All the software tools and databases are freely available.

DrugRepo is a computational pipeline to repurpose drugs for new indications. The repurposing pipeline has various steps including: compound-target data analysis, structural analysis, gene-disease relationships and pathway analysis. The pipeline is able to repurpose ~0.8. million compounds across 606 diseases (including various cancers, cardiovascular and kidney diseases).



Citation: Wang et al. DrugRepo: a novel approach to repurposing drugs based on chemical and genomic features. Sci Rep. 2022. 12(1):21116. doi: 10.1038/s41598-022-24980-2


ENDS is an online tool for the Epistemic Nonparametric Drug-response Scoring. We present a class of non-parametric models for the curve fitting and scoring of drug dose–responses. To allow a more objective representation of the drug sensitivity, these epistemic models devoid of any parametric assumptions attached to the linear fit, allow the parallel indexing such as half-maximal inhibitory concentration and area under curve. Specifically, three non-parametric models including spline (npS), monotonic and Bayesian and the parametric logistic are implemented. Other indices including maximum effective dose and drug–response span gradient pertinent to the npS are also provided to facilitate the interpretation of the fit. 


Source code:


[1] Amiryousefi, A.; Williams, B.; Jafari, M.; Tang, J. The ENDS of Assumptions; an Online Tool for the Epistemic Nonparametric Drug-Response Scoring. Bioinformatics 2022, btac217.



drda: An R package for dose-response data analysis
Analysis of dose-response data is an important step in many scientific disciplines, including but not limited to pharmacology, toxicology, and epidemiology. The R package drda is designed to facilitate the analysis of dose-response data by implementing efficient and accurate functions with a familiar interface. With drda, it is possible to fit models by the method of least squares, perform goodness of fit tests, and conduct model selection. Compared to other similar packages, drda provides, in general, more accurate estimates in the least-squares sense. This result is achieved by a smart choice of the starting point in the optimization algorithm and by implementing the Newton method with a trust region with analytical gradients and Hessian matrices. In this article, drda is presented through the description of its methodological components and examples of its user-friendly functions. Performance is finally evaluated using a real, large-scale drug sensitivity screening dataset.

R package: 

[1] Malyutina et al. drda: An R package for dose-response data analysis. Journal of Statistical Software. 2022. In press. 

drda: An R package for dose-response data analysis


Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments

Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of MICHA (Minimal Information for Chemosensitivity Assays), accessed via Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents, and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets, and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies, as well as six recently conducted COVID-19 studies. With the MICHA webserver and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.

Web application:

[1] Tanoli et al. Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments. Brief Bioinform. 2021:bbab350. doi: 10.1093/bib/bbab350.

Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments


DrugComb - an integrative cancer drug combination data portal

Drug combination therapy has the potential to enhance efficacy, reduce dose-dependent toxicity and prevent the emergence of drug resistance. However, discovery of synergistic and effective drug combinations has been a laborious and often serendipitous process. In recent years, identification of combination therapies has been accelerated due to the advances in high-throughput drug screening, but informatics approaches for systems-level data management and analysis are needed. To contribute toward this goal, we created an open-access data portal called DrugComb ( where the results of drug combination screening studies are accumulated, standardized and harmonized. Through the data portal, we provided a web server to analyze and visualize users' own drug combination screening data. The users can also effectively participate a crowdsourcing data curation effect by depositing their data at DrugComb. To initiate the data repository, we collected 437 932 drug combinations tested on a variety of cancer cell lines. We showed that linear regression approaches, when considering chemical fingerprints as predictors, have the potential to achieve high accuracy of predicting the sensitivity of drug combinations. All the data and informatics tools are freely available in DrugComb to enable a more efficient utilization of data resources for future drug combination discovery.


[1] Nucleic Acids Res. 2021 Jun 1; gkab438. doi: 10.1093/nar/gkab438 

[2] Nucleic Acids Res. 2019 Jul 2;47(W1):W43-W51.doi: 10.1093/nar/gkz337

Database link:

DrugComb: an integrative cancer drug combination data portal

SynergyFinder Plus (SynergyFinder+)

Combinatorial therapies have been recently proposed for improving anticancer treatment efficacy. SynergyFinder R package is a software tool to analyze pre-clinical drug combination datasets developed in our group. We report the major updates of the R package to improve the interpretation and annotation of drug combination screening results. Compared to the existing implementations, the novelty of the updated SynergyFinder R package consists of 1) extending to higher order drug combination data analysis and the implementation of dimension reduction techniques for visualizing the synergy landscape for unlimited number of drugs in a combination; 2) statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 3) incorporating a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric of synergy; 4) incorporating the evaluation of drug combination synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential. Furthermore, we provide the annotation of drugs and cell lines that are tested in an experiment, including their chemical information, targets and signaling network information. These annotations shall improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package for the drug discovery community, we also provide a web server at that provides a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.

Web applications:


Web application development site: 


R package (

R package development site:  


[1] Genomics, Proteomics & Bioinformatics. 2022. doi: 
[2] Bioinformatics. 2017 Aug 1;33(15):2413-2415. doi: 10.1093/bioinformatics/btx162.
[3] Comput Struct Biotechnol J. 2015 Sep 25;13:504-13. doi: 10.1016/j.csbj.2015.09.001

SynergyFinder Plus: towards a better interpretation and annotation of drug combination screening datasets


FAIRification of drug target interaction data (

Knowledge of the full target space of bioactive substances, approved and investigational drugs as well as chemical probes, provides important insights into therapeutic potential and possible adverse effects. The existing compound-target bioactivity data resources are often incomparable due to non-standardized and heterogeneous assay types and variability in endpoint measurements. To extract higher value from the existing and future compound target-profiling data, we implemented an open-data web platform, named Drug Target Commons (DTC), which features tools for crowd-sourced compound-target bioactivity data annotation, standardization, curation, and intra-resource integration. We demonstrate the unique value of DTC with several examples related to both drug discovery and drug repurposing applications and invite researchers to join this community effort to increase the reuse and extension of compound bioactivity data.


[1] Cell Chem Biol. 2018 Feb 15;25(2):224-229.e2. doi: 10.1016/j.chembiol.2017.11.009

[2] Database (Oxford). 2018 Jan 1;2018:1-13. doi: 10.1093/database/bay083

[3] Brief Bioinform. 2021 Mar 22;22(2):1656-1678. doi: 10.1093/bib/bbaa003

Drug Target Commons: A Community Effort to Build a Consensus Knowledge Base for Drug-Target Interactions


We carried out a systematic evaluation of target selectivity profiles across three recent large-scale biochemical assays of kinase inhibitors and further compared these standardized bioactivity assays with data reported in the widely used databases ChEMBL and STITCH. Our comparative evaluation revealed relative benefits and potential limitations among the bioactivity types, as well as pinpointed biases in the database curation processes. Ignoring such issues in data heterogeneity and representation may lead to biased modeling of drugs' polypharmacological effects as well as to unrealistic evaluation of computational strategies for the prediction of drug-target interaction networks. Toward making use of the complementary information captured by the various bioactivity types, including IC50, K(i), and K(d), we also introduce a model-based integration approach, termed KIBA, and demonstrate here how it can be used to classify kinase inhibitor targets and to pinpoint potential errors in database-reported drug-target interactions. An integrated drug-target bioactivity matrix across 52,498 chemical compounds and 467 kinase targets, including a total of 246,088 KIBA scores, has been made freely available.
Citation: Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis.  J Chem Inf Model. 2014 Mar 24;54(3):735-43. doi: 10.1021/ci400709d. 

Download the dataset:

Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis


Target inhibition network analysis using Minimization and Maximization Averaging

A recent trend in drug development is to identify drug combinations or multi-target agents that effectively modify multiple nodes of disease-associated networks. Such polypharmacological effects may reduce the risk of emerging drug resistance by means of attacking the disease networks through synergistic and synthetic lethal interactions. However, due to the exponentially increasing number of potential drug and target combinations, systematic approaches are needed for prioritizing the most potent multi-target alternatives on a global network level. We took a functional systems pharmacology approach toward the identification of selective target combinations for specific cancer cells by combining large-scale screening data on drug treatment efficacies and drug-target binding affinities. Our model-based prediction approach, named TIMMA, takes advantage of the polypharmacological effects of drugs and infers combinatorial drug efficacies through system-level target inhibition networks. Case studies in MCF-7 and MDA-MB-231 breast cancer and BxPC-3 pancreatic cancer cells demonstrated how the target inhibition modeling allows systematic exploration of functional interactions between drugs and their targets to maximally inhibit multiple survival pathways in a given cancer type. The TIMMA prediction results were experimentally validated by means of systematic siRNA-mediated silencing of the selected targets and their pairwise combinations, showing increased ability to identify not only such druggable kinase targets that are essential for cancer survival either individually or in combination, but also synergistic interactions indicative of non-additive drug efficacies. These system-level analyses were enabled by a novel model construction method utilizing maximization and minimization rules, as well as a model selection algorithm based on sequential forward floating search. Compared with an existing computational solution, TIMMA showed both enhanced prediction accuracies in cross validation as well as significant reduction in computation times. Such cost-effective computational-experimental design strategies have the potential to greatly speed-up the drug testing efforts by prioritizing those interventions and interactions warranting further study in individual cancer cases.


[1] Bioinformatics. 2015 Jun 1;31(11):1866-8. doi: 10.1093/bioinformatics/btv067.

[2] PLoS Comput Biol. 2013;9(9):e1003226. doi: 10.1371/journal.pcbi.1003226.

Target Inhibition Networks: Predicting Selective Combinations of Druggable Targets to Block Cancer Survival Pathways


Bayesian Analysis of Population Structure 

During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions. We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software. The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at


[1] Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539

[2] Bayesian analysis of population structure based on linked molecular information. Math Biosci. 2007 Jan;205(1):19-31. doi: 10.1016/j.mbs.2006.09.015.

[3] Identifying currents in the gene pool for bacterial populations using an integrative approach. 

PLoS Comput Biol. 2009 Aug;5(8):e1000455. doi: 10.1371/journal.pcbi.1000455

[4] Hyper-recombination, diversity, and antibiotic resistance in pneumococcus. Science. 2009 Jun 12;324(5933):1454-7.doi: 10.1126/science.1171908

Download the source code (v5.2):

For more recent updates, please contact Prof. Jukka Corander

Hyper-recombination, diversity, and antibiotic resistance in pneumococcus

DOI: 10.1126/science.1171908

Homologous recombination is frequent in many bacteria, but few studies have addressed whether subpopulations within a species are more or less likely to undergo this process and whether it has consequences for their evolution. Taking a large data set from the pathogen Streptococcus pneumoniae, Hanage et al. (p. 1454) discovered a group of strains characterized by an anomalous sequence of housekeeping genes. This sequence appeared to have been horizontally acquired from other pneumococci and related species and was associated with resistance to all classes of antibiotics for which data are available. Thus, hyper-recombination (in contrast to hypermutation) is important in the evolution and spread of antibiotic resistance and may play a role in determining the emergence of species clusters and the phenotypes associated with them.


T-RFLP Bayesian Analysis of Population Structures in Bacteria

The investigation of microbial communities is an essential part of the study of the biosphere. Flexible molecular fingerprinting tools such as terminal-restriction fragment length polymorphism (T-RFLP) analysis are often applied in the studies to enable the characterization of the microbial population. However, such data have so far been primarily analyzed using conventional clustering methods. Here we introduce a Bayesian model-based method for the purpose of comparing microbial communities using T-RFLP data. Such datasets have in general several challenging features, e.g. sparseness, missing values and structurally zero-valued observations. These features are taken into account by developing a Bayesian latent class mixture model for the observations in our framework. To make inferences under the model we use a recent Markov chain Monte Carlo (MCMC) -based method for the Bayesian model selection. To assess the introduced method we analyze both simulated and real datasets. The simulations show that our approach compares preferably to standard statistical clustering tools, such as k-means, hierarchical clustering, and Autoclass. The developed tool is freely available as a software package T-BAPS at

Citation: T-BAPS: a Bayesian statistical tool for comparison of microbial communities using terminal-restriction fragment length polymorphism (T-RFLP) data. Stat Appl Genet Mol Biol. 2007;6:Article30. doi: 10.2202/1544-6115.1303

Paper download:

T-BAPS: a Bayesian statistical tool for comparison of microbial communities using terminal-restriction fragment length polymorphism (T-RFLP) data