John Patrick Mpindi graduated from the University of Turku, in 2008, having bioinformatics as his major subject. He started his PhD project in Olli Kallioniemi’s group at VTT Medical Biotechnology Turku in 2009 and joined FIMM in 2011.
During his thesis project, John Patrick has developed bioinformatic tools for analyzing large-scale biomedical data sets. His thesis consists of four publications, three of which have already been published. The last manuscript has been resubmitted to Nature correspondence. He has concentrated on two types of biomedical data: microarray based gene expression datasets as well as drug sensitivity and resistance testing datasets.
In the thesis study, John Patrick developed a novel algorithm for detecting outliers in gene expression datasets. The method, called gene tissue index (GTI) outlier method, and its utility in detecting genes overexpressed in tumour samples, is described in the first two articles of his thesis. He tested the algorithm in several gene expression datasets and was able to show that the method is superior to many existing methods in detecting oncogenic outlier genes when the sample number per gene varies. Utilizing the algorithm revealed several promising new oncogenes and cancer-type specific biomarkers.
Interestingly, the method was developed by modifying and adapting algorithms originally developed for economics problems.
– During my bachelor’s degree studies, I was really interested in statistical problems in economics and my future plans consisted of a career in finance, perhaps working in a bank. When the similarities between gene expression data and financial data hit me, I realized the possibilities of adapting the existing economical methodology in biomedicine. This was something that nobody had thought of doing before, said John Patrick.
– In my mind, the genes having different expression levels are like countries with different income, while sample numbers are comparable to population-size of the country. Detecting outliers can be done using similar approaches regardless of the nature of the data.
In the last two publications of his thesis he concentrated on high-throughput drug screening dose-response data and showed that the choice of normalization method has a highly significant effect on the quality and reproducibility of the data especially when the hit rate is high. By comparing datasets from several independent studies, the team was able to demonstrate that drug testing data are highly consistent between studies but only when both laboratory protocols and the bioinformatic data processing and analysis are standardized.
Integrated analysis of large-scale biomedical datasets requires programming skills as well as special knowledge of both biology and analysis algorithms. This combination of skills is valued in the scientific community and, similar to many other bioinformaticians, John Patric holds an impressive track record of collaborative projects and co-authored publications.
John Patrick emphasizes the importance of efficient and well-standardized methods in personalized medicine research.
– We need to be able to produce high-quality data with only one experiment, since new patient samples are usually difficult to obtain.
– I believe that we can deliver the promise of personalized medicine but only if many research centres all over the world collaborate and share their data. This is only meaningful if the bioinformatics methods and laboratory assay protocols used are broadly standardized.
John Patrick’s future plans are not yet settled but since he truly enjoys analyzing biomedical data he wants to continue along that path during his post-doctoral career, be that in academia or industry.
The public examination of John Patric Mpindi’s doctoral dissertation took place on 11 March 2016 at 12:00 in Lecture hall 3 at Biomedicum Helsinki 1, Haartmaninkatu 8. Dr. Al-Lazikani Bissan (The Institute of Cancer Research, London) served as the opponent and Professor Olli Kallioniemi as the custos.
The dissertation is also available in an electronic form