3.2018, Exactum B222, Kumpula Campus.
Please fill this form before 15.3.2018.
Dirichlet Mixtures, the Dirichlet Process,and the Topography of Amino Acid Multinomial Space
14.00-14.30 Coffee break
14.30 Pasi Rastas: Lep-MAP3: Robust linkage mapping even for low-coverage whole genome sequencing data
14.55 Ari Löytynoja: Short template switch events explain mutation clusters in the human genome
Dirichlet Mixtures, the Dirichlet Process, and the Topography of Amino Acid Multinomial Space
Stephen Altschul, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
Abstract
The Dirichlet Process is used to estimate probability distributions that are mixtures of an unknown and unbounded number of components. Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we have used the Dirichlet Process to construct such distributions. The resulting mixtures describe multiple alignment data substantially better than do those previously derived. They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on protein structure. Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino-acid multinomial space.
Veli Mäkinen