A get-together event for bioinformatics researchers takes place on Friday 23.3.2018, Exactum B222, Kumpula Campus.


Please fill this form before 15.3.2018.



Dirichlet Mixtures, the Dirichlet Process,  and the Topography of Amino Acid Multinomial Space

Stephen Altschul, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA


The Dirichlet Process is used to estimate probability distributions that are mixtures of an unknown and unbounded number of components.  Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we have used the Dirichlet Process to construct such distributions.  The resulting mixtures describe multiple alignment data substantially better than do those previously derived.  They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on protein structure.  Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino-acid multinomial space.


Veli Mäkinen