Learn More
In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm(More)
Here, a protein atom-ligand fragment interaction library is described. The library is based on experimentally solved structures of protein-ligand and protein-protein complexes deposited in the Protein Data Bank (PDB) and it is able to characterize binding sites given a ligand structure suitable for a protein. A set of 30 ligand fragment types were defined(More)
In this paper we propose a method of constructing a hierarchical classification based on the notion of stochastic complexity. Minimization of stochastic complexity amounts to maximization of the information content of the classification. A dendrogram is obtained by first finding the classification which minimizes stochastic complexity and then by step-wise(More)
In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in(More)
We introduce a novel Markov chain Monte Carlo algorithm for estimation of posterior probabilities over discrete model spaces. Our learning approach is applicable to families of models for which the marginal likelihood can be analytically calculated, either exactly or approximately, given any fixed structure. It is argued that for certain model neighborhood(More)
Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the(More)
We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.
In this document we introduce a software package BinClass for the clas-siication of binary vectors and analysis of the classiication results. First we will give brief introduction to the mathematical foundations and theory of clustering, cumulative classiication and mixture classiication. We also introduce methods for analysis of the classiications(More)