Florencia G. Leonardi

Learn More
In this paper we address the problem of identifying differences between populations of trees. Besides the theoretical relevance of this problem, we are interested in testing if trees characterizing protein sequences from different families constitute samples of significantly different distributions. In this context, trees are obtained by modelling protein(More)
The completion of the genome sequence of Plasmodium falciparum revealed that close to 60% of the annotated genome corresponds to hypothetical proteins and that many genes, whose metabolic pathways or biological products are known, have not been predicted from sequence similarity searches. Recently, using global gene expression of the asexual blood stages of(More)
MOTIVATION A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS We introduce a new algorithm,(More)
0, Vol. 0, No. 00, 1–24 DOI: 10.1214/11-AOAS511 © Institute of Mathematical Statistics, 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 CONTEXT TREE SELECTION AND(More)
The completion of the genome sequence of Plasmodium falciparum revealed that close to 60% of the annotated genome corresponds to hypothetical proteins and that many genes, whose metabolic pathways or biological products are known biochemically, had not been predicted. Recently, using global gene expression of the asexual blood stages of P. falciparum at 1h(More)
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as(More)
Abstract. We find upper bounds for the probability of underestimation and overestimation errors in penalized likelihood context tree estimation. The bounds are explicit and applies to processes of not necessarily finite memory. We allow for general penalizing terms and we give conditions over the maximal depth of the estimated trees in order to get strongly(More)
A methodology, based on progressive steps, has been developed, so that the students be prepared to design and implement typical industrial projects, such as digital filters, voice processing algorithms, and others, and also be able to correlate this knowledge with other disciplines. They start with an analog system, described by a differential equation,(More)