PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment

Abstract

MOTIVATION The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. RESULTS Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.

DOI: 10.1093/bioinformatics/btu851
0102030201520162017
Citations per Year

Citation Velocity: 16

Averaging 16 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Koskinen2015PANNZERHF, title={PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment}, author={Patrik Koskinen and Petri T{\"{o}r{\"{o}nen and Jussi Nokso-Koivisto and Liisa Holm}, journal={Bioinformatics}, year={2015}, volume={31 10}, pages={1544-52} }