Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi

@article{Schoch2014FindingNI,
  title={Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi},
  author={Conrad L. Schoch and Barbara Robbertse and Vincent Robert and Duong Vu and Gianluigi Cardinali and Laszlo Irinyi and Wieland Meyer and R. Henrik Nilsson and Karen W Hughes and Andrew N. Miller and Paul M. Kirk and Kessy Abarenkov and M. Catherine Aime and Hiran A. Ariyawansa and Martin I. Bidartondo and Teun Boekhout and Bart Buyck and Qing Cai and Jie Chen and Ana Crespo and Pedro W. Crous and Ulrike Damm and Z. Wilhelm De Beer and Bryn T. M. Dentinger and Pradeep Kumar Divakar and Margarita Due{\~n}as and Nicolas Feau and Kateřina Ol{\vs}a Fliegerov{\'a} and Miguel Angel Garc{\'i}a and Zai-Wei Ge and Gareth W. Griffith and Johannes Z. Groenewald and Marizeth Groenewald and Martin Grube and Marieka Gryzenhout and C{\'e}cile Gueidan and Liangdong Guo and Sarah Hambleton and Richard C. Hamelin and Karen Hansen and Val{\'e}rie Hofstetter and Seung-Beom Hong and Jos Houbraken and Kevin David Hyde and Patrik Inderbitzin and Peter R. Johnston and Samantha Chandranath Karunarathna and Urmas K{\~o}ljalg and G{\'a}bor M. Kov{\'a}cs and Ekaphan Kraichak and Krisztina Krizsan and Cletus P. Kurtzman and Karl-Henrik Larsson and Steven D. Leavitt and Peter M. Letcher and Kare Liimatainen and Jiankui Liu and D. Jean Lodge and Janet Jennifer Luangsa-ard and H. Thorsten Lumbsch and Sajeewa S. N. Maharachchikumbura and Dimuthu S. Manamgoda and Mar{\'i}a Paz Mart{\'i}n and Andrew M. Minnis and Jean-Marc Moncalvo and Giuseppina Mul{\`e} and Karen K. Nakasone and Tuula Niskanen and Ibai Olariaga and Tam{\'a}s Papp and Tam{\'a}s Petkovits and Raquel Pino-Bodas and Martha J. Powell and Huzefa A. Raja and Dirk Redecker and J. M. Sarmiento-Ramirez and Keith A. Seifert and Bhushan Shrestha and Soili Stenroos and Benjamin Stielow and Sung-Oui Suh and Kazuaki Tanaka and Leho Tedersoo and M. Teresa Telleria and Dhanushka Udayanga and Wendy A. Untereiner and Javier Di{\'e}guez Uribeondo and Krishna V. Subbarao and Csaba V{\'a}gv{\"o}lgyi and Cobus M. Visagie and Kerstin Voigt and Donald M. Walker and Bevan Simon Weir and Michael Wei{\ss} and Nalin N. Wijayawardene and Michael J. Wingfield and J. P. Xu and ZHU-LIANG Yang and Ning Zhang and Wen-Ying Zhuang and Scott Federhen},
  journal={Database: The Journal of Biological Databases and Curation},
  year={2014},
  volume={2014}
}
DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re… 

Figures and Tables from this paper

Dnabarcoder: An open‐source software package for analysing and predicting DNA sequence similarity cutoffs for fungal sequence identification

A new tool is presented, dnabarcoder, to predict local similarity cutoffs and measure the resolving powers of a biomarker for sequence identification for different clades of fungi by showing that it might be better to extract the ITS region from the ITS barcodes to optimize taxonomic assignment accuracy.

Improving taxonomic accuracy for fungi in public sequence databases: applying ‘one name one species’ in well-defined genera with Trichoderma/Hypocrea as a test case

The recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database, and a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions.

Improving ITS sequence data for identification of plant pathogenic fungi

A concerted effort was reported on to identify high-quality reference sequences for various plant pathogenic fungi and to re-annotate incorrectly or insufficiently annotated public ITS sequences from these fungal lineages, to enrich the sequences with geographical and ecological metadata.

A long-read amplicon approach to scaling up the metabarcoding of lichen herbarium specimens

This study highlights the potential and challenges of using new sequencing technologies on collection specimens to generate DNA sequences for reference databases and presents a method that further decreases lichen specimen metabarcoding costs.

PacBio amplicon sequencing for metabarcoding of mixed DNA samples from lichen herbarium specimens

With increasing data output and reducing sequencing cost, PacBio amplicon sequencing is seen as a promising approach for the generation of reference sequences for lichenised fungi as well as the characterisation of lichen-associated fungal communities.

A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts

This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi that supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial datasets.

Unambiguous identification of fungi: where do we stand and how accurate and precise is fungal DNA barcoding?

A conceptual framework for the identification of fungi is provided, encouraging the approach of integrative (polyphasic) taxonomy for species delimitation, i.e. the combination of genealogy, phenotype, and phenotype-based approaches to catalog the global diversity of fungi and establish initial species hypotheses.

Publicly Available and Validated DNA Reference Sequences Are Critical to Fungal Identification and Global Plant Protection Efforts: A Use-Case in Colletotrichum.

It is demonstrated that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in the dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.

Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory

This study investigates whether public fungal ITS sequences are subjected to sufficient trimming in their distal (5’ and 3’) ends prior to deposition in the public repositories, and provides a set of recommendations on how to manage the sequence trimming problem.

Caveats of fungal barcoding: a case study in Trametes s.lat. (Basidiomycota: Polyporales) in Vietnam reveals multiple issues with mislabelled reference sequences and calls for third-party annotations

This study demonstrates that accurate identification of fungi through molecular barcoding is currently not a fast-track approach that can be achieved through automated pipelines, and calls for the implementation of third-party annotations or analogous approaches in primary sequence repositories.
...

References

SHOWING 1-10 OF 109 REFERENCES

Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

The present study uses a large set of fungal DNA sequences from the inclusive International Nucleotide Sequence Database to show that the taxon sampling of fungi is far from complete, that about 20% of the entries may be incorrectly identified to species level, and that the majority of entries lack descriptive and up-to-date annotations.

PHYMYCO-DB: A Curated Database for Analyses of Fungal Diversity and Evolution

The PHYMYCO-DB offers the tools necessary to extract high quality fungal sequences for each of the 5 fungal phyla, at all taxonomic levels, and launch alignments of personal sequences along with stored data.

Reference databases for taxonomic assignment in metagenomics

An overview of existing reference resources for both types of markers is given, highlighting strengths and possible shortcomings of their use for metagenomics purposes, and a new database of well annotated and phylogenetically classified ITS1 sequences is presented, to be used as a reference collection in metagenomic studies of environmental fungal communities.

UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi.

The UNITE database, an open-access database dedicated to the reliable identification of ECM fungi, comprises well annotated fungal ITS sequences from well defined herbarium specimens that include full her barium reference identification data, collector/source and ecological data.

Towards a unified paradigm for sequence‐based identification of fungi

All fungal species represented by at least two ITS sequences in the international nucleotide sequence databases are now given a unique, stable name of the accession number type, and the term ‘species hypothesis’ (SH) is introduced for the taxa discovered in clustering on different similarity thresholds.

Identification of Fungal DNA Barcode Targets and PCR Primers Based on Pfam Protein Families and Taxonomic Hierarchy

The identified targets have essential housekeeping functions, like the well known phylogenetic or barcode markers, and most have a better resolution potential to differentiate species among fully sequenced genomes than the most presently used markers.

CREST – Classification Resources for Environmental Sequence Tags

Analysis of cross-validation and environmental datasets indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa.

PlutoF—a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences

The web-based workbench PlutoF is described, which is designed to bridge the gap between the needs of contemporary research in biology and the existing software resources and databases.

Genetypes: a concept to help integrate molecular phylogenetics and taxonomy

The term “genetype” is proposed as a label for any sequence data from types (including from holotypes, secondary types, topotypes, etc.) to bring awareness to the situation.
...