MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN]

  title={MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN]},
  author={Michael Campbell and Meiyee Law and Carson Holt and Joshua C. Stein and Gaurav D. Moghe and David E. Hufnagel and Jikai Lei and Rujira Achawanantakun and Dian Jiao and Carolyn J. Lawrence-Dill and Doreen H Ware and Shin-han Shiu and Kevin L. Childs and Yanni Sun and Ning Jiang and Mark Yandell},
  journal={Plant Physiology},
  pages={513 - 524}
MAKER-P annotates the entire Arabidopsis and maize genomes in less than 3 h with comparable quality to the current TAIR10 and maize V2 annotation builds. We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit… 

Tables from this paper

Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN]

The use of MAKER-P is reported to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure and demonstrate the utility of MAker-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.

Gene prediction and annotation in Penstemon (Plantaginaceae): A workflow for marker development from extremely low-coverage genome sequencing1

Combining bioinformatics tools into a workflow that produces annotations can be useful for creating potential phylogenetic markers from thousands of sequences even when genome coverage is extremely low and reference data are only available from distant relatives.

Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes, and is able to generate better gene predictions compared to three HMM-based programs using their respective available HMMs.

FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences

FINDER takes a completely automated approach to annotate genes directly from raw expression data, capable of processing eukaryotic genomes of all sizes and requires no manual supervision—ideal for bench researchers with limited experience in handling computational tools.

Improved maize reference genome with single-molecule technologies

The assembly and annotation of a reference genome of maize is reported, using single-molecule real-time sequencing and high-resolution optical mapping to identify transposable element lineage expansions that are unique to maize.

An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps

The latest tomato reference genome (SL4.4.0) assembled de novo from PacBio long reads and scaffolded using Hi-C contact maps and validated using Bionano optical maps and 10X linked-read sequences is presented.

Genome Annotation and Curation Using MAKER and MAKER‐P

This unit describes how to use the genome annotation and curation tools MAKER and MAKER‐P to annotate protein‐coding and noncoding RNA genes in newly assembled genomes, update/combine legacy

Plant genome and transcriptome annotations: from misconceptions to simple solutions

A comprehensive review of typical ontologies to be used in the plant sciences, useful databases and resources used for functional annotation, what to expect from an annotated plant genome and a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources are presented.

The complex sequence landscape of maize revealed by single molecule technologies

The assembly and annotation of maize is reported using Single Molecule Real-Time (SMRT) sequencing and high-resolution genome map, revealing a prevalence of deletions in the region of low gene density region and maize lineage-specific genes.

Using multiple reference genomes to identify and resolve annotation inconsistencies

A high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model and demonstrated the utility of this method using gene annotations of three reference genomes from maize.



The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools

Recent developments include several new genome releases, progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

MAKER2 is the first annotation engine specifically designed for second-generation genome projects, which scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality.

Gene Discovery and Tissue-Specific Transcriptome Analysis in Chickpea with Massively Parallel Pyrosequencing and Web Resource Development1[W][OA]

The strategy for optimization of de novo assembly presented here may further facilitate the transcriptome sequencing and characterization in other organisms and help to accelerate research in various areas of genomics and implementing breeding programs in chickpea.

Quantitative measures for the management and comparison of annotated genomes

A suite of quantitative measures are developed to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review, and demonstrate the usefulness of these measures for genome annotation management.

The Construction of Arabidopsis Expressed Sequence Tag Assemblies (A New Resource to Facilitate Gene Identification)

The application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a “contig” or assembly is described.

Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice1[C][W][OA]

It is found that fewΨs have expression evidence, and their expression levels tend to be lower compared with annotated genes, indicating that Ψ expression may be due to insufficient time for complete degeneration of regulatory signals, and larger protein domain families have significantly more Ψs in general.

Using native and syntenically mapped cDNA alignments to improve de novo gene finding

This work incorporates several different evidence sources into the gene finder AUGUSTUS, a widely used and essential tool for analyzing newly sequenced genomes and correctly predicts at least one splice form exactly correct in 57% of human genes.

Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data

A whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software is described, demonstrating how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity.

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

REGIA, An EU Project on Functional Genomics of Transcription Factors From Arabidopsis Thaliana

The project involves the preparation of both a TF gene array for expression analysis and a normalised full length open reading frame (ORF) library of TFs in a yeast two hybrid vector; the applications of these resources should extend beyond the scope of this programme.