Content discovery and retrieval services at the European Nucleotide Archive

  title={Content discovery and retrieval services at the European Nucleotide Archive},
  author={Nicole Silvester and Blaise T. F. Alako and Clara Amid and Ana Cerde{\~n}o-T{\'a}rraga and Iain Cleland and Richard Gibson and Neil Goodgame and Petra ten Hoopen and Simon Kay and Rasko Leinonen and Weizhong Li and Xin Liu and Rodrigo Lopez and Nima Pakseresht and Swapna Pallreddy and Sheila Plaister and Rajesh Radhakrishnan and Marc Rossello and Alexander Senf and Dimitriy Smirnov and Ana Luisa Toribio and Daniel Vaughan and Vadim Zalunin and Guy Cochrane},
  journal={Nucleic Acids Research},
  pages={D23 - D29}
The European Nucleotide Archive (ENA; is Europe's primary resource for nucleotide sequence information. With the growing volume and diversity of public sequencing data comes the need for increased sophistication in data organisation, presentation and search services so as to maximise its discoverability and usability. In response to this, ENA has been introducing and improving checklists for use during submission and expanding its search facilities to provide targeted… 

Figures and Tables from this paper

Biocuration of functional annotation at the European nucleotide archive
This article reports on ENA in 2015 regarding general activity, notable published data sets and major achievements, followed by a focus on sustainable biocuration of functional annotation, an area which has particularly felt the pressure of sequencing growth.
The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI
The EBI Search engine is presented, referred to here as ‘EBI Search’, an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities.
The European Bioinformatics Institute in 2016: Data growth and integration
The Embassy Cloud service, which allows users to run large analyses in a virtual environment next to EMBL-EBI's vast public data resources, is launched.
Value, but high costs in post-deposition data curation
It is concluded that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach.
The EMBL-EBI bioinformatics web and programmatic tools framework
The EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications, and new tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, ensure that the framework remains relevant to today's biological community.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.
The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes,
Bioinformatics ’ resources : focus on curated databases
An overview of SIB’s resources and competence areas is provided, with a strong focus on curated databases and SIB's most popular and widely used resources.
DNA data bank of Japan (DDBJ) progress report
The activities of the DDBJ Center over the past year including submissions to databases and improvements in services for data retrieval, analysis, and integration are reported on.
The RNASeq-er API—a gateway to systematically updated analysis of public RNA-seq data
A Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single‐cell) RNA‐Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer.
The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases
An overview of S IB's resources and competence areas is provided, with a strong focus on curated databases and SIB's most popular and widely used resources.


Assembly information services in the European Nucleotide Archive
The European Nucleotide Archive content and growth over 2013 is reviewed, rapidly developing services for genome assembly information are described and further major developments over the last year are outlined.
Updates to BioSamples database at European Bioinformatics Institute
Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, anew data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
The International Nucleotide Sequence Database Collaboration
This article outlines INSDC services and update the reader on developments in 2011, including the newly launched BioProject database and improved handling of assembly information.
The International Nucleotide Sequence Database Collaboration
The INSDC is introduced, data growth patterns are outlined and the challenges of increased growth are commented on, with a clear mark on INSDC strategy.
EBI metagenomics—a new resource for the analysis and archiving of metagenomic data
A new metagenomics resource is developed that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored in the European Nucleotide Archive.
RNAcentral: an international database of ncRNA sequences
  • Anton I Simon J E Richard Eugene Dan Elspeth A Mathew W Petrov Kay Gibson Kulesha Staines Bruford Wright B, Anton I. Petrov, K. Pruitt
  • Biology
    Nucleic Acids Res.
  • 2015
The first release of RNAcentral is presented, a database that collates and integrates information from an international consortium of established RNA sequence databases that contains over 8.1 million sequences.
RNAcentral: A vision for an international database of RNA sequences.
This article proposes the creation of a new open public resource that is term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases.
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications
To establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, the minimum information about any (x) sequence is presented (MIxS).
The complete genome sequence of a Neanderthal from the Altai Mountains
Kay Prüfer, Fernando Racimo, Nick Patterson, Flora Jay, Sriram Sankararaman, Susanna Sawyer, Anja Heinze, Gabriel Renaud, Peter H. Sudmant, Cesare de Filippo, Heng Li, Swapan Mallick, Michael