• Publications
  • Influence
Pfam: the protein families database
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in
The Pfam protein families database
TLDR
The latest version (4.3) of Pfam contains 1815 families, which match 63% of proteins in SWISS-PROT 37 and TrEMBL 9.
Initial sequencing and analysis of the human genome.
TLDR
The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
The Pfam protein families database: towards a more sustainable future
TLDR
Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set, and the facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
Pfam: clans, web tools and services
TLDR
Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
miRBase: microRNA sequences, targets and gene nomenclature
TLDR
The miRBase database aims to provide integrated interfaces to comprehensive microRNA sequence data, annotation and predicted gene targets, and acts as an independent arbiter of microRNA gene nomenclature.
An introduction to hidden Markov models.
This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit
The Pfam protein families database in 2019
TLDR
A significant comparison to the structural classification database that led to the creation of 825 new families based on their set of uncharacterized families (EUFs) was carried out and Pfam entries were connected to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms.
The Pfam protein families database
TLDR
The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2)
TLDR
The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
...
...