DivergentSet, a Tool for Picking Non-redundant Sequences from Large Sequence Collections*
@article{Widmann2006DivergentSetAT, title={DivergentSet, a Tool for Picking Non-redundant Sequences from Large Sequence Collections*}, author={J. Widmann and M. Hamady and Rob Knight}, journal={Molecular & Cellular Proteomics}, year={2006}, volume={5}, pages={1520 - 1532} }
DivergentSet addresses the important but so far neglected bioinformatics task of choosing a representative set of sequences from a larger collection. We found that using a phylogenetic tree to guide the construction of divergent sets of sequences can be up to 2 orders of magnitude faster than the naive method of using a full distance matrix. By providing a user-friendly interface (available online) that integrates the tasks of finding additional sequences, building and refining the divergent… CONTINUE READING
Figures and Topics from this paper
13 Citations
Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs
- Biology, Medicine
- Nucleic acids research
- 2012
- 55
- PDF
Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data
- Biology, Medicine
- The ISME Journal
- 2010
- 920
- PDF
MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs
- Medicine, Biology
- Genome Biology
- 2008
- 3
- PDF
Subgrouping Automata: Automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm
- Mathematics, Computer Science
- Comput. Biol. Chem.
- 2014
Global patterns in bacterial diversity
- Biology, Medicine
- Proceedings of the National Academy of Sciences
- 2007
- 1,165
- PDF
C-Terminal acidic domain of ubiquitin-conjugating enzymes: a multi-functional conserved intrinsically disordered domain in family 3 of E2 enzymes.
- Biology, Medicine
- Journal of structural biology
- 2012
- 19
Loop 7 of E2 Enzymes: An Ancestral Conserved Functional Motif Involved in the E2-Mediated Steps of the Ubiquitination Cascade
- Biology, Medicine
- PloS one
- 2012
- 25
- PDF
Phylogeography of microbial phototrophs in the dry valleys of the high Himalayas and Antarctica
- Geography, Medicine
- Proceedings of the Royal Society B: Biological Sciences
- 2010
- 71
- PDF
References
SHOWING 1-10 OF 24 REFERENCES
Removing near-neighbour redundancy from large protein sequence collections
- Mathematics, Medicine
- Bioinform.
- 1998
- 279
- PDF
Shotgun: getting more from sequence similarity searches
- Computer Science, Medicine
- Bioinform.
- 1999
- 37
- PDF
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
- Biology, Medicine
- Nucleic acids research
- 2004
- 29,124
- Highly Influential
- PDF
FastGroup: A program to dereplicate libraries of 16S rDNA sequences
- Biology, Medicine
- BMC Bioinformatics
- 2001
- 54
- Highly Influential
Incomplete taxon sampling is not a problem for phylogenetic inference
- Medicine, Biology
- Proceedings of the National Academy of Sciences of the United States of America
- 2001
- 285
- PDF
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness
- Biology, Medicine
- Applied and Environmental Microbiology
- 2005
- 2,382
- PDF
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
- Medicine, Biology
- Nucleic acids research
- 1997
- 29,087
- Highly Influential
- PDF
Prospects for inferring very large phylogenies by using the neighbor-joining method.
- Biology, Medicine
- Proceedings of the National Academy of Sciences of the United States of America
- 2004
- 3,582
- PDF