DivergentSet, a Tool for Picking Non-redundant Sequences from Large Sequence Collections*

@article{Widmann2006DivergentSetAT,
  title={DivergentSet, a Tool for Picking Non-redundant Sequences from Large Sequence Collections*},
  author={J. Widmann and M. Hamady and Rob Knight},
  journal={Molecular & Cellular Proteomics},
  year={2006},
  volume={5},
  pages={1520 - 1532}
}
  • J. Widmann, M. Hamady, Rob Knight
  • Published 2006
  • Biology, Medicine
  • Molecular & Cellular Proteomics
  • DivergentSet addresses the important but so far neglected bioinformatics task of choosing a representative set of sequences from a larger collection. We found that using a phylogenetic tree to guide the construction of divergent sets of sequences can be up to 2 orders of magnitude faster than the naive method of using a full distance matrix. By providing a user-friendly interface (available online) that integrates the tasks of finding additional sequences, building and refining the divergent… CONTINUE READING
    13 Citations

    Figures and Topics from this paper

    MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs
    • 3
    • PDF
    Global patterns in bacterial diversity
    • 1,172
    • PDF
    PyCogent: a toolkit for making sense from sequence
    • 195
    • PDF
    Phylogeography of microbial phototrophs in the dry valleys of the high Himalayas and Antarctica
    • 71
    • PDF

    References

    SHOWING 1-10 OF 24 REFERENCES
    Removing near-neighbour redundancy from large protein sequence collections
    • 279
    • PDF
    Shotgun: getting more from sequence similarity searches
    • 37
    • PDF
    A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
    • CHRISTUS, WUKSCH
    • 3,747
    • PDF
    MUSCLE: multiple sequence alignment with high accuracy and high throughput.
    • R. Edgar
    • Biology, Medicine
    • Nucleic acids research
    • 2004
    • 29,388
    • Highly Influential
    • PDF
    FastGroup: A program to dereplicate libraries of 16S rDNA sequences
    • 54
    • Highly Influential
    Incomplete taxon sampling is not a problem for phylogenetic inference
    • M. Rosenberg, S. Kumar
    • Medicine, Biology
    • Proceedings of the National Academy of Sciences of the United States of America
    • 2001
    • 286
    • PDF
    Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness
    • 2,386
    • PDF
    Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
    • 29,367
    • Highly Influential
    • PDF
    Basic local alignment search tool.
    • 73,008
    • PDF
    Prospects for inferring very large phylogenies by using the neighbor-joining method.
    • K. Tamura, M. Nei, S. Kumar
    • Biology, Medicine
    • Proceedings of the National Academy of Sciences of the United States of America
    • 2004
    • 3,619
    • PDF