Data mining for proteins characteristic of clades


A synapomorphy is a phylogenetic character that provides evidence of shared descent. Ideally a synapomorphy is ubiquitous within the clade of related organisms and nonexistent outside the clade, implying that it arose after divergence from other extant species and before the last common ancestor of the clade. With the recent proliferation of genetic sequence data, molecular synapomorphies have assumed great importance, yet there is no convenient means to search for them over entire genomes. We have developed a new program called Conserv, which can rapidly assemble orthologous sequences and rank them by various metrics, such as degree of conservation or divergence from out-group orthologs. We have used Conserv to conduct a largescale search for molecular synapomorphies for bacterial clades. The search discovered sequences unique to clades, such as Actinobacteria, Firmicutes and gamma-Proteobacteria, and shed light on several open questions, such as whether Symbiobacterium thermophilum belongs with Actinobacteria or Firmicutes. We conclude that Conserv can quickly marshall evidence relevant to evolutionary questions that would be much harder to assemble with other tools.

Extracted Key Phrases

8 Figures and Tables


Citations per Year

575 Citations

Semantic Scholar estimates that this publication has 575 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Bern2006DataMF, title={Data mining for proteins characteristic of clades}, author={Marshall W. Bern and David Goldberg and Eugenia Lyashenko}, journal={Nucleic Acids Research}, year={2006}, volume={34}, pages={4342 - 4353} }