- Full text PDF available (335)
- This year (55)
- Last 5 years (341)
- Last 10 years (572)
Journals and Conferences
Data Set Used
A new class of metrics appropriate for measuring effective similarity relations between sequences, say one type of similarity per metric, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it minorizes every metric in the class (that is, it is universal in that it… (More)
MOTIVATION Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. RESULTS We… (More)
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we… (More)
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete… (More)
This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of nding a pattern that, with some error, occurs in one set of strings (the Closest String… (More)
Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, RNA secondary and tertiary structures can be represented as a set of nested arcs and a set of crossing arcs, respectively. Since RNA functions are largely determined by molecular confirmation and therefore secondary and tertiary structures, the… (More)
Extending the single optimized spaced seed of PatternHunter to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search technology back to a full circle.
This paper studies various algorithmic issues in reconstructing a species tree from gene trees under the duplication and the mutation cost model. This is a fundamental problem in computational molecular biology. Our main results are as follows. 1. A linear time algorithm is presented for computing all the losses in duplications associated with the least… (More)
The RSR2015 database, designed to evaluate text-dependent speaker verification systems under different durations and lexical constraints has been collected and released by the Human Language Technology (HLT) department at Institute for Infocomm Research (IR) in Singapore. English speakers were recorded with a balanced diversity of accents commonly found in… (More)
While achieving the best compression ratios for DNA sequences, our new DNACompress program significantly improves the running time of all previous DNA compression programs.