Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. This research describes schemes to combinatorially annotate… (More)

We present algorithms that reduce the time and space needed to solve problems of finding all motifs common to a set of sequences. In particular, we give algorithms that (1) require time and space linear in the size of the input, (2) succinctly encode the output so that the time and space requirements depend on the number of motifs, not directly on motif… (More)

- Andrew David Smith, Patricia A Evans, Examining Board, Joseph D Horton, Bradford G Nickerson, Maryhelen Stevenson +3 others
- 2004

Discovering patterns in strings is a central task in analyzing molecular sequences. One pattern discovery problem is to find a pattern that occurs as a substring in each member of a given set of strings. Additionally, occurrences of this pattern are allowed to have up to some specified number of errors, so the occurrences may not exactly match the pattern.… (More)

Problems associated with ÿnding strings that are within a speciÿed Hamming distance of a given set of strings occur in several disciplines. In this paper, we use techniques from pa-rameterized complexity to assess non-polynomial time algorithmic options and complexity for the COMMON APPROXIMATE SUBSTRING (CAS) problem. Our analyses indicate under which… (More)

We describe three applications in computational learning theory of techniques and ideas recently introduced in the study of parameterized computational complexity. (1) Using paratneterized problem reducibilities, we show that P-sized DNF (CNF) formulas can be exactly learned in time polynomial in the number of variables by extended equivalence queries if… (More)

The closest substring problem, where a short string is sought that minimizes the number of mismatches between it and each of a given set of strings, is a minimization problem with a polynomial time approximation scheme [6]. In this paper, both this problem and its maximization complement, where instead the number of matches is maximized, are examined and… (More)