Learn More
In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making fewest changes to the edge set of an input graph so that it becomes a cluster graph. We study the complexity of(More)
Post-translational modifications (PTMs) are of great biological importance. Most existing approaches perform a restrictive search that can only take into account a few types of PTMs and ignore all others. We describe an unrestrictive PTM search algorithm that searches for all types of PTMs at once in a blind mode, i.e., without knowing which PTMs exist in a(More)
We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomorphic to H or decide that there is no such subtree. We give an O( k1:5 logkn)time algorithm for this problem, where k and n are the number of vertices in H and G respectively. This improves over the O(k1:5n) algorithms of Chung and Matula. We also give a(More)
Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S′ of S is the subset C ⊆ Σ of the symbols occurring in S′. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or(More)
The following probabilistic process models the generation of noisy clustering data: Clusters correspond to disjoint sets of vertices in a graph. Each two vertices from the same set are connected by an edge with probability p, and each two vertices from different sets are connected by an edge with probability r < p. The goal of the clustering problem is to(More)
We present an index for top-k most frequent document retrieval whose space is |CSA|+o(n)+D log n D+O(D) bits, and its query time is O(log k log 2+ n) per reported document, where D is the number of documents, n is the sum of lengths of the documents, and |CSA| is the space of the compressed suffix array for the documents. This improves over previous results(More)
Real Scaled Matching is the problem of finding all locations in the text where the pattern, proportionally enlarged according to an arbitrary real-sized scale, appears. Real scaled matching is an important problem that was originally inspired by Computer Vision. In this paper, we present a new, more precise and realistic, definition for one dimensional real(More)
Given two undirected trees T and P , the Subtree Homeomorphism Problem is to find whether T has a subtree t that can be transformed into P by removing entire subtrees, as well as repeatedly removing a degree-2 node and adding the edge joining its two neighbors. In this paper we extend the Subtree Homeomorphism Problem to a new optimization problem by(More)