• Publications
  • Influence
On Prediction Using Variable Order Markov Models
TLDR
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. Expand
  • 383
  • 53
  • PDF
Variations on probabilistic suffix trees: statistical modeling and prediction of protein families
TLDR
MOTIVATION We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). Expand
  • 186
  • 20
  • PDF
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.
  • G. Yona, M. Levitt
  • Computer Science, Medicine
  • Journal of molecular biology
  • 1 February 2002
This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess theirExpand
  • 302
  • 18
  • PDF
ProtoMap: automatic classification of protein sequences and hierarchy of protein families
TLDR
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. Expand
  • 186
  • 12
  • PDF
Automatic prediction of protein domains from sequence information using a hybrid learning system
TLDR
MOTIVATION We describe a novel method for detecting the domain structure of a protein from sequence information alone. Expand
  • 78
  • 10
  • PDF
Modeling protein families using probabilistic suffix trees
TLDR
We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). Expand
  • 71
  • 10
  • PDF
BIOZON: a system for unification, management and analysis of heterogeneous biological data
TLDR
We present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Expand
  • 105
  • 9
  • PDF
QualComp: a new lossy compressor for quality scores based on rate distortion theory
TLDR
We present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Expand
  • 49
  • 6
  • PDF
Global self-organization of all known protein sequences reveals inherent biological signatures.
TLDR
A global classification of all currently known protein sequences is performed. Expand
  • 86
  • 5
  • PDF
BIOZON: a hub of heterogeneous biological data
TLDR
A unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein–protein interactions and cellular pathways, and establishes the relationships between them. Expand
  • 52
  • 5
  • PDF