# Using a VOM model for reconstructing potential coding regions in EST sequences

@article{Shmilovici2007UsingAV, title={Using a VOM model for reconstructing potential coding regions in EST sequences}, author={Armin Shmilovici and Irad Ben-Gal}, journal={Computational Statistics}, year={2007}, volume={22}, pages={49-69} }

This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for…

## 18 Citations

### Gene-finding with the VOM model

- Computer ScienceJ. Comput. Methods Sci. Eng.
- 2007

Experiments with the proposed gene-finder (GF) on three prokaryotic genomes indicate its potential advantage on the detection of short genes.

### Single Species Gene Finding

- Computer Science
- 2015

This chapter covers a five of the most commonly used mathematical models used as main algorithms in single species gene finding, which are hidden Markov models, generalized hidden MarkOV models, interpolated Markov model, neural networks, and decision trees.

### MicroRNA Prediction Using a Fixed-Order Markov Model Based on the Secondary Structure Pattern

- BiologyPloS one
- 2012

A new generation of miRNA prediction algorithm is provided, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences and presents a new understanding of the biological recognition based on the strongest signal’s location detected by FOMmiR.

### Classical and Quantum Algorithms for Constructing Text from Dictionary Problem

- Computer ScienceNat. Comput.
- 2021

The classical algorithm is optimal up to a log factor, and the quantum algorithm shows speed-up comparing to any classical algorithm in a case of non-constant length of strings in the dictionary.

### Representing higher-order dependencies in networks

- Computer ScienceScience Advances
- 2016

The higher-order network (HON) representation is proposed, including accuracy, scalability, and direct compatibility with the existing suite of network analysis methods, and it is illustrated how HON can be applied to a broad variety of tasks, such as random walking, clustering, and ranking.

### Distributions of pattern statistics in sparse Markov models

- Computer Science, MathematicsAnnals of the Institute of Statistical Mathematics
- 2019

Method for efficient computation of pattern distributions through Markov chains with minimal state spaces is extended to the sparse Markov framework, which gives a better handling of the trade-off between bias associated with having too few model parameters and variance from having too many.

### High-Order Entropy-Based Population Diversity Measures in the Traveling Salesman Problem

- Computer ScienceEvolutionary Computation
- 2020

Three types of population diversity measures that address high-order dependencies between the variables to investigate the effectiveness of considering high- order dependencies are proposed.

### A boosting method with asymmetric mislabeling probabilities which depend on covariates

- Computer ScienceComput. Stat.
- 2012

A new boosting method for a kind of noisy data is developed, where the probability of mislabeling depends on the label of a case. The mechanism of the model is based on a simple idea and gives…

### Representing Big Data as Networks: New Methods and Insights

- Computer ScienceArXiv
- 2017

This dissertation proposes theHigher-order network, which is a critical piece for representing higher-order interaction data; it introduces a scalable algorithm for building the network, and visualization tools for interactive exploration, and presents broad applications of the higher- order network in the real-world.

### Measuring the Efficiency of the Intraday Forex Market with a Universal Data Compression Algorithm

- Computer Science
- 2009

A universal Variable Order Markov (VOM) model is presented and used to test the weak form of the Efficient Market Hypothesis and Forex market turns out to be efficient, at least most of the time.

## References

SHOWING 1-10 OF 35 REFERENCES

### ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences

- BiologyISMB
- 1999

It is shown that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors.

### Modeling sequencing errors by combining Hidden Markov models

- Biology, Computer ScienceECCB
- 2003

This research improves the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs.

### A VOM based gene-finder that specializes in short genes

- Engineering2004 23rd IEEE Convention of Electrical and Electronics Engineers in Israel
- 2004

The proposed VOM gene-finder outperforms traditional gene-finders that are based on fifth-order Markov models for short newly sequenced bacterial genomes.

### Interpolated markov chains for eukaryotic promoter recognition

- BiologyBioinform.
- 1999

A new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes based on three interpolated Markov chains of different order which are trained on coding, non-coding and promoter sequences is described.

### ExonHunter: a comprehensive approach to gene finding

- Computer ScienceISMB
- 2005

ExonHunter is a new and comprehensive gene finding system that outperforms existing systems and features several new ideas and approaches and gives a new method for modeling the length distribution of intergenic regions in hidden Markov models.

### DIANA-EST: a statistical analysis

- BiologyBioinform.
- 2001

The goal of this work is the development of a new program called DNA Intelligent Analysis for ESTs (DIANA-EST) based on a combination of Artificial Neural Networks and statistics for the characterization of the coding regions within ESTs and the reconstruction of the encoded protein.

### Finding borders between coding and noncoding DNA regions by an entropic segmentation method.

- Computer SciencePhysical review letters
- 2000

It is found that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets.

### Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

- Computer Science, BiologyBioinform.
- 2001

Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

### Assessment of protein coding measures.

- Computer ScienceNucleic acids research
- 1992

This paper reviews and synthesizes the underlying coding measures from published algorithms and concludes that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures.

### EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance

- BiologyBMC Bioinformatics
- 2002

A new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene based on a hidden Markov model (HMM) that is automatically estimated for a new genome.