Michael G. Sadovsky

  • Citations Per Year
Learn More
A new method to compare two (or several) symbol sequences is developed. The method is based on the comparison of the frequencies of the small fragments of the compared sequences; it requires neither string editing, nor other transformations of the compared objects. The comparison is executed through a calculation of the specific entropy of a frequency(More)
The information capacity of nucleotide sequences is defined through the calculation of specific entropy of their frequency dictionary. The specificentropy of the frequency dictionary is calculated against the reconstructeddictionary; this latter bears the most probable continuations of the shorterstrings. This developed measure allows to distinguish the(More)
The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their(More)
Information capacity of nucleotide sequences measures the unexpectedness of a continuation of a given string of nucleotides, thus having a sound relation to a variety of biological issues. A continuation is defined in a way maximizing the entropy of the ensemble of such continuations. The capacity is defined as a mutual entropy of real frequency dictionary(More)
Statistical analysis of biological macromolecules is an important tool of modern biophysics and bioinformatics. Let us call any sequence that the researcher regards as an entity (a gene, an exon or intron, a gene cluster, or a genome) a genetic text (GT). Researchers identify various structures within the GT; the method for identifying them is usually(More)