• Corpus ID: 245986712

Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants

@article{Xylogiannopoulos2022MultipleGA,
  title={Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants},
  author={Konstantinos F. Xylogiannopoulos},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.05198}
}
Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The SARS-CoV-2 pandemic has made such problems more demanding with hundreds or thousands of new genome variants discovered every week, because of constant mutations, and there is a desperate need for fast and accurate analyses. The requirement for computational tools for genomic analyses… 

References

SHOWING 1-10 OF 28 REFERENCES

String Matching in DNA Databases

TLDR
String Matching in DNA Databases: Index-based: In situations where a fixed string s is to be searched repeatedly, it is worthwhile constructing an index over s, such as suffix trees, suffix arrays, and more recently the BWTtransformation.

Evaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences

TLDR
The most efficient solutions for the online exact matching problem appeared in the latest years when applied for searching on genome sequences are reviewed and some new variants of an efficient string matching algorithm are proposed.

Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads

TLDR
Robust detection of human repeat expansions from careful alignments of long but error-prone reads to a reference genome is reported, which may help to elucidate the many genetic diseases whose causes remain unknown.

Repeated patterns detection in big data using classification and parallelism on LERP Reduced Suffix Arrays

TLDR
The Probabilistic Existence of LerP theorem has been proven in this paper and a formula for an accurate upper bound estimation of the LERP value has been introduced using only the length of the string and the size of the alphabet used in constructing the string.

Identification of common molecular subsequences.

Analyzing very large time series using suffix arrays

TLDR
It is argued that MLERP is a very useful tool for detecting all repeated patterns in a time series regardless of its size and hardware limitations.

NR‐grep: a fast and flexible pattern‐matching tool

TLDR
Nrgrep is a new pattern‐matching tool designed for efficient search of complex patterns based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton that can find from simple patterns to regular expressions, exactly or allowing errors in the matches.

A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity

TLDR
This study reveals a family of endonucleases that use dual-RNAs for site-specific DNA cleavage and highlights the potential to exploit the system for RNA-programmable genome editing.

Fast Pattern Matching in Strings

TLDR
An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.

The exact string matching algorithms efficiency review

TLDR
It is concluded that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison.