André Yoshiaki Kashiwabara

Learn More
UNLABELLED EGene is a generic, flexible and modular pipeline generation system that makes pipeline construction a modular job. EGene allows for third-party programs to be used and integrated according to the needs of distinct projects and without any previous programming or formal language experience being required. EGene comes with CoEd, a visual tool to(More)
This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site(More)
BACKGROUND A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative(More)
The identification of transcription factors binding sites (TFBS) – also called motifs – in DNA sequences is the first step to understanding how works gene regulation. Recognizing these patterns in the promoter regions of co-expressed genes is a determining key for this. Although there are several algorithms for this purpose, the problem is(More)
The development of new genomic sequencing techniques leads to a generation of a huge volume of biological data. In this context, it is important to develop new pattern recognition methods and improve its accuracy in order to support the analysis of these huge volume of data. In particular, a valuable information of the genomic sequences is its nucleotides(More)
This work presents a new approach for classification of ge-nomic sequences from measurements of complex networks and information theory. For this, it is considered the nucleotides, dinucleotides and trinucleotides of a genomic sequence. For each of them, the entropy, sum entropy and maximum entropy values are calculated. For each of them is also generated a(More)
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by(More)
  • A Gruber, Ahagon, +25 authors D J Gapped
  • 2003
DNA reads generated by large-scale sequencing projects have to be processed before further analyses in order to perform vector/primer masking, low-quality trimming and contaminant removal. This sequential processing involves several steps and the use of different computer programs, each one following its own calling convention and input/output formats. As a(More)
  • 1