Convolutional neural networks for classification of alignments of non-coding RNA sequences

@article{Aoki2018ConvolutionalNN,
  title={Convolutional neural networks for classification of alignments of non-coding RNA sequences},
  author={Genta Aoki and Yasubumi Sakakibara},
  journal={Bioinformatics},
  year={2018},
  volume={34},
  pages={i237 - i244}
}
Motivation The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. [] Key Method Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary‐structure information specific to ncRNAs and furthermore with mapping profiles of next‐generation…

Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations

A novel deep fusion learning framework, GcForest fusion method (GCFM), is proposed, to classify alignments of ncRNA sequences for accurate clustering of n cRNAs and applies GCFM to construct a phylogenetic tree of nCRNA and predict the probability of interactions between RNAs.

Classification of Long Noncoding RNA Elements Using Deep Convolutional Neural Networks and Siamese Networks

An efficient approach to convert the RNA sequences into images characterizing their base-pairing probability is proposed, and this research also considers the folding potential of the ncRNAs in addition to their primary sequence.

Classification of Noncoding RNA Elements Using Deep Convolutional Neural Networks

An efficient approach to convert the RNA sequences into images characterizing their base-pairing probability is proposed, and classifying RNA sequences is converted to an image classification problem that can be efficiently solved by available CNN-based classification models.

Bacterial classification with convolutional neural networks based on different data reduction layers

The Random Projection with an activation function, which carries data with a decent variety with some randomness, is suggested instead of the pooling layers and the feature reduction is achieved while keeping the high accuracy for classifying bacteria into taxonomic levels.

Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs

Different types of ncRNAs are discussed, computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA are reviewed, and sub-cellular localization determination datasets are summarized to aid the development and evaluation of novel computational methodologies.

Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning

A pre-training algorithm is adopted for the effective embedding of RNA bases to acquire semantically rich representations and this algorithm is applied to two fundamental RNA sequence problems: structural alignment and clustering.

Informative RNA-base embedding for functional RNA structural alignment and clustering by deep representation learning

A pre-training algorithm is adopted for the effective embedding of RNA bases to acquire semantically rich representations, and it is used to achieve accuracy superior to that of existing state-of-the-art methods in RNA structural alignment and RNA family clustering tasks.

iSS-CNN: Identifying splicing sites using convolution neural network

DeepPPF: A deep learning framework for predicting protein family

References

SHOWING 1-10 OF 30 REFERENCES

Convolutional neural network architectures for predicting DNA–protein binding

A systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets is presented, finding that adding convolutional kernels to a network is important for motif-based tasks and creating a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures.

Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics

A new representation and feature extraction method for biological sequences that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction is introduced.

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

This work describes a new similarity measure for the hierarchical clustering of ncRNAs that can keep its high performance even when the sequence identity of family members is less than 60% and approximate structural alignment in a more simplified manner.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks

An open source package Basset is introduced to apply CNNs to learn the functional activity of DNA sequences from genomics data and offers a powerful computational approach to annotate and interpret the noncoding genome.

SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

An algorithm termed SHARAKU is developed to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs to allow for the detection of common processing patterns of small derived RNA families expressed in the brain.

RNAscClust: clustering RNA sequences using structure conservation and graph based motifs

This work presents RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account, and shows that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

This work shows that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery.

Deep Motif: Visualizing Genomic Sequence Classifications

This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task and proposes an optimization driven strategy to extract "motifs", or symbolic patterns which visualize the positive class learned by the network.

GraphClust: alignment-free structural clustering of local RNA secondary structures

A novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure, which is comparable to state-of-the-art sequence–structure methods although achieving speedups of several orders of magnitude.

DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition

DAFS, a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment, is developed and extended to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure.