Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

@article{Gupta2017DilatedCF,
  title={Dilated Convolutions for Modeling Long-Distance Genomic Dependencies},
  author={Ankit Gupta and Alexander M. Rush},
  journal={bioRxiv},
  year={2017}
}
We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to… 
Projection layers improve deep learning models of regulatory DNA function
TLDR
Analysis of the learned projection weights shows that the inclusion of this layer simplifies the network’s internal representation of the occurrence of motifs, notably by projecting features representing forward and reverse-complement motifs to similar positions in the lower dimensional feature space output by the layer.
Accelerating Protein Design Using Autoregressive Generative Models
TLDR
This work borrows from recent advances in natural language processing and speech synthesis to develop a generative deep neural network-powered autoregressive model for biological sequences that captures functional constraints without relying on an explicit alignment structure.
Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
TLDR
An accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network and recurrent neural network to predict TFBS is developed that can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph.
DeepARC: An Attention-based Hybrid Model for Predicting Transcription Factor Binding Sites from Positional Embedded DNA Sequence
  • Jia-li Chen, L. Deng
  • Biology, Computer Science
    2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • 2020
TLDR
A position-based embedding strategy to embed a DNA sequence into a matrix with distributed representation contenting the position information and then feed the distributed representations into a CNN-BiLSTM-Attention-based framework to classify whether there is a TFBS in a sequence.
PIPENN: Protein Interface Prediction with an Ensemble of Neural Nets
TLDR
The PIPENN’s ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on all interaction types, and shows no single DL architecture performs best on all instances, but that an ensemble of DL architectures consistently achieves peak prediction performance.
Modeling Genome Data Using Bidirectional LSTM
  • Neda Tavakoli
  • Computer Science
    2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)
  • 2019
TLDR
This paper proposes to use deep bidirectional LSTM for sequence modeling as an approach to perform locality-sensitive hashing (LSH)-based sequence alignment and shows that using the introduced L STM-based model, it achieves higher accuracy with the number of epochs.
Learning the Regulatory Code of Gene Expression
TLDR
The approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance.
Protein Design and Variant Prediction Using Autoregressive Generative Models
TLDR
This work introduces a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments and successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library.
Protein design and variant prediction using autoregressive generative models
TLDR
This work introduces a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments and successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library.
Deep SNP: An End-to-end Deep Neural Network with Attention-based Localization for Break-point Detection in SNP Array Genomic data
TLDR
It is shown, that Deep SNP is capable of successfully predicting the presence or absence of a breakpoint in large genomic windows and outperforms state-of-the-art neural network models.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
TLDR
The DanQ model, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting noncoding function de novo from sequence, improves considerably upon other models across several metrics.
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
TLDR
This work shows that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery.
Predicting effects of noncoding variants with deep learning–based sequence model
TLDR
A deep learning–based algorithmic framework, DeepSEA, is developed that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity and improving prioritization of functional variants.
Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
TLDR
Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction, are proposed, which are more accurate than Bi-LSTM-CRFs while attaining 8x faster test time speeds.
An Integrated Encyclopedia of DNA Elements in the Human Genome
TLDR
The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
An integrated encyclopedia of DNA elements in the human genome
TLDR
The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions
TLDR
Iterated dilated convolutional neural networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction, are proposed, which are not only more accurate than Bi-LSTM-CRFs, but also 8x faster at test time on long sequences.
Expanding the ‘central dogma’: the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia
TLDR
The possibility that ncRNA regulation of schizophrenia risk genes may underlie the diverse findings of genetic linkage studies including that protein-altering gene polymorphisms are not generally found in schizophrenia is discussed.
Multi-Scale Context Aggregation by Dilated Convolutions
TLDR
This work develops a new convolutional network module that is specifically designed for dense prediction, and shows that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems.
...
...