PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets
@article{Deshpande2019PLITAA, title={PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets}, author={Sumukh Deshpande and James Shuttleworth and Jianhua Yang and Sandy Taramonli and Matthew England}, journal={Computers in biology and medicine}, year={2019}, volume={105}, pages={ 169-181 } }
Figures and Tables from this paper
22 Citations
Computational methods for annotation of plant regulatory non-coding RNAs using RNA-seq
- BiologyBriefings Bioinform.
- 2021
This review discusses major plant endogenous, regulatory ncRNAs in an RNA sample followed by computational strategies applied to discover each class of nc RNAs using RNA-seq, to present a comprehensive bioinformatics toolbox for plant ncRNA researchers.
Long Non-coding RNA for Plants Using Big Data Analytics—A Review
- Biology
- 2019
The role of emergent systems and databases to store the data of lncRNAs of plants is presented and the importance of Big Data analytics in storage of data and Machine learning algorithms for implementation plays a major role.
Systematic and computational identification of Androctonus crassicauda long non-coding RNAs
- BiologyScientific reports
- 2021
A stringent step-by-step filtering pipeline and machine learning-based tools were used to identify the specific Androctonus crassicauda lncRNAs and analyze the features of predicted scorpion lnc RNAs, uncovering that lower protein-coding potential, lower GC content, shorter transcript length, and less number of isoform per gene are outstanding features of A. crassic audiology transcripts.
PtLnc-BXE: Prediction of plant lncRNAs using a Bagging-XGBoost-ensemble method with multiple features
- BiologyArXiv
- 2019
A plant lncRNA prediction approach PtLnc-BXE is presented, which combines multiple sequence features in two steps to develop an ensemble mode and outperformed other state-of-the-art plant lNCRNA prediction methods, achieving higher AUC on the benchmark datasets.
Biogenesis, Functions, Interactions, and Resources of Non-Coding RNAs in Plants
- BiologyInternational journal of molecular sciences
- 2022
The biogenesis, biological functions, and interactions with DNA, RNA, protein, and microorganism of five major regulatory ncRNAs in plants are described and tools for analysis and prediction of plant nc RNAs are summarized, as well as databases.
Common Features in lncRNA Annotation and Classification: A Survey
- BiologyNon-coding RNA
- 2021
It is found that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences, and the distinction of lnc RNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
The computational approaches of lncRNA identification based on coding potential: Status quo and challenges
- BiologyComputational and structural biotechnology journal
- 2020
ItLnc-BXE: A Bagging-XGBoost-Ensemble Method With Comprehensive Sequence Features for Identification of Plant lncRNAs
- BiologyIEEE Access
- 2020
The results show that ItLnc-BXE outperforms other state-of-the-art plant lncRNA identification methods, achieving better and robust performance, and the results indicate that dicots-based and monocot-based models can be used to accurately identify lncRNAs in lower plant species, such as mosses and algae.
Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction.
- Computer ScienceGenomics
- 2020
References
SHOWING 1-10 OF 53 REFERENCES
PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme
- BiologyBMC Bioinformatics
- 2013
PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes and is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data.
lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts
- BiologyScientific reports
- 2016
Compared to other state-of-the-art alignment-free tools (e.g. CPAT, CNCI, and PLEK), lncScore outperforms them on accurately distinguishing lncRNAs from m RNAs, especially partial-length mRNAs in the human and mouse datasets.
CANTATAdb: A Collection of Plant Long Non-Coding RNAs
- BiologyPlant & cell physiology
- 2016
An online database of lncRNAs in 10 model plant species is created and their potential roles in splicing modulation and deregulation of microRNA functions are investigated to better characterize them.
lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning.
- BiologyMolecular bioSystems
- 2015
A powerful predictor to identify lncRNAs by fusing multiple features of the open reading frame, k-mer, the secondary structure and the most-like coding domain sequence and using deep learning classification algorithms is developed, showing that lncRNA-MFDL is a powerful tool for identifying lnc RNAs.
PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features.
- Biology, Computer ScienceMolecular bioSystems
- 2015
This study presented a machine learning approach, named as PredcircRNA, focused on distinguishing circularRNA from other lncRNAs using multiple kernel learning, and showed that the proposed method can classify circular RNA from other types of lnc RNAs with an accuracy of 0.778.
NONCODE 2016: an informative and valuable data source of long non-coding RNAs
- BiologyNucleic Acids Res.
- 2016
In this update, NONCODE has added six new species, bringing the total to 16 species altogether and introduced three important new features: conservation annotation; the relationships between lncRNAs and diseases; and an interface to choose high-quality datasets through predicted scores, literature support and long-read sequencing method support.
Long Non-coding RNAs and Their Biological Roles in Plants
- BiologyGenom. Proteom. Bioinform.
- 2015
GENCODE: the reference human genome annotation for The ENCODE Project.
- Biology, Computer ScienceGenome research
- 2012
This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.
Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
- BiologyNucleic acids research
- 2013
The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan.
CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model
- Biology, Computer ScienceNucleic acids research
- 2013
A novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates, and is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies.