• Corpus ID: 232035603

Multichannel LSTM-CNN for Telugu Technical Domain Identification

  title={Multichannel LSTM-CNN for Telugu Technical Domain Identification},
  author={Sunil Gundapu and R. Mamidi},
With the instantaneous growth of text information, retrieving domain-oriented information from the text data has a broad range of applications in Information Retrieval and Natural language Processing. Thematic keywords give a compressed representation of the text. Usually, Domain Identification plays a significant role in Machine Translation, Text Summarization, Question Answering, Information Extraction, and Sentiment Analysis. In this paper, we proposed the Multichannel LSTM-CNN methodology… 

Figures and Tables from this paper

Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities

Despite their educational level and professional qualifications, an important percentage of highly-skilled migrants and refugees find employment in low-skill vocations throughout the world. Typical

Efficient English Translation Method and Analysis Based on the Hybrid Neural Network

A hybrid neural network that combines the convolutional neural network (CNN) and long short-term memory (LSTM) and introduces the attention mechanism based on the encoder-decoder structure to improve the translation accuracy, especially for long sentences is proposed.



Multichannel CNN with Attention for Text Classification

Attention-based Multichannel Convolutional Neural Network (AMCNN) is proposed for text classification, which utilizes a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations, so that the information of both the front and back of the sentence can be fully expressed.

Generalization and network design strategies

Telugu Text Categorization using Language Models

This paper proposes language dependent and independent models applicable to categorization of Telugu documents using a variant of k-nearest neighbors algorithm used for categorization process.

Learning Word Vectors for 157 Languages

This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors.

Indian Language Text Representation and Categorization Using Supervised Learning Algorithm

The objective of the work is the representation and categorization of Indian language text documents using text mining techniques using naive Bayes classifier, k-Nearest-Neighbor classifier and decision tree for text categorization.

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Ontology Based Text Categorization - Telugu Documents

A new method of ontology based text classification for Telugu documents and retrieval system is introduced which effectively discriminates between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.

SMOTE: Synthetic Minority Over-sampling Technique

A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.