• Corpus ID: 207847524

Char-RNN and Active Learning for Hashtag Segmentation

  title={Char-RNN and Active Learning for Hashtag Segmentation},
  author={T. Glushkova and E. Artemova},
We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which… 

Figures and Tables from this paper

Hashtag Segmentation: A Comparative Study Involving the Viterbi, Triangular Matrix and Word Breaker Algorithms

The Word Breaker algorithm, which can ascertain the meaningful tokens in the form of words, before proceeding with the segmentation of the remaining characters, is considered superior to both the Viterbi and Triangular Matrix algorithms, particularly when it comes to the detection of unknown words.

Research on automatic detection of text-oriented counterfactual statements

Experiments show that the proposed detection method can accurately detect and locate the cause and result in the counterfactual text.

ALRt: An Active Learning Framework for Irregularly Sampled Temporal Data




Neural Word Segmentation Learning for Chinese

A novel neural framework is proposed which thoroughly eliminates context windows and can utilize complete segmentation history and employs a gated combination neural network over characters to produce distributed representations of word candidates, which are then given to a long short-term memory (LSTM) language scoring model.

A Neural Architecture for Dialectal Arabic Segmentation

This paper shows how a segmenter can be trained using only 350 annotated tweets using neural networks without any normalization or use of lexical features or lexical resources.

Active Discriminative Text Representation Learning

It is argued that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations), in contrast to traditional AL approaches which specify higher level objectives.

Processing and Normalizing Hashtags

This work’s normalization scripts allow for the lexical consolidation and segmentation of hashtags, potentially leading to improved semantic classification in Twitter text.

A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks

This paper proposes a gap-based framework to implement segmenting a given sentence, which outperforms the best character-based and word-based methods on 5 benchmarks, without any further post-processing module (e.g. Conditional Random Fields) nor beam search.

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks.

Neural Networks Incorporating Dictionaries for Chinese Word Segmentation

This paper seeks to address the problem of incorporating dictionaries into neural networks for the Chinese word segmentation task and proposes two different methods that extend the bi-directional long short-term memory neural network to perform the task.

Fast and Accurate Neural Word Segmentation for Chinese

This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks of current neural models, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.

Learning Character-level Representations for Part-of-Speech Tagging

A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.

Towards Deep Semantic Analysis of Hashtags

A context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet, which demonstrates the effectiveness of the technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.