• Corpus ID: 51969623

Unsupervised Learning of Sentence Representations Using Sequence Consistency

@article{Brahma2018UnsupervisedLO,
  title={Unsupervised Learning of Sentence Representations Using Sequence Consistency},
  author={Siddhartha Brahma},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.04217}
}
Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing consistency constraints on sequences of tokens. We consider two classes of such constraints -- sequences that form a sentence and between two sequences that form a sentence when merged. We learn sentence encoders by training them to distinguish between consistent… 
7 Citations

Figures and Tables from this paper

Mining Discourse Markers for Unsupervised Sentence Representation Learning
TLDR
This work proposes a method to automatically discover sentence pairs with relevant discourse markers, and applies it to massive amounts of data, to use as supervision for learning transferable sentence embeddings.
On Losses for Modern Language Models
TLDR
It is shown that NSP is detrimental to training due to its context splitting and shallow semantic signal, and it is demonstrated that using multiple tasks in a multi-task pre-training framework provides better results than using any single auxiliary task.
Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers
TLDR
Evaluating Vectors for Lexemes and Synsets Toward Expansion of Japanese WordNet and Context-Guided Self-supervised Relation Embeddings.
Main category model Subcategory model Feature extractors Entity linking model Wikipedia
  • Computer Science
  • 2018
TLDR
This work shows that combining neural NER model and entity linking model with a knowledge base is more effective in recognizing named entities than using NER models alone.
BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
TLDR
This work pretrain a simple CNN using a GAN-style learning scheme and Wikipedia data, and integrates it with standard TLMs, and shows that on the GLUE tasks, the combination of the pretrained CNN with ALberT outperforms the original ALBERT and achieves a similar performance to that of SOTA.
SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders
TLDR
A training regime for enabling encoders that can effectively deal with word-sequences of various kinds, including complete sentences, as well as incomplete sentences and phrases is proposed.
Combining neural and knowledge-based approaches to Named Entity Recognition in Polish
TLDR
This work shows that combining neural NER model and entity linking model with a knowledge base is more effective in recognizing named entities than using Ner model alone.

References

SHOWING 1-10 OF 25 REFERENCES
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.
An efficient framework for learning sentence representations
TLDR
This work reformulates the problem of predicting the context in which a sentence appears as a classification problem, and proposes a simple and efficient framework for learning sentence representations from unlabelled data.
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
TLDR
This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
DisSent: Sentence Representation Learning from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT.
Fake Sentence Detection as a Training Task for Sentence Encoding
TLDR
It is found that the BiLSTM trains much faster on fake sentence detection using smaller amounts of data (1M instead of 64M sentences) and the learned representations capture many syntactic and semantic properties expected from good sentence representations.
Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features
TLDR
This work presents a simple but efficient unsupervised objective to train distributed representations of sentences, which outperforms the state-of-the-art un supervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
Learning Distributed Representations of Sentences from Unlabelled Data
TLDR
A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application.
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
...
...