Corpus ID: 19168290

cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information

@inproceedings{Cao2018cw2vecLC,
  title={cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information},
  author={Shaosheng Cao and Wei Lu and Jun Zhou and Xiaolong Li},
  booktitle={AAAI},
  year={2018}
}
We propose cw2vec, a novel method for learning Chinese word embeddings. [...] Key Result Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms state-of-the-art approaches such as word-based word2vec and GloVe, character-based CWE, component-based JWE and pixel-based GWE.Expand
Learning chinese word embeddings from character structural information
TLDR
This work employs attention mechanism to capture the semantic structure of Chinese words and proposes a novel framework, named Attention-based multi-Layer Word Embedding model(ALWE), which learns to share subword information between distinct words selectively and adaptively. Expand
An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings
TLDR
A novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams and empirical results verify that this method significantly outperforms several state-of-the-art methods. Expand
Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks
TLDR
The proposed Radical and Stroke-enhanced Word Embeddings (RSWE), a novel method based on neural networks for learning Chinese word embeddings with joint guidance from semantic and morphological internal information, outperforms existing state-of-the-art approaches. Expand
VCWE: Visual Character-Enhanced Word Embeddings
TLDR
A model to learn Chinese word embeddings via three-level composition using a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; a recurrent neural network with self-attention to compose character representation into word embedDings; and the Skip-Gram framework to capture non-compositionality directly from the contextual information. Expand
Learning Chinese Word Embeddings from Stroke, Structure and Pinyin of Characters
TLDR
A novel method ssp2vec is proposed to predict the contextual words based on the feature substrings of the target words for learning Chinese word embeddings, and it is shown that the proposed method obtains better results than state-of-the-art approaches. Expand
Learning Chinese word representation better by cascade morphological n-gram
TLDR
By overlaying component and stroke n -gram vectors on word vectors, this paper successfully improves Chinese word embedding so as to preserve as more morphological information as possible at different granularity levels. Expand
Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
TLDR
This work proposes a continuously enhanced word embedding model that starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Expand
Pronunciation-Enhanced Chinese Word Embedding
TLDR
This study proposes a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings. Expand
Hierarchical Joint Learning for Chinese Word Embeddings
TLDR
This work proposes a method called HJWE which predicts the target word, characters and sub-characters in the targetword at the same time and shows that this method performs best on the word similarity, word analogy and text classification tasks. Expand
Attention Enhanced Chinese Word Embeddings
TLDR
A new Chinese word embeddings method called AWE is introduced by utilizing attention mechanism to enhance Mikolov’s CBOW and proposes P&AWE, which far exceed the CBOW model, and achieve state-of-the-art performances on the task of word similarity. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 50 REFERENCES
Improving Word Embeddings with Convolutional Feature Learning and Subword Information
TLDR
A convolutional neural network architecture is introduced that allows us to measure structural information of context words and incorporate subword features conveying semantic, syntactic and morphological information related to the words. Expand
Joint Learning of Character and Word Embeddings
TLDR
A character-enhanced word embedding model (CWE) is presented to address the issues of character ambiguity and non-compositional words, and the effectiveness of CWE on word relatedness computation and analogical reasoning is evaluated. Expand
Improve Chinese Word Embeddings by Exploiting Internal Structure
TLDR
This paper proposes a similaritybased method to learn Chinese word and character embeddings jointly by exploiting the similarity between a word and its component characters with the semantic knowledge obtained from other languages. Expand
Improved Learning of Chinese Word Embeddings with Semantic Knowledge
TLDR
The basic idea is to take the semantic knowledge about words and their component characters into account when designing composition functions, and experiments show that this approach outperforms two strong baselines on word similarity, word analogy, and document classification tasks. Expand
Multi-Granularity Chinese Word Embedding
TLDR
Quantitative evaluation demonstrates the superiority of MGE in word similarity computation and analogical reasoning, and Qualitative analysis further shows its capability to identify finer-grained semantic meanings of words. Expand
Charagram: Embedding Words and Sentences via Character n-grams
TLDR
It is demonstrated that Charagram embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks. Expand
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
Learning word embeddings efficiently with noise-contrastive estimation
TLDR
This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand
Learning Chinese Word Representations From Glyphs Of Characters
TLDR
The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings. Expand
...
1
2
3
4
5
...