Corpus ID: 195345074

KaWAT: A Word Analogy Task Dataset for Indonesian

@article{Kurniawan2019KaWATAW,
  title={KaWAT: A Word Analogy Task Dataset for Indonesian},
  author={Kemal Kurniawan},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.09912}
}
We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains. 

References

SHOWING 1-10 OF 10 REFERENCES
Learning Word Vectors for 157 Languages
  • 640
  • PDF
Polyglot: Distributed Word Representations for Multilingual NLP
  • 409
  • Highly Influential
  • PDF
Indosum: A New Benchmark Dataset for Indonesian Text Summarization
  • 12
  • PDF
Enriching Word Vectors with Subword Information
  • 4,784
  • Highly Influential
  • PDF
Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging
  • 8
  • PDF
Glove: Global Vectors for Word Representation
  • 17,588
  • PDF
Word vectors, reuse, and replicability: Towards a community repository of large-text resources
  • 86
  • PDF
Distributed Representations of Words and Phrases and their Compositionality
  • 21,685
  • PDF