Corpus ID: 5892596

Cross-Lingual Dataless Classification for Many Languages

@inproceedings{Song2016CrossLingualDC,
  title={Cross-Lingual Dataless Classification for Many Languages},
  author={Y. Song and Shyam Upadhyay and H. Peng and D. Roth},
  booktitle={IJCAI},
  year={2016}
}
  • Y. Song, Shyam Upadhyay, +1 author D. Roth
  • Published in IJCAI 2016
  • Computer Science
  • Dataless text classification [Chang et al., 2008] is a classification paradigm which maps documents into a given label space without requiring any annotated training data. This paper explores a crosslingual variant of this paradigm, where documents in multiple languages are classified into an English label space. We use CLESA (cross-lingual explicit semantic analysis) to embed both foreign language documents and an English label space into a shared semantic space, and select the best label(s… CONTINUE READING
    15 Citations

    Figures, Tables, and Topics from this paper.

    Cross-lingual Dataless Classification for Languages with Small Wikipedia Presence
    • 3
    • PDF
    Toward any-language zero-shot topic classification of textual documents
    • 4
    • PDF
    Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Polylingual Text Classification
    • 1
    • PDF
    Multi-label dataless text classification with topic modeling
    • 6
    • Highly Influenced
    • PDF
    N-Gram Graphs for Text Classification: A Distributed Approach
    • 2017
    • 1
    • PDF

    References

    SHOWING 1-10 OF 57 REFERENCES
    Semi-Supervised Representation Learning for Cross-Lingual Text Classification
    • 27
    • PDF
    Cross-Language Text Classification Using Structural Correspondence Learning
    • 224
    • PDF
    A co-classification approach to learning from multilingual corpora
    • 44
    • PDF
    Importance of Semantic Representation: Dataless Classification
    • 123
    • PDF
    On Dataless Hierarchical Text Classification
    • 83
    • PDF
    Inducing Crosslingual Distributed Representations of Words
    • 324
    • PDF
    Semi-Supervised Learning for Natural Language
    • 340
    • PDF
    A Wikipedia-Based Multilingual Retrieval Model
    • 220
    • PDF