Corpus ID: 220919910

The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification

  title={The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification},
  author={Mihaela Guaman and Radu Tudor Ionescu},
  • Mihaela Guaman, Radu Tudor Ionescu
  • Published 2020
  • Computer Science
  • ArXiv
  • In this work, we provide a follow-up on the Moldavian versus Romanian Cross-Dialect Topic Identification (MRC) shared task of the VarDial 2019 Evaluation Campaign. The shared task included two sub-task types: one that consisted in discriminating between the Moldavian and the Romanian dialects and one that consisted in classifying documents by topic across the two dialects of Romanian. Participants achieved impressive scores, e.g. the top model for Moldavian versus Romanian dialect… CONTINUE READING
    7 Citations

    Figures, Tables, and Topics from this paper

    Explore Further: Topics Discussed in This Paper

    Dialect Identification under Domain Shift: Experiments with Discriminating Romanian and Moldavian
    • 1
    Discriminating between standard Romanian and Moldavian tweets using filtered character ngrams
    • 1
    • Highly Influenced
    A dual-encoding system for dialect classification
    • 1
    A Report on the VarDial Evaluation Campaign 2020
    • 12
    • Highly Influenced
    • PDF
    Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets
    • 2
    • PDF
    Experiments in Language Variety Geolocation and Dialect Identification
    • 1


    ADI17: A Fine-Grained Arabic Dialect Identification Dataset
    • 7
    Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation
    • 14
    • Highly Influential
    • PDF
    DART: A Large Dataset of Dialectal Arabic Tweets
    • 10
    • PDF