String Kernels for Native Language Identification: Insights from Behind the Curtains

@article{Ionescu2016StringKF,
  title={String Kernels for Native Language Identification: Insights from Behind the Curtains},
  author={Radu Tudor Ionescu and Marius Popescu and Aoife Cahill},
  journal={Computational Linguistics},
  year={2016},
  volume={42},
  pages={491-525}
}
The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. Despite the fact that the approach based on string kernels performs so well… CONTINUE READING
BETA

Similar Papers

Results and Topics from this paper.

Key Quantitative Results

  • In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far.
  • In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far.
  • In addition, in a cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state-of-the-art system by 32.3%.

Citations

Publications citing this paper.
SHOWING 1-10 OF 24 CITATIONS

Can string kernels pass the test of time in Native Language Identification?

VIEW 13 EXCERPTS
CITES RESULTS, METHODS & BACKGROUND
HIGHLY INFLUENCED

Learning to Identify Arabic and German Dialects using Multiple Kernels

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Local frame match distance: A novel approach for exemplar gesture recognition

  • 2017 25th European Signal Processing Conference (EUSIPCO)
  • 2017
VIEW 4 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels

VIEW 7 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

SC-UPB at the VarDial 2019 Evaluation Campaign : Moldavian vs . Romanian Cross-Dialect Topic Identification

Cristian Onose, Dumitru-Clementin Cercel, Stefan Trausan-Matu
  • 2019
VIEW 1 EXCERPT

References

Publications referenced by this paper.
SHOWING 1-10 OF 52 REFERENCES

A report on the first native language identification shared task

Tetreault, Joel, Daniel Blanchard, Aoife Cahill.
  • Proceedings of the Eighth Workshop on Innovative Use of NLP for Building
  • 2013
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Experimental results on the native language identification shared task

Abu-Jbara, Amjad, Rahul Jha, Eric Morley, Dragomir Radev.
  • Proceedings of the Eighth Workshop on Innovative Use of
  • 2013
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

A fast algorithm for Local Rank Distance: Application to Arabic native language identification

Ionescu, Radu Tudor.
  • Proceedings of ICONIP, pages 390–400, Istanbul.
  • 2015