A comparative study of word embeddings and other features for lexical complexity detection in French
@inproceedings{Soler2018ACS, title={A comparative study of word embeddings and other features for lexical complexity detection in French}, author={Aina Gar{\'i} Soler and Marianna Apidianaki and A. Allauzen}, booktitle={JEPTALNRECITAL}, year={2018} }
Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this…
3 Citations
Lexical Complexity Prediction: An Overview
- Computer ScienceACM Computing Surveys
- 2022
An overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data is presented, which includes relevant approaches which include traditional machine learning classifiers and deep neural networks as well as a variety of features, such as those inspired by literature in psycholinguistics.
archer at SemEval-2021 Task 1: Contextualising Lexical Complexity
- Linguistics, PsychologySEMEVAL
- 2021
This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors.
The Generative Nature of Commonsense Knowledge: Insights from Machine Learning
- Computer Science
The main finding of this paper is that the knowledge base that directly facilitates both human agreement and the model’s measure of fit is by its very nature generative, and only truly exists in representation as it is applied.
References
SHOWING 1-10 OF 15 REFERENCES
A model to predict lexical complexity and to grade words (Un modèle pour prédire la complexité lexicale et graduer les mots) [in French]
- LinguisticsJEP/TALN/RECITAL
- 2014
This article identifies a set of predictors of the lexical complexity whose efficiency are assessed with a correlational analysis and the best of those variables are integrated into a model able to predict the difficulty of words for learners of French.
Towards Automatic Lexical Simplification in Spanish: An Empirical Study
- LinguisticsPITR@NAACL-HLT
- 2012
The results of the analysis of a parallel corpus of original and simplified texts in Spanish are presented, gathered for the purpose of developing an automatic simplification system for this language intended for individuals with cognitive disabilities.
Enriching Word Vectors with Subword Information
- Computer ScienceTACL
- 2017
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
SemEval-2012 Task 1: English Lexical Simplification
- Linguistics*SEMEVAL
- 2012
This is the first time such a shared task has been organized and its goal is to provide a framework for the evaluation of systems for lexical simplification and foster research on context-aware lexical Simplification approaches.
A Comparison of Techniques to Automatically Identify Complex Words.
- Computer ScienceACL
- 2013
Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine, which show that thresholding does not perform significantly differently to the more naive technique of simplifyingEverything.
FLELex: a graded lexical resource for French foreign learners
- LinguisticsLREC
- 2014
FLELex is the first graded lexicon for French as a foreign language (FFL) that reports word frequencies by difficulty level (according to the CEFR scale) and is presented to freely provide this resource to the community to be used for a variety of purposes.
UOW-SHEF: SimpLex – Lexical Simplicity Ranking based on Contextual and Psycholinguistic Features
- Linguistics*SEMEVAL
- 2012
SimpLex operates on the basis of a linear weighted ranking function composed of context sensitive and psycholinguistic features that outperforms a very strong baseline, and ranked first on the shared task at SemEval-2012.
Statistical Estimation of Word Acquisition with Application to Readability Prediction
- Computer ScienceEMNLP
- 2009
A novel statistical model for document readability that is based on the logistic Rasch model and the quantiles of word acquisition age distributions is presented, and it is demonstrated that the estimated acquisition distributions are very effective in predicting both global and local documentreadability.
MANULEX: A grade-level lexical database from French elementary school readers
- EducationBehavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc
- 2004
MANULEX is a Web-accessible database that provides grade-level word frequency lists of nonlemmatized and lemmatization words computed from the 1.9 million words taken from 54 French elementary school readers.
The effects of syntactic and lexical complexity on the comprehension of elementary science texts
- Linguistics
- 2011
In this study we examined the effects of syntactic and lexical complexity on third-grade students' comprehension of science texts. A total of 16 expository texts were designed to represent systematic…