• Corpus ID: 69755969

A comparative study of word embeddings and other features for lexical complexity detection in French

  title={A comparative study of word embeddings and other features for lexical complexity detection in French},
  author={Aina Gar{\'i} Soler and Marianna Apidianaki and A. Allauzen},
Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this… 

Tables from this paper

Lexical Complexity Prediction: An Overview

An overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data is presented, which includes relevant approaches which include traditional machine learning classifiers and deep neural networks as well as a variety of features, such as those inspired by literature in psycholinguistics.

archer at SemEval-2021 Task 1: Contextualising Lexical Complexity

This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors.

The Generative Nature of Commonsense Knowledge: Insights from Machine Learning

The main finding of this paper is that the knowledge base that directly facilitates both human agreement and the model’s measure of fit is by its very nature generative, and only truly exists in representation as it is applied.



A model to predict lexical complexity and to grade words (Un modèle pour prédire la complexité lexicale et graduer les mots) [in French]

This article identifies a set of predictors of the lexical complexity whose efficiency are assessed with a correlational analysis and the best of those variables are integrated into a model able to predict the difficulty of words for learners of French.

Towards Automatic Lexical Simplification in Spanish: An Empirical Study

The results of the analysis of a parallel corpus of original and simplified texts in Spanish are presented, gathered for the purpose of developing an automatic simplification system for this language intended for individuals with cognitive disabilities.

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

SemEval-2012 Task 1: English Lexical Simplification

This is the first time such a shared task has been organized and its goal is to provide a framework for the evaluation of systems for lexical simplification and foster research on context-aware lexical Simplification approaches.

A Comparison of Techniques to Automatically Identify Complex Words.

Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine, which show that thresholding does not perform significantly differently to the more naive technique of simplifyingEverything.

FLELex: a graded lexical resource for French foreign learners

FLELex is the first graded lexicon for French as a foreign language (FFL) that reports word frequencies by difficulty level (according to the CEFR scale) and is presented to freely provide this resource to the community to be used for a variety of purposes.

UOW-SHEF: SimpLex – Lexical Simplicity Ranking based on Contextual and Psycholinguistic Features

SimpLex operates on the basis of a linear weighted ranking function composed of context sensitive and psycholinguistic features that outperforms a very strong baseline, and ranked first on the shared task at SemEval-2012.

Statistical Estimation of Word Acquisition with Application to Readability Prediction

A novel statistical model for document readability that is based on the logistic Rasch model and the quantiles of word acquisition age distributions is presented, and it is demonstrated that the estimated acquisition distributions are very effective in predicting both global and local documentreadability.

MANULEX: A grade-level lexical database from French elementary school readers

MANULEX is a Web-accessible database that provides grade-level word frequency lists of nonlemmatized and lemmatization words computed from the 1.9 million words taken from 54 French elementary school readers.

The effects of syntactic and lexical complexity on the comprehension of elementary science texts

In this study we examined the effects of syntactic and lexical complexity on third-grade students' comprehension of science texts. A total of 16 expository texts were designed to represent systematic