Linguistic Features for Readability Assessment

@inproceedings{Deutsch2020LinguisticFF,
  title={Linguistic Features for Readability Assessment},
  author={Tovly Deutsch and Masoud Jasbi and S. Shieber},
  booktitle={BEA},
  year={2020}
}
Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further. This paper… Expand
Semi-Supervised Joint Estimation of Word and Document Readability
TLDR
This work proposes to jointly estimate word and document difficulty through a graph convolutional network (GCN) in a semi-supervised fashion and results reveal that the GCN-based method can achieve higher accuracy than strong baselines, and stays robust even with a smaller amount of labeled data. Expand
BERT Embeddings for Automatic Readability Assessment
TLDR
The proposed method outperforms classical approaches in readability assessment using English and Filipino datasets and can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task. Expand
A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance
TLDR
This study improves the conventional methodology of automatic readability assessment by incorporating the Word Mover’s Distance of ranked texts as an additional post-processing technique to further ground the difficulty level given by a model. Expand
Annotation Curricula to Implicitly Train Non-Expert Annotators
TLDR
The results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality, and can provide a novel way to improve data collection. Expand
Knowledge-Rich BERT Embeddings for Readability Assessment
TLDR
This study proposes an alternative way of utilizing the informationrich embeddings of BERT models through a joint-learning method combined with handcrafted linguistic features for readability assessment, and shows that the proposed method outperforms classical approaches in readability Assessment. Expand
Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment
TLDR
This work proposes to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features by forming a correlation graph among features and using it to learn their embedDings so that similar features will be represented by similar embeddments. Expand
Predicting Lexical Complexity in English Texts
TLDR
This paper analyzes previous work carried out in this task and investigates the properties of complex word identification datasets for English. Expand
Trends, Limitations and Open Challenges in Automatic Readability Assessment Research
TLDR
A brief survey of contemporary research on developing computational models for readability assessment identifies the common approaches, discusses their shortcomings, and identifies some challenges for the future. Expand
Supervised and Unsupervised Neural Approaches to Text Readability
TLDR
This study presents a set of novel neural supervised and unsupervised approaches for determining the readability of documents, and exposes their strengths and weaknesses, and compares them to current state-of-the-art classification approaches to readability. Expand

References

SHOWING 1-10 OF 25 REFERENCES
Supervised and Unsupervised Neural Approaches to Text Readability
TLDR
This study presents a set of novel neural supervised and unsupervised approaches for determining the readability of documents, and exposes their strengths and weaknesses, and compares them to current state-of-the-art classification approaches to readability. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Automatic analysis of syntactic complexity in second language writing
  • X. Lu
  • Computer Science
  • 2010
TLDR
The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners. Expand
Reading Level Assessment Using Support Vector Machines and Statistical Language Models
TLDR
This paper uses support vector machines to combine features from traditional reading level measures, statistical language models, and other language processing tools to produce a better method of assessing reading level. Expand
On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition
TLDR
It is shown that the developmental measures from Second Language Acquisition research when combined with traditional readability features such as word length and sentence length provide a good indication of text readability across different grades. Expand
The CELEX lexical database
  • 1995
HuggingFace's Transformers: State-of-the-art Natural Language Processing
TLDR
The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Hierarchical Attention Networks for Document Classification
TLDR
Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Expand
...
1
2
3
...