• Corpus ID: 9831081

Automatic Construction of Large Readability Corpora

  title={Automatic Construction of Large Readability Corpora},
  author={Jorge Wagner and Rodrigo Wilkens and Aline Villavicencio},
  booktitle={CL4LC@COLING 2016},
This work presents a framework for the automatic construction of large Web corpora classified by readability level. We compare different Machine Learning classifiers for the task of readability assessment focusing on Portuguese and English texts, analysing the impact of variables like the feature inventory used in the resulting corpus. In a comparison between shallow and deeper features, the former already produce F-measures of over 0.75 for Portuguese texts, but the use of additional features… 

