• Corpus ID: 233181823

EXPATS: A Toolkit for Explainable Automated Text Scoring

@article{Manabe2021EXPATSAT,
  title={EXPATS: A Toolkit for Explainable Automated Text Scoring},
  author={Hitoshi Manabe and Masato Hagiwara},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.03364}
}
Automated text scoring (ATS) tasks, such as automated essay scoring and readability assessment, are important educational applications of natural language processing. Due to their interpretability of models and predictions, traditional machine learning (ML) algorithms based on handcrafted features are still in wide use for ATS tasks. Practitioners often need to experiment with a variety of models (including deep and traditional ML ones), features, and training objectives (regression and… 

Figures and Tables from this paper

Do Deep Neural Nets Display Human-like Attention in Short Answer Scoring?

Deep Learning (DL) techniques have been increasingly adopted for Automatic Text Scoring in education. However, these techniques often suffer from their inabilities to explain and justify how a

Black-box Error Diagnosis in Deep Neural Networks for Computer Vision: a Survey of Tools

This paper focuses on the application of DNNs to Computer Vision tasks and presents a survey of the tools that support the black-box performance diagnosis paradigm, and illustrates the features and gaps of the current proposals, discusses the relevant research directions and provides a brief overview of the diagnosis tools in sectors other than CV.

References

SHOWING 1-10 OF 28 REFERENCES

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

The Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models, is presented, which integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis.

Automatic Text Scoring Using Neural Networks

A model that forms word representations by learning the extent to which specific words contribute to the text's score is introduced, using Long-Short Term Memory networks to represent the meaning of texts.

Automated Essay Scoring: A Survey of the State of the Art

An overview of the major milestones made in automated essay scoring research since its inception is presented.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

AllenNLP: A Deep Semantic Natural Language Processing Platform

AllenNLP is described, a library for applying deep learning methods to NLP research that addresses issues with easy-to-use command-line tools, declarative configuration-driven experiments, and modular NLP abstractions.

A Neural Approach to Automated Essay Scoring

This paper develops an approach based on recurrent neural networks to learn the relation between an essay and its assigned score, without any feature engineering.

Assessing Chinese Readability using Term Frequency and Lexical Chain

This paper derives features from the result of lexical chaining to capture the lexical cohesive information, where E-HowNet lexical database is used to compute semantic similarity between nouns with high word frequency.

Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking

A new way to fine-tune pre-trained language models with multiple losses of the same task is found to improve AES’s performance, and the model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model.

Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value

This study shows that faster (up to three orders of magnitude) SHAP implementations are as accurate as the slower model-agnostic one and evaluates the impact of deep learning (multi-layer perceptron neural networks) on the performance of AES.

Should You Fine-Tune BERT for Automated Essay Scoring?

It is found that fine-tuning BERT produces similar performance to classical models at significant additional cost, and a review of promising areas for research on student essays where the unique characteristics of Transformers may provide benefits over classical methods to justify the costs.