While attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator, and there are many ways in which this does not hold, where gradient-based rankings of attention weights better predict their effects than their magnitudes.
The role untrained human evaluations play in NLG evaluation is examined and three approaches for quickly training evaluators to better identify GPT3-authored text are explored and it is found that while evaluation accuracy improved up to 55%, it did not significantly improve across the three domains.
This senior comprehensive project analyzed three systems that predict how users will rate specific books and created a system that makes these predictions based on data gathered from the Amazon Book Reviews dataset, BookCrossing dataset, GoogleBooks API, and GoodReads API.