Lexical Generalization Improves with Larger Models and Longer Training

@article{Bandel2022LexicalGI,
  title={Lexical Generalization Improves with Larger Models and Longer Training},
  author={Elron Bandel and Yoav Goldberg and Yanai Elazar},
  journal={ArXiv},
  year={2022},
  volume={abs/2210.12673}
}
While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less suscep-tible to adopting lexical overlap heuristics. We also find… 

Figures and Tables from this paper