• Corpus ID: 235399978

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

  title={Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking},
  author={Zhiyi Ma and Kawin Ethayarajh and Tristan Thrush and Somya Jain and Ledell Yu Wu and Robin Jia and Christopher Potts and Adina Williams and Douwe Kiela},
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform. Our platform evaluates NLP models directly instead of relying on selfreported metrics or predictions on a single dataset. Under this paradigm, models are submitted to be evaluated in the cloud, circumventing the issues of reproducibility, accessibility, and backwards compatibility that often hinder benchmarking in NLP. This allows… 

