Corpus ID: 4421027

Datasheets for Datasets

@article{Gebru2018DatasheetsFD,
  title={Datasheets for Datasets},
  author={Timnit Gebru and J. Morgenstern and Briana Vecchione and Jennifer Wortman Vaughan and H. Wallach and Hal Daum{\'e} and K. Crawford},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.09010}
}
  • Timnit Gebru, J. Morgenstern, +4 authors K. Crawford
  • Published 2018
  • Computer Science
  • ArXiv
  • The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet… CONTINUE READING
    233 Citations

    Figures, Tables, and Topics from this paper

    Explore Further: Topics Discussed in This Paper

    MT-Adapted Datasheets for Datasets: Template and Repository
    • 3
    • Highly Influenced
    • PDF
    A System Framework for Personalized and Transparent Data-Driven Decisions
    • 2
    • PDF
    Intrinsic Evaluation of Summarization Datasets
    • PDF
    The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning
    • 2
    • PDF
    Accountable Data Analytics Start with Accountable Data: The LiQuID Metadata Model
    • PDF
    Dataset Reuse: Toward Translating Principles to Practice
    • Highly Influenced
    Towards Standardization of Data Licenses: The Montreal Data License
    • 6
    • PDF
    Pitfalls in Machine Learning Research: Reexamining the Development Cycle
    • PDF

    References

    SHOWING 1-10 OF 59 REFERENCES
    DataHub: Collaborative Data Science & Dataset Version Management at Scale
    • 120
    • PDF
    Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use
    • 32
    • PDF
    The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards
    • 87
    • PDF
    Model Cards for Model Reporting
    • 250
    • PDF
    Baselines and a datasheet for the Cerema AWP dataset
    • 6
    • PDF
    Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science
    • 123
    • PDF
    Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
    • 985
    • PDF
    QuAC : Question Answering in Context
    • 244
    • PDF
    Increasing Trust in AI Services through Supplier's Declarations of Conformity
    • 92
    • PDF