Data Lake Management: Challenges and Opportunities

@article{Nargesian2019DataLM,
  title={Data Lake Management: Challenges and Opportunities},
  author={F. Nargesian and Erkang Zhu and R. Miller and Ken Q. Pu and Patricia C. Arocena},
  journal={Proc. VLDB Endow.},
  year={2019},
  volume={12},
  pages={1986-1989}
}
  • F. Nargesian, Erkang Zhu, +2 authors Patricia C. Arocena
  • Published 2019
  • Computer Science
  • Proc. VLDB Endow.
  • The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management. PVLDB Reference Format: Fatemeh Naregsian, Erkang Zhu, Renée J… CONTINUE READING
    17 Citations

    Figures and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    A Zone Reference Model for Enterprise-Grade Data Lake Management
    Finding Related Tables in Data Lakes for Interactive Data Science
    • 7
    • PDF
    Data-driven domain discovery for structured datasets
    Semantic Data Understanding with Character Level Learning
    • Michael J. Mior, Ken Q. Pu
    • Computer Science
    • 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)
    • 2020
    Artificial Intelligence for Digital Agriculture at Scale: Techniques, Policies, and Challenges
    • 3
    • PDF
    Distributed Data Strategies to Support Large-Scale Data Analysis Across Geo-Distributed Data Centers
    Workflow Provenance in the Lifecycle of Scientific Machine Learning
    NoSQL Schema Evolution and Data Migration: State-of-the-Art and Opportunities
    • 1
    • PDF
    RulER: Scaling Up Record-level Matching Rules
    Adaptive Top-k Overlap Set Similarity Joins

    References

    SHOWING 1-10 OF 53 REFERENCES
    Constance: An Intelligent Data Lake System
    • 102
    CLAMS: Bringing Quality to Data Lakes
    • 39
    • PDF
    Draining the Data Swamp: A Similarity-based Approach
    • 12
    • PDF
    Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets
    • 20
    • PDF
    Aurum: A Data Discovery System
    • 41
    • PDF
    Big data integration
    • D. Srivastava
    • Computer Science
    • 2013 IEEE 29th International Conference on Data Engineering (ICDE)
    • 2013
    • 200
    • PDF
    Making Open Data Transparent: Data Discovery on Open Data
    • 9
    • PDF
    DataHub: Collaborative Data Science & Dataset Version Management at Scale
    • 117
    • PDF
    The Data Civilizer System
    • 85
    • PDF