Data Lake Management: Challenges and Opportunities

@article{Nargesian2019DataLM,
  title={Data Lake Management: Challenges and Opportunities},
  author={Fatemeh Nargesian and Erkang Zhu and Ren{\'e}e J. Miller and Ken Q. Pu and Patricia C. Arocena},
  journal={PVLDB},
  year={2019},
  volume={12},
  pages={1986-1989}
}
The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management. PVLDB Reference Format: Fatemeh Naregsian, Erkang Zhu, Renée J… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 41 REFERENCES

and M

E. Zhu, D. Deng, F. Nargesian
  • R. J. Josie: Overlap set similarity search for finding joinable tables in data lakes. In SIGMOD
  • 2019
VIEW 1 EXCERPT

Aurum: A Data Discovery System

  • 2018 IEEE 34th International Conference on Data Engineering (ICDE)
  • 2018
VIEW 1 EXCERPT

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery

  • 2018 IEEE 34th International Conference on Data Engineering (ICDE)
  • 2018
VIEW 1 EXCERPT

Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data

  • 2018 IEEE 14th International Conference on e-Science (e-Science)
  • 2018
VIEW 1 EXCERPT

and R

F. Nargesian, K. Q. Pu, E. Zhu, B. G. Bashardoost
  • J. Miller. Optimizing organizations for navigating data lakes
  • 2018
VIEW 1 EXCERPT