Personal Data Lake with Data Gravity Pull
@article{Walker2015PersonalDL, title={Personal Data Lake with Data Gravity Pull}, author={Coral Walker and Hassan H. Alrehamy}, journal={2015 IEEE Fifth International Conference on Big Data and Cloud Computing}, year={2015}, pages={160-167} }
This paper presents Personal Data Lake, a unified storage facility for storing, analyzing and querying personal data. A data lake stores data regardless of format and thus provides an intuitive way to store personal data fragments of any type. Metadata management is a central part of the lake architecture. For structured/semi-structured data fragments, metadata may contain information about the schema of the data so that the data can be transformed into queryable data objects when required. For…
Figures and Tables from this paper
62 Citations
Modeling Data Lake Metadata with a Data Vault
- Computer ScienceIDEAS
- 2018
This paper instantiate the metadata conceptual model into relational and document-oriented logical and physical models, respectively, and compares the physical models in terms of metadata storage and query response time.
Data Lake Ingestion Management
- Computer ScienceArXiv
- 2021
A metadata model that includes information about external data sources, data ingestion processes, ingested data, dataset veracity and dataset security is proposed and a developed metadata management system whereby users can easily consult different elements stored in DL is introduced.
Textual Data Analysis from Data Lakes
- Computer ScienceADBIS
- 2019
This thesis proposes in this thesis a methodological approach to enable textual data analyses from data lakes through an efficient metadata system.
Metadata Systems for Data Lakes: Models and Features.
- Computer Science
- 2019
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema.…
Constance: An Intelligent Data Lake System
- Computer ScienceSIGMOD Conference
- 2016
Constance is a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources that discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities.
Semantic Profiling in Data Lake
- Computer Science
- 2018
A new metadata extension to data lake systems by semantic profiling, which attempts to recognize the meaning of the data which is ingested into the Data Lake, and shows that Semantic Ingestion is a promising approach for enriching the data sets in a data lake.
Metadata Management for Data Lakes
- Computer ScienceADBIS
- 2019
A metadata conceptual schema which considers different types (structured, semi-structured and unstructured) of raw or processed data is presented and is implemented in two DBMSs to validate the proposal.
Towards Information Profiling: Data Lake Content Metadata Management
- Computer Science2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)
- 2016
This work formally defines a metadata management process which identifies the key activities required to effectively handle information profiling, and demonstrates the value and feasibility of this approach using a prototype implementation handling a real-life case-study from the OpenML DL.
Metadata Systems for Data Lakes: Models and Features
- Computer ScienceADBIS
- 2019
Data querying and analysis depend on a metadata system that must be efficient and comprehensive, and metadata management in data lakes remains a current issue and the criteria for evaluating its effectiveness are more or less nonexistent.
Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake
- Computer Science
- 2020
This thesis introduces a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake, and presents a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution.
References
SHOWING 1-10 OF 24 REFERENCES
A comparison of a graph database and a relational database: a data provenance perspective
- Computer ScienceACM SE '10
- 2010
This paper reports on a comparison of one such NoSQL graph database called Neo4j with a common relational database system, MySQL, for use as the underlying technology in the development of a software system to record and query data provenance information.
Scalable SQL and NoSQL data stores
- Computer ScienceSGMD
- 2011
This paper examines a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers, and contrasts the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions.
Binary RDF for scalable publishing, exchanging and consumption in the web of data
- Computer ScienceWWW
- 2012
This article discusses an ongoing doctoral thesis addressing efficient formats for publication, exchange and consumption of RDF on a large scale, and proposes a binary serialization format for RDF, called HDT.
Data Wrangling: The Challenging Yourney from the Wild to the Lake
- Computer ScienceCIDR
- 2015
This paper proposes that what is really needed is a curated data lake, where the lake contents have undergone a curation process that enable its use and deliver the promise of ad-hoc data accessibility to users beyond the enterprise IT staff.
The Future of Social Is Personal: The Potential of the Personal Data Store
- Computer Science
- 2014
This chapter argues that technical architectures that facilitate the longitudinal, decentralised and individual-centric personal collection and curation of data will be an important, but partial,…
Semantic database modeling: survey, applications, and research issues
- Computer ScienceCSUR
- 1987
This paper provides a tutorial introduction to the primary components of semantic models, which are the explicit representation of objects, attributes of and relationships among objects, type constructors for building complex types, ISA relationships, and derived schema components.
Graph Databases
- Computer Science
- 2013
This practical book shows you how to apply the schema-free graph model to real-world problems and design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains.
Taking Care of Digital Collections and Data: ‘Curation’ and Organisational Choices for Research Libraries
- Business
- 2009
The article introduces the issues dealt with in the LIBER Workshop ‘Curating Research’ to be held in The Hague on 17 April 2009 and this corresponding issue of LIBER Quarterly.
Constructions from Dots and Lines
- Computer ScienceArXiv
- 2010
The world of graphs in computing is explored and situations in which graphical models are beneficial are exposed.
Social Collective Intelligence
- Computer ScienceComputational Social Sciences
- 2014
The book will provide a cohesive and holistic treatment of Social Collective Intelligence, including challenges emerging in various disciplines (computer science, sociology, ethics and opportunities for innovating in various application areas), and will gauge insight and knowledge into the challenges and opportunities provided by this new, exciting, field of investigation.