Personal Data Lake with Data Gravity Pull

@article{Walker2015PersonalDL,
  title={Personal Data Lake with Data Gravity Pull},
  author={Coral Walker and Hassan H. Alrehamy},
  journal={2015 IEEE Fifth International Conference on Big Data and Cloud Computing},
  year={2015},
  pages={160-167}
}
This paper presents Personal Data Lake, a unified storage facility for storing, analyzing and querying personal data. A data lake stores data regardless of format and thus provides an intuitive way to store personal data fragments of any type. Metadata management is a central part of the lake architecture. For structured/semi-structured data fragments, metadata may contain information about the schema of the data so that the data can be transformed into queryable data objects when required. For… 

Figures and Tables from this paper

Modeling Data Lake Metadata with a Data Vault
TLDR
This paper instantiate the metadata conceptual model into relational and document-oriented logical and physical models, respectively, and compares the physical models in terms of metadata storage and query response time.
Data Lake Ingestion Management
TLDR
A metadata model that includes information about external data sources, data ingestion processes, ingested data, dataset veracity and dataset security is proposed and a developed metadata management system whereby users can easily consult different elements stored in DL is introduced.
Textual Data Analysis from Data Lakes
TLDR
This thesis proposes in this thesis a methodological approach to enable textual data analyses from data lakes through an efficient metadata system.
Metadata Systems for Data Lakes: Models and Features.
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema.
Constance: An Intelligent Data Lake System
TLDR
Constance is a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources that discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities.
Semantic Profiling in Data Lake
TLDR
A new metadata extension to data lake systems by semantic profiling, which attempts to recognize the meaning of the data which is ingested into the Data Lake, and shows that Semantic Ingestion is a promising approach for enriching the data sets in a data lake.
Metadata Management for Data Lakes
TLDR
A metadata conceptual schema which considers different types (structured, semi-structured and unstructured) of raw or processed data is presented and is implemented in two DBMSs to validate the proposal.
Towards Information Profiling: Data Lake Content Metadata Management
TLDR
This work formally defines a metadata management process which identifies the key activities required to effectively handle information profiling, and demonstrates the value and feasibility of this approach using a prototype implementation handling a real-life case-study from the OpenML DL.
Metadata Systems for Data Lakes: Models and Features
TLDR
Data querying and analysis depend on a metadata system that must be efficient and comprehensive, and metadata management in data lakes remains a current issue and the criteria for evaluating its effectiveness are more or less nonexistent.
Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake
TLDR
This thesis introduces a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake, and presents a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
A comparison of a graph database and a relational database: a data provenance perspective
TLDR
This paper reports on a comparison of one such NoSQL graph database called Neo4j with a common relational database system, MySQL, for use as the underlying technology in the development of a software system to record and query data provenance information.
Scalable SQL and NoSQL data stores
TLDR
This paper examines a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers, and contrasts the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions.
Binary RDF for scalable publishing, exchanging and consumption in the web of data
TLDR
This article discusses an ongoing doctoral thesis addressing efficient formats for publication, exchange and consumption of RDF on a large scale, and proposes a binary serialization format for RDF, called HDT.
Data Wrangling: The Challenging Yourney from the Wild to the Lake
TLDR
This paper proposes that what is really needed is a curated data lake, where the lake contents have undergone a curation process that enable its use and deliver the promise of ad-hoc data accessibility to users beyond the enterprise IT staff.
The Future of Social Is Personal: The Potential of the Personal Data Store
This chapter argues that technical architectures that facilitate the longitudinal, decentralised and individual-centric personal collection and curation of data will be an important, but partial,
Semantic database modeling: survey, applications, and research issues
TLDR
This paper provides a tutorial introduction to the primary components of semantic models, which are the explicit representation of objects, attributes of and relationships among objects, type constructors for building complex types, ISA relationships, and derived schema components.
Graph Databases
TLDR
This practical book shows you how to apply the schema-free graph model to real-world problems and design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains.
Taking Care of Digital Collections and Data: ‘Curation’ and Organisational Choices for Research Libraries
TLDR
The article introduces the issues dealt with in the LIBER Workshop ‘Curating Research’ to be held in The Hague on 17 April 2009 and this corresponding issue of LIBER Quarterly.
Constructions from Dots and Lines
TLDR
The world of graphs in computing is explored and situations in which graphical models are beneficial are exposed.
Social Collective Intelligence
TLDR
The book will provide a cohesive and holistic treatment of Social Collective Intelligence, including challenges emerging in various disciplines (computer science, sociology, ethics and opportunities for innovating in various application areas), and will gauge insight and knowledge into the challenges and opportunities provided by this new, exciting, field of investigation.
...
1
2
3
...