• Publications
  • Influence
Jena: implementing the semantic web recommendations
The new Semantic Web recommendations for RDF, RDFS and OWL have, at their heart, the RDF graph. Jena2, a second-generation RDF toolkit, is similarly centered on the RDF graph. RDFS and OWL reasoning
Efficient RDF Storage and Retrieval in Jena2
This paper describes the persistence subsystem of Jena2 which is intended to support large datasets and query optimization for RDF is identified as a promising area for future research.
Jena Property Table Implementation
This paper describes a property table design and implementation for Jena, an RDF Semantic Web toolkit, and a design goal is to make Jena property tables look like normal relational database tables.
Data integration flows for business intelligence
The requirements for data integration flows in this next generation of operational BI system are described, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges are described.
Optimizing analytic data flows for multiple execution engines
This paper focuses on optimizing flows for a single objective, namely performance, over multiple execution engines that span a DBMS, a Map-Reduce engine, and an orchestration engine (e.g., NoSQL plus SQL).
Parallel algorithms for the execution of relational database operations
This paper presents and analyzes algorithms for parallel processing of relational database operations in a general multiprocessor framework, and introduces an analysis methodology which incorporates I/O, CPU, and message costs and which can be adjusted to fit different multiproprocessor architectures.
Leveraging Business Process Models for ETL Design
This paper proposes the use of business process models for a conceptual view of ETL and shows how to link this conceptual view to existing business processes and how to translate from this conceptualView to a logical ETL view that can be optimized.
Optimizing ETL workflows for fault-tolerance
This paper describes the QoX optimizer that considers multiple design strategies and finds an ETL design that satisfies multiple objectives, and defines the optimizer search space, cost functions, and search algorithms.
QoX-driven ETL design: reducing the cost of ETL consulting engagements
A novel approach to ETL design is presented that incorporates a suite of quality metrics, termed QoX, at all stages of the design process that helps reduce the cost of these engagements while obtaining optimal designs.
HFMS: Managing the lifecycle and complexity of hybrid analytic data flows
A Hybrid Flow Management System is presented, an independent software layer over a number of independent execution engines and storage repositories that simplifies the design of analytic data flows and includes optimization and executor modules to produce optimized executable flows that can run across multiple execution engines.