Eugene J. Shekita

Learn More
XML is quickly becoming the <i>de facto</i> standard for data exchange over the Internet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred" XML documents into(More)
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact(More)
Big data may contain big values, but also brings lots of challenges to the computing theory, architecture, framework, knowledge discovery algorithms, and domain specific tools and applications. Beyond the 4-V or 5-V characters of big datasets, the data processing shows the features like inexact, incremental, and inductive manner. This brings new research(More)
The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users.(More)
Jaql is a declarative scripting language for enterprise data analysis powered by a scalable runtime that leverages Hadoop’s MapReduce parallel programming framework. Jaql is used in IBM’s Cognos Consumer Insight [6], the announced InfoSphere BigInsights [3], as well as several research projects. Through these interactions and use-cases, we have focused on(More)
XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the(More)
XML has emerged as the standard data exchange format for Internet-based business applications. This has created the need to publish existing business data, stored in relational databases, as XML. A general way to publish relational data as XML is to provide XML views over relational data, and allow business partners to query these views using an XML query(More)
This paper presents an overview of EXODUS, an extensible database system project that is addressing data management problems posed by a variety of challenging new applications. The goal of the project is to facilitate the fast development of high-performance, application-specific database systems. EXODUS provides certain kernel facilities, including a(More)
provided by no commercial database system at this time). Storing This paper describes the design of the object-oriented storage component of EXODUS, an extensible database manaaement~svstern currently under development at the University of-Wiscon&t. The basic abstraction in the EXODU’S storage system is the storage object, an uninterpmted variable-length(More)
The eXtended Markup Language (XML) is quickly emerging as the universal format for publishing and exchanging data on the World Wide Web. As a result, data sources, including object-relational databases, are now faced with a new class of users and applications; customers and programs that would like to deal directly with XML data rather than being forced to(More)