Learn More
Newly emerged event-based online social services, such as Meetup and Plancast, have experienced increased popularity and rapid growth. From these services, we observed a new type of social network - <i>event-based social network</i> (EBSN). An EBSN does not only contain online social interactions as in other conventional online social networks, but also(More)
To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a " think like a vertex " programming model to support iterative graph computation. This(More)
The Starburst project, at IBM's Almaden Research Center , is improving the design of relational database management systems and enhancing their performance, while building an extensible system to better support nontraditional applications (such as engineering , geographic, office, etc.), and to serve as a testbed for future improvements in database(More)
Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a(More)
Hadoop has become an attractive platform for large-scale data ana-lytics. In this paper, we identify a major performance bottleneck of Hadoop: its lack of ability to colocate related data on the same set of nodes. To overcome this bottleneck, we introduce CoHadoop, a lightweight extension of Hadoop that allows applications to control where data are stored.(More)
A database management system architecture is described that facilitates the implementation of data management extensions for relational database systems. The architecture defines two classes of data management extensions alternative ways of storing relations called relation &#8220;storage methods&#8221;, and access paths, integrity constraints, or triggers(More)
Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separately, causing shared content to be indexed multiple times. In this paper, we describe a new document representation model where related documents are organized as a tree, allowing shared(More)
In this paper we describe the design, implementation , and performance of an incremental join facility that has been added as an extension to the Starburst extensible DBMS. This facility provides an efficient access path for joins that materialize many-to-one relationships, and it works by maintaining hidden pointer fields embedded in related tuples. The(More)