Christian A. Lang

Learn More
High-end solid state disks (SSDs) provide much faster access to data compared to conventional hard disk drives. We present a technique for using solid-state storage as a caching layer between RAM and hard disks in database management systems. By caching data that is accessed frequently, disk I/O is reduced. For random I/O, the potential performance gains(More)
Solid state disks (SSDs) provide much faster random access to data compared to conventional hard disk drives. Therefore, the response time of a database engine could be improved by moving the objects that are frequently accessed in a random fashion to the SSD. Considering the price and limited storage capacity of solid state disks, the database(More)
Online Analytical Processing (OLAP) has been a valuable tool for analyzing trends in business information. While the multi-dimensional cube model used by OLAP is ideal for analyzing structured business data, it is not suitable for representing and analyzing complex semi-structured data, such as, XML documents. Need for analyzing XML documents is gaining(More)
In this paper, we describe the IBM Research system for analysis, indexing, and retrieval of video, which was applied to the TREC-2002 video retrieval benchmark. The system explores methods for fully-automatic content analysis, shot boundary detection, multi-modal feature extraction, statistical modeling for semantic concept detection, and speech recognition(More)
Bloom Filters are widely used in many applications including database management systems. With a certain allowable error rate, this data structure provides an efficient solution for membership queries. The error rate is inversely proportional to the size of the Bloom filter. Currently, Bloom filters are stored in main memory because the low locality of(More)
Recent advances in solid state technology have led to the introduction of solid state drives (SSDs). Today's SSDs store data persistently using NAND flash memory and support good random IO performance. Current work in exploiting flash in database systems has primarily focused on using its random IO capability for second level bufferpools below main memory.(More)
The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting <i>data retrieval</i> and <i>analysis</i> scenarios to RDBMS. In such settings, queries often take the form of <i>k</i>-<i>constrained optimization</i>, with a Boolean constraint and a numeric optimization expression(More)
P. Chowdhary K. Bhaskaran N. S. Caswell H. Chang T. Chao S.-K. Chen M. Dikun H. Lei J.-J. Jeng S. Kapoor C. A. Lang G. Mihaila I. Stanoi L. Zeng Business process integration and monitoring provides an invaluable means for an enterprise to adapt to changing conditions. However, developing such applications using traditional methods is challenging because of(More)
Decision support (DSS) workloads generally contain multiple large concurrent scan operations. These are often executed as relational table scans which can take up a lot of I/O bandwidth. This is especially true for ad-hoc queries where the workload is not known in advance. Common database management systems have only limited ability to reuse memory buffer(More)
Decision support systems are characterized by large concurrent scan operations. A significant percentage of these scans are executed as index based scans of the data. This is especially true when the data is physically clustered on the index columns using the various clustering schemes employed by database engines. Common database management systems have(More)