• Publications
  • Influence
Diversifying search results
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category accordingExpand
  • 968
  • 160
  • PDF
Split query processing in polybase
This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language. Unlike other database systemsExpand
  • 129
  • 10
  • PDF
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
"Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema toExpand
  • 100
  • 9
  • PDF
Turbocharging DBMS buffer pool using SSDs
Flash solid-state drives (SSDs) are changing the I/O landscape, which has largely been dominated by traditional hard disk drives (HDDs) for the last 50 years. In this paper we propose andExpand
  • 87
  • 9
  • PDF
Mixed Mode XML Query Processing
Querying XML documents typically involves both tree-based navigation and pattern matching similar to that used in structured information retrieval domains. In this paper, we show that for goodExpand
  • 101
  • 7
  • PDF
Query optimization in microsoft SQL server PDW
In recent years, Massively Parallel Processors have increasingly been used to manage and query vast amounts of data. Dramatic performance improvements are achieved through distributed execution ofExpand
  • 28
  • 5
  • PDF
Froid: Optimization of Imperative Programs in a Relational Database
For decades, RDBMSs have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL hasExpand
  • 22
  • 4
  • PDF
When Free Is Not Really Free: What Does It Cost to Run a Database Workload in the Cloud?
The current computing trend towards cloud-based Database-as-a-Service (DaaS) as an alternative to traditional on-site relational database management systems (RDBMSs) has largely been driven by theExpand
  • 19
  • 4
  • PDF
Generating labels from clicks
The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score ofExpand
  • 49
  • 3
  • PDF
ROX: Relational Over XML
An increasing percentage of the data needed by business applications is being generated in XML format. Storing the XML in its native format will facilitate new applications that exchange businessExpand
  • 54
  • 3
  • PDF