• Publications
  • Influence
Diversifying search results
TLDR
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. Expand
  • 974
  • 158
  • PDF
Split query processing in polybase
TLDR
This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language. Expand
  • 137
  • 10
  • PDF
Turbocharging DBMS buffer pool using SSDs
TLDR
We propose and systematically explore designs for using an SSD to improve the performance of a DBMS buffer manager that deal with the dirty pages evicted from the buffer pool. Expand
  • 87
  • 9
  • PDF
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
TLDR
In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. Expand
  • 99
  • 8
  • PDF
Mixed Mode XML Query Processing
TLDR
We show that for good performance, a native XML query processing system should support query plans that mix these two processing paradigms, and provide a cost model for identifying efficient combinations of the techniques. Expand
  • 101
  • 7
  • PDF
Query optimization in microsoft SQL server PDW
TLDR
We leverage existing QO technology in Microsoft SQL Server to implement a cost-based optimizer for distributed query execution in an MPP. Expand
  • 28
  • 5
  • PDF
Froid: Optimization of Imperative Programs in a Relational Database
TLDR
We present Froid, an extensible framework for optimizing imperative programs in relational databases. Expand
  • 22
  • 4
  • PDF
When Free Is Not Really Free: What Does It Cost to Run a Database Workload in the Cloud?
TLDR
The current computing trend towards cloud-based Database-as-a-Service (DaaS) as an alternative to traditional on-site relational database management systems (RDBMSs) has largely been driven by the perceived simplicity and cost-effectiveness of migrating to a DaaS. Expand
  • 21
  • 4
  • PDF
Generating labels from clicks
TLDR
The ranking function used by search engines to order results is learned from labeled training data. Expand
  • 50
  • 3
  • PDF
ROX: Relational Over XML
TLDR
This paper explores the feasibility of accessing natively-stored XML data through traditional SQL interfaces, called Relational Over XML (ROX), in order to avoid the costly conversion of legacy applications to XQuery. Expand
  • 54
  • 2
  • PDF