• Publications
  • Influence
Summarizing Source Code using a Neural Attention Model
This paper presents the first completely datadriven approach for generating high level summaries of source code, which uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries.
Learning a Neural Semantic Parser from User Feedback
We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal
Packet Transactions: High-Level Programming for Line-Rate Switches
This paper introduces the notion of a packet transaction: a sequential packet-processing code block that is atomic and isolated from other such code blocks that can run at line rate on emerging programmable line-rate switching chips.
Synthesizing highly expressive SQL queries from input-output examples
A new scalable and efficient algorithm for synthesizing SQL queries based on I/O examples that develops a language for abstract queries, i.e., queries with uninstantiated operators, that can be used to express a large space of SQL queries efficiently.
Mapping Language to Code in Programmatic Context
This work introduces CONCODE, a new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develops a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.
Cosette: An Automated Prover for SQL
This paper presents COSETTE, a fully automated prover that can determine the equivalence of SQL queries, and believes that this tool represents a major step towards building provably-correct query optimizers for real-world database systems.
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm, and generates executable code from the summary, using either the Hadoop, Spark, or Flink API.
PipeGen: Data Pipe Generator for Hybrid Analytics
PipeGen automatically generates data pipes between DBMSs by leveraging their functionality to transfer data via disk files using common data formats such as CSV, and creates data pipes by extending such functionality with efficient binary data transfer capabilities.
Understanding Database Performance Inefficiencies in Real-world Web Applications
This study studied 27 real-world open-source applications built on top of the popular Ruby on Rails ORM framework to understand the database-related performance inefficiencies and suggested techniques to alleviate these issues and measured the potential performance gain.
Optimizing database-backed applications with query synthesis
This paper presents QBS, a system that automatically transforms fragments of application logic into SQL queries, and demonstrates that this approach can convert a variety of imperative constructs into relational specifications and significantly improve application performance asymptotically by orders of magnitude.