• Publications
  • Influence
code2vec: learning distributed representations of code
A neural model for representing snippets of code as continuous distributed vectors as a single fixed-length code vector which can be used to predict semantic properties of the snippet, making it the first to successfully predict method names based on a large, cross-project corpus. Expand
code2seq: Generating Sequences from Structured Representations of Code
This model represents a code snippet as the set of compositional paths in its abstract syntax tree and uses attention to select the relevant paths while decoding and significantly outperforms previous models that were specifically designed for programming languages, as well as state-of-the-art NMT models. Expand
Code completion with statistical language models
The main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences, and design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. Expand
Effective typestate verification in the presence of aliasing
A novel framework for verification of typestate properties, including several new techniques to precisely treat aliases without undue performance costs, is presented, including a flowsensitive, context-sensitive, integrated verifier that utilizes a parametric abstract domain combining typestate and aliasing information. Expand
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
It is shown that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. Expand
Tracelet-based code search in executables
This work presents a novel technique for computing similarity between functions based on decomposition of functions into tracelets: continuous, short, partial traces of an execution, and employs a simple rewriting engine to establish tracelet similarity in the face of low-level compiler transformations. Expand
Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin's L*Expand
Practical concurrent binary search trees via logical ordering
The experimental results show that the algorithms with lock-free contains and on-time deletion are practical and often comparable to the state-of-the-art. Expand
Statistical similarity of binaries
A new statistical approach for measuring the similarity between two procedures is presented, using similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. Expand
Chameleon: adaptive selection of collections
CHAMELEON is presented, a low-overhead automatic tool that assists the programmer in choosing the appropriate collection implementation for her application and shows that for some applications, using CHAMELEONS leads to a significant improvement of the memory footprint of the application. Expand