• Publications
  • Influence
On the naturalness of software
TLDR
We show that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations. Expand
  • 375
  • 54
  • PDF
An Investigation into Coupling Measures for C++
TLDR
This paper proposes a comprehensive suite of measures to quantify the level of class coupling during the design of object-oriented systems. Expand
  • 395
  • 43
  • PDF
Don't touch my code!: examining the effects of ownership on software quality
Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista andExpand
  • 286
  • 36
  • PDF
The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
TLDR
The field of automated software repair lacks a set of common benchmark problems. Expand
  • 179
  • 35
  • PDF
Mining email social networks
TLDR
Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. Expand
  • 548
  • 34
  • PDF
On the naturalness of software
TLDR
We begin with the conjecture that software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations — and thus, like natural language, it is also likely to be repetitive and predictable. Expand
  • 253
  • 31
How, and why, process metrics are better
TLDR
We analyze the applicability and efficacy of process and code metrics for defect prediction in a release-oriented setting across a large number of releases from a diverse set of projects. Expand
  • 228
  • 29
  • PDF
A large scale study of programming languages and code quality in github
TLDR
We use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v. dynamic typing, strong v. weak typing on software quality. Expand
  • 272
  • 24
  • PDF
Are deep neural networks the best choice for modeling source code?
TLDR
We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Expand
  • 161
  • 24
  • PDF
Gender and Tenure Diversity in GitHub Teams
TLDR
We show that gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. Expand
  • 198
  • 23
  • PDF
...
1
2
3
4
5
...