Fuse: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets

@article{Barik2015FuseAR,
  title={Fuse: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets},
  author={Titus Barik and Kevin Lubick and Justin Smith and John Slankas and Emerson R. Murphy-Hill},
  journal={2015 IEEE/ACM 12th Working Conference on Mining Software Repositories},
  year={2015},
  pages={486-489}
}
Spreadsheets are perhaps the most ubiquitous form of end-user programming software. This paper describes a corpus, called Fuse, containing 2,127,284 URLs that return spreadsheets (and their HTTP server responses), and 249,376 unique spreadsheets, contained within a public web archive of over 26.83 billion pages. Obtained using nearly 60,000 hours of… CONTINUE READING