The General Index of Software Engineering Papers

  title={The General Index of Software Engineering Papers},
  author={Zeinab Abou Khalil and Stefano Zacchiroli},
  journal={2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)},
We introduce the General Index of Software Engineering Papers, a dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. The dataset includes both complete bibliographic information and indexed n-grams (sequence of contiguous words after removal of stopwords and non-words, for a total of 577 276 382 unique n-grams in this release) with length 1 to 5 for 44 581 papers retrieved from 34 venues over the 1971–2020 period. The dataset serves… 

Figures and Tables from this paper


The POSTGRES next generation database management system
Giant, free index to world's research papers released online.
Happy Birthday! A trend analysis on past MSR papers
A text mining exercise applied on the complete corpus of MSR papers is reported on to reflect on where the discipline has come from; where it is now; and where it should be going.
A systematic mapping study onmining software repositories
  • In Proceedings of the 31st Annual ACM Symposium on Applied Computing,
  • 2016
Research trends on distance learning: a text mining-based literature review from 2008 to 2018
Today’s dynamic distance learning environments offer a flexible, comfortable, and lifelong learning experience, independent of space and time. In this way, it also supports and develops existing tr...
Standing on shoulders or feet? An extended study on the usage of the MSR data papers
Examining the usage of data papers published in the Mining Software Repositories proceedings in terms of use frequency, users, and use purpose concluded that data papers have provided the foundation for a significant number of studies, but there is room for improvement in their utilization.
Research Publication Trends in Software Engineering
Results show that software testing is on hype while comparing with development, maintenance, refactoring and management.
Removing the Barriers to Research: An Introduction to Open Access for Librarians
Open-access literature is online, free of charge, and free of most copyright and licensing restrictions. Open access will solve the pricing crisis, which prevents libraries from buying access to all
Finding Trends in Software Research
It is reported that text mining methods can detect large scale trends within the authors' community and it is important to have automatic agents that can update their understanding of their community whenever new data arrives.