Empirical evidence of large-scale diversity in API usage of object-oriented software

@article{Mendez2013EmpiricalEO,
  title={Empirical evidence of large-scale diversity in API usage of object-oriented software},
  author={Diego Mendez and Beno{\^i}t Baudry and Monperrus Martin},
  journal={2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM)},
  year={2013},
  pages={43-52}
}
In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on “usage diversity”, defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software… 

Figures and Tables from this paper

Analysis and Exploitation of Natural Software Diversity: The Case of API Usages
TLDR
The empirical evidence that there is a significant usage diversity for many classes in the API is presented, and it is shown that one can use this API usage diversity to reason on the core design of object-oriented classes.
On the Impact of Order Information in API Usage Patterns
TLDR
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually cataloging and cataloging API usage patterns from code repositories.
A Large-Scale Study on Repetitiveness, Containment, and Composability of Routines in Open-Source Projects
TLDR
A large-scale study on the repetitiveness, containment, and composability of source code at the semantic level by collecting 8,764,971 unique subroutines as basic units for code searching/synthesis.
Highlighting Current Issues in API Usage Mining to Enhance Software Reusability
TLDR
This paper makes a theoretical comparison of the API usage pattern mining and highlights unresolved issues along with proper suggestions to address them.
Investigating Order Information in API-Usage Patterns: A Benchmark and Empirical Study
TLDR
This work presents a benchmark consisting of an episode mining algorithm that can be configured to learn all three types of patterns mentioned above, and empirically quantifies the importance of the order information encoded in sequential and partial-order patterns for representing correct co-occurrences of code elements in real code.
Understanding the API usage in Java
Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
TLDR
This study study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies, finding a reuse-core from APIs that is sufficient to provide for most clients.
The Multiple Facets of Software Diversity
TLDR
This survey includes classical work about design and data diversity for fault tolerance, as well as the cybersecurity literature that investigates randomization at different system levels, with an emphasis on the most recent advances in the field.
...
...

References

SHOWING 1-10 OF 26 REFERENCES
Detecting Missing Method Calls in Object-Oriented Software
TLDR
A new system is proposed, which automatically detects missing method calls during both software development and quality assurance phases and has a low false positive rate (<5%) and is able to findMissing method calls in the source code of the Eclipse IDE.
Usage Patterns of the Java Standard API
  • Homan Ma, R. Amor, E. Tempero
  • Computer Science, Economics
    2006 13th Asia Pacific Software Engineering Conference (APSEC'06)
  • 2006
TLDR
A corpus-based approach is taken to help determine the "typical" usage of the Java Standard API, and finds that, in an extensive corpus of open-source software, only about 50% of the classes in the Standard API are used at all, and around 21%" of the methods are used.
Large-scale, AST-based API-usage analysis of open-source Java projects
TLDR
An approach to large-scale API-usage analysis of open-source Java projects, which is motivated by API migration, and which also instantiate for the Source-Forge open- source repository in a certain way.
Understanding the shape of Java software
TLDR
The results of the first in-depth study of the structure of Java programs are presented, finding evidence that some relationships follow power-laws, while others do not.
Detecting missing method calls as violations of the majority rule
TLDR
A new system is proposed that searches for missing method calls in software based on the other method calls that are observable, showing that the voting theory concept of majority rule holds for method calls.
A study of the uniqueness of source code
TLDR
The first study of the uniqueness of source code is presented, examining a collection of 6,000 software projects and measuring the degree to which each project can be `assembled' solely from portions of this corpus, thus providing a precise measure of `uniqueness' that is called syntactic redundancy.
A fault model for subtype inheritance and polymorphism
TLDR
This paper presents a model for the appearance and realization of OO faults and defines and discusses specific categories of inheritance and polymorphic faults, which can be used to support empirical investigations of object-oriented testing techniques, to inspire further research into object- oriented testing and analysis, and to help improve design and development of object.
Data mining library reuse patterns using generalized association rules
  • Amir Michail
  • Computer Science
    Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium
  • 2000
TLDR
The paper improves upon earlier research using "association rules" by taking into account the inheritance hierarchy using "generalized association rules", and shows how data mining can be used to discover library reuse patterns in existing applications.
Dual ecological measures of focus in software development
TLDR
This work analogizes the developer-artifact contribution network to a predator-prey food web, and draws upon ideas from ecology to produce a novel, and conceptually unified view of measuring focus and ownership, which are theoretically well-founded and yield novel predictive, conceptual, and actionable value in software projects.
Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law
TLDR
The extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse.
...
...