Empirical evidence of large-scale diversity in API usage of object-oriented software

@article{Mendez2013EmpiricalEO,
  title={Empirical evidence of large-scale diversity in API usage of object-oriented software},
  author={Diego Mendez and Beno{\^i}t Baudry and Monperrus Martin},
  journal={2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM)},
  year={2013},
  pages={43-52}
}
In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on “usage diversity”, defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software… 

Figures and Tables from this paper

Analysis and Exploitation of Natural Software Diversity: The Case of API Usages

TLDR
The empirical evidence that there is a significant usage diversity for many classes in the API is presented, and it is shown that one can use this API usage diversity to reason on the core design of object-oriented classes.

On the Impact of Order Information in API Usage Patterns

TLDR
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually cataloging and cataloging API usage patterns from code repositories.

A Large-Scale Study on Repetitiveness, Containment, and Composability of Routines in Open-Source Projects

TLDR
A large-scale study on the repetitiveness, containment, and composability of source code at the semantic level by collecting 8,764,971 unique subroutines as basic units for code searching/synthesis.

Investigating Order Information in API-Usage Patterns: A Benchmark and Empirical Study

TLDR
This work presents a benchmark consisting of an episode mining algorithm that can be configured to learn all three types of patterns mentioned above, and empirically quantifies the importance of the order information encoded in sequential and partial-order patterns for representing correct co-occurrences of code elements in real code.

Understanding the API usage in Java

Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs

TLDR
This study study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies, finding a reuse-core from APIs that is sufficient to provide for most clients.

The Multiple Facets of Software Diversity

TLDR
This survey includes classical work about design and data diversity for fault tolerance, as well as the cybersecurity literature that investigates randomization at different system levels, with an emphasis on the most recent advances in the field.

KOWALSKI: Collecting API Clients in Easy Mode

TLDR
KOWALSKI, a tool that takes the name of an API, then finds and downloads client binaries by exploiting the Maven dependency management system, and creates a typed call graph that allows developers to identify hotspots in the API.

References

SHOWING 1-10 OF 26 REFERENCES

Detecting Missing Method Calls in Object-Oriented Software

TLDR
A new system is proposed, which automatically detects missing method calls during both software development and quality assurance phases and has a low false positive rate (<5%) and is able to findMissing method calls in the source code of the Eclipse IDE.

Usage Patterns of the Java Standard API

  • Homan MaR. AmorE. Tempero
  • Computer Science, Economics
    2006 13th Asia Pacific Software Engineering Conference (APSEC'06)
  • 2006
TLDR
A corpus-based approach is taken to help determine the "typical" usage of the Java Standard API, and finds that, in an extensive corpus of open-source software, only about 50% of the classes in the Standard API are used at all, and around 21%" of the methods are used.

Large-scale, AST-based API-usage analysis of open-source Java projects

TLDR
An approach to large-scale API-usage analysis of open-source Java projects, which is motivated by API migration, and which also instantiate for the Source-Forge open- source repository in a certain way.

Predicting class testability using object-oriented metrics

  • M. BruntinkA. Deursen
  • Computer Science, Mathematics
    Source Code Analysis and Manipulation, Fourth IEEE International Workshop on
  • 2004
TLDR
The goal of this work is to define and evaluate a set of metrics that can be used to assess the testability of the classes of a Java system.

Understanding the shape of Java software

TLDR
The results of the first in-depth study of the structure of Java programs are presented, finding evidence that some relationships follow power-laws, while others do not.

Detecting missing method calls as violations of the majority rule

TLDR
A new system is proposed that searches for missing method calls in software based on the other method calls that are observable, showing that the voting theory concept of majority rule holds for method calls.

A study of the uniqueness of source code

TLDR
The first study of the uniqueness of source code is presented, examining a collection of 6,000 software projects and measuring the degree to which each project can be `assembled' solely from portions of this corpus, thus providing a precise measure of `uniqueness' that is called syntactic redundancy.

A fault model for subtype inheritance and polymorphism

TLDR
This paper presents a model for the appearance and realization of OO faults and defines and discusses specific categories of inheritance and polymorphic faults, which can be used to support empirical investigations of object-oriented testing techniques, to inspire further research into object- oriented testing and analysis, and to help improve design and development of object.

Data mining library reuse patterns using generalized association rules

  • Amir Michail
  • Computer Science
    Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium
  • 2000
TLDR
The paper improves upon earlier research using "association rules" by taking into account the inheritance hierarchy using "generalized association rules", and shows how data mining can be used to discover library reuse patterns in existing applications.

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law

TLDR
The extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse.