Empirical evidence of large-scale diversity in API usage of object-oriented software

@article{Mendez2013EmpiricalEO,
  title={Empirical evidence of large-scale diversity in API usage of object-oriented software},
  author={Diego Mendez and B. Baudry and Monperrus Martin},
  journal={2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM)},
  year={2013},
  pages={43-52}
}
In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on “usage diversity”, defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software… Expand
Analysis and Exploitation of Natural Software Diversity: The Case of API Usages
In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on "usage diversity", defined as the different statically observable combinations ofExpand
On the Impact of Order Information in API Usage Patterns
TLDR
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually cataloging and cataloging API usage patterns from code repositories. Expand
A Large-Scale Study on Repetitiveness, Containment, and Composability of Routines in Open-Source Projects
TLDR
A large-scale study on the repetitiveness, containment, and composability of source code at the semantic level by collecting 8,764,971 unique subroutines as basic units for code searching/synthesis. Expand
Investigating Order Information in API-Usage Patterns: A Benchmark and Empirical Study
TLDR
This work presents a benchmark consisting of an episode mining algorithm that can be configured to learn all three types of patterns mentioned above, and empirically quantifies the importance of the order information encoded in sequential and partial-order patterns for representing correct co-occurrences of code elements in real code. Expand
Understanding the API usage in Java
TLDR
A large-scale, comprehensive, empirical analysis of the actual usage of APIs on Java, a modern, mature, and widely-used programming language, to understand how APIs are employed in practical development and explore their potential applications based on the results of API usage analysis. Expand
Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
TLDR
This study study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies, finding a reuse-core from APIs that is sufficient to provide for most clients. Expand
API usage pattern recommendation for software development
TLDR
The approach represents the source code as a network of object usages where an object usage is a set of method calls invoked on a single API class and automatically extract usage patterns by clustering the data based on the co-existence relations between object usage. Expand
The Multiple Facets of Software Diversity
TLDR
This survey includes classical work about design and data diversity for fault tolerance, as well as the cybersecurity literature that investigates randomization at different system levels, with an emphasis on the most recent advances in the field. Expand
KOWALSKI: Collecting API Clients in Easy Mode
TLDR
KOWALSKI, a tool that takes the name of an API, then finds and downloads client binaries by exploiting the Maven dependency management system, and creates a typed call graph that allows developers to identify hotspots in the API. Expand
Analyzing the Change-Proneness of APIs and web APIs
TLDR
It is shown that changes to APIs are more likely to appear if APIs are affected by the ComplexClass, SpaghettiCode, and SwissArmyKnife antipatterns, and this result suggests that software engineers should design interfaces with high external cohesion (measured with the IUC metric) to avoid frequent changes. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 26 REFERENCES
Detecting Missing Method Calls in Object-Oriented Software
TLDR
A new system is proposed, which automatically detects missing method calls during both software development and quality assurance phases and has a low false positive rate (<5%) and is able to findMissing method calls in the source code of the Eclipse IDE. Expand
Usage Patterns of the Java Standard API
TLDR
A corpus-based approach is taken to help determine the "typical" usage of the Java Standard API, and finds that, in an extensive corpus of open-source software, only about 50% of the classes in the Standard API are used at all, and around 21%" of the methods are used. Expand
Large-scale, AST-based API-usage analysis of open-source Java projects
TLDR
An approach to large-scale API-usage analysis of open-source Java projects, which is motivated by API migration, and which also instantiate for the Source-Forge open- source repository in a certain way. Expand
Predicting class testability using object-oriented metrics
TLDR
The goal of this work is to define and evaluate a set of metrics that can be used to assess the testability of the classes of a Java system. Expand
Understanding the shape of Java software
TLDR
The results of the first in-depth study of the structure of Java programs are presented, finding evidence that some relationships follow power-laws, while others do not. Expand
Detecting missing method calls as violations of the majority rule
TLDR
A new system is proposed that searches for missing method calls in software based on the other method calls that are observable, showing that the voting theory concept of majority rule holds for method calls. Expand
A study of the uniqueness of source code
TLDR
The first study of the uniqueness of source code is presented, examining a collection of 6,000 software projects and measuring the degree to which each project can be `assembled' solely from portions of this corpus, thus providing a precise measure of `uniqueness' that is called syntactic redundancy. Expand
A fault model for subtype inheritance and polymorphism
TLDR
This paper presents a model for the appearance and realization of OO faults and defines and discusses specific categories of inheritance and polymorphic faults, which can be used to support empirical investigations of object-oriented testing techniques, to inspire further research into object- oriented testing and analysis, and to help improve design and development of object. Expand
Data mining library reuse patterns using generalized association rules
  • A. Michail
  • Computer Science
  • Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium
  • 2000
TLDR
The paper improves upon earlier research using "association rules" by taking into account the inheritance hierarchy using "generalized association rules", and shows how data mining can be used to discover library reuse patterns in existing applications. Expand
Dual ecological measures of focus in software development
TLDR
This work analogizes the developer-artifact contribution network to a predator-prey food web, and draws upon ideas from ecology to produce a novel, and conceptually unified view of measuring focus and ownership, which are theoretically well-founded and yield novel predictive, conceptual, and actionable value in software projects. Expand
...
1
2
3
...