GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences

  title={GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences},
  author={Catherine Tony and Nicol'as E. D'iaz Ferreyra and Riccardo Scandariato},
—GitHub is a popular data repository for code exam- ples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning… 

Figures from this paper

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

A dataset containing 150 NL prompts that can be leveraged for assessing the security performance of large Language Models, and how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.



The Impact of Developer Experience in Using Java Cryptography

It is found that, in general, the experience of developers in using JCA does not correlate with their performance, and none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or thenumber of projects they are involved in correlate with developer performance in this domain.

Evaluating Large Language Models Trained on Code

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

Java Cryptography Uses in the Wild

An exploratory study to find out how crypto APIs are used in open-source Java projects, what types of misuses exist, and why developers make such mistakes concludes that using Crypto APIs is still problematic for developers but blindly blaming them for such misuses may lead to erroneous conclusions.

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions

This work systematically investigates the prevalence and conditions that can cause GitHub Copilot to recommend insecure code, and explores Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.

A Dataset of Parametric Cryptographic Misuses

This work created a collection of 201 misuses found in real-world applications along with a classification of those misuses, and integrated the dataset into MUBench, a benchmark for API misuse detection.

MUBench: A Benchmark for API-Misuse Detectors

With the dataset MuBench, a dataset of 89 API misuses that is collected from 33 real-world projects and a survey, the prevalence of API misused is analyzed, finding that they are rare, but almost always cause crashes.

Negative Results on Mining Crypto-API Usage Rules in Android Apps

This work proposes to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules, and yields negative results on the assumption that API usage updates tend to correct misuses.

Deep API learning

DeepAPI is proposed, a deep learning based approach to generate API usage sequences for a given natural language query that adapts a neural language model named RNN Encoder-Decoder, and generates an API sequence based on the context vector.

CogniCryptGEN: generating code for the secure usage of crypto APIs

CogniCryptGEN is a code generator that proactively assists developers in using Java crypto APIs correctly and is seen as significantly simpler to use than the same template-based solution.

Inferring crypto API rules from code changes

This work applies a new approach to extract security fixes from thousands of code changes to the Java Crypto API and shows that it is effective: over 80% of the code changes are security fixes identifying security rules.