GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences
@article{Tony2022GitHubCH, title={GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences}, author={Catherine Tony and Nicol'as E. D'iaz Ferreyra and Riccardo Scandariato}, journal={ArXiv}, year={2022}, volume={abs/2211.13498} }
—GitHub is a popular data repository for code exam- ples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning…
One Citation
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
- Computer Science
- 2023
A dataset containing 150 NL prompts that can be leveraged for assessing the security performance of large Language Models, and how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.
References
SHOWING 1-10 OF 26 REFERENCES
The Impact of Developer Experience in Using Java Cryptography
- Computer Science2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
- 2019
It is found that, in general, the experience of developers in using JCA does not correlate with their performance, and none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or thenumber of projects they are involved in correlate with developer performance in this domain.
Evaluating Large Language Models Trained on Code
- Computer ScienceArXiv
- 2021
It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.
Java Cryptography Uses in the Wild
- Computer ScienceESEM
- 2020
An exploratory study to find out how crypto APIs are used in open-source Java projects, what types of misuses exist, and why developers make such mistakes concludes that using Crypto APIs is still problematic for developers but blindly blaming them for such misuses may lead to erroneous conclusions.
Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions
- Computer Science2022 IEEE Symposium on Security and Privacy (SP)
- 2022
This work systematically investigates the prevalence and conditions that can cause GitHub Copilot to recommend insecure code, and explores Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.
A Dataset of Parametric Cryptographic Misuses
- Computer Science2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
- 2019
This work created a collection of 201 misuses found in real-world applications along with a classification of those misuses, and integrated the dataset into MUBench, a benchmark for API misuse detection.
MUBench: A Benchmark for API-Misuse Detectors
- Computer Science2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)
- 2016
With the dataset MuBench, a dataset of 89 API misuses that is collected from 33 real-world projects and a survey, the prevalence of API misused is analyzed, finding that they are rare, but almost always cause crashes.
Negative Results on Mining Crypto-API Usage Rules in Android Apps
- Computer Science2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
- 2019
This work proposes to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules, and yields negative results on the assumption that API usage updates tend to correct misuses.
Deep API learning
- Computer ScienceSIGSOFT FSE
- 2016
DeepAPI is proposed, a deep learning based approach to generate API usage sequences for a given natural language query that adapts a neural language model named RNN Encoder-Decoder, and generates an API sequence based on the context vector.
CogniCryptGEN: generating code for the secure usage of crypto APIs
- Computer ScienceCGO
- 2020
CogniCryptGEN is a code generator that proactively assists developers in using Java crypto APIs correctly and is seen as significantly simpler to use than the same template-based solution.
Inferring crypto API rules from code changes
- Computer SciencePLDI
- 2018
This work applies a new approach to extract security fixes from thousands of code changes to the Java Crypto API and shows that it is effective: over 80% of the code changes are security fixes identifying security rules.