Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python

  title={Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python},
  author={Amir M. Mir and Evaldas Latoskinas and Sebastian Proksch and Georgios Gousios},
  journal={2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)},
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and productivity. Lack of static typing can cause run-time exceptions and is a major factor for weak IDE support. To alleviate these issues, PEP 484 introduced optional type annotations for Python. As retrofitting types to existing code-bases is error-prone and laborious, machine learning (ML)-based approaches have been proposed to enable automatic type infer-ence based on existing, partially… 

Figures and Tables from this paper

Guess What: Test Case Generation for Javascript with Unsupervised Probabilistic Type Inference

. Search-based test case generation approaches make use of static type information to determine which data types should be used for the creation of new test cases. Dynamically typed languages like



TypeWriter: neural type prediction with search-based validation

TypeWriter is presented, the first combination of probabilistic type prediction with search-based refinement of predicted types, which can fully annotate between 14% to 44% of the files in a randomly selected corpus, while ensuring type correctness.

Learning type annotation: is big data enough?

This work presents TypeBert, demonstrating that even with simple token-sequence inductive bias used in BERT-style models and enough data, type-annotation performance of the most sophisticated models can be surpassed.

NL2Type: Inferring JavaScript Function Types from Natural Language Information

NL2Type is presented, a learning-based approach for predicting likely type signatures of JavaScript functions using a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code.

Deep learning type inference

DeepTyper is proposed, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially.

Typilus: neural type hints

A graph neural network model is presented that predicts types by probabilistically reasoning over a program’s structure, names, and patterns and can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones.

ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

A light-weight static analyzer pipeline is developed and accompanied with the ManyTypes4Py dataset, a large Python dataset for machine learning (ML)-based type inference that contains a total of 5,382 Python projects with more than 869K type annotations.

Python probabilistic type inference with natural language support

This work proposes to use probabilistic inference to allow the beliefs of individual type hints to be propagated, aggregated, and eventually converge on probabilities of variable types in Python programs.

PyART: Python API Recommendation in Real-Time

This paper proposes a novel approach, PyART, to recommend APIs for Python programs in real-time, which features a light-weight analysis to derives so-called optimistic data-flow, which is neither sound nor complete, but simulates the local data- flow information humans can derive.

Python 3 types in the wild: a tale of two type systems

This paper reviews MyPy and PyType, two canonical static type checking and inference tools, and their distinct approaches to type analysis, and evaluates the types and tools on a corpus of public GitHub repositories.

OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints

A framework for probabilistic type inference that combines logic and learning is introduced: logical constraints on the types are extracted from the program, and deep learning is applied to predict types from surface-level code properties that are statistically associated, such as variable names.