Testing Framework for Black-box AI Models

  title={Testing Framework for Black-box AI Models},
  author={Aniya Aggarwal and Samiullah Shaikh and Sandeep Hans and Swastik Haldar and Rema Ananthanarayanan and Diptikalyan Saha},
  journal={2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)},
With widespread adoption of AI models for important decision making, ensuring reliability of such models remains an important challenge. In this paper, we present an end-to-end generic framework for testing AI Models which performs automated test generation for different modalities such as text, tabular, and time-series data and across various properties such as accuracy, fairness, and robustness. Our tool has been used for testing industrial AI models and was very effective to uncover issues… 

Figures and Tables from this paper

Automated Testing of AI Models

The capability of the AITEST tool is extended to include the testing techniques for Image and Speech-to-text models along with interpretability testing for tabular models, making it a comprehensive framework for testing AI models.

Quantitative AI Risk Assessments: Opportunities and Challenges

A quantitative AI Risk Assessment will quantitatively assess the risks of an existing model in a manner analogous to how a home inspector might assess the energy efficiency of an already-built home or a physician might assess overall patient health based on a battery of tests.

pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing

A new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing in a distrustful multi-stakeholder environment is proposed and a new quadratic matrix program (QMP)- based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple inputs and inputs in a batch manner is presented.

A survey on software test automation return on investment, in organizations predominantly from Bengaluru, India

A survey and analysis is presented to understand the ROI of test automation from industry test professionals from both product and services organizations.

Software Fairness: An Analysis and Survey

A clear view of the state-of-the-art in software fairness analysis is provided including the need to study intersectional/sequential bias, policy-based bias handling and human-in- the-loop, socio-technical bias mitigation.



Black box fairness testing of machine learning models

This work proposes a methodology for auto-generation of test inputs, for the task of detecting individual discrimination, which combines two well-established techniques - symbolic execution and local explainability for effective test case generation.

Machine Learning Testing: Survey, Landscapes and Horizons

This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research, covering 144 papers on testing properties, testing components, and application scenarios.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

The potential for artificial intelligence in healthcare

ABSTRACT The complexity and rise of data in healthcare means that artificial intelligence (AI) will increasingly be applied within the field. Several types of AI are already being employed by payers

Neural network credit scoring models

  • D. West
  • Computer Science
    Comput. Oper. Res.
  • 2000

Fairness through awareness

A framework for fair classification comprising a (hypothetical) task-specific metric for determining the degree to which individuals are similar with respect to the classification task at hand and an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly is presented.

A Neural Network Scheme for Long-Term Forecasting of Chaotic Time Series

A topology and training scheme for a novel artificial neural network, named “Hybrid-connected Complex Neural Network” (HCNN), which is able to capture the dynamics embedded in chaotic time series and to predict long horizons of such series.

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models

Metamorphic Testing of a Deep Learning Based Forecaster

The Metamorphic Testing of an in-use deep learning based forecasting application that looks at the past data of system characteristics to predict outages in the future and finds that 65.9% of the bugs were caught through the relations.

A Survey on Metamorphic Testing

This article provides a comprehensive survey on metamorphic testing, which summarises the research results and application areas, and analyses common practice in empirical studies of metamorphIC testing as well as the main open challenges.