• Corpus ID: 237532193

Let the CAT out of the bag: Contrastive Attributed explanations for Text

  title={Let the CAT out of the bag: Contrastive Attributed explanations for Text},
  author={Saneem A. Chemmengath and A. Azad and Ronny Luss and Amit Dhurandhar},
Contrastive explanations for understanding the behavior of black box models has gained a lot of attention recently as they provide potential for recourse. In this paper, we propose a method Contrastive Attributed explanations for Text (CAT) which provides contrastive explanations for natural language text data with a novel twist as we build and exploit attribute classifiers leading to more semantically meaningful explanations. To ensure that our contrastive generated text has the fewest… 

Figures and Tables from this paper


Plug and Play Language Models: A Simple Approach to Controlled Text Generation
The Plug and Play Language Model (PPLM) for controllable language generation is proposed, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM.
Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
A novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network is proposed and it is argued that such explanations are natural for humans and are used commonly in domains such as health care and criminology.
Pathologies of Neural Models Make Interpretations Difficult
This work uses input reduction, which iteratively removes the least important word from the input, to expose pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Examples are not enough, learn to criticize! Criticism for Interpretability
Motivated by the Bayesian model criticism framework, MMD-critic is developed, which efficiently learns prototypes and criticism, designed to aid human interpretability.
Leveraging Latent Features for Local Explanations
The new definition of "addition" uses latent features to move beyond the limitations of previous explanations and resolve an open question laid out in Dhurandhar, et.
A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Hierarchical interpretations for neural network predictions
This work introduces the use of hierarchical interpretations to explain DNN predictions through the proposed method, agglomerative contextual decomposition (ACD), and demonstrates that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN's outputs.
Efficient Data Representation by Selecting Prototypes with Importance Weights
This paper presents algorithms with strong theoretical guarantees to mine data sets and select prototypes and presents a fast ProtoDash algorithm that works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016).