ConceptGlassbox: Guided Concept-Based Explanation for Deep Neural Networks

In Fig. 1, the process of explaining individual predictions is presented. It distinctly illustrates that when explanations are conveyed in terms of concepts that correspond with a user’s understanding of those concepts, it enables the user to make more informed decisions with the assistance of a machine learning model. In the following, we present ConceptGlassbox, a novel framework for local interpretability that facilitates the interactive acquisition of transparent and high-level concept definitions in the latent space and provide an explanation in the form of a small set of concepts contributing to the prediction of the instance being explained. The primary goal of ConceptGlassbox is to create an interpretable model over the learned concepts that is faithful to the black-box model.

Fig. 1figure 1

Explaining individual predictions. a A set of images similar to the image being explained is retrieved. b Each image is segmented resulting in a pool of segments all coming from similar images to the one being explained. The activation space of one bottleneck layer of a state-of-the-art CNN classifier is used as a similarity space. Similar segments are clustered in the activation space and outliers are removed to increase coherency of clusters. c ConceptGlassbox incorporate the user feedback in forming high-level intuitive concepts. d ConceptGlassbox builds an interpretable model on the learnt concepts and highlights the intuitive concepts learnt from the interpretable model that led to the explanation of the instance being explained

Fidelity-Interpretability Trade-Off

Let \(x'\in \mathbb ^\) be the initial representation of an instance being explained. More specifically, we adopt a perspective whereby an explanation is represented by a model \(f\in F\), selected from a class of inherently interpretable models including but not limited to linear models and decision trees. Model f is constructed over high-level intuitive concepts. Throughout this work, we will refer to the model to be explained as z. For classification task, \(z(x')\) is the probability of the membership of instance \(x'\) to a particular class. To define locality around \(x'\), we use \(\pi _(t)\) as a proximity measure between an instance t to \(x'\). Additionally, let \(L(z, f, \pi _)\) be a measure of how inaccurate the approximation of z by f is in the locality specified by \(\pi _\). We also introduce a complexity measure \(\Omega \) for the explanation model f. For example, \(\Omega (f)\) could be the number of non-zero weights in the case of a linear model or the depth of the tree in case of a decision tree. To ensure that the explanation is both interpretable and locally faithful, we must minimize \(L(z, f, \pi _)\) while keeping \(\Omega (f)\) low enough to be easily interpretable by humans. The explanation produced by ConceptGlassbox is obtained as follows:

$$\begin \zeta (x') = \mathop }\limits \limits _ L(z,f,\pi _)+\Omega (f) \end$$

(1)

Our objective is to train a two-stage prediction function, denoted as f, aimed at approximating the behavior of z in the proximity of \(x'\), leveraging a training dataset \(\,y_\}^\), where x represents the input feature vector, and y corresponds to the prediction made by z. The first stage of the prediction function involves a concept definition function denoted as g, which maps the embeddings of x to binary concepts \(c\in \^\). The second function, f, maps these concepts to predictions y obtained from z. Our objective is to obtain an interpretable function f that is locally faithful to z while also interactively training g to model the user’s knowledge about concepts.

Sampling for Local Exploration

We aim to minimize the locality-aware loss \(L(z,f,\pi _)\) outlined in Eq. 1. To approximate the behavior of z in the vicinity of \(x'\), we estimate \(L(z,f,\pi _)\) by drawing weighted samples by \(\pi _\). Specifically, we randomly select a set of instances \(I_\) from \(\,y_\}^\) and assign weights to each sample instance based on its proximity to \(x'\) (See Fig. 1(a)). Instances in the vicinity of \(x'\) are assigned a higher weight, while those farther away receive a lower weight. In this work, we set the size of the sample \(I_\) to 500, deferring the exploration of a dynamic sample size to future work. Using the dataset \(I_\), we optimize Eq. 1 to derive an explanation \(\zeta (x')\). ConceptGlassbox framework generates a locally faithful explanation, where the notion of locality is encapsulated by \(\pi _\).

Automatic Concept Discovery

The main goal of the concept extraction phase is to automatically extract meaningful concepts that will be later refined through user feedback. ConceptGlassbox takes a trained classifier and an image to be explained. It then extracts the concepts present in \(I_\), where in image data, concepts are typically in the form of segments. To extract concepts from \(I_\), ConceptGlassbox starts with the segmentation of each image using semantic image segmentation technique that aims to assign a meaningful class to each pixel (See Fig. 1(b)). ConceptGlassbox uses DeepLabv3+ segmentation technique [40], which has been widely used due to its superior performance on dense datasets (after examining several segmentation techniques). To ensure the meaningfulness of the extracted concepts, we cluster segments into a number of clusters such that segments of the same cluster represent a particular concept. We define the similarity between segments as the Euclidean distance between their corresponding activation maps obtained from the intermediate layer of model z. Each segment was resized to the original size of z. All segments were passed through z to obtain their layer presentations and then clustered using the MeanShift clustering algorithm [41]. Then, we retain only the top \(n = 70\) segments within each cluster, selected based on their smallest Euclidean distance from the cluster center, while discarding the remaining segments. We exclude two types of clusters. The first type comprises clusters consisting of segments sourced from only one image or a very small number of images. These clusters pose an issue as they represent uncommon concepts within the target class. For example, if numerous segments of the same type of tree appear in just one image, they may form a cluster due to their similarity. However, such clusters do not represent common concepts in the dataset. The second type encompasses clusters containing fewer than L segments. In this work, we use constant value for L equals \(0.4\sqrt\), where \(\) is the number of segments in cluster c, leaving the exploration of different values for L to future work. The main problem with clusters of few segments is that the concepts they present are uncommon in the neighborhood of the image being explained. To achieve a balance, we maintain three categories of clusters: a) high-frequency (segments appearing in over half of the discovery images), b) medium-frequency with moderate popularity (appearing in more than one-quarter of discovery images and with a cluster size larger than the number of discovery images), and c) high popularity (cluster size exceeding twice the number of discovery images). The output of this phase is a set of clusters represent the learnt concepts denoted \(C=\\) with centers \(\,...s_\}\), where n is the number of concept clusters after the exclusion criteria.

Interactive Concept Learning

To learn g that maps the embeddings of x to \(c\in \^\), we do the following. For each segment, \(s\in c\), the hidden layer activations \(a=z_(s)\) at layer l is extracted and stored along its corresponding concept label. For each candidate concept \(c\in C\), we train a logistic binary classifier \(h_\) to detect the presence of concept c. First, we proceed to train each concept denoted as c on the dataset \(D_\), which comprises a combination of segments carefully balanced to include instances both with and without the presence of concept c. We define \(D_=D^_ \cup D^_\), where \(D^_=\(s^),y_^),...., (z_(s^),y_^)|_=1}\}\) and \(D^_=\(s^),y_^),...., (z_(s^),y_^)|_=0}\}\), where \(y_\in \\) indicates the absence or the presence of concept c in a segment. Negative examples \(D^_\) for each concept c are randomly selected from other cluster concepts such that the number of examples in \(D^_\) and \(D^_\) are equal. We use these concept classifiers for each image \(I_^ \in I_\) to create a binary vector \(v=(r_, r_,..., r_)\) representing the presence or absence of each concept \(c\in C\) in \(I_^\), where \(r_=h_}(I_^)\), \(r_\in \\).

To incorporate the user’s knowledge in learning function g, we interactively learn the representative set of segments associated with each concept. On the one hand, asking the user for the intuitiveness of each segment in each cluster results in a frustrating experience that treats the user as a mere information oracle. On the other side, a design that ignores the needs of the machine learning model in favour of end-user flexibility may be equally frustrating if a user cannot properly train the model to recognize a desired concept. To assist a user in effectively steering a machine learning model while preserving the user’s flexibility and controllability, we cluster all segments in each cluster \(c_\) using k-means clustering algorithms [42]. In this work, we employ a static value of k, specifically \(k = 15\), leaving the investigation of alternative values for future research. This phase yields d clusters \(C' =\,..., c'_\}\) with centres \(\, s '_,... s'_\}\), where d is the total number of clusters obtained from the k-means algorithm applied on C. Next, the user is asked whether a segment \( s'_\) should be associated with concept \(c_\). It is posited that the intuitiveness of function g is achieved when the user acknowledges the suggested association between segments in a particular cluster \(c'_\) represented by a segment \( s'_\) and a concept \(c_\) for every (i, j) cluster-concept association in g. To learn g that satisfies intuitiveness, we do the following. We define a binary matrix \(A\in \^\), \(A_=1\) represents the association of segments in cluster \(c'_\) to concept j and \(A_=0\) represents the dissociation of cluster \(c'_\) from concept j. We first initialize matrix, A, by associating cluster \(c'_\) containing segment \(s_\) to concept \(c_\). Algorithm 1, which is adapted from [43], outlines the procedure for associating clusters to concepts. The algorithm constructs g on \(C'\) incrementally by suggesting several cluster-concept proposals \((i^*,j^*)\) that the user either accepts or rejects. The algorithm generates proposals based on pairs of (i, j) that have not been previously explored. A predetermined number of proposals are produced for each concept before proceeding to the subsequent one. In this work, we utilize a constant number of proposals per concept, which is denoted as \(numproposals=10\). More specifically, each concept \(c_j\) is associated with two feature lists. The first is the explored list \(l_\), which includes the clusters that have been suggested to users as possible associations with the concept \(c_j\). The second is the unexplored list \(u_\), which contains the set of clusters that have not yet been proposed for concept \(c_j\).

If a user approves the proposed cluster-concept pairing, the proposed cluster is incorporated into the definition of the concept, resulting in the cluster-concept matrix being updated such that \(A_=1\). If the user rejects the proposal, the matrix remains unchanged. Initially, list \(l_\) is initialized with a single cluster i, whereby \(A_=1\) for all concepts j, while \(u_\) contains the remaining clusters not included in \(l_\). Algorithm 1 outlines the process of modelling user feedback and proposing cluster-concept associations. The algorithm adapts its proposals based on the user’s prior acceptance of proposed associations, and iteratively refits model f after each update to function g. To do so, a matrix intuit is maintained to store the labels of the proposals that have been accepted or rejected by the user. This matrix is initialized with \(intuit_=1\) and \(intuit_=0\), if \(A_=1\) in the concept definitions initialized by the user. If a user agrees with the suggested cluster-concept association, the matrix is updated by setting \(intuit_\) to 1. However, if the user disagrees, the matrix remains unchanged. It is noteworthy that a given cluster may be associated to multiple concepts. The primary challenge is to suggest cluster-concept associations that are both faithful to the model and intuitive to the user. If the proposal accurately reflects the black-box model but is not intuitive to the user, it will not be accepted, and the overall performance of f will not improve. Conversely, if a proposal is unfaithful, even if accepted by the user, it will not enhance the performance of f. Therefore, the objective is is to generate a substantial number of proposals that exhibit interpretability while maintaining a high level of predictive accuracy. To accomplish this goal, two scores are computed for each proposal: fidelity score (SFid) and intuitiveness score (SIntuit). \(SFid_\) is used to assess the degree to which model f accurately captures the behavior of z in the vicinity of \(x'\) when associating cluster i to concept j.

Algorithm 1figure a

Algorithm for interactively proposing intuitive and interpretable concepts with human feedback

\(SFid_\) is calculated for each concept \(c_j\), and the scores are ranked to prioritize proposals that enhance the fidelity of the model. The \(SFid_\) score is computed by updating the value of f once the user approves the (i, j) proposal. Our objective with \(SIntuit_\) is to evaluate the likelihood of the user accepting the association between cluster i and concept j. We calculate \(SIntuit_\) for each concept \(c_j\) and prioritize our proposals based on the ranking of these scores to increase the chances of acceptance by the user. To achieve coherence in our concepts, we assume that the user is more inclined to approve the association of cluster i with concept j if another similar cluster i′ has already been associated to concept j. The notion of similarity between two clusters is defined by the Euclidean distance (denoted D) between the centres of the two clusters. We determine the likelihood of the user accepting the association between cluster i to concept j as follows:

$$\begin SIntuit_ = exp(\frac \frac-Intuit_)^}}}} D(s'_,s'_)}) \end$$

(2)

Our objective is to produce cluster-concept proposals that are both highly intuitive and faithful. To achieve this, we rank the proposals based on the Pareto front of the trade-off between intuitiveness and fidelity, selecting the one with the highest rank.

Fig. 2figure 2

A shallow concept-based explanation decision tree, with a depth of 4, is employed to elucidate the prediction of an image depicting the coastline

Constructing Local Explanation

ConceptGlassbox constructs the explanation model over a dataset consisting of the concept representation v for each image \(I^i_\in \) \(I_\) along with the class prediction obtained from z. ConceptGlassbox is based on the view that a satisfactory explanation should explain both the prediction of the instance being explained and a user-defined counterfactual decision. The ConceptGlassbox framework employs two distinct explanation models for explaining instances in terms of high-level concepts. Moreover, ConceptGlassbox offers a counterfactual explanation, which delineates the minimal alteration required in the feature values to transition the prediction to a predefined output [44]. The first model is a decision tree classifier favoured for its interpretability. It allows for deriving concept rules from a decision tree’s root-leaf path and extracting counterfactuals via symbolic reasoning. To swiftly search for counterfactuals, the framework considers all possible paths in the decision tree that result in a user-specified decision. The one with the least number of unsatisfied split conditions by \(x'\) is selected from these paths. As the depth of the decision tree increases, its prediction accuracy improves, but its interpretability decreases as the number of nodes grow rapidly. Therefore, a shallow decision tree is favored due to its enhanced comprehensibility. In this work, a fixed depth of 4 is used, with the investigation of dynamic depth left for future work. Figure 2 illustrates an explanation tree for an image predicted as a coast. The explanation tree indicates that the image has been classified as a coast due to the presence of concepts ‘mountain’, and ‘sea’. ConceptGlassbox provides a counterfactual explanation for a user-defined counterfactual decision (i.e., snowy mountain), which is the path in the decision tree that results from the presence of concepts ‘mountain’, ‘sea’, and ‘tree’ leading to snowy mountain prediction for the instance being explained. The second explanation model used by ConceptGlassbox is logistic regression, favoured for its interpretability through concept weights. To obtain a counterfactual explanation using a logistic regression model, the following approach is taken. Firstly, the concept representation \(x''\) of the instance being explained \(x'\) is obtained. Let \(min_(x'')\) be the vector obtained by modifying the minimum number of concepts in \(x''\) such that \(f(min_(x''))=y'\) and \(f(x')=y\), where \(y'\) is a user-specified counterfactual decision and \(y\ne \) \(y'\). A perturbation of \(x'\) is the minimum change in the number of concepts to change the prediction of \(x'\) to \(y'\). We calculate all perturbations of \(x'\) and select the perturbation that exhibits the highest probability of class \(y'\).

Comments (0)

No login
gif