skip to main content
research-article
Open Access

Semantic Explanation for Deep Neural Networks Using Feature Interactions

Published:15 November 2021Publication History

Skip Abstract Section

Abstract

Given the promising results obtained by deep-learning techniques in multimedia analysis, the explainability of predictions made by networks has become important in practical applications. We present a method to generate semantic and quantitative explanations that are easily interpretable by humans. The previous work to obtain such explanations has focused on the contributions of each feature, taking their sum to be the prediction result for a target variable; the lack of discriminative power due to this simple additive formulation led to low explanatory performance. Our method considers not only individual features but also their interactions, for a more detailed interpretation of the decisions made by networks. The algorithm is based on the factorization machine, a prediction method that calculates factor vectors for each feature. We conducted experiments on multiple datasets with different models to validate our method, achieving higher performance than the previous work. We show that including interactions not only generates explanations but also makes them richer and is able to convey more information. We show examples of produced explanations in a simple visual format and verify that they are easily interpretable and plausible.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Deep learning techniques have developed quickly in recent years, achieving high performance in such diverse areas as computer vision [16, 30, 46] and natural language processing [29]. Although research has shown the discriminative power of deep neural networks (DNNs), the grounds on which such networks make decisions have not been fully clarified. This is a major problem: without explanations for decisions, users cannot be certain that their models learn the proper knowledge. Moreover, convincing explanations for models have additional special value in business situations, where they can be used to improve services or products. Thus, many studies have been reported aiming at the explainability problem [1, 3, 34, 47].

There are various methods for explaining networks. In the case of convolutional neural networks (CNNs), one of the most popular strategies is to visualize the areas where the model is paying attention [36]. These methods measure gradients of CNN weights in response to the change in output; they are not able to capture semantic information. To solve this problem, several network analysis studies have attempted to produce more semantic and interpretable visualization methods [52]. However, these were done by incorporating explanation modules into models that were designed for other tasks, sacrificing performance. Other studies developed post hoc explaining models using knowledge distillation: explainable models were trained to imitate models designed for the original tasks [6].

Chen et al. [3] also employed knowledge distillation to produce semantic and quantitative explanations: explaining models could produce explanations for prediction while learning to output the same values as the original predictors. One of the methods they introduced describes attributes by the addition of contributions of visual concepts. More specifically, after they trained a predictor for multi-label classification, they obtained the contributions of each pre-defined visual concept to a target attribute’s classification result by calculating the product of a classification probability and its weight. They then trained an “explainer” so that the sum of the contributions became close to the classification result of the target attribute from the original predictor. In this way, they produced humanly comprehensible explanations without harming the performance of the original tasks; nevertheless, there is a drawback to their method: the performance of the explainer is lower than that of the original problem-solving model, predictor, by a large margin. This is due to the simplicity of the composition of the explainer; that is, the target prediction by the explainer is simply the sum of the contributions of the visual concepts.

In order to improve the performance while preserving or even enhancing the explanation, we propose a method of producing explanations using feature interactions. We were inspired by the factorization machine (FM) [33]. FM combines the advantages of support vector machines and factorization models. It has been a popular option for predictive tasks such as click-through rate prediction and advertisement recommendation. Although FM is not a deep-learning-based method, some recent methods using DNNs have incorporated it into their architectures [5, 11]. In our work, we extend the method of Chen et al. [3], described above, to make the best use of FM. There have been few works that introduce feature interactions as explanations of DNNs. In contrast to the interactions computed in these existing methods, ours calculates factor vectors, which are used for calculating interactions, for every input. This calculation method enables us to consider feature interactions as explanations.

The flow of our proposed architecture is as follows. We first train a predictor to predict all the attributes; then we train an explainer that can take feature interactions into account. Weights for attributes and factor vectors are obtained from the explainer to calculate the contributions of each individual attribute and their interactions. These contributions are our explanations for the prediction results. This method enables us to analyze explanations quantitatively and semantically. The sum of the contributions and a bias term is assumed as an output value from the explainer, and the model is trained to output values close to the prediction models that the explainer model mimics. In other words, the explainer model gains knowledge from the prediction model and imitates its decision logic. Note that the interactions in our method are different from those of FM in that factor vectors are not simple embeddings of features, but variables derived from each input.

The experiments, including both classification and regression tasks, verify the efficiency of our method. Because each dataset has different types of input/output and tasks (classification and regression), we have designed and trained different models on each one. Our method, although intended mainly for visual analyses, is applicable to multimedia data, so we also tested it on a TV ads dataset. A comparison between our approaches and models without interactions shows that ours can achieve higher performances. Furthermore, visualized explanations are displayed to prove that these explanations are humanly interpretable and become richer thanks to interactions. The contributions of this work are as follows:

  • We propose a method of explaining networks with the aid of feature interactions that yields easily interpretable semantic and quantitative explanations. The weight factor vector of FM is dynamically tuned depending on the input data.

  • We demonstrate different kinds of networks for the multiple datasets used in our experiments so as to cope with various types of input and output.

  • Our method performs better on the datasets than does a similar method without interactions.

Skip 2RELATED WORKS Section

2 RELATED WORKS

2.1 Explainability in Deep Learning

Visualization: When a CNN is the base of a model, visualization methods are often used to explain the decisions of the networks. When the areas where the model is focusing are highlighted, it becomes clear which parts affect the prediction result significantly. One visualizing strategy is to illustrate gradients of weights in layers caused by back propagation through trained models [8, 27, 37, 48, 49]. In this strategy, inverse operations are defined for every layer in the CNN model; gradients can then be taken from any layer to analyze their nature individually. Class activation map (CAM) is one of the most common visualization techniques to highlight areas corresponding to specific classes, and its successor model Grad-CAMs [36] are based on the same principles. Grad-CAMs use activation maps and gradients of weights, instead of weights as in CAM, to make heatmaps. One drawback of these types of approaches is that they are mainly targeted only at classification problems.

Another major explanatory strategy aimed at visualization employs perturbation methods. These methods learned the relationship between inputs and outputs by adding changes to the inputs and calculating the extent to which the outputs are altered by the change. Perturbation has been implemented in various ways, such as sliding an occlusion mask over an input image [34, 48, 54], masking a word in input sentences, or masking a feature from a hidden layer [21]. Zintgraf et al. [55] proposed a method for calculating the relevance between regions in an input image and classes. They estimated the conditional probability of the presence of a certain class under the condition that a part of an image was perturbed; the difference between the probability and the original prediction result for the whole image was assumed to be the contribution made by the perturbed area to the classification result.

Model revision for interpretable features: There are many studies that, instead of taking a post hoc approach, attempt to improve the model itself that has been reported so that interpretable features can be obtained [42]. Generative models have developed so remarkably that the generated images are almost completely natural, but the lack of interpretability in the latent space has limited the range of applications. In order to interpret generative-model decision-making and control attributes in generated images, disentanglement has been explored [4, 15]. Zhang et al. [52] proposed an interpretable CNN in which each filter in a high convolutional layer represents a specific object part by introducing a loss for each filter.

Joint training: Visualization methods that involve analyzing pre-trained models have limited expressiveness. On the other hand, incorporating interpretability into models can damage their discriminative power. To overcome these problems, approaches have emerged that introduce additional tasks [17, 19, 31]. In this approach, not only the original model but also additional models solving other tasks are trained jointly. One of the most intrinsic strategies is to generate explanations in text format. Hendricks et al. [14] have proposed a phrase-critic model to refine candidate explanatory sentences by comparing their accumulated scores for each noun they contain and selecting the one that is the most class and image relevant. Zellers et al. [50] have formulated a new task called visual sense reasoning, the task of answering questions with a thorough visual understanding. They have collected data that contains questions, answers, and rationales and introduced a method for solving the task that consists of grounding, contextualization, and reasoning. Another joint training method is explanation by prototypes: the prediction result of an input image is explained by a subset of the training datasets [2, 22, 28].

Knowledge distillation: To enhance interpretability, it has been suggested that an explainable model could distill knowledge from a prediction model [6, 12]. Various ways of distilling knowledge into more interpretable models, such as decision trees and graphs, have been investigated [10, 51]. This kind of strategy is applicable not only to images but also to other domains such as videos [18]. Our work is closely related to one of the distillation methods proposed by Chen et al. [3], which describes prediction results using pre-defined visual concepts, with a prior weight to prevent biased interpretation. We describe this method in more detail in Section 3. Our modification of it achieves higher performance by enhancing its discriminative power. In addition, our method can generate more detailed explanations.

2.2 Feature Interaction

Implicit interaction interpretation: There have been some approaches to detect and interpret feature interactions [9, 35]. Some works have tackled interpretation of complex models using features [26, 39]. However, feature interactions were not discussed in these methods. One of the most recent works about detecting interactions in neural architectures is [44], the goal of which is closely related to ours. Our method, however, is aimed to explicitly consider interactions and furthermore is capable of comprehensive interpretation including not only interactions but also each feature separately.

Learning explicit interaction: In order to improve prediction performances for that high-dimensional and sparse data that often can been seen in recommender systems, factorization models have gained popularity, such as Matrix Factorization [20, 41]. Rendle [33] proposed FM, which combines the advantages of support vector machines and the factorization models. FM works as a general predictor with any real-valued feature vector. By using factorized parameters, it can model interactions between all the input parameters even in the case of sparsity. As DNNs have gained popularity, there have been many other attempts to integrate interactions in deep learning models [5, 11, 23, 32, 40, 45, 53].

Skip 3APPROACH Section

3 APPROACH

In this section, we explain how our model works in detail. First, we provide an overview of our whole architecture, and then illustrate its components individually.

3.1 Overview

An overview of our model is shown in Figure 1. Our goal is to explain prediction results by the contributions from every individual attribute and their interactions; hence, our model is a combination of two sub-models that we term the “performer” and “explainer,” following [3]. First, a prediction task is solved by the performer; then, an explanation is generated by the explainer using the predicted values. The prediction results are explained by the contributions of single features and their interactions. More precisely, we acquire vectors from the explainer whose sizes are the total number of explanatory attributes plus that of combinations of every attribute. The vector represents numerical contributions to the prediction result, by which we can tell how much each attribute or interaction contributes to the result, and we treat the vector as an “explanation” for the object network.

Fig. 1.

Fig. 1. Overview of our model. It is composed of a performer model that performs a prediction task and an explainer model that explains the prediction result. The explainer distills knowledge from the performer. Face image is from CelebA dataset [25].

We will define the variables used in Figure 1 and the following sections before entering into the details. We assume the output value from the performer is , and the output value from the explainer is given an input instance I. The predictors of a target attribute in the performer are denoted by F, and those of the other attributes by , where n represents the number of attributes used to explain the prediction. The attribute’s prediction results are , whereas is a vector that denotes the weight for single attributes and is a matrix containing factor vectors. The explainer consists of g (for producing ) and h (for producing ). The size of a factor vector is set to l. These variables are used to calculate the contributions, as we explain in the next section.

3.2 Explainer Algorithm

In this section, we illustrate how the explainer works and how to acquire explanations for the predictor results. We formulate the explainer model as

(1)
where , and means an inner product of two vectors. The first term in the equation is a bias, the second refers to the contributions from attributes, and the third to contributions made by attribute interactions. In the second term, is the product of a weight from the explainer and a predicted value from the performer, measuring the extent to which the ith attribute contributes to the target attribute. Regarding the third term, we newly add this to the formula of the previous method [3]. represents the interaction between the ith and jth attributes. It is defined as the product of their predicted values from the performer with the inner product of the ith and jth factor vectors.

It is important to note that the weights and the factor vectors are dynamically calculated depending on the attributes as well as the input instance and the parameters in the explainer, which means that every set of input data produces a different set of factor vectors; this contrasts with the usual linear regression or FM model. These variables thus give us more expressive power than do traditional methods.

We will now take a closer look at the interaction term. A straightforward approach to computing the contribution of the interactions would be calculating each interaction for every pair of attributes and then adding them up. However, this would be a very time-consuming operation, because its computational complexity is of . To reduce the computational load, we follow the reformulation used in the original FM [33]. The third term in Equation (1) is calculated as follows in an actual model, omitting deformations in the middle:

(2)
This reduces the complexity to . In the training phase, interactions are calculated in this way. By contrast, in the testing phase, we calculate all pairwise interactions so as to clarify which kinds of attribute interactions make large contributions and which do not.

3.3 Training Process

First, the performer is trained to solve a prediction task, for example, multilabel classification and regression. We use cross entropy as the performer’s loss function for classification tasks and mean squared error for regression tasks. Then, the explainer is trained using the output values from the performer. The value predicted by the explainer, which is calculated in the manner illustrated in the previous section, is expected to be similar to the prediction results by the performer because we train the explainer to mimic the behavior of the performer. The loss function for training the explainer is defined as follows:

(3)

The loss function works for minimizing the mean squared error between the performer and explainer outputs. PriorLoss was proposed in [3] as a solution for the problem of biased interpretation: simply minimizing the error between the performer and explainer outputs makes the explainer select fewer attributes, resulting in biased explanations. To avoid this, the prior weights are approximated as the derivatives of with respect to , and the difference between the priors and the weights is minimized. The definition of the loss function is , where t represents the current epoch, is a constant, and means the L-2 norm. We use this loss function in some of our experiments to penalize the weights of the additive function of attribute contributions.

Skip 4EXPERIMENTS Section

4 EXPERIMENTS

We used three datasets to validate our method. Each dataset has a different domain of input data and a different type of annotation. By testing our model in different settings, it can be verified to be useful in many applications.

4.1 Experiment 1: CelebA Dataset

The first dataset is the CelebA dataset [25]. This is a face attribute dataset including 200K celebrity images, each with 40 attribute annotations such as “Eyeglasses” and “Smiling.” In this experiment, we set “Attractive,” “Heavy Makeup,” “Male,” and “Young” as the attributes to be explained by the rest. These global attributes are selected as targets because they can be intuitively explained by combinations of other local features.

The model architecture is illustrated in Figure 2. The input is an image. We use VGG16 [38] as a base model for a performer and ResNet152 [13] for an explainer; both are pretrained on ImageNet [7]. F and serve as the performer’s prediction heads that output predictions for each attribute. The explainer’s prediction heads, g and , which regress the weight and factor-vector for each attribute, share the same architecture with different parameters following ResNet152. Layers composing these models are listed in Table 1. The number of explanatory attributes n is 39. The dimension of factor vectors l is set to 2.

Fig. 2.

Fig. 2. Model architecture used for the experiments on CelebA dataset and DeepFashion dataset. The Face image is from CelebA dataset [25].

Table 1.
LayerSpecification
Linear
ReLU
Dropout
Linear
ReLU
Dropout
Linearn-1l
  • The performer’s prediction head follows VGG16, and the explainer’s follows ResNet152. The dimen- sion of the last output is l only in to regress factor vectors.

Table 1. Architectural Specification of the Performer and Explainer’s Prediction Head

  • The performer’s prediction head follows VGG16, and the explainer’s follows ResNet152. The dimen- sion of the last output is l only in to regress factor vectors.

We first train a performer with a cross-entropy loss and then an explainer with the loss function of Equation (3). In PriorLoss, is set to 10.

4.2 Experiment 2: DeepFashion Dataset

The second dataset is the DeepFashion dataset [24]. The DeepFashion database includes many benchmarks available for various purposes. We select the Category and Attribute Prediction Benchmark because it contains rich annotations suitable for explanations. Coarse annotation has five types of attributes: texture, figure, shape, part, and style. Because the annotation includes as many as 1,000 attributes, we reduce the number of attributes and data. The benchmark includes many kinds of clothes (denim jacket, long skirt, T-shirt, etc.). In order to limit the number of attributes, we choose only data from the tops. Then the 100 most frequent attributes are selected, for example, “Print,” “Knit,” and “Shirt;” the rest are abandoned. As a result, the number of data points is about 140K. In this experiment, we select “Classic,” “Basic,” “Cute,” and “Soft” as attributes to be explained by the other attributes, as they describe clothes’ global features.

The model architecture used in this experiment and its training process is the same as that of Experiment 1. The input is an image. The number of explanatory attributes n is 99. The dimension of factor vectors is the same as that of Experiment 1, which is 2.

4.3 Experiment 3: TV Ads Dataset

The last dataset used is the TV ads dataset, a collection of 14,990 commercial videos that were actually broadcast on TV in Japan between January 2006 and April 2016. Each video was evaluated and annotated by 600 participants. The dataset was collected to predict the following four impressional and emotional effects:

  • Favorability rating (F): how much participants liked the content of the advertisement itself

  • Interest rating (I): how much participants became interested in the product/service

  • Willingness rating (W): how much participants felt like buying the product/service

  • Recognition rating (R): how much participants remembered the advertisement

Besides the videos, the dataset contains metadata such as information about the casts featured in the ad. In addition, scores are given to 26 attributes that describe the ad, such as “Good story” and “Impressive.” In the present experiment, we attempt to explain each of the four effects of the attributes.

The effects and the attributes are continuous values, not binary labels, and the performer’s prediction task is necessarily a regression problem, in contrast to the previous two experiments. Hence, a different architecture is needed. We illustrate the model in Figure 3. Input data consists of frame deep features extracted from video, sound, metadata, cast data, text in frames, and narration data. As the base model for both the performer and the explainer, we employ a multimodal fusion model using an attention mechanism proposed in previous research [43]. F regresses one of the four effects; f outputs a vector (where n is 26) that represents the predicted attributes. In contrast to the model in Figure 2, F and f output the target prediction and explanatory attributes prediction independently. The architecture of the explainer is otherwise similar to that in Figure 2: g and h share the base model, and its branches produce and . We set in Equation (3) to 0; that is, we do not employ PriorLoss in this experiment. l is set to 2 similarly.

Fig. 3.

Fig. 3. The model architecture used in Experiment 3. The performer solves regression tasks. The performer and explainer have the same multimodal prediction module.

4.4 Results

We show the accuracy or correlation coefficients of each experiment in Tables 2, 4, and 6. We also show the conditional entropy in Tables 3, 5, and 7. The first row shows the result from the explainer in the method of Chen et al. [3], the second row shows our method’s explainer, and the third row shows the performer. In the experiment, we compared our method to the previous method to show that feature interaction improves the explainer’s performance. Conditional entropy of explanation presented in the previous work [3] is not appropriate for evaluating our method because the weights of attributes and their interactions were not approximated as they were in the previous one.

Table 2.
AttractiveHeavy MakeupMaleYoung
Explainer w/o interaction [3]0.7890.8990.9600.812
Explainer w/ interaction (ours)0.8150.9110.9700.875
Performer0.8190.9120.9770.881
  • The evaluation metric is classification accuracy.

Table 2. Results of Experiment 1 on the CelebA dataset

  • The evaluation metric is classification accuracy.

Table 3.
AttractiveHeavy MakeupMaleYoung
Explainer w/o interaction [3]9.819.809.819.81
Explainer w/ interaction (ours)9.809.819.819.82
Performer9.859.869.819.88
  • The evaluation metric is the conditional entropy of the prediction.

Table 3. Results of Experiment 1 on the CelebA Dataset

  • The evaluation metric is the conditional entropy of the prediction.

For Experiments 1 and 2, we reimplemented the previous method to use it as a comparative method. There could be a slight difference between our implementation and that of Chen et al. [3], since their paper misses some details about the model; nevertheless, as the first and third rows in Table 2 show, the performer and explainer implemented by us achieve almost the same performances as those reported in the paper. This implies that our implementation can accurately reproduce the method of Chen et al. [3].

Table 2 shows the results of the experiment on the CelebA dataset. As mentioned above, our proposed method is compared with an interaction-free method from the literature. From the table it can be seen that explainers perform better with feature interaction whatever the target attribute, and attain accuracies close to those of performers. This indicates that feature interaction is capable of increasing both explainability and the model’s discriminative power at the same time. To verify that our model using interaction can produce reasonable explanations, we display an example in Figures 4 and 5, picked from test data in the CelebA dataset. These are explanations of why the performer judged the image to be “Attractive.” The horizontal axis is the attribute label and the vertical axis is the contribution to the prediction. Figure 4 shows an explanation produced with the method of Chen et al. [3]. The 20 largest contributions are sorted in descending order. Figure 5 shows an explanation produced by our method. The top row shows the contribution from single attributes and the bottom row shows that from attribute interactions (the 20 largest for each). The previous method already achieved quantitative and semantic explanations; however, ours is able to consider not only single-attribute contributions but also interactions, resulting in more unbiased and insightful explanations. Examining Figure 5 in more detail, it is suggested by the explainer that attributes such as “Double chin” and “Bushy eyebrows” contribute to “Attractive” for this face image and so do the attribute interactions, including “No beard & Young” and “Male & Young.” The explanation is reasonable and easily interpretable by humans. In addition, it is observed in the explanation that contributions made by feature interactions such as “No beard & Young” are larger than those made by single features such as “Double chin.” It can be estimated that the performer highlights the feature interaction when it performs prediction, and our method is able to successfully detect that.

Fig. 4.

Fig. 4. Example of explanations produced by the explainer without interactions in experiments on the CelebA dataset. This and the one below explain a prediction result for “Attractive.”

Fig. 5.

Fig. 5. Example of explanations produced by the explainer with interactions in experiments on the CelebA dataset [25]. Note that the scale between the two rows is different.

Table 4 shows the results of the experiment on the DeepFashion dataset. As can be seen, introducing interactions to the explainer is not helpful for improving prediction performance in this domain. There are two possible reasons for this. The first is that the annotations are so coarse that the interactions contain more error. Since interaction is the product of the probabilities of two attributes and factor vectors, error tends to be amplified. The second possible reason is that the task itself is so simple that explanation models can easily converge to the optimum regardless of variation of explanatory variables. We find that conditions such as the number of attributes or complexity of the model are strongly related to the quality of explanation and prediction performance and thus need to be carefully designed. For a more detailed analysis, we present an example of explanations produced by the method of Chen et al. and by ours in Figures 6 and 7, respectively. These explanations were produced to explain why the image displayed on the top is “Classic.” For example, according to Figure 6, the second most important reason for being “Classic” is “New York,” while it is hard to determine whether the cloth can be categorized as “New York.” Furthermore, Figure 7 shows that interactions between “Collar & Button” or between “Collar & Pleated” are the most significant factors, although “Button & Pleated” are unseen in the image. Similarly, other examples contain wrong attributes in their explanations.

Fig. 6.

Fig. 6. An example of explanations produced by the explainer without interactions in the experiment on the DeepFashion dataset. This chart and the one below show explanation of a prediction result for “Classic.” Note that a cloth of interest in this image is a jacket.

Fig. 7.

Fig. 7. An example of explanations produced by the explainer with interactions in experiments on the DeepFashion dataset [24]. Note that the scale between the two rows is different.

Table 4.
ClassicBasicCuteSoft
Explainer w/o interaction [3]0.97000.99090.99210.9920
Explainer w/ interaction (ours)0.97000.99060.99200.9920
Performer0.96990.99090.99210.9920
  • The evaluation metric is classification accuracy.

Table 4. Results of Experiment 2 on the DeepFashion Dataset

  • The evaluation metric is classification accuracy.

Table 6 compares prediction results of the experiment on the TV ads dataset. Different from the previous two experiments, the results are evaluated by Pearson’s correlation coefficients, since the targets are continuous values. It can be seen that the explainer achieves higher performance when interactions are incorporated, except on the Favorability rating. This implies that considering interactions is valid for various tasks including regression. Figures 8 and 9 give examples of the explanations produced by the two methods explaining the ad’s Favorability rating.1 For the interaction-free explanation, attributes such as “Familiar” and “Empathetic” are dominant causes. By contrast, the explanation with interactions similarly takes “Familiar” as one of the most important reasons; however, it is different from the other one in that the second most emphasized attribute is “Celebrity/Character,” which is aligned with our intuition. Although in this case the effects of interaction are much less important than in the other two experiments, our method is able to produce reasonable quantitative and semantic explanations just as the other cases.

Fig. 8.

Fig. 8. An example of explanations produced by the explainer without interactions in experiments on TV ads dataset. This explains the favorability of a TV ad featuring famous actors promoting popular over-the-counter medicine.

Fig. 9.

Fig. 9. An example of explanations produced by the explainer with interactions in experiments on the TV ads dataset. Note that the scale between the two rows is different.

Tables 3, 5, and 7 show that the conditional entropy is almost the same as the explainer without interaction as well as the performer. However, as pointed out in [3], this is not directly related to the ground truth of explanations. We believe that higher accuracy and correlation coefficients are more important because it means the distillation from the performer is more successful.

Table 5.
ClassicBasicCuteSoft
Explainer w/o interaction [3]9.799.799.799.79
Explainer w/ interaction (ours)9.809.809.809.80
Performer9.829.839.839.81
  • The evaluation metric is the conditional entropy of the prediction.

Table 5. Results of Experiment 2 on the DeepFashion Dataset

  • The evaluation metric is the conditional entropy of the prediction.

Table 6.
FIWR
Explainer w/o interaction [3]0.6310.5520.7240.653
Explainer w/ interaction (ours)0.6130.5680.7280.674
Performer0.6870.6920.8160.716
  • The evaluation metric is Pearson’s correlation coefficients. The columns refer to Favorability, Interest, Willingness, and Recognition.

Table 6. Results of Experiment 3 on TV Ads Dataset

  • The evaluation metric is Pearson’s correlation coefficients. The columns refer to Favorability, Interest, Willingness, and Recognition.

Table 7.
FIWR
Explainer w/o interaction [3]6.816.826.826.81
Explainer w/ interaction (ours)6.826.826.816.80
Performer6.826.816.816.82
  • The evaluation metric is the conditional entropy of the prediction. The columns refer to Favorability, Interest, Willingness, and Recognition.

Table 7. Results of Experiment 3 on TV Ads Dataset

  • The evaluation metric is the conditional entropy of the prediction. The columns refer to Favorability, Interest, Willingness, and Recognition.

For more experimental results, please refer to Figures 10, 11, 12, and 13 in the appendix.

4.5 Discussion

In the previous section, we reviewed our experimental results and argued that our method with interactions is able to produce more accurate and insightful explanations than a similar one without them. Experiments 1 and 3 demonstrated better prediction results by explainers with interactions, while Experiment 2 resulted in almost the same performances. Here, we would like to consider an aspect of the architecture in more detail. Let’s discuss the regression model used in Experiment 3 on the TV ads dataset. This model is distinguished from the other two in that a sigmoid function is attached to the end of g and h, which induces weights and factor vectors , respectively. The reason we add a sigmoid to this model is that the prediction performance drops sharply otherwise, as illustrated in Table 8. Our original intention was to allow weights and factor vectors to take negative values to provide more flexibility to the models, as in [3] and the other two experiments. However, we find that the range of the weights and vectors needs to be restricted to produce plausible explanations and at the same time maintain an acceptable level of performance. We suppose that whether an explainer needs a sigmoid or another appropriate activation function depends on the type of prediction task: when explainer models are built, the fine details of their design will depend on the problems for which they are intended.

Table 8.
FIWR
Explainer w/o sigmoid0.5790.5180.6330.514
Explainer w/ sigmoid0.6130.5680.7280.674
  • The columns refer to Favorability, Interest, Willingness, and Recognition.

Table 8. Performance of Our Interaction-including Model on the TV Ads Dataset with and without a Sigmoid

  • The columns refer to Favorability, Interest, Willingness, and Recognition.

Skip 5CONCLUSION Section

5 CONCLUSION

In this article, we have proposed a method to add explainability to an existing prediction model regardless of the type of prediction task. More specifically, our method can produce quantitative and semantic explanations that are easily interpretable. Our method developed from previous work by Chen et al. [3] that attempted to explain a prediction result by the addition of contributions from attributes, without including interactions. However, this method had a defect, in that there was a trade-off between performance and explainability. Inspired by the factorization machine, we addressed this problem by introducing feature interactions to the method. We verified the effectiveness of our proposal through experiments on multiple datasets with multiple prediction problems. We conducted qualitative and quantitative evaluations of the explainer of our study and showed it superior to that of the no-interactions model.

In future work, the interactions not only between attributes in the same domain but also between attributes in different domains may be considered to acquire more insightful explanations.

Appendix

A RESULTS FROM EXPERIMENT 1

Fig. 10.

Fig. 10. Example of explanations produced by the explainer without interactions in experiments on the CelebA dataset. This and the one below explain a prediction result for “Attractive.”

Fig. 11.

Fig. 11. Example of explanations produced by the explainer with interactions in experiments on the CelebA dataset [25]. Note that the scale between the two rows is different.

Fig. 12.

Fig. 12. Another example of explanations produced by the explainer without interactions in experiments on the CelebA dataset [25]. This and the one below explain a prediction result for “Attractive.”

Fig. 13.

Fig. 13. Example of explanations produced by the explainer with interactions in experiments on the CelebA dataset corresponding to Figure 12. Note that the scale between the two rows is different.

Footnotes

  1. 1 Zenyaku Kogyo Co., Ltd. October 3 2005. Jikininn.

    Footnote

REFERENCES

  1. [1] Arik Sercan O. and Pfister Tomas. 2020. ProtoAttend: Attention-based prototypical learning. Journal of Machine Learning Research 21, 210 (2020), 135.Google ScholarGoogle Scholar
  2. [2] Chen Chaofan, Li Oscar, Tao Daniel, Barnett Alina, Rudin Cynthia, and Su Jonathan K.. 2019. This looks like that: Deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems, Vol. 32. 89308941. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chen Runjin, Chen Hao, Ren Jie, Huang Ge, and Zhang Quanshi. 2019. Explaining neural networks semantically and quantitatively. In Proceedings of the IEEE/CVF International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Xi, Duan Yan, Houthooft Rein, Schulman John, Sutskever Ilya, and Abbeel Pieter. 2016. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, Vol. 29. 21722180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cheng Heng-Tze, Koc Levent, Harmsen Jeremiah, Shaked Tal, Chandra Tushar, Aradhye Hrishi, Anderson Glen, Corrado Greg, Chai Wei, Ispir Mustafa, Anil Rohan, Haque Zakaria, Hong Lichan, Jain Vihan, Liu Xiaobing, and Shah Hemal. 2016. Wide and deep learning for recommender systems. In Proceedings of the Workshop on Deep Learning for Recommender Systems. 710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Choi Edward, Bahadori Mohammad Taha, Kulas Joshua A., Schuetz Andy, Stewart Walter F., and Sun Jimeng. 2016. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proceedings of the International Conference on Neural Information Processing Systems. 35123520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248255.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Erhan Dumitru, Bengio Yoshua, Courville Aaron, and Vincent Pascal. 2009. Visualizing higher-layer features of a deep network. Technical Report, Univeristé de Montréal.Google ScholarGoogle Scholar
  9. [9] Friedman Jerome H. and Popescu Bogdan E.. 2008. Predictive learning via rule ensembles. Annals of Applied Statistics 2, 3 (2008), 916954.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Frosst Nicholas and Hinton Geoffrey E.. 2017. Distilling a Neural Network Into a Soft Decision Tree. arxiv:1711.09784.Google ScholarGoogle Scholar
  11. [11] Guo Huifeng, Tang Ruiming, Ye Yunming, Li Zhenguo, and He Xiuqiang. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. In Proceedings of the International Joint Conference on Artificial Intelligence. 17251731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Harradon Michael, Druce Jeff, and Ruttenberg Brian. 2018. Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations. arxiv:1802.00541.Google ScholarGoogle Scholar
  13. [13] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  14. [14] Hendricks Lisa Anne, Hu Ronghang, Darrell Trevor, and Akata Zeynep. 2018. Grounding visual explanations. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Higgins Irina, Matthey Loïc, Pal Arka, Burgess Christopher, Glorot Xavier, Botvinick Matthew, Mohamed Shakir, and Lerchner Alexander. 2017. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  16. [16] Jiao Licheng, Zhang Fan, Liu Fang, Yang Shuyuan, Li Lingling, Feng Zhixi, and Qu Rong. 2019. A survey of deep learning-based object detection. IEEE Access 7 (2019), 128837128868.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kanehira Atsushi and Harada Tatsuya. 2019. Learning to explain with complemental examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kanehira Atsushi, Takemoto Kentaro, Inayoshi Sho, and Harada Tatsuya. 2019. Multimodal explanations by predicting counterfactuality in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Kim Jinkyu, Rohrbach Anna, Darrell Trevor, Canny John, and Akata Zeynep. 2018. Textual explanations for self-driving vehicles. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Koren Yehuda. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Li Jiwei, Monroe Will, and Jurafsky Dan. 2017. Understanding Neural Networks through Representation Erasure. arxiv:1612.08220.Google ScholarGoogle Scholar
  22. [22] Li Oscar, Liu Hao, Chen Chaofan, and Rudin Cynthia. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Lian Jianxun, Zhou Xiaohuan, Zhang Fuzheng, Chen Zhongxia, Xie Xing, and Sun Guangzhong. 2018. XDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 17541763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Ziwei, Luo Ping, Qiu Shi, Wang Xiaogang, and Tang Xiaoou. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  25. [25] Liu Ziwei, Luo Ping, Wang Xiaogang, and Tang Xiaoou. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 37303738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Lou Yin, Caruana Rich, Gehrke Johannes, and Hooker Giles. 2013. Accurate intelligible models with pairwise interactions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 623631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Mahendran Aravindh and Vedaldi Andrea. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Ming Yao, Xu Panpan, Qu Huamin, and Ren Liu. 2019. Interpretable and steerable sequence learning via prototypes. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 903913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Otter Daniel W., Medina Julian R., and Kalita Jugal K.. 2021. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2021), 604624.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Pan Zhaoqing, Yu Weijie, Yi Xiaokai, Khan Asifullah, Yuan Feng, and Zheng Yuhui. 2019. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access 7 (2019), 3632236333.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Park Dong Huk, Hendricks Lisa Anne, Akata Zeynep, Rohrbach Anna, Schiele Bernt, Darrell Trevor, and Rohrbach Marcus. 2018. Multimodal explanations: Justifying decisions and pointing to the evidence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Qu Yanru, Fang Bohui, Zhang Weinan, Tang Ruiming, Niu Minzhe, Guo Huifeng, Yu Yong, and He Xiuqiang. 2018. Product-based neural networks for user response prediction over multi-field categorical data. ACM Transactions on Information Systems 37, 1 (2018), Article 5, 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Rendle Steffen. 2010. Factorization machines. In Proceedings of the IEEE International Conference on Data Mining. 9951000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 11351144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, and Batra Dhruv. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618626.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Shrikumar Avanti, Greenside Peyton, and Kundaje Anshul. 2017. Learning important features through propagating activation differences. In Proceedings of the International Conference on Machine Learning, Vol. 70. 31453153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Simonyan Karen and Zisserman Andrew. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  39. [39] Singh Chandan, Murdoch W. James, and Yu Bin. 2019. Hierarchical interpretations for neural network predictions. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  40. [40] Song Weiping, Shi Chence, Xiao Zhiping, Duan Zhijian, Xu Yewen, Zhang Ming, and Tang Jian. 2019. AutoInt: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the ACM International Conference on Information and Knowledge Management. 11611170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Srebro Nathan, Rennie Jason D. M., and Jaakkola Tommi S.. 2004. Maximum-margin matrix factorization. In Proceedings of the International Conference on Neural Information Processing Systems. 13291336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Stone Austin, Wang Hua-Yan, Stark Michael, Liu Yi, Phoenix D. Scott, and George Dileep. 2017. Teaching compositionality to CNNs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 50585067.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Tao Li, Wang Xueting, Kawahara Tatsuya, and Yamasaki Toshihiko. 2020. Television advertisement analysis using attention-based multimodal network. In Proceedings of the Annual Conference of JSAI. 1H4OS12b01–1H4OS12b01.Google ScholarGoogle Scholar
  44. [44] Tsang Michael, Cheng Dehua, Liu Hanpeng, Feng Xue, Zhou Eric, and Liu Yan. 2020. Feature interaction interpretability: A case for explaining ad-recommendation systems via neural interaction detection. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  45. [45] Wang Ruoxi, Fu Bin, Fu Gang, and Wang Mingliang. 2017. Deep and cross network for ad click predictions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Article 12, 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Wang Zhihao, Chen Jian, and Hoi Steven C. H.. 2020. Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2020), 33653387.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Xie Ning, Ras Gabrielle, Gerven Marcel van, and Doran Derek. 2020. Explainable Deep Learning: A Field Guide for the Uninitiated. arxiv:2004.14545.Google ScholarGoogle Scholar
  48. [48] Zeiler Matthew, Krishnan Dilip, Taylor Graham, and Fergus Robert. 2010. Deconvolutional networks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 25282535.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zeiler Matthew D. and Fergus Rob. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. 818833.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zellers R., Bisk Y., Farhadi A., and Choi Y.. 2019. From recognition to cognition: Visual commonsense reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 67206731.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Zhang Quanshi, Cao Ruiming, Shi Feng, Wu Ying Nian, and Zhu Song-Chun. 2018. Interpreting CNN knowledge via an explanatory graph. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zhang Q., Wu Y. N., and Zhu S.. 2018. Interpretable convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 88278836.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Weinan, Du Tianming, and Wang Jun. 2016. Deep learning over multi-field categorical data. In Advances in Information Retrieval. 4557.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhou Bolei, Khosla Aditya, Lapedriza Àgata, Oliva Aude, and Torralba Antonio. 2015. Object detectors emerge in deep scene CNNs. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  55. [55] Zintgraf Luisa M., Cohen Taco S., Adel Tameem, and Welling Max. 2017. Visualizing deep neural network decisions: Prediction difference analysis. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. Semantic Explanation for Deep Neural Networks Using Feature Interactions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3s
      October 2021
      324 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3492435
      Issue’s Table of Contents

      Copyright © 2021 Copyright held by the owner/author(s).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 November 2021
      • Accepted: 1 July 2021
      • Revised: 1 May 2021
      • Received: 1 December 2020
      Published in tomm Volume 17, Issue 3s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!