LETTER
Communicated by Si Wu
Progressive Interpretation Synthesis: Interpreting
Task Solving by Quantifying Previously Used
and Unused Information
Zhengqi He
zhengqi.he@riken.jp
Lab for Neural Computation and Adaptation, RIKEN Center for Brain Science,
Saitama 351-0198, Japan
Taro Toyoizumi
taro.toyoizumi@riken.jp
Lab for Neural Computation and Adaptation, RIKEN Center for Brain Science,
Saitama 351-0198, Japan, and Department of Mathematical Informatics, Graduate
School of Information Science and Technology, the University of Tokyo,
Tokyo 113-8656, Japan
A deep neural network is a good task solver, but it is difficult to make
sense of its operation. People have different ideas about how to interpret
its operation. We look at this problem from a new perspective where the
interpretation of task solving is synthesized by quantifying how much
and what previously unused information is exploited in addition to the
information used to solve previous tasks. D'abord, after learning several
tasks, the network acquires several information partitions related to each
task. We propose that the network then learns the minimal information
partition that supplements previously learned information partitions to
more accurately represent the input. This extra partition is associated
with unconceptualized information that has not been used in previous
tasks. We manage to identify what unconceptualized information is used
and quantify the amount. To interpret how the network solves a new task,
we quantify as meta-information how much information from each par-
tition is extracted. We implement this framework with the variational in-
formation bottleneck technique. We test the framework with the MNIST
and the CLEVR data set. The framework is shown to be able to com-
pose information partitions and synthesize experience-dependent inter-
pretation in the form of meta-information. This system progressively
improves the resolution of interpretation upon new experience by con-
verting a part of the unconceptualized information partition to a task-
related partition. It can also provide a visual interpretation by imaging
what is the part of previously unconceptualized information that is
needed to solve a new task.
Neural Computation 35, 38–57 (2023)
https://doi.org/10.1162/neco_a_01542
© 2022 Massachusetts Institute of Technology.
Publié sous Creative Commons
Attribution 4.0 International (CC PAR 4.0) Licence.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
39
1 Introduction
Deep neural networks (DNNs) have made great achievements in fields
such as image recognition (Krizhevsky, Sutskever, & Hinton, 2017), speech
reconnaissance (Hinton et al., 2012), natural language processing (Vaswani
et coll., 2017), and game playing beyond human-level performance (Silver
et coll., 2016). DNNs, cependant, are famous black-box models. That they
fail under certain circumstances, such as adversarial attack (Goodfellow,
Shlens, & Szegedy, 2014), motivates increasing research into understanding
how DNNs solve tasks or model interpretation. More recent research also
suggests that better model interpretation can be useful to, Par exemple,
explanation about model behavior, knowledge mining, ethics, and trust.
(Doshi-Velez & Kim, 2017; Lipton, 2018)
Researchers have proposed different approaches to proceed with model
interpretation; Par exemple, concerning the interpretation style, the post
hoc style tries to separate the model training step and model interpreta-
tion step, and the concurrent style aims simultaneously for task perfor-
mance as well as interpretation (Lipton, 2018). As for the applicability of
interpretation methods, the model-specific type targets a certain class of
models, and with the model-agnostic type, the interpretation method does
not depend on the model (Arrieta et al., 2020). Considering the scope of in-
terpretation, global interpretation gives information about how the task is
solved from a broader view, and local interpretation is more focused on cer-
tain examples or parts of the model (Doshi-Velez & Kim, 2017). Il y a
also diverse forms of interpretation, such as the information feature (Chen,
Song, Wainwright, & Jordan, 2018), the relevance feature (Bach et al., 2015),
a hot spot of attention (Hudson & Manning, 2018), or gradient informa-
tion (Sundararajan, Taly, & Yan, 2017). Another stream of research proposes
that interpretable models are usually simple ones: Par exemple, discrete-
state models (Hou & Zhou, 2018), shallower decision trees (Freitas, 2014;
Wu et al., 2017), graph models (Zhang, Cao, Shi, Wu, & Zhu, 2017), or a
small number of neurons (Lechner et al., 2020). (See Arrieta et al., 2020, pour
a more detailed overview.)
One particular dimension for model interpretation related to our let-
ter is how much preestablished human knowledge is needed. Methods
that require high human involvement, such as interpretation with human
predefined concepts (Koh et al., 2020; Chen, Bei, & Rudin, 2020) or with
large human-annotated data sets (Kim, Tapaswi, & Fidler, 2018), implicitly
assume the background knowledge of an average human to make sense
of the interpretation, which is hard to define rigorously. Contrarily, exister-
ing human-agnostic methods transfer interpretation into some measurable
formulaire, such as the depth of the decision tree (Freitas, 2014; Wu et al., 2017).
Cependant, how well this kind of measure is related to human-style interpre-
tation is under debate.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
40
Z. He and T. Toyoizumi
Within the human-agnostic dimension of interpretation, we extend the
discussion with two new perspectives. One perspective starts with the sim-
ple idea that interpretation should be experience dependent. Motivated by
this idea, we focus on the situation where the model learns a sequence of
tasks by assuming that later tasks can be explained using earlier experi-
ences. Autrement dit, model interpretation in our framework is defined
as meta-information describing how the information used to solve the new
task is related to previous ones. The second perspective is motivated by the
idea that interpretation should be able to handle the out-of-experience situ-
ation. In a situation where a new task cannot be fully solved by experience,
the model interpretation method should be able to report new knowledge,
mimicking a human explaining what is newly learned. We demonstrate that
this framework can cast insight into how later tasks can be solved based on
previous experience on MNIST and CLEVR data sets (Johnson et al., 2017)
and express ignorance when experience is not applicable.
Our work is related to the concept bottleneck model (CBM) and con-
cept whitening model (CWM; Koh et al., 2020; Chen et al., 2020) dans
the sense that meaningful interpretation of the current task depends on
previously learned knowledge. Cependant, these methods do not capture
reasonable interpretation when the human-defined concepts alone are in-
sufficient to solve downstream tasks (Margeloiu et al., 2021). In our frame-
travail, we add the unconceptualized region to take care of information not
yet associated with tasks. De plus, a recent study also shows that contam-
ination of concept-irrelevant information in the predefined feature space
can hamper interpretation (Mahinpei et al., 2021). We implement infor-
mation bottleneck (IB; Tishby, Pereira, & Bialek, 2000) as a remedy to this
information leak problem. Our method also shares similarities with varia-
tional information bottleneck for interpretation (VIBI) method (Bang, Xie,
Lee, Wu, & Xing, 2019) and the multiview information bottleneck method
(Wang, Boudreau, Luo, Tan, & Zhou, 2019) in the sense that these methods
use IB to obtain minimal latent representation from previously given repre-
sentations. Cependant, unlike the multiview IB method for problem solving,
the goal of our framework is to synthesize interpretation. En outre, notre
framework does so using macroscopic task-level representations, which is
different from microscopic input-level representations used in VIBI.
2 Insight into Interpretation
This section discusses the intuition behind our framework for model inter-
pretation.
2.1 Interpretation as Meta-Information. To quantify how a new task
is solved using the experience of previous tasks, we evaluate meta-
information. We define meta-information as a vector of mutual information,
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
41
where each element of the vector describes how much the corresponding
information partition is used for the new task.
2.1.1 Interpreting Using the Right Level. In this work, a machine learns a
series of different tasks. The aim is to ascribe an interpretation of how the
model solves the new task based on previous experience. If we did this us-
ing low-level features, such as the intensity and color of each pixel, the task
description would become complicated. Plutôt, we aim to give an inter-
pretation at a more abstract level—for example, “This new task is solved by
combining the knowledge about tasks 2 and 4.” To achieve this goal, infor-
mation about the input is partitioned at the task level. We therefore prepare
information partitions that encode useful features for each task.
2.1.2 Inducing Independence. These partitions have to satisfy certain con-
ditions. If these information partitions are redundant, we will have arbi-
trariness in assigning meta-information since a task can equally be solved
using different partitions (Wibral, Priesemann, Kay, Lizier, & Phillips, 2017).
Donc, inducing independence among partitions is preferred for hav-
ing unambiguous meta-information. Useful methods are widely available
in machine learning fields such as independent component analysis (Cloche
& Sejnowski, 1995; Hyvärinen & Oja, 2000) and variational autoencoders
(Kingma & Welling, 2013).
2.1.3 Meaning Assignment. We have defined meta-information meta-
information as a vector of Shannon information measured in bits (c'est à dire., comment
much each information partition is used). Although the number of bits itself
has no meaning, each entry of the vector is linked to a corresponding task.
Ainsi, the meta-information can be mapped to the relevance of previous
tasks.
2.2 Progressive Nature of Interpretation.
2.2.1 Progressive Interpretation. One important but usually ignored prop-
erty of interpretation is that we interpret based on experience (National
Research Council, 2002; Bada & Olusegun, 2015). Progressively learning
multiple tasks is not a rare setting in machine learning (Andreas, Rohrbach,
Darrell, & Klein, 2016; Rusu et al., 2016; Parisi, Kemker, Part, Kanan, &
Wermter, 2019), which is usually referred to as “lifelong learning,” “sequen-
tial learning,” or “incremental learning.” However, these studies usually
focus on avoiding catastrophic forgetting and do not investigate how pro-
gressiveness contributes to interpretation. In one example, Kim et al. (2018),
point out that interpretability emerges when lower-level modules are pro-
gressively made use of. We propose that interpretation should be synthe-
sized in a progressive manner, where the model behavior is interpreted by
how much the current task is related to previously experienced tasks.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
42
Z. He and T. Toyoizumi
2.2.2 Knowing You Don’t Know. An experience-based progressive in-
terpretation framework may inevitably encounter the situation when its
previous experience does not help interpret the current task. To solve this
problem, we introduce an unconceptualized partition, storing information
not yet included in the existing information partitions. We noticed that
this unconceptualized partition generates a “knowing you don’t know”
type of interpretation—a meta-cognition ability that allows a person to re-
flect on their knowledge, including what they don’t know (Glucksberg &
McCloskey, 1981). Under this situation, the design of the framework should
be able to interpret knowing you don’t know when faced with out-of-
experience tasks.
We now formalize our insights in the language of information theory in
the following sections.
3 The Progressive Interpretation Framework
Assume we have a model with stochastic input X, which is statistically the
same regardless of a task. Task i is defined as predicting a series of stochastic
labels Zi. Its corresponding internal representation is Yi. The progressive
interpretation framework is formalized iteratively as follows:
, Y2
, . . . , Yn, Yelse
1. Assume that after task n, a model has a minimal internal representa-
tion Y = {Y1
} that encodes input X. Yi describes the
internal representation learned to solve task i. Yelse describes inter-
nal representation encoding X that is not yet used to solve and task.
The optimization in the ideal case yields independence among the
previous task-relevant partitions:
je(Faire
; Yj ) = 0, (je (cid:2)= j ∈ [1, n] ∪ else).
Ici, we define the notation [1, n] to be {1, 2, 3, . . . , n}.
2. Then the model is faced with the new task n + 1 and learns to predict
Zn+1. After learning Zn+1, the model distills the necessary part Y(i∩n+1)
from each partition Yi(i = [1, n] ∪ else) for solving task n + 1. This is
achieved by minimizing
je(Oui(i∩n+1)
; Faire), (i ∈ [1, n] ∪ else)
while maintaining the best task performance, c'est, by maintaining
ideally all task-relevant information:
je(∪n,else
i=1 Yi
; Zn+1) = I(∪n,else
i=1 Y(i∩n+1)
; Zn+1).
3. The interpretation is defined as the meta-information of how much
the individual partitions {Faire
} for previous tasks i ∈ [1, n] ∪ else are
used to solve task n + 1. Namely, the composition of the mutual
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
43
Chiffre 1: A schematic plot showing the intuition of the progressive inter-
pretation framework. Interpretation in our framework is based on the meta-
information that specifies from which partitions the needed information comes
to solve a new task, Zn+1. The map has the resolution in the level of task par-
titions Yi, where partitions are made independent of each other. Independence
among task partitions ensures the uniqueness of the needed information. Any-
thing the model has not yet learned to use would stay in the unconceptualized
Yelse region. The more tasks the model has encountered, the smaller the uncon-
ceptualized region would be. Ainsi, later tasks lead to better interpretation.
; Faire) over the different partitions i = [1, n] ∪ else
information I(Oui(i∩n+1)
is the meta-information we use to interpret the global operation of
the neural network. Then the local interpretation for each example is
available from {Oui(i∩n+1)
}.
4. After task n + 1, the model updates the representation partition by
splitting Yelse into the newly added representation Y(else∩n+1) and its
\Oui(else∩n+1). Then the former is denoted as Yn+1 and
complement, Yelse
the latter as new Yelse. The model would continue for further iteration
and interpretation of the tasks.
The process is shown in Figure 1.
4 Implementation
Our particular interest is in the system involving neural networks. Since our
framework is information-theoretic, all types of neural networks are treated
equally as segments of information processing pipelines. Which type of
neural network to choose depends on the specific problem.
Neural network implementation of progressive interpretation can be im-
plemented as loops over the four steps set out in section 3. In step 1, nous
assume a network already has information maps for task 1-to-n. Then we
extract the unconceptualized partition that is unrelated to task 1-to-n by
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
44
Z. He and T. Toyoizumi
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 2: Information flow graph of the progressive interpretation framework.
Yellow areas are representations, and green and blue arrows represent neural
réseaux. Green ones are put under training while blue ones are fixed. The red
square with IB represents the information bottleneck.
IB. In step 2, the model learns a new task, n + 1. Then the interpretation
is gained by knowing how much information is needed from each subre-
gion as in step 3. In step 4, we repeat step 1 with a new map for task n + 1
and prepare for the next loop. By adding new tasks and looping over the
steps, a progressively more informative interpretation can be gained. Le
information flow chart implemented in the following sections is shown in
Chiffre 2.
4.1 Information Bottleneck. In our framework, IB plays an important
role in manipulating information flow. To predict label Z from statistical
input X with inner representation Y, IB would maximize
[je(Oui; Z) − γ I(Oui; X )], Y = fθ (X, (cid:4)),
maximum
je
(4.1)
where γ ∈ [0, 1] is the scaling factor controlling the balance between the
task performance (when γ is small) and having nonredundant information
representation (when γ is large). f is a neural network parameterized by
Progressive Interpretation Synthesis
45
the parameter θ , et (cid:4) is a noise term that is important to suppress task-
irrelevant information out of X.
We choose the variational information bottleneck (VIB) implementation
(Alemi, Fischer, Dillon, & Murphy, 2016; Chalk, Marre, & Tkacik, 2016; Li &
Eisner, 2019) with loss function
L(p, q, r) = E
Oui,Z
(cid:2)
− log q (Z | Oui)
(cid:3)
+ γ EX{KL
(cid:2)
(cid:3)
p (Oui | X ) , r(Oui)
}
(4.2)
to optimize the encoding distribution p(Oui|X ), decoding distribution q(Z|Oui),
and prior distribution r(Oui) for p. EX describes taking the expectation over
random variable X. Note that E
NON|X. During the optimization,
= EXE
EXEZ|X is computed by averaging over N training samples of input {x j
| j =
1, . . . , N} and label {z j
Oui|X is the average over the encoding
distribution p(Oui|X ), which is computed using the mapping Y = fθ (X, (cid:4)) de
the encoding neural network. Y can be a vector of either continuous or dis-
crete variables (Li & Eisner, 2019) (see appendix section 3 for details). Pour
clarity, we further simplify the notation of loss function to be
| j = 1, . . . , N}. E
Oui|X
Oui,Z
L = Q(Z|Oui) + γ KL(Oui)
(4.3)
for future use, where the Q term corresponds to the log-likelihood term
trying to approximate Z from internal representation Y. The KL term cor-
responds to the KL-divergence term trying to control the expressiveness
of Y.
4.2 Task Training and Information Partition Splitting. Suppose a
new model with task input X learns its first task to predict label Z1. Il
is not difficult to train a neural network for this task by optimization:
minθ D( f1,θ (X )||Z1), where D is a distance function, such as KL divergence
or mean-square error, which is decided by the problem. f1,θ is an encoder
network parameterized by θ . After training, we will be able to obtain
= f1(X, (cid:4)), where f1 indicates a neural
the representation of task 1 as Y1
network f1,θ after optimizing θ .
Our next problem is how to obtain task 1 unrelated representation Yelse,
; Yelse) = 0, to complement the intermediate rep-
which ideally satisfies I(Y1
resentation about the input. Ici, we propose that Yelse can be obtained via
the implementation of IB on an autoencoding task:
maximum
je
[je(Y1
, Yelse
; X ) − γ I(Yelse
; X )],
Yelse
= felse,je (X, (cid:4)),
(4.4)
where γ is again the scaling factor controlling the trade-off between in-
cluding and excluding different information. Note that the learned f1
function is fixed while felse,θ is trained. The intuition behind equation
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
46
Z. He and T. Toyoizumi
; Yelse) > 0 implies redundant information
4.4 is described as follows. je(Y1
about Y1 contained in Yelse. This redundant information would not improve
; X ). Cependant, removing this redundant information can decrease
je(Y1
; X ), thus contributing to our optimization goal. Note that we assume
je(Yelse
γ is less than one.
, Yelse
With the simplified notation of the VIB introduced above, the loss
fonction
L = Q(X|Y1
, Yelse) + γ KL(Yelse)
(4.5)
is minimized. The loss function seeks to autoencode X given previously
learned Y1 (which is fixed) together with Yelse, while controlling expressive-
ness of Yelse.
4.3 New Task Interpretation. Now assume the model has internal rep-
resentation Y = {Y1
} after learning tasks 1 to n. When the
new task n + 1 is introduced, the model learns to predict Zn+1. Task n + 1
relevant information can be extracted from Y by the IB as follows:
, . . . , Yn, Yelse
, Y2
(cid:4)
je(∪n,else
i=1 Y(i∩n+1)
maximum
je
; Zn+1) −c
(cid:6)
je(Oui(i∩n+1)
; Faire)
,
n,else(cid:5)
je = 1
Oui(i∩n+1)
= f(i∩n+1),je (Faire
, (cid:4)),
(4.6)
, (i ∈ [1, n] ∪ else) is the information needed from Yi to solve
where Y(i∩n+1)
task n + 1. Encore, (cid:4) is the noise term required to eliminate information irrel-
evant to task n + 1. Since Y(i∩n+1)
, (cid:4)) depends on Yi, ensemble
with IB, Oui(i∩n+1) is then a minimum subpartition of Yi required for task
n + 1. We again implement the variational IB loss function with simplified
notation:
= f(i∩n+1),je (Faire
L = Q(Zn+1
| ∪n,else
i=1 Y(i∩n+1)) +
c
n + 1
n,else(cid:5)
je = 1
KL(Oui(i∩n+1)).
(4.7)
After getting {Oui(i∩n+1)
The loss function seeks to maximize the prediction of Zn+1 while controlling
the needed information from Yi. Index i specifies a representation partition.
}, we can derive an interpretation as the meta-
; Faire) needed from each partition Yi as defined in sec-
information I(Oui(i∩n+1)
tion 3. We can also look into the representations of Y(i∩n+1) to gain insight
into how task n + 1 is solved for each example.
Oui(else∩n+1) is the information needed from the unconceptualized partition
Yelse to solve task n + 1. We can rewrite this to be Yn+1 and define the new
\Oui(else;n+1). We can then go back
unconceptualized partition as Yelse
to step 1 and continue the iteration for task n + 2.
← Yelse
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
47
5 Experiments
5.1 MNIST Data Set. We first illustrate our progressive interpretation
framework on the MNIST data set (60,000/10,000 train/test splits). We set
task 1 as digit recognition. For task 2, we propose three kinds of tasks: deter-
mining if a number is even or odd (parity task), predicting the sum of pixel
intensities (ink task), or a task that involves both digit information and pixel
intensity information with a certain resolution (see below). D'abord, we train a
network f1 to perform digit recognition, and then we train an autoencoder
with IB to train a network felse to obtain a digit-independent partition. Alors
we extend the network to train on a second task and obtain interpretation
from the information flow. We choose continuous latent representation for
this section. (See appendix sections 1 et 2 for implementation details.)
5.1.1 IB Removes Task-Relevant Information from the Unconceptualized Re-
gion. Unconceptualized representation can be obtained after the autoen-
coding step. We can check what has been learned by scanning this latent
code. Figure 3a shows the scanning result of the top five latent represen-
tation units, ordered by descending mutual information with X. Note that
changing these features does not change the digit. De plus, mutual infor-
mation between Ydigit and Yelse is estimated by training a neural network
that predicts Ydigit from Yelse. The estimated information is smaller than 0.1
Nat when γ is larger than 5e-4, which indicates that digit information is
removed from the un-conceptualized region by IB.
5.1.2 The Framework Explains How a New Task is Solved. After the auto-
encoding step, we proceed to solve either the parity task or ink task to
study the interpretation that the framework provides. For the parity task,
mutual information from Ydigit and Yelse are 0.702 Nat and 0.002, Nat re-
spectively, and for the ink task, 1.498 Nat and 2.045 Nat. The result shows
that the parity task doesn’t need information from Yelse, while the ink task
does. Clues of how the tasks are solved can also be found by looking into
the representation obtained after IB. For the parity task, different digits are
clustered into two groups according to their parity. For the ink task, digits
are aligned in an order corresponding to their actual average ink amount
(0 > 8 > 2 > 3 > 6 > 5 > 9 > 4 > 7 > 1), as Figure 4 shows.
5.1.3 Experience-Dependence of the ELSE Partition. After learning the digit
and the ink tasks, we can update the autoencoder felse to exclude the ink-
task-related information. D'une part, Yink (the first row of Figure 3b)
represents the average pixel intensity. On the other hand, this information
is suppressed in Yelse (rows 2–5). The suppression can be measured by fea-
ture correlation between Yink and Yelse. Before the ink task, the correlations
sont (0.295, 0.414, 0.080, 0.492, 0.100) for the five units visualized, but after
the ink task, the correlation becomes (0.030, 0.194, 0.019, 0.028, 0.001). Nous
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
48
Z. He and T. Toyoizumi
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Chiffre 3: Latent code scanning of unconceptualized representation after au-
toencoding, before (un) and after (b) ink task (except the first row). The recon-
structed images plotted as the activity (columns) of one of the coding units
(rows) are varied with others fixed. (c) Shows how the average ink intensity
varies when we scan the latent code of the same five units as in panels a and b.
also present the result of the average ink intensity versus the latent code
of the five units. It can clearly be seen that before the ink task, the knowl-
edge of average intensity is distributed across all five units. Cependant, after
the ink task, the knowledge of average intensity is extracted as Yink and
removed from Yelse (see Figure 3c). The result indicates that the unconcep-
tualized region is experience dependent, and information about the already
learned task is excluded. Unlike other frameworks such as variational au-
toencoder (Kingma & Welling, 2013) and infoGAN (Chen et al., 2016), lequel
usually have no explicit control over partitioning latent representation, notre
framework allows latent representation reorganization through progressive
tasks.
Progressive Interpretation Synthesis
49
Chiffre 4: VIB latent code distribution of different digits for the parity task
Ydigit∩parity (gauche) and ink task Ydigit∩ink (droite). The x-axis shows the code value,
and the y-axis shows the code density; different colors represent different digits
ranging from 0 à 9. For the parity task, the latent code formed two clusters,
one for even numbers and one for odd numbers. And for the ink task, digits are
aligned in the order of the average amount of ink.
5.1.4 Quantitative Benchmark of Interpretation. Suivant, we ask if our pro-
posed interpretation is quantitatively useful. Because we are not aware of
task-level, human-agnostic interpretation algorithms directly comparable
to ours, we study how the interpretation changes as we systematically mod-
ify the required type of information for task 2. Task 2 is designed to require
both digit information and digit-independent ink information involving
different resolutions. For digit information, we have four resolutions: d1,
d2, d5, and d10. Par exemple, d5 means that 10 digits are separated into five
equally sized groups, and the task is to tell which group the image belongs
à. Par conséquent, (0, 0.693, 1.609, 2.303) Nat of information about digits is the-
oretically needed, respectivement. For digit-independent ink information, nous
also have four resolutions (according to the percentile-based grouping for
each digit by the amounts of ink used): s1, s2, s3, and s4, which theoretically
require (0, 0.693, 1.099, 1.386) Nat of information. By combining them, nous
get 16 possibilities for task 2; the interpretation measured as mutual infor-
mation and the corresponding theoretical values are shown in Figure 5. Le
; Ydigit), can be
figure shows that information needed from Ydigit, je(Ydigit∩2
; Yelse) depuis
precisely predicted. The required nondigit information I(Yelse∩2
Yelse via autoencoding correlates with the required amount to solve the task.
Cependant, due to the imperfection of the variational IB algorithm to purely
extract relevant information, more than the theoretically required amount
of information from Yelse is used for good performance. This problem can
be practically remedied by allowing Yelse to be retrained by adding an aux-
iliary autoencoding task when learning task 2. Since input data are avail-
able during task 2, adding an auxiliary autoencoding task during task 2
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
50
Z. He and T. Toyoizumi
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
Chiffre 5: Mutual information from Ydigit and Yelse used to solve the benchmark
tasks. Blue/orange bars are mutual information without/with auxiliary autoen-
coding, and the theory indicating the required amount of information is plot-
ted with the green dotted line. Inside text boxes are task-correct rates without/
with auxiliary autoencoding. The title of each panel represents different task
types combining four digit resolutions—d1, d2, d5, and d10—and four digit-
independent ink resolution—s1, s2, s3, and s4—forming a 4-by-4 matrix.
training increases task performance without needing extra data. (Voir
appendix section 9 for further discussion.)
5.2 CLEVR Data Set. Dans cette section, we demonstrate the progressive
interpretation framework on the CLEVR data set (Johnson et al., 2017), un
large collection of 3D-rendered scenes (70,000/15,000 train/test splits) avec
multiple objects with compositionally different properties. The CLEVR data
set was originally designed for a visual question-answering task, but we
train the model without using natural language. Par exemple, we train the
model to classify the color of an object or conduct a multiple-choice (MC)
task using only pictures. For the MC task, the model is trained on a large set
of four pictures and learns to choose one of the four pictures that includes
a target object (100,000/20,000 train/test splits).
Dans cette section, we divide the tasks into two groups. In task group 1, le
model that is pretrained to tell objects apart learns to recognize three of
the important properties, position, color, and material, among shape, size,
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
51
color, matériel, and position. In task group 2, the model is asked to perform
an MC task selecting a picture according to a specific context, Par exemple,
“Choose the picture with red cubes,” which needs information learned or
not yet learned in task 1. For task group 1, we first use convolutional neu-
ral networks (CNNs) to report the image properties by supervised learning
and then obtain the unconceptualized region via autoencoding. After that,
task group 2 is performed with interpretation synthesized. We choose dis-
crete latent representation for this section. (See appendix sections 1 et 2
for implementation details.)
5.2.1 Interpretation by Information Flow. The result of interpretation by in-
; Faire)
formation flow is shown in Table 1. The mutual information I(Oui(i∩MC)
for i ∈ {posi, color, matériel, else} is measured in Nat per object, where MC
represents the multiple-choice task. Different rows represent different ques-
tion types. We sample five random initializations of the networks for each
task and present both the average and standard deviations. The theoretical
amount of information required for feature i is shown in parentheses. Nous
can interpret how the model is solving the task by calculating mutual infor-
mation coming from each information partition. Par exemple, the task to
“choose the picture with green metal” needs 0.345 Nat of information from
the color domain and 0.686 Nat from the material domain. As expected, dans-
formation coming from other domains is judged as irrelevant to this task. If
the task is to “choose the picture with a small yellow object,” the model then
needs 0.343 Nat from the color domain, plus 0.70 Nat of information from
the unconceptualized region since the model has not yet explicitly learned
about object size. If the task is “choose the picture with a large sphere,” the
model finds out that all previously learned properties are useless and has
to pick 0.31 Nat of information from the unconceptualized region. This is
because neither size nor shape information has been used in previous tasks.
5.2.2 Single-Example Interpretation and Unconceptualized Representation.
After getting the model, it is also possible to synthesize interpretation for
a single example by looking into the discrete representation Y(i∩MC) pour
i ∈ {posi, color, matériel, else}. A typical example is shown in Figure 6. Ce
example corresponds to a “small yellow object.” We can see the model dis-
criminates if the object has the color “yellow” while neglecting position
and material information. To solve the problem, the model also needs in-
formation from the unconceptualized partition, which is representing the
size “small.” The behavior of the model is consistent with the expectation
of the question regarding the “small yellow object.”
We examine the correctness of the unconceptualized representation by
comparing it with the true label. Par exemple, if the task is “choose the
small yellow object,” the unconceptualized region should represent the
size “small.” We can cross-check by calculating their mutual information,
which is 0.662 Nat per object. For the case “choosing a red cube,” mutual
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
/
e
d
toi
n
e
c
o
un
r
t
je
c
e
–
p
d
/
je
F
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
un
_
0
1
5
4
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
52
Z. He and T. Toyoizumi
9
9
)
1
5
4
.
0
(
1
0
%
1
%
7
%
1
%
4
%
7
%
1
.
.
.
.
.
.
0
±
3
0
±
0
0
±
2
0
±
8
0
±
8
0
±
4
.
.
.
.
.
.
9
9
7
9
9
9
5
9
4
9
)
0
(
1
0
0
.
0
<
)
0
(
2
0
.
0
±
1
0
.
0
)
3
9
6
.
0
(
1
0
)
7
3
6
.
0
(
6
0
)
7
3
6
.
0
(
6
0
.
.
.
.
0
±
0
7
0
±
9
8
0
±
8
8
0
±
1
3
.
.
.
.
0
0
0
0
)
3
9
6
.
0
(
8
0
0
)
3
9
6
.
0
(
1
0
0
.
.
0
±
6
8
6
0
±
8
8
6
.
.
0
0
)
0
(
1
0
0
)
0
(
1
0
0
)
0
(
1
0
0
)
0
(
1
0
0
.
.
.
.
0
<
0
<
0
<
0
<
)
7
7
3
.
0
(
1
0
0
.
0
±
5
4
3
.
0
)
0
(
1
0
0
.
0
<
)
7
7
3
.
0
(
2
0
0
)
7
7
3
.
0
(
2
0
0
.
.
0
±
3
4
3
0
±
1
8
3
.
.
0
0
)
0
(
1
0
0
)
0
(
1
0
0
.
.
0
<
0
<
)
0
(
1
0
0
.
0
<
.
)
2
5
0
(
2
0
.
0
±
6
5
.
0
)
0
(
1
0
0
)
0
(
1
0
0
.
.
0
<
0
<
)
1
5
.
0
(
3
0
.
0
±
9
5
.
0
)
0
(
1
0
0
.
0
<
l
a
t
e
M
n
e
e
r
G
r
e
b
b
u
R
t
f
e
L
w
o
l
l
e
Y
l
l
a
m
S
e
b
u
C
d
e
R
r
e
d
n
i
l
y
C
t
h
g
i
R
e
r
e
h
p
S
e
g
r
a
L
e
t
a
r
t
c
e
r
r
o
C
n
w
o
n
k
n
U
l
a
i
r
e
t
a
M
r
o
l
o
C
n
o
i
t
i
s
o
P
e
p
y
T
n
o
i
t
s
e
u
Q
.
n
o
i
t
a
t
e
r
p
r
e
t
n
I
2
k
s
a
T
r
o
f
e
l
b
a
T
:
1
e
l
b
a
T
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
.
e
u
l
a
v
l
a
c
i
t
e
r
o
e
h
t
e
h
t
s
i
,
s
e
s
e
h
t
n
e
r
a
p
e
d
i
s
n
i
j
,
)
t
c
e
b
o
/
t
a
N
(
t
i
n
u
n
o
i
t
a
m
r
o
f
n
i
e
h
T
:
e
t
o
N
Progressive Interpretation Synthesis
53
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 6: Single-example interpretation of the task “choose the small yellow
object.” The left column shows input pictures, and the middle column shows
masks colored according to object IDs. We overlaid the masks with the object
IDs for visual aid. The right column shows the binary activity summarizing the
information at layer Y(i∩MC). The x-axis corresponds to object ID, and the y-axis
represents four kinds of representations: position Y(posi∩MC), color Y(color∩MC), ma-
terial Y(material∩MC), and else Y(else∩MC), where the dimension with highest mutual
information is plotted. The red square represents the lower frequency binary
representation, and the white space represents the counterpart.
information with the label “cube” is 0.432 Nat per object. For the case
“choosing cylinder on the right side,” mutual information with the label
“cylinder” is 0.408 Nat per object. All of these numbers exceed the chance
level (the 99, 95, and 90 percentile by chance are 0.637, 0.495, and 0.368
Nat, respectively, for balanced binary random variables like size, and 0.583,
0.449, 0.332 Nat for cases with three alternatives like shape).
5.2.3 Visualizing the Unconceptualized Representation. After getting the un-
conceptualized representation useful for the new task, we can continue
the framework by splitting that representation into the learned useful part
and its complement. Separating this new useful representation is nontriv-
ial because labels of the MC task jointly depend on multiple image prop-
erties. While previous methods (Koh et al., 2020; Chen et al., 2020) need
feature-specific labels to learn a new property, the proposed framework
54
Z. He and T. Toyoizumi
Figure 7: Visualizing the newly learned YMC about size after learning the task
“choose the picture with a small yellow object.” As can be seen from the result,
changing YMC of a small object renders a big counterpart of the same object, and
changing YMC of a big object renders a small counterpart of the same object.
automatically segregates a new, useful representation from previously
learned representations. Furthermore, the proposed system can visualize
what new representation has just been learned.
Here, we demonstrate the result after learning the task “choose the pic-
ture with a small yellow object. We have mentioned that after learning this
new task, the model is expected to learn a new concept about size as the
= Y(else∩MC). Note, again, that we never provided
new representation YMC
the model labels specifically about size. Then we can continue the frame-
work by performing another round of autoencoding, which splits Yelse into
\YMC. After that, the model explains what property is newly
YMC and Yelse
learned by generating the image of an object and changing its size as the
newly latent representation YMC is altered (see Figure 7). This visualization
also helps humans interpret the operation of the model.
Information about other studies on the CLEVR data set can be found in
appendix sections 4 to 8. We also offer more discussion about our method in
appendix section 9 and discuss limitations of our method in appendix sec-
tion 10. The source code of this project can be found at https://github.com/
hezq06/progressive_interpretation.
6 Conclusion
This letter proposes a progressive framework based on information the-
ory to synthesize interpretation. We show that interpretation involves
independence, is progressive, and can be given at a macroscopic level
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
55
using meta-information. Changing the receiver of the interpretation from
a human to a target model helps define interpretation clearly. Our inter-
pretation framework divides the input representations into independent
partitions by tasks and synthesizes interpretation for the next task. This
framework can also visualize what conceptualized and unconceptualized
partitions code by generating images. The framework is implemented with
a VIB technique and is tested on the MNIST and the CLEVR data sets. The
framework can solve the task and synthesize nontrivial interpretation in
the form of meta-information. The framework is also able to progressively
form meaningful new representation partitions. Our information-theoretic
framework capable of forming quantifiable interpretations is expected to
inspire future understanding-driven deep learning.
Acknowledgments
We thank Ho Ka Chan, Yuri Kinoshita, and Qian-Yuan Tang for useful dis-
cussions about the work. This study was supported by Brain/MINDS from
the Japan Agency for Medical Research and Development (AMED) under
grant JP15dm0207001, Japan Society for the Promotion of Science (JSPS) un-
der KAKENHI grant JP18H05432, and the RIKEN Center for Brain Science.
References
Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2016). Deep variational information
bottleneck. arXiv:1612.00410.
Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Neural module networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 39–48).
Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., . . .
Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies,
opportunities and challenges toward responsible AI. Information Fusion, 58, 82–
115. 10.1016/j.inffus.2019.12.012
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015).
On pixel-wise explanations for non-linear classifier decisions by layer-wise rele-
vance propagation. PLOS One, 10(7), e0130140.
Bada, S. O., & Olusegun, S. (2015). Constructivism learning theory: A paradigm for
teaching and learning. Journal of Research and Method in Education, 5(6), 66–70.
Bang, S., Xie, P., Lee, H., Wu, W., & Xing, E. (2019). Explaining a black-box using a deep
variational information bottleneck approach. arXiv:1902.06918.
Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to
blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159.
10.1162/neco.1995.7.6.1129
Chalk, M., Marre, O., & Tkacik, G. (2016). Relevant sparse codes with variational in-
formation bottleneck. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett
(Eds.), Advances in neural information processing systems, 29. Curran.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
56
Z. He and T. Toyoizumi
Chen, J., Song, L., Wainwright, M. J., & Jordan, M. I. (2018). Learning to explain: An
information-theoretic perspective on model interpretation. arXiv:1802.07814.
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016).
InfoGAN: Interpretable representation learning by information maximizing gen-
erative adversarial nets. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Gar-
nett (Eds.), Advances in neural information processing systems, 29 (pp. 2180–2188).
Curran.
Chen, Z., Bei, Y., & Rudin, C. (2020). Concept whitening for interpretable im-
age recognition. Nature Machine Intelligence, 2(12), 772–782. 10.1038/s42256-020
-00265-z
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine
learning. arXiv:1702.08608.
Freitas, A. A. (2014). Comprehensible classification models: A position paper. ACM
SIGKDD Explorations Newsletter, 15(1), 1–10. 10.1145/2594473.2594475
Glucksberg, S., & McCloskey, M. (1981). Decisions about ignorance: Knowing that
you don’t know. Journal of Experimental Psychology: Human Learning and Memory,
7(5), 311. 10.1037/0278-7393.7.5.311
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial
examples. arXiv:1412.6572.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., . . . Kingsbury,
B. (2012). Deep neural networks for acoustic modeling in speech recognition: The
shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–
97. 10.1109/MSP.2012.2205597
Hou, B.-J., & Zhou, Z.-H. (2018). Learning with interpretable structure from RNN.
arXiv:1810.10708.
Hudson, D. A., & Manning, C. D. (2018). Compositional attention networks for machine
reasoning. arXiv:1803.03067.
Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and
applications. Neural Networks, 13(4–5), 411–430.
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., &
Girshick, R. (2017). CLEVR: A diagnostic dataset for compositional language and
elementary visual reasoning. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (pp. 2901–2910).
Kim, S. W., Tapaswi, M., & Fidler, S. (2018). Visual reasoning by progressive module
networks. arXiv:1806.02453.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. arXiv:1312.6114.
Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., & Liang, P.
(2020). Concept bottleneck models. In Proceedings of the International Conference on
Machine Learning (pp. 5338–5348).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with
deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
10.1145/3065386
Lechner, M., Hasani, R., Amini, A., Henzinger, T. A., Rus, D., & Grosu, R. (2020).
Neural circuit policies enabling auditable autonomy. Nature Machine Intelligence,
2(10), 642–652. 10.1038/s42256-020-00237-3
Li, X. L., & Eisner, J. (2019). Specializing word embeddings (for parsing) by information
bottleneck. arXiv:1910.00163.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Progressive Interpretation Synthesis
57
Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 31–57.
10.1145/3236386.3241340
Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F., & Pan, W. (2021). Promises and pitfalls
of black-box concept learning models. arXiv:2106.13314.
Margeloiu, A., Ashman, M., Bhatt, U., Chen, Y., Jamnik, M., & Weller, A. (2021). Do
concept bottleneck models learn as intended? arXiv:2105.04289.
National Research Council. (2002). Learning and understanding: Improving advanced
study of mathematics and science in US high schools. National Academies Press.
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong
learning with neural networks: A review. Neural Networks, 113, 54–71. 10.1016/
j.neunet.2019.01.012
Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick,
J.,
Kavukcuoglu, K., . . . Hadsell, R. (2016). Progressive neural networks. arXiv:1606.
04671.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., . . .
Hassabis, D. (2016). Mastering the game of Go with deep neural networks and
tree search. Nature, 529(7587), 484–489. 10.1038/nature16961
Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks.
arXiv:1703.01365.
Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method.
arXiv:0004057.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, . . . Polosukhin,
I. (2017). Attention is all you need. In I. Guyon, Y. V. Luxburg, S. Bengio, H. Wal-
lach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural informa-
tion processing systems, 30 (pp. 5998–6008). Curran.
Wang, Q., Boudreau, C., Luo, Q., Tan, P.-N., & Zhou, J. (2019). Deep multi-view in-
formation bottleneck. In Proceedings of the 2019 SIAM International Conference on
Data Mining (pp. 37–45).
Wibral, M., Priesemann, V., Kay, J. W., Lizier, J. T., & Phillips, W. A. (2017). Partial
information decomposition as a unified approach to the specification of neural
goal functions. Brain and Cognition, 112, 25–38. 10.1016/j.bandc.2015.09.004
Wu, M., Hughes, M. C., Parbhoo, S., Zazzi, M., Roth, V., & Doshi-Velez, F. (2017). Be-
yond sparsity: Tree regularization of deep models for interpretability. arXiv:1711.06178.
Zhang, Q., Cao, R., Shi, F., Wu, Y. N., & Zhu, S.-C. (2017). Interpreting CNN knowledge
via an explanatory graph. arXiv:1708.01785.
Received May 24, 2022; accepted August 10, 2022.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
/
3
5
1
3
8
2
0
7
5
4
4
0
n
e
c
o
_
a
_
0
1
5
4
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3