信
Communicated by Arindam Banerjee
Multilinear Common Component Analysis
via Kronecker Product Representation
Kohei Yoshikawa
yoshikawa.kohei615@gmail.com
Shuichi Kawano
skawano@ai.lab.uec.ac.jp
Graduate School of Informatics and Engineering, The University of
Electro-Communications, Chofu-shi, 东京 182-8585, 日本
We consider the problem of extracting a common structure from multi-
ple tensor data sets. For this purpose, we propose multilinear common
成分分析 (MCCA) based on Kronecker products of mode-wise
covariance matrices. MCCA constructs a common basis represented by
linear combinations of the original variables that lose little information
of the multiple tensor data sets. We also develop an estimation algorithm
for MCCA that guarantees mode-wise global convergence. Numerical
studies are conducted to show the effectiveness of MCCA.
1 介绍
Various statistical methodologies for extracting useful information from a
large amount of data have been studied over the decades since the appear-
ance of big data. In the present era, it is important to discover a common
structure of multiple data sets. In an early study, Flury (1984) focused on
the structure of the covariance matrices of multiple data sets and discussed
the heterogeneity of the structure. The author reported that population
covariance matrices differ among multiple data sets in practical applica-
系统蒸发散. Many methodologies have been developed for treating the hetero-
geneity between covariance matrices of multiple data sets (看, Flury, 1986,
1988; Flury & Gautschi, 1986; Pourahmadi, Daniels, & 公园, 2007; 王,
Banerjee, & Boley, 2011; 公园 & Konishi, 2020).
Among such methodologies, common component analysis (CCA; 王
等人。, 2011) is an effective tool for statistics. The central idea of CCA is to
reduce the number of dimensions of data while losing as little information
of the multiple data sets as possible. To reduce the number of dimensions,
CCA reconstructs the data with a few new variables that are linear combi-
nations of the original variables. For considering the heterogeneity between
covariance matrices of multiple data sets, CCA assumes that there is a dif-
ferent covariance matrix for each data set. There have been many papers
on various statistical methodologies using multiple covariance matrices:
神经计算 33, 2853–2880 (2021) © 2021 麻省理工学院.
https://doi.org/10.1162/neco_a_01425
在知识共享下发布
归因 4.0 国际的 (抄送 4.0) 执照.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2854
K. Yoshikawa and S. Kawano
discriminant analysis (Bensmail & Celeux, 1996), spectral decomposition
(Boik, 2002), and a likelihood ratio test for multiple covariance matrices
(Manly & 雷纳, 1987). It should be noted that principal component anal-
分析 (PCA) (皮尔逊, 1901; Jolliffe, 2002) is a technique similar to CCA. 在
事实, CCA is a generalization of PCA; PCA can only be applied to one data
放, whereas CCA can be applied to multiple data sets.
同时, in various fields of research, including machine learning and
计算机视觉, the main interest has been in tensor data, which has a mul-
tidimensional array structure. In order to apply the conventional statistical
methodologies, such as PCA, to tensor data, a simple approach is to first
transform the tensor data into vector data and then apply the methodol-
奥吉. 然而, such an approach causes the following problems:
1. In losing the tensor structure of the data, the approach ignores the
higher-order inherent relationships of the original tensor data.
2. Transforming tensor data to vector data substantially increases the
number of features. It also has a high computational cost.
To overcome these problems, statistical methodologies for tensor data anal-
yses have been proposed that take the tensor structure of the data into
consideration. Such methods enable us to accurately extract higher-order
inherent relationships in a tensor data set. 尤其, many existing statis-
tical methodologies have been extended for tensor data, 例如, mul-
tilinear principal component analysis (MPCA) (卢等人。, 2008) and sparse
PCA for tensor data analysis (艾伦, 2012; 王, Sun, 陈, Pang, & 周,
2012; Lai, 徐, 陈, 哪个, & 张, 2014), as well as others (see Carroll &
张, 1970; Harshman, 1970; Kiers, 2000; Badeau & 博耶, 2008; Kolda &
Bader, 2009).
In this letter, we extend CCA to tensor data analysis, proposing multi-
linear common component analysis (MCCA). MCCA discovers the com-
mon structure of multiple data sets of tensor data while losing as little of
the information of the data sets as possible. To identify the common struc-
真实, we estimate a common basis constructed as linear combinations of
the original variables. For estimating the common basis, we develop a new
estimation algorithm based on the idea of CCA. In developing the estima-
tion algorithm, two issues must be addressed: the convergence properties
of the algorithm and its computational cost. To determine the convergence
特性, we investigate first the relationship between the initial values
of the parameters and global optimal solution and then the monotonic con-
vergence of the estimation algorithm. These analyses reveal that our pro-
posed algorithm guarantees convergence of the mode-wise global optimal
solution under some conditions. To analyze the computational efficacy, 我们
calculate the computational cost of our proposed algorithm.
The rest of the letter is organized as follows. In section 2, we review
the formulation and the minimization problem of CCA. In section 3, 我们
formulate the MCCA model by constructing the covariance matrices of
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2855
tensor data, based on a Kronecker product representation. Then we for-
mulate the estimation algorithm for MCCA in section 4. In section 5, 我们
present the theoretical properties for our proposed algorithm and ana-
lyze the computational cost. The efficacy of the MCCA is demonstrated
through numerical experiments in section 6. Concluding remarks are pre-
sented in section 7. Technical proofs are provided in the appendixes. 我们的
implementation of MCCA and supplementary materials are available at
https://github.com/yoshikawa-kohei/MCCA.
2 Common Component Analysis
, . . . X(G)的](西德:2) ∈ RNg×P with
Suppose that we obtain data matrices X(G)
Ng observations and P variables for g = 1, . . . , G, where x(G)i is the P-
dimensional vector corresponding to the ith row of X(G) and G is the number
of data sets. Then the sample covariance matrix in group g is
= [X(G)1
S(G)
= 1
的
的(西德:2)
我=1
(西德:3)
X(G)我
− ¯x(G)
(西德:4) (西德:3)
X(G)我
− ¯x(G)
(西德:4)(西德:2) ,
g = 1, . . . , G,
(2.1)
∈ SP
哪里(G)
ces of size P × P, and ¯x(G)
group g.
+, in which SP
= 1
的
(西德:5)
+ is a set of symmetric positive-definite matri-
的
i=1 x(G)i is a P-dimensional mean vector in
The main idea of the CCA model is to find the common structure of mul-
tiple data sets by projecting the data onto a common lower-dimensional
space with the same basis as the data sets. Wang et al. (2011) assumed that
the covariance matrices S(G) for g = 1, . . . , G can be decomposed to a prod-
uct of latent covariance matrices and an orthogonal matrix for the linear
transformation as
S(G)
= V(西德:2)
(G)V
(西德:2) + 乙(G)
,
(西德:2)
s.t. V
V = IR,
(2.2)
(西德:7)
(G)
∈ SP
∈ SR
(西德:6)
e(G)我
+ is the latent covariance matrix in group g, V ∈ RP×R is an
在哪里 (西德:2)
orthogonal matrix for the linear transformation, 乙(G)
+ is the error matrix
in group g, and IR is the identity matrix of size R × R. 乙(G) consists of the sum
of outer products for independent random vectors
(G)i with mean
(> O) (i = 1, 2, . . . , 的). V de-
乙
termines the R-dimensional common subspace of the multiple data sets. 在
特别的, by assuming R < P, the CCA can discover the latent structures
of the data sets. Wang et al. (2011) referred to the model, equation 2.2, as
common component analysis (CCA).
The parameters V and (cid:2)
(g) (g = 1, . . . , G) are estimated by solving the
= 0 and covariance matrix Cov
i=1 e(g)ie(cid:2)
(cid:6)
e(g)i
(cid:5)
Ng
(cid:7)
minimization problem,
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
c
o
_
a
_
0
1
4
2
5
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
2856
K. Yoshikawa and S. Kawano
G(cid:2)
g=1
min
V,(cid:2)
(g)
g=1,...,G
(cid:4)S(g)
− V(cid:2)
(g)V
(cid:2)(cid:4)2
F
,
(cid:2)
s.t. V
V = IR,
(2.3)
where (cid:4) · (cid:4)F denotes the Frobenius norm. The estimator of latent covari-
ance matrices (cid:2)
(g) for g = 1, . . . , G can be obtained by solving the mini-
= V(cid:2)S(g)V. By using the estimated value ˆ(cid:2)
mization problem as ˆ(cid:2)
(g), the
minimization problem can be reformulated as the following maximization
problem:
(g)
⎧
⎨
(cid:2)
⎩V
G(cid:2)
(cid:3)
g=1
max
V
tr
(cid:2)
S(g)VV
S(g)
(cid:4)
⎫
⎬
,
V
⎭
(cid:2)
s.t. V
V = IR,
(2.4)
where tr(·) denotes the trace of a matrix. A crucial issue for solving the
maximization problem 2.4 is the nonconvexity. Certainly the maximization
problem is nonconvex since the problem is defined on a set of orthogonal
matrices, which is a nonconvex set. Generally it is difficult to find the global
optimal solution in nonconvex optimization problems. To overcome this
drawback, Wang et al. (2011) proposed an estimation algorithm in which
the estimated parameters are guaranteed to constitute the global optimal
solution under some conditions.
3 Multilinear Common Component Analysis
In this section, we introduce a mathematical formulation of the MCCA, an
extension of the CCA in terms of tensor data analysis. Moreover, we for-
mulate an optimization problem of MCCA and investigate its convergence
properties.
Suppose that we independently obtain Mth order tensor data X
×...×PM for i = 1, . . . Ng. We set the data sets of the tensors X
×P2
, X
∈
RP1
=
×···×PM×Ng for g = 1, . . . , G, where G is the
[X
number of data sets. Then the sample covariance matrix in group g for the
tensor data set is defined by
(g)Ng] ∈ RP1
, . . . , X
×P2
(g)1
(g)2
(g)i
(g)
S
∗
(g) := S(1)
(g)
⊗ S(2)
(g)
⊗ · · · ⊗ S(M)
(g)
,
(3.1)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
n
e
c
o
a
r
t
i
c
e
-
p
d
/
l
f
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
c
o
_
a
_
0
1
4
2
5
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
(cid:14)
+, in which P =
∈ SPk
k=1 Pk, ⊗ denotes the Kronecker product op-
+ is the sample covariance matrix for kth mode in group
M
∈ SP
where S∗
(g)
erator, and S(k)
(g)
g defined by
S(k)
(g) :=
1
(cid:14)
j(cid:7)=k Pj
Ng
(cid:15)
X(k)
(g)i
Ng(cid:2)
i=1
(cid:16) (cid:15)
− ¯X(k)
(g)
X(k)
(g)i
− ¯X(k)
(g)
(cid:16)(cid:2)
.
(3.2)
Multilinear Common Component Analysis
2857
(cid:14)
×(
(cid:14)
(g)
∈
×(
¯X
∈ RPk
= 1
Ng
j(cid:7)=k Pj ) is the mode-k unfolded matrix of X
(cid:5)
Here, X(k)
(g)i, and ¯X(k)
(g)i
(g)
(cid:14)
×(
Ng
RPk
j(cid:7)=k Pj ) is the mode-k unfolded matrix of
i=1
that the mode-k unfolding from an Mth order tensor X ∈ RP1
a matrix X(k) ∈ RPk
j(cid:7)=k Pj ) means that the tensor element (p1
maps to matrix element (pk
(cid:14)
t−1
, p2
m=1,m(cid:7)=k Pm, in which p1
(g)i. Note
×···×PM to
, . . . , pM)
t=1,t(cid:7)=k(pt − 1)Lt with Lt =
, l), where l = 1 +
, . . . , pM denote the indices of the Mth order
tensor X . For a more detailed description of tensor operations, see Kolda
and Bader (2009). A representation of the tensor covariance matrix by Kro-
necker products is often used (Kermoal, Schumacher, Pedersen, Mogensen,
& Frederiksen, 2002; Yu et al., 2004; Werner, Jansson, & Stoica, 2008).
X
×P2
, p2
(cid:5)
M
To formulate CCA in terms of tensor data analysis, we consider CCA for
the kth mode covariance matrix in group g as follows,
S(k)
(g)
= V(k)(cid:2)(k)
(g)V(k)
(cid:2)
+ E(k)
(g)
,
s.t. V(k)
(cid:2)
V(k) = IRk
,
(3.3)
∈ SRk
×Rk is an orthogonal matrix for the linear transformation, and E(k)
(g)
where (cid:2)(k)
+ is the latent kth mode covariance matrix in group g, V(k) ∈
(g)
RPk
∈ SPk
+
is the error matrix in group g. E(k)
(g) consists of the sum of outer products
(cid:5)
= 0 and
with mean E
for independent random vectors
Ng
(cid:18)
(cid:2)
i=1 e(k)
(> 氧) (i = 1, 2, . . . , 的). Since S∗
(G)我
(G)ie(k)
(西德:17)
e(k)
(G)我
(G) can be de-
(G) for k = 1, . . . , M in formula 3.1, 我们
covariance matrix Cov
composed to a Kronecker product of S(k)
obtain the following model,
(西德:17)
e(k)
(G)我
(西德:18)
S
∗
(G)
= V
∗(西德:2)∗
(G)V
∗(西德:2) + 乙
,
∗
(G)
s.t. V
∗(西德:2)
∗ = IR,
V
(3.4)
(西德:14)
where R =
⊗ (西德:2)(中号)
(G) , and E∗
中号
k=1 Rk, V∗ = V(1) ⊗ V(2) ⊗ · · · ⊗ V(中号), (西德:2)∗
(G) is the error matrix in group g. We refer to this model as
= (西德:2)(1)
(G)
⊗ (西德:2)(2)
(G)
⊗ · · ·
(G)
multilinear common component analysis (MCCA).
To find the R-dimensional common subspace between the multiple ten-
sor data sets, MCCA determines V(1), V(2), . . . , V(中号). As with CCA, we ob-
tain the estimate of (西德:2)∗
. With respect
to V∗
, we can obtain the estimate by solving the following maximization
问题, which is similar to equation 2.4:
(G) for g = 1, . . . , G as ˆ(西德:2)∗
(G)V∗
S∗
= V∗(西德:2)
(G)
⎧
⎨
⎩V
∗(西德:2)
G(西德:2)
(西德:15)
S
g=1
max
V∗
tr
∗
∗
(G)V
∗(西德:2)
V
S
∗
(G)
(西德:16)
∗
V
⎫
⎬
⎭
,
s.t. V
∗(西德:2)
∗ = IR.
V
(3.5)
然而, the number of parameters will be very large when we try to
solve this problem directly, and thus results in a high computational cost.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2858
K. Yoshikawa and S. Kawano
而且, it may not be possible to discover the inherent relationships
among the variables in each mode simply by solving problem 3.5.
To solve the maximization problem efficiently and identify the inherent
关系, the maximization problem 3.5 can be decomposed into the
mode-wise maximization problems represented in the following lemma.
Lemma 1. An estimate of the parameters V(k) for k = 1, 2, . . . , M in the maxi-
mization problem 3.5 can be obtained by solving the following maximization prob-
lem for each mode:
G(西德:2)
中号(西德:19)
max
V(k)
k=1,2,…,中号
g=1
k=1
(西德:20)
V(k)
(西德:2)
tr
S(k)
(G)V(k)V(k)
(西德:2)
S(k)
(G)V(k)
(西德:21)
,
s.t. V(k)
(西德:2)
V(k) = IRk
.
(3.6)
然而, we cannot simultaneously solve this problem for V(k), k =
1, 2, . . . , 中号. 因此, by summarizing the terms unrelated to V(k) in maximiza-
化问题 3.6, we can obtain the maximization problem for kth mode,
max
V(k)
fk(V(k)) = max
V(k)
tr
(西德:20)
V(k)
(西德:2)
中号(V(k))V(k)
(西德:21)
,
s.t. V(k)
(西德:2)
V(k) = IRk
,
(3.7)
其中 M(V(k)) =
(西德:5)
G
g=1
w(−k)
(G) S(k)
(G)V(k)V(k)
(西德:2)
(G), in which w(−k)
S(k)
(G)
is given by
w(−k)
(G)
=
(西德:20)
V( j)
(西德:2)
tr
(西德:19)
j(西德:7)=k
S( j)
(G)V( j)V( j)
(西德:2)
S( j)
(G)V( j)
(西德:21)
.
(3.8)
Although an estimate of V(k) can be obtained by solving maximization prob-
莱姆 3.7, this problem is nonconvex, since V(k) is assumed to be an orthog-
onal matrix. 因此, the maximization problem has several local maxima.
然而, by choosing the initial values of parameters in the estimation
near the global optimal solution, we can obtain the global optimal solu-
的. In section 4, we develop not only an estimation algorithm but also an
initialization method for choosing the initial values of the parameters near
the global optimal solution. The initialization method helps guarantee the
convergence of our algorithm to the mode-wise global optimal solution.
4 Estimation
Our estimation algorithm consists of two steps: initializing the parameters
and iteratively updating the parameters. The initialization step gives us the
initial values of the parameters near the global optimal solution for each
mode. 下一个, by iteratively updating the parameters, we can monotonically
increase the value of the objective function 3.7 until convergence.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2859
4.1 Initialization. The first step is to initialize the parameters V(k) 为了
(西德:21)
(西德:4)
each mode. We define an objective function f (西德:8)
w(−k)
(G) S(k)
for k = 1, . . . , 中号, 其中 M
(G). 下一个, we adopt a max-
imizer of f (西德:8)
k(V(k)) as initial values of the parameters V(k). To obtain the maxi-
(西德:18)
(西德:17)
w(−k)
悲惨的, we need an initial value of w(k) =
. The initial
(1)
value for w(k) is obtained by solving the quadratic programming problem,
k(V(k)) = tr
(G)S(k)
, . . . , w(−k)
(G)
, w(−k)
(2)
(西德:3)
我(k)
V(k)
G
g=1
我(k)
(西德:5)
中号
=
(西德:20)
V(k)
(西德:2)
(西德:3)
(西德:4)
(西德:2)
w(k)
λ(k)
0
λ(k)
0
(西德:2)
w(k),
min
w(k)
s.t. w(k) > 0, w(k)
(西德:2)
(西德:2)
λ(k)
1
λ(k)
1
w(k) = 1,
(4.1)
在哪里
λ(k)
0
=
λ(k)
1
=
⎡
⎣
Pk(西德:2)
i=Rk
+1
,
λ(k)
(1)我
Pk(西德:2)
i=Rk
+1
, . . . ,
λ(k)
(2)我
(西德:26)
Pk(西德:2)
我=1
,
λ(k)
(1)我
Pk(西德:2)
我=1
, . . . ,
λ(k)
(2)我
Pk(西德:2)
我=1
⎤
(西德:2)
⎦
λ(k)
(G)我
,
Pk(西德:2)
i=Rk
+1
(西德:27)(西德:2)
λ(k)
(G)我
,
(4.2)
in which λ( j)
(G)i is the ith largest eigenvalue of S( j)
(G)S( j)
(G).
0 by maximizing f (西德:8)
Using the initial value of w(k), we can obtain the initial value of the pa-
rameter V(k)
k(V(k)) for each mode. The maximizer consists
of Rk eigenvectors, corresponding to the Rk largest eigenvalues, obtained by
(西德:4)
我(k)
eigenvalue decomposition of M
. The theoretical justification for this
initialization is discussed in section 5.
(西德:3)
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
4.2 Iterative Update of Parameters. The second step is to update pa-
rameters V(k) for each mode. We update parameters such that the objective
function fk(V(k)) is maximized. Let V(k)
s be the value of V(k) at step s. 然后
we solve the surrogate maximization problem,
(西德:20)
V(k)
s+1
(西德:2)
中号(V(k)
s )V(k)
s+1
(西德:21)
,
(西德:2)
s.t. V(k)
s+1
V(k)
s+1
= IRk
.
(4.3)
tr
max
V(k)
s+1
The solution of equation 4.3 consists of Rk eigenvectors, 相应的
to the Rk largest eigenvalues, obtained by eigenvalue decomposition of
中号(V(k)
s ). By iteratively updating the parameters, the objective function
fk(V(k)) is monotonically increased, which allows it to be maximized. 这
monotonically increasing property is discussed in section 5.
Our estimation procedure comprises the above estimation steps. 这
procedure is summarized as algorithm 1.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2860
K. Yoshikawa and S. Kawano
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
5 理论
This section presents the theoretical and computational analyses for algo-
rithm 1. Theoretical analyses consist of two steps. 第一的, we prove that the
initial values of parameters obtained in section 4.1 are relatively close to
the global optimal solution. If the initial values are close to the global maxi-
妈妈, then we can obtain the global optimal solution even if the maximiza-
tion problem is nonconvex. 第二, we prove that the iterative updates of
the parameters in section 4.2 monotonically increase the value of objective
function 3.7 by solving the surrogate problem 4.3. From the monotonically
increasing property, the estimated parameters always converge at a sta-
tionary point. The combination of these two results enables us to obtain
the mode-wise global optimal solution. In the computational analysis, 我们
calculate computational cost for MCCA and then compare the cost with
conventional methods. By comparing the costs, we investigate the compu-
tational efficacy of MCCA.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2861
5.1 Analysis of Upper and Lower Bounds. This section aims to pro-
vide the upper and lower bounds of the maximization problem 3.7. 从
the bounds, we find that the initial values in section 4.1 are relatively close
to the global optimal solution. Before providing the bounds, we define a
contraction ratio.
Definition 1. Let
我(k)
tr
be the global maximum of
. Then a contraction ratio of data for kth mode is defined by
(西德:28)
中号
(西德:4)(西德:29)
(西德:3)
F (西德:8) max
k
F (西德:8)
k(V(k)) 和M(k) =
A(k) = f (西德:8) max
k
中号(k)
tr
=
(西德:3)
(西德:2)
(西德:20)
V(k)
0
(西德:28)
中号
tr
中号
(西德:3)
我(k)
我(k)
V(k)
0
(西德:4)
(西德:4)(西德:29)
(西德:21)
.
(5.1)
Note that a contraction ratio α(k) satisfies 0 ≤ α(k) ≤ 1 and α(k) = 1 if and
only if Rk
= Pk.
Using f (西德:8) max
and the contraction ratio α(k), we have the following theo-
rem that reveals the upper and lower bounds of the global maximum in
问题 3.7.
k
Theorem 1. Let f max
k
be the global maximum of fk(V(k)). 然后
A(k) F
(西德:8) max
k
≤ f max
k
≤ f
(西德:8) max
k
,
where α(k) is the contraction ratio defined in equation 5.1 and f (西德:8) max
maximum of f (西德:8)
k
k(V(k)).
(5.2)
is the global
k
→ f max
0 and w(k), V(k)
This theorem indicates that f (西德:8) max
k when α(k) → 1. 因此, it is im-
portant to obtain an α(k) that is as close as possible to one. Since α(k) depends
on V(k)
0 depends on w(k). From this dependency, if we could set
the initial value of w(k) such that α(k) is as large as possible, then we could
obtain an initial value of V(k)
. 下列
0
theorem shows that we can compute the initial value of w(k) such that α(k)
is maximized.
Theorem 2. Let λ(k)
be the vectors consisting of eigenvalues defined
0
(k = 1, 2, . . . , 中号), suppose
in equation 4.2. For w(k) =
that the estimate ˆw(k) is obtained by solving equation 4.1 for k = 1, 2, . . . , 中号. 然后
ˆw(k) maximizes α(k).
that attains a value near f max
and λ(k)
(西德:17)
1
w(−k)
(1)
, . . . , w(−k)
(G)
, w(−k)
(2)
(西德:18)
k
实际上, A(k) is very close to one with the initial values given in theorem 2
even if Rk is small. This resembles the cumulative contribution ratio in PCA.
5.2 Convergence Analysis. We next verify that our proposed pro-
cedure for iteratively updating parameters maximizes the optimization
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2862
K. Yoshikawa and S. Kawano
问题 3.7. In algorithm 1, the parameter V(k)
s+1 can be obtained by solving
the surrogate maximization problem 4.3. Theorem 3 shows that we can
monotonically increase the value of the function fk(V(k)) in equation 3.7 经过
algorithm 1.
Theorem 3. Let V(k)
价值观, obtained by eigenvalue decomposition of M(V(k)
s+1 be Rk eigenvectors, corresponding to the Rk largest eigen-
s ). 然后
fk(V(k)
s ) ≤ fk(V(k)
s+1).
(5.3)
From theorem 1, we obtain initial values of the parameters that are near
the global optimal solution. By combining theorems 1 和 3, the solution
from algorithm 1 can be characterized by the following corollary.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
Corollary 1. Consider the maximization problem 3.7. Suppose that the initial
value of the parameter is obtained by V(k)
(V(k)), and the parameter
0
(西德:30)
fk
(西德:8)
= arg max
V(k)
V(k)
is repeatedly updated by algorithm 1. Then the mode-wise global maximum
s
for the maximization problem 3.7 is achieved when all the contraction ratios α(k)
for k = 1, 2, . . . , M go to one.
Algorithm 1 does not guarantee the global solution due to the fundamen-
tal problem of nonconvexity, but it is enough for pragmatic purposes. 我们
investigate the issue of convergence to global solution through numerical
studies in section 6.3.
5.3 Computational Analysis. 第一的, we analyze the computational cost.
Pj for j = 1, 2, . . . , 中号. 这
To simplify the analysis, we assume P = arg max
j
implies that P is the upper bound of R j for all j. We then calculate the upper
bound of the computational complexity.
The expensive computations of the each iteration in algorithm 1 骗局-
sist of three parts: the formulation of M(V(k)
s ), the eigenvalue decomposi-
tion of M(V(k)
s ), and updating latent covariance matrices (西德:2)(k)
G . These steps
are O(GM2P3), 氧(P3), 和O(GMP3), 分别. The total computational
complexity per iteration is then O(GM2P3).
下一个, we analyze the memory requirement of algorithm 1. MCCA repre-
sents the original tensor data with fewer parameters by projecting the data
× Rk projection matri-
onto a lower-dimensional space. This requires the Pk
(西德:16)
ces V(k) for k = 1, 2, . . . , 中号. MCCA projects the data with size of N
中号
k=1 Pk
(西德:5)
G
to N
g=1 Ng. 因此, the required size for the pa-
(西德:15)(西德:14)
中号
k=1 Rk
(西德:16)
, where N =
中号
k=1 PkRk
. MPCA requires the same amount of
(西德:15)(西德:14)
中号
k=1 Rk
(西德:5)
rameters is
+ 氮
(西德:15)(西德:14)
(西德:16)
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2863
桌子 1: Comparisons of the Computational Complexity and the Memory
Requirement.
方法
Computational Complexity
PCA
CCA
MPCA
MCCA
氧(P3M )
氧(GP3M )
氧(NMPM+1 )
氧(GM2P3 )
Memory Reqirement
(西德:15)(西德:14)
(西德:15)(西德:14)
右
右
(西德:16)
中号
k=1 Pk
中号
k=1 Pk
+ NR
(西德:16)
+ NR
(西德:15)(西德:14)
中号
k=1 PkRk
中号
k=1 PkRk
+ 氮
+ 氮
(西德:15)(西德:14)
中号
k=1 Rk
中号
k=1 Rk
(西德:5)
(西德:5)
(西德:16)
(西德:16)
memory as MCCA. 同时, CCA and PCA need a projection matrix,
. The required size for the parameters is then
(西德:16)
(西德:15)(西德:14)
中号
k=1 Pk
which is size R
(西德:16)
(西德:15)(西德:14)
中号
k=1 Pk
右
+ NR.
To compare the computational cost clearly, the upper bounds of compu-
tational complexity and the memory requirement are summarized in Table
1. 桌子 1 shows that the computational complexity of MCCA is superior
to that of the other algorithms and the complexity of MCCA is not limited
by sample size. 相比之下, the MPCA algorithm is affected by the sample
尺寸 (鲁, Plataniotis, & Venetsanopoulos, 2008). 此外, MCCA and
MPCA require a large amount of memory when the number of modes in
a data set is large, but their memory requirements are much smaller than
those of PCA and CCA.
6 实验
To demonstrate the efficacy of MCCA, we applied MCCA, PCA, CCA, 和
MPCA to image compression tasks.
6.1 Experimental Setting. For the experiments, we prepared the follow-
ing three image data sets:
MNIST data set consists of data of handwritten digits 0, 1, . . . , 9 at im-
age sizes of 28 × 28 像素. The data set includes a training data set
的 60,000 images and a test data set of 10,000 图片. We used the
第一的 10 training images of the data set for each group. The MNIST
数据集 (Lecun, 波图, 本吉奥, & Haffner, 1998) 可以在 http 上找到:
//yann.lecun.com/exdb/mnist/.
AT&时间 (ORL) face data set contains gray-scale facial images of 40 人们.
The data set has 10 images sized 92 × 112 pixels for each person. 我们
used images resized by a factor of 0.5 to improve the efficiency of the
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2864
K. Yoshikawa and S. Kawano
桌子 2: Summary of the Data Sets.
Data Set
MNIST
AT&时间(ORL)
Cropped AR
团体
尺寸
Sample Size
(/团体)
Number of
Dimensions
数字
of Groups
小的
小的
中等的
大的
小的
中等的
大的
10
10
14
28 × 28 = 784
46 × 56 = 2576
30 × 41 × 3 = 7380
10
10
20
40
10
25
50
实验. The AT&T face data set is available at https://git-disl.
github.io/GTDLBench/datasets/att_face_dataset/. All the credits of
this data set go to AT&T Laboratories Cambridge.
Cropped AR database has color facial images of 100 人们. These im-
ages are cropped around the face. The size of images is 120 × 165 × 3
像素. The data set contains 26 images in each group, 12 其中
are images of people wearing sunglasses or scarves. We used the
cropped facial images of 50 males who were not wearing sunglasses
or scarves. Due to memory limitations, we resized these images by
a factor of 0.25. The AR database (Martinez & Benavente, 1998; 三月-
tinez & Kak, 2001) 可以在 http 上找到://www2.ece.ohio-state.edu/∼
aleix/ARdatabase.html.
The data set characteristics are summarized in Table 2.
To compress these images, we performed dimensionality reductions by
MCCA, PCA, CCA, and MPCA, as follows. We vectorized the tensor data
set before performing PCA and CCA. In MCCA, the images were com-
pressed and reconstructed according to the following steps:
1. Prepare the multiple image data sets X
∈ RP1
×P2
×···×PM×Ng for g =
(G)
1, 2, . . . , G.
(G) for g = 1, 2, . . . , G.
2. Compute the covariance matrix of X
3. From these covariance matrices, compute the linear transforma-
for i = 1, 2, . . . , M for mapping to the
×Ri
, . . . , RM)-dimensional latent space.
· · · ×M VM ∈
(G)我
is the i-mode product of
to X
×···×RM , where the operator ×
我
tion matrices Vi
(R1
, R2
4. Map the
ith sample X
∈ RPi
1 V1
2 V2
×R2
×
×
(G)我
RR1
tensor (Kolda & Bader, 2009).
5. Reconstruct ith sample ˜X
= X
×
1 V1V(西德:2)
1
(G)我
×
2 V2V(西德:2)
2
· · · ×M VMV(西德:2)
中号.
(G)我
同时, PCA and MPCA each require a single data set. 因此, we ag-
gregated the data sets as X = [X
g=1 Ng and
performed PCA and MPCA for data set X .
(G)] ∈ RP1
, . . . , X
×···×PM×
, X
×P2
(2)
(1)
(西德:5)
G
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2865
6.2 Performance Assessment. For MCCA and MPCA, the reduced di-
mensions R1 and R2 were chosen as the same number, and then we fixed
R3 as two. All computations were performed by the software R (version 3.6)
(R核心团队, 2019). In the initialization of MCCA, solving the quadratic
programming problem was carried out using the function ipop in the pack-
age kernlab. MPCA was implemented as the function mpca in the package
rTensor. (The implementations of MCCA, PCA, and CCA are available at
https://github.com/yoshikawa-kohei/MCCA.)
To assess their performances, we calculated the reconstruction error rate
(RER) under the same compression ratio (CR). RER is defined by
RER =
(西德:31)
(西德:31)
(西德:31)2
(西德:31)X − (西德:30)X
F
(西德:4)X (西德:4)2
F
,
(6.1)
, (西德:30)X
(西德:30)X = [
(西德:30)X
(G)
在哪里
tensors
(G)2
norm of a tensor X ∈ RP1
(西德:30)X
(1)
= [ ˜X
, . . . , (西德:30)X
(2)
, ˜X
(G)1
, . . . , ˜X
×P2
(G)] is the aggregated data set of reconstructed
(G)的] for g = 1, 2, . . . , G and (西德:4)X (西德:4)F is the
×···×PM computed by
(西德:4)X (西德:4)F =
!
!
!
”
P1(西德:2)
P2(西德:2)
下午(西德:2)
···
=1
p1
=1
p2
pM=1
x2
p1
,p2
,…,pM
,
(6.2)
in which xp1
fined CR as
,p2
,…,pM is an element (p1
, p2
, . . . , pM) of X . 此外, we de-
CR =
{The number of required parameters}
(西德:14)
中号
k=1 Pk
N ·
.
(6.3)
(西德:5)
中号
k=1 PkRk
+
(西德:16)
(西德:15)(西德:14)
中号
k=1 Pk
+ NR.
The number of required parameters for MCCA and MPCA is
(西德:16)
(西德:15)(西德:14)
中号
k=1 Rk
氮
, whereas that for CCA and PCA is R
数字 1 plots the RER obtained by estimating various reduced dimen-
sions for the AT&时间(ORL) data set with group sizes of small, medium, 和
大的. As the figures for the results of the other data sets were similar to
数字 1, we show them in the supplementary materials S1.
From Figure 1, we observe that the RER material MCCA is the smallest
for any value of CR. This indicates that MCCA performs better than the
other methods. 此外, note that CCA performs better than MPCA only
for fairly small values of CR, even though it is a method for vector data,
whereas MPCA performs better for larger values of CR. This implies the
limitations of CCA for vector data.
Next we consider group size by comparing panels a, 乙, and c in Figure 1.
The value of CR at the intersection of CCA and MPCA increases with
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2866
K. Yoshikawa and S. Kawano
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
数字 1: Plots of RER versus CR for the AT&时间(ORL) data set of various group
sizes: (A) 小的, (乙) medium, 和 (C) 大的.
increasing the group size. This indicates that MPCA has more trouble ex-
tracting an appropriate latent space as the group size increases. Since MPCA
does not consider the group structure, it is not possible to properly estimate
the covariance structure when the group size is large.
数字 2 shows the comparison of runtime for the AT&时间(ORL) 数据集
with group sizes of small, medium, 和大. Although Table 1 gives the
superiority of the computational complexity for MCCA, 数字 2 节目
that MCCA is slower than MPCA for any size of data set. This probably
arises from the difference of implementation of MCCA and MPCA: MCCA
is implemented by our hand-built source code, while MPCA is done by
the package rTensor. But when we compare MCCA with CCA, MCCA is
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2867
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 2: Plots of runtime versus CR for the AT&时间(ORL) data set of various
group sizes: (A) 小的, (乙) medium, 和 (C) 大的.
superior to CCA in terms of both computational complexity and the run-
time comparisons.
数字 3 plots the reconstructed images for the AT&时间(ORL) data set with
group sizes of the medium. This figure can be obtained by performing four
= 5 and R = 2. By setting the number
methodologies when we set R1
of the ranks in this way, we can compare the images with almost the same
CR, PCA, CCA, and MPCA can recover the average structure of face images,
but they cannot deal with changes in the angle of the face. MCCA can also
recover the detailed differences in each image.
= R2
6.3 Behavior of Contraction Ratio. We examined the behavior of con-
traction ratio α(k). We performed MCCA on the AT&时间(ORL) data set with
the medium group size and computed α(1) and α(2) with the various pairs
of reduced dimensions (R1
, R2) ∈ {1, 2, . . . , 25} × {1, 2, . . . , 25}.
数字 4 shows the values of α(1) and α(2) for all pairs of R1 and R2. 作为
显示, A(1) and α(2) were invariant to variations in R2 and R1, 分别.
所以, to facilitate visualization of changes in α(k), we draw Figure 5,
2868
K. Yoshikawa and S. Kawano
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
数字 3: The reconstructed images for the AT&时间(ORL) data set with the
medium group sizes under almost
the same CR. Image source: AT&时间
Laboratories Cambridge.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 4: A(1) and α(2) versus pairs of reduced dimensions (R1
, R2).
which represents α(1) and α(2) 为了, 分别, R2
= 1. 从
这些, we observe that when both R1 and R2 are greater than eight, 两个都
A(1) and α(2) are close to one.
= 1 and R1
6.4 Efficacy of Solving the Quadratic Programming Problem. 我们在-
vestigated the usefulness of determining the initial value of w(k) by solv-
ing the quadratic programming problem 4.1. We applied MCCA to the
Multilinear Common Component Analysis
2869
数字 5: A(1) and α(2) versus R1 and R2, 分别.
AT&时间(ORL) data set with the small, medium, and large number of groups.
此外, we used the smaller group size of three. For determining
the initial value of w(k), we consider three methods: solving the quadratic
programming problem 4.1 (MCCA:QP); setting all values of w(k) to one
(MCCA:FIX); and setting the values by random sampling according to the
uniform distribution U(0, 1) (MCCA:RANDOM). We computed the α(k)
= R2 (∈ {1, 2, . . . , 10}) for each of these
with the reduced dimensions R1
方法.
To evaluate the performance of these methods, we compared the val-
ues of α(k) and the number of iterations in the estimation. 的数量
iterations in the estimation is the number of repetitions of lines 7 到 9 in al-
gorithm 1. For MCCA(RANDOM), we performed 50 trials and calculated
averages of each of these indices.
数字 6 shows the comparisons of α(1) and α(2) when the initializa-
tion was performed by MCCA:QP, MCCA:FIX, and MCCA:RANDOM for
the AT&时间(ORL) data set with a group size of three. It was confirmed that
MCCA:QP provides the largest values of α(1) and α(2). 数字 7 shows the
number of iterations. MCCA:QP gives the smallest number of iterations
for almost all values of the reduced dimensions. This result indicates that
MCCA:QP converges to a solution faster than the other initialization meth-
消耗臭氧层物质. 然而, when the reduced dimension is greater than or equal to eight,
the other methods are competitive with MCCA:QP. A lack of difference in
the number of iterations could result from the closeness of the initial values
and the global optimal solution. Note that when the R1 and R2 are greater
than or equal to eight, A(1) and α(2) are sufficiently close to one, 基于
数字 6. This indicates that the initial values are close to the global optimal
solution obtained from theorem 1. 因此, the result shows almost the same
number of iterations for the three methods.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2870
K. Yoshikawa and S. Kawano
数字 6: Comparisons of α(1) and α(2) computed by using the initial values ob-
tained from the initializations MCCA:QP, MCCA:FIX, and MCCA:RANDOM
with the AT&时间(ORL) data set for a group size of three.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 7: Comparison of the number of iterations when the initialization
was performed by MCCA:QP, MCCA:FIX, and MCCA:RANDOM with the
AT&时间(ORL) data set for a group size of three.
人物 8 和 9 show comparisons for the AT&时间(ORL) data set with the
medium group size. Since the figures for the results of other group sizes are
similar to Figures 8 和 9, we show them in the supplementary materials
S2. 数字 8 shows results similar those in Figure 6, whereas Figure 9 节目
competitive performances for all reduced dimensions.
Multilinear Common Component Analysis
2871
数字 8: Comparisons of α(1) and α(2) computed using the initial values ob-
tained from the initialization of MCCA:QP, MCCA:FIX, and MCCA:RANDOM
with the AT&时间(ORL) data set and the medium group size.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
数字 9: Comparison of the number of iterations when the initialization
was perfomed by MCCA:QP, MCCA:FIX, and MCCA:RANDOM with the
AT&时间(ORL) data set and the medium group size.
7 结论
We have developed the multilinear common components analysis (MCCA)
by introducing a covariance structure based on the Kronecker product. 到
efficiently solve the nonconvex optimization problem for MCCA, 我们有
2872
K. Yoshikawa and S. Kawano
proposed an iteratively updating algorithm that exhibits some superior the-
oretical convergence properties. Numerical experiments have shown the
usefulness of MCCA.
具体来说, MCCA was shown to be competitive among the initializa-
tion methods in terms of the number of iterations. As the number of groups
增加, the overall number of samples increases. This may be the reason
why all methods required almost the same number of iterations for small,
medium, and large groups. Note that in this study, we used the Kronecker
product representation to estimate the covariance matrix for tensor data
套. Greenewald, 周, and Hero (2019) used the Kronecker sum repre-
sentation for estimating the covariance matrix, and it would be interesting
to extend the MCCA to this and other covariance representations.
附录A: Proof of Lemma 1
We provide two basic lemmas about Kronecker products before we prove
引理 1.
Lemma 2. For matrices A, 乙, C, and D such that matrix products AC and BD
can be calculated, the following equation holds:
(A ⊗ B)(C ⊗ D) = AC ⊗ BD.
Lemma 3. For square matrices A and B, the following equation holds:
tr(A ⊗ B) = tr(A)tr(乙).
These lemmas are known as the mixed-product property and the spec-
trum property, 分别. See Harville (1998) for detailed proofs.
Proof of Lemma 1. For the maximization problem 3.5, move the summa-
tion over index g out of the tr(·) and replace S∗
⊗
· · · ⊗ S(中号)
(G) and V∗
(G) 和V(1) ⊗ V(2) ⊗ · · · ⊗ V(中号), 分别. 然后
with S(1)
(G)
⊗ S(2)
(G)
G(西德:2)
g=1
max
V(k)
k=1,2,…,中号
(西德:15)
#(西德:15)
tr
V(1) ⊗ · · · ⊗ V(中号)
(西德:16)(西德:2) (西德:15)
S(1)
(G)
(西德:16) (西德:15)
(西德:16)
⊗ · · · ⊗ S(中号)
(G)
V(1) ⊗ · · · ⊗ V(中号)
V(1) ⊗ · · · ⊗ V(中号)
(西德:16)(西德:2) (西德:15)
S(1)
(G)
(西德:16) (西德:15)
(西德:16)$
⊗ · · · ⊗ S(中号)
(G)
V(1) ⊗ · · · ⊗ V(中号)
,
s.t. V(k)
(西德:2)
V(k) = IRk
.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2873
By lemmas 2 和 3, 我们有
G(西德:2)
(西德:20)(西德:15)
tr
V(1)
g=1
(西德:2)
S(1)
(G)V(1)V(1)
(西德:2)
S(1)
(G)V(1)
(西德:16)
max
(西德:2)
V(k)
V(k) =IRk
k=1,2,…,中号
(西德:15)
V(中号)
(西德:2)
···
S(中号)
(G) V(中号)V(中号)
(西德:2)
(西德:16)(西德:21)
S(中号)
(G) V(中号)
(西德:21)
= max
V(k)
(西德:2)
V(k) =IRk
k=1,2,…,中号
(西德:20)
V(k)
(西德:2)
tr
S(k)
(G)V(k)V(k)
(西德:2)
G(西德:2)
中号(西德:19)
g=1
k=1
S(k)
(G)V(k)
.
This leads to the maximization problem in lemma 1.
(西德:2)(西德:2)
附录B: Proof of Theorem 1
Theorem 1 can be easily shown from the following lemma.
Lemma 4. Consider the maximization problem
max
V(k)
F
(西德:8)
k(V(k)) = max
V(k)
tr
⎧
⎨
⎩V(k)
⎛
(西德:2)
⎝
G(西德:2)
g=1
⎞
⎠ V(k)
⎫
⎬
⎭
.
(B.1)
w(−k)
(G) S(k)
(G)S(k)
(G)
Let M(k) = tr
(西德:20)(西德:5)
G
g=1
w(−k)
(G) S(k)
(G)S(k)
(G)
(西德:21)
. 然后
F (西德:8)
k(V(k))2
中号(k)
≤ fk(V(k)) ≤ f
(西德:8)
k(V(k)).
Proof of Lemma 4. 第一的, we prove fk(V(k)) ≤ f (西德:8)
onal matrix V(k) ∈ RPk
×(Pk
V(k)
V(k)
k(V(k)). For any orthog-
×Rk , we can always find an orthogonal matrix
(西德:2) +
⊥ = O. Then the equation V(k)V(k)
−Rk ) that satisfies V(k)(西德:2)V(k)
= IPk holds. By definition,
⊥ ∈ RPk
(西德:2)
⊥ V(k)
⊥
fk(V(k)) = tr
≤ tr
⎧
⎨
⎩V(k)
⎧
⎨
⎩V(k)
⎛
(西德:2)
G(西德:2)
⎝
g=1
⎛
(西德:2)
G(西德:2)
⎝
g=1
⎞
w(−k)
(G) S(k)
(G)V(k)V(k)
(西德:2)
S(k)
(G)
⎠ V(k)
⎫
⎬
⎭
w(−k)
(G) S(k)
(G)
(西德:15)
V(k)V(k)
(西德:2)
(西德:16)
(西德:2)
+ V(k)
⊥ V(k)
⊥
⎞
S(k)
(G)
⎠ V(k)
⎫
⎬
⎭
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2874
K. Yoshikawa and S. Kawano
⎞
⎠ V(k)
⎫
⎬
⎭
w(−k)
(G) S(k)
(G)S(k)
(G)
⎧
⎨
⎩V(k)
⎛
(西德:2)
G(西德:2)
⎝
g=1
= tr
= f
(西德:8)
k(V(k)).
因此, we have obtained fk(V(k)) ≤ f (西德:8)
k(V(k)).
下一个, we prove f (西德:8)
k (V(k) )2
中号(k)
matrices:
≤ fk(V(k)). We define the following block
)*
)*
A =
B =
w(−k)
(1) S(k)
(1)
1
2 V(k)V(k)
(西德:2)
S(k)
(1)
1
2 , . . . ,
*
+
w(−k)
(1) S(k)
(1)
, . . . ,
w(−k)
(G) S(k)
(G)
.
*
w(−k)
(G) S(k)
(G)
1
2 V(k)V(k)
(西德:2)
S(k)
(G)
+
1
2
,
Note that since S(k)
decomposed to S(k)
(G)
分别:
1
2 S(k)
(G)
(G) is a symmetric positive-definite matrix, S(k)
(G) 可
2 . We calculate the traces of AA, AB, and BB,
1
,
–
w(−k)
(G) tr
S(k)
(G)
1
2 V(k)V(k)
(西德:2)
S(k)
(G)
1
2 S(k)
(G)
1
2 V(k)V(k)
(西德:2)
1
2
S(k)
(G)
G(西德:2)
g=1
G(西德:2)
tr (AA) =
=
w(−k)
(G) tr
(西德:20)
(西德:2)
V(k)
S(k)
(G)V(k)V(k)
(西德:2)
S(k)
(G)V(k)
(西德:21)
g=1
⎧
⎨
⎩V(k)
= tr
⎛
(西德:2)
G(西德:2)
⎝
g=1
⎞
w(−k)
(G) S(k)
(G)V(k)V(k)
(西德:2)
S(k)
(G)
⎠ V(k)
⎫
⎬
⎭
= fk(V(k)),
tr (AB) =
=
=
G(西德:2)
g=1
G(西德:2)
g=1
G(西德:2)
g=1
S(k)
(G)
1
2 V(k)V(k)
(西德:2)
S(k)
(G)
1
2 S(k)
(G)
–
w(−k)
(G) tr
,
,
w(−k)
(G) tr
S(k)
(G)
1
2 V(k)V(k)
(西德:2)
S(k)
(G)
1
2 S(k)
(G)
1
2 S(k)
(G)
1
2
–
(西德:20)
w(−k)
(G) tr
(西德:2)
V(k)
(G)S(k)
S(k)
(G)V(k)
(西德:21)
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2875
⎞
⎠ V(k)
⎫
⎬
⎭
w(−k)
(G) S(k)
(G)S(k)
(G)
⎧
⎨
⎩V(k)
⎛
(西德:2)
G(西德:2)
⎝
g=1
= tr
= f
(西德:8)
k(V(k)),
tr (BB) = tr
⎛
G(西德:2)
⎝
g=1
⎞
⎠ = M(k).
w(−k)
(G) S(k)
(G)S(k)
(G)
From the Cauchy–Schwarz inequality, 我们有
fk(V(k))中号(k) = tr (AA) tr (BB) ≥
(西德:28)
tr (AB)
(西德:29)
2 = f
(西德:8)
k(V(k))2.
By dividing both sides of the inequality by M(k), we obtain f
(西德:8)
k (V(k) )2
中号(k)
≤ fk(V(k)).
(西德:2)
Proof of Theorem 1. Let f (西德:8) max
F (西德:8)
V(k)
k(V(k)). From lemma 4 and the definition of α(k), 我们有
0
be the global maximum of f (西德:8)
k(V(k)) 和
k
= arg max
V(k)
A(k) F
(西德:8) max
k
= f (西德:8)
0 )2
k(V(k)
中号(k)
≤ fk(V(k)
0 ).
Let f max
k
f max
. 因此,
k
be the global maximum of fk(V(k)). It then holds that fk(V(k)
0 ) ≤
A(k) F
(西德:8) max
k
≤ f max
k
.
Let V(k)
0∗ = arg max
V(k)
fk(V(k)). From lemma 4, 我们有
f max
k
= fk(V(k)
0∗ ) ≤ f
(西德:8)
k(V(k)
0∗ ).
Since f (西德:8)
k(V(k)
0∗ ) ≤ f (西德:8) max
k
, 我们有
f max
k
≤ f
(西德:8) max
k
.
因此, we have obtained α(k) F (西德:8) max
k
≤ f max
k
≤ f (西德:8) max
k
.
(西德:2)
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2876
K. Yoshikawa and S. Kawano
附录C: Proof of Theorem 2
Proof of Theorem 2. By definition
(西德:15)(西德:5)
(西德:2)
A(k) = f (西德:8) max
k
中号(k)
tr
=
(西德:20)
V(k)
0
G
g=1
G
g=1
w(−k)
(G) S(k)
w(−k)
(G) S(k)
(G)S(k)
(G)
(西德:21)
(G)S(k)
(G)
(西德:20)(西德:5)
tr
(西德:16)
(西德:21)
V(k)
0
.
(C.1)
By using the eigenvalue representation, we can rewrite the numerator of
A(k) 作为
(西德:8) max
k
F
=
G(西德:2)
g=1
w(−k)
(G)
Rk(西德:2)
我=1
.
λ(k)
(G)我
The denominator of α(k) can be represented as the sum of eigenvalues as
如下:
中号(k) =
G(西德:2)
g=1
w(−k)
(G)
Pk(西德:2)
我=1
.
λ(k)
(G)我
因此, we can transform α(k) as follows:
A(k) =
(西德:5)
(西德:5)
G
g=1
G
g=1
w(−k)
(G)
w(−k)
(G)
(西德:5)
(西德:5)
Rk
我=1
Pk
我=1
λ(k)
(G)我
λ(k)
(G)我
.
When we set
⎡
⎣
Pk(西德:2)
i=Rk
+1
,
λ(k)
(1)我
Pk(西德:2)
i=Rk
+1
, . . . ,
λ(k)
(2)我
(西德:26)
Pk(西德:2)
我=1
(西德:17)
w(−k)
(1)
,
λ(k)
(1)我
Pk(西德:2)
我=1
, . . . ,
λ(k)
(2)我
, w(−k)
(2)
, . . . , w(−k)
(G)
Pk(西德:2)
我=1
(西德:18)(西德:2)
,
λ(k)
0
=
λ(k)
1
=
w(k) =
⎤
(西德:2)
⎦
λ(k)
(G)我
,
Pk(西德:2)
i=Rk
+1
(西德:27)(西德:2)
λ(k)
(G)我
,
we can reformulate α(k) 作为
(西德:15)
(西德:16)(西德:2)
A(k) =
λ(k)
1
− λ(k)
0
(西德:2)
λ(k)
1
w(k)
w(k)
.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2877
因此, we obtain the following maximization problem:
(西德:15)
λ(k)
1
max
w(k)
(西德:16)(西德:2)
w(k)
− λ(k)
0
(西德:2)
λ(k)
1
w(k)
,
s.t. w(k) > 0.
Note that the constraints can be obtained by the definition of w(k). 在阿迪-
的, this maximization problem can be reformulated as
(西德:15)
λ(k)
1
max
w(k)
(西德:16)(西德:2)
w(k)
− λ(k)
0
(西德:2)
λ(k)
1
w(k)
= max
w(k)
1 -
(西德:2)
(西德:2)
w(k)
w(k)
λ(k)
0
λ(k)
1
= min
w(k)
(西德:2)
(西德:2)
w(k)
w(k)
.
λ(k)
0
λ(k)
1
(西德:2)
(西德:2)
w(k)/λ(k)
1
Since λ(k)
w(k) is nonnegative, solving the optimization problem
0
for the squared function of the objective function maintains generality.
因此, we can consider the following minimization problem:
min
w(k)
w(k)
w(k)
(西德:2)
(西德:2)
(西德:2)λ(k)
0
(西德:2)λ(k)
1
λ(k)
0
λ(k)
1
w(k)
w(k)
,
s.t. w(k) > 0.
此外, from the invariance under multiplication of w(k) by a constant,
we obtain the following objective function of the quadratic programming
问题:
(西德:2)
w(k)
λ(k)
0
λ(k)
0
(西德:2)
w(k),
min
w(k)
s.t. w(k) > 0, w(k)
(西德:2)
(西德:2)
λ(k)
1
λ(k)
1
w(k) = 1.
(西德:2)
Appendix D: Proof of Theorem 3
Proof of Theorem 3. We define the following block matrices:
)*
As =
w(−k)
(1) S(k)
(1)
1
2 V(k)
s V(k)
s
(西德:2)
S(k)
(1)
1
2 , . . . ,
*
w(−k)
(G) S(k)
(G)
1
2 V(k)
s V(k)
s
+
1
2
.
(西德:2)
S(k)
(G)
这里, we calculate the traces of AsAs, AsAs+1, and As+1As+1. The calcula-
tions of tr (AsAs) and tr (As+1As+1) are the same as that of tr (AA) by replac-
ing V(k) with V(k)
s+1, 分别, in lemma 4. 因此, 我们
obtain
s and V(k) with V(k)
tr (AsAs) = fk(V(k)
s ),
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2878
K. Yoshikawa and S. Kawano
tr (AsAs+1) =
=
G(西德:2)
g=1
G(西德:2)
,
w(−k)
(G) tr
S(k)
(G)
1
2 V(k)
s V(k)
s
(西德:2)
S(k)
(G)
1
2 S(k)
(G)
1
2 V(k)
s+1V(k)
s+1
–
(西德:2)
1
2
S(k)
(G)
w(−k)
(G) tr
(西德:20)
V(k)
s+1
(西德:2)
S(k)
(G)V(k)
s V(k)
s
(西德:2)
(西德:21)
(G)V(k)
S(k)
s+1
⎞
w(−k)
(G) S(k)
(G)V(k)
s V(k)
s
(西德:2)
S(k)
(G)
⎠ V(k)
s+1
⎫
⎬
⎭
,
g=1
⎧
⎨
⎩V(k)
s+1
= tr
⎛
(西德:2)
G(西德:2)
⎝
g=1
tr (As+1As+1) = fk(V(k)
s+1).
Since V(k)
s+1
= arg max
V(k)
tr
(西德:15)(西德:5)
(西德:2)
(西德:20)
V(k)
G
g=1
w(−k)
(G) S(k)
(G)V(k)
s V(k)
s
(西德:16)
(西德:21)
V(k)
, 我们有
(西德:2)
S(k)
(G)
fk(V(k)
s ) = tr
≤ tr
⎛
G(西德:2)
⎝
g=1
⎛
(西德:2)
s
⎧
⎨
⎩V(k)
⎧
⎨
⎩V(k)
s+1
w(−k)
(G) S(k)
(G)V(k)
s V(k)
s
⎞
(西德:2)
S(k)
(G)
⎠ V(k)
s
⎫
⎬
⎭
(西德:2)
G(西德:2)
⎝
g=1
w(−k)
(G) S(k)
(G)V(k)
s V(k)
s
⎞
(西德:2)
S(k)
(G)
⎠ V(k)
s+1
⎫
⎬
⎭
= tr (AsAs+1) .
From the positivity of both sides of the inequality, it holds that
fk(V(k)
s )2 ≤ [tr (AsAs+1)]2 .
此外, from the Cauchy–Schwarz inequality, 我们有
fk(V(k)
s ) fk(V(k)
s+1) = tr (AsAs) tr (As+1As+1)
≥ [tr (AsAs+1)]2 .
因此,
fk(V(k)
s ) fk(V(k)
s+1) ≥ [tr (AsAs+1)]2 ≥ fk(V(k)
s )2.
然后, we have obtained fk(V(k)
sides of the inequality by fk(V(k)
fk(V(k)
s+1).
s ) fk(V(k)
s )2 ≤ fk(V(k)
s ), we obtain the inequality fk(V(k)
s+1). By dividing both
s ) ≤
(西德:2)
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Multilinear Common Component Analysis
2879
致谢
We thank the reviewer for helpful comments and constructive suggestions.
S.K. was supported by JSPS KAKENHI grants JP19K11854 and JP20H02227,
and MEXT KAKENHI grants JP16H06429, JP16K21723, and JP16H06430.
参考
艾伦, G. (2012). Sparse higher-order principal components analysis. In Proceedings
of the Fifteenth International Conference on Artificial Intelligence and Statistics (PP.
27–36).
Badeau, R。, & 博耶, 右. (2008). Fast multilinear singular value decomposition for
structured tensors. SIAM Journal on Matrix Analysis and Applications, 30(3), 1008–
1021.
Bensmail, H。, & Celeux, G. (1996). Regularized gaussian discriminant analysis
through eigenvalue decomposition. Journal of the American Statistical Association,
91(436), 1743–1748.
Boik, 右. J. (2002). Spectral models for covariance matrices. Biometrika, 89(1), 159–182.
卡罗尔, J. D ., & 张, J.-J. (1970). Analysis of individual differences in multidi-
mensional scaling via an N-way generalization of “Eckart-Young” decomposi-
的. 心理测量学, 35(3), 283–319.
Flury, 乙. 氮. (1984). Common principal components in K groups. Journal of the Amer-
ican Statistical Association, 79(388), 892–898.
Flury, 乙. 氮. (1986). Asymptotic theory for common principal component analysis.
Annals of Statistics, 14(2), 418–430.
Flury, 乙. 氮. (1988). Common principal components and related multivariate models. 新的
约克: 威利.
Flury, 乙. N。, & Gautschi, 瓦. (1986). An algorithm for simultaneous orthogonal trans-
formation of several positive definite symmetric matrices to nearly diagonal
形式. SIAM Journal on Scientific and Statistical Computing, 7(1), 169–184.
Greenewald, K., 周, S。, & Hero III, A. (2019). Tensor graphical Lasso (TeraLasso).
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(5), 901–
931.
Harshman, 右. A. (1970). Foundations of the PARAFAC procedure: Models and con-
ditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in
Phonetics, 16(1), 84.
Harville, D. A. (1998). Matrix algebra from a statistician’s perspective. 纽约:
Springer-Verlag.
Jolliffe, 我. (2002). Principal component analysis. 纽约: Springer-Verlag.
Kermoal, J. P。, Schumacher, L。, Pedersen, K. 我。, Mogensen, 磷. E., & Frederiksen, F.
(2002). A stochastic MIMO radio channel model with experimental validation.
IEEE Journal on Selected Areas in Communications, 20(6), 1211–1226.
Kiers, H. A. (2000). Towards a standardized notation and terminology in multiway
分析. Journal of Chemometrics, 14(3), 105–122.
Kolda, 时间. G。, & Bader, 乙. 瓦. (2009). Tensor decompositions and applications. SIAM
审查, 51(3), 455–500.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
2880
K. Yoshikawa and S. Kawano
Lai, Z。, 徐, Y。, 陈, Q., 哪个, J。, & 张, D. (2014). Multilinear sparse principal
成分分析. IEEE Transactions on Neural Networks and Learning Systems,
25(10), 1942–1950.
Lecun, Y。, 波图, L。, 本吉奥, Y。, & Haffner, 磷. (1998). Gradient-based learning ap-
plied to document recognition. In Proceedings of the IEEE, 86(11), 2278–2324.
鲁, H。, Plataniotis, K. N。, & Venetsanopoulos, A. 氮. (2008). MPCA: Multilinear prin-
cipal component analysis of tensor objects. IEEE Transactions on Neural Networks,
19(1), 18–39.
Manly, 乙. F. J。, & 雷纳, J. C. 瓦. (1987). The comparison of sample covariance ma-
trices using likelihood ratio tests. Biometrika, 74(4), 841–847.
Martinez, A。, & Benavente., 右. (1998). The AR face database (CVC Technical Report
24).
Martinez, A. M。, & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23(2), 228–233.
公园, H。, & Konishi, S. (2020). Sparse common component analysis for multiple high-
dimensional datasets via noncentered principal component analysis. Statistical
文件, 61, 2283–2311.
皮尔逊, K. (1901). LIII. On lines and planes of closest fit to systems of points in space.
伦敦, 爱丁堡, and Dublin Philosophical Magazine and Journal of Science, 2(11),
559–572.
Pourahmadi, M。, Daniels, 中号. J。, & 公园, 时间. (2007). Simultaneous modelling of the
Cholesky decomposition of several covariance matrices. Journal of Multivariate
分析, 98(3), 568–587.
R核心团队 (2019). 右: A language and environment for statistical computing. 维也纳,
奥地利: R Foundation for Statistical Computing.
王, H。, Banerjee, A。, & Boley, D. (2011). Common component analysis for multiple
covariance matrices. In Proceedings of the 17th ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining (PP. 956–964). 纽约: ACM.
王, S。, Sun, M。, 陈, Y。, Pang, E., & 周, C. (2012). STPCA: Sparse tensor prin-
cipal component analysis for feature extraction. In Proceedings of the 21st Interna-
tional Conference on Pattern Recognition (PP. 2278–2281). 皮斯卡塔韦, 新泽西州: IEEE.
Werner, K., 扬松, M。, & Stoica, 磷. (2008). On estimation of covariance matrices
with Kronecker product structure. IEEE Transactions on Signal Processing, 56(2),
478–491.
于, K., Bengtsson, M。, Ottersten, B., McNamara, D ., Karlsson, P。, & Beach, 中号. (2004).
Modeling of wide-band MIMO radio channels based on NLOS indoor measure-
评论. IEEE Transactions on Vehicular Technology, 53(3), 655–665.
11月收到 20, 2020; accepted April 27, 2021.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d
/
我
F
/
/
/
/
3
3
1
0
2
8
5
3
1
9
8
2
2
5
6
n
e
C
哦
_
A
_
0
1
4
2
5
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3