Treed Gaussian Process Regression for Solving
Offline Data-Driven Continuous Multiobjective
Optimization Problems
Atanu Mazumdar
University of Jyvaskyla, Faculty of Information Technology, Finland
atanu.a.mazumdar@jyu.fi
Manuel López-Ibáñez
Alliance Manchester Business School, University of Manchester, UK
manuel.lopez-ibanez@manchester.ac.uk
Tinkle Chugh
Department of Computer Science, University of Exeter, UK
t.chugh@exeter.ac.uk
Jussi Hakanen
University of Jyvaskyla, Faculty of Information Technology, Finland
jussi.hakanen@jyu.fi
Kaisa Miettinen
University of Jyvaskyla, Faculty of Information Technology, Finland
kaisa.miettinen@jyu.fi
https://doi.org/10.1162/evco_a_00329
Abstract
For offline data-driven multiobjective optimization problems (MOPs), no new data is
available during the optimization process. Approximation models (or surrogates) are
first built using the provided offline data, and an optimizer, for example, a multiob-
jective evolutionary algorithm, can then be utilized to find Pareto optimal solutions to
the problem with surrogates as objective functions. In contrast to online data-driven
MOPs, these surrogates cannot be updated with new data and, hence, the approxi-
mation accuracy cannot be improved by considering new data during the optimiza-
tion process. Gaussian process regression (GPR) models are widely used as surrogates
because of their ability to provide uncertainty information. However, building GPRs
becomes computationally expensive when the size of the dataset is large. Using sparse
GPRs reduces the computational cost of building the surrogates. However, sparse GPRs
are not tailored to solve offline data-driven MOPs, where good accuracy of the surro-
gates is needed near Pareto optimal solutions. Treed GPR (TGPR-MO) surrogates for
offline data-driven MOPs with continuous decision variables are proposed in this pa-
per. The proposed surrogates first split the decision space into subregions using re-
gression trees and build GPRs sequentially in regions close to Pareto optimal solutions
in the decision space to accurately approximate tradeoffs between the objective func-
tions. TGPR-MO surrogates are computationally inexpensive because GPRs are built
only in a smaller region of the decision space utilizing a subset of the data. The TGPR-
MO surrogates were tested on distance-based visualizable problems with various data
sizes, sampling strategies, numbers of objective functions, and decision variables. Ex-
perimental results showed that the TGPR-MO surrogates are computationally cheaper
and can handle datasets of large size. Furthermore, TGPR-MO surrogates produced
solutions closer to Pareto optimal solutions compared to full GPRs and sparse GPRs.
Keywords
Gaussian processes, Kriging, regression trees, metamodelling, surrogate, Pareto
optimality.
Manuscript received: 13 April 2022; revised: 6 November 2022, 23 February 2023, 23 March 2023; accepted: 31
March 2023.
© 2023 Massachusetts Institute of Technology.
Published under a Creative Commons
Attribution 4.0 International (CC BY 4.0) license.
Evolutionary Computation xx(x): 1–25
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
–
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
1
Introduction
A multiobjective optimization problem (MOP) consists of two or more conflicting ob-
jective functions that are optimized simultaneously. The solutions of an MOP are called
Pareto optimal when the value of any objective function cannot be further improved
without degrading some of the others; thus, tradeoffs exist among the objective func-
tions. Not all real-world MOPs have analytical functions or simulation models avail-
able. Instead, the MOP may need to be formulated by utilizing the data acquired from
a phenomenon, for example, real-life processes, physical experiments, and sensors. To
solve this MOP, first the underlying objective functions are approximated by fitting surro-
gates (also known as metamodels) on the data available. Next, a multiobjective evo-
lutionary algorithm (MOEA) can be used to approximate the set of Pareto optimal
solutions by considering the surrogates as objective functions.
Data-driven optimization problems are classified into online and offline ones (Jin
et al., 2019). For online data-driven optimization, new data can be acquired during the
optimization process, for example, by conducting further experiments or simulations
to update the surrogates (Shahriari et al., 2016). In contrast, for offline data-driven opti-
mization problems, no new data can be acquired, and the optimization has to proceed by
using only the existing data (Wang et al., 2019). Ideally, the underlying objective values
of the solutions should be better than the objective values in the provided dataset. All
the available data should be utilized to build surrogates with good approximation while
solving offline data-driven MOPs. However, when the size of the data set is large, for ex-
ample, in Wang et al. (2016) and Wang and Jin (2020), building certain surrogates, such
as Gaussian processes, can become computationally expensive. Because the surrogates
cannot be updated with new data while solving offline data-driven MOPs, the approx-
imation accuracy of the surrogates directly influences the closeness of the solutions to
the Pareto optimal set. Most of the previous works on offline data-driven optimization,
such as Wang et al. (2019, 2016), Wang and Jin (2020), and Yang et al. (2020), considered
different surrogate modelling techniques for solving offline data-driven optimization
problems. However, these works did not consider the tradeoffs between the underlying
objective functions while building the surrogates. Other developed approaches, for ex-
ample, Mazumdar et al. (2019, 2022), Rahat et al. (2018), and Hughes (2001), relied on the
uncertainty in the prediction of Gaussian processes to improve the quality of solutions.
The previous works on offline data-driven optimization dealing with large datasets, for
example, Wang et al. (2016) and Wang and Jin (2020) did not quantify the uncertainty,
which is a critical component in solving offline data-driven MOPs, as shown in Mazum-
dar et al. (2019, 2020, 2022). This paper proposes an approach to build computationally
cheap surrogates with a high approximation accuracy at the region around the Pareto
optimal set. The surrogates use a combination of regression trees and Gaussian process
regression models and provide uncertainty in the prediction. The proposed surrogates
are tailored toward solving offline data-driven MOPs and are capable of handling large
datasets (tested for a maximum size of 50,000).
To study offline data-driven MOPs, data is sampled from benchmark problems for
which the Pareto optimal set is known. Knowledge of the Pareto optimal set enables
quality comparisons of the solutions obtained by different optimization approaches
and surrogates. While solving an offline data-driven MOP, generally two types of in-
dicators are used to quantify the quality of the solutions (Mazumdar et al., 2019, 2020).
The approximation accuracy is measured in root mean squared error (RMSE) between
the approximated objective values of the surrogates and the underlying objective val-
ues. In addition, the hypervolume (Zitzler and Thiele, 1998) indicator is used to measure
2
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
–
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
how close the approximated Pareto front (after evaluating with the underlying objective
functions) is to the Pareto front of the underlying MOP.
Kriging, or Gaussian process regression (GPR), (Forrester et al., 2008) is a popu-
lar choice of surrogate (Rasmussen and Williams, 2006) for solving offline problems
because it also provides information about the uncertainty in the approximation. Previ-
ous works (Hughes, 2001; Rahat et al., 2018; Mazumdar et al., 2019, 2022) have shown
that utilizing the uncertainty prediction from GPR surrogates produces solutions with
improved accuracy and hypervolume. The uncertainty prediction from GPR makes it
an attractive choice of surrogates for solving offline data-driven MOPs. Most of the ap-
proaches for solving offline data-driven MOPs build a global GPR surrogate for each
objective using all the provided data. However, full GPRs have high computational and
memory complexity when the size of the dataset (or sample size) is large. An alterna-
tive to full GPRs is to use sparse GPRs (Snelson and Ghahramani, 2005; Titsias, 2009),
which build approximation models with a small subset of data called support or induc-
ing points. In Titsias (2009), a variational method was proposed to select the inducing
points by gradient descent or greedy search algorithm. However, this method becomes
computationally expensive when the dataset is large. On the other hand, using fewer
points for building the surrogates reduces the approximation accuracy of the surrogates
compared to full GPRs (Titsias, 2009). Other works, such as Emmerich et al. (2006), pro-
posed building GPR using only K-nearest neighbor samples from the point that is to be
predicted.
While performing multiobjective optimization using surrogates, it is desirable to
obtain a good approximation accuracy at the region around the Pareto set, which is
referred to as the trade-off region in this paper. The accuracy of the surrogates in other re-
gions of the decision space (excluding the trade-off region) is not of utmost importance.
A possible approach to reduce the computational cost is to build local GPR surrogates
for the underlying objective functions exclusively in the trade-off region. Using local
GPRs reduces the overall computational cost of building GPRs while providing a good
approximation accuracy at the trade-off region. Previous works (Kim et al., 2005; Das
and Srivastava, 2010; van Stein et al., 2015) partitioned the decision space into regions
and fitted GPRs in each region. Partitioning the decision space is relatively inexpensive,
and each GPR utilizes smaller sets of data. This concept was extended by Gramacy and
Lee (2008) to fit GPRs in each region or leaf nodes partitioned by a Bayesian treed model
previously proposed by Chipman et al. (1998). In Assael et al. (2014), treed GPRs were
proposed to deal with cases where the noise is different for different samples. All the
previous works build GPRs in all the regions of the decision space. These types of GPRs
become computationally expensive, depending on the number of regions and amount
of data in each region. Moreover, these surrogates do not consider the trade-off region
of the offline data-driven MOP during the building process.
This paper proposes treed GPR surrogates for multiobjective optimization (TGPR-
MO) which have a high accuracy around the trade-off region and are tailored for solving
offline data-driven MOPs with continuous decision variables. First, computationally in-
expensive regression tree surrogates are built using the provided dataset. The regression
tree surrogates provide a less accurate approximation of the underlying objective func-
tions (compared to GPRs) (Loh, 2011) and split the decision space into regions. A region
is defined as the part of the decision space that is enclosed by the leaf node of the fitted
regression trees. The approximation accuracy of the prediction by the leaf node is the
accuracy of the region. Next, an MOEA is executed considering the regression trees as
objective functions. The solutions of the MOEA are not very accurate but provide the
Evolutionary Computation Volume xx, Number x
3
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
–
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
approximate location of the trade-off region. After a certain number of generations of
the MOEA, the accuracy of the trees’ prediction is improved by building GPRs at leaf
nodes within the trade-off region. The GPRs built improve the accuracy of the solutions
in the region of the decision space corresponding to the leaf node they replace. The final
surrogates consist of regression trees with GPRs at a few leaf nodes providing accurate
approximations exclusively in the trade-off region.
The TGPR-MO surrogates were tested on several instances of distance-based vi-
sualizable test problems (DBMOPP) with different numbers of decision variables and
objective functions. Various sizes of the initial dataset with different sampling strate-
gies were used in the tests. Numerical experiments showed that TGPR-MO surrogates
significantly reduced the building time for surrogates when using large size data while
solving offline data-driven MOPs. TGPR-MO surrogates also produced solutions with
a better hypervolume compared to sparse GPR surrogates.
The rest of the paper is arranged as follows. The background of offline data-driven
MOPs, GPRs, and regression trees is given in Section 2. The proposed TGPR-MO surro-
gates are detailed in Section 3. The experimental results consisting of a comparison of
TGPR-MO surrogates with other surrogates are presented in Section 4. Section 5 con-
cludes and discusses the future research perspectives.
2 Background
For an offline data-driven MOP, the starting point for solving the problem is (precol-
lected) data. In this paper, we assume that the available data is the output of a process
or phenomenon. The process of generating the data is referred to as the underlying ob-
jective functions of an MOP. The underlying MOP that has to be solved is considered to
be of the following form:
minimize
subject to
(f1(x), . . . , fK (x))
x ∈ (cid:2),
(1)
where K ≥ 2 is the number of objective functions and (cid:2) is the feasible region in the
decision space (cid:4)n. For a feasible decision vector x (consisting of n decision variables),
the corresponding objective vector is f(x) = (f1(x), . . . , fK (x)), where f1(x), . . . , fK (x)
are the objective (function) values.
A solution x ∈ (cid:2) dominates another solution x(cid:5) ∈ (cid:2) if fi (x) ≤ fi (x(cid:5)
) for all i =
1, . . . , K and fi (x) < fi (x(cid:5)
) for at least one i = 1, . . . , K. If a solution of an MOP is not
dominated by any other feasible solutions, it is called nondominated. Solving an MOP
using a multiobjective optimization algorithm, for example, an MOEA typically pro-
duces a set of mutually nondominated solutions. The solutions of the optimization
problem defined in Equation (1) that are nondominated in the whole set (cid:2) are called
Pareto optimal solutions.
A generic way to solve an offline data-driven MOP with an MOEA as the optimizer
is shown in Figure 1. As explained in Jin et al. (2019) and Wang et al. (2019), the solution
process can be divided into three components: (a) data collection, (b) formulating the
MOP and surrogate building, and (c) optimization with surrogates.
The optimization process starts with an offline dataset consisting of N samples. A
sample consists of a decision vector x and its corresponding objective vector f(x) as a
tuple of two matrices: (X, Y ) where X ∈ (cid:4)N×n and Y ∈ (cid:4)N×K . Each row in X and Y is
a decision vector and its corresponding objective vector, respectively. Next, surrogates
are built using all or a subset of the data. The surrogates are considered as objective
4
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
Figure 1: Flowchart of a generic offline data-driven multiobjective optimization
approach.
functions by an MOEA to solve the offline data-driven MOP. For simplicity of describing
a surrogate model for a single objective i, let yi ∈ (cid:4)N×1 be the vector of objective values
fi (x) for all decision vectors x ∈ X.
2.1 Gaussian Process Regression
In this work, a surrogate model is built for each objective function. For simplicity in
terminologies, y is considered here instead of yi to introduce the Gaussian process re-
gression (GPR) and regression trees. GPRs have been widely used in surrogate-assisted
optimization and time-series analysis (Jin et al., 2019; Chugh et al., 2019). The major
advantage of using a GPR is its ability to provide the distribution about its prediction
(or the uncertainty). A GPR is a multivariate normal distribution with a mean μ and a
covariance matrix C:
y ∼ N (μ, C).
For simplicity in calculations, a mean of zero is considered without loss of generality.
The covariance matrix C uses a covariance (or kernel) function to define correlation be-
tween two samples, x and x(cid:5)
. In this work, we use the Matérn 5/2 kernel because it
does not make the covariance functions unrealistically smooth compared to other ker-
nels such as the Gaussian (or RBF) kernel. Therefore, it is more appropriate for solving
practical optimisation problems (Snoek et al., 2012). The kernel is defined as:
(2)
⎛
⎞
⎛
⎞
κ (x, x
(cid:5), (cid:2)) = σ 2
f
⎝1 +
√
n(cid:4)
5
j =1
rj + 5
3
n(cid:4)
j =1
⎠ exp
r 2
j
√
⎝−
5
n(cid:4)
j =1
rj
⎠ + σ 2
t δxx(cid:5)
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
j
|
lj
and xj and x(cid:5)
where rj = |xj −x(cid:5)
j are the j th components of the decision vector of x
and x(cid:5), (cid:2) = (σf, l1, . . . , ln, σt) is the set of parameters in the GPR model and δxx(cid:5) is the
Kronecker delta function. The notation |xj − x(cid:5)
| represents the Euclidean distance be-
j
tween xj and x(cid:5)
j . The parameters σf, lj and σt represent the amplitude, length scale of
the j th variable, and noise in the data, respectively. For more details on the significance
of these parameters, see Rasmussen and Williams (2006).
To build a GPR model, the parameters mentioned above can be estimated by max-
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
imizing the marginal likelihood function:
(cid:7)
(cid:8)
(cid:9)C−1y
y
p(y | X, (cid:2)) =
1√
|2π C| exp
After estimating the parameters, the model defined in Equation (2) can be used for the
posterior predictive distribution at a new decision vector x∗
. The GPR model provides
a posterior predictive distribution which is also Gaussian:
∗, (cid:2))−κ (x
p (y∗|x
∗, X, (cid:2))C−1y, κ (x
∗, X, y, (cid:2)) = N
(cid:9)C−1κ (X, x
(cid:10)
. (4)
∗, X, (cid:2))
(cid:9)
κ (x
− 1
2
∗, (cid:2))
∗, x
(3)
.
Evolutionary Computation Volume xx, Number x
5
329
A. Mazumdar et al.
The posterior mean in the equation above is κ (x∗, X, (cid:2))C−1y, and the variance repre-
senting the uncertainty is κ (x∗, x∗, (cid:2)) − κ (x∗, X, (cid:2))
(cid:9)C−1κ (X, x∗, (cid:2)).
2.2 Regression Trees
A regression tree recursively partitions the decision space such that the provided sam-
ples with similar objective function values are grouped together (Loh, 2011). Each
node φ of the tree contains data Qφ = (Xφ, yφ ) with N φ samples, where each row of
Xφ ∈ (cid:4)N φ ×n is a decision vector xi and each row of yφ ∈ (cid:4)N φ ×1 is its corresponding objec-
tive value yi, i = 1, . . . , N φ. Node φ is split into nodes φleft and φright according to param-
eter θ = (j, t ), which specifies a j th decision variable (1 ≤ j ≤ n) and a threshold t, by
partitioning Qφ into two disjoint subsets Qφleft
) | (xi, yi ) ∈ Qφ ∧ xij ≤ t}
= {(Xφleft , yφleft
and Qφright
. That is, split parameter θ partitions the samples (rows) in Qφ
according to the value of variable xj of each decision vector x ∈ Xφ.
= Qφ \ Qφleft
θ
θ
θ
Given a node φ and a split parameter θ , the quality of the split or the total loss is
calculated as:
G(Qφ, θ ) = N φleft
N φ
H (Qφleft
θ
) + N φright
N φ
H (Qφright
θ
),
(5)
are the number of samples in Qφleft
where N φleft
H (Qφ ) is a loss function. For instance, the loss function for mean-squared error is:
and Qφright
and N φright
, respectively, and
θ
θ
H (Qφ ) = 1
N φ
(cid:4)
yi ∈Qφ
(yi − ¯yφ )2,
(6)
where ¯yφ is the mean objective value in Qφ. An optimal split θ ∗
of a node φ can be
found by minimizing the loss function defined in Equation (5) using a single objective
optimization algorithm. In the proposed TGPR-MO surrogates, the splitting process is
recursed for both Qφleft
θ ∗ until splitting a node would produce a leaf node with
less than a predefined minimum number of samples, Nmin.
θ ∗ and Qφright
For predicting any given decision variable value, the regression tree is traversed to
the respective leaf node. The prediction of the tree at the leaf node l, which contains
training subset Ql with N l number of samples, is ¯yl = 1
N l
yi ∈Ql yi.
(cid:11)
3 Treed GPRs for Multiobjective Optimization (TGPR-MO)
The time complexity of a full GPR is cubic, which is polynomial. A suitable alternative
to reduce the computational cost is to split the dataset and build GPRs exclusively in the
trade-off region. The proposed TGPR-MO surrogates are based on this modelling ap-
proach. The following subsections describe the building process of the proposed TGPR-
MO surrogates. A detailed accuracy and complexity analysis and comparison with full
GPRs and sparse GPRs are also provided.
3.1
Building the TGPR-MO Surrogates
In a generalized treed GPR surrogate, first a regression tree model is built using the pro-
vided data as described in Section 2.2. The splitting at the nodes is done by minimizing
the total loss function (Equation 5). The subset of data at the lth leaf node is Ql, where
l = 1, . . . , L and L is the total number of leaf nodes in the regression tree built. A GPR
is fitted at every leaf node of the regression tree using the data Ql = (Xl, yl ). The GPRs
are built in a similar fashion as described in Section 2.1 by maximizing the marginal
6
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
(cid:9)
y∗|x∗, Xl, yl, (cid:2)
likelihood at the leaf node l, as defined in Equation (3). The posterior predictive distri-
(cid:10)
bution of the lth GPR is p
, as defined in Equation (4). Building GPRs
with smaller subsets of data reduces the overall cost of building surrogates compared
to building a GPR with the entire dataset. However, building GPRs at all the leaf nodes
becomes expensive when there are too many of them, and the data subset at each leaf
node is large. Moreover, for solving an offline data-driven MOP, an accurate approxi-
mation of the global landscape of the underlying objective functions is not required. A
good approximation of the local landscape near the trade-off region is sufficient.
While performing multiobjective optimization using surrogates, the approxima-
tion accuracy in the trade-off region is crucial and directly influences the quality of the
approximated Pareto optimal solutions. Hence, to obtain better quality solutions, it is
desirable to have accurate approximations in the trade-off region. While building the
regression trees, the splits at each node divide the decision space. The leaf nodes of
the regression trees are the smallest regions the decision space is split into. In addition,
the initial approximation of regression tree surrogates (though not highly accurate) pro-
vides information about the trade-offs between the objective functions. After building
the regression trees with all the provided data, an MOEA is run considering them as
objective functions. The solutions found by the MOEA are not accurate, but they pro-
vide an approximation of the Pareto front. The accuracy of the trade-off region is the
approximation accuracy of the region enclosed by the leaf nodes that predicts the ap-
proximated Pareto front. Later, local GPRs are built exclusively in the leaf nodes repre-
senting the trade-off region and achieve an accurate approximation only in the neigh-
borhood of the Pareto set. Next, the algorithm to build the treed GPR surrogates for
multiobjective optimization (TGPR-MO) tailored to solve offline data-driven MOPs is
introduced.
Algorithm 1 starts with a dataset containing N samples with K objective functions
and n decision variables. First, K regression trees (one per objective function) are built
with all the provided data. Just the parameter Nmin is adjusted, and the depth of the
trees is not controlled. After building the regression trees, there can be a maximum of
N
leaf nodes. Initially, there are no GPRs present at the leaf nodes, and the predictions
Nmin
are from the regression trees (i.e., ¯yl). Next, a population is initialized and an MOEA is
executed that iteratively builds GPRs at specific leaf nodes. Each iteration consists of
running the MOEA for Gmax generations (that is a predefined parameter) and building
a GPR at one leaf node in every tree. In every generation, offspring individuals are
produced using crossover and mutation operators and evaluated using the treed GPRs.
Selection of individuals is then performed using MOEA-specific selection criterion, and
the process is repeated for Gmax generations within an iteration. The inner workings of
the proposed algorithm are exemplified in two-dimensional decision spaces in Section
1 and Figure 1 of the supplementary material.
After the MOEA completes Gmax generations, the approximation errors of the solu-
tions (for each objective) are compared. The comparison is done by first calculating the
loss function value, here the mean squared error, as defined in Equation (6), of the leaf
node predicting the objective values of the solutions found by the MOEA. For the treed
GPR surrogate corresponding to the j th objective, let lj
s denote the leaf nodes
containing the s solutions found by the MOEA and let H (Qlj
s ) denote the
loss function value of those leaf nodes, then one GRP is built at the leaf node i∗
with the
i ), using its subset of samples Qlj
maximum loss value, that is, i∗ = arg maxi=1,...,sH (Qlj
i∗ .
If multiple solutions fall within the same leaf node, the loss value is calculated once and
only one GPR is built for that leaf node. The process of building GPRs is repeated for
1 ), . . . , H (Qlj
1 , . . . , lj
Evolutionary Computation Volume xx, Number x
7
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
8
Evolutionary Computation Volume xx, Number x
329
Treed GPR for Offline Data-Driven MOPs
j = 1, . . . , K treed GPRs. The process of adding GPRs to the leaf nodes is repeated until
any of the following criteria are met:
• Number of iterations has reached the predefined maximum of Imax.
• All the solutions in the MOEA’s population fall in the leaf nodes that have
GPRs built.
The regression trees perform two tasks: (1) they provide an approximation of the
underlying objective functions, and (2) they split the decision space into smaller regions.
Using regression trees over other clustering algorithms is advantageous since they are
optimized for reducing the prediction error. Other clustering algorithms, for example,
K-means clustering and mean shift clustering, utilize a loss function in the decision
space and not in the objective space. Therefore, regression trees provide a good predic-
tion of the approximated objective functions in addition to splitting the decision space
into subregions. In the initial iterations, the MOEA first finds solutions using the ap-
proximation provided by the regression tree surrogates (that may have poor accuracy).
These solutions have a higher approximation error, but they provide information about
the trade-off between the objective functions. To improve the approximation accuracy,
GPRs are built for solutions provided by the trees that have the maximum approxima-
tion error. This ensures that GPRs are built exclusively in the region of the Pareto set and
where the tree’s approximation is the worst, simultaneously. In later iterations, if the de-
cision vector in the MOEA’s population falls within the path of the leaf node where a
GPR is already built, the posterior predictive mean of the GPR is used as the final pre-
diction of the surrogate. Otherwise, the prediction is the mean value at the leaf node of
the regression trees. Local GPRs are built sequentially because building a local GPR at a
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 2: The flowchart for solving offline data-driven MOPs with TGPR-MO surrogates
with an MOEA.
Evolutionary Computation Volume xx, Number x
9
329
A. Mazumdar et al.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 3: Progress of the building process of TGPR-MO surrogates for a bi-objective
DTLZ2 (a)–(c) and a DBMOPP (d)–(f) problem. The plots show the solutions (in blue)
of the MOEA (predicted by the treed GPR surrogates) at different iterations I of the
building process. The Pareto front of the underlying MOP is shown in orange ‘(cid:8)’.
leaf node improves the accuracy of the solutions in the following iteration. Thus, many
solutions found in the previous iteration (belonging to different leaf nodes of the trees)
may be eliminated as newer and better solutions are discovered. Eliminating solutions
reduces the number of local GPRs required to be built to approximate the underlying
objective functions.
The flowchart for solving offline data-driven MOPs with TGPR-MO surrogates is
shown in Figure 2. The approach starts with a dataset, and regression trees are built
for each objective. The blocks within the dotted lines represent the building process
of TGPR-MO surrogates as described in Algorithm 1. After Algorithm 1 is completed,
the surrogates are composed of regression trees with GPRs at some of the leaf nodes.
The built TGPR-MO surrogates are then used for multiobjective optimization using a
second MOEA. Or the MOEA used in Algorithm 1 may simply continue to run from
the last population for an additional number of generations without further modifying
the surrogates.
The building process of TGPR-MO surrogates is illustrated in Figure 3. The figure
shows the solutions obtained by the MOEA in some of the iterations of Algorithm 1 for
two bi-objective test problems. The two problems are DTLZ2 with n = 10 (Figure 3a–
3c) from the DTLZ (Deb et al., 2005) test suite and a distance-based visualizable test
10
Evolutionary Computation Volume xx, Number x
329
Treed GPR for Offline Data-Driven MOPs
Table 1: Configurations of the DBMOPP problems used.
Problem
Configuration
Dimension (n) Objectives (K )
P1
P2
P3
P4
number of disconnected set regions = 0,
number of local fronts = 0,
number of dominance resistance regions = 0,
number of discontinuous regions = 0
number of disconnected set regions = 1,
number of local fronts = 0,
number of dominance resistance regions = 0,
number of discontinuous regions = 0
number of disconnected set regions = 2,
number of local fronts = 0,
number of dominance resistance regions = 0,
number of discontinuous regions = 0
number of disconnected set regions = 0,
number of local fronts = 0,
number of dominance resistance regions = 1,
number of discontinuous regions = 0
2, 5, 7, and 10
3, 5, and 7
2, 5, 7, and 10
3, 5, and 7
2, 5, 7, and 10
3, 5, and 7
2, 5, 7, and 10
3, 5, and 7
problem (DBMOPP) (Fieldsend et al., 2019) with n = 2 (Figure 3d–3f). The problem con-
figuration for the DBMOPP problem was the same as problem P1, which can be found
in Table 1. For each problem, a dataset of size N = 2000 was generated by Latin hyper-
cube sampling (LHS). The MOEA used was RVEA (Cheng et al., 2016) with standard
parameter settings. The first column shows the solutions obtained by the MOEA in the
first iteration before any GPRs are built at the leaf nodes of the trees. As the objective
values of the solutions are predicted by the regression trees, the obtained solutions do
not form a smooth Pareto front. In the second column, it can be observed that the solu-
tions are obtained after four more iterations. Few GPRs are built at the leaf nodes, and
the approximated Pareto front gradually becomes more smooth. At this iteration, the
objective values of the solutions are predicted by the regression trees and GPRs at leaves
for either or both the objective functions. The last column shows the final iteration of
the surrogate building process, when all solutions in the MOEA’s population are pre-
dicted by leaf nodes that contain GPRs. Building the surrogates for the DTLZ2 problem
consumes more iterations than the DBMOPP problem instance. This behavior is due to
the Pareto set of DTLZ problems that is located in a larger region of the decision space.
Hence, more GPRs are required to be built at the leaves of the trees to approximate the
tradeoff region accurately.
Figure 4 shows an illustration of the leaf node that is considered for building GPRs
for two subsequent iterations in a bi-objective minimization problem. The iteration
I ∈ {1, 2} and the objective j ∈ {1, 2} are also indicated. The red-colored leaf nodes are
chosen for building GPRs at each iteration. In the first iteration, the solutions are pre-
dicted by nodes 4 and 5 for the first objective and node 5 for the second objective. Node
4 and node 5 for the first and second objectives’ tree have the highest RMSE. Hence,
GPRs are built using the subset of data in those leaf nodes. In the second iteration, the
solutions predicted by the trees are from node 4 and node 5 (with the GPR built) for
the first objective. For the second objective, the solutions are predicted by leaf node 4
Evolutionary Computation Volume xx, Number x
11
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 4: Illustration of the building process of GPRs at leaf nodes of the regression trees
in two subsequent iterations. Red leaf nodes denote a GPR was built using the subset
of data.
and node 5 (with the GPR built). The process of building GPRs is repeated at node 4
and node 5 for the first and second objective’s trees, respectively. The process will be
repeated until the stopping criteria mentioned before are satisfied.
3.2 Accuracy Analysis
The fitness landscape of a bi-objective DBMOPP problem with n = 2 (with the same
configuration as P1 in Table 1) is illustrated in Figure 5. The figure shows the accuracy
of the approximation of full GPR, sparse GPR, and TGPR-MO surrogates. The accuracy
is measured as the RMSE (described in subsection 4.1) between the approximated ob-
jective values of the surrogates and the evaluated values of the underlying objective
functions. The number of samples in the dataset was N = 2000 with Latin hypercube
sampling (LHS). The contour lines represent the underlying objective functions’ land-
scape in Figure 5a and the approximated objective functions’ landscape in Figures 5b–
5d. A further close-up of the tradeoff regions for the sparse GPR and TGPR-MO
surrogates are shown in Figures 5e and 5f, respectively. The color grading represents
12
Evolutionary Computation Volume xx, Number x
329
Treed GPR for Offline Data-Driven MOPs
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 5: Contour plots of the (a) underlying objective function, landscape approxi-
mated by (b) full GPRs, (c) sparse GPRs, and (d) the proposed TGPR-MO surrogates.
Close up of the Pareto set for (e) sparse GPRs and (f) the proposed TGPR-MO surro-
gates. The contour lines (in solid and dotted) indicate the value of objective functions
f1(x) and f2(x), respectively. Color shade represents the RMSE in the approximation
(darker is higher RMSE), and the blue dots show the Pareto set.
Evolutionary Computation Volume xx, Number x
13
329
A. Mazumdar et al.
the RMSE between the approximated and the underlying objective functions, and the
blue dots represent the Pareto set.
It can be observed that full GPRs (that use all the data) in Figure 5b have the highest
accuracy in all the approximated regions of the decision space. For the sparse GPR sur-
rogates in Figure 5c, the accuracy is good in all the regions with deterioration near the
Pareto set. In the case of the proposed TGPR-MO surrogates in Figure 5d, the accuracy
is low in most of the regions of the decision space except near the Pareto set. It can also
be observed that the contour lines are linear in most regions because of the prediction
provided by the leaf nodes of the trees. However, near the Pareto set the GPRs at the
leaf nodes are used for predictions, thus making the contours nonlinear. Comparing
the accuracy near the Pareto set for sparse GPR and TGPR-MO surrogates in Figures 5e
and 5f, it is evident that TGPR-MO surrogates provide a better approximation accuracy
compared to sparse GPR surrogates.
3.3 Complexity
While building TGPR-MO surrogates, a maximum of K GPRs are built in every itera-
− 1 samples, because a split at the node of a tree
tion. These GPRs can have Nmin to 2Nmin
occurs when the subset size is larger than or equal to 2Nmin. Thus every leaf node has at
− 1 samples. As these local GPRs have a
least Nmin samples and a maximum of 2Nmin
smaller number of samples, the computational cost is low. The complexity of building
TGPR-MO surrogates can vary depending on the function landscape, the number of
decision variables, and the provided data. The worst-case complexity will be achieved
− 1 samples
when GPRs are built at all the leaf nodes of the regression tree with 2Nmin
− 1)3).
at each leaf node. The complexity of building a GPR at a leaf node is O((2Nmin
As the total number of leaf nodes (in the worst case) is
−1 , the total complexity of
− 1)2). For an MOP with K objective
building GPRs at all the leaf nodes is O(N (2Nmin
− 1)2) with a memory com-
functions, the worst case time complexity is O(KN (2Nmin
plexity of O(KN (2Nmin
− 1)).
N
2Nmin
Even in the worst case, the computational cost of building TGPR-MO surrogates
is significantly smaller than building full GPRs that have time and memory complex-
ities of O(N 3) and O(N 2), respectively. The complexity for building sparse GPRs with
M (where M < N) inducing points is O(KN M 2). However, as mentioned previously,
finding the induction points uses a gradient descent or greedy search algorithm that
becomes expensive when the sample size is large, increasing the overall computational
cost. In contrast, the complexity of TGPR-MO surrogates during optimization is primar-
ily due to evaluating the individuals using the regression trees and GPRs at the leaves,
which is significantly lower. The optimization results and comparisons of the proposed
TPGR-MO surrogates with other surrogate models are shown in the next section.
4
Experimental Results
The goal of the proposed TGPR-MO surrogates is to build computationally cheaper
surrogates when dealing with large datasets for solving offline data-driven MOPs. This
section reports the tests performed to find whether TGPR-MO surrogates have signifi-
cant improvement over sparse GPRs and full GPRs in computation time and the quality
of the solutions (in hypervolume and accuracy). The starting point of the experiments
was data generated from distance-based visualizable test problems (DBMOPP) (Field-
send et al., 2019) for a better understanding of the behavior of the surrogates. For
the DBMOPP test problems, the solutions in the decision and objective spaces can be
simultaneously visualized. Thus, the search behavior can be observed with the progress
14
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
of the optimization process. In addition, the complexity of these problems can be con-
trolled by the features (e.g., varying density, number of local fronts, dominance resis-
tance regions, numbers of objective functions, and decision variables). These advan-
tages provide more control over the problems and make them adjustable to reflect the
needs of real-world optimization problems. These features of DBMOPP test problems
prove to be more advantageous compared to DTLZ (Deb et al., 2005) benchmark prob-
lems. A detailed description of the experiment settings, results, and in-depth analysis
are provided in the following subsections.
4.1
Experimental Setup
All the approaches for solving the offline data-driven MOP were coded in Python uti-
lizing the DESDEO framework (Misitano et al., 2021) (https://desdeo.it.jyu.fi).1 The
experiments were executed on one node of an HPC cluster equipped with AMD Rome
CPUs, each node having 128 cores running at 2.6 GHz, with 256 GiB of memory. Each
individual run was executed on one CPU core and allocated a maximum memory us-
age of 2 GiB. The regression tree was built using the sklearn Python package (Pedregosa
et al., 2011). For building the GPRs at the leaf nodes, the GPy (GPy, 2012) Python package
was used.
Benchmark Problems
4.1.1
Four DBMOPP problems P1–4, as shown in Table 1, were used. The problem instances
and data were generated by the code provided by Fieldsend et al. (2019). All combi-
nations of numbers of objective functions (K ∈ {3, 5, 7}) and numbers of decision vari-
ables (n ∈ {2, 5, 7, 10}) were used for the tests. The total number of problem instances
tested was 48, and every problem instance represented a different type of problem
characteristic.
4.1.2 Datasets
For generating the data, Latin hypercube sampling (LHS) and multivariate normal sam-
pling (MVNS) (Forrester et al., 2008) were used. In MVNS sampling, the objective func-
tions were considered independent with mean at the mid-point of the decision space,
that is, zero for DBMOPP problems. The variance of the sampling distribution was set to
0.1 for all the objective functions. Using MVNS sampling tests the ability of the surrogate
models and optimization algorithm to handle biased (or skewed) datasets (Mazumdar
et al., 2022). Sample sizes of initial data (N ∈ {2000, 10000, 50000}) were chosen for the
tests. A total of 288 cases were tested, and 31 sets of data with a random seed were
generated for each test case. Each of these datasets was the starting point of the three
different surrogate models that were tested. These individual runs were independent,
and the results were used to compare the surrogates’ performances statistically.
Settings for TGPR-MO Surrogates
4.1.3
The MOEA for building TGPR-MO surrogates was RVEA (Cheng et al., 2016), a
decomposition-based MOEA (Zhang and Li, 2007; Deb and Jain, 2014). Decomposition-
based MOEAs have shown to be effective in solving offline data-driven MOPs with
more than three objective functions. However, the surrogate is not limited to RVEA and
one can use any MOEA of choice. The parameters for RVEA were kept the same as sug-
gested by Cheng et al. (2016). The parameter Gmax was set to 50. The maximum number
10n . Such a value of Imax was chosen considering
of iterations was set to Imax
= N
Nmin
= N
1Source code available at https://github.com/industrial-optimization-group/TreedGP_MOEA
Evolutionary Computation Volume xx, Number x
15
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
one GPR is built at a leaf node for all the trees in each iteration (as the maximum number
of leaf nodes is N
). The loss function used for building the trees was MSE as mentioned
Nmin
= 10n. These parameters were chosen because sufficient de-
in Equation (6) with Nmin
sign points are necessary for building GPRs at the leaves. Based on recommendations
from the GPR literature (Chapman et al., 1994; Jones et al., 1998), at least 10n points are
required at each leaf node. The kernel used for building the GPRs at the leaf nodes was
Mátern 5/2 with automatic relevance determination enabled.
4.1.4 Other Surrogates Tested
The proposed TGPR-MO surrogates were compared with sparse GPR and full GPR
surrogates. The same GPy package and kernel were used for building both of these
surrogates as for the TGPR-MO surrogates. For sparse GPR surrogates, the number of
induction points was set to M = 10n. Here, the overall process of solving an offline data-
driven MOP with a specific surrogate is referred to as an approach for simplicity. It should
be noted that random forest was not considered in the tests as the focus of this paper is
on GPR surrogates.
Parameter Settings of MOEA (for Solving the Offline Data-Driven MOP)
4.1.5
For solving the offline data-driven MOP with surrogates as objective functions, RVEA
with the same default parameter settings was used. The termination criterion for RVEA
was 1,000 generations, which was sufficient for all the approaches tested to converge. To
generate a uniform Pareto front, the reference vectors are rearranged or adapted after a
certain number of generations in RVEA. The reference vector adaptation rate was set to
once every 100 generations.
Performance Indicators
4.1.6
The quality of the solutions obtained by the MOEA was measured in terms of their
hypervolume (HV) after evaluating them with the underlying objective functions of
1 , y∗
the test problems. For computing the HV indicator, the reference point of y∗ = (y∗
2 ,
. . . , y∗
K ) in the objective space was chosen, such that it is dominated by all the solutions.
In this paper, the reference point used was y∗ = (2
K ) because this
point is always dominated in DBMOPP problems. The multivariate RMSE of the ob-
jective values of the solutions obtained by the MOEA with their respective underlying
objective values was used to measure the accuracy. The multivariate RMSE is the Eu-
clidean distance between the approximated and evaluated underlying objective func-
j =1( ˆfj,i − fj,i )2, where s is the
tion values of the solutions and is given by 1
s
number of solutions, ˆfj,i and fj,i are the approximated and the underlying objective
value, respectively, for the ith solution and j th objective. Finally, the computational
cost of the various methods was measured as the time taken in seconds to build the
surrogates.
K, . . . , 2
(cid:12)(cid:11)
K, 2
s
i=1
(cid:11)
√
√
√
K
In this paper, multivariate RMSE is referred to as RMSE for simplicity. The hyper-
volume and RMSE indicators are used here to benchmark the performance of TGPR-
MO surrogates for solving an offline data-driven MOP. Evaluating the solutions with
the underlying objective functions in an offline data-driven MOP may not be possible
in real life.
4.2 Results and Discussions
While running the experiments, it was observed that building full GPR surrogates with
10,000 and 50,000 sample sizes consistently gave out-of-memory errors. The storage
16
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
complexity of full GPRs is O(KN 2), and each element of the array is a 64-bit float. Hence,
the memory requirement for three objective functions and sample sizes of 10,000 and
50,000 is 2.3 GiB and 56 GiB, respectively. Thus, building GPR surrogates using all the
provided offline data will become almost impossible with readily available computing
resources when the sample size is large. Hence, the results for full GPR surrogates are
not included for sample sizes of 10,000 and 50,000.
A pairwise Wilcoxon two-tailed significance test was conducted to compare the
performance of the different approaches. The calculated p-values were Bonferonni cor-
rected, and α = 0.05 was considered for rejecting the null hypothesis (an approach is
not significantly better or worse than another approach). The median values were com-
pared to determine whether an approach is significantly better or worse than another
one, provided that the p-value is less than α. A pairwise comparison of the approaches
with a scoring system was used for ranking the approaches. An approach is given a
score of +1 if it is significantly better than the other approach. A score of −1 is given
to the approach if it is significantly worse than the other approach. If the approach is
not significantly better or worse than the other approach, a score of zero is given to
both approaches. The sum of the scores is used for ranking all the approaches (a higher
score gives a better rank) for the indicator being compared. A rank of “1” indicates that
an approach has performed significantly better than all other approaches. Equal ranks
indicate that those approaches are not significantly different in their performance.
The performance of the approaches is summarized in Table 2. The ranks of the ap-
proaches for three different indicators are color-coded in green, yellow, and red in the
order of best to worst. It should be noted that these rankings are categorized with re-
spect to the sample size and sampling strategies. The total number of instances is 96
(48 for LHS and MVNS each) for each sample size. Each instance consists of the com-
bination of the various problem settings, that is, the number of samples (N ), sampling
strategy, number of objective functions (K), number of decision variables (n), and prob-
lem configuration. The number of instances in which full GPRs or sparse GPRs perform
better, worse, or not significantly different compared to the proposed TGPR-MO surro-
gates is denoted by “+,” “-,” and “≈,” respectively. For TGPR-MO surrogates, only the
number of instances where it performed significantly better than both full GPRs and
sparse GPRs (or it ranked the best) is shown.
The rows measuring “Time” in Table 2 show that the proposed TGPR-MO surro-
gates were computationally the cheapest compared to the full GPR and sparse GPR for
different sample sizes and sampling strategies. The number of instances TGPR-MO per-
formed significantly better than the other two surrogates in each subcategory is close to
48 (that was the total number of instances in each subcategory). The TGPR-MO surro-
gates performed better than sparse GPR surrogates in hypervolume for all sample sizes
and sampling strategies. However, full GPRs performed the best in hypervolume for a
sample size of 2,000. For 2,000 samples, sparse GPRs outperformed TGPR-MO surro-
gates for both sampling strategies in RMSE, and full GPR performed the best. The RMSE
of the solutions for TGPR-MO surrogates was better than sparse GPRs for sample sizes
of 10,000 and 50,000 for LHS sampling only, whereas for MVNS sampling, the sparse
GPR slightly outperformed TGPR-MO surrogates for 10,000 and 50,000 samples.
For a smaller sample size, one may choose full GPRs because they give the best per-
formance in hypervolume and RMSE. However, full GPRs become almost impossible
to build due to their high computational cost when the sample size is large. One can
use TGPR-MO surrogates for larger sample sizes because they perform better in hy-
pervolume and RMSE and excellently in computation time compared to sparse GPRs.
Evolutionary Computation Volume xx, Number x
17
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
Table 2: Summary of pairwise comparison of the hypervolume (HV), RMSE, and time
(seconds) for the different approaches. The number of instances in which full GP and
sparse GP surrogates perform better, worse, and not significantly different compared
to the TGP-MO surrogates is indicated by “+,” “-,” and “≈,” respectively. The full GP
runs failed due to memory overflow in the instances marked “—.” The approaches’
performance is ranked by the color code green, yellow, and red in the order of best to
worst, respectively.
Sample
size
Sampling
strategy
Indicator
Full GP
+ / - / ≈
Sparse GP
+ / - / ≈
TGP-MO
+
Surrogate type
2,000
LHS
MVNS
10,000
LHS
MVNS
50,000
LHS
MVNS
HV
RMSE
Time (s)
HV
RMSE
Time (s)
HV
RMSE
Time (s)
HV
RMSE
Time (s)
HV
RMSE
Time (s)
HV
RMSE
Time (s)
9 / 9 / 30
43 / 2 / 3
0 / 48 / 0
17 / 14 / 17
34 / 7 / 7
0 / 48 / 0
—
—
—
—
—
—
—
—
—
—
—
—
4 / 38 / 6
19 / 21 / 8
1 / 46 / 1
14 / 28 / 6
20 / 16 / 12
1 / 47 / 0
4 / 33 / 11
19 / 23 / 6
0 / 48 / 0
11 / 29 / 8
21 / 16 / 11
0 / 48 / 0
3 / 35 / 10
15 / 24 / 9
0 / 48 / 0
10 / 29 / 9
23 / 19 / 6
0 / 48 / 0
9
1
46
12
2
47
33
23
48
29
16
48
35
24
48
29
19
48
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
However, for non-uniform sampling strategies, for example, MVNS, TGPR-MO surro-
gates suffer slightly in RMSE compared to sparse GPR surrogates. Sparse GPR surro-
gates have better RMSE than TGPR-MO surrogates because the variational parameters
are selected by minimizing the KL divergence. Thus, the inducing inputs selected are
not skewed even if the provided dataset is, for example, in MVNS sampling.
The performances of a few selected instances are shown in Table 3. The table shows
the median and the standard deviation of hypervolume, RMSE, and time taken to build
the surrogates for the three different approaches for 31 runs. The instances shown in the
table were chosen based on the maximum difference between the median hypervolume
of the best and the second best performing surrogates. One instance was selected from
each sample size, sampling strategy, and number of objective functions. The figures in
bold represent the best performing approaches. It can be observed that the proposed
TGPR-MO surrogates performed the best in building time in the instances shown. It
can also be observed that TGPR-MO surrogates produce solutions with an improved
hypervolume and RMSE for sample sizes of 10,000 and 50,000 compared to sparse GPR
surrogates. It can be observed that the building times of TGPR-MO surrogates are less
18
Evolutionary Computation Volume xx, Number x
329
Treed GPR for Offline Data-Driven MOPs
-
O
M
R
P
G
T
1
0
+
E
0
4
1
.
1
0
+
E
0
2
1
.
1
0
+
E
2
6
2
.
1
0
+
E
2
1
1
.
1
0
+
E
3
8
2
.
1
0
+
E
3
5
.
1
1
0
+
E
1
3
1
.
0
0
+
E
5
4
2
.
1
0
+
E
9
3
1
.
0
0
+
E
0
0
8
.
1
0
+
E
7
2
1
.
1
0
+
E
9
6
1
.
1
0
+
E
9
4
2
.
0
0
+
E
0
8
.
8
1
0
+
E
3
9
5
.
1
0
+
E
1
1
.
2
1
0
+
E
8
7
3
.
1
0
+
E
2
3
1
.
1
0
+
E
2
4
1
.
0
0
+
E
2
5
1
.
1
0
+
E
3
7
4
.
1
0
+
E
9
1
1
.
1
0
+
E
4
9
8
.
1
0
+
E
2
0
3
.
)
s
(
e
m
T
i
e
s
r
a
p
S
R
P
G
1
0
+
E
4
6
8
.
1
0
+
E
6
1
2
.
2
0
+
E
1
5
5
.
2
0
+
E
3
4
1
.
2
0
+
E
0
1
.
2
1
0
+
E
6
4
3
.
2
0
+
E
6
5
3
.
2
0
+
E
9
1
1
.
2
0
+
E
1
5
5
.
1
0
+
E
4
6
7
.
2
0
+
E
6
3
.
4
1
0
+
E
2
3
6
.
3
0
+
E
5
4
1
.
2
0
+
E
2
2
2
.
3
0
+
E
2
5
6
.
3
0
+
E
4
2
1
.
3
0
+
E
7
3
9
.
3
0
+
E
8
3
1
.
3
0
+
E
2
1
4
.
2
0
+
E
9
6
6
.
3
0
+
E
8
3
2
.
2
0
+
E
5
6
4
.
3
0
+
E
1
5
3
.
2
0
+
E
4
5
6
.
l
l
u
F
R
P
G
2
0
+
E
6
3
1
.
1
0
+
E
5
8
1
.
2
0
+
E
6
1
6
.
2
0
+
E
0
4
1
.
2
0
+
E
0
2
3
.
1
0
+
E
8
3
5
.
2
0
+
E
5
2
3
.
1
0
+
E
6
7
6
.
2
0
+
E
4
3
.
5
2
0
+
E
1
0
1
.
2
0
+
E
7
5
3
.
1
0
+
E
0
4
4
.
—
—
—
—
—
—
-
O
M
R
P
G
T
E
S
M
R
e
s
r
a
p
S
R
P
G
l
l
u
F
R
P
G
-
O
M
R
P
G
T
e
s
r
a
p
S
R
P
G
l
l
u
F
R
P
G
e
m
u
l
o
v
r
e
p
y
H
1
0
-
E
7
6
.
2
1
0
-
E
8
3
.
1
1
0
-
E
3
3
.
9
1
0
-
E
6
6
.
1
1
0
-
E
6
6
.
1
1
0
-
E
1
1
.
1
1
0
-
E
5
5
.
8
1
0
-
E
9
9
.
1
0
0
+
E
4
0
.
2
1
0
-
E
8
7
.
2
1
0
-
E
0
8
.
9
1
0
-
E
8
3
.
2
1
0
-
E
0
5
.
1
2
0
-
E
5
0
.
5
1
0
-
E
9
0
.
7
1
0
-
E
8
2
.
2
1
0
-
E
3
7
.
9
2
0
-
E
2
8
.
8
1
0
-
E
9
0
.
9
1
0
-
E
2
5
.
1
1
0
-
E
3
8
.
1
1
0
-
E
2
2
.
1
1
0
-
E
2
2
.
1
1
0
-
E
2
4
.
1
0
0
+
E
4
0
.
1
1
0
-
E
2
7
.
5
2
0
-
E
8
1
.
4
1
0
-
E
4
7
.
2
0
0
+
E
9
7
.
1
0
0
+
E
5
7
.
1
1
0
-
E
8
3
.
4
0
0
+
E
6
1
.
2
1
0
-
E
0
7
.
9
0
0
+
E
2
5
.
1
1
0
-
E
9
0
.
6
0
0
+
E
6
7
.
2
1
0
-
E
9
5
.
1
0
0
+
E
6
7
.
3
1
0
-
E
5
5
.
8
1
0
-
E
5
9
.
8
1
0
-
E
5
8
.
4
0
0
+
E
0
1
.
2
1
0
-
E
2
2
.
2
0
0
+
E
6
5
.
1
1
0
-
E
4
8
.
1
0
0
+
E
8
6
.
1
1
0
-
E
0
4
.
3
0
0
+
E
2
3
.
2
0
0
+
E
0
1
.
1
0
0
+
E
1
9
.
2
0
0
+
E
4
3
.
1
1
0
-
E
9
6
.
3
2
0
-
E
1
1
.
5
2
0
-
E
6
2
.
8
0
0
+
E
8
0
.
1
1
0
-
E
5
6
.
4
0
0
+
E
6
8
.
1
1
0
-
E
5
9
.
1
1
0
-
E
0
8
.
7
1
0
-
E
7
0
.
2
—
—
—
—
—
—
1
0
+
E
8
4
.
7
0
0
+
E
3
7
.
2
4
0
+
E
4
6
.
1
2
0
+
E
9
5
.
5
6
0
+
E
3
2
.
9
4
0
+
E
2
3
.
9
1
0
+
E
7
2
.
6
0
0
+
E
0
1
.
5
4
0
+
E
0
3
.
1
3
0
+
E
0
2
.
1
6
0
+
E
2
8
.
8
5
0
+
E
0
3
.
5
1
0
+
E
8
5
.
7
1
0
-
E
3
9
.
5
4
0
+
E
5
6
.
1
2
0
+
E
6
2
.
8
6
0
+
E
9
5
.
8
5
0
+
E
1
6
.
2
1
0
+
E
7
2
.
6
0
0
+
E
1
3
.
3
4
0
+
E
5
7
.
1
2
0
+
E
5
1
.
3
6
0
+
E
4
2
.
9
5
0
+
E
1
0
.
1
1
0
+
E
9
4
.
5
0
0
+
E
5
3
.
9
4
0
+
E
7
2
.
1
3
0
+
E
3
2
.
1
6
0
+
E
5
5
.
6
6
0
+
E
4
2
.
1
1
0
+
E
5
6
.
5
0
0
+
E
5
3
.
6
4
0
+
E
6
0
.
1
3
0
+
E
0
1
.
2
6
0
+
E
2
4
.
6
5
0
+
E
2
6
.
2
1
0
+
E
6
2
.
6
0
0
+
E
8
7
.
9
4
0
+
E
7
2
.
1
2
0
+
E
9
0
.
6
6
0
+
E
1
7
.
6
5
0
+
E
0
9
.
1
1
0
+
E
7
9
.
4
0
0
+
E
0
4
.
5
4
0
+
E
3
2
.
1
3
0
+
E
0
5
.
2
6
0
+
E
5
5
.
6
6
0
+
E
2
2
.
1
1
0
+
E
4
5
.
7
0
0
+
E
2
0
.
4
4
0
+
E
6
4
.
1
2
0
+
E
3
7
.
6
6
0
+
E
1
2
.
9
5
0
+
E
0
0
.
1
1
0
+
E
1
7
.
5
0
0
+
E
4
1
.
4
4
0
+
E
6
2
.
1
3
0
+
E
8
0
.
2
6
0
+
E
1
8
.
7
5
0
+
E
6
3
.
5
—
—
—
—
—
—
n
5
0
1
5
0
1
0
1
7
5
0
1
0
1
0
1
5
5
3
5
7
3
5
7
3
5
7
3
5
7
3
P
4
P
3
P
3
P
2
P
2
P
3
P
4
P
1
P
3
P
3
P
3
P
K
m
e
l
b
o
r
P
y
g
e
t
a
r
t
s
g
n
i
l
p
m
a
S
e
l
p
m
a
S
e
z
i
s
0
0
0
,
2
S
H
L
S
N
V
M
S
N
V
M
S
H
L
0
0
0
,
0
1
n
i
(
n
o
i
t
a
i
v
e
d
d
r
a
d
n
a
t
s
r
i
e
h
t
d
n
a
e
m
i
t
d
n
a
,
E
S
M
R
,
e
m
u
l
o
v
r
e
p
y
h
n
a
i
d
e
m
e
h
t
g
n
i
w
o
h
s
s
e
c
n
a
t
s
n
i
t
s
e
t
d
e
t
c
e
l
e
s
f
o
n
o
s
i
r
a
p
m
o
C
:
3
e
l
b
a
T
.
d
l
o
b
n
i
n
w
o
h
s
e
r
a
s
e
h
c
a
o
r
p
p
a
g
n
i
m
r
o
f
r
e
p
t
s
e
b
e
h
T
.
s
n
u
r
t
s
e
t
1
3
e
h
t
f
o
)
s
c
i
l
a
t
i
Evolutionary Computation Volume xx, Number x
19
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
.
d
e
u
n
i
t
n
o
C
:
3
e
l
b
a
T
-
O
M
R
P
G
T
e
s
r
a
p
S
R
P
G
l
l
u
F
R
P
G
-
O
M
R
P
G
T
e
s
r
a
p
S
R
P
G
l
l
u
F
R
P
G
-
O
M
R
P
G
T
e
s
r
a
p
S
R
P
G
l
l
u
F
R
P
G
)
s
(
e
m
T
i
E
S
M
R
e
m
u
l
o
v
r
e
p
y
H
1
0
+
E
6
1
3
.
0
0
+
E
5
2
.
9
1
0
+
E
0
9
3
.
1
0
+
E
0
1
.
1
1
0
+
E
7
7
8
.
1
0
+
E
4
4
.
3
1
0
+
E
9
2
5
.
1
0
+
E
7
0
2
.
1
0
+
E
8
6
8
.
1
0
+
E
4
1
4
.
2
0
+
E
1
9
1
.
1
0
+
E
9
5
7
.
4
0
+
E
6
1
2
.
3
0
+
E
7
7
.
1
4
0
+
E
9
5
3
.
3
0
+
E
9
2
.
6
4
0
+
E
5
0
5
.
3
0
+
E
1
4
.
9
3
0
+
E
9
1
8
.
2
0
+
E
5
3
6
.
4
0
+
E
4
3
1
.
2
0
+
E
6
9
7
.
4
0
+
E
7
8
1
.
3
0
+
E
4
2
2
.
—
—
—
—
—
—
1
0
-
E
1
9
.
3
2
0
-
E
3
6
.
8
1
0
-
E
8
1
.
6
1
0
-
E
0
2
.
1
1
0
-
E
4
4
.
7
2
0
-
E
9
5
.
8
2
0
-
E
3
7
.
9
2
0
-
E
4
0
.
5
1
0
-
E
3
4
.
1
1
0
-
E
9
4
.
1
2
0
-
E
1
8
.
9
1
0
-
E
6
3
.
1
1
0
-
E
9
2
.
8
1
0
-
E
8
0
.
6
1
0
-
E
4
6
.
8
1
0
-
E
0
1
.
2
1
0
-
E
0
2
.
9
2
0
-
E
3
2
.
9
0
0
+
E
7
7
.
1
1
0
-
E
1
1
.
9
0
0
+
E
3
7
.
2
0
0
+
E
3
1
.
1
0
0
+
E
5
7
.
3
0
0
+
E
5
4
.
1
—
—
—
—
—
—
1
0
+
E
0
3
.
7
0
0
+
E
2
6
.
2
4
0
+
E
5
6
.
1
2
0
+
E
4
0
.
4
6
0
+
E
5
6
.
8
5
0
+
E
5
6
.
1
1
0
+
E
9
5
.
7
1
0
-
E
1
4
.
4
4
0
+
E
6
7
.
1
2
0
+
E
3
0
.
3
6
0
+
E
6
2
.
9
5
0
+
E
8
1
.
1
1
0
+
E
4
0
.
6
0
0
+
E
8
2
.
7
4
0
+
E
5
4
.
1
2
0
+
E
0
1
.
4
6
0
+
E
9
4
.
7
5
0
+
E
2
1
.
1
1
0
+
E
7
2
.
5
1
0
+
E
4
1
.
1
4
0
+
E
5
2
.
1
3
0
+
E
4
0
.
2
6
0
+
E
5
5
.
6
6
0
+
E
5
0
.
1
—
—
—
—
—
—
n
0
1
0
1
0
1
5
5
5
K
m
e
l
b
o
r
P
y
g
e
t
a
r
t
s
e
z
i
s
g
n
i
l
p
m
a
S
e
l
p
m
a
S
3
5
7
3
5
7
3
P
3
P
1
P
3
P
3
P
3
P
S
N
V
M
S
H
L
0
0
0
,
0
5
20
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
than those of sparse GPR by an order of about 102 and 103 for samples size of 10,000
and 50,000 respectively. The building time also increases with the number of objective
functions for sparse GPR surrogates.
The total number of samples utilized to build TGPR-MO surrogates with the num-
ber of iterations for six different problem instances with sample sizes of 2,000, 10,000,
and 50,000 is shown in Figure 6. The solid line shows the mean number of samples used,
and the shaded region denotes the 95% confidence interval of the runs for each treed
GPR surrogate (for each objective). The plots have been extended in the iteration axis
to the maximum allowed iterations, I max = N
10n . The iteration axis is broken when I max
is large to reduce the width of the plots. It can be observed that the number of samples
utilized converges before the maximum iterations and varies with the objective. Certain
problem instances required more samples during the building process than others, es-
pecially due to the number of decision variables and the characteristics of the problem.
If the trade-off region is located in a smaller region of the decision space, the building
process converges quickly and therefore consumes fewer samples.
The quantity of data available at the trade-off region affects the accuracy and hy-
pervolume of the solutions obtained that is a general challenge while solving offline
data-driven MOPs. The proposed TGPR-MO surrogates split the decision space into
subregions and approximate the underlying objective functions at the trade-off region.
It is best suitable when the Pareto set is located in a smaller region of the decision space.
When the Pareto set is in a larger region, the prediction of TGPR-MO surrogates is from
multiple leaf node GPRs and has discontinuities near the splits of the regression trees.
Therefore the approximated Pareto front has some discontinuities compared to sparse
GPRs or full GPRs. Further tests with the DTLZ (Deb et al., 2005) benchmark problems
are given in the supplementary material.
5 Conclusions
This paper proposed tailored surrogate models that can be built with a lower computa-
tional cost compared to sparse GPRs and full GPRs for solving offline data-driven MOPs
when dealing with large datasets. The proposed TGPR-MO surrogates performed sig-
nificantly better than sparse GPR surrogates in hypervolume, RMSE, and computation
time for most of the instances of the DBMOPP problems. The full GPR surrogates failed
to complete the runs for larger sample sizes due to memory restrictions. Thus, it can
be concluded that the proposed TGPR-MO surrogates are best suited for solving offline
data-driven MOPs with a large size dataset.
A GPR at a leaf node of a tree approximates certain regions of the decision space.
This feature can be exploited while solving offline data-driven MOPs with preferences
from the decision maker in an a priori or interactive fashion. The provided preferences
for objective functions can be utilized to build GPRs at the leaf nodes, and only the
solutions that follow the preferences can be approximated. Developing an interactive
framework utilizing the TGPR-MO surrogates will be one of the future works.
Utilizing the correlation between the objective functions using multi-target regres-
sion trees (Osojnik et al., 2018) and multi-output GPRs (Borchani et al., 2015) is an in-
teresting future research direction. Tests and comparisons of the uncertainty prediction
provided by the proposed TGPR-MO surrogates will be conducted for solving offline
data-driven MOPs. Selecting a suitable kernel for a given dataset is an active research
topic. Integrating automatic kernel selection into the TGPR-MO surrogates will be quite
beneficial. One of the major drawbacks of the TGPR-MO surrogates is the discontinuity
between the leaf node GPRs. Tackling discontinuity in the prediction (van Stein et al.,
Evolutionary Computation Volume xx, Number x
21
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 6: The median and 95% confidence interval of the total number of samples uti-
lized (or the sum of the samples available at the leaf nodes that are considered for build-
ing GPRs) with iterations for three different DBMOPP problem instances as displayed
according to the following format (N, sampling strategy, problem, K, n). The iteration
axis is extended to the maximum possible iteration N
10n . The iteration axes for plots 6c–6f
are broken to accommodate the plots.
22
Evolutionary Computation Volume xx, Number x
329
Treed GPR for Offline Data-Driven MOPs
2015; Wang et al., 2017) at the partition of the decision space (as provided by the tree)
will be a future task. Further improvements can be made in the way the trees partition
the decision space. Instead of using a traditional loss function, a nonlinear trade-off
criterion can be formulated to split the nodes. Such splitting criteria will enable the
TGPR-MO surrogates to approximate the trade-off region with fewer samples. Testing
the TGPR-MO surrogates for solving real-life offline data-driven MOPs will be a future
task.
Acknowledgments
This research was partly supported by the Academy of Finland (grant number 311877
and 322221) and is related to the thematic research area DEMO (Decision Analytics uti-
lizing Causal Models and Multiobjective Optimization, http://www.jyu.fi/demo) of
the University of Jyväskylä. The numerical experiments were performed on Mahti su-
percomputer provided by CSC (https://www.csc.fi).
References
Assael, J.-A. M., Wang, Z., Shahriari, B., and de Freitas, N. (2014). Heteroscedastic treed Bayesian
optimisation. CoRR. Retrieved from arXiv:1410.7172.
Borchani, H., Varando, G., Bielza, C., and Larrañaga, P. (2015). A survey on multi-output regres-
sion. WIREs Data Mining and Knowledge Discovery, 5(5):216–233. 10.1002/widm.1157
Chapman, W., Welch, W., Bowman, K., Sacks, J., and Walsh, J. (1994). Arctic sea ice variability:
Model sensitivities and a multidecadal simulation. Journal of Geophysical Research: Oceans,
99(C1):919–935. 10.1029/93JC02564
Cheng, R., Jin, Y., Olhofer, M., and Sendhoff, B. (2016). A reference vector guided evolutionary
algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation,
20(5):773–791. 10.1109/TEVC.2016.2519378
Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search. Journal
of the American Statistical Association, 93(443):935–948. 10.1080/01621459.1998.10473750
Chugh, T., Sindhya, K., Hakanen, J., and Miettinen, K. (2019). A survey on handling computa-
tionally expensive multiobjective optimization problems with evolutionary algorithms. Soft
Computing, 23:3137–3166. 10.1007/s00500-017-2965-0
Das, K., and Srivastava, A. N. (2010). Block-GP: Scalable Gaussian process regression for multi-
modal data. In 2010 IEEE International Conference on Data Mining, pp. 791–796.
Deb, K., and Jain, H. (2014). An evolutionary many-objective optimization algorithm using
reference-point-based nondominated sorting approach, Part I: Solving problems with box
constraints. IEEE Transactions on Evolutionary Computation, 18(4):577–601. 10.1109/TEVC
.2013.2281535
Deb, K., Thiele, L., Laumanns, M., and Zitzler, E. (2005). Scalable test problems for evolutionary
multiobjective optimization. In A. Abraham, L. Jain, and R. Goldberg (Eds.), Evolutionary
multiobjective optimization: Theoretical advances and applications, pp. 105–145. Springer.
Emmerich, M., Giannakoglou, K., and Naujoks, B. (2006). Single- and multiobjective evolutionary
optimization assisted by Gaussian random field metamodels. IEEE Transactions on Evolution-
ary Computation, 10(4):421–439. 10.1109/TEVC.2005.859463
Fieldsend, J. E., Chugh, T., Allmendinger, R., and Miettinen, K. (2019). A feature rich distance-
based many-objective visualisable test problem generator. In Proceedings of the Genetic and
Evolutionary Computation Conference, pp. 541–549.
Evolutionary Computation Volume xx, Number x
23
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
.
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
A. Mazumdar et al.
Forrester, A., Sobester, A., and Keane, A. (2008). Engineering design via surrogate modelling. John
Wiley & Sons.
GPy (2012). GPy: A Gaussian process framework in python. http://github.com/SheffieldML/
GPy.
Gramacy, R. B., and Lee, H.K.H. (2008). Bayesian treed Gaussian process models with an applica-
tion to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130.
10.1198/016214508000000689
Hughes, E. J. (2001). Evolutionary multi-objective ranking with uncertainty and noise. In Proceed-
ings of Evolutionary Multi-Criterion Optimization, pp. 329–343.
Jin, Y., Wang, H., Chugh, T., Guo, D., and Miettinen, K. (2019). Data-driven evolutionary optimiza-
tion: An overview and case studies. IEEE Transactions on Evolutionary Computation, 23(3):442–
458. 10.1109/TEVC.2018.2869001
Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive
black-box functions. Journal of Global Optimization, 13:455–492. 10.1023/A:1008306431147
Kim, H.-M., Mallick, B. K., and Holmes, C. C. (2005). Analyzing nonstationary spatial data using
piecewise Gaussian processes. Journal of the American Statistical Association, 100(470):653–668.
10.1198/016214504000002014
Loh, W.-Y. (2011). Classification and regression trees. WIREs Data Mining and Knowledge Discovery,
1(1):14–23. 10.1002/widm.8
Mazumdar, A., Chugh, T., Hakanen, J., and Miettinen, K. (2020). An interactive framework for of-
fline data-driven multiobjective optimization. In Proceedings of Bioinspired Optimization Meth-
ods and Their Applications, pp. 97–109.
Mazumdar, A., Chugh, T., Hakanen, J., and Miettinen, K. (2022). Probabilistic selection ap-
proaches in decomposition-based evolutionary algorithms for offline data-driven mul-
tiobjective optimization. IEEE Transactions on Evolutionary Computation, 26(5):1182–1191.
10.1109/TEVC.2022.3154231
Mazumdar, A., Chugh, T., Miettinen, K., and López-Ibáñez, M. (2019). On dealing with uncertain-
ties from Kriging models in offline data-driven evolutionary multiobjective optimization. In
Proceedings of Evolutionary Multi-Criterion Optimization, pp. 463–474.
Misitano, G., Saini, B. S., Afsar, B., Shavazipur, B., and Miettinen, K. (2021). DESDEO: The mod-
ular and open source framework for interactive multiobjective optimization. IEEE Access,
9:148277–148295. 10.1109/ACCESS.2021.3123825
Osojnik, A., Panov, P., and Džeroski, S.
(2018). Tree-based methods for online multi-
target regression. Journal of Intelligent Information Systems, 50:315–339. 10.1007/s10844-017
-0462-7
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et
al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research,
12:2825–2830.
Rahat, A.A.M., Wang, C., Everson, R. M., and Fieldsend, J. E. (2018). Data-driven multi-objective
optimisation of coal-fired boiler combustion systems. Applied Energy, 229:446–458. 10.1016/
j.apenergy.2018.07.101
Rasmussen, C. E., and Williams, C.K.I. (2006). Gaussian processes for machine learning. MIT Press.
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human
out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175.
10.1109/JPROC.2015.2494218
24
Evolutionary Computation Volume xx, Number x
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
.
/
/
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329
Treed GPR for Offline Data-Driven MOPs
Snelson, E., and Ghahramani, Z. (2005). Sparse Gaussian processes using pseudo-inputs. In Pro-
ceedings of the 18th International Conference on Neural Information Processing Systems, pp. 1257–
1264.
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine
learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cur-
ran Associates.
Titsias, M. K. (2009). Variational learning of inducing variables in sparse Gaussian processes. In
Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, pp. 567–
574.
van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., and Emmerich, M. (2015). Optimally weighted
cluster kriging for big data regression. In E. Fromont, T. De Bie, and M. van Leeuwen (Eds.),
Advances in intelligent data analysis XIV, pp. 310–321. Springer.
Wang, H., and Jin, Y. (2020). A random forest-assisted evolutionary algorithm for data-driven
constrained multiobjective combinatorial optimization of trauma systems. IEEE Transactions
on Cybernetics, 50(2):536–549. 10.1109/TCYB.2018.2869674
Wang, H., Jin, Y., and Jansen, J. O. (2016). Data-driven surrogate-assisted multiobjective evo-
lutionary optimization of a trauma system. IEEE Transactions on Evolutionary Computation,
20(6):939–952. 10.1109/TEVC.2016.2555315
Wang, H., Jin, Y., Sun, C., and Doherty, J. (2019). Offline data-driven evolutionary optimization us-
ing selective surrogate ensembles. IEEE Transactions on Evolutionary Computation, 23(2):203–
216. 10.1109/TEVC.2018.2834881
Wang, H., van Stein, B., Emmerich, M., and Bäck, T. (2017). Time complexity reduction in effi-
cient global optimization using cluster kriging. In Proceedings of the Genetic and Evolutionary
Computation Conference, pp. 889–896.
Yang, C., Ding, J., Jin, Y., and Chai, T. (2020). Offline data-driven multiobjective optimization:
Knowledge transfer between surrogates and generation of final solutions. IEEE Transactions
on Evolutionary Computation, 24(3):409–423. 10.1109/TEVC.2019.2925959
Zhang, Q., and Li, H. (2007). MOEA/D: A multiobjective evolutionary algorithm based on
decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712–731. 10.1109/
TEVC.2007.892759
Zitzler, E., and Thiele, L. (1998). Multiobjective optimization using evolutionary algorithms—A
comparative case study. In Parallel Problem Solving from Nature, pp. 292–301.
Evolutionary Computation Volume xx, Number x
25
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
/
e
d
u
e
v
c
o
a
r
t
i
c
e
-
p
d
l
f
/
d
o
i
/
/
/
.
1
0
1
1
6
2
e
v
c
o
_
a
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
a
_
0
0
3
2
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
329