Treed Gaussian Process Regression for Solving - Recherche en IA spécialisée au MIT

Treed Gaussian Process Regression for Solving
Offline Data-Driven Continuous Multiobjective
Optimization Problems

Atanu Mazumdar
University of Jyvaskyla, Faculty of Information Technology, Finlande

atanu.a.mazumdar@jyu.fi

Manuel López-Ibáñez
Alliance Manchester Business School, University of Manchester, ROYAUME-UNI

manuel.lopez-ibanez@manchester.ac.uk

Tinkle Chugh
Department of Computer Science, University of Exeter, ROYAUME-UNI

t.chugh@exeter.ac.uk

Jussi Hakanen
University of Jyvaskyla, Faculty of Information Technology, Finlande

jussi.hakanen@jyu.fi

Kaisa Miettinen
University of Jyvaskyla, Faculty of Information Technology, Finlande

kaisa.miettinen@jyu.fi

https://doi.org/10.1162/evco_a_00329

Abstrait

For offline data-driven multiobjective optimization problems (MOPs), no new data is
available during the optimization process. Approximation models (or surrogates) sont
first built using the provided offline data, and an optimizer, Par exemple, a multiob-
jective evolutionary algorithm, can then be utilized to find Pareto optimal solutions to
the problem with surrogates as objective functions. In contrast to online data-driven
MOPs, these surrogates cannot be updated with new data and, hence, the approxi-
mation accuracy cannot be improved by considering new data during the optimiza-
tion process. Gaussian process regression (GPR) models are widely used as surrogates
because of their ability to provide uncertainty information. Cependant, building GPRs
becomes computationally expensive when the size of the dataset is large. Using sparse
GPRs reduces the computational cost of building the surrogates. Cependant, sparse GPRs
are not tailored to solve offline data-driven MOPs, where good accuracy of the surro-
gates is needed near Pareto optimal solutions. Treed GPR (TGPR-MO) surrogates for
offline data-driven MOPs with continuous decision variables are proposed in this pa-
par. The proposed surrogates first split the decision space into subregions using re-
gression trees and build GPRs sequentially in regions close to Pareto optimal solutions
in the decision space to accurately approximate tradeoffs between the objective func-
tion. TGPR-MO surrogates are computationally inexpensive because GPRs are built
only in a smaller region of the decision space utilizing a subset of the data. The TGPR-
MO surrogates were tested on distance-based visualizable problems with various data
sizes, sampling strategies, numbers of objective functions, and decision variables. Ex-
perimental results showed that the TGPR-MO surrogates are computationally cheaper
and can handle datasets of large size. En outre, TGPR-MO surrogates produced
solutions closer to Pareto optimal solutions compared to full GPRs and sparse GPRs.

Mots clés

Gaussian processes, Kriging, regression trees, metamodelling, surrogate, Pareto
optimality.

Manuscript received: 13 Avril 2022; revised: 6 Novembre 2022, 23 Février 2023, 23 Mars 2023; accepted: 31
Mars 2023.
© 2023 Massachusetts Institute of Technology.
Publié sous Creative Commons
Attribution 4.0 International (CC PAR 4.0) Licence.

Evolutionary Computation xx(X): 1–25

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
e
v
c
o
un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2
e
v
c
o
_
un
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
un
_
0
0
3
2
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

329

UN. Mazumdar et al.

Introduction

A multiobjective optimization problem (MOP) consists of two or more conflicting ob-
jective functions that are optimized simultaneously. The solutions of an MOP are called
Pareto optimal when the value of any objective function cannot be further improved
without degrading some of the others; thus, tradeoffs exist among the objective func-
tion. Not all real-world MOPs have analytical functions or simulation models avail-
capable. Plutôt, the MOP may need to be formulated by utilizing the data acquired from
a phenomenon, Par exemple, real-life processes, physical experiments, and sensors. À
solve this MOP, first the underlying objective functions are approximated by fitting surro-
gates (also known as metamodels) on the data available. Suivant, a multiobjective evo-
lutionary algorithm (MOEA) can be used to approximate the set of Pareto optimal
solutions by considering the surrogates as objective functions.

Data-driven optimization problems are classified into online and offline ones (Jin
et coll., 2019). For online data-driven optimization, new data can be acquired during the
optimization process, Par exemple, by conducting further experiments or simulations
to update the surrogates (Shahriari et al., 2016). In contrast, for offline data-driven opti-
mization problems, no new data can be acquired, and the optimization has to proceed by
using only the existing data (Wang et al., 2019). Ideally, the underlying objective values
of the solutions should be better than the objective values in the provided dataset. All
the available data should be utilized to build surrogates with good approximation while
solving offline data-driven MOPs. Cependant, when the size of the data set is large, for ex-
ample, in Wang et al. (2016) and Wang and Jin (2020), building certain surrogates, tel
as Gaussian processes, can become computationally expensive. Because the surrogates
cannot be updated with new data while solving offline data-driven MOPs, the approx-
imation accuracy of the surrogates directly influences the closeness of the solutions to
the Pareto optimal set. Most of the previous works on offline data-driven optimization,
such as Wang et al. (2019, 2016), Wang and Jin (2020), and Yang et al. (2020), considered
different surrogate modelling techniques for solving offline data-driven optimization
problems. Cependant, these works did not consider the tradeoffs between the underlying
objective functions while building the surrogates. Other developed approaches, for ex-
ample, Mazumdar et al. (2019, 2022), Rahat et al. (2018), and Hughes (2001), relied on the
uncertainty in the prediction of Gaussian processes to improve the quality of solutions.
The previous works on offline data-driven optimization dealing with large datasets, pour
example, Wang et al. (2016) and Wang and Jin (2020) did not quantify the uncertainty,
which is a critical component in solving offline data-driven MOPs, as shown in Mazum-
dar et al. (2019, 2020, 2022). This paper proposes an approach to build computationally
cheap surrogates with a high approximation accuracy at the region around the Pareto
optimal set. The surrogates use a combination of regression trees and Gaussian process
regression models and provide uncertainty in the prediction. The proposed surrogates
are tailored toward solving offline data-driven MOPs and are capable of handling large
datasets (tested for a maximum size of 50,000).

To study offline data-driven MOPs, data is sampled from benchmark problems for
which the Pareto optimal set is known. Knowledge of the Pareto optimal set enables
quality comparisons of the solutions obtained by different optimization approaches
and surrogates. While solving an offline data-driven MOP, generally two types of in-
dicators are used to quantify the quality of the solutions (Mazumdar et al., 2019, 2020).
The approximation accuracy is measured in root mean squared error (RMSE) entre
the approximated objective values of the surrogates and the underlying objective val-
ues. En outre, the hypervolume (Zitzler and Thiele, 1998) indicator is used to measure

Evolutionary Computation Volume xx, Number x

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
e
v
c
o
un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2
e
v
c
o
_
un
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
un
_
0
0
3
2
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

329

Treed GPR for Offline Data-Driven MOPs

how close the approximated Pareto front (after evaluating with the underlying objective
les fonctions) is to the Pareto front of the underlying MOP.

Kriging, or Gaussian process regression (GPR), (Forrester et al., 2008) is a popu-
lar choice of surrogate (Rasmussen and Williams, 2006) for solving offline problems
because it also provides information about the uncertainty in the approximation. Previ-
ous works (Hughes, 2001; Rahat et al., 2018; Mazumdar et al., 2019, 2022) have shown
that utilizing the uncertainty prediction from GPR surrogates produces solutions with
improved accuracy and hypervolume. The uncertainty prediction from GPR makes it
an attractive choice of surrogates for solving offline data-driven MOPs. Most of the ap-
proaches for solving offline data-driven MOPs build a global GPR surrogate for each
objective using all the provided data. Cependant, full GPRs have high computational and
memory complexity when the size of the dataset (or sample size) is large. An alterna-
tive to full GPRs is to use sparse GPRs (Snelson and Ghahramani, 2005; Titsias, 2009),
which build approximation models with a small subset of data called support or induc-
ing points. In Titsias (2009), a variational method was proposed to select the inducing
points by gradient descent or greedy search algorithm. Cependant, this method becomes
computationally expensive when the dataset is large. On the other hand, using fewer
points for building the surrogates reduces the approximation accuracy of the surrogates
compared to full GPRs (Titsias, 2009). Other works, such as Emmerich et al. (2006), pro-
posed building GPR using only K-nearest neighbor samples from the point that is to be
predicted.

While performing multiobjective optimization using surrogates, it is desirable to
obtain a good approximation accuracy at the region around the Pareto set, which is
referred to as the trade-off region in this paper. The accuracy of the surrogates in other re-
gions of the decision space (excluding the trade-off region) is not of utmost importance.
A possible approach to reduce the computational cost is to build local GPR surrogates
for the underlying objective functions exclusively in the trade-off region. Using local
GPRs reduces the overall computational cost of building GPRs while providing a good
approximation accuracy at the trade-off region. Previous works (Kim et al., 2005; Le
and Srivastava, 2010; van Stein et al., 2015) partitioned the decision space into regions
and fitted GPRs in each region. Partitioning the decision space is relatively inexpensive,
and each GPR utilizes smaller sets of data. This concept was extended by Gramacy and
Lee (2008) to fit GPRs in each region or leaf nodes partitioned by a Bayesian treed model
previously proposed by Chipman et al. (1998). In Assael et al. (2014), treed GPRs were
proposed to deal with cases where the noise is different for different samples. All the
previous works build GPRs in all the regions of the decision space. These types of GPRs
become computationally expensive, depending on the number of regions and amount
of data in each region. De plus, these surrogates do not consider the trade-off region
of the offline data-driven MOP during the building process.

This paper proposes treed GPR surrogates for multiobjective optimization (TGPR-
MO) which have a high accuracy around the trade-off region and are tailored for solving
offline data-driven MOPs with continuous decision variables. D'abord, computationally in-
expensive regression tree surrogates are built using the provided dataset. The regression
tree surrogates provide a less accurate approximation of the underlying objective func-
tion (compared to GPRs) (Loh, 2011) and split the decision space into regions. A region
is defined as the part of the decision space that is enclosed by the leaf node of the fitted
regression trees. The approximation accuracy of the prediction by the leaf node is the
accuracy of the region. Suivant, an MOEA is executed considering the regression trees as
objective functions. The solutions of the MOEA are not very accurate but provide the

Evolutionary Computation Volume xx, Number x

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
e
v
c
o
un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2
e
v
c
o
_
un
_
0
0
3
2
9
2
1
5
5
3
4
0
e
v
c
o
_
un
_
0
0
3
2
9
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

329

UN. Mazumdar et al.

approximate location of the trade-off region. After a certain number of generations of
the MOEA, the accuracy of the trees’ prediction is improved by building GPRs at leaf
nodes within the trade-off region. The GPRs built improve the accuracy of the solutions
in the region of the decision space corresponding to the leaf node they replace. The final
surrogates consist of regression trees with GPRs at a few leaf nodes providing accurate
approximations exclusively in the trade-off region.

The TGPR-MO surrogates were tested on several instances of distance-based vi-
sualizable test problems (DBMOPP) with different numbers of decision variables and
objective functions. Various sizes of the initial dataset with different sampling strate-
gies were used in the tests. Numerical experiments showed that TGPR-MO surrogates
significantly reduced the building time for surrogates when using large size data while
solving offline data-driven MOPs. TGPR-MO surrogates also produced solutions with
a better hypervolume compared to sparse GPR surrogates.

The rest of the paper is arranged as follows. The background of offline data-driven
MOPs, GPRs, and regression trees is given in Section 2. The proposed TGPR-MO surro-
gates are detailed in Section 3. The experimental results consisting of a comparison of
TGPR-MO surrogates with other surrogates are presented in Section 4. Section 5 con-
cludes and discusses the future research perspectives.

2 Background

For an offline data-driven MOP, the starting point for solving the problem is (precol-
lected) data. In this paper, we assume that the available data is the output of a process
or phenomenon. The process of generating the data is referred to as the underlying ob-
jective functions of an MOP. The underlying MOP that has to be solved is considered to
be of the following form:

minimize

subject to

(f1(X), . . . , fK (X))
x ∈ (cid:2),

(1)

where K ≥ 2 is the number of objective functions and (cid:2) is the feasible region in the
decision space (cid:4)n. For a feasible decision vector x (consisting of n decision variables),
the corresponding objective vector is f(X) = (f1(X), . . . , fK (X)), where f1(X), . . . , fK (X)
are the objective (fonction) valeurs.

A solution x ∈ (cid:2) dominates another solution x(cid:5) ∈ (cid:2) if fi (X) ≤ fi (X(cid:5)

) for all i =
1, . . . , K and fi (X) < fi (x(cid:5) ) for at least one i = 1, . . . , K. If a solution of an MOP is not dominated by any other feasible solutions, it is called nondominated. Solving an MOP using a multiobjective optimization algorithm, for example, an MOEA typically pro- duces a set of mutually nondominated solutions. The solutions of the optimization problem defined in Equation (1) that are nondominated in the whole set (cid:2) are called Pareto optimal solutions. A generic way to solve an offline data-driven MOP with an MOEA as the optimizer is shown in Figure 1. As explained in Jin et al. (2019) and Wang et al. (2019), the solution process can be divided into three components: (a) data collection, (b) formulating the MOP and surrogate building, and (c) optimization with surrogates. The optimization process starts with an offline dataset consisting of N samples. A sample consists of a decision vector x and its corresponding objective vector f(x) as a tuple of two matrices: (X, Y ) where X ∈ (cid:4)N×n and Y ∈ (cid:4)N×K . Each row in X and Y is a decision vector and its corresponding objective vector, respectively. Next, surrogates are built using all or a subset of the data. The surrogates are considered as objective 4 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs Figure 1: Flowchart of a generic offline data-driven multiobjective optimization approach. functions by an MOEA to solve the offline data-driven MOP. For simplicity of describing a surrogate model for a single objective i, let yi ∈ (cid:4)N×1 be the vector of objective values fi (x) for all decision vectors x ∈ X. 2.1 Gaussian Process Regression In this work, a surrogate model is built for each objective function. For simplicity in terminologies, y is considered here instead of yi to introduce the Gaussian process re- gression (GPR) and regression trees. GPRs have been widely used in surrogate-assisted optimization and time-series analysis (Jin et al., 2019; Chugh et al., 2019). The major advantage of using a GPR is its ability to provide the distribution about its prediction (or the uncertainty). A GPR is a multivariate normal distribution with a mean μ and a covariance matrix C: y ∼ N (μ, C). For simplicity in calculations, a mean of zero is considered without loss of generality. The covariance matrix C uses a covariance (or kernel) function to define correlation be- tween two samples, x and x(cid:5) . In this work, we use the Matérn 5/2 kernel because it does not make the covariance functions unrealistically smooth compared to other ker- nels such as the Gaussian (or RBF) kernel. Therefore, it is more appropriate for solving practical optimisation problems (Snoek et al., 2012). The kernel is defined as: (2) ⎛ ⎞ ⎛ ⎞ κ (x, x (cid:5), (cid:2)) = σ 2 f ⎝1 + √ n(cid:4) 5 j =1 rj + 5 3 n(cid:4) j =1 ⎠ exp r 2 j √ ⎝− 5 n(cid:4) j =1 rj ⎠ + σ 2 t δxx(cid:5) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / j | lj and xj and x(cid:5) where rj = |xj −x(cid:5) j are the j th components of the decision vector of x and x(cid:5), (cid:2) = (σf, l1, . . . , ln, σt) is the set of parameters in the GPR model and δxx(cid:5) is the Kronecker delta function. The notation |xj − x(cid:5) | represents the Euclidean distance be- j tween xj and x(cid:5) j . The parameters σf, lj and σt represent the amplitude, length scale of the j th variable, and noise in the data, respectively. For more details on the significance of these parameters, see Rasmussen and Williams (2006). To build a GPR model, the parameters mentioned above can be estimated by max- f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 imizing the marginal likelihood function: (cid:7) (cid:8) (cid:9)C−1y y p(y | X, (cid:2)) = 1√ |2π C| exp After estimating the parameters, the model defined in Equation (2) can be used for the posterior predictive distribution at a new decision vector x∗ . The GPR model provides a posterior predictive distribution which is also Gaussian: ∗, (cid:2))−κ (x p (y∗|x ∗, X, (cid:2))C−1y, κ (x ∗, X, y, (cid:2)) = N (cid:9)C−1κ (X, x (cid:10) . (4) ∗, X, (cid:2)) (cid:9) κ (x − 1 2 ∗, (cid:2)) ∗, x (3) . Evolutionary Computation Volume xx, Number x 5 329 A. Mazumdar et al. The posterior mean in the equation above is κ (x∗, X, (cid:2))C−1y, and the variance repre- senting the uncertainty is κ (x∗, x∗, (cid:2)) − κ (x∗, X, (cid:2)) (cid:9)C−1κ (X, x∗, (cid:2)). 2.2 Regression Trees A regression tree recursively partitions the decision space such that the provided sam- ples with similar objective function values are grouped together (Loh, 2011). Each node φ of the tree contains data Qφ = (Xφ, yφ ) with N φ samples, where each row of Xφ ∈ (cid:4)N φ ×n is a decision vector xi and each row of yφ ∈ (cid:4)N φ ×1 is its corresponding objec- tive value yi, i = 1, . . . , N φ. Node φ is split into nodes φleft and φright according to param- eter θ = (j, t ), which specifies a j th decision variable (1 ≤ j ≤ n) and a threshold t, by partitioning Qφ into two disjoint subsets Qφleft ) | (xi, yi ) ∈ Qφ ∧ xij ≤ t} = {(Xφleft , yφleft and Qφright . That is, split parameter θ partitions the samples (rows) in Qφ according to the value of variable xj of each decision vector x ∈ Xφ. = Qφ \ Qφleft θ θ θ Given a node φ and a split parameter θ , the quality of the split or the total loss is calculated as: G(Qφ, θ ) = N φleft N φ H (Qφleft θ ) + N φright N φ H (Qφright θ ), (5) are the number of samples in Qφleft where N φleft H (Qφ ) is a loss function. For instance, the loss function for mean-squared error is: and Qφright and N φright , respectively, and θ θ H (Qφ ) = 1 N φ (cid:4) yi ∈Qφ (yi − ¯yφ )2, (6) where ¯yφ is the mean objective value in Qφ. An optimal split θ ∗ of a node φ can be found by minimizing the loss function defined in Equation (5) using a single objective optimization algorithm. In the proposed TGPR-MO surrogates, the splitting process is recursed for both Qφleft θ ∗ until splitting a node would produce a leaf node with less than a predefined minimum number of samples, Nmin. θ ∗ and Qφright For predicting any given decision variable value, the regression tree is traversed to the respective leaf node. The prediction of the tree at the leaf node l, which contains training subset Ql with N l number of samples, is ¯yl = 1 N l yi ∈Ql yi. (cid:11) 3 Treed GPRs for Multiobjective Optimization (TGPR-MO) The time complexity of a full GPR is cubic, which is polynomial. A suitable alternative to reduce the computational cost is to split the dataset and build GPRs exclusively in the trade-off region. The proposed TGPR-MO surrogates are based on this modelling ap- proach. The following subsections describe the building process of the proposed TGPR- MO surrogates. A detailed accuracy and complexity analysis and comparison with full GPRs and sparse GPRs are also provided. 3.1 Building the TGPR-MO Surrogates In a generalized treed GPR surrogate, first a regression tree model is built using the pro- vided data as described in Section 2.2. The splitting at the nodes is done by minimizing the total loss function (Equation 5). The subset of data at the lth leaf node is Ql, where l = 1, . . . , L and L is the total number of leaf nodes in the regression tree built. A GPR is fitted at every leaf node of the regression tree using the data Ql = (Xl, yl ). The GPRs are built in a similar fashion as described in Section 2.1 by maximizing the marginal 6 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs (cid:9) y∗|x∗, Xl, yl, (cid:2) likelihood at the leaf node l, as defined in Equation (3). The posterior predictive distri- (cid:10) bution of the lth GPR is p , as defined in Equation (4). Building GPRs with smaller subsets of data reduces the overall cost of building surrogates compared to building a GPR with the entire dataset. However, building GPRs at all the leaf nodes becomes expensive when there are too many of them, and the data subset at each leaf node is large. Moreover, for solving an offline data-driven MOP, an accurate approxi- mation of the global landscape of the underlying objective functions is not required. A good approximation of the local landscape near the trade-off region is sufficient. While performing multiobjective optimization using surrogates, the approxima- tion accuracy in the trade-off region is crucial and directly influences the quality of the approximated Pareto optimal solutions. Hence, to obtain better quality solutions, it is desirable to have accurate approximations in the trade-off region. While building the regression trees, the splits at each node divide the decision space. The leaf nodes of the regression trees are the smallest regions the decision space is split into. In addition, the initial approximation of regression tree surrogates (though not highly accurate) pro- vides information about the trade-offs between the objective functions. After building the regression trees with all the provided data, an MOEA is run considering them as objective functions. The solutions found by the MOEA are not accurate, but they pro- vide an approximation of the Pareto front. The accuracy of the trade-off region is the approximation accuracy of the region enclosed by the leaf nodes that predicts the ap- proximated Pareto front. Later, local GPRs are built exclusively in the leaf nodes repre- senting the trade-off region and achieve an accurate approximation only in the neigh- borhood of the Pareto set. Next, the algorithm to build the treed GPR surrogates for multiobjective optimization (TGPR-MO) tailored to solve offline data-driven MOPs is introduced. Algorithm 1 starts with a dataset containing N samples with K objective functions and n decision variables. First, K regression trees (one per objective function) are built with all the provided data. Just the parameter Nmin is adjusted, and the depth of the trees is not controlled. After building the regression trees, there can be a maximum of N leaf nodes. Initially, there are no GPRs present at the leaf nodes, and the predictions Nmin are from the regression trees (i.e., ¯yl). Next, a population is initialized and an MOEA is executed that iteratively builds GPRs at specific leaf nodes. Each iteration consists of running the MOEA for Gmax generations (that is a predefined parameter) and building a GPR at one leaf node in every tree. In every generation, offspring individuals are produced using crossover and mutation operators and evaluated using the treed GPRs. Selection of individuals is then performed using MOEA-specific selection criterion, and the process is repeated for Gmax generations within an iteration. The inner workings of the proposed algorithm are exemplified in two-dimensional decision spaces in Section 1 and Figure 1 of the supplementary material. After the MOEA completes Gmax generations, the approximation errors of the solu- tions (for each objective) are compared. The comparison is done by first calculating the loss function value, here the mean squared error, as defined in Equation (6), of the leaf node predicting the objective values of the solutions found by the MOEA. For the treed GPR surrogate corresponding to the j th objective, let lj s denote the leaf nodes containing the s solutions found by the MOEA and let H (Qlj s ) denote the loss function value of those leaf nodes, then one GRP is built at the leaf node i∗ with the i ), using its subset of samples Qlj maximum loss value, that is, i∗ = arg maxi=1,...,sH (Qlj i∗ . If multiple solutions fall within the same leaf node, the loss value is calculated once and only one GPR is built for that leaf node. The process of building GPRs is repeated for 1 ), . . . , H (Qlj 1 , . . . , lj Evolutionary Computation Volume xx, Number x 7 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 8 Evolutionary Computation Volume xx, Number x 329 Treed GPR for Offline Data-Driven MOPs j = 1, . . . , K treed GPRs. The process of adding GPRs to the leaf nodes is repeated until any of the following criteria are met: • Number of iterations has reached the predefined maximum of Imax. • All the solutions in the MOEA’s population fall in the leaf nodes that have GPRs built. The regression trees perform two tasks: (1) they provide an approximation of the underlying objective functions, and (2) they split the decision space into smaller regions. Using regression trees over other clustering algorithms is advantageous since they are optimized for reducing the prediction error. Other clustering algorithms, for example, K-means clustering and mean shift clustering, utilize a loss function in the decision space and not in the objective space. Therefore, regression trees provide a good predic- tion of the approximated objective functions in addition to splitting the decision space into subregions. In the initial iterations, the MOEA first finds solutions using the ap- proximation provided by the regression tree surrogates (that may have poor accuracy). These solutions have a higher approximation error, but they provide information about the trade-off between the objective functions. To improve the approximation accuracy, GPRs are built for solutions provided by the trees that have the maximum approxima- tion error. This ensures that GPRs are built exclusively in the region of the Pareto set and where the tree’s approximation is the worst, simultaneously. In later iterations, if the de- cision vector in the MOEA’s population falls within the path of the leaf node where a GPR is already built, the posterior predictive mean of the GPR is used as the final pre- diction of the surrogate. Otherwise, the prediction is the mean value at the leaf node of the regression trees. Local GPRs are built sequentially because building a local GPR at a l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 2: The flowchart for solving offline data-driven MOPs with TGPR-MO surrogates with an MOEA. Evolutionary Computation Volume xx, Number x 9 329 A. Mazumdar et al. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 3: Progress of the building process of TGPR-MO surrogates for a bi-objective DTLZ2 (a)–(c) and a DBMOPP (d)–(f) problem. The plots show the solutions (in blue) of the MOEA (predicted by the treed GPR surrogates) at different iterations I of the building process. The Pareto front of the underlying MOP is shown in orange ‘(cid:8)’. leaf node improves the accuracy of the solutions in the following iteration. Thus, many solutions found in the previous iteration (belonging to different leaf nodes of the trees) may be eliminated as newer and better solutions are discovered. Eliminating solutions reduces the number of local GPRs required to be built to approximate the underlying objective functions. The flowchart for solving offline data-driven MOPs with TGPR-MO surrogates is shown in Figure 2. The approach starts with a dataset, and regression trees are built for each objective. The blocks within the dotted lines represent the building process of TGPR-MO surrogates as described in Algorithm 1. After Algorithm 1 is completed, the surrogates are composed of regression trees with GPRs at some of the leaf nodes. The built TGPR-MO surrogates are then used for multiobjective optimization using a second MOEA. Or the MOEA used in Algorithm 1 may simply continue to run from the last population for an additional number of generations without further modifying the surrogates. The building process of TGPR-MO surrogates is illustrated in Figure 3. The figure shows the solutions obtained by the MOEA in some of the iterations of Algorithm 1 for two bi-objective test problems. The two problems are DTLZ2 with n = 10 (Figure 3a– 3c) from the DTLZ (Deb et al., 2005) test suite and a distance-based visualizable test 10 Evolutionary Computation Volume xx, Number x 329 Treed GPR for Offline Data-Driven MOPs Table 1: Configurations of the DBMOPP problems used. Problem Configuration Dimension (n) Objectives (K ) P1 P2 P3 P4 number of disconnected set regions = 0, number of local fronts = 0, number of dominance resistance regions = 0, number of discontinuous regions = 0 number of disconnected set regions = 1, number of local fronts = 0, number of dominance resistance regions = 0, number of discontinuous regions = 0 number of disconnected set regions = 2, number of local fronts = 0, number of dominance resistance regions = 0, number of discontinuous regions = 0 number of disconnected set regions = 0, number of local fronts = 0, number of dominance resistance regions = 1, number of discontinuous regions = 0 2, 5, 7, and 10 3, 5, and 7 2, 5, 7, and 10 3, 5, and 7 2, 5, 7, and 10 3, 5, and 7 2, 5, 7, and 10 3, 5, and 7 problem (DBMOPP) (Fieldsend et al., 2019) with n = 2 (Figure 3d–3f). The problem con- figuration for the DBMOPP problem was the same as problem P1, which can be found in Table 1. For each problem, a dataset of size N = 2000 was generated by Latin hyper- cube sampling (LHS). The MOEA used was RVEA (Cheng et al., 2016) with standard parameter settings. The first column shows the solutions obtained by the MOEA in the first iteration before any GPRs are built at the leaf nodes of the trees. As the objective values of the solutions are predicted by the regression trees, the obtained solutions do not form a smooth Pareto front. In the second column, it can be observed that the solu- tions are obtained after four more iterations. Few GPRs are built at the leaf nodes, and the approximated Pareto front gradually becomes more smooth. At this iteration, the objective values of the solutions are predicted by the regression trees and GPRs at leaves for either or both the objective functions. The last column shows the final iteration of the surrogate building process, when all solutions in the MOEA’s population are pre- dicted by leaf nodes that contain GPRs. Building the surrogates for the DTLZ2 problem consumes more iterations than the DBMOPP problem instance. This behavior is due to the Pareto set of DTLZ problems that is located in a larger region of the decision space. Hence, more GPRs are required to be built at the leaves of the trees to approximate the tradeoff region accurately. Figure 4 shows an illustration of the leaf node that is considered for building GPRs for two subsequent iterations in a bi-objective minimization problem. The iteration I ∈ {1, 2} and the objective j ∈ {1, 2} are also indicated. The red-colored leaf nodes are chosen for building GPRs at each iteration. In the first iteration, the solutions are pre- dicted by nodes 4 and 5 for the first objective and node 5 for the second objective. Node 4 and node 5 for the first and second objectives’ tree have the highest RMSE. Hence, GPRs are built using the subset of data in those leaf nodes. In the second iteration, the solutions predicted by the trees are from node 4 and node 5 (with the GPR built) for the first objective. For the second objective, the solutions are predicted by leaf node 4 Evolutionary Computation Volume xx, Number x 11 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 4: Illustration of the building process of GPRs at leaf nodes of the regression trees in two subsequent iterations. Red leaf nodes denote a GPR was built using the subset of data. and node 5 (with the GPR built). The process of building GPRs is repeated at node 4 and node 5 for the first and second objective’s trees, respectively. The process will be repeated until the stopping criteria mentioned before are satisfied. 3.2 Accuracy Analysis The fitness landscape of a bi-objective DBMOPP problem with n = 2 (with the same configuration as P1 in Table 1) is illustrated in Figure 5. The figure shows the accuracy of the approximation of full GPR, sparse GPR, and TGPR-MO surrogates. The accuracy is measured as the RMSE (described in subsection 4.1) between the approximated ob- jective values of the surrogates and the evaluated values of the underlying objective functions. The number of samples in the dataset was N = 2000 with Latin hypercube sampling (LHS). The contour lines represent the underlying objective functions’ land- scape in Figure 5a and the approximated objective functions’ landscape in Figures 5b– 5d. A further close-up of the tradeoff regions for the sparse GPR and TGPR-MO surrogates are shown in Figures 5e and 5f, respectively. The color grading represents 12 Evolutionary Computation Volume xx, Number x 329 Treed GPR for Offline Data-Driven MOPs l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 5: Contour plots of the (a) underlying objective function, landscape approxi- mated by (b) full GPRs, (c) sparse GPRs, and (d) the proposed TGPR-MO surrogates. Close up of the Pareto set for (e) sparse GPRs and (f) the proposed TGPR-MO surro- gates. The contour lines (in solid and dotted) indicate the value of objective functions f1(x) and f2(x), respectively. Color shade represents the RMSE in the approximation (darker is higher RMSE), and the blue dots show the Pareto set. Evolutionary Computation Volume xx, Number x 13 329 A. Mazumdar et al. the RMSE between the approximated and the underlying objective functions, and the blue dots represent the Pareto set. It can be observed that full GPRs (that use all the data) in Figure 5b have the highest accuracy in all the approximated regions of the decision space. For the sparse GPR sur- rogates in Figure 5c, the accuracy is good in all the regions with deterioration near the Pareto set. In the case of the proposed TGPR-MO surrogates in Figure 5d, the accuracy is low in most of the regions of the decision space except near the Pareto set. It can also be observed that the contour lines are linear in most regions because of the prediction provided by the leaf nodes of the trees. However, near the Pareto set the GPRs at the leaf nodes are used for predictions, thus making the contours nonlinear. Comparing the accuracy near the Pareto set for sparse GPR and TGPR-MO surrogates in Figures 5e and 5f, it is evident that TGPR-MO surrogates provide a better approximation accuracy compared to sparse GPR surrogates. 3.3 Complexity While building TGPR-MO surrogates, a maximum of K GPRs are built in every itera- − 1 samples, because a split at the node of a tree tion. These GPRs can have Nmin to 2Nmin occurs when the subset size is larger than or equal to 2Nmin. Thus every leaf node has at − 1 samples. As these local GPRs have a least Nmin samples and a maximum of 2Nmin smaller number of samples, the computational cost is low. The complexity of building TGPR-MO surrogates can vary depending on the function landscape, the number of decision variables, and the provided data. The worst-case complexity will be achieved − 1 samples when GPRs are built at all the leaf nodes of the regression tree with 2Nmin − 1)3). at each leaf node. The complexity of building a GPR at a leaf node is O((2Nmin As the total number of leaf nodes (in the worst case) is −1 , the total complexity of − 1)2). For an MOP with K objective building GPRs at all the leaf nodes is O(N (2Nmin − 1)2) with a memory com- functions, the worst case time complexity is O(KN (2Nmin plexity of O(KN (2Nmin − 1)). N 2Nmin Even in the worst case, the computational cost of building TGPR-MO surrogates is significantly smaller than building full GPRs that have time and memory complex- ities of O(N 3) and O(N 2), respectively. The complexity for building sparse GPRs with M (where M < N) inducing points is O(KN M 2). However, as mentioned previously, finding the induction points uses a gradient descent or greedy search algorithm that becomes expensive when the sample size is large, increasing the overall computational cost. In contrast, the complexity of TGPR-MO surrogates during optimization is primar- ily due to evaluating the individuals using the regression trees and GPRs at the leaves, which is significantly lower. The optimization results and comparisons of the proposed TPGR-MO surrogates with other surrogate models are shown in the next section. 4 Experimental Results The goal of the proposed TGPR-MO surrogates is to build computationally cheaper surrogates when dealing with large datasets for solving offline data-driven MOPs. This section reports the tests performed to find whether TGPR-MO surrogates have signifi- cant improvement over sparse GPRs and full GPRs in computation time and the quality of the solutions (in hypervolume and accuracy). The starting point of the experiments was data generated from distance-based visualizable test problems (DBMOPP) (Field- send et al., 2019) for a better understanding of the behavior of the surrogates. For the DBMOPP test problems, the solutions in the decision and objective spaces can be simultaneously visualized. Thus, the search behavior can be observed with the progress 14 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs of the optimization process. In addition, the complexity of these problems can be con- trolled by the features (e.g., varying density, number of local fronts, dominance resis- tance regions, numbers of objective functions, and decision variables). These advan- tages provide more control over the problems and make them adjustable to reflect the needs of real-world optimization problems. These features of DBMOPP test problems prove to be more advantageous compared to DTLZ (Deb et al., 2005) benchmark prob- lems. A detailed description of the experiment settings, results, and in-depth analysis are provided in the following subsections. 4.1 Experimental Setup All the approaches for solving the offline data-driven MOP were coded in Python uti- lizing the DESDEO framework (Misitano et al., 2021) (https://desdeo.it.jyu.fi).1 The experiments were executed on one node of an HPC cluster equipped with AMD Rome CPUs, each node having 128 cores running at 2.6 GHz, with 256 GiB of memory. Each individual run was executed on one CPU core and allocated a maximum memory us- age of 2 GiB. The regression tree was built using the sklearn Python package (Pedregosa et al., 2011). For building the GPRs at the leaf nodes, the GPy (GPy, 2012) Python package was used. Benchmark Problems 4.1.1 Four DBMOPP problems P1–4, as shown in Table 1, were used. The problem instances and data were generated by the code provided by Fieldsend et al. (2019). All combi- nations of numbers of objective functions (K ∈ {3, 5, 7}) and numbers of decision vari- ables (n ∈ {2, 5, 7, 10}) were used for the tests. The total number of problem instances tested was 48, and every problem instance represented a different type of problem characteristic. 4.1.2 Datasets For generating the data, Latin hypercube sampling (LHS) and multivariate normal sam- pling (MVNS) (Forrester et al., 2008) were used. In MVNS sampling, the objective func- tions were considered independent with mean at the mid-point of the decision space, that is, zero for DBMOPP problems. The variance of the sampling distribution was set to 0.1 for all the objective functions. Using MVNS sampling tests the ability of the surrogate models and optimization algorithm to handle biased (or skewed) datasets (Mazumdar et al., 2022). Sample sizes of initial data (N ∈ {2000, 10000, 50000}) were chosen for the tests. A total of 288 cases were tested, and 31 sets of data with a random seed were generated for each test case. Each of these datasets was the starting point of the three different surrogate models that were tested. These individual runs were independent, and the results were used to compare the surrogates’ performances statistically. Settings for TGPR-MO Surrogates 4.1.3 The MOEA for building TGPR-MO surrogates was RVEA (Cheng et al., 2016), a decomposition-based MOEA (Zhang and Li, 2007; Deb and Jain, 2014). Decomposition- based MOEAs have shown to be effective in solving offline data-driven MOPs with more than three objective functions. However, the surrogate is not limited to RVEA and one can use any MOEA of choice. The parameters for RVEA were kept the same as sug- gested by Cheng et al. (2016). The parameter Gmax was set to 50. The maximum number 10n . Such a value of Imax was chosen considering of iterations was set to Imax = N Nmin = N 1Source code available at https://github.com/industrial-optimization-group/TreedGP_MOEA Evolutionary Computation Volume xx, Number x 15 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. one GPR is built at a leaf node for all the trees in each iteration (as the maximum number of leaf nodes is N ). The loss function used for building the trees was MSE as mentioned Nmin = 10n. These parameters were chosen because sufficient de- in Equation (6) with Nmin sign points are necessary for building GPRs at the leaves. Based on recommendations from the GPR literature (Chapman et al., 1994; Jones et al., 1998), at least 10n points are required at each leaf node. The kernel used for building the GPRs at the leaf nodes was Mátern 5/2 with automatic relevance determination enabled. 4.1.4 Other Surrogates Tested The proposed TGPR-MO surrogates were compared with sparse GPR and full GPR surrogates. The same GPy package and kernel were used for building both of these surrogates as for the TGPR-MO surrogates. For sparse GPR surrogates, the number of induction points was set to M = 10n. Here, the overall process of solving an offline data- driven MOP with a specific surrogate is referred to as an approach for simplicity. It should be noted that random forest was not considered in the tests as the focus of this paper is on GPR surrogates. Parameter Settings of MOEA (for Solving the Offline Data-Driven MOP) 4.1.5 For solving the offline data-driven MOP with surrogates as objective functions, RVEA with the same default parameter settings was used. The termination criterion for RVEA was 1,000 generations, which was sufficient for all the approaches tested to converge. To generate a uniform Pareto front, the reference vectors are rearranged or adapted after a certain number of generations in RVEA. The reference vector adaptation rate was set to once every 100 generations. Performance Indicators 4.1.6 The quality of the solutions obtained by the MOEA was measured in terms of their hypervolume (HV) after evaluating them with the underlying objective functions of 1 , y∗ the test problems. For computing the HV indicator, the reference point of y∗ = (y∗ 2 , . . . , y∗ K ) in the objective space was chosen, such that it is dominated by all the solutions. In this paper, the reference point used was y∗ = (2 K ) because this point is always dominated in DBMOPP problems. The multivariate RMSE of the ob- jective values of the solutions obtained by the MOEA with their respective underlying objective values was used to measure the accuracy. The multivariate RMSE is the Eu- clidean distance between the approximated and evaluated underlying objective func- j =1( ˆfj,i − fj,i )2, where s is the tion values of the solutions and is given by 1 s number of solutions, ˆfj,i and fj,i are the approximated and the underlying objective value, respectively, for the ith solution and j th objective. Finally, the computational cost of the various methods was measured as the time taken in seconds to build the surrogates. K, . . . , 2 (cid:12)(cid:11) K, 2 s i=1 (cid:11) √ √ √ K In this paper, multivariate RMSE is referred to as RMSE for simplicity. The hyper- volume and RMSE indicators are used here to benchmark the performance of TGPR- MO surrogates for solving an offline data-driven MOP. Evaluating the solutions with the underlying objective functions in an offline data-driven MOP may not be possible in real life. 4.2 Results and Discussions While running the experiments, it was observed that building full GPR surrogates with 10,000 and 50,000 sample sizes consistently gave out-of-memory errors. The storage 16 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs complexity of full GPRs is O(KN 2), and each element of the array is a 64-bit float. Hence, the memory requirement for three objective functions and sample sizes of 10,000 and 50,000 is 2.3 GiB and 56 GiB, respectively. Thus, building GPR surrogates using all the provided offline data will become almost impossible with readily available computing resources when the sample size is large. Hence, the results for full GPR surrogates are not included for sample sizes of 10,000 and 50,000. A pairwise Wilcoxon two-tailed significance test was conducted to compare the performance of the different approaches. The calculated p-values were Bonferonni cor- rected, and α = 0.05 was considered for rejecting the null hypothesis (an approach is not significantly better or worse than another approach). The median values were com- pared to determine whether an approach is significantly better or worse than another one, provided that the p-value is less than α. A pairwise comparison of the approaches with a scoring system was used for ranking the approaches. An approach is given a score of +1 if it is significantly better than the other approach. A score of −1 is given to the approach if it is significantly worse than the other approach. If the approach is not significantly better or worse than the other approach, a score of zero is given to both approaches. The sum of the scores is used for ranking all the approaches (a higher score gives a better rank) for the indicator being compared. A rank of “1” indicates that an approach has performed significantly better than all other approaches. Equal ranks indicate that those approaches are not significantly different in their performance. The performance of the approaches is summarized in Table 2. The ranks of the ap- proaches for three different indicators are color-coded in green, yellow, and red in the order of best to worst. It should be noted that these rankings are categorized with re- spect to the sample size and sampling strategies. The total number of instances is 96 (48 for LHS and MVNS each) for each sample size. Each instance consists of the com- bination of the various problem settings, that is, the number of samples (N ), sampling strategy, number of objective functions (K), number of decision variables (n), and prob- lem configuration. The number of instances in which full GPRs or sparse GPRs perform better, worse, or not significantly different compared to the proposed TGPR-MO surro- gates is denoted by “+,” “-,” and “≈,” respectively. For TGPR-MO surrogates, only the number of instances where it performed significantly better than both full GPRs and sparse GPRs (or it ranked the best) is shown. The rows measuring “Time” in Table 2 show that the proposed TGPR-MO surro- gates were computationally the cheapest compared to the full GPR and sparse GPR for different sample sizes and sampling strategies. The number of instances TGPR-MO per- formed significantly better than the other two surrogates in each subcategory is close to 48 (that was the total number of instances in each subcategory). The TGPR-MO surro- gates performed better than sparse GPR surrogates in hypervolume for all sample sizes and sampling strategies. However, full GPRs performed the best in hypervolume for a sample size of 2,000. For 2,000 samples, sparse GPRs outperformed TGPR-MO surro- gates for both sampling strategies in RMSE, and full GPR performed the best. The RMSE of the solutions for TGPR-MO surrogates was better than sparse GPRs for sample sizes of 10,000 and 50,000 for LHS sampling only, whereas for MVNS sampling, the sparse GPR slightly outperformed TGPR-MO surrogates for 10,000 and 50,000 samples. For a smaller sample size, one may choose full GPRs because they give the best per- formance in hypervolume and RMSE. However, full GPRs become almost impossible to build due to their high computational cost when the sample size is large. One can use TGPR-MO surrogates for larger sample sizes because they perform better in hy- pervolume and RMSE and excellently in computation time compared to sparse GPRs. Evolutionary Computation Volume xx, Number x 17 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. Table 2: Summary of pairwise comparison of the hypervolume (HV), RMSE, and time (seconds) for the different approaches. The number of instances in which full GP and sparse GP surrogates perform better, worse, and not significantly different compared to the TGP-MO surrogates is indicated by “+,” “-,” and “≈,” respectively. The full GP runs failed due to memory overflow in the instances marked “—.” The approaches’ performance is ranked by the color code green, yellow, and red in the order of best to worst, respectively. Sample size Sampling strategy Indicator Full GP + / - / ≈ Sparse GP + / - / ≈ TGP-MO + Surrogate type 2,000 LHS MVNS 10,000 LHS MVNS 50,000 LHS MVNS HV RMSE Time (s) HV RMSE Time (s) HV RMSE Time (s) HV RMSE Time (s) HV RMSE Time (s) HV RMSE Time (s) 9 / 9 / 30 43 / 2 / 3 0 / 48 / 0 17 / 14 / 17 34 / 7 / 7 0 / 48 / 0 — — — — — — — — — — — — 4 / 38 / 6 19 / 21 / 8 1 / 46 / 1 14 / 28 / 6 20 / 16 / 12 1 / 47 / 0 4 / 33 / 11 19 / 23 / 6 0 / 48 / 0 11 / 29 / 8 21 / 16 / 11 0 / 48 / 0 3 / 35 / 10 15 / 24 / 9 0 / 48 / 0 10 / 29 / 9 23 / 19 / 6 0 / 48 / 0 9 1 46 12 2 47 33 23 48 29 16 48 35 24 48 29 19 48 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 However, for non-uniform sampling strategies, for example, MVNS, TGPR-MO surro- gates suffer slightly in RMSE compared to sparse GPR surrogates. Sparse GPR surro- gates have better RMSE than TGPR-MO surrogates because the variational parameters are selected by minimizing the KL divergence. Thus, the inducing inputs selected are not skewed even if the provided dataset is, for example, in MVNS sampling. The performances of a few selected instances are shown in Table 3. The table shows the median and the standard deviation of hypervolume, RMSE, and time taken to build the surrogates for the three different approaches for 31 runs. The instances shown in the table were chosen based on the maximum difference between the median hypervolume of the best and the second best performing surrogates. One instance was selected from each sample size, sampling strategy, and number of objective functions. The figures in bold represent the best performing approaches. It can be observed that the proposed TGPR-MO surrogates performed the best in building time in the instances shown. It can also be observed that TGPR-MO surrogates produce solutions with an improved hypervolume and RMSE for sample sizes of 10,000 and 50,000 compared to sparse GPR surrogates. It can be observed that the building times of TGPR-MO surrogates are less 18 Evolutionary Computation Volume xx, Number x 329 Treed GPR for Offline Data-Driven MOPs - O M R P G T 1 0 + E 0 4 1 . 1 0 + E 0 2 1 . 1 0 + E 2 6 2 . 1 0 + E 2 1 1 . 1 0 + E 3 8 2 . 1 0 + E 3 5 . 1 1 0 + E 1 3 1 . 0 0 + E 5 4 2 . 1 0 + E 9 3 1 . 0 0 + E 0 0 8 . 1 0 + E 7 2 1 . 1 0 + E 9 6 1 . 1 0 + E 9 4 2 . 0 0 + E 0 8 . 8 1 0 + E 3 9 5 . 1 0 + E 1 1 . 2 1 0 + E 8 7 3 . 1 0 + E 2 3 1 . 1 0 + E 2 4 1 . 0 0 + E 2 5 1 . 1 0 + E 3 7 4 . 1 0 + E 9 1 1 . 1 0 + E 4 9 8 . 1 0 + E 2 0 3 . ) s ( e m T i e s r a p S R P G 1 0 + E 4 6 8 . 1 0 + E 6 1 2 . 2 0 + E 1 5 5 . 2 0 + E 3 4 1 . 2 0 + E 0 1 . 2 1 0 + E 6 4 3 . 2 0 + E 6 5 3 . 2 0 + E 9 1 1 . 2 0 + E 1 5 5 . 1 0 + E 4 6 7 . 2 0 + E 6 3 . 4 1 0 + E 2 3 6 . 3 0 + E 5 4 1 . 2 0 + E 2 2 2 . 3 0 + E 2 5 6 . 3 0 + E 4 2 1 . 3 0 + E 7 3 9 . 3 0 + E 8 3 1 . 3 0 + E 2 1 4 . 2 0 + E 9 6 6 . 3 0 + E 8 3 2 . 2 0 + E 5 6 4 . 3 0 + E 1 5 3 . 2 0 + E 4 5 6 . l l u F R P G 2 0 + E 6 3 1 . 1 0 + E 5 8 1 . 2 0 + E 6 1 6 . 2 0 + E 0 4 1 . 2 0 + E 0 2 3 . 1 0 + E 8 3 5 . 2 0 + E 5 2 3 . 1 0 + E 6 7 6 . 2 0 + E 4 3 . 5 2 0 + E 1 0 1 . 2 0 + E 7 5 3 . 1 0 + E 0 4 4 . — — — — — — - O M R P G T E S M R e s r a p S R P G l l u F R P G - O M R P G T e s r a p S R P G l l u F R P G e m u l o v r e p y H 1 0 - E 7 6 . 2 1 0 - E 8 3 . 1 1 0 - E 3 3 . 9 1 0 - E 6 6 . 1 1 0 - E 6 6 . 1 1 0 - E 1 1 . 1 1 0 - E 5 5 . 8 1 0 - E 9 9 . 1 0 0 + E 4 0 . 2 1 0 - E 8 7 . 2 1 0 - E 0 8 . 9 1 0 - E 8 3 . 2 1 0 - E 0 5 . 1 2 0 - E 5 0 . 5 1 0 - E 9 0 . 7 1 0 - E 8 2 . 2 1 0 - E 3 7 . 9 2 0 - E 2 8 . 8 1 0 - E 9 0 . 9 1 0 - E 2 5 . 1 1 0 - E 3 8 . 1 1 0 - E 2 2 . 1 1 0 - E 2 2 . 1 1 0 - E 2 4 . 1 0 0 + E 4 0 . 1 1 0 - E 2 7 . 5 2 0 - E 8 1 . 4 1 0 - E 4 7 . 2 0 0 + E 9 7 . 1 0 0 + E 5 7 . 1 1 0 - E 8 3 . 4 0 0 + E 6 1 . 2 1 0 - E 0 7 . 9 0 0 + E 2 5 . 1 1 0 - E 9 0 . 6 0 0 + E 6 7 . 2 1 0 - E 9 5 . 1 0 0 + E 6 7 . 3 1 0 - E 5 5 . 8 1 0 - E 5 9 . 8 1 0 - E 5 8 . 4 0 0 + E 0 1 . 2 1 0 - E 2 2 . 2 0 0 + E 6 5 . 1 1 0 - E 4 8 . 1 0 0 + E 8 6 . 1 1 0 - E 0 4 . 3 0 0 + E 2 3 . 2 0 0 + E 0 1 . 1 0 0 + E 1 9 . 2 0 0 + E 4 3 . 1 1 0 - E 9 6 . 3 2 0 - E 1 1 . 5 2 0 - E 6 2 . 8 0 0 + E 8 0 . 1 1 0 - E 5 6 . 4 0 0 + E 6 8 . 1 1 0 - E 5 9 . 1 1 0 - E 0 8 . 7 1 0 - E 7 0 . 2 — — — — — — 1 0 + E 8 4 . 7 0 0 + E 3 7 . 2 4 0 + E 4 6 . 1 2 0 + E 9 5 . 5 6 0 + E 3 2 . 9 4 0 + E 2 3 . 9 1 0 + E 7 2 . 6 0 0 + E 0 1 . 5 4 0 + E 0 3 . 1 3 0 + E 0 2 . 1 6 0 + E 2 8 . 8 5 0 + E 0 3 . 5 1 0 + E 8 5 . 7 1 0 - E 3 9 . 5 4 0 + E 5 6 . 1 2 0 + E 6 2 . 8 6 0 + E 9 5 . 8 5 0 + E 1 6 . 2 1 0 + E 7 2 . 6 0 0 + E 1 3 . 3 4 0 + E 5 7 . 1 2 0 + E 5 1 . 3 6 0 + E 4 2 . 9 5 0 + E 1 0 . 1 1 0 + E 9 4 . 5 0 0 + E 5 3 . 9 4 0 + E 7 2 . 1 3 0 + E 3 2 . 1 6 0 + E 5 5 . 6 6 0 + E 4 2 . 1 1 0 + E 5 6 . 5 0 0 + E 5 3 . 6 4 0 + E 6 0 . 1 3 0 + E 0 1 . 2 6 0 + E 2 4 . 6 5 0 + E 2 6 . 2 1 0 + E 6 2 . 6 0 0 + E 8 7 . 9 4 0 + E 7 2 . 1 2 0 + E 9 0 . 6 6 0 + E 1 7 . 6 5 0 + E 0 9 . 1 1 0 + E 7 9 . 4 0 0 + E 0 4 . 5 4 0 + E 3 2 . 1 3 0 + E 0 5 . 2 6 0 + E 5 5 . 6 6 0 + E 2 2 . 1 1 0 + E 4 5 . 7 0 0 + E 2 0 . 4 4 0 + E 6 4 . 1 2 0 + E 3 7 . 6 6 0 + E 1 2 . 9 5 0 + E 0 0 . 1 1 0 + E 1 7 . 5 0 0 + E 4 1 . 4 4 0 + E 6 2 . 1 3 0 + E 8 0 . 2 6 0 + E 1 8 . 7 5 0 + E 6 3 . 5 — — — — — — n 5 0 1 5 0 1 0 1 7 5 0 1 0 1 0 1 5 5 3 5 7 3 5 7 3 5 7 3 5 7 3 P 4 P 3 P 3 P 2 P 2 P 3 P 4 P 1 P 3 P 3 P 3 P K m e l b o r P y g e t a r t s g n i l p m a S e l p m a S e z i s 0 0 0 , 2 S H L S N V M S N V M S H L 0 0 0 , 0 1 n i ( n o i t a i v e d d r a d n a t s r i e h t d n a e m i t d n a , E S M R , e m u l o v r e p y h n a i d e m e h t g n i w o h s s e c n a t s n i t s e t d e t c e l e s f o n o s i r a p m o C : 3 e l b a T . d l o b n i n w o h s e r a s e h c a o r p p a g n i m r o f r e p t s e b e h T . s n u r t s e t 1 3 e h t f o ) s c i l a t i Evolutionary Computation Volume xx, Number x 19 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. . d e u n i t n o C : 3 e l b a T - O M R P G T e s r a p S R P G l l u F R P G - O M R P G T e s r a p S R P G l l u F R P G - O M R P G T e s r a p S R P G l l u F R P G ) s ( e m T i E S M R e m u l o v r e p y H 1 0 + E 6 1 3 . 0 0 + E 5 2 . 9 1 0 + E 0 9 3 . 1 0 + E 0 1 . 1 1 0 + E 7 7 8 . 1 0 + E 4 4 . 3 1 0 + E 9 2 5 . 1 0 + E 7 0 2 . 1 0 + E 8 6 8 . 1 0 + E 4 1 4 . 2 0 + E 1 9 1 . 1 0 + E 9 5 7 . 4 0 + E 6 1 2 . 3 0 + E 7 7 . 1 4 0 + E 9 5 3 . 3 0 + E 9 2 . 6 4 0 + E 5 0 5 . 3 0 + E 1 4 . 9 3 0 + E 9 1 8 . 2 0 + E 5 3 6 . 4 0 + E 4 3 1 . 2 0 + E 6 9 7 . 4 0 + E 7 8 1 . 3 0 + E 4 2 2 . — — — — — — 1 0 - E 1 9 . 3 2 0 - E 3 6 . 8 1 0 - E 8 1 . 6 1 0 - E 0 2 . 1 1 0 - E 4 4 . 7 2 0 - E 9 5 . 8 2 0 - E 3 7 . 9 2 0 - E 4 0 . 5 1 0 - E 3 4 . 1 1 0 - E 9 4 . 1 2 0 - E 1 8 . 9 1 0 - E 6 3 . 1 1 0 - E 9 2 . 8 1 0 - E 8 0 . 6 1 0 - E 4 6 . 8 1 0 - E 0 1 . 2 1 0 - E 0 2 . 9 2 0 - E 3 2 . 9 0 0 + E 7 7 . 1 1 0 - E 1 1 . 9 0 0 + E 3 7 . 2 0 0 + E 3 1 . 1 0 0 + E 5 7 . 3 0 0 + E 5 4 . 1 — — — — — — 1 0 + E 0 3 . 7 0 0 + E 2 6 . 2 4 0 + E 5 6 . 1 2 0 + E 4 0 . 4 6 0 + E 5 6 . 8 5 0 + E 5 6 . 1 1 0 + E 9 5 . 7 1 0 - E 1 4 . 4 4 0 + E 6 7 . 1 2 0 + E 3 0 . 3 6 0 + E 6 2 . 9 5 0 + E 8 1 . 1 1 0 + E 4 0 . 6 0 0 + E 8 2 . 7 4 0 + E 5 4 . 1 2 0 + E 0 1 . 4 6 0 + E 9 4 . 7 5 0 + E 2 1 . 1 1 0 + E 7 2 . 5 1 0 + E 4 1 . 1 4 0 + E 5 2 . 1 3 0 + E 4 0 . 2 6 0 + E 5 5 . 6 6 0 + E 5 0 . 1 — — — — — — n 0 1 0 1 0 1 5 5 5 K m e l b o r P y g e t a r t s e z i s g n i l p m a S e l p m a S 3 5 7 3 5 7 3 P 3 P 1 P 3 P 3 P 3 P S N V M S H L 0 0 0 , 0 5 20 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs than those of sparse GPR by an order of about 102 and 103 for samples size of 10,000 and 50,000 respectively. The building time also increases with the number of objective functions for sparse GPR surrogates. The total number of samples utilized to build TGPR-MO surrogates with the num- ber of iterations for six different problem instances with sample sizes of 2,000, 10,000, and 50,000 is shown in Figure 6. The solid line shows the mean number of samples used, and the shaded region denotes the 95% confidence interval of the runs for each treed GPR surrogate (for each objective). The plots have been extended in the iteration axis to the maximum allowed iterations, I max = N 10n . The iteration axis is broken when I max is large to reduce the width of the plots. It can be observed that the number of samples utilized converges before the maximum iterations and varies with the objective. Certain problem instances required more samples during the building process than others, es- pecially due to the number of decision variables and the characteristics of the problem. If the trade-off region is located in a smaller region of the decision space, the building process converges quickly and therefore consumes fewer samples. The quantity of data available at the trade-off region affects the accuracy and hy- pervolume of the solutions obtained that is a general challenge while solving offline data-driven MOPs. The proposed TGPR-MO surrogates split the decision space into subregions and approximate the underlying objective functions at the trade-off region. It is best suitable when the Pareto set is located in a smaller region of the decision space. When the Pareto set is in a larger region, the prediction of TGPR-MO surrogates is from multiple leaf node GPRs and has discontinuities near the splits of the regression trees. Therefore the approximated Pareto front has some discontinuities compared to sparse GPRs or full GPRs. Further tests with the DTLZ (Deb et al., 2005) benchmark problems are given in the supplementary material. 5 Conclusions This paper proposed tailored surrogate models that can be built with a lower computa- tional cost compared to sparse GPRs and full GPRs for solving offline data-driven MOPs when dealing with large datasets. The proposed TGPR-MO surrogates performed sig- nificantly better than sparse GPR surrogates in hypervolume, RMSE, and computation time for most of the instances of the DBMOPP problems. The full GPR surrogates failed to complete the runs for larger sample sizes due to memory restrictions. Thus, it can be concluded that the proposed TGPR-MO surrogates are best suited for solving offline data-driven MOPs with a large size dataset. A GPR at a leaf node of a tree approximates certain regions of the decision space. This feature can be exploited while solving offline data-driven MOPs with preferences from the decision maker in an a priori or interactive fashion. The provided preferences for objective functions can be utilized to build GPRs at the leaf nodes, and only the solutions that follow the preferences can be approximated. Developing an interactive framework utilizing the TGPR-MO surrogates will be one of the future works. Utilizing the correlation between the objective functions using multi-target regres- sion trees (Osojnik et al., 2018) and multi-output GPRs (Borchani et al., 2015) is an in- teresting future research direction. Tests and comparisons of the uncertainty prediction provided by the proposed TGPR-MO surrogates will be conducted for solving offline data-driven MOPs. Selecting a suitable kernel for a given dataset is an active research topic. Integrating automatic kernel selection into the TGPR-MO surrogates will be quite beneficial. One of the major drawbacks of the TGPR-MO surrogates is the discontinuity between the leaf node GPRs. Tackling discontinuity in the prediction (van Stein et al., Evolutionary Computation Volume xx, Number x 21 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 6: The median and 95% confidence interval of the total number of samples uti- lized (or the sum of the samples available at the leaf nodes that are considered for build- ing GPRs) with iterations for three different DBMOPP problem instances as displayed according to the following format (N, sampling strategy, problem, K, n). The iteration axis is extended to the maximum possible iteration N 10n . The iteration axes for plots 6c–6f are broken to accommodate the plots. 22 Evolutionary Computation Volume xx, Number x 329 Treed GPR for Offline Data-Driven MOPs 2015; Wang et al., 2017) at the partition of the decision space (as provided by the tree) will be a future task. Further improvements can be made in the way the trees partition the decision space. Instead of using a traditional loss function, a nonlinear trade-off criterion can be formulated to split the nodes. Such splitting criteria will enable the TGPR-MO surrogates to approximate the trade-off region with fewer samples. Testing the TGPR-MO surrogates for solving real-life offline data-driven MOPs will be a future task. Acknowledgments This research was partly supported by the Academy of Finland (grant number 311877 and 322221) and is related to the thematic research area DEMO (Decision Analytics uti- lizing Causal Models and Multiobjective Optimization, http://www.jyu.fi/demo) of the University of Jyväskylä. The numerical experiments were performed on Mahti su- percomputer provided by CSC (https://www.csc.fi). References Assael, J.-A. M., Wang, Z., Shahriari, B., and de Freitas, N. (2014). Heteroscedastic treed Bayesian optimisation. CoRR. Retrieved from arXiv:1410.7172. Borchani, H., Varando, G., Bielza, C., and Larrañaga, P. (2015). A survey on multi-output regres- sion. WIREs Data Mining and Knowledge Discovery, 5(5):216–233. 10.1002/widm.1157 Chapman, W., Welch, W., Bowman, K., Sacks, J., and Walsh, J. (1994). Arctic sea ice variability: Model sensitivities and a multidecadal simulation. Journal of Geophysical Research: Oceans, 99(C1):919–935. 10.1029/93JC02564 Cheng, R., Jin, Y., Olhofer, M., and Sendhoff, B. (2016). A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation, 20(5):773–791. 10.1109/TEVC.2016.2519378 Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93(443):935–948. 10.1080/01621459.1998.10473750 Chugh, T., Sindhya, K., Hakanen, J., and Miettinen, K. (2019). A survey on handling computa- tionally expensive multiobjective optimization problems with evolutionary algorithms. Soft Computing, 23:3137–3166. 10.1007/s00500-017-2965-0 Das, K., and Srivastava, A. N. (2010). Block-GP: Scalable Gaussian process regression for multi- modal data. In 2010 IEEE International Conference on Data Mining, pp. 791–796. Deb, K., and Jain, H. (2014). An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 18(4):577–601. 10.1109/TEVC .2013.2281535 Deb, K., Thiele, L., Laumanns, M., and Zitzler, E. (2005). Scalable test problems for evolutionary multiobjective optimization. In A. Abraham, L. Jain, and R. Goldberg (Eds.), Evolutionary multiobjective optimization: Theoretical advances and applications, pp. 105–145. Springer. Emmerich, M., Giannakoglou, K., and Naujoks, B. (2006). Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Transactions on Evolution- ary Computation, 10(4):421–439. 10.1109/TEVC.2005.859463 Fieldsend, J. E., Chugh, T., Allmendinger, R., and Miettinen, K. (2019). A feature rich distance- based many-objective visualisable test problem generator. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 541–549. Evolutionary Computation Volume xx, Number x 23 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / . / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 A. Mazumdar et al. Forrester, A., Sobester, A., and Keane, A. (2008). Engineering design via surrogate modelling. John Wiley & Sons. GPy (2012). GPy: A Gaussian process framework in python. http://github.com/SheffieldML/ GPy. Gramacy, R. B., and Lee, H.K.H. (2008). Bayesian treed Gaussian process models with an applica- tion to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130. 10.1198/016214508000000689 Hughes, E. J. (2001). Evolutionary multi-objective ranking with uncertainty and noise. In Proceed- ings of Evolutionary Multi-Criterion Optimization, pp. 329–343. Jin, Y., Wang, H., Chugh, T., Guo, D., and Miettinen, K. (2019). Data-driven evolutionary optimiza- tion: An overview and case studies. IEEE Transactions on Evolutionary Computation, 23(3):442– 458. 10.1109/TEVC.2018.2869001 Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492. 10.1023/A:1008306431147 Kim, H.-M., Mallick, B. K., and Holmes, C. C. (2005). Analyzing nonstationary spatial data using piecewise Gaussian processes. Journal of the American Statistical Association, 100(470):653–668. 10.1198/016214504000002014 Loh, W.-Y. (2011). Classification and regression trees. WIREs Data Mining and Knowledge Discovery, 1(1):14–23. 10.1002/widm.8 Mazumdar, A., Chugh, T., Hakanen, J., and Miettinen, K. (2020). An interactive framework for of- fline data-driven multiobjective optimization. In Proceedings of Bioinspired Optimization Meth- ods and Their Applications, pp. 97–109. Mazumdar, A., Chugh, T., Hakanen, J., and Miettinen, K. (2022). Probabilistic selection ap- proaches in decomposition-based evolutionary algorithms for offline data-driven mul- tiobjective optimization. IEEE Transactions on Evolutionary Computation, 26(5):1182–1191. 10.1109/TEVC.2022.3154231 Mazumdar, A., Chugh, T., Miettinen, K., and López-Ibáñez, M. (2019). On dealing with uncertain- ties from Kriging models in offline data-driven evolutionary multiobjective optimization. In Proceedings of Evolutionary Multi-Criterion Optimization, pp. 463–474. Misitano, G., Saini, B. S., Afsar, B., Shavazipur, B., and Miettinen, K. (2021). DESDEO: The mod- ular and open source framework for interactive multiobjective optimization. IEEE Access, 9:148277–148295. 10.1109/ACCESS.2021.3123825 Osojnik, A., Panov, P., and Džeroski, S. (2018). Tree-based methods for online multi- target regression. Journal of Intelligent Information Systems, 50:315–339. 10.1007/s10844-017 -0462-7 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. Rahat, A.A.M., Wang, C., Everson, R. M., and Fieldsend, J. E. (2018). Data-driven multi-objective optimisation of coal-fired boiler combustion systems. Applied Energy, 229:446–458. 10.1016/ j.apenergy.2018.07.101 Rasmussen, C. E., and Williams, C.K.I. (2006). Gaussian processes for machine learning. MIT Press. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175. 10.1109/JPROC.2015.2494218 24 Evolutionary Computation Volume xx, Number x l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / . / / 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed GPR for Offline Data-Driven MOPs Snelson, E., and Ghahramani, Z. (2005). Sparse Gaussian processes using pseudo-inputs. In Pro- ceedings of the 18th International Conference on Neural Information Processing Systems, pp. 1257– 1264. Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cur- ran Associates. Titsias, M. K. (2009). Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, pp. 567– 574. van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., and Emmerich, M. (2015). Optimally weighted cluster kriging for big data regression. In E. Fromont, T. De Bie, and M. van Leeuwen (Eds.), Advances in intelligent data analysis XIV, pp. 310–321. Springer. Wang, H., and Jin, Y. (2020). A random forest-assisted evolutionary algorithm for data-driven constrained multiobjective combinatorial optimization of trauma systems. IEEE Transactions on Cybernetics, 50(2):536–549. 10.1109/TCYB.2018.2869674 Wang, H., Jin, Y., and Jansen, J. O. (2016). Data-driven surrogate-assisted multiobjective evo- lutionary optimization of a trauma system. IEEE Transactions on Evolutionary Computation, 20(6):939–952. 10.1109/TEVC.2016.2555315 Wang, H., Jin, Y., Sun, C., and Doherty, J. (2019). Offline data-driven evolutionary optimization us- ing selective surrogate ensembles. IEEE Transactions on Evolutionary Computation, 23(2):203– 216. 10.1109/TEVC.2018.2834881 Wang, H., van Stein, B., Emmerich, M., and Bäck, T. (2017). Time complexity reduction in effi- cient global optimization using cluster kriging. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 889–896. Yang, C., Ding, J., Jin, Y., and Chai, T. (2020). Offline data-driven multiobjective optimization: Knowledge transfer between surrogates and generation of final solutions. IEEE Transactions on Evolutionary Computation, 24(3):409–423. 10.1109/TEVC.2019.2925959 Zhang, Q., and Li, H. (2007). MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712–731. 10.1109/ TEVC.2007.892759 Zitzler, E., and Thiele, L. (1998). Multiobjective optimization using evolutionary algorithms—A comparative case study. In Parallel Problem Solving from Nature, pp. 292–301. Evolutionary Computation Volume xx, Number x 25 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / d o i / / / . 1 0 1 1 6 2 e v c o _ a _ 0 0 3 2 9 2 1 5 5 3 4 0 e v c o _ a _ 0 0 3 2 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 329 Treed Gaussian Process Regression for Solving image

Télécharger le PDF