A Revisit of Infinite Population Models for
Evolutionary Algorithms on Continuous
Optimization Problems
Bo Song
Department of Electrical and Electronic Engineering, the University of Hong Kong,
Pokfulam, Hong Kong
bsong@connect.hku.hk
Victor O.K. Li
Department of Electrical and Electronic Engineering, the University of Hong Kong,
Pokfulam, Hong Kong
vli@eee.hku.hk
https://doi.org/10.1162/evco_a_00249
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Astratto
Infinite population models are important tools for studying population dynamics of
evolutionary algorithms. They describe how the distributions of populations change
between consecutive generations. Generalmente, infinite population models are derived
from Markov chains by exploiting symmetries between individuals in the population
and analyzing the limit as the population size goes to infinity. In questo articolo, we study
the theoretical foundations of infinite population models of evolutionary algorithms
on continuous optimization problems. Primo, we show that the convergence proofs in
a widely cited study were in fact problematic and incomplete. We further show that
the modeling assumption of exchangeability of individuals cannot yield the transition
equation. Then, in order to analyze infinite population models, we build an analyti-
cal framework based on convergence in distribution of random elements which take
values in the metric space of infinite sequences. The framework is concise and math-
ematically rigorous. It also provides an infrastructure for studying the convergence of
the stacking of operators and of iterating the algorithm which previous studies failed to
address. Finalmente, we use the framework to prove the convergence of infinite population
models for the mutation operator and the k-ary recombination operator. We show that
these operators can provide accurate predictions for real population dynamics as the
population size goes to infinity, provided that the initial population is identically and
independently distributed.
Keywords
Evolutionary algorithms, infinite population models, population dynamics, conver-
gence in distribution, theoretical analysis.
1
introduzione
Evolutionary algorithms (EAs) are general purpose optimization algorithms with great
successes in real-world applications. They are inspired by the evolutionary process
in nature. A certain number of candidate solutions to the problem at hand are mod-
eled as individuals in a population. The algorithm evolves the population by mutation,
crossover, and natural selection so that individuals with more preferable objective func-
tion values have higher survival probability. By the “survival of the fittest” principle, Esso
Manuscript received: 17 ottobre 2018; revised: 26 Gennaio 2019; accepted: 30 Gennaio 2019.
© 2019 Istituto di Tecnologia del Massachussetts
Evolutionary Computation 28(1): 55–85
B. Song and V.O.K. Li
is likely that after many generations the population will contain individuals with high
fitness values such that they are satisfactory solutions to the problem at hand.
Though conceptually simple, the underlying evolutionary processes and the behav-
iors of EAs remain to be fully understood. The difficulties lie in the fact that EAs are cus-
tomizable population-based iterative stochastic algorithms, and the objective function
also has great influence on their behaviors. A successful model of EAs should describe
both the mechanisms of the algorithm and the influence from the objective function.
One way to study EAs is to model them as dynamical systems. The idea is to pick a
certain quantity of interest first, such as the distribution of the population or a certain
statistic about it. Then, transitions in the state space of the picked quantity are modeled.
A transition matrix (when the state space is finite) or a difference equation (when the
state space is not finite) for Markov chain is derived to describe how the picked quantity
changes between consecutive generations.
In order to characterize the population dynamics accurately, the state space of the
Markov chain tends to grow rapidly as the population size increases. Di conseguenza, even
for time-homogeneous EAs with moderate population size, the Markov chain is often
too large to be analyzed or simulated. To overcome this issue, some researchers turn
to studying the limiting behaviors of EAs as the population size goes to infinity. IL
idea is to exploit some kind of symmetry in the state space (such as all individuals
having the same marginal distribution), and prove that in the limit the Markov chain
can be described by a more compact model. Models built in this way are called infinite
population models (IPMs).
In questo articolo, we follow this line of research and study IPMs of EAs on continuous
spazio. More specifically, we aim at rigorously proving the convergence of IPMs. In this
study, by “convergence” we usually mean that IPMs characterize limiting behaviors of
real EAs. “An IPM converges loosely” means that as the population size goes to infinity,
the population dynamics of the real EA converge in a sense to the population dynamics
predicted by this model. This usage is different from conventional ones where it means
that the EA eventually locates and gets stuck in some local or global optima. Conver-
gence results are the foundations and justifications of IPMs.
The main results of the article can be summarized as follows. Primo, we show that
a widely cited research on convergence of IPM was problematic. It is mainly because
the core assumption of exchangeability of individuals in their proof cannot lead to
the convergence conclusion. Then, we build an analytical framework from a different
perspective and show that it defines convergence in general settings. Then, to show
the effectiveness of our framework, we prove the convergence of IPM of simple EA
with mutation and crossover operators when the initial population follows an identi-
cal and independent distribution (i.i.d.). Finalmente, we discuss the results and point out
that the convergence of IPM of simple EA with proportionate selection is yet to be
sviluppato.
To our knowledge, there are very few research efforts which directly studied the
convergence of IPMs. Among them, Vose (1999B) and Qi and Palmieri (1994UN,B) are the
classic ones. We focus on Qi and Palmieri (1994UN,B) in this article as their results are for
EA on continuous solution space and most relevant here. In the first part of their study,
the authors built a model of simple EA with only mutation and proportionate selection.
A transition equation is constructed which describes how the probability density func-
zioni (p.d.f.s) of marginal distribution of the population change between consecutive
generations. The authors proved that if individuals are exchangeable in a population,
as the population size goes to infinity, the marginal p.d.f.s of populations of simple EA
56
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
will converge point-wise to the p.d.f.s. calculated by the following transition equation:
(cid:2)
fxk+1 (X) =
Ffxk (sì)G(sì)fw(X|sì)dy
(cid:2)
Ffxk (sì)G(sì)dy
,
(1)
where F is the solution space, fxk is the predicted marginal p.d.f. of the kth generation, G
is the objective function to be maximized, and fw(X|sì) is the conditional p.d.f. decided
by the mutation operator. In the second part of the research, the authors further ana-
lyzed the crossover operator and modified the transition equation to include all three
operators in the simple EA.
In Section 2, we will examine Qi’s model and show that their study is unsound and
incomplete. Primo, we provide a counterexample to show that in the authors’ proof a key
assertion about the law of large numbers (LLN) for exchangeable random vectors is
generally not true. Therefore, the whole proof is unsound. Inoltre, we show that
the modeling assumption of exchangeability of individuals cannot yield the transition
equation in general. This means that under the authors’ modeling assumption, the con-
clusion (1) cannot be reached.
Inoltre, we show that the authors’ proofs in Qi and Palmieri (1994UN,B) are in-
complete. The authors did not address the convergence of the stacking of operators
and of recursively iterating the algorithm. In essence, the authors attempted to prove
the convergence of the IPM for only one iteration step. Even if the proof for (1) is cor-
rect, it only shows that as n → ∞ the marginal p.d.f.s of the (k + 1)th population con-
verges point-wise to fxk+1 (X), provided that the marginal p.d.f. of the kth generation is
fxk (X). Tuttavia, this convergence does not automatically hold for all subsequent gen-
erations. Di conseguenza, (1) cannot be iterated to make predictions for subsequent (> k + 1)
generations.
Besides Qi and Palmieri (1994UN,B), we found no other studies that attempted to
prove the convergence of IPMs for EAs on continuous space. Therefore, in Section 3 we
propose a general analytical framework. The novelty of our framework is that from the
very start of the analysis, we model generations of the population as random elements
taking values in the metric space of infinite sequences, and we use convergence in dis-
tribution instead of point-wise convergence to define the convergence of IPMs.
To illustrate the effectiveness of our framework, we perform convergence analysis
of IPM of the simple EA in Sections 4 E 5. In Section 4, we adopted a “stronger”
modeling assumption that individuals of the same generation in the IPM are identically
and independently distributed (i.i.d.), and we gave sufficient conditions under which
the IPM is convergent. For general EA, this assumption may seem restricted at first
sight, but it turns out to be a reasonable one. In Section 5, we analyze the mutation
operator and the k-ary recombination operator. We show that these commonly used
operators have the property of producing i.i.d. populations, in the sense that if the initial
population is i.i.d., as the population size goes to infinity, in the limit all subsequent
generations are also i.i.d. This means that for these operators, the transition equation in
the IPM can predict the real population dynamics as the population size goes to infinity.
We also show that our results hold even if these operators are stacked together and
iterated repeatedly by the algorithm. Finalmente, in Section 6 we conclude the article and
propose future research.
To be complete, regarding Qi and Palmieri (1994UN,B), there is a comment by Yong
et al. (1998) with a reply published. Tuttavia, the comment was mainly about the latter
part of Qi and Palmieri (1994UN), where the authors analyzed the properties of EAs based
on the IPM. It did not discuss the proof for the model itself. For IPMs of EAs on discrete
Evolutionary Computation Volume 28, Numero 1
57
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
optimization problems, extensive research was done by Vose et al. in a series of studies
(Nix and Vose, 1992; Vose, 1999B, 1999UN, 2004). The problems under consideration were
discrete optimization problems with finite solution space. The starting point of the au-
thors’ analysis was to model each generation of the population as an “incidence vector,"
which describes for each point in the solution space the proportion of the population it
occupies. Based on this representation the authors derived transition equations between
incidence vectors of consecutive generations and analyzed their properties as the popu-
lation size goes to infinity. Tuttavia, for EAs on continuous solution space, the analyses
of Vose et al. are not immediately applicable. This is because for continuous optimiza-
tion problems the solution space is not denumerable. Therefore, the population cannot
be described by a finite-dimensional incidence vector.
2 Discussion of the Works of Qi et al.
In this section we analyze the results of Qi and Palmieri (1994UN,B). We begin by intro-
ducing some preliminaries for the analysis. Then, in Section 2.2, following the notations
and derivations in the authors’ papers, we provide a counterexample to show that the
convergence proof for the transition equation in Qi and Palmieri (1994UN) is problematic.
We further show that the modeling assumption of exchangeability cannot yield the tran-
sition equation in general. In Section 2.3, we show that the analyses in Qi and Palmieri
(1994UN,B) are incomplete. The authors did not prove the convergence of IPMs in the
cases where operators are stacked together and the algorithm is iterated for multiple
generations.
2.1
Preliminari
In the authors’ paper (Qi and Palmieri, 1994UN), the problem to be optimized is
arg max
X
G(X) s.t. x ∈ F ⊆ Rm,
(2)
where F is the solution space and g is some given objective function. The analysis in-
tends to be general; Perciò, no explicit form of g is assumed. The algorithm to be
analyzed is the simple EA with proportionate selection and mutation. Let Xk = (xj
j =1
denote the kth generation produced by the EA, where N is the population size. To gener-
ate the (k + 1)th population, proportionate selection produces intermediate populations
X(cid:6)
k following the conditional probability that
k )N
(cid:6)io
P(X
k
= xj
k
|Xk ) =
(cid:3)
G(xj
k )
N
l=1 g(xl
k )
, for all i, j = 1, 2, . . . , N.
(3)
After selection, each individual in X(cid:6)
tion follows the conditional probability that
k is mutated to generate individuals in Xk+1. Muta-
F (xi
k+1
(cid:6)io
= x|X
k
= y) = fw(X|sì).
(4)
Overall the algorithm is illustrated in Figure 1.
After presenting the optimization problem and the algorithm, the authors proved
the convergence of the IPM if the distributions of individuals in the population are ex-
changeable. It is the main result in Qi and Palmieri (1994UN).
Theorem 1 (Theorem 1 in Qi and Palmieri, 1994UN): Assume that the fitness function g(X)
In (2) and the mutation operator of simple EA described by (4) satisfy the following conditions:
58
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
Figura 1: The pseudocode of the simple EA.
≤ g(X) ≤ gmax < ∞, ∀x ∈ F. 1. 0 < gmin 2. supx,y∈Rd fw(x|y) ≤ M < ∞. Then as n → ∞, the time history of the simple EA can be described by a sequence of random vectors (xk ) ∞ k=0 with densities (cid:2) fxk+1 (x) = (cid:2) Ffxk (y)g(y)fw(x|y) dy Ffxk (y)g(y) dy . (5) In Theorem 1, fxk is the marginal p.d.f. of the kth generation predicted by the IPM. It should be emphasized that in Qi and Palmieri (1994a,b), the authors proved this the- orem under the assumption that simple EA has exchangeable individuals in the popu- lation. Though not explicitly stated in the theorem, the assumption of exchangeability is the core assumption in their proof and an integral part of their formulation of the theorem. For analyses in this article, we use the concept of exchangeability in probability theory. Its definition and some basic facts are listed. Definition 1 (Exchangeable random variables, Definition 1.1.1 in Taylor et al., 1985): A finite set of random variables {xi}n i=1 is said to be exchangeable if the joint distribution of (xi )n i=1 is invariant with respect to permutations of the indices 1, 2, . . . , n. A collection of ran- dom variables {xα : α ∈ (cid:2)} is said to be exchangeable if every finite subset of {xα : α ∈ (cid:2)} is exchangeable. Definition 1 can also be extended to cover exchangeable random vectors or ex- changeable random elements by replacing the term “random variables” in the defini- tion with the respective term. One property of exchangeability is that if {xi}n i=1 are n ex- changeable random elements, then the joint distributions of any 1 ≤ k ≤ n distinct ones of them are always the same (Proposition 1.1.1 in Taylor et al., 1985). When k = 1 this property indicates that {xi}n i=1 have the same marginal distribution. Another property is Evolutionary Computation Volume 28, Number 1 59 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 B. Song and V.O.K. Li that a collection of random elements are exchangeable if and only if they are condition- ally independent and identically distributed (c.i.i.d.) given some σ -field G (Theorem 1.2.2 in Taylor et al., 1985). Conversely, a collection of c.i.i.d. random elements are al- ways exchangeable. Finally, it is obvious that i.i.d. random elements are exchangeable, but the converse is not necessarily true. It can be seen that the simple EA generates c.i.i.d. individuals given the current population. Therefore, the individuals within the same generation are exchangeable, and they have the same marginal distribution. It is the core assumption of the proof in Qi and Palmieri (1994a,b) and Condition 3 in the Theorem 1. 2.2 Convergence Proof of the Transition Equation In this section, we analyze the proof of Theorem 1 and show that it is incorrect. The proof by Qi et al. is in Appendix A of Qi and Palmieri (1994a). In the proof, the authors assumed that individuals in the same generation are exchangeable; therefore, they have the same marginal distribution. After a series of derivation steps, the authors managed to obtain a transition equation between the density functions of xi (cid:4) (cid:4) k+1 and Xk: fxi k+1 (x) = FN fXk (y1, y2, . . . , yn) g(yj )fw(x|yj ) N(cid:3) 1 N l=1 g(yl ) where in (6), ξ dy1dy2 . . . dyn for any fixed i, j = E (cid:5) (cid:6) , ξ k (x) ηN k k (x) (cid:2) g(xj k ) for any fixed j, k )fw(x|xj N(cid:7) g(xl k ). l=1 ηN k (cid:2) 1 N l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l (6) (7) (8) f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . Eq. (6) is correct. It accurately describes how the marginal p.d.f. for any individual in the next generation can be calculated from the joint p.d.f. of individuals in the cur- rent generation. Noticing that ηN k is the average of the exchangeable random variables {g(xj j =1, by the LLN for exchangeable random variables, the authors asserted that k )}N lim N→∞ ηN k = η k almost surely (a.s.). The authors further asserted that η E[η k is itself a random variable, satisfying k] = E[g(xj k )] for any j. (9) (10) Eqs. (9) and (10) correspond to (A13) and (A14) in Appendix A of Qi and Palmieri (1994a), respectively. The authors’ proof is correct until this step. However, the authors then asserted that f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 η k is independent of ηN k = g(xj η1 k k ) for all j = 1, 2, . . . , N. for any finite N. In particular, η k is independent of Based on this assertion the authors then proved that for all k and x, (cid:5) (cid:8) (cid:8) (cid:8) (cid:8)E lim N→0 (cid:6) ξ k (x) ηN k − E [ξ E [η k (x)] k] (cid:8) (cid:8) (cid:8) (cid:8) = 0. (11) (12) 60 Evolutionary Computation Volume 28, Number 1 A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space As a result, the p.d.f. in (6) converges point-wisely to E[ξ E[η hand side of (5). Hence the authors claimed that Theorem 1 is proved. k (x)] k ] , which is equal to the right In the following, we provide a counterexample to show that assertion (11) is not true when N ≥ 2 (N = 1 is the degenerate case). Then, we carry out further analysis to show that under the modeling assumption of exchangeability, conclusion (12) or equivalently Theorem 1 cannot be true in general. 2.2.1 On Assertion (11) We first reformulate the assertion. Since {xl k changeable (Property 1.1.2 in Taylor et al., 1985). Let yl = g(xl premises of Theorem 1 imply l=1 are exchangeable, {g(xl }N k )}N l=1 are ex- k ), l = 1, . . . , N . Then the Let y = η {yl}N ≤ yl ≤ gmax. l=1 are exchangeable and gmin k. According to (8), (9), and (10), y has the properties that l=1 yl = y, a.s., (cid:3) (cid:9) N 1 N lim N→∞ E(y) = E(yl ) for any l. (13) (14) (15) Since g is a general function, there are no other restrictions for {yl}N (11) is equivalent to the following assertion: l=1 and y. Therefore, For any {yl}N l=1 and y satisfying (13), (14), and (15), y and (cid:7)N l=1 1 N yl are independent for any finite N. In particular, y is independent of yl for any l. However, we use the following counterexample (modified from Example 1.1.1 and related discussions on pages 11–12 in Taylor et al. (1985), to show that assertion (16) is not true. Therefore, (11) is not true. 2.2.2 Counterexample Let {zl}∞ l=1 be a sequence of i.i.d. random variables satisfying (16) − gmax − gmin 4 for all l. Let y be a random variable independent of {zl}∞ gmax ≤ zl ≤ gmax − gmin 4 ≤ y ≤ 3gmax + 3gmin 4 and E(zl ) = 0 l=1 satisfying + gmin 4 and E(y) = gmax + gmin 2 . Finally, let yl = zl + y for all l. N (cid:3) It can easily be verified that {yl}∞ l=1 and y satisfy (13) and (15). Since zl is bounded, E(|zl|) < ∞ for any l. By the strong law of large numbers (SLLN) for i.i.d. random vari- l=1 zl → 0 a.s. as N → ∞; therefore, (14) is also satisfied, that is, y is the limit ables, 1 (cid:3) (cid:3) N N N of 1 l=1 zl and y is inde- N pendent of {zl}∞ N l=1 yl is not independent of y except for some degenerate cases (e.g., when y equals to a constant). In particular, in general yl = y + zl is not independent of y for any l. Therefore, assertion (16) is not true. Equivalently, as- sertion (11) is not true. This renders the authors’ proof for (12) invalid. l=1 yl as N → ∞. However, because 1 l=1, it can be seen that 1 l=1 yl = y + 1 (cid:3) (cid:3) N N N N Evolutionary Computation Volume 28, Number 1 61 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 B. Song and V.O.K. Li Further Analysis 2.2.3 In the following, we carry out further analysis to show that (12) cannot be true even considering other methods of proof and adding new sufficient conditions. Therefore, in general, Theorem 1 cannot be true. ξ k (x) ηN k . We prove the following lemma. To begin with, consider the random variable Lemma 1: E (cid:10) (cid:11) (cid:10) → E (cid:11) ξ k (x) η k ξ k (x) ηN k as N → ∞. Proof: According to (9), ηN k surely. Since h(x) = 1 x is continuous on (0, ∞), a.s.→ η k, since gmin ≤ ηN k ≤ gmax, 0 < gmin ≤ η k ≤ gmax almost h(ηN k ) = 1 ηN k a.s.→ h(η k ) = 1 η k (Proposition 47.2 in Port, 1994). Then ξ k (x) ηN k ξ a.s.→ k (x) η k (Proposition 47.4 (ii) in Port, 1994). Finally, by the conditions in Theorem 1, 0 ≤ ξ k (x) ηN k ≤ Mgmax gmin inated Convergence Theorem (Proposition 11.30 in Port, 1994), E N → ∞. (cid:10) . By the Lebesgue’s Dom- → E (cid:10) (cid:11) (cid:11) ξ ξ k (x) η k k (x) ηN k as (cid:3) Now by Lemma 1, (12) is equivalent to (cid:13) (cid:12) E ξ k (x) η k = E [ξ E [η k (x)] k] . ((cid:3)) Now it is clear that if the only assumption is exchangeability, ((cid:10)) is not true even considering other methods of proof. Of course, if (11) is true, ξ k are indepen- dent, then ((cid:10)) is true. However, as already shown by the counterexample, (11) is not true in general. Therefore, ((cid:10)), and equivalently Theorem 1, are in general not true. k (x) and η A natural question then arises: Is it possible to introduce some reasonable suffi- cient conditions such that ((cid:10)) can be proved? One of such conditions frequently used is that η k )]. However, the following analysis shows that given the modeling assumption of exchangeability, this condition is not true in general. Therefore, it cannot be introduced. = E[g(xj k For exchangeable random variables {g(xl V(ηN k ) V(η k ) = lim N→∞ k )}N l=1, we have ⎡ ⎢ ⎢ ⎢ ⎣ E ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ = lim N→∞ N(cid:3) l=1 g(xl k ) N ⎤ 2 ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣E N(cid:3) l=1 − g(xl k ) N ⎤ 2 ⎥ ⎥ ⎥ ⎦ ⎫ ⎪⎪⎪⎪⎬ ⎪⎪⎪⎪⎭ (cid:29) = lim N→∞ 1 N 2 (cid:9) N(cid:7) l=1 (cid:28) (cid:27) g(xl k ) V + N(cid:7) N(cid:7) i=1 j =1,j (cid:11)=i C g(xi k ), g(xj k ) (17) ⎫ ⎬ (cid:30) ⎭ 62 Evolutionary Computation Volume 28, Number 1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space Table 1: Population dynamics of EAn under operator H. k EAn EA1 ... EAn ... EA∞ 0 P1 0 ... Pn 0 ... P∞ 0 1 2 . . . … … H1(P1 0) ... Hn(Pn 0 ) ... H∞(P∞ 0)) 0 )) H1(H1(P1 ... Hn(Hn(Pn ... 0 ) H∞(H∞(P∞ 0 )) … l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 (cid:9) (cid:27) V (cid:28) g(x1 k ) N = lim N→∞ + N − 1 N C (cid:27) g(x1 k ), g(x2 k ) (cid:31) (cid:28) (cid:27) g(x1 k ), g(x2 k ) (cid:28) , = C (18) (19) where V(x) is the variance of x and C(x, y) is the covariance of x and y. (17) is by the boundedness of ηN k and the Lebesgue’s Dominated Convergence Theorem, (18) is by the exchangeability of {xj }N j =1, and (19) is by the boundedness of g and pushing N k to infinity. Now it is clear that if the only modeling assumption is exchangeability, k does not con- there is no guarantee that C verge to a constant. Thus this condition cannot be introduced as a sufficient condition for ((cid:10)). = 0. Therefore, in general ηN k ), g(x2 k ) g(x1 (cid:27) (cid:28) 2.3 The Issue of the Stacking of Operators and Iterating the Algorithm In the following, we discuss IPMs from another perspective and show that the proofs in Qi and Palmieri (1994a,b) are incomplete, as they only consider convergence of one iteration. Because discussion in this section relates very closely to our proposed frame- work, we will use our notation from here on in our article consistently, which is dif- ferent from the previous two sections where we followed strictly the notations of Qi and Palmieri (1994a,b). k,i )n = (xn Consider an EA with only one operator. Let the operator be denoted by H. When the population size is n, denote this EA by EAn and the operator it actually uses by Hn. Let Pn i=1 denote the kth generation produced by EAn. Then the transition rules k between consecutive generations produced by EAn can be described by Pn k ). In Table 1, we write down the population dynamics of EAn. Each row in Table 1 shows k is expanded as [Hn]k (Pn the population dynamics produced by EAn. In the table Pn 0 ). Let EA∞ denote the IPM, and P∞ 0 ) denote the populations predicted by k EA∞. Then we can summarize the results in Qi and Palmieri (1994a) in the following way. = [H∞]k (P∞ = Hn(Pn k+1 Assume that the initial population comes from a known sequence of individuals, ∞ = (xi ) i=1. For EAn, its initial population Pn 0 consists of the first n represented by P0 = P0. This setting represents the fact that elements of P0; that is, Pn 0 0 EAn and EA∞ use the same initial population. Hn can be viewed as operators on P0 which takes only the first n elements to produce the next generation. Then the authors i=1. Let P∞ = (xi )n Evolutionary Computation Volume 28, Number 1 63 B. Song and V.O.K. Li essentially proved that Hn(P0) m.p.w.−→ H∞(P0) as n → ∞, (20) where m.p.w. stands for point-wise convergence of marginal p.d.f.s. However, apart from the fact that this proof is problematic, the authors’ proof cov- ers only one iteration step, corresponding to the column-wise convergence of the k = 1 column in Table 1. The problem is that even if (21) is true, it does not automatically lead m.p.w.−→ [H∞]k (P0) as n → ∞. In to the conclusion that for the arbitrary kth step, [Hn]k (P0) other words, one has to study whether the transition equation for one step can be it- erated recursively to predict populations after multiple steps. In Table 1, this problem corresponds to whether other columns have similar column-wise convergence property when the convergence of the k = 1 column is proved. To give an example, consider the column of k = 2 in Table 1. To prove column-wise convergence, the authors need to prove that given (20), Hn(Pn 1 ) m.p.w.−→ H∞(P ∞ 1 ), or equivalently (21) [Hn]2(P0) m.p.w.−→ [H∞]2(P0) (22) as n → ∞. Comparing (20) with (21) and (22), (21) has the same sequence of operators but with a sequence of converging inputs, (22) has the same input but with a sequence of different operators. Therefore, they are not necessarily true even if (20) is proved. A similar problem exists when considering the arbitrary kth generation. We call this problem the issue of iterating the algorithm. As studies in Qi and Palmieri (1994a,b) ignored this issue, we believe their proofs are incomplete. The issue of the stacking of operators is similar. Given some operator H satisfy- m.p.w.−→ G∞(P0) as n → ∞, it is not nec- ing (20) and some operator G satisfying Gn(P0) m.p.w.−→ H∞(G∞(P0)) as n → ∞. However, the authors in Qi essarily true that Hn(Gn(P0)) and Palmieri (1994b) totally ignored this issue and combined the transition equations for selection, mutation and crossover together (in Section III of Qi and Palmieri 1994b) without any justification. ≡ fx(cid:6) · fx(cid:6) In addition, there are several statements in the authors’ proofs in Qi and Palmieri (1994b) that are questionable. First, in the first paragraph of Appendix A (the proof for Theorem 1 in that paper), the authors considered a pair of parents xk and x(cid:6) k for the uni- form crossover operator. xk and x(cid:6) k are “drawn from the population independently with the same density of fxk k .” Then, the authors claimed that “the joint density of xk and x(cid:6) k is therefore fxk k .” This is simply not true. Two individuals drawn indepen- dently from the same population are conditionally independent, they are not necessarily independent. In fact, without the i.i.d. assumption, it is very likely that individuals in the same population are dependent. Therefore, the joint density function of xk and x(cid:6) k is not necessarily fxk k , and the authors’ proof for Theorem 1 in Qi and Palmieri (1994b) is dubious at best. On the other hand, if the authors’ modeling assumption is i.i.d. of in- dividuals for the uniform crossover operator, this assumption is incompatible with the modeling assumption of exchangeability in Qi and Palmieri (1994a) for selection and mutation. Therefore, combining the transition equations for all these three operators is problematic, because the i.i.d. assumption cannot hold beyond one iteration step. · fx(cid:6) Another issue in Qi and Palmieri (1994b) is that the uniform crossover operator pro- duces two dependent offspring at the same time. As a result, after uniform crossover, the intermediate population is not even exchangeable because it has pair-wise dependency between individuals. Then the same incompatible assumption problem arises: that is, 64 Evolutionary Computation Volume 28, Number 1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space the transition equation for the uniform crossover operator cannot be combined with the transition equations for selection and mutation. Besides, the transition equation for the uniform crossover operator cannot be iterated beyond one step and still hold i.i.d. or exchangeability as its modeling assumption. In summary, several issues arise from previous studies on IPMs for EAs on continu- ous optimization problems. Therefore, new frameworks and proof methods are needed for analyzing the convergence of IPMs and studying the issue of the stacking of opera- tors and iterating the algorithm. 3 Proposed Framework In this section, we present our proposed analytical framework. In constructing the framework we strive to achieve the following three goals. 1. The framework should be general enough to cover real-world operators and to characterize the evolutionary process of real EA. 2. The framework should be able to define the convergence of IPMs and serve as justifications of using them. The definition should match one’s intuition and at the same time be mathematically rigorous. 3. The framework should provide an infrastructure to study the issue of the stack- ing of operators and iterating the algorithm. The contents of this section roughly reflect the pursuit of the first two goals. The third goal is reflected in the sufficient conditions for convergence and i.i.d. IPM con- struction in Section 4 and the analyses of the simple EA in Section 5. More specifically, in Section 3.1, we introduce notations and preliminaries for the remainder of this ar- ticle. In Section 3.2, we present our framework. In the framework, each generation is modeled by a random sequence. This approach unifies the spaces of random elements modeling populations of different sizes. In Section 3.3, we define the convergence of the IPM as convergence in distribution on the space of random sequences. We summarize and discuss our framework in Section 3.4. 3.1 Notations and Preliminaries In the remainder of this article, we focus on the unconstrained continuous optimization problem arg max x g(x) s.t. x ∈ Rd , (23) where g is some given objective function. Our framework is general enough such that it does not require other conditions on the objective function g. However, to prove the convergence of IPMs for mutation and recombination, conditions such as those in The- orem 1 are sometimes needed. We will introduce them when they are required. From now on we use N to denote the set of nonnegative integers and N+ the set of positive integers. For any two real numbers a and b, let a ∧ b be the smaller one of them and a ∨ b be the larger one of them. Let x, y be random elements of some measurable space ((cid:4), F ). We use L(x) to represent the law of x. If x and y follow the same law, that is, P(x ∈ A) = P(y ∈ A) for every A ∈ F, we write L(x) = L(y). Note that L(x) = L(y) and x = y have different meanings. In particular, x = y indicates dependency between x and y. We use the notation (xi )n ∞ i=m represents the infinite sequence (xm, xm+1, . . .). {xi}n i=m to represent the array (xm, xm+1, . . . , xn). When n = ∞, i=m represent the i=m and {xi}∞ (xi ) Evolutionary Computation Volume 28, Number 1 65 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 B. Song and V.O.K. Li (cid:3) collections {xm, xm+1, . . . , xn} and {xi|i = m, m + 1, . . .}, respectively. When the range is clear, we use (xi )i and {xi}i or (xi ) and {xi} for short. Let S denote the solution space Rd . This simplifies the notation system when we discuss the spaces Sn and S∞. In the following, we define metrics and σ -fields on S, Sn and S∞ and state properties of the corresponding measurable spaces. i=1(xi − yi )2] 2 . Let S denote the Borel σ -field on S generated by the open sets under ρ. Together (S, S ) defines a measurable space. S is equipped with the ordinary metric ρ(x, y) = [ (cid:3) d 1 · Similarly, Sn is equipped with the metric ρn(x, y) = [ 1 2 , and the corre- sponding Borel σ -field under ρn is denoted by S (cid:6)n. Together (Sn, S (cid:6)n) is the measurable space for n tuples. n i=1 ρ2(xi, yi )] Next, consider the space of infinite sequences S∞ = {(x1, x2, . . .) | xi ∈ S, i ∈ N+}. It under 1+ρ(xi ,yi ) . The Borel σ -field on S∞ (cid:3)∞ i=1 ρ(xi ,yi ) 1 2i is equipped with the metric ρ∞(x, y) = ρ∞ is denoted by S (cid:6)∞ . Then (Sn, S (cid:6)∞ ) is the measurable space for infinite sequences. Since S is separable and complete, it can be proved that Sn and S∞ are also sep- arable and complete (Appendix M6 in Billingsley, 1999). In addition, because of sep- arability, the Borel σ -fields S (cid:6)n and S (cid:6)∞ are equal to S n and S ∞ , respectively. In other words, the Borel σ -fields S (cid:6)n and S (cid:6)∞ generated by the collection of open sets under the corresponding metrics coincide with the product σ -fields generated by all measur- able rectangles (S n) and all measurable cylinder sets (S ∞ ), respectively (Lemma 1.2 in Kallenberg, 2002). Therefore, from now on we write S n and S ∞ for the corresponding Borel σ -fields. Finally, let M, Mn and M∞ denote the set of all random elements of S, Sn and S∞ , respectively. Let πn : S∞ → Sn be the natural projection: πn(x) = (x1, x2, . . . , xn). Since given x ∈ M∞ , (πn ◦ x) : (cid:4) → Sn defines a random element of Sn projected from S∞ , we also use πn to denote the mapping: πn : M∞ → Mn where πn(x) = (x1, x2, . . . , xn). By defini- tion, πn is the operator which truncates random sequences to random vectors. Given A ⊂ S∞ , we use πn(A) to denote the projection of A; that is, πn(A) = {x ∈ Sn : x = πn(y) for some y ∈ A}. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 3.2 Analytical Framework for EA and IPMs In this section, we present an analytical framework for the EA and IPMs. First, the mod- eling assumptions are stated. We deal only with operators that generate c.i.i.d. individ- uals. Then, we present an abstraction of the EA and IPMs. This abstraction serves as the basis for building our framework. Finally, the framework is presented. It unifies the range spaces of the random elements and defines the convergence of IPMs. 3.2.1 Modeling Assumptions We assume that the EA on the problem (23) is time homogeneous and Markovian, such that the next generation depends only on the current one, and the transition rule from the kth generation to the (k + 1)th generation is invariant with respect to k ∈ N. We further assume that individuals in the next generation are c.i.i.d. given the current gen- eration. As this assumption is the only extra assumption introduced in the framework, it may need some further explanation. The main reason for introducing this assumption is to simplify the analysis. Con- ditional independence implies exchangeability, therefore individuals in the same gen- eration k ∈ N+ are always exchangeable. As a result, it is possible to exploit the sym- metry in the population and study the transition equations of marginal distributions. Besides, it is because of conditional independence that we can easily expand the random 66 Evolutionary Computation Volume 28, Number 1 A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space elements modeling finite-sized populations to random sequences, and therefore define convergence in distribution for random elements of the corresponding metric space. In addition, many real world operators in EAs satisfy this assumption, such as the pro- portionate selection operator and the crossover operator analyzed in Qi and Palmieri (1994a,b). Finally, we emphasize that exchangeability is a property that facilitates the analysis of convergence across multiple iterations and different operators. Because of the generational nature of EA, we want a property that leads to convergence for one it- eration and also holds as a premise for the analysis of the next iteration. Exchangeability serves as this property in our analysis. By exchangeability the convergence results can be extended to further iterations and stacked with results from other operators. However, we admit that there are some exceptions to our assumption. A most no- table one may be the mutation operator, though it does not pose significant difficul- ties. The mutation operator perturbs each individual in the current population inde- pendently, according to a common conditional p.d.f. If the current population is not exchangeable, then after mutation the resultant population is not exchangeable, either. Therefore, it seems that mutation does not produce c.i.i.d. individuals. However, con- sidering the fact that mutation is often used along with other operators, as long as these other operators generate c.i.i.d. populations, the individuals after mutation will be c.i.i.d., too. Therefore, a combined operator of mutation and any other operator sat- isfying the c.i.i.d. assumption can satisfy our assumption. An example can be seen in Qi and Palmieri (1994a), where mutation is analyzed together with proportionate selec- tion. On the other hand, an algorithm which only uses mutation is very simple. It can be readily modeled and analyzed without much difficulty. Perhaps more significant exceptions are operators such as selection without replace- ment, or the crossover operator which produces two dependent offspring at the same time. In fact, for these operators not satisfying the c.i.i.d. assumption, it is still possible to expand the random elements modeling finite-sized population to random sequences. For example, the random elements can be padded with some fixed constants or ran- dom elements of known distributions to form the random sequences. In this way, our definition of the convergence of IPMs can still be applied. However, whether in this scenario convergence in distribution for these random sequences can still yield mean- ingful results similar to the transition equation is another research problem. It may need further investigation. Nonetheless, our assumption is equivalent to the exchangeability assumption generally used in previous studies. The Abstraction of EA and IPMs 3.2.2 Given the modeling assumptions, we develop an abstraction to describe the population dynamics of the EA and IPMs. Let the EA with population size n be denoted by EAn, and the kth (k ∈ N) gener- ∈ M ∈ Mn, where xn = (xn k,i )n ation it produces be modeled as a random element Pn k,i k is a random element representing the ith individual in Pn k . Without loss of generality, assume that the EA has two operators, G and H. In each iteration, the EA first employs G on the current population to generate an intermediate population, on which it then employs H to generate the next population. Notice that here G and H are just terms representing the operators in the real EA. They facilitate describing the evolutionary process. For EAn, G and H are actually instantiated as functions from Mn to Mn, de- noted by Gn and Hn, respectively. For example, if G represents proportionate selection, the function Gn : Mn → Mn is the actual operator in EAn generating n c.i.i.d. individu- als according to the conditional probability (3). Of course, for the above abstraction to i=1 Evolutionary Computation Volume 28, Number 1 67 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 B. Song and V.O.K. Li be valid, the operators used in EAn should actually produce random elements in Mn; that is, the newly generated population should be measurable on (Sn, S n). As most op- erators in real EAs satisfy this condition and this is the assumption implicitly taken in previous studies, we assume that this condition is automatically satisfied. Given these notations, the evolutionary process of EAn can be described by the k , k ∈ 0 is known and the generation of Pn k=0, where the initial population Pn ∞ sequence (Pn k ) N+ follows the recurrence equation Pn k+1 = (Hn ◦ Gn)(Pn k ). (24) Then understanding the population dynamics of the EA can be achieved by studying the distributions and properties of Pn k . Let the IPM of the EA be denoted by EA∞. The population dynamics it produces ∞ 0 is known and the generation k=0, where P∞ can be described by the sequence (P∞ k of P∞ ) k , k ∈ N+ follows the recurrence equation ∈ M∞ = (H∞ ◦ G∞)(P (25) in which G∞, H∞ : M∞ → M∞ are operators in EA∞ modeled after G and H. Then, the convergence of EA∞ basically requires that (Pn for every genera- k ) tion k. n=1 converges to P∞ P ∞ k ∞ k ), ∞ k+1 The Proposed Framework 3.2.3 As stated before, for each generation k ∈ N, the elements of the sequence (P1 k, . . .) and the limit P∞ k are all random elements of different metric spaces. Therefore, the core of developing our model is to expand Pn k to random sequences, while ensuring that this expansion will not affect modeling the evolutionary process of the real EA. The result k=0 for each n ∈ N+, which of this step is the sequence of random sequences (Qn k completely describes the population dynamics of EAn. For the population dynamics of EA∞, we just let Q∞ k = P∞ k . ∈ M∞ k, P2 ∞ ) The expansion of Pn k , Qn our framework. In the following, we present them rigorously. k and the relationships between Pn k , and Q∞ k are the core of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . The Expansion of Pn k 3.2.4 We start by decomposing each of Gn and Hn to two operators. One operator is from S∞ to Sn. It corresponds to how to convert random sequences to random vectors. A natural choice is the projection operator πn. To model the evolutionary process, we also have to define how to expand random vectors to random sequences. In other words, we have to define the expansions of Gn and Hn, which are functions from Sn to S∞ . Definition 2 (The expansion of operator): For an operator Tn : Mn → Mn satisfying the condition that for any x ∈ Mn, the elements of Tn(x) are c.i.i.d. given x, the expansion of Tn is the operator Tn : Mn → M∞, satisfying that for any x ∈ Mn, f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 1. Tn(x) = (πn ◦ 2. The elements of Tn)(x). Tn(x) are c.i.i.d. given x. Tn is the expansion of Tn. Condition 1 ensures that Tn In Definition 2, the operator can be safely replaced by πn ◦ Tn. Condition 2 ensures that the paddings for the sequence are generated according to the same conditional probability distribution as that used by Tn to generate new individuals. In other words, if the operator ˙Tn : Mn → M describes how Tn generates each new individual from the current population, Tn is equivalent to 68 Evolutionary Computation Volume 28, Number 1 A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space Figure 2: Relationships between Pn k , Qn k , and Q∞ k . invoking ˙Tn independently on the current population for n times, and Tn is equivalent to invoking ˙Tn independently for infinite times. Finally, because Tn satisfies the condition in the premise, the expansion By Definition 2, the operators in EAn can be decomposed as Gn = πn ◦ Hn = πn ◦ the sequence of random sequences [Qn k equation Gn and Hn, respectively. Then, the evolutionary process of EAn can be described by ∞ ∞ k=0, satisfying the recurrence i=0 Hn ◦ πn ◦ Tn always exists. = (yn ∈ M∞ = ( k,i ) ] Gn)(Pn Qn k ), k+1 k follows the recurrence equation (24), and Qn 0 (26) 0, 0, 0, . . .). It can also be = (Pn where Pn proved that = πn(Qn (27) Essentially, (26) and (27) describe how the algorithm progresses in the order k )k, and it k+1, . . .. It fully characterizes the population dynamics (Pn k+1, Pn k , Qn k , Pn Pn k k ). . . . , Qn is clear that the extra step of generating Qn k does not introduce modeling errors. , there is no need for expansion. For convenience we For EA∞, because P∞ k ∈ M∞ simply let ∞ Q k ∞ = P k (28) for k ∈ N. k and Q∞ k , Qn In summary, the relationships between Pn k are better illustrated in Fig- ure 2. This is the core of our framework for modeling the EA and IPMs. For clarity, we also show the intermediate populations generated by G (denoted by P(cid:6)n k ), their ex- k ), and their counterparts generated by G∞ (denoted by Q(cid:6)∞ pansions (denoted by Q(cid:6)n k ), respectively. How they fit in the evolutionary process can be clearly seen in the figure. In Figure 2, a solid arrow with an operator on it means that the item at the arrow head equals the result of applying the operator on the item at the arrow tail. For exam- = ple, from the figure it can be read that Qn 0 ). Dashed arrow with a question mark 1 on it signals the place to check whether convergence in distribution holds. For example, when k = 2, it should be checked whether (Qn 2 ) Finally, one distinction needs special notice. For EAm and EAn (m (cid:11)= n), consider k . It is clear that Gm : Mm → Mm and Gn : Mn → Mn the operators to generate Pm are two different operators because their domains and ranges are all different. The dis- tinction still exists when we consider Qn k , though it is more subtle and likely to be ig- , it is clear nored. In Figure 2, if we consider the operator ! Gn uses the same mechanism to generate new individuals as the one used in that Gn = Gn ◦ πn, and Q(cid:6)n k ) describes the same population dynamics as that gener- k ! ated by P(cid:6)n Gn are both functions from k k ). However, if we choose m (cid:11)= n, n=1 converges to Q∞ Gn : M∞ → M∞ 2 as n → ∞. Gn = πn ◦ ! k and Pn ! Gm and = Gn(Pn Hn(P(cid:6)n Gn(Qn = ! ∞ Evolutionary Computation Volume 28, Number 1 69 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / / e d u e v c o a r t i c e - p d l f / / / / / 2 8 1 5 5 2 0 2 0 3 5 6 e v c o _ a _ 0 0 2 4 9 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 B. Song and V.O.K. Li to M∞ ! Gm and . Therefore, checking domains and ranges are not enough to discern ! M∞ Gm and ! ! Gn lies in the contents of Gn. It is important to realize that the distinction between ! Gn use m and n individuals in the current population to generate the functions. the new population, respectively, although the new population contains infinite number of individuals. In short, EAm and EAn are the EA instantiated with different population sizes. Mathematically, the corresponding population dynamics are modeled by stochas- tic processes involving different operators, even though their domains and ranges may be the same. The same conclusion also holds for the operator H. ! Gm and 3.3 Convergence of IPMs Given the framework modeling the EA and IPMs, first, we define convergence in distri- bution for random elements of S∞. This is standard material. Then, the convergence of IPMs is defined by requiring that the sequence (Q1 for every k ∈ N. k, . . .) converges to Q∞ k, Q2 k 3.3.1 Convergence in Distribution k are random elements of S∞ As Qn , in the following we define convergence in distribu- tion for sequences of S∞ -valued random elements. Convergence in distribution is equiv- alent to weak convergence of induced probability measures of the random elements. We use the former theory because when modeling individuals and populations as random elements, the former theory is more intuitive and straightforward. The following mate- rials are standard. They contain the definition of convergence in distribution for random elements, as well as some useful definitions and theorems which are used in our analy- sis of the simple EA. Most of the materials are collected from the theorems and examples in Sections 1–3 of Billingsley (1999). The definition of Prokhorov metric is collected from Section 11.3 in Dudley (2002). Let x, y, xn, n ∈ N+ be random elements defined on a hidden probability space ((cid:4), F, P) taking values in some separable metric space T. T is coupled with the Borel σ -field T . Let (T(cid:6), T (cid:6)) be a separable measurable space other than (T, T ). Definition 3 (Convergence in distribution): If the sequence (xn) that E [h(xn)] → E [h(x)] for every bounded, continuous function h : T → R, we say (xn) d→ x. converges in distribution to x, and write xn ∞ n=1 satisfies the condition ∞ n=1 For (cid:6) > 0, let A(cid:6) = {y ∈ T : D(X, sì) < (cid:6) for some x ∈ A}. Then it is well known that convergence in distribution on separable metric spaces can be metricized by the Prokhorov metric. Definition 4 (Prokhorov metric): For two random elements x and y, the Prokhorov metric is defined as ρd(x, y) = inf{(cid:6) > 0 : P(x ∈ A) ≤ P(y ∈ A(cid:6) ) + (cid:6), ∀A ∈ T }.
Call a set A in T an x-continuity set if P(x ∈ ∂A) = 0, where ∂A is the boundary set
of A.
Theorem 2 (The Portmanteau theorem): The following statements are equivalent.
d→ x.
1. xn
2. lim supn P(xn ∈ F ) ≤ P(x ∈ F ) for all closed set F ∈ T .
3. lim infn P(xn ∈ G) ≥ P(x ∈ G) for all open G ∈ T .
4. P(xn ∈ A) → P(x ∈ A) for all x-continuity set A ∈ T .
70
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
Theorem 3 (The mapping theorem): Suppose h : (T, T ) → (T(cid:6), T (cid:6)) is a measurable func-
d→ h(X).
zione. Denote by Dh the set of discontinuities of h. If xn
, Poi (a b)T and
Let a, an be random elements of T, B, bn be random elements of T(cid:6)
(an bn)T are random elements of T × T(cid:6). Note that T × T(cid:6) is separable.
Theorem 4 (Convergence in distribution for product spaces): If a is independent of b and
d→ b.
an is independent of bn for all n ∈ N+, Poi (an bn)T d→(a b)T if and only if an
d→ x and P(Dh) = 0, then h(xn)
d→ a and bn
Theorem 4 is adapted from Theorem 2.8 (ii) in Billingsley (1999).
Let z, zn, n ∈ N+ be random elements of S∞
.
Theorem 5 (Finite-dimensional convergence): zn
any m ∈ N+.
d→ z if and only if πm(zn)
d→ πm(z) for
Theorem 5 basically asserts that convergence in distribution for countably infinite
dimensional random elements can be studied through their finite-dimensional projec-
zioni. It is adapted from Example 1.2 and Example 2.4 in Billingsley (1999). In Billings-
ley (1999), the metric space under consideration is R∞
. Tuttavia, as both R and S are
separable, it is not difficult to adapt the proofs for R∞
to a proof for Theorem 5. Note
that πm(z) are random elements defined on ((cid:4), F, P) taking values in (Sm, S m), E
P[πm(z) ∈ A] = P(z ∈ A × S × S × . . .) for every A ∈ S m. The same is true for πm(zn).
3.3.2 Convergence of IPM
As convergence in distribution is properly defined, we can use the theory to define
convergence of IPMs. The idea is that IPM is convergent (thus justifiable) if and only if it
can predict the limit distribution of the population dynamics of EAn for every generation
k ∈ N as the population size n goes to infinity. It captures the limiting behaviors of real
EAs.
Definition 5 (Convergence of IPMs): An infinite population model EA∞ is convergent if
k as n → ∞, where Qn
and only if for every k ∈ N, Qn
k
generated according to (26), (28), (24), E (25).
k and the underling Pn
d→ Q∞
k , Q∞
k , P∞
k are
Definition 5 is essentially the core of our proposed framework. It defines the con-
vergence of IPM and is rigorous and clear.
3.4
Summary
In this section, we built a framework to analyze the convergence of IPMs. The most
significant feature of the framework is that we model the populations as random se-
quences, thereby unifying the ranges of the random elements in a common metric space.
Then, we gave a rigorous definition for the convergence of IPMs based on the theory of
convergence in distribution.
Our framework is general. It only requires that operators produce c.i.i.d. individu-
COME. Infatti, any EA and IPM satisfying this assumption can be put into the framework.
Tuttavia, to obtain meaningful results, the convergence of IPMs has to be proved. Questo
may require extra analyses on IPM and the inner mechanisms of the operators. These
analyses are presented in Sections 4 E 5.
Finalmente, there is one question worth discussing. In our framework, the expansion of
operator is carried out by padding the finite population with c.i.i.d. individuals follow-
ing the same marginal distribution. Then a question naturally arises: why not pad the
finite population with some other random elements, or just with the constant 0? Questo
Evolutionary Computation Volume 28, Numero 1
71
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
idea deserves consideration. After all, if the expansion is conducted by padding 0s, IL
requirement of c.i.i.d. can be discarded, and the framework and the convergence of IPMs
stay the same. Tuttavia, we did not choose this approach. The reason is that padding
the population with c.i.i.d. individuals facilitates the analysis of the IPM. Per esempio,
in our analysis in Sections 4 E 5, the sufficient conditions for the convergence of IPMs
k ), Dove (cid:2) is the operator under analysis. (cid:2)m uses the first
require us to consider (cid:2)M(Qn
m elements of Qn
k is expanded from
k by padding 0s, (cid:2)M(Qn
Pn
k ) does not make any sense because the m individuals used by
(cid:2)m have (m − n) 0S. This restricts our option in proving the convergence of IPMs.
k to generate new individuals. Now if m > n and Qn
4
Sufficient Conditions for Convergence of IPMs and I.I.D. IPM
Construction
In this section, we give applicable sufficient conditions for convergence of IPMs. In Sec-
zione 4.1, we give sufficient conditions for the convergence of IPMs. To appreciate the
necessity, consider the framework in Figure 2. To prove the convergence of IPM, by
k as n → ∞ for every k ∈ N. Tuttavia,
Definition 5, we should check whether Qn
k
this direct approach is usually not viable. To manually check the convergence for all
values of k is wearisome and sometimes difficult. This is because as k increases, the dis-
tributions of Qn
k as
k+1 as n → ∞. Of
n → ∞ may be different from the method needed to prove Qn
course, after proving the cases for several values of k, it may be possible to discover
some patterns in the proofs, which can be extended to cover other values of k, così
proving the convergence of the IPM. But this process is still tedious and uncertain.
k change. Therefore, the method needed to prove Qn
k
k and Q∞
d→ Q∞
d→ Q∞
d→ Q∞
k+1
In view of this, a “smarter” way to prove the convergence of IPM may be the fol-
lowing method. Primo, the convergence of IPM for one iteration step for each operator
is proved. Then, the results are combined and extended to cover the whole population
dynamics. The idea is that if the convergence holds for one generation number k, Poi
it can be passed on automatically to all subsequent generations. Per esempio, in Figure
2, consider the operators G∞ and
Gn ◦ πn. The first step is to prove that
if Qn
k
d→ Q
∞
(cid:6)N
k as n → ∞, then Q
k
d→ Q
(cid:6)∞
k as n → ∞.
(29)
In other words, G∞ can model
lar results for H∞ and
the overall IPM is proved.
Gn ◦ πn for one iteration step. Then, after obtaining simi-
Hn ◦ πn, we combine the results together and the convergence of
Tuttavia, this approach still seems difficult because we have to prove this pass-on
relation (30) holds for every k. In essence, this corresponds to whether the operators
in IPM can be stacked together and iterated for any number of steps. This is the issue
of the stacking of operators and iterating the algorithm. Therefore, in Section 4.1, we
give sufficient conditions for this to hold. These conditions are important. If they hold,
proving the convergence of the overall IPM can be broken down to proving the conver-
gence of one iteration step of each operator in IPM. This greatly reduces the difficulty
in deriving the proof.
To model real EAs, IPM has to be constructed reasonably. As shown in Section 2,
exchangeability cannot yield the transition equation for the simple EA. This creates the
research problem of finding a suitable modeling assumption to derive IPM. Therefore,
in Section 4.2, we discuss the issue and propose to use i.i.d. as the modeling assumption
in IPM.
72
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
4.1
Sufficient Conditions for Convergence of IPMs
To derive sufficient conditions for the convergence of the overall IPM, the core step is
to derive conditions under which the operators in the IPM can be stacked and iterated.
As before, let EAn and EA∞ denote the EA with population size n and the IPM
E
be its corresponding expanded operators in EAn and EA∞, respec-
. To give an example, (cid:2)N
under analysis, rispettivamente. Let (cid:2) be an operator in the EA, E (cid:2)N : M∞ → M∞
(cid:2)∞ : M∞ → M∞
tively. Note that (cid:2)n and (cid:2)∞ generate random elements of S∞
E (cid:2)∞ may correspond to πn ◦
Gn and G∞ in Figure 2, rispettivamente.
We define a property under which (cid:2)∞ can be stacked with some other operator (cid:5)∞
satisfying the same property without affecting the convergence of the overall IPM. In
other words, for an EA using (cid:5) E (cid:2) as its operators, we can prove the convergence
of IPM by studying (cid:5) E (cid:2) separately. We call this property “the stacking property.”
It is worth noting that if (cid:6) = (cid:2), then this property guarantees that (cid:2)∞ can be iterated
for any number of times. Therefore, it also resolves the issue of iterating the algorithm.
for α ∈ N+ ∪ {∞}. We have the following results.
Definition 6 (The stacking property): Given U ⊂ M∞, if for any converging sequence
d→ A∞ ∈ U, (cid:2)N(An)
d→ (cid:2)∞(A∞) ∈ U as n → ∞ always holds, then we say that (cid:2)∞ has
An
the stacking property on U.
Theorem 6: If (cid:5)∞ and (cid:2)∞ have the stacking property on U, Poi (cid:5)∞ ◦ (cid:2)∞ has the stacking
property on U.
Let Aα be random elements in M∞
, because (cid:2)∞ has the stacking
Proof: For any converging sequence An
d→ (cid:2)∞(A∞) ∈ U. Then, ((cid:2)N(An))n is also a converging
property on U, we have (cid:2)N(An)
sequence. Since (cid:5)∞ has the stacking property on U, then by definition we immediately
(cid:3)
Avere ((cid:5)n ◦ (cid:2)N)(An)
d→((cid:5)∞ ◦ (cid:2)∞)(A∞) ∈ U.
d→ A∞ ∈ U ⊂ M∞
By Theorem 6, any composition of (cid:5)∞ and (cid:2)∞ has the stacking property on U.
In particular, ((cid:2)∞)m has the stacking property on U. The stacking property essentially
guarantees that the convergence on U can be passed on to subsequent generations.
Theorem 7 (Sufficient condition 1): For an EA consisting of a single operator (cid:2), let (cid:2) be
modeled by (cid:2)∞ in the IPM, EA∞ and (cid:2)∞ have the stacking property on some space U ⊂ M∞.
If the initial populations of both EA and EA∞ follow the same distribution PX for some X ∈ U,
then EA∞ converges.
Proof: Note that for EAn and EA∞, the kth populations they generate are ((cid:2)N)k (X) E
((cid:2)∞)k (X), rispettivamente. By Theorem 6, ((cid:2)∞)k has the stacking property on U. Because
d→((cid:2)∞)k (X) ∈ U as
the sequence (X, X, . . .) converges to X ∈ U, by Definition 6, ((cid:2)N)k (X)
(cid:3)
n → ∞. Since this holds for any k ∈ N, by Definition 5, EA∞ converges.
By Theorems 6 E 7, we can prove the convergence of the overall IPM by proving
that the operators in the IPM have the stacking property. Comparing with (30), it is
clear that the stacking property is a sufficient condition. This is because the stacking
property requires that ((cid:2)N(An))n converges to a point in U for any converging sequence
d→ A∞ ∈ U, while (29) requires the convergence to hold only for
(An)n satisfying (An)N
the specific converging sequence (Qn
k )n is generated by the algorithm, it may
have special characteristics regarding converging rate, distributions, eccetera. On the other
hand, checking the stacking property may be easier than proving (29). This is because
the stacking property is independent of the generation number k.
k )N. Since (Qn
Evolutionary Computation Volume 28, Numero 1
73
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
Another point worth discussing is the introduction of U in Definition 6. Ovviamente, if
we omit U (or equivalently let U = M∞
), the stacking property will become “stronger”
because if it holds, the convergence of the IPM is proved for the EA starting from any
initial population. Tuttavia, in that case the condition is so restricted that the stacking
property cannot be proved for many operators.
In Definition 6, it is required that (cid:2)N(An)
d→ (cid:2)∞(A∞) ∈ U as n → ∞. The sequence
under investigation is ((cid:2)N(An))N, which is a sequence of changing operators ((cid:2)N)n on a
sequence of changing inputs (An)N. As both the operators and the inputs change, IL
convergence of ((cid:2)N(An))n may still be difficult to prove. Therefore, in the following, we
further derive two sufficient conditions for the stacking property.
Primo, let Bα,β = (cid:2)β (Aα ), where α, β ∈ N+ ∪ {∞}. Then, we have the following suffi-
cient conditions for the stacking property.
Theorem 8 (Sufficient condition 2): For a space U and all converging sequences An
U, if the following two conditions
d→ A∞ ∈
1. ∃M ∈ N+, such that
for all m > M, Bn,M
supm>M ρd(Bn,M, B∞,M) → 0 as n → ∞,
d→ B∞,m uniformly as n → ∞,
i.e.
2. B∞,M
d→ B∞,∞ ∈ U as m → ∞,
are both met, Poi (cid:2)∞ has the stacking property on U.
Theorem 9 (Sufficient condition 3): For a space U and all converging sequences An
U, if the following two conditions
d→ A∞ ∈
1 ∃N ∈ N+, such that
for all n > N, Bn,M
supn>N ρd(Bn,M, Bn,∞) → 0 as m → ∞,
d→ Bn,∞ uniformly as m → ∞,
i.e.
2. Bn,∞
d→ B∞,∞ ∈ U as n → ∞,
are both met, Poi (cid:2)∞ has the stacking property on U.
Since Theorems 8 E 9 are symmetric in m and n, proving one of them leads to
the other. In the following, we prove Theorem 8. Recall that ρd is the Prokhorov metric
(Definition 4) and ∨ gets the maximal in the expression.
Proof: ∀(cid:6) > 0, by condition 1 in Theorem 8, ∃N s.t. supm>M ρd(Bn,M, B∞,M) < 1
n > N. By condition 2 in Theorem 8, ∃ M s.t. ρd(B∞,M, B∞,∞) < 1
for all l > M ∨ N ∨ M,
2 (cid:6) for all
2 (cid:6) for all m > M. Now
ρd(Bl,l, B∞,∞) ≤ ρd(Bl,l, B∞,l ) + ρd(B∞,l, B∞,∞) ≤ 1
2
(cid:6) + 1
2
(cid:6) = (cid:6).
Therefore, Bn,N
d→ B∞,∞ as n → ∞.
(cid:3)
To understand these two theorems, consider the relationships between Aα and Bα,β
illustrated by Figure 3. In the figure, the solid arrow represents the premise in Definition
d→ A∞ ∈ U as n → ∞. The double line arrow represents the direction to
6; questo è, An
d→ B∞,∞ ∈ U as n → ∞. The dashed
be proved for the stacking property on U, i.e. Bn,N
arrows are the directions to be checked for Theorem 8 to hold. The wavy arrows are the
directions to be checked for Theorem 9 to hold.
74
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
Figura 3: Relationships between Aα and Bα,β .
inputs (Bn,N
Now it is clear that Theorems 8 E 9 bring benefits. Per esempio, for Theorem 8,
instead of proving the convergence for a sequence generated by changing operators and
d→ B∞,∞), this sufficient condition considers the convergence of sequences
d→ B∞,M) and of the sequence
d→ B∞,∞).
generated by changing operators on the same input (B∞,M
generated by the same operator on changing inputs (Bn,M
The reason we introduce M and N in Theorems 8 E 9, rispettivamente, is to exclude
some of the starting columns and rows in Figure 3, if necessary. This is useful in proving
the convergence of the IPM of the k-ary recombination operator.
4.2
The I.I.D. Assumption
In this section, we address the issue of how to construct IPM. This issue also corresponds
to how to choose the space U for the stacking property.
Before introducing the i.i.d. assumption, let us give an example. Consider the
space U = {x ∈ M∞| P[x = (C, C, . . .)] = 1 for some c ∈ S}. If the initial population fol-
lows some distribution from U, then the population consists of all identical individuals.
If an EA with proportionate selection and crossover operates on this initial population,
then all subsequent populations stay the same as the initial population. An IPM of this
EA can be easily constructed, and it can be easily proved that the stacking property
holds as long as the EA chooses its initial population from U. Tuttavia, this is not a
very interesting case. This is because U is too small to model real EAs.
D'altra parte, if U = {x ∈ M∞|x is exchangeable}, U may be too big to derive
meaningful results. This can be seen from our analysis in Section 2 which shows that
under exchangeability it is not possible to derive transition equations of marginal dis-
tributions for the simple EA.
Therefore, choosing U should strike a balance between the capacity and the com-
= {x ∈ M∞|x is i.i.d.}.
plexity of the IPM. In the following analysis, we choose U to be U
IO
Evolutionary Computation Volume 28, Numero 1
75
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
IPMs of EAs are constructed using the i.i.d. assumption, and we prove the convergence
of the overall IPM by proving that the operators in the IPM have the stacking property
on U
IO.
We choose U
I for the following reasons. Primo, in the real world, many EAs generate
i.i.d. initial populations. Therefore this assumption is realistic. Secondly, i.i.d. random
elements have the same marginal distributions. Therefore, IPM can be described by
transition equations of marginal distributions. Finalmente, there is abundant literature on
the converging laws and limit theorems of i.i.d. sequences. Therefore, the difficulty in
constructing IPM can be greatly reduced compared with using other modeling assump-
zioni.
In the following, we show how to construct IPM under the i.i.d. assumption. Questo
process also relates to condition 2 in Theorem 8. It essentially describes how the IPM
generates new populations.
Let the operator in the EA be (cid:2), and the corresponding operator in EAm be (cid:2)M :
M∞ → M∞. Recall that in our framework we only study EAs consisting of c.i.i.d. oper-
ators, Perciò (cid:2)m generates c.i.i.d. outputs by using the first m elements of its input.
The process that (cid:2)m generates each output can be described by the conditional p.d.f.
F(cid:2)M (X|y1, y2, . . . , ym). Let a = (ai )∞
= (cid:2)M(UN) be the
i=1
produzione, then the distribution of b can be completely described by its finite-dimensional
p.d.f.s
∈ M∞ be the input and b = (bi )∞
i=1
fπl (B)(x1, . . . , xl ) =
(cid:4) (cid:4)
l”
i=1
Sm
F(cid:2)M (xi|y1, . . . , ym) · fπm(UN)(y1, . . . , ym) dy1 . . . dym
(30)
for every l ∈ N+.
To derive the IPM (cid:2)∞ for (cid:2), consider the case when l = 1 and a ∈ U
m#
that in this case fπm(UN)(y1, . . . , ym) =
fa1 (yi ), we have
I in (30). Noting
i=1
(cid:4) (cid:4)
fb1 (X) =
F(cid:2)M (X|y1, . . . , ym) ·
Sm
M”
i=1
fa1 (yi ) dy1 . . . dym.
(31)
Now taking m → ∞, (32) in the limit becomes the transition equation describing
how (cid:2)∞ generates each new individual. Let the transition equation be
(32)
= (cid:2)∞(UN). Then how (cid:2)∞ generates l individuals can be described by
fb1
= T(cid:2)[fa1 ],
∞
and let c = (ci )
i=1
the finite-dimensional p.d.f.s of c:
fπl (C)(x1, . . . , xl ) =
l”
i=1
T(cid:2)[fa1 ](xi )
(33)
for every l ∈ N+. Overall, (34) describes the mapping (cid:2)∞ : U
IO
→ U
IO.
To better understand the construction, it is important to realize that for (cid:2)∞ both the
input and the output are i.i.d. In other words, (cid:2)∞ generates i.i.d. population dynamics to
simulate the real population dynamics produced by (cid:2), only that the transition equation
In (cid:2)∞ is derived by mimicking how (cid:2) generates each new individual on i.i.d. inputs and
taking the population size to infinity. Infatti, if the stacking property on U
I is proved
and the initial population is i.i.d., (cid:2)∞ will always take i.i.d. inputs and produce i.i.d.
I are well-defined. D'altra parte, (cid:2)∞(A /∈ U
outputs. The behaviors of (cid:2)∞ on U
IO)
76
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
is not defined in the construction. This leaves us freedom. We can define (cid:2)∞(A /∈ U
IO)
freely to facilitate proving the stacking property of (cid:2)∞. In particular, Bn,∞ for n ∈ N+ in
Figura 3 can be defined freely to facilitate the analysis.
Infatti, under the i.i.d. assumption, deriving the transition equation for most opera-
tors is the easy part. The more difficult part is to prove the stacking property of (cid:2)∞ on U
IO.
To give an example, consider the transition equation (5) constructed in Qi and Palmieri
(1994UN), which models the joint effects of proportionate selection and mutation. As our
analysis in Section 2 shows, it does not hold under the assumption of exchangeability.
Tuttavia, if the modeling assumption is i.i.d., the transition equation can be immedi-
ately proved (see our analysis in Section 2). This also applies to the transition equation
built by the same authors for the uniform crossover operator (in Theorem 1 of Qi and
Palmieri, 1994B), where the transition equation is in fact constructed under the i.i.d. COME-
assunzione. Therefore, in the following analyses, we do not refer to the explicit form of
the transition equation, unless it is needed. We only assume that the transition equa-
tion is successfully constructed, and it has the form (32) which is derived from (31) COME
m → ∞.
The construction of the IPM also relates partly to condition 2 in Theorem 8. Com-
paring with this condition, it can be seen that for a successfully constructed (cid:2)∞, IL
following two facts are proved in the construction (m.p.w. stands for point-wise con-
vergence of marginal p.d.f.s.).
m.p.w.−→ B∞,∞ as m → ∞.
1. B∞,M
2. B∞,∞ ∈ U
IO.
Ovviamente, these two facts are not sufficient for this condition to hold. One still needs
d→ B∞,∞ as m → ∞. In other words, one has to consider convergence of
to prove B∞,M
finite dimensional distributions.
Finalmente, we sometimes use x for x1, . . . , xl if l is clear in the context. Per esempio (30)
F(cid:2)M (xi|sì) · fπm(UN)(sì) dy, where l takes m’s place
can be rewritten as fπl (B)(X) =
(cid:2) (cid:2)
l#
Sm
i=1
and means the population size which the operator is operating on.
5 Analysis of the Simple EA
In this section, we use the sufficient conditions to prove the convergence of IPMs for
various simple EA operators. The operators of mutation and k-ary recombination are
readily analyzed in Sections 5.1 E 5.2, rispettivamente. In Section 5.3, we summarize this
section and discuss our results.
5.1 Analysis of the Mutation Operator
Having derived sufficient conditions for the stacking property and constructed the IPM,
we prove the convergence of the IPM of the mutation operator first. Mutation adds an
i.i.d. random vector to each individual in the population. If the current population is
A ∈ M∞, then the population after mutation satisfies L[B = (cid:2)M(UN)] = L(UN + X) for all
m ∈ N+, where X ∈ U
I is a random element decided by the mutation operator. Come il
content of the mutation operator does not depend on m, we just write (cid:2) to represent
(cid:2)M. To give an example, X may be the sequence (x1, x2, . . .) with all xi ∈ M mutually
independent and xi ∼ N (0, Id ) for all i ∈ N+, where N (UN, B ) is the multivariate normal
distribution with mean a and covariance matrix B, and Id is the d-dimensional identity
matrix. Note that every time (cid:2) is invoked, it generates perturbations independently.
Evolutionary Computation Volume 28, Numero 1
77
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
Per esempio, let A1 and A2 be two populations, then we can write (cid:2)(Ai ) = Ai + Xi for
i = 1, 2 satisfying L(X1) = L(X2) = L(X) E {Xi}i=1,2 are mutually independent and
independent from {Ai}i=1,2.
Prossimo, consider (cid:2)∞. Recall that as an IPM, (cid:2)∞ simulates real population dynamics
by taking i.i.d. inputs and producing i.i.d. outputs. If the marginal p.d.f.s of A and X
are fa and fx, rispettivamente, Poi (cid:2)∞(UN) generates i.i.d. individuals whose p.d.f.s are fa ∗
fx, where ∗ stands for convolution. Given the construction, we can prove the stacking
property of (cid:2)∞.
Theorem 10 (Mutation): Let (cid:2) be the mutation operator, E (cid:2)∞ be the corresponding oper-
ator in the IPM constructed under the i.i.d. assumption, Poi (cid:2)∞ has the stacking property on
U
IO.
Proof: We use the notations and premises in Theorem 8. Refer to Figure 3. In particular,
the sequence (An) and the limit A∞ are given and An
d→ A∞ ∈ U
I as n → ∞.
Apparently,
[(cid:2)M(A∞)]
∞
m=1
= [(cid:2)(A∞), (cid:2)(A∞), . . .]
d→ (cid:2)(A∞) = (cid:2)∞(A∞) ∈ U
IO.
Therefore, condition 2 in Theorem 8 is satisfied.
Noting that condition 1 in Theorem 8 is equivalent to (cid:2)(An)
d→ (cid:2)(A∞), we prove
d→ πi[(cid:2)(A∞)] for all i ∈ N+. Then by Theorem
this condition by proving that πi[(cid:2)(An)]
5, condition 1 in Theorem 8 holds. Then, as both conditions in Theorem 8 are satisfied,
this theorem is proved.
Now, we prove πi[(cid:2)(An)]
d→ πi[(cid:2)(A∞)] for all i ∈ N+. Primo, note that (cid:2)(Aα ) = Aα +
Xα for all α ∈ N ∪ {∞}. {Xα ∈ M∞} are i.i.d. and independent from {Aα ∈ M∞}. In addi-
zione, for every α, l(Xα ) = L(X).
Since L(Xα ) = L(X), it is apparent that Xn
d→ X∞. Then by Theorem 5, we have
πi (Xn)
d→ πi (X∞) and πi (An)
d→ πi (A∞).
Consider the product space Si × Si. It is both separable and complete. Since πi (Aα )
and πi (Xα ) are independent, by Theorem 4, it follows that
(cid:5)
(cid:6)
(cid:5)
(cid:6)
πi (An)
πi (Xn)
d→
πi (A∞)
πi (X∞)
.
(34)
Note that
(cid:5)
(cid:6)
(cid:12)(cid:5)
(cid:6)(cid:13)
(cid:27)
(cid:28)
πi[(cid:2)(Aα )] =
πi (Aα )
πi (Xα )
where I is the identity matrix of appropriate dimension and h : Si × Si → Si is a func-
(cid:30)
tion satisfying h(
. Apparently h is continuous. Then by (34), (35), E
πi (Aα )
πi (Xα )
= h
(35)
I I
(cid:29)
(cid:30)
(cid:29)
,
Theorem 3, πi[(cid:2)(An)]
X
sì
X
sì
) = [ I I ]
d→ πi[(cid:2)(A∞)] for any i ∈ N+.
(cid:3)
In the proof, we concatenate the input (An) and the randomness (Xn) of the mutation
operator in a common product space, and represent (cid:2) as a continuous function in that
spazio. This technique is also used when analyzing other operators.
5.2 Analysis of k-ary Recombination
Consider the k-ary recombination operator and denote it by (cid:2). In EAm, the operator is
denoted by (cid:2)M. (cid:2)m works as follows. To generate a new individual, it first samples k
individuals from the current m-sized population randomly with replacement. Assume
78
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
the current population consists of {xi}M
{yi}k
i=1 follows the probability:
x=1, and the selected k parents are {yi}k
i=1, Poi
P(yi = xj ) = 1
M
for all i ∈ {1, . . . , k}, j ∈ {1, . . . , M}.
(36)
After the k parents are selected, (cid:2)m produces a new individual x following the for-
mula
x =
k(cid:7)
i=1
Uiyi,
(37)
i=1 are random elements of Rd×d (recall that x and yi are random elements of
Dove {Ui}k
i=1 are also independent of {yi}io,
S = Rd modeling individuals in our framework). {Ui}k
and the joint distribution of (Ui )i is decided by the inner mechanism of (cid:2). Overall, (cid:2)M
generates the next population by repeatedly using this procedure to generate new in-
dividuals independently.
Our formulation seems strange at first sight, but it covers many real world recom-
bination operators. Per esempio, consider k = 2 and U1
2 IO. This operator is the
crossover operator taking the mean of its two parents. D'altra parte, if k = 2 E
the distributions of U1 and U2 satisfy
= U2
= 1
$
U1
U2
= Diag(s1, s2, . . . , sd )
= I − U1
,
where Diag constructs a diagonal matrix from its inputs, {si} are i.i.d. random variables
taking values in {0, 1} satisfying P(si = 0) = P(si = 1) = 1/2, then this operator is the
uniform crossover operator which sets value at each position from the two parents with
probability 1
2 .
Consider the IPM (cid:2)∞. As stated in Section 4.2, we do not give the explicit form of
the transition equation in (cid:2)∞. We assume that the IPM is successfully constructed, E
the transition equation is derived by taking m → ∞ in (31). The reason for this approach
is not only because deriving the transition equation is generally easier than proving the
convergence of the IPM, but also the formulation in (36) E (37) encompasses many
real-world k-ary recombination operators. We do not delve into details of the mecha-
nisms of these operators and derive a transition equation for each one of them. Invece,
our approach is general in that as long as the IPM is successfully constructed, our anal-
ysis on the convergence of the IPM can always be applied.
The following theorem is the primary result of our analysis for the k-ary recombi-
nation operator.
Theorem 11 (k-ary recombination): Let (cid:2) be the k-ary recombination operator, E (cid:2)∞ be
the corresponding operator in the IPM constructed under the i.i.d. assumption, Poi (cid:2)∞ has the
stacking property on U
IO.
Proof: We use the notations and premises in Theorem 9. Refer to Figure 3. In particular,
the sequence (An) and the limit A∞ are given and An
d→ A∞ ∈ U
I as n → ∞.
We prove that
πi (Bn,N)
d→ πi (B∞,∞)
(38)
as n → ∞ for any i ∈ N+. Then by Theorem 5, the conclusion follows.
The overall idea to prove (38) is that we first prove the convergence in distribution
for the k · i selected parents; then because the recombination operator is continuous,
(39) follows.
Evolutionary Computation Volume 28, Numero 1
79
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
Primo, we decompose the operator πi ◦ (cid:2)M : M∞ → Mi. πi ◦ (cid:2)m generates the i c.i.i.d.
outputs one by one. This generation process can also be viewed as first selecting the
i groups of k parents at once from the first m elements of the input (in total the inter-
mediate output is k · i parents not necessarily distinct), then producing the i outputs
one by one by using each group of k parents. In the following, we describe this process
mathematically.
Consider (cid:6)M : M∞ → Mk·i. Let x = (xj )
∞
j =1
∈ M∞
and y = (yj )k·i
j =1
= (cid:6)M(X). Let (cid:6)M
be described by the probability
P(yj = xl ) = 1
M
for all j ∈ {1, . . . , k · i} and l ∈ {1, . . . , M}.
(39)
In essence, (cid:6)m describes how to select the k · i parents from x.
Consider (cid:5) : Mk·i → Mi. Let
u = (u1,1, u1,2, . . . , u1,k, u2,1, u2,2, . . . , u2,k, . . . . . . , ui,1, ui,1, . . . , ui,k ) ∈ Mk·i.
Let v = (vj )io
j =1
= (cid:5)(tu). Let (cid:5) be described by
k(cid:7)
vj =
Uj,luj,l for all j ∈ {1, . . . , io}
(40)
l=1] = L[(Ul )k
in which L[(Uj,l )k
ator (cid:2) as in (37), E (Uj,l )k
how to generate the i individuals from the k · i parents.
Now it is obvious that πi ◦ (cid:2)m = (cid:5) ◦ (cid:6)M. Therefore,
l=1
l=1], Dove {Ul} are decided by the recombination oper-
l=1 are independent for different j . In essence, (cid:5) describes
πi (Bm,α ) = (πi ◦ (cid:2)M)(Aα ) = ((cid:5) ◦ (cid:6)M)(Aα )
for all m ∈ N+ and α ∈ N+ ∪ {∞}.
Prossimo, consider πi ◦ (cid:2)∞ : M∞ → Mi. Let (cid:6)∞ = πk·i, we prove that
l[(πi ◦ (cid:2)∞)(UN)] = L[((cid:5) ◦ (cid:6)∞)(UN)], ∀A ∈ U
IO.
(41)
(42)
Eq. (42) is almost obvious because both operators generate i.i.d. outputs, and both
marginal p.d.f.s of the outputs follow the same distribution decided by (cid:5) on k i.i.d.
parents from A. In other words, (cid:5) ◦ (cid:6)∞ is a model of πi ◦ (cid:2)∞ on i.i.d. inputs. The out-
puts they generate on the same i.i.d. input follow the same distribution.
Since A∞ ∈ U
IO, by (42),
l[πi (B∞,∞) = (πi ◦ (cid:2)∞)(A∞)] = L[((cid:5) ◦ (cid:6)∞)(A∞)].
Then (38) is equivalent to
((cid:5) ◦ (cid:6)N)(An)
d→((cid:5) ◦ (cid:6)∞)(A∞).
as n → ∞ for any i ∈ N+.
To prove (44), we prove the following two conditions.
(43)
(44)
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
1. ∃N ∈ N+, such that for all n > N, (cid:6)M(An)
d→ (cid:6)∞(An) uniformly as m → ∞; Quello
È, supn>N ρd[(cid:6)M(An), (cid:6)∞(An)] → 0 as m → ∞.
2. (cid:6)∞(An)
d→ (cid:6)∞(A∞) as n → ∞ and (cid:6)∞(A∞) is i.i.d.
80
Evolutionary Computation Volume 28, Numero 1
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
These two conditions correspond to the conditions in Theorem 9. Since (cid:6)α is from
to Mk·i, we cannot directly apply Theorem 9. Tuttavia, it is easy to extend the proof
M∞
d→ (cid:6)∞(A∞) as n → ∞.
of Theorem 9 to prove that these two conditions lead to (cid:6)N(An)
Then, by (40) it is apparent that (cid:5) is a continuous function of its input and inner random-
ness. By concatenating the input and the inner randomness using the same technique as
that used in the proof for Theorem 10, (44) can be proved. Then this theorem is proved.
In the remainder of the proof, we prove conditions 1 E 2. These conditions can be
(cid:3)
understood by replacing the top line with (cid:6)m in Figure 3.
Proof of Condition 2
Since (cid:6)∞ = πk·i : S∞ → Sk·i (recall that πk·i can be viewed both as a mapping from S∞
Sk·i and from M∞
An
condition 2 is proved.
A
to Mk·i), (cid:6)∞ is continuous (see Example 1.2 in Billingsley, 1999). Since
d→ (cid:6)∞(A∞). Apparently, (cid:6)∞(A∞) is i.i.d. Therefore
d→ A∞, by Theorem 3, (cid:6)∞(An)
It is worth noting that this simple proof comes partly from our extension of (cid:5) ◦ (cid:6)∞
to inputs A /∈ U
IO. Infatti, the only requirement for (cid:6)∞ is (42); questo è, (cid:5) ◦ (cid:6)∞ should
model πi ◦ (cid:2)∞ on i.i.d. inputs. By defining (cid:6)∞ to be πk·i, it can take non-i.i.d. inputs such
as An. Thus this condition can be proved. In Figure 3, this corresponds to our freedom
of defining Bn,∞, n ∈ N+.
Proof of Condition 1
To prove condition 1, we first give another representation of (cid:6)M(Aα ), where m > k · i
and α ∈ N+ ∪ {∞}. This representation is based on the following mutually exclusive
cases.
1. The k · i parents chosen from Aα by (cid:6)m are distinct.
2. There are duplicates in the k · i parents which are chosen from Aα by (cid:6)M.
Let sm,α be random variables taking values in {0, 1}, with probability
P(M) = P(sm,α = 1)
= P((cid:6)m chooses k · i distinct parents from Aα )
= m · (m − 1) · · · · · (m − k · i + 1)
mk·i
.
(45)
Let xm,α ∈ Mk·i follow the conditional distribution of the k · i parents when sm,α = 1, E
ym,α ∈ Mk·i follow the conditional distribution of the k · i parents when sm,α = 0, Poi
(cid:6)M(Aα ) can be further represented as
(cid:6)M(Aα ) = sm,α · xm,α + (1 − sm,α ) · ym,α.
(46)
For our purpose, it is not necessary to explicitly describe the distribution of xm,α and
ym,α. The only useful fact is that by exchangeability of Aα,
l(xm,α ) = L[(cid:6)∞(Aα )].
(47)
To put it another way, xm,α and (cid:6)∞(Aα ) both follow the same distribution of k · i distinct
individuals from the current exchangeable population Aα. Also note that {sm,α}α are i.i.d.
random variables. They are independent of xm,α and ym,α.
Evolutionary Computation Volume 28, Numero 1
81
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
Now consider P[(cid:6)M(An) ∈ A] for any A ∈ S k·i. By conditioning on whether the k · i
parents are distinct, we have
P[(cid:6)M(An) ∈ A] = p(M) · P(xm,n ∈ A) + [1 − p(M)] · P(ym,n ∈ A).
Then by (47),
P[(cid:6)M(An) ∈ A] − P[(cid:6)∞(An) ∈ A]
= [P(M) − 1] · P[(cid:6)∞(An) ∈ A] + [1 − p(M)] · P(ym,n ∈ A).
(48)
Since p(M), P[(cid:6)∞(An) ∈ A] and P(ym,n ∈ A) are all less than or equal to 1,
P(M) − 1 ≤ [P(M) − 1] · P((cid:6)∞(An) ∈ A)
≤ P[(cid:6)M(An) ∈ A] − P[(cid:6)∞(An) ∈ A]
≤ [P(M) − 1] · P((cid:6)∞(An) ∈ A) + [1 − p(M)]
≤ 1 − p(M),
(cid:8)
(cid:8) ≤ 1 − p(M) for all A. Taking supremum over all
P[(cid:6)M(An) ∈ A] − P[(cid:6)∞(An) ∈ A]
(cid:8)
(cid:8)
P[(cid:6)M(An) ∈ A] − P[(cid:6)∞(An) ∈ A]
(cid:8)
(cid:8) ≤ 1 − p(M)
(49)
(cid:8)
(cid:8)
i.e.
UN, we have
sup
A∈S k·i
The left-hand side of (49) is the total variation distance between (cid:6)M(An) E
(cid:6)∞(An). It is an upper bound of the Prokhorov distance (see Gibbs and Su, 2002 for
its definition and properties). Since the bound 1 − p(M) is uniform with respect to n
and p(M) → 1 as m → ∞, we have
ρd[(cid:6)M(An), (cid:6)∞(An)] ≤ 1 − p(M) → 0 as m → ∞.
(50)
sup
N
This is exactly condition 1. Therefore this theorem is proved.
Or, if we do not want to use the total variance distance, we have the following result
(cid:8)
(cid:8)
for any (cid:6)∞(A∞)-continuity set A ∈ Sk·i.
P[(cid:6)N(An) ∈ A] − P[(cid:6)∞(A∞) ∈ A]
(cid:8)
(cid:8)
≤
P[(cid:6)N(An) ∈ A] − P[(cid:6)∞(An) ∈ A]
(cid:8)
(cid:8)
(cid:8)
(cid:8) +
(cid:8)
(cid:8)
≤ 1 − p(N) +
P[(cid:6)∞(An) ∈ A] − P[(cid:6)∞(A∞) ∈ A]
(cid:8)
(cid:8)
(cid:8)
(cid:8).
P[(cid:6)∞(An) ∈ A] − P[(cid:6)∞(A∞) ∈ A]
(cid:8)
(cid:8)
(51)
(cid:8)
(cid:8)
d→ (cid:6)∞(A∞) is proved.
d→ (cid:6)∞(A∞), by 4) in Theorem 2,
P[(cid:6)∞(An) ∈ A] −
Since we already proved (cid:6)∞(An)
(cid:8)
(cid:8) → 0. Then apparently (51) converges to 0. Noting that A is arbitrary,
P[(cid:6)∞(A∞) ∈ A]
by applying 4) in Theorem 2 Ancora, (cid:6)N(An)
(cid:3)
We give a brief discussion of the proof. In our opinion, the most critical step of our
proof is decomposing the k-ary recombination operator to two suboperators: one is re-
sponsible for selecting parents ((cid:6)), the other is responsible for combining them ((cid:5)).
Inoltre, for parent selection, the suboperator does not use the information of fitness
values. Piuttosto, it selects parents “blindly” according to its own rules (uniform sampling
with replacement). This makes the operator (cid:6) easier to analyze because the way it se-
lects parents does not rely on its input. Therefore, we can prove uniform convergence
In (50).
Another point worth mentioning is the choice of Theorem 9 in our proof. Though
Theorems 8 E 9 are symmetric, the difficulties of proving them are quite different. In
fatto, it is very difficult to prove the uniform convergence condition in Theorem 8.
82
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
Finalmente, our proof can be easily extended to cover k-ary recombination operators
using uniform sampling without replacement to select parents for each offspring. IL
overall proof framework roughly stays the same.
5.3
Summary
In this section, we analyzed the simple EA within the proposed framework. As the anal-
ysis shows, although the convergence of IPM is rigorously defined, actually proving the
convergence for operators usually takes a lot of effort. We did analysis under the IPM
construction and sufficient conditions from Section 4, and used various techniques to
analyze the mutation operator and the k-ary recombination operator. It can be seen that
although the sufficient conditions can provide general directions for the proofs, there
are still many details to be worked out in order to analyze different operators.
To appreciate the significance of our work, it is worth noting that in Qi and Palmieri
(1994UN,B), the convergence of the IPMs of the mutation operator, the uniform crossover
operator and the proportionate selection operator was not properly proved, and the
issue of stacking of operators and iterating the algorithm was not addressed at all. In this
article, Tuttavia, we have proved the convergence of IPMs of several general operators.
Since these general operators cover the operators studied in Qi and Palmieri (1994UN,B)
as special cases, the convergence of the IPMs of mutation and uniform crossover are
actually proved in this article. Besides, our proof does not depend on the explicit form
of the transition equation of the IPM. As long as the IPM is constructed under the i.i.d.
assumption, our proof is valid.
As a consequence of our result, consider the explicit form of the transition equation
for the uniform crossover operator derived in Section II in Qi and Palmieri (1994B).
As the authors’ proof was problematic and incomplete, the derivation of the transition
equation was not well founded. Tuttavia, it can be seen that the authors’ derivation is
in fact equivalent to constructing the IPM under the i.i.d. assumption. Since we have
already proved the convergence of IPM of the k-ary crossover operator, the analysis in
Qi and Palmieri (1994B) regarding the explicit form of the transition equation can be
retained.
6 Conclusion and Future Research
In questo articolo, we revisited the existing literature on the theoretical foundations of IPMs,
and proposed an analytical framework for IPMs based on convergence in distribution
for random elements taking values in the metric space of infinite sequences. Under the
framework, commonly used operators such as mutation and recombination were ana-
lyzed. Our approach and analyses are new. There are many topics worth studying for
future research.
Perhaps the most immediate topic is to analyze the proportionate selection operator
in our framework. The reason that the mutation operator and the k-ary recombination
operator can be readily analyzed is partly because they do not use the information of
the fitness value. Also to generate a new individual, these operators draw information
from a fixed number of parents. D'altra parte, to generate each new individual, IL
proportionate selection operator actually gathers and uses fitness values of the whole
population. This makes analyzing proportionate selection difficult.
We think further analysis on proportionate selection can be conducted in the fol-
lowing two directions.
Evolutionary Computation Volume 28, Numero 1
83
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
B. Song and V.O.K. Li
1. In the analyses we tried to prove the stacking property on U
I for the IPM of
proportionate selection. Apart from more efforts trying to prove/disprove this
property, it is worth considering modifying the space U
IO. Per esempio, we can
incorporate the rate of convergence into the space. If we can prove the stack-
∩ U where U is the space of converging sequences with rate
ing property on U
IO
O(H(N)), it is also a meaningful result.
2. Another strategy is to bypass the sufficient conditions and return to Definition 5
k for every k. This is the original method. In essence, it requires
to prove Qn
k
studying the convergence of nesting integrals.
d→ Q∞
Apart from proportionate selection, it is also worth studying whether other oper-
ators, such as ranking selection, can be analyzed in our framework. As many of these
operators do not generate c.i.i.d. offspring, it makes deriving the IPM and proving its
convergence difficult, if not impossible. In this regard, we believe new techniques of
modeling and extensions of the framework are fruitful directions for further research.
Finalmente, it is possible to extend the concept of “incidence vectors” proposed by Vose
to the continuous search space. After all, as noted by Vose himself, incidence vectors can
also be viewed as marginal p.d.f.s of individuals. As a consequence, the cases of EAs on
discrete and continuous solution spaces indeed do bear some resemblance. By an easy
extension, the incidence vectors in the continuous space can be defined as functions
ciδ(xi ), where δ is the Dirac function and ci is the rational number
with the form
representing the fraction that xi appears in the population. If similar analyses based on
this extension can be carried out, many results in Nix and Vose (1992) and Vose (1999B,UN,
2004) can be extended to the continuous space.
(cid:3)
Riferimenti
Billingsley, P. (1999). Convergence of probability measures. 2nd ed. New York: John Wiley & Sons.
Dudley, R. M. (2002). Real analysis and probability. 2nd ed. Cambridge: Cambridge University
Press.
Gibbs, UN. L., and Su, F. E. (2002). On choosing and bounding probability metrics. Internazionale
Statistical Review, 70(3): 419–435.
Kallenberg, O. (2002). Foundations of modern probability. 2nd ed. New York: Springer.
Nix, A., and Vose, M. D. (1992). Modeling genetic algorithms with Markov chains. Annals of Math-
ematics and Artificial Intelligence, 5(1): 79–88.
Port, S. C. (1994). Theoretical probability for applications. New York: John Wiley & Sons.
Qi, X., and Palmieri, F. (1994UN). Theoretical analysis of evolutionary algorithms with an infinite
population size in continuous space. Part I: Basic properties of selection and mutation. IEEE
Transactions on Neural Networks, 5(1): 102–119.
Qi, X., and Palmieri, F. (1994B). Theoretical analysis of evolutionary algorithms with an infinite
population size in continuous space. Part II: Analysis of the diversification role of crossover.
IEEE Transactions on Neural Networks, 5(1): 120–129.
Taylor, R. L., Daffer, P. Z., and Patterson, R. F. (1985). Limit theorems for sums of exchangeable random
variables. Totowa, NJ: Vogatore & Allanheld.
Vose, M. D. (1999UN). The simple genetic algorithm: Foundations and theory. Cambridge, MA: MIT
Press.
84
Evolutionary Computation Volume 28, Numero 1
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Space
Vose, M. D. (1999B). What are genetic algorithms? A mathematical prespective. In L. Davis, K. De
Jong, M. Vose, and L. Whitley (Eds.), Evolutionary algorithms, pag. 251–276. The IMA Volumes
in Mathematics and Its Applications, vol. 111. New York: Springer.
Vose, M. D. (2004). Infinite population GA tutorial. Technical Report ut-cs-04-533. The University
of Tennessee, Knoxville.
Yong, G., Xiaofeng, Q., and Palmieri, F. (1998). Comments on “Theoretical analysis of evolution-
ary algorithms with an infinite population size in continuous space. IO. Basic properties of
selection and mutation” [with reply]. IEEE Transactions on Neural Networks, 9(2): 341–343.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
/
/
e
D
tu
e
v
C
o
UN
R
T
io
C
e
–
P
D
l
F
/
/
/
/
/
2
8
1
5
5
2
0
2
0
3
5
6
e
v
C
o
_
UN
_
0
0
2
4
9
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Evolutionary Computation Volume 28, Numero 1