PERSPECTIVE - 麻省理工学院人工智能研究专业

PERSPECTIVE

Do Infants Really Learn Phonetic Categories?

Naomi H. Feldman1*

, Sharon Goldwater2*, Emmanuel Dupoux3,4, and Thomas Schatz1

1Department of Linguistics and UMIACS, University of Maryland, 学院公园, 医学博士, 美国
2School of Informatics, 爱丁堡大学, 英国
3Cognitive Machine Learning (ENS – EHESS – PSL Research University – 法国国家科学研究中心 – INRIA), 巴黎, 法国
4Facebook A.I. 研究, 巴黎, 法国
*These authors contributed equally to this work.

开放访问

杂志

关键词: language acquisition, speech perception, 计算建模, 表示
学习

抽象的

Early changes in infants’ ability to perceive native and nonnative speech sound contrasts are
typically attributed to their developing knowledge of phonetic categories. We critically
examine this hypothesis and argue that there is little direct evidence of category knowledge in
infancy. We then propose an alternative account in which infants’ perception changes
because they are learning a perceptual space that is appropriate to represent speech, 没有
yet carving up that space into phonetic categories. If correct, this new account has substantial
implications for understanding early language development.

介绍

Infants’ perception of speech becomes specialized for the native language even before their
first birthday. Discrimination of native contrasts improves, and discrimination of nonnative
contrasts declines (Kuhl et al., 2006; Werker & Tees, 1984). These changes are often assumed
to reflect the development of adultlike perceptual patterns, and more specifically of adultlike
phonetic category representations: linguistically relevant categories that are phoneme-length
and correspond roughly to the consonants and vowels of a language (最好的, 1994; Kuhl et al.,
1992; Werker et al., 2007; Zevin, 2012).1 These assumptions have been motivated by the
close ties observed in adults between native language phonetic categories and language-
specific patterns of discrimination along phonetically relevant dimensions, as shown schemat-
ically in Figure 1 (Liberman et al., 1957).

If early changes in discrimination result from early knowledge of phonetic categories—
discrete units, with or without explicit labels, that roughly correspond to linguistically rele-
vant sounds like [r] (as in rock) 和 [我] (as in lock)—then infants must learn these categories
by their first birthday. The categories would then drive changes to their perceptual space
(Figure 2a). 然而, phonetic categories are difficult to learn from the speech infants hear
(Antetomaso et al., 2017; Bion et al., 2013), raising doubts about the feasibility of early

1 Contextual variants of a phoneme are generally treated as different categories, with phonetic categories cor-
responding roughly to allophones (Dillon et al., 2013; Werker & Curtin, 2005; but see Pegg & Werker, 1997).

引文: 费尔德曼, 氮. H。, Goldwater, S。,
Dupoux, E., & Schatz, 时间. (2021). 做
Infants Really Learn Phonetic
Categories? 开放的心态: Discoveries
in Cognitive Science, 5, 113–131.
https://doi.org/10.1162/opmi_a_00046

DOI:
https://doi.org/10.1162/opmi_a_00046

已收到: 31 一月 2020
公认: 6 八月 2021

利益争夺:
The authors declare no conflict
of interest.

Corresponding Authors:
Naomi H. 费尔德曼
nhf@umd.edu
Sharon Goldwater
sgwater@inf.ed.ac.uk

版权: © 2021
麻省理工学院
在知识共享下发布
归因 4.0 国际的
(抄送 4.0) 执照

麻省理工学院出版社

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

数字 1. Hypothetical identification and discrimination functions in two-alternative forced
choice tasks.

phonetic category learning. Early phonetic category learning has been questioned before
( Jusczyk, 1992), yet only a few concrete alternative accounts of infants’ changes in discrim-
ination have been proposed (Guenther & Gjaja, 1996; Herrmann et al., 1995; Matusevych
等人。, 2020; Schatz et al., 2021).

Here we critically examine the evidence for phonetic category learning in infancy and highlight
recent developments in speech technology which, we argue, can inspire an alternative account of
early perceptual learning where phonetic categories are not involved. Under this account, early
changes in discrimination are caused by a learning process that—without recourse to phonetic
categories—transforms the acoustic similarity space, changing the perceptual distances between

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

数字 2. Phonetic category learning vs. perceptual space learning. (A) Under standard phonetic category learning theories, infants identify
categories early. 因此, perception becomes warped along phonetically relevant dimensions (Dimension 1) and discrimination decreases
along phonetically irrelevant dimensions (Dimension 2). (乙) An alternative theory is that learners’ perceptual space undergoes substantial
changes before phonetic categories are learned. In this simplistic example, perceptual learning collapses the dimension of lower variance,
decreasing discrimination along Dimension 2. As described later, we believe perceptual space learning actually involves more complex
transformations.

开放的心态: 认知科学的发现

114

Do Infants Learn Phonetic Categories?

Feldman et al.

声音 (Figure 2b). Phonetic categories are learned later, or more gradually, by carving up this
learned space. We refer to the earlier phase of learning as perceptual space learning2 and discuss
several algorithms that might be used to implement such learning, including learning without
any discrete units, or with units that do not correspond meaningfully to phones. Changes in
discrimination driven by knowledge of phonetic categories could in principle also be consid-
ered a type of perceptual space learning, but here we restrict the term to mean learning with-
out phonetic categories. We do not argue conclusively against the early phonetic category
learning hypothesis; 反而, we argue that perceptual space learning, which has thus far
received little attention in the language acquisition literature, should be seriously considered
as a plausible alternative theory of what causes infants’ perceptual changes.

Attributing infants’ perceptual changes to perceptual space learning would have major im-
plications for theories of language acquisition. Phonetic category learning has conventionally
been thought to occur before ( Werker et al., 2009) or alongside (Swingley, 2009) word learning,
enabling word forms to be composed of sequences of phones from the earliest stages. This hy-
pothesized trajectory makes phonetic category learning a difficult problem because it cannot
draw on extensive knowledge of word meanings, which would provide information about
which sounds in a language are meaningfully different (Trubetzkoy, 1939). 然而, if phonetic
category learning occurs later in childhood, it could draw on a broad array of word meanings
and minimal pairs, making it an easier problem (McMurray et al., 2018). Perceptual space learn-
ing would also have broad implications for other areas of language acquisition, such as under-
standing when and how infants notice that words are mispronounced (Curtin et al., 2009;
Fennell & Werker, 2003; Rost & McMurray, 2009; Stager & Werker, 1997), studying whether
infant-directed speech is optimized for phonetic learning (Cristia & Seidl, 2014; Eaves et al.,
2016; Kuhl et al., 1997; McMurray et al., 2013), or understanding the challenges of adult second
language learning (Flege & Hillenbrand, 1986; Francis & Nusbaum, 2002; Lipski et al., 2012;
Underbakke et al., 1988; Ylinen et al., 2009). 更普遍, it would radically change our view
of what children know at the beginning of their second year, a period when they rapidly acquire
aspects of language related to grammar and meaning.

CHILDREN’S PERCEPTUAL LEARNING

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

The primary evidence for phonetic category learning in infancy comes from experiments that
measure infants’ discrimination of native and nonnative sound contrasts. The discrimination
tasks do not inherently require category knowledge (Box 1), but they do reveal changes in
discrimination that are suggestive of category learning (as articulated by Zevin, 2012).
Discrimination of nonnative speech contrasts generally declines during the first year of life:
by 10–12 months for consonants and by 6–8 months for vowels (Anderson et al., 2003;
最好的 & McRoberts, 2003; Best et al., 1995; Bosch & Sebastián-Gallés, 2003; Burns et al.,
2007; Kuhl et al., 1992; Segal et al., 2016; Tsuji & Cristia, 2014; Werker & Lalonde, 1988;
Werker & Tees, 1984). During the same time period, discrimination of native contrasts gener-
ally improves (Burns et al., 2007; Kuhl et al., 2006; Narayan et al., 2010; Tsao et al., 2006).
Although there are exceptions to this pattern (Best et al., 1988; L. 刘 & Kager, 2014, 2016;
Mattock & Burnham, 2006; Mazuka et al., 2014; Mugitani et al., 2009; Polka & Bohn, 1996;
Polka et al., 2001; Sundara et al., 2006; Yeung et al., 2013), it is clear that infants’ perception
becomes more native-like as they are exposed to their native language.

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

2 In machine learning the usual term is unsupervised representation learning, but we want to avoid confusion
caused by the broader meaning of representation in cognitive science.

开放的心态: 认知科学的发现

115

Do Infants Learn Phonetic Categories?

Feldman et al.

Box 1. DO INFANT DISCRIMINATION TASKS REQUIRE CATEGORY KNOWLEDGE?

Most tests of infant speech perception have used one of two paradigms. In a habituation exper-
iment, infants experience repeated trials in which they hear a habituation stimulus—exemplars from
one phonetic category—while viewing a visual display. Once their looking time to habituation trials
falls below a threshold, discrimination is measured as the extent to which they look longer at change
试验 (with exemplars from another category) than at same trials (with exemplars from the habituated
类别). Infants need to be able to discriminate a contrast in order to show different looking behav-
ior toward change trials and same trials. 然而, infants can succeed at this task without knowing
phonetic categories, as long as they perceive the stimuli on change trials to be acoustically anom-
alous, relative to the habituation trials. Similar considerations hold for the oddball paradigm used by
Hochmann and Papeo (2014).

The other paradigm that is frequently used to measure infant speech perception is the condi-
tioned head turn (CHT) procedure, in which infants face an experimenter who is playing with toys
and hear a background stimulus from a loudspeaker on the side of the room. On change trials, 这
stimulus changes to an exemplar from the other phonetic category, and they can look toward the
loudspeaker and see toys light up and start to move. On same trials, when the category does not
改变, looking toward the loudspeaker does not yield any visual reward. After an initial condi-
tioning phase, discrimination is assessed by measuring head turns on change trials, relative to same
试验. As in habituation experiments, infants need to be able to discriminate a contrast in order to
show different looking behavior toward change trials and same trials. 然而, because this par-
adigm involves a decision of whether to perform a head turn, it resembles identification tasks in
some ways. Particularly striking are studies showing that when trained on a phonetic contrast, 在-
fants can generalize to novel speakers during test in a CHT paradigm (Kuhl, 1979, 1983). 这
seems to suggest that infants already know that phonetic differences, but not speaker differences,
signal a category distinction.

然而, it is possible that the categorical patterns of generalization reflect learning that has
occurred during the experiment. The visual reinforcements that infants see during a CHT exper-
iment provide a reward signal that could engage reinforcement learning mechanisms, which ap-
pear to be particularly successful in driving auditory perceptual learning in adults (Lim et al.,
2019; Lim & 霍尔特, 2011; Tricomi et al., 2006). In line with this, Kuhl (1979) notes that the infants
initially make head turns toward stimuli that vary from the background stimulus along irrelevant
方面, such as speaker or pitch, but that this tendency lessens over the course of the exper-
iment. She hypothesizes that learning has occurred during the experiment and suggests that

the infant demonstrates a proclivity to try to discover a criterial attribute which separates
the two categories. The infant, 有效, displays a tendency to be a “natural sorter,“ 和
is attracted to a dimension which makes a set of multidimensional auditory stimuli fit
into easily recognized perceptual groupings. (p. 1674)

换句话说, Kuhl hypothesizes that it is the functional equivalence of different exemplars with
respect to the visual reinforcement in the CHT paradigm that supports learning of new cue weights.
Given that this learning could occur within the experiment itself, the categorical head-turn behavior
that infants exhibit within this paradigm does not necessarily support the strong hypothesis that they
come into the lab with well-formed phonetic categories (see Apfelbaum & McMurray, 2011, for a
similar argument). Whether, and at what age, children use the same strategy to learn phonetic cat-
egories in more naturalistic settings remains an open question.

开放的心态: 认知科学的发现

116

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

A category-based account of these perceptual changes would entail that learners group
stimuli into discrete units that correspond roughly to the phones of a language. As shown in
Figure 2a, the categories would then drive changes in the perceptual space (Bonnasse-Gahot
& Nadal, 2008; Kuhl, 1979). 然而, there are reasons to question whether categories are
the driving force behind infants’ perceptual changes. Box 2 distinguishes three perceptual ef-
fects that are often associated with category knowledge. If all three are direct results of cate-
gory knowledge, then they should develop in tandem, as categories are learned. Given the
substantial evidence that discrimination of nonnative contrasts declines sharply relative to na-
tive contrasts during infants’ first year (Effect 3), one might also expect to find sharpening cat-
egory boundaries (Effect 1) or sharpening discrimination peaks along phonetically relevant
方面 (Effect 2) in young infants. Yet there is little evidence that these effects develop
during the same time period.

Box 2. PERCEPTUAL EFFECTS ASSOCIATED WITH CATEGORIES

Three types of perceptual effects are typically assumed to arise from category knowledge. 尽管
there is substantial evidence that the first two are closely tied to knowledge of categories, 或者至少
distinct clusters of sounds, we argue that the third effect is more general, and need not reflect such
知识.

Effect 1 is a sharp category boundary in identification tasks (Liberman et al., 1957; 数字 1).
Performing an identification task requires category knowledge, given the use of category labels in
the task. 然而, changes in steepness of the category boundary during learning could arise either
from changes in category knowledge, or from children’s improving ability to perform an identifica-
tion task. These two possibilities can be disambiguated through a phenomenon known as cue
weighting, which refers to the relative steepness of the identification curve across different dimen-
西翁. Changes in cue weighting have been tied to category learning across many studies (Francis
等人。, 2000; Francis et al., 2008; Francis & Nusbaum, 2002; 霍尔特 & Lotto, 2006; Idemaru & 霍尔特,
2011, 2014; Lehet & 霍尔特, 2017; 右. 刘 & 霍尔特, 2015; 哪个 & Sundara, 2019; Ylinen et al., 2009),
and cue weights are also key to many models of categorization (Kruschke, 1992; Love et al., 2004;
Nosofsky, 1986; Toscano & McMurray, 2010), suggesting that this effect is closely tied to category
知识.

Effect 2 is a discrimination peak near the category boundary (Liberman et al., 1957; 数字 1).
While some models do attribute peaks in discrimination near the category boundary to category
知识 (Feldman et al., 2009; Kuhl, 1993; Lacerda, 1995), other models have suggested that this
effect may only require distinct clusters in the distribution of sounds in the acoustic space (像
distributions in the third panel in Figure 2a) even if the clusters are not recognized as discrete units
(Guenther & Gjaja, 1996; Herrmann et al., 1995; Shi et al., 2010). 而且, categories with high
variability (Figure 2a, second panel) may not yield a distinctive discrimination peak (Kronrod et al.,
2016). 因此, we take the discrimination peak to index how tightly clustered the distribution of sounds
is in listeners’ perceptual space. Whether well-separated clusters of sounds constitute perceptual
categories is a matter of some debate; to avoid overloading terminology, we simply refer to these
as clusters of sounds in a perceptual space.

Effect 3 is listeners’ differential ability to discriminate sounds along different dimensions. 对于前-
充足, English listeners discriminating instances of [r] 和 [我] are more sensitive to differences in the
third formant than to differences in the second formant, whereas Japanese listeners have roughly
equal sensitivity to both dimensions (Iverson et al., 2003). Listeners can retain sensitivity to cues even
when they stop using those cues to categorize sounds (Lehet & 霍尔特, 2020), so changes in sensitivity

开放的心态: 认知科学的发现

117

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

in discrimination tasks are not necessarily the same thing as changes in cue weighting. 理论上, 这是
possible to lose or gain the ability to discriminate along certain dimensions even without representing
well-separated clusters of sounds in a perceptual space (Figure 2b; 数字 3; 数字 4); that is the
possibility we explore in this article.

The scope of this last effect merits consideration, because although discrimination is typically
assumed to be better along phonetically relevant dimensions than along phonetically irrelevant di-
mensions (比照. Goldstone, 1994), there are exceptions to this generalization (Best et al., 1988).
而且, even if there were no exceptions, predicting exactly which contrasts are difficult to dis-
criminate requires knowing the dimensions of listeners’ perceptual space. The second formant in
tokens of [我] 或者 [r] may be a different perceptual dimension than the second formant in vowels, 为了
实例. For the purposes of this article, we take the primary signature of Effect 2 to be a peak in
discrimination near a category boundary. Absent evidence of the development of such a peak, 我们
tentatively assume that any changes in discrimination could instead be instances of Effect 3.

Identification tasks are challenging to carry out with infants, but the few studies that have
directly measured English-learning infants’ categorization have found extremely shallow iden-
tification boundaries (Burnham, 1986; Burnham et al., 1991). Boundaries become steeper—as
measured through aggregated data and individual participants’ identification functions—
之间 3 和 7 年, with differences even between 6- or 7-year-olds and adults in some
案例 (Burnham, 1986; Burnham et al., 1991; 陈等人。, 2017; Hazan & Barrett, 2000;
克劳斯, 1982; Kuijpers, 1996; McMurray et al., 2018; Ohde & Haley, 1997; 西蒙 &
Fourcin, 1978; Zlatin & Koenigsknecht, 1975). These changes could be partly due to chil-
dren’s improving ability to perform identification tasks, but task difficulty is not the only factor.
Across much of the range between 3-year-olds and adults, the increase in category boundary
steepness depends on the category being tested (Slawinski & Fitzgerald, 1998) and on the spe-
cific phonetic dimensions along which those categories are tested (Greenlee, 1980; Hazan &
Barrett, 2000; Nittrouer, 1992; Nittrouer & 磨坊主, 1997; Nittrouer & Studdert-Kennedy, 1987;
Ohde et al., 1995; Ohde & Haley, 1997), indicating that children are reweighting different
dimensions as cues to category membership. These differential changes in category boundary
steepness strongly suggest that at least some category learning occurs later in childhood.

Discrimination peaks along phonetically relevant dimensions sharpen in tandem with the
changes in category boundary steepness later in childhood (陈等人。, 2017; Medina et al.,
2010), whereas in infants, evidence for the development of discrimination peaks is mixed.

数字 3. Example illustrating how different perceptual space learning methods could lead to different perceived distances between the
same original points. 这里, both methods map points from a two-dimensional space to a one-dimensional line. The mapping is shown ex-
plicitly for only four points; distances along the line correspond to perceptual distances in the learned space. (A) In the linear mapping, 这
brown stars are mapped to the same location, so the distinction between these points is lost, whereas the red squares remain distinct. (乙) 在里面
nonlinear mapping, the opposite holds.

开放的心态: 认知科学的发现

118

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

⇒

数字 4. Perceptual space learning can make category learning easier. The marker shapes/colors represent ground truth category labels,
which are unknown to the learner; the dotted line highlights the transformation. The decision boundary is simpler after transforming the space.

Newborn and 6-month-old English and Swedish learners show cross-linguistic differences in
vowel perception for [我] 和 [y] (Kuhl et al., 1992; Moon et al., 2013), and English-learning
6-month-olds’ discrimination is worse near a prototypical [我] than near a nonprototypical [我],
similar to adults (Grieser & Kuhl, 1989; Kuhl, 1991). These studies are suggestive, but do not
provide direct evidence that between-category discrimination peaks are developing in infancy.
In consonants, there are cross-linguistic differences in infants’ voice onset time ( VOT) discrim-
信息 (Eilers et al., 1979; Streeter, 1976), with a clear peak in discrimination near the phonetic
category boundary in English-learning 1- and 4-month-old infants (Eimas et al., 1971). 然而,
a meta-analysis of infant studies with English learners did not find evidence that the VOT dis-
crimination peak sharpens over the first year of life (Galle & McMurray, 2014). 而且, 这
discrimination peak is also present in nonhuman animals (Kuhl, 1981; Kuhl & 磨坊主, 1975; Kuhl
& Padden, 1982), suggesting that it arises from an auditory discontinuity. Whether auditory dis-
continuities constitute knowledge of categories, and how they relate to subsequent perceptual
学习, is less clear (Chládková & Paillereau, 2020). One study did find that French-learning
infants’ VOT discrimination changes between 4 和 8 months in the direction that would be
expected if they were learning phonetic categories (Hoonhorst et al., 2009), providing some
evidence of a developing discrimination peak. 全面的, 然而, there is little convincing ev-
idence that peaks in discrimination along phonetically relevant dimensions sharpen substan-
tially during infants’ first year.

The literature thus suggests that different perceptual changes occur at different ages. Infants’
discrimination changes substantially during the first year (Effect 3), but changes that are diag-
nostic of category learning (Effect 1) and of increasing perceptual separation between clusters
of sounds (Effect 2) are most clearly documented later in childhood. Existing accounts never-
theless attribute both infant and childhood perceptual changes to category learning (Burnham,
1986; Zevin, 2012). We question this interpretation for two reasons. 第一的, as we argue in the
next section, general changes in discrimination are compatible with various perceptual space
learning algorithms that do not require phonetic categories at all. 第二, for phonetic cate-
gories to be the cause of those drastic early perceptual changes, one must either posit well-
developed categories (in which case the missing evidence of Effect 2 is puzzling), or suppose
that noisy, poorly developed categories can drive a drastic reshaping of the perceptual space
to yield Effect 3, even though those same category representations are too noisy to yield dis-
crimination peaks along phonetically relevant dimensions (Effect 2).

由于这些原因, we believe it is time for the field to consider the possibility that infants’
perceptual changes primarily reflect a perceptual space learning process. Early perceptual
development would look more like Figure 2b, or a more sophisticated variant (discussed
following). Learning phonetic categories to carve up this perceptual space could then extend

开放的心态: 认知科学的发现

119

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

well into childhood and even adolescence. Although there is, as yet, little empirical evidence
to distinguish this hypothesis from the early phonetic category learning hypothesis, 后者
makes stronger assumptions about the nature of early representations that have yet to be
clearly validated.

COMPUTATIONAL APPROACHES TO PERCEPTUAL SPACE LEARNING

Although cognitive scientists have proposed a handful of perceptual space learning models for
speech (Gauthier et al., 2007; Guenther & Gjaja, 1996; Herrmann et al., 1995; Nixon &
Tomaschek, 2021; Westermann & Reck Miranda, 2004), perceptual space learning is more
actively studied in the machine learning community, where it is well-known that modified
representations of input features can be learned without access to, and without necessarily
导致, categorical knowledge. This type of learning has been used in many domains,
including vision and speech (Chung et al., 2019; Erhan et al., 2010; Kamper et al., 2015;
Ranzato et al., 2007; Schneider et al., 2019; van den Oord et al., 2018; Yu et al., 2010),
and there is even a series of recent speech technology challenge tasks devoted to the topic
(Dunbar et al., 2017, 2019; Versteegh et al., 2015).

Perceptual space learning is popular in machine learning because it can improve a system’s
ability to learn from the signal: for example for speech, spectral information, or even wave-
形式. 相比之下, cognitive models often use more abstract features (such as formants) 如-
放. 然而, starting from abstract features skips over a critical part of the learning process,
wherein infants must learn which of the many dimensions of raw speech are relevant to pro-
cessing their native language. We argue that this aspect of learning, which most cognitive
models do not consider at all, could explain many of the perceptual changes seen in young
婴儿.

To illustrate, consider a well-known method for perceptual space learning: principal com-
ponent analysis (PCA). PCA reduces the dimensionality of data in order to learn a more com-
pact representation that still preserves the most important information. 例如, 在里面
speech domain each input data point might represent a short (10 多发性硬化症) slice of speech using
a vector where each dimension represents the value of some acoustic measure such as spectral
活力. Some of these dimensions may vary independently, while others may be highly cor-
related or simply record random noise—thus, most of the information can be represented using
a smaller number of dimensions. PCA identifies the orthogonal dimensions of greatest varia-
tion in the original data, rotates these to align with the axes of the vector space, and discards
dimensions with low variation. 那是, it learns a representation that is optimized to capture
the greatest amount of variance in the data.

The transformation learned by PCA is linear, since it simply rotates the axes of the space
before collapsing some dimensions. 然而, many perceptual space learning methods are
more powerful, in that they learn a nonlinear transformation, warping the original space in
potentially arbitrary ways (数字 3).3 The result is that points that were close together in the
input space may end up far apart in the learned space or vice versa. 所以, if discrimina-
tion depends on distance in some perceptual space (Shepard, 1987), perceptual space learning
could lead to changes in discrimination.

Although perceptual space learning is not directly optimized for categorization, 它可以
nevertheless help with later category learning by factoring out irrelevant features or warping

3 Although both of the illustrated methods reduce dimensionality, perceptual space learning can also maintain
or even increase dimensions; the key property is that it changes the shape of the input space.

开放的心态: 认知科学的发现

120

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

the space in a way that makes the category structure more obvious (数字 4). This effect has
been demonstrated both in cognitive models of auditory learning (Gauthier et al., 2007; Roark
等人。, 2020) and in machine learning models, where “pretraining” a system’s perceptual space
on a generic unsupervised task (such as predicting the next input in a sequence) can improve
performance on a variety of downstream tasks (such as question answering or phone classifi-
阳离子) (Chung et al., 2019; Devlin et al., 2019; Erhan et al., 2010; Peters et al., 2018;
Schneider et al., 2019). While it is theoretically possible that systems pretrained on speech
could be implicitly learning phonetic categories, evidence from models that do learn quan-
tized representations (latent categories) suggests otherwise: the learned units are typically
far more granular than phonetic categories, and often cannot even be well-characterized as
sub-phones or subsets of phonetic categories (Baevski et al., 2020; Baevski, 施耐德, & Auli,
2019; Chorowski et al., 2019; Hsu et al., 2021; Schatz et al., 2021).

These recent successes in machine learning have led to a proliferation of new work on
perceptual space learning algorithms. 因此, cognitive scientists should be considering not just
whether perceptual space learning could explain infants’ early perceptual development, 但
more specifically which algorithms might provide good models for infant learning. 这些
algorithms differ in the source of the learning signal and the cognitive plausibility and
domain-specificity of the mechanism. 例如, self-organizing maps (Kohonen, 1989,
2001) are an early method for nonlinear dimensionality reduction, based on competitive learn-
英. More popular in the speech community are autoencoder neural networks (Chorowski
等人。, 2019; van Niekerk et al., 2020), which can be viewed as a domain-general learning
mechanism inspired by memory encoding: they learn to encode each input into an internal
representation that allows the original input to be reconstructed as closely as possible. 其他
recent algorithms aim to predict missing or upcoming stretches of speech, with the learning
signal coming from prediction errors—another cognitively plausible domain-general mecha-
nism (Baevski, Auli, & Mohamed, 2019; Baevski et al., 2020; Baevski, 施耐德, & Auli,
2019; Chung et al., 2019; Hsu et al., 2021).

There have also been recent proposals for more domain-specific perceptual space learning
methods that rely on a noisy top-down signal provided by knowledge of some word-like units
(Kamper et al., 2015; Renshaw et al., 2015; Riad et al., 2018; Thiollière et al., 2015). 这些
units can be found by searching for stretches of speech that form similar pairs or clusters, 和-
out any knowledge of phones ( Jansen & Van Durme, 2011; McInnes & Goldwater, 2011; 公园
& Glass, 2008; Räsänen & Blandon, 2020). Assuming that the clusters represent different in-
stances of the same word, the learner can then adjust its current representation of the low-level
speech features to make these instances even closer together in perceptual space. 初步的
evidence suggests that models using this mechanism can learn representations that demon-
strate some of the effects seen in infants (Matusevych et al., 2020). At a high level, this is es-
sentially the mechanism proposed by Jusczyk (1992), and—unlike the other methods
described above—it does use a form of categorical knowledge (word categories) to guide
学习. Whereas we argue in the next section that phonetic categories are difficult to learn
due to high acoustic overlap, word-like units are likely to have fewer near acoustic neighbors
than phones (Swingley, 2009), which could make them easier for infants to discover in natu-
ralistic speech (比照. Jusczyk & Aslin, 1995; Jusczyk et al., 1999).

REVISITING PHONETIC CATEGORY LEARNING

Learners eventually develop sharp identification boundaries and discrimination peaks, 提供-
ing evidence of well-separated categories (Box 2). Under a phonetic category learning account

开放的心态: 认知科学的发现

121

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

of infants’ perceptual changes, much of the category learning process happens in infancy.
Under a perceptual space learning account, category learning might occur later or more grad-
ually, and even if it begins in infancy, it is not the primary driver of infants’ perceptual changes.
Either way, there must be a mechanism for learning phonetic categories.

Distributional learning (Maye et al., 2002) has emerged as a leading hypothesis for a mech-
anism that could operate in infancy. Infants discriminate stimuli better after hearing a bimodal
distribution—with two distinct clusters of sounds—along the relevant phonetic dimension than
after hearing a unimodal distribution (Cristia, 2011; Maye et al., 2002; Maye et al., 2008;
Wanrooij et al., 2014; Yoshida et al., 2010; see Cristia, 2018, for a meta-analysis). This ability
to track acoustic distributions of sounds could support category learning if phonetic categories
corresponded to well-separated clusters of sounds.

然而, while some contrasts in laboratory speech are well-separated acoustically (Lisker
& 艾布拉姆森, 1964), categories overlap substantially in naturalistic speech, as in the second
panel of Figure 2a (Antetomaso et al., 2017; Bard & 安德森, 1982; Bion et al., 2013;
Hitczenko et al., 2020; 波拉克 & 皮克特, 1963; Swingley, 2019).4 Most models that have
tested the feasibility of distributional learning for identifying phonetic categories have simpli-
fied the learning problem, 例如, by using artificial data with low variability (McMurray
等人。, 2009; Pajak et al., 2013; Vallabha et al., 2007), focusing only on subsets of the catego-
ries infants would need to acquire (Adriaans & Swingley, 2017; de Boer & Kuhl, 2003;
Gauthier et al., 2007), or limiting the training data to a single speaker (Miyazawa et al.,
2010; Miyazawa et al., 2011). Similar models that were tested on more realistic datasets
showed much worse performance at learning phonetic categories (Adriaans & Swingley,
2012; Jones et al., 2012; Schatz et al., 2021). 所以, the distributional sensitivity that in-
fants exhibit in simplified laboratory settings may not be sufficient to learn phonetic categories
in naturalistic settings. This may still be true even after perceptual space learning (如在
second panels of Figure 2b and Figure 4).

Aside from distributional information, phonetic category learners can draw on additional
sources of information, such as word forms or meanings (Swingley, 2009). Infants recognize
word forms in fluent speech (Bortfeld et al., 2005; Jusczyk & Aslin, 1995; Jusczyk et al., 1999)
and know some word meanings (Bergelson & Swingley, 2012); both can affect infants’ dis-
crimination in laboratory settings (费尔德曼, 迈尔斯, 等人。, 2013; Yeung & Werker, 2009).
然而, unsupervised phonetic category learning models that use contextual information
have again done better when trained in idealized settings than in more naturalistic settings
(Antetomaso et al., 2017; Feldman et al., 2013; Frank et al., 2014; C.-Y. 李等人。, 2015).

These differences between naturalistic and idealized settings make category-based
accounts of infants’ perceptual changes less parsimonious than previously believed. 什么时候
categories are heavily overlapping along some dimensions, as in the second panel of
Figure 2a, separating them—even imperfectly, as in the third panel of Figure 2a—requires find-
ing better dimensions for representing the sounds in the underlying perceptual space. Such a
transformation is similar to perceptual space learning, but is driven by category knowledge.
因此, both the category-based account and the perceptual space learning account require the
same two learning processes. What is at stake is the interdependence and relative timing of
those processes. If phonetic category learning is as difficult as the above evidence suggests, 它
might be more feasible for older children, who can draw on more knowledge of higher level

4 Although the degree of overlap depends on the specific dimensions measured, we know of no language-
universal set of dimensions that reliably yields well-separated phonetic categories (see also Chládková &
Paillereau, 2020).

开放的心态: 认知科学的发现

122

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

linguistic structure (McMurray et al., 2018) and benefit from using a learned perceptual space
with fewer irrelevant dimensions.

EMPIRICAL EVIDENCE FOR PERCEPTUAL SPACE LEARNING

There is not yet any direct evidence for a perceptual space learning process in infants.
然而, evidence from adults lends plausibility to such an account. After hearing nonspeech
stimuli in which two auditory dimensions are perfectly correlated, listeners can discriminate
between stimuli that follow the same correlation as in training, but not those that violate the
correlation (Stilp et al., 2010; Stilp & Kluender, 2012), suggesting that correlations among
dimensions can drive auditory perceptual space learning. The integration of perceptual dimen-
sions for perceiving speech is not always determined by experience (Kingston et al., 2008; S.
李 & Katz, 2016), but several studies have suggested that an experience-based perceptual
space learning process could play a role (Holt et al., 2001; Nearey, 1997; Schertz et al.,
2020) and could interact in nontrivial ways with subsequent learning of cue weights (Roark
等人。, 2020; Roark & 霍尔特, 2019; Scharinger et al., 2013).

Adults are additionally sensitive to temporal structure within perceptual dimensions. 他们的
attention to dimensions in visual perception, such as color or shape, is affected by the tempo-
ral statistics within each dimension (赵等人。, 2013)—that is, conditional probabilities,
which infants are sensitive to in auditory perception (Saffran et al., 1996). This attentional
benefit may well have an analogue in the auditory domain, given that auditory exposure to
temporal regularities elicits increased MEG amplitude in auditory cortex relative to random
序列 (Barascud et al., 2016). Although there is not yet evidence linking this attentional
benefit of temporal structure to infants’ early perceptual changes, such a strategy could poten-
tially be effective at identifying informative perceptual dimensions, because language has con-
siderable internal structure.

THE WAY FORWARD

To begin testing which type of theory best accounts for early perceptual development in
speech, it is important to take seriously the complexity of speech produced in naturalistic
环境. Naturalistic speech varies along many more acoustic dimensions than are typ-
ically manipulated in stimuli for speech perception experiments, or represented in phonetic
learning models, and several studies have already shown that considering the variability of
naturalistic speech can change our understanding of perceptual development (Antetomaso
等人。, 2017; Bion et al., 2013; Hitczenko et al., 2020). Methods for working with speech in
naturalistic settings have been developed in the context of engineering applications, and nat-
uralistic speech corpora now exist in numerous languages. By adapting these tools (例如,
Räsänen, 2011; Räsänen & Rasilo, 2015; Schatz, 2016; Schatz et al., 2013, 2021; Schatz
等人。, 2018; Schatz & 费尔德曼, 2018), cognitive scientists can begin investigating the role
of perceptual space learning in explaining how infants’ perception of speech becomes special-
ized for their native language.5

到目前为止, we know of only a handful of models that have been evaluated against infant
behavioral data after training on natural continuous speech. Schatz et al. (2021) trained a
bottom-up distributional learner—specifically, a Dirichlet process Gaussian mixture model—
on low-level spectral representations of speech from Japanese or English. The model reproduced
infants’ discrimination of [r] 和 [我], but the units it learned did not resemble phonetic categories.

5 A complete model would also need to incorporate social factors (Conboy et al., 2015; Kuhl et al., 2003; Lytle
等人。, 2018; Tripp et al., 2021).

开放的心态: 认知科学的发现

123

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

Matusevych et al. (2020) found that a recurrent neural network that optimized its hidden repre-
sentations to represent correspondences between tokens of the same word achieved perfor-
mance comparable to the model from Schatz et al. (2021). The success of these models
suggests that alternatives to the phonetic category learning hypothesis, including perceptual
space learning models that have no sub-word categories at all, are well worth exploring. 在
对比, we are not aware of a phonetic category–based model that has been trained on con-
连续的, unsegmented speech and used to predict cross-linguistic patterns of infants’ discrimi-
国家 (see Schatz et al., 2021, supplementary discussion 1, for further discussion of this gap in
the literature).

Parallels between the phonetic learning and machine learning literatures provide other rea-
sons to be optimistic about perceptual space learning theories. Perceptual space learning
algorithms that rely on word-like units (Kamper et al., 2015; Renshaw et al., 2015; Riad
等人。, 2018; Thiollière et al., 2015) are reminiscent of proposals that the words infants segment
from fluent speech can constrain phonetic category learning (费尔德曼, Griffiths, 等人。, 2013;
Swingley, 2009). The distributional learning strategy that Schatz et al. (2021) used is similar to
that proposed by Maye et al. (2002) to learn phonetic categories. Both of these strategies have
struggled to scale to more realistic data under a phonetic category learning account
(Antetomaso et al., 2017; Bion et al., 2013; Taniguchi et al., 2016), but perform well once
the constraint that phonetic categories need to be learned is dropped.

Jusczyk (1992) proposed over 25 years ago that phonetic learning might not rely on pho-
netic categories, but this idea has largely been disregarded in the literature on phonetic learn-
英. Here we have argued that this idea is consistent with a large body of empirical literature
on infant phonetic learning and have connected the proposal to recent trends in speech tech-
nology that provide paths toward a formal theory. The time course of phonetic category learn-
ing has major implications for our understanding of language acquisition as a whole, 并作为
such we hope this article will inspire serious consideration of the perceptual space learning
hypothesis and encourage the kind of rigorous empirical and computational tests that can
ultimately distinguish it from the currently popular alternative.

致谢

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

We thank Adam Albright, Richard Aslin, Yevgen Matusevych, Bob McMurray, and two anon-
ymous reviewers for insightful comments.

资金信息

NHF, National Science Foundation (https://dx.doi.org/10.13039/100000001), 奖项ID: BCS-
1734245. SG, Economic and Social Research Council (https://dx.doi.org/10.13039
/501100000269), 奖项ID: ES/R006660/1. SG, 詹姆斯·S. McDonnell Foundation (https://dx
.doi.org/10.13039/100000913), 奖项ID: Scholar Award 220020374. ED, Agence
Nationale pour la Recherche, 奖项ID: ANR-17-EURE-0017 Frontcog. ED, Agence
Nationale pour la Recherche, 奖项ID: ANR-10-IDEX-0001-02 PSL*. ED, Agence Nationale
pour la Recherche, 奖项ID: ANR-19-P3IA-0001 PRAIRIE 31A Institute. ED, Facebook AI
研究, 奖项ID: Research Grant.

作者贡献

NHF: 概念化: 带领; 资金获取: 带领; 调查: 带领; Writing – orig-
inal draft: 带领; 写作——复习 & 编辑: 带领. SG: 概念化: 带领; Funding acqui-
位置: 带领; 调查: 带领; Writing – original draft: 带领; 写作——复习 & 编辑: 带领.

开放的心态: 认知科学的发现

124

Do Infants Learn Phonetic Categories?

Feldman et al.

ED: 概念化: 配套; 资金获取: 配套; 写作——复习 & edit-
英: 配套. TS: 概念化: 配套; 写作——复习 & 编辑: 配套.

参考

Adriaans, F。, & Swingley, D. (2012). Distributional learning of
vowel categories is supported by prosody in infant-directed
speech. 客栈. Miyake, D. Peebles, & 右. 磷. 库珀 (编辑。),
Proceedings of the 34th Annual Conference of the Cognitive
科学社 (PP. 72–77). 认知科学学会.

Adriaans, F。, & Swingley, D. (2017). Prosodic exaggeration within
infant-directed speech: Consequences for vowel learnability.
Journal of the Acoustical Society of America, 141(3070),
3070–3078. https://doi.org/10.1121/1.4982246, 考研:
28599541

安德森, J. L。, 摩根, J. L。, & 白色的, K. S. (2003). A statistical basis
for speech sound discrimination. 语言和言语, 46(2–3),
155–182. https://doi.org/10.1177/00238309030460020601,
考研: 14748443

Antetomaso, S。, Miyazawa, K., 费尔德曼, N。, Elsner, M。, Hitczenko,
K., & Mazuka, 右. (2017). Modeling phonetic category learning
from natural acoustic data. 在米. LaMendola & J. 斯科特 (编辑。),
Proceedings of the 41st Boston University Conference on
Language Development (PP. 32–35). Cascadilla Press.

Apfelbaum, K. S。, & McMurray, 乙. (2011). Using variability to guide
dimensional weighting: Associative mechanisms in early word
学习. 认知科学, 35(6), 1105–1138. https://doi.org
/10.1111/j.1551-6709.2011.01181.x, 考研: 21609356

Baevski, A。, Auli, M。, & Mohamed, A. (2019). Effectiveness of self-
supervised pre-training for speech recognition. ArXiv. https://
arxiv.org/abs/1911.03912

Baevski, A。, 施耐德, S。, & Auli, 中号. (2019). vq-wav2vec: 自己-
supervised learning of discrete speech representations. 在
International Conference on Learning Representations.
OpenReview.net.

Baevski, A。, 周, Y。, Mohamed, A。, & Auli, 中号. (2020). wav2vec
2.0: A framework for self-supervised learning of speech represen-
tations. 在H. 拉罗谢尔, 中号. Ranzato, 右. Hadsell, 中号. F. Balcan, &
H. 林 (编辑。), Advances in Neural Information Processing Systems
33 (PP. 12449–12460). 柯伦联合公司.

Barascud, N。, Pearce, 中号. T。, Griffiths, 时间. D ., 弗里斯顿, K. J。, & Chait,
中号. (2016). Brain responses in humans reveal ideal observer-like
sensitivity to complex acoustic patterns. 诉讼程序
美国国家科学院, 113(5), E616–E625. https://土井
.org/10.1073/pnas.1508523113, 考研: 26787854

Bard, 乙. G。, & 安德森, A. H. (1982). The unintelligibility of
speech to children. Journal of Child Language, 10(2), 265–292.
https://doi.org/10.1017/S0305000900007777, 考研:
6874768

Bergelson, E., & Swingley, D. (2012). At 6–9 months, 人类婴儿
know the meanings of many common nouns. 诉讼程序
美国国家科学院, 109(9), 3253–3258. https://土井
.org/10.1073/pnas.1113380109, 考研: 22331874

最好的, C. 时间. (1994). Emergence of native-language influences. 在J. C.
古德曼 & H. C. Nusbaum (编辑。), The development of speech
洞察力: The transition from speech sounds to spoken words
(PP. 167–224). 与新闻界.

最好的, C. T。, & McRoberts, G. 瓦. (2003). Infant perception of non-
native consonant contrasts that adults assimilate in different
方法. 语言和言语, 46(2–3), 183–216. https://doi.org
/10.1177/00238309030460020701, 考研: 14748444

最好的, C. T。, McRoberts, G. W., LaFleur, R。, & Silver-Isenstadt, J.
(1995). Divergent developmental patterns for infants’ perception
of two nonnative consonant contrasts. Infant Behavior and
发展, 18(3), 339–350. https://doi.org/10.1016/0163
-6383(95)90022-5

最好的, C. T。, McRoberts, G. W., & Sithole, 氮. 中号. (1988). Examination
of perceptual reorganization for nonnative speech contrasts: Zulu
click discrimination by English-speaking adults and infants.
实验心理学杂志: Human Perception and
Performance, 14(3), 345–360. https://doi.org/10.1037/0096
-1523.14.3.345, 考研: 2971765

Bion, 右. A. H。, Miyazawa, K., Kikuchi, H。, & Mazuka, 右. (2013).
Learning phonemic vowel length from naturalistic recordings of
Japanese infant-directed speech. PLoS ONE, 8(2), Article e51594.
https://doi.org/10.1371/journal.pone.0051594, 考研:
23437036

Bonnasse-Gahot, L。, & Nadal, J.-P. (2008). Neural coding of cat-
egories: information efficiency and optimal population codes.
计算神经科学杂志, 25(1), 169–187.
https://doi.org/10.1007/s10827-007-0071-5, 考研:
18236147

Bortfeld, H。, 摩根, J. L。, Golinkoff, 右. M。, & Rathbun, K. (2005).
Mommy and me: Familiar names help launch babies into
speech-stream segmentation. 心理科学, 16(4),
298–304. https://doi.org/10.1111/j.0956-7976.2005.01531.X,
考研: 15828977

Bosch, L。, & Sebastián-Gallés, 氮. (2003). Simultaneous bilingualism
and the perception of a language-specific vowel contrast in the first
year of life. 语言和言语, 46(2–3), 217–243. https://土井
.org/10.1177/00238309030460020801, 考研: 14748445

Burnham, D. K. (1986). Developmental loss of speech perception:
Exposure to and experience with a first language. Applied
Psycholinguistics, 7(3), 207–240. https://doi.org/10.1017
/S0142716400007542

Burnham, D. K., Earnshaw, L. J。, & 克拉克, J. 乙. (1991). 发展
of categorical identification of native and non-native bilabial
stops: 婴儿, children and adults. Journal of Child Language,
18, 231–260. https://doi.org/10.1017/S0305000900011041,
考研: 1874826

Burns, 时间. C。, Yoshida, K. A。, 爬坡道, K., & Werker, J. F. (2007). 这
development of phonetic representation in bilingual and mono-
lingual infants. Applied Psycholinguistics, 28(3), 455–474.
https://doi.org/10.1017/S0142716407070257

陈, F。, 彭, G。, 严, N。, & 王, L. (2017). 的发展
categorical perception of Mandarin tones in four- to seven-year-old
孩子们. Journal of Child Language, 44(6), 1413–1434. https://
doi.org/10.1017/S0305000916000581, 考研: 27916015
Chládková, K., & Paillereau, 氮. (2020). The what and when of uni-
versal perception: A review of early speech sound acquisition.
Language Learning, 70(4), 1136–1182. https://doi.org/10.1111
/lang.12422

Chorowski, J。, 韦斯, 右. J。, 本吉奥, S。, & van den Oord, A. (2019).
Unsupervised speech representation learning using wavenet
autoencoders. IEEE/ACM Transactions on Audio, Speech, 和
语言处理, 27(12), 2041–2053. https://doi.org/10
.1109/TASLP.2019.2938863

开放的心态: 认知科学的发现

125

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

钟, Y.-A., Hsu, W.-N., 唐, H。, & Glass, J. (2019). An unsuper-
vised autoregressive model for speech representation learning. 在
Proceedings of Interspeech (PP. 146–150). International Speech
Communication Association. https://doi.org/10.21437
/Interspeech.2019-1473

Conboy, 乙. T。, 布鲁克斯, R。, Meltzoff, A. N。, & Kuhl, 磷. K. (2015).
Social interaction in infants’ learning of second-language pho-
网络学: An exploration of brain-behavior relations. Developmental
Neuropsychology, 40(4), 216–229. https://doi.org/10.1080
/87565641.2015.1014487, 考研: 26179488

Cristia, A. (2011). Fine-grained variation in caregivers’ /s/ predicts
their infants’ /s/ category. Journal of the Acoustical Society of
美国, 129(5), 3271–3280. https://doi.org/10.1121/1
.3562562, 考研: 21568428

Cristia, A. (2018). Can infants learn phonology in the lab? A meta-
analytic answer. 认识, 170, 312–327. https://doi.org/10
.1016/j.cognition.2017.09.016, 考研: 29102857

Cristia, A。, & Seidl, A. (2014). The hyperarticulation hypothesis
of infant-directed speech. Journal of Child Language, 41(4),
913–934. https://doi.org/10.1017/S0305000912000669,
考研: 23406830

Curtin, S。, Fennell, C。, & Escudero, 磷. (2009). Weighting of vowel
cues explains patterns of word-object associative learning.
Developmental Science, 12(5), 725–731. https://doi.org/10
.1111/j.1467-7687.2009.00814.x, 考研: 19702765

de Boer, B., & Kuhl, 磷. K. (2003). Investigating the role of infant-
directed speech with a computer model. Acoustics Research
Letters Online, 4(4), 129–134. https://doi.org/10.1121/1.1613311
Devlin, J。, 张, M.-W., 李, K., & Toutanova, K. (2019). BERT:
Pre-training of deep bidirectional transformers for language un-
理解. 在诉讼程序中 2019 Conference of the
North American Chapter of the Association for Computational
语言学: 人类语言技术, 体积 1 (Long and
Short Papers) (PP. 4171–4186). Association for Computational
语言学.

Dillon, B., Dunbar, E., & Idsardi, 瓦. (2013). A single stage
approach to learning phonological categories: Insights from
Inuktitut. 认知科学, 37(4), 344–377. https://doi.org/10
.1111/cogs.12008, 考研: 23137418

Dunbar, E., Algayres, R。, Karadayi, J。, Bernard, M。, Benjumea, J。,
曹, X.-N., Miskic, L。, Dugrain, C。, Ondel, L。, 黑色的, A. W.,
Besacier, L。, Sakti, S。, & Dupoux, 乙. (2019). The zero resource
speech challenge 2019: TTS without T. In Interspeech 2019: 20th
Annual Congress of the International Speech Communication
协会. https://doi.org/10.21437/Interspeech.2019-2904

Dunbar, E., 曹, X. N。, Benjumea, J。, Karadayi, J。, Bernard, M。,
Besacier, L。, Anguera, X。, & Dupoux, 乙. (2017). The zero
resource speech challenge 2017. 在 2017 IEEE Automatic Speech
Recognition and Understanding Workshop (PP. 323–330). IEEE.
https://doi.org/10.1109/ASRU.2017.8268953

Eaves, 乙. S。, 小。, 费尔德曼, 氮. H。, Griffiths, 时间. L。, & Shafto, 磷. (2016).
Infant-directed speech is consistent with teaching. Psychological
审查, 123(6), 758–771. https://doi.org/10.1037/rev0000031,
考研: 27088361

Eilers, 右. E., Gavin, W., & Wilson, 瓦. 右. (1979). Linguistic experi-
ence and phonemic perception in infancy: A crosslinguistic
学习. Child Development, 50(1), 14–18. https://doi.org/10
.2307/1129035, 考研: 446199

Eimas, 磷. D ., Siqueland, 乙. R。, Jusczyk, P。, & Vigorito, J. (1971).
Speech perception in infants. 科学, 171(3968), 303–306.
https://doi.org/10.1126/science.171.3968.303, 考研: 5538846
尔汗, D ., 本吉奥, Y。, 考维尔, A。, Manzagol, P.-A., Vincent, P。,
& 本吉奥, S. (2010). Why does unsupervised pre-training help

deep learning? Journal of Machine Learning Research, 11(19),
625–660.

费尔德曼, 氮. H。, Griffiths, 时间. L。, Goldwater, S。, & 摩根, J. L.
(2013). A role for the developing lexicon in phonetic category
acquisition. 心理评论, 120(4), 751–778. https://土井
.org/10.1037/a0034245, 考研: 24219848

费尔德曼, 氮. H。, Griffiths, 时间. L。, & 摩根, J. L. (2009). The influ-
ence of categories on perception: Explaining the perceptual
magnet effect as optimal statistical inference. Psychological
审查, 116(4), 752–782. https://doi.org/10.1037/a0017196,
考研: 19839683

费尔德曼, 氮. H。, 迈尔斯, 乙. B., 白色的, K. S。, Griffiths, 时间. L。, &
摩根, J. L. (2013). Word-level information influences phonetic
learning in adults and infants. 认识, 127(3), 427–438.
https://doi.org/10.1016/j.cognition.2013.02.007, 考研:
23562941

Fennell, C. T。, & Werker, J. F. (2003). Early word learners’ ability to
access phonetic detail in well-known words. 语言和
S p e e c h , 4 6 ( 2 ) , 2 4 5 – 2 6 4 . h t t p s : / / d o i . o r g / 1 0 . 1 1 7 7
/00238309030460020901, 考研: 14748446

Flege, J. E., & Hillenbrand, J. (1986). Differential use of temporal
cues to the /s/-/z/ contrast by native and non-native speakers of
英语. Journal of the Acoustical Society of America, 79(2),
508–517. https://doi.org/10.1121/1.393538, 考研: 3950204
Francis, A. L。, Baldwin, K., & Nusbaum, H. C. (2000). Effects of
training on attention to acoustic cues. Perception and
心理物理学, 62(8), 1668–1680. https://doi.org/10.3758
/BF03212164, 考研: 11140187

Francis, A. L。, Kaganovich, N。, & Driscoll-Huber, C. (2008). Cue-
specific effects of categorization training on the relative weight-
ing of acoustic cues to consonant voicing in English. 杂志
the Acoustical Society of America, 124(2), 1234–1251. https://
doi.org/10.1121/1.2945161, 考研: 18681610

Francis, A. L。, & Nusbaum, H. C. (2002). Selective attention and
the acquisition of new phonetic categories. 实验杂志
心理学: Human Perception and Performance, 28(2), 349–366.
https://doi.org/10.1037/0096-1523.28.2.349, 考研: 11999859
Frank, S。, 费尔德曼, 氮. H。, & Goldwater, S. (2014). Weak semantic
context helps phonetic learning in a model of infant language
acquisition. In Proceedings of the 52nd Annual Meeting of the
计算语言学协会 (PP. 1073–1083).
计算语言学协会. https://doi.org/10
.3115/v1/P14-1101

Galle, 中号. E., & McMurray, 乙. (2014). The development of voicing
类别: A quantitative review of over 40 years of infant
speech perception research. Psychonomic Bulletin and Review,
21(4), 884–906. https://doi.org/10.3758/s13423-013-0569-y,
考研: 24550074

Gauthier, B., Shi, R。, & 徐, 是. (2007). Learning phonetic categories
by tracking movements. 认识, 103(1), 80–106. https://土井
.org/10.1016/j.cognition.2006.03.002, 考研: 16650399

Goldstone, 右. (1994). Influences of categorization on perceptual
歧视. 实验心理学杂志: General,
123(2), 178–200. https://doi.org/10.1037/0096-3445.123.2.178,
考研: 8014612

Greenlee, 中号. (1980). Learning the phonetic cues to the voiced-
voiceless distinction: A comparison of child and adult speech.
Journal of Child Language, 7(3), 459–468. https://doi.org/10
.1017/S0305000900002786, 考研: 7440672

Grieser, D ., & Kuhl, 磷. K. (1989). Categorization of speech by in-
fants: Support for speech-sound prototypes. Developmental
心理学, 25(4), 577–588. https://doi.org/10.1037/0012-1649
.25.4.577

开放的心态: 认知科学的发现

126

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

Guenther, F. H。, & Gjaja, 中号. 氮. (1996). The perceptual magnet ef-
fect as an emergent property of neural map formation. 杂志
the Acoustical Society of America, 100(2), 1111–1121. https://
doi.org/10.1121/1.416296, 考研: 8759964

Hazan, 五、, & Barrett, S. (2000). The development of phonemic cat-
egorization in children aged 6–12. Journal of Phonetics, 28(4),
377–396. https://doi.org/10.1006/jpho.2000.0121

Herrmann, M。, Bauer, H.-U., & Der, 右. (1995). The “perceptual
magnet” effect: A model based on self-organizing feature maps.
In L. S. 史密斯 & 磷. J. 乙. Hancock (编辑。), Proceedings of the 3rd
Neural Computation and Psychology Workshop (PP. 107–116).
施普林格. https://doi.org/10.1007/978-1-4471-3579-1_9

Hitczenko, K., Mazuka, R。, Elsner, M。, & 费尔德曼, 氮. H. (2020).
When context is and isn’t helpful: A corpus study of naturalistic
speech. Psychonomic Bulletin and Review, 27(4), 640–676.
https://doi.org/10.3758/s13423-019-01687-6, 考研:
32166605

Hochmann, J.-R., & Papeo, L. (2014). The invariance problem in
infancy: A pupillometry study. 心理科学, 25(11),
2038–2046. https://doi.org/10.1177/0956797614547918,
考研: 25269621

霍尔特, L. L。, & Lotto, A. J. (2006). Cue weighting in auditory cate-
gorization: Implications for first and second language acquisi-
的. Journal of the Acoustical Society of America, 119(5),
3059–3071. https://doi.org/10.1121/1.2188377, 考研:
16708961

霍尔特, L. L。, Lotto, A. J。, & Kluender, K. 右. (2001). Influence of fun-
damental frequency on stop-consonant voicing perception: A
case of learned covariation or auditory enhancement? 杂志
of the Acoustical Society of America, 109(2), 764–774. https://
doi.org/10.1121/1.1339825, 考研: 11248980

Hoonhorst, 我。, Colin, C。, Markessis, E., Radeau, M。, Deltenre, P。, &
Serniclaes, 瓦. (2009). French native speakers in the making:
From language-general to language-specific voicing boundaries.
Journal of Experimental Child Psychology, 104(4), 353–366.
https://doi.org/10.1016/j.jecp.2009.07.005, 考研: 19709671
Hsu, W.-N., Bolte, B., Tsai, Y.-H. H。, Lakhotia, K., Salakhutdinov,
R。, & Mohamed, A. (2021). HuBERT: Self-supervised speech
representation learning by masked prediction of hidden units.
ArXiv. https://arxiv.org/abs/2106.07447

Idemaru, K., & 霍尔特, L. L. (2011). Word recognition reflects dimension-
based statistical learning. 实验心理学杂志:
Human Perception and Performance, 37(6), 1939–1956. https://
doi.org/10.1037/a0025641, 考研: 22004192

Idemaru, K., & 霍尔特, L. L. (2014). Specificity of dimension-based
statistical learning in word recognition. 实验杂志
心理学: Human Perception and Performance, 40(3), 1009–1021.
https://doi.org/10.1037/a0035269, 考研: 24364708

Iverson, P。, Kuhl, 磷. K., Akahane-Yamada, R。, Diesch, E., Tohkura,
Y。, Kettermann, A。, & Siebert, C. (2003). A perceptual interfer-
ence account of acquisition difficulties for non-native phonemes.
认识, 87(1), B47–B57. https://doi.org/10.1016/S0010-0277
(02)00198-1, 考研: 12499111

Jansen, A。, & Van Durme, 乙. (2011). Efficient spoken term discovery
using randomized algorithms. In IEEE Workshop on Automatic
Speech Recognition and Understanding (PP. 401–406). IEEE.
https://doi.org/10.1109/ASRU.2011.6163965

琼斯, C。, Meakins, F。, & Muawiyath, S. (2012). Learning vowel
categories from maternal speech in Gurindji Kriol. 语言
学习, 62(4), 1052–1078. https://doi.org/10.1111/j.1467
-9922.2012.00725.X

Jusczyk, 磷. 瓦. (1992). Developing phonological categories from the
speech signal. 在C中. A. Ferguson, L. Menn, & C. Stoel-Gammon

(编辑。), Phonological development: 楷模, 研究, implications
(PP. 17–64). 约克.

Jusczyk, 磷. W., & Aslin, 右. 氮. (1995). Infants’ detection of the sound
patterns of words in fluent speech. 认知心理学, 29(1),
1–23. https://doi.org/10.1006/cogp.1995.1010, 考研: 7641524
Jusczyk, 磷. W., Houston, D. M。, & Newsome, 中号. (1999). The begin-
nings of word segmentation in English-learning infants. 认知的
心理学, 39(3–4), 159–207. https://doi.org/10.1006/cogp
.1999.0716, 考研: 10631011

Kamper, H。, Elsner, M。, Jansen, A。, & Goldwater, S. (2015).
Unsupervised neural network based feature extraction using
weak top-down constraints. In Proceedings of the 40th IEEE
International Conference on Acoustics, Speech and Signal
加工 (PP. 5818–5822). IEEE. https://doi.org/10.1109
/ICASSP.2015.7179087

Kingston, J。, Diehl, 右. L。, 柯克, C. J。, & Castleman, 瓦. A. (2008). 上
internal perceptual structure of distinctive features: 这 [嗓音]
对比. Journal of Phonetics, 36(1), 28–54. https://doi.org/10
.1016/j.wocn.2007.02.001, 考研: 19657466

Kohonen, 时间. (1989). Self-organization and associative memory.

施普林格. https://doi.org/10.1007/978-3-642-88163-3

Kohonen, 时间. (2001). Self-organizing maps (3rd ed.). 施普林格.

https://doi.org/10.1007/978-3-642-56927-2

克劳斯, S. 乙. (1982). Vowel duration as a perceptual cue to postvo-
calic consonant voicing in young children and adults. 杂志
the Acoustical Society of America, 71(4), 990–995. https://土井
.org/10.1121/1.387580, 考研: 7085987

Kronrod, Y。, Coppess, E., & 费尔德曼, 氮. H. (2016). A unified ac-
count of categorical effects in phonetic perception. Psychonomic
Bulletin and Review, 23(6), 1681–1712. https://doi.org/10.3758
/s13423-016-1049-y, 考研: 27220996

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist
model of category learning. 心理评论, 99(1), 22–44.
https://doi.org/10.1037/0033-295X.99.1.22, 考研: 1546117
Kuhl, 磷. K. (1979). Speech perception in early infancy: Perceptual
constancy for spectrally dissimilar vowel categories. 杂志
the Acoustical Society of America, 66(6), 1668–1679. https://
doi.org/10.1121/1.383639, 考研: 521551

Kuhl, 磷. K. (1981). Discrimination of speech by nonhuman animals:
Basic auditory sensitivities conducive to the perception of
speech-sound categories. Journal of the Acoustical Society of
美国, 70(2), 340–349. https://doi.org/10.1121/1.386782
Kuhl, 磷. K. (1983). Perception of auditory equivalence classes for
speech in early infancy. Infant Behavior and Development, 6(2–3),
263–285. https://doi.org/10.1016/S0163-6383(83)80036-8

Kuhl, 磷. K. (1991). Human adults and human infants show a “per-
ceptual magnet effect” for the prototypes of speech categories,
monkeys do not. Perception and Psychophysics, 50(2), 93–107.
https://doi.org/10.3758/BF03212211, 考研: 1945741

Kuhl, 磷. K. (1993). Early linguistic experience and phonetic percep-
的: Implications for theories of developmental speech perception.
Journal of Phonetics, 21(1–2), 125–139. https://doi.org/10.1016
/S0095-4470(19)31326-9

Kuhl, 磷. K., Andruski, J. E., Chistovich, 我. A。, Chistovich, L. A。,
Kozhevnikova, 乙. 五、, Ryskina, V. L。, Stolyarova, 乙. 我。, Sundberg,
U。, & Lacerda, F. (1997). Cross-language analysis of phonetic
units in language addressed to infants. 科学, 277(5326),
684–686. https://doi.org/10.1126/science.277.5326.684,
考研: 9235890

Kuhl, 磷. K., & 磨坊主, J. D. (1975). Speech perception by the chin-
chilla: Voiced-voiceless distinction in alveolar plosive conso-
nants. 科学, 190(4209), 69–72. https://doi.org/10.1126
/science.1166301, 考研: 1166301

开放的心态: 认知科学的发现

127

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

Kuhl, 磷. K., & Padden, D. 中号. (1982). Enhanced discriminability at
the phonetic boundaries for the voicing feature in macaques.
Perception and Psychophysics, 32(6), 542–550. https://doi.org
/10.3758/BF03204208, 考研: 7167352

Kuhl, 磷. K., Stevens, E., Hayashi, A。, Deguchi, T。, Kiritani, S。, &
Iverson, 磷. (2006). Infants show a facilitation effect for native
language phonetic perception between 6 和 12 月.
Developmental Science, 9(2), F13–F21. https://doi.org/10.1111
/j.1467-7687.2006.00468.x, 考研: 16472309

Kuhl, 磷. K., Tsao, F.-M., & 刘, H.-M. (2003). Foreign-language
experience in infancy: Effects of short-term exposure and social
interaction on phonetic learning. 国家会议录
Academy of Sciences, 100(15), 9096–9101. https://doi.org/10
.1073/pnas.1532872100, 考研: 12861072

Kuhl, 磷. K., 威廉姆斯, K. A。, Lacerda, F。, Stevens, K. N。, & Lindblom,
乙. (1992). Linguistic experience alters phonetic perception in in-
fants by 6 months of age. 科学, 255(5044), 606–608. https://
doi.org/10.1126/science.1736364, 考研: 1736364

Kuijpers, C. 时间. L. (1996). Perception of the voicing contrast by
Dutch children and adults. Journal of Phonetics, 24(3), 367–382.
https://doi.org/10.1006/jpho.1996.0020

Lacerda, F. (1995). The perceptual-magnet effect: An emergent
consequence of exemplar-based phonetic memory. In K. Elenius &
磷. Branderud (编辑。), Proceedings of the XIIIth International Congress
of Phonetic Sciences (卷. 2, PP. 140–147). KTH and Stockholm
大学.

李, C.-Y., O’Donnell, 时间. J。, & Glass, J. 右. (2015). Unsupervised
lexicon discovery from acoustic input. Transactions of the
计算语言学协会, 3, 389–403. https://
doi.org/10.1162/tacl_a_00146

李, S。, & Katz, J. (2016). Perceptual integration of acoustic cues to
laryngeal contrasts in Korean fricatives. Journal of the Acoustical
美国协会, 139(2), 605–611. https://doi.org/10.1121/1
.4926435, 考研: 26936544

Lehet, M。, & 霍尔特, L. L. (2017). Dimension-based statistical learning
affects both speech perception and production. 认知的
科学, 41(S1), 885–912. https://doi.org/10.1111/cogs.12413,
考研: 27666146

Lehet, M。, & 霍尔特, L. L. (2020). 尽管如此, it persists: Dimension-
based statistical learning and normalization of speech impact dif-
ferent levels of perceptual processing. 认识, 202, 文章
104328. https://doi.org/10.1016/j.cognition.2020.104328,
考研: 32502867

Liberman, A. M。, 哈里斯, K. S。, Hoffman, H. S。, & Griffith, 乙. C.
(1957). The discrimination of speech sounds within and across
phoneme boundaries. 实验心理学杂志, 54(5),
358–368. https://doi.org/10.1037/h0044417, 考研: 13481283
Lim, S.-J., Fiez, J. A。, & 霍尔特, L. L. (2019). Role of the striatum in
incidental learning of sound categories. 诉讼程序
美国国家科学院, 116(10), 4671–4680. https://土井
.org/10.1073/pnas.1811992116, 考研: 30782817

Lim, S.-J., & 霍尔特, L. L. (2011). Learning foreign sounds in an alien
世界: Videogame training improves non-native speech catego-
rization. 认知科学, 35(7), 1390–1405. https://doi.org/10
.1111/j.1551-6709.2011.01192.x, 考研: 21827533

Lipski, S. C。, Escudero, P。, & Benders, 时间. (2012). Language experi-
ence modulates weighting of acoustic cues for vowel perception:
An event-related potential study. Psychophysiology, 49(5),
638–650. https://doi.org/10.1111/j.1469-8986.2011.01347.x,
考研: 22335401

Lisker, L。, & 艾布拉姆森, A. S. (1964). A cross-language study of
voicing in initial stops: Acoustical measurements. Word, 20(3),
384–422. https://doi.org/10.1080/00437956.1964.11659830

刘, L。, & Kager, 右. (2014). Perception of tones by infants learning a
non-tone language. 认识, 133(2), 385–394. https://doi.org
/10.1016/j.cognition.2014.06.004, 考研: 25128796

刘, L。, & Kager, 右. (2016). Perception of a native vowel contrast by
Dutch monolingual and bilingual infants: A bilingual perceptual
带领. International Journal of Bilingualism, 20(3), 335–345.
https://doi.org/10.1177/1367006914566082

刘, R。, & 霍尔特, L. L. (2015). Dimension-based statistical learning of
vowels. 实验心理学杂志: Human Perception
and Performance, 41(6), 1783–1798. https://doi.org/10.1037
/xhp0000092, 考研: 26280268

Love, 乙. C。, Medin, D. L。, & Gureckis, 时间. 中号. (2004). SUSTAIN: A
network model of category learning. 心理评论,
111(2), 309–332. https://doi.org/10.1037/0033-295X.111.2.309,
考研: 15065912

Lytle, S. R。, Garcia-Sierra, A。, & Kuhl, 磷. K. (2018). Two are better
than one: Infant language learning from video improves in the
presence of peers. 美国国家科学院院刊
科学, 115(40), 9859–9866. https://doi.org/10.1073/pnas
.1611621115, 考研: 30275298

Mattock, K., & Burnham, D. (2006). Chinese and English infants’
tone perception: Evidence for perceptual reorganization.
I n f a n c y , 1 0 ( 3 ) , 2 4 1 – 2 6 5 . h t t p s : / / d o i . o r g / 1 0 . 1 2 0 7
/s15327078in1003_3

Matusevych, Y。, Schatz, T。, Kamper, H。, 费尔德曼, 氮. H。, &
Goldwater, S. (2020). Evaluating computational models of infant
phonetic learning across languages. 在S. Denison, 中号. Mack, 是.
徐, & 乙. C. Armstrong (编辑。), Proceedings of the 42nd Annual
认知科学学会会议 (PP. 571–577).
认知科学学会.

Maye, J。, 韦斯, D. J。, & Aslin, 右. 氮. (2008). Statistical phonetic
learning in infants: Facilitation and feature generalization.
Developmental Science, 11(1), 122–134. https://doi.org/10
.1111/j.1467-7687.2007.00653.x, 考研: 18171374

Maye, J。, Werker, J. F。, & Gerken, L. (2002). Infant sensitivity to
distributional information can affect phonetic discrimination.
认识, 82(3), B101–B111. https://doi.org/10.1016/S0010
-0277(01)00157-3, 考研: 11747867

Mazuka, R。, Hasegawa, M。, & Tsuji, S. (2014). Development of
non-native vowel discrimination: Improvement without expo-
sure. Developmental Psychobiology, 56(2), 192–209. https://土井
.org/10.1002/dev.21193, 考研: 24374789

McInnes, F. R。, & Goldwater, S. (2011). Unsupervised extraction of
recurring words from infant-directed speech. In L. 卡尔森, C.
Hölscher, & 时间. Shipley (编辑。), Proceedings of the 33rd Annual
认知科学学会会议 (PP. 2006–2011).
认知科学学会.

McMurray, B., Aslin, 右. N。, & Toscano, J. C. (2009). Statistical
learning of phonetic categories: Insights from a computational
方法. Developmental Science, 12(3), 369–378. https://
doi.org/10.1111/j.1467-7687.2009.00822.x, 考研:
19371359

McMurray, B., Danelz, A。, Rigler, H。, & Seedorff, 中号. (2018).
Speech categorization develops slowly through adolescence.
Developmental Psychobiology, 54(8), 1472–1491. https://doi.org
.org/10.1037/dev0000542, 考研: 29952600

McMurray, B., Kovack-Lesh, K. A。, Goodwin, D ., & McEchron, 瓦.
(2013). Infant directed speech and the development of speech
洞察力: Enhancing development or an unintended conse-
序列? 认识, 129(2), 362–378. https://doi.org/10.1016/j
.cognition.2013.07.015, 考研: 23973465

Medina, 五、, Hoonhorst, 我。, Bogliotti, C。, & Serniclaes, 瓦. (2010).
Development of voicing perception in French: Comparing

开放的心态: 认知科学的发现

128

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

adults, adolescents, and children. Journal of Phonetics, 38(4),
493–503. https://doi.org/10.1016/j.wocn.2010.06.002

Miyazawa, K., Kikuchi, H。, & Mazuka, 右. (2010). Unsupervised
learning of vowels from continuous speech based on self-
在诉讼程序中
organized phoneme acquisition model.
Interspeech (PP. 2914–2917). https://doi.org/10.21437/Interspeech
.2010-757

Miyazawa, K., Miura, H。, Kikuchi, H。, & Mazuka, 右. (2011). 这
multi timescale phoneme acquisition model of the self-
organizing based on the dynamic features. 在诉讼程序中
Interspeech (PP. 749–752). International Speech Communication
协会. https://doi.org/10.21437/Interspeech.2011-286

Moon, C。, Lagercrantz, H。, & Kuhl, 磷. K. (2013). Language experi-
ence in utero affects vowel perception after birth: A two-country
学习. Acta Pediatrica, 102(2), 156–160. https://doi.org/10.1111
/apa.12098, 考研: 23173548

Mugitani, R。, Pons, F。, Fais, L。, Dietrich, C。, Werker, J. F。, & Amano,
S. (2009). Perception of vowel length by Japanese- and English-
learning infants. Developmental Psychology, 45(1), 236–247.
https://doi.org/10.1037/a0014043, 考研: 19210005

Narayan, C. R。, Werker, J. F。, & Beddor, 磷. S. (2010). 国际米兰-
action between acoustic salience and language experience in
developmental speech perception: Evidence from nasal place
歧视. Developmental Science, 13(3), 407–420.
https://doi.org/10.1111/j.1467-7687.2009.00898.x, 考研:
20443962

Nearey, 时间. 中号. (1997). Speech perception as pattern recognition.
Journal of the Acoustical Society of America, 101(6), 3241–3254.
https://doi.org/10.1121/1.418290, 考研: 9193041

Nittrouer, S. (1992). Age-related differences in perceptual effects of
formant transitions within syllables and across syllable bound-
aries. Journal of Phonetics, 20(3), 351–382. https://doi.org/10
.1016/S0095-4470(19)30639-4

Nittrouer, S。, & 磨坊主, 中号. 乙. (1997). Predicting developmental shifts
in perceptual weighting schemes. Journal of the Acoustical
美国协会, 101(4), 2253–2266. https://doi.org/10.1121
/1.418207, 考研: 9104027

Nittrouer, S。, & Studdert-Kennedy, 中号. (1987). The role of coarticula-
tory effects in the perception of fricatives by children and adults.
Journal of Speech and Hearing Research, 30(3), 319–329. https://
doi.org/10.1044/jshr.3003.319, 考研: 3669639

Nixon, J. S。, & Tomaschek, F. (2021). Prediction and error in early
infant speech learning: A speech acquisition model. 认识,
212, 文章 104697. https://doi.org/10.1016/j.cognition.2021
.104697, 考研: 33798952

Nosofsky, 右. 中号. (1986). Attention, 相似, and the identification-
categorization relationship. 实验心理学杂志,
115(1), 39–57. https://doi.org/10.1037/0096-3445.115.1.39,
考研: 2937873

Ohde, 右. N。, & Haley, K. L. (1997). Stop-consonant and vowel per-
ception in 3- and 4-year-old children. Journal of the Acoustical
美国协会, 102(6), 3711–3722. https://doi.org/10.1121
/1.420135, 考研: 9407663

Ohde, 右. N。, Haley, K. L。, Vorperian, H. K., & 麦克马洪, C. 瓦.
(1995). A developmental study of the perception of onset spectra
for stop consonants in different vowel environments. 杂志
the Acoustical Society of America, 97(6), 3800–3812. https://
doi.org/10.1121/1.412395, 考研: 7790658

Pajak, B., 比克内尔, K., & 征收, 右. (2013). A model of generalization
in distributional learning of phonetic categories. In Proceedings
of the Fourth Annual Workshop on Cognitive Modeling and
计算语言学 (PP. 11–20). Association for Computational
语言学.

公园, A. S。, & Glass, J. 右. (2008). Unsupervised pattern discovery in
speech. IEEE Transactions on Audio, Speech and Language
加工, 16(1), 186–197. https://doi.org/10.1109/TASL.2007
.909282

Pegg, J. E., & Werker, J. F. (1997). Adult and infant perception of
two English phones. Journal of the Acoustical Society of America,
102(6), 3742–3753. https://doi.org/10.1121/1.420137, 考研:
9407666

Peters, 中号. E., 诺伊曼, M。, 伊耶尔, M。, 加德纳, M。, 克拉克, C。, 李,
K., & Zettlemoyer, L. (2018). Deep contextualized word represen-
tations. In Proceedings of the Conference of the North American
Chapter of the Association for Computational Linguistics: 人类
语言技术 (PP. 2227–2237). 协会
计算语言学. https://doi.org/10.18653/v1/N18-1202
Polka, L。, & Bohn, O.-S. (1996). A cross-language comparison of
vowel perception in English-learning and German-learning
婴儿. Journal of the Acoustical Society of America, 100(1),
577–592. https://doi.org/10.1121/1.415884, 考研: 8675849
Polka, L。, Colantonie, C。, & Sundara, 中号. (2001). A cross-language
comparison of /d/-/ð/ perception: Evidence for a new develop-
mental pattern. Journal of the Acoustical Society of America,
109(5), 2190–2201. https://doi.org/10.1121/1.1362689,
考研: 11386570

波拉克, 我。, & 皮克特, J. 中号. (1963). The intelligibility of excerpts
from conversation. 语言和言语, 6(3), 165–171.
https://doi.org/10.1177/002383096300600305

Ranzato, M。, Poultney, C。, Chopra, S。, & Cun, 是. (2007). Efficient
learning of sparse representations with an energy-based model.
在乙. Schölkopf, J. Platt, & 时间. Hoffman (编辑。), 进展
Neural Information Processing Systems 19 (PP. 1137–1144).
与新闻界.

Räsänen, 氧. (2011). A computational model of word segmentation
from continuous speech using transitional probabilities of atomic
acoustic events. 认识, 120(2), 149–176. https://doi.org/10
.1016/j.cognition.2011.04.001, 考研: 21524739

Räsänen, 奥。, & Blandon, 中号. C. (2020). Unsupervised discovery of
recurring speech patterns using probabilistic adaptive metrics. 在
Proceedings of Interspeech (PP. 4871–4875). 国际的
Speech Communication Association. https://doi.org/10.21437
/Interspeech.2020-1738

Räsänen, 奥。, & Rasilo, H. (2015). A joint model of word segmen-
tation and meaning acquisition through cross-situational learn-
英. 心理评论, 122(4), 792–829. https://doi.org/10
.1037/a0039702, 考研: 26437151

Renshaw, D ., Kamper, H。, Jansen, A。, & Goldwater, S. (2015). A
comparison of neural network methods for unsupervised repre-
sentation learning on the zero resource speech challenge. 在
Proceedings of Interspeech. International Speech Communication
协会. https://doi.org/10.21437/Interspeech.2015-644

Riad, R。, Dancette, C。, Karadayi, J。, Zeghidour, N。, Schatz, T。, &
Dupoux, 乙. (2018). Sampling strategies in Siamese Networks
for unsupervised speech representation learning. ArXiv. https://
arxiv.org/abs/1804.11297

Roark, C. L。, & 霍尔特, L. L. (2019). Perceptual dimensions influence
auditory category learning. Attention, 洞察力, 和
心理物理学, 81(4), 912–926. https://doi.org/10.3758/s13414
-019-01688-6, 考研: 30761504

Roark, C. L。, 普劳特, D. C。, & 霍尔特, L. L. (2020). A neural network
model of the effect of prior experience with regularities on sub-
sequent category learning. 在S. Denison, 中号. Mack, 是. 徐, &
乙. C. Armstrong (编辑。), Proceedings of the 42nd Annual
认知科学学会会议 (PP. 1817–1823).
认知科学学会.

开放的心态: 认知科学的发现

129

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

Rost, G. C。, & McMurray, 乙. (2009). Speaker variability augments
phonological processing in early word learning. Developmental
科学, 12(2), 339–349. https://doi.org/10.1111/j.1467-7687
.2008.00786.X, 考研: 19143806

藏红花, J. R。, Aslin, 右. N。, & Newport, 乙. L. (1996). Statistical learn-
ing by 8-month-old infants. 科学, 274(5294), 1926–1928.
https://doi.org/10.1126/science.274.5294.1926, 考研:
8943209

Scharinger, M。, 亨利, 中号. J。, & Obleser, J. (2013). Prior experience
with negative spectral correlations promotes information integra-
tion during auditory category learning. Memory and Cognition,
41(5), 752–768. https://doi.org/10.3758/s13421-013-0294-9,
考研: 23354998

Schatz, 时间. (2016). ABX-discriminability measures and applications

(未发表的博士论文). Université Paris 6.

Schatz, T。, 巴赫, F。, & Dupoux, 乙. (2018). Evaluating automatic
speech recognition systems as quantitative models of cross-
lingual phonetic category perception. Journal of the Acoustical
美国协会, 143(5), EL372–EL378. https://doi.org/10
.1121/1.5037615, 考研: 29857692

Schatz, T。, & 费尔德曼, 氮. H. (2018). Neural network vs. HMM
speech recognition systems as models of human cross-linguistic
phonetic perception. In Proceedings of the Conference on Cognitive
计算神经科学. 认知科学学会. https://
doi.org/10.32470/CCN.2018.1240-0

Schatz, T。, 费尔德曼, 氮. H。, Goldwater, S。, 曹, X.-N., & Dupoux,
乙. (2021). Early phonetic learning without phonetic categories.
美国国家科学院院刊, 118(7), 文章
e2001844118. https://doi.org/10.1073/pnas.2001844118,
考研: 33510040

Schatz, T。, Peddinti, 五、, 巴赫, F。, Jansen, A。, Hermansky, H。, &
Dupoux, 乙. (2013). Evaluating speech features with the
minimal-pair ABX task: Analysis of the classical MFC/PLP pipe-
Interspeech (PP. 1781–1785).
线.
International Speech Communication Association. https://土井
.org/10.21437/Interspeech.2013-441

在诉讼程序中

Schertz, J。, Carbonell, K., & Lotto, A. J. (2020). Language specificity
in phonetic cue weighting: Monolingual and bilingual perception
of the stop voicing contrast in English and Spanish. Phonetica,
77(3), 186–208. https://doi.org/10.1159/000497278, 考研:
31018217

施耐德, S。, Baevski, A。, Collobert, R。, & Auli, 中号. (2019). wav2vec:
Unsupervised pre-training for speech recognition. ArXiv. https://
arxiv.org/abs/1904.05862v4

Segal, 奥。, Hejli-Assi, S。, & Kishon-Rabin, L. (2016). 的效果
listening experience on the discrimination of / ba/ and /pa/ in
Hebrew-learning and Arabic-learning infants. Infant Behavior
and Development, 42, 86–99. https://doi.org/10.1016/j.infbeh
.2015.10.002, 考研: 26708235

Shepard, 右. 氮. (1987). Toward a universal law of generalization for
psychological science. 科学, 237(4820), 1317–1323. https://
doi.org/10.1126/science.3629243, 考研: 3629243

Shi, L。, Griffiths, 时间. L。, 费尔德曼, 氮. H。, & Sanborn, A. 氮. (2010).
Exemplar models as a mechanism for performing Bayesian infer-
恩斯. Psychonomic Bulletin and Review, 17(4), 443–464. https://
doi.org/10.3758/PBR.17.4.443, 考研: 20702863

西蒙, C。, & Fourcin, A. J. (1978). Cross-language study of speech-
pattern learning. Journal of the Acoustical Society of America,
63(3), 925–935. https://doi.org/10.1121/1.381772

Slawinski, 乙. B., & Fitzgerald, L. K. (1998). Perceptual development
of the categorization of the /r-w/ contrast in normal children.
Journal of Phonetics, 26(1), 27–43. https://doi.org/10.1006/jpho
.1997.0057

Stager, C. L。, & Werker, J. F. (1997). Infants listen for more phonetic
detail in speech perception than in word-learning tasks. 自然,
388(6640), 381–382. https://doi.org/10.1038/41102, 考研:
9237755

Stilp, C. E., & Kluender, K. 右. (2012). Efficient coding and statisti-
cally optimal weighting of covariance among acoustic attributes
in novel sounds. PLoS ONE, 7(1), Article e30845. https://doi.org
/10.1371/journal.pone.0030845, 考研: 22292057

Stilp, C. E., 罗杰斯, 时间. T。, & Kluender, K. 右. (2010). Rapid efficient
coding of correlated complex acoustic properties. 会议记录
the National Academy of Sciences, 107, 21914–21919. https://
doi.org/10.1073/pnas.1009020107, 考研: 21098293

Streeter, L. A. (1976). Language perception of 2-month-old infants
shows effects of both innate mechanisms and experience. 自然,
259(5538), 39–41. https://doi.org/10.1038/259039a0, 考研:
1256541

Sundara, M。, Polka, L。, & Genesee, F. (2006). Language-experience
facilitates discrimination of /d-ð/ in monolingual and bilingual
acquisition of English. 认识, 100(2), 369–388. https://土井
.org/10.1016/j.cognition.2005.04.007, 考研: 16115614

Swingley, D. (2009). Contributions of infant word learning to
language development. Philosophical Transactions of the Royal
Society B, 364(1536), 3617–3632. https://doi.org/10.1098/rstb
.2009.0107, 考研: 19933136

Swingley, D. (2019). Learning phonology from surface distribu-
系统蒸发散, considering Dutch and English vowel duration. 语言
Learning and Development, 15(3), 199–216. https://doi.org/10
.1080/15475441.2018.1562927, 考研: 31607832
T a ni g u c h i , 时间 . , N a g a s a k a , S . , & N a k a s h i m a , 右 .

( 20 1 6 ) .
Nonparametric Bayesian double articulation analyzer for direct
language acquisition from continuous speech signals. IEEE
Transactions on Cognitive and Developmental Systems, 8(3),
171–185. https://doi.org/10.1109/TCDS.2016.2550591

Thiollière, R。, Dunbar, E., Synnaeve, G。, Versteegh, M。, &
Dupoux, 乙. (2015). A hybrid dynamic time warping-deep neu-
ral network architecture for unsupervised acoustic modeling. 在
Proceedings of Interspeech (PP. 3169–3173). International Speech
Communication Association. https://doi.org/10.21437/Interspeech
.2015-640

Toscano, J. C。, & McMurray, 乙. (2010). Cue integration with cate-
gories: Weighting acoustic cues in speech using unsupervised
learning and distributional statistics. 认知科学, 34(3),
434–464. https://doi.org/10.1111/j.1551-6709.2009.01077.x,
考研: 21339861

Tricomi, E., Delgado, 中号. R。, McCandliss, 乙. D ., 麦克莱兰, J. L。, &
Fiez, J. A. (2006). Performance feedback drives caudate activa-
tion in a phonological learning task. 认知杂志
神经科学, 18(6), 1029–1043. https://doi.org/10.1162/jocn
.2006.18.6.1029, 考研: 16839308

Tripp, A。, 费尔德曼, 氮. H。, & Idsardi, 瓦. J. (2021). Social inference
may guide early lexical learning. 心理学前沿, 12,
文章 645247. https://doi.org/10.3389/fpsyg.2021.645247,
考研: 34093326

Trubetzkoy, 氮. S. (1939). Grundzüge der Phonologie. Vandenhoeck

und Ruprecht.

Tsao, F.-M., 刘, H.-M., & Kuhl, 磷. K. (2006). Perception of native
and non-native affricate-fricative contrasts: Cross-language tests
on adults and infants. Journal of the Acoustical Society of America,
120(4), 2285–2294. https://doi.org/10.1121/1.2338290, 考研:
17069324

Tsuji, S。, & Cristia, A. (2014). Perceptual attunement in vowels: A
meta-analysis. Developmental Psychobiology, 56(2), 179–191.
https://doi.org/10.1002/dev.21179, 考研: 24273029

开放的心态: 认知科学的发现

130

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Do Infants Learn Phonetic Categories?

Feldman et al.

Underbakke, M。, Polka, L。, 戈特弗里德, 时间. L。, & Strange, 瓦. (1988).
Trading relations in the perception of /r/-/l/ by Japanese learners
of English. Journal of the Acoustical Society of America, 84(1),
90–100. https://doi.org/10.1121/1.396878, 考研: 3411058
Vallabha, G. K., 麦克莱兰, J. L。, Pons, F。, Werker, J. F。, & Amano,
S. (2007). Unsupervised learning of vowel categories from infant-
directed speech. 美国国家科学院院刊
科学, 104(33), 13273–13278. https://doi.org/10.1073/pnas
.0705369104, 考研: 17664424

van den Oord, A。, 李, Y。, & Vinyals, 氧. (2018). Representation
learning with contrastive predictive coding. ArXiv. https://arxiv
.org/abs/1807.03748v1

van Niekerk, B., Nortje, L。, & Kamper, H. (2020). Vector-quantized
neural networks for acoustic unit discovery in the zerospeech
2020 challenge. In Proceedings of Interspeech (PP. 4836–4840).
International Speech Communication Association. https://doi.org
/10.21437/Interspeech.2020-1693

Versteegh, M。, Thiollière, R。, Schatz, T。, 曹, X.-N., Anguera, X。,
Jansen, A。, & Dupoux, 乙. (2015). The zero resource speech
challenge 2015. In Proceedings of Interspeech (PP. 3169–3173).
International Speech Communication Association. https://doi.org
/10.21437/Interspeech.2015-638

Wanrooij, K., Boersma, P。, & van Zuijen, 时间. L. (2014). Fast phonetic
learning occurs already in 2-to-3-month old infants: An ERP
学习. 心理学前沿, 5(77), 1–12. https://doi.org/10
.3389/fpsyg.2014.00077, 考研: 24701203

Werker, J. F。, Byers-Heinlein, K., & Fennell, C. 时间. (2009). Bilingual
beginnings to learning words. Philosophical Transactions of the
英国皇家学会B, 364(1536), 3649–3663. https://doi.org/10
.1098/rstb.2009.0105, 考研: 19933138

Werker, J. F。, & Curtin, S. (2005). PRIMIR: A developmental frame-
work of infant speech processing. Language Learning and
发展, 1(2), 197–234. https://doi.org/10.1080/15475441
.2005.9684216

Werker, J. F。, & Lalonde, C. 乙. (1988). Cross-language speech per-
塞申斯: Initial capabilities and developmental change.
Developmental Psychology, 24(5), 672–683. https://doi.org/10
.1037/0012-1649.24.5.672

Werker, J. F。, Pons, F。, Dietrich, C。, Kajikawa, S。, Fais, L。, & Amano,
S. (2007). Infant-directed speech supports phonetic category
learning in English and Japanese. 认识, 103(1), 147–162.
https://doi.org/10.1016/j.cognition.2006.03.006, 考研:
16707119

Werker, J. F。, & Tees, 右. C. (1984). Cross-language speech percep-
的: Evidence for perceptual reorganization during the first year

生命的. Infant Behavior and Development, 7(1), 49–63. https://土井
.org/10.1016/S0163-6383(84)80022-3

Westermann, G。, & Reck Miranda, 乙. (2004). A new model of sen-
sorimotor coupling in the development of speech. Brain and
语言, 89(2), 393–400. https://doi.org/10.1016/S0093-934X
(03)00345-6, 考研: 15068923

哪个, M。, & Sundara, 中号. (2019). Cue-shifting between acoustic
cues: Evidence for directional asymmetry. Journal of Phonetics,
75, 27–42. https://doi.org/10.1016/j.wocn.2019.04.002

Yeung, H. H。, 陈, K. H。, & Werker, J. F. (2013). When does
native language input affect phonetic perception? The preco-
cious case of lexical tone. 记忆与语言杂志,
68(2), 123–139. https://doi.org/10.1016/j.jml.2012.09.004

Yeung, H. H。, & Werker, J. F. (2009). Learning words’ sounds
before learning how words sound: 9-month-olds use distinct
objects as cues to categorize speech information. 认识,
113(2), 234–243. https://doi.org/10.1016/j.cognition.2009.08
.010, 考研: 19765698

Ylinen, S。, Uther, M。, Latvala, A。, Vepsäläinen, S。, Iverson, P。,
Akahane-Yamada, R。, & Naätänen, 右. (2009). Training the brain
to weight speech cues differently: A study of Finnish second-
language users of English. 认知神经科学杂志,
22(6), 1319–1332. https://doi.org/10.1162/jocn.2009.21272,
考研: 19445609

Yoshida, K. A。, Pons, F。, Maye, J。, & Werker, J. F. (2010).
Distributional phonetic learning at 10 months of age. Infancy,
15(4), 420–433. https://doi.org/10.1111/j.1532-7078.2009
.00024.X, 考研: 32693519

于, D ., Deng, L。, & Dahl, G. (2010, 十二月 10). Roles of
pre-training and fine-tuning in context-dependent DBN-HMMs
for real-world speech recognition [Paper presentation]. Neural
Information Processing Systems (NIPS) Workshop on Deep
Learning and Unsupervised Feature Learning, Whistler, BC.
Zevin, J. D. (2012). A sensitive period for shibboleths: The long tail
and changing goals of speech perception over the course of
发展. Developmental Psychobiology, 54(6), 632–642.
https://doi.org/10.1002/dev.20611, 考研: 22714710

赵, J。, Al-Aidroos, N。, & Turk-Browne, 氮. 乙. (2013). Attention is
spontaneously biased toward regularities. 心理科学,
24(5), 667–677. https://doi.org/10.1177/0956797612460407,
考研: 23558552

Zlatin, 中号. A。, & Koenigsknecht, 右. A. (1975). Development of the
voicing contrast: Perception of stop consonants. 杂志
Speech and Hearing Research, 18(3), 541–553. https://doi.org
/10.1044/jshr.1803.541, 考研: 1186163

开放的心态: 认知科学的发现

131

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
4
6
1
9
6
9
1
5
7
哦
p
米
_
A
_
0
0
0
4
6
p
d

我

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3 PERSPECTIVE image

下载pdf