报告 - 麻省理工学院人工智能研究专业

报告

Speech Segmentation and Cross-Situational
Word Learning in Parallel

Rodrigo Dal Ben1

Débora de Hollanda Souza1

, Isabella Toselli Prequero1,
, and Jessica F. Hay2

开放访问

杂志

关键词: statistical learning, speech segmentation, cross-situational word learning, word learning

1Universidade Federal de São Carlos, São Carlos, São Paulo, 巴西
2田纳西大学, Knoxville, Knoxville, TN, 美国

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

抽象的

Language learners track conditional probabilities to find words in continuous speech and to
map words and objects across ambiguous contexts. It remains unclear, 然而, 无论
learners can leverage the structure of the linguistic input to do both tasks at the same time. 到
explore this question, we combined speech segmentation and cross-situational word learning
into a single task. In Experiment 1, when adults (N= 60) simultaneously segmented continuous
speech and mapped the newly segmented words to objects, they demonstrated better
performance than when either task was performed alone. 然而, when the speech stream
had conflicting statistics, participants were able to correctly map words to objects, but were at
chance level on speech segmentation. In Experiment 2, we used a more sensitive speech
segmentation measure to find that adults (N= 35), exposed to the same conflicting speech
溪流, correctly identified non-words as such, but were still unable to discriminate between
words and part-words. 再次, mapping was above chance. Our study suggests that learners
can track multiple sources of statistical information to find and map words to objects in noisy
环境. It also prompts questions on how to effectively measure the knowledge arising
from these learning experiences.

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

介绍

Learning a new language requires mastering several complex tasks. Research has shown that
language learners can use statistical cues from their linguistic environment to overcome some
of these challenges. 例如, learners can track conditional probabilities between sylla-
bles to discover words from continuous speech and between words and objects to learn the
meaning of novel words across ambiguous situations. The present study explores how tracking
conditional probabilities in audiovisual input may help learners to solve both tasks simulta-
neously. We combine two well established statistical learning tasks—speech segmentation
(例如, 龙伯格 & 藏红花, 2010; Saffran et al., 1996) and cross-situational word learning
(例如, 史密斯 & 于, 2008; 于 & 史密斯, 2007)—into a single paradigm.

Faced with continuous speech and only a few words in isolation (∼10%; Brent & Siskind,
2001), one of the crucial challenges for language learners is to segment streams of words into
discrete units. Conditional probabilities between syllables (IE。, transitional probabilities; Krogh
等人。, 2013; 龙伯格 & 藏红花, 2010; Saffran et al., 1996) provide one cue that aids segmen-
站 (for evidence of other cues, see Hay & 藏红花, 2012; Johnson et al., 2014). In natural

引文: Dal Ben, R。, Prequero, 我. T。,
Souza, D. de H., & 干草, J. F. (2023).
Speech Segmentation and Cross-
Situational Word Learning in Parallel.
开放的心态: 认知方面的发现
科学, 7, 510–533. https://doi.org/10
.1162/opmi_a_00095

DOI:
https://doi.org/10.1162/opmi_a_00095

已收到: 6 七月 2023
公认: 6 七月 2023

利益争夺: 作者
声明不存在利益冲突.

通讯作者:
Rodrigo Dal Ben
dalbenwork@gmail.com

版权: © 2023
麻省理工学院
在知识共享下发布
归因 4.0 国际的
(抄送 4.0) 执照

麻省理工学院出版社

Segmentation and Word Learning Dal Ben et al.

speech, syllables that form words tend to have higher likelihood of co-occurrence (更高
Transitional Probabilities, TPs) in comparison to syllables across word boundaries (Swingley,
1999; but see Yang, 2004), which provides a potential cue to segmentation. 例如, 在
the sequence pretty#baby the TP of pre to ty is greater than the TP of ty to ba, this difference in
TP could signal a word boundary for learners (Saffran et al., 1996). There is now a vast empirical
literature showing that language learners can track differences in TPs across syllable sequences
to segment continuous speech into discrete words (for reviews see Cannistraci et al., 2019;
Cunillera & Guilera, 2018; but see Black & Bergmann, 2017). The experimental task in these
studies usually starts by familiarizing participants with a continuous speech stream in which TP
is the main cue to word boundaries. 例如, some syllables always occur together
(creating a word), sometimes occur together (creating a part-word or a low TP word), 或者
never occur together (creating a non-word). Following familiarization, participants’ preferences
for words, part-words, or non-words are measured. By and large participants differentiate words
from foils (part-words or non-words), suggesting that they successfully tracked TP information to
find words in the continuous speech stream.

Phonotactic probability (PP), the conditional probability of a syllable occurring in a given
position of a word from a given language ( Vitevitch & Luce, 2004), is another statistical cue to
word boundaries (Benitez & 藏红花, 2021; Mattys & Jusczyk, 2001; Mattys et al., 1999). 为了
实例, in the same sound sequence pretty#baby, the English PPs1 of the words pretty and
baby are comparable (≈ 0.0440, ≈ 0.0050, 分别) and both are higher than the PP of
the part-word ty#ba (≈ 0.0022), which could signal word boundaries to language learners. 这
combined information of TPs and PPs can promote—when both cues point to word
boundaries—or impair speech segmentation—when they provide conflicting information
about word boundaries. Evidence suggests that this happens when TP is combined with legal
versus illegal PPs (芬恩 & Hudson Kam, 2008), with high versus low PPs (Mersad & Nazzi,
2011), and even with subtle differences in high PPs (Dal Ben et al., 2021). 在之前的工作中,
we argued that careful consideration of phonotactics from participants’ natural languages
should be an integral part of the stimuli design of statistical speech segmentation studies
(Dal Ben et al., 2021). This is especially true when studying adults, who will promptly bring
their extensive learning history and expectations from their natural languages’ PPs to the
experimental task (Steber & Rossi, 2020; Sundara et al., 2022).

Assigning meaning to words is another challenge for language learners. There is evidence
那, early in development, recently segmented words (with stronger TPs) are treated as better
candidate labels on subsequent mapping tasks (Graf Estes et al., 2007; Hay et al., 2011). 尽管
the benefit of high TP sequences during word learning appears to diminish across develop-
蒙特 (Mirman et al., 2008; Shoaib et al., 2018), learners continue to be remarkably successful
both at segmenting speech using TP information (Saffran et al., 1996; but see Black &
Bergmann, 2017) and at making one-to-one mappings between labels and referents (Graf
Estes, 2009; Graf Estes et al., 2007; Lany & 藏红花, 2010). 此外, across the lifespan,
language learners rely on phonotactics from their natural languages when learning novel
字, with words with stronger PPs being learned faster and more accurately than words with
weaker PPs (Graf Estes et al., 2011; Storkel et al., 2013; but see Cristia 2018). 然而, 这
might not be true when learning novel words in ambiguous situations (Dal Ben et al., 2022).

In everyday life, several words are presented with several potential referents at the same
时间, creating ambiguous learning experiences (Quine, 1960). A growing empirical literature

1 Phonotactic probabilities calculated using Vitevitch and Luce (2004) online calculator.

开放的心态: 认知科学的发现

511

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

shows that learners can track word-object co-occurrences across ambiguous situations to find
the meaning of words (for a recent meta-analysis, see Dal Ben et al., 2019; but see Smith et al.,
2014). The experimental task in these studies usually familiarizes participants with a series of
ambiguous trials. On each trial, 二 (或者更多) words are presented with two (或者更多) 物体.
On any given trial, there is insufficient information to solve the ambiguity. 然而, if partic-
ipants compare word-object conditional probabilities across trials, word-object relations can
be learned2 (史密斯 & 于, 2008; 于 & 史密斯, 2007).

The evidence that statistical information can promote both speech segmentation and cross-
situational word learning prompts the question of whether these processes unfold in sequence
or in parallel. Related evidence for the latter is reported by Cunillera, Laine, 等人. (2010).
Adults were familiarized with a continuous speech stream and, 同时, with a stream
of objects. When the first word was being played, its corresponding object was displayed on
屏幕; when the second word started, its corresponding object replaced the previous one,
等等. From this dynamic presentation, participants were able to segment words from
the continuous speech and to map them to its corresponding objects in parallel. 此外, 在
a follow-up study, François et al. (2017) replicated the findings and showed neurophysiolog-
ical markers for online simultaneous speech segmentation and mapping. Although these stud-
ies have shown that segmentation and mapping can happen in parallel (see also Shukla et al.,
2011 for a related task with infants), both used non-ambiguous word learning tasks.

直观地, adding mapping ambiguity could make the simultaneous task too challenging.
然而, Yurovsky et al. (2012) have shown that adults can simultaneously segment labels
from phrases and map them to objects across ambiguous presentations. Using an adaptation
of the cross-situational word learning paradigm ( 于 & 史密斯, 2007), adults were exposed to
scenes with two novel objects. On each trial, they would see only one object and hear a
sentence that included a word labeling it among other function words. When the position
and the onset of labels in the sentences matched the patterns of their natural language (IE。,
final position, label preceded by a small set of words), participants were able to segment the
labels and to map them to objects. Despite the additional demands that ambiguity might
impose, the authors argued that the parallel solution of segmentation and mapping might hap-
pen in continuous iterations, as even partial speech segmentation would reduce mapping
ambiguity and vice-versa (for similar evidence with multilingual adults see Tachakourt,
2023; for related evidence with other linguistic cues, see Feldman, Griffiths, 等人。, 2013;
费尔德曼, 迈尔斯, 等人。, 2013). This is in line with proposals by Räsänen and Rasilo (2015).
In a comprehensive combination of computational simulations and reanalyses of empirical
数据, the authors argue that tracking cross-modal conditional probabilities between words
and objects in ambiguous situations may boost both speech perception and word learning,
in comparison to tracking only TPs or word-object co-occurrences (for a similar argument,
see Jones et al., 2010). 而且, recent meta-analytic findings show that infants effectively
integrate audio and visual information, 来自各种来源, when learning language (例如,
Cox et al., 2022; but see Frank et al., 2007, 约翰逊 & Tyler, 2010, and Thiessen, 2010 为了
potential limits of this integration).

Here we further explore whether the integration of transitional probabilities, phonotactic
probabilities, and word-object co-occurrences would promote speech segmentation and word

2 Here we do not join the productive debate between hypothesis-testing and aggregation as learning mech-
anisms for cross-situational word learning (例如, Yurovsky & Frank, 2015), as we believe it is beyond the scope of
我们的研究.

开放的心态: 认知科学的发现

512

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

learning across ambiguous presentations. Our study is guided by three main questions. 第一的,
we ask whether words can be segmented and mapped at the same time across dynamic
ambiguous presentations. To answer this question, we adapted the design by Cunillera, Laine,
等人. (2010) to combine a speech stream with several new objects in an ambiguous fashion.
第二, we ask whether phonotactic properties of our stimuli would impact speech segmen-
tation and cross-situational word learning in parallel. Answering this question allows us to bet-
ter understand how multiple linguistic statistics can be combined when learning novel words
across ambiguous situations (藏红花, 2020; 史密斯等人。, 2018). 第三, we ask whether this joint
task would improve segmentation and mapping in comparison to separate tasks. To answer
this question, we compared our current findings to data from our previous studies testing
speech segmentation (Dal Ben et al., 2021) and cross-situational word learning (Dal Ben
等人。, 2022) separately, but using the same stimuli (same TP and phonotactic properties)
和人口.

EXPERIMENT 1

To investigate whether words can be segmented and mapped simultaneously and whether dif-
ferences in phonotactics would impact this joint performance, we exposed participants to con-
tinuous speech streams with varying distributions of phonotactics and TPs. 同时,
we also presented them with a series of objects, two at a time, that corresponded to the words
in the speech streams. Critically, one of the languages had TPs and phonotactics aligned,
consistently pointing to word boundaries. In another language, words and part-words had
balanced phonotactics, with TPs being the only informative statistic to word boundaries. 在
a third language, TPs and phonotactics were in conflict: TPs pointed to word boundaries
and phonotactic information pointed to syllables within-words (part-words).

To investigate whether the joint task would improve segmentation and mapping in compar-
ison to separate tasks, we compared segmentation and mapping performance in the present
combined task with performance in the individual tasks (IE。, speech segmentation only and
cross-situational word learning only; Dal Ben et al., 2021, 2022, 分别).

方法

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

参加者. Sixty native Brazilian-Portuguese-speaking adults (Mage = 21.37 年, ± 3.27
标清, 32 女性) participated. None of the participants reported any visual or auditory impair-
ments that could interfere with the task. Participants were recruited online at the official Face-
book group of Universidade Federal de São Carlos, where data was collected. They received
no compensation for their in-person participation. The study was conducted according to the
Declaration of Helsinki and the Ethics Committee of the host university approved the research
(#1.484.847). Participants were randomly assigned to one of three groups.

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Stimuli and Design

Auditory Stimuli. Three frequency-balanced languages from Dal Ben et al. (2021) were used
(见表 1). Each language contained six statistically defined disyllabic pseudo-words
(TP = 1), which served as labels in our task. Test words and part-words in all Languages were
frequency balanced (Aslin et al., 1998). In each language, half of the words were repeated
300 次 (labeled H on Table 1) and the other half were repeated 150 次 (labeled L on
桌子 1). The recombination of syllables from the words with higher frequency generated three
part-words, used during test phase, that had lower TPs (TP = 0.5), but that were balanced in
frequency with the test words (150 repetitions each; Aslin et al., 1998).

开放的心态: 认知科学的发现

513

Segmentation and Word Learning Dal Ben et al.

桌子 1. Words and Part-words (grapheme and IPA) and their Phonotactic Probabilities (PP+ or PP−) and Frequency (High or Low) 为了
Balanced, and Aligned, Conflict Languages

语言
Balanced

Aligned

Conflict

Words

Familiarization
PP
H+

[sute]

sute

viko

bara

nipe

tadi

[viko]

[baʁa]

[nipe]
[tad͡ʒi]

mide

[mide]

dini

deta

[d͡ʒini]

[deta]

pemi

[pemi]

sute

viko

bara

teba

kosu

ravi

nipe

tadi

[sute]

[viko]

[baʁa]

[teba]

[kosu]

[ʁavi]

[nipe]
[tad͡ʒi]

mide

[mide]

H+

H−

H+

H−

Words

nipe

tadi

[nipe]
[tad͡ʒi]

mide

[mide]

sute

viko

bara

[sute]

[viko]

[baʁa]

nipe

tadi

[nipe]
[tad͡ʒi]

mide

[mide]

PP
H−

H−

H+

H−

TP
1.0

1.0

Test

Part-words

teba

kosu

ravi

[teba]

[kosu]

[ʁavi]

nipe

tadi

[nipe]
[tad͡ʒi]

mide

[mide]

sute

viko

bara

[sute]

[viko]

[baʁa]

PP
H−

H−

H+

TP
0.5

0.5

Freq
H

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

此外, all words and part-words had legal and high phonotactic probabilities in
Brazilian-Portuguese. Following previous research (Dal Ben et al., 2021), we decided to use
only syllable sequences with high phonotactics (instead of legal vs. illegal or high vs. 低的; 芬恩
& Hudson Kam, 2008; Mersad & Nazzi 2011) so that all syllable sequences would be phono-
tactically plausible in the participants’ native language. 然而, some syllable sequences
had higher phonotactic probability than others (桌子 1, PP+ or PP−). Phonotactics were cal-
culated using Vitevitch and Luce’s (2004) algorithm and Estivalet and Meunier (2015) 数据库
of Brazilian-Portuguese biphones. Briefly, we divided the sum of the log (根据 10) of token
frequency of each biphone on each word position by the total log frequency of words with
biphones in that given position (例如, /mæ/ in the third biphone divided by the total log fre-
quency of all words with at least three biphones). 然后, using a custom search engine, 我们
created six novel disyllabic words with consonant–vowel structure (CVCV) and with the high-
est possible phonotactic probability before becoming actual words in Brazilian-Portuguese
(labeled PP+; 桌子 1). 最后, we recombined their biphones to create six other novel words
that had slightly less probable, but still high, phonotactic probabilities (labeled PP−; 桌子 1).
For a full description of the phonotactic calculations, see Dal Ben et al. (2021) and Vitevitch
and Luce (2004).

开放的心态: 认知科学的发现

514

Segmentation and Word Learning Dal Ben et al.

(A) Displays the Familiarization phase, with dynamic trials combining the continuous
数字 1.
speech stream with two objects at a time. (乙) Displays a trial of the speech segmentation test (二-
alternative forced-choice). (C) Displays a trial of the mapping test (four-alternative forced-choice).

Languages were synthesized using the MBROLA speech synthesizer with a Portuguese
female voice3 (Dutoit et al., 1996). Prosodic cues were minimized by setting the pitch constant
在 180 赫兹, the intensity at 77 分贝, and the duration of each word to 696 多发性硬化症 (比照. Cunillera,
Laine, 等人。, 2010). The total duration of each language was 15 min 39 s and 424 多发性硬化症.

Following our previous studies, TPs and phonotactics were combined to create three lan-
guages. The Balanced language had test words (TP = 1.0) and part-words (TP = 0.5) with bal-
anced phonotactic probabilities (Mwords = 0.0072, Mpart-words = 0.0075; 桌子 1); this language
served as a control. The Aligned language had test words with higher phonotactic probabilities
in comparison to part-words (Mwords = 0.0085, Mpart-words = 0.0072; 桌子 1). 因此, both TPs
and phonotactics signaled word boundaries. 最后, in the Conflict language: test words had
lower phonotactic probabilities in comparison to part-words (Mwords = 0.0072, Mpart-words =
0.0085; 桌子 1). 因此, TPs highlighted word boundaries whereas phonotactics highlighted
part-words.

Visual Stimuli. Six novel objects, used by Dal Ben et al. (2022), were also used in the present
实验. They were realistic, colorful, 3D objects that are part of the NOUN object base
(Horst & Hout, 2016) and were chosen based on their high degree of novelty (米= 77%) 和
discriminability (米= 90%). For each language, objects and words were randomly paired,
forming six word-object pairs. All stimuli are openly available at https://osf.io/rs2bm/.

设计. Our paradigm (数字 1) was an adaptation of Cunillera, Laine, 等人. (2010) 和com-
bined speech segmentation and cross-situational word learning in the same task. It had two
阶段: familiarization and test. During familiarization, one of the languages (Balanced,
Aligned, Conflict) was played while objects were displayed on the computer screen. 我们
matched words from the speech stream and objects on the screen in such a way that, 在
any given time, two objects were displayed while their corresponding words were presented
(ﬃ 1392 多发性硬化症; 数字 1). 例如, when the first word was first presented, the objects

3 We used the MBROLA database br4 (可以在: https://github.com/numediart/ MBROLA-voices).

开放的心态: 认知科学的发现

515

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

corresponding to the first and second words were displayed; when the third word was played,
the first two objects were replaced by two other objects and so on. This created a highly
dynamic adaptation of the classic 2 × 2 cross-situational word learning arrangement (for a
video sample, see https://osf.io/rs2bm/; 比照. 史密斯 & 于, 2008). 重要的, the onset and offset
of the words and objects were desynchronized (± 100, ± 150, or ± 200 多发性硬化症) to avoid additional
cues to speech segmentation (Cunillera, Càmara, 等人。, 2010). 此外, the entire audio
stream had a fade-in and fade-out effect of 500 ms to minimize cues for the initial and final
words’ boundaries. 最后, to minimize fatigue from this extensive exposure (a total of 1350
word-object presentations, 或者 675 2 × 2 “trials”, over ﬃ 15 minutes), we divided the familiar-
ization into five blocks. Each block had 270 word-object presentations—60 for each high
frequency word-object pair and 30 for each low frequency pair—and lasted a little over
3 minutes. Between blocks, participants were given a 5-second pause on a screen displaying
the task progress (例如, “Block 2 of 5”).

Following familiarization, two tests were performed, always in the same order: segmenta-
tion and mapping. The segmentation test followed a two-alternative forced-choice structure.
On each trial, a frequency-balanced word (IE。, a low frequency word, TP = 1, 150 repetitions)
and a part-word (TP = 0.5, 150 repetitions) were played with a pause of 500 ms between them.
Participants were prompted to indicate which one was a word from the speech stream they
had just heard. The order of presentation of words and part-words was counterbalanced across
试验. Each of the three low frequency words were tested six times across 18 test trials, 和
each word being tested against each part-word twice4.

The mapping test followed a four-alternative forced-choice structure. Each trial began with
four objects displayed in the corners of the screen: one target object (co-occurrence probabil-
ity = 1 with target word) and three distractors (co-occurrence probability = 0.2 with target
word). 后 1 第二, a target word was played and participants were prompted to select
the matching object. Each of the 6 word-object pairs (3 high frequency words and 3 low fre-
quency words) were tested twice across 12 试验.

程序. The experiment was conducted in a sound-attenuated room and was computer
administered using Psychopy2 (Peirce et al., 2019). Auditory stimuli were played on high-
definition neutral headphones (AKG K240 powered by Fiio e10K dac/amp). All responses were
entered on an adapted numeric keyboard with only the keys: 1, 2, 3, 4, Return, +, and − (到
increase or decrease the audio volume). At the beginning of the experiment, music with the
same intensity as the experimental stimuli (77 分贝) was played and participants were instructed
to adjust the volume to a comfortable level.

下一个, they were instructed that they would hear a new language and see new objects and
that their task was to discover which words corresponded to which objects. Following famil-
iarization, they were tested on segmentation and mapping. The first two trials of each testing
phase were warm-up trials used to familiarize participants with the structure of the tasks. 为了
例子, before the segmentation test trials began, participants were presented with two prac-
tice trials with a common word from Brazilian-Portuguese versus a nonsense word (例如, pato
[duck] 与. tafi). 相似地, before the mapping test trials began, participants were presented with
two practice trials during which they heard a familiar word and were presented with 4 familiar

4 The decision to test each word six times was based on our previous investigation of speech segmentation
仅有的 (Dal Ben et al., 2021). Whereas this number of repetitions is higher in comparison to similar studies (例如,
Cunillera, Laine, 等人。, 2010; François et al., 2017), follow-up analyses revealed that trial number did not pre-
dict performance on neither Experiment 1 也不 2. Full analysis available at: https://osf.io/rs2bm/.

开放的心态: 认知科学的发现

516

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

物体 (例如, “pato” + picture of a duck, 房子, 猫, ball). 此外, after each test phase,
participants were asked to estimate their performance by indicating if the percentage of correct
responses was between 0–25%, 25–50%, 50–75%, or 75–100%. Participants’ compliance to
instructions was continuously assessed using a CCTV system. At the end, 参与者
answered a questionnaire about their educational background and language abilities.

数据分析. After excluding inattentive responses, defined as test trials with reaction times
greater than 3 SDs away from the mean (segmentation: 15 试验, 1% 数据的; 映射: 17
试验, 2% 数据的), we fitted mixed-effects logistic regressions using the lme4 package for R
(Bates et al., 2015; R核心团队, 2021) and Spearmans’ correlations, also in R, to explore
speech segmentation performance, cross-situational word learning performance, 关系
它们之间, and self-evaluation. Specific models, 结果, and predictors are described
in the next section. Given the exploratory nature of our investigation, we report effect size
estimations and confidence intervals, but not p-values (Scheel et al., 2021). All scripts and data
are openly available at https://osf.io/rs2bm/.

Results and Discussion

Speech Segmentation. To analyze speech segmentation performance, our mixed-effects logistic
regression had selection of the target word (either correct or incorrect) as our outcome variable
and chance level (logit of 0.5) 和语言 (Balanced, Aligned, Conflict, 分别) as pre-
dictor variables. Our initial model had a maximal random structure with stimuli as random
slopes and participants as random intercepts5 (Barr et al., 2013), but this model did not con-
verge. We then pruned it to include only random intercepts for stimuli and participants6.

Participants from the Balanced language were much more likely to select the words over
the part-words at test (Odds Ratio = 10.95, 95% CI [4.19, 28.57]7; 米= 0.85, 标准差= 0.16;
数字 2). Participants from the Aligned language, in which both TP and phonotactic proba-
bility pointed to word boundaries, were even more likely to select words over part-words
(change in OR = 1.61, 95% CI [0.41, 6.27]; 米= 0.87, 标准差= 0.18). 另一方面, par-
ticipants from the Conflict language, in which TP and phonotactic probabilities worked against
彼此, were equally likely to select words and part-words (change in OR = 0.13, 95% CI
[0.04, 0.42]; 米= 0.57, 标准差= 0.3). These results are in line with our previous findings that
adults not only track both TP and PP at the same time, but that these statistics can be combined
to improve (IE。, Aligned language) or impair (IE。, Conflict language) speech segmentation (Dal
Ben et al., 2021).

此外, segmentation performance and self-evaluation (数字 2) were positively cor-
related for the Balanced (rs = 0.45) and Aligned (rs = 0.48) 语言, but not for the Conflict
语言 (rs = 0.12). This suggests that being exposed to a continuous speech in which TPs
and PPs were either aligned or balanced within words formed clearer word representations,
which allowed participants to estimate their knowledge of the words more accurately from the
speech.

To explore whether our joint task impacts speech segmentation, we compared the present
data with data from a previous investigation testing speech segmentation only (Dal Ben et al.,
2021). Because we used the exact same languages as previous studies, we fit separate

5 lme4 syntax: selection ∼ chance level + 语言 + (刺激|参与者).
6 lme4 syntax: selection ∼ chance level + 语言 + (1|刺激) + (1|参与者).
7 Regression tables are available at https://osf.io/rs2bm/.

开放的心态: 认知科学的发现

517

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(A) Mean number of correct word selections for Balanced (米= 0.84, 标准差= 0.15), Aligned
数字 2.
(米= 0.87, 标准差= 0.18), and Conflict (米= 0.57, 标准差= 0.3) languages on segmentation test of
实验 1. Solid points represent the overall mean, error bars represent 95% CIs (non-parametric
bootstrap). Points represent the mean for each participant. Shaded areas depict the distribution of
individual responses. The dashed line displays the chance level (0.5). Panel B: Correlations between
segmentation and self-evaluation (upper panel; rs Balanced = 0.45; rs Aligned = 0.48; rs Conflict = 0.12)
for Balanced, Aligned, and Conflict languages on Experiment 1. The size of dots indicates the
number of participants that overlap in given coordinates (从 1 到 4).

mixed-effects logistic regressions8 for each language (Balanced, Aligned, Conflict), 拥有
the selection of target words (correct or incorrect) as our outcome variable, 实验
(segmentation only or simultaneous task) as a predictor variable, and participants as random
intercepts.

For the Balanced language, participants in the simultaneous task were approximately three
times more likely to choose the target word compared to the separate task (change in OR =
3.21, 95% CI [1.50, 6.88]; 数字 3). The difference was even higher for the Aligned language,
participants from the simultaneous task were almost five times more likely to make correct
selections in comparison to the separate task (change in OR = 4.93, 95% CI [1.34, 18.16]).
另一方面, in the Conflict language, although participants in the simultaneous task still

8 lm4 syntax for each language: word selection ∼ experiment + (1|参与者).

开放的心态: 认知科学的发现

518

Segmentation and Word Learning Dal Ben et al.

数字 3. Mean number of correct word selections for Balanced (separate: 米= 0.68, 标准差= 0.2;
simultaneous: 米= 0.84, 标准差= 0.15), Aligned (separate: 米= 0.68, 标准差= 0.27; simultaneous: 米=
0.87, 标准差= 0.18), and Conflict (separate: 米= 0.43, 标准差= 0.23; simultaneous: 米= 0.57, 标准差= 0.3)
languages for an experiment testing speech segmentation only ( WS only; Dal Ben et al., 2021)
and on our current simultaneous task ( WS & CSWL). Solid points represent the overall mean,
error bars represent 95% CIs (non-parametric bootstrap). Points represent the mean for each partic-
ipant. Shaded areas depict the distribution of individual responses. Dashed line displays the chance
等级 (0.5).

outperformed participants from the separate task, the improvement was much less pronounced
(change in OR = 1.96, 95% CI [0.83, 4.64]).

These results show that adults will use any statistic available–phonetic and audiovisual co-
occurrences–to find words in continuous speech. 而且, the improvement in segmentation
in our current task indicates that adults benefit from tracking multiple statistical sources. 这
provides initial empirical support for the model proposed by Räsänen and Rasilo (2015) 并且是
in line with recent research on language development in natural environments (Clerkin et al.,
2017; 史密斯等人。, 2018; Yu et al., 2021).

Cross-situational Word Learning. To analyze cross-situational word learning, our mixed-effects
logistic regression9 had selection of the target object (either correct or incorrect) 作为
outcome variable, chance level (logit of 0.25), 语言 (Balanced, Aligned, Conflict, 重新指定-
主动地), the frequency of word-object pairs (low or high), and their interaction as predictor
变量, and stimuli and participants as random intercepts.

Across all languages and pair frequencies, participants were much more likely to select the
correct object in comparison to the distractors (数字 4; full regression table available at
https://osf.io/rs2bm/). Mapping and self-evaluation (数字 4) were positively correlated for
all languages. They were strongly correlated for the Balanced language (rs = 0.9), and moder-
ately for the Aligned (rs = 0.59) and the Conflict languages (rs = 0.53). This suggests that

9 lm4 syntax: object selection ∼ chance level + 语言 * pair frequency + (1|刺激) + (1|参与者).

开放的心态: 认知科学的发现

519

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(A) Mean number of correct high and low frequency object selections for Balanced,
数字 4.
Aligned, and Conflict languages on cross-situational word learning test of Experiment 1 (Balanced:
Mlow = 0.85, 标准差= 0.26, Mhigh = 0.75, 标准差= 0.3; Aligned: Mlow = 0.89, 标准差= 0.18, Mhigh = 0.84,
标准差= 0.28; Conflict: Mlow = 0.49, 标准差= 0.31, Mhigh = 0.56, 标准差= 0.32). Solid points represent the
overall mean, error bars represent 95% CIs (non-parametric bootstrap). Shaded areas depict the dis-
tribution of individual responses. The dashed line displays the chance level (0.25). (乙) Correlations
between cross-situational word learning and self-evaluation for Balanced, Aligned, and Conflict
语言 (rs Balanced = 0.9; rs Aligned = 0.59; rs Conflict = 0.52) on Experiment 1. The size of dots
indicates the number of participants that overlap in given coordinates (从 1 到 7).

participants from all languages were able to form clear word-object relationships. It was sur-
prising to see that participants from the Conflict language, who performed at chance on the
speech segmentation task, were able to form strong word-object relationships—a point to
which we return later.

To explore whether our simultaneous task impacts mapping performance, we compared the
present data with data from a previous experiment that only tested cross-situational word
learning but using the same stimuli and population (Dal Ben et al., 2022). We fitted one
mixed-effect logistic model that had mapping (correct or incorrect) as the outcome variable,
the interaction between experiment (separate or simultaneous task) 和语言 (Balanced,
Aligned, Conflict) as a predictor, and participants as random intercepts10.

10 lme4 syntax: object selection ∼ experiment:语言 + (1|参与者).

开放的心态: 认知科学的发现

520

Segmentation and Word Learning Dal Ben et al.

数字 5. Mean number of correct object selections in an experiment testing cross-situational
word learning only—CSWL only (米= 0.65, 标准差= 0.24; Dal Ben et al., 2022)—and in the Balanced
(米= 0.79, 标准差= 0.28), Aligned (米= 0.86, 标准差= 0.23), and Conflict (米= 0.69, 标准差= 0.3) 语言
from the present, simultaneous, 实验. Solid points represent the overall mean, error bars
represent 95% CIs (non-parametric bootstrap). Points represent the mean for each participant.
Shaded areas depict the distribution of individual responses. Dashed line displays the chance
等级 (0.25).

全面的, cross-situational word learning improved for all languages during the parallel task
in comparison to the separate task (数字 5). The improvement was greater for participants
from the Aligned language (change in OR = 7.39, 95% CI [2.10, 25.98]), followed by partic-
ipants from the Balanced language (change in OR = 3.34, 95% CI [1.37, 5.52]). Although less
pronounced, there was also an improvement for the Conflict language (change in OR = 1.60,
95% CI [0.50, 5.05]), which indicates that participants can benefit from word-object co-
occurrence even when TP and phonotactics point to different word boundaries.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Relationship Between Speech Segmentation and Word Mapping. To explore potential relationships
between speech segmentation and word mapping, we ran Spearmans’ correlations between
words’ and objects’ selections (average scores per participant) for each Language. 我们发现
moderate positive correlations between segmentation and mapping for all Languages (rs Balanced =
0.49; rs Aligned = 0.52; rs Conflict = 0.42; Figure 6A). 全面的, participants that were better at
segmentation were also better at mapping. To further explore if that was true for participants
from the Conflict Language, we performed a median split of segmentation performance
(Mdn = 0.66, IQR = 0.4) and ran Spearman correlation tests for each group separately
(Figure 6B). Participants that successfully segmented the speech (above median) were also suc-
cessful in mapping words to objects (rs = 0.46). 然而, we found no relationship between
segmentation and mapping for those who performed poorly on segmentation (below the
median; rs = 0.003).

Our design does not inform us about potential learning sequences. 直观地, 强的
speech segmentation skills should lead to strong word mapping, which is confirmed to some
extent by the positive correlation between word and object selections for participants above
the median in the Conflict language, but not for those below the median. 有趣的是,
simulations by Räsänen and Rasilo (2015) favor a simultaneous performance in which speech

开放的心态: 认知科学的发现

521

Segmentation and Word Learning Dal Ben et al.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(A) Correlations between speech segmentation and mapping for Balanced, Aligned,
数字 6.
and Conflict Languages on Experiment 1 (rs Balanced = 0.49; rs Aligned = 0.52; rs Conflict = 0.42).
The size of dots indicates the number of participants that overlap in each coordinate (从 1 到
6). (乙) Correlations between speech segmentation and mapping in the Conflict Language for partic-
ipants with speech segmentation above the median (Mdn = 0.66, IQR = 0.4; rs = 0.45) and below
the median (rs = 0.003). The size of dots indicates the number of participants that overlap in each
coordinate (从 1 到 2).

segmentation and mapping retrofeed each other, driving performance on both tasks. 在这个
看待, the absence of a relationship between segmentation and mapping for participants
below the median in the Conflict Language indicates that these performances could be
independent from one another.

全面的, results from the present experiment suggest that not only can adults simultaneously
track conditional probabilities between audio and visual stimuli to segment words from
continuous speech streams and map them to referents under ambiguous learning contexts,
but that both segmentation and mapping improve when a greater set of cues, even from dif-
ferent modalities, 可用 (人物 3 和 5). Such results provide preliminary empirical
evidence to the model of simultaneous segmentation and ambiguous mapping proposed by
Räsänen and Rasilo (2015) and Jones et al. (2010).

Our results also indicate that phonotactic probabilities, or how familiar syllables’ positional
probabilities are in the native language of the participants, also impact such joint performance.

开放的心态: 认知科学的发现

522

Segmentation and Word Learning Dal Ben et al.

When transitional and phonotactic probabilities worked together to signal word boundaries,
segmentation and mapping improved (Aligned language) in contrast to when the phonotactic
probabilities were balanced among test items (Balanced language). 然而, the impact of
phonotactics was most pronounced when it conflicted with TP information. In the Conflict
语言, 全面的, participants failed to show a preference for words when compared to
part-words at test (数字 2). 尽管如此, they were able to map words and objects (数字 4).
How could this happen?

If we assume that segmentation is a necessary pre-step to cross-situational mapping, 然后
this result is hard to explain. 然而, if adults use whatever informative statistics they have at
hand to solve linguistic ambiguity, they would take advantage of both transitional and phono-
tactic statistics and word-object co-occurrences in the Aligned and Balanced languages. 在
另一方面, in the Conflict language, statistics were not consistent enough to promote seg-
心理状态, but co-occurrences between word syllables and objects were consistent enough to
promote mapping and, to some extent, speech segmentation—even without clear and explicit
word representations. It is worth noting that objects were consistently paired with words only,
and not with part-words. This might have provided some participants with enough information
for speech segmentation. It might also have decreased the influence of statistical cues on seg-
心理状态 (both TPs and PPs). 尽管如此, if word-object co-occurrence was the main source
of information for speech segmentation, we should have seen similar levels of segmentation in
all languages.

而且, our two-alternative forced-choice test might not have been sensitive enough to
capture the weaker and implicit word representations that might have arisen in the Conflict
语言, providing us a partial picture of participants’ speech segmentation. Our two-
alternative-forced-choice trials contrasted words with stronger TPs and weaker phonotactic
probability, or part-words with weaker TPs but stronger phonotactic probability. The contrast
between recently acquired TP knowledge, and language specific phonotactic knowledge
learned across the lifespan, may have impaired word selection (芬恩 & Hudson Kam,
2008). With this in mind, we replicate the current experiment, but using an arguably more
sensitive speech segmentation measurement.

最后, it is worth noting that our careful selection and combination of syllables to create
disyllabic words with varying TP and PP contrasts introduced an important confound to our
学习: none of our words shared syllables. As all syllables were unique to a given word, track-
ing co-occurrences between individual syllables and objects would be enough to solve the
mapping task—but not the segmentation task. This learning strategy would greatly reduce
mapping complexity: participants could ignore half of the syllables and all linguistic regular-
实体 (IE。, TP and PP). Whereas this strategy may be computationally most simplistic, 它似乎
喜欢, as a group, participants in the Balanced and Aligned languages did track word-level sta-
tistics, as indicated by their segmentation performance. 然而, when faced with conflicting
linguistic regularities, participants in the Conflict language might have defaulted to this more
simplistic learning strategy and solved the mapping task without relying on word represen-
tations. 重要的, this confound extends to Experiment 2 and we further discuss it in the
General Discussion.

EXPERIMENT 2

In an attempt to capture the potentially nuanced word form knowledge implicitly arising from
experience with the Conflict language, in the current experiment we use a more sensitive word
segmentation test: go/no-go (François et al., 2017). In this test, each item is presented and

开放的心态: 认知科学的发现

523

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

evaluated separately, 一次一个. By avoiding the contrast between stimuli (IE。, word, 部分-
word, non-word) with different statistics (TP and phonotactics) at test and by adding a new
stimuli type (IE。, non-words), we aim for a more fine-grained understanding of word represen-
tations in the Conflict language. The mapping test is the same as in Experiment 1.

方法

This experiment was a replication of Experiment 1, but it was fully online due to the COVID-
19 pandemic. Differences in methodology are described below.

参加者. Forty-five adults, all native speakers of Brazilian-Portuguese, with no reported
visual or auditory impairment that could interfere with the task, participated. 然而, 10
participants were excluded from the final analyses because they failed or missed attention
check questions, reported using their mobile phones or taking notes during the experiment
(see Data analysis for further details). 最终样本包括 35 adults (Mage = 23.51,
± 4.01 标清, 22 女性). As in Experiment 1, participants were recruited at the official Facebook
group of the Universidade Federal de São Carlos and received no compensation for their par-
期待. The study was conducted according to the Declaration of Helsinki and the Ethics
Committee of the host university approved the research (#3.085.914).

Stimuli and Design. We used the Conflict language from Experiment 1, with the same word-
object pairs. As a brief reminder, in this Language, words had high TPs (TP = 1; 桌子 1) 和
lower phonotactic probabilities (Mwords = 0.0072), while part-words had lower TPs (TP = 0.5)
and higher phonotactic probabilities (Mpart-words = 0.0085). 此外, we created three addi-
tional non-words with balanced phonotactic by recombining the initial syllables of words (IE。,
/visu/, /tami/, /rako/; PPs = 0.0080, 0.0074, 0.0069, 分别). Because their syllables never
occurred together in the Language, their TP was zero.

A similar design from Experiment 1 was used here, with four main differences. 第一的, 给定
the online nature of the study, before beginning the experimental task, participants were
instructed to move to a quiet room, to turn off any electronic devices (例如, cellphone, TV),
to wear earphones, and not to take notes during the experiment. 第二, the segmentation
test followed a go/no-go structure: test words (IE。, /nipe/, /tadi/, /mide/), part-words (IE。,
/sute/, /viko/, /bara/), and non-words (IE。, /visu/, /tami/, /rako/) were presented one at a time
and participants were instructed to indicate whether they were or were not words from the
language they had just heard (by pressing to “s” or “n”, corresponding to “sim” [是的] 或者
“não” [不] 在葡萄牙语). Each stimuli was tested 6 次 (total of 54 试验). 第三, 注意力
checks were conducted during the familiarization and segmentation test. At each familiariza-
tion block, participants were prompted to answer five simple questions (IE。, “Are you alive?”,
“Are you sleeping?”, “Are you breathing?”, “Are you dead?”, “Are you awake?”). Between seg-
mentation test trials, attention checks displayed either a Portuguese word or a made-up word
(例如, “mesa” [桌子], “drevo”) printed on the screen and participants were prompted to indi-
cate if the word existed in Portuguese or not. During both familiarization and test, 参与者
indicated their answers for attention checks by pressing the “s” or “n” keys on the keyboard.
第四, at the end of the experiment, we checked for compliance to instructions by asking
participants whether they had used the cellphone or if they had taken notes during the
实验.

程序. The experiment was entirely online, hosted on Pavlovia and programmed using
Psychopy3 (Bridges et al., 2020). After agreeing to participate, participants were instructed
to avoid distractions (see previous section), answered a questionnaire about their educational

开放的心态: 认知科学的发现

524

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

background and language abilities, and then started the experimental task. As in Experiment 1,
they were exposed to three phases: familiarization, segmentation test, and mapping test (相同的
as Experiment 1). 此外, attention checks (described before) were presented between
familiarization blocks and between trials during the segmentation test.

数据分析. We followed similar analytical steps from Experiment 1. We first excluded
participants who reported using their mobile phones during the experiment (n = 3) 以及那些
(n = 2) who failed two or more attention checks (out of five questions) during familiarization.
Another five participants were excluded because their reaction times to attention checks in the
familiarization or segmentation tests were greater than 3 SDs from the mean. For the remaining
参与者 (n = 35), we excluded trials with reaction times greater than 3 SDs away from the
意思是 (segmentation: 32 trials overall, 1% 数据的; 映射: 7 trials overall, 1% 的
数据). The final data was entered in mixed-effect logistic regressions. The outcome, predictors,
and random effects for each model is described in the next section.

Results and Discussion

Speech Segmentation. To analyze speech segmentation, we fitted a mixed-effects logistic
regression with words’, part-words’, and non-words’ evaluations as the outcome variable.
Selection of words and rejections of part-words and non-words were coded as correct
responses. Predictors were the chance level (logit of 0.5) and stimuli type (字, part-words,
non-words), stimuli and participants were random intercepts11.

We replicated the results from Experiment 1 – Conflict language. 全面的, participants’ per-
formance was at chance level (米= 0.51, 标准差= 0.15; Figure 7A). The analyses by stimuli type
(Figure 7C) reveal a slight tendency for evaluating words as such (OR = 1.21, 95% CI [0.61,
2.4]), a stronger tendency for correctly rejecting non-words (change in OR = 1.68, 95%
CI [0.67, 4.2]), and a much less accurate judgment when rejecting part-words (change in
OR = 0.41, 95% CI [0.16, 1.03]). As in Experiment 1, there was no correlation between speech
segmentation and self-evaluation (rs Conflict = 0.08).

These results indicate that participants might have tracked both transitional and phonotactic
statistics from familiarization, but used them differently when evaluating stimuli during test. 为了
实例, they might have relied on TP information when evaluating words (higher TP and
lower phonotactics) and phonotactic information when evaluating part-words (lower TP
and higher phonotactics). 最后, the lack of familiarity with non-words (no TP information),
and the balanced phonotactic statistics, might have generated correct non-word rejections.
全面的, our nuanced results could indicate that the go/no-go procedure is not sensitive
enough to capture implicit word representation arising from speech segmentation of a lan-
guage with conflicting statistics—a point we return to in the General Discussion.

Cross-situational Word Learning. To model mapping performance, our mixed-effect logistic
regression had object selection (correct or incorrect) as the outcome variable, chance level
(logit of 0.25) and target stimuli frequency (150 或者 300 repetitions) as predictors, and stimuli
and participants as random intercepts12. As in Experiment 1, participants correctly mapped
both high and low frequency words above chance level (Mhigh = 0.56, SDhigh = 0.31; Mlow =
0.49, SDlow = 0.31; 数字 8), with small differences in the likelihood of correctly selecting
high or low frequency word-object pairs (ORhigh = 1.51, 95% CI [0.75, 3.02]; change in

11 lme4 syntax: selection ∼ chance level + stimuli type + (1|刺激) + (1|参与者).
12 lme4 syntax: selection ∼ chance level + target frequency + (1|刺激) + (1|参与者).

开放的心态: 认知科学的发现

525

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(A) Mean number of correct word selections and part-word and non-word rejections on
数字 7.
实验 2 (米= 0.51, 标准差= 0.15). The solid point represents the overall mean, error bars
represent 95% CIs (non-parametric bootstrap). Points represent the mean for each participant.
The shaded area depicts the distribution of individual responses. The dashed line displays the
chance level (0.5). (乙) Correlations between segmentation and self-evaluation (rs Conflict = 0.08)
on Experiment 2. The size of dots indicates the number of participants that overlap in given coor-
dinates (从 1 到 2). (C) Evaluation by stimuli type (word, part-word, non-word). Positive scores
represent correct selection of words (米= 0.54, 标准差= 0.28) and rejection of part-words (米=
0.35, 标准差= 0.25) and non-words (米= 0.65, 标准差= 0.27). Negative scores represent incorrect rejec-
tions of words and selection of part-words and non-words.

ORlow = 0.68, 95% CI [0.35, 1.36]). 再次, we found a moderate positive correlation between
mapping and self-evaluation (rs = 0.67; 数字 8).

Relationship Between Speech Segmentation and Word Mapping. As in Experiment 1, we ran Spear-
mans’ correlation tests between words’ and objects’ selections (average scores per participant)
to explore potential relationships between speech segmentation and word mapping. 我们发现
a weak positive correlation between segmentation and mapping (rs = 0.32; 数字 9). 再次,
全面的, participants that were better at segmentation were also better at mapping. 更远
exploration by speech segmentation median split (Mdn = 0.5, IQR = 0.24) revealed little dif-
ference between participants above the median (rs = 0.05) and below the median (rs = 0.11),
with no correlation between segmentation and mapping for both groups.

开放的心态: 认知科学的发现

526

Segmentation and Word Learning Dal Ben et al.

(A) Mean number of correct object selections for high (米= 0.56, 标准差= 0.31) and low
数字 8.
(米= 0.49, 标准差= 0.31) frequency pairs on Experiment 2. The solid point represents the overall
意思是, error bars represent 95% CIs (non-parametric bootstrap). The shaded area depicts the distri-
bution of individual responses. The dashed line displays the chance level (0.25). (乙) Correlations
between mapping and self-evaluation (rs Conflict = 0.67) on Experiment 2. The size of dots indicates
the number of participants that overlap in given coordinates (从 1 到 4).

The current experiment was designed to further evaluate the effects of the conflict between
transitional and phonotactic statistics on simultaneous speech segmentation and cross-
situational word learning. 全面的, we replicated Experiment 1: speech segmentation, as mea-
sured by a go/no-go test, was at chance level, but word-object mapping performance was
above chance. 尽管如此, our more sensitive word segmentation test provided some
nuanced information about stimulus representations.

We found that participants were likely to correctly evaluate non-words as such. 这个印迪-
cates that how participants represented words and part-words was most likely the result of the
interplay between phonotactic and transitional probabilities. 例如, stronger phonotac-
tics combined with a probabilistic transitional probability (TP = 0.5) lead participants to incor-
rectly evaluate part-words as words. 另一方面, the weaker phonotactics combined
with deterministic transitional information (TP = 1) prompt only a slight tendency to correctly
evaluate words as such.

As in Experiment 1, speech segmentation performance and self-evaluation indicate that the
conflict between transitional and phonotactic probabilities impaired the formation of clear
word representations, which could have impaired participants’ accuracy when estimating their
knowledge of words from speech. 再次, 然而, despite the absence of clear word repre-
句子, participants were able to map words to objects. Consistent word-object
co-occurrences might have provided sufficient information to promote mapping and some
level of segmentation, despite conflicting phonetic information (Räsänen & Rasilo, 2015).
此外, as in Experiment 1, syllables were not shared between words. Participants could
have tracked co-occurrences between individual syllables and objects to solve the mapping
任务, without relying on any word-level phonetic information. Whereas using this strategy
would allow participants to solve the mapping task, it wouldn’t allow them to gain any
word-level information. 因此, if this was the entirety of the explanation, speech segmentation
performance should have been at chance level for words, part-words, and non-words.

开放的心态: 认知科学的发现

527

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(A) Correlation between segmentation and mapping (rs = 0.32) on Experiment 2. (乙)
数字 9.
Correlations between speech segmentation and mapping for participants with speech segmentation
above the median (Mdn = 0.5, IQR = 0.24; rs = 0.05) and below the median (rs = 0.11).

然而, during the go/no-go test, participants were more likely to correctly select words as
such and to correctly reject non-words, indicating that they tracked word-level statistics to
some degree.

下一个, we discuss a possible design to overcome this important confound as well as some of
the limitations of our study. 而且, 我们, discuss how our preliminary findings broaden our
understanding of statistical learning from multiple cues and prompt further research on the
主题.

GENERAL DISCUSSION

In the present study, we explored whether adults could segment speech streams into words
and map them to objects simultaneously by tracking conditional probabilities across ambigu-
ous presentations. We also investigated the effects of word-level phonotactics in segmentation
and mapping. Phonotactics were either balanced, 对齐, or in conflict with transitional prob-
能力. We found that participants were successful at both the segmentation and mapping
tasks when transitional and phonotactic probabilities were either aligned or balanced across
字. 相比之下, when transitional and phonotactic probabilities were in conflict, 我们没有
find evidence for speech segmentation, but we still found evidence for mapping.

开放的心态: 认知科学的发现

528

Segmentation and Word Learning Dal Ben et al.

Our results offer preliminary support for the idea that a greater set of cues, even in different
方式, supports speech segmentation and word learning in ambiguous situations ( 琼斯
等人。, 2010; Räsänen & Rasilo, 2015). Not only were participants able to segment and map
words simultaneously by tracking conditional probabilities, but their overall performance was
stronger in this simultaneous task in comparison to separate tasks of segmentation and
cross-situational word learning. This adds to the evidence showing that language learners
benefit from combining several sources of linguistic information when learning a new
语言 (Choi et al., 2018; 约翰逊, 2016; 藏红花, 2020; 史密斯等人。, 2018; Tachakourt,
2023; Yurovsky et al., 2012). This combination might be especially useful when dealing with
ambiguity, as even a partial solution to one linguistic challenge could reduce ambiguity in
other linguistic challenges (for related evidence, see Feldman, Griffiths, 等人。, 2013; 费尔德曼,
迈尔斯, 等人。, 2013).

Our design does not provide insights into specific learning strategies used by our partici-
pants. 例如, they could have used an aggregation strategy, gradually segmenting
speech and using the segmented words as anchors for further segmentation and mapping.
Or they could have used hypothesis testing from the start, electing syllable sequences and
testing their co-occurrence with each other and with objects over time. Participants might
have also used a blend of these strategies depending on the level of ambiguity they were fac-
英 ( Yurovsky & Frank, 2015). 此外, our selection of unique syllables for each word
introduced an important confound to our study. Participants could have solved the mapping
task even when ignoring half of the syllables and all word-level statistics. 相比之下, 更多的
efficient segmentation performance for all languages (except for the Conflict language on
实验 1) suggests that participants tracked word-level statistics to some degree. 更远-
更多的, the positive correlation between segmentation and mapping, found in all languages of
both experiments, suggests that both speech segmentation, that relied on word-level statistics,
and cross-situational mapping were in close interaction, potentially retro feeding each other
随着时间的推移 (in line with proposals by Räsänen & Rasilo, 2015).

有趣的是, the conflict between phonotactics and transitional information might have
impaired the formation of clear and explicit word representations, but not the formation
of strong word-object relationships. Whereas this could point to independent processing
of phonetic and audiovisual statistics, it could also be that participants did form clear, 但
implicit, word representations that our explicit measurements (either a two-alternative
forced-choice or go/no-go) were not able to capture. 例如, despite the conflict
between transitional and phonotactic statistics, participants were still able to consistently
reject non-words during Experiment 2, showing that stimuli with different degrees of statis-
tical information were treated differently. It is worth reinforcing that we manipulated TPs and
PPs differently. Whereas TPs were determined when designing the experimental stimuli, PPs
were estimated from participants’ native language. This led to differences in strength
between cues that might have had different effects on participants’ processing. 例如,
they could have relied more on TP to find word boundaries and on PPs to form strong word
陈述. A more direct way to assess these implicit processes and to overcome the
confound of not repeating syllables across words may be to use EEG measures during the
passive familiarization phase.

Recording neural activity during familiarization could inform us about how the alignment
or conflict between TP and PP are processed and what happens when these statistics are
violated (例如, Elmer et al., 2021; François et al., 2017). 此外, the use of neural entrain-
ment analysis could provide direct evidence to whether participants track word-level statistics
(disyllabic words) or individual syllables (例如, Batterink & Paller, 2017; Choi et al., 2020). 在

开放的心态: 认知科学的发现

529

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

an ongoing EEG study in one of our labs, we are measuring: A) whether participants will show
similar ERPs to violations of transitional and phonotactic information presented in the
Balanced and in the Conflict languages and b) the temporal entrainment of their neural activity
during familiarization.

Another invaluable source of information on the learning mechanisms involved in the
simultaneous speech segmentation and mapping are the cognitive processes underlying such
performances. 例如, auditory and visual memory have been shown to predict cross-
situational word learning ( Vlach & DeBrock, 2017; Vlach & 约翰逊, 2013). Differences in
attention have also been found to impact statistical learning (史密斯 & 于, 2013; Yurovsky
等人。, 2013). Future research could measure these and other cognitive processes to better
understand their role in statistical language learning.

Our study was exploratory in nature. Building on our promising initial findings, 未来
replications should put our findings to the test. They could also address some of the short-
comings of the present investigation. Future investigations could, 例如, 操纵
both TPs and PPs in a similar way, leading to a finer control of their contrast. Both cues
could be defined experimentally (例如, Benitez & 藏红花, 2021), or estimated directly from
participants’ natural language. The latter estimation could lead to the design of natural con-
tinuous speech (in line with Hay et al., 2011) that comprises the variability in transitional
and phonotactic probabilities that learners face in the wild, increasing the ecological validity
of statistical learning findings (比照. 史密斯等人。, 2014). Future research could also have par-
ticipants from more heterogeneous backgrounds, 例如, by recruiting participants from
different ages, in different countries, with different socioeconomic status. We tested fairly
homogenous samples of young college students from a single language background. 这
trends we found may not generalize to other populations (Simons et al., 2017). 还, 它可以
be that the statistical learning mechanisms involved in this simultaneous task may have
different roles across development (Choi et al., 2018; Danielson et al., 2017; 史密斯等人。,
2018). Future research could investigate simultaneous statistical language learning across
development to bridge the gaps between young adults, 婴儿, and older adults. 最后,
although highly dynamic, our task comprises only a small sampling of the challenges
(IE。, segmentation and mapping) and statistics (IE。, conditional probabilities) available for
language learners in natural environments. Future studies could improve ecological validity
经过, 例如, combining statistical, prosodic, and semantic information (Hay et al., 2011;
Karaman & 干草, 2018), or diving into natural environments (Bogaerts et al., 2022; Yu et al.,
2021).

Learning languages is difficult. To overcome many linguistic challenges, learners can rely
on several cues. Here we provide preliminary evidence that adults can track conditional prob-
abilities to simultaneously find words in continuous speech and map them to objects across
ambiguous situations. We also show that the level of pre-experimental familiarity with words
can impact their representation. 通过这样做, we contribute to a more nuanced understanding
of how statistical cues interact to promote language learning.

致谢

This work was supported by grants from FAPESP (#2015/26389-7, #2018/04226-7) and CAPES
(#001) to RDB; FAPESP (#2018/18748-5) to ITP; from INCT-ECCE (National Institute on
认识, Behavior and Teaching; CNPq #573972/2008-7, #465686/2014-1, FAPESP
#2008/57705-8, #2014/50909-8) to DHS; and from the NICHD (#R01HD083312) to JFH.
RDB is now at Ambrose University.

开放的心态: 认知科学的发现

530

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

作者贡献

RDB: 概念化, 方法, 软件, 调查, 可视化, Data Formal
分析, Writing – Original Draft; ITP: 调查, Data Formal Analysis, Writing – Original
Draft; DHS & JFH: 概念化, Writing – 审查 & Editing, 监督, 项目
行政, 资金获取.

DATA AVAILABILITY STATEMENT

The materials, 代码, and data from this study are openly available on Open Science Frame-
work at https://osf.io/rs2bm/.

参考

Aslin, 右. N。, 藏红花, J. R。, & Newport, 乙. L. (1998). Computation of
conditional probability statistics by 8-month-old infants. 心理-
logical Science, 9(4), 321–324. https://doi.org/10.1111/1467
-9280.00063

Barr, D. J。, 征收, R。, Scheepers, C。, & Tily, H. J. (2013). Random
effects structure for confirmatory hypothesis testing: Keep it
maximal. 记忆与语言杂志, 68(3), 255–278.
https://doi.org/10.1016/j.jml.2012.11.001, 考研: 24403724
Bates, D ., Mächler, M。, Bolker, B., & 沃克, S. (2015). Fitting linear
mixed-effects models using lme4. 统计软件杂志,
67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Batterink, L. J。, & Paller, K. A. (2017). Online neural monitoring of
statistical learning. Cortex, 90, 31–45. https://doi.org/10.1016/j
.cortex.2017.02.004, 考研: 28324696

Benitez, V. L。, & 藏红花, J. 右. (2021). Two for the price of one:
Concurrent learning of words and phonotactic regularities from
continuous speech. PLoS ONE, 16(6), Article e0253039. https://
doi.org/10.1371/journal.pone.0253039, 考研: 34115799
黑色的, A。, & Bergmann, C. (2017). Quantifying infants’ statistical
word segmentation: A meta-analysis. In G. Gunzelmann, A.
Howes, 时间. Tenbrink, & 乙. J. Davelaar (编辑。), 诉讼程序
39th Annual Conference of
认知科学学会
(PP. 124–129). 认知科学学会.

Bogaerts, L。, Siegelman, N。, Christiansen, 中号. H。, & Frost, 右. (2022).
Is there such a thing as a ‘good statistical learner’? 趋势
认知科学, 26(1), 25–37. https://doi.org/10.1016/j.tics
.2021.10.012, 考研: 34810076

Brent, 中号. R。, & Siskind, J. 中号. (2001). The role of exposure to
isolated words in early vocabulary development. 认识,
81(2), B33–B44. https://doi.org/10.1016/S0010-0277(01)00122
-6, 考研: 11376642

Bridges, D ., Pitiot, A。, MacAskill, 中号. R。, & Peirce, J. 瓦. (2020). 这
timing mega-study: Comparing a range of experiment generators,
both lab-based and online. 同行杂志, 8, Article e9414. https://doi.org
/10.7717/peerj.9414, 考研: 33005482

Cannistraci, 右. A。, Dal Ben, R。, Karaman, F。, Esfahani, S. P。, & 干草,
J. F. (2019). Statistical learning approaches to studying language
发展. 在J. S. Horst & J. von Koss Torkildsen (编辑。),
International handbook of language acquisition (PP. 51–75).
劳特利奇. https://doi.org/10.4324/9781315110622-4

Choi, D ., Batterink, L. J。, 黑色的, A. K., Paller, K. A。, & Werker, J. F.
(2020). Preverbal infants discover statistical word patterns at
similar rates as adults: Evidence from neural entrainment. Psy-
chological Science, 31(9), 1161–1173. https://doi.org/10.1177
/0956797620933237, 考研: 32865487

Choi, D ., 黑色的, A. K., & Werker, J. F. (2018). Cascading and
multisensory influences on speech perception development.

头脑, Brain, and Education, 12(4), 212–223. https://doi.org/10
.1111/mbe.12162

Clerkin, 乙. M。, 哈特, E., Rehg,

J. M。, 于, C。, & 史密斯, L. 乙.
(2017). Real-world visual statistics and infants’ first-learned
object names. Philosophical Transactions of the Royal Society
of London, Series B: Biological Sciences, 372(1711), 文章
20160055. https://doi.org/10.1098/rstb.2016.0055, 考研:
27872373

考克斯, C. 中号. M。, Keren-Portnoy, T。, Roepstorff, A。, & Fusaroli, 右.
(2022). A Bayesian meta-analysis of infants’ ability to perceive
audio–visual congruence for speech. Infancy, 27(1), 67–96.
https://doi.org/10.1111/infa.12436, 考研: 34542230

Cristia, A. (2018). Can infants learn phonology in the lab? A
meta-analytic answer. 认识, 170, 312–327. https://doi.org
/10.1016/j.cognition.2017.09.016, 考研: 29102857

Cunillera, T。, Càmara, E., Laine, M。, & Rodríguez-Fornells, A.
(2010). Speech segmentation is facilitated by visual cues.
Quarterly Journal of Experimental Psychology, 63(2), 260–274.
https://doi.org/10.1080/17470210902888809, 考研:
19526435

Cunillera, T。, & Guilera, G. (2018). Twenty years of statistical learning:
From language, back to machine learning. Scientometrics, 117(1),
1–8. https://doi.org/10.1007/s11192-018-2856-x

Cunillera, T。, Laine, M。, Càmara, E., & Rodríguez-Fornells, A. (2010).
Bridging the gap between speech segmentation and word-to-
world mappings: Evidence from an audiovisual statistical learning
任务. 记忆与语言杂志, 63(3), 295–305. https://土井
.org/10.1016/j.jml.2010.05.003

Dal Ben, R。, Souza, D. d. H, & 干草, J. F. (2019). Cross-situational
word learning: Systematic review and meta-analysis. Manuscript
in preparation. https://doi.org/10.17605/OSF.IO/GU9RB

Dal Ben, R。, Souza, D. d. H, & 干草, J. F. (2021). When statistics
collide: The use of transitional and phonotactic probability cues
to word boundaries. 记忆 & 认识, 49(7), 1300–1310.
https://doi.org/10.3758/s13421-021-01163-4, 考研: 33751490
Dal Ben, R。, Souza, D. d. H, & 干草, J. F. (2022). Combining statis-
抽动症: The role of phonotactics on cross-situational word learning.
Psicologia: Reflexao e Critica, 35(1), 文章 30. https://doi.org/10
.1186/s41155-022-00234-y, 考研: 36169750

Danielson, D. K., Bruderer, A. G。, Kandhadai, P。, Vatikiotis-Bateson,
E., & Werker, J. F. (2017). The organization and reorganization of
audiovisual speech perception in the first year of life. 认知的
发展, 42, 37–48. https://doi.org/10.1016/j.cogdev.2017
.02.004, 考研: 28970650

Dutoit, T。, Pagel, 五、, Pierret, N。, Bataille, F。, & van der Vrecken, 氧.
(1996). The MBROLA project: Towards a set of high quality
speech synthesizers free of use for non commercial purposes.

开放的心态: 认知科学的发现

531

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

In Proceedings of Fourth International Conference on Spoken
语言处理 (卷. 3, PP. 1393–1396). IEEE. https://土井
.org/10.1109/ICSLP.1996.607874

Elmer, S。, Valizadeh, S. A。, Cunillera, T。, & Rodriguez-Fornells, A.
(2021). Statistical learning and prosodic bootstrapping differen-
tially affect neural synchronization during speech segmentation.
神经影像, 235, 文章 118051. https://doi.org/10.1016/j
.neuroimage.2021.118051, 考研: 33848624

Estivalet, G. L。, & 莫尼耶, F. (2015). The Brazilian Portuguese
Lexicon: An instrument for psycholinguistic research. PLoS ONE,
10(12), Article e0144016. https://doi.org/10.1371/journal.pone
.0144016, 考研: 26630138

费尔德曼, 氮. H。, Griffiths, 时间. L。, Goldwater, S。, & 摩根, J. L.
(2013). A role for the developing lexicon in phonetic category
acquisition. 心理评论, 120(4), 751–778. https://土井
.org/10.1037/a0034245, 考研: 24219848

费尔德曼, 氮. H。, 迈尔斯, 乙. B., 白色的, K. S。, Griffiths, 时间. L。, &
摩根, J. L. (2013). Word-level information influences phonetic
learning in adults and infants. 认识, 127(3), 427–438.
https://doi.org/10.1016/j.cognition.2013.02.007, 考研:
23562941

芬恩, A. S。, & Hudson Kam, C. L. (2008). The curse of knowledge:
First language knowledge impairs adult learners’ use of novel
statistics for word segmentation. 认识, 108(2), 477–499.
https://doi.org/10.1016/j.cognition.2008.04.002, 考研:
18533142

François, C。, Cunillera, T。, 加西亚, E., Laine, M。, & 罗德里格斯-
Fornells, A. (2017). Neurophysiological evidence for the inter-
play of speech segmentation and word-referent mapping during
novel word learning. Neuropsychologia, 98, 56–67. https://土井
.org/10.1016/j.neuropsychologia.2016.10.006, 考研:
27732869

Frank, 中号. C。, Mansinghka, 五、, 吉布森, E., & Tenenbaum, J. 乙.
(2007). Word segmentation as word learning: Integrating stress
and meaning with distributional cues. 在H. Caunt-Nulton, S.
Kulatilake, & 我. Woo (编辑。), Proceedings of the 31st Annual
Boston University Conference on Language Development
(PP. 218–229). 波士顿大学.

Graf Estes, K. (2009). From tracking statistics to learning words: Sta-
tistical learning and lexical acquisition. Linguistics and Language
Compass, 3(6), 1379–1389. https://doi.org/10.1111/j.1749-818X
.2009.00164.X

Graf Estes, K., 爱德华兹, J。, & 藏红花, J. 右. (2011). Phonotactic con-
straints on infant word learning. Infancy, 16(2), 180–197. https://
doi.org/10.1111/j.1532-7078.2010.00046.x , 考研:
21297877

Graf Estes, K., 埃文斯, J. L。, Alibali, 中号. W., & 藏红花, J. 右. (2007).
Can infants map meaning to newly segmented words? Statistical
segmentation and word learning. 心理科学, 18(3),
254–260. https://doi.org/10.1111/j.1467-9280.2007.01885.x,
考研: 17444923

干草, J. F。, Pelucchi, B., Graf Estes, K., & 藏红花, J. 右. (2011). Linking
sounds to meanings: Infant statistical learning in a natural lan-
规格. 认知心理学, 63(2), 93–106. https://doi.org/10
.1016/j.cogpsych.2011.06.002, 考研: 21762650

干草, J. F。, & 藏红花, J. 右. (2012). Rhythmic grouping biases constrain
infant statistical learning. Infancy, 17(6), 610–641. https://doi.org
/10.1111/j.1532-7078.2011.00110.x, 考研: 23730217

Horst, J. S。, & Hout, 中号. C. (2016). The Novel Object and Unusual
姓名 (NOUN) Database: A collection of novel images for use
in experimental research. Behavior Research Methods, 48(4),
1393–1409. https://doi.org/10.3758/s13428-015-0647-3,
考研: 26424438

约翰逊, 乙. K. (2016). Constructing a proto-lexicon: An integrative
view of infant language development. Annual Review of Linguis-
抽动症, 2, 391–412. https://doi.org/10.1146/annurev-linguistics
-011415-040616

约翰逊, 乙. K., Seidl, A。, & Tyler, 中号. D. (2014). The edge factor in
early word segmentation: Utterance-level prosody enables word
form extraction by 6-month-olds. PLoS ONE, 9(1), 文章
e83546, https://doi.org/10.1371/journal.pone.0083546,
考研: 24421892

约翰逊, 乙. K., & Tyler, 中号. D. (2010). Testing the limits of statistical
learning for word segmentation. Developmental Science 13(2),
339–345. https://doi.org/10.1111/j.1467-7687.2009.00886.x,
考研: 20136930

琼斯, 乙. K., 约翰逊, M。, & Frank, 中号. C. (2010). Learning words
and their meanings from unsegmented child-directed speech.
In Human Language Technologies: The Annual Conference of
the North American Chapter of the Association for Computational
语言学 (PP. 501–509). Association for Computational Lin-
语言学. https://aclanthology.org/N10-1074/

Karaman, F。, & 干草, J. F. (2018). The longevity of statistical learning:
When infant memory decays, isolated words come to the rescue.
实验心理学杂志: 学习, 记忆, and Cog-
尼尼申, 44(2), 221–232. https://doi.org/10.1037/xlm0000448,
考研: 28782968

Krogh, L。, Vlach, H. A。, & 约翰逊, S. 磷. (2013). Statistical learning
across development: Flexible yet constrained. Frontiers in Psy-
chology, 3, 文章 598. https://doi.org/10.3389/fpsyg.2012
.00598, 考研: 23430452

Lany, J。, & 藏红花, J. 右. (2010). From statistics to meaning: Infants’
acquisition of lexical categories. 心理科学, 21(2),
284–291. https://doi.org/10.1177/0956797609358570,
考研: 20424058

Mattys, S. L。, & Jusczyk, 磷. 瓦. (2001). Do infants segment words or
recurring contiguous patterns? Journal of Experimental Psychol-
奥吉: Human Perception and Performance, 27(3), 644–655.
https://doi.org/10.1037/0096-1523.27.3.644, 考研:
11424651

Mattys, S. L。, Jusczyk, 磷. W., Luce, 磷. A。, & 摩根, J. L. (1999).
Phonotactic and prosodic effects on word segmentation in
婴儿. 认知心理学, 38(4), 465–494. https://doi.org
/10.1006/cogp.1999.0721, 考研: 10334878

Mersad, K., & Nazzi, 时间. (2011). Transitional probabilities and posi-
tional frequency phonotactics in a hierarchical model of speech
segmentation. 记忆 & 认识, 39(6), 1085–1093. https://
doi.org/10.3758/s13421-011-0074-3, 考研: 21312017

Mirman, D ., 马格努森, J. S。, Graf Estes, K., & 狄克逊, J. A. (2008).
The link between statistical segmentation and word learning in
adults. 认识, 108(1), 271–280. https://doi.org/10.1016/j
.cognition.2008.02.003, 考研: 18355803

Peirce, J。, Gray, J. R。, 辛普森, S。, MacAskill, M。, Höchenberger, R。,
Sogo, H。, Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2:
Experiments in behavior made easy. Behavior Research Methods,
51(1), 195–203. https://doi.org/10.3758/s13428-018-01193-y,
考研: 30734206

Quine, 瓦. V. 氧. (1960). Word and object. 与新闻界.
R核心团队. (2021). 右: 统计语言和环境

计算. R Foundation for Statistical Computing.

Räsänen, 奥。, & Rasilo, H. (2015). A joint model of word segmenta-
tion and meaning acquisition through cross-situational learning.
心理评论, 122(4), 792–829. https://doi.org/10.1037
/a0039702, 考研: 26437151

龙伯格, A. R。, & 藏红花, J. 右. (2010). Statistical learning and
language acquisition. Wiley Interdisciplinary Reviews: 认知的

开放的心态: 认知科学的发现

532

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
哦
p
米

我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

我

1
0
1
1
6
2
哦
p
米
_
A
_
0
0
0
9
5
2
1
5
0
9
6
1
哦
p
米
_
A
_
0
0
0
9
5
p
d

我

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

Segmentation and Word Learning Dal Ben et al.

科学, 1(6), 906–914. https://doi.org/10.1002/wcs.78,
考研: 21666883

藏红花, J. 右. (2020). Statistical language learning in infancy. Child
Development Perspectives, 14(1), 49–54. https://doi.org/10.1111
/cdep.12355, 考研: 33912228

藏红花, J. R。, Aslin, 右. N。, & Newport, 乙. L. (1996). Statistical
learning by 8-month-old infants. 科学, 274(5294), 1926–1928.
https://doi.org/10.1126/science.274.5294.1926, 考研:
8943209

Scheel, A. M。, Tiokhin, L。, Isager, 磷. M。, & Lakens, D. (2021). 为什么
hypothesis testers should spend less time testing hypotheses. Per-
spectives on Psychological Science, 16(4), 744–755. https://土井
.org/10.1177/1745691620966795, 考研: 33326363

Shoaib, A。, 王, T。, 干草, J. F。, & Lany, J. (2018). Do infants learn
words from statistics? Evidence from English-learning infants
hearing Italian. 认知科学, 42(8), 3083–3099. https://
doi.org/10.1111/cogs.12673, 考研: 30136301

Shukla, M。, 白色的, K. S。, & Aslin, 右. 氮. (2011). Prosody guides the
rapid mapping of auditory word forms onto visual objects in
6-mo-old infants. 美国国家科学院院刊
Sciences of the United States of America, 108(15), 6038–6043.
https://doi.org/10.1073/pnas.1017617108, 考研: 21444800
Simons, D. J。, Shoda, Y。, & Lindsay, D. S. (2017). Constraints on
Generality (COG): A proposed addition to all empirical papers.
Perspectives on Psychological Science, 12(6), 1123–1128. https://
doi.org/10.1177/1745691617708630, 考研: 28853993

史密斯, L. 乙, Jayaraman, S。, Clerkin, E., & 于, C. (2018). 开发-
oping infant creates a curriculum for statistical learning. 趋势
认知科学, 22(4), 325–336. https://doi.org/10.1016/j.tics
.2018.02.004, 考研: 29519675

史密斯, L. B., Suanda, S. H。, & 于, C. (2014). The unrealized promise
of infant statistical word-referent learning. Trends in Cognitive
科学, 18(5), 251–258. https://doi.org/10.1016/j.tics.2014.02
.007, 考研: 24637154

史密斯, L. B., & 于, C. (2008). Infants rapidly learn word-referent
mappings via cross-situational statistics. 认识, 106(3),
1558–1568. https://doi.org/10.1016/j.cognition.2007.06.010,
考研: 17692305

史密斯, L. B., & 于, C. (2013). Visual attention is not enough: Indi-
vidual differences in statistical word-referent learning in infants.
Language Learning and Development, 9(1), 25–49. https://doi.org
/10.1080/15475441.2012.707104, 考研: 24403867

Steber, S。, & Rossi, S. (2020). So young, yet so mature? Electrophys-
iological and vascular correlates of phonotactic processing in
18-month-olds. Developmental Cognitive Neuroscience, 43,
文章 100784. https://doi.org/10.1016/j.dcn.2020.100784,
考研: 32510350

Storkel, H. L。, Bontempo, D. E., Aschenbrenner, A. J。, Maekawa, J。,
& 李, S.-Y. (2013). The effect of incremental changes in phono-
tactic probability and neighborhood density on word learning by
preschool children. Journal of Speech, 语言, and Hearing
研究, 56(5), 1689–1700. https://doi.org/10.1044/1092
-4388(2013/12-0245), 考研: 23882005

Sundara, M。, 周, Z. L。, Breiss, C。, Katsuda, H。, & Steffman, J.
(2022). Infants’ developing sensitivity to native language

phonotactics: A meta-analysis. 认识, 221, 文章 104993.
https://doi.org/10.1016/j.cognition.2021.104993, 考研:
34953268

Swingley, D. (1999). Conditional probability and word discovery: A
corpus analysis of speech to infants. 在米. Hahn & S. C. Stoness
(编辑。), Proceedings of the 21st Annual Conference of the Cogni-
tive Science Society (PP. 724–729). Psychology Press. https://土井
.org/10.4324/9781410603494-131

Tachakourt, 是. (2023). Simultaneous speech segmentation and
cross-situational statistical learning in monolinguals, bilinguals,
and multilinguals. Journal of Applied Language and Cultural
学习, 6(1), 110–134. https://revues.imist.ma/index.php/JALCS
/article/view/36374/18533

蒂森, 乙. D. (2010). Effects of visual information on adults’ and
infants’ auditory statistical learning. 认知科学, 34(6),
1093–1106. https://doi.org/10.1111/j.1551-6709.2010.01118.x,
考研: 21564244

Vitevitch, 中号. S。, & Luce, 磷. A. (2004). A Web-based interface to
calculate phonotactic probability for words and nonwords in
英语. Behavior Research Methods, Instruments, & 电脑,
36(3), 481–487. https://doi.org/10.3758/BF03195594, 考研:
15641436

Vlach, H. A。, & DeBrock, C. A. (2017). Remember dax? 关系
between children’s cross-situational word learning, 记忆, 和
language abilities. 记忆与语言杂志, 93,
217–230. https://doi.org/10.1016/j.jml.2016.10.001, 考研:
28503024

Vlach, H. A。, & 约翰逊, S. 磷. (2013). Memory constraints on
infants’ cross-situational statistical learning. 认识, 127(3),
375–382. https://doi.org/10.1016/j.cognition.2013.02.015,
考研: 23545387

哪个, C. D. (2004). Universal Grammar, statistics or both? 趋势
认知科学, 8(10), 451–456. https://doi.org/10.1016/j.tics
.2004.08.006, 考研: 15450509

于, C。, & 史密斯, L. 乙. (2007). Rapid word learning under uncer-
tainty via cross-situational statistics. 心理科学,
18(5), 414–420. https://doi.org/10.1111/j.1467-9280.2007
.01915.X, 考研: 17576281

于, C。, 张, Y。, Slone, L. K., & 史密斯, L. 乙. (2021). The infant’s
view redefines the problem of referential uncertainty in early
word learning. 美国国家科学院院刊
美利坚合众国, 118(52), Article e2107019118.
https://doi.org/10.1073/pnas.2107019118, 考研: 34933998
Yurovsky, D ., & Frank, 中号. C. (2015). An integrative account of
constraints on cross-situational learning. 认识, 145, 53–62.
https://doi.org/10.1016/j.cognition.2015.07.013, 考研:
26302052

Yurovsky, D ., 于, C。, & 史密斯, L. 乙. (2012). Statistical speech seg-
mentation and word learning in parallel: Scaffolding from
child-directed speech. 心理学前沿, 3, 文章 374.
https://doi.org/10.3389/fpsyg.2012.00374, 考研: 23162487
Yurovsky, D ., 于, C。, & 史密斯, L. 乙. (2013). Competitive processes
in cross-situational word learning. 认知科学, 37(5),
891–921. https://doi.org/10.1111/cogs.12035, 考研:
23607610