Transactions of the Association for Computational Linguistics, 2 (2014) 93–104. Action Editor: Stefan Riezler.

Transactions of the Association for Computational Linguistics, 2 (2014) 93–104. Action Editor: Stefan Riezler.

Submitted 12/2013; Published 2/2014. C
(cid:13)

2014 Verein für Computerlinguistik.

ExploringtheRoleofStressinBayesianWordSegmentationusingAdaptorGrammarsBenjaminB¨orschinger1,2MarkJohnson1,31DepartmentofComputing,MacquarieUniversity,Sydney,Australia2DepartmentofComputationalLinguistics,HeidelbergUniversity,Heidelberg,Germany3SantaFeInstitute,SantaFe,USA{benjamin.borschinger|mark.johnson}@mq.edu.auAbstractStresshaslongbeenestablishedasamajorcueinwordsegmentationforEnglishinfants.Weshowthatenablingacurrentstate-of-the-artBayesianwordsegmentationmodeltotakead-vantageofstresscuesnoticeablyimprovesitsperformance.Wefindthattheimprovementsrangefrom10to4%,dependingonboththeuseofphonotacticcuesand,toalesserex-tent,theamountofevidenceavailabletothelearner.Wealsofindthatinparticularearlyon,stresscuesaremuchmoreusefulforourmodelthanphonotacticcuesbythemselves,consistentwiththefindingthatchildrendoseemtousestresscuesbeforetheyusephono-tacticcues.Finally,westudyhowthemodel’sknowledgeaboutstresspatternsevolvesovertime.Wenotonlyfindthatourmodelcor-rectlyacquiresthemostfrequentpatternsrel-ativelyquicklybutalsothattheUniqueStressConstraintthatisattheheartofapreviouslyproposedmodeldoesnotneedtobebuiltinbutcanbeacquiredjointlywithwordsegmen-tation.1IntroductionAmongthefirsttasksachildlanguagelearnerhastosolveispickingoutwordsfromthefluentspeechthatconstitutesitslinguisticinput.1ForEnglish,stresshaslongbeenclaimedtobeausefulcueininfantwordsegmentation(Jusczyketal.,1993;Jusczyketal.,1999b),followingthedemonstra-1Thedatasetsandsoftwaretoreplicateourexperimentsareavailablefromhttp://web.science.mq.edu.au/˜bborschi/tionofitseffectivenessinadultspeechprocess-ing(Cutleretal.,1986).Severalstudieshaveinvestigatedtheroleofstressinwordsegmenta-tionusingcomputationalmodels,usingbothneu-ralnetworkand“algebraic”(asopposedto“statis-tical”)approaches(Christiansenetal.,1998;Yang,2004;LignosandYang,2010;Lignos,2011;Lig-nos,2012).Bayesianmodelsofwordsegmenta-tion(Brent,1999;Goldwater,2007),Jedoch,haveuntilrecentlycompletelyignoredstress.ThesoleexceptioninthisrespectisDoyleandLevy(2013)whoaddedstresscuestotheBigrammodel(Gold-wateretal.,2009),demonstratingthatthisleadstoanimprovementinsegmentationperformance.Inthispaper,weextendtheirworkandshowhowtointegratestresscuesintotheflexibleAdaptorGram-marframework(Johnsonetal.,2007).Thisallowsustobothstartfromastrongerbaselinemodelandtoinvestigatehowtheroleofstresscuesinteractswithotheraspectsofthemodel.Inparticular,wefindthatphonotacticcuestoword-boundariesinter-actwithstresscues,indicatingsynergisticeffectsforsmallinputsandpartialredundancyforlargerin-puts.Overall,wefindthatstresscuesaddroughly6%tokenf-scoretoamodelthatdoesnotaccountforphonotacticsand4%toamodelthatalreadyin-corporatesphonotactics.Relatedlyandinlinewiththefindingthatstresscuesareusedbyinfantsbe-forephonotacticcues(Jusczyketal.,1999a),weob-servethatphonotacticcuesrequiremoreinputthanstresscuestobeusedefficiently.AcloserlookattheknowledgeacquiredbyourmodelsshowsthattheUniqueStressConstraintofYang(2004)canbeacquiredjointlywithsegmentingtheinputinstead

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

94

ofhavingtobepre-specified;andthatourmodelscorrectlyidentifythepredominantstresspatternoftheinputbutunderestimatethefrequencyofiambicwords,whichhavebeenfoundtobemissegmentedbyinfantlearners.Theoutlineofthepaperisasfollows.InSection2wereviewpriorwork.InSection3weintroduceourownmodels.InSection4weexplainourexperimen-talevaluationanditsresults.Section5discussesourfindings,andSection6concludesandprovidessomesuggestionsforfutureresearch.2BackgroundandrelatedworkLexicalstressisthe“accentuationofsyllableswithinwords”(Cutler,2005)andhaslongbeenar-guedtoplayanimportantroleinadultwordrecog-nition.FollowingCutlerandCarter(1987)’sobser-vationthatstressedsyllablestendtooccuratthebe-ginningsofwordsinEnglish,Jusczyketal.(1993)investigatedwhetherinfantsacquiringEnglishtakeadvantageofthisfact.Theirstudydemonstratedthatthisisindeedthecasefor9montholds,al-thoughtheyfoundnoindicationofusingstressedsyllablesascuesforwordboundariesin6montholds.Theirfindingshavebeenreplicatedandex-tendedinsubsequentwork(Jusczyketal.,1999b;ThiessenandSaffran,2003;Curtinetal.,2005;ThiessenandSaffran,2007).Abriefsummaryofthekeyfindingsisasfollows:Englishinfantstreatstressedsyllablesascuesforthebeginningsofwordsfromroughly7monthsofage,suggestingthattheroleplayedbystressneedstobeacquired,andthatthisrequiresantecedentsegmentationbynon-stress-basedmeans(ThiessenandSaffran,2007).Theyalsoexhibitapreferenceforlow-passfilteredstress-initialwordsfromthisage,suggestingthatitisindeedstressandnototherphoneticorphono-tacticpropertiesthataretreatedasacueforword-beginnings(Jusczyketal.,1993).Infact,phontacticcuesseemtobeusedlaterthanstresscues(Jusczyketal.,1999a)andseemtobeoutweighedbystresscues(MattysandJusczyk,2000).Theearliestcomputationalmodelforwordseg-mentationincorporatingstresscuesweareawareofistherecurrentnetworkmodelofChristiansenetal.(1998)andChristiansenandCurtin(1999).Theyonlyreportedaword-tokenf-scoreof44%(grob,segmentationaccuracy:seeSection4),whichisconsiderablybelowtheperformanceofsubsequentmodels,makingadirectcomparisoncomplicated.Yang(2004)introducedasimpleincrementalalgo-rithmthatreliesonstressbyembodyingaUniqueStressConstraint(USC)thatallowsatmostasin-glestressedsyllableperword.Onpre-syllabifiedchilddirectedspeech,hereportedawordtokenf-scoreof85.6%foranon-statisticalalgorithmthatexploitstheUSC.WhiletheUSChasbeenarguedtobenear-to-universalandfollowsfromthe“cul-minativefunctionofstress”(Fromkin,2001;Cutler,2005),thehighscoreYangreportedcruciallyde-pendsoneverywordtokencarryingstress,includingfunctionwords.Morerecently,Lignos(2010,2011,2012)furtherexploredYang’soriginalalgorithm,takingintoaccountthatfunctionwordsshouldnotbeassumedtopossesslexicalstresscues.WhilehisscoresareinlinewiththosereportedbyYang,theimportanceofstressforthislearnerweremoremodest,providingagainofaround2.5%(Lignos,2011).Auch,theYang/Lignoslearnerisunabletoacquireknowledgeabouttherolestressplaysinthelanguage,e.g.thatstresstendstofallonparticularpositionswithinwords.DoyleandLevy(2013)extendtheBigrammodelofGoldwateretal.(2009)byaddingstress-templatestothelexicalgenerator.Astress-templateindicateshowmanysyllablesthewordhas,andwhichofthesesyllables(ifany)arestressed.Thisallowsthemodeltoacquireknowledgeaboutthestresspatternsofitsinputbyassigningdifferentprobabilitiestothedifferentstress-templates.How-ever,DoyleandLevy(2013)donotdirectlyexam-inetheprobabilitiesassignedtothestress-templates;theyonlyreportthattheirmodeldoesslightlypreferstress-initialwordsoverthebaselinemodelbycal-culatingthefractionofstress-initialwordtypesintheoutputsegmentationsoftheirmodels.Theyalsodemonstratethatstresscuesdoindeedaidsegmen-tation,althoughtheirreportedgainof1%intokenf-scoreisevensmallerthanthatreportedbyLig-nos(2011).Ourownapproachdiffersfromtheirsinassumingphonemicratherthanpre-syllabifiedin-put(althoughourmodelcould,trivially,berunonsyllabifiedinputaswell)andmakesuseofAdap-torGrammarsinsteadoftheGoldwateretal.(2009)Bigrammodel,providinguswithaflexibleframe-workforexploringtheusefulnessofstressindiffer-

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

95

entmodels.AdaptorGrammar(Johnsonetal.,2007)isagrammar-basedformalismforspecifyingnon-parametrichierarchicalmodels.Previousworkex-ploredtheusefulnessof,forexample,syllable-structure(Johnson,2008B;JohnsonandGoldwa-ter,2009)ormorphology(Johnson,2008B;Johnson,2008A)inwordsegmentation.TheclosestworktoourownisJohnsonandDemuth(2010)whoinvesti-gatetheusefulnessoftonesforMandarinphonemicsegmentation.Theirwayofaddingtonestoamodelofwordsegmentationisverysimilartoourwayofincorporatingstress.3ModelsWegiveanintuitivedescriptionofthemathemati-calbackgroundofAdaptorGrammarsin3.1,refer-ringthereadertoJohnsonetal.(2007)fortechnicaldetails.ThemodelsweexaminearederivedfromthecollocationalmodelofJohnsonandGoldwater(2009)byvaryingthreeparameters,resultingin6models:twobaselinesthatdonottakeadvantageofstresscuesandeitherdoordonotusephonotactics,asdescribedinSection3.2;andfourstressmodelsthatdifferwithrespecttotheuseofphonotactics,andastowhethertheyembodytheUniqueStressConstraintintroducedbyYang(2004).Wedescribethesemodelsinsection3.3.3.1AdaptorGrammarsBriefly,anAdaptorGrammar(AG)canbeseenasaprobabilisticcontext-freegrammar(PCFG)withaspecialsetofadaptednon-terminals.Weuseun-derliningtodistinguishadaptednon-terminals(X)fromnon-adaptednon-terminals(Y).Thedistri-butionforeachadaptednon-terminalXisdrawnfromaPitman-YorProcesswhichtakesasitsbase-distributionthetree-distributionovertreesrootedinXasdefinedbythePCFG.Asaneffect,eachadaptednon-terminalcanbeseenashavingassoci-atedwithitacacheofpreviously-generatedsubtreesthatcanbereusedwithouthavingtoberegeneratedusingtheindividualPCFGrules.ThisallowsAGstolearnreusablesub-treessuchaswords,sequencesofwords,orsmallerunitssuchasOnsetsandCodas.Thus,whileordinaryPCFGshaveafinitenumberofparameters(oneprobabilityforeachrule),Adap-torGrammarsinadditionhaveaparameterforeverypossiblecompletetreerootedinanyofitsadaptednon-terminals,leadingtoapotentiallyinfinitenum-berofsuchparameters.ThePitman-YorProcessin-ducesarich-get-richerdynamics,biasingthemodeltowardsidentifyingasmallsetofunitsthatcanbereusedasoftenaspossible.Inthecaseofwordseg-mentation,themodelwilltrytoidentifyascompactalexiconaspossibletosegmenttheunsegmentedinput.3.2BaselinemodelsOurstartingpointisthestate-of-the-artAGmodelforwordsegmentation,JohnsonandGoldwater(2009)’scolloc3-syllmodel,reproducedinFig-ure1.2Themodelassumesthatwordsaregroupedintolargercollocationalunitsthatthemselvescanbegroupedintoevenlargercollocationalunits.Thisaccountsforthefactthatinnaturallanguage,therearestrongword-to-worddependenciesthatneedtobeaccountedforifsevereundersegmentationsoftheform“isinthe”aretobeavoided(Goldwater,2007;JohnsonandGoldwater,2009;B¨orschingeretal.,2012).Italsousesalanguage-independentformofsyllablestructuretoconstrainthespaceofpossi-blewords.Finally,thismodelcanlearnword-initialonsetsandword-finalcodas.InalanguagelikeEn-glish,thisabilityprovidesadditionalcuestoword-boundariesascertainonsetsaremuchmorelikelytooccurword-initiallythanmedially(e.g.“bl”in“black”),andanalogouslyforcertaincodas(e.g.“dth”in“width”or“ngth”in“strength”).Wedefineanadditionalbaselinemodelbyreplac-ingrules(5)Und(6)von(17),anddeletingrules(7)Zu(12).Thisremovesthemodel’sabilitytousephono-tacticcuestoword-boundaries.Word→Syll(Syll)(Syll)(Syll)(17)WerefertothemodelinFigure1asthecolloc3-phonmodel,andthemodelthatresultsfromsub-stitutingandremovingrulesasdescribedasthecolloc3-nophonmodel.Alternatively,onecouldlimitthemodelsabilitytocaptureword-to-wordde-pendenciesbyremovingrules(1)Zu(3).Thisresults2WefollowJohnsonandGoldwater(2009)inlimitingthelengthofpossiblewordstofoursyllablestospeedupruntime.Inpilotexperiments,thischoicedidnothaveanoticeableeffectonsegmentationperformance.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

96

Collocations3→Collocation3+(1)Collocation3→Collocation2+(2)Collocation2→Collocation+(3)Collocation→Word+(4)Word→SyllIF(5)Word→SyllI(Syll)(Syll)SyllF(6)SyllIF→(OnsetI)RhymeF(7)SyllI→(OnsetI)Rhyme(8)SyllF→(Onset)RhymeF(9)CodaF→Consonant+(10)RhymeF→Vowel(CodaF)(11)OnsetI→Consonant+(12)Syll→(Onset)Rhyme(13)Rhyme→Vowel(Coda)(14)Onset→Consonant+(15)Coda→Consonant+(16)Figure1:Thebaselinemodel.Weuseregular-expressionnotationtoabbreviatemultiplerules.X{N}standsforuptonrepetitionsofX,bracketsindicateoptionality,andX+standsforoneormorerepetitionsofX.Xindicatesanadaptednon-terminal.Rulesthatintroduceterminalsforthepre-terminalsVowel,Consonantareomitted.Refertothemaintextforanexplanationofthegrammar.inthecolloc-model(Johnson,2008B)thathasprevi-ouslybeenfoundtobehavesimilarlytotheBigrammodelusedinDoyleandLevy(2013)(Johnson,2008B;B¨orschingeretal.,2012).Weperformedex-perimentswiththecolloc-modelaswellandfoundsimilarresultstoDoyleandLevy(2013)whichare,whileoverallworse,similarintrendtotheresultsobtainedforthecolloc3-models.Fortherestofthepaper,daher,wewillfocusonvariantsofthecolloc3-model.3.3Stress-basedmodelsInorderforstresscuestobehelpful,themodelmusthavesomewayofassociatingthepositionofstresswithword-boundaries.Intuitively,thereasonstresshelpsinfantsinsegmentingEnglishisthatastressedsyllableisareliableindicatorofthebeginningofaword(Jusczyketal.,1993).Moregenerally,ifthereisa(reasonably)reliablerelationshipbetweenthepositionofstressedsyllablesandbeginnings(orWord→{SSyll|USyll}{1,4}(18)SSyll→(Onset)RhymeS(19)USyll→(Onset)RhymeU(20)RhymeS→Vowel∗(Coda)(21)RhymeU→Vowel(Coda)(22)Onset→Consonant+(23)Coda→Consonant+(24)Figure2:Descriptionoftheall-stress-patternsmodel.WeuseX{M,N}for“atleastmandatmostnrepetitionsofX”and{X|Y}for“eitherXorY”.Stressisasso-ciatedwithavowelbysuffixingitwiththespecialtermi-nalsymbol∗,leadingtoadistinctionbetweenstressed(SSyll)andunstressed(USyll)syllables.Awordcanconsistofanypossiblesequenceofuptofoursyllables,asindicatedbytheregular-expressionnotation.Byad-ditionallyaddinginitialandfinalvariantsofSSyllandUSyllasinFigure1,phonotacticscanbecombinedwithstresscues.endings)ofwords,alearnermightexploitthisrela-tionship.InaBayesianmodel,thisintuitioncanbecapturedbymodifyingthelexicalgenerator,thatis,thedistributionthatgeneratesWords.Here,changingthelexicalgeneratorcorrespondstomodifyingtherulesexpandingWord.Astraight-forwardwaytomodifyitaccordinglyistoenu-merateallpossiblesequencesofstressedandun-stressedsyllables.3AlexicalgeneratorlikethisisgiveninFigure2.Inthedata,stresscuesarerep-resentedusingaspecialterminal“∗”thatfollowsastressedvowel,asillustratedinFigure3.Inthegrammar,“∗”isconstrainedtoonlysurfacefollow-ingaVowel,renderingasyllableinwhichitoccursstressed(SSyll).Syllablesthatdonotcontaina“∗”areconsideredunstressed(USyll).Byperforminginferencefortheprobabilitiesassignedtothedif-ferentexpansionsofrule(18),ourmodelscan,forexample,learnthatabi-syllabicwordthatisstress-initial(atrochee)ismoreprobablethanonethatputsstressonthesecondsyllable(aniamb).Das(partly)capturesthetendencyofEnglishforstress-initialwordsandthusprovideanadditionalcueforidenti-fyingwords;anditisexactlythekindofpreferenceinfantlearnersofEnglishseemtoacquire(Jusczyk3Thisis,inessence,alsothestrategychosenbyDoyleandLevy(2013).

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

97

grammarphonstressUSCcolloc3-nophoncolloc3-phon•colloc3-nophon-stress•colloc3-phon-stress••colloc3-nophon-stress-usc••colloc3-phon-stress-usc•••Table1:Thedifferentmodelsusedinourexperiments.“phon”indicateswhetherphonotacticsareused,“stress”whetherstresscuesareusedand“usc”whethertheUniqueStressConstraintisassumed.orthographicthedo-gieno-stressdhahdaogiystressdhahdao*giyFigure3:Illustrationoftheinput-representationwechoose.Weindicateprimarystressaccordingtothedic-tionarywithbold-faceintheorthography.ThephonemictranscriptionusesARPABETandisproducedusinganextendedversionofCMUDict.Primarystressisindi-catedbyinsertingthespecialsymbol“*”afterthevowelofastressedsyllable.etal.,1993).Wecancombinethislexicalgeneratorwiththecolloc3-nophonbaseline,resultinginthecolloc3-nophon-stressmodel.Wecanalsoaddphonotac-ticstothelexicalgeneratorinFigure2byaddinginitialandfinalvariantsofSSyllandUSyll,anal-ogoustorules(5)Zu(12)inFigure1.Thisyieldsthecolloc3-phon-stressmodel.WecanalsoaddtheUniqueStressConstraint(USC)(Yang,2004)byexcludingallvariantsofrule(18)thatgeneratetwoormorestressedsyllables.Forexample,whilethelexicalgeneratorforthecolloc3-nophon-stressmodelwillincludetheruleWord→SSyllSSyll,thelexicalgeneratorembodyingtheUSClacksthisrule.WerefertothemodelsthatincludetheUSCascolloc3-nophon-stress-uscandcolloc3-phon-stress-uscmodels.AcompactoverviewofthesixdifferentmodelsisgiveninTable1.4ExperimentsWeevaluateourmodelsonseveralcorporaofchilddirectedspeech.Wefirstdescribethecorporaweused,thentheexperimentalmethodologyemployedandfinallytheexperimentalresults.Asthetrendiscomparableacrossallcorpora,weonlydiscussindetailresultsobtainedontheAlexcorpus.Forcom-pleteness,Jedoch,Table3reportsthe“standard”evaluationofperforminginferenceoverallofthethreecorpora.4.1CorporaandcorpuscreationFollowingChristiansenetal.(1998)andDoyleandLevy(2013),weusetheKormancorpus(Korman,1984)asoneofourcorpora.Itcompriseschild-directedspeechforveryyounginfants,agedbe-tween6and16weeksand,likeallothercor-porausedinthispaper,isavailablethroughtheCHILDESdatabase(MacWhinney,2000).Wede-riveaphonemicizedversionofthecorpususinganextendedversionofCMUDict(CarnegieMellonUniversity,2008)4,aswewereunabletoobtainthestress-annotatedversionofthiscorpususedinprevi-ousexperiments.Thephonemicizedversionispro-ducedbyreplacingeachorthographicwordinthetranscriptwiththefirstpronunciationgivenbythedictionary.CMUDictalsoannotateslexicalstress,andweusethisinformationtoaddstresscuestothecorpus.Weonlycodeprimarylexicalstressesintheinput,ignoringsecondarystressesinlinewithex-perimentalworkthatindicatesthathumanlistenersarecapableofreliablydistinguishingprimaryandsecondarystress(Mattys,2000).Duetotheverylowfrequencyofwordswith3ormoresyllablesinthesecorpora,thischoicehasverylittleeffectonthenumberofstresscuesavailableintheinput.Ourver-sionoftheKormancorpuscontains,intotal,11413utterances.UnlikeChristiansenetal.(1998),Yang(2004),andDoyleandLevy(2013),wefollowLig-nosandYang(2010)inmakingthemorerealisticas-sumptionthatthe94mono-syllabicfunctionwordslistedbySelkirk(1984)neversurfacewithlexicalstress.Asfunctionwordsaccountforroughly50%ofthetokensbutonlyroughly5%ofthetypesinourcorpora,thismeansthatthetypeandtokendistribu-tionofstresspatternsdiffersdramaticallyinallourcorpora,ascanbeseenfromTable2.WealsoaddedstressinformationtotheBrent-Bernstein-Ratnercorpus(Bernstein-Ratner,1987;Brent,1999),followingtheprocedurejustout-lined.Thiscorpusisade-factostandardforevaluat-4http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict.0.7A

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

98

PatternbrentkormanalexTokTypTokTypTokTypW+.48.07.47.08.44.05SW∗.49.86.49.86.52.87WSW∗.03.07.03.06.04.07Other.00.00.00.00.00.00Table2:Relativefrequenciesforstresspatternsforthecorporausedinourstudy.X∗standsfor0ormore,X+foroneormorerepetitionsofX,andSforastressedandWforanunstressedsyllable.Notethestarkasymmetrybetweentypeandtokenfrequenciesforunstressedwords.Uptotwo-decimalplaces,patternsotherthantheonesgivenhaverelativefrequency0.00(frequenciesmightnotsumto1asanartefactofroundingto2decimalplaces).ingmodelsofBayesianwordsegmentation(Brent,1999;Goldwater,2007;Goldwateretal.,2009;JohnsonandGoldwater,2009),comprisingintotal9790utterances.Asourthirdcorpus,weusetheAlexportionoftheProvidencecorpus(Demuthetal.,2006;B¨orschingeretal.,2012).AmajorbenefitoftheProvidencecorpusisthatthevideo-recordingsfromwhichthetranscriptswereproducedareavailablethroughCHILDESalongsidethetranscripts.Thiswillallowfutureworktorelyonevenmorerealis-ticstresscuesthatcanbederiveddirectlyfromtheacousticsignal.Whilebeyondthescopeofthispa-per,webelievechoosingacorpusthatmakesricherinformationavailablewillbeimportantforfutureworkonstress(andotheracoustic)cues.AnothermajorbenefitoftheAlexcorpusisthatitprovideslongitudinaldataforasingleinfant,ratherthanbe-ingaconcatenationoftranscriptscollectedfrommultiplechildren,suchastheKormanandtheBrent-Bernstein-Ratnercorpus.Intotal,theAlexcorpuscomprises17948utterances.Notethatdespitethedifferencesinageofthein-fantsandoverallmake-upofthecorpora,thedis-tributionofstresspatternsacrossthecorporaisroughlythesame,asshownbyTable2forthefirst10,000utterancesofeachofthecorpora.Thissug-geststhatthedistributionofstresspatternsbothatatokenandtypelevelisarobustpropertyofEnglishchild-directedspeech.4.2EvaluationprocedureTheaimofourexperimentsistounderstandthecontributionofstresscuestotheBayesianwordsegmentationmodelsdescribedinSection3.Togetanideaofhowinputsizeinteractswiththis,welookatprefixesofthecorporawithincreasingsizes(100,200,500,1000,2000,5000,and10,000utterances).Inaddition,weareinterestedinunder-standingwhatkindofstresspatternpreferencesourmodelsacquire.Forthis,wealsocollectsamplesoftheprobabilitiesassignedtothedifferentexpansionsofrule(18),allowingustoexaminethisdirectly.Thestandardevaluationofsegmentationmodelsin-volveshavingthemsegmenttheirinputinanun-supervisedmannerandevaluatingperformanceonhowwelltheysegmentedthatinput.Weaddition-allyevaluatethemodelsonatestsetforeachcor-pus.Useofaseparatetestsethaspreviouslybeensuggestedasameansoftestinghowwelltheknowl-edgealearneracquiredgeneralizestonovelutter-ances(Pearletal.,2011),andisrequiredforthekindofcomparisonacrossdifferentsizesofinputweareinterestedintodeterminewhethertheretheroleofstresscuesinteractswiththeinputsize.Wecreatethetest-setsbytakingthefinal1000ut-terancesforeachcorpus.These1000utteranceswillbesegmentedbythemodelafterithasperformedinferenceonitsinput,withoutmakinganyfurtherchangestothelexiconthatthemodelhasinduced.Inotherwords,themodelwillhavetosegmenteachofthetestutterancesusingonlythelexicon(andanyadditionalknowledgeaboutco-occurrences,phono-tactics,andstress)ithasacquiredfromthetrainingportionofthecorpusduringinference.Wemeasuresegmentationperformanceusingthestandardmetricoftokenf-score(Brent,1999)whichistheharmonicmeanoftokenprecisionandrecall.Tokenf-scoreprovidesanoverallimpressionofhowaccurateindividualwordtokenswereidentified.Toillustrate,ifthegoldsegmentationis“thedog”,thesegmentation“thedog”hasatokenprecisionof13(oneoutofthreepredictedwordsiscorrect);atokenrecallof12(oneofthetwogoldwordswascorrectlyidentified);andatokenf-scoreof0.4.4.3InferenceForinference,wecloselyfollowJohnsonandGold-water(2009):weputvaguepriorsonallthehyper-

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
6
8
1
5
6
6
8
1
8

/

/
T

l

A
C
_
A
_
0
0
1
6
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

99

psuscalexkormanbrenttraintesttraintesttraintest.81.81.85.83.82.82•.85.84.86.84.86.86•.86.87.87.86.86.87••.88.88.88.87.87.87••.87.88.87.88.86.87•••.88.88.88.87.87.88Table3:Tokenf-scoresonbothtrainandtestportionsforallthreecorporawheninferenceisperformedoverthefullcorpus.Notethatthebenefitofstressisclearerwhenevaluatingonthetestset,andthatoverall,perfor-manceofthedifferentmodelsiscomparableacrossallthreecorpora.ModelsarecodedaccordingtothekeyinTable1.parametersofourmodelsandrun4chainsfor1000iterations,collecting20samplesfromeachchainwithalagof10iterationsbetweeneachsampleaf-teraburn-inof800iterations,usingbothbatch-initializationandtable-labelresamplingtoensuregoodconvergenceofthesampler.Weconstructasinglesegmentationfromtheposteriorsamplesus-ingtheirminimumBayesriskdecoding,providingasinglescoreforeachcondition.4.4ExperimentalconditionsEachofoursixmodelsisevaluatedoninputsofin-creasingsize,startingat100andendingat10,000utterances,allowingustoinvestigatebothhowper-formanceand“knowledge”ofthelearnervariesasafunctionofinputsize.Forcompleteness,wealsoreportthe“standard”evaluation,i.e.performanceofourmodelsonallcorporawhentrainedontheentireinputinTable3.WewillfocusourdiscussionontheresultsobtainedontheAlexcorpus,whicharede-pictedinFigure4,wheretheinputsizeisdepictedonthex-axis,andthesegmentationf-scoreforthetest-setonthey-axis.5DiscussionWefindaclearimprovementforthestress-modelsoverboththecolloc3-nophonandthecolloc3-phonmodels.AscanbeseeninTable3,theoveralltrendisthesameforallthreecorpora,bothwhenevaluatingontheinputandtheseparatetest-set.55WeperformedWilcoxranksumtestsontheindividualscoresofthe4independentchainsforeachmodelonthefulltrainingdatasetsandfoundthatthestress-modelswerealwaysNotehowtherelativegainforstressisroughly1%higherwhenevaluatingonthetest-set;thismighthavetodowithJusczyk(1997)’sobserva-tionthattheadvantageofstress“mightbemoreevidentforrelativelyunexpectedorunfamiliarizedstrings”(Jusczyk,1997).AcloserlookatFigure4indicatesfurtherinterestingdifferencesbetweenthecolloc3-nophonandthecolloc3-phonmodelsthatonlybecomeevidentwhenconsideringdifferentin-putsizes.5.1StresscueswithoutphonotacticsForthecolloc3-nophonmodels,weobservearel-ativelystableimprovementbyaddingstresscuesof6-7%,irrespectiveofinputsizeandwhetherornottheUniqueStressConstraint(USC)isassumed.Thesoleexceptiontothisoccurswhenthelearneronlygetstosee100utterances:inthiscase,thecolloc-nophon-stressmodelonlyshowsa3%im-provement,whereasthecolloc3-nophon-stress-uscmodelobtainsaboostofroughly8%.Noticeableconsistentdifferencesbetweenthecolloc3-nophon-stressandcolloc3-nophon-stress-uscmodel,how-ever,allbutdisappearstartingfromaround500ut-terances.Thisissomewhatsurprising,consideringthatitistheUSCthatwasarguedbyYang(2004)tobekeyfortakingadvantageofstress.6Wetakethisbehaviourtoindicatethatevenwithaslittleevidenceas200to500utterances,aBayesianideallearnercaneffectivelyinferthatsomethingliketheUSCistrueofEnglish.Thisalsobecomesclearwhenexamininghowthelearn-ers’preferencesfordifferentstresspatternsevolveovertime,aswedoinSection5.3below.5.2StresscuesandphonotacticsOverall,themodelsincludingphonotacticcuesper-formbetterthanthosethatdonotrelyonphono-tactics.However,theoverallgaincontributedbystresstothecolloc3-phonbaselineissmaller,al-significantlymoreaccurate(P<0.05)thanthebaselinemodelsexceptwhenevaluatingonthetrainingdatafortheKormanandBrentcorpora.6Ondatainwhichfunctionwordsaremarkedforstress(asinYang(2004)andDoyleandLevy(2013)),theUSCyieldsex-tremelyhighscoresacrossallmodels,simplybecauseroughlyeverysecondwordisafunctionword.Giventhatthisassump-tionisextremelyunnatural,wedonottakethisasanargumentfortheUSC. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 6 8 1 5 6 6 8 1 8 / / t l a c _ a _ 0 0 1 6 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 100 0.650.700.750.800.8510020050010002000500010000number of input utterancessegmentation f−scorecolloc3−nophon colloc3−phon colloc3−nophon−stress colloc3−phon−stress colloc3−nophon−stress−usc colloc3−phon−stress−uscFigure4:Segmentationperformanceofthedifferentmodels,acrossdifferentinputsizesandasevaluatedonthetest-setfortheAlexcorpus.Theno-stressbaselinesaregiveninred,thestress-modelswithouttheUniqueStressConstraint(USC)ingreenandtheonesincludingtheUSCinblack.Solidlinesindicatemodelsthatuse,dashedlinesmodelsthatdonotusephonotactics.Refertothetextfordiscussion.thoughthisseemstodependonthesizeoftheinput.Whilephonotacticsbyitselfappearstobeapow-erfulcue,yieldinganoticeable4-5%improvementoverthecolloc3-nophonbaseline,thelearnerseemstorequireatleastaround500utterancesbeforethecolloc3-phonmodelbecomesclearlymoreaccuratethanthecolloc3-nophonmodel.Incontrast,evenforonly100utterancesstresscuesbythemselvesprovidea3%improvementtothecolloc3-nophonmodel,indicatingthattheycanbetakenadvantageofearlier.WhilethenumberofutterancesprocessedbyaBayesianideallearnerisnotdirectlyrelatedtodevelopmentalstages,thisobservationisconsistentwiththepsycholinguists’claimthatphonotacticsareusedbyinfantsforwordsegmentationaftertheyhavebeguntousestressforsegmentation(Jusczyketal.,1999a).Turningtotheinteractionbetweenstressandphonotactics,weseethatthereisnoconsistentad-vantageofincludingtheUSCinthemodel.Thisis,infact,evenclearerthanforthecolloc3-nophonmodelwhereatleastforsmallinputsofsize100,theUSCaddedalmost5%inperformance.Forthecolloc3-phonmodels,weonlyobservea1-2%im-provementbyaddingtheUSCupuntil500utter-ances.Thisfurtherstrengthensthepointthatevenintheabsenceofsuchaninnateconstraint,astatisti-callearnercantakeadvantageofstresscuesand,asweshowbelow,actuallyacquiresomethingliketheUSCfromtheinput.The4%differencebetweenthecolloc3-phon-stress/colloc3-phon-stress-uscmodelstothecolloc3-phonbaselineissmallerthanthe7%dif-ferencebetweenthecolloc3-nophonandcolloc3-nophon-stressmodels.Thisshowsthatthereisaredundancybetweenphonotacticandstresscuesinlargeamountsofdata,astheirjointcontributiontothecolloc3-nophonbaselineislessthanthesumoftheirindividualcontributionsat10,000utterances,of4%(forphonotactics)and7%(forstress).Unlikeforthecolloc3-nophonmodels,wealsoseeaclearimpactofinputsize.Inparticular,at100utterancestheadditionofstresscuesleadstoan8–10%improvement,dependingonwhetherornottheUSCisassumed,whereasforthecolloc3-nophonmodelweonlyobserveda3–8%improve-ment.Thisisparticularlystrikingwhenwecon-siderthatbythemselves,thephonotacticcuesonlycontributea1%improvementtothecolloc3-nophonbaselinewhentrainedonthe100utterancecorpus, l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 6 8 1 5 6 6 8 1 8 / / t l a c _ a _ 0 0 1 6 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 101 indicatingasynergisticinteraction(ratherthanre-dundancy)betweenphonotacticsandstressforsmallinputs.Thiseffectdisappearsstartingfromaround1000utterances;forinputsofsize1000andlarger,thenet-gainofstressdropsfromroughly10%toa3–4%improvement.Thatis,whilewedidnotnoticeanyrelationshipbetweeninputsizeandimpactofstresscuesforthecolloc3-nophonmodel,wedoseesuchaninteractionforthecombinationofphonotac-ticsandstresscueswhich,takentogether,leadtoalargerrelativegaininperformanceonsmallerinputsthanonlargeones.5.3AcquisitionofstresspatternsInadditiontoacquiringalexicon,theBayesianlearneracquiresknowledgeaboutthepossiblestresspatternsofEnglishwords.Thefactthatthisknowl-edgeisexplicitlyrepresentedthroughthePCFGrulesandtheirprobabilitiesthatdefinethelexi-calgeneratorallowsustostudythegeneralisationsaboutstressthemodelactuallyacquires.WhileDoyleandLevy(2013)suggestcarryingoutsuchananalysis,theyrestrictthemselvestoestimatingthefractionofstresspatternsinthesegmentedout-put.AsshowninTable2,however,thetypeandtokendistributionsofstresspatternscandiffersub-stantially.Wethereforeinvestigatethestresspref-erencesacquiredbyourlearnerbyexaminingtheprobabilitiesassignedtothedifferentexpansionsofrule(18),aggregatingtheprobabilitiesoftheindi-vidualrulesintopatterns.Forexample,therulesWord→SSyll(USyll){0,3}correspondtothepattern“Stressonthefirstsyllable”,whereastherulesWord→USyll{1,4}correspondtothepat-tern“Unstressedword”.Bycomputingtherespec-tiveprobabilities,wegettheoverallprobabilityas-signedbyalearnertothepattern.Figure5providesthisinformationforseveraldif-ferentrulepatterns.Additionally,theseplotsin-cludetheempiricaltype(reddotted)andtokenpro-portions(reddouble-dashed)fortheinputcorpus.Notehowforthetwomajorpatterns,allmodelssuc-cessfullytrackthetype,ratherthanthetokenfre-quency,correctlydevelopingapreferenceforstress-initialoverunstressedwords,despitethecompa-rabletokenfrequencyofthesetwopatterns.ThisiscompatiblewitharecentproposalbyThiessenandSaffran(2007),whoarguethatinfantsinferthestresspatternovertheirlexicon.ForaBayesianmodelsuchasoursorGoldwateretal.(2009)’s,thereisnoneedtopre-specifythatthedistributionoughttobelearnedovertypesratherthantokens,asthemodelsautomaticallyinterpolatebetweentypeandtokenstatisticsaccordingtothepropertiesoftheirinput(Goldwateretal.,2006).Inaddition,aBayesianframeworkprovidesasimpleanswertothequestionofhowalearnermightidentifytheroleofstressinitslanguagewithoutalreadyhavingac-quiredatleastsomewords.Bycombiningdiffer-entkindsofcues,e.g.distributional,phonotacticandprosodic,inaprincipledmanneraBayesianlearnercanjointlysegmentitsinputandlearntheappropriateroleofeachcue,withouthavingtopre-specifyspecificpreferencesthatmightdifferacrosslanguages.Theiambicrulepatternthatputsstressonthesec-ondsyllableismuchmoreinfrequentonatokenlevel.Allmodelstrackthislowtokenfrequency,underestimatingthetypefrequencyofthispatternbyafairamount.Thissuggeststhatlearningthispatterncorrectlyrequiresconsiderablymoreinputthanfortheotherpatterns.Indeed,theiambicpat-ternisknowntoposeproblemsforinfantswhentheystartusingstressasaneffectivecue.Itisonlyfromroughly10monthsofagethatinfantssuccessfullysegmentiambicwords(Jusczyketal.,1999b).Notsurprisingly,theUSCdoesn’taidinlearningaboutthispatternbecauseitiscompletelysilentonwherestressmightfall(anddoesnotnoticeablyimprovesegmentationperformancetobeginwith).Finally,wecanalsoinvestigatewhetherthemodelsthatlacktheUSCneverthelesslearnthatwordscontainatmostonelexicallystressedsyl-lable.Thebottom-rightgraphinFigure5plotstheprobabilityassignedbythemodelstopatternsthatviolatetheUSC.Thisincludes,forexample,therulesWord→SyllSSyllSandWord→SyllSSyllUSyllS.Notehowtheprobabilitiesas-signedtotheserulesapproacheszero,indicatingthatthelearnerbecomesmorecertainthattherearenowordsthatcontainmorethanonesyllablewithlex-icalstress.Aswearguedabove,thissuggeststhataBayesianlearnercanacquiretheUSCfromamod-estamountofdata—itwillproperlyinferthattheunnaturalpatternsaresimplynotsupportedbytheinput.Tosummarize,byexaminingtheinternal l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 6 8 1 5 6 6 8 1 8 / / t l a c _ a _ 0 0 1 6 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 102 0.550.600.650.700.750.800.8510020050010002000500010000P(Stress on first)0.020.030.040.050.060.0710020050010002000500010000number of input utterancesP(Stress on second)0.050.100.150.200.250.300.350.400.4510020050010002000500010000P(Unstressed word)0.050.1010020050010002000500010000number of input utterancesP(Violates USC)colloc3−nophon−stress colloc3−phon−stress colloc3−nophon−stress−usc colloc3−phon−stress−uscFigure5:EvolutionoftheknowledgethelearneracquiresontheAlexcorpus.Thereddottedlineindicatestheempiricaltypedistributionofaspecificpattern,andthedouble-dashedlinetheempiricaltokendistribution.Top-Left:Stress-initialpattern,Top-Right:UnstressedWords,Bottom-Left:Stress-secondpattern,Bottom-Right:PatternsthatviolatetheUSC.stateoftheBayesianlearnerswecancharacterisehowtheirknowledgeaboutthestresspreferencesoftheirlanguagesdevelops,ratherthanmerelymeasur-inghowwelltheyperformwordsegmentation.WefindthattheiambicpatternthathasbeenobservedtoposeproblemsforinfantlearnersalsoisharderfortheBayesianlearnertoacquire,arguablyduetoitsextremelylowtoken-frequency.6ConclusionandFutureWorkWehavepresentedAdaptorGrammarmodelsofwordsegmentationthatareabletotakeadvantageofstresscuesandareabletolearnfromphonemicinput.Wefindthatphonotacticsandstressinteractininterestingways,andthatstresscuesmakesasta-blecontributiontoexistingwordsegmentationmod-els,improvingtheirperformanceby4-6%tokenf-score.WealsofindthattheUSCintroducedbyYang(2004)neednotbeprebuiltintoamodelbutcanbeacquiredbyaBayesianlearnerfromthedata.Sim-ilarly,wedirectlyinvestigatethestresspreferencesacquiredbyourmodelsandfindthatforstress-initialandunstressedwords,theytracktyperatherthantokenfrequencies.Therarestress-secondpatternseemstorequiremoreinputtobeproperlyacquired,whichiscompatiblewithinfantdevelopmentdata.Animportantgoalforfutureresearchistoeval-uatesegmentationmodelsontypologicallydifferentlanguagesandtostudytherelativeusefulnessofdif-ferentcuescross-lingually.Forexample,languagessuchasFrenchlacklexicalstress;itwouldbeinter-estingtoknowwhetherinsuchacase,phonotactic(orother)cuesaremoreimportant.Relatedly,recentworksuchasB¨orschingeretal.(2013)hasfoundthatartificiallycreateddataoftenmasksthecom-plexityexhibitedbyrealspeech.Thissuggeststhatfutureworkshouldusedatadirectlyderivedfromtheacousticsignaltoaccountforcontextualeffects,ratherthanusingdictionarylook-uporotherheuris-tics.InusingtheAlexcorpus,forwhichgoodqual-ityaudioisavailable,wehavetakenafirststepinthisdirection. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 6 8 1 5 6 6 8 1 8 / / t l a c _ a _ 0 0 1 6 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 103 AcknowledgementsThisresearchwassupportedbytheAustralianResearchCouncil’sDiscoveryProjectsfundingscheme(projectnumbersDP110102506andDP110102593).We’dliketothankProfessorDupouxandourothercolleaguesattheLaboratoiredeSciencesCognitivesetPsycholinguistiqueinParisforhostinguswhilethisresearchwasper-formed,aswellastheMairiedeParis,thefondationPierreGillesdeGennes,theEcoledesHautesEtudesenSciencesSociales,theEcoleNormaleSup´erieure,TheRegionIledeFrance,theEuropeanResearchCouncil(ERC-2011-AdG-295810BOOT-PHON),theAgenceNationalepourlaRecherche(ANR-2010-BLAN-1901-1BOOTLANG,ANR-10-IDEX-0001-02andANR-10-LABX-0087)andtheFondationdeFrance.We’dalsoliketothankthreeanonymousreviewersforhelpfulcommentsandsuggestions.ReferencesN.Bernstein-Ratner.1987.Thephonologyofparent-childspeech.InK.NelsonandA.vanKleeck,editors,Children’sLanguage,volume6.Erlbaum,Hillsdale,NJ.BenjaminB¨orschinger,KatherineDemuth,andMarkJohnson.2012.StudyingtheeffectofinputsizeforBayesianwordsegmentationontheProvidencecor-pus.InProceedingsofthe24thInternationalCon-ferenceonComputationalLinguistics(Coling2012),pages325–340.Coling2012OrganizingCommittee.BenjaminB¨orschinger,MarkJohnson,andKatherineDe-muth.2013.AjointmodelofwordsegmentationandphonologicalvariationforEnglishword-final/t/-deletion.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages1508–1516.AssociationforComputationalLinguistics.M.Brent.1999.Anefficient,probabilisticallysoundalgorithmforsegmentationandworddiscovery.Ma-chineLearning,34:71–105.M.ChristiansenandS.Curtin.1999.Thepowerofsta-tisticallearning:Noneedforalgebraicrules.InPro-ceedingsofthe21stAnnualConferenceoftheCogni-tiveScienceSociety.MortenHChristiansen,JosephAllen,andMarkSSei-denberg.1998.Learningtosegmentspeechusingmultiplecues:Aconnectionistmodel.LanguageandCognitiveProcesses,13(2-3):221–268.SuzanneCurtin,TobenHMintz,andMortenHChris-tiansen.2005.Stresschangestherepresentationallandscape:Evidencefromwordsegmentation.Cog-nition,96(3):233–262.AnneCutlerandDavidMCarter.1987.Thepredomi-nanceofstronginitialsyllablesintheEnglishvocabu-lary.ComputerSpeechandLanguage,2(3):133–142.AnneCutler,JacquesMehler,DennisNorris,andJuanSegui.1986.Thesyllable’sdifferingroleintheseg-mentationofFrenchandEnglish.JournalofMemoryandLanguage,25(4):385–400.AnneCutler.2005.Lexicalstress.InDavidB.PisoniandRobertE.Remez,editors,TheHandbookofSpeechPerception,pages264–289.BlackwellPub-lishing.K.Demuth,J.Culbertson,andJ.Alter.2006.Word-minimality,epenthesis,andcodalicensingintheac-quisitionofEnglish.LanguageandSpeech,49:137–174.GabrielDoyleandRogerLevy.2013.Combiningmul-tipleinformationtypesinBayesianwordsegmenta-tion.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics:HumanLanguageTechnolo-gies,pages117–126.AssociationforComputationalLinguistics.VictoriaFromkin,editor.2001.Linguistics:AnIntro-ductiontoLinguisticTheory.Blackwell,Oxford,UK.SharonGoldwater,TomGriffiths,andMarkJohn-son.2006.Interpolatingbetweentypesandtokensbyestimatingpower-lawgenerators.InY.Weiss,B.Sch¨olkopf,andJ.Platt,editors,AdvancesinNeuralInformationProcessingSystems18,pages459–466.MITPress.SharonGoldwater,ThomasL.Griffiths,andMarkJohn-son.2009.ABayesianframeworkforwordsegmen-tation:Exploringtheeffectsofcontext.Cognition,112(1):21–54.SharonGoldwater.2007.NonparametricBayesianMod-elsofLexicalAcquisition.Ph.D.thesis,BrownUni-versity.MarkJohnsonandKatherineDemuth.2010.Unsu-pervisedphonemicChinesewordsegmentationusingAdaptorGrammars.InProceedingsofthe23rdIn-ternationalConferenceonComputationalLinguistics(Coling2010),pages528–536.Coling2010Organiz-ingCommittee.MarkJohnsonandSharonGoldwater.2009.ImprovingnonparametericBayesianinference:experimentsonunsupervisedwordsegmentationwithadaptorgram-mars.InProceedingsofHumanLanguageTechnolo-gies:The2009AnnualConferenceoftheNorthAmer-icanChapteroftheAssociationforComputational l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 6 8 1 5 6 6 8 1 8 / / t l a c _ a _ 0 0 1 6 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 104 Linguistics,pages317–325.AssociationforCompu-tationalLinguistics.MarkJohnson,ThomasL.Griffiths,andSharonGoldwa-ter.2007.AdaptorGrammars:Aframeworkforspec-ifyingcompositionalnonparametricBayesianmodels.InB.Sch¨olkopf,J.Platt,andT.Hoffman,editors,Ad-vancesinNeuralInformationProcessingSystems19,pages641–648.MITPress,Cambridge,MA.MarkJohnson.2008a.UnsupervisedwordsegmentationforSesothousingAdaptorGrammars.InProceedingsoftheTenthMeetingofACLSpecialInterestGrouponComputationalMorphologyandPhonology,pages20–27.AssociationforComputationalLinguistics.MarkJohnson.2008b.UsingAdaptorGrammarstoidentifysynergiesintheunsupervisedacquisitionoflinguisticstructure.InProceedingsofthe46thAnnualMeetingoftheAssociationofComputationalLinguis-tics,pages398–406.AssociationforComputationalLinguistics.PeterWJusczyk,AnneCutler,andNancyJRedanz.1993.Infants’preferenceforthepredominantstresspatternsofEnglishwords.ChildDevelopment,64(3):675–687.PeterW.Jusczyk,E.A.Hohne,andA.Bauman.1999a.Infants’sensitivitytoallophoniccuesforwordseg-mentation.PerceptionandPsychophysics,61:1465–1476.PeterW.Jusczyk,DerekM.Houston,andMaryNew-some.1999b.ThebeginningsofwordsegmentationinEnglish-learninginfants.CognitivePsychology,39(3-4):159–207.PeterJusczyk.1997.Thediscoveryofspokenlanguage.MITPress,Cambridge,MA.MyronKorman.1984.Adaptiveaspectsofmaternalvo-calizationsindifferingcontextsattenweeks.FirstLanguage,5:44–45.ConstantineLignosandCharlesYang.2010.Reces-sionsegmentation:simpleronlinewordsegmentationusinglimitedresources.InProceedingsoftheFour-teenthConferenceonComputationalNaturalLan-guageLearning,pages88–97.AssociationforCom-putationalLinguistics.ConstantineLignos.2011.Modelinginfantwordseg-mentation.InProceedingsoftheFifteenthConferenceonComputationalNaturalLanguageLearning,pages29–38.AssociationforComputationalLinguistics.ConstantineLignos.2012.Infantwordsegmentation:Anincremental,integratedmodel.InProceedingsoftheWestCoastConferenceonFormalLinguistics30.BrianMacWhinney.2000.TheCHILDESproject:Toolsforanalyzingtalk:VolumeI:Transcriptionformatandprograms,volumeII:Thedatabase.ComputationalLinguistics,26(4):657–657.SvenLMattysandPeterWJusczyk.2000.Phonotac-ticcuesforsegmentationoffluentspeechbyinfants.Cognition,78(2):91–121.SvenLMattys.2000.TheperceptionofprimaryandsecondarystressinEnglish.PerceptionandPsy-chophysics,62(2):253–265.LisaPearl,SharonGoldwater,andMarkSteyvers.2011.OnlinelearningmechanismsforBayesianmodelsofwordsegmentation.ResearchonLanguageandCom-putation,8(2):107–132.ElisabethO.Selkirk.1984.PhonologyandSyntax:TheRelationBetweenSoundandStructure.MITPress.ErikDThiessenandJennyRSaffran.2003.Whencuescollide:useofstressandstatisticalcuestowordboundariesby7-to9-month-oldinfants.Developmen-talPsychology,39(4):706.ErikDThiessenandJennyRSaffran.2007.Learningtolearn:Infantsacquisitionofstress-basedstrategiesforwordsegmentation.LanguageLearningandDevelop-ment,3(1):73–100.CarnegieMellonUniversity.2008.TheCMUpronounc-ingdictionary,v.0.7a.CharlesYang.2004.Universalgrammar,statisticsorboth?TrendsinCognitiveSciences,8(10):451–456.
PDF Herunterladen