Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 63–74. Redattore di azioni: Brian Roark.
Submitted 9/2012; Pubblicato 3/2013. C
(cid:13)
2013 Associazione per la Linguistica Computazionale.
UnsupervisedDependencyParsingwithAcousticCuesJohnKPate†‡j.k.pate@sms.ed.ac.ukSharonGoldwater†sgwater@inf.ed.ac.uk†ILCC,SchoolofInformatics‡DepartmentofComputingUniversityofEdinburghMacquarieUniversityEdinburgh,EH89AB,UKSydney,NSW2109,AustraliaAbstractUnsupervisedparsingisadifficulttaskthatinfantsreadilyperform.Progresshasbeenmadeonthistaskusingtext-basedmodels,butfewcomputationalapproacheshaveconsideredhowinfantsmightbenefitfromacousticcues.Thispaperexploresthehypothesisthatworddurationcanhelpwithlearningsyntax.Wede-scribehowdurationinformationcanbeincor-poratedintoanunsupervisedBayesiandepen-dencyparserwhoseonlyothersourceofinfor-mationisthewordsthemselves(withoutpunc-tuationorpartsofspeech).Ourresults,evalu-atedonbothadult-directedandchild-directedutterances,showthatusingworddurationcanimproveparsequalityrelativetowords-onlybaselines.Theseresultssupporttheideathatacousticcuesprovideusefulevidenceaboutsyntacticstructureforlanguage-learningin-fants,andmotivatetheuseofworddurationcuesinNLPtaskswithspeech.1IntroductionUnsupervisedlearningofsyntaxisdifficultforNLPsystems,yetinfantsperformthistaskroutinely.Pre-viousworkinNLPhasfocusedonusingtheimplicitsyntacticinformationavailableinpart-of-speech(POS)tags(KleinandManning,2004),punctuation(Seginer,2007;Spitkovskyetal.,2011b;Ponvertetal.,2011),andsyntacticsimilaritiesbetweenrelatedlanguages(CohenandSmith,2009;Cohenetal.,2011).Tuttavia,theseapproacheslikelyusethedatainaverydifferentwayfromchildren:neitherPOStagsnorpunctuationareobservedduringlanguageacquisition(althoughseeSpitkovskyetal.(2011UN)andChristodoulopoulosetal.(2012)forencourag-ingresultsusingunsupervisedPOStags),andmanychildrenlearninabroadlymonolingualenvironment.ThispaperexploresapossiblesourceofinformationthatNLPsystemstypicallyignore:wordduration,orthelengthoftimetakentopronounceeachword.Therearegoodreasonstothinkthatworddura-tionmightbeusefulforlearningsyntax.First,thewell-establishedProsodicBootstrappinghypothesis(GleitmanandWanner,1982)proposesthatinfantsuseacoustic-prosodiccues(suchaswordduration)tohelpthemidentifysyntacticstructure,becauseprosodicandsyntacticstructuressometimescoincide.Morerecently,weproposed(PateandGoldwater,2011)thatinfantsmightuseworddurationasadi-rectcuetosyntacticstructure(i.e.,withoutrequir-ingintermediateprosodicstructure),becausewordsinhigh-probabilitysyntacticstructurestendtobepronouncedmorequickly(GahlandGarnsey,2004;Gahletal.,2006;Tilyetal.,2009).Likemostrecentworkonunsupervisedparsing,wefocusonlearningsyntacticdependencies.OurworkisbasedonHeaddenetal.(2009)’sBayesianversionoftheDependencyModelwithValence(DMV)(KleinandManning,2004),usinginterpo-latedbackofftechniquestoincorporatemultipleinfor-mationsourcespertoken.However,whereasHead-denetal.usedwordsandPOStagsasinput,weusewordsandworddurationinformation,presentingthreevariantsoftheirmodelthatusethisinformationinslightlydifferentways.11Byusingneithergold-standardnorlearnedPOStagsasinput,ourworkdiffersfromnearlyallpreviousworkonunsuper-viseddependencyparsing.Whilelearnedtagsmightbeplausible
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
64
Toourknowledge,thisisthefirstworktoincor-porateacousticcuesintoanunsupervisedsystemforlearningfullsyntacticparses.Themethodsinthispaperwereinspiredbyourpreviousapproach(PateandGoldwater,2011),whichshowedthatworddura-tionmeasurementscouldimprovetheperformanceofanunsupervisedlexicalizedsyntacticchunkeroverawords-onlybaseline.However,thatworkwaslim-itedtoHMM-likesequencemodels,testedonadult-directedspeech(ADS)only,andnoneofthemodelsoutperformeduniform-branchingbaselines.Here,weextendourresultstofulldependencyparsing,andexperimentontranscriptsofbothspontaneousADSandchild-directedspeech(CDS).Ourmodelsus-ingworddurationoutperformwords-onlybaselines,alongwiththeCommonCoverLinkparserofSeginer(2007),andtheUnsupervisedPartialParserofPon-vertetal.(2011),unsupervisedlexicalizedparsersthathaveobtainedstate-of-the-artresultsonstandardnewswiretreebanks(thoughtheirperformancehereisworse,asourinputlackspunctuation).Wealsooutperformuniform-branchingbaselines.2SyntaxandWordDurationBeforepresentingourmodelsandexperiments,wefirstdiscusswhyworddurationmightbeausefulcuetosyntax.Thissectionreviewsthetwopossiblerea-sonsmentionedabove:durationasacuetoprosodicstructure,orasacuetopredictability.2.1ProsodicBootstrappingProsodyisthestructureofspeechasconveyedbyrhythmandintonation,whichare,inturn,conveyedbysuchmeasurablephenomenaasvariationinfun-damentalfrequency,wordduration,andspectraltilt.Prosodicstructureistypicallyanalyzedasimposingashallow,hierarchicalgroupingstructureonspeech,withtheendsofprosodicphrases(constituents)be-ingcuedinpartbylengtheningthelastwordofthephrase(BeckmanandPierrehumbert,1986).TheProsodicBootstrappinghypothesis(Gleit-manandWanner,1982)pointsoutthatprosodicphrasesareoftenalsosyntacticphrases,andproposesthatlanguage-acquiringinfantsexploitthiscorrela-tion.Specifically,ifinfantscanlearnaboutprosodicphrasestructureusingwordduration(andfundamen-inamodeloflanguageacquisition,goldtagscertainlyarenot.talfrequency),theymaybeabletoidentifysyntacticphrasesmoreeasilyusingwordstringsandprosodictreesthanusingwordstringsalone.Severalbehavioralexperimentssupportthecon-nectionbetweenprosodyandsyntaxandtheprosodicbootstrappinghypothesisspecifically.Forexample,thereisevidencethatadultsuseprosodicinformationforsyntacticdisambiguation(Millotteetal.,2007;Priceetal.,1991)andtohelpinlearningthesyntaxofanartificiallanguage(Morganetal.,1987),whileinfantscanuseacoustic-prosodiccuesforutterance-internalclausesegmentation(Seidl,2007).Onthecomputationalside,weareawareofonlyourpreviousHMM-basedchunkers(PateandGold-water,2011),whichlearnedshallowsyntaxfromwords,wordsandworddurations,orwordsandhand-annotatedprosody.Usingthesechunkers,wefoundthatusingwordsplusprosodicannotationworkedbetterthanjustwords,andwordsplusworddurationworkedevenbetter.Whiletheseresultsareconsistentwiththeprosodicbootstrappinghypothesis,wesug-gestedthatpredictabilitybootstrapping(seebelow)mightbeamoreplausibleexplanation.Othercomputationalworkhascombinedprosodywithsyntax,butonlyinsupervisedsystems,andtypi-callyusinghand-annotatedprosodicinformation.Forexample,HuangandHarper(2010)usedannotatedprosodicbreaksasakindofpunctuationinasu-pervisedPCFG,whileprosodicbreakslearnedinasemi-supervisedwayhavebeenusedasfeaturesforparsereranking(Kahnetal.,2005)orPCFGstate-splitting(DreyerandShafran,2007).Incontrasttothesemethods,ourapproachobservesneitherparsetreesnorprosodicannotations.2.2PredictabilityBootstrappingOnthebasisofourHMMchunkers,weintroducedthepredictabilitybootstrappinghypothesis(PateandGoldwater,2011):theideathatworddurationscouldbeausefulcuetosyntacticstructurenot(ornotonly)becausetheyprovideinformationaboutprosodicstructure,butbecausetheyareadirectcuetosyntac-ticpredictability.Itiswell-establishedthattalkerstendtopronouncewordsmorequicklywhentheyaremorepredictable,asmeasuredby,e.g.,wordfrequency,n-gramprobability,orwhetherornotthewordhasbeenpreviouslymentioned(AylettandTurk,2004;Belletal.,2009).Tuttavia,syntacticproba-
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
65
youthrewitrightatthebasketFigure1:Exampleunlabeleddependencyparse.bilityalsoseemstomatter,withstudiesshowingthatverbstendtobepronouncedmorequicklywhentheyareintheirpreferredsyntacticframe—transitivevs.intransitiveordirectobjectvs.sententialcomple-ment(GahlandGarnsey,2004;Gahletal.,2006;Tilyetal.,2009).Whilethissyntacticevidenceisonlyforverbs,togetherwiththeevidencethateffectsofothernotionsofpredictability,itsuggeststhatsuchsyntacticeffectsmayalsobewidespread.Ifso,thedurationofawordcouldgivecluesastowhetheritisbeingusedinahigh-probabilityorlow-probabilitystructure,andthuswhatthecorrectstructureis.Wefoundthatoursyntacticchunkersbenefitedmorefromdurationinformationthanprosodican-notations,providingsomepreliminaryevidenceinfavorofpredictabilitybootstrapping,butnotrulingoutprosodicbootstrapping.So,weareleftwithtwoplausiblemechanismsbywhichworddurationcouldhelpwithlearningsyntax.Slowpronunciationsmaycuetheendofaprosodicphrase,whichissometimesalsotheendofasyntacticphrase.Alternatively,slowpronunciationsmayindicatethatthehiddensyntacticstructureislowprobability,facilitatingtheinduc-tionofaprobabilisticgrammar.Thispaperwillnotseektodeterminewhichmechanismisuseful,insteadtakingthepresenceoftwopossiblemechanismsasencouragingfortheprospectofincorporatingworddurationintounsupervisedparsing.3Models2Asmentioned,wewillbeincorporatingworddura-tionintounsuperviseddependencyparsing,produc-inganalysesliketheoneinFigure1.Eacharcisbetweentwowords,withtheheadatthenon-arrowendofthearc,andthedependentatthearrowend.Oneword,theroot,dependsonnoword,andallotherwordsdependonexactlyoneword.Followingpreviousworkonunsuperviseddependencyparsing,wewillnotlabelthearcs.2Theimplementationofthesemodelsisavailableathttp://github.com/jpate/predictabilityParsing3.1DependencyModelwithValenceAllofourmodelsareultimatelybasedontheDe-pendencyModelwithValence(DMV)ofKleinandManning(2004),agenerative,probabilisticmodelforprojective(i.e.nocrossingarcs),unlabeledde-pendencyparses,suchastheoneinFigure1.TheDMVgeneratesdependencyparsesusingthreeprobabilitydistributions,whichtogethercom-prisemodelparametersθ.First,therootofthesentenceisdrawnfromProot.Second,wedecidewhethertostopgeneratingdependentsoftheheadhindirectiondir∈{left,right}withprobabilityPstop(·|H,dir,v),wherevisTifhhasadir-warddependentandFotherwise.Ifwedecidetostop,thenhtakesnomoredependentsinthedirectionofdir.Ifwedon’tstop,weusethethirdprobabilitydistributionPchoose(D|H,dir)todeterminewhichde-pendentdtogenerate.Thesecondandthirdsteprepeatforeachgeneratedworduntilallwordshavestoppedgeneratinginbothdirections.TheDMVwasthefirstunsupervisedparsingmodeltooutperformauniform-branchingbaselineontheWallStreetJournalcorpus.ItwastrainedusingEMtoobtainamaximum-likelihoodestimateoftheparametersθ,andlearnedfromPOStagstoavoidrareevents.However,allworkonsyntacticpredictabilityeffectsonworddurationhasbeenlexi-calized(lookingat,e.g.,thetransitivitybiasofpar-ticularverbs).Inaddition,itisunlikelythatchildrenhaveaccesstothecorrectpartsofspeechwhenfirstlearningsyntacticstructure.Thus,wewantaDMVvariantthatlearnsfromwordsratherthanPOStags.WethereforeadoptseveralextensionstotheDMVduetoHeaddenetal.(2009),describednext.3.2TheDMVwithBackoffHeaddenetal.(2009)soughttoimprovetheDMVbyincorporatinglexicalinformationinadditiontoPOStags.However,arcsbetweenparticularwordsarerare,sotheymodifiedtheDMVintwowaystodealwiththissparsity.First,theyswitchedfromMLEtoaBayesianapproach,estimatingaprobabilitydistribu-tionovermodelparametersθanddependencytreesTgiventhetrainingcorpusCandapriordistributionαovermodels:P(T,θ|C,α).Headdenetal.avoidedoverestimatingtheproba-bilityofrareeventsthathappentooccurinthetrain-
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
66
ingdatabypickingαtoassignlowprobabilitytomodelsθwhichgivehighprobabilitytorareevents.Accordingly,modelsthatovercommittorareeventswillcontributelittletothefinalaverageovermodels.Specifically,Headdenetal.useDirichletpriors,withαbeingtheDirichlethyperparameters.Headdenetal.’ssecondinnovationwastoadaptin-terpolatedbackoffmethodsfromlanguagemodelingwithn-grams,whereonecanestimatetheprobabil-ityofwordwngivenwordwn−1byinterpolatingbetweenunigramandbigramprobabilityestimates:ˆP(wn|wn−1)=λP(wn|wn−1)+(1−λ)P(wn)withλ∈[0,1].Ideally,λshouldbelargewhenwn−1isfrequent,andsmallwhenwn−1israre.Headdenetal.(2009)applythismethodtotheDMVbybackingofffromChooseandStopdistributionsthatconditiononbothheadwordandPOStodistributionsthatconditionononlytheheadPOS.Intheequationabove,λisascalarparameter.However,itactuallyspecifiesaprobabilitydistri-butionoverthedecisiontobackoff(B)ornotbackoff(¬B),andwecanusedifferentnotationtoreflectthisview.Specifically,λstop(·)andλchoose(·)willrepresentourbackoffdistributionsfortheStopandChoosedecision,respectively.UsinghpanddptorepresentheadanddependentPOStagandhwanddwtorepresentheadanddependentword,oneofthemodelsHeaddenetal.exploredestimates:ˆPchoose(dp|hw,hp,dir,val)=λchoose(¬B|hw,hp,dir)Pchoose(dp|hw,hp,dir)+λchoose(B|hw,hp,dir)Pchoose(dp|hp,dir)(1)withananalogousbackoffforPstop.WecanseefromEquation1thatˆPchoosebacksofffromadis-tributionthatconditionsonhwtoadistributionthatmarginalizesouthw,andthattheextentofbackoffvariesacrosshw;wecanusethistobackoffmorewhenwehavelessevidenceabouthw.Thismodelonlyconditionsonwords;itdoesnotgeneratetheminthedependents.Thismeansitisactuallyacondi-tional,ratherthanfullygenerative,modelofobservedPOStagsandunobservedsyntaxconditionedontheobservedwords.SinceidentifyingthetrueposteriordistributionP(T,θ|C,α)isintractable,Headdenetal.useMean-fieldVariationalBayes(KuriharaandSato,2006;Johnson,2007),whichfindsanapproximationtotheposteriorusinganiterativeEM-likealgorithm.IntheE-stepofVBEM,expectedcountsE(ri)aregatheredforeachlatentvariableusingtheInside-Outsidealgo-rithm,exactlyasintheE-stepoftraditionalEM.TheMaximizationstepdiffersfromtheM-StepofEMintwoways.First,theexpectedcountsforeachvalueofthelatentvariableriareincrementedbythehy-perparameterαi.Second,thenumeratoranddenom-inatorarescaledbythefunctionexp(ψ(·)),whichreducestheprobabilityofrareevents.Specifically,thePchoosedistributionisestimatedusingexpecta-tionsforeacharcadp,H,dirfromheadhtodependentPOStagdpindirectiondir,andtheupdateequationforPchoosefromiterationnton+1is:ˆPn+1choose(dp|H,dir)=exp(ψ(In(adp,H,dir)+αdp,H,dir))esp(ψ(Pc(In(ac,H,dir)+αc,H,dir)))(2)wherehistheheadPOStagforthebackoffdistri-bution,andthehead(word,POS)pairforthenobackoffdistribution.TheupdateequationforPstopisanalogous.Nowconsidertheupdateequationsforλchoose:ˆλn+1choose(¬B|hw,hp,dir)=exp(ψ(α¬B+Pc(In(ac,hw,hp,dir))))esp(ψ(αB+α¬B+Pc(In(ac,hw,hp,dir))))ˆλn+1choose(B|hw,hp,dir)=exp(ψ(αB))esp(ψ(αB+α¬B+Pc(In(ac,hw,hp,dir))))Onlythe¬Bnumeratorincludestheexpectedcounts,soasweseehwindirectiondirmoreoften,the¬BnumeratorwillswamptheBnumerator.BypickingαBlargerthanα¬B,wecanbiasourλdistributiontopreferbackingoffuntilweexpectatleastαB−α¬Barcsoutofhwwithtaghpinthedirectionofdir.Toobtaingoodperformance,Headdenetal.re-placedeachwordthatappearedfewerthan100timesinthetrainingdatawiththetoken“UNK.”WewillalsousesuchanUNKcutoff.3.3DMVwithDurationWeexplorethreemodels.OneisastraightforwardapplicationoftheDMVwithBackofftowordsand
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
67
(quantized)wordduration,andtheothertwoarefully-generativevariants.WealsoconsiderusingwordsandPOStagsasinputtothesemodels.Backoffmod-elsaregiventwostreamsofinformation,providingtwoofwordidentity,POStag,orworddurationforeachobservedtoken.Wecallonestreamthe“back-off”stream,andtheotherthe“extra”stream.Backoffmodelslearnaprobabilitydistributionconditioningonbothstreams,backingofftoconditionononlythebackoffstream.Ourfirstwordsanddurationmodeltakesthedu-rationastheextrastreamandthewordidentityasthebackoffstream,E,usinghatorepresenttheacousticinformationforthehead,defines:ˆPchoose(dw|hw,ha,dir)=λchoose(¬B|hw,ha,dir)Pchoose(dw|hw,ha,dir)+λchoose(B|hw,ha,dir)Pchoose(dw|hw,dir)(3)withananalogousbackoffschemeforPstop.Wewillrefertothisconditionalmodelas“Cond.”inourexperiments.ThisequationissimilartoEquation1,exceptituseswordsanddurationinsteadofwordsandPOStags,andbacksoffto,notawayfrom,words.Webackofftothesparsewords,ratherthanthelesssparseduration,becausedurationprovidesalmostnoinformationaboutsyntaxinisolation.3Directlymodellingtheextrastreamamongthedependentsmayallowustocaptureselectionalre-strictionsinPOSandwordsmodels,orexploitef-fectsofsyntacticpredictabilityondependentdura-tion.Wethereforeexplorevariantsthatgeneratebothstreamsinthedependents.First,weexamineamodel(“Joint”)thatgeneratesthemjointly:ˆPchoose(dw,da|hw,hp,dir)=λchoose(¬B|hw,ha,dir)Pchoose(dw,da|hw,ha,dir)+λchoose(B|hw,ha,dir)Pchoose(dw,da|hw,dir)(4)Tuttavia,thisjointmodelwillhaveaverylargestate-spaceandmaysufferfromthesamedatasparsity,sowealsoexploreamodel(“Indep.”)thatgeneratesthe3Preliminarydev-setexperimentsconfirmedthisintuition,asmodelsthatbackedofftoworddurationperformedpoorly.extraandbackoffindependently:ˆPchoose(dw,da|hw,hp,dir)=λchoose(¬B|hw,ha,dir)Pchoosebackoff(dw|hw,ha,dir)Pchooseextra(da|hw,ha,dir)+λchoose(B|hw,ha,dir)Pchoosebackoff(dw|hw,dir)Pchooseextra(da|hw,dir)(5)WealsomodifiedtheDMVwithBackofftohandleheavilylexicalizedmodels.InHeaddenetal.(2009),arcsbetweenwordsthatneverappearinthesamesentencearegivenprobabilitymassonlybyvirtueofthebackoffdistributiontoPOStags,whichallappearinthesamesentenceatleastonce.WewanttoavoidrelyingonPOStags,andwealsowanttouseheld-outdevelopmentandtestsetstoavoidimplicitlyoverfittingthedatawhenexploringdifferentmodelstructures.Tothisend,weaddoneextraαUNKhyper-parametertotheDirichletpriorofPchooseforeachcombinationofconditioningevents.Thishyperpa-rameterreservesprobabilitymassforaheadhtotakeaworddwasadependentifhanddwneverappearedtogetherinthetrainingdata.Theamountofprobabil-itymassreserveddecreasesasweseehwmoreoften.ThisisimplementedintrainingbyaddingαUNKtothedenominatorofthePchooseupdateequationforeachhanddir.Attesttime,ifaworddwappearsasanunseendependentforheadh,htakesdwasadependentwithprobability:ˆPchoose(dw|H,dir)=(6)esp(ψ(αUNK))esp(ψ(αUNK+Pc(Elast(rc,H,dir)+αc,H,dir)))Here,hmaybeaword,(word,POS)pair,O(word,duration)pair.Sincethiseventbydefinitionneveroccursinthetrainingdata,αUNKdoesnotappearinthenumeratorduringtraining.4Finally,theconditionalmodelignorestheextrastreaminProot,andthegenerativemodelsestimate4NotealsothatαUNKisdifferentfromaglobalUNKcutoff,whichisimposedinpreprocessing,andsoeffectseveryoccur-renceofananUNK’dwordinthemodel.αUNKaffectsonlydependentsinPchoose,andtreatsadependentasUNKiffitdidnotoccuronthatparticularsideofthatparticularheadwordinanysentence.WeusedbothglobalUNKcutoffs(optimizedonthedevset)andtheseαUNKhyperparameters.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
68
TrainDevTestwsj10Wordtokens42,5051,7652,571Wordtypes7,8048181,134Sentences6,007233357swbdnxt10Wordtokens24,9982,9803,052Wordtypes2,647760767Sentences3,998488491brentWordtokens20,9542,1272,206Wordtypes1,390482488Sentences6,249424449Table1:Statisticsforourthreecorpora.Prootoverbothstreamsjointlyandindependently,respectively.4ExperimentalSetup4.1DatasetsWeevaluateonthreedatasets:wsj10,sentencesoflength10orlessfromtheWallStreetJournalpor-tionofthePennTreebank;swbdnxt10,sentencesoflength10orlessfromtheSwitchboarddatasetofADSusedbyPateandGoldwater(2011);andbrent,partoftheBrentcorpusofCDS(BrentandSiskind,2001).Table1presentscorpusstatistics.4.1.1wsj10WepresentanewevaluationoftheDMVwithBackoffonwsj10,whichdoesnothaveanyacous-ticinformation,simplytoverifythatαUNKperformssensiblyonastandardcorpus.Additionally,Headdenetal.(2009)useanintensiveinitializerthatreliesondozensofrandomrestarts,andso,strictlyspeaking,onlyshowthatthebackofftechnologyisusefulforgoodinitializations.Ournewevaluationwillshowthatthebackofftechnologyprovidesasubstantialbenefitevenforharmonicinitialization.wsj10wascreatedinthestandardway;allpunc-tuationandtraceswereremoved,andsentencescon-tainingmorethantentokenswerediscarded.Forourfullylexicalizedversionofwsj10,allwordswerelowercased,andnumberswerereplacedwiththetoken“NUMBER.”5Followingstandardpractice,weusedsections2-21fortraining,section22fordevelopment,andsection23fortest.wsj10con-tainshand-annotatedconstituencyparses,notdepen-dencyparses,soweusedthestandard“constituency-5Numbersweretreatedinthiswayonlyinwsj10.to-dependency”conversiontoolofJohanssonandNugues(2007)toobtainhigh-qualityCoNLL-styledependencyparses.4.1.2swbdnxt10Next,weevaluateonswbdnxt10,whichcon-tainsallsentencesuptolength10fromthesamesectionsoftheswbdnxtversionofSwitchboardusedbyPateandGoldwater(2011).Shortsentencesareusuallyformulaicdiscourseresponses(e.g.“ohok”),sothisdatasetalsoexcludessentencesshorterthanthreewords.Asourmodelssuccessfullyuseworddurations,thisevaluationprovidesanimportantreplicationofthebasicresultfromPateandGoldwa-ter(2011)withadifferentkindofsyntacticmodel.swbdnxt10hasaforcedalignmentofadictionary-basedphonetictranscriptionofeachut-terancetoaudio,providingourworddurationinfor-mation.Asaverysimplemodelofhyper-articulationandhypo-articulation,weclassifyawordasinthelongestthirdduration,shortestthird,ormiddlethird.Tominimizeeffectsofwordform,thisclassificationwasbasedonvowelcount(countingadiphthongasonevowel):eachwordwithnvowelsisclassifiedasintheshortest,longest,ormiddletercileofdurationamongwordswithnvowels.Likewsj10,swbdnxt10isannotatedonlywithconstituencyparses,sotoprovideapproximate“gold-standard”dependencies,weusedthesameconstituency-to-dependencyconversiontoolasforwsj10.Weevaluated200randomly-selectedsen-tencestochecktheaccuracyoftheconversiontool,whichwasdesignedfornewspapertext.Excludingarcsinvolvingwordswithnoclearroleindepen-dencystructure(suchas“um”),about86%ofthearcswerecorrect.Whilethisrateisuncomfortablylow,itisstillmuchhigherthanunsuperviseddepen-dencyparserstypicallyachieve,andsomayprovideareasonablemeasureofrelativedependencyparsequalityamongcompetingsystems.4.1.3brentWealsoevaluatedourmodelsonthe“LargeBrent”datasetintroducedinRyttingetal.(2010),apor-tionoftheBrentcorpusofchild-directedspeech(BrentandSiskind,2001).Wecallthiscorpusbrent.ItconsistsofutterancesfromfourofthemothersinBrentandSiskind’s(2001)study,E,like
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
T
l
UN
C
_
UN
_
0
0
2
1
0
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
69
swbdnxt10,hasaforcedalignmentfromwhichweobtaindurationterciles.Ryttingetal.(2010)useda90%/10%train/testpartition.Weextractedeveryninthutterancefromtheoriginaltrainingpartitiontocreateadevset,producingan80%/10%/10%parti-tion.Wealsoseparatedcliticsfromtheirbaseword.Thisdatasetonlyhas186sentenceslongerthantenwords,withamaximumlengthof22words,sowediscardedonlysentencesshorterthanthreewordsfromtheevaluationsets.TheBrentcorpusisdistributedviaCHILDES(MacWhinney,2000)withautomaticdependencyan-notations.However,thesearenothand-corrected,andrelyonadifferenttokenizationofthedatasetthanispresentonthetranscriptiontier.Toproduceareliablegold-standard,6weannotatedallsentencesoflength2orgreaterfromthedevelopmentandtestsetswithdependenciesdrawnfromtheStanfordTypedDependencyset(deMarneffeandManning,2008)usingtheannotationtoolusedfortheCopenhagenDependencyTreebank(Kromann,2003).4.2ParametersInallexperiments,hyperparametersforProot,Pstop,andPchoose(andtheirbacked-offdistributions,andincludingαUNK)were1,αBwas10,andα¬Bwas1.VBEMwasrunonthetrainingsetuntilthedatalog-likelihoodchangedbylessthan0.001%,andthentheparameterswereheldfixedandusedtoobtainViterbiparsesfortheevaluationsentences.Finally,weexploreddifferentglobalUNKcutoffs,replacingeachwordthatappearedlessthanctimeswiththetokenUNK.Weraneachmodelforeachc∈{0,1,25,50,100},andpickedthebest-scoringconthedevelopmentsetforrunningonthetestsetandpresentationhere.WeusedaharmonicinitializersimilartotheoneinKleinandManning(2004).4.3EvaluationInadditiontoevaluatingthevariousincarnationsoftheDMVwithbackoffandinputtypes,wecomparetouniformbranchingbaselines,theCommonCoverLink(CCL)parserofSeginer(2007),andtheUnsu-pervisedPartialParser(UPP)ofPonvertetal.(2011).TheUPPproducesaconstituencyparsefromwordsandpunctuationusingaseriesoffinite-statechun-6Availableathttp://homepages.inf.ed.ac.uk/s0930006/brentDep/kers;weusethebest-performing(ProbabilisticRightLinearGrammar)version.TheCCLparserproducesaconstituencyparseusinganovel“CoverLink”rep-resentation,scoringtheselinksheuristically.BothCCLandUPPrelyonpunctuation(thoughaccordingtoPonvertetal.(2011),UPPlessso),whichourin-putismissing.Theleft-headed“LH”(right-headed“RH”)baselineassumesthateachwordtakesthefirstwordtoitsright(left)asadependent,andcorre-spondstoauniformright-branching(left-branching)constituencybaseline.Weevaluatetheoutputofallmodelsintermsofbothconstituencyscoresanddependencyaccu-racy.Ourwsj10andswbdnxt10corporaareoriginallyannotatedforconstituencystructure,withthedependencygoldstandardderivedasdescribedabove,whileourbrentcorpusisoriginallyanno-tatedfordependencystructure,withtheconstituencygoldstandardderivedbydefiningaconstituenttospanaheadandeachofitsdependents(ignoringanyone-word“constituents”).AstheCCLandUPPparsersdon’tproducedependencies,onlycon-stituencyscoresareprovided.Forconstituencyscores,wepresentthestandardunlabeledPrecision,Recall,andF-measurescores.Fordependencyscores,wepresentDirectedattach-mentaccuracy,Undirectedattachmentaccuracy,andthe“NeutralEdgeDetection”(NED)scoreintro-ducedbySchwartzetal.(2011).Directedattachmentaccuracycountsanarcasatruepositiveifitcorrectlyidentifiesbothaheadandadependent,whereasundi-rectedattachmentaccuracyignoresarcdirectionincountingtruepositives.NEDcountsanarcasatruepositiveifitwouldbeatruepositiveundertheUndi-rectedattachmentscore,oriftheproposedheadisthegold-standardgrandparentoftheproposeddepen-dent.Thisavoidspenalizingparsesforflippinganarc,suchasmakingdeterminers,ratherthannouns,theheadofnounphrases.Toassessstatisticalsignificance,wecarriedoutstratifiedshufflingtests,with10,000randomshuf-fles,forallmeasures.Tablesindicatesignificancedifferencesbetweenthebackoffmodelsandthemostcompetitivebaselinemodelonthatmeasure,indi-catedbyanitalicscore.Astar(∗)indicatesp<0.05,andadagger(cid:0)†(cid:1)indicatesp<0.01.Toseethedi-rectionofasignificantdifference(i.e.whetherthebackoffmodelisbetterorworsethanthebaseline),
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
t
l
a
c
_
a
_
0
0
2
1
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
70
wsj10swbdnxt10DependencyConstituencyDependencyConstituencyUNKDir.Undir.NEDPRFUNKDir.Undir.NEDPRFEMWds2532.552.567.049.548.549.02530.650.966.845.447.146.3POS—46.463.878.159.258.158.6—53.065.076.852.552.952.7VBWds2529.452.470.551.352.652.02536.154.972.749.050.049.5POS—43.561.977.359.757.158.4—51.362.574.347.146.646.8Wds+POSCond.5049.9†66.1†79.6∗64.2†61.9†63.0†10045.5†62.4†77.858.4†58.9†58.7†Joint5046.063.779.062.0†59.160.5∗149.4†63.779.6†60.0†52.956.3†Indep.2552.5†68.0†83.5†63.5†61.5†62.5†10055.7†65.874.6†61.5†57.9†59.6†LH—26.055.874.353.169.660.3—24.150.872.760.882.570.0RH—31.256.461.425.833.829.3—29.252.057.922.230.125.5CCL————50.840.745.2————53.647.450.3UPP————52.837.243.7————60.046.652.4Table2:Performanceonwsj10andswbdnxt10formodelsusingwordsandPOStagsonly.Boldscoresindicatethebestperformanceofallmodelsandbaselinesonthatmeasure.†Significantlydifferentfrombestnon-uniformbaseline(italics)byastratifiedshufflingtest,P<0.01;∗:p<0.05.looktothescoresthemselves.5ResultsInallresults,whenamodelseesonlyonekindofinformation,thatisexpressedbywritingouttheab-breviationfortherelevantstream:“Wds”forwords,“POS”forPart-Of-Speech,“Dur”forwordduration.Forbaselinemodelsthatseetwostreams,theabbre-viationsarejoinedbya“×”symbol(astheytreatinputpairsasatomsdrawninthecross-productofthetwostreams’vocabulary).Forthebackoffmodels,theabbreviationsarejoinedbya“+”symbol(astheycombinetheinformationsourceswithaweightedsum),withthe“extra”streamnamefirst.5.1Results:wsj10ThelefthalfofTable2presentsresultsonwsj10.Forthebaselinemodels,thefirstcolumnwithhori-zontaltextindicatestheinput,whileforthebackoff(Wds+POS)models,thefirstcolumnwithhorizontaltextindicateswhetherandhowtheextrastreamismodeledindependents(asdescribedinSection3.3).TheEMmodelwithPOSinputislargelyarepli-cationoftheoriginalDMV,differingintheuseofseparatetrain,dev,andtestsets,andpossiblythedetailsoftheharmonicinitializer.Ourreplicationachievesanundirectedattachmentscoreof63.8onthetestset,similartothescoreof64.5reportedbyKleinandManning(2004)whentrainingandevalu-atingonallofwsj10.Cohenetal.(2008)usethesametrain/dev/testpartitionthatwedo,andreportadirectedattachmentscoreof45.8,similartoourdirectedattachmentscoreof46.4.TheVBmodelwhichlearnsfromPOStagsdoesnotoutperformtheEMmodelwhichlearnsfromPOStags,suggestingthatdatasparsitydoesnothurttheDMVwhenusingPOStags.Asexpected,thewords-onlymodelsperformmuchworsethanboththePOSinputmodelsandtheuniformLHbaseline.VBdoesimprovethewords-onlyconstituencyperformance.TheCond.andIndep.backoffmodelsoutperformthePOS-onlybaselineonallmeasures,buttheJointbackoffmodeldoesnotdemonstrateaclearadvan-tageoverthePOS-onlybaselineonanymeasure.ThesuccessoftheIndep.modelindicatesthatmodellingdependentwordidentitydoesprovideenoughinfor-mationtojustifytheincreaseinsparsity.ThefailureoftheJointmodeltoprovideafurtherimprovementindicatesthattheextrainformationinthefulljointoverdependentsdoesnotjustifythelargeincreaseinparameters.Wealsoseethatseveralmodelsout-performtheLHbaselineondependencies,buttheadvantageismuchlessinF-Score,underscoringthelossofinformationintheconversionofdependen-ciestoconstituencies.Finally,allmodelsoutperformCCLandUPPonF-score,emphasizingtheirrelianceonthepunctuationweremoved.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
t
l
a
c
_
a
_
0
0
2
1
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
71
DependencyConstituencyUNKDir.Undir.NEDPRFEMWds2530.650.966.845.447.146.3Wds×Dur2526.146.562.045.648.747.1VBWds2536.455.173.049.150.049.6Wds×Dur2531.851.771.349.255.952.3Dur+WdsCond.2532.6†55.174.5†59.1†71.4†64.7†Joint5031.8†51.8†70.8∗54.4†60.5†57.3†Indep.5040.3†59.1†76.0†56.1†61.7†58.8†LH—24.150.872.760.882.570.0RH—29.252.057.922.230.125.5CCL————53.647.450.3UPP————60.046.652.4ll4550556045505560657075Switchboard Model PerformanceUndirected Attachment ScoreConstituency F−scorellWdsWdsxDurCond.JointIndep.LHTable3:Performanceonswbdnxt10formodelsusingwordsandduration.Thescatterplotincludesasubsetoftheinformationinthetable:F-scoreandundirectedattachmentaccuracyforbackoffmodelsandVBandLHbaseline.Bold,italics,andsignificanceannotationsasinTable2.5.2Results:swbdnxt10TherighthalfofTable2presentsperformancefig-uresonswbdnxt10forinputinvolvingwordsandPOStags.Asexpected,theEMandVBbaselinesperformbestwhenlearningfromgold-standardPOStags,andweagainseenobenefitfortheVBPOS-onlymodelcomparedtotheEMPOS-onlymodel.ThePOS-onlybaselinesfaroutperformtheuniform-attachmentbaselinesonthedependencymeasures;toourknowledgethisisthefirstdemonstrationoutsidethenewspaperdomainthattheDMVoutperformsauniformbranchingstrategyonthesemeasures.TheothercomparisonsamongsystemslistedinTable2arelargelyinconclusive.Modelsdocom-parativelywelloneithertheconstituencyordepen-dencyevaluation,butnotboth.Thebackoffmod-elsoutperformthebaselinePOS-onlymodelsintheconstituencyevaluation,butunderperformormatchthosesamemodelsinthedependencyevaluation.Conversely,mostmodelsoutperformtheLHbase-lineinthedependencyevaluation,butnotintheconstituencyevaluation.Thereareprobablytwocausesfortheambiguityintheseresults.First,thenoiseinthedependencygold-standardmayhaveover-whelmedanyadvantagefrombackoff.Second,aswesawwithwsj10,theconversionfromdependenciestoconstituenciesremovesinformation,whichmayexplainthefailureofanymodeltooutperformtheLHbaselineintheconstituencyevaluation.Table3presentsperformancefiguresonswbdnxt10forinputinvolvingwordsandduration,includingascatter-plotofUndirectedattachmentagainstconstituencyF-Scorefortheinterestingcomparisons.Inthescatter-plot,modelsupandtotherightperformedbetter,andweseethatthenegativecorrelationbetweenthedependencyandconstituencyevaluationspersistsinwordsanddura-tioninput.VBsubstantiallyoutperformsEMinthebaselines,indicatingthatgoodsmoothingishelpfulwhenlearningfromwords.Othercomparisonsareagainambiguous;thedependencyevaluationisnoisy,andbackoffmodelsoutperformbaselinemodelsontheconstituencyevaluationbutnottheLHbaseline.Still,thebackoffmodelsoutperformallwords-onlybaselinesinconstituencyscore,withtwoperformingslightlyworseindependencyscoreandoneperformingmuchbetter.Sothereissomeevidencethatworddurationisuseful,butwewillfindclearerevidenceonthebrentcorpus.5.3Results:brentTable4presentsresultsonthebrentdataset.VBisevenmoreeffectivethanintheotherdatasetsforimprovingperformanceamongbaselinemodels,lead-ingtodouble-digitimprovementsonsomemeasures.Moreover,thebestdev-setUNKcutoffdropsto1forallVBmodels,indicatingthat,onthisdataset,VBprovidesgoodsmoothingeveninmodelswithoutbackoff.Thisdifferencebetweendatasetsislikelyrelatedtodifferencesinvocabularydiversity;the
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
t
l
a
c
_
a
_
0
0
2
1
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
72
DependencyConstituencyUNKDir.Undir.NEDPRFEMWds2536.956.370.752.469.559.8Wds×Dur2531.351.166.950.764.756.9VBWds151.264.277.363.368.166.0Wds×Dur147.060.574.066.264.965.5Dur+WdsCond.153.1∗65.5∗78.7∗65.468.667.0∗Joint150.763.076.365.665.4†65.5Indep.153.266.7†79.6†61.5†67.964.5LH—28.353.678.347.985.661.4RH—27.248.861.126.246.833.6CCL————41.758.848.8UPP————56.863.860.1ll5055606570606264666870Brent Model PerformanceUndirected Attachment ScoreConstituency F−scorellWdsWdsxDurCond.JointIndep.LHTable4:Performanceonbrentformodelsusingwordsandduration.Thescatterplotincludesasubsetoftheinformationinthetable:F-scoreandundirectedattachmentaccuracyforbackoffmodelsandVBandLHbaseline.Bold,italics,andsignificanceannotationsasinTable2.type:tokenratiointhebrenttrainingsetisabout1:15,comparedto1:5and1:9inthewsj10andswbdnxt10trainingsets,respectively.Moreimportantlyforourmainhypothesis,allthreebackoffmodelsusingwordsanddurationout-performthewords-onlybaselines(includingCCLandUPP)onalldependencymeasures—themostaccuratemeasuresonthiscorpus,whichhashand-annotateddependencies—andtheCond.modelalsowinsonF-score.6ConclusionInthispaper,weshowedhowtousetheDMVwithBackoffandtwofully-generativevariantstoexploretheutilityofworddurationinfullylexicalizedun-superviseddependencyparsing.Althoughotherre-searchershaveincorporatedfeaturesbeyondwordsandPOStagsintoDMV-likemodels(e.g.,semantics:NaseemandBarzilay(2011);morphology:Berg-Kirkpatricketal.(2009)),webelievethisisthefirstexamplebasedonHeaddenetal.(2009)’sbackoffmethod.Asfarasweknow,ourworkisalsothefirsttestofaDMV-basedmodelontranscribedconver-sationalspeechandthefirsttooutperformuniform-branchingbaselineswithoutusingeitherPOStagsorpunctuationintheinput.Ourresultsshowthatfully-lexicalizedmodelscandowelliftheyaresmoothedproperlyandexploitmultiplecues.OurexperimentsalsosuggestthatCDSisespe-ciallyeasytolearnfrom.Modelperformanceonthebrentdatasetwasgenerallyhigherthanonswbdnxt10,withamuchlowerUNKthreshold.Thislatterpoint,andthefactthatbrenthasamuchlowerwordtype/tokenratiothantheotherdatasets,suggestthatCDSprovidesmoreandclearerevidenceaboutwords’syntacticbehavior.Finally,ourresultsprovidemoreevidence,usingadifferent,morepowerfulsyntacticmodelthanthatofPateandGoldwater(2011),thatworddurationisausefulcueforunsupervisedparsing.Wefoundthatseveralwaysofincorporatingdurationwereuse-ful,althoughtheextrasparsityofJointemissionswasnotjustifiedinanyofourinvestigations.Ourresultsareconsistentwithboththeprosodicandpre-dictabilitybootstrappinghypothesesoflanguageac-quisition,providingthefirstcomputationalsupportfortheseusingafullsyntacticparsingmodelandtestedonchild-directedspeech.Whileourmodelsdonotprovideamechanisticaccountofhowchildrenmightusedurationinformationtohelpwithlearningsyntax,theydoshowthatthisinformationisusefulinprinciple,evenwithoutanyknowledgeoflatentprosodicstructureoritsrelationshiptosyntax.Inad-dition,ourresultssuggestitmaybeusefultoexploreusingworddurationtoenrichNLPtasksinspeech-relatedtechnologies,suchassyntactically-inspiredlanguagemodelsfortext-to-speechgeneration.Inthefuture,wealsohopetoinvestigatewhydurationishelpful,designingexperimentstoteaseaparttheroleofprosodyandpredictabilityinlearningsyntax.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
t
l
a
c
_
a
_
0
0
2
1
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
73
ReferencesMatthewAylettandAliceTurk.2004.Thesmoothsignalredundancyhypothesis:Afunctionalexplanationforre-lationshipsbetweenredundancy,prosodicprominence,anddurationinspontaneousspeech.LanguageandSpeech,47(1):31–56.MaryBeckmanandJanetPierrehumbert.1986.Intona-tionalstructureinJapaneseandEnglish.PhonologyYearbook,3:255–309.AlanBell,JasonMBrenier,MichelleGregory,CynthiaGirand,andDanJurafsky.2009.Predictabilityeffectsondurationsofcontentandfunctionwordsinconver-sationalEnglish.JournalofMemoryandLanguage,60:92–111.TaylorBerg-Kirkpatrick,AlexandreBouchard-Cˆot´e,JohnDeNero,andDanKlein.2009.Painlessunsupervisedlearningwithfeatures.InProceedingsofNAACL.MichaelRBrentandJeffreyMSiskind.2001.Theroleofexposuretoisolatedwordsinearlyvocabularyde-velopment.Cognition,81:31–44.ChristosChristodoulopoulos,SharonGoldwater,andMarkSteedman.2012.Turningthepipelineintoaloop:IteratedunsuperviseddependencyparsingandPoSin-duction.InProceedingsoftheNAACL-HLTWorkshopontheInductionofLinguisticStructure,pages96–99,Montr´eal,Canada,June.AssociationforComputationalLinguistics.ShayBCohenandNoahASmith.2009.Sharedlo-gisticnormaldistributionsforsoftparametertyinginunsupervisedgrammarinduction.InProceedingsofNAACL.ShayBCohen,KevinGimpel,andNoahASmith.2008.Logisticnormalpriorsforunsupervisedprobabilisticgrammarinduction.InAdvancesinNeuralInformationProcessingSystems22.ShayBCohen,DipanjanDas,andNoahASmith.2011.Unsupervisedstructurepredictionwithnon-parallelmultilingualguidance.InProceedingsofEMNLP.Marie-CatherinedeMarneffeandChristopherDManning.2008.Stanfordtypeddependenciesmanual.Technicalreport.MarkusDreyerandIzhakShafran.2007.ExploitingprosodyforPCFGswithlatentannotations.InPro-ceedingsofInterspeech,Antwerp,Belgium,August.SusanneGahlandSusanMGarnsey.2004.Knowledgeofgrammar,knowledgeofusage:Syntacticprobabilitiesaffectpronunciationvariation.Language,80:748–775.SusanneGahl,SusanMGarnsey,CynthiaFisher,andLauraMatzen.2006.“Thatsoundsunlikely”:Syntac-ticprobabilitiesaffectpronunciation.InProceedingsofthe27thmeetingoftheCognitiveScienceSociety.LilaGleitmanandEricWanner.1982.Languageacqui-sition:Thestateoftheart.InEricWannerandLilaGleitman,editors,Languageacquisition:Thestateoftheart,pages3–48.CambridgeUniversityPress,Cam-bridge,UK.WillHeadden,MarkJohnson,andDavidMcClosky.2009.Improvedunsuperviseddependencyparsingwithrichercontextsandsmoothing.InProceedingsofNAACL-HLT.ZhongqiangHuangandMaryHarper.2010.Appropri-atelyhandledprosodicbreakshelpPCFGparsing.InProceedingsofNAACL-HLT,pages37–45,LosAnge-les,California,June.AssociationforComputationalLinguistics.RichardJohanssonandPierreNugues.2007.Extendedconstituent-to-dependencyconversionforEnglish.InProceedingsofNODALIDA2007.MarkJohnson.2007.Whydoesn’tEMfindgoodHMMPOS-taggers.InProceedingsofEMNLP-CoNLL,pages296–305.JeremyGKahn,MatthewLease,EugeneCharniak,MarkJohnson,andMariOstendorf.2005.Effectiveuseofprosodyinparsingconversationalspeech.InProceed-ingsofHLT-EMNLP,pages233–240.DanKleinandChristopherD.Manning.2004.Corpus-basedinductionofsyntacticstructure:Modelsofde-pendencyandconstituency.InProceedingsofACL,pages479–486.MatthiasTrautnerKromann.2003.TheDanishDepen-dencyTreebankandtheDTAGtreebanktool.InPro-ceedingsoftheSecondWorkshoponTreebanksandLinguisticTheories,pages217–220.KenichiKuriharaandTaisukeSato.2006.VariationalBayesiangrammarinductionfornaturallanguage.InProceedingsoftheInternationalColloquiumonGram-maticalInference,pages84–96.BrianMacWhinney.2000.TheCHILDESproject:Toolsforanalyzingtalk.LawrenceErlbaumAssociates,Mah-wah,NJ,thirdedition.S´everineMillotte,RogerWales,andAnneChristophe.2007.Phrasalprosodydisambiguatessyntax.Lan-guageandCognitiveProcesses,22(6):898–909.JamesLMorgan,RichardPMeier,andElissaLNewport.1987.Structuralpackagingintheinputtolanguagelearning:contributionsofprosodicandmorphologi-calmarkingofphrasestotheacquisitionoflanguage.CognitivePsychology,19:498–550.TahiraNaseemandReginaBarzilay.2011.Usingseman-ticcuestolearnsyntax.InProceedingsofAAAI.JohnKPateandSharonGoldwater.2011.Unsupervisedsyntacticchunkingwithacousticcues:computationalmodelsforprosodicbootstrapping.InProceedingsofthe2ndACLworkshoponCognitiveModelingandComputationalLinguistics.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
0
1
5
6
6
6
2
1
/
/
t
l
a
c
_
a
_
0
0
2
1
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
74
EliasPonvert,JasonBaldridge,andKatrinErk.2011.Simpleunsupervisedgrammarinductionfromrawtextwithcascadedfinitestatemodels.InProceedingsofACL-HLT.PattiJPrice,MariOstendorf,StefanieShattuck-Hufnagel,andCynthiaFong.1991.Theuseofprosodyinsyntac-ticdisambiguation.InProceedingsoftheHLTwork-shoponSpeechandNaturalLanguage,pages372–377,Morristown,NJ,USA.AssociationforComputationalLinguistics.CAntonRytting,ChrisBrew,andEricFosler-Lussier.2010.Segmentingwordsfromnaturalspeech:subseg-mentalvariationinsegmentalcues.JournalofChildLanguage,37(3):513–543.RoySchwartz,OmriAbend,RoiReichart,andAriRap-poport1.2011.Neutralizinglinguisticallyproblematicannotationsinunsuperviseddependencyparsingevalu-ation.InProceedingsofthe49thACL,pages663–672.YoavSeginer.2007.Fastunsupervisedincrementalpars-ing.InProceedingsofACL.AmandaSeidl.2007.Infants’useandweightingofprosodiccuesinclausesegmentation.JournalofMem-oryandLanguage,57(1):24–48.ValentinISpitkovsky,HiyanAlshawi,AngelXChang,andDanielJurafsky.2011a.Unsuperviseddependencyparsingwithoutgoldpart-of-speechtags.InProceed-ingsofEMNLP.ValentinISpitkovsky,HiyanAlshawi,andDanielJurafsky.2011b.Punctuation:Makingapointinunsuperviseddependencyparsing.InProceedingsofCoNLL.HarryTily,SusanneGahl,InbalArnon,NealSnider,AnubhaKothari,andJoanBresnan.2009.Syntacticprobabilitiesaffectpronunciationvariationinsponta-neousspeech.LanguageandCognition,1(2):147–165.
Scarica il pdf