Transactions of the Association for Computational Linguistics, 2 (2014) 131–142. Action Editor: Joakim Nivre. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 131–142. Editor de acciones: Joakim Nivré.
Submitted 11/2013; Revised 2/2014; Publicado 4/2014. C(cid:13)2014 Asociación de Lingüística Computacional.

JointIncrementalDisﬂuencyDetectionandDependencyParsingMatthewHonnibalDepartmentofComputingMacquarieUniversitySydney,Australiamatthew.honnibal@mq.edu.edu.auMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,Australiamark.johnson@mq.edu.edu.auAbstractWepresentanincrementaldependencyparsingmodelthatjointlyperformsdisﬂu-encydetection.Themodelhandlesspeechrepairsusinganovelnon-monotonictran-sitionsystem,andincludesseveralnovelclassesoffeatures.Forcomparison,weevaluatedtwopipelinesystems,us-ingstate-of-the-artdisﬂuencydetectors.Thejointmodelperformedbetteronbothtasks,withaparseaccuracyof90.5%and84.0%accuracyatdisﬂuencydetection.Themodelrunsinexpectedlineartime,andprocessesover550tokensasecond.1IntroductionMostunscriptedspeechcontainsﬁlledpauses(umsanduhs),anderrorsthatareusuallyeditedon-the-ﬂybythespeaker.Disﬂuencydetectionisthetaskofdetectingtheseinfelicitiesinspokenlanguagetranscripts.Thetaskhassomeimme-diatevalue,asdisﬂuencieshavebeenshowntomakespeechrecognitionoutputmuchmoredif-ﬁculttoread(Jonesetal.,2003),buthasalsobeenmotivatedasamoduleinanaturallanguageunderstandingpipeline,becausedisﬂuencieshaveprovenproblematicforPCFGparsingmodels.Insteadofapipelineapproach,webuildonre-centworkintransition-baseddependencyparsing,toperformthetwotasksjointly.Therehavebeentwosmallstudiesofdependencyparsingonun-scriptedspeech,bothusingentirelygreedypars-ingstrategies,withoutadirectcomparisonagainstapipelinearchitecture(Jorgensen,2007;RasooliandTetreault,2013).Wegosubstantiallybeyondthesepilotstudies,andpresentasystemthatcom-paresfavourablytoapipelineconsistingofstate-of-the-artcomponents.OurparserlargelyfollowsthedesignofZhangandClark(2011).Weuseastructuredaveragedperceptronmodelwithbeam-searchdecoding(collins,2002).OurfeaturesetisbasedonZhangandClark(2011),andourtransition-systemisbasedonthearc-eagersystemofNivre(2003).Weextendthetransitionsystemwithanovelnon-monotonictransition,Edit.Itallowssen-tenceslike‘Passthepepperuhsalt’tobeparsedincrementally,withouttheneedtoguessearlythatpepperisdisﬂuent.Thisisachievedbyre-processingtheleftwardchildrenofthewordEditmarksasdisﬂuent.Forinstance,iftheparserat-tachesthetopepper,butsubsequentlymarkspep-perasdisﬂuent,thewillbereturnedtothestack.Wealsoexploittheeasewithwhichthemodelcanincorporatearbitraryfeatures,anddesignasetoffeaturesthatcapturethe‘roughcopy’structureofsomespeechrepairs,whichmotivatedtheJohnsonandCharniak(2004)noisychannelmodel.Ourmaincomparisonisagainsttwopipelinesystems,whichusethetwocurrentstate-of-the-artdisﬂuencydetectionsystemsaspre-processorstoourparser,minusthecustomdisﬂuencyfea-turesandtransition.Thejointmodelcomparedfavourablytothepipelineparsersatbothtasks,withanunlabelledattachmentscoreof90.5%,and84.0%accuracyatdetectingspeechrepairs.Anef-ﬁcientimplementationisavailableunderanopen-sourcelicense.1Thefutureprospectsofthesys-temarealsoquitepromising.Becausetheparserisincremental,itshouldbewellsuitedtoun-segmentedtextsuchastheoutputofaspeech-recognitionsystem.Weconsiderourmaincon-tributionstobe:•anovelnon-monotonictransitionsystem,forspeechrepairsandrestarts,1http://github.com/syllog1sm/redshift

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

132

Aﬂighttoum|{z}FPBoston|{z}RMImean|{z}IMDenver|{z}RPTuesdayFigure1:AsentencewithdisﬂuenciesannotatedinthestyleofShriberg(1994)andtheSwitchboardcor-pus.FP=FilledPause,RM=Reparandum,IM=Interregnum,RP=Repair.Wefollowpreviousworkinevaluatingthesys-temontheaccuracywithwhichitidentiﬁesspeech-repairs,markedreparandumabove.•severalnovelfeatureclasses,•directcomparisonagainstthetwobestdisﬂu-encypre-processors,and•state-of-the-artaccuracyforbothspeechparsinganddisﬂuencydetection.2SwitchboardDisﬂuencyAnnotationsTheSwitchboardportionofthePennTreebank(Marcusetal.,1993)consistsoftelephoneconver-sationsbetweenstrangersaboutanassignedtopic.Twoannotationlayersareprovided:oneforsyn-tacticbracketing(MRGﬁles),andonefordisﬂu-encies(DPSﬁles).Thedisﬂuencylayermarksel-ementswithlittleornosyntacticfunction,suchasﬁlledpausesanddiscoursemarkers,andannotatesspeechrepairsusingtheShriberg(1994)systemofreparandum/interregnum/repair.AnexampleisshowninFigure1.Inthesyntacticannotation,editedwordsarecoveredbyaspecialnodelabelledEDITED.Theideaistomarktextwhich,ifexcised,wouldre-sultinagrammaticalsentence.TheMRGﬁlesdonotmarkothertypesofdisﬂuencies.WefollowtheevaluationdeﬁnedbyCharniakandJohnson(2001),whichevaluatestheaccuracyofidentify-ingspeechrepairsandrestarts.Thisdeﬁnitionofthetaskisthestandardinrecentwork.Thereasonforthisisthatﬁlledpausescanbedetectedusingasimplerule-basedapproach,andparentheticalshavelessimpactonreadabilityanddown-streamprocessingaccuracy.TheMRGandDPSlayershavehighbutim-perfectagreementoverwhattokenstheymarkasspeechrepairs:ofthetextannotatedwithbothlay-ers,33,720tokensaremarkedasdisﬂuentinatleastonelayer,32,310areonlymarkedasdisﬂu-entbytheDPSﬁles,and32,742areonlymarkedasdisﬂuentbytheMRGlayer.TheSwitchboardannotationprojectwasnotfullycompleted.Becausedisﬂuencyannotationischeapertoproduce,manyoftheDPStrainingﬁlesdonothavematchingMRGﬁles.Only619,236ofthe1,482,845tokensintheDPSdisﬂuency-detectiontrainingdatahavegold-standardsyntac-ticparses.Oursystemrequiresthemoreexpen-sivesyntacticannotation,butweﬁndthatitout-performsthepreviousstate-of-the-art(QianandLiu,2013),despitetrainingonlessthanhalfthedata.2.1DependencyConversionAsisstandardinstatisticaldependencyparsingofEnglish,weacquireourgold-standarddepen-denciesfromphrase-structuretrees.Weusedthe2013-04-05versionoftheStanforddependencyconverter(deMarneffeetal.,2006).AsisstandardforEnglishdependencyparsing,weusetheBa-sicDependenciesscheme,whichproducesstrictlyprojectiverepresentations.Atﬁrstwefearedthattheﬁlledpauses,disﬂuen-ciesandmeta-datatokensintheSwitchboardcor-pusmightdisrupttheconversionprocess,bymak-ingitmoredifﬁcultfortheconvertertorecognisetheunderlyingproductionrules.Totestthis,weperformedasmallexperiment.Wepreparedtwoversionsofthecorpus:onewhereEDITEDnodes,ﬁlledpausesandmeta-datawereremovedbeforethetreesweretransformedbytheStanfordconverter,andonewherethedis-ﬂuencyremovalwasperformedafterthedepen-dencyconversion.Theresultingcorporawerelargelyidentical:99.54%ofunlabelledand98.7%oflabelleddependencieswerethesame.ThefactthattheStanfordconverterisquiterobusttodis-ﬂuencieswasusefulforourbaselinejointmodel,whichistrainedondependencytreesthatalsoin-cludedgovernorsfordisﬂuentwords.Wefollowpreviousworkondisﬂuencydetec-tionbylower-casingthetextandremovingpunc-tuationandpartialwords(wordstaggedXXandwordsendingin‘-’).Wealsoremoveone-tokensentences,astheirsyntacticanalysesaretrivial.Wefoundthattwoadditionalsimplepre-processesimprovedourresults:discardingall‘um’and‘uh’tokens;andmerging‘youknow’and‘imean’intosingletokens.Thesepre-processescanbecompletedonthein-putstringwithoutlosinginformation:noneofthe‘um’or‘uh’tokensaresemanticallysigniﬁcant,andthebigramsyouknowandimeanhaveade-pendencybetweenthetwotokensover99.9%ofthetimestheyoccurinthetreebank,withyouandIneverhavinganychildren.Thismakesiteasytounmergethetokensdeterministicallyafterpars-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

133

En g:allincomingandoutgoingarcswillpointtoknowormean.Thesamepre-processingwasper-formedforallourparsingsystems.3Transition-basedDependencyParsingAtransition-basedparserpredictsthesyntacticstructureofasentenceincrementally,bymakingasequenceofclassiﬁcationdecisions.WefollowthearchitectureofZhangandClark(2011),whousebeam-searchfordecoding,andastructuredav-eragedperceptronfortraining.Despiteitssimplic-ity,thistypeofparserhasproducedhighlycom-petitiveresultsontheWallStreetJournal:withtheextendedfeaturesetdescribedbyZhangandNivre(2011),itachieves93.5%unlabelledaccuracyonStanfordbasicdependencies(deMarneffeetal.,2006).ConvertingtheconstituencytreesproducedbytheCharniakandJohnson(2005)rerankingparserresultsinsimilaraccuracy.Brieﬂy,thetransition-basedparserconsistsofaconﬁguration(or‘state’)whichissequentiallyma-nipulatedbyasetofpossibletransitions.Forus,astateisa4-tuplec=(pag,b,A,D),whereσandβaredisjointsetsofwordindicestermedthestackandbufferrespectively,Aisthesetofdependencyarcs,andDisthesetofwordindicesmarkeddis-ﬂuent.TherearenoarcstoorfrommembersofD,sothedependenciesanddisﬂuenciescanbeimple-mentedasasinglevector(inourparser,atokenismarkedasdisﬂuentbysettingitasitsownhead).Weusethearc-eagertransitionsystem(Nivre,2003,2008),whichconsistsoffourparsingac-tions:Shift,Left-Arc,Right-ArcandReduce.Wedenotethestackwithitstopmostelementtotheright,andthebufferwithitsﬁrstelementtotheleft.Averticalbarisusedtoindicateconcate-nationtothestackorbuffer,e.g.σ|iindicatesastackwiththetopmostelementiandremainingelementsσ.Adependencyfromagovernoritoachildjisdenotedi→j.Thefourarc-eagertransitionsareshowninFigure2.TheShiftactionmovestheﬁrstitemofthebufferontothestack.TheRight-Arcdoesthesame,butalsoaddsanarc,sothatthetoptwoitemsonthestackareconnected.TheReducemoveandtheLeft-Arcbothpopthestack,buttheLeft-Arcﬁrstaddsanarcfromtheﬁrstwordofthebuffertothewordontopofthestack.Con-straintsontheReduceandLeft-Arcmovesensurethateverywordisassignedexactlyoneheadintheﬁnalconﬁguration.Wefollowthesuggestion(pag,i|b,A,D)'(pag|i,b,A,D)S(pag|i,j|b,A,D)'(pag,j|b,A∪{j→i},D)LOnlyifidoesnothaveanincomingarc.(pag|i,j|b,A,D)'(pag|i|j,b,A∪{i→j},D)R(pag|i,b,A,D)'(pag,b,A,D)DOnlyifihasanincomingarc.(pag|i,j|b,A,D)'(pag|[x1,xn],j|b,A0,D0)EWhereA0=A\{x→yory→x:∀x∈[i,j),∀y∈N}D0=D∪[i,j)x1…xnaretheformerleftchildrenofiFigure2:Ourparser’stransitionsystem.Theﬁrstfourtransitionsarethestandardarc-eagersystem;theﬁfthisournovelEdittransition.ofBallesterosandNivre(2013)andaddadummytokenthatgovernsrootdependenciestotheendofthesentence.Parsingterminateswhenthistokenisatthestartofthebuffer,andthestackisempty.DisﬂuenciesareaddedtoDviatheEdittransition,mi,whichwenowdeﬁne.4ANon-MonotonicEditTransitionOneofthereasonsdisﬂuentsentencesarehardtoparseisthatthereoftenappeartobesyntacticre-lationshipsbetweenwordsinthereparandumandtheﬂuentsentence.Whentheserelationsarecon-sideredinadditiontothedependenciesbetweenﬂuentwords,theresultingstructureisnotneces-sarilyaprojectivetree.Figure3showsasimpleexample,wherethere-pairsquarereplacesthereparandumrectangle.Anincrementalparsercouldeasilybecome‘garden-pathed’andattachtherepairsquaretothepreced-ingwords,constructingthedependenciesshowndottedinFigure3.Ratherthanattemptingtode-viseanincrementalmodelthatavoidsconstruct-ingsuchdependencies,weallowtheparsertocon-structthesedependenciesandlaterdeletethemifthegovernororchildaremarkeddisﬂuent.Psycholinguisticmodelsofhumansentenceprocessinghavelongpositedrepairmechanisms(FrazierandRayner,1982).Recientemente,Honnibaletal.(2013)showedthatalimitedamountof‘non-monotonic’behaviourcanimproveanincremen-talparser’saccuracy.Wehereintroduceanon-monotonictransition,Edit,forspeechrepairs.TheEdittransitionmarksthewordiontopofthestackσ|iasdisﬂuent,alongwithitsright-warddescendents—i.e.,allwordsinthesequencei…j−1,wherejisthewordatthestartofthebuffer.Itthenrestoresthewordsbothprecedingandformerlygovernedbyitothestack.Inotherwords,thewordontopofthestackand

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

134

PassmetheredrectangleuhImeansquareFigure3:Examplewhereapparentdependenciesbetweenthereparandumandtheﬂuentsentencecomplicateparsing.Thedottededgesaredifﬁcultforanincrementalparsertoavoid,butcannotbepartoftheﬁnalparseifitistobeaprojectivetree.Oursolutionistomakethetransitionsystemnon-monotonic:theparserisabletodeleteedges.itsrightwarddescendentsareallmarkedasdis-ﬂuent,andthestackispopped.Wethenrestoreitsleftwardchildrentothestack,andalldepen-denciestoandfromwordsmarkeddisﬂuentaredeleted.Thetransitionisnon-monotonicinthesensethatitcandeletedependenciescreatedbyaprevioustransition,andreplacetokensontothestackthathadbeenpopped.Whyrevisittheleftwardchildren,butnottheright?Weareconcernedaboutdependencieswhichmightbemirroredbetweenthereparandumandtherepair.Therightwardsubtreeofthedisﬂu-encymightwellbeincorrect,butifitis,itwouldstillbeincorrectifthewordontopofthestackwereactuallyﬂuent.Wethereforeregardtheseasparsingerrorsthatwewilltrainourmodeltoavoid.Incontrast,avoidingtheLeft-Arctransi-tionswouldrequiretheparsertopredictthattheheadisdisﬂuentwhenithasnotnecessarilyseenanyevidenceindicatingthat.4.1WorkedExampleFigure4showsagold-standardderivationforadisﬂuentsentencefromthedevelopmentdata.Line1showsthestateresultingfromtheinitialShiftaction.Inthenextthreestates,HisisLeft-Arcedtocompany,whichisthenShiftedontothestack,andLeft-ArcedtowentinLine4.Thedependencybetweenwentandcompanyisnotpartofthegold-standard,becausewentisdis-ﬂuent.Thecorrectgovernorofcompanyisthesec-ondwentinthesentence.TheLeft-ArcmoveinLine4canstillbeconsideredcorrect,sin embargo,be-causethegold-standardanalysisisstillderivablefromtheresultingconﬁguration,viatheEdittran-sition.Anothernon-golddependencyiscreatedinLine6,betweenbrokeandwent,beforebrokeisReducedfromthestackinLine7.Lines9and10showthestatesbeforeandaftertheEdittransition.ThewordontopofthestackinLine9,went,hasoneleftwardchild,andoneright-1.SHiscompanywentbrokeimeanwentbankrupt2.LHiscompanywentbrokeimeanwentbankrupt3.SHiscompanywentbrokeimeanwentbankrupt4.LHiscompanywentbrokeimeanwentbankrupt5.SHiscompanywentbrokeimeanwentbankrupt6.RHiscompanywentbrokeimeanwentbankrupt7.DHiscompanywentbrokeimeanwentbankrupt8.SHiscompanywentbrokeimeanwentbankrupt9.LHiscompanywentbrokeimeanwentbankrupt10.EHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt11.LHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt12.SHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt12.RHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt13.DHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankruptFigure4:Agold-standardtransitionsequenceusingourEDITtransition.Eachlinespeciﬁesanactionandshowsthestateresultingfromit.Wordsonthestackarecircled,andthearrowindicatesthestartofthebuffer.Disﬂuentwordsarestruck-through.wardchild.AftertheEdittransitionisapplied,wentanditsrightwardchildbrokearebothmarkeddisﬂuent,andcompanyisreturnedtothestack.Allofthepreviousdependenciestoandfromwentandbrokearedeleted.Parsingthenproceedsasnormal,withthecor-rectgovernorofcompanybeingassignedbytheLeft-ArcinLine11,andbankruptbeingRight-ArcedtowentinLine12.Toconservespace,wehaveomittedthedummyROOTtoken,whichisplacedattheendofthesentence,followingthesuggestionofBallesterosandNivre(2013).TheﬁnalactionwillbeaLeft-ArcfromtheROOTto-kentowent.4.2DynamicOracleTrainingAlgorithmOurnon-monotonictransitionsystemintroducessubstantialspuriousambiguity:thegold-standardparsecanbederivedviamanydifferenttransition

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

135

sequences.Recentworkhasshownthatthiscanbeadvantageous(Sartorioetal.,2013;Honnibaletal.,2013;GoldbergandNivre,2012),becausedifﬁcultdecisionscansometimesbedelayeduntilmoreinformationisavailable.Line5ofFigure4showsastatethatintroducesspuriousambiguity.Fromthisconﬁguration,therearemultipleactionsthatcouldbeconsidered‘cor-rect’,inthesensethatthegold-standardanalysiscanbederivedfromthem.TheEdittransitioniscorrectbecausewentisdisﬂuent,buttheLeft-ArcandeventheRight-Arcarealsocorrect,inthattherearecontinuationsfromthemthatleadtothegold-standardanalysis.Weregardalltransitionsequencesthatcanre-sultinthecorrectanalysisasequallyvalid,andwanttoavoidstipulatingoneofthemduringtrain-ing.WeachievethisbyfollowingGoldbergandNivre(2012)inusingadynamicoracletocreatepartiallylabelledtrainingdata.2Adynamicoracleisafunctionthatdeterminesthecostofapplyinganactiontoastate,intermsofgold-standardarcsthatarenewlyunreachable.WefollowCollins(2002)intraininganaver-agedperceptronmodeltopredicttransitionse-quences,ratherthanindividualtransitions.Thistypeofmodelisoftenreferredtoasastruc-turedperceptron,orsometimesaglobalpercep-tron.Duringtraining,ifthemodeldoesnotpre-dictthecorrectsequence,anupdateisperformed,basedonthegold-standardsequenceandpartofthesequencepredictedbythecurrentweights.Onlypartofthesequenceisusedtocalculatetheweightupdate,inordertoaccountforsearcher-rors.Weusethemaximumviolationstrategyde-scribedbyHuangetal.(2012)toselectthesubse-quencetoupdatefrom.Totrainourmodelusingthedynamicoracle,weusethelatent-variablestructuredperceptronal-gorithmdescribedbySunetal.(2009).Beam-searchisperformedtoﬁndthehighest-scoringgold-standardsequence,aswellasthehighest-scoringprediction.Weusethesamebeam-widthforbothsearchprocedures.4.3PathLengthNormalisationOneproblemintroducedbytheEdittransitionisthatthenumberofactionsappliedtoasentenceis2Thetrainingdataispartiallylabelledinthesensethatin-stancescanhavemultipletruelabels.Equivalently,onemightsaythatthetransitionsarelatentvariables,whichgeneratethedependencies.nolongerconstant—itisnolongerguaranteedtobe2n−1,forasentenceoflengthn.WhentheEdittransitionisappliedtoawordwithleftwardchildren,thosechildrenarereturnedtothestack,andprocessedagain.Thishaslittletonoimpactonthealgorithm’sempiricalefﬁciency,althoughworst-casecomplexityisnolongerlinear,butitdoesposeaproblemfordecoding.Theperceptronmodeltendstoassignlargepos-itivescorestoitstopprediction.Wethusob-servedaproblemwhencomparingpathsofdiffer-entlengths,attheendofthesentence.PathsthatincludedEdittransitionswerelonger,sothesumoftheirscorestendedtobehigher.ThesameproblemhasbeenobservedduringincrementalPCFGparsing,byZhuetal.(2013).Theyintroduceanadditionaltransition,IDLE,toensurethatpathsarethesamelength.Solongasonecandidateinthebeamisstillbeingprocessed,allothercandidatesapplytheIDLEtransition.Weadoptasimplersolution.Wenormalisetheﬁgure-of-meritforacandidatestate,whichisusedtorankitinthebeam,bythelengthofitstransitionhistory.Thenewﬁgure-of-meritisthearithmeticmeanofthecandidate’stransitionscores,wherepreviouslytheﬁgure-of-meritwasthesumofthecandidate’stransitionscores.Interestingly,Zhuetal.(2013)reportthattheytriedexactlythis,andthatitwaslesseffectivethantheirsolution.Wefoundthatthefeaturesassoci-atedwiththeIDLEtransitionwereuninformative(thestateisattermination,sothestackandbufferareempty),andhadnothingtodowithhowmanyedittransitionswereearlierapplied.5FeaturesfortheJointParserOurbaselineparserusesthefeaturesetdescribedbyZhangandNivre(2011).Thefeaturesetcon-tains73templatesthatmostlyrefertotheprop-ertiesof12contexttokens:thetopofthestack(S0),itstwoleftmostandrightmostchildren(S0L,S0L2,S0R,S0R2),itsparentandgrand-parent(S0h,S0h2),theﬁrstwordofthebufferanditstwoleftmostchildren(N0,N0L,N0LL),andthenexttwowordsofthebuffer(N1,N2).Atomicfeaturesconsistoftheword,part-of-speechtag,ordependencylabelforthesetokens;andmultiplefeatureatomsareoftencombinedforfeaturetemplates.Therearealsofeaturesforthestring-distancebetweenS0andN0,andtheleftandrightvalencies(totalnumberofchildren)de

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

136

S0andN0,aswellasthesetoftheirchildren’sde-pendencylabels.Werestrictthesetotheﬁrstandlast2childrenforimplementationefﬁciency,aswefoundthishadnoeffectonaccuracy.Numericfeatures(fordistanceandvalency)arebinnedwiththefunctionλx:mín.(X,5).Thereisonlyonebi-lexicalfeaturetemplate,whichpairsthewordsofS0andN0.Therearealsotentri-tagtemplates.OurfeaturesetincludesadditionaldependencylabelfeaturesnotusedbyZhangandNivre(2011),aswefoundthatdisﬂuencydetectionerrorsoftenresultedinungrammaticaldependencylabelcom-binations.TheadditionaltemplatescombinethePOStagofS0withtwoorthreedependencyla-belsfromitsleftandrightsubtrees.Detailscanbefoundinthesupplementarymaterial.5.1BrownClusterFeaturesTheBrownclusteringalgorithm(Brownetal.,1992)isawellknownsourceofsemi-supervisedfeatures.Theclusteringalgorithmisrunoveralargesampleofunlabelleddata,togenerateatype-to-clustermap.Thismappingisthenusedtogeneratefeaturesthatsometimesgeneralisebetterthanlexicalfeatures,andarehelpfulforout-of-vocabularywords(Turianetal.,2010).KooandCollins(2010)foundthatBrownclus-terfeaturesgreatlyimprovedtheperformanceofagraph-baseddependencyparser.Onourtransition-basedparser,BrownclusterfeaturesbringasmallbutstatisticallysigniﬁcantimprovementontheWSJtask(0.1-0.3%UAS).Otherdevelopersoftransition-basedparsersseemtohavefoundsim-ilarresults(personalcommunication).SinceaBrownclustermappingcomputedbyLiang(2005)iseasilyavailable,3thefeaturesaresimpletoim-plementandcheaptocompute.OurtemplatesfollowKooandCollins(2010)inincludingfeaturesthatrefertoclusterpreﬁxstrings,aswellasthefullclusters.Weadapttheirtemplatestotransition-basedparsingbyreplacing‘head’with‘itemontopofthestack’and‘child’with‘ﬁrstwordofthebuffer’.Theexacttemplatescanbefoundinthesupplementarymaterial.TheBrownclusterfeaturesareusedinour‘baseline’parser,andintheparsersweuseaspartofourpipelinesystems.Theyimproveddevelop-mentsetaccuracyby0.4%.Weexperimentedwiththeotherfeaturesetsintheseparsers,butfoundthattheydidnotimproveaccuracyonﬂuenttext.3http://www.metaoptimize.com/projects/wordreps5.2RoughCopyFeaturesJohnsonandCharniak(2004)pointoutthatinspeechrepairs,therepairisoftena‘roughcopy’ofthereparandum.Thesimplestcaseofthisiswheretherepairisasinglewordrepetition.Itiscommonfortherepairtodifferfromthereparan-dumbyinsertion,deletionorsubstitutionofoneormorewords.Tocapturethisregularity,weﬁrstextendthefeature-setwiththreenewcontexttokens:41.S0re:TherightmostedgeofS0descendants;2.S0le:TheleftmostedgeofS0descendants;3.N0le:TheleftmostedgeofN0descendants.Ifawordhasnoleftwardchildren,itwillbeitsownleft-edge,andsimilarlyitwillbeitsownrightwardedgeifithasnorightwardchildren.NotethatthetokenS0reisnecessarilyimmedi-atelybeforeN0le,unlesssomeofthetokensbe-tweenthemaredisﬂuent.WeusetheS0leandN0letocomputethefollowingrough-copyfeatures:1.HowlongisthepreﬁxwordmatchbetweenS0le…S0andN0le…N0?Iftheparserwereanalysingtheredthebluesquare,withredonthestackandsquareatN0,itsvaluewouldbe1.2.HowlongisthepreﬁxPOStagmatchbe-tweenS0le…S0andN0le…N0?3.DothewordsinS0le…S0andN0le…N0matchexactly?4.DothePOStagsinS0le…S0andN0le…N0matchexactly?Iftheparserwereanalysingtheredsquarethebluerectangle,withsquareonthestackandrectangleatN0,itsvaluewouldbetrue.Thepreﬁx-lengthfeaturesarebinnedusingthefunctionλx:mín.(X,5).5.3MatchFeaturesThisclassoffeaturesaskwhichpairsofthecon-texttokensmatch,inwordorPOStag.Thecon-texttokensintheZhangandNivre(2011)fea-turesetarethetopofthestack(S0),itsheadand4Asiscommoninthistypeofparser,ourimplementationhasanumberofvectorsforpropertiesthataredeﬁnedbeforeparsing,suchaswordforms,POStags,Brownclusters,etc.Acontexttokenisanindexintothesevectors,allowingfeaturesconsideringthesepropertiestobecomputed.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

137

grandparent(S0h,S0h2),itstwoleft-andright-mostchildren(S0L,S0L2,S0R,S0R2),theﬁrstthreewordsofthebuffer(N0,N1,N2),andthetwoleftmostchildrenofN0(N0L,N0LL).Weex-tendthissetwiththeS0le,S0reandN0letokensdescribedabove,andalsotheﬁrstleftandrightchildofS0andN0(S0L0,S0R0,N0L0).Allup,thereare18contexttokens,entonces(cid:0)182(cid:1)=153tokenpairs.Foreachpairofthesetokens,weaddtwobinaryfeatures,indicatingwhetherthetwotokensmatchinwordformorPOStag.Wealsohavetwofurtherclassesoffeatures:ifthewordsdomatch,afeatureisaddedindicat-ingthewordform;ifthetagsmatch,afeatureisaddedindicatingthetag.Theseﬁnergrainedver-sionshelpthemodeladjustforthefactthatsomewordscanbeduplicatedingrammaticalsentences(e.g.‘thatthat’),whilemostrarewordscannot.5.4EditedNeighbourFeaturesDisﬂuenciesareusuallystringcontiguous,eveniftheydonotformasingleconstituent.Inthesesitu-ations,ourmodelhastomakemultipletransitionstomarkasingledisﬂuency.Forinstance,ifanut-terancebeginsandtheanda,thestackwillcontaintwoentries,forandandthe,andtwoEdittransi-tionswillberequired.Tomitigatethisdisadvantageofourmodel,weaddfourbinaryfeatures.TwoﬁrewhenthewordorpairofwordsimmediatelyprecedingN0havebeenmarkeddisﬂuent;theothertwoﬁrewhenthewordorpairofwordsimmediatelyfollowingS0havebeenmarkeddisﬂuent.Thesefeaturespro-videanadditionalstring-basedviewthattheparserwouldotherwisebemissing.Speakerstendbedisﬂuentinbursts:ifthepreviouswordisdis-ﬂuent,thenextwordismorelikelytobedisﬂu-ent.Thesefourfeaturesarethereforeallassoci-atedwithpositiveweightsfortheEdittransition.Withoutthesefeatures,wewouldmissanaspectofdisﬂuencyprocessingthatsequencemodelsnatu-rallycapture.6Part-of-SpeechTaggingWeadoptthestandardstrategyofusingaPOStaggerasapre-processbeforeparsing.Mosttransition-basedparsersuseastructuredaveragedperceptronmodelwithbeam-searchfortagging,asthismodelachievescompetitiveaccuracyandmatchesthestandarddependencyparsingarchi-tecture.Ourtaggeralsousesthisarchitecture.Weperformedsomeadditionalfeatureengi-neeringforthetagger,inordertoimproveitsaccu-racygiventhelackofcasedistinctionsandpunc-tuationinthedata.Ouradditionalfeaturesusetwosourcesofunsupervisedinformation.First,wefollowthesuggestionofManning(2011)byus-ingBrownclusterfeaturestoimprovethetagger’saccuracyonunknownwords.Second,wecom-pensateforthelackofcasedistinctionsbyinclud-ingfeaturesthataskwhatpercentageofthetimeawordformwasseentitle-cased,upper-casedandlower-casedintheGoogleWeb1Tcorpus.Wheremostpreviousworkusescross-foldtrainingforthetagger,toensurethattheparseristrainedontagsthatreﬂectrun-timeaccuracies,wedoonlinetrainingofthetaggeralongsidetheparser,usingthecurrenttaggermodeltoproducetagsduringparsertraining.Thishadnoimpactonparseaccuracy,andmadeitslightlyeasiertode-velopourtaggeralongsidetheparser.Thetaggerachieved96.5%accuracyonthede-velopmentdata,butwhenweranourﬁnaltestexperiments,wefounditsaccuracydroppedto96.0%,indicatingsomeover-ﬁttingduringourfeatureengineering.Onthedevelopmentdata,ourparseraccuracyimprovesbyabout1%whengold-standardtagsareused.7ExperimentsWeusetheSwitchboardportionofthePennTree-bank(Marcusetal.,1993),asdescribedinSec-tion2,totrainourjointmodelsandevaluatethemondependencyparsinganddisﬂuencydetection.Thepre-processinganddependencyconversionaredescribedinSection2.1.Weusethestan-dardtrain/dev/testsplitfromCharniakandJohn-son(2001):Sections2and3fortraining,andSec-tion4dividedintothreeheld-outsections,theﬁrstofwhichisusedforﬁnalevaluation.OurparserevaluationusestheSPARSEVAL(Roarketal.,2006)metric.However,wewantedtousetheStanforddependencyconverter,forthereasonsdescribedinSection2.1,soweusedourownimplementation.Becausewedonotneedtodealwithrecognitionerrors,wedonotneedtoreportourparsingresultsusingP/R/F-measures.Instead,wereportanunlabelledaccuracyscore,whichreferstothepercentageofﬂuentwordswhosegovernorswereassignedcorrectly.Notethatwordsmarkedasdisﬂuentcannothaveanyin-comingorout-goingdependencies,soifawordis

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

138

incorrectlymarkedasdisﬂuent,allofitsdepen-dencieswillbeincorrect.WefollowJohnsonandCharniak(2004)andothersinrestrictingourdisﬂuencyevaluationtospeechrepairs,whichweidentifyaswordsthathaveanodelabelledEDITEDasanancestor.Un-likemostotherdisﬂuencydetectionresearch,wetrainonlyontheMRGﬁles,givingus619,236wordsoftrainingdatainsteadofthe1,482,845usedbythepipelinesystems.Itmaybepossibletoimproveoursystem’sdisﬂuencydetectionbyleveragingtheadditionaldatathatdoesnothavesyntacticannotationinsomeway.Allparsingmodelsweretrainedfor15itera-tions.Wefoundthatoptimisingthenumberofiterationsonadevelopmentsetledtosmallim-provementsthatdidnottransfertoaseconddevel-opmentset(partofSection4,whichCharniakandJohnson(2001)reservedfor‘futureuse’).Wetestforstatisticalsigniﬁcanceinourresultsbytraining20modelsforeachexperimentalcon-ﬁguration,usingdifferentrandomseeds.Theran-domseedscontrolhowthesentencesareshuf-ﬂedduringtraining,whichtheperceptronmodelisquitesensitiveto.WeusetheWilcoxonrank-sumsnon-parametrictest.ThestandarddeviationinUASforasamplewastypicallyaround0.05%,and0.5%fordisﬂuencyF-measure.Allofourmodelsusebeam-searchdecoding,withabeamwidthof32.Wefoundthatabeamwidthof64broughtaverysmallaccuracyim-provement(about0.1%),atthecostof50%slowerrun-time.Widerbeamsthanthisbroughtnoac-curacyimprovement.Accuracyseemstoplateauwithslightlynarrowerbeamsthanonnewswiretext.ThisisprobablyduetotheshortersentencesinSwitchboard.Thebaselineandpipelinesystemsareconﬁg-uredinthesameway,exceptthatthebaselineparserismodiﬁedslightlytoallowittopredictdisﬂuencies,usingaspecialdependencylabel,ERASED.Alldescendantsofawordattachedtoitsheadbythislabelaremarkedasdisﬂuent.Boththebaselineandpipeline/oracleparsersusethesamefeatureset:theZhangandNivre(2011)características,plusourBrownclusterfeatures.Thebaselinesystemisastandardarc-eagertransition-basedparserwithastructuredaveragedperceptronmodelandbeam-searchdecoding.Themodelistrainedinthestandardway,witha‘static’oracleandmaximum-violationupdate,following(Huangetal.,2012).7.1ComparisonwithPipelineApproachesTheaccuracyofincrementaldependencyparsersiswellestablishedontheWallStreetJournal,buttherearenodependencyparsingresultsinthelit-eraturethatmakeiteasytoputourjointmodel’sparsingaccuracyintocontext.Wethereforecom-pareourjointmodeltotwopipelinesystems,whichconsistofadisﬂuencydetector,followedbyourdependencyparser.Wealsoevaluateparseac-curaciesafteroraclepre-processing,togaugetheneteffectofdisﬂuenciesonourparser’saccuracy.Thedependencyparserforthepipelinesystemswastrainedontextwithalldisﬂuenciesremoved,followingCharniakandJohnson(2001).ThetwodisﬂuencydetectionsystemsweusedweretheQianandLiu(2013)sequence-taggingmodel,andaversionoftheJohnsonandCharniak(2004)noisychannelmodel,usingtheCharniak(2001)syntacticlanguagemodelandthererankingfea-turesofZwartsandJohnson(2011).Theyarethetwobestpublisheddisﬂuencydetectionsystems.8ResultsTable1showsthedevelopmentsetaccuraciesforourjointparser.BoththedisﬂuencyfeaturesandtheEdittransitionmakestatisticallysigniﬁcantimprovements,inbothdisﬂuencyF-measure,un-labelledattachmentscore(UAS),andlabelledat-tachmentscore(LAS).TheOraclepipelinesystem,whichusesthegold-standardtocleandisﬂuenciespriortopars-ing,showsthetotalimpactofspeech-errorsontheparser.Thebaselineparser,whichusestheZhangandNivre(2011)featuresetplustheBrownclus-terfeatures,scores1.8%UASlowerthantheora-cle.WhenweaddthefeaturesdescribedinSec-tions5.2,5.3and5.4,thegapisreducedto1.2%(+Características).Finalmente,theimprovedtransitionsys-temreducesthegapfurtherstill,to0.8%UAS(+Edittransition).WealsotestedthesefeaturesintheOracleparser,butfoundtheywereineffec-tiveonﬂuenttext.Thew/scolumnshowsthetokensanalysedpersecondforeachsystem,includingdisﬂuencies,withasinglethreadona2.4GHzIntelXeon.Theadditionalfeaturesreduceefﬁciency,butthenon-monotonicEdittransitiondoesnot.Thesystemiseasilyefﬁcientenoughforreal-timeuse.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7

/
t

a
C
_
a
_
0
0
1
7
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

139

PRFUASLASw/sBaselinejoint79.470.174.589.986.9711+Features86.077.281.390.587.5539+Edittransition92.280.285.890.987.9555Oraclepipeline10010010091.788.6782Table1:Developmentresultsforthejointmodels.Forthebaselinemodel,disﬂuenciesreduceparseaccuracyby1.7%UnlabelledAttachmentScore(UAS).OurfeaturesandEdittransitionreducethegapto0.7%,andimprovedisﬂuencyde-tectionby11.3%F-measure.Disﬂ.FUASJohnsonetalpipeline82.190.3QianandLiupipeline83.990.1Baselinejointparser73.989.4Finaljointparser84.190.5Table2:Test-setparseanddisﬂuencyaccuracies.ThejointparserisimprovedbythefeaturesandEdittransition,andisbetterthanpre-processingthetextwithstate-of-the-artdisﬂu-encydetectors.Table2showstheﬁnalevaluation.Ourmaincomparisoniswiththetwopipelinesystems,de-scribedinSection7.1.TheJohnsonandChar-niak(2004)systemwas1.8%lessaccurateatdis-ﬂuencydetectionthantheotherdisﬂuencydetec-torweevaluated,thestate-of-the-artQianandLiu(2013)system.However,whenweevaluatedthetwosystemsaspre-processorsbeforeourparser,wefoundthattheJohnsonetalpipelineachieved0.2%betterunlabelledattachmentscorethantheQianandLiupipeline.WeattributethistotheuseoftheCharniakandJohnson(2001)syntac-ticlanguagemodelintheJohnsonetalpipeline,whichwouldhelpthesystemproducemoresyn-tacticallyconsistentoutput.Ourjointmodelachievedanunlabelledat-tachmentscoreof90.5%,out-performingbothpipelinesystems.TheBaselinejointparser,whichdidnotincludetheEdittransitionordisﬂu-encyfeatures,scores1.1%belowtheFinaljointparser.Alloftheparseaccuracydifferenceswerefoundtobestatisticallysigniﬁcant(pag<0.001).TheEdittransitionanddisﬂuencyfeaturesto-getherbroughta10.1%improvementindisﬂuencyF-measure,whichwasalsofoundtobestatisti-callysigniﬁcant.Theﬁnaljointparserachieved0.2%higherdisﬂuencydetectionaccuracythanthepreviousstate-of-the-art,theQianandLiu(2013)system,5despitehavingapproximatelyhalfasmuchtrainingdata(werequiresyntacticanno-5Ourscoresrefertoanupdatedversionofthesystemthatcorrectsminorpre-processingproblems.WethankQianXianformakinghiscodeavailable.tation,forwhichthereislessdata).Oursigniﬁcancetestingregimeinvolvedusing20differentrandomseedswhentrainingeachofourmodels,whichtheperceptronalgorithmissen-sitiveto.Thiscouldnotbeappliedtotheothertwodisﬂuencydetectors,sowecannottestthosedif-ferencesforsigniﬁcance.However,wenotethatthe20samplesforourdisﬂuencydetectorrangedinaccuracyfrom83.3-84.6,sowedoubtthat0.2%meanimprovementovertheQianandLiu(2013)resultismeaningful.Althoughwedidnotsystematicallyoptimiseonthedevelopmentset,ourtestscoresarelowerthanourdevelopmentaccuracies.Muchoftheover-ﬁttingseemstobeinthePOStagger,whichdroppedinaccuracyby0.5%.9AnalysisofEditBehaviourInordertounderstandhowtheparserappliestheEdittransition,wecollectedsomeadditionalstatisticsoverthedevelopmentdata.Theparserpredicted2,558Edittransitions,whichtogethermarked2,706wordsdisﬂuent(2,495correctly).TheEdittransitioncanmarkmultiplewordsdis-ﬂuentwhenS0hasoneormorerightwarddescen-dants.Itturnsoutthiscaseisuncommon;theparserlargelyassignsdisﬂuencylabelsword-by-word,onlysometimesmarkingwordswithright-warddescendentsasdisﬂuent.Ofthe2,558Edittransitions,therewere682caseswereatleastoneleftwardchildwasreturnedtothestack,andthetotalnumberofleftwardchil-drenreturnedwas1,132.Themostcommontypeofconstructionthatcausedtheparsertoreturnwordstothestackweredisﬂuentpredicates,whichoftenhavesubjectsanddiscourseconjunctionsasleftwardchildren.Anexampleofadisﬂuentpred-icatewithaﬂuentsubjectisshowninFigure4.Therewereonly48casesofthesamewordbe-ingreturnedtothestacktwice.Thepossibilityofwordsbeingreturnedtothestackmultipletimesiswhatgivesoursystemworsethanlinearworst-casecomplexity.Intheworstcase,theithwordofasentenceoflengthncouldbereturnedtothestackn−(i+1)times.Empirically,theEdittran-sitionmadenodifferencetorun-time.OnceawordhasbeenreturnedtothestackbytheEdittransition,howdoestheparserendupanalysingit?Ifitturnedoutthatalmostalloftheformerleftwardchildrenofdisﬂuentwordsaresubsequentlymarkedasdisﬂuent,therewouldbe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 1 1 5 6 6 8 8 7 / / t l a c _ a _ 0 0 1 7 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 140 littlepointinreturningthemtothestack—wecouldjustmarkthemasdisﬂuentintheoriginalEdittransition.Ontheotherhand,iftheyareal-mostallmarkedasﬂuent,perhapstheycanjustbeattachedaschildrentotheﬁrstwordofthebuffer.Infactthetwocasesarealmostequallycom-mon.Ofthe1,132wordsreturnedtothestack,547weresubsequentlymarkeddisﬂuent,and584werenot.Theparserwasalsoquiteaccurateinitsdecisionsoverthesetokens.Ofthe547tokensmarkeddisﬂuent,500werecorrect—similartotheoveralldevelopmentsetprecision,92.2%.Accuracyoverthewordsreturnedtothestackmightbeimprovedinfuturebyfeaturesreferringtotheirformerheads.Forinstance,inHewentbrokeuhbecamebankrupt,wedonotcurrentlyhavefeaturesthatrecordthedeleteddependencybecameheandwent.Wethankoneoftheanony-mousreviewersforthissuggestion.10RelatedWorkThemostsimilarsystemtoourswaspublishedveryrecently.RasooliandTetreault(2013)de-scribeajointmodelofdependencyparsinganddisﬂuencydetection.Theyintroduceasecondclassiﬁcationstep,wheretheyﬁrstdecidewhethertoapplyadisﬂuencytransition,oraregularpars-ingmove.Disﬂuencytransitionsoperateeitheroverasequenceofwordsbeforethestartofthebuffer,orasequenceofwordsfromthestartofthebufferforward.Insteadofthedynamicoracletrainingmethodthatweemploy,theyuseatwo-stagebootstrap-styleprocess.Directcomparisonbetweenourmodelandtheirsisdifﬁcult,astheyusethePenn2MALTscheme,andtheirparserusesgreedydecoding,whereweusebeamsearch.Theyalsousegold-standardpart-of-speechtags,whichwouldim-proveourscoresbyaround1%.Theuseofbeam-searchmayexplainmuchofourperfor-manceadvantage:theyreportanunlabelledat-tachmentscoreof88.6,andadisﬂuencydetec-tionF-measureof81.4%.Ourtrainingalgorithmwouldbeapplicabletoabeam-searchversionoftheirparser,astheirtransition-systemalsointro-ducessubstantialspuriousambiguity,andsomenon-monotonicbehaviour.Ahybridtransitionsystemwouldalsobepossi-ble,asthetwotypesofEdittransitionseemtobecomplementary.TheRasooliandTetreaultsystemoffersatoken-basedviewofdisﬂuencies,whichisusefulforexamplessuchas,andtheandthe,whichwouldrequiretwoapplicationsofourtran-sition.Ontheotherhand,ourEdittransitionmayhavetheadvantageformoresyntacticallycompli-catedexamples,particularlyfordisﬂuentverbs.Theimportanceofsyntacticfeaturesfordisﬂu-encydetectionwasdemonstratedbyJohnsonandCharniak(2004).Despitethis,mostsubsequentworkhasusedsequencemodels,ratherthansyn-tacticparsers.Theotherdisﬂuencysystemthatwecompareourmodelto,developedbyQianandLiu(2013),usesacascadeofMaximumMarginMarkovModelstoperformdisﬂuencydetectionwithminimalsyntacticinformation.Onemotivationforsequentialapproachesisthatmostapplicationsofthesemodelswillbeoverun-segmentedtext,assegmentingunpunctuatedtextisadifﬁculttaskthatbeneﬁtsfromsyntacticfea-tures(Zhangetal.,2013).Weconsiderthemostpromisingaspectofoursystemtobethatitisnaturallyincremental,soitshouldbestraightforwardtoextendthesystemtooperateonunsegmentedtextinsubsequentwork.Duetoitsuseofsyntacticfeatures,fromthejointmodel,thesystemissubstantiallymoreaccuratethanthepreviousstate-of-the-artinincrementaldisﬂuencydetection,77%(Zwartsetal.,2010).11ConclusionWehavepresentedanefﬁcientandaccuratejointmodelofdependencyparsinganddisﬂuencyde-tection.Themodelout-performspipelineap-proachesusingstate-of-the-artdisﬂuencydetec-tors,andishighlyefﬁcient,processingover550tokensasecond.Becausethesystemisincremen-tal,itshouldbestraight-forwardtoapplyittoun-segmentedtext.Thesuccessofanincremental,non-monotonicparseratdisﬂuentspeechparsingmayalsobeofsomepsycholinguisticinterest.AcknowledgmentsTheauthorswouldliketothanktheanony-mousreviewersfortheirvaluablecomments.ThisresearchwassupportedundertheAus-tralianResearchCouncil’sDiscoveryProjectsfundingscheme(projectnumbersDP110102506andDP110102593).ReferencesMiguelBallesterosandJoakimNivre.2013.Go-ingtotherootsofdependencyparsing.Compu-tationalLinguistics.39:1. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 1 1 5 6 6 8 8 7 / / t l a c _ a _ 0 0 1 7 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 141 PeterF.Brown,PeterV.deSouza,RobertL.Mer-cer,VincentJ.DellaPietra,andJeniferC.Lai.1992.Class-basedn-grammodelsofnaturallanguage.ComputationalLinguistics,18:467–479.EugeneCharniak.2001.Immediate-headparsingforlanguagemodels.InProceedingsof39thAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages124–131.Associa-tionforComputationalLinguistics,Toulouse,France.EugeneCharniakandMarkJohnson.2001.Editdetectionandparsingfortranscribedspeech.InProceedingsofthe2ndMeetingoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics,pages118–126.TheAs-sociationforComputationalLinguistics.EugeneCharniakandMarkJohnson.2005.Coarse-to-ﬁnen-bestparsingandMaxEntdis-criminativereranking.InProceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics,pages173–180.As-sociationforComputationalLinguistics,AnnArbor,Michigan.MichaelCollins.2002.DiscriminativetrainingmethodsforhiddenMarkovmodels:Theoryandexperimentswithperceptronalgorithms.InProceedingsofthe2002ConferenceonEmpir-icalMethodsinNaturalLanguageProcessing,pages1–8.AssociationforComputationalLin-guistics.Marie-CatherinedeMarneffe,BillMacCartney,andChristopherD.Manning.2006.Generatingtypeddependencyparsesfromphrasestructureparses.InProceedingsofthe5thInternationalConferenceonLanguageResourcesandEvalu-ation(LREC).LynFrazierandKeithRayner.1982.Makingandcorrectingerrorsduringsentencecomprehen-sion:Eyemovementsintheanalysisofstruc-turallyambiguoussentences.CognitivePsy-chology,14(2):178–210.YoavGoldbergandJoakimNivre.2012.Ady-namicoracleforarc-eagerdependencyparsing.InProceedingsofthe24thInternationalCon-ferenceonComputationalLinguistics(Coling2012).AssociationforComputationalLinguis-tics,Mumbai,India.MatthewHonnibal,YoavGoldberg,andMarkJohnson.2013.Anon-monotonicarc-eagertransitionsystemfordependencyparsing.InProceedingsoftheSeventeenthConferenceonComputationalNaturalLanguageLearning,pages163–172.AssociationforComputationalLinguistics,Soﬁa,Bulgaria.LiangHuang,SuphanFayong,andYangGuo.2012.Structuredperceptronwithinexactsearch.InProceedingsofthe2012Con-ferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages142–151.AssociationforComputationalLinguis-tics,Montr´eal,Canada.MarkJohnsonandEugeneCharniak.2004.ATAG-basednoisychannelmodelofspeechre-pairs.InProceedingsofthe42ndAnnualMeet-ingoftheAssociationforComputationalLin-guistics,pages33–39.DouglasA.Jones,FlorianWolf,EdwardGib-son,ElliottWilliams,EvelinaFedorenko,Dou-glasA.Reynolds,andMarcA.Zissman.2003.Measuringthereadabilityofautomaticspeech-to-texttranscripts.InINTERSPEECH.ISCA.FredrikJorgensen.2007.Theeffectsofdisﬂu-encydetectioninparsingspokenlanguage.InJoakimNivre,Heiki-JaanKaalep,KadriMuis-chnek,andMareKoit,editors,Proceedingsofthe16thNordicConferenceofComputationalLinguisticsNODALIDA-2007,pages240–244.TerryKooandMichaelCollins.2010.Efﬁcientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11.PercyLiang.2005.Semi-supervisedlearningfornaturallanguage.Ph.D.thesis,MIT.ChristopherD.Manning.2011.Part-of-speechtaggingfrom97linguistics?InProceedingsofthe12thinternationalconferenceonComputa-tionallinguisticsandintelligenttextprocessing-VolumePartI,CICLing’11,pages171–189.Springer-Verlag,Berlin,Heidelberg.MichellP.Marcus,BeatriceSantorini,andMaryAnnMarcinkiewicz.1993.BuildingalargeannotatedcorpusofEnglish:ThePennTreebank.ComputationalLinguistics,19(2):313–330.JoakimNivre.2003.Anefﬁcientalgorithmforprojectivedependencyparsing.InProceedings l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 1 1 5 6 6 8 8 7 / / t l a c _ a _ 0 0 1 7 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 142 ofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages149–160.JoakimNivre.2008.Algorithmsfordeterminis-ticincrementaldependencyparsing.Computa-tionalLinguistics,34:513–553.XianQianandYangLiu.2013.Disﬂuencydetec-tionusingmulti-stepstackedlearning.InPro-ceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics:HumanLanguageTech-nologies,pages820–825.AssociationforCom-putationalLinguistics,Atlanta,Georgia.MohammadSadeghRasooliandJoelTetreault.2013.Jointparsinganddisﬂuencydetectioninlineartime.InProceedingsofthe2013Con-ferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages124–129.AssociationforComputationalLinguistics,Seattle,Wash-ington,USA.BrianRoark,MaryHarper,EugeneCharniak,BonnieDorr,MarkJohnson,JeremyKahn,YangLiu,MaryOstendorf,JohnHale,AnnaKrasnyanskaya,MatthewLease,IzhakShafran,MatthewSnover,RobinStewart,andLisaYung.2006.Sparseval:Evaluationmetricsforpars-ingspeech.InProceedingsofLanguageRe-sourceandEvaluationConference,pages333–338.EuropeanLanguageResourcesAssocia-tion(ELRA),Genoa,Italy.FrancescoSartorio,GiorgioSatta,andJoakimNivre.2013.Atransition-baseddependencyparserusingadynamicparsingstrategy.InPro-ceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics,pages135–144.AssociationforComputationalLin-guistics,Soﬁa,Bulgaria.ElizabethShriberg.1994.PreliminariestoaThe-oryofSpeechDisﬂuencies.Ph.D.thesis,Uni-versityofCalifornia,Berkeley.XuSun,TakuyaMatsuzaki,DaisukeOkanohara,andJun’ichiTsujii.2009.Latentvariableper-ceptronalgorithmforstructuredclassiﬁcation.InIJCAI,pages1236–1242.JosephTurian,Lev-ArieRatinov,andYoshuaBengio.2010.Wordrepresentations:Asimpleandgeneralmethodforsemi-supervisedlearn-ing.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguis-tics,pages384–394.AssociationforComputa-tionalLinguistics,Uppsala,Sweden.DongdongZhang,ShuangzhiWu,NanYang,andMuLi.2013.Punctuationpredictionwithtransition-basedparsing.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages752–760.As-sociationforComputationalLinguistics,Soﬁa,Bulgaria.YueZhangandStephenClark.2011.Syntac-ticprocessingusingthegeneralizedperceptronandbeamsearch.ComputationalLinguistics,37(1):105–151.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages188–193.AssociationforComputationalLinguistics,Portland,USA.MuhuaZhu,YueZhang,WenliangChen,MinZhang,andJingboZhu.2013.Fastandaccu-rateshift-reduceconstituentparsing.InPro-ceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics,pages434–443.AssociationforComputationalLin-guistics,Soﬁa,Bulgaria.SimonZwartsandMarkJohnson.2011.Theim-pactoflanguagemodelsandlossfunctionsonrepairdisﬂuencydetection.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages703–711.AssociationforComputationalLinguistics,Portland,USA.SimonZwarts,MarkJohnson,andRobertDale.2010.Detectingspeechrepairsincrementallyusinganoisychannelapproach.InProceedingsofthe23rdInternationalConferenceonCom-putationalLinguistics(Coling2010),pages1371–1378.Coling2010OrganizingCommit-tee,Beijing,China.
Descargar PDF