Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 131–142. Editor de acciones: Joakim Nivré.
Submitted 11/2013; Revised 2/2014; Publicado 4/2014. C(cid:13)2014 Asociación de Lingüística Computacional.
JointIncrementalDisfluencyDetectionandDependencyParsingMatthewHonnibalDepartmentofComputingMacquarieUniversitySydney,Australiamatthew.honnibal@mq.edu.edu.auMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,Australiamark.johnson@mq.edu.edu.auAbstractWepresentanincrementaldependencyparsingmodelthatjointlyperformsdisflu-encydetection.Themodelhandlesspeechrepairsusinganovelnon-monotonictran-sitionsystem,andincludesseveralnovelclassesoffeatures.Forcomparison,weevaluatedtwopipelinesystems,us-ingstate-of-the-artdisfluencydetectors.Thejointmodelperformedbetteronbothtasks,withaparseaccuracyof90.5%and84.0%accuracyatdisfluencydetection.Themodelrunsinexpectedlineartime,andprocessesover550tokensasecond.1IntroductionMostunscriptedspeechcontainsfilledpauses(umsanduhs),anderrorsthatareusuallyeditedon-the-flybythespeaker.Disfluencydetectionisthetaskofdetectingtheseinfelicitiesinspokenlanguagetranscripts.Thetaskhassomeimme-diatevalue,asdisfluencieshavebeenshowntomakespeechrecognitionoutputmuchmoredif-ficulttoread(Jonesetal.,2003),buthasalsobeenmotivatedasamoduleinanaturallanguageunderstandingpipeline,becausedisfluencieshaveprovenproblematicforPCFGparsingmodels.Insteadofapipelineapproach,webuildonre-centworkintransition-baseddependencyparsing,toperformthetwotasksjointly.Therehavebeentwosmallstudiesofdependencyparsingonun-scriptedspeech,bothusingentirelygreedypars-ingstrategies,withoutadirectcomparisonagainstapipelinearchitecture(Jorgensen,2007;RasooliandTetreault,2013).Wegosubstantiallybeyondthesepilotstudies,andpresentasystemthatcom-paresfavourablytoapipelineconsistingofstate-of-the-artcomponents.OurparserlargelyfollowsthedesignofZhangandClark(2011).Weuseastructuredaveragedperceptronmodelwithbeam-searchdecoding(collins,2002).OurfeaturesetisbasedonZhangandClark(2011),andourtransition-systemisbasedonthearc-eagersystemofNivre(2003).Weextendthetransitionsystemwithanovelnon-monotonictransition,Edit.Itallowssen-tenceslike‘Passthepepperuhsalt’tobeparsedincrementally,withouttheneedtoguessearlythatpepperisdisfluent.Thisisachievedbyre-processingtheleftwardchildrenofthewordEditmarksasdisfluent.Forinstance,iftheparserat-tachesthetopepper,butsubsequentlymarkspep-perasdisfluent,thewillbereturnedtothestack.Wealsoexploittheeasewithwhichthemodelcanincorporatearbitraryfeatures,anddesignasetoffeaturesthatcapturethe‘roughcopy’structureofsomespeechrepairs,whichmotivatedtheJohnsonandCharniak(2004)noisychannelmodel.Ourmaincomparisonisagainsttwopipelinesystems,whichusethetwocurrentstate-of-the-artdisfluencydetectionsystemsaspre-processorstoourparser,minusthecustomdisfluencyfea-turesandtransition.Thejointmodelcomparedfavourablytothepipelineparsersatbothtasks,withanunlabelledattachmentscoreof90.5%,and84.0%accuracyatdetectingspeechrepairs.Anef-ficientimplementationisavailableunderanopen-sourcelicense.1Thefutureprospectsofthesys-temarealsoquitepromising.Becausetheparserisincremental,itshouldbewellsuitedtoun-segmentedtextsuchastheoutputofaspeech-recognitionsystem.Weconsiderourmaincon-tributionstobe:•anovelnon-monotonictransitionsystem,forspeechrepairsandrestarts,1http://github.com/syllog1sm/redshift
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
132
Aflighttoum|{z}FPBoston|{z}RMImean|{z}IMDenver|{z}RPTuesdayFigure1:AsentencewithdisfluenciesannotatedinthestyleofShriberg(1994)andtheSwitchboardcor-pus.FP=FilledPause,RM=Reparandum,IM=Interregnum,RP=Repair.Wefollowpreviousworkinevaluatingthesys-temontheaccuracywithwhichitidentifiesspeech-repairs,markedreparandumabove.•severalnovelfeatureclasses,•directcomparisonagainstthetwobestdisflu-encypre-processors,and•state-of-the-artaccuracyforbothspeechparsinganddisfluencydetection.2SwitchboardDisfluencyAnnotationsTheSwitchboardportionofthePennTreebank(Marcusetal.,1993)consistsoftelephoneconver-sationsbetweenstrangersaboutanassignedtopic.Twoannotationlayersareprovided:oneforsyn-tacticbracketing(MRGfiles),andonefordisflu-encies(DPSfiles).Thedisfluencylayermarksel-ementswithlittleornosyntacticfunction,suchasfilledpausesanddiscoursemarkers,andannotatesspeechrepairsusingtheShriberg(1994)systemofreparandum/interregnum/repair.AnexampleisshowninFigure1.Inthesyntacticannotation,editedwordsarecoveredbyaspecialnodelabelledEDITED.Theideaistomarktextwhich,ifexcised,wouldre-sultinagrammaticalsentence.TheMRGfilesdonotmarkothertypesofdisfluencies.WefollowtheevaluationdefinedbyCharniakandJohnson(2001),whichevaluatestheaccuracyofidentify-ingspeechrepairsandrestarts.Thisdefinitionofthetaskisthestandardinrecentwork.Thereasonforthisisthatfilledpausescanbedetectedusingasimplerule-basedapproach,andparentheticalshavelessimpactonreadabilityanddown-streamprocessingaccuracy.TheMRGandDPSlayershavehighbutim-perfectagreementoverwhattokenstheymarkasspeechrepairs:ofthetextannotatedwithbothlay-ers,33,720tokensaremarkedasdisfluentinatleastonelayer,32,310areonlymarkedasdisflu-entbytheDPSfiles,and32,742areonlymarkedasdisfluentbytheMRGlayer.TheSwitchboardannotationprojectwasnotfullycompleted.Becausedisfluencyannotationischeapertoproduce,manyoftheDPStrainingfilesdonothavematchingMRGfiles.Only619,236ofthe1,482,845tokensintheDPSdisfluency-detectiontrainingdatahavegold-standardsyntac-ticparses.Oursystemrequiresthemoreexpen-sivesyntacticannotation,butwefindthatitout-performsthepreviousstate-of-the-art(QianandLiu,2013),despitetrainingonlessthanhalfthedata.2.1DependencyConversionAsisstandardinstatisticaldependencyparsingofEnglish,weacquireourgold-standarddepen-denciesfromphrase-structuretrees.Weusedthe2013-04-05versionoftheStanforddependencyconverter(deMarneffeetal.,2006).AsisstandardforEnglishdependencyparsing,weusetheBa-sicDependenciesscheme,whichproducesstrictlyprojectiverepresentations.Atfirstwefearedthatthefilledpauses,disfluen-ciesandmeta-datatokensintheSwitchboardcor-pusmightdisrupttheconversionprocess,bymak-ingitmoredifficultfortheconvertertorecognisetheunderlyingproductionrules.Totestthis,weperformedasmallexperiment.Wepreparedtwoversionsofthecorpus:onewhereEDITEDnodes,filledpausesandmeta-datawereremovedbeforethetreesweretransformedbytheStanfordconverter,andonewherethedis-fluencyremovalwasperformedafterthedepen-dencyconversion.Theresultingcorporawerelargelyidentical:99.54%ofunlabelledand98.7%oflabelleddependencieswerethesame.ThefactthattheStanfordconverterisquiterobusttodis-fluencieswasusefulforourbaselinejointmodel,whichistrainedondependencytreesthatalsoin-cludedgovernorsfordisfluentwords.Wefollowpreviousworkondisfluencydetec-tionbylower-casingthetextandremovingpunc-tuationandpartialwords(wordstaggedXXandwordsendingin‘-’).Wealsoremoveone-tokensentences,astheirsyntacticanalysesaretrivial.Wefoundthattwoadditionalsimplepre-processesimprovedourresults:discardingall‘um’and‘uh’tokens;andmerging‘youknow’and‘imean’intosingletokens.Thesepre-processescanbecompletedonthein-putstringwithoutlosinginformation:noneofthe‘um’or‘uh’tokensaresemanticallysignificant,andthebigramsyouknowandimeanhaveade-pendencybetweenthetwotokensover99.9%ofthetimestheyoccurinthetreebank,withyouandIneverhavinganychildren.Thismakesiteasytounmergethetokensdeterministicallyafterpars-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
133
En g:allincomingandoutgoingarcswillpointtoknowormean.Thesamepre-processingwasper-formedforallourparsingsystems.3Transition-basedDependencyParsingAtransition-basedparserpredictsthesyntacticstructureofasentenceincrementally,bymakingasequenceofclassificationdecisions.WefollowthearchitectureofZhangandClark(2011),whousebeam-searchfordecoding,andastructuredav-eragedperceptronfortraining.Despiteitssimplic-ity,thistypeofparserhasproducedhighlycom-petitiveresultsontheWallStreetJournal:withtheextendedfeaturesetdescribedbyZhangandNivre(2011),itachieves93.5%unlabelledaccuracyonStanfordbasicdependencies(deMarneffeetal.,2006).ConvertingtheconstituencytreesproducedbytheCharniakandJohnson(2005)rerankingparserresultsinsimilaraccuracy.Briefly,thetransition-basedparserconsistsofaconfiguration(or‘state’)whichissequentiallyma-nipulatedbyasetofpossibletransitions.Forus,astateisa4-tuplec=(pag,b,A,D),whereσandβaredisjointsetsofwordindicestermedthestackandbufferrespectively,Aisthesetofdependencyarcs,andDisthesetofwordindicesmarkeddis-fluent.TherearenoarcstoorfrommembersofD,sothedependenciesanddisfluenciescanbeimple-mentedasasinglevector(inourparser,atokenismarkedasdisfluentbysettingitasitsownhead).Weusethearc-eagertransitionsystem(Nivre,2003,2008),whichconsistsoffourparsingac-tions:Shift,Left-Arc,Right-ArcandReduce.Wedenotethestackwithitstopmostelementtotheright,andthebufferwithitsfirstelementtotheleft.Averticalbarisusedtoindicateconcate-nationtothestackorbuffer,e.g.σ|iindicatesastackwiththetopmostelementiandremainingelementsσ.Adependencyfromagovernoritoachildjisdenotedi→j.Thefourarc-eagertransitionsareshowninFigure2.TheShiftactionmovesthefirstitemofthebufferontothestack.TheRight-Arcdoesthesame,butalsoaddsanarc,sothatthetoptwoitemsonthestackareconnected.TheReducemoveandtheLeft-Arcbothpopthestack,buttheLeft-Arcfirstaddsanarcfromthefirstwordofthebuffertothewordontopofthestack.Con-straintsontheReduceandLeft-Arcmovesensurethateverywordisassignedexactlyoneheadinthefinalconfiguration.Wefollowthesuggestion(pag,i|b,A,D)'(pag|i,b,A,D)S(pag|i,j|b,A,D)'(pag,j|b,A∪{j→i},D)LOnlyifidoesnothaveanincomingarc.(pag|i,j|b,A,D)'(pag|i|j,b,A∪{i→j},D)R(pag|i,b,A,D)'(pag,b,A,D)DOnlyifihasanincomingarc.(pag|i,j|b,A,D)'(pag|[x1,xn],j|b,A0,D0)EWhereA0=A\{x→yory→x:∀x∈[i,j),∀y∈N}D0=D∪[i,j)x1…xnaretheformerleftchildrenofiFigure2:Ourparser’stransitionsystem.Thefirstfourtransitionsarethestandardarc-eagersystem;thefifthisournovelEdittransition.ofBallesterosandNivre(2013)andaddadummytokenthatgovernsrootdependenciestotheendofthesentence.Parsingterminateswhenthistokenisatthestartofthebuffer,andthestackisempty.DisfluenciesareaddedtoDviatheEdittransition,mi,whichwenowdefine.4ANon-MonotonicEditTransitionOneofthereasonsdisfluentsentencesarehardtoparseisthatthereoftenappeartobesyntacticre-lationshipsbetweenwordsinthereparandumandthefluentsentence.Whentheserelationsarecon-sideredinadditiontothedependenciesbetweenfluentwords,theresultingstructureisnotneces-sarilyaprojectivetree.Figure3showsasimpleexample,wherethere-pairsquarereplacesthereparandumrectangle.Anincrementalparsercouldeasilybecome‘garden-pathed’andattachtherepairsquaretothepreced-ingwords,constructingthedependenciesshowndottedinFigure3.Ratherthanattemptingtode-viseanincrementalmodelthatavoidsconstruct-ingsuchdependencies,weallowtheparsertocon-structthesedependenciesandlaterdeletethemifthegovernororchildaremarkeddisfluent.Psycholinguisticmodelsofhumansentenceprocessinghavelongpositedrepairmechanisms(FrazierandRayner,1982).Recientemente,Honnibaletal.(2013)showedthatalimitedamountof‘non-monotonic’behaviourcanimproveanincremen-talparser’saccuracy.Wehereintroduceanon-monotonictransition,Edit,forspeechrepairs.TheEdittransitionmarksthewordiontopofthestackσ|iasdisfluent,alongwithitsright-warddescendents—i.e.,allwordsinthesequencei…j−1,wherejisthewordatthestartofthebuffer.Itthenrestoresthewordsbothprecedingandformerlygovernedbyitothestack.Inotherwords,thewordontopofthestackand
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
134
PassmetheredrectangleuhImeansquareFigure3:Examplewhereapparentdependenciesbetweenthereparandumandthefluentsentencecomplicateparsing.Thedottededgesaredifficultforanincrementalparsertoavoid,butcannotbepartofthefinalparseifitistobeaprojectivetree.Oursolutionistomakethetransitionsystemnon-monotonic:theparserisabletodeleteedges.itsrightwarddescendentsareallmarkedasdis-fluent,andthestackispopped.Wethenrestoreitsleftwardchildrentothestack,andalldepen-denciestoandfromwordsmarkeddisfluentaredeleted.Thetransitionisnon-monotonicinthesensethatitcandeletedependenciescreatedbyaprevioustransition,andreplacetokensontothestackthathadbeenpopped.Whyrevisittheleftwardchildren,butnottheright?Weareconcernedaboutdependencieswhichmightbemirroredbetweenthereparandumandtherepair.Therightwardsubtreeofthedisflu-encymightwellbeincorrect,butifitis,itwouldstillbeincorrectifthewordontopofthestackwereactuallyfluent.Wethereforeregardtheseasparsingerrorsthatwewilltrainourmodeltoavoid.Incontrast,avoidingtheLeft-Arctransi-tionswouldrequiretheparsertopredictthattheheadisdisfluentwhenithasnotnecessarilyseenanyevidenceindicatingthat.4.1WorkedExampleFigure4showsagold-standardderivationforadisfluentsentencefromthedevelopmentdata.Line1showsthestateresultingfromtheinitialShiftaction.Inthenextthreestates,HisisLeft-Arcedtocompany,whichisthenShiftedontothestack,andLeft-ArcedtowentinLine4.Thedependencybetweenwentandcompanyisnotpartofthegold-standard,becausewentisdis-fluent.Thecorrectgovernorofcompanyisthesec-ondwentinthesentence.TheLeft-ArcmoveinLine4canstillbeconsideredcorrect,sin embargo,be-causethegold-standardanalysisisstillderivablefromtheresultingconfiguration,viatheEdittran-sition.Anothernon-golddependencyiscreatedinLine6,betweenbrokeandwent,beforebrokeisReducedfromthestackinLine7.Lines9and10showthestatesbeforeandaftertheEdittransition.ThewordontopofthestackinLine9,went,hasoneleftwardchild,andoneright-1.SHiscompanywentbrokeimeanwentbankrupt2.LHiscompanywentbrokeimeanwentbankrupt3.SHiscompanywentbrokeimeanwentbankrupt4.LHiscompanywentbrokeimeanwentbankrupt5.SHiscompanywentbrokeimeanwentbankrupt6.RHiscompanywentbrokeimeanwentbankrupt7.DHiscompanywentbrokeimeanwentbankrupt8.SHiscompanywentbrokeimeanwentbankrupt9.LHiscompanywentbrokeimeanwentbankrupt10.EHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt11.LHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt12.SHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt12.RHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankrupt13.DHiscompany(cid:24)(cid:24)went(cid:24)(cid:24)(cid:24)brokeimeanwentbankruptFigure4:Agold-standardtransitionsequenceusingourEDITtransition.Eachlinespecifiesanactionandshowsthestateresultingfromit.Wordsonthestackarecircled,andthearrowindicatesthestartofthebuffer.Disfluentwordsarestruck-through.wardchild.AftertheEdittransitionisapplied,wentanditsrightwardchildbrokearebothmarkeddisfluent,andcompanyisreturnedtothestack.Allofthepreviousdependenciestoandfromwentandbrokearedeleted.Parsingthenproceedsasnormal,withthecor-rectgovernorofcompanybeingassignedbytheLeft-ArcinLine11,andbankruptbeingRight-ArcedtowentinLine12.Toconservespace,wehaveomittedthedummyROOTtoken,whichisplacedattheendofthesentence,followingthesuggestionofBallesterosandNivre(2013).ThefinalactionwillbeaLeft-ArcfromtheROOTto-kentowent.4.2DynamicOracleTrainingAlgorithmOurnon-monotonictransitionsystemintroducessubstantialspuriousambiguity:thegold-standardparsecanbederivedviamanydifferenttransition
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
135
sequences.Recentworkhasshownthatthiscanbeadvantageous(Sartorioetal.,2013;Honnibaletal.,2013;GoldbergandNivre,2012),becausedifficultdecisionscansometimesbedelayeduntilmoreinformationisavailable.Line5ofFigure4showsastatethatintroducesspuriousambiguity.Fromthisconfiguration,therearemultipleactionsthatcouldbeconsidered‘cor-rect’,inthesensethatthegold-standardanalysiscanbederivedfromthem.TheEdittransitioniscorrectbecausewentisdisfluent,buttheLeft-ArcandeventheRight-Arcarealsocorrect,inthattherearecontinuationsfromthemthatleadtothegold-standardanalysis.Weregardalltransitionsequencesthatcanre-sultinthecorrectanalysisasequallyvalid,andwanttoavoidstipulatingoneofthemduringtrain-ing.WeachievethisbyfollowingGoldbergandNivre(2012)inusingadynamicoracletocreatepartiallylabelledtrainingdata.2Adynamicoracleisafunctionthatdeterminesthecostofapplyinganactiontoastate,intermsofgold-standardarcsthatarenewlyunreachable.WefollowCollins(2002)intraininganaver-agedperceptronmodeltopredicttransitionse-quences,ratherthanindividualtransitions.Thistypeofmodelisoftenreferredtoasastruc-turedperceptron,orsometimesaglobalpercep-tron.Duringtraining,ifthemodeldoesnotpre-dictthecorrectsequence,anupdateisperformed,basedonthegold-standardsequenceandpartofthesequencepredictedbythecurrentweights.Onlypartofthesequenceisusedtocalculatetheweightupdate,inordertoaccountforsearcher-rors.Weusethemaximumviolationstrategyde-scribedbyHuangetal.(2012)toselectthesubse-quencetoupdatefrom.Totrainourmodelusingthedynamicoracle,weusethelatent-variablestructuredperceptronal-gorithmdescribedbySunetal.(2009).Beam-searchisperformedtofindthehighest-scoringgold-standardsequence,aswellasthehighest-scoringprediction.Weusethesamebeam-widthforbothsearchprocedures.4.3PathLengthNormalisationOneproblemintroducedbytheEdittransitionisthatthenumberofactionsappliedtoasentenceis2Thetrainingdataispartiallylabelledinthesensethatin-stancescanhavemultipletruelabels.Equivalently,onemightsaythatthetransitionsarelatentvariables,whichgeneratethedependencies.nolongerconstant—itisnolongerguaranteedtobe2n−1,forasentenceoflengthn.WhentheEdittransitionisappliedtoawordwithleftwardchildren,thosechildrenarereturnedtothestack,andprocessedagain.Thishaslittletonoimpactonthealgorithm’sempiricalefficiency,althoughworst-casecomplexityisnolongerlinear,butitdoesposeaproblemfordecoding.Theperceptronmodeltendstoassignlargepos-itivescorestoitstopprediction.Wethusob-servedaproblemwhencomparingpathsofdiffer-entlengths,attheendofthesentence.PathsthatincludedEdittransitionswerelonger,sothesumoftheirscorestendedtobehigher.ThesameproblemhasbeenobservedduringincrementalPCFGparsing,byZhuetal.(2013).Theyintroduceanadditionaltransition,IDLE,toensurethatpathsarethesamelength.Solongasonecandidateinthebeamisstillbeingprocessed,allothercandidatesapplytheIDLEtransition.Weadoptasimplersolution.Wenormalisethefigure-of-meritforacandidatestate,whichisusedtorankitinthebeam,bythelengthofitstransitionhistory.Thenewfigure-of-meritisthearithmeticmeanofthecandidate’stransitionscores,wherepreviouslythefigure-of-meritwasthesumofthecandidate’stransitionscores.Interestingly,Zhuetal.(2013)reportthattheytriedexactlythis,andthatitwaslesseffectivethantheirsolution.Wefoundthatthefeaturesassoci-atedwiththeIDLEtransitionwereuninformative(thestateisattermination,sothestackandbufferareempty),andhadnothingtodowithhowmanyedittransitionswereearlierapplied.5FeaturesfortheJointParserOurbaselineparserusesthefeaturesetdescribedbyZhangandNivre(2011).Thefeaturesetcon-tains73templatesthatmostlyrefertotheprop-ertiesof12contexttokens:thetopofthestack(S0),itstwoleftmostandrightmostchildren(S0L,S0L2,S0R,S0R2),itsparentandgrand-parent(S0h,S0h2),thefirstwordofthebufferanditstwoleftmostchildren(N0,N0L,N0LL),andthenexttwowordsofthebuffer(N1,N2).Atomicfeaturesconsistoftheword,part-of-speechtag,ordependencylabelforthesetokens;andmultiplefeatureatomsareoftencombinedforfeaturetemplates.Therearealsofeaturesforthestring-distancebetweenS0andN0,andtheleftandrightvalencies(totalnumberofchildren)de
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
136
S0andN0,aswellasthesetoftheirchildren’sde-pendencylabels.Werestrictthesetothefirstandlast2childrenforimplementationefficiency,aswefoundthishadnoeffectonaccuracy.Numericfeatures(fordistanceandvalency)arebinnedwiththefunctionλx:mín.(X,5).Thereisonlyonebi-lexicalfeaturetemplate,whichpairsthewordsofS0andN0.Therearealsotentri-tagtemplates.OurfeaturesetincludesadditionaldependencylabelfeaturesnotusedbyZhangandNivre(2011),aswefoundthatdisfluencydetectionerrorsoftenresultedinungrammaticaldependencylabelcom-binations.TheadditionaltemplatescombinethePOStagofS0withtwoorthreedependencyla-belsfromitsleftandrightsubtrees.Detailscanbefoundinthesupplementarymaterial.5.1BrownClusterFeaturesTheBrownclusteringalgorithm(Brownetal.,1992)isawellknownsourceofsemi-supervisedfeatures.Theclusteringalgorithmisrunoveralargesampleofunlabelleddata,togenerateatype-to-clustermap.Thismappingisthenusedtogeneratefeaturesthatsometimesgeneralisebetterthanlexicalfeatures,andarehelpfulforout-of-vocabularywords(Turianetal.,2010).KooandCollins(2010)foundthatBrownclus-terfeaturesgreatlyimprovedtheperformanceofagraph-baseddependencyparser.Onourtransition-basedparser,BrownclusterfeaturesbringasmallbutstatisticallysignificantimprovementontheWSJtask(0.1-0.3%UAS).Otherdevelopersoftransition-basedparsersseemtohavefoundsim-ilarresults(personalcommunication).SinceaBrownclustermappingcomputedbyLiang(2005)iseasilyavailable,3thefeaturesaresimpletoim-plementandcheaptocompute.OurtemplatesfollowKooandCollins(2010)inincludingfeaturesthatrefertoclusterprefixstrings,aswellasthefullclusters.Weadapttheirtemplatestotransition-basedparsingbyreplacing‘head’with‘itemontopofthestack’and‘child’with‘firstwordofthebuffer’.Theexacttemplatescanbefoundinthesupplementarymaterial.TheBrownclusterfeaturesareusedinour‘baseline’parser,andintheparsersweuseaspartofourpipelinesystems.Theyimproveddevelop-mentsetaccuracyby0.4%.Weexperimentedwiththeotherfeaturesetsintheseparsers,butfoundthattheydidnotimproveaccuracyonfluenttext.3http://www.metaoptimize.com/projects/wordreps5.2RoughCopyFeaturesJohnsonandCharniak(2004)pointoutthatinspeechrepairs,therepairisoftena‘roughcopy’ofthereparandum.Thesimplestcaseofthisiswheretherepairisasinglewordrepetition.Itiscommonfortherepairtodifferfromthereparan-dumbyinsertion,deletionorsubstitutionofoneormorewords.Tocapturethisregularity,wefirstextendthefeature-setwiththreenewcontexttokens:41.S0re:TherightmostedgeofS0descendants;2.S0le:TheleftmostedgeofS0descendants;3.N0le:TheleftmostedgeofN0descendants.Ifawordhasnoleftwardchildren,itwillbeitsownleft-edge,andsimilarlyitwillbeitsownrightwardedgeifithasnorightwardchildren.NotethatthetokenS0reisnecessarilyimmedi-atelybeforeN0le,unlesssomeofthetokensbe-tweenthemaredisfluent.WeusetheS0leandN0letocomputethefollowingrough-copyfeatures:1.HowlongistheprefixwordmatchbetweenS0le…S0andN0le…N0?Iftheparserwereanalysingtheredthebluesquare,withredonthestackandsquareatN0,itsvaluewouldbe1.2.HowlongistheprefixPOStagmatchbe-tweenS0le…S0andN0le…N0?3.DothewordsinS0le…S0andN0le…N0matchexactly?4.DothePOStagsinS0le…S0andN0le…N0matchexactly?Iftheparserwereanalysingtheredsquarethebluerectangle,withsquareonthestackandrectangleatN0,itsvaluewouldbetrue.Theprefix-lengthfeaturesarebinnedusingthefunctionλx:mín.(X,5).5.3MatchFeaturesThisclassoffeaturesaskwhichpairsofthecon-texttokensmatch,inwordorPOStag.Thecon-texttokensintheZhangandNivre(2011)fea-turesetarethetopofthestack(S0),itsheadand4Asiscommoninthistypeofparser,ourimplementationhasanumberofvectorsforpropertiesthataredefinedbeforeparsing,suchaswordforms,POStags,Brownclusters,etc.Acontexttokenisanindexintothesevectors,allowingfeaturesconsideringthesepropertiestobecomputed.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
137
grandparent(S0h,S0h2),itstwoleft-andright-mostchildren(S0L,S0L2,S0R,S0R2),thefirstthreewordsofthebuffer(N0,N1,N2),andthetwoleftmostchildrenofN0(N0L,N0LL).Weex-tendthissetwiththeS0le,S0reandN0letokensdescribedabove,andalsothefirstleftandrightchildofS0andN0(S0L0,S0R0,N0L0).Allup,thereare18contexttokens,entonces(cid:0)182(cid:1)=153tokenpairs.Foreachpairofthesetokens,weaddtwobinaryfeatures,indicatingwhetherthetwotokensmatchinwordformorPOStag.Wealsohavetwofurtherclassesoffeatures:ifthewordsdomatch,afeatureisaddedindicat-ingthewordform;ifthetagsmatch,afeatureisaddedindicatingthetag.Thesefinergrainedver-sionshelpthemodeladjustforthefactthatsomewordscanbeduplicatedingrammaticalsentences(e.g.‘thatthat’),whilemostrarewordscannot.5.4EditedNeighbourFeaturesDisfluenciesareusuallystringcontiguous,eveniftheydonotformasingleconstituent.Inthesesitu-ations,ourmodelhastomakemultipletransitionstomarkasingledisfluency.Forinstance,ifanut-terancebeginsandtheanda,thestackwillcontaintwoentries,forandandthe,andtwoEdittransi-tionswillberequired.Tomitigatethisdisadvantageofourmodel,weaddfourbinaryfeatures.TwofirewhenthewordorpairofwordsimmediatelyprecedingN0havebeenmarkeddisfluent;theothertwofirewhenthewordorpairofwordsimmediatelyfollowingS0havebeenmarkeddisfluent.Thesefeaturespro-videanadditionalstring-basedviewthattheparserwouldotherwisebemissing.Speakerstendbedisfluentinbursts:ifthepreviouswordisdis-fluent,thenextwordismorelikelytobedisflu-ent.Thesefourfeaturesarethereforeallassoci-atedwithpositiveweightsfortheEdittransition.Withoutthesefeatures,wewouldmissanaspectofdisfluencyprocessingthatsequencemodelsnatu-rallycapture.6Part-of-SpeechTaggingWeadoptthestandardstrategyofusingaPOStaggerasapre-processbeforeparsing.Mosttransition-basedparsersuseastructuredaveragedperceptronmodelwithbeam-searchfortagging,asthismodelachievescompetitiveaccuracyandmatchesthestandarddependencyparsingarchi-tecture.Ourtaggeralsousesthisarchitecture.Weperformedsomeadditionalfeatureengi-neeringforthetagger,inordertoimproveitsaccu-racygiventhelackofcasedistinctionsandpunc-tuationinthedata.Ouradditionalfeaturesusetwosourcesofunsupervisedinformation.First,wefollowthesuggestionofManning(2011)byus-ingBrownclusterfeaturestoimprovethetagger’saccuracyonunknownwords.Second,wecom-pensateforthelackofcasedistinctionsbyinclud-ingfeaturesthataskwhatpercentageofthetimeawordformwasseentitle-cased,upper-casedandlower-casedintheGoogleWeb1Tcorpus.Wheremostpreviousworkusescross-foldtrainingforthetagger,toensurethattheparseristrainedontagsthatreflectrun-timeaccuracies,wedoonlinetrainingofthetaggeralongsidetheparser,usingthecurrenttaggermodeltoproducetagsduringparsertraining.Thishadnoimpactonparseaccuracy,andmadeitslightlyeasiertode-velopourtaggeralongsidetheparser.Thetaggerachieved96.5%accuracyonthede-velopmentdata,butwhenweranourfinaltestexperiments,wefounditsaccuracydroppedto96.0%,indicatingsomeover-fittingduringourfeatureengineering.Onthedevelopmentdata,ourparseraccuracyimprovesbyabout1%whengold-standardtagsareused.7ExperimentsWeusetheSwitchboardportionofthePennTree-bank(Marcusetal.,1993),asdescribedinSec-tion2,totrainourjointmodelsandevaluatethemondependencyparsinganddisfluencydetection.Thepre-processinganddependencyconversionaredescribedinSection2.1.Weusethestan-dardtrain/dev/testsplitfromCharniakandJohn-son(2001):Sections2and3fortraining,andSec-tion4dividedintothreeheld-outsections,thefirstofwhichisusedforfinalevaluation.OurparserevaluationusestheSPARSEVAL(Roarketal.,2006)metric.However,wewantedtousetheStanforddependencyconverter,forthereasonsdescribedinSection2.1,soweusedourownimplementation.Becausewedonotneedtodealwithrecognitionerrors,wedonotneedtoreportourparsingresultsusingP/R/F-measures.Instead,wereportanunlabelledaccuracyscore,whichreferstothepercentageoffluentwordswhosegovernorswereassignedcorrectly.Notethatwordsmarkedasdisfluentcannothaveanyin-comingorout-goingdependencies,soifawordis
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
138
incorrectlymarkedasdisfluent,allofitsdepen-dencieswillbeincorrect.WefollowJohnsonandCharniak(2004)andothersinrestrictingourdisfluencyevaluationtospeechrepairs,whichweidentifyaswordsthathaveanodelabelledEDITEDasanancestor.Un-likemostotherdisfluencydetectionresearch,wetrainonlyontheMRGfiles,givingus619,236wordsoftrainingdatainsteadofthe1,482,845usedbythepipelinesystems.Itmaybepossibletoimproveoursystem’sdisfluencydetectionbyleveragingtheadditionaldatathatdoesnothavesyntacticannotationinsomeway.Allparsingmodelsweretrainedfor15itera-tions.Wefoundthatoptimisingthenumberofiterationsonadevelopmentsetledtosmallim-provementsthatdidnottransfertoaseconddevel-opmentset(partofSection4,whichCharniakandJohnson(2001)reservedfor‘futureuse’).Wetestforstatisticalsignificanceinourresultsbytraining20modelsforeachexperimentalcon-figuration,usingdifferentrandomseeds.Theran-domseedscontrolhowthesentencesareshuf-fledduringtraining,whichtheperceptronmodelisquitesensitiveto.WeusetheWilcoxonrank-sumsnon-parametrictest.ThestandarddeviationinUASforasamplewastypicallyaround0.05%,and0.5%fordisfluencyF-measure.Allofourmodelsusebeam-searchdecoding,withabeamwidthof32.Wefoundthatabeamwidthof64broughtaverysmallaccuracyim-provement(about0.1%),atthecostof50%slowerrun-time.Widerbeamsthanthisbroughtnoac-curacyimprovement.Accuracyseemstoplateauwithslightlynarrowerbeamsthanonnewswiretext.ThisisprobablyduetotheshortersentencesinSwitchboard.Thebaselineandpipelinesystemsareconfig-uredinthesameway,exceptthatthebaselineparserismodifiedslightlytoallowittopredictdisfluencies,usingaspecialdependencylabel,ERASED.Alldescendantsofawordattachedtoitsheadbythislabelaremarkedasdisfluent.Boththebaselineandpipeline/oracleparsersusethesamefeatureset:theZhangandNivre(2011)características,plusourBrownclusterfeatures.Thebaselinesystemisastandardarc-eagertransition-basedparserwithastructuredaveragedperceptronmodelandbeam-searchdecoding.Themodelistrainedinthestandardway,witha‘static’oracleandmaximum-violationupdate,following(Huangetal.,2012).7.1ComparisonwithPipelineApproachesTheaccuracyofincrementaldependencyparsersiswellestablishedontheWallStreetJournal,buttherearenodependencyparsingresultsinthelit-eraturethatmakeiteasytoputourjointmodel’sparsingaccuracyintocontext.Wethereforecom-pareourjointmodeltotwopipelinesystems,whichconsistofadisfluencydetector,followedbyourdependencyparser.Wealsoevaluateparseac-curaciesafteroraclepre-processing,togaugetheneteffectofdisfluenciesonourparser’saccuracy.Thedependencyparserforthepipelinesystemswastrainedontextwithalldisfluenciesremoved,followingCharniakandJohnson(2001).ThetwodisfluencydetectionsystemsweusedweretheQianandLiu(2013)sequence-taggingmodel,andaversionoftheJohnsonandCharniak(2004)noisychannelmodel,usingtheCharniak(2001)syntacticlanguagemodelandthererankingfea-turesofZwartsandJohnson(2011).Theyarethetwobestpublisheddisfluencydetectionsystems.8ResultsTable1showsthedevelopmentsetaccuraciesforourjointparser.BoththedisfluencyfeaturesandtheEdittransitionmakestatisticallysignificantimprovements,inbothdisfluencyF-measure,un-labelledattachmentscore(UAS),andlabelledat-tachmentscore(LAS).TheOraclepipelinesystem,whichusesthegold-standardtocleandisfluenciespriortopars-ing,showsthetotalimpactofspeech-errorsontheparser.Thebaselineparser,whichusestheZhangandNivre(2011)featuresetplustheBrownclus-terfeatures,scores1.8%UASlowerthantheora-cle.WhenweaddthefeaturesdescribedinSec-tions5.2,5.3and5.4,thegapisreducedto1.2%(+Características).Finalmente,theimprovedtransitionsys-temreducesthegapfurtherstill,to0.8%UAS(+Edittransition).WealsotestedthesefeaturesintheOracleparser,butfoundtheywereineffec-tiveonfluenttext.Thew/scolumnshowsthetokensanalysedpersecondforeachsystem,includingdisfluencies,withasinglethreadona2.4GHzIntelXeon.Theadditionalfeaturesreduceefficiency,butthenon-monotonicEdittransitiondoesnot.Thesystemiseasilyefficientenoughforreal-timeuse.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
yo
a
C
_
a
_
0
0
1
7
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
139
PRFUASLASw/sBaselinejoint79.470.174.589.986.9711+Features86.077.281.390.587.5539+Edittransition92.280.285.890.987.9555Oraclepipeline10010010091.788.6782Table1:Developmentresultsforthejointmodels.Forthebaselinemodel,disfluenciesreduceparseaccuracyby1.7%UnlabelledAttachmentScore(UAS).OurfeaturesandEdittransitionreducethegapto0.7%,andimprovedisfluencyde-tectionby11.3%F-measure.Disfl.FUASJohnsonetalpipeline82.190.3QianandLiupipeline83.990.1Baselinejointparser73.989.4Finaljointparser84.190.5Table2:Test-setparseanddisfluencyaccuracies.ThejointparserisimprovedbythefeaturesandEdittransition,andisbetterthanpre-processingthetextwithstate-of-the-artdisflu-encydetectors.Table2showsthefinalevaluation.Ourmaincomparisoniswiththetwopipelinesystems,de-scribedinSection7.1.TheJohnsonandChar-niak(2004)systemwas1.8%lessaccurateatdis-fluencydetectionthantheotherdisfluencydetec-torweevaluated,thestate-of-the-artQianandLiu(2013)system.However,whenweevaluatedthetwosystemsaspre-processorsbeforeourparser,wefoundthattheJohnsonetalpipelineachieved0.2%betterunlabelledattachmentscorethantheQianandLiupipeline.WeattributethistotheuseoftheCharniakandJohnson(2001)syntac-ticlanguagemodelintheJohnsonetalpipeline,whichwouldhelpthesystemproducemoresyn-tacticallyconsistentoutput.Ourjointmodelachievedanunlabelledat-tachmentscoreof90.5%,out-performingbothpipelinesystems.TheBaselinejointparser,whichdidnotincludetheEdittransitionordisflu-encyfeatures,scores1.1%belowtheFinaljointparser.Alloftheparseaccuracydifferenceswerefoundtobestatisticallysignificant(pag<0.001).TheEdittransitionanddisfluencyfeaturesto-getherbroughta10.1%improvementindisfluencyF-measure,whichwasalsofoundtobestatisti-callysignificant.Thefinaljointparserachieved0.2%higherdisfluencydetectionaccuracythanthepreviousstate-of-the-art,theQianandLiu(2013)system,5despitehavingapproximatelyhalfasmuchtrainingdata(werequiresyntacticanno-5Ourscoresrefertoanupdatedversionofthesystemthatcorrectsminorpre-processingproblems.WethankQianXianformakinghiscodeavailable.tation,forwhichthereislessdata).Oursignificancetestingregimeinvolvedusing20differentrandomseedswhentrainingeachofourmodels,whichtheperceptronalgorithmissen-sitiveto.Thiscouldnotbeappliedtotheothertwodisfluencydetectors,sowecannottestthosedif-ferencesforsignificance.However,wenotethatthe20samplesforourdisfluencydetectorrangedinaccuracyfrom83.3-84.6,sowedoubtthat0.2%meanimprovementovertheQianandLiu(2013)resultismeaningful.Althoughwedidnotsystematicallyoptimiseonthedevelopmentset,ourtestscoresarelowerthanourdevelopmentaccuracies.Muchoftheover-fittingseemstobeinthePOStagger,whichdroppedinaccuracyby0.5%.9AnalysisofEditBehaviourInordertounderstandhowtheparserappliestheEdittransition,wecollectedsomeadditionalstatisticsoverthedevelopmentdata.Theparserpredicted2,558Edittransitions,whichtogethermarked2,706wordsdisfluent(2,495correctly).TheEdittransitioncanmarkmultiplewordsdis-fluentwhenS0hasoneormorerightwarddescen-dants.Itturnsoutthiscaseisuncommon;theparserlargelyassignsdisfluencylabelsword-by-word,onlysometimesmarkingwordswithright-warddescendentsasdisfluent.Ofthe2,558Edittransitions,therewere682caseswereatleastoneleftwardchildwasreturnedtothestack,andthetotalnumberofleftwardchil-drenreturnedwas1,132.Themostcommontypeofconstructionthatcausedtheparsertoreturnwordstothestackweredisfluentpredicates,whichoftenhavesubjectsanddiscourseconjunctionsasleftwardchildren.Anexampleofadisfluentpred-icatewithafluentsubjectisshowninFigure4.Therewereonly48casesofthesamewordbe-ingreturnedtothestacktwice.Thepossibilityofwordsbeingreturnedtothestackmultipletimesiswhatgivesoursystemworsethanlinearworst-casecomplexity.Intheworstcase,theithwordofasentenceoflengthncouldbereturnedtothestackn−(i+1)times.Empirically,theEdittran-sitionmadenodifferencetorun-time.OnceawordhasbeenreturnedtothestackbytheEdittransition,howdoestheparserendupanalysingit?Ifitturnedoutthatalmostalloftheformerleftwardchildrenofdisfluentwordsaresubsequentlymarkedasdisfluent,therewouldbe
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
l
a
c
_
a
_
0
0
1
7
1
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
140
littlepointinreturningthemtothestack—wecouldjustmarkthemasdisfluentintheoriginalEdittransition.Ontheotherhand,iftheyareal-mostallmarkedasfluent,perhapstheycanjustbeattachedaschildrentothefirstwordofthebuffer.Infactthetwocasesarealmostequallycom-mon.Ofthe1,132wordsreturnedtothestack,547weresubsequentlymarkeddisfluent,and584werenot.Theparserwasalsoquiteaccurateinitsdecisionsoverthesetokens.Ofthe547tokensmarkeddisfluent,500werecorrect—similartotheoveralldevelopmentsetprecision,92.2%.Accuracyoverthewordsreturnedtothestackmightbeimprovedinfuturebyfeaturesreferringtotheirformerheads.Forinstance,inHewentbrokeuhbecamebankrupt,wedonotcurrentlyhavefeaturesthatrecordthedeleteddependencybecameheandwent.Wethankoneoftheanony-mousreviewersforthissuggestion.10RelatedWorkThemostsimilarsystemtoourswaspublishedveryrecently.RasooliandTetreault(2013)de-scribeajointmodelofdependencyparsinganddisfluencydetection.Theyintroduceasecondclassificationstep,wheretheyfirstdecidewhethertoapplyadisfluencytransition,oraregularpars-ingmove.Disfluencytransitionsoperateeitheroverasequenceofwordsbeforethestartofthebuffer,orasequenceofwordsfromthestartofthebufferforward.Insteadofthedynamicoracletrainingmethodthatweemploy,theyuseatwo-stagebootstrap-styleprocess.Directcomparisonbetweenourmodelandtheirsisdifficult,astheyusethePenn2MALTscheme,andtheirparserusesgreedydecoding,whereweusebeamsearch.Theyalsousegold-standardpart-of-speechtags,whichwouldim-proveourscoresbyaround1%.Theuseofbeam-searchmayexplainmuchofourperfor-manceadvantage:theyreportanunlabelledat-tachmentscoreof88.6,andadisfluencydetec-tionF-measureof81.4%.Ourtrainingalgorithmwouldbeapplicabletoabeam-searchversionoftheirparser,astheirtransition-systemalsointro-ducessubstantialspuriousambiguity,andsomenon-monotonicbehaviour.Ahybridtransitionsystemwouldalsobepossi-ble,asthetwotypesofEdittransitionseemtobecomplementary.TheRasooliandTetreaultsystemoffersatoken-basedviewofdisfluencies,whichisusefulforexamplessuchas,andtheandthe,whichwouldrequiretwoapplicationsofourtran-sition.Ontheotherhand,ourEdittransitionmayhavetheadvantageformoresyntacticallycompli-catedexamples,particularlyfordisfluentverbs.Theimportanceofsyntacticfeaturesfordisflu-encydetectionwasdemonstratedbyJohnsonandCharniak(2004).Despitethis,mostsubsequentworkhasusedsequencemodels,ratherthansyn-tacticparsers.Theotherdisfluencysystemthatwecompareourmodelto,developedbyQianandLiu(2013),usesacascadeofMaximumMarginMarkovModelstoperformdisfluencydetectionwithminimalsyntacticinformation.Onemotivationforsequentialapproachesisthatmostapplicationsofthesemodelswillbeoverun-segmentedtext,assegmentingunpunctuatedtextisadifficulttaskthatbenefitsfromsyntacticfea-tures(Zhangetal.,2013).Weconsiderthemostpromisingaspectofoursystemtobethatitisnaturallyincremental,soitshouldbestraightforwardtoextendthesystemtooperateonunsegmentedtextinsubsequentwork.Duetoitsuseofsyntacticfeatures,fromthejointmodel,thesystemissubstantiallymoreaccuratethanthepreviousstate-of-the-artinincrementaldisfluencydetection,77%(Zwartsetal.,2010).11ConclusionWehavepresentedanefficientandaccuratejointmodelofdependencyparsinganddisfluencyde-tection.Themodelout-performspipelineap-proachesusingstate-of-the-artdisfluencydetec-tors,andishighlyefficient,processingover550tokensasecond.Becausethesystemisincremen-tal,itshouldbestraight-forwardtoapplyittoun-segmentedtext.Thesuccessofanincremental,non-monotonicparseratdisfluentspeechparsingmayalsobeofsomepsycholinguisticinterest.AcknowledgmentsTheauthorswouldliketothanktheanony-mousreviewersfortheirvaluablecomments.ThisresearchwassupportedundertheAus-tralianResearchCouncil’sDiscoveryProjectsfundingscheme(projectnumbersDP110102506andDP110102593).ReferencesMiguelBallesterosandJoakimNivre.2013.Go-ingtotherootsofdependencyparsing.Compu-tationalLinguistics.39:1.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
l
a
c
_
a
_
0
0
1
7
1
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
141
PeterF.Brown,PeterV.deSouza,RobertL.Mer-cer,VincentJ.DellaPietra,andJeniferC.Lai.1992.Class-basedn-grammodelsofnaturallanguage.ComputationalLinguistics,18:467–479.EugeneCharniak.2001.Immediate-headparsingforlanguagemodels.InProceedingsof39thAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages124–131.Associa-tionforComputationalLinguistics,Toulouse,France.EugeneCharniakandMarkJohnson.2001.Editdetectionandparsingfortranscribedspeech.InProceedingsofthe2ndMeetingoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics,pages118–126.TheAs-sociationforComputationalLinguistics.EugeneCharniakandMarkJohnson.2005.Coarse-to-finen-bestparsingandMaxEntdis-criminativereranking.InProceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics,pages173–180.As-sociationforComputationalLinguistics,AnnArbor,Michigan.MichaelCollins.2002.DiscriminativetrainingmethodsforhiddenMarkovmodels:Theoryandexperimentswithperceptronalgorithms.InProceedingsofthe2002ConferenceonEmpir-icalMethodsinNaturalLanguageProcessing,pages1–8.AssociationforComputationalLin-guistics.Marie-CatherinedeMarneffe,BillMacCartney,andChristopherD.Manning.2006.Generatingtypeddependencyparsesfromphrasestructureparses.InProceedingsofthe5thInternationalConferenceonLanguageResourcesandEvalu-ation(LREC).LynFrazierandKeithRayner.1982.Makingandcorrectingerrorsduringsentencecomprehen-sion:Eyemovementsintheanalysisofstruc-turallyambiguoussentences.CognitivePsy-chology,14(2):178–210.YoavGoldbergandJoakimNivre.2012.Ady-namicoracleforarc-eagerdependencyparsing.InProceedingsofthe24thInternationalCon-ferenceonComputationalLinguistics(Coling2012).AssociationforComputationalLinguis-tics,Mumbai,India.MatthewHonnibal,YoavGoldberg,andMarkJohnson.2013.Anon-monotonicarc-eagertransitionsystemfordependencyparsing.InProceedingsoftheSeventeenthConferenceonComputationalNaturalLanguageLearning,pages163–172.AssociationforComputationalLinguistics,Sofia,Bulgaria.LiangHuang,SuphanFayong,andYangGuo.2012.Structuredperceptronwithinexactsearch.InProceedingsofthe2012Con-ferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages142–151.AssociationforComputationalLinguis-tics,Montr´eal,Canada.MarkJohnsonandEugeneCharniak.2004.ATAG-basednoisychannelmodelofspeechre-pairs.InProceedingsofthe42ndAnnualMeet-ingoftheAssociationforComputationalLin-guistics,pages33–39.DouglasA.Jones,FlorianWolf,EdwardGib-son,ElliottWilliams,EvelinaFedorenko,Dou-glasA.Reynolds,andMarcA.Zissman.2003.Measuringthereadabilityofautomaticspeech-to-texttranscripts.InINTERSPEECH.ISCA.FredrikJorgensen.2007.Theeffectsofdisflu-encydetectioninparsingspokenlanguage.InJoakimNivre,Heiki-JaanKaalep,KadriMuis-chnek,andMareKoit,editors,Proceedingsofthe16thNordicConferenceofComputationalLinguisticsNODALIDA-2007,pages240–244.TerryKooandMichaelCollins.2010.Efficientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11.PercyLiang.2005.Semi-supervisedlearningfornaturallanguage.Ph.D.thesis,MIT.ChristopherD.Manning.2011.Part-of-speechtaggingfrom97linguistics?InProceedingsofthe12thinternationalconferenceonComputa-tionallinguisticsandintelligenttextprocessing-VolumePartI,CICLing’11,pages171–189.Springer-Verlag,Berlin,Heidelberg.MichellP.Marcus,BeatriceSantorini,andMaryAnnMarcinkiewicz.1993.BuildingalargeannotatedcorpusofEnglish:ThePennTreebank.ComputationalLinguistics,19(2):313–330.JoakimNivre.2003.Anefficientalgorithmforprojectivedependencyparsing.InProceedings
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
7
1
1
5
6
6
8
8
7
/
/
t
l
a
c
_
a
_
0
0
1
7
1
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
142
ofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages149–160.JoakimNivre.2008.Algorithmsfordeterminis-ticincrementaldependencyparsing.Computa-tionalLinguistics,34:513–553.XianQianandYangLiu.2013.Disfluencydetec-tionusingmulti-stepstackedlearning.InPro-ceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics:HumanLanguageTech-nologies,pages820–825.AssociationforCom-putationalLinguistics,Atlanta,Georgia.MohammadSadeghRasooliandJoelTetreault.2013.Jointparsinganddisfluencydetectioninlineartime.InProceedingsofthe2013Con-ferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages124–129.AssociationforComputationalLinguistics,Seattle,Wash-ington,USA.BrianRoark,MaryHarper,EugeneCharniak,BonnieDorr,MarkJohnson,JeremyKahn,YangLiu,MaryOstendorf,JohnHale,AnnaKrasnyanskaya,MatthewLease,IzhakShafran,MatthewSnover,RobinStewart,andLisaYung.2006.Sparseval:Evaluationmetricsforpars-ingspeech.InProceedingsofLanguageRe-sourceandEvaluationConference,pages333–338.EuropeanLanguageResourcesAssocia-tion(ELRA),Genoa,Italy.FrancescoSartorio,GiorgioSatta,andJoakimNivre.2013.Atransition-baseddependencyparserusingadynamicparsingstrategy.InPro-ceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics,pages135–144.AssociationforComputationalLin-guistics,Sofia,Bulgaria.ElizabethShriberg.1994.PreliminariestoaThe-oryofSpeechDisfluencies.Ph.D.thesis,Uni-versityofCalifornia,Berkeley.XuSun,TakuyaMatsuzaki,DaisukeOkanohara,andJun’ichiTsujii.2009.Latentvariableper-ceptronalgorithmforstructuredclassification.InIJCAI,pages1236–1242.JosephTurian,Lev-ArieRatinov,andYoshuaBengio.2010.Wordrepresentations:Asimpleandgeneralmethodforsemi-supervisedlearn-ing.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguis-tics,pages384–394.AssociationforComputa-tionalLinguistics,Uppsala,Sweden.DongdongZhang,ShuangzhiWu,NanYang,andMuLi.2013.Punctuationpredictionwithtransition-basedparsing.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages752–760.As-sociationforComputationalLinguistics,Sofia,Bulgaria.YueZhangandStephenClark.2011.Syntac-ticprocessingusingthegeneralizedperceptronandbeamsearch.ComputationalLinguistics,37(1):105–151.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages188–193.AssociationforComputationalLinguistics,Portland,USA.MuhuaZhu,YueZhang,WenliangChen,MinZhang,andJingboZhu.2013.Fastandaccu-rateshift-reduceconstituentparsing.InPro-ceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics,pages434–443.AssociationforComputationalLin-guistics,Sofia,Bulgaria.SimonZwartsandMarkJohnson.2011.Theim-pactoflanguagemodelsandlossfunctionsonrepairdisfluencydetection.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages703–711.AssociationforComputationalLinguistics,Portland,USA.SimonZwarts,MarkJohnson,andRobertDale.2010.Detectingspeechrepairsincrementallyusinganoisychannelapproach.InProceedingsofthe23rdInternationalConferenceonCom-putationalLinguistics(Coling2010),pages1371–1378.Coling2010OrganizingCommit-tee,Beijing,China.
Descargar PDF