Transactions of the Association for Computational Linguistics, 1 (2013) 13–24. Action Editor: Giorgio Satta.
Submitted 11/2012; Published 3/2013. C
(cid:13)
2013 Verein für Computerlinguistik.
FindingOptimal1-Endpoint-CrossingTreesEmilyPitler,SampathKannan,MitchellMarcusComputerandInformationScienceUniversityofPennsylvaniaPhiladelphia,PA19104epitler,kannan,mitch@seas.upenn.eduAbstractDependencyparsingalgorithmscapableofproducingthetypesofcrossingdependenciesseeninnaturallanguagesentenceshavetra-ditionallybeenordersofmagnitudeslowerthanalgorithmsforprojectivetrees.For95.8-99.8%ofdependencyparsesinvariousnat-urallanguagetreebanks,wheneveranedgeiscrossed,theedgesthatcrossitallhaveacommonvertex.Theoptimaldependencytreethatsatisfiesthis1-Endpoint-Crossingprop-ertycanbefoundwithanO(n4)parsingal-gorithmthatrecursivelycombinesforestsoverintervalswithoneexteriorpoint.1-Endpoint-CrossingtreesalsohavenaturalconnectionstolinguisticsandanotherclassofgraphsthathasbeenstudiedinNLP.1IntroductionDependencyparsingisoneofthefundamentalprob-lemsinnaturallanguageprocessingtoday,withap-plicationssuchasmachinetranslation(DingandPalmer,2005),informationextraction(CulottaandSorensen,2004),andquestionanswering(Cuietal.,2005).Mosthigh-accuracygraph-baseddepen-dencyparsers(KooandCollins,2010;RushandPetrov,2012;ZhangandMcDonald,2012)findthehighest-scoringprojectivetrees(inwhichnoedgescross),despitethefactthatalargeproportionofnat-urallanguagesentencesarenon-projective.Projec-tivetreescanbefoundinO(n3)Zeit(Eisner,2000),butcoveronly63.6%ofsentencesinsomenaturallanguagetreebanks(Table1).TheclassofdirectedspanningtreescoversalltreebanktreesandcanbeparsedinO(n2)withedge-basedfeatures(McDonaldetal.,2005),butitisNP-hardtofindthemaximumscoringsuchtreewithgrandparentorsiblingfeatures(McDonaldandPereira,2006;McDonaldandSatta,2007).Therearevariousexistingdefinitionsofmildlynon-projectivetreeswithbetterempiricalcoveragethanprojectivetreesthatdonothavethehardnessofextensibilitythatspanningtreesdo.However,thesehavehadparsingalgorithmsthatareordersofmag-nitudeslowerthantheprojectivecaseortheedge-basedspanningtreecase.Forexample,well-nesteddependencytreeswithblockdegree2(Kuhlmann,2013)coveratleast95.4%ofnaturallanguagestruc-tures,buthaveaparsingtimeofO(n7)(Gómez-Rodríguezetal.,2011).Nopreviouslydefinedclassoftreessimultane-ouslyhashighcoverageandlow-degreepolynomialalgorithmsforparsing,allowinggrandparentorsib-lingfeatures.Wepropose1-Endpoint-Crossingtrees,inwhichforanyedgethatiscrossed,allotheredgesthatcrossthatedgeshareanendpoint.Whilesimpletostate,thispropertycovers95.8%ormoreofde-pendencyparsesinnaturallanguagetreebanks(Ta-ble1).Theoptimal1-Endpoint-Crossingtreecanbefoundinfasterasymptotictimethananyprevi-ouslyproposedmildlynon-projectivedependencyparsingalgorithm.Weshowhowany1-Endpoint-Crossingtreecanbedecomposedintoisolatedsetsofintervalswithoneexteriorpoint(Section3).Thisisthekeyinsightthatallowsefficientparsing;theO(n4)parsingalgorithmispresentedinSection4.1-Endpoint-Crossingtreesareasubclassof2-planargraphs(Section5.1),aclassthathasbeenstudied
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
0
6
1
5
6
6
6
3
9
/
/
T
l
A
C
_
A
_
0
0
2
0
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
14
inNLP.1-Endpoint-Crossingtreesalsohavesomelinguisticinterpretation(pairsofcrossserialverbsproduce1-Endpoint-Crossingtrees,Section5.2).2DefinitionsofNon-ProjectivityDefinition1.Edgeseandfcrossifeandfhavedistinctendpointsandexactlyoneoftheendpointsoffliesbetweentheendpointsofe.Definition2.Adependencytreeis1-Endpoint-Crossingifforanyedgee,alledgesthatcrosseshareanendpointp.Table1showsthepercentageofdependencyparsesintheCoNLL-Xtrainingsetsthatare1-Endpoint-Crossingtrees.Acrosssixlanguageswithvaryingamountsofnon-projectivity,95.8-99.8%ofdependencyparsesintreebanksare1-Endpoint-Crossingtrees.1Wenextreviewandcompareotherrelevantdef-initionsofnon-projectivityfrompriorwork:well-nestedwithblockdegree2,gap-minding,projective,and2-planar.Thedefinitionsofblockdegreeandwell-nestednessaregivenbelow:Definition3.Foreachnodeuinthetree,ablockofthenodeis“alongestsegmentconsistingofdescen-dantsofu.”(Kuhlmann,2013).Theblock-degreeofuis“thenumberofdistinctblocksofu”.Theblockdegreeofatreeisthemaximumblockdegreeofanyofitsnodes.Thegapdegreeisthenumberofgapsbetweentheseblocks,andsobydefinitionisonelessthantheblockdegree.(Kuhlmann,2013)Definition4.Twotrees“T1andT2interleaveifftherearenodesl1,r1∈T1andl2,r2∈T2suchthatl1
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
0
6
1
5
6
6
6
3
9
/
/
T
l
A
C
_
A
_
0
0
2
0
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
21
WhatdidsayBAC…Zatet?nsaid1said2t1t2Figure8:Anexampleofwh-movementoverapoten-tiallyunboundednumberofclauses.Theedgesbe-tweentheheadsofeachclausecrosstheedgesfromtracetotrace,butallobey1-Endpoint-Crossing.Endpoint-Crossing.Psycholinguistically,betweentwoandthreeverbsisexactlywherethereisalargechangeinthesentenceprocessingabilitiesofhumanlisteners(basedonbothgrammaticaljudgmentsandscoresonacomprehensiontask)(Bachetal.,1986).Morespeculatively,theremaybeaconnectionbetweentheformof1-Endpoint-Crossingtreesandphases(grob,propositionalunitssuchasclauses)inMinimalism(Chomskyetal.,1998).Figure8showsanexampleofwh-movementoverapoten-tiallyunboundednumberofclauses.Thephase-impenetrabilitycondition(PIC)statesthatonlytheheadofthephaseandelementsthathavemovedtoitsedgeareaccessibletotherestofthesentence(Chomskyetal.,1998,p.22).Movementisthere-forerequiredtobesuccessivecyclic,withamovedelementleavingachainoftracesattheedgeofeachclauseonitswaytoitsfinalpronouncedloca-tion(Chomsky,1981).InFigure8,noticethatthecrossingedgesformarepeatedpatternthatobeysthe1-Endpoint-Crossingproperty.Moregenerally,wesuspectthattreessatisfyingthePICwilltendtoalsobe1-Endpoint-Crossing.Furthermore,ifthetraceswerenotattheedgeofeachclause,andin-steadwerepositionedbetweenaheadandoneofitsarguments,1-Endpoint-Crossingwouldbevio-lated.Forexample,ift2inFigure8werebe-tweenCandsaid2,thentheedge(t1,t2)wouldcross(sagen,said1),(said1,said2),Und(C,said2),whichdonotallshareanendpoint.Anexplorationoftheselinguisticconnectionsmaybeaninterestingavenueforfurtherresearch.6Conclusions1-Endpoint-Crossingtreescharacterizeover95%ofstructuresfoundinnaturallanguagetreebank,andcanbeparsedinonlyafactorofnmoretimethanprojectivetrees.Thedynamicprogrammingalgo-rithmforprojectivetrees(Eisner,2000)hasbeenextendedtohandlehigherorderfactors(McDonaldandPereira,2006;Carreras,2007;KooandCollins,2010),addingatmostafactorofntotheedge-basedrunningtime;itwouldbeinterestingtoex-tendthealgorithmpresentedheretoincludehigherorderfactors.1-Endpoint-Crossingisaconditiononedges,whilepropertiessuchaswell-nestednessorblockdegreeareframedintermsofsubtrees.Threeedgeswillalwayssufficeasacertificateofa1-Endpoint-Crossingviolation(twovertex-disjointedgesthatbothcrossathird).Incontrast,forapropertylikeill-nestedness,twonodesmighthavealeastcommonancestorarbitrarilyfaraway,andsoonemightneedtheentiregraphtoverifywhetherthesub-treesrootedatthosenodesaredisjointandill-nested.Wehavediscussedcross-serialdepen-dencies;afurtherexplorationofwhichlinguisticphenomenawouldandwouldnothave1-Endpoint-Crossingdependencytreesmayberevealing.AcknowledgmentsWewouldliketothankJulieLegateforanin-terestingdiscussion.ThismaterialisbaseduponworksupportedunderaNationalScienceFoun-dationGraduateResearchFellowship,NSFAwardCCF1137084,andArmyResearchOfficeMURIgrantW911NF-07-1-0216.ADynamicProgramtofindthemaximumscoring1-Endpoint-CrossingTreeInput:MatrixS:S[ich,J]isthescoreofthedirectededge(ich,J)Output:Maximumscoreofa1-Endpoint-Crossingtreeoververtices[0,N],rootedat0Init:∀iInt[ich,ich,F,F]=Int[ich,i+1,F,F]=0Int[ich,ich,T,F]=Int[ich,ich,F,T]=Int[ich,ich,T,T]=−∞Final:Int[0,N,F,T]Shorthandforbooleans:TF(X,S):=ifx=T,exactlyoneofthesetSistrueifx=F,allofthesetSmustbefalsebi,bj,bxaretrueiffthecorrespondingboundarypointhasitsincomingedge(parent)inthatsub-problem.FortheLRsub-problem,biandbjarealwaysfalse,andsoomitted.Forallsub-problemswiththesuffixAFromB,theboundarypointAhasitsparentedgeinthesub-problemsolution;theothertwoboundarypointsdonot.Forexample,L_XFromIwouldcor-respondtohavingbooleansbi=bj=Fandbx=T,withtherestrictionthatxmustbeadescendantofi.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
0
6
1
5
6
6
6
3
9
/
/
T
l
A
C
_
A
_
0
0
2
0
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
22
Int[ich,J,F,bj]←maxInt[i+1,j,T,F]ifbj=FS[ich,J]+Int[ich,J,F,F]ifbj=Tmaxk∈(ich,J)S[ich,k]+Int[ich,k,F,F]+Int[k,J,F,bj]maxTF(bj,{bl,br})LR[ich,k,J,bl]+Int[k,J,F,br]maxl∈(k,J),TF(T,{bl,bm,br})(cid:26)R[ich,k,l,F,F,bl]+Int[k,l,F,bm]+L[l,J,k,br,bj,F]LR[ich,k,l,bl]+Int[k,l,F,bm]+Int[l,J,br,bj]maxl∈(ich,k),TF(T,{bl,bm,br})(cid:26)Int[ich,l,F,bl]+L[l,k,ich,bm,F,F]+N[k,J,l,F,bj,br]R[ich,l,k,F,bl,F]+Int[l,k,bm,F]+L[k,J,l,F,bj,br]Int[ich,J,T,F]←symmetrictoInt[ich,J,F,T]Int[ich,J,T,T]←−∞LR[ich,J,X,bx]←maxL[ich,J,X,F,F,bx]R[ich,J,X,F,F,bx]maxk∈(ich,J),TF(bx,{bxl,bxr}),TF(T,{bkl,bkr})L[ich,k,X,F,bkl,bxl]+R[k,J,X,bkr,F,bxr]N[ich,J,X,bi,bj,F]←maxInt[ich,J,bi,bj]S[X,ich]+N[ich,J,X,F,bj,F]ifbi=TS[X,J]+N[ich,J,X,bi,F,F]ifbj=Tmaxk∈(ich,J)S[X,k]+N[ich,k,X,bi,F,F]+Int[k,J,F,bj]N[ich,J,X,F,bj,T]←maxS[ich,X]+N[ich,J,X,F,bj,F]S[X,J]+N_XFromI[ich,J,X]ifbj=TS[J,X]+N[ich,J,X,F,F,F]ifbj=FS[J,X]+Int[ich,J,F,T]ifbj=Tmaxk∈(ich,J)S[X,k]+N_XFromI[ich,k,X]+Int[k,J,F,bj]maxk∈(ich,J)S[k,X]+(cid:26)Int[ich,k,F,T]+Int[k,J,F,bj]N[ich,k,X,F,F,F]+Int[k,J,T,bj]N[ich,J,X,T,F,T]←symmetrictoN[ich,J,X,F,T,T]N[ich,J,X,T,T,T]←−∞N_XFromI[ich,J,X]←maxS[ich,X]+N[ich,J,X,F,F,F]maxk∈(ich,J)(cid:26)S[X,k]+N_XFromI[ich,k,X]+Int[k,J,F,F]S[k,X]+Int[ich,k,F,T]+Int[k,J,F,F]N_IFromX[ich,J,X]←max(S[X,ich]+N[ich,J,X,F,F,F]maxk∈(ich,J)S[X,k]+N[ich,k,X,T,F,F]+Int[k,J,F,F]N_XFromJ[ich,J,X]←symmetrictoN_XFromI[ich,J,X]N_JFromX[ich,J,X]←symmetrictoN_IFromX[ich,J,X]L[ich,J,X,bi,bj,F]←maxInt[ich,J,bi,bj]S[X,ich]+L[ich,J,X,F,bj,F]ifbi=TS[X,J]+L[ich,J,X,bi,F,F]ifbj=Tmaxk∈(ich,J),TF(bi,{bl,br})S[X,k]+(cid:26)L[ich,k,X,bl,F,F]+N[k,J,ich,F,bj,br]Int[ich,k,bl,F]+L[k,J,ich,F,bj,br]L[ich,J,X,F,bj,T]←maxS[ich,X]+L[ich,J,X,F,bj,F]S[X,J]+L_XFromI[ich,J,X]ifbj=TS[J,X]+L[ich,J,X,F,F,F]ifbj=FS[J,X]+L_JFromI[ich,J,X]ifbj=Tmaxk∈(ich,J)S[X,k]+L_XFromI[ich,k,X]+N[k,J,ich,F,bj,F]maxk∈(ich,J)S[k,X]+L_JFromI[ich,k,X]+N[k,J,ich,F,bj,F]L[ich,k,X,F,F,F]+N[k,J,ich,T,bj,F]maxTF(T,{bl,br})Int[ich,k,F,bl]+L[k,J,ich,br,bj,F]L[ich,J,X,T,bj,T]←notreachableL_XFromI[ich,J,X]←maxS[ich,X]+L[ich,J,X,F,F,F]maxk∈(ich,J)S[X,k]+L_XFromI[ich,k,X]+N[k,J,ich,F,F,F]maxk∈(ich,J)S[k,X]+L_JFromI[ich,k,X]+N[k,J,ich,F,F,F]L[ich,k,X,F,F,F]+N_IFromX[k,J,ich]Int[ich,k,F,T]+L[k,J,ich,F,F,F]Int[ich,k,F,F]+L_IFromX[k,J,ich]L_IFromX[ich,J,X]←maxS[X,ich]+L[ich,J,X,F,F,F]maxk∈(ich,J)S[X,k]+L[ich,k,X,T,F,F]+N[k,J,ich,F,F,F]L[ich,k,X,F,F,F]+N_XFromI[k,J,ich]Int[ich,k,T,F]+L[k,J,ich,F,F,F]Int[ich,k,F,F]+L_XFromI[k,J,ich]L_JFromX[ich,J,X]←maxS[X,J]+L[ich,J,X,F,F,F]maxk∈(ich,J)S[X,k]+(cid:26)L[ich,k,X,F,F,F]+Int[k,J,F,T]Int[ich,k,F,F]+L_JFromI[k,J,ich]L_JFromI[ich,J,X]←maxInt[ich,J,F,T]maxk∈(ich,J)S[X,k]+(cid:26)L[ich,k,X,F,F,F]+N_JFromX[k,J,ich]Int[ich,k,F,F]+L_JFromX[k,J,ich]
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
0
6
1
5
6
6
6
3
9
/
/
T
l
A
C
_
A
_
0
0
2
0
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
23
R[ich,J,X,bi,bj,F]←symmetrictoL[ich,J,X,bi,bj,F]R[ich,J,X,bi,F,T]←symmetrictoL[ich,J,X,F,bj,T]R[ich,J,X,bi,T,T]←notreachableR_XFromJ[ich,J,X]←symmetrictoL_XFromI[ich,J,X]R_JFromX[ich,J,X]←symmetrictoL_IFromX[ich,J,X]R_IFromX[ich,J,X]←symmetrictoL_JFromX[ich,J,X]R_IFromJ[ich,J,X]←symmetrictoL_JFromI[ich,J,X]ReferencesE.Bach,C.Brown,andW.Marslen-Wilson.1986.Crossedandnesteddependenciesingermananddutch:Apsycholinguisticstudy.LanguageandCognitiveProcesses,1(4):249–262.F.BernhartandP.C.Kainen.1979.Thebookthicknessofagraph.JournalofCombinatorialTheory,SeriesB,27(3):320–331.M.Bodirsky,M.Kuhlmann,andM.Möhl.2005.Well-nesteddrawingsasmodelsofsyntacticstructure.InInTenthConferenceonFormalGrammarandNinthMeetingonMathematicsofLanguage,pages88–1.UniversityPress.X.Carreras.2007.Experimentswithahigher-orderprojectivedependencyparser.InProceedingsoftheCoNLLSharedTaskSessionofEMNLP-CoNLL,vol-ume7,pages957–961.N.Chomsky,MassachusettsInstituteofTechnology.Dept.ofLinguistics,andPhilosophy.1998.Minimal-istinquiries:theframework.MIToccasionalpapersinlinguistics.DistributedbyMITWorkingPapersinLinguistics,MIT,Dept.ofLinguistics.N.Chomsky.1981.LecturesonGovernmentandBind-ing.Dordrecht:Foris.F.Chung,F.Leighton,andA.Rosenberg.1987.Em-beddinggraphsinbooks:Alayoutproblemwithap-plicationstoVLSIdesign.SIAMJournalonAlgebraicDiscreteMethods,8(1):33–58.H.Cui,R.Sun,K.Li,M.Y.Kan,andT.S.Chua.2005.Questionansweringpassageretrievalusingdepen-dencyrelations.InProceedingsofthe28thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages400–407.ACM.A.CulottaandJ.Sorensen.2004.Dependencytreekernelsforrelationextraction.InProceedingsofthe42ndAnnualMeetingonAssociationforComputa-tionalLinguistics,page423.AssociationforCompu-tationalLinguistics.Y.DingandM.Palmer.2005.Machinetranslationusingprobabilisticsynchronousdependencyinsertiongram-mars.InProceedingsofthe43rdAnnualMeetingonAssociationforComputationalLinguistics,pages541–548.AssociationforComputationalLinguistics.J.Eisner.2000.Bilexicalgrammarsandtheircubic-timeparsingalgorithms.InHarryBuntandAntonNijholt,editors,AdvancesinProbabilisticandOtherParsingTechnologies,pages29–62.KluwerAcademicPublishers,October.S.EvenandA.Itai.1971.Queues,stacks,andgraphs.InProc.InternationalSymp.onTheoryofMachinesandComputations,pages71–86.C.Gómez-RodríguezandJ.Nivre.2010.Atransition-basedparserfor2-planardependencystructures.InProceedingsofACL,pages1492–1501.C.Gómez-Rodríguez,J.Carroll,andD.Weir.2011.De-pendencyparsingschemataandmildlynon-projectivedependencyparsing.ComputationalLinguistics,37(3):541–586.T.KooandM.Collins.2010.Efficientthird-orderde-pendencyparsers.InProceedingsofACL,pages1–11.M.Kuhlmann.2013.Mildlynon-projectivedependencygrammar.ComputationalLinguistics,39(2).R.McDonaldandF.Pereira.2006.Onlinelearningofapproximatedependencyparsingalgorithms.InPro-ceedingsofEACL,pages81–88.R.McDonaldandG.Satta.2007.Onthecomplexityofnon-projectivedata-drivendependencyparsing.InProceedingsofthe10thInternationalConferenceonParsingTechnologies,pages121–132.R.McDonald,F.Pereira,K.Ribarov,andJ.Hajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InProceedingsoftheconferenceonHumanLanguageTechnologyandEmpiricalMethodsinNaturalLanguageProcessing,pages523–530.As-sociationforComputationalLinguistics.E.Pitler,S.Kannan,andM.Marcus.2012.Dynamicprogrammingforhigherorderparsingofgap-mindingtrees.InProceedingsofEMNLP,pages478–488.L.A.Ringenberg.1967.Collegegeometry.Wiley.A.RushandS.Petrov.2012.Vinepruningforeffi-cientmulti-passdependencyparsing.InProceedingsofNAACL,pages498–507.S.M.Shieber.1985.Evidenceagainstthecontext-freenessofnaturallanguage.LinguisticsandPhiloso-phy,8(3):333–343.H.ZhangandR.McDonald.2012.Generalizedhigher-orderdependencyparsingwithcubepruning.InPro-ceedingsofEMNLP,pages320–331.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
0
6
1
5
6
6
6
3
9
/
/
T
l
A
C
_
A
_
0
0
2
0
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3