Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 267–278. Redattore di azioni: Brian Roark.
Submitted 3/2013; Pubblicato 7/2013. C(cid:13)2013 Associazione per la Linguistica Computazionale.
EfficientParsingforHead-SplitDependencyTreesGiorgioSattaDept.ofInformationEngineeringUniversityofPadua,Italysatta@dei.unipd.itMarcoKuhlmannDept.ofLinguisticsandPhilologyUppsalaUniversity,Swedenmarco.kuhlmann@lingfil.uu.seAbstractHeadsplittingtechniqueshavebeensuccess-fullyexploitedtoimprovetheasymptoticruntimeofparsingalgorithmsforproject-ivedependencytrees,underthearc-factoredmodel.Inthisarticleweextendthesetech-niquestoaclassofnon-projectivedependencytrees,calledwell-nesteddependencytreeswithblock-degreeatmost2,whichhasbeenprevi-ouslyinvestigatedintheliterature.Wedefineastructuralpropertythatallowsheadsplittingforthesetrees,andpresenttwoalgorithmsthatim-proveovertheruntimeofexistingalgorithmsatnosignificantlossincoverage.1IntroductionMuchoftherecentworkondependencyparsinghasbeenaimedatfindingagoodbalancebetweenac-curacyandefficiency.Foroneendofthespectrum,Eisner(1997)showedthatthehighest-scoringpro-jectivedependencytreeunderanarc-factoredmodelcanbecomputedintimeO.n3/,wherenisthelengthoftheinputstring.Laterworkhasfocusedonmak-ingprojectiveparsingviableundermoreexpressivemodels(Carreras,2007;KooandCollins,2010).Atthesametime,ithasbeenobservedthatformanystandarddatasets,thecoverageofprojectivetreesisfarfromcomplete(KuhlmannandNivre,2006),whichhasledtoaninterestinparsingal-gorithmsfornon-projectivetrees.Whilenon-project-iveparsingunderanarc-factoredmodelcanbedoneintimeO.n2/(McDonaldetal.,2005),parsingwithmoreinformedmodelsisintractable(McDonaldandSatta,2007).Thishasledseveralauthorstoinvestig-ate‘mildlynon-projective’classesoftrees,withthegoalofachievingabalancebetweenexpressivenessandcomplexity(KuhlmannandNivre,2006).Inthisarticlewefocusonaclassofmildlynon-projectivedependencystructurescalledwell-nesteddependencytreeswithblock-degreeatmost2.ThisclasswasfirstintroducedbyBodirskyetal.(2005),whoshowedthatitcorresponds,inanaturalway,totheclassofderivationtreesoflexicalizedtree-adjoin-inggrammars(JoshiandSchabes,1997).Whiletherearelinguisticargumentsagainsttherestrictiontothisclass(MaierandLichte,2011;Chen-MainandJoshi,2010),KuhlmannandNivre(2006)foundthatithasexcellentcoverageonstandarddatasets.Assum-inganarc-factoredmodel,well-nesteddependencytreeswithblock-degree(cid:20)2canbeparsedintimeO.n7/usingthealgorithmofG´omez-Rodr´ıguezetal.(2011).Recentemente,Pitleretal.(2012)haveshownthatifanadditionalrestrictioncalled1-inheritisim-posed,parsingcanbedoneintimeO.n6/,withoutanyadditionallossincoverageonstandarddatasets.Standardcontext-freeparsingmethods,whenadap-tedtotheparsingofprojectivetrees,provideO.n5/timecomplexity.TheO.n3/timeresultreportedbyEisner(1997)hasbeenobtainedbyexploitingmoresophisticateddynamicprogrammingtechniquesthat‘split’dependencytreesatthepositionoftheirheads,inordertosavebookkeeping.Splittingtechniqueshavealsobeenexploitedtospeedupparsingtimeforotherlexicalizedformalisms,suchasbilexicalcontext-freegrammarsandheadautomata(EisnerandSatta,1999).Tuttavia,toourknowledgenoat-tempthasbeenmadeintheliteraturetoextendthesetechniquestonon-projectivedependencyparsing.InthisarticleweleveragethecentralideafromEisner’salgorithmandextendittotheclassofwell-nesteddependencytreeswithblock-degreeatmost2.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
268
Weintroduceastructuralproperty,calledhead-split,thatallowsustosplitthesetreesatthepositionsoftheirheads.Thepropertyisrestrictive,meaningthatitreducestheclassoftreesthatcanbegenerated.However,weshowthattherestrictiontohead-splittreescomesatnosignificantlossincoverage,anditallowsparsingintimeO.n6/,anasymptoticimprove-mentofoneorderofmagnitudeoverthealgorithmbyG´omez-Rodr´ıguezetal.(2011)fortheunrestric-tedclass.Wealsoshowthatrestrictingtheclassofhead-splittreesbyimposingthealreadymentioned1-inheritpropertydoesnotcauseanyadditionallossincoverage,andthatparsingforthecombinedclassispossibleintimeO.n5/,oneorderofmagnitudefasterthanthealgorithmbyPitleretal.(2012)forthe1-inheritclasswithoutthehead-splitcondition.Theaboveresultshaveconsequencesalsofortheparsingofotherrelatedformalisms,suchasthealreadymentionedlexicalizedtree-adjoininggram-mars.Thiswillbediscussedinthefinalsection.2HeadSplittingTointroducethebasicideaofthisarticle,webrieflydiscussinthissectiontwowell-knownalgorithmsforcomputingthesetofallprojectivedependencytreesforagiveninputsentence:thena¨ıve,CKY-stylealgorithm,andtheimprovedalgorithmwithheadsplitting,intheversionofEisnerandSatta(1999).1CKYparsingTheCKY-stylealgorithmworksinapurebottom-upway,buildingdependencytreesbycombiningsubtrees.AssuminganinputstringwDa1(cid:1)(cid:1)(cid:1)an,N(cid:21)1,eachsubtreetisrepresentedbymeansofafinitesignatureŒi;j;H(cid:141),calleditem,wherei;jaretheboundarypositionsoft’sspanoverwandhisthepositionoft’sroot.Thisistheonlyinformationweneedinordertocombinesubtreesunderthearc-factoredmodel.NotethatthenumberofpossiblesignaturesisO.n3/.ThemainstepofthealgorithmisdisplayedinFigure1(UN).Hereweintroducethegraphicalconven-tion,usedthroughoutthisarticle,ofrepresentingasubtreebyashadedarea,withanhorizontallinein-dicatingthespannedfragmentoftheinputstring,andofmarkingthepositionoftheheadbyabullet.TheillustratedstepattachesatreewithsignatureŒk;j;D(cid:141)1Eisner(1997)describesaslightlydifferentalgorithm.(UN)ahadikj)ahadij(B)ahadk)ahad(C)ahadj)ahadjFigure1:Basicstepsfor(UN)theCKY-stylealgorithmand(B,C)theheadsplittingalgorithm.asadependentofatreewithsignatureŒi;k;H(cid:141).TherecanbeO.n5/instantiationsofthisstep,andthisisalsotherunningtimeofthealgorithm.Eisner’salgorithmEisnerandSatta(1999)im-proveovertheCKYalgorithmbyreducingthenum-berofpositionrecordsinanitem.Theydothisby‘splitting’eachtreeintoaleftandarightfragment,sothattheheadisalwaysplacedatoneofthetwoboundarypositionsofafragment,asopposedtobe-ingplacedataninternalposition.Inthiswayitemsneedonlytwoindices.Leftandrightfragmentscanbeprocessedindependently,andmergedafterwards.Letusconsiderarightfragmenttwithheadah.Attachmentattofarightdependenttreewithheadadisnowperformedintwosteps.Thefirststepat-tachesaleftfragmentwithheadad,asinFigure1(B).Thisresultsinanewtypeoffragment/itemthathasbothheadsahandadplacedatitsboundaries.Thesecondstepattachesarightfragmentwithheadad,asinFigure1(C).Thenumberofpossibleinstanti-ationsofthesesteps,andtheasymptoticruntimeofthealgorithm,isO.n3/.Inthisarticleweextendthesplittingtechniquetotheclassofwell-nesteddependencytreeswithblock-degreeatmost2.Thisamountstodefiningafac-torizationforthesetreesintofragments,eachwithitsownheadatoneofitsboundarypositions,alongwithsomeunfoldingoftheattachmentoperationintointermediatesteps.Whileforprojectivetreesheadsplittingcanbedonewithoutanylossincoverage,fortheextendedclassheadsplittingturnsouttobeaproperrestriction.Theempiricalrelevanceofthiswillbediscussedin(cid:144)7.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
269
3Head-SplitTreesInthissectionweintroducetheclassofwell-nesteddependencytreeswithblock-degreeatmost2,anddefinethesubclassofhead-splitdependencytrees.3.1PreliminariesFornon-negativeintegersi;jwewriteŒi;j(cid:141)tode-notethesetfi;iC1;:::;jg;wheni>j,Œi;j(cid:141)istheemptyset.ForastringwDa1(cid:1)(cid:1)(cid:1)an,wheren(cid:21)1andeachaiisalexicaltoken,andfori;j2Œ0;N(cid:141)withi(cid:20)j,wewritewi;jtodenotethesubstringaiC1(cid:1)(cid:1)(cid:1)ajofw;wi;iistheemptystring.Adependencytreetoverwisadirectedtreewhosenodesareasubsetofthetokensaiinwandwhosearcsencodeadependencyrelationbetweentwonodes.Wewriteai!ajtodenotethearc.ai;aj/int;here,thenodeaiisthehead,andthenodeajisthedependent.Ifeachtokenai,i2Œ1;N(cid:141),isanodeoft,thentiscalledcomplete.Sometimeswewritetaitoemphasizethattreetisrootedinnodeai.Ifaiisanodeoft,wealsowritetŒai(cid:141)todenotethesubtreeoftcomposedbynodeaiasitsrootandallofitsdescendantnodes.Thenodesoftuniquelyidentifyasetofmax-imalsubstringsofw,thatis,substringsseparatedbytokensnotint.Thesequenceofsuchsubstrings,orderedfromlefttoright,istheyieldoft,writtenyd.t/.Letaibesomenodeoft.Theblock-degreeofaiint,writtenbd.ai;t/,isdefinedasthenumberofstringcomponentsofyd.tŒai(cid:141)/.Theblock-degreeoft,writtenbd.t/,isthemaximalblock-degreeofitsnodes.Treetisnon-projectiveifbd.t/>1.Treetiswell-nestedif,foreachnodeaioftandforeverypairofoutgoingdependenciesai!ad1andai!ad2,thestringcomponentsofyd.tŒad1(cid:141)/andyd.tŒad2(cid:141)/donot‘interleave’inw.Moreprecisely,itisrequiredthat,ifsomecomponentofyd.tŒadi(cid:141)/,i2Œ1;2(cid:141),occursinwinbetweentwocomponentss1;s2ofyd.tŒadj(cid:141)/,j2Œ1;2(cid:141)andj¤i,thenallcomponentsofyd.tŒadi(cid:141)/occurinbetweens1;s2.Throughoutthisarticle,wheneverweconsideradependencytreetwealwaysimplicitlyassumethattisoverw,thatthasblock-degreeatmost2,andthattiswell-nested.Lettaibesuchatree,withbd.ai;tai/D2.Wecalltheportionofwinbetweenthetwosubstringsofyd.tai/thegapoftai,denotedbygap.tai/.ahad4ad3ad2ad1m.tah/Figure2:Exampleofanodeahwithblock-degree2inanon-projective,well-nesteddependencytreetah.Integerm.tah/,definedin(cid:144)3.2,isalsomarked.Example1Figure2schematicallydepictsawell-nestedtreetahwithblock-degree2;wehavemarkedtherootnodeahanditsdependentnodesadi.Foreachnodeadi,ashadedareahighlightstŒadi(cid:141).Wehavebd.ah;tah/Dbd.ad1;tah/Dbd.ad4;tah/D2andbd.ad2;tah/Dbd.ad3;tah/D1.(cid:3)3.2TheHead-SplitPropertyWesaythatadependencytreethasthehead-splitpropertyifitsatisfiesthefollowingcondition.Letah!adbeanydependencyintwithbd.ah;t/Dbd.ad;t/D2.Whenevergap.tŒad(cid:141)/containsah,itmustalsocontaingap.tŒah(cid:141)/.Intuitively,thismeansthatifyd.tŒad(cid:141)/‘crossesover’thelexicaltokenahinw,thenyd.tŒad(cid:141)/mustalso‘crossover’gap.tŒah(cid:141)/.Example2Dependencyah!ad1inFigure3viol-atesthehead-splitcondition,sinceyd.tŒad1(cid:141)/crossesoverthelexicaltokenahinw,butdoesnotcrossovergap.tŒah(cid:141)/.Theremainingoutgoingdependenciesofahtriviallysatisfythehead-splitcondition,sincethechildnodeshaveblock-degree1.(cid:3)Lettahbeadependencytreesatisfyingthehead-splitpropertyandwithbd.ah;tah/D2.Wespecifybelowaconstructionthat‘splits’tahwithrespecttothepositionoftheheadahinyd.tah/,resultingintwodependencytreessharingtherootahandhavingalloftheremainingnodesformingtwodisjointsets.Furthermore,theresultingtreeshaveblock-degreeatmost2.ahad1ad2ad3Figure3:Arcah!ad1violatesthehead-splitcondition.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
270
(UN)ahad4ad3(B)ahad2ad1m.tah/Figure4:Lowertree(UN)anduppertree(B)fragmentsforthedependencytreeinFigure2.Letyd.tah/Dhwi;j;wp;qiandassumethatahisplacedwithinwi;j.(Asymmetricconstructionshouldbeusedincaseahisplacedwithinwp;q.)Themirrorimageofahwithrespecttogap.tah/,writtenm.tah/,isthelargestintegerinŒp;q(cid:141)suchthattherearenodependencieslinkingnodesinwi;H(cid:0)1andnodesinwp;m.tah/andtherearenodependencieslinkingnodesinwh;jandnodesinwm.tah/;q.Itisnothardtoseethatsuchanintegeralwaysexists,sincetahiswell-nested.Weclassifyeverydependentadofahasbeingan‘upper’dependentora‘lower’dependentofah,accordingtothefollowingconditions:(io)Ifd2Œi;H(cid:0)1(cid:141)[Œm.tah/C1;q(cid:141),thenadisanupperdependentofah.(ii)Ifd2ŒhC1;j(cid:141)[Œp;m.tah/(cid:141),thenadisalowerdependentofah.Theuppertreeoftahisthedependencytreerootedinahandcomposedofalldependenciesah!adintahwithadanupperdependentofah,alongwithallsubtreestahŒad(cid:141)rootedinthosedependents.Similarly,thelowertreeoftahisthedependencytreerootedinahandcomposedofalldependenciesah!adintahwithadalowerde-pendentofah,alongwithallsubtreestahŒad(cid:141)rootedinthosedependents.Asageneralconvention,inthisarticlewewritetU;ahandtL;ahtodenotetheupperandthelowertreesoftah,respectively.Notethat,insomedegeneratecases,thesetoflowerorupperde-pendentsmaybeempty;thentU;ahortL;ahconsistsoftherootnodeahonly.Example3ConsiderthetreetahdisplayedinFig-ure2.Integerm.tah/denotestheboundarybetweentherightcomponentofyd.tahŒad4(cid:141)/andtherightcomponentofyd.tahŒad1(cid:141)/.Nodesad3andad4arelowerdependents,andnodesad1andad2areupperdependents.TreestL;ahandtU;aharedisplayedinFigure4(UN)E(B),respectively.(cid:3)Theimportanceofthehead-splitpropertycanbeinformallyexplainedasfollows.Letah!adbeadependencyintah.Whenwetakeaparttheupperandthelowertreesoftah,theentiresubtreetahŒad(cid:141)endsupineitherofthesetwofragments.Thisallowsustorepresentupperandlowerfragmentsforsomeheadindependentlyoftheother,andtofreelyrecombinethem.Moreformally,ouralgorithmswillmakeuseofthefollowingthreeproperties,statedherewithoutanyformalproof:P1TreestU;ahandtL;aharewell-nested,haveblock-degree(cid:20)2,andsatisfythehead-splitproperty.P2TreestU;ahandtL;ahhavetheirheadahalwaysplacedatoneoftheboundariesintheiryields.P3Lett0U;ahandt00L;ahbetheupperandlowertreesofdistincttreest0ahandt00ah,respectively.Ifm.t0ah/Dm.t00ah/,thenthereexistsatreetahsuchthattU;ahDt0U;ahandtL;ahDt00L;ah.4ParsingItemsLetwDa1(cid:1)(cid:1)(cid:1)an,N(cid:21)1,betheinputstring.Weneedtocompactlyrepresenttreesthatspansubstringsofwbyrecordingonlytheinformationthatisneededtocombinethesetreesintolargertreesduringtheparsingprocess.Wedothisbyassociatingeachtreewithasignature,calleditem,whichisatupleŒi;j;P;q;H(cid:141)X,whereh2Œ1;N(cid:141)identifiesthetokenah,io;jwith0(cid:20)io(cid:20)j(cid:20)nidentifyasubstringwi;j,andp;qwithj
h>jC1Thetwocasesh
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
271
whichisundesiredforthespecificationofoural-gorithm.Asanexample,itemsŒi;j;(cid:0);(cid:0);iC1(cid:141)XandŒiC1;j;(cid:0);(cid:0);iC1(cid:141)Xbothrepresentadepend-encytreetaiC1withyd.taiC1/Dhwi;ji.Thisandothersimilarcasesareavoidedbythebanagainsth2fi;jC1g,whichamountstoimposingsomenormalformforitems.Inourexample,onlyitemŒi;j;(cid:0);(cid:0);iC1(cid:141)Xisavalidsignature.Finally,wedistinguishamongseveralitemtypes,indicatedbythevalueofsubscriptX.Thesetypesarespecifictoeachparsingalgorithm,andwillbedefinedinlatersections.5ParsingofHead-SplitTreesWepresentinthissectionourfirsttabularalgorithmforcomputingthesetofalldependencytreesforaninputsentencewthathavethehead-splitproperty,underthearc-factoredmodel.Recallthattaidenotesatreewithrootai,andtL;aiandtU;aiaretheloweranduppertreesoftai.Thestepsofthealgorithmarespecifiedbymeansofdeductionrulesoveritems,followingtheapproachofShieberetal.(1995).5.1BasicIdeaOuralgorithmbuildstreesstepbystep,byattachingatreetah0asadependentofatreetahandcreatingthenewdependencyah!ah0.Computationally,theworstcaseforthisoperationiswhenbothtahandtah0haveagap;Poi,foreachtreeweneedtokeeparecordofthefourboundaries,alongwiththepositionofthehead,asdonebyG´omez-Rodr´ıguezetal.(2011).Tuttavia,ifweareinterestedinparsingtreesthatsatisfythehead-splitproperty,wecanavoidrepresentingatreewithagapbymeansofasingleitem.Weinsteadfollowthegeneralideaof(cid:144)2forprojectiveparsing,andusedifferentitemsfortheupperandthelowertreesofthesourcetree.Whenweneedtoattachtah0asanupperdependentoftah,definedasin(cid:144)3.2,weperformtwoconsecutivesteps.First,weattachtL;ah0totU;ah,resultinginanewintermediatetreet1.Asasecondstep,weattachtU;ah0tot1,resultinginanewtreet2whichistU;ahwithtah0attachedasanupperdependent,asdesired.BothstepsaredepictedinFigure5;hereweintroducetheconventionofindicatingtreegroupingthroughadashedline.Asymmetricprocedurecanbeusedtoattachtah0asalowerdependenttotL;ah.TheahtU;ahah0tL;ah0+t1(UN)t1ah0ahah0tU;ah0+t2(B)Figure5:Twostepattachmentoftah0attU;ah:(UN)attach-mentoftL;ah0;(B)attachmentoftU;ah0.correctnessofthetwostepapproachfollowsfrompropertiesP1andP3in(cid:144)3.2.BypropertyP2in(cid:144)3.2,inbothstepsabovethelexicalheadsahandah0canbereadfromthebound-ariesoftheinvolvedtrees.Thenthesestepscanbeimplementedmoreefficientlythanthena¨ıvemethodofattachingtah0totahinasinglestep.Amorede-tailedcomputationalanalysiswillbeprovidedin(cid:144)5.7.Tosimplifythepresentation,werestricttheuseofheadsplittingtotreeswithagapandparsetreeswithnogapwiththena¨ıvemethod;thisdoesnotaffectthecomputationalcomplexity.5.2ItemTypesWedistinguishfivedifferenttypesofitems,indicatedbythesubscriptX2f0;l;U;=L;=Ug,asdescribedinwhatfollows.(cid:15)IfXD0,wehavepDqD(cid:0)andyd.ah/isspecifiedasin(cid:144)4.(cid:15)IfXDL,weusetheitemtorepresentsomelowertree.Wehavethereforep;q¤(cid:0)andh2fiC1;qg.(cid:15)IfXDU,weusetheitemtorepresentsomeuppertree.Wehavethereforep;q¤(cid:0)andh2fj;pC1g.(cid:15)IfXD=LorXD=U,weusetheitemtorepresentsomeintermediatestepintheparsingprocess,inwhichonlytheloweroruppertreeofsomedependenthasbeencollectedbytheheadah,andwearestillmissingtheupper(=U)orthelower(=L)tree.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
272
Wefurtherspecializesymbol=Ubywriting=U<(=U>)toindicatethatthemissinguppertreeshouldhaveitsheadtotheleft(right)ofitsgap.Wealsouse=L
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
277
Inconnectionwiththe1-inheritproperty,thisevenincreasestotwoordersofmagnitude.However,asalreadystatedin(cid:144)2,thisimprovementispaidforbyalossincoverage;forinstance,treesoftheformshowninFigure3cannotbeparsedanylonger.7.1QuantitativeEvaluationInordertoassesstheempiricallossincoveragethattherestrictiontohead-splittreesincurs,weevaluatedthecoverageofseveralclassesofdependencytreesonstandarddatasets.FollowingPitleretal.(2012),wereportinTable1figuresforthetrainingsetsofsixlanguagesusedintheCoNLL-Xsharedtaskondependencyparsing(BuchholzandMarsi,2006).Aswecansee,theO.n6/classofhead-splittreeshasonlyslightlylowercoverageonthisdatathanthebaselineclassofwell-nesteddependencytreeswithblock-degreeatmost2.Thelossesareupto0.2percentagepointsonfiveofthesixlanguages,and0.9pointsontheDutchdata.OurevenmorerestrictedO.n5/classof1-inherithead-splittreeshasthesamecoverageasourO.n6/class,whichisexpectedgiventheresultsofPitleretal.(2012):TheirO.n6/classof1-inherittreeshasexactlythesamecoverageasthebaseline(andtherebymorecoveragethanourO.n6/class).Interestinglythough,theirO.n5/classof‘gap-minding’treeshasasignificantlysmallercoveragethanourO.n5/class.Weconcludethatourclassseemstostrikeagoodbalancebetweenexpressivenessandparsingcomplexity.7.2QualitativeEvaluationWhiletheoriginalmotivationbehindintroducingthehead-splitpropertywastoimproveparsingcomplex-ity,itisinterestingtoalsodiscussthelinguisticrelev-anceofthisproperty.Afirstinspectionofthestruc-turesthatviolatethehead-splitpropertyrevealedthatmanysuchviolationsdisappearifoneignoresgapscausedbypunctuation.Somedecisionsaboutwhatnodesshouldfunctionastheheadsofpunctuationsymbolsleadtomoregapsthanothers.Inordertoquantifytheimplicationsofthis,werecomputedthecoverageoftheclassofhead-splittreesondatasetswherewefirstremovedallpunctuation.TheresultsaregiveninTable2.WerestrictourselvestothefivenativedependencytreebanksusedintheCoNLL-Xsharedtask,ignoringtreebanksthathavebeencon-vertedfromphrasestructurerepresentations.ArabicCzechDanishSloveneTurkishwith1139122without146002Table2:Violationsagainstthehead-splitproperty(relativetotheclassofwell-nestedtreeswithblock-degree(cid:20)2)withandwithoutpunctuation.Weseethatwhenweremovepunctuationfromthesentences,thenumberofviolationsagainstthehead-splitpropertyatmostdecreases.ForDanishandSlovene,removingpunctuationevenhastheef-fectthatallwell-nesteddependencytreeswithblock-degreeatmost2becomehead-split.Overall,theabsolutenumbersofviolationsareextremelysmall—exceptforCzech,wherewehave139violationswithand46withoutpunctuation.AcloserinspectionoftheCzechsentencesrevealsthatmanyofthesefea-turerathercomplexcoordinations.Indeed,outofthe46violationsinthepunctuation-freedata,only9remainwhenoneignoresthosewithcoordination.Fortheremainingones,wehavenotbeenabletoidentifyanyclearpatterns.8ConcludingRemarksInthisarticlewehaveextendedheadsplittingtech-niques,originallydevelopedforparsingofprojectivedependencytrees,totwosubclassesofwell-nesteddependencytreeswithblock-degreeatmost2.Wehaveimprovedovertheasymptoticruntimeoftwoexistingalgorithms,atnosignificantlossincoverage.Withthesamegoalofimprovingparsingefficiencyforsubclassesofnon-projectivetrees,inveryrecentworkPitleretal.(2013)haveproposedanO.n4/timealgorithmforasubclassofnon-projectivetreesthatarenotwell-nested,usinganapproachthatisorthogonaltotheonewehaveexploredhere.Otherthanfordependencyparsing,ourresultshavealsoimplicationsformildlycontext-sensitivephrasestructureformalisms.Inparticular,theal-gorithmof(cid:144)5canbeadaptedtoparseasubclassoflexicalizedtree-adjoininggrammars,improvingtheresultbyEisnerandSatta(2000)fromO.n7/toO.n6/.Similarly,thealgorithmof(cid:144)6canbeadaptedtoparsealexicalizedversionofthetree-adjoininggrammarsinvestigatedbySattaandSchuler(1998),improvingana¨ıveO.n7/algorithmtoO.n5/.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5
/
/
T
l
UN
C
_
UN
_
0
0
2
2
6
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
278
ReferencesManuelBodirsky,MarcoKuhlmann,andMathiasM¨ohl.2005.Well-nesteddrawingsasmodelsofsyntacticstructure.InProceedingsofthe10thConferenceonFormalGrammar(FG)andNinthMeetingonMathem-aticsofLanguage(MOL),pages195–203,Edinburgh,UK.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsoftheTenthConferenceonComputationalNaturalLanguageLearning(CoNLL),pages149–164,NewYork,USA.XavierCarreras.2007.Experimentswithahigher-orderprojectivedependencyparser.InProceedingsoftheCoNLLSharedTaskSessionofEMNLP-CoNLL2007,pages957–961,Prague,CzechRepublic.JoanChen-MainandAravindK.Joshi.2010.Unavoid-ableill-nestednessinnaturallanguageandtheadequacyoftreelocal-MCTAGinduceddependencystructures.InProceedingsoftheTenthInternationalConferenceonTreeAdjoiningGrammarsandRelatedFormalisms(TAG+),NewHaven,USA.JasonEisnerandGiorgioSatta.1999.Efficientparsingforbilexicalcontext-freegrammarsandHeadAuto-matonGrammars.InProceedingsofthe37thAnnualMeetingoftheAssociationforComputationalLinguist-ics(ACL),pages457–464,CollegePark,MD,USA.JasonEisnerandGiorgioSatta.2000.AfasterparsingalgorithmforlexicalizedTree-AdjoiningGrammars.InProceedingsoftheFifthWorkshoponTreeAdjoiningGrammarsandRelatedFormalisms(TAG+),pages14–19,Paris,France.JasonEisner.1997.Bilexicalgrammarsandacubic-timeprobabilisticparser.InProceedingsoftheFifthInter-nationalWorkshoponParsingTechnologies(IWPT),pages54–65,Cambridge,MA,USA.CarlosG´omez-Rodr´ıguez,JohnCarroll,andDavidJ.Weir.2011.Dependencyparsingschemataandmildlynon-projectivedependencyparsing.ComputationalLin-guistics,37(3):541–586.AravindK.JoshiandYvesSchabes.1997.Tree-AdjoiningGrammars.InGrzegorzRozenbergandArtoSalomaa,editors,HandbookofFormalLanguages,volume3,pages69–123.Springer.TerryKooandMichaelCollins.2010.Efficientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11,Uppsala,Sweden.MarcoKuhlmannandJoakimNivre.2006.Mildlynon-projectivedependencystructures.InProceedingsofthe21stInternationalConferenceonComputationalLinguistics(COLING)and44thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL)MainConferencePosterSessions,pages507–514,Sydney,Australia.WolfgangMaierandTimmLichte.2011.Characteriz-ingdiscontinuityinconstituenttreebanks.InPhilippedeGroote,MarkusEgg,andLauraKallmeyer,editors,FormalGrammar.14thInternationalConference,FG2009,Bordeaux,France,July25–26,2009,RevisedSelectedPapers,volume5591ofLectureNotesinCom-puterScience,pages167–182.Springer.RyanMcDonaldandGiorgioSatta.2007.Onthecom-plexityofnon-projectivedata-drivendependencypars-ing.InProceedingsoftheTenthInternationalConfer-enceonParsingTechnologies(IWPT),pages121–132,Prague,CzechRepublic.RyanMcDonald,FernandoPereira,KirilRibarov,andJanHajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InHumanLanguageTechno-logyConference(HLT)andConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages523–530,Vancouver,Canada.EmilyPitler,SampathKannan,andMitchellMarcus.2012.Dynamicprogrammingforhigherorderparsingofgap-mindingtrees.InProceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)andComputationalNaturalLan-guageLearning(CoNLL),pages478–488,JejuIsland,RepublicofKorea.EmilyPitler,SampathKannan,andMitchellMarcus.2013.Findingoptimal1-endpoint-crossingtrees.TransactionsoftheAssociationforComputationalLin-guistics.GiorgioSattaandWilliamSchuler.1998.Restrictionsontreeadjoininglanguages.InProceedingsofthe36thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL)and17thInternationalConferenceonComputationalLinguistics(COLING),pages1176–1182,Montr´eal,Canada.StuartM.Shieber,YvesSchabes,andFernandoPereira.1995.Principlesandimplementationofdeductivepars-ing.JournalofLogicProgramming,24(1–2):3–36.
Scarica il pdf