Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 267–278. Redattore di azioni: Brian Roark.

Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 267–278. Redattore di azioni: Brian Roark.
Submitted 3/2013; Pubblicato 7/2013. C(cid:13)2013 Associazione per la Linguistica Computazionale.

EfficientParsingforHead-SplitDependencyTreesGiorgioSattaDept.ofInformationEngineeringUniversityofPadua,Italysatta@dei.unipd.itMarcoKuhlmannDept.ofLinguisticsandPhilologyUppsalaUniversity,Swedenmarco.kuhlmann@lingfil.uu.seAbstractHeadsplittingtechniqueshavebeensuccess-fullyexploitedtoimprovetheasymptoticruntimeofparsingalgorithmsforproject-ivedependencytrees,underthearc-factoredmodel.Inthisarticleweextendthesetech-niquestoaclassofnon-projectivedependencytrees,calledwell-nesteddependencytreeswithblock-degreeatmost2,whichhasbeenprevi-ouslyinvestigatedintheliterature.Wedefineastructuralpropertythatallowsheadsplittingforthesetrees,andpresenttwoalgorithmsthatim-proveovertheruntimeofexistingalgorithmsatnosignificantlossincoverage.1IntroductionMuchoftherecentworkondependencyparsinghasbeenaimedatfindingagoodbalancebetweenac-curacyandefficiency.Foroneendofthespectrum,Eisner(1997)showedthatthehighest-scoringpro-jectivedependencytreeunderanarc-factoredmodelcanbecomputedintimeO.n3/,wherenisthelengthoftheinputstring.Laterworkhasfocusedonmak-ingprojectiveparsingviableundermoreexpressivemodels(Carreras,2007;KooandCollins,2010).Atthesametime,ithasbeenobservedthatformanystandarddatasets,thecoverageofprojectivetreesisfarfromcomplete(KuhlmannandNivre,2006),whichhasledtoaninterestinparsingal-gorithmsfornon-projectivetrees.Whilenon-project-iveparsingunderanarc-factoredmodelcanbedoneintimeO.n2/(McDonaldetal.,2005),parsingwithmoreinformedmodelsisintractable(McDonaldandSatta,2007).Thishasledseveralauthorstoinvestig-ate‘mildlynon-projective’classesoftrees,withthegoalofachievingabalancebetweenexpressivenessandcomplexity(KuhlmannandNivre,2006).Inthisarticlewefocusonaclassofmildlynon-projectivedependencystructurescalledwell-nesteddependencytreeswithblock-degreeatmost2.ThisclasswasfirstintroducedbyBodirskyetal.(2005),whoshowedthatitcorresponds,inanaturalway,totheclassofderivationtreesoflexicalizedtree-adjoin-inggrammars(JoshiandSchabes,1997).Whiletherearelinguisticargumentsagainsttherestrictiontothisclass(MaierandLichte,2011;Chen-MainandJoshi,2010),KuhlmannandNivre(2006)foundthatithasexcellentcoverageonstandarddatasets.Assum-inganarc-factoredmodel,well-nesteddependencytreeswithblock-degree(cid:20)2canbeparsedintimeO.n7/usingthealgorithmofG´omez-Rodr´ıguezetal.(2011).Recentemente,Pitleretal.(2012)haveshownthatifanadditionalrestrictioncalled1-inheritisim-posed,parsingcanbedoneintimeO.n6/,withoutanyadditionallossincoverageonstandarddatasets.Standardcontext-freeparsingmethods,whenadap-tedtotheparsingofprojectivetrees,provideO.n5/timecomplexity.TheO.n3/timeresultreportedbyEisner(1997)hasbeenobtainedbyexploitingmoresophisticateddynamicprogrammingtechniquesthat‘split’dependencytreesatthepositionoftheirheads,inordertosavebookkeeping.Splittingtechniqueshavealsobeenexploitedtospeedupparsingtimeforotherlexicalizedformalisms,suchasbilexicalcontext-freegrammarsandheadautomata(EisnerandSatta,1999).Tuttavia,toourknowledgenoat-tempthasbeenmadeintheliteraturetoextendthesetechniquestonon-projectivedependencyparsing.InthisarticleweleveragethecentralideafromEisner’salgorithmandextendittotheclassofwell-nesteddependencytreeswithblock-degreeatmost2.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

268

Weintroduceastructuralproperty,calledhead-split,thatallowsustosplitthesetreesatthepositionsoftheirheads.Thepropertyisrestrictive,meaningthatitreducestheclassoftreesthatcanbegenerated.However,weshowthattherestrictiontohead-splittreescomesatnosignificantlossincoverage,anditallowsparsingintimeO.n6/,anasymptoticimprove-mentofoneorderofmagnitudeoverthealgorithmbyG´omez-Rodr´ıguezetal.(2011)fortheunrestric-tedclass.Wealsoshowthatrestrictingtheclassofhead-splittreesbyimposingthealreadymentioned1-inheritpropertydoesnotcauseanyadditionallossincoverage,andthatparsingforthecombinedclassispossibleintimeO.n5/,oneorderofmagnitudefasterthanthealgorithmbyPitleretal.(2012)forthe1-inheritclasswithoutthehead-splitcondition.Theaboveresultshaveconsequencesalsofortheparsingofotherrelatedformalisms,suchasthealreadymentionedlexicalizedtree-adjoininggram-mars.Thiswillbediscussedinthefinalsection.2HeadSplittingTointroducethebasicideaofthisarticle,webrieflydiscussinthissectiontwowell-knownalgorithmsforcomputingthesetofallprojectivedependencytreesforagiveninputsentence:thena¨ıve,CKY-stylealgorithm,andtheimprovedalgorithmwithheadsplitting,intheversionofEisnerandSatta(1999).1CKYparsingTheCKY-stylealgorithmworksinapurebottom-upway,buildingdependencytreesbycombiningsubtrees.AssuminganinputstringwDa1(cid:1)(cid:1)(cid:1)an,N(cid:21)1,eachsubtreetisrepresentedbymeansofafinitesignatureŒi;j;H(cid:141),calleditem,wherei;jaretheboundarypositionsoft’sspanoverwandhisthepositionoft’sroot.Thisistheonlyinformationweneedinordertocombinesubtreesunderthearc-factoredmodel.NotethatthenumberofpossiblesignaturesisO.n3/.ThemainstepofthealgorithmisdisplayedinFigure1(UN).Hereweintroducethegraphicalconven-tion,usedthroughoutthisarticle,ofrepresentingasubtreebyashadedarea,withanhorizontallinein-dicatingthespannedfragmentoftheinputstring,andofmarkingthepositionoftheheadbyabullet.TheillustratedstepattachesatreewithsignatureŒk;j;D(cid:141)1Eisner(1997)describesaslightlydifferentalgorithm.(UN)ahadikj)ahadij(B)ahadk)ahad(C)ahadj)ahadjFigure1:Basicstepsfor(UN)theCKY-stylealgorithmand(B,C)theheadsplittingalgorithm.asadependentofatreewithsignatureŒi;k;H(cid:141).TherecanbeO.n5/instantiationsofthisstep,andthisisalsotherunningtimeofthealgorithm.Eisner’salgorithmEisnerandSatta(1999)im-proveovertheCKYalgorithmbyreducingthenum-berofpositionrecordsinanitem.Theydothisby‘splitting’eachtreeintoaleftandarightfragment,sothattheheadisalwaysplacedatoneofthetwoboundarypositionsofafragment,asopposedtobe-ingplacedataninternalposition.Inthiswayitemsneedonlytwoindices.Leftandrightfragmentscanbeprocessedindependently,andmergedafterwards.Letusconsiderarightfragmenttwithheadah.Attachmentattofarightdependenttreewithheadadisnowperformedintwosteps.Thefirststepat-tachesaleftfragmentwithheadad,asinFigure1(B).Thisresultsinanewtypeoffragment/itemthathasbothheadsahandadplacedatitsboundaries.Thesecondstepattachesarightfragmentwithheadad,asinFigure1(C).Thenumberofpossibleinstanti-ationsofthesesteps,andtheasymptoticruntimeofthealgorithm,isO.n3/.Inthisarticleweextendthesplittingtechniquetotheclassofwell-nesteddependencytreeswithblock-degreeatmost2.Thisamountstodefiningafac-torizationforthesetreesintofragments,eachwithitsownheadatoneofitsboundarypositions,alongwithsomeunfoldingoftheattachmentoperationintointermediatesteps.Whileforprojectivetreesheadsplittingcanbedonewithoutanylossincoverage,fortheextendedclassheadsplittingturnsouttobeaproperrestriction.Theempiricalrelevanceofthiswillbediscussedin(cid:144)7.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

269

3Head-SplitTreesInthissectionweintroducetheclassofwell-nesteddependencytreeswithblock-degreeatmost2,anddefinethesubclassofhead-splitdependencytrees.3.1PreliminariesFornon-negativeintegersi;jwewriteŒi;j(cid:141)tode-notethesetfi;iC1;:::;jg;wheni>j,Œi;j(cid:141)istheemptyset.ForastringwDa1(cid:1)(cid:1)(cid:1)an,wheren(cid:21)1andeachaiisalexicaltoken,andfori;j2Œ0;N(cid:141)withi(cid:20)j,wewritewi;jtodenotethesubstringaiC1(cid:1)(cid:1)(cid:1)ajofw;wi;iistheemptystring.Adependencytreetoverwisadirectedtreewhosenodesareasubsetofthetokensaiinwandwhosearcsencodeadependencyrelationbetweentwonodes.Wewriteai!ajtodenotethearc.ai;aj/int;here,thenodeaiisthehead,andthenodeajisthedependent.Ifeachtokenai,i2Œ1;N(cid:141),isanodeoft,thentiscalledcomplete.Sometimeswewritetaitoemphasizethattreetisrootedinnodeai.Ifaiisanodeoft,wealsowritetŒai(cid:141)todenotethesubtreeoftcomposedbynodeaiasitsrootandallofitsdescendantnodes.Thenodesoftuniquelyidentifyasetofmax-imalsubstringsofw,thatis,substringsseparatedbytokensnotint.Thesequenceofsuchsubstrings,orderedfromlefttoright,istheyieldoft,writtenyd.t/.Letaibesomenodeoft.Theblock-degreeofaiint,writtenbd.ai;t/,isdefinedasthenumberofstringcomponentsofyd.tŒai(cid:141)/.Theblock-degreeoft,writtenbd.t/,isthemaximalblock-degreeofitsnodes.Treetisnon-projectiveifbd.t/>1.Treetiswell-nestedif,foreachnodeaioftandforeverypairofoutgoingdependenciesai!ad1andai!ad2,thestringcomponentsofyd.tŒad1(cid:141)/andyd.tŒad2(cid:141)/donot‘interleave’inw.Moreprecisely,itisrequiredthat,ifsomecomponentofyd.tŒadi(cid:141)/,i2Œ1;2(cid:141),occursinwinbetweentwocomponentss1;s2ofyd.tŒadj(cid:141)/,j2Œ1;2(cid:141)andj¤i,thenallcomponentsofyd.tŒadi(cid:141)/occurinbetweens1;s2.Throughoutthisarticle,wheneverweconsideradependencytreetwealwaysimplicitlyassumethattisoverw,thatthasblock-degreeatmost2,andthattiswell-nested.Lettaibesuchatree,withbd.ai;tai/D2.Wecalltheportionofwinbetweenthetwosubstringsofyd.tai/thegapoftai,denotedbygap.tai/.ahad4ad3ad2ad1m.tah/Figure2:Exampleofanodeahwithblock-degree2inanon-projective,well-nesteddependencytreetah.Integerm.tah/,definedin(cid:144)3.2,isalsomarked.Example1Figure2schematicallydepictsawell-nestedtreetahwithblock-degree2;wehavemarkedtherootnodeahanditsdependentnodesadi.Foreachnodeadi,ashadedareahighlightstŒadi(cid:141).Wehavebd.ah;tah/Dbd.ad1;tah/Dbd.ad4;tah/D2andbd.ad2;tah/Dbd.ad3;tah/D1.(cid:3)3.2TheHead-SplitPropertyWesaythatadependencytreethasthehead-splitpropertyifitsatisfiesthefollowingcondition.Letah!adbeanydependencyintwithbd.ah;t/Dbd.ad;t/D2.Whenevergap.tŒad(cid:141)/containsah,itmustalsocontaingap.tŒah(cid:141)/.Intuitively,thismeansthatifyd.tŒad(cid:141)/‘crossesover’thelexicaltokenahinw,thenyd.tŒad(cid:141)/mustalso‘crossover’gap.tŒah(cid:141)/.Example2Dependencyah!ad1inFigure3viol-atesthehead-splitcondition,sinceyd.tŒad1(cid:141)/crossesoverthelexicaltokenahinw,butdoesnotcrossovergap.tŒah(cid:141)/.Theremainingoutgoingdependenciesofahtriviallysatisfythehead-splitcondition,sincethechildnodeshaveblock-degree1.(cid:3)Lettahbeadependencytreesatisfyingthehead-splitpropertyandwithbd.ah;tah/D2.Wespecifybelowaconstructionthat‘splits’tahwithrespecttothepositionoftheheadahinyd.tah/,resultingintwodependencytreessharingtherootahandhavingalloftheremainingnodesformingtwodisjointsets.Furthermore,theresultingtreeshaveblock-degreeatmost2.ahad1ad2ad3Figure3:Arcah!ad1violatesthehead-splitcondition.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

270

(UN)ahad4ad3(B)ahad2ad1m.tah/Figure4:Lowertree(UN)anduppertree(B)fragmentsforthedependencytreeinFigure2.Letyd.tah/Dhwi;j;wp;qiandassumethatahisplacedwithinwi;j.(Asymmetricconstructionshouldbeusedincaseahisplacedwithinwp;q.)Themirrorimageofahwithrespecttogap.tah/,writtenm.tah/,isthelargestintegerinŒp;q(cid:141)suchthattherearenodependencieslinkingnodesinwi;H(cid:0)1andnodesinwp;m.tah/andtherearenodependencieslinkingnodesinwh;jandnodesinwm.tah/;q.Itisnothardtoseethatsuchanintegeralwaysexists,sincetahiswell-nested.Weclassifyeverydependentadofahasbeingan‘upper’dependentora‘lower’dependentofah,accordingtothefollowingconditions:(io)Ifd2Œi;H(cid:0)1(cid:141)[Œm.tah/C1;q(cid:141),thenadisanupperdependentofah.(ii)Ifd2ŒhC1;j(cid:141)[Œp;m.tah/(cid:141),thenadisalowerdependentofah.Theuppertreeoftahisthedependencytreerootedinahandcomposedofalldependenciesah!adintahwithadanupperdependentofah,alongwithallsubtreestahŒad(cid:141)rootedinthosedependents.Similarly,thelowertreeoftahisthedependencytreerootedinahandcomposedofalldependenciesah!adintahwithadalowerde-pendentofah,alongwithallsubtreestahŒad(cid:141)rootedinthosedependents.Asageneralconvention,inthisarticlewewritetU;ahandtL;ahtodenotetheupperandthelowertreesoftah,respectively.Notethat,insomedegeneratecases,thesetoflowerorupperde-pendentsmaybeempty;thentU;ahortL;ahconsistsoftherootnodeahonly.Example3ConsiderthetreetahdisplayedinFig-ure2.Integerm.tah/denotestheboundarybetweentherightcomponentofyd.tahŒad4(cid:141)/andtherightcomponentofyd.tahŒad1(cid:141)/.Nodesad3andad4arelowerdependents,andnodesad1andad2areupperdependents.TreestL;ahandtU;aharedisplayedinFigure4(UN)E(B),respectively.(cid:3)Theimportanceofthehead-splitpropertycanbeinformallyexplainedasfollows.Letah!adbeadependencyintah.Whenwetakeaparttheupperandthelowertreesoftah,theentiresubtreetahŒad(cid:141)endsupineitherofthesetwofragments.Thisallowsustorepresentupperandlowerfragmentsforsomeheadindependentlyoftheother,andtofreelyrecombinethem.Moreformally,ouralgorithmswillmakeuseofthefollowingthreeproperties,statedherewithoutanyformalproof:P1TreestU;ahandtL;aharewell-nested,haveblock-degree(cid:20)2,andsatisfythehead-splitproperty.P2TreestU;ahandtL;ahhavetheirheadahalwaysplacedatoneoftheboundariesintheiryields.P3Lett0U;ahandt00L;ahbetheupperandlowertreesofdistincttreest0ahandt00ah,respectively.Ifm.t0ah/Dm.t00ah/,thenthereexistsatreetahsuchthattU;ahDt0U;ahandtL;ahDt00L;ah.4ParsingItemsLetwDa1(cid:1)(cid:1)(cid:1)an,N(cid:21)1,betheinputstring.Weneedtocompactlyrepresenttreesthatspansubstringsofwbyrecordingonlytheinformationthatisneededtocombinethesetreesintolargertreesduringtheparsingprocess.Wedothisbyassociatingeachtreewithasignature,calleditem,whichisatupleŒi;j;P;q;H(cid:141)X,whereh2Œ1;N(cid:141)identifiesthetokenah,io;jwith0(cid:20)io(cid:20)j(cid:20)nidentifyasubstringwi;j,andp;qwithjh>jC1ThetwocaseshjC1abovewillbeusedwhentherootnodeahoftahhasnotyetcollectedallofitsdependents.Notethath2fi;jC1gisnotusedinthedefinitionofitem.Thisismeanttoavoiddiffer-entitemsrepresentingthesamedependencytree,

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

271

whichisundesiredforthespecificationofoural-gorithm.Asanexample,itemsŒi;j;(cid:0);(cid:0);iC1(cid:141)XandŒiC1;j;(cid:0);(cid:0);iC1(cid:141)Xbothrepresentadepend-encytreetaiC1withyd.taiC1/Dhwi;ji.Thisandothersimilarcasesareavoidedbythebanagainsth2fi;jC1g,whichamountstoimposingsomenormalformforitems.Inourexample,onlyitemŒi;j;(cid:0);(cid:0);iC1(cid:141)Xisavalidsignature.Finally,wedistinguishamongseveralitemtypes,indicatedbythevalueofsubscriptX.Thesetypesarespecifictoeachparsingalgorithm,andwillbedefinedinlatersections.5ParsingofHead-SplitTreesWepresentinthissectionourfirsttabularalgorithmforcomputingthesetofalldependencytreesforaninputsentencewthathavethehead-splitproperty,underthearc-factoredmodel.Recallthattaidenotesatreewithrootai,andtL;aiandtU;aiaretheloweranduppertreesoftai.Thestepsofthealgorithmarespecifiedbymeansofdeductionrulesoveritems,followingtheapproachofShieberetal.(1995).5.1BasicIdeaOuralgorithmbuildstreesstepbystep,byattachingatreetah0asadependentofatreetahandcreatingthenewdependencyah!ah0.Computationally,theworstcaseforthisoperationiswhenbothtahandtah0haveagap;Poi,foreachtreeweneedtokeeparecordofthefourboundaries,alongwiththepositionofthehead,asdonebyG´omez-Rodr´ıguezetal.(2011).Tuttavia,ifweareinterestedinparsingtreesthatsatisfythehead-splitproperty,wecanavoidrepresentingatreewithagapbymeansofasingleitem.Weinsteadfollowthegeneralideaof(cid:144)2forprojectiveparsing,andusedifferentitemsfortheupperandthelowertreesofthesourcetree.Whenweneedtoattachtah0asanupperdependentoftah,definedasin(cid:144)3.2,weperformtwoconsecutivesteps.First,weattachtL;ah0totU;ah,resultinginanewintermediatetreet1.Asasecondstep,weattachtU;ah0tot1,resultinginanewtreet2whichistU;ahwithtah0attachedasanupperdependent,asdesired.BothstepsaredepictedinFigure5;hereweintroducetheconventionofindicatingtreegroupingthroughadashedline.Asymmetricprocedurecanbeusedtoattachtah0asalowerdependenttotL;ah.TheahtU;ahah0tL;ah0+t1(UN)t1ah0ahah0tU;ah0+t2(B)Figure5:Twostepattachmentoftah0attU;ah:(UN)attach-mentoftL;ah0;(B)attachmentoftU;ah0.correctnessofthetwostepapproachfollowsfrompropertiesP1andP3in(cid:144)3.2.BypropertyP2in(cid:144)3.2,inbothstepsabovethelexicalheadsahandah0canbereadfromthebound-ariesoftheinvolvedtrees.Thenthesestepscanbeimplementedmoreefficientlythanthena¨ıvemethodofattachingtah0totahinasinglestep.Amorede-tailedcomputationalanalysiswillbeprovidedin(cid:144)5.7.Tosimplifythepresentation,werestricttheuseofheadsplittingtotreeswithagapandparsetreeswithnogapwiththena¨ıvemethod;thisdoesnotaffectthecomputationalcomplexity.5.2ItemTypesWedistinguishfivedifferenttypesofitems,indicatedbythesubscriptX2f0;l;U;=L;=Ug,asdescribedinwhatfollows.(cid:15)IfXD0,wehavepDqD(cid:0)andyd.ah/isspecifiedasin(cid:144)4.(cid:15)IfXDL,weusetheitemtorepresentsomelowertree.Wehavethereforep;(cid:0)andh2fiC1;qg.(cid:15)IfXDU,weusetheitemtorepresentsomeuppertree.Wehavethereforep;(cid:0)andh2fj;pC1g.(cid:15)IfXD=LorXD=U,weusetheitemtorepresentsomeintermediatestepintheparsingprocess,inwhichonlytheloweroruppertreeofsomedependenthasbeencollectedbytheheadah,andwearestillmissingtheupper(=U)orthelower(=L)tree.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

272

Wefurtherspecializesymbol=Ubywriting=U<(=U>)toindicatethatthemissinguppertreeshouldhaveitsheadtotheleft(right)ofitsgap.Wealsouse=Lwithasimilarmeaning.5.3ItemNormalFormItcouldhappenthatouralgorithmproducesitemsoftype0thatdonotsatisfythenormalformconditiondiscussedin(cid:144)4.Toavoidthisproblem,weassumethateveryitemoftype0thatisproducedbythealgorithmisconvertedintoanequivalentnormalformitem,bymeansofthefollowingrules:Œi;j;(cid:0);(cid:0);io(cid:141)0Œi(cid:0)1;j;(cid:0);(cid:0);io(cid:141)0(1)Œi;j;(cid:0);(cid:0);jC1(cid:141)0Œi;jC1;(cid:0);(cid:0);jC1(cid:141)0(2)5.4ItemsofType0Westartwithdeductionrulesthatproduceitemsoftype0.Asalreadymentioned,wedonotapplytheheadsplittingtechniqueinthiscase.Thenextrulecreatestreeswithasinglenode,rep-resentingthehead,andnodependents.Theruleisactuallyanaxiom(thereisnoantecedent)andthestatementi2Œ1;N(cid:141)isasidecondition.Œi(cid:0)1;io;(cid:0);(cid:0);io(cid:141)0˚i2Œ1;N(cid:141)(3)Thenextruletakesatreeheadedinah0andmakesitadependentofanewheadah.Thisruleimple-mentswhathasbeencalledthe‘hooktrick’.Thefirstsideconditionenforcesthatthetreeheadedinah0hascollectedallofitsdependents,asdiscussedin(cid:144)4.Thesecondsideconditionenforcesthatnocycleiscreated.Wealsowriteah!ah0toindicatethatanewdependencyiscreatedintheparseforest.Œi;j;(cid:0);(cid:0);h0(cid:141)0Œi;j;(cid:0);(cid:0);H(cid:141)08<:h02ŒiC1;j(cid:141)h62ŒiC1;j(cid:141)ah!ah0(4)Thenexttworulescombinegap-freedependentsofthesameheadah.Œi;k;(cid:0);(cid:0);h(cid:141)0Œk;j;(cid:0);(cid:0);h(cid:141)0Œi;j;(cid:0);(cid:0);h(cid:141)0(5)Œi;h;(cid:0);(cid:0);h(cid:141)0Œh(cid:0)1;j;(cid:0);(cid:0);h(cid:141)0Œi;j;(cid:0);(cid:0);h(cid:141)0(6)Weneedthespecialcasein(6)todealwiththecon-catenationoftwoitemsthatsharetheheadahattheconcatenationpoint.Observetheapparentmismatchinstep(6)betweenindexhinthefirstantecedentandindexh(cid:0)1inthesecondantecedent.Thisisbecauseinournormalform,boththefirstandthesecondantecedenthavealreadyincorporatedacopyofthesharedheadah.Thenexttworulescollectadependentofahthatwrapsaroundthedependentsthathavealreadybeencollected.Asalreadydiscussed,thisoperationisperformedbytwosuccessivesteps:Wefirstcollectthelowertreeandthentheuppertree.Wepresentthecaseinwhichthesharedheadofthetwotreesisplacedattheleftofthegap.Thecaseinwhichtheheadisplacedattherightofthegapissymmetric.Œi0;j0;(cid:0);(cid:0);h(cid:141)0Œi;i0;j0;j;iC1(cid:141)LŒi;j;(cid:0);(cid:0);h(cid:141)=U<(cid:26)h62ŒiC1;i0(cid:141)[Œj0C1;j(cid:141)(7)Œi0;j0;(cid:0);(cid:0);h(cid:141)=U<Œi;i0C1;j0;j;i0C1(cid:141)UŒi;j;(cid:0);(cid:0);h(cid:141)08<:h62ŒiC1;i0C1(cid:141)[Œj0C1;j(cid:141)ah!ai0C1(8)Again,thereisanoverlapinrule(8)betweenthetwoantecedents,duetothefactthatbothitemshavealreadyincorporatedcopiesofthesamehead.5.5ItemsofTypeUWenowconsiderthedeductionrulesthatareneededtoprocessuppertrees.Throughoutthissubsectionweassumethattheheadoftheuppertreeisplacedattheleftofthegap.Theothercaseissymmetric.Thenextrulecreatesanuppertreewithasinglenode,rep-resentingitshead,andnodependents.Weconstructanitemforallpossiblerightgapboundariesj.Œi(cid:0)1;i;j;j;i(cid:141)U(cid:26)i2Œ1;n(cid:141)j2ŒiC1;n(cid:141)(9)Thenextruleaddstoanuppertreeagroupofnewdependentsthatdonothaveanygap.Wepresentthecaseinwhichthenewdependentsareplacedattheleftofthegapoftheuppertree.Œi;i0;(cid:0);(cid:0);j(cid:141)0Œi0;j;p;q;j(cid:141)UŒi;j;p;q;j(cid:141)U(10) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 6 1 5 6 6 6 6 5 / / t l a c _ a _ 0 0 2 2 6 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 273 Thenexttworulescollectanewdependentthatwrapsaroundtheuppertree.Again,thisoperationisperformedbytwosuccessivesteps:Wefirstcollectthelowertree,thentheuppertree.Wepresentthecaseinwhichthesharedheadofthetwotreesisplacedattheleftofthegap.Œi0;j;P;q0;j(cid:141)UŒi;i0;q0;q;iC1(cid:141)LŒi;j;P;q;j(cid:141)=U<(11)Œi0;j;p;q0;j(cid:141)=U<Œi;i0C1;q0;q;i0C1(cid:141)UŒi;j;p;q;j(cid:141)U˚aj!ai0C1(12)5.6ItemsofTypeLSofarwehavealwaysexpandeditems(type0orU)attheirexternalboundaries.Whendealingwithlowertrees,wehavetoreversethisstrategyandexpanditems(typeL)attheirinternalboundaries.Apartfromthisdifference,thedeductionrulesbelowareentirelysymmetrictothosein(cid:144)5.5.Again,weas-sumethattheheadofthelowertreeisplacedattheleftofthegap,theothercasebeingsymmetric.Ourfirstrulecreatesalowertreewithasinglenode,representingitshead.Weblindlyguesstherightboundaryofthegapofsuchatree.Œi(cid:0)1;i;j;j;i(cid:141)L(cid:26)i2Œ1;n(cid:141)j2ŒiC1;n(cid:141)(13)Thenextruleaddstoalowertreeagroupofnewdependentsthatdonothaveanygap.Wepresentthecaseinwhichthenewdependentsareplacedattheleftofthegapofthelowertree.Œj0;j;(cid:0);(cid:0);iC1(cid:141)0Œi;j0;p;q;iC1(cid:141)LŒi;j;p;q;iC1(cid:141)L(14)Thenexttworulescollectanewdependentwithagapandembeditwithinthegapofourlowertree,creatinganewdependency.Again,thisoperationisperformedbytwosuccessivesteps,andwepresentthecaseinwhichthecommonheadoftheloweranduppertreesthatareembeddedisplacedattheleftofthegap,theothercasebeingsymmetric.Œi;j0;p0;q;iC1(cid:141)LŒj0;j;p;p0;j(cid:141)UŒi;j;p;q;iC1(cid:141)=L<(15)Œi;j0;p0;q;iC1(cid:141)=L<Œj0(cid:0)1;j;p;p0;j0(cid:141)LŒi;j;p;q;iC1(cid:141)L˚aiC1!aj0(16)ahad1ad2ad3ad4ad5tU;ahtLL;ahtLR;ahFigure6:Nodeahsatisfiesboththe1-inheritandhead-splitconditions.Accordingly,treetahcanbesplitintothreefragmentstU;ah,tLL;ahandtLR;ah.5.7RuntimeThealgorithmrunsintimeO.n6/,wherenisthelengthoftheinputsentence.Theworstcaseisduetodeductionrulesthatcombinetwoitems,eachofwhichrepresentstreeswithonegap.Forinstance,rule(11)involvessixfreeindicesrangingoverŒ1;N(cid:141),andthuscouldbeinstantiatedO.n6/manytimes.Ifthehead-splitpropertydoesnothold,attachmentofadependentinonestepresultsintimeO.n7/,asseenforinstanceinG´omez-Rodr´ıguezetal.(2011).6Parsingof1-InheritHead-SplitTreesInthissectionwespecializetheparsingalgorithmof(cid:144)5toanew,moreefficientalgorithmforarestric-tedclassoftrees.6.11-InheritHead-SplitTreesPitleretal.(2012)introducearestrictiononwell-nes-teddependencytreeswithblock-degreeatmost2.Atreetsatisfiesthe1-inheritpropertyif,foreverynodeahintwithbd.ah;t/D2,thereisatmostonedependencyah!ad(cid:3)suchthatgap.tŒad(cid:3)(cid:141)/containsgap.tŒah(cid:141)/.Informally,thismeansthatyd.tŒad(cid:3)(cid:141)/‘crossesover’gap.tŒah(cid:141)/,andwesaythatad(cid:3)‘inherits’thegapofah.Inthissectionwein-vestigatetheparsingofhead-splittreesthatalsohavethe1-inheritproperty.Example4Figure6showsaheadnodeahalongwithdependentsadi,satisfyingthehead-splitcondi-tion.Onlytad1hasitsyieldcrossingovergap.tah/.Thusahalsosatisfiesthe1-inheritcondition.(cid:3)6.2BasicIdeaLettahbesometreesatisfyingboththehead-splitpropertyandthe1-inheritpropery.Assumethatthedependentnodead(cid:3)whichinheritsthegapoftahisplacedwithintU;ah.Thismeansthat,forevery l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 6 1 5 6 6 6 6 5 / / t l a c _ a _ 0 0 2 2 6 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 274 dependencyah!adintL;ah,yd.tŒad(cid:141)/doesnotcrossovergap.tL;ah/.ThenwecanfurthersplittL;ahintotwotrees,bothwithrootah.Wecallthesetwotreesthelower-lefttree,writtentLL;ah,andthelower-righttree,writtentLR;ah;seeagainFigure6.ThebasicideabehindouralgorithmistosplittahintothreedependencytreestU;ah,tLL;ahandtLR;ah,allsharingthesamerootah.Thismeansthattahcanbeattachedtoanexistingtreethroughthreesuc-cessivesteps,eachprocessingoneofthethreetreesabove.ThecorrectnessofthisprocedurefollowsfromastraightforwardextensionofpropertiesP1andP3from(cid:144)3.2,statingthatthetreefragmentstU;ah,tLL;ahandtLR;ahcanberepresentedandprocessedoneindependentlyoftheothers,andfreelycombinedifcertainconditionsaresatisfiedbytheiryields.Incasead(cid:3)isplacedwithintL;ah,weintroducetheupper-leftandtheupper-righttrees,writtentUL;ahandtUR;ah,andapplyasimilaridea.6.3ItemTypesWhenprocessinganattachment,theorderinwhichthealgorithmassemblesthethreetreefragmentsoftahdefinedin(cid:144)6.2isnotalwaysthesame.Suchanorderischosenonthebasisofwheretheheadahandthedependentad(cid:3)inheritingthegapareplacedwithintheinvolvedtrees.Asaconsequence,inouralgorithmweneedtorepresentseveralintermediateparsingstates.Besidestheitemtypesfrom(cid:144)5.2,wethereforeneedadditionaltypes.Thespecificationofthesenewitemtypesisrathertechnical,andisthereforedelayeduntilweintroducetherelevantde-ductionrules.6.4ItemsofType0WestartwiththedeductionrulesforparsingoftreestLL;ahandtLR;ah;treestUL;ahandtUR;ahcanbetreatedsymmetrically.TheyieldsoftLL;ahandtLR;ahhavetheformspecifiedin(cid:144)4forthecasepDqD(cid:0).Wecanthereforeuseitemsoftype0toparsethesetrees,adoptingastrategysimilartotheonein(cid:144)5.4.Themaindifferenceisthat,whenatreetah0withagapisattachedasadependenttotheheadah,weusethreeconsecutivesteps,eachprocessingasinglefragmentoftah0.Weassumebelowthattah0canbesplitintotreestU;ah0,tLL;ah0andtLR;ah0,theothercasecanbetreatedinasimilarway.Weuserules(3),(4)E(5)from(cid:144)5.4.Sinceinad(cid:3)ah„ƒ‚…(cid:27)1„ƒ‚…(cid:27)2„ƒ‚…(cid:27)3„ƒ‚…(cid:27)4tad(cid:3)tU;ad(cid:3)tLL;ad(cid:3)tLR;ad(cid:3)Figure7:TreetU;ahisdecomposedintotad(cid:3)andsubtreescoveringsubstrings(cid:27)io,i2Œ1;4(cid:141).Treetad(cid:3)isinturndecomposedintothreefragments(treestLL;ad(cid:3),tLR;ad(cid:3),andtU;ad(cid:3)inthisexample).thetreestLL;ahandtLR;ahtheheadisneverplacedinthemiddleoftheyield,rule(6)isnotneedednowanditcansafelybediscarded.Rule(7),attachingalowertree,needstobereplacedbytwonewrules,processingalower-leftandalower-righttree.Weassumeherethatthecommonheadofthesetreesisplacedattheleftboundaryofthelower-lefttree;weleaveoutthesymmetriccase.Œi;i0;(cid:0);(cid:0);iC1(cid:141)0Œi0;j;(cid:0);(cid:0);H(cid:141)0Œi;j;(cid:0);(cid:0);H(cid:141)=LR<˚h62ŒiC1;i0(cid:141)(17)Œj0;j;(cid:0);(cid:0);iC1(cid:141)0Œi;j0;(cid:0);(cid:0);h(cid:141)=LR<Œi;j;(cid:0);(cid:0);h(cid:141)=U<˚h62Œj0C1;j(cid:141)(18)Thefirstantecedentin(17)encodesalower-lefttreewithitsheadattheleftboundary.Theconsequentitemhasthenthenewtype=LR<,meaningthatalower-righttreeismissingthatmusthaveitsheadattheleft.Thefirstantecedentin(18)providesthemissinglower-righttree,havingthesameheadasthealreadyincorporatedlower-lefttree.Aftertheserulesareapplied,rule(8)from(cid:144)5.4canbeappliedtotheconsequentitemof(18).Thiscompletestheattachmentofa‘wrapping’dependentofah,withtheincorporationofthemissinguppertreeandwiththeconstructionofthenewdependency.6.5ItemsofTypeUWenowassumethatnodead(cid:3)isrealizedwithintU;ah,sothattahcanbesplitintotreestU;ah,tLL;ahandtLR;ah.WeprovidedeductionrulestoparseoftU;ah;thisisthemostinvolvedpartofthealgorithm.Incasead(cid:3)isrealizedwithintL;ah,tahmustbesplitintotL;ah,tUL;ahandtUR;ah,andasymmetricalstrategycanbeappliedtoparsetL;ah. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 6 1 5 6 6 6 6 5 / / t l a c _ a _ 0 0 2 2 6 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 275 ad(cid:3)ah(cid:27)1(cid:27)2(cid:27)3(cid:27)4tU;ad(cid:3)tLL;ad(cid:3)tLR;ad(cid:3)rule(19)rule(20)Figure8:DecompositionoftU;ahasinFigure7,withhighlightedapplicationofrules(19)E(20).Westartbyobservingthatyd.tad(cid:3)/splitsyd.tU;ah/intoatmostfoursubstrings(cid:27)io;seeFig-ure7.2Becauseofthewell-nestedproperty,withinthetreetU;aheachdependentofahotherthanad(cid:3)hasayieldthatisentirelyplacedwithinoneofthe(cid:27)i’ssubstrings.Thismeansthateachsubstring(cid:27)icanbeparsedindependentlyoftheothersubstrings.AsafirststepintheprocessofparsingtU;ah,weparseeachsubstring(cid:27)i.Wedothisfollowingtheparsingstrategyspecifiedin(cid:144)6.4.Asasecondstep,weassumethateachofthethreefragmentsresultingfromthedecompositionoftreetad(cid:3)hasalreadybeenparsed;seeagainFigure7.Wethen‘merge’thesethreefragmentsandthetreesforsegments(cid:27)i’sintoacompleteparsetreerepresentingtU;ah.Thisisdescribedindetailinwhatfollows.WeassumethatahisplacedattheleftofthegapoftU;ah(therightcasebeingsymmetrical)andwedistinguishfourcases,dependingonthetwowaysinwhichtad(cid:3)canbesplit,andthetwosidepositionsoftheheadad(cid:3)withrespecttogap.tad(cid:3)/.Case1Weassumethattad(cid:3)canbesplitintotreestU;ad(cid:3),tLL;ad(cid:3),tLR;ad(cid:3),andtheheadad(cid:3)isplacedattheleftofgap.tad(cid:3)/;seeagainFigure7.Rule(19)belowcombinestLL;ad(cid:3)withaparseforsegment(cid:27)2,whichhasitsheadahplacedatitsrightboundary;seeFigure8foragraphicalrepresentationofrule(19).TheresultisanitemofthenewtypeHH.Thisitemisusedtorepresentanintermediatetreefragmentwithrootofblock-degree1,whereboththeleftandtherightboundariesareheads;adependency2Accordingtoourdefinitionofm.tah/in(cid:144)3.2,(cid:27)3isalwaystheemptystring.However,herewedealwiththegeneralformu-lationoftheprobleminordertoclaimin(cid:144)8thatouralgorithmcanbedirectlyadaptedtoparsesomesubclassesoflexicalizedtree-adjoininggrammars.ahad(cid:3)(cid:27)1(cid:27)2(cid:27)3(cid:27)4tU;ad(cid:3)tLL;ad(cid:3)tLR;ad(cid:3)rule(22)rule(23)Figure9:DecompositionoftU;ahasinFigure7,withhighlightedapplicationofrules(22)E(23).betweentheseheadswillbeconstructedlater.Œi;i0;(cid:0);(cid:0);iC1(cid:141)0Œi0;j;(cid:0);(cid:0);j(cid:141)0Œi;j;(cid:0);(cid:0);j(cid:141)HH(19)Rule(20)combinestU;ad(cid:3)withatype0itemrep-resentingtLR;ad(cid:3);seeagainFigure8.Notethatthiscombinationoperationexpandsanuppertreeatoneofitsinternalboundaries,somethingthatwasnotpossiblewiththerulesspecifiedin(cid:144)5.5.Œi;j;p0;q;j(cid:141)UŒp;p0;(cid:0);(cid:0);j(cid:141)0Œi;j;P;q;j(cid:141)U(20)Finalmente,wecombinetheconsequentsof(19)E(20),andprocessthedependencythatwasleftpendingintheitemoftypeHH.Œi;j0;P;q;j0(cid:141)UŒj0(cid:0)1;j;(cid:0);(cid:0);j(cid:141)HHŒi;j;P;q;j(cid:141)U˚aj!aj0(21)Aftertheabovesteps,parsingoftU;ahcanbecom-pletedbycombiningitemŒi;j;P;q;j(cid:141)Ufrom(21)withitemsoftype0representingparsesforthesub-strings(cid:27)1,(cid:27)3E(cid:27)4.Case2Weassumethattad(cid:3)canbesplitintotreestU;ad(cid:3),tLL;ad(cid:3),tLR;ad(cid:3),andtheheadad(cid:3)isplacedattherightofgap.tad(cid:3)/,asdepictedinFigure9.Rule(22)below,graphicallyrepresentedinFig-ure9,combinestU;ad(cid:3)withatype0itemrepresent-ingtLL;ad(cid:3).Thiscanbeviewedasthesymmetricversionofrule(20)ofCase1,expandinganuppertreeatoneofitsinternalboundaries.Œi;j0;P;q;pC1(cid:141)UŒj0;j;(cid:0);(cid:0);pC1(cid:141)0Œi;j;P;q;pC1(cid:141)U(22) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 6 1 5 6 6 6 6 5 / / t l a c _ a _ 0 0 2 2 6 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 276 ArabicCzechDanishDutchPortugueseSwedishNumberoftrees1,46072,7035,19013,3499,07111,042WN2O.n7/1,45899.9%72,32199.5%5,17599.7%12,89696.6%8,65095.4%10,95599.2%ClassesconsideredinthispaperWN2+HSO.n6/1,45799.8%72,18299.3%5,17499.7%12,77495.7%8,64895.3%10,95199.2%WN2+HS+1IO.n5/1,45799.8%72,18299.3%5,17499.7%12,77495.7%8,64895.3%10,95199.2%ClassesconsideredbyPitleretal.(2012)WN2+1IO.n6/1,45899.9%72,32199.5%5,17599.7%12,89696.6%8,65095.4%10,95599.2%WN2+0IO.n5/1,39495.5%70,69597.2%4,98596.1%12,06890.4%8,48193.5%10,78797.7%ProjectiveO.n3/1,29788.8%55,87276.8%4,37984.4%8,48463.6%7,35381.1%9,96390.2%Table1:CoverageofvariousclassesofdependencytreesonthetrainingsetsusedintheCoNLL-Xsharedtask(WN2=well-nested,block-degree(cid:20)2;HS=head-split;1I=1-inherit;0I=0-inherit,‘gap-minding’)Prossimo,wecombinetheresultof(22)withaparseforsubstring(cid:27)2.Theresultisanitemofthenewtype=LR>.Thisitemisusedtorepresentanintermediatetreefragmentthatismissingalower-righttreewithitsheadattheright.Inthisfragment,twoheadsareleftpending,andadependencyrelationwillbeeventuallyestablishedbetweenthem.Œi;j0;P;q;pC1(cid:141)UŒj0;j;(cid:0);(cid:0);j(cid:141)0Œi;j;P;q;j(cid:141)=LR>(23)Thenextrulecombinestheconsequentitemof(23)withatreetLR;ad(cid:3)havingitsheadattherightbound-ary,andprocessesthedependencythatwasleftpendinginthe=LR>item.Œi;j;p0;q;j(cid:141)=LR>Œp;p0C1;(cid:0);(cid:0);p0C1(cid:141)0Œi;j;P;q;j(cid:141)U˚aj!ap0C1(24)Aftertheaboverules,parsingoftU;ahcontinuesbycombiningtheconsequentitemŒi;j;P;q;j(cid:141)Ufromrule(24)withitemsrepresentingparsesforthesub-strings(cid:27)1,(cid:27)3E(cid:27)4.Cases3and4Weinformallydiscussthecasesinwhichtad(cid:3)canbesplitintotreestL;ad(cid:3),tUL;ad(cid:3),tUR;ad(cid:3),forbothpositionsoftheheadad(cid:3)withre-specttogap.tad(cid:3)/.InbothcaseswecanadoptastrategysimilartotheoneofCase2.WefirstexpandtL;ad(cid:3)externally,atthesideop-positetotheheadad(cid:3),withatreefragmenttUL;ad(cid:3)ortUR;ad(cid:3),similarlytorule(22)ofCase2.Thisresultsinanewfragmentt1.Next,wemerget1withaparsefor(cid:27)2containingtheheadah,similarlytorule(23)ofCase2.Thisresultsinanewfrag-mentt2whereadependencyrelationinvolvingtheheadsad(cid:3)andahisleftpending.Finally,wemerget2withamissingtreetUL;ad(cid:3)ortUR;ad(cid:3),andpro-cessthependingdependency,similarlytorule(24).OneshouldcontrastthisstrategywiththealternativestrategyadoptedinCase1,wherethefragmentoftad(cid:3)havingblock-degree2cannotbemergedwithaparseforthesegmentcontainingtheheadah((cid:27)2inCase1),becauseofaninterveningfragmentoftad(cid:3)withblock-degree1(tLL;ad(cid:3)inCase1).Finalmente,ifthereisnonodead(cid:3)intU;ahthatinheritsthegapofah,wecansplittU;ahintotwodependencytrees,aswehavedonefortL;ahin(cid:144)6.2,andparsethetwofragmentsusingthestrategyof(cid:144)6.4.6.6RuntimeOuralgorithmrunsintimeO.n5/,wherenisthelengthoftheinputsentence.Thereasonoftheim-provementwithrespecttotheO.n6/resultof(cid:144)5isthatwenolongerhavedeductionruleswherebothantecedentsrepresenttreeswithagap.Inthenewal-gorithm,theworstcaseisduetoruleswhereonlyoneantecedenthasagap.Thisleadstorulesinvolvingamaximumoffiveindices,rangingoverŒ1;N(cid:141).TheserulescanbeinstantiatedinO.n5/ways.7EmpiricalCoverageWehaveseenthattherestrictiontohead-splitde-pendencytreesenablesustoparsethesetreesoneorderofmagnitudefasterthantheclassofwell-nes-teddependencytreeswithblock-degreeatmost2.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

277

Inconnectionwiththe1-inheritproperty,thisevenincreasestotwoordersofmagnitude.However,asalreadystatedin(cid:144)2,thisimprovementispaidforbyalossincoverage;forinstance,treesoftheformshowninFigure3cannotbeparsedanylonger.7.1QuantitativeEvaluationInordertoassesstheempiricallossincoveragethattherestrictiontohead-splittreesincurs,weevaluatedthecoverageofseveralclassesofdependencytreesonstandarddatasets.FollowingPitleretal.(2012),wereportinTable1figuresforthetrainingsetsofsixlanguagesusedintheCoNLL-Xsharedtaskondependencyparsing(BuchholzandMarsi,2006).Aswecansee,theO.n6/classofhead-splittreeshasonlyslightlylowercoverageonthisdatathanthebaselineclassofwell-nesteddependencytreeswithblock-degreeatmost2.Thelossesareupto0.2percentagepointsonfiveofthesixlanguages,and0.9pointsontheDutchdata.OurevenmorerestrictedO.n5/classof1-inherithead-splittreeshasthesamecoverageasourO.n6/class,whichisexpectedgiventheresultsofPitleretal.(2012):TheirO.n6/classof1-inherittreeshasexactlythesamecoverageasthebaseline(andtherebymorecoveragethanourO.n6/class).Interestinglythough,theirO.n5/classof‘gap-minding’treeshasasignificantlysmallercoveragethanourO.n5/class.Weconcludethatourclassseemstostrikeagoodbalancebetweenexpressivenessandparsingcomplexity.7.2QualitativeEvaluationWhiletheoriginalmotivationbehindintroducingthehead-splitpropertywastoimproveparsingcomplex-ity,itisinterestingtoalsodiscussthelinguisticrelev-anceofthisproperty.Afirstinspectionofthestruc-turesthatviolatethehead-splitpropertyrevealedthatmanysuchviolationsdisappearifoneignoresgapscausedbypunctuation.Somedecisionsaboutwhatnodesshouldfunctionastheheadsofpunctuationsymbolsleadtomoregapsthanothers.Inordertoquantifytheimplicationsofthis,werecomputedthecoverageoftheclassofhead-splittreesondatasetswherewefirstremovedallpunctuation.TheresultsaregiveninTable2.WerestrictourselvestothefivenativedependencytreebanksusedintheCoNLL-Xsharedtask,ignoringtreebanksthathavebeencon-vertedfromphrasestructurerepresentations.ArabicCzechDanishSloveneTurkishwith1139122without146002Table2:Violationsagainstthehead-splitproperty(relativetotheclassofwell-nestedtreeswithblock-degree(cid:20)2)withandwithoutpunctuation.Weseethatwhenweremovepunctuationfromthesentences,thenumberofviolationsagainstthehead-splitpropertyatmostdecreases.ForDanishandSlovene,removingpunctuationevenhastheef-fectthatallwell-nesteddependencytreeswithblock-degreeatmost2becomehead-split.Overall,theabsolutenumbersofviolationsareextremelysmall—exceptforCzech,wherewehave139violationswithand46withoutpunctuation.AcloserinspectionoftheCzechsentencesrevealsthatmanyofthesefea-turerathercomplexcoordinations.Indeed,outofthe46violationsinthepunctuation-freedata,only9remainwhenoneignoresthosewithcoordination.Fortheremainingones,wehavenotbeenabletoidentifyanyclearpatterns.8ConcludingRemarksInthisarticlewehaveextendedheadsplittingtech-niques,originallydevelopedforparsingofprojectivedependencytrees,totwosubclassesofwell-nesteddependencytreeswithblock-degreeatmost2.Wehaveimprovedovertheasymptoticruntimeoftwoexistingalgorithms,atnosignificantlossincoverage.Withthesamegoalofimprovingparsingefficiencyforsubclassesofnon-projectivetrees,inveryrecentworkPitleretal.(2013)haveproposedanO.n4/timealgorithmforasubclassofnon-projectivetreesthatarenotwell-nested,usinganapproachthatisorthogonaltotheonewehaveexploredhere.Otherthanfordependencyparsing,ourresultshavealsoimplicationsformildlycontext-sensitivephrasestructureformalisms.Inparticular,theal-gorithmof(cid:144)5canbeadaptedtoparseasubclassoflexicalizedtree-adjoininggrammars,improvingtheresultbyEisnerandSatta(2000)fromO.n7/toO.n6/.Similarly,thealgorithmof(cid:144)6canbeadaptedtoparsealexicalizedversionofthetree-adjoininggrammarsinvestigatedbySattaandSchuler(1998),improvingana¨ıveO.n7/algorithmtoO.n5/.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
2
2
6
1
5
6
6
6
6
5

/

/
T

l

UN
C
_
UN
_
0
0
2
2
6
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

278

ReferencesManuelBodirsky,MarcoKuhlmann,andMathiasM¨ohl.2005.Well-nesteddrawingsasmodelsofsyntacticstructure.InProceedingsofthe10thConferenceonFormalGrammar(FG)andNinthMeetingonMathem-aticsofLanguage(MOL),pages195–203,Edinburgh,UK.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsoftheTenthConferenceonComputationalNaturalLanguageLearning(CoNLL),pages149–164,NewYork,USA.XavierCarreras.2007.Experimentswithahigher-orderprojectivedependencyparser.InProceedingsoftheCoNLLSharedTaskSessionofEMNLP-CoNLL2007,pages957–961,Prague,CzechRepublic.JoanChen-MainandAravindK.Joshi.2010.Unavoid-ableill-nestednessinnaturallanguageandtheadequacyoftreelocal-MCTAGinduceddependencystructures.InProceedingsoftheTenthInternationalConferenceonTreeAdjoiningGrammarsandRelatedFormalisms(TAG+),NewHaven,USA.JasonEisnerandGiorgioSatta.1999.Efficientparsingforbilexicalcontext-freegrammarsandHeadAuto-matonGrammars.InProceedingsofthe37thAnnualMeetingoftheAssociationforComputationalLinguist-ics(ACL),pages457–464,CollegePark,MD,USA.JasonEisnerandGiorgioSatta.2000.AfasterparsingalgorithmforlexicalizedTree-AdjoiningGrammars.InProceedingsoftheFifthWorkshoponTreeAdjoiningGrammarsandRelatedFormalisms(TAG+),pages14–19,Paris,France.JasonEisner.1997.Bilexicalgrammarsandacubic-timeprobabilisticparser.InProceedingsoftheFifthInter-nationalWorkshoponParsingTechnologies(IWPT),pages54–65,Cambridge,MA,USA.CarlosG´omez-Rodr´ıguez,JohnCarroll,andDavidJ.Weir.2011.Dependencyparsingschemataandmildlynon-projectivedependencyparsing.ComputationalLin-guistics,37(3):541–586.AravindK.JoshiandYvesSchabes.1997.Tree-AdjoiningGrammars.InGrzegorzRozenbergandArtoSalomaa,editors,HandbookofFormalLanguages,volume3,pages69–123.Springer.TerryKooandMichaelCollins.2010.Efficientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11,Uppsala,Sweden.MarcoKuhlmannandJoakimNivre.2006.Mildlynon-projectivedependencystructures.InProceedingsofthe21stInternationalConferenceonComputationalLinguistics(COLING)and44thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL)MainConferencePosterSessions,pages507–514,Sydney,Australia.WolfgangMaierandTimmLichte.2011.Characteriz-ingdiscontinuityinconstituenttreebanks.InPhilippedeGroote,MarkusEgg,andLauraKallmeyer,editors,FormalGrammar.14thInternationalConference,FG2009,Bordeaux,France,July25–26,2009,RevisedSelectedPapers,volume5591ofLectureNotesinCom-puterScience,pages167–182.Springer.RyanMcDonaldandGiorgioSatta.2007.Onthecom-plexityofnon-projectivedata-drivendependencypars-ing.InProceedingsoftheTenthInternationalConfer-enceonParsingTechnologies(IWPT),pages121–132,Prague,CzechRepublic.RyanMcDonald,FernandoPereira,KirilRibarov,andJanHajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InHumanLanguageTechno-logyConference(HLT)andConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages523–530,Vancouver,Canada.EmilyPitler,SampathKannan,andMitchellMarcus.2012.Dynamicprogrammingforhigherorderparsingofgap-mindingtrees.InProceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)andComputationalNaturalLan-guageLearning(CoNLL),pages478–488,JejuIsland,RepublicofKorea.EmilyPitler,SampathKannan,andMitchellMarcus.2013.Findingoptimal1-endpoint-crossingtrees.TransactionsoftheAssociationforComputationalLin-guistics.GiorgioSattaandWilliamSchuler.1998.Restrictionsontreeadjoininglanguages.InProceedingsofthe36thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL)and17thInternationalConferenceonComputationalLinguistics(COLING),pages1176–1182,Montr´eal,Canada.StuartM.Shieber,YvesSchabes,andFernandoPereira.1995.Principlesandimplementationofdeductivepars-ing.JournalofLogicProgramming,24(1–2):3–36.
Scarica il pdf