Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 405–418. Editor de acciones: Marcos Steedman.
Submitted 4/2014; Revised 8/2014; Publicado 10/2014. C(cid:13)2014 Asociación de Lingüística Computacional.
405
ANewParsingAlgorithmforCombinatoryCategorialGrammarMarcoKuhlmannDepartmentofComputerandInformationScienceLinköpingUniversity,Swedenmarco.kuhlmann@liu.seGiorgioSattaDepartmentofInformationEngineeringUniversityofPadua,Italysatta@dei.unipd.itAbstractWepresentapolynomial-timeparsingalgo-rithmforCCG,basedonanewdecompositionofderivationsintosmall,shareableparts.Ouralgorithmhasthesameasymptoticcomplex-ity,O.n6/,asapreviousalgorithmbyVijay-ShankerandWeir(1993),butiseasiertoun-derstand,implement,andprovecorrect.1IntroductionCombinatoryCategorialGrammar(CCG;SteedmanandBaldridge(2011))isalexicalizedgrammarfor-malismthatbelongstotheclassofso-calledmildlycontext-sensitiveformalisms,ascharacterizedbyJoshi(1985).CCGhasbeensuccessfullyusedforawiderangeofpracticaltasksincludingdata-drivenparsing(ClarkandCurran,2007),wide-coveragese-manticconstruction(Bosetal.,2004;Kwiatkowskietal.,2010;LewisandSteedman,2013)andmachinetranslation(Weeseetal.,2012).SeveralparsingalgorithmsforCCGhavebeenpresentedintheliterature.Earlierproposalsshowrunningtimeexponentialinthelengthoftheinputstring(PareschiandSteedman,1987;Tomita,1988).AbreakthroughcamewiththeworkofVijay-ShankerandWeir(1990)andVijay-ShankerandWeir(1993)whoreportthefirstpolynomial-timealgorithmforCCGparsing.Untilthisday,thisalgorithm,whichweshallrefertoastheV&Walgorithm,remainstheonlypublishedpolynomial-timeparsingalgorithmforCCG.However,wearenotawareofanypracticalparserforCCGthatactuallyusesit.Wespeculatethatthishastwomainreasons:Primero,someauthorshavearguedthatlinguisticresourcesavailableforCCGcanbecoveredwithcontext-freefragmentsoftheformalism(FowlerandPenn,2010),forwhichmoreefficientparsingalgorithmscanbegiven.Sec-ond,theV&Walgorithmisconsiderablymorecom-plexthanparsingalgorithmsforequivalentmildlycontext-sensitiveformalisms,suchasTree-Adjoin-ingGrammar(JoshiandSchabes,1997),andisquitehardtounderstand,implement,andprovecorrect.TheV&Walgorithmisbasedonaspecialdecom-positionofCCGderivationsintosmallerpartsthatcanthenbesharedamongdifferentderivations.Thissharingisthekeytothepolynomialruntime.Inthisarticlewebuildonthesameidea,butdevelopanalternativepolynomial-timealgorithmforCCGparsing.ThenewalgorithmisbasedonadifferentdecompositionofCCGderivations,andisarguablysimplerthantheV&Walgorithminatleasttwore-spects:Primero,thenewalgorithmusesonlythreebasicsteps,againsttheninebasicstepsoftheV&Wparser.Second,thecorrectnessproofofthenewalgorithmissimplerthantheonereportedbyVijay-ShankerandWeir(1993).ThenewalgorithmrunsintimeO.n6/wherenisthelengthoftheinputstring,thesameastheV&Wparser.Weorganizeourpresentationasfollows.InSec-tion2weintroduceCCGandthecentralnotionofderivationtrees.InSection3westartwithasimplebutexponential-timeparserforCCG,fromwhichwederiveourpolynomial-timeparserinSection4.Section5furthersimplifiesthealgorithmandprovesitscorrectness.WethenprovideadiscussionofouralgorithmandpossibleextensionsinSection6.Sec-tion7concludesthearticle.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
406
2CombinatoryCategorialGrammarWeassumebasicfamiliaritywithCCGingeneralandtheformalismofWeirandJoshi(1988)inparticular.Inthissectionwesetupourterminologyandnotation.ACCGhastwomainparts:alexiconthatassociateswordswithcategories,andrulesthatspecifyhowcategoriescanbecombinedintoothercategories.Together,thesecomponentsgiverisetoderivationssuchastheoneshowninFigure1.2.1LexiconTheCCGlexiconisafinitesetofword–categorypairswWDX.1Categoriesarebuiltfromafinitesetofatomiccategoriesandtwobinaryoperators:forwardslash(=)andbackwardslash(=).Atomiccategoriesrepresentthesyntactictypesofcompleteconstituents;theyincludeadistinguishedcategorySforcompletesentences.Aconstituentwiththecom-plexcategoryX=YrepresentsafunctionthatseeksaconstituentofcategoryYimmediatelytoitsrightandreturnsaconstituentofcategoryX;similarmente,X=YrepresentsafunctionthatseeksaYtoitsleft.Wetreatslashesasleft-associativeoperatorsandomitunnecessaryparentheses.Bythisconvention,everycategoryXcanbewrittenasXDAjmXm(cid:1)(cid:1)(cid:1)j1X1wherem(cid:21)0,AisanatomiccategorycalledthetargetofXandthejiXiareslash–categorypairscalledtheargumentsofX.Weviewtheseargumentsasbeingarrangedinastackwithj1X1atthetopandjmXmatthebottom.ThusanotherwayofwritingthecategoryXaboveisasXDA˛,where˛isa(possiblyempty)stackofmarguments.ThenumbermiscalledthearityofX;wedenoteitbyar.X/.2.2RulesTherulesofCCGaredirectedversionsof(general-ized)functionalcomposition.Therearetwoforms,forwardrulesandbackwardrules:X=YYjdYd(cid:1)(cid:1)(cid:1)j1Y1)XjdYd(cid:1)(cid:1)(cid:1)j1Y1.>d/YjdYd(cid:1)(cid:1)(cid:1)j1Y1X=Y)XjdYd(cid:1)(cid:1)(cid:1)j1Y1.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
407
w….XiffwWDXX=YYˇXˇt1t2iffX=YYˇ)XˇYˇX=YXˇt2t1iffYˇX=Y)XˇFigure2:Recursivedefinitionofderivationtrees.Nodeslabeledwithprimaryinputcategoriesareshaded.2.3DerivationTreesThesetofderivationtreesofaCCGcanbeformallydefinedasinFigure2.ThereandintheremainderofthisarticleweuseˇandothersymbolsfromthebeginningoftheGreekalphabettodenotea(pos-siblyempty)stackofarguments.Derivationtreesconsistofunarybranchingsandbinarybranchings:unarybranchings(drawnwithdottedlines)corre-spondtolexiconentries;binarybranchingscorre-spondto(validinstancesof)compositionrules.Theyieldofaderivationtreeistheleft-to-rightsequenceofitsleaves.Thetypeofaderivationtreeisthecategoryatitsroot.3CKY-StyleParsingAlgorithmAsthepointofdepartureforourownwork,wenowintroduceastraightforward,CKY-styleparsingalgo-rithmforCCGs.ItisasimplegeneralizationofthealgorithmpresentedbyShieberetal.(1995),whichisrestrictedtogrammarswithrulesofdegree0or1.Asinthatarticle,wespecifyouralgorithmintermsofagrammaticaldeductionsystem.3.1DeductionSystemWearegivenaCCGandastringwDw1(cid:1)(cid:1)(cid:1)wntobeparsed,whereeachwiisalexicaltoken.Asageneralnotation,forintegersi;con0(cid:20)i(cid:20)j(cid:20)nwewritewŒi;j(cid:141)todenotethesubstringwiC1(cid:1)(cid:1)(cid:1)wjofw.Asusual,wetakewŒi;i(cid:141)tobetheemptystring.ItemsTheCKY-stylealgorithmusesalogicwithitemsoftheformŒX;i;j(cid:141)whereXisacategoryandi;jarefencepostpositionsinw.TheintendedinterpretationofsuchanitemistoassertthatwecanbuildaderivationtreewithyieldwŒi;j(cid:141)andtypeX.ThegoalofthealgorithmistheconstructionoftheitemŒS;0;norte(cid:141),whichassertstheexistenceofaderivationtreefortheentireinputstring.(RecallthatSisthedistinguishedcategoryforsentences.)AxiomsandInferenceRulesThestepsoftheal-gorithmarespecifiedbymeansofinferencerulesoveritems.Theserulesimplementtherecursivedefi-nitionofderivationtreesgiveninFigure2.Thecon-structionstartswithaxiomsoftheformŒX;i;iC1(cid:141)wherewiC1WDXisalexiconentry;theseitemsasserttheexistenceofaunary-branchingderivationtreeoftheformshownintheleftofFigure2foreachlexicaltokenwiC1.Thereisoneinferenceruleforeveryforwardrule(applicationorcomposition):ŒX=Y;i;j(cid:141)ŒYˇ;j;k(cid:141)ŒXˇ;i;k(cid:141)X=YYˇ)Xˇ(1)Asymmetricalruleisusedforbackwardapplicationandcomposition.However,hereandintheremainderofthearticleweonlyspecifytheforwardversionofeachruleandleavethebackwardversionimplicit.3.2CorrectnessandRuntimeThesoundnessandcompletenessoftheCKY-stylealgorithmcanbeprovedbyinductiononthenumberofinferencesandthenumberofnodesinaderivationtree,respectively.Itisnothardtoseethat,inthegeneralcase,thealgorithmusesanamountoftimeandspaceexponen-tialwithrespecttothelengthoftheinputstring,n.Thisisbecauserule(1)maybeusedtogrowthearityofprimaryinputcategoriesuptosomelinearfunctionofn,resultinginexponentiallymanycate-gories.2Notethatthisisonlypossibleifthereareruleswithdegree2ormore.Forgrammarsrestrictedtoruleswithdegree0or1,suchasthoseconsideredbyShieberetal.(1995),theruntimeofthealgorithmiscubicinn.Thisrestrictedclassofgrammarsonlyholdscontext-freegenerativepower,whilethepowerofgeneralCCGisbeyondthatofcontext-freegram-mars(Vijay-ShankerandWeir,1994).2Categorieswhosearityisnotboundedbyalinearfunc-tionofnarenotuseful,inthesensethattheycannotoccurincompletederivations.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
408
wŒi0;j0(cid:141)t0wŒi;i0(cid:141)wŒj0;j(cid:141)X=YcXˇ(a)treetwŒi00;i0(cid:141)wŒj0;j00(cid:141)Xj1Yc1wŒi;i00(cid:141)wŒj00;j(cid:141)Xˇj2Zc2Xˇ(cid:13)(b)contextcFigure3:Decompositionofderivations.4APolynomial-TimeAlgorithmWenowintroduceourpolynomial-timealgorithm.ThisalgorithmusesthesameaxiomsandthesamegoalitemastheCKY-stylealgorithm,butfeaturesnewitemsandnewinferencerules.4.1NewItemsInafirststep,inordertoavoidruntimeexponentialinnwerestrictouritemsettocategorieswhosearityisboundedbysomegrammarconstantcG:ŒX;i;j(cid:141)wherear.X/(cid:20)cGTheexactchoiceoftheconstantwillbediscussedinSection5.Withtherestricteditemset,thenewalgorithmbehavesliketheoldoneaslongasthearityofcategoriesdoesnotexceedcG.However,regla(1)aloneisnolongercomplete:Derivationswithcate-gorieswhosearityexceedscGcannotbesimulatedanymore.Toremedythisdeficiency,weintroduceanewtypeofitemtoimplementaspecificdecomposi-tionoflongderivationsintosmallerpieces.ConsideraderivationtoftheformshowninFig-ure3(a).NotethattheyieldoftiswŒi;j(cid:141).Thederivationconsistsoftwoparts,namedt0andc;theseshareacommonnodewithacategoryoftheformX=Y.NowassumethatchasthespecialpropertythatnoneofthecombinatoryrulesthatitusespopstheargumentstackofthecategoryX.Thismeansthatc,afterpoppingtheargument=Y,maypushnewargumentsandpopthemagain,butmaynever“touch”X.Wecallafragmentwiththisspecialprop-ertyaderivationcontext.(AformaldefinitionwillbegiveninSection5.2.)Thespecialpropertyofcisusefulbecauseitim-pliesthatccanbecarriedoutforanychoiceofX.Tobemorespecific,letuswriteˇforthe(possiblyempty)sequenceofargumentsthatcpushestotheargumentstackofXinplaceof=Y.Weshallreferto=Yasthebridgingargumentandtothesequenceˇastheexcessofc.Supposenowthatwereplacet0byaderivationtreewiththesameyieldbutwithatypeX0=YwhereX0¤X.ThenbecausecdoesnottouchX0weobtainanothervalidderivationtreewiththesameyieldast;thetypeofthistreewillbeX0ˇ.Forthecombinationwithc,theinternalstructureoft0isofnoimportance;theonlyimportantinforma-tionistheextentoftheyieldoft0andtheidentityofthebridgingargument=Y.Intermsofourdeductionsystem,thiscanbeexpressedasfollows:Thederiva-tioncontextccanbecombinedwithanytreet0thatisassociatedwithanitemoftheformŒX=Y;i0;j0(cid:141),whereXisanycategory.Similarly,theinternalstruc-tureofcisofnoimportanceeither,aslongastheargumentstackofthecategoryXremainsuntouched.Itsufficestorecordthefollowing:(cid:15)theextentoftheyieldoft,specifiedintermsofthepositionsiandj;(cid:15)theextentoftheyieldoft0,specifiedintermsofthepositionsi0andj0;(cid:15)thebridgingargument=Y;y(cid:15)theexcessˇ.WerepresentthesepiecesofinformationinanewtypeofitemoftheformŒ=Y;ˇ;i;i0;j0;j(cid:141).Thein-tendedinterpretationoftheseitemsistoassertthat,foranychoiceofX,ifwecanbuildaderivationtreet0withyieldwŒi0;j0(cid:141)andtypeX=Y,thenwecanalsobuildaderivationtreet0withyieldwŒi;j(cid:141)andtypeXˇ.WealsouseitemsŒ=Y;ˇ;i;i0;j0;j(cid:141)withabackwardslash,withasimilarmeaning.Likeitemsthatrepresentderivationtrees,ouritemsforderivationcontextsarearity-restricted:ŒjY;ˇ;i;i0;j0;j(cid:141)wherear.Yˇ/(cid:20)cGAswewillseeinSection5,theserestricteditemssufficetosimulateallderivationsofaCCG.Further-more,thiscanbedoneintimepolynomialinn,be-causeourencodingallowssharingofthesameitemsamongseveralderivations.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
409
ŒA;0;1(cid:141)ŒS=H=A=F;2;5(cid:141)ŒB;1;2(cid:141)ŒC=A=F;2;3(cid:141)ŒS=E;3;4(cid:141)ŒE=H=C;4;5(cid:141)ŒS=H=C;3;5(cid:141)(1)ŒS=H=A=F;2;5(cid:141)(1)ŒF=G=B;5;6(cid:141)Œ=F;=G=B;2;2;5;6(cid:141)(2)(cid:142)Œ=F;=G;1;2;5;6(cid:141)(4)ŒG;6;7(cid:141)Œ=F;»;1;2;5;7(cid:141)(4)ŒS=H=A;1;7(cid:141)(3)(cid:143)ŒS=H;0;7(cid:141)(1)ŒH;7;8(cid:141)ŒS;0;8(cid:141)(1)Figure4:AsamplederivationofthegrammaticaldeductionsystemofSection4.Inference(cid:142)triggersanewcontextitemfromatreeitem;inferencia(cid:143)reusesthetreeitem(asindicatedbythearrow),recombiningitwiththe(modified)contextitem.4.2NewInferenceRulesInourparsingalgorithm,contextitemsareintroducedwheneverthecompositionoftwocategorieswhosearitiesareboundedbycGwouldresultinacategorywhosearityexceedsthisbound:ŒX=Y;i;j(cid:141)ŒYˇ;j;k(cid:141)Œ=Y;ˇ;i;i;j;k(cid:141)8<:X=YYˇ)Xˇar.Xˇ/>cG(2)Thenewrulehasthesameantecedentsasrule(1),butratherthanextendingthederivationassertedbythefirstantecedentŒX=Y;i;j(cid:141),whichisnotpossiblebe-causeofthearitybound,ittriggersanewderivationcontext,assertedbytheitemŒ=Y;ˇ;i;i;j;k(cid:141).Fur-therapplicationsandcompositionswillextendthenewcontext,andonlywhentheexcessofthiscontexthasbecomesufficientlysmallwillitberecombinedwiththederivationthatoriginallytriggeredit.Thisisdonebythefollowingrule:ŒXjY;i0;j0(cid:141)ŒjY;ˇ;i;i0;j0;j(cid:141)ŒXˇ;i;j(cid:141)(3)Notethatthisrule(likeallrulesinthedeductionsystem)isonlydefinedonvaliditems;inparticularitonlyfiresifthearityofthecategoryXˇisboundedbycG.Theremainingrulesofthealgorithmparallelthethreerulesthatwehaveintroducedsofarbuttakeitemsthatrepresentderivationcontextsratherthanderivationtreesastheirfirstantecedents.Firstout,regla(4)extendsaderivationcontextinthesamewayasrule(1)extendsaderivationtree.ŒjY;ˇ=Z;i;i0;j0;j(cid:141)ŒZ(cid:13);j;k(cid:141)ŒjY;ˇ(cid:13);i;i0;j0;k(cid:141)8<:X=ZZ(cid:13))X(cid:13)ar.Yˇ(cid:13)/(cid:20)cG(4)Rule(5)istheobviouscorrespondentofrule(2):Ittriggersanewcontextwhentheantecedentcontextcannotbeextendedbecauseofthearitybound.ŒjY;ˇ=Z;i;i0;j0;j(cid:141)ŒZ(cid:13);j;k(cid:141)Œ=Z;(cid:13);i;i;j;k(cid:141)8<:X=ZZ(cid:13))X(cid:13)ar.Yˇ(cid:13)/>cG(5)Finalmente,andparalleltorule(3),weneedaruletorecombineacontextwiththecontextthatoriginallytriggeredit.Asitwillturnout,weonlyneedthisincaseswherethetriggeredcontexthasnoexcess.Œj1Y;ˇj2Z;i00;i0;j0;j00(cid:141)Œj2Z;»;i;i00;j00;j(cid:141)Œj1Y;ˇ;i;i0;j0;j(cid:141)(6)4.3SampleDerivationWenowillustrateouralgorithmonatoygrammar.Thegrammarhasthefollowinglexicon:w1WDAw5WDE=H=Cw2WDBw6WDF=G=Bw3WDC=A=Fw7WDGw4WDS=Ew8WDHThestartsymbolisS.Thegrammarallowsallin-stancesofapplicationandallinstancesofcomposi-tionwithdegreeboundedby2.WeletcGD3(asexplainedlaterinSection5.2).Aderivationofourdeductionsystemonthein-putstringw1(cid:1)(cid:1)(cid:1)w8isgiveninFigure4.Westartbyapplyingrule(1)twice(onceforward,onceback-ward)toobtaintheitemŒS=H=A=F;2;5(cid:141).Com-biningthisitemwiththeaxiomŒF=G=B;5;6(cid:141)isnotpossibleusingrule(1),asthiswouldresultin
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
410
acategorywitharity4,exceedingthearitybound.Wethereforeuserule(2)totriggerthecontextitemŒ=F;=G=B;2;2;5;6(cid:141)((cid:142)).Successively,weuserule(4)twicetoobtaintheitemŒ=F;»;1;2;5;7(cid:141).Atthispointweuserule(3)(withˇD»)torecombinethecontextitemwiththetreeitemthatoriginallytriggeredit((cid:143));thisyieldstheitemŒS=H=A;1;7(cid:141).Notethattherecombinationeffectivelyretrievestheportionofthestackthatwasbelowtheargument=Fwhenthecontextitemwastriggeredin(cid:142).Doubleap-plicationofrule(1)producesthegoalitemŒS;0;8(cid:141).4.4RuntimeAnalysisWenowturntoananalysisoftheruntimecomplexityofouralgorithm.Wefirstconsiderruntimecomplex-itywithrespecttothelengthoftheinputstring,n.Theruntimeisdominatedbythenumberofinstantia-tionsofrule(6)whichinvolvestwocontextitemsasantecedents.Byinspectionofthisrule,weseethatthenumberofpossibleinstantiationsisboundedbyn6.ThereforeweconcludethatthealgorithmrunsintimeO.n6/.Wenowconsiderruntimecomplexitywithrespecttothesizeoftheinputgrammar.Heretheruntimeisdominatedbythenumberofinstantiationsofrules(1)–(5).Forexample,regla(5)combinesitemsŒjY;ˇ=Z;i;i0;j0;j(cid:141)andŒZ(cid:13);j;k(cid:141):Byourrestrictionsonitems,boththearityofYˇ=ZandthearityofZ(cid:13)areupper-boundedbythecon-stantcG.NowrecallthateverycategoryXcanbewrittenasXDA˛forsomeatomiccategoryAandstackofarguments˛.LetAbethesetofatomiccategoriesintheinputgrammar,andletLbethesetofallargumentsoccurringinanycategoryofthelexicon.ByaresultofVijay-ShankerandWeir(1994,Lemma3.1),everyargumentthatmayoccurinaderivedcategoryoccursinL.Thenthenumberofpossibleinstantiationsofrule(5)aswellasrules(1)–(4),andhencetheruntimeofthealgorithm,isinO.jAjjLjcG(cid:1)jLjcG/DO.jAjjLj2cG/:NotethatbothAandLmaygrowwiththegrammarsize.AswewillseeinSection5.2,theconstantcGalsodependsonthegrammarsize.Thismeansthattheworst-caseruntimecomplexityofourparserisexponentialinthesizeoftheinputgrammar.WewillreturntothispointinSection7.5CorrectnessInthissectionweprovethecorrectnessofourparsingalgorithm.Inordertosimplifytheproofs,westartbysimplifyingouralgorithm,atthecostofmakingitlessefficient:(cid:15)Weremovetherulesforextendingtreesandcontexts,regla(1)andrule(4).(cid:15)Weconflatetherulesfortriggeringcontexts,regla(2)andrule(5),intothesingleruleŒYˇ;j;k(cid:141)Œ=Y;ˇ;i;i;j;k(cid:141)X=YYˇ)Xˇ(7)Thisruleblindlyguessestheextensionofthetriggeringtreeorcontext(specifiedbyposi-tionsiandj),ratherthanwaitingforacor-respondingitemtobederived.ThesimplifiedalgorithmisspecifiedinFigure5.WenowarguethatthisalgorithmandthealgorithmfromSection4parseexactlythesamederivationtrees,althoughtheyusedifferentparsingstrategies.First,weobservethatrule(1)intheoldalgorithmcanbesimulatedbyacombinationofotherrulesinthesimplifiedalgorithm,asfollows:ŒX=Y;i;j(cid:141)ŒYˇ;j;k(cid:141)Œ=Y;ˇ;i;i;j;k(cid:141)(7)ŒXˇ;i;k(cid:141)(3)Además,thesimplifiedalgorithmdoesnolongerneedrule(4),whoseroleisnowtakenoverbyrule(7).Toseethis,recallthatrule(4)extendsanexistingcontextwheneverthecompositionoftwocategoriesresultsinanewcategorywhosearitydoesnotexceedcG.Incontrast,regla(7)alwaystriggersanewcontextc,eveniftheresultofthecompositionofcwithsomeexistingcontextsatisfiestheabovearityrestriction.Despitethedifferenceintheadoptedstrategy,thesetworulesareequivalentintermsofstackcontent,leadingtothesamederivationtrees.5.1DefinitionsWeintroducesomeadditionalterminologyandnota-tionthatwewilluseintheproofs.Foraderivationtreetandanodeuoft,wewritetŒu(cid:141)todenotethecategoryatu,andwewritetjutodenotethesub-treeoftatu.Formally,tjuistherestrictionofttonodeuandallofitsdescendants.Eachsubtreeofaderivationtreeisanotherderivationtree.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
411
Itemsoftype1:ŒX;i;j(cid:141),ar.X/(cid:20)cGItemsoftype2:ŒjY;ˇ;i;i0;j0;j(cid:141),ar.Yˇ/(cid:20)cGAxioms:ŒX;i;iC1(cid:141),wiC1WDXGoal:ŒS;0;norte(cid:141)Inferencerules:ŒYˇ;j;k(cid:141)Œ=Y;ˇ;i;i;j;k(cid:141)X=YYˇ)Xˇ(7)ŒX=Y;i0;j0(cid:141)Œ=Y;ˇ;i;i0;j0;j(cid:141)ŒXˇ;i;j(cid:141)(3)Œj1Y;ˇj2Z;i00;i0;j0;j00(cid:141)Œj2Z;»;i;i00;j00;j(cid:141)Œj1Y;ˇ;i;i0;j0;j(cid:141)(6)Figura 5:Itemsandinferencerulesofthesimplifiedalgorithmforaninputstringw1(cid:1)(cid:1)(cid:1)wn.Definition1Lettbeaderivationtreewithrootr.ThenthassignatureŒX;i;j(cid:141)if1.theyieldoftiswŒi;j(cid:141)and2.thetypeoftisX,thatis,tŒr(cid:141)DX.Notethatwhileweusethesamenotationforsig-naturesasforitems,thesignatureofaderivationtreeisapurelystructuralconcept,whereasanitemisanobjectinthealgorithm.Acentralconceptinourproofisthenotionofspine.Recallthataderivationtreeconsistsofunarybranchingsandbinarybranchings.Ineachbinarybranching,werefertothetwochildrenofthebranch-ing’srootnodeastheprimarychildandthesec-ondarychild,dependingonwhichofthetwoisla-beledwiththeprimaryandsecondaryinputcategoryofthecorrespondingruleinstance.InFigure2,theprimarychildrenoftherootnodeareshaded.Definition2Foraderivationtreet,thespineoftistheuniquepaththatstartsattherootnodeoftandateachnodeucontinuestotheprimarychildofu.Thespineofaderivationtreealwaysendsatanodethatislabeledwithacategoryfromthelexicon.Definition3Lettbeaderivationtreewithrootr.Aderivationcontextcisobtainedbyremovingallproperdescendantsofsomenodef¤ronthespineoft,undertherestrictionthatar.tŒu(cid:141)/>ar.tŒr(cid:141)/foreverynodeuonthespineproperlybetweenfandr.Thenodefiscalledthefootnodeofc.Theyieldofcisthepairwhosefirstcomponentistheyieldofttotheleftoffandwhosesecondcomponentistheyieldofttotherightoff.Foraderivationcontextcandanodeuofc,wewritecŒu(cid:141)todenotethecategoryatu.Definition3formalizestheconceptofderivationcontextsthatweintroducedinSection4.1.First,becausefisonthespineandf¤r,thecate-gorycŒf(cid:141)takestheformX=Y.Thearityrestric-tionimpliesthatthecategoryofeverynodeuonthespineproperlybetweenfandrtakestheformXˇu,jˇuj>0,andthatthecategoryattheroottakestheformXˇ,jˇj(cid:21)0.ThusthecategoryXisneverexposedinc,exceptperhapsatr.AswewillseeinSection5.4,thisproperty,togetherwithacarefulselectionof“splitnodes”,willallowustodecomposederivationsintosmaller,shareableparts.Thebasicideaisthesameasinthetabulationofpushdownautomata(Lang,1974;NederhofandSatta,2004),wherethepushdowninourcaseistheargumentstackoftheprimaryinputcategoriesalongaspine.Theconceptsofsignatureandspinearegeneral-izedtoderivationcontextsasfollows:Definition4Letcbeaderivationcontextwithrootnoderandfootnodef.ThenchassignatureŒjY;ˇ;i;i0;j0;j(cid:141)if1.theyieldofcis.wŒi;i0(cid:141);wŒj0;j(cid:141)/;2.forsomeX,cŒf(cid:141)DXjYandcŒr(cid:141)DXˇ.Definition5Foraderivationcontextc,thespineofcisthepathfromitsrootnodetoitsfootnode.5.2GrammarConstantBeforewestartwiththeproofassuch,weturntothechoiceofthegrammarconstantcG,whichwasleftpendinginprevioussections.RecallthatweareusingcGasaboundonthearityofXintype1itemsŒX;i;j(cid:141).Sincetheseitemsareproducedbyouraxiomsfromthesetofcategoriesinthelexicon,cGmustnotbesmallerthanthemaximumarity`ofacategoryinthisfiniteset.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
412
WealsousecGasaboundonthearityofthecat-egoryYˇintype2itemsŒjY;ˇ;i;i0;j0;j(cid:141).Theseitemsareproducedbyinferencerule(7)tosimulateinstancesofcompositionoftheformX=YYˇ)Xˇ.Herethelengthofˇisboundedbythemaxi-mumdegreedofacompositionruleinthegrammar,andar.Y/isboundedbythemaximumarityaofanargumentfromthe(finite)setLofargumentsinthelexicon(recallSection4.4).ThereforecGcannotbesmallerthanaCd.Puttingeverythingtogether,weobtaintheconditioncG(cid:21)maxf`;aCdg:(8)Thenextlemmawillbeusedinseveralplaceslater.Lemma1Letcbeaderivationcontextwithsigna-tureŒjY;ˇ;i;i0;j0;j(cid:141).Thenar.Yˇ/(cid:20)cG.Proof.Letrandfbetherootandthefootnode,respectivamente,ofc.Fromthedefinitionofsignature,theremustbesomeXsuchthatcŒr(cid:141)DXˇandcŒf(cid:141)DXjY.Nowletpbetheparentnodeoff,andassumethattheruleusedatpisinstantiatedasX=YYˇ0)Xˇ0,sothatcŒp(cid:141)DXˇ0.IfpDrthenˇ0Dˇ;de lo contrario,becauseofthearityrestric-tioninthedefinitionofderivationcontexts(Defini-tion3)wehavejˇ0j>jˇj.Thenar.Yˇ/(cid:20)ar.Yˇ0/(cid:20)cG;wheretherightinequalityfollowsfromtheassump-tionthatX=YYˇ0)Xˇ0isaruleinstanceofthegrammar,andfrominequality(8).5.3SoundnessWestartthecorrectnessproofbyarguingforthesoundnessofthedeductionsysteminFigure5.Morespecifically,weshowthatforeveryitemoftype1thereexistsaderivationtreewiththesamesigna-ture,andthatforeveryitemoftype2thereexistsaderivationcontextwiththesamesignature.Thesoundnessoftheaxiomsisobvious.Rule(7)statesthat,ifwehavebuiltaderivationtreetwithsignatureŒYˇ;j;k(cid:141)thenwecanbuildaderivationcontextcwithsignatureŒ=Y;ˇ;i;i;j;k(cid:141).UndertheconditionthatthegrammaradmitstheruleinstanceX=YYˇ)Xˇ,thisinferenceissound;thecontextcanbebuiltasshowninFigure6.wŒj;k(cid:141)YˇX=YXˇtFigure6:Soundnessofrule(7).Regla(3)statesthat,ifwehavebuiltaderivationtreet0withsignatureŒX=Y;i0;j0(cid:141)andacontextcwithsignatureŒ=Y;ˇ;i;i0;j0;j(cid:141),thenwecanbuildanewtreetwithsignatureŒXˇ;i;j(cid:141).Weobtaintbysubstitutingt0forthefootnodeofc(seeFigure3(a)).Regla(6)statesthat,ifwehavebuiltaderivationcontextc1withsignatureŒj1Y;ˇj2Z;i00;i0;j0;j00(cid:141)andanothercontextc2withsignatureŒj2Z;»;i;i00;j00;j(cid:141),thenwecanbuildaderivationcontextcwithsignatureŒj1Y;ˇ;i;i0;j0;j(cid:141).Weobtaincbysubstitutingc1forthefootnodeofc2;thisisillustratedbyFigure3(b)ifweassume(cid:13)D».5.4CompletenessInthefinalpartofourcorrectnessproofwenowprovethecompletenessofthedeductionsysteminFigure5.Specificallyweshowthefollowingstrongerstatement:ForeveryderivationtreeandforeveryderivationcontextwithasignatureIsatisfyingthearityboundsforitemsofFigure5,thedeductionsysteminfersthecorrespondingitemI.Fromthisstatementwecanimmediatelyconcludethatthesys-temconstructsthegoalitemwheneverthereexistsaderivationtreewhoseyieldisthecompleteinputstringandwhosetypeisthedistinguishedcategoryS.Ourproofisbyinductiononameasurethatwecallrank.Therankofaderivationtreeorcontextisthenumberofitsnon-leafnodes.Notethatthisdefinitionimpliesthatthefootnodeofacontextisnotcountedagainstitsrank.Therankofatreeorcontextisalwaysatleast1,withrank1onlyrealizedforderivationtreesconsistingofasinglenode.5.4.1BaseCaseConsideraderivationtreetwithsignatureŒX;i;j(cid:141)andrank.t/D1.ThetreettakestheformshownintheleftofFigure2,andwehavejDiC1andwjWDX.TheitemŒX;i;j(cid:141)isthenproducedbyoneoftheaxiomsofourdeductionsystem.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
413
5.4.2InductiveCaseThegeneralideaunderlyingtheinductivecasecanbestatedasfollows.Weconsideraderivationtreeorcontext’withsignatureIsatisfyingtheboundsstatedinFigure5foritemsoftype1or2.Wethenidentifyaspecialnodesin’’sspine,whichwecallthesplitnode.Weusesto“split”’intotwopartsthatareeitherderivationtreesorcontexts,thatbothsatisfytheboundsforitemsoftype1or2,andthatbothhaveranksmallerthantherankof’.Wethenapplytheinductionhypothesistoobtaintwoitemsthatcanbecombinedbyoneoftheinferencerulesofouralgorithm,resultinginthedesireditemIfor’.Wefirstconsiderthecaseinwhich’isatree,andlaterthecaseinwhich’isacontext.5.4.3SplittingTreesConsideraderivationtreetwithsignatureŒX0;i;j(cid:141),rootnoder,andrank.t/>1.Thenthespineoftconsistsofatleast2nodes.Nowassumethatar.X0/(cid:20)cG,thatis,ŒX0;i;j(cid:141)isavaliditem.Choosethesplitnodestobethehighest(closesttotheroot)non-rootnodeonthespineforwhichar.tŒs(cid:141)/(cid:20)cG.Nodesalwaysexists,asthearityconstraintissatisfiedatleastforthelowest(farthestfromtheroot)nodeonthespine,whichislabeledwithacategoryfromthelexicon.Considerthesubtreet0Dtjs;thussistherootnodeoft0.Becausesisaprimarynodeint,thecategoryatshasatleastoneargument.WedealherewiththecasewherethiscategorytakestheformtŒs(cid:141)DX=Y;thecasetŒs(cid:141)DX=Yissym-metrical.Thusthesignatureoft0takestheformŒX=Y;i0;j0(cid:141)wherei0;j0areintegerswithi(cid:20)i0
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
yo
a
C
_
a
_
0
0
1
9
2
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
414
nodeuinthespine,andatthesametime,nocombi-natoryrulecanreducethearityofitsprimaryinputcategorybymorethanoneunit.Considerthecontextc1thatisobtainedbyrestrict-ingctonodesandallofitsdescendants;thussistherootnodeofc1andfisthefootnode.Toseethatc1iswell-defined,notethatourchoiceofsguaranteesthatar.cŒu(cid:141)/>ar.cŒs(cid:141)/foreverynodeuthatisproperlybetweenfands.Toseethis,supposethattherewasanodeu¤ssuchthatar.cŒu(cid:141)/(cid:20)ar.cŒs(cid:141)/.Sincear.cŒs(cid:141)/Dar.cŒr(cid:141)/C1andar.cŒu(cid:141)/¤ar.cŒs(cid:141)/,byourdefinitionofs,wewouldhavear.cŒu(cid:141)/(cid:20)ar.cŒr(cid:141)/,whichcannotbebecauseinc,everynodeuproperlybetweenfandshasarityar.cŒu(cid:141)/>ar.cŒr(cid:141)/.Becausefisaprimarynodeinc,thecategoryatfhasatleastoneargument;callitj1Y.Thenodesisaprimarynodeincaswell,sotheexcessofc1takestheformˇj2Z,wherej2Zisthetopmostargumentofthecategoryats.Thusthesignatureofc1takestheformŒj1Y;ˇj2Z;i00;i0;j0;j00(cid:141)wherei00;j00areintegerswithi(cid:20)i00(cid:20)i0andj0(cid:20)j00(cid:20)j.ApplyingLemma1toc1,wegetar.Yˇj2Z/(cid:20)cG,andthereforeŒj1Y;ˇj2Z;i00;i0;j0;j00(cid:141)isavaliditem.Finally,wenotethatrank.c1/