计算语言学协会会刊, 2 (2014) 405–418. 动作编辑器: Mark Steedman.
Submitted 4/2014; 修改 8/2014; 已发表 10/2014. C(西德:13)2014 计算语言学协会.
405
ANewParsingAlgorithmforCombinatoryCategorialGrammarMarcoKuhlmannDepartmentofComputerandInformationScienceLinköpingUniversity,Swedenmarco.kuhlmann@liu.seGiorgioSattaDepartmentofInformationEngineeringUniversityofPadua,Italysatta@dei.unipd.itAbstractWepresentapolynomial-timeparsingalgo-rithmforCCG,basedonanewdecompositionofderivationsintosmall,shareableparts.Ouralgorithmhasthesameasymptoticcomplex-ity,O.n6/,asapreviousalgorithmbyVijay-ShankerandWeir(1993),butiseasiertoun-derstand,implement,andprovecorrect.1IntroductionCombinatoryCategorialGrammar(CCG;SteedmanandBaldridge(2011))isalexicalizedgrammarfor-malismthatbelongstotheclassofso-calledmildlycontext-sensitiveformalisms,ascharacterizedbyJoshi(1985).CCGhasbeensuccessfullyusedforawiderangeofpracticaltasksincludingdata-drivenparsing(ClarkandCurran,2007),wide-coveragese-manticconstruction(Bosetal.,2004;Kwiatkowskietal.,2010;LewisandSteedman,2013)andmachinetranslation(Weeseetal.,2012).SeveralparsingalgorithmsforCCGhavebeenpresentedintheliterature.Earlierproposalsshowrunningtimeexponentialinthelengthoftheinputstring(PareschiandSteedman,1987;Tomita,1988).AbreakthroughcamewiththeworkofVijay-ShankerandWeir(1990)andVijay-ShankerandWeir(1993)whoreportthefirstpolynomial-timealgorithmforCCGparsing.Untilthisday,thisalgorithm,whichweshallrefertoastheV&Walgorithm,remainstheonlypublishedpolynomial-timeparsingalgorithmforCCG.However,wearenotawareofanypracticalparserforCCGthatactuallyusesit.Wespeculatethatthishastwomainreasons:第一的,someauthorshavearguedthatlinguisticresourcesavailableforCCGcanbecoveredwithcontext-freefragmentsoftheformalism(FowlerandPenn,2010),forwhichmoreefficientparsingalgorithmscanbegiven.Sec-ond,theV&Walgorithmisconsiderablymorecom-plexthanparsingalgorithmsforequivalentmildlycontext-sensitiveformalisms,suchasTree-Adjoin-ingGrammar(JoshiandSchabes,1997),andisquitehardtounderstand,implement,andprovecorrect.TheV&Walgorithmisbasedonaspecialdecom-positionofCCGderivationsintosmallerpartsthatcanthenbesharedamongdifferentderivations.Thissharingisthekeytothepolynomialruntime.Inthisarticlewebuildonthesameidea,butdevelopanalternativepolynomial-timealgorithmforCCGparsing.ThenewalgorithmisbasedonadifferentdecompositionofCCGderivations,andisarguablysimplerthantheV&Walgorithminatleasttwore-spects:第一的,thenewalgorithmusesonlythreebasicsteps,againsttheninebasicstepsoftheV&Wparser.Second,thecorrectnessproofofthenewalgorithmissimplerthantheonereportedbyVijay-ShankerandWeir(1993).ThenewalgorithmrunsintimeO.n6/wherenisthelengthoftheinputstring,thesameastheV&Wparser.Weorganizeourpresentationasfollows.InSec-tion2weintroduceCCGandthecentralnotionofderivationtrees.InSection3westartwithasimplebutexponential-timeparserforCCG,fromwhichwederiveourpolynomial-timeparserinSection4.Section5furthersimplifiesthealgorithmandprovesitscorrectness.WethenprovideadiscussionofouralgorithmandpossibleextensionsinSection6.Sec-tion7concludesthearticle.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
406
2CombinatoryCategorialGrammarWeassumebasicfamiliaritywithCCGingeneralandtheformalismofWeirandJoshi(1988)inparticular.Inthissectionwesetupourterminologyandnotation.ACCGhastwomainparts:alexiconthatassociateswordswithcategories,andrulesthatspecifyhowcategoriescanbecombinedintoothercategories.Together,thesecomponentsgiverisetoderivationssuchastheoneshowninFigure1.2.1LexiconTheCCGlexiconisafinitesetofword–categorypairswWDX.1Categoriesarebuiltfromafinitesetofatomiccategoriesandtwobinaryoperators:forwardslash(=)andbackwardslash(=).Atomiccategoriesrepresentthesyntactictypesofcompleteconstituents;theyincludeadistinguishedcategorySforcompletesentences.Aconstituentwiththecom-plexcategoryX=YrepresentsafunctionthatseeksaconstituentofcategoryYimmediatelytoitsrightandreturnsaconstituentofcategoryX;similarly,X=YrepresentsafunctionthatseeksaYtoitsleft.Wetreatslashesasleft-associativeoperatorsandomitunnecessaryparentheses.Bythisconvention,everycategoryXcanbewrittenasXDAjmXm(西德:1)(西德:1)(西德:1)j1X1wherem(西德:21)0,AisanatomiccategorycalledthetargetofXandthejiXiareslash–categorypairscalledtheargumentsofX.Weviewtheseargumentsasbeingarrangedinastackwithj1X1atthetopandjmXmatthebottom.ThusanotherwayofwritingthecategoryXaboveisasXDA˛,where˛isa(possiblyempty)stackofmarguments.ThenumbermiscalledthearityofX;wedenoteitbyar.X/.2.2RulesTherulesofCCGaredirectedversionsof(general-ized)functionalcomposition.Therearetwoforms,forwardrulesandbackwardrules:X=YYjdYd(西德:1)(西德:1)(西德:1)j1Y1)XjdYd(西德:1)(西德:1)(西德:1)j1Y1.>d/YjdYd(西德:1)(西德:1)(西德:1)j1Y1X=Y)XjdYd(西德:1)(西德:1)(西德:1)j1Y1.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
407
w….XiffwWDXX=YYˇXˇt1t2iffX=YYˇ)XˇYˇX=YXˇt2t1iffYˇX=Y)XˇFigure2:Recursivedefinitionofderivationtrees.Nodeslabeledwithprimaryinputcategoriesareshaded.2.3DerivationTreesThesetofderivationtreesofaCCGcanbeformallydefinedasinFigure2.ThereandintheremainderofthisarticleweuseˇandothersymbolsfromthebeginningoftheGreekalphabettodenotea(pos-siblyempty)stackofarguments.Derivationtreesconsistofunarybranchingsandbinarybranchings:unarybranchings(drawnwithdottedlines)corre-spondtolexiconentries;binarybranchingscorre-spondto(validinstancesof)compositionrules.Theyieldofaderivationtreeistheleft-to-rightsequenceofitsleaves.Thetypeofaderivationtreeisthecategoryatitsroot.3CKY-StyleParsingAlgorithmAsthepointofdepartureforourownwork,wenowintroduceastraightforward,CKY-styleparsingalgo-rithmforCCGs.ItisasimplegeneralizationofthealgorithmpresentedbyShieberetal.(1995),whichisrestrictedtogrammarswithrulesofdegree0or1.Asinthatarticle,wespecifyouralgorithmintermsofagrammaticaldeductionsystem.3.1DeductionSystemWearegivenaCCGandastringwDw1(西德:1)(西德:1)(西德:1)wntobeparsed,whereeachwiisalexicaltoken.Asageneralnotation,forintegersi;jwith0(西德:20)我(西德:20)j(西德:20)nwewritewŒi;j(西德:141)todenotethesubstringwiC1(西德:1)(西德:1)(西德:1)wjofw.Asusual,wetakewŒi;我(西德:141)tobetheemptystring.ItemsTheCKY-stylealgorithmusesalogicwithitemsoftheformŒX;我;j(西德:141)whereXisacategoryandi;jarefencepostpositionsinw.TheintendedinterpretationofsuchanitemistoassertthatwecanbuildaderivationtreewithyieldwŒi;j(西德:141)andtypeX.ThegoalofthealgorithmistheconstructionoftheitemŒS;0;n(西德:141),whichassertstheexistenceofaderivationtreefortheentireinputstring.(RecallthatSisthedistinguishedcategoryforsentences.)AxiomsandInferenceRulesThestepsoftheal-gorithmarespecifiedbymeansofinferencerulesoveritems.Theserulesimplementtherecursivedefi-nitionofderivationtreesgiveninFigure2.Thecon-structionstartswithaxiomsoftheformŒX;我;iC1(西德:141)wherewiC1WDXisalexiconentry;theseitemsasserttheexistenceofaunary-branchingderivationtreeoftheformshownintheleftofFigure2foreachlexicaltokenwiC1.Thereisoneinferenceruleforeveryforwardrule(applicationorcomposition):ŒX=Y;我;j(西德:141)ŒYˇ;j;k(西德:141)ŒXˇ;我;k(西德:141)X=YYˇ)Xˇ(1)Asymmetricalruleisusedforbackwardapplicationandcomposition.However,hereandintheremainderofthearticleweonlyspecifytheforwardversionofeachruleandleavethebackwardversionimplicit.3.2CorrectnessandRuntimeThesoundnessandcompletenessoftheCKY-stylealgorithmcanbeprovedbyinductiononthenumberofinferencesandthenumberofnodesinaderivationtree,respectively.Itisnothardtoseethat,inthegeneralcase,thealgorithmusesanamountoftimeandspaceexponen-tialwithrespecttothelengthoftheinputstring,n.Thisisbecauserule(1)maybeusedtogrowthearityofprimaryinputcategoriesuptosomelinearfunctionofn,resultinginexponentiallymanycate-gories.2Notethatthisisonlypossibleifthereareruleswithdegree2ormore.Forgrammarsrestrictedtoruleswithdegree0or1,suchasthoseconsideredbyShieberetal.(1995),theruntimeofthealgorithmiscubicinn.Thisrestrictedclassofgrammarsonlyholdscontext-freegenerativepower,whilethepowerofgeneralCCGisbeyondthatofcontext-freegram-mars(Vijay-ShankerandWeir,1994).2Categorieswhosearityisnotboundedbyalinearfunc-tionofnarenotuseful,inthesensethattheycannotoccurincompletederivations.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
408
wŒi0;j0(西德:141)t0wŒi;i0(西德:141)wŒj0;j(西德:141)X=YcXˇ(A)treetwŒi00;i0(西德:141)wŒj0;j00(西德:141)Xj1Yc1wŒi;i00(西德:141)wŒj00;j(西德:141)Xˇj2Zc2Xˇ(西德:13)(乙)contextcFigure3:Decompositionofderivations.4APolynomial-TimeAlgorithmWenowintroduceourpolynomial-timealgorithm.ThisalgorithmusesthesameaxiomsandthesamegoalitemastheCKY-stylealgorithm,butfeaturesnewitemsandnewinferencerules.4.1NewItemsInafirststep,inordertoavoidruntimeexponentialinnwerestrictouritemsettocategorieswhosearityisboundedbysomegrammarconstantcG:ŒX;我;j(西德:141)wherear.X/(西德:20)cGTheexactchoiceoftheconstantwillbediscussedinSection5.Withtherestricteditemset,thenewalgorithmbehavesliketheoldoneaslongasthearityofcategoriesdoesnotexceedcG.However,规则(1)aloneisnolongercomplete:Derivationswithcate-gorieswhosearityexceedscGcannotbesimulatedanymore.Toremedythisdeficiency,weintroduceanewtypeofitemtoimplementaspecificdecomposi-tionoflongderivationsintosmallerpieces.ConsideraderivationtoftheformshowninFig-ure3(A).NotethattheyieldoftiswŒi;j(西德:141).Thederivationconsistsoftwoparts,namedt0andc;theseshareacommonnodewithacategoryoftheformX=Y.NowassumethatchasthespecialpropertythatnoneofthecombinatoryrulesthatitusespopstheargumentstackofthecategoryX.Thismeansthatc,afterpoppingtheargument=Y,maypushnewargumentsandpopthemagain,butmaynever“touch”X.Wecallafragmentwiththisspecialprop-ertyaderivationcontext.(AformaldefinitionwillbegiveninSection5.2.)Thespecialpropertyofcisusefulbecauseitim-pliesthatccanbecarriedoutforanychoiceofX.Tobemorespecific,letuswriteˇforthe(possiblyempty)sequenceofargumentsthatcpushestotheargumentstackofXinplaceof=Y.Weshallreferto=Yasthebridgingargumentandtothesequenceˇastheexcessofc.Supposenowthatwereplacet0byaderivationtreewiththesameyieldbutwithatypeX0=YwhereX0¤X.ThenbecausecdoesnottouchX0weobtainanothervalidderivationtreewiththesameyieldast;thetypeofthistreewillbeX0ˇ.Forthecombinationwithc,theinternalstructureoft0isofnoimportance;theonlyimportantinforma-tionistheextentoftheyieldoft0andtheidentityofthebridgingargument=Y.Intermsofourdeductionsystem,thiscanbeexpressedasfollows:Thederiva-tioncontextccanbecombinedwithanytreet0thatisassociatedwithanitemoftheformŒX=Y;i0;j0(西德:141),whereXisanycategory.Similarly,theinternalstruc-tureofcisofnoimportanceeither,aslongastheargumentstackofthecategoryXremainsuntouched.Itsufficestorecordthefollowing:(西德:15)theextentoftheyieldoft,specifiedintermsofthepositionsiandj;(西德:15)theextentoftheyieldoft0,specifiedintermsofthepositionsi0andj0;(西德:15)thebridgingargument=Y;和(西德:15)theexcessˇ.WerepresentthesepiecesofinformationinanewtypeofitemoftheformŒ=Y;ˇ;我;i0;j0;j(西德:141).Thein-tendedinterpretationoftheseitemsistoassertthat,foranychoiceofX,ifwecanbuildaderivationtreet0withyieldwŒi0;j0(西德:141)andtypeX=Y,thenwecanalsobuildaderivationtreet0withyieldwŒi;j(西德:141)andtypeXˇ.WealsouseitemsŒ=Y;ˇ;我;i0;j0;j(西德:141)withabackwardslash,withasimilarmeaning.Likeitemsthatrepresentderivationtrees,ouritemsforderivationcontextsarearity-restricted:ŒjY;ˇ;我;i0;j0;j(西德:141)wherear.Yˇ/(西德:20)cGAswewillseeinSection5,theserestricteditemssufficetosimulateallderivationsofaCCG.Further-more,thiscanbedoneintimepolynomialinn,be-causeourencodingallowssharingofthesameitemsamongseveralderivations.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
409
ŒA;0;1(西德:141)ŒS=H=A=F;2;5(西德:141)ŒB;1;2(西德:141)ŒC=A=F;2;3(西德:141)ŒS=E;3;4(西德:141)ŒE=H=C;4;5(西德:141)ŒS=H=C;3;5(西德:141)(1)ŒS=H=A=F;2;5(西德:141)(1)ŒF=G=B;5;6(西德:141)Œ=F;=G=B;2;2;5;6(西德:141)(2)(西德:142)Œ=F;=G;1;2;5;6(西德:141)(4)ŒG;6;7(西德:141)Œ=F;”;1;2;5;7(西德:141)(4)ŒS=H=A;1;7(西德:141)(3)(西德:143)ŒS=H;0;7(西德:141)(1)ŒH;7;8(西德:141)ŒS;0;8(西德:141)(1)Figure4:AsamplederivationofthegrammaticaldeductionsystemofSection4.Inference(西德:142)triggersanewcontextitemfromatreeitem;inference(西德:143)reusesthetreeitem(asindicatedbythearrow),recombiningitwiththe(modified)contextitem.4.2NewInferenceRulesInourparsingalgorithm,contextitemsareintroducedwheneverthecompositionoftwocategorieswhosearitiesareboundedbycGwouldresultinacategorywhosearityexceedsthisbound:ŒX=Y;我;j(西德:141)ŒYˇ;j;k(西德:141)Œ=Y;ˇ;我;我;j;k(西德:141)8<:X=YYˇ)Xˇar.Xˇ/>cG(2)Thenewrulehasthesameantecedentsasrule(1),butratherthanextendingthederivationassertedbythefirstantecedentŒX=Y;我;j(西德:141),whichisnotpossiblebe-causeofthearitybound,ittriggersanewderivationcontext,assertedbytheitemŒ=Y;ˇ;我;我;j;k(西德:141).Fur-therapplicationsandcompositionswillextendthenewcontext,andonlywhentheexcessofthiscontexthasbecomesufficientlysmallwillitberecombinedwiththederivationthatoriginallytriggeredit.Thisisdonebythefollowingrule:ŒXjY;i0;j0(西德:141)ŒjY;ˇ;我;i0;j0;j(西德:141)ŒXˇ;我;j(西德:141)(3)Notethatthisrule(likeallrulesinthedeductionsystem)isonlydefinedonvaliditems;inparticularitonlyfiresifthearityofthecategoryXˇisboundedbycG.Theremainingrulesofthealgorithmparallelthethreerulesthatwehaveintroducedsofarbuttakeitemsthatrepresentderivationcontextsratherthanderivationtreesastheirfirstantecedents.Firstout,规则(4)extendsaderivationcontextinthesamewayasrule(1)extendsaderivationtree.ŒjY;ˇ=Z;我;i0;j0;j(西德:141)ŒZ(西德:13);j;k(西德:141)ŒjY;ˇ(西德:13);我;i0;j0;k(西德:141)8<:X=ZZ(cid:13))X(cid:13)ar.Yˇ(cid:13)/(cid:20)cG(4)Rule(5)istheobviouscorrespondentofrule(2):Ittriggersanewcontextwhentheantecedentcontextcannotbeextendedbecauseofthearitybound.ŒjY;ˇ=Z;i;i0;j0;j(cid:141)ŒZ(cid:13);j;k(cid:141)Œ=Z;(cid:13);i;i;j;k(cid:141)8<:X=ZZ(cid:13))X(cid:13)ar.Yˇ(cid:13)/>cG(5)最后,andparalleltorule(3),weneedaruletorecombineacontextwiththecontextthatoriginallytriggeredit.Asitwillturnout,weonlyneedthisincaseswherethetriggeredcontexthasnoexcess.Œj1Y;ˇj2Z;i00;i0;j0;j00(西德:141)Œj2Z;”;我;i00;j00;j(西德:141)Œj1Y;ˇ;我;i0;j0;j(西德:141)(6)4.3SampleDerivationWenowillustrateouralgorithmonatoygrammar.Thegrammarhasthefollowinglexicon:w1WDAw5WDE=H=Cw2WDBw6WDF=G=Bw3WDC=A=Fw7WDGw4WDS=Ew8WDHThestartsymbolisS.Thegrammarallowsallin-stancesofapplicationandallinstancesofcomposi-tionwithdegreeboundedby2.WeletcGD3(asexplainedlaterinSection5.2).Aderivationofourdeductionsystemonthein-putstringw1(西德:1)(西德:1)(西德:1)w8isgiveninFigure4.Westartbyapplyingrule(1)twice(onceforward,onceback-ward)toobtaintheitemŒS=H=A=F;2;5(西德:141).Com-biningthisitemwiththeaxiomŒF=G=B;5;6(西德:141)isnotpossibleusingrule(1),asthiswouldresultin
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
410
acategorywitharity4,exceedingthearitybound.Wethereforeuserule(2)totriggerthecontextitemŒ=F;=G=B;2;2;5;6(西德:141)((西德:142)).Successively,weuserule(4)twicetoobtaintheitemŒ=F;”;1;2;5;7(西德:141).Atthispointweuserule(3)(withˇD”)torecombinethecontextitemwiththetreeitemthatoriginallytriggeredit((西德:143));thisyieldstheitemŒS=H=A;1;7(西德:141).Notethattherecombinationeffectivelyretrievestheportionofthestackthatwasbelowtheargument=Fwhenthecontextitemwastriggeredin(西德:142).Doubleap-plicationofrule(1)producesthegoalitemŒS;0;8(西德:141).4.4RuntimeAnalysisWenowturntoananalysisoftheruntimecomplexityofouralgorithm.Wefirstconsiderruntimecomplex-itywithrespecttothelengthoftheinputstring,n.Theruntimeisdominatedbythenumberofinstantia-tionsofrule(6)whichinvolvestwocontextitemsasantecedents.Byinspectionofthisrule,weseethatthenumberofpossibleinstantiationsisboundedbyn6.ThereforeweconcludethatthealgorithmrunsintimeO.n6/.Wenowconsiderruntimecomplexitywithrespecttothesizeoftheinputgrammar.Heretheruntimeisdominatedbythenumberofinstantiationsofrules(1)–(5).Forexample,规则(5)combinesitemsŒjY;ˇ=Z;我;i0;j0;j(西德:141)andŒZ(西德:13);j;k(西德:141):Byourrestrictionsonitems,boththearityofYˇ=ZandthearityofZ(西德:13)areupper-boundedbythecon-stantcG.NowrecallthateverycategoryXcanbewrittenasXDA˛forsomeatomiccategoryAandstackofarguments˛.LetAbethesetofatomiccategoriesintheinputgrammar,andletLbethesetofallargumentsoccurringinanycategoryofthelexicon.ByaresultofVijay-ShankerandWeir(1994,Lemma3.1),everyargumentthatmayoccurinaderivedcategoryoccursinL.Thenthenumberofpossibleinstantiationsofrule(5)aswellasrules(1)–(4),andhencetheruntimeofthealgorithm,isinO.jAjjLjcG(西德:1)jLjcG/DO.jAjjLj2cG/:NotethatbothAandLmaygrowwiththegrammarsize.AswewillseeinSection5.2,theconstantcGalsodependsonthegrammarsize.Thismeansthattheworst-caseruntimecomplexityofourparserisexponentialinthesizeoftheinputgrammar.WewillreturntothispointinSection7.5CorrectnessInthissectionweprovethecorrectnessofourparsingalgorithm.Inordertosimplifytheproofs,westartbysimplifyingouralgorithm,atthecostofmakingitlessefficient:(西德:15)Weremovetherulesforextendingtreesandcontexts,规则(1)andrule(4).(西德:15)Weconflatetherulesfortriggeringcontexts,规则(2)andrule(5),intothesingleruleŒYˇ;j;k(西德:141)Œ=Y;ˇ;我;我;j;k(西德:141)X=YYˇ)Xˇ(7)Thisruleblindlyguessestheextensionofthetriggeringtreeorcontext(specifiedbyposi-tionsiandj),ratherthanwaitingforacor-respondingitemtobederived.ThesimplifiedalgorithmisspecifiedinFigure5.WenowarguethatthisalgorithmandthealgorithmfromSection4parseexactlythesamederivationtrees,althoughtheyusedifferentparsingstrategies.First,weobservethatrule(1)intheoldalgorithmcanbesimulatedbyacombinationofotherrulesinthesimplifiedalgorithm,asfollows:ŒX=Y;我;j(西德:141)ŒYˇ;j;k(西德:141)Œ=Y;ˇ;我;我;j;k(西德:141)(7)ŒXˇ;我;k(西德:141)(3)此外,thesimplifiedalgorithmdoesnolongerneedrule(4),whoseroleisnowtakenoverbyrule(7).Toseethis,recallthatrule(4)extendsanexistingcontextwheneverthecompositionoftwocategoriesresultsinanewcategorywhosearitydoesnotexceedcG.Incontrast,规则(7)alwaystriggersanewcontextc,eveniftheresultofthecompositionofcwithsomeexistingcontextsatisfiestheabovearityrestriction.Despitethedifferenceintheadoptedstrategy,thesetworulesareequivalentintermsofstackcontent,leadingtothesamederivationtrees.5.1DefinitionsWeintroducesomeadditionalterminologyandnota-tionthatwewilluseintheproofs.Foraderivationtreetandanodeuoft,wewritetŒu(西德:141)todenotethecategoryatu,andwewritetjutodenotethesub-treeoftatu.Formally,tjuistherestrictionofttonodeuandallofitsdescendants.Eachsubtreeofaderivationtreeisanotherderivationtree.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
411
Itemsoftype1:ŒX;我;j(西德:141),ar.X/(西德:20)cGItemsoftype2:ŒjY;ˇ;我;i0;j0;j(西德:141),ar.Yˇ/(西德:20)cGAxioms:ŒX;我;iC1(西德:141),wiC1WDXGoal:ŒS;0;n(西德:141)Inferencerules:ŒYˇ;j;k(西德:141)Œ=Y;ˇ;我;我;j;k(西德:141)X=YYˇ)Xˇ(7)ŒX=Y;i0;j0(西德:141)Œ=Y;ˇ;我;i0;j0;j(西德:141)ŒXˇ;我;j(西德:141)(3)Œj1Y;ˇj2Z;i00;i0;j0;j00(西德:141)Œj2Z;”;我;i00;j00;j(西德:141)Œj1Y;ˇ;我;i0;j0;j(西德:141)(6)Figure5:Itemsandinferencerulesofthesimplifiedalgorithmforaninputstringw1(西德:1)(西德:1)(西德:1)wn.Definition1Lettbeaderivationtreewithrootr.ThenthassignatureŒX;我;j(西德:141)if1.theyieldoftiswŒi;j(西德:141)and2.thetypeoftisX,thatis,tŒr(西德:141)DX.Notethatwhileweusethesamenotationforsig-naturesasforitems,thesignatureofaderivationtreeisapurelystructuralconcept,whereasanitemisanobjectinthealgorithm.Acentralconceptinourproofisthenotionofspine.Recallthataderivationtreeconsistsofunarybranchingsandbinarybranchings.Ineachbinarybranching,werefertothetwochildrenofthebranch-ing’srootnodeastheprimarychildandthesec-ondarychild,dependingonwhichofthetwoisla-beledwiththeprimaryandsecondaryinputcategoryofthecorrespondingruleinstance.InFigure2,theprimarychildrenoftherootnodeareshaded.Definition2Foraderivationtreet,thespineoftistheuniquepaththatstartsattherootnodeoftandateachnodeucontinuestotheprimarychildofu.Thespineofaderivationtreealwaysendsatanodethatislabeledwithacategoryfromthelexicon.Definition3Lettbeaderivationtreewithrootr.Aderivationcontextcisobtainedbyremovingallproperdescendantsofsomenodef¤ronthespineoft,undertherestrictionthatar.tŒu(西德:141)/>ar.tŒr(西德:141)/foreverynodeuonthespineproperlybetweenfandr.Thenodefiscalledthefootnodeofc.Theyieldofcisthepairwhosefirstcomponentistheyieldofttotheleftoffandwhosesecondcomponentistheyieldofttotherightoff.Foraderivationcontextcandanodeuofc,wewritecŒu(西德:141)todenotethecategoryatu.Definition3formalizestheconceptofderivationcontextsthatweintroducedinSection4.1.First,becausefisonthespineandf¤r,thecate-gorycŒf(西德:141)takestheformX=Y.Thearityrestric-tionimpliesthatthecategoryofeverynodeuonthespineproperlybetweenfandrtakestheformXˇu,jˇuj>0,andthatthecategoryattheroottakestheformXˇ,jˇj(西德:21)0.ThusthecategoryXisneverexposedinc,exceptperhapsatr.AswewillseeinSection5.4,thisproperty,togetherwithacarefulselectionof“splitnodes”,willallowustodecomposederivationsintosmaller,shareableparts.Thebasicideaisthesameasinthetabulationofpushdownautomata(Lang,1974;NederhofandSatta,2004),wherethepushdowninourcaseistheargumentstackoftheprimaryinputcategoriesalongaspine.Theconceptsofsignatureandspinearegeneral-izedtoderivationcontextsasfollows:Definition4Letcbeaderivationcontextwithrootnoderandfootnodef.ThenchassignatureŒjY;ˇ;我;i0;j0;j(西德:141)if1.theyieldofcis.wŒi;i0(西德:141);wŒj0;j(西德:141)/;2.forsomeX,cŒf(西德:141)DXjYandcŒr(西德:141)DXˇ.Definition5Foraderivationcontextc,thespineofcisthepathfromitsrootnodetoitsfootnode.5.2GrammarConstantBeforewestartwiththeproofassuch,weturntothechoiceofthegrammarconstantcG,whichwasleftpendinginprevioussections.RecallthatweareusingcGasaboundonthearityofXintype1itemsŒX;我;j(西德:141).Sincetheseitemsareproducedbyouraxiomsfromthesetofcategoriesinthelexicon,cGmustnotbesmallerthanthemaximumarity`ofacategoryinthisfiniteset.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
412
WealsousecGasaboundonthearityofthecat-egoryYˇintype2itemsŒjY;ˇ;我;i0;j0;j(西德:141).Theseitemsareproducedbyinferencerule(7)tosimulateinstancesofcompositionoftheformX=YYˇ)Xˇ.Herethelengthofˇisboundedbythemaxi-mumdegreedofacompositionruleinthegrammar,andar.Y/isboundedbythemaximumarityaofanargumentfromthe(finite)setLofargumentsinthelexicon(recallSection4.4).ThereforecGcannotbesmallerthanaCd.Puttingeverythingtogether,weobtaintheconditioncG(西德:21)maxf`;aCdg:(8)Thenextlemmawillbeusedinseveralplaceslater.Lemma1Letcbeaderivationcontextwithsigna-tureŒjY;ˇ;我;i0;j0;j(西德:141).Thenar.Yˇ/(西德:20)cG.Proof.Letrandfbetherootandthefootnode,分别,ofc.Fromthedefinitionofsignature,theremustbesomeXsuchthatcŒr(西德:141)DXˇandcŒf(西德:141)DXjY.Nowletpbetheparentnodeoff,andassumethattheruleusedatpisinstantiatedasX=YYˇ0)Xˇ0,sothatcŒp(西德:141)DXˇ0.IfpDrthenˇ0Dˇ;否则,becauseofthearityrestric-tioninthedefinitionofderivationcontexts(Defini-tion3)wehavejˇ0j>jˇj.Thenar.Yˇ/(西德:20)ar.Yˇ0/(西德:20)cG;wheretherightinequalityfollowsfromtheassump-tionthatX=YYˇ0)Xˇ0isaruleinstanceofthegrammar,andfrominequality(8).5.3SoundnessWestartthecorrectnessproofbyarguingforthesoundnessofthedeductionsysteminFigure5.Morespecifically,weshowthatforeveryitemoftype1thereexistsaderivationtreewiththesamesigna-ture,andthatforeveryitemoftype2thereexistsaderivationcontextwiththesamesignature.Thesoundnessoftheaxiomsisobvious.Rule(7)statesthat,ifwehavebuiltaderivationtreetwithsignatureŒYˇ;j;k(西德:141)thenwecanbuildaderivationcontextcwithsignatureŒ=Y;ˇ;我;我;j;k(西德:141).UndertheconditionthatthegrammaradmitstheruleinstanceX=YYˇ)Xˇ,thisinferenceissound;thecontextcanbebuiltasshowninFigure6.wŒj;k(西德:141)YˇX=YXˇtFigure6:Soundnessofrule(7).规则(3)statesthat,ifwehavebuiltaderivationtreet0withsignatureŒX=Y;i0;j0(西德:141)andacontextcwithsignatureŒ=Y;ˇ;我;i0;j0;j(西德:141),thenwecanbuildanewtreetwithsignatureŒXˇ;我;j(西德:141).Weobtaintbysubstitutingt0forthefootnodeofc(seeFigure3(A)).规则(6)statesthat,ifwehavebuiltaderivationcontextc1withsignatureŒj1Y;ˇj2Z;i00;i0;j0;j00(西德:141)andanothercontextc2withsignatureŒj2Z;”;我;i00;j00;j(西德:141),thenwecanbuildaderivationcontextcwithsignatureŒj1Y;ˇ;我;i0;j0;j(西德:141).Weobtaincbysubstitutingc1forthefootnodeofc2;thisisillustratedbyFigure3(乙)ifweassume(西德:13)D”.5.4CompletenessInthefinalpartofourcorrectnessproofwenowprovethecompletenessofthedeductionsysteminFigure5.Specificallyweshowthefollowingstrongerstatement:ForeveryderivationtreeandforeveryderivationcontextwithasignatureIsatisfyingthearityboundsforitemsofFigure5,thedeductionsysteminfersthecorrespondingitemI.Fromthisstatementwecanimmediatelyconcludethatthesys-temconstructsthegoalitemwheneverthereexistsaderivationtreewhoseyieldisthecompleteinputstringandwhosetypeisthedistinguishedcategoryS.Ourproofisbyinductiononameasurethatwecallrank.Therankofaderivationtreeorcontextisthenumberofitsnon-leafnodes.Notethatthisdefinitionimpliesthatthefootnodeofacontextisnotcountedagainstitsrank.Therankofatreeorcontextisalwaysatleast1,withrank1onlyrealizedforderivationtreesconsistingofasinglenode.5.4.1BaseCaseConsideraderivationtreetwithsignatureŒX;我;j(西德:141)andrank.t/D1.ThetreettakestheformshownintheleftofFigure2,andwehavejDiC1andwjWDX.TheitemŒX;我;j(西德:141)isthenproducedbyoneoftheaxiomsofourdeductionsystem.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
413
5.4.2InductiveCaseThegeneralideaunderlyingtheinductivecasecanbestatedasfollows.Weconsideraderivationtreeorcontext’withsignatureIsatisfyingtheboundsstatedinFigure5foritemsoftype1or2.Wethenidentifyaspecialnodesin’’sspine,whichwecallthesplitnode.Weusesto“split”’intotwopartsthatareeitherderivationtreesorcontexts,thatbothsatisfytheboundsforitemsoftype1or2,andthatbothhaveranksmallerthantherankof’.Wethenapplytheinductionhypothesistoobtaintwoitemsthatcanbecombinedbyoneoftheinferencerulesofouralgorithm,resultinginthedesireditemIfor’.Wefirstconsiderthecaseinwhich’isatree,andlaterthecaseinwhich’isacontext.5.4.3SplittingTreesConsideraderivationtreetwithsignatureŒX0;我;j(西德:141),rootnoder,andrank.t/>1.Thenthespineoftconsistsofatleast2nodes.Nowassumethatar.X0/(西德:20)cG,thatis,ŒX0;我;j(西德:141)isavaliditem.Choosethesplitnodestobethehighest(closesttotheroot)non-rootnodeonthespineforwhichar.tŒs(西德:141)/(西德:20)cG.Nodesalwaysexists,asthearityconstraintissatisfiedatleastforthelowest(farthestfromtheroot)nodeonthespine,whichislabeledwithacategoryfromthelexicon.Considerthesubtreet0Dtjs;thussistherootnodeoft0.Becausesisaprimarynodeint,thecategoryatshasatleastoneargument.WedealherewiththecasewherethiscategorytakestheformtŒs(西德:141)DX=Y;thecasetŒs(西德:141)DX=Yissym-metrical.Thusthesignatureoft0takestheformŒX=Y;i0;j0(西德:141)wherei0;j0areintegerswithi(西德:20)i0
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
9
2
1
5
6
6
9
0
3
/
/
t
我
A
C
_
A
_
0
0
1
9
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
414
nodeuinthespine,andatthesametime,nocombi-natoryrulecanreducethearityofitsprimaryinputcategorybymorethanoneunit.Considerthecontextc1thatisobtainedbyrestrict-ingctonodesandallofitsdescendants;thussistherootnodeofc1andfisthefootnode.Toseethatc1iswell-defined,notethatourchoiceofsguaranteesthatar.cŒu(西德:141)/>ar.cŒs(西德:141)/foreverynodeuthatisproperlybetweenfands.Toseethis,supposethattherewasanodeu¤ssuchthatar.cŒu(西德:141)/(西德:20)ar.cŒs(西德:141)/.Sincear.cŒs(西德:141)/Dar.cŒr(西德:141)/C1andar.cŒu(西德:141)/¤ar.cŒs(西德:141)/,byourdefinitionofs,wewouldhavear.cŒu(西德:141)/(西德:20)ar.cŒr(西德:141)/,whichcannotbebecauseinc,everynodeuproperlybetweenfandshasarityar.cŒu(西德:141)/>ar.cŒr(西德:141)/.Becausefisaprimarynodeinc,thecategoryatfhasatleastoneargument;callitj1Y.Thenodesisaprimarynodeincaswell,sotheexcessofc1takestheformˇj2Z,wherej2Zisthetopmostargumentofthecategoryats.Thusthesignatureofc1takestheformŒj1Y;ˇj2Z;i00;i0;j0;j00(西德:141)wherei00;j00areintegerswithi(西德:20)i00(西德:20)i0andj0(西德:20)j00(西德:20)j.ApplyingLemma1toc1,wegetar.Yˇj2Z/(西德:20)cG,andthereforeŒj1Y;ˇj2Z;i00;i0;j0;j00(西德:141)isavaliditem.Finally,wenotethatrank.c1/