Transactions of the Association for Computational Linguistics, vol. 5, pp. 413–424, 2017. Action Editor: Brian Roark.

Transactions of the Association for Computational Linguistics, vol. 5, pp. 413–424, 2017. Action Editor: Brian Roark.
Submission batch: 6/2017; Published 11/2017.

2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

c
(cid:13)

In-OrderTransition-basedConstituentParsingJiangmingLiuandYueZhangSingaporeUniversityofTechnologyandDesign,8SomapahRoad,Singapore,487372jmliunlp@gmail.com,yuezhang@sutd.edu.sgAbstractBothbottom-upandtop-downstrategieshavebeenusedforneuraltransition-basedcon-stituentparsing.Theparsingstrategiesdif-ferintermsoftheorderinwhichtheyrecog-nizeproductionsinthederivationtree,wherebottom-upstrategiesandtop-downstrategiestakepost-orderandpre-ordertraversalovertrees,respectively.Bottom-upparsersbene-fitfromrichfeaturesfromreadilybuiltpar-tialparses,butlacklookaheadguidanceintheparsingprocess;top-downparsersbenefitfromnon-localguidanceforlocaldecisions,butrelyonastrongencoderovertheinputtopredictaconstituenthierarchybeforeitscon-struction.Tomitigatebothissues,wepro-poseanovelparsingsystembasedonin-ordertraversaloversyntactictrees,designingasetoftransitionactionstofindacompromisebe-tweenbottom-upconstituentinformationandtop-downlookaheadinformation.Basedonstack-LSTM,ourpsycholinguisticallymoti-vatedconstituentparsingsystemachieves91.8F1ontheWSJbenchmark.Furthermore,thesystemachieves93.6F1withsupervisedrerankingand94.2F1withsemi-supervisedreranking,whicharethebestresultsontheWSJbenchmark.1IntroductionTransition-basedconstituentparsingemploysse-quencesoflocaltransitionactionstoconstructcon-stituenttreesoversentences.Therearetwopop-ulartransition-basedconstituentparsingsystems,namelybottom-upparsing(SagaeandLavie,2005;ZhangandClark,2009;Zhuetal.,2013;WatanabeandSumita,2015)andtop-downparsing(Dyeretal.,2016;Kuncoroetal.,2017).Theparsingstrate-giesdifferintermsoftheorderinwhichtheyrecog-nizeproductionsinthederivationtree.Theprocessofbottom-upparsingcanbere-gardedaspost-ordertraversaloveraconstituenttree.Forexample,giventhesentenceinFigure1,abottom-upshift-reduceparsertakestheac-tionsequenceinTable2(a)1tobuildtheoutput,wherethewordsequence“Thelittleboy”isfirstread,andthenanNPrecognizedforthewordsequence.Afterthesystemreadstheverb“likes”anditssubsequentNP,aVPisrecognized.Thefullorderofrecognitionforthetreenodesis3(cid:13)→4(cid:13)→5(cid:13)→2(cid:13)→7(cid:13)→9(cid:13)→10(cid:13)→8(cid:13)→6(cid:13)→11(cid:13)→1(cid:13).Whenmakinglocaldecisions,richinformationisavailablefromreadilybuiltpartialtrees(Zhuetal.,2013;WatanabeandSumita,2015;CrossandHuang,2016),whichcontributestolocaldisambiguation.However,thereislackoftop-downguidancefromlookaheadinformation,whichcanbeuseful(Johnson,1998;RoarkandJohnson,1999;Charniak,2000;LiuandZhang,2017).Inaddition,binarizationmustbeappliedtotrees,asshowninFigure1(b),toensureaconstantnumberofactions(SagaeandLavie,2005),andtotakeadvantageoflexicalheadinformation(Collins,2003).However,suchbinarizationrequiresasetoflanguage-specificrules,whichhampersadaptationofparsingtootherlanguages.Ontheotherhand,theprocessoftop-downparsingcanberegardedaspre-ordertraversaloveratree.GiventhesentenceinFigure1,atop-down1Theactionsequenceistakenonunbinarizedtrees.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

414

SVPNPredtomatoeslikesThe.littleboyNPS-rVP-lNP-rredtomatoeslikesThe.littleboyNP-rNP-r*S-l*Thelittleboylikesredtomatoes(a)Syntactictree(b)Binarizedsyntactictree①②③④⑤⑥⑦⑧⑨⑩①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑪Figure1:Syntactictreesofthesentence“Thelittleboylikesredtomatoes.”.(a)syntactictree;(b)bi-narizedsyntactictree,whererandlmeantheheadistherightbranchandtheleftbranch,respectively,and∗meansthisconstituentisnotcompleted.shift-reduceparsertakestheactionsequenceinTable2(b)tobuildtheoutput,whereanSisfirstmadeandthenanNPisgenerated.Afterthat,thesystemmakesadecisiontoreadthewordsequence“Thelittleboy”tocompletetheNP.Thefullorderofrecognitionforthetreenodesis1(cid:13)→2(cid:13)→3(cid:13)→4(cid:13)→5(cid:13)→6(cid:13)→7(cid:13)→8(cid:13)→9(cid:13)→10(cid:13)→11(cid:13).Thetop-downlookaheadguidancecontributestonon-localdisambiguation.However,itisdifficulttogenerateaconstituentbeforeitssubconstituentshavebeenrealized,sincenoexplicitfeaturescanbeextractedfromtheirsubtreestructures.Thankstotheuseofrecurrentneuralnetworks,whichmakeitpossibletorepresentasentencegloballybeforesyntactictreeconstruction,seminalworkofneuraltop-downparsingdirectlygeneratesbracketedcon-stituenttreesusingsequence-to-sequencemodels(Vinyalsetal.,2015).Dyeretal.(2016)designasetoftop-downtransitionactionsforstandardstackbufferactionnode[][Thelittle…]SHIFT3(cid:13)[The][littleboy…]SHIFT4(cid:13)[Thelittle][boylikes…]SHIFT5(cid:13)[…littleboy][likesred…]REDUCE-R-NP2(cid:13)…………(a)bottom-upsystemstackbufferactionnode[][Thelittle…]NT-S1(cid:13)[(S][Thelittle…]NT-NP2(cid:13)[(S(NP][Thelittle…]SHIFT3(cid:13)[…(NPThe][littleboy…]SHIFT4(cid:13)[…Thelittle][boylikes…]SHIFT5(cid:13)[…littleboy][likesred…]REDUCE/…………(b)top-downsystemstackbufferactionnode[][Thelittle…]SHIFT3(cid:13)[The][littleboy…]PJ-NP2(cid:13)[TheNP][littleboy…]SHIFT4(cid:13)[…NPlittle][boylikes…]SHIFT5(cid:13)[…littleboy][likesred…]REDUCE/…………(c)in-ordersystemFigure2:Actionsequencesofthreetypesoftransi-tionconstituentparsingsystem.DetailsoftheactionsystemareintroducedinSection2.1,Section2.2andSection3,respectively.transition-basedparsing.Inthispaper,weproposeanoveltransitionsystemforconstituentparsing,mitigatingissuesofbothbottom-upandtop-downsystemsbyfindingacompromisebetweenbottom-upconstituentinformationandtop-downlookaheadinformation.Theprocessoftheproposedconstituentparsingcanberegardedasin-ordertraversaloveratree.GiventhesentenceinFigure1,thesystemtakestheactionsequenceinTable2(c)tobuildtheoutput.Thesystemreadstheword“The”andthenprojectsanNP,whichisbasedonbottom-upevidence.Afterthis,basedontheprojectedNP,thesystemreadsthewordsequence“littleboy”,withtop-downguidancefromNP.Similarly,basedonthecompletedconstituent“(NPThelittleboy)”,thesystemprojectsanS,withthebottom-upevidence.WiththeSandtheword“likes”,thesystemprojects

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

415

anVP,whichcanserveastop-downguidance.Thefullorderofrecognitionforthetreenodesis3(cid:13)→2(cid:13)→4(cid:13)→5(cid:13)→1(cid:13)→7(cid:13)→6(cid:13)→9(cid:13)→8(cid:13)→10(cid:13)→11(cid:13).Comparedtopost-ordertraversal,in-ordertraversalcanpotentiallyresolvenon-localambi-guitybetterbytop-downguidance.Comparedtopre-ordertraversal,in-ordertraversalcanpoten-tiallyresolvelocalambiguitybetterbybottom-upevidence.Furthermore,in-ordertraversalispsycho-linguisticallymotivated(Roarketal.,2009;Steedman,2000).Empirically,ahumanreadercomprehendssentencesbygivinglookaheadguessesforconstituents.Forexample,whenreadingaword“likes”,ahumanreadercouldguessthatitcouldbeastartofaconstituentVP,insteadofwaitingtoreadtheobject“redtomatoes”,whichistheprocedureofabottom-upsystem.Wecompareoursystemwiththetwobaselinesystems(i.e.atop-downsystemandabottom-upsystem)underthesameneuraltransition-basedframeworkofDyeretal.(2016).Ourfinalmod-elsoutperformbothofthebottom-upandtop-downtransition-basedconstituentparsingbyachievinga91.8F1inEnglishanda86.1F1inChineseforgreedyfully-supervisedparsing,respectively.Fur-thermore,ourfinalmodelobtainsa93.6F1withsu-pervisedreranking(ChoeandCharniak,2016)anda94.2F1withsemi-supervisedreranking,achievingthestate-of-the-artresultsonconstituentparsingontheEnglishbenchmark.ByconvertingtoStanforddependencies,ourfinalmodelachievesthestate-of-the-artresultsondependencyparsingbyobtaininga96.2%UASanda95.2%LAS.Toourknowl-edge,wearethefirsttosystematicallycomparetop-downandbottom-upconstituentparsingunderthesameneuralframework.Wereleaseourcodeathttps://github.com/LeonCrashCode/InOrderParser.2Transition-basedconstituentparsingTransition-basedconstituentparsingtakesaleft-to-rightscanoftheinputsentence,whereastackisusedtomaintainpartiallyconstructedphrase-structures,whiletheinputwordsarestoredinabuffer.Formally,astateisdefinedas[σ,i,f],whereσisthestack,iisthefrontindexofthebuffer,andfisabooleanvalueshowingthattheparsingisfin-ished.Ateachstep,atransitionactionisappliedtoconsumeaninputwordorconstructanewphrase-structure.Differentparsingsystemsemploytheirownsetsofactions.2.1Bottom-upsystemWetakethebottom-upsystemofSagaeandLavie(2005)asourbottom-upbaseline.Givenastate,thesetoftransitionactionsare:•SHIFT:popthefrontwordfromthebuffer,andpushitontothestack.•REDUCE-L/R-X:popthetoptwoconstituentsoffthestack,combinethemintoanewcon-stituentwithlabelX,andpushthenewcon-stituentontothestack.•UNARY-X:popthetopconstituentoffthestack,raiseittoanewconstituentwithlabelX,andpushthenewconstituentontothestack.•FINISH:poptherootnodeoffthestackandendparsing.Thebottom-upparsercanbesummarizedasthedeductivesysteminFigure3(a).Giventhesen-tencewiththebinarizedsyntactictreeinFigure1(b),thesequenceofactionsSHIFT,SHIFT,SHIFT,REDUCE-R-NP,REDUCE-R-NP,SHIFT,SHIFT,SHIFT,REDUCE-R-NP,REDUCE-L-VP,SHIFT,REDUCE-L-S,REDUCE-R-SandFINISH,canbeusedtoconstructitsconstituenttree.2.2Top-downsystemWetakethetop-downsystemofDyeretal.(2016)asourtop-downbaseline.Givenastate,thesetoftransitionactionsare:•SHIFT:popthefrontwordfromthebuffer,andpushitontothestack.•NT-X:openanonterminalwithlabelXontopofthestack.•REDUCE:repeatedlypopcompletedsubtreesorterminalsymbolsfromthestackuntilanopennonterminalisencountered,andthenthisopenNTispoppedandusedasthelabelofanewconstituentthathasthepoppedsubtreesas

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

416

SHIFT[σ,i,false][σ|wi,i+1,false]REDUCE-L/R-X[σ|s1|s0,i,false][σ|Xs1s0,i,false]Unary-X[σ|s0,i,false][σ|Xs0,i,false]FINISH[σ,i,false][σ,i,true](a)bottom-upsystemSHIFT[σ,i,/][σ|wi,i+1,/]NT-X[σ,i,/](σ|X,i,/]REDUCE[σ|X|sj|…|s0,i,/][σ|Xsj…s0,i,/](b)top-downsystemSHIFT[σ,i,false][σ|wi,i+1,false]PJ-X[σ|s0,i,false](σ|s0|X,i,false]REDUCE[σ|sj|X|sj−1|…|s0,i,false][σ|Xsjsj−1…s0,i,false]FINISH[σ,i,false][σ,i,true](c)in-ordersystemFigure3:Differenttransitionsystems.Thestartstateis[φ,0,false]andthefinalstateis[σ,n,true].itschildren.Thisnewcompletedconstituentispushedontothestackasasinglecompositeitem.ThedeductionsystemfortheprocessisshowninFigure3(b)2.GiventhesentenceinFigure1,thesequenceofactionsNT-S,NT-NP,SHIFT,SHIFT,SHIFT,REDUCE,NT-VP,SHIFT,NT-NP,SHIFT,SHIFT,REDUCE,REDUCE,SHIFTandREDUCE,canbeusedtoconstructitsconstituenttree.2Duetounarydecisions,withexceptionofthetop-downsystem,weusecompletedmarkstomakethefinishdecisions.3In-ordersystemWeproposeanovelin-ordersystemfortransition-basedconstituentparsing.Similartothebottom-upandtop-downsystems,thein-ordersystemmain-tainsastackandabufferforrepresentingastate.Thesetoftransitionactionsaredefinedas:•SHIFT:popthefrontwordfromthebuffer,andpushitontothestack.•PJ-X:projectanonterminalwithlabelXontopofthestack.•REDUCE:repeatedlypopcompletedsubtreesorterminalsymbolsfromthestackuntilapro-jectednonterminalencountered,andthenthisprojectednonterminalispoppedandusedasthelabelofanewconstituent.Furthermore,onemoreitemonthetopofstackispoppedandinsertedastheleftmostchildofthenewconstituent.Thepoppedsubtreesareinsertedastherestofthechildren.Thisnewcompletedconstituentispushedontothestackasasinglecompositeitem.•FINISH:poptherootnodeoffthestackandendparsing.ThedeductionsystemfortheprocessisshowninFigure3(c).GiventhesentenceinFigure1,thesequenceofactionsSHIFT,PJ-NP,SHIFT,SHIFT,REDUCE,PJ-S,SHIFT,PJ-VP,SHIFT,PJ-NP,SHIFT,REDUCE,REDUCE,SHIFT,REDUCE,FIN-ISHcanbeusedtoconstructitsconstituenttree.VariantsThein-ordersystemcanbegener-alizedintovariantsbymodifyingk,thenumberofleftmostnodestracedbeforetheparentnode.Forexample,giventhetree“(Sabcd)”,thetraversalis“aSbcd”ifk=1whilethetraversalis“abScd”ifk=2.Wenameeachvariantwithacertainkvalueask-in-ordersystems.Inthispaper,weonlyinvestigatethein-ordersystemwithk=1,the1-in-ordersystem.Notethatthetop-downparsercanberegardedasaspecialcaseofageneralizedversionofthein-orderparserwithk=0,andthebottom-upparsercanberegardedasaspecialcasewithk=∞.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

417

softmaxa2a1xjxj-1xj-2si-1a0stackbufferactionsisi-2Figure4:Frameworkofourtransition-basedparsers.4NeuralparsingmodelWeemploythestack-LSTMparsingmodelofDyeretal.(2016)forthethreetypesoftransition-basedparsingsystemsinSection2.1,2.2and3,respec-tively,whereastack-LSTMisusedtorepresentthestack,astack-LSTMisusedtorepresentthebuffer,andavanillaLSTMisusedtorepresenttheactionhistory,asshowninFigure4.4.1WordrepresentationWefollowDyeretal.(2015),representingeachwordusingthreedifferenttypesofembeddings,in-cludingpretrainedwordembedding,ewi,whichisnotfine-tunedduringthetrainingoftheparser,ran-domlyinitializedembeddingsewi,whichisfine-tuned,andtherandomlyinitializedpart-of-speechembeddings,whichisfine-tuned.Thethreeembed-dingsareconcatenated,andthenfedtononlinearlayertoderivethefinalwordembedding:xi=f(Winput[epi;ewi;ewi]+binput),whereWinputandbinputaremodelparameters,wiandpidenotetheformandthePOStagoftheithinputword,respectively,andfisannonlinearfunc-tion.Inthispaper,weuseReLuforf.ThelittleboyNPlittleboyNP-r*(a)Unbinarizedcomposition(b)BinarizedcompositionThelittleboyNPNPNPboylittleNPFigure5:Thecompositionfunction.(a)isforun-binarizedtreesand(b)isforbinarizedtrees,where“NP-r*”meansthat“littleboy”isanon-completednounphrasewithhead“boy”.4.2StackrepresentationWeemployabidirectionalLSTMasthecompo-sitionfunctiontorepresentconstituentsonstack3.Fortop-downparsingandin-orderparsing,follow-ingDyeretal.(2016),asshowninFigure5(a),thecompositionrepresentationscompiscomputedas:scomp=(LSTMfwd[ent,s0,…,sm];LSTMbwd[ent,sm,…,s0]),whereentistherepresentationofanon-terminal,sj,j∈[0,m]isthejthchildnode,andmisthenumberofthechildnodes.Forbottom-upparsing,wemakeuseoftheheadinformationinthecompo-sitionfunctionbyrequiringtheorderthattheheadnodeisalwaysbeforethenon-headnodeinthebidi-rectionalLSTM,asshowninFigure5(b)4.Thebi-3Tobefair,weuseabidirectionalLSTMascompositionfunctionforallparsingsystems4AbidirectionalLSTMconsistsoftwoLSTMs,makingitbalancedforcomposition.However,theyhavedifferentparam-eterssothatonerepresentsinformationofhead-firstwhileotherrepresentsinformationofhead-last.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

418

narizedcompositioniscomputedas:sbcomp=(LSTMfwd[ent,sh,so];LSTMbwd[ent,so,sh]),whereshandsoistherepresentationoftheheadandthenon-headnode,respectively.4.3GreedyactionclassificationGivenasentencew0,w1,…,wn−1,wherewiistheithword,andnisthelengthofthesentence,ourparsermakeslocalactionclassificationdeci-sionsincrementally.Forthekthparsingstatelike[sj,…,s1,s0,i,false],theprobabilitydistributionofthecurrentactionpis:p=SOFTMAX(W[hstk;hbuf;hah]+b),(*)whereWandbaremodelparameters,therepresen-tationofstackinformationhstkis:hstk=stack-LSTM[s0,s1,…,sj],therepresentationofbufferinformationhbufis:hbuf=stack-LSTM[xi,xi+1,…,xn],xisthewordrepresentation,andtherepresentationofactionhistoryhahis:hah=LSTM[eactk−1,eactk−2,…,eact0],whereeactk−1istherepresentationofactioninthek-1thparsingstate.TrainingOurmodelsaretrainedtominimizeacross-entropylossobjectivewithanl2regularizationterm,definedbyL(θ)=−XiXjlogpaij+λ2||θ||2,whereθisthesetofparameters,paijistheproba-bilityofthejthactionintheithtrainingexamplegivenbythemodelandλisaregularizationhyper-parameter(λ=10−6).Weusestochasticgradientdescentwitha0.1initializedlearningratewitha0.05learningratedecay.ParameterValueLSTMlayer2Wordembeddingdim32Englishpretrainedwordembeddingdim100Chinesepretrainedwordembeddingdim80POStagembeddingdim12Actionembeddingdim16Stack-LSTMinputdim128Stack-LSTMhiddendim128Table1:Hyper-parameters.5Experiments5.1DataWeempiricallycompareourbottom-up,top-downandin-orderparsers.TheexperimentsarecarriedoutonbothEnglishandChinese.ForEnglishdata,weusethestandardbenchmarkofWSJsectionsinPTB(Marcusetal.,1993),wheretheSections2-21aretakenfortrainingdata,Section22fordevel-opmentdataandSection23fortestingbothdepen-dencyparsingandconstituencyparsing.WeadoptthepretrainedEnglishwordembeddingsgeneratedontheAFPportionofEnglishGigaword.ForChinesedata,weuseVersion5.1ofthePennChineseTreebank(CTB)(Xueetal.,2005).Weusearticles001-270and440-1151fortraining,articles301-325forsystemdevelopment,andarticles271-300forfinalperformanceevaluation.WeadoptthepretrainedChinesewordembeddingsgeneratedonthecompleteChineseGigawordcorpus.ThePOStagsinboththeEnglishdataandtheChinesedataareautomaticallyassignedthesameastheworkofDyeretal.(2016),usingStanfordtagger.WefollowtheworkofChoeandCharniak(2016)andadopttheAFPportionoftheEnglishGiga-wordastheextraresourcesforthesemi-supervisedreranking.5.2SettingsHyper-parametersForbothEnglishandChineseexperiments,weusethesamehyper-parametersastheworkofDyeretal.(2016)withoutfurtheropti-mization,asshowninTable1.RerankingexperimentsFollowingthesamererankingsettingofDyeretal.(2016)andChoeandCharniak(2016),weobtain100samplesfromourbottom-up,top-down,andin-ordermodel(Sec-

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

419

ModelLRLPF1Top-downparser91.5991.6691.62Bottom-upparser91.8991.8391.86In-orderparser91.9891.8691.92Table2:Developmentresults(%)onWSJ22.ModelF1fully-superviseTop-downparser91.2Bottom-upparser91.3In-orderparser91.8rerankTop-downparser93.3Bottom-upparser93.3In-orderparser93.6Table3:Finalresults(%)onWSJSection23.tion4),respectively,withanexponentiationstrat-egy(α=0.8),byusingtheprobabilitydistribu-tionofaction(equation*).WeadoptthererankerofChoeandCharniak(2016)asbothourEnglishfully-supervisedrerankerandsemi-supervisedreranker,andthegenerativererankerofDyeretal.(2016)asourChinesesupervisedreranker.5.3DevelopmentexperimentsTable2showsthedevelopmentresultsofthethreeparsingsystems.Thebottom-upsystemperformsslightlybetterthanthetop-downsystem.Thein-ordersystemoutperformsboththebottom-upandthetop-downsystem.5.4ResultsTable3showstheparsingresultsontheEnglishtestdataset.Wefindthatthebottom-upparserandthetop-downparserhavesimilarresultsunderthegreedysetting,andthein-orderparseroutperformsbothofthem.Also,withsupervisedreranking,thein-orderparserachievesthebestresults.EnglishconstituentresultsWecompareourmodelswithpreviouswork,asshowninTa-ble4.Withthefully-supervisesetting5,thein-orderparseroutperformsthestate-of-the-artdiscreteparser(Shindoetal.,2012;Zhuetal.,2013),thestate-of-the-artneuralparsers(CrossandHuang,5Here,weonlyconsidertheworkofasinglemodel.ModelF1fully-superviseSocheretal.(2013)90.4Zhuetal.(2013)90.4Vinyalsetal.(2015)90.7WatanabeandSumita(2015)90.7Shindoetal.(2012)91.1DurrettandKlein(2015)91.1Dyeretal.(2016)91.2CrossandHuang(2016)91.3LiuandZhang(2017)91.7Top-downparser91.2Bottom-upparser91.3In-orderparser91.8rerankingHuang(2008)91.7CharniakandJohnson(2005)91.5ChoeandCharniak(2016)92.6Dyeretal.(2016)93.3Kuncoroetal.(2017)93.6Top-downparser93.3Bottom-upparser93.3In-orderparser93.6semi-supervisedrerankingChoeandCharniak(2016)93.8In-orderparser94.2Table4:Finalresults(%)onWSJSection23.2016;WatanabeandSumita,2015)andthestate-of-the-arthybridparsers(DurrettandKlein,2015;LiuandZhang,2017),achievingstate-of-the-artresults.Withthererankingsetting,thein-orderparserout-performsthebestdiscreteparser(Huang,2008)andhasthesameperformanceasKuncoroetal.(2017),whichextendstheworkofDyeretal.(2016)byaddingagatedattentionmechanismoncompositionfunctions.Withthesemi-supervisedsetting,thein-orderparseroutperformsthebestsemi-supervisedparser(ChoeandCharniak,2016)byachieving94.2F1(theoracleis97.9F1).EnglishdependencyresultsAsshowninTable5,byconvertingtoStanfordDependencies,with-outadditionaltrainingdata,ourmodelsachieveasimilarperformancewiththestate-of-the-artsystem(ChoeandCharniak,2016);withthesameaddi-tionaltrainingdata,ourmodelsachievenewstate-of-the-artresultsondependencyparsingbyachiev-ing96.2%UASand95.2%LASonstandardbench-mark.ChineseconstituentresultsTable6showsthefinalresultsontheChinesetestdataset.Thein-

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

420

ModelUASLASKiperwasserandGoldberg(2016)†93.991.9Chengetal.(2016)†94.191.5Andoretal.(2016)94.692.8Dyeretal.(2016)-re95.694.4DozatandManning(2017)†95.794.0Kuncoroetal.(2017)-re95.794.5ChoeandCharniak(2016)-sre95.994.1In-orderparser94.593.4In-orderparser-re95.994.9In-orderparser-sre96.295.2Table5:StanfordDependencyaccuracy(%)onWSJSection23.†meansgraph-basedparsing.“-re”meansfully-supervisedrerankingand“-sre”meanssemi-supervisedreranking.ParserF1fully-supervisionZhuetal.(2013)83.2Wangetal.(2015)83.2Dyeretal.(2016)84.6LiuandZhang(2017)85.5Top-downparser84.6Bottom-upparser85.7In-orderparser86.1rerankCharniakandJohnson(2005)82.3Dyeretal.(2016)86.9Top-downparser86.9Bottom-upparser87.5In-orderparser88.0semi-supervisionZhuetal.(2013)85.6WangandXue(2014)86.3Wangetal.(2015)86.6Table6:FinalresultsontestsetofCTB.orderparserachievesthebestresultsunderthefully-supervisedsetting.Withthesupervisedreranking,thein-orderparseroutperformsthestate-of-the-artmodelsbyachieving88.0F1(theoracleis93.47F1).ChinesedependencyresultsAsshowninTable7,byconvertingtheresultstodependencies6,ourfi-nalmodelachievesthebestresultsamongtransition-basedparsing,andobtainscomparableresultstothestate-of-the-artgraph-basedmodels.6ThePenn2MalttoolisusedwithChineseheadruleshttps://stp.lingfil.uu.se/nivre/research/Penn2Malt.html.ModelUASLASDyeretal.(2016)85.584.0Ballesterosetal.(2016)87.786.2KiperwasserandGoldberg(2016)87.686.1Chengetal.(2016)†88.185.7DozatandManning(2017)†89.388.2In-orderparser87.486.4In-orderparser-re89.488.4Table7:Dependencyaccuracy(%)onCTBtestset.†meansgraph-basedparsing.“-re”meanssuper-visedreranking.6AnalysisWeanalyzetheresultsofSection23inWSJgivenbyourmodel(i.e.in-orderparser)andtwobaselinemodels(i.e.thebottom-upparserandthetop-downparser)againstthesentencelength,thespanlengthandtheconstituenttype,respectively.6.1InfluenceofsentencelengthFigure6showstheF1scoresofthethreeparsersonsentencesofdifferentlengths.Comparedtothetop-downparser,thebottom-upparserperformsbetterontheshortsentenceswiththelengthfallingintherange[20-40].Thisislikelybecausethebottom-upparsertakesadvantagesofrichlocalfeaturesfrompartially-builttrees,whichareusefulforpars-ingshortsentences.However,theselocalstructuresarecanbeinsufficientforparsinglongsentencesduetoerrorpropagation.Ontheotherhand,thetop-downparserobtainsbetterresultsonlongsen-tenceswiththelengthfallingintherange[40-50].Thisisbecause,asthelengthofsentencesincrease,lookaheadfeaturesbecomerichandtheycouldbecorrectlyrepresentedbytheLSTM,whichisbene-ficialforparsingnon-localstructures.Wefindthatthein-orderparserperformsthebestforbothshortandlongsentences,showingtheadvantagesofinte-gratingbottom-upandtop-downinformation.6.2InfluenceofspanlengthFigure7showstheF1scoresofthethreeparsersonspansofdifferentlengths.Thetrendofperfor-mancesofthetwobaselineparsersaresimilar.Com-paredtothebaselineparsers,thein-orderparserob-tainssignificantimprovementonlongspans.Lin-guistically,itisbecausethein-ordertraversal,(over

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

421

NPVPSPPSBARADVPADJPWHNPQPTop-downparser92.8792.5191.3687.9686.7485.2175.4196.4489.41Bottom-upparser93.0192.2091.4687.9586.8184.5874.8494.9989.95In-orderparser93.2392.8391.8788.9788.0586.3076.6296.7592.16Improvement+0.22+0.32+0.41+1.01+1.04+1.09+1.21+0.31+2.01Table8:Comparisonondifferentphrasestypes.1020304050608688909294SentencelengthF1(%)Top-downparserBottom-upparserIn-orderparserFigure6:F1scoreagainstsentencelength.(thenumberofwordsinasentence,inbinsofsize10,where20containssentenceswithlengthsin[10,20).)2468101214859095spanlengthF1(%)Top-downparserBottom-upparserIn-orderparserFigure7:F1scoreagainstspanlength.atree)allowsconstituenttypesofspanstobecor-rectlyprojectedbasedontheinformationofthebe-ginning(leftmostnodes)ofthespans.Thenthepro-jectedconstituentsconstrainlongspanconstruction,whichisdifferentfromthetop-downparser,gener-atingconstituenttypesofspanswithouttraceofthespans.6.3InfluenceofconstituenttypeTable7showstheF1scoresofthethreeparsersonfrequentconstituenttypes.Thebottom-upparserperformsbetterthanthetop-downparseroncon-stituenttypesincludingNP,S,SBAR,QP.Wefindthatthepredictionoftheseconstituenttypesre-quires,explicitly,modelingofbottom-upstructures.Inotherwords,bottom-upinformationisnecessaryforustoknowifthespancanbeanounphrase(NP)orsentence(S),forexample.Ontheotherhand,thetop-downparserhasbetterperformanceonWHNP,whichislikelybecauseaWHNPstartswithacer-tainquestionword,makingthepredictioneasywith-outbottom-upinformation.Thein-orderparserper-formsthebestonallconstituenttypes,demonstrat-ingthatthein-orderparsercanbenefitfrombothbottom-upandtop-downinformation.6.4ExamplesWegiveoutputexamplesfromthetestsettoqualita-tivelycomparetheperformancesofthethreeparsersusingthefully-supervisedmodelwithoutreranking,asshowninTable9.Forexample,giventheSen-tence#2006,thebottom-upandthein-orderparsersgivebothcorrectresults.However,thetop-downparsermakesanincorrectdecisiontogenerateanS,leadingtosubsequentincorrectdecisionsonVPtocompleteS.Sentencepatternambiguityallowstop-downguidancetoover-parsingthesentencebyrec-ognizingtheword“Plans”asaverb,whilemorebottom-upinformationisusefulforthelocaldisam-biguation.GiventheSentence#308,thebottom-upparserprefersconstructionoflocalconstituentssuchas“onceproducersandcustomers”,ignoringthepos-sibleclauseSBAR,however,whichiscapturedbythein-orderparser.TheparserprojectsaconstituentSBARfromtheword“stick”andcontinuestocom-pletetheclause,showingthattop-downlookaheadinformationisnecessaryfornon-localdisambigua-tion.Thein-orderparsergivesthecorrectoutputfortheSentence#2066andtheSentence#308,show-ingthatitcanbenefitfrombottom-upandtop-downinformation.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

422

Sent#2066EmployeeBenefitPlansInc.–Gold(NPEmployeeBenefitPlansInc.-)Top-down(S(NPEmployeeBenefit)(VPPlans(NPInc.)–))Bottom-up(NPEmployeeBenefitPlansInc.-)In-order(NPEmployeeBenefitPlansInc.-)Sent#308…whetherthenewpostedpriceswillstickonceproducersandcustomersstarttohaggle.Gold…(VPwill(VPstick(SBARonce(S(NPproducersandcustomers)(VPstart(S…))))))…Top-down…(VPwill(VPstick(SBARonce(S(NPproducersandcustomers)(VPstart(S…))))))…Bottom-up…(VPwill(VPstick(NPonceproducersandcustomers)))…(VPstart(S…))…In-order…(VPwill(VPstick(SBARonce(S(NPproducersandcustomers)(VPstart(S…))))))…Sent#1715Thishasbothmadeinvestorsuneasyandthecorporationsmorevulnerable.Gold(S(NPThis)(VPhas(VPbothmade(S(Sinvestorsuneasy)and(Sthecorporations…)))).)Top-down(S(S(NPThis)(VPhas(S(NPboth)(VPmadeinvestorsuneasy))))and(Sthecorporations…).)Bottom-up(S(S(NPThis)(VPhas(Sboth(VPmadeinvestorsuneasy))))and(Sthecorporations…).)In-order(S(NPThis)(VPhasboth(VPmade(S(Sinvestorsuneasy)and(Sthecorporations…)))).)Table9:OutputexamplesofthethreeparsersontheEnglishtestset.Incorrectconstituentsaremarkedinred.IntheSentence#1715,therearecoordinatedob-jectssuchas“investorsuneasy”and“thecorpora-tionsmorevulnerable”.Allofthethreeparserscanrecognizecoordination.However,thetop-downandthebottom-upparsersincorrectlyrecognizethe“Thishasbothmadeinvestorsuneasy”asacom-pletesentence.Thetop-downparserincorrectlygeneratesS,markedinred,ataearlystage,leavingnochoicebuttofollowthisincorrectnon-terminal.Thebottom-upparserwithoutlookaheadinforma-tionmakesincorrectlocaldecisions.Bycontrast,thein-orderparserreadstheword“and”andprojectsanon-terminalSforcoordinationaftercompleting“(Sinvestorsuneasy)”.Ontheotherhand,thein-orderparserisconfusedbyprojectingfortheword“made”ortheword“both”intoanVP,whichwethinkcouldbeaddressedbyusingain-ordersystemvariantwithk=2describedinSection3.7RelatedworkOurworkisrelatedtoleftcornerparsing.RosenkrantzandLewis(1970)formalizethisinau-tomatatheory,whichhaveappearedfrequentlyinthecompilerliterature.RoarkandJohnson(1999)applythestrategyintoparsing.Typicalworksin-vestigatethetransformationofsyntactictreesbasedonleft-cornerrules(Roark,2001;Schuleretal.,2010;VanSchijndelandSchuler,2013).Incontrast,weproposeanovelgeneraltransition-basedin-orderconstituentparsingsystem.Neuralnetworkshaveachievedthestate-of-the-artforparsingundervariousgrammarformalisms,includingdependency(DozatandManning,2017),constituent(Dyeretal.,2016;Kuncoroetal.,2017),andCCGparsing(Xu,2016;Lewisetal.,2016).Seminalworkemploystransition-basedmethods(ChenandManning,2014).Thismethodhasbeenextendedbyinvestigatingmorecomplexrep-resentationsofconfigurationsforconstituentpars-ing(WatanabeandSumita,2015;Dyeretal.,2016).Dyeretal.(2016)employstack-LSTMontothetop-downsystem,whichisthesameasourtop-downparser.WatanabeandSumita(2015)employtree-LSTMtomodelthecomplexrepresentationinthestackinbottom-upsystem.Wearethefirsttoinvestigatein-ordertraversalbydesigninganoveltransition-basedsystemunderthesameneuralstruc-turemodelframework.8ConclusionWeproposedanovelpsycho-linguisticallymoti-vatedconstituentparsingsystembasedonthein-ordertraversaloversyntactictrees,aimingtofindacompromisebetweenbottom-upconstituentinfor-mationandtop-downlookaheadinformation.OnthestandardWSJbenchmark,ourin-ordersystemoutperformsbottom-upparsingonanon-localam-biguityandtop-downparsingonlocaldecision.Theresultingparserachievesthestate-of-the-artcon-stituentparsingresultsbyobtaining94.2F1andde-pendencyparsingresultsbyobtaining96.2%UASand95.2%LAS.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

423

AcknowledgmentsWethanktheanonymousreviewersfortheirde-tailedandconstructivecommentsandtheactioned-itorBrianRoark.YueZhangisthecorrespondingauthor.ReferencesDanielAndor,ChrisAlberti,DavidWeiss,AliakseiSeveryn,AlessandroPresta,KuzmanGanchev,SlavPetrov,andMichaelCollins.2016.Globallynormal-izedtransition-basedneuralnetworks.InACL,pages2442–2452.MiguelBallesteros,YoavGoldberg,ChrisDyer,andNoahASmith.2016.Trainingwithexplorationim-provesagreedystack-LSTMparser.InEMNLP,pages2005–2010.EugeneCharniakandMarkJohnson.2005.Coarse-to-finen-bestparsingandMaxEntdiscriminativererank-ing.InACL,pages173–180.EugeneCharniak.2000.Amaximum-entropy-inspiredparser.InNAACL,pages132–139.DanqiChenandChristopherManning.2014.Afastandaccuratedependencyparserusingneuralnetworks.InEMNLP,pages740–750.HaoCheng,HaoFang,XiaodongHe,JianfengGao,andLiDeng.2016.Bi-directionalattentionwithagreementfordependencyparsing.InEMNLP,pages2204–2214.DoKookChoeandEugeneCharniak.2016.Parsingaslanguagemodeling.InEMNLP,pages2331–2336.MichaelCollins.2003.Head-drivenstatisticalmodelsfornaturallanguageparsing.Computationallinguis-tics,29(4):589–637.JamesCrossandLiangHuang.2016.Span-basedcon-stituencyparsingwithastructure-labelsystemandprovablyoptimaldynamicoracles.InEMNLP,pages1–11.TimothyDozatandChristopherD.Manning.2017.Deepbiaffineattentionforneuraldependencyparsing.InICLR.GregDurrettandDanKlein.2015.NeuralCRFparsing.InACL,pages302–312.ChrisDyer,MiguelBallesteros,WangLing,AustinMatthews,andNoahA.Smith.2015.Transition-baseddependencyparsingwithstacklongshort-termmemory.InACL,pages334–343.ChrisDyer,AdhigunaKuncoro,MiguelBallesteros,andNoahA.Smith.2016.Recurrentneuralnetworkgrammars.InNAACL,pages199–209LiangHuang.2008.Forestreranking:Discriminativeparsingwithnon-localfeatures.InACL,pages586–594.MarkJohnson.1998.PCFGmodelsoflinguis-tictreerepresentations.ComputationalLinguistics,24(4):613–632.EliyahuKiperwasserandYoavGoldberg.2016.SimpleandaccuratedependencyparsingusingbidirectionalLSTMfeaturerepresentations.TransactionsoftheAs-sociationofComputationalLinguistics,4:313–327.AdhigunaKuncoro,MiguelBallesteros,LingpengKong,ChrisDyer,GrahamNeubig,andNoahA.Smith.2017.Whatdorecurrentneuralnetworkgrammarslearnaboutsyntax?InEACL,pages1249–1258.MikeLewis,KentonLee,andLukeZettlemoyer.2016.LSTMCCGparsing.InNAACL,pages221–231.JiangmingLiuandYueZhang.2017.Shift-ReduceConstituentParsingwithNeuralLookaheadFeatures.TransactionsoftheAssociationofComputationalLin-guistics,5:45–58.MitchellP.Marcus,BeatriceSantorini,andMaryAnnMarcinkiewicz.1993.Buildingalargeannotatedcor-pusofEnglish:ThePennTreebank.ComputationalLinguistics,19(2):313–330.BrianRoarkandMarkJohnson.1999.Efficientprob-abilistictop-downandleft-cornerparsing.InACL,pages421–428.BrianRoark,AsafBachrach,CarlosCardenas,andChristophePallier.2009.Derivinglexicalandsyn-tacticexpectation-basedmeasuresforpsycholinguis-ticmodelingviaincrementaltop-downparsing.InEMNLP,pages324–333.BrianRoark.2001.Robustprobabilisticpredictivesyn-tacticprocessing:motivations,models,andapplica-tions.InPh.D.thesis.DanielJ.RosenkrantzandPhilipM.Lewis.1970.De-terministicleftcornerparsing.InIEEEConferenceRecordof11thAnnualSymposiumonSwitchingandAutomataTheory,pages139–152.KenjiSagaeandAlonLavie.2005.Aclassifier-basedparserwithlinearrun-timecomplexity.InIWPT,pages125–132.WilliamSchuler,SamirAbdelRahman,TimMiller,andLaneSchwartz.2010.Broad-coverageparsingusinghuman-likememoryconstraints.ComputationalLin-guistics,36(1):1–30.HiroyukiShindo,YusukeMiyao,AkinoriFujino,andMasaakiNagata.2012.Bayesiansymbol-refinedtreesubstitutiongrammarsforsyntacticparsing.InACL,pages440–448.RichardSocher,JohnBauer,ChristopherD.Manning,andAndrewY.Ng.2013.Parsingwithcompositionalvectorgrammars.InACL,pages455–465.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
7
0
1
5
6
7
5
4
5

/

/
t

l

a
c
_
a
_
0
0
0
7
0
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

424

MarkSteedman.2000.Thesyntacticprocess,vol-ume24.MITPress.MartenVanSchijndelandWilliamSchuler.2013.Ananalysisoffrequency-andmemory-basedprocessingcosts.InHLT-NAACL,pages95–105.OriolVinyals,ŁukaszKaiser,TerryKoo,SlavPetrov,IlyaSutskever,andGeoffreyHinton.2015.Grammarasaforeignlanguage.InICLR.ZhiguoWangandNianwenXue.2014.JointPOStag-gingandtransition-basedconstituentparsinginChi-nesewithnon-localfeatures.InACL,pages733–742.ZhiguoWang,HaitaoMi,andNianwenXue.2015.Featureoptimizationforconstituentparsingvianeu-ralnetworks.InACL-IJCNLP,pages1138–1147.TaroWatanabeandEiichiroSumita.2015.Transition-basedneuralconstituentparsing.InACL,pages1169–1179.WenduanXu.2016.LSTMshift-reduceCCGparsing.InEMNLP,pages1754–1764.NaiwenXue,FeiXia,Fu-dongChiou,andMarthaPalmer.2005.ThePennChinesetreebank:Phrasestructureannotationofalargecorpus.NaturalLan-guageEngineering,11(2):207–238.YueZhangandStephenClark.2009.Transition-basedparsingofthechinesetreebankusingaglobaldiscrim-inativemodel.InIWPT,pages162–171.MuhuaZhu,YueZhang,WenliangChen,MinZhang,andJingboZhu.2013.Fastandaccurateshift-reducecon-stituentparsing.InACL,pages434–443.
Download pdf