Transactions of the Association for Computational Linguistics, 1 (2013) 139–150. Action Editor: Joakim Nivre.
Submitted 12/2012; Überarbeitet 3/2013; Published 5/2013. C
(cid:13)
2013 Verein für Computerlinguistik.
EfficientStackedDependencyParsingbyForestRerankingKatsuhikoHayashiandShuheiKondoandYujiMatsumotoGraduateSchoolofInformationScienceNaraInstituteofScienceandTechnology8916-5,Takayama,Ikoma,Nara630-0192,Japan{katsuhiko-h,shuhei-k,matsu}@is.naist.jpAbstractThispaperproposesadiscriminativefor-estrerankingalgorithmfordependencypars-ingthatcanbeseenasaformofefficientstackedparsing.Adynamicprogrammingshift-reduceparserproducesapackedderiva-tionforestwhichisthenscoredbyadiscrim-inativereranker,usingthe1-besttreeoutputbytheshift-reduceparserasguidefeaturesinadditiontothird-ordergraph-basedfeatures.Toimproveefficiencyandaccuracy,thispa-peralsoproposesanovelshift-reduceparserthateliminatesthespuriousambiguityofarc-standardtransitionsystems.TestingontheEnglishPennTreebankdata,forestrerankinggaveastate-of-the-artunlabeleddependencyaccuracyof93.12.1IntroductionTherearetwomainapproachesofdata-drivende-pendencyparsing–oneisgraph-basedandtheotheristransition-based.Inthegraph-basedapproach,globaloptimiza-tionalgorithmsfindthehighest-scoringtreewithlocallyfactoredmodels(McDonaldetal.,2005).Whilethird-ordergraph-basedmodelsachievestate-of-the-artaccuracy,ithasO(n4)timecomplexityforasentenceoflengthn.Recently,someprun-ingtechniqueshavebeenproposedtoimprovetheefficiencyofthird-ordermodels(RushandPetrov,2012;ZhangandMcDonald,2012).Thetransition-basedapproachusuallyemploystheshift-reduceparsingalgorithmwithlinear-timecomplexity(Nivre,2008).Itgreedilychoosesthetransitionwiththehighestscoreandtheresult-ingtransitionsequenceisnotalwaysgloballyop-timal.Thebeamsearchalgorithmimprovespars-ingflexibilityindeterministicparsing(ZhangandClark,2008;ZhangandNivre,2011),anddy-namicprogrammingmakesbeamsearchmoreeffi-cient(HuangandSagae,2010).Thereisalsoanalternativeapproachthatin-tegratesgraph-basedandtransition-basedmodels(SagaeandLavie,2006;ZhangandClark,2008;NivreandMcDonald,2008;Martinsetal.,2008).Martinsetal.(2008)formulatedtheirapproachasstackingofparserswheretheoutputofthefirst-stageparserisprovidedtothesecondasguidefeatures.Inparticular,theyusedatransition-basedparserforthefirststageandagraph-basedparserforthesecondstage.Themaindrawbackofthisapproachisthattheefficiencyofthetransition-basedparserissacri-ficedbecausethesecond-stageemploysfullparsing.Thispaperproposesanefficientstackedpars-ingmethodthroughdiscriminativererankingwithhigher-ordergraph-basedfeatures,whichworksontheforestsoutputbythefirst-stagedynamicpro-grammingshift-reduceparserandintegratesnon-localfeaturesefficientlywithcube-pruning(HuangandChiang,2007).Theadvantagesofourmethodareasfollows:•Unliketheconventionalstackingapproach,thefirst-stageshift-reduceparserprunesthesearchspaceofthesecond-stagegraph-basedparser.•Inadditiontoguidefeatures,thesecond-stagegraph-basedparsercanemploythescoresofthefirst-stageparserwhichcannotbeincorpo-
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
6
1
5
6
6
6
2
9
/
/
T
l
A
C
_
A
_
0
0
2
1
6
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
140
axiom(c0):0:(0,1,w0):∅goal(c2n):2N:(0,N,s0):∅shift:statepz}|{ℓ:(,J,sd|sd−1|…|s1|s0):ℓ+1:(J,j+1,sd−1|sd−2|…|s0|wj):(P)ich