Transactions of the Association for Computational Linguistics, 1 (2013) 301–314. Action Editor: Jason Eisner.
Submitted 6/2012; Revised 10/2012; Published 7/2013. c
(cid:13)
2013 Association for Computational Linguistics.
Data-driven,PCFG-basedandPseudo-PCFG-basedModelsforChineseDependencyParsingWeiweiSunandXiaojunWanInstituteofComputerScienceandTechnology,PekingUniversityTheMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity{ws,wanxiaojun}@pku.edu.cnAbstractWepresentacomparativestudyoftransition-,graph-andPCFG-basedmodelsaimedatil-luminatingmorepreciselythelikelycontri-butionofCFGsinimprovingChinesedepen-dencyparsingaccuracy,especiallybycom-biningheterogeneousmodels.Inspiredbytheimpactofaconstituencygrammaronde-pendencyparsing,weproposeseveralstrate-giestoacquirepseudoCFGsonlyfromde-pendencyannotations.Comparedtolinguisticgrammarslearnedfromrichphrase-structuretreebanks,welldesignedpseudogrammarsachievesimilarparsingaccuracyandhaveequivalentcontributionstoparserensemble.Moreover,pseudogrammarsincreasethedi-versityofbasemodels;therefore,togetherwithallothermodels,furtherimprovesys-temcombination.BasedonautomaticPOStagging,ourfinalmodelachievesaUASof87.23%,resultinginasignificantimprove-mentofthestateoftheart.1IntroductionPopularapproachestodependencyparsingcanbedividedintotwoclasses:grammar-freeandgrammar-based.Data-driven,grammar-freeap-proachesmakeessentialuseofmachinelearningfromlinguisticannotationsinordertoparsenewsentences.Suchapproaches,e.g.transition-based(Nivre,2008)andgraph-based(McDonald,2006;TorresMartinsetal.,2009)haveattractedthemostattentioninrecentyears.Incontrast,grammar-basedapproachesrelyonlinguisticgrammars(ineitherdependencyorconstituencyformalisms)toshapethesearchspaceforpossiblesyntacticanal-ysis.Inparticular,CFG-baseddependencyparsingexploitsamappingbetweendependencyandcon-stituencyrepresentationsandreusesparsingalgo-rithmsdevelopedforCFGtoproducedependencystructures.Inpreviouswork,data-driven,discrim-inativeapproacheshavebeenwidelydiscussedforChinesedependencyparsing.Ontheotherhand,variousPCFG-basedconstituentparsingmethodshavebeenappliedtoobtainphrase-structuresaswell.Withrichlinguisticrules,phrase-structuresofChinesesentencescanbewelltransformedtotheircorrespondingdependencystructures(Xue,2007).Therefore,PCFGparserswithsuchconversionrulescanbetakenasanothertypeofdependencyparser.WecallthemPCFG-basedparsers,inthispaper.Explicitlydefininglinguisticrulestoexpresspreciselygenericgrammaticalregularities,acon-stituencygrammarcanbeappliedtoarrangesen-tencesintoahierarchyofnestedphrases,whichde-terminesconstructionsbetweenlargerphrasesandtheirsmallercomponentphrases.Thistypeofinfor-mationisdifferentfrom,buthighlyrelatedto,theinformationcapturedbyadependencyrepresenta-tion.Aconstituencygrammar,thus,hasgreatpossi-blecontributionstodependencyparsing.Inordertopavethewayfornewandbettermethods,westudytheimpactofCFGsonChinesedependencyparsing.Aseriesofempiricalanalysisofstate-of-the-artgraph-,transition-andPCFG-basedparsersispresentedtoilluminatemorepreciselythepropertiesofheterogeneousmodels.WeshowthatCFGshaveagreatimpactondependencyparsingandPCFG-basedmodelshavecomplementarypredictivepow-erstodata-drivenmodels.Systemensembleisaneffectiveandimportanttechniquetobuildmoreaccurateparsersbasedonmultiple,diverse,weakermodels.Exploitingdiffer-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3
/
/
t
l
a
c
_
a
_
0
0
2
2
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
302
entdata-drivenmodels,e.g.transition-andgraph-basedmodels,hasreceivedthemostattentionindependencyparserensemble(NivreandMcDon-ald,2008;TorresMartinsetal.,2008;SagaeandLavie,2006).Onlyafewworksinvestigateinte-gratingdata-drivenandPCFG-basedmodels(Mc-Donald,2006).Wearguethatgrammarscansignif-icantlyincreasethediversityofbasemodels,whichplaysacentralroleinparserensemble,andthereforeleadtobetterandmorepromisinghybridsystems.Weintroduceageneralclassifierenhancingtech-nique,i.e.bootstrapaggregating(Bagging),toim-provedependencyparsingaccuracy.Thistechniquecanbeappliedtoenhanceasingle-viewparser,ortocombinemultipleheterogeneousparsers.Exper-imentsontheCoNLL09sharedtaskdatademon-strateitseffectiveness:(1)Baggingcanimprovein-dividualsingle-viewparsers,especiallythePCFG-basedone;(2)Baggingismoreeffectivethanpre-viouslyintroducedensemblemethodstocombinemulti-viewparsers;(3)Integratingdata-drivenandPCFG-basedmodelsismoreusefulthancombiningdifferentdata-drivenmodels.AlthoughPCFG-basedmodelshaveabigcon-tributiontodata-drivendependencyparsing,theyhaveaseriouslimitation:Therearenocorre-spondingconstituencyannotationsforsomedepen-dencytreebanks,e.g.ChineseDependencyTree-bank(LDC2012T05).Toovercomethislimita-tion,weproposeseveralstrategiestoacquirepseudogrammarsonlyfromdependencyannotations.Inparticular,dependencytreesareconvertedtopseudoconstituencytreesandPCFGscanbeextractedfromsuchtrees.Anothermotivationofthisstudyistoin-creasethediversityofcandidatemodelsforparserensemble.Experimentsshowthatpseudo-PCFG-basedmodelsareverycompetitive:(1)Pseudogrammarsachievesimilarorevenbetterparsingre-sultsthanlinguisticgrammarslearnedfromrichconstituencyannotations;(2)Comparedtolinguisticgrammars,welldesigned,single-viewpseudogram-marshaveanequivalentcontributiontoparseren-semble;(3)Combiningdifferentpseudogrammarsevenworkbetterforensemblethanlinguisticgram-mars;(4)Pseudo-PCFG-basedmodelsincreasethediversityofbasemodels,andthereforeleadtofur-therimprovementsforensemble.BasedonautomaticPOStagging,ourfinalmodelachievesaUASof87.23%ontheCoNLLdataand84.65%onCTB5,whichyieldrelativeerrorreduc-tionsof18-24%overthebestpublishedresultsintheliterature.2Backgroundandrelatedwork2.1Data-drivendependencyparsingThemainstreamworkonrecentdependencypars-ingfocusesondata-drivenapproachesthatautomat-icallylearntoproducedependencygraphsforsen-tencessolelyfromahand-crafteddependencytree-bank.Theadvantageofsuchmodelsisthattheyareeasilyportedtoanylanguageinwhichlabeledlinguisticresourcesexist.Practicallyallstatisti-calmodelsthathavebeenproposedinrecentyearscanbemainlydescribedaseithergraph-basedortransition-based(McDonaldandNivre,2007).BothmodelshavebeenadoptedtolearnChinesedepen-dencystructures(ZhangandClark,2011;ZhangandNivre,2011;HuangandSagae,2010;Hatorietal.,2011;Lietal.,2011,2012).Accordingtopublishedresults,graph-basedandtransition-basedparsersachievesimilaraccuracy.Inthegraph-basedframework,informativeevalu-ationresultshavebeenpresentedin(Lietal.,2011).First,secondandthirdorderprojectiveparsingmod-elsarewellevaluated.Inthetransition-basedframe-work,twoadvancedtechniqueshavebeenstud-ied.First,developingfeatureshasbeenshowncrucialtoadvancingparsingaccuracyandaveryrichfeaturesetiscarefullyevaluatedbyZhangandNivre(2011).Second,beyonddeterministicgreedysearch,principleddynamicprogrammingstrategiescanbeemployedtoexploremorepossiblehypothe-ses(HuangandSagae,2010).BothtechniqueshavebeenexaminedandshownhelpfulforChinesede-pendencyparsing.Furthermore,Hatorietal.(2011)combinedbothandobtainedastate-of-the-artsuper-visedparsingresult.2.2PCFG-baseddependencyparsingPCFG-baseddependencyparsingapproachesarebasedonthefindingthatprojectivedependencytreescanbetransformedfromconstituencytreesbyap-plyingrichlinguisticrules.Insuchapproaches,de-pendencyparsingcanberesolvedbyatwo-steppro-cess:constituentparsingandrule-basedextraction
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3
/
/
t
l
a
c
_
a
_
0
0
2
2
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
303
ofdependenciesfromphrasestructures.Thead-vantageofconstituency-grammar-basedapproachisthatallthewell-studiedparsingmethodsforsuchgrammarscanbeusedfordependencyparsingaswell.Twolanguage-specificpropertiesessentiallymakePCFG-basedapproacheseasytobeappliedtoChinesedependencyparsing:(1)Chinesegram-maticiansfavorusingprojectivestructures;1(2)Chi-nesephrase-structureannotationsnormallycontainricherinformationandthusarereliablefortreecon-version.2.2.1ConstituencyparsingComparedtomanyotherlanguages,statisticalconstituentparsingforChinesehasreachedearlysuccess,duetothefactthatthelanguagehasrela-tivelyfixedwordorderandextremelypoorinflec-tionalmorphology.BothfactsallowPCFG-basedstatisticalmodelingtoperformwell.Forthecon-stituentparsing,themajorityofthestate-of-the-artparsersarebasedongenerativePCFGlearn-ing.Forexample,thewell-knownandsuccess-fulCollinsandCharniak&Johnsonparsers(Collins,2003;Charniak,2000;CharniakandJohnson,2005)implementgenerativelexicalizedstatisticalmodels.ApartfromlexicalizedPCFGparsing,unlex-icalizedparsingwithlatentvariablegrammars(PCFGLA)canalsoproducecomparableaccuracy(Matsuzakietal.,2005;Petrovetal.,2006).Latentvariablegrammarsmodelanobservedtreebankofcoarseparsetreeswithamodelovermorerefined,butunobserved,derivationtreesthatrepresentmuchmorecomplexsyntacticprocesses.Ratherthanattemptingtomanuallyspecifyfine-grainedcate-gories,previousworkshowsthatautomaticallyin-ducingthesub-categoriesfromdatacanworkquitewell.APCFGLAparserleveragesonanautomaticproceduretolearnrefinedgrammarsandarethere-foremorerobusttoparsenon-Englishlanguagesthatarenotwellstudied.ForChinese,suchaparserachievesthestate-of-the-artperformanceandde-featsmanyothertypesofparsers,includingCollinsaswellasCharniakparser(Cheetal.,2012)and1Forexample,astwopopulardependencytreebanks,theCoNLL2009dataandtheChineseDependencyTreebankbothexcluedenon-projectiveannotations.Itisworthnotingthattheformeroneisconvertedfromaconstituencytreebankwhilethelatteroneisdirectlyannotatedbylingusitics.discriminativetransition-basedmodels(ZhangandClark,2009).2.2.2CStoDSconversionIntheabsenceofdependencyandconstituencystructuresforaparticulartreebank,treebank-guidedparserdevelopersnormallyapplyrichlinguisticrulestoconvertonerepresentationformalismtoan-othertogetnecessarydatatotrainparsers.Xue(2007)examinesthelinguisticadequacyofdepen-dencystructureannotationautomaticallyconvertedfromphrasestructuretreebankswithrule-basedap-proaches.Astructuralapproachisintroducedfortheconstituencystructure(CS)todependencystruc-ture(DS)conversionfortheChineseTreebankdata,whichisthebasisoftheCoNLL2009sharedtaskdata.Byapplyingthisconversionprocedureontheoutputsofanautomaticphrasestructureparser,wecanbuildaPCFG-baseddependencyparser.2.3ParserensembleNLPsystemsbuiltonparticularsingleviewsnor-mallycapturedifferentpropertiesofanoriginalproblem,andthereforedifferinpredictivepowers.Asaresult,NLPsystemscantakeadvantageofcom-plementarystrengthsofmultipleviews.Combiningtheoutputsofseveralsystemshasbeenshowninthepasttoimproveparsingperformancesignificantly,includingintegratingphrase-structureparsers(Hen-dersonandBrill,1999),dependencyparsers(NivreandMcDonald,2008),orboth(McDonald,2006).Severalensemblemodelshavebeenproposedfortheparsingofsyntacticconstituentsanddependen-cies,includinglearning-basedstacking(NivreandMcDonald,2008;TorresMartinsetal.,2008)andlearning-freepost-inference(HendersonandBrill,1999;SagaeandLavie,2006).SurdeanuandMan-ning(2010)presentasystematicanalysisoftheseensemblemethodsandfindseveralnon-obviousfacts:•thediversityofbaseparsersismoreimportantthancomplexmodelsforlearning,and•simplestscoringmodelforvotingandrepars-ingperformsessentiallyaswellasothermorecomplexmodels.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3
/
/
t
l
a
c
_
a
_
0
0
2
2
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
304
3AcomparativeanalysisofheterogeneousparsersTheinformationencodedinadependencyrepre-sentationisdifferentfromtheinformationcapturedinaconstituencyrepresentation.Whilethedepen-dencystructurerepresentshead-dependentrelationsbetweenwords,theconstituencystructurerepre-sentsthegroupingofwordsintophrases,classifiedbystructuralcategories.Thesedifferencesconcernwhatisexplicitlyencodedintherespectiverepresen-tations,andaffectsdata-drivenandPCFG-basedde-pendencyparsingmodelssubstantially.Inthissec-tion,wegiveacomparativeanalysisoftransition-,graph-andPCFG-basedmodelsaimedatilluminat-ingmorepreciselythelikelycontributionofCFGsindependencyparsing.3.1ExperimentalsetupPennChineseTreeBank(CTB)isasegmented,POStagged,andfullybracketedcorpusinthecon-stituencyformalism,andverypopulartoevaluatefundamentalNLPtasks,includingwordsegmenta-tion,POStagging,constituentparsingaswellasde-pendencyparsing.WeuseCTB6asourmaincorpusanddefinethetraining,developmentandtestsetsac-cordingtotheCoNLL2009sharedtask.Toevaluateandanalyzedependencyparsers,wedirectlyusetheCoNLLdata.CTB’ssyntacticannotationsalsoin-cludesfunctionalinformationandemptycategories.Modernparsers,e.g.CollinsandBerkeleyparsers,ignorethesetypesoflinguisticknowledge.Totrainaconstituentparser,weperformaheuristicproce-dureonthetreebankdatatodeletefunctiontagsandemptycategoriesaswellasitsassociatedredundantancestors.ManypapersreportedparsingresultsofanolderversionCTB(namelyCTB5).Tocomparewithsystemsintroducedinthesepapers,weevaluateourfinalensemblemodelonCTB5inSection5.4.Fordependencyparsing,wechooseasecondordergraph-basedparser2(Bohnet,2010)andatransition-basedparser(Hatorietal.,2011),forexperiments.Forconstituentparsing,wechooseBerkeleyparser,3awellknownimplementationoftheunlexicalizedPCFGLAmodelandBikelparser,42code.google.com/p/mate-tools/3code.google.com/p/berkeleyparser/4cis.upenn.edu/˜dbikel/software.htmlawellknownimplementationofCollins’lexical-izedmodel,forexperiments.Indata-drivenpars-ing,featuresconsistingofPOStagsareveryeffec-tive,sotypicallyPOStaggingisperformedasapre-processing.Weusethebaselinesequentialtaggerdescribedin(SunandUszkoreit,2012)toprovidesuchlexicalinformationtothegraph-basedparser.Notethatthetransition-basedparserperformsajointinferencetoacquirePOSanddependencyinforma-tionsimultaneously,sothereisnoneedtoofferextrataggingresultstoit.3.2OverallperformanceTable1(Column2-6)summarizestheoverallaccu-racyofdifferentparsers.Twotransition-basedpars-ingresultsarepresented:Thefirstoneemployasimplefeatureset(ZhangandClark,2008)andasmallbeam(16);thesecondoneemployrichfea-tures(ZhangandNivre,2011)andalargerbeam(32).Twograph-basedparsingresultsarereported;thedifferencebetweenthemiswhetherintegratere-lationlabelsintotheparsingprocedure.Roughlyspeaking,currentlystate-of-the-artdata-drivenmod-elsachievesslightlybetterprecisionthanunlexical-izedPCFG-basedmodelswithregardtounlabeleddependencyprediction.Thereisabiggapbetweenlexicalizedandunlexi-calizedparsing.Thesamephenomenonhasbeenob-servedby(Cheetal.,2012)and(ZhuangandZong,2010).Inadditiontodependencyparsing,ZhuangandZong(2010)foundthatBerkeleyparserpro-ducemuchmoreaccuratesyntacticanalysestoassistaChinesesemanticrolelabelerthanBikelparser.CharniakandStanfordparsersaretwootherwell-knownandfrequentlyusedtoolsthatcanprovidelexicalizedparsingresults.Accordingto(Cheetal.,2012),theyperformevenworsethanBikelparser,atleastforStanforddependencies.Duetothepoorparsingperformance,weonlyconcentrateontheun-lexicalizedmodelintheremainderofthispaper.Theperformanceoflabeleddependencypredic-tionoftheunlexicalizedPCFG-basedparserismuchlower.WecanlearnthattheCStoDSconversionisnotrobusttoassignfunctionalcategoriestodepen-denciesandsimplelinguisticrulesarenotcapabletodofine-grainedclassification.PreviousresearchonEnglishindicatesthatthemaindifficultyinde-pendencyparsingisthepredictionofdependency
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3
/
/
t
l
a
c
_
a
_
0
0
2
2
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
305
Devel.UASLASCompl.FsibFgrdTran[b=16,Z08]82.80N/A29.0066.5579.74Tran[b=32,Z11]83.80N/A31.6168.5880.87Graph[-lab]83.66N/A29.2867.9680.82Graph[+lab]84.2480.5530.9969.1181.38Unlex82.8667.4427.9869.0781.22Lex70.3858.10——Bagging(15)Tran[b=16,Z08]83.25N/A28.6667.1778.89Tran[b=32,Z11]84.25N/A31.2169.1481.49Graph[-lab]83.81N/A29.6868.0080.62Graph[+lab]84.50N/A31.4469.4881.10Unlex84.92N/A32.3571.0883.66Bagging(8)Unlex84.35N/A31.1670.4983.57Table1:Accuracyofdifferentparsers.Thefirstblockpresentsbaselineparsers;thelasttwoblockspresentBagging-enhancedparsers,wheremisrespectivelysetto15and8.Z08andZ11distinguishdifferentfeaturesets;b=16andb=32arebeamsizes.+/-labmeanswhethertoincorporaterelationlabelstoamodel.structures,andanextrastatisticalclassifiercanbeemployedtolabelautomaticallyrecognizeddepen-dencieswithahighaccuracy.AlthoughthisissueisnotwellstudiedforChinesedependencyparsing,previousresearchonfunctiontaglabeling(SunandSui,2009)andsemanticrolelabeling(Sun,2010a)givesussomeclues.Theirresearchshowsthatbothfunctionalandpredicate-argumentstructuralinfor-mationisrelativelyeasytopredictifhigh-qualitysyntacticparsesareavailable.WemainlyfocusontheUASmetricinthefollowingexperiments.3.3ConstraintsAgrammar-basedmodelutilizesanexplicitlyde-finedformalgrammartoshapethesearchspaceforpossiblesyntactichypotheses.Parametersofasta-tisticalgrammar-basedmodelarerelatedtoagram-marrule,andasaresultspecificlanguageconstruc-tionsareconstrainedbyeachother.Forexample,parametersareassignedtorewriterulesforaCFG-basedmodel.SincethePCFG-basedmodellever-agesrewriterulestolocallyconstrainseveralpossi-bledependentsforoneheadword,itdoesrelativelybetterforlocallyconnecteddependencies.Thetra-ditionalevaluationmetrics,i.e.UASandLAS,onlyconsiderbi-lexical(firstorder)dependencies,whicharesmallestpiecesofadependencystructure.Be-sidesbi-lexicaldependencies,wereportthepredic-tionaccuracyofgrandparentandsiblingdependen-cies,i.e.secondorderdependencies.Themetricsaredefinedasfollows.•Foreveryworddwhoseparentisnottheroot,weconsiderthewordtriplehd,p,giamongdanditsparentpandgrandparentg.Awordtriplehd,p,gifromapredictedtreeisconsid-eredascorrectifitalsoapprearsinthecorre-spondinggoldtree.Basedonthisdefinition,precison,recallandf-scoreofgrandparentde-pendencycanbedefinedinanormalsense.Allpunctuationsareexcludedforcalculation.•Foreverywordhthatgovernsatleasttwochil-dren(d1,…,dn),weconsidereverywordtriplehh,di,di+1i,amonghanditssiblingdepen-dentsdiaswellasdi+1(0≤i