Transactions of the Association for Computational Linguistics, 1 (2013) 301–314. Action Editor: Jason Eisner. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, 1 (2013) 301–314. Editor de acciones: Jason Eisner.

Submitted 6/2012; Revised 10/2012; Publicado 7/2013. C
(cid:13)

2013 Asociación de Lingüística Computacional.

Data-driven,PCFG-basedandPseudo-PCFG-basedModelsforChineseDependencyParsingWeiweiSunandXiaojunWanInstituteofComputerScienceandTechnology,PekingUniversityTheMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity{ws,wanxiaojun}@pku.edu.cnAbstractWepresentacomparativestudyoftransition-,graph-andPCFG-basedmodelsaimedatil-luminatingmorepreciselythelikelycontri-butionofCFGsinimprovingChinesedepen-dencyparsingaccuracy,especiallybycom-biningheterogeneousmodels.Inspiredbytheimpactofaconstituencygrammaronde-pendencyparsing,weproposeseveralstrate-giestoacquirepseudoCFGsonlyfromde-pendencyannotations.Comparedtolinguisticgrammarslearnedfromrichphrase-structuretreebanks,welldesignedpseudogrammarsachievesimilarparsingaccuracyandhaveequivalentcontributionstoparserensemble.Moreover,pseudogrammarsincreasethedi-versityofbasemodels;por lo tanto,togetherwithallothermodels,furtherimprovesys-temcombination.BasedonautomaticPOStagging,ourﬁnalmodelachievesaUASof87.23%,resultinginasigniﬁcantimprove-mentofthestateoftheart.1IntroductionPopularapproachestodependencyparsingcanbedividedintotwoclasses:grammar-freeandgrammar-based.Data-driven,grammar-freeap-proachesmakeessentialuseofmachinelearningfromlinguisticannotationsinordertoparsenewsentences.Suchapproaches,e.g.transition-based(Nivre,2008)andgraph-based(McDonald,2006;TorresMartinsetal.,2009)haveattractedthemostattentioninrecentyears.Incontrast,grammar-basedapproachesrelyonlinguisticgrammars(ineitherdependencyorconstituencyformalisms)toshapethesearchspaceforpossiblesyntacticanal-ysis.Inparticular,CFG-baseddependencyparsingexploitsamappingbetweendependencyandcon-stituencyrepresentationsandreusesparsingalgo-rithmsdevelopedforCFGtoproducedependencystructures.Inpreviouswork,data-driven,discrim-inativeapproacheshavebeenwidelydiscussedforChinesedependencyparsing.Ontheotherhand,variousPCFG-basedconstituentparsingmethodshavebeenappliedtoobtainphrase-structuresaswell.Withrichlinguisticrules,phrase-structuresofChinesesentencescanbewelltransformedtotheircorrespondingdependencystructures(Xue,2007).Por lo tanto,PCFGparserswithsuchconversionrulescanbetakenasanothertypeofdependencyparser.WecallthemPCFG-basedparsers,inthispaper.Explicitlydeﬁninglinguisticrulestoexpresspreciselygenericgrammaticalregularities,acon-stituencygrammarcanbeappliedtoarrangesen-tencesintoahierarchyofnestedphrases,whichde-terminesconstructionsbetweenlargerphrasesandtheirsmallercomponentphrases.Thistypeofinfor-mationisdifferentfrom,buthighlyrelatedto,theinformationcapturedbyadependencyrepresenta-tion.Aconstituencygrammar,de este modo,hasgreatpossi-blecontributionstodependencyparsing.Inordertopavethewayfornewandbettermethods,westudytheimpactofCFGsonChinesedependencyparsing.Aseriesofempiricalanalysisofstate-of-the-artgraph-,transition-andPCFG-basedparsersispresentedtoilluminatemorepreciselythepropertiesofheterogeneousmodels.WeshowthatCFGshaveagreatimpactondependencyparsingandPCFG-basedmodelshavecomplementarypredictivepow-erstodata-drivenmodels.Systemensembleisaneffectiveandimportanttechniquetobuildmoreaccurateparsersbasedonmultiple,diverse,weakermodels.Exploitingdiffer-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3

/
t

a
C
_
a
_
0
0
2
2
9
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

302

entdata-drivenmodels,e.g.transition-andgraph-basedmodels,hasreceivedthemostattentionindependencyparserensemble(NivreandMcDon-ald,2008;TorresMartinsetal.,2008;SagaeandLavie,2006).Onlyafewworksinvestigateinte-gratingdata-drivenandPCFG-basedmodels(Mc-Donald,2006).Wearguethatgrammarscansignif-icantlyincreasethediversityofbasemodels,whichplaysacentralroleinparserensemble,andthereforeleadtobetterandmorepromisinghybridsystems.Weintroduceageneralclassiﬁerenhancingtech-nique,i.e.bootstrapaggregating(Bagging),toim-provedependencyparsingaccuracy.Thistechniquecanbeappliedtoenhanceasingle-viewparser,ortocombinemultipleheterogeneousparsers.Exper-imentsontheCoNLL09sharedtaskdatademon-strateitseffectiveness:(1)Baggingcanimprovein-dividualsingle-viewparsers,especiallythePCFG-basedone;(2)Baggingismoreeffectivethanpre-viouslyintroducedensemblemethodstocombinemulti-viewparsers;(3)Integratingdata-drivenandPCFG-basedmodelsismoreusefulthancombiningdifferentdata-drivenmodels.AlthoughPCFG-basedmodelshaveabigcon-tributiontodata-drivendependencyparsing,theyhaveaseriouslimitation:Therearenocorre-spondingconstituencyannotationsforsomedepen-dencytreebanks,e.g.ChineseDependencyTree-bank(LDC2012T05).Toovercomethislimita-tion,weproposeseveralstrategiestoacquirepseudogrammarsonlyfromdependencyannotations.Inparticular,dependencytreesareconvertedtopseudoconstituencytreesandPCFGscanbeextractedfromsuchtrees.Anothermotivationofthisstudyistoin-creasethediversityofcandidatemodelsforparserensemble.Experimentsshowthatpseudo-PCFG-basedmodelsareverycompetitive:(1)Pseudogrammarsachievesimilarorevenbetterparsingre-sultsthanlinguisticgrammarslearnedfromrichconstituencyannotations;(2)Comparedtolinguisticgrammars,welldesigned,single-viewpseudogram-marshaveanequivalentcontributiontoparseren-semble;(3)Combiningdifferentpseudogrammarsevenworkbetterforensemblethanlinguisticgram-mars;(4)Pseudo-PCFG-basedmodelsincreasethediversityofbasemodels,andthereforeleadtofur-therimprovementsforensemble.BasedonautomaticPOStagging,ourﬁnalmodelachievesaUASof87.23%ontheCoNLLdataand84.65%onCTB5,whichyieldrelativeerrorreduc-tionsof18-24%overthebestpublishedresultsintheliterature.2Backgroundandrelatedwork2.1Data-drivendependencyparsingThemainstreamworkonrecentdependencypars-ingfocusesondata-drivenapproachesthatautomat-icallylearntoproducedependencygraphsforsen-tencessolelyfromahand-crafteddependencytree-bank.Theadvantageofsuchmodelsisthattheyareeasilyportedtoanylanguageinwhichlabeledlinguisticresourcesexist.Practicallyallstatisti-calmodelsthathavebeenproposedinrecentyearscanbemainlydescribedaseithergraph-basedortransition-based(McDonaldandNivre,2007).BothmodelshavebeenadoptedtolearnChinesedepen-dencystructures(ZhangandClark,2011;ZhangandNivre,2011;HuangandSagae,2010;Hatorietal.,2011;Lietal.,2011,2012).Accordingtopublishedresults,graph-basedandtransition-basedparsersachievesimilaraccuracy.Inthegraph-basedframework,informativeevalu-ationresultshavebeenpresentedin(Lietal.,2011).Primero,secondandthirdorderprojectiveparsingmod-elsarewellevaluated.Inthetransition-basedframe-work,twoadvancedtechniqueshavebeenstud-ied.First,developingfeatureshasbeenshowncrucialtoadvancingparsingaccuracyandaveryrichfeaturesetiscarefullyevaluatedbyZhangandNivre(2011).Segundo,beyonddeterministicgreedysearch,principleddynamicprogrammingstrategiescanbeemployedtoexploremorepossiblehypothe-ses(HuangandSagae,2010).BothtechniqueshavebeenexaminedandshownhelpfulforChinesede-pendencyparsing.Furthermore,Hatorietal.(2011)combinedbothandobtainedastate-of-the-artsuper-visedparsingresult.2.2PCFG-baseddependencyparsingPCFG-baseddependencyparsingapproachesarebasedontheﬁndingthatprojectivedependencytreescanbetransformedfromconstituencytreesbyap-plyingrichlinguisticrules.Insuchapproaches,de-pendencyparsingcanberesolvedbyatwo-steppro-cess:constituentparsingandrule-basedextraction

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3

/
t

a
C
_
a
_
0
0
2
2
9
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

303

ofdependenciesfromphrasestructures.Thead-vantageofconstituency-grammar-basedapproachisthatallthewell-studiedparsingmethodsforsuchgrammarscanbeusedfordependencyparsingaswell.Twolanguage-speciﬁcpropertiesessentiallymakePCFG-basedapproacheseasytobeappliedtoChinesedependencyparsing:(1)Chinesegram-maticiansfavorusingprojectivestructures;1(2)Chi-nesephrase-structureannotationsnormallycontainricherinformationandthusarereliablefortreecon-version.2.2.1ConstituencyparsingComparedtomanyotherlanguages,statisticalconstituentparsingforChinesehasreachedearlysuccess,duetothefactthatthelanguagehasrela-tivelyﬁxedwordorderandextremelypoorinﬂec-tionalmorphology.BothfactsallowPCFG-basedstatisticalmodelingtoperformwell.Forthecon-stituentparsing,themajorityofthestate-of-the-artparsersarebasedongenerativePCFGlearn-ing.Forexample,thewell-knownandsuccess-fulCollinsandCharniak&Johnsonparsers(collins,2003;Charniak,2000;CharniakandJohnson,2005)implementgenerativelexicalizedstatisticalmodels.ApartfromlexicalizedPCFGparsing,unlex-icalizedparsingwithlatentvariablegrammars(PCFGLA)canalsoproducecomparableaccuracy(Matsuzakietal.,2005;Petrovetal.,2006).Latentvariablegrammarsmodelanobservedtreebankofcoarseparsetreeswithamodelovermorereﬁned,butunobserved,derivationtreesthatrepresentmuchmorecomplexsyntacticprocesses.Ratherthanattemptingtomanuallyspecifyﬁne-grainedcate-gories,previousworkshowsthatautomaticallyin-ducingthesub-categoriesfromdatacanworkquitewell.APCFGLAparserleveragesonanautomaticproceduretolearnreﬁnedgrammarsandarethere-foremorerobusttoparsenon-Englishlanguagesthatarenotwellstudied.ForChinese,suchaparserachievesthestate-of-the-artperformanceandde-featsmanyothertypesofparsers,includingCollinsaswellasCharniakparser(Cheetal.,2012)and1Forexample,astwopopulardependencytreebanks,theCoNLL2009dataandtheChineseDependencyTreebankbothexcluedenon-projectiveannotations.Itisworthnotingthattheformeroneisconvertedfromaconstituencytreebankwhilethelatteroneisdirectlyannotatedbylingusitics.discriminativetransition-basedmodels(ZhangandClark,2009).2.2.2CStoDSconversionIntheabsenceofdependencyandconstituencystructuresforaparticulartreebank,treebank-guidedparserdevelopersnormallyapplyrichlinguisticrulestoconvertonerepresentationformalismtoan-othertogetnecessarydatatotrainparsers.Xue(2007)examinesthelinguisticadequacyofdepen-dencystructureannotationautomaticallyconvertedfromphrasestructuretreebankswithrule-basedap-proaches.Astructuralapproachisintroducedfortheconstituencystructure(CS)todependencystruc-ture(DS)conversionfortheChineseTreebankdata,whichisthebasisoftheCoNLL2009sharedtaskdata.Byapplyingthisconversionprocedureontheoutputsofanautomaticphrasestructureparser,wecanbuildaPCFG-baseddependencyparser.2.3ParserensembleNLPsystemsbuiltonparticularsingleviewsnor-mallycapturedifferentpropertiesofanoriginalproblem,andthereforedifferinpredictivepowers.Asaresult,NLPsystemscantakeadvantageofcom-plementarystrengthsofmultipleviews.Combiningtheoutputsofseveralsystemshasbeenshowninthepasttoimproveparsingperformancesigniﬁcantly,includingintegratingphrase-structureparsers(Hen-dersonandBrill,1999),dependencyparsers(NivreandMcDonald,2008),orboth(McDonald,2006).Severalensemblemodelshavebeenproposedfortheparsingofsyntacticconstituentsanddependen-cies,includinglearning-basedstacking(NivreandMcDonald,2008;TorresMartinsetal.,2008)andlearning-freepost-inference(HendersonandBrill,1999;SagaeandLavie,2006).SurdeanuandMan-ning(2010)presentasystematicanalysisoftheseensemblemethodsandﬁndseveralnon-obviousfacts:•thediversityofbaseparsersismoreimportantthancomplexmodelsforlearning,and•simplestscoringmodelforvotingandrepars-ingperformsessentiallyaswellasothermorecomplexmodels.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3

/
t

a
C
_
a
_
0
0
2
2
9
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

304

3AcomparativeanalysisofheterogeneousparsersTheinformationencodedinadependencyrepre-sentationisdifferentfromtheinformationcapturedinaconstituencyrepresentation.Whilethedepen-dencystructurerepresentshead-dependentrelationsbetweenwords,theconstituencystructurerepre-sentsthegroupingofwordsintophrases,classiﬁedbystructuralcategories.Thesedifferencesconcernwhatisexplicitlyencodedintherespectiverepresen-tations,andaffectsdata-drivenandPCFG-basedde-pendencyparsingmodelssubstantially.Inthissec-tion,wegiveacomparativeanalysisoftransition-,graph-andPCFG-basedmodelsaimedatilluminat-ingmorepreciselythelikelycontributionofCFGsindependencyparsing.3.1ExperimentalsetupPennChineseTreeBank(CTB)isasegmented,POStagged,andfullybracketedcorpusinthecon-stituencyformalism,andverypopulartoevaluatefundamentalNLPtasks,includingwordsegmenta-tion,POStagging,constituentparsingaswellasde-pendencyparsing.WeuseCTB6asourmaincorpusanddeﬁnethetraining,developmentandtestsetsac-cordingtotheCoNLL2009sharedtask.Toevaluateandanalyzedependencyparsers,wedirectlyusetheCoNLLdata.CTB’ssyntacticannotationsalsoin-cludesfunctionalinformationandemptycategories.Modernparsers,e.g.CollinsandBerkeleyparsers,ignorethesetypesoflinguisticknowledge.Totrainaconstituentparser,weperformaheuristicproce-dureonthetreebankdatatodeletefunctiontagsandemptycategoriesaswellasitsassociatedredundantancestors.ManypapersreportedparsingresultsofanolderversionCTB(namelyCTB5).Tocomparewithsystemsintroducedinthesepapers,weevaluateourﬁnalensemblemodelonCTB5inSection5.4.Fordependencyparsing,wechooseasecondordergraph-basedparser2(Bohnet,2010)andatransition-basedparser(Hatorietal.,2011),forexperiments.Forconstituentparsing,wechooseBerkeleyparser,3awellknownimplementationoftheunlexicalizedPCFGLAmodelandBikelparser,42code.google.com/p/mate-tools/3code.google.com/p/berkeleyparser/4cis.upenn.edu/˜dbikel/software.htmlawellknownimplementationofCollins’lexical-izedmodel,forexperiments.Indata-drivenpars-ing,featuresconsistingofPOStagsareveryeffec-tive,sotypicallyPOStaggingisperformedasapre-processing.Weusethebaselinesequentialtaggerdescribedin(SunandUszkoreit,2012)toprovidesuchlexicalinformationtothegraph-basedparser.Notethatthetransition-basedparserperformsajointinferencetoacquirePOSanddependencyinforma-tionsimultaneously,sothereisnoneedtoofferextrataggingresultstoit.3.2OverallperformanceTable1(Column2-6)summarizestheoverallaccu-racyofdifferentparsers.Twotransition-basedpars-ingresultsarepresented:Theﬁrstoneemployasimplefeatureset(ZhangandClark,2008)andasmallbeam(16);thesecondoneemployrichfea-tures(ZhangandNivre,2011)andalargerbeam(32).Twograph-basedparsingresultsarereported;thedifferencebetweenthemiswhetherintegratere-lationlabelsintotheparsingprocedure.Roughlyspeaking,currentlystate-of-the-artdata-drivenmod-elsachievesslightlybetterprecisionthanunlexical-izedPCFG-basedmodelswithregardtounlabeleddependencyprediction.Thereisabiggapbetweenlexicalizedandunlexi-calizedparsing.Thesamephenomenonhasbeenob-servedby(Cheetal.,2012)y(ZhuangandZong,2010).Inadditiontodependencyparsing,ZhuangandZong(2010)foundthatBerkeleyparserpro-ducemuchmoreaccuratesyntacticanalysestoassistaChinesesemanticrolelabelerthanBikelparser.CharniakandStanfordparsersaretwootherwell-knownandfrequentlyusedtoolsthatcanprovidelexicalizedparsingresults.Accordingto(Cheetal.,2012),theyperformevenworsethanBikelparser,atleastforStanforddependencies.Duetothepoorparsingperformance,weonlyconcentrateontheun-lexicalizedmodelintheremainderofthispaper.Theperformanceoflabeleddependencypredic-tionoftheunlexicalizedPCFG-basedparserismuchlower.WecanlearnthattheCStoDSconversionisnotrobusttoassignfunctionalcategoriestodepen-denciesandsimplelinguisticrulesarenotcapabletodoﬁne-grainedclassiﬁcation.PreviousresearchonEnglishindicatesthatthemaindifﬁcultyinde-pendencyparsingisthepredictionofdependency

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
2
9
1
5
6
6
6
6
3

/
t

a
C
_
a
_
0
0
2
2
9
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

305

Devel.UASLASCompl.FsibFgrdTran[b=16,Z08]82.80N/A29.0066.5579.74Tran[b=32,Z11]83.80N/A31.6168.5880.87Graph[-lab]83.66N/A29.2867.9680.82Graph[+lab]84.2480.5530.9969.1181.38Unlex82.8667.4427.9869.0781.22Lex70.3858.10——Bagging(15)Tran[b=16,Z08]83.25N/A28.6667.1778.89Tran[b=32,Z11]84.25N/A31.2169.1481.49Graph[-lab]83.81N/A29.6868.0080.62Graph[+lab]84.50N/A31.4469.4881.10Unlex84.92N/A32.3571.0883.66Bagging(8)Unlex84.35N/A31.1670.4983.57Table1:Accuracyofdifferentparsers.Theﬁrstblockpresentsbaselineparsers;thelasttwoblockspresentBagging-enhancedparsers,wheremisrespectivelysetto15and8.Z08andZ11distinguishdifferentfeaturesets;b=16andb=32arebeamsizes.+/-labmeanswhethertoincorporaterelationlabelstoamodel.structures,andanextrastatisticalclassiﬁercanbeemployedtolabelautomaticallyrecognizeddepen-dencieswithahighaccuracy.AlthoughthisissueisnotwellstudiedforChinesedependencyparsing,previousresearchonfunctiontaglabeling(SunandSui,2009)andsemanticrolelabeling(Sol,2010a)givesussomeclues.Theirresearchshowsthatbothfunctionalandpredicate-argumentstructuralinfor-mationisrelativelyeasytopredictifhigh-qualitysyntacticparsesareavailable.WemainlyfocusontheUASmetricinthefollowingexperiments.3.3ConstraintsAgrammar-basedmodelutilizesanexplicitlyde-ﬁnedformalgrammartoshapethesearchspaceforpossiblesyntactichypotheses.Parametersofasta-tisticalgrammar-basedmodelarerelatedtoagram-marrule,andasaresultspeciﬁclanguageconstruc-tionsareconstrainedbyeachother.Forexample,parametersareassignedtorewriterulesforaCFG-basedmodel.SincethePCFG-basedmodellever-agesrewriterulestolocallyconstrainseveralpossi-bledependentsforoneheadword,itdoesrelativelybetterforlocallyconnecteddependencies.Thetra-ditionalevaluationmetrics,i.e.UASandLAS,onlyconsiderbi-lexical(ﬁrstorder)dependencies,whicharesmallestpiecesofadependencystructure.Be-sidesbi-lexicaldependencies,wereportthepredic-tionaccuracyofgrandparentandsiblingdependen-cies,i.e.secondorderdependencies.Themetricsaredeﬁnedasfollows.•Foreveryworddwhoseparentisnottheroot,weconsiderthewordtriplehd,pag,giamongdanditsparentpandgrandparentg.Awordtriplehd,pag,gifromapredictedtreeisconsid-eredascorrectifitalsoapprearsinthecorre-spondinggoldtree.Basedonthisdeﬁnition,precison,recallandf-scoreofgrandparentde-pendencycanbedeﬁnedinanormalsense.Allpunctuationsareexcludedforcalculation.•Foreverywordhthatgovernsatleasttwochil-dren(d1,…,dn),weconsidereverywordtriplehh,di,di+1i,amonghanditssiblingdepen-dentsdiaswellasdi+1(0≤ir rate Figure1:Nominalvs.verbalconstructions.Argumentsinexocentricconstructionshelpcom-pletethemeaningofapredicateandaretakentobeobligatoryandselectedbytheirheads;adjunctsin l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 306 endocentricconstructionsarestructurallydispens-ablepartsthatprovideauxiliaryinformationandtakentobeoptionalandnotselectedbytheirheads.AnimportantannotationpolicyoftheCTBis“onegrammaticalrelationperbracket”,whichmeanseachconstituentfallsintooneofthethreeprimitivegrammaticalrelations:(1)head-complementation,(2)head-adjunctionand(3)coordination.Addi-tionally,theargumentisattachedatalevelthatis“closer”totheheadthantheadjuncts.Duetothelinguisticpropertiesofdifferentdependentsandtheannotationstrategies,agrammar-basedmodelcancapturemoresyntacticpreferencepropertiesofar-gumentsviahardconstraints,i.e.grammarrules,andarethereforemoresuitabletoanalyzeexocen-tricconstructions.Figure1istheerrorrateofunlabeleddependen-ciesconsideringdifferentconstruction.Aconstruc-tion“←X←”isconsideredascorrectlypredictedifandonlyifalldependentwordsandheadwordofXarecompletelycorrectlyfound.Theerrorrateintermsofthismetricseemsratherhighbecausetheunitsweconsiderarenormallymuchlargerthanwordpairs.Fromthisﬁgure,wecanclearlyseethatthedata-drivenparserdoesbetterforthepredictionofnominalconstructions(NN/NR/NT/PN5),whichrelatemoreonoptionaladjunctsormodiﬁers;thegrammar-basedparserperformsbetterforthepre-dictionofverbalconstructions(VC/VE/VV),whichrelatemoreonobligatoryarguments.Theevalua-tionofthenominalandverbalconstructionsroughlyconﬁrmsthestrengthofgrammar-basedmodeltopredictverbalconstructions.4BaggingparsersThecomparativeanalysishighlightsthefundamen-taldiversitybetweendata-drivenandPCFG-basedmodels.Inordertoexploitthediversitygain,wead-dresstheissueofparsercombination.Weemployageneralensemblelearningtechnique,i.e.Bag-ging,toenhanceasingle-viewparserandtocom-binemulti-viewparsers.5Forthedeﬁnitionandillustrationofthesetags,pleaserefertotheannotationguidelines(http://www.cis.upenn.edu/˜chinese/posguide.3rd.ch.pdf).4.1ApplyingBaggingtodependencyparsingBaggingisamachinelearningensemblemeta-algorithmtoimproveclassiﬁcationandregressionmodelsintermsofstabilityandclassiﬁcationaccu-racy(Breiman,1996).Italsoreducesvarianceandhelpstoavoidoverﬁtting.GivenatrainingsetDofsizen,BagginggeneratesmnewtrainingsetsDiofsizen′≤n,bysamplingexamplesfromD.mmodelsareseparatelylearnedonthemnewtrain-ingsetsandcombinedbyvoting(forclassiﬁcation)oraveragingtheoutput(forregression).Hender-sonandBrill(2000)successfullyappliedBaggingtoenhanceaconstituentparser.Moreover,BagginghasbeenappliedtocombinemultiplesolutionsforChineselexicalprocessing(Sol,2010b;SunandUszkoreit,2012).Inthispaper,weapplyBaggingtodependencyparsing.Sincetrainingevenonesin-gleparsertakeshours(ifnotdays),experimentsonBaggingistime-consuming.Tosavetime,wecon-ductdata-drivenparsingexperimentsbasedonsim-pleconﬁguration.Morespeciﬁcally,thebeamsizeofthetransition-basedparserissetto16,andthesimplefeaturesetisutilized;dependencyrelationsarenotincorporatedforthegraph-basedparser.Bootrappingstep.Inthetrainingphase,givenatrainingsetDofsizen,ourmodelgeneratesmnewtrainingsetsDiofsizeλnbysamplinguniformlywithoutreplacement.EachDicanbeusedtotrainasingle-viewparserormultipleparsersaccordingtodifferentviews.Usingthisstrategy,wecangetmweakparsersorkmparsersifmultipleviewsareim-plemented.Intheparsingphase,foreachsentence,el(k)mmodelsoutput(k)mcandidateanalysesthatarecombinedinapost-inferenceprocedure.Aggregatingstep.Differentfromclassiﬁcationproblems,simplevotingschemeisnotsuitableforparsing,whichisatypicalstructuredpredictionproblem.Toaggregateoutputsof(k)msub-models,astructuredinferenceprocedureisneeded.SagaeandLavie(2006)presentaframeworkforcombin-ingtheoutputofseveraldifferentparserstoproduceresultsthataresuperiortoeachoftheindividualparsers.Weimplementtheirmethodtoaggregatemodels.Oncewehaveobtainedmultipledepen-dencytreesrespectivelyfrombaseparsers,wecanbuildagraphwhereeachwordinthesentenceisa l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 307 node.Wethencreateweighteddirectededgesbe-tweenthenodescorrespondingtowordsforwhichdependenciesareobtainedfromeachoftheinitialstructures.Theweightsaretheword-by-wordvotingresultsofsub-models.Basedonthisgraph,thesen-tencecanbereparsedbyagraph-basedalgorithm.TakingChineseasaprojectivelanguage,weuseEis-ner’salgorithm(Eisner,1996)tocombinemultipledependencyparses.SurdeanuandManning(2010)indicatesthatreparsingperformsessentiallyaswellasothersimplerormorecomplexmodels.4.2ParametertuningWeevaluateourcombinationmodelonthesamedatasetusedinthelastsection.Thetwohyper-parameters(λandm)ofourBaggingmodelaretunedonthedevelopment(validation)set.Ononehand,withtheincreaseofthesizeofsub-samples,i.e.λ,theperformanceofsub-modelsisimproved.However,sincethesub-modelsoverlapmore,thedi-versityofbasemodelsforensemblewilldecreaseandtheﬁnalpredictionaccuracymaygodown.Toevaluatetheeffectofλ,weseparatelysample50%,60%,70%and80%sentencesfromtheoriginaltrainingdata5times,train5sub-modelsforeachparser,andcombinethemtogether.Thebeamsizeofthetransition-basedparserissetto16.Table2showstheinﬂuenceofthechoiceofλ’s.Forallfol-lowingexperiments,wesetλ=0.7.λ50%60%70%80%Tran+Graph[-lab]+Unlex83.5085.9686.1585.60Table2:UASofBagging(5)modelswithdifferentλ.ThesecondparameterforBaggingisthenumberofsub-modelstobeusedforcombination.Figure2summarizestheBaggingperformancewhendiffer-entmodelsareemployedanddifferentnumber(i.e.m)ofsubsamplesareused.Fromthisﬁgure,wecanlearntheinﬂuenceofthenumberofsub-models.4.3Baggingsingle-viewparsers4.3.1ResultsTable1indicatesthatBaggingcanimprovein-dividualsingle-viewparsers,especiallyBerkeleyparser.IfwetakeBaggingasageneralparseren-hancementtechniqueandstillconsideraBagging-enhancedparserasasingleview,weconclude81.582.583.584.585.586.53456789Graph[-lab]TranUnlexGraph[-lab]+UnlexTran+UnlexGraph[-lab]+TranFigure2:AveragedUASofdifferentBaggingmodelswithdifferentnumbersofsamplingdatasets.thatBagging-enhancedPCFG-basedmethodworksbestamongstate-of-the-artapproaches.Forthetransition-basedparser,thoughthescoreoversinglewordsgoesup,thescoreoversentencesgoesdown.Themainreasonisthatthereparsingalgorithmisagraph-basedone,whichperformsworsewithregardtothepredictionofawholesentence.Theimprove-mentforthegraph-basedparserisverymodest.WetrainaBagging(8)-enhancedBerkeleyparser,whichachievesequivalentoverallUAStodata-drivenparsers,andcomparetheirparsingabilitiesofsecondorderdependencies.NowwecanmoreclearlyseethattheBagging-enhancedPCFG-basedmodelperformsbetterinthepredictionofsecondorderdependencies.4.3.2RelatedexperimentsonsequencemodelsBagginghasbeenappliedtoenhancediscrimina-tivesequencemodelsforChinesewordsegmenta-tion(Sol,2010b)andPOStagging(SunandUszko-reit,2012).Forwordsegmentation,experimentsondiscriminativeMarkovandsemi-Markovtaggingmodelsarereported.TheirexperimentsshowedthatBaggingcanconsistentlyenhanceasemi-MarkovmodelbutnottheMarkovone.ExperimentsonPOStaggingindicatedthatBaggingMarkovmodelshurtstaggingperformance.ItseemsthattherelationshipsamongbasicprocessingunitsaffectBagging.PCFGLAparsersarebuiltupongenerativemod-elswithlatentannotations.Theuseofautomati-callyinducedlatentvariablesmayalsoaffectBag-ging.Generativesequencemodelswithlatentanno- yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 308 tationscanalsoachievegoodperformanceforChi-nesePOStagging.Huangetal.(2009)describedandevaluatedabi-gramHMMtaggerthatutilizeslatentannotations.DifferentfromnegativeresultsofBaggingdiscriminativemodels,ourauxiliaryexper-imentshowsthatBaggingHuangetal.’staggercanhelpChinesePOStagging.Inotherwords,BaggingsubstantiallyimprovesbothHMMLAandPCFGLAmodels,atleastforChinesePOStaggingandcon-stituencyparsing.ItseemsthatBaggingfavorstheuseoflatentvariables.4.4Baggingmulti-viewparsers4.4.1ResultsFigure2clearlyshowsthattheBaggingmodeltakingbothdata-drivenandPCFG-basedmodelsasbasicsystemsoutperformtheBaggingmodeltakingeithermodelinisolationasbasicsystems.Thecom-binationofaPCFG-basedmodelandadata-drivenmodel(eithergraph-basedortransition-based)ismoreeffectivethanthecombinationoftwodata-drivenmodels,whichhasreceivedthemostatten-tionindependencyparserensemble.Table3istheperformanceofreparsingonthedevelopmentdata.Fromthistable,wecanseebyutilizingmoreparsers,Baggingcanenhancereparsing.AccordingtoSurdeanuandManning(2010)’sﬁndings,repars-ingperformsaswellasothercombinationmod-els.Ourauxiliaryexperimentsconﬁrmthisﬁnding:Learning-basedstackingcannotachievebetterper-formance.Limitedtothedocumentlength,wedonotgivedescriptionsoftheseexperiments.Devel.UASReparsing(Tran[b=16,Z08]+Graph[-lab]+Unlex)85.82+Bagging(15)86.37bagging(reparse(gramo,t,C))86.09reparse(bagging(gramo,t,C))85.86Table3:UASofreparsingandBagging.4.4.2AnalysisInourproposedmodel,Bagginghasatwo-foldeffect:Oneisasasystemcombinationtechniqueandtheotherasageneralparserenhancingtech-nique.Twoadditionalexperimentsareperformedtoevaluatethesetwoeffects.Toillustratethediffer-encesbetweenthesetwoexperiments,respectivelydenotegraph-based,transition-basedandPCFG-basedparsersasg,tandc;denotethereparsingprocedureasreparseandtheBaggingprocedureasbagging.Thetwoexperimentsareasfollows.•Baggingahybridparser.Inthisexperiment,foreachsub-sampleDi,weﬁrsttrainthreeparsers:gi,tiandci.Thenwecombinethesethreeparsersbyreparsingandconstructahy-bridparserreparse(gi,de,ci).Finalmente,allhy-bridparsersarecollectedtobuildtheﬁnalparser:bagging(reparse(gramo,t,C)).•CombiningBagging-enhancedparsers.Inthisexperiment,foreachmodel,weﬁrsttrainthreeBagging-enhancedparsers:bagging(gramo),bagging(t)andbagging(C).ThenthesethreeBagging-enhancedparsersarecom-binedbyreparsingtobuildtheﬁnalparser:reparse(bagging(gramo,t,C)).EvaluationresultsarepresentedinTable3.5Pseudo-grammar-basedmodelsAlthoughthecombinationofdata-drivenandgrammar-basedmodelsisveryeffective,ithasaseriouslimitation:Itisonlyapplicablewhencon-stituencyannotationsareavailabletolearningagrammar.However,manytreebanks,e.g.ChineseDependencyTreebank(LDC2012T05),donothavesuchlinguisticallyrichstructures.Ourexperimentsalsosuggestthataconstituencygrammarcansig-niﬁcantlyincreasethediversityofbasemodelsforparserensemble,whichplaysamajorroleinboost-ingpredictionaccuracy.Inordertoreducetheneedforphrase-structureannotations,andtoincreasethediversityofcandi-dateparsers,westudylearningpseudogrammarsfordependencyparsing.Thekeyideaisverysim-ple:Byconvertingadependencystructuretoaconstituencyone,wecanreusethePCFGLAap-proachtolearnpseudogrammarsfordependencyparsing.Figure3isanexample.Theﬁrsttreeisanoriginaldependencyparse,whilethesecondtreeisthecorrespondingCTBannotation.Thenexttwotreesaretwoautomaticallyconvertedpseudocon-stituencytrees.ByapplyingDStoCSrules,wecanacquirepseudoconstituencytreebanksandthenlearnpseudogrammarsfromthem. yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 309 (1)Dependencytree(2)Linguisticconstituencytree(3)Flatconstituencytree(4)BinarizedconstituencytreeFigure3:Anexample:Chinaencouragesprivateentrepreneurstoinvestinnationalinfrastructure.Thebasicideaofourmethodistouseparsingmodelsinoneformalismforparsinginanotherfor-malism.Inpreviouswork,PCFGsareusedtosolveparsingproblemsinmanyotherformalisms,includ-ingdependency(Collinsetal.,1999),CCG(FowlerandPenn,2010),LFG(Cahilletal.,2004)andHPSG(ZhangandKrieger,2011)parsing.5.1StrategiesforDStoCSconversionTheconversionfromDStoCSisanon-trivialprob-lem.Onemainissueintheconversionistheindeter-minancyinthechoiceofaphrasalcategorygivenadependencyrelation,thelevelandpositionofattach-mentofadependentintheconstituencystructure,asdependencyrelationstypicallydonotencodesuchinformation.ToconvertaDStoaCS,especiallyfordependencyparsing,weshouldconsider(1)howtotransformbetweenthetopologicalstructures,(2)howtoinduceasyntacticcategory,y(3)howtoeasilyrecoverdependencytreesfrompseudocon-stituencytrees.Fromthesethreeaspects,wepresentthefollowingstrategies.5.1.1TopologicalstructureThetopologicalstructuresrepresenttheboundaryinformationofconstituentsinagivensentence.De-pendencystructuresdonotdirectlyrepresentsuchboundaryinformation.Nevertheless,acompletesubtreeinaprojectivedependencytreeshouldbeconsideredasaconstituent.Wecanconstructaveryﬂatconstituenttree,ofwhichnodesareassociatedwithcompletesubtreesofadependencyparse.ThethirdtreeinFigure3isanexampleofsuchconver-sion.Right-to-leftbinarizationAccordingtothestudyin(Sol,2010a),headwordsofmostphrasesinChinesearelocatedattheﬁrstorthelastposition.Thatmeansforbinarizingmostphrases,weonlyneedsequentiallycombinetherightorleftpartsto-getherwiththeirheadphrases.Mainexceptionsareclauses,ofwhichtheheadpredicatelocatesinside,sinceChineseisanSVOlanguage.Todealwiththeseexceptions,wespliteachphrasewhoseheadchildisinsideitselfintothreeparts:leftchild(ren),headandrightchild(ren).Weﬁrstsequentiallycom-binetheheadanditsrightchild(ren)thatareusu-allyobjectsasintermediatephrases,thensequen-tiallycombinetheleftchild(ren)untilreachtheorig-inalparentnode.Forexample,theﬁrstrewriteruleinfollowsshouldbetransferredintothesecondandthirdtypesofrules.1.Xp→X1,...,Xi,...,Xm l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 310 2.X′p→Xi,Xi+1;Xp′′→Xp′,Xi+2;...3.X∗p→Xi−1,Xp′...′;X∗∗p→Xi−2,X∗p;...Thisright-to-leftbinarizationstrategyisconsistentwithmostChinesetreebankannotationschemes.ThefourthtreeinFigure3isanexampleofbina-rizedpseudotree.5.1.2PhrasalcategoryProjectionprincipleisintroducedbyChomskytolinktogetherthelevelsofsyntacticdescription.Itconnectssyntacticstructureswithlexicalentries:Lexicalstructuremustberepresentedcategoricallyateverysyntacticlevel,andrepresentationsateachlevelofsyntaxareprojectedfromthelexiconinthattheyobservethesubcategorisationpropertiesoflex-icalitems.Accordingtothisprinciple,itisreason-abletousethelexicalcategory(POS)oftheheadwordasthephrasalcategoryofaphrase.5.1.3AuxiliarysymbolWecanuseauxiliarysymbolstodenotetheheadphrasepositioninaCFGrule.Inotherwords,somecategoriesmaybesplittedintosubcategoriesaccord-ingtoiftheyareheadphrasesoftheirparentnodesorwhichchildrenaretheirheadphrases.Auxiliarysymbolscouldbeeitherassignedtooneoftherighthandsideorthelefthandside.TheﬁrstchoiceistoconvenientlyuseaHsymboltoindicatethatcurrentphraseistheheadofitsparentnode.ThesecondchoiceistopracticallyuseanLorRsymboltoindi-catetheheadofcurrentnodeisitsleftorrightchild,inabinarizedtree.Thefollowingtablegivesanex-ampleofdifferentruleswithauxiliarysymbols.WithheadsymbolWithleft/rightsymbolXl→Xl#H,XrXl#L→Xl,XrXr→Xl,Xr#HXr#R→Xl,Xr5.2ThreeconversionsTakingintoaccounttheabovestrategies,weproposethreeconcreteDStoCSconversions:FlatconversionwithHauxiliarysymbol(FlatH).JustasshownasthethirdtreeinFigure3,wecanlearnagrammarfromveryﬂatconstituencytreeswheretheauxiliarysymbolHisusedforextractingdependencies.Right-to-leftbinarizingwithHauxiliarysymbol(BinH).Differentfromtheﬂatconversion,webi-narizeatreeaccordingtotheright-to-leftprinciple.AuxiliarysymbolHischosen.Right-to-leftbinarizingwithLRauxiliarysym-bol(BinLR).Differentfromthesecondtypeofconversion,weuseauxiliaryL/Rsymbolstodenoteheadphrases.SeethefourthtreeinFigure3forin-stance.Practically,everyconstituencyparsethatispro-ducedbyparserstrainedwithbinarizedtreesexactlymapstoonedependencytree.However,theparsertrainedwithﬂattreesmayproduceverybadcon-stituencyresults.Sometimes,oneparentnodemayhavezerochildthatisassignedwithHormorethanonechildrenthatareareassignedH.Intheﬁrstcase,weselecttherightmostchildastheheadofsuchparent,whileinthesecondcase,weselecttherightmostonefromthechildrenthatareassignedH.5.3Evaluation5.3.1EquivalentparsingaccuracyDevel.BaseBagging(15)CTB83.49%84.92%FlatH80.15%83.53%BinH81.80%84.64%BinLR82.46%84.90%Table4:UASofpseudo-grammar-basedmodels.Table4summarizestheperformanceofdiffer-entpseudo-grammar-basedmodels.ComparedtothelinguisticgrammarlearnedfromCTB,wecanseethatpseudogrammarsareverycompetitive.Notthat,theFlatH/BinH/BinLRtreesarederivedfromtheCoNLLdata,ratherthantheoriginalCTB.AmongdifferentDStoCSconversionstrategies,theBinLRconversionworksbest.Moreinterestingly,whenweenhancethePCFGLAmethodbyusingBagging,theBinLRmodelperformsaswellasthereal-grammar-basedmodel.5.3.2BettercontributiontoensembleTheexperimentsaboveindicatethatwecaneas-ilybuildgoodgrammar-baseddependencyparserwithoutanyconstituencyannotations.Thefol-lowingexperimentsonparsercombinationshowthatcomparedtothelinguisticgrammar,binHand l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 311 Devel.UASTran+Graph+CTB86.37%Tran+Graph+FlatH86.14%Tran+Graph+BinH86.29%Tran+Graph+BinLR86.28%Tran+Graph+ﬂat+BinH+BinLR87.03%Tran+Graph+CTB+FlatH86.96%Tran+Graph+CTB+BinH87.10%Tran+Graph+CTB+BinLR87.15%Tran+Graph+CTB+BinH+BinLR87.38%Tran+Graph+CTB+FlatH+BinH+BinLR87.35%Table5:UASofdifferentBagging(15)models.binLRgrammarshaveequivalentcontributionstoparserensemble.Table5presentstheensem-bleperformanceonthedevelopmentdata.ByBagging,thedata-drivenmodelstogetherwithei-therrealgrammar-basedorpseudo-grammar-basedmodelreachasimilarUAS.5.3.3IncreasedparserdiversitySincepseudogrammarsareverydifferentfromrealgrammarsthatareinducedfromlarge-scalelin-guisticannotations.Pseudo-grammar-basedparsingmodelsbehaveverydifferentlywithgrammar-basedmodels.Inotherwords,theyincreasethediver-sityofmodelcandidatesforparserensemble.Asaresult,pseudo-grammar-basedmodelsleadtofur-therimprovementsforparsercombination.Table5showsthatthecombinationofdata-driven,PCFG-basedandbinarizedpseudo-grammar-basedmodelsissigniﬁcantlybetterthanthecombinationofdata-drivenandPCFG-basedmodels.5.4Comparisontothestate-of-the-artTable6summarizestheparsingperformanceonthetestdataset,aswellasthebestpublishedresultre-portedinLietal.(2012).Tofairlycomparetheper-formanceofourparserandothersystemswhicharebuiltwithoutlinguisticconstituencytrees,weonlyusepseudo-PCFGsinthisexperiment.BasedonautomaticPOStagging,ourﬁnalmodelachievesaUASof87.23%,whichyieldsarelativeerrorreduc-tionof24%overthebestpublishedresult.Table6alsopresentstheresultsevaluatedontheCTB5datathatismorewidelyusedforpreviousresearch.Lietal.(2011)andHatorietal.(2011)respec-tivelyevaluatedtheirgraph-basedandtransition-basedparsers;ZhangandClark(2011)evaluatedCoNLL-testUAS(Lietal.,2012)83.23%Graph+Tran+FlatH+BinH+BinLR87.23%CTB5-testUAS(Lietal.,2011)80.79%(Hatorietal.,2011)81.33%(ZhangandClark,2011)81.21%Graph+Tran+FlatH+BinH+BinLR84.65%Table6:UASofdifferentmodelsonthetestdata.ahybriddata-drivenparser.Ourmodelissigniﬁ-cantlybetterthanthesesystems:ItachievesaUASof84.65%,whichobtainsanerrorreductionof18%overthebestsystemintheliterature.6ConclusionandFutureWorkTherehavebeenseveralattemptstodevelophighaccuracyparsersinbothconstituencyanddepen-dencyformalismsforChinese,andmanysuccessfulparsingalgorithmsdesignedforEnglishhavebeenapplied.However,thestate-of-the-artstillfallsfarshortwhencomparedtoEnglish.Thispaperstud-iesdata-drivenandPCFG-basedmodelsforChinesedependencyparsing.Wepresentacomparativeanal-ysisoftransition-,graph-,andPCFG-basedparsers,whichhighlightsthesystematicdifferencesbetweendata-drivenandPCFG-basedmodels.Ouranalysismaybeneﬁtparserensemble,parserco-training,ac-tivelearningfortreebankconstruction,andsoon.Inordertoexploitthediversitygain,weaddresstheissueofparsercombination.Toovercomethelimitationofthelackofconstituencytreebanks,westudypseudo-grammar-basedmodels.Experimentalresultsshowthatcombiningvariousdata-drivenandPCFG-basedmodelssigniﬁcantlyadvancethestate-of-the-art,andbyconvertingparsetrees,wecanstilltakeadvantagesoftheconstituencyrepresentationevenwithoutconstituencyannotations.AcknowledgementWewouldliketothankthankallanonymousreview-erswhosevaluablecommentsledtosignilicantre-visions.TheﬁrstauthorwouldliketothankProf.HansUszkoreitfordiscussionandfeedbackofanearlyversionofthiswork.TheworkwassupportedbyNSFC(61170166),BeijingNovaProgram(2008B03)andNationalHigh-TechR&DProgram(2012AA011101). yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 312 ReferencesBerndBohnet.2010.Topaccuracyandfastde-pendencyparsingisnotacontradiction.InPro-ceedingsofthe23rdInternationalConferenceonComputationalLinguistics(Coling2010),pages89–97.Coling2010OrganizingCommittee,Bei-jing,China.URLhttp://www.aclweb.org/anthology/C10-1011.LeoBreiman.1996.Baggingpredictors.MachineLearning,24(2):123–140.AoifeCahill,MichaelBurke,RuthO’Donovan,JosefVanGenabith,andAndyWay.2004.Long-distancedependencyresolutioninautomaticallyacquiredwide-coveragepcfg-basedlfgapprox-imations.InProceedingsofthe42ndMeet-ingoftheAssociationforComputationalLin-guistics(ACL’04),MainVolume,pages319–326.Barcelona,Spain.URLhttp://www.aclweb.org/anthology/P04-1041.EugeneCharniak.2000.Amaximum-entropy-inspiredparser.InProceedingsoftheﬁrstcon-ferenceonNorthAmericanchapteroftheAssoci-ationforComputationalLinguistics.EugeneCharniakandMarkJohnson.2005.Coarse-to-ﬁnen-bestparsingandmaxentdiscriminativereranking.InProceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics(ACL’05),pages173–180.Associa-tionforComputationalLinguistics,AnnArbor,Michigan.WanxiangChe,ValentinSpitkovsky,andTingLiu.2012.Acomparisonofchineseparsersforstanforddependencies.InProceedingsofthe50thAnnualMeetingoftheAssoci-ationforComputationalLinguistics(Volume2:ShortPapers),pages11–16.Associa-tionforComputationalLinguistics,JejuIsland,Korea.URLhttp://www.aclweb.org/anthology/P12-2003.MichaelCollins.2003.Head-drivenstatisticalmod-elsfornaturallanguageparsing.ComputationalLinguistics,29(4):589–637.MichaelCollins,JanHajic,LanceRamshaw,andChristophTillmann.1999.Astatisticalparserforczech.InProceedingsofthe37thAnnualMeet-ingoftheAssociationforComputationalLin-guistics,pages505–512.AssociationforCom-putationalLinguistics,CollegePark,Maryland,USA.URLhttp://www.aclweb.org/anthology/P99-1065.JasonM.Eisner.1996.Threenewprobabilis-ticmodelsfordependencyparsing:anex-ploration.InProceedingsofthe16thcon-ferenceonComputationallinguistics-Vol-ume1,COLING’96,pages340–345.Associa-tionforComputationalLinguistics,Stroudsburg,Pensilvania,USA.URLhttp://dx.doi.org/10.3115/992628.992688.TimothyA.D.FowlerandGeraldPenn.2010.Ac-curatecontext-freeparsingwithcombinatorycat-egorialgrammar.InProceedingsofthe48thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages335–344.Associ-ationforComputationalLinguistics,Uppsala,Sweden.URLhttp://www.aclweb.org/anthology/P10-1035.JunHatori,TakuyaMatsuzaki,YusukeMiyao,andJun’ichiTsujii.2011.Incrementaljointpostag-ginganddependencyparsinginchinese.InPro-ceedingsof5thInternationalJointConferenceonNaturalLanguageProcessing,pages1216–1224.AsianFederationofNaturalLanguageProcess-ing,ChiangMai,Thailand.URLhttp://www.aclweb.org/anthology/I11-1136.JohnHendersonandEricBrill.1999.Exploitingdi-versityinnaturallanguageprocessing:Combin-ingparsers.InInProceedingsoftheFourthCon-ferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages187–194.JohnC.HendersonandEricBrill.2000.Bag-gingandboostingatreebankparser.InPro-ceedingsofthe1stNorthAmericanchapteroftheAssociationforComputationalLinguisticsconference,NAACL2000,pages34–41.Asso-ciationforComputationalLinguistics,Strouds-burg,Pensilvania,USA.URLhttp://dl.acm.org/citation.cfm?id=974305.974310.LiangHuangandKenjiSagae.2010.Dynamicpro-grammingforlinear-timeincrementalparsing.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,pages l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 313 1077–1086.AssociationforComputationalLin-guistics,Uppsala,Sweden.URLhttp://www.aclweb.org/anthology/P10-1110.ZhongqiangHuang,VladimirEidelman,andMaryHarper.2009.Improvingasimplebigramhmmpart-of-speechtaggerbylatentannotationandself-training.InProceedingsofHumanLan-guageTechnologies:The2009AnnualConfer-enceoftheNorthAmericanChapteroftheAsso-ciationforComputationalLinguistics,Compan-ionVolume:ShortPapers,pages213–216.As-sociationforComputationalLinguistics,Roca,Colorado.URLhttp://www.aclweb.org/anthology/N/N09/N09-2054.ZhenghuaLi,TingLiu,andWanxiangChe.2012.Exploitingmultipletreebanksforpars-ingwithquasi-synchronousgrammars.InPro-ceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages675–684.Associa-tionforComputationalLinguistics,JejuIsland,Korea.URLhttp://www.aclweb.org/anthology/P12-1071.ZhenghuaLi,MinZhang,WanxiangChe,TingLiu,WenliangChen,andHaizhouLi.2011.JointmodelsforChinesepostagginganddepen-dencyparsing.InProceedingsofthe2011Con-ferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages1180–1191.AssociationforComputationalLinguistics,Edimburgo,Scot-land,UK.URLhttp://www.aclweb.org/anthology/D11-1109.TakuyaMatsuzaki,YusukeMiyao,andJun’ichiTsujii.2005.Probabilisticcfgwithlatentan-notations.InProceedingsofthe43rdAn-nualMeetingonAssociationforComputationalLinguistics,ACL’05,pages75–82.Associa-tionforComputationalLinguistics,Stroudsburg,Pensilvania,USA.URLhttp://dx.doi.org/10.3115/1219840.1219850.RyanMcDonald.2006.Discriminativelearningandspanningtreealgorithmsfordependencypars-ing.Ph.D.thesis,UniversityofPennsylvania,Filadelfia,Pensilvania,USA.AAI3225503.RyanMcDonaldandJoakimNivre.2007.Char-acterizingtheerrorsofdata-drivendependencyparsingmodels.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning(EMNLP-CoNLL),pages122–131.AssociationforCom-putationalLinguistics,Prague,CzechRepub-lic.URLhttp://www.aclweb.org/anthology/D/D07/D07-1013.JoakimNivre.2008.Algorithmsforde-terministicincrementaldependencypars-ing.Comput.Linguist.,34:513–553.URLhttp://dx.doi.org/10.1162/coli.07-056-R1-07-027.JoakimNivreandRyanMcDonald.2008.In-tegratinggraph-basedandtransition-baseddependencyparsers.InProceedingsofACL-08:HLT,pages950–958.AssociationforCompu-tationalLinguistics,Columbus,Ohio.URLhttp://www.aclweb.org/anthology/P/P08/P08-1108.SlavPetrov,LeonBarrett,RomainThibaux,andDanKlein.2006.Learningaccurate,compact,andinterpretabletreeannotation.InProceedingsofthe21stInternationalConferenceonComputa-tionalLinguisticsand44thAnnualMeetingoftheAssociationforComputationalLinguistics,pages433–440.AssociationforComputationalLinguis-tics,Sídney,Australia.KenjiSagaeandAlonLavie.2006.Parsercom-binationbyreparsing.InProceedingsoftheHumanLanguageTechnologyConferenceoftheNAACL,CompanionVolume:ShortPapers,NAACL-Short’06,pages129–132.AssociationforComputationalLinguistics,Stroudsburg,Pensilvania,USA.URLhttp://portal.acm.org/citation.cfm?id=1614049.1614082.WeiweiSun.2010a.ImprovingChinesese-manticrolelabelingwithrichsyntacticfea-tures.InProceedingsoftheACL2010Con-ferenceShortPapers,pages168–172.Associ-ationforComputationalLinguistics,Uppsala,Sweden.URLhttp://www.aclweb.org/anthology/P10-2031.WeiweiSun.2010b.Word-basedandcharacter-basedwordsegmentationmodels:Compari-sonandcombination.InProceedingsofthe l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 2 9 1 5 6 6 6 6 3 / / t yo a C _ a _ 0 0 2 2 9 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 314 23rdInternationalConferenceonComputationalLinguistics(Coling2010),pages1211–1219.Coling2010OrganizingCommittee,Beijing,China.URLhttp://www.aclweb.org/anthology/C10-2139.WeiweiSunandZhifangSui.2009.Chinesefunc-tiontaglabeling.InProceedingsofthe23rdPa-ciﬁcAsiaConferenceonLanguage,InformationandComputation.HongKong.WeiweiSunandHansUszkoreit.2012.Capturingparadigmaticandsyntagmaticlexicalrelations:TowardsaccurateChinesepart-of-speechtagging.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics.Asso-ciationforComputationalLinguistics.MihaiSurdeanuandChristopherD.Manning.2010.Ensemblemodelsfordependencyparsing:Cheapandgood?InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmer-icanChapteroftheAssociationforComputa-tionalLinguistics,pages649–652.AssociationforComputationalLinguistics,LosAngeles,Cal-ifornia.URLhttp://www.aclweb.org/anthology/N10-1091.AndreTorresMartins,NoahSmith,andEricXing.2009.Conciseintegerlinearprogrammingfor-mulationsfordependencyparsing.InProceed-ingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcess-ingoftheAFNLP,pages342–350.Associa-tionforComputationalLinguistics,Suntec,Sin-gapore.URLhttp://www.aclweb.org/anthology/P/P09/P09-1039.Andr´eFilipeTorresMartins,DipanjanDas,NoahA.Smith,andEricP.Xing.2008.Stackingde-pendencyparsers.InProceedingsofthe2008ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages157–166.Associ-ationforComputationalLinguistics,Honolulu,Hawaii.URLhttp://www.aclweb.org/anthology/D08-1017.NianwenXue.2007.Tappingtheimplicitinfor-mationforthePStoDSconversionoftheChi-nesetreebank.InProceedingsoftheSixthInter-nationalWorkshoponTreebanksandLinguisticsTheories.YiZhangandHans-UlrichKrieger.2011.Large-scalecorpus-drivenpcfgapproximationofanhpsg.InProceedingsofthe12thInternationalConferenceonParsingTechnologies,pages198–208.AssociationforComputationalLinguistics,Dublín,Ireland.URLhttp://www.aclweb.org/anthology/W11-2923.YueZhangandStephenClark.2008.Ataleoftwoparsers:Investigatingandcombininggraph-basedandtransition-baseddependencyparsing.InPro-ceedingsofthe2008ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages562–571.AssociationforComputationalLinguis-tics,Honolulu,Hawaii.URLhttp://www.aclweb.org/anthology/D08-1059.YueZhangandStephenClark.2009.Transition-basedparsingoftheChinesetreebankusingaglobaldiscriminativemodel.InProceedingsofthe11thInternationalConferenceonPars-ingTechnologies(IWPT’09),pages162–171.As-sociationforComputationalLinguistics,París,France.URLhttp://www.aclweb.org/anthology/W09-3825.YueZhangandStephenClark.2011.Syntac-ticprocessingusingthegeneralizedperceptronandbeamsearch.Comput.Linguist.,37(1):105–151.URLhttp://dx.doi.org/10.1162/coli_a_00037.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfea-tures.InProceedingsofthe49thAnnualMeet-ingoftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages188–193.AssociationforComputationalLinguistics,Portland,Oregón,USA.URLhttp://www.aclweb.org/anthology/P11-2033.TaoZhuangandChengqingZong.2010.Amin-imumerrorweightingcombinationstrategyforChinesesemanticrolelabeling.InProceedingsofthe23rdInternationalConferenceonCompu-tationalLinguistics(Coling2010),pages1362–1370.Coling2010OrganizingCommittee,Bei-jing,China.URLhttp://www.aclweb.org/anthology/C10-1153.
Descargar PDF