Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 415–428. Redattore di azioni: Brian Roark.
Submitted 7/2013; Revised 9/2013; Pubblicato 10/2013. C
(cid:13)
2013 Associazione per la Linguistica Computazionale.
JointMorphologicalandSyntacticAnalysisforRichlyInflectedLanguagesBerndBohnet∗JoakimNivre?IgorBoguslavsky•◦Rich´ardFarkas(cid:5)FilipGinter†JanHajiˇc‡∗UniversityofBirmingham,SchoolofComputerScience?UppsalaUniversity,DepartmentofLinguisticsandPhilology•UniversidadPolit´ecnicadeMadrid,DepartamentodeInteligenciaArtificial◦RussianAcademyofSciences,InstituteforInformationTransmissionProblems(cid:5)UniversityofSzeged,InstituteofInformatics†UniversityofTurku,DepartmentofInformationTechnology‡CharlesUniversityinPrague,InstituteofFormalandAppliedLinguisticsAbstractJointmorphologicalandsyntacticanalysishasbeenproposedasawayofimprovingparsingaccuracyforrichlyinflectedlanguages.Start-ingfromatransition-basedmodelforjointpart-of-speechtagginganddependencypars-ing,weexploredifferentwaysofintegratingmorphologicalfeaturesintothemodel.Wealsoinvestigatetheuseofrule-basedmor-phologicalanalyzerstoprovidehardorsoftlexicalconstraintsandtheuseofwordclus-terstotacklethesparsityoflexicalfeatures.Evaluationonfivemorphologicallyrichlan-guages(Czech,Finnish,German,Hungarian,andRussian)showsconsistentimprovementsinbothmorphologicalandsyntacticaccuracyforjointpredictionoverapipelinemodel,withfurtherimprovementsthankstolexicalcon-straintsandwordclusters.Thefinalresultsimprovethestateoftheartindependencyparsingforalllanguages.1IntroductionSyntacticparsingofnaturallanguagehaswitnessedatremendousdevelopmentduringthelasttwentyyears,especiallythroughtheuseofstatisticalmod-elsforrobustandaccuratebroad-coverageparsing.However,asstatisticalparsingtechniqueshavebeenappliedtomoreandmorelanguages,ithasalsobeenobservedthattypologicaldifferencesbetweenlanguagesleadtonewchallenges.Inparticular,ithasbeenfoundoverandoveragainthatlanguagesexhibitingrichmorphologicalstructure,oftento-getherwitharelativelyfreewordorder,usuallyob-tainlowerparsingaccuracy,especiallyincompar-isontoEnglish.OnestrikingdemonstrationofthistendencycanbefoundintheCoNLLsharedtasksonmultilingualdependencyparsing,organizedin2006and2007,whererichlyinflectedlanguagesclusteredatthelowerendofthescalewithrespecttopars-ingaccuracy(BuchholzandMarsi,2006;Nivreetal.,2007).Theseandsimilarobservationshaveledtoanincreasedinterestinthespecialchallengesposedbyparsingmorphologicallyrichlanguages,asevidencedmostclearlybyanewseriesofwork-shopsdevotedtothistopic(Tsarfatyetal.,2010),aswellasaspecialissueinComputationalLinguistics(Tsarfatyetal.,2013)andasharedtaskonparsingmorphologicallyrichlanguages.1Onehypothesizedexplanationforthelowerpars-ingaccuracyobservedforrichlyinflectedlanguagesisthestrictseparationofmorphologicalandsyn-tacticanalysisassumedinmanyparsingframe-works(Tsarfatyetal.,2010;Tsarfatyetal.,2013).Thisistrueinparticularfordata-drivendependencyparsers,whichtendtoassumethatallmorphologicaldisambiguationhasbeenperformedbeforesyntacticanalysisbegins.However,asarguedbyLeeetal.(2011),inmorphologicallyrichlanguagesthereisoftenconsiderableinteractionbetweenmorphologyandsyntax,suchthatneithercanbedisambiguatedwithouttheother.Leeetal.(2011)goontoshowthatadiscriminativemodelforjointmorphologicaldisambiguationanddependencyparsinggivescon-sistentimprovementsinmorphologicalandsyntac-ticaccuracy,comparedtoapipelinemodel,forAn-cientGreek,Czech,HungarianandLatin.Simi-larly,BohnetandNivre(2012)proposeamodelfor1Seehttps://sites.google.com/site/spmrl2013/home/sharedtask.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
T
l
UN
C
_
UN
_
0
0
2
3
8
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
416
jointpart-of-speechtagginganddependencyparsingandreportimprovedaccuracyforCzechandGer-man(butalsoforChineseandEnglish),althoughinthiscasethejointmodelislimitedtobasicpart-of-speechtagsanddoesnotinvolvethefullcomplexofmorphologicalfeatures.Anintegratedapproachtomorphologicalandsyntacticanalysiscanalsobefoundingrammar-baseddependencyparsers,suchastheETAP-3linguisticprocessor(Apresianetal.,2003),wheremorphologicaldisambiguationismostlycarriedouttogetherwithsyntacticanal-ysis.Finally,itisworthnotingthatjointmodelsofmorphologyandsyntaxhavebeenmorepopu-larinconstituency-basedstatisticalparsing(CowanandCollins,2005;Tsarfaty,2006;CohenandSmith,2007;GoldbergandTsarfaty,2008).Anotherhypothesisfromtheliteratureisthatthehightype-tokenratioresultingfromlargemorpho-logicalparadigmsleadstodatasparsenesswhenes-timatingtheparametersofastatisticalparsingmodel(Tsarfatyetal.,2010;Tsarfatyetal.,2013).Inpar-ticular,formanywordsinthelanguage,onlyasub-setofitsmorphologicalformswillbeobservedattrainingtime.Thissuggeststhatusingrule-basedmorphologicalanalyzersorotherlexicalresourcesmaybeaviablestrategytoimprovecoverageandperformance.Thus,GoldbergandElhadad(2013)showthatintegratinganexternalwide-coveragelex-iconwithatreebank-trainedPCFGparserimprovesparsingaccuracyforModernHebrew,whichisinlinewithearlierstudiesofpart-of-speechtaggingformorphologicallyrichlanguages(Hajiˇc,2000).ThesparsityoflexicalfeaturescanalsobetackledbytheuseofdistributionalwordclustersaspioneeredbyKooetal.(2008).Inthispaper,wepresentatransition-basedmodelthatjointlypredictscomplexmorphologicalrepre-sentationsanddependencyrelations,generalizingtheapproachofBohnetandNivre(2012)toincludethefullrangeofmorphologicalinformation.Westartbyinvestigatingdifferentwaysofintegratingmorphologicalfeaturesintothemodel,goontoex-aminetheeffectofusingrule-basedmorphologicalanalyzerstoderivehardorsoftconstraintsonthemorphologicalanalysis,andfinallyaddwordclus-terfeaturestocombatlexicalsparsity.Weevalu-ateourmethodsondatafromCzech,Finnish,Ger-man,Hungarian,andRussian,fivemorphologicallyrichlanguagesrepresentingthreedifferentlanguagegroups.Theexperimentsshowthatjointpredictionofmorphologyandsyntax,rule-basedmorpholog-icalanalyzers,andwordclustersallcontributetoimprovedparsingaccuracy,leadingtonewstate-of-the-artresultsforalllanguages.2MethodInthissection,wedefinetargetrepresentationsandevaluationmetrics(2.1),anddescribeourtransition-basedparsingframework,consistingofanabstracttransitionsystem(2.2),afeature-basedscoringfunc-tion(2.3),andalgorithmsfordecoding(2.4)andlearning(2.5).2.1RepresentationsandMetricsWetakeanunlabeleddependencytreeforasentencex=w1,…,wntobeadirectedtreeT=(Vx,UN),whereVx={0,1,…,N},A⊆Vx×V+x,and0istherootofthetree(K¨ubleretal.,2009).ThesetVxofnodesisthesetofpositiveintegersuptoandin-cludingn,eachcorrespondingtothelinearpositionofawordinthesentence,plusanextraartificialrootnode0.WeuseV+xtodenoteVx−{0}.ThesetAofarcsisasetofpairs(io,j),whereiistheheadnodeandjisthedependentnode.Tothisbasicrepresentationofsyntacticstructureweaddfourlabelingfunctionsforpart-of-speechtags,morphologicalfeatures,lemmas,anddepen-dencyrelations.Thefunctionπ:V+x→PmapseachnodeinV+xtoapart-of-speechtaginthesetP;thefunctionµ:V+x→MmapseachnodetoamorphologicaldescriptioninthesetM;thefunctionλ:V+x→Z∗mapseachnodeinV+xtoalemma(astringoversomecharactersetZ);andthefunc-tionδ:A→DmapseacharctoadependencylabelinthesetD.TheexactnatureofP,MandDdependsonthedatasetsused,butnormallyPandDonlycontainatomiclabelswhilethemembersofMaresetsofatomicfeaturesencodingpropertieslikenumber,case,tense,etc.Forlemmas,wedonotassumethatthereisafixedlexiconbutallowanycharacterstringasalegalvalue.Wedefineourtargetrepresentationforasentencex=w1,…,wnasaquintupleΓ=(UN,π,µ,λ,δ)suchthat(Vx,UN)isanunlabeleddependencytree;π,µandλlabelthenodeswithpart-of-speechtags,
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
T
l
UN
C
_
UN
_
0
0
2
3
8
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
417
TransitionConditionLEFT-ARCd([P|io,j],B,Γ)⇒([P|j],B,Γ[(j,io)∈A,δ(j,io)=d])i6=0RIGHT-ARCd([P|io,j],B,Γ)⇒([P|io],B,Γ[(io,j)∈A,δ(io,j)=d])SHIFTp,M,l(P,[io|β],Γ)⇒([P|io],β,Γ[π(io)=p,µ(io)=m,λ(io)=l])SWAP([P|io,j],β,Γ)⇒([P|j],[io|β],Γ)0d l D o w n o a d e d f r o m h t t p
: / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 3 8 1 5 6 6 6 8 5 / / t l a c _ a _ 0 0 2 3 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 418 topermutenodesinordertoallownon-projectivedependencies.BohnetandNivre(2012)modifiedthissystembyreplacingthesimpleSHIFTtransitionbySHIFTp,whichnotonlymovesanodefromthebuffertothestackbutalsoassignsitapart-of-speechtagp,turningitintoasystemforjointpart-of-speechtagginganddependencyparsing.2HereweaddtwoadditionalparametersmandltotheSHIFTtransi-tion,sothatanodemovedfromthebuffertothestackisassignednotonlyatagpbutalsoamorpho-logicaldescriptionmandalemmal.Inthisway,wegetajointmodelforthepredictionofpart-of-speechtags,morphologicalfeatures,lemmas,anddependencytrees.2.3ScoringIntransition-basedparsing,wescoreparsesinanin-directfashionbyscoringtransitionsequences.Ingeneral,weassumethatthescorefunctionsfactorsbyconfiguration-transitionpairs:S(X,C0,m)=mXi=0s(X,ci,ti)(1)Inoltre,whenusingstructuredlearning,asfirstproposedfortransition-basedparsingbyZhangandClark(2008),weassumethatthescoreisgivenbyalinearmodelwhosefeaturerepresentationsdecom-poseinthesameway:S(X,C0,m)=f(X,C0,m)·w=mXi=0f(X,ci,ti)·w(2)Here,F(X,C,T)isahigh-dimensionalfeaturevec-tor,whereeachcomponentfi(X,C,T)isanonneg-ativenumericalfeature(usuallybinary),andwisaweightvectorofthesamedimensionality,whereeachcomponentwiisthereal-valuedweightofthefeaturefi(X,C,T).Thechoiceoffeaturestoincludeinf(X,C,T)isdiscussedseparatelyforeachinstanti-ationofthemodelinSections4–6.2Hatorietal.(2011)previouslymadethesamemodifica-tiontothearc-standardsystem(Nivre,2004),withouttheSWAPtransition.Similarly,TitovandHenderson(2007)addedawordparametertotheSHIFTtransitiontogetajointmodelofwordstringsanddependencytrees.AsimilarmodelwasconsideredbutfinallynotusedbyGesmundoetal.(2009).2.4DecodingExactdecodingfortransition-basedparsingishardingeneral.3Earlytransition-basedparsersmostlyreliedongreedy,deterministicdecoding,whichmakesforveryefficientparsing(YamadaandMat-sumoto,2003;Nivre,2003),butresearchhasshownthataccuracycanbeimprovedbyusingbeamsearchinstead(ZhangandClark,2008;ZhangandNivre,2012).Whilestillnotexact,beamsearchdecodersexplorealargerpartofthesearchspacethangreedyparsers,whichislikelytobeespeciallyimportantforjointmodels,wherethesearchspaceislargerthanforplaindependencyparsingwithoutmorphol-ogy(evenmoresowiththeSWAPtransitionfornon-projectivity).Figure2outlinesthebeamsearchal-gorithmusedfordecodingwithourmodel.Differ-entinstantiationsofthemodelwillrequireslightlydifferentimplementationsofthepermissibilitycon-ditioninvokedinline8,whichcanbeusedtofilteroutlabelsthatareimprobableorincompatiblewithanexternallexicon,andthepruningstepperformedinline13,wheretheremaybeaneedtobalancetheamountofmorphologicalandsyntacticvariationinthebeam.BoththeseaspectswillbediscussedindepthinSections4–6.Althoughtheworst-caserunningtimewithcon-stantbeamsizeisquadraticinsentencelength,theobservedrunningtimeislinearfornaturallanguagedatasets,duetothesparsityofnon-projectivede-pendencies(Nivre,2009).Therunningtimeisalsolinearin|D|+|P×M|,whichmeansthatjointpre-dictiononlygivesalinearincreaseinrunningtime,oftenquitemarginalbecause|D|>|P×M|.Thisassumesthatthelemmaispredicteddeterministi-callygivenatagandamorphologicaldescription,anassumptionthatisenforcedinallourexperiments.2.5LearningInordertolearnaweightvectorwfromatrainingsetofsentenceswithgoldparses,weuseavariantofthestructuredperceptron,introducedbyCollins(2002)andfirstusedfortransition-basedparsingbyZhangandClark(2008).Weinitializeallweights3Whilethereexistexactdynamicprogrammingalgorithmsforprojectivetransitionsystems(HuangandSagae,2010;Kuhlmannetal.,2011)andevenforrestrictednon-projectivesystems(Cohenetal.,2011),parsingisintractableforsystemslikeoursthatpermitarbitrarynon-projectivetrees.l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
T
l
UN
C
_
UN
_
0
0
2
3
8
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
419
PARSE(X,w)1h0.c←cs(X)2h0.s←0.03h0.f←{0.0}dim(w)4BEAM←[h0]5while∃h∈BEAM:h.c6∈Ct6TMP←[]7foreachh∈BEAM8foreacht∈T:PERMISSIBLE(h.c,T)9h.f←h.f+f(X,h.c,T)10h.s←h.s+f(X,h.c,T)·w11h.c←t(h.c)12TMP←INSERT(H,TMP)13BEAM←PRUNE(TMP)14h∗←TOP(BEAM)15returnΓh∗cFigure2:BeamsearchalgorithmforfindingthebestMS-parseforinputsentencexwithweightvectorw.Thesymbolsh.c,h.sandh.fdenote,rispettivamente,thecon-figuration,scoreandfeaturevectorofahypothesish;ΓcdenotestheMS-parsedefinedbyc.to0.0,makeNiterationsoverthetrainingdataandupdatetheweightvectorforeverysentencexwherethetransitionsequenceC0,mcorrespondingtothegoldparseisdifferentfromthehighestscoringtran-sitionsequenceC∗0,m0.4Moreprecisely,weusethepassive-aggressiveupdateofCrammeretal.(2006).Wealsousetheearlyupdatestrategyfoundbenefi-cialforparsinginseveralpreviousstudies(CollinsandRoark,2004;ZhangandClark,2008;HuangandSagae,2010).Thismeansthat,atlearningtime,weterminatethebeamsearchassoonasthehypothesiscorrespondingtothegoldparseisprunedfromthebeamandthenupdatewithrespecttothepartialtransitionsequencesconstructeduptothatpoint.Finally,weusethestandardtechniqueofav-eragingoverallweightvectorsseenintraining,asoriginallyproposedbyCollins(2002).4Notethattheremaybemorethanonetransitionsequencecorrespondingtothegoldparse,inwhichcasewepickthecanonicaltransitionsequencethatprocessesallleft-dependentsbeforeright-dependentsandappliesthelazyswappingstrategyofNivreetal.(2009).3DataSetsandResourcesThroughoutthepaper,weexperimentwithdatafromfivelanguages:Czech,Finnish,German,Hungarian,andRussian.Foreachlanguage,weuseamorpho-logicallyandsyntacticallyannotatedcorpus(tree-bank),dividedintoatrainingset,adevelopmentsetandatestset.Inaddition,weusealexicongen-eratedbyarule-basedmorphologicalanalyzer,anddistributionalwordclustersderivedfromalargeun-labeledcorpus.Belowwedescribethespecificre-sourcesusedforeachlanguage.Table1providesdescriptivestatisticsabouttheresources.CzechFortrainingandtestweusethePragueDe-pendencyTreebank(Hajiˇcetal.,2001;B¨ohmov´aetal.,2003),Version2.5,convertedtotheformatusedintheCoNLL2009sharedtask(Hajiˇcetal.,2009).ThemorphologicallexiconcomesfromHajiˇcandHladk´a(1998),5andwordclustersarederivedfromalargewebcorpus(Spoustov´aandSpousta,2012).FinnishThetrainingsetisfromtheTurkuDepen-dencyTreebank(Haverinenetal.,2013),andthetestsetisthehiddentestsetmaintainedbythetreebankdevelopers.Itisworthnotingthat,whiletheentiretreebankhasmanuallyvalidatedsyntacticannota-tion,themorphologicalannotationisautomaticex-ceptforasubsetof1204tokensinthetestset,whichwillbeusedtoestimatethePOS,MOR,LEM,PMandPMDscores.Theestimatedaccuracyoftheautomaticannotationis97.3%POSand94.8%PM(Haverinenetal.,2013).Also,becauseofthelim-itedamountofdata,wedonotuseadevelopmentsetforFinnishbutinsteadusecross-validationonthetrainingsetwhentuningparameters.Weusetheopen-sourcemorphologicalanalyzerOMorFi(Piri-nen,2011)andwordclustersderivedfromtheentireFinnishWikipedia.6GermanTrainingandtestsetsarefromtheTigerTreebank(Brantsetal.,2002)intheimprovedde-pendencyconversionbySeekeretal.(2010).WeusetheSMORmorphologicalanalyzer(Schmidetal.,2004),butbecausethetagsandmorphologicalfeaturesinthelexiconarenotthesameasinthe5Downloadedfromthehttp://lindat.czrepositoryasresourcePIDhttp://hdl.handle.net/11858/00-097C-0000-0015-A780-9.6DownloadedinMarch2012.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
T
l
UN
C
_
UN
_
0
0
2
3
8
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
420
TreebankMorphologyClustersTrainDevTestPMDFormsLemmasTokensTypesCzech652,54487,98870,3481218514998,36042,058628,332,859477,185Finnish183,118–21,2111219174757,12725,28050,207,300257,984German648,29632,06531,692542574376,72955,2201,327,701,1821,621,083Hungarian1,101,871210,068171,46622110533151,97171,263200,249,814538,138Russian575,40072,89371,664144547897,90535,039195,897,041639,446Table1:Statisticsaboutdatasetsandresourcesusedintheexperiments.Treebank:numberoftokensindatasets;numberoflabelsinlabelsets.Morphology:numberofwordformsandlemmasintreebankcoveredbymorphologicalanalyzer.Clusters:numberoftokensandtypesinunlabeledcorpus.treebankannotationwehavetorelyonaheuristicmappingbetweenthetwo.Wordclustersarederivedfromtheso-calledHugeGermanCorpus.7HungarianFortrainingandtestweusetheSzegedDependencyTreebank(Farkasetal.,2012).Weuseafinite-statemorphologicalanalyzercon-structedfromthemorphdb.hulexicalresource(Tr´onetal.,2006),andwordclusterscomefromtheHun-garianNationalCorpus(V´aradi,2002).RussianParsersaretrainedandtestedondatafromtheSynTagRusTreebank(Boguslavskyetal.,2000;Boguslavskyetal.,2002).Themorpholog-icalanalyzerisamoduleoftheETAP-3linguisticprocessor(Apresianetal.,2003)withadictionarycomprisingmorethan130,000lexemes(IomdinandSizov,2008).WordclustershavebeenproducedonthebasisofanunlabeledcorpusofRussiancom-piledbytheRussianLanguageInstituteoftheRus-sianAcademyofSciencesandtokenizedbytheETAP-3analyzer.4JointMorphologyandSyntaxWestartbyexploringdifferentwaysofintegratingmorphologyandsyntaxinadata-drivensetting,thatis,whereouronlyknowledgesourceistheanno-tatedtrainingcorpus.Atbothlearningandparsingtime,wepreprocesssentencesusingataggerthatas-signs(upto)kppart-of-speechtagsandkmmorpho-logicaldescriptionsandalemmatizerthatassignsasinglebestlemmatoeachword.Complexmor-phologicaldescriptionsconsistingofseveralatomicfeaturesarepredictedasawhole,bothinprepro-cessingandinparsing.Althoughitwouldbepos-7Seehttp://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/hgc.html.sibletopredicteachatomicmorphologicalfeatureseparately,webelievethiswouldincreasetheriskofcreatinginconsistentmorphologicaldescriptions.Aspreprocessors,weusethetaggerandlemmatizerincludedintheMATEtools8trainedonthesameannotatedtrainingset,using10-foldjack-knifingtogetpredictionsforthetrainingsetitself.Thetag-gerisagreedyleft-to-righttaggertrainedwiththesamepassive-aggressiveonlinelearningasthepars-ingsystem,whichisruntwiceovertheinputtomakemoreeffectiveuseofcontextualfeatures.Thetaggerscoresarenotproperlynormalizedbuttendtobeinthe[0,1]rangeforbothpart-of-speechtagsandmor-phologicaldescriptions.Inthissetting,weconsiderfourdifferentmodelsforderivingafullMS-parse:1.InthePIPELINEmodel,wesetkp=km=1,whichmeansthattheSHIFTtransitionalwaysselectsthe1-besttag,morphologicaldescrip-tionandlemmaforeachword.Weuseabeamsizeof40andprunebysimplykeepingthe40highestscoringhypothesesateachstep.Asthenamesuggests,thisisequivalenttoastandardpipelinewithnojointprediction.2.TheSIMPLETAGmodelreplicatesthemodelofBohnetandNivre(2012)withkp=2,km=1,andascorethresholdfortagsof0.25,meaningthatthesecondbesttagisonlyconsideredifitsscoreislessthan0.25belowthatofthebesttag.Weusetwo-stepbeampruning,wherewefirstextractthe40highestscoringhypotheseswithdistinctdependencytreesandthenaddthe8highestscoringremaininghypotheses(nor-mallymorphologicalvariantsofhypothesesal-readyincluded)foratotalbeamsizeof48.This8Availableathttps://code.google.com/p/mate-tools/.
l
D
o
w
N
o
UN
D
e
D
F
R
o
M
H
T
T
P
:
/
/
D
io
R
e
C
T
.
M
io
T
.
e
D
tu
/
T
UN
C
l
/
l
UN
R
T
io
C
e
–
P
D
F
/
D
o
io
/
.
1
0
1
1
6
2
/
T
l
UN
C
_
UN
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
T
l
UN
C
_
UN
_
0
0
2
3
8
P
D
.
F
B
sì
G
tu
e
S
T
T
o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
421
modelperformsjointtaggingandparsingbutrelieson1-bestmorphologicalfeatures.3.TheCOMPLEXTAGmodelislikeSIMPLETAGexceptthatwelettagsrepresenttheconcate-nationofordinarytagsandmorphologicalde-scriptions(andretrainthepreprocessingtaggeronthisrepresentation).Thismodelperformsjointmorphologicalandsyntacticanalysisasjointtaggingandparsingwithafine-grainedtagset.4.TheJOINTmodelhaskp=km=2,meaningthatthetagandthemorphologicaldescriptioncanbeselectedindependentlybytheparser.Formorphologicaldescriptions,weuseascorethresholdof0.1.Forbeampruning,wegener-alizethepreviousmethodbyfirstextractingthe40highest-scoringhypotheseswithdistinctde-pendencytrees.Foreachofthese,wethenfindthehighest-scoringhypothesiswiththesamedependencytreebutdifferenttagsormorpho-logicalfeatures,storingtheseintwotemporarylistsTMPp,forhypothesesthatdifferwithre-specttotags,andTMPm,forhypothesesthatdifferonlywithrespecttomorphologicalfea-tures.Finally,weextractthe8highest-scoringhypothesesfromeachofTMPpandTMPmandaddthemtothebeamforatotalbeamsizeof56.Thismodelperformsjointpredictionofpart-of-speechtags,morphologicaldescrip-tionsanddependencyrelations(butstillrelieson1-bestlemmas,likealltheothermodels.)Theproceduresforbeampruningmayappearbothcomplexandadhoc,especiallyfortheJOINTmodel,butaremotivatedbytheneedtoachieveabalancebetweenmorphologicalandsyntacticambiguityinthesetofhypothesesmaintained.AsexplainedbyBohnetandNivre(2012),justmaintainingasinglebeamdoesnotgiveenoughvarietyinthebeam.ThemethodusedfortheJOINTmodelisonewayofgen-eralizingthistechniquetoafullyjointmodel,butotherstrategiesarecertainlyconceivable.Anotherpointthatmaybesurprisingisthechoicetokeepkpandkmaslowas2,whichisfairlyclosetoapipelinemodel.BohnetandNivre(2012)experi-mentedwithhighervaluesforthetagthresholdbutfoundnoimprovementinaccuracy,andourownpre-liminaryexperimentsconfirmedthistrendformor-phologicaldescriptions.InSection7,wepresentanempiricalanalysisthatgivesfurthersupportforthischoice,atleastforthelanguagesconsideredinthispaper.Notealsothatthechoiceisnotmotivatedbyefficiencyconcerns,sinceincreasingthevaluesofkpandkmhasonlyamarginaleffectonrunningtime,asexplainedinSection2.4.Finally,thechoicenottoconsiderk-bestlemmasisdictatedbythefactthatourlemmatizeronlyprovidesa1-bestanalysis.Forthefirstthreemodels,weusethesamefea-turerepresentationsasBohnetandNivre(2012),9consistingoftheiradaptationofthefeaturesusedbyZhangandNivre(2011),thegraphcompletionfeaturesofBohnetandKuhn(2012),andthespe-cialfeaturesoverk-besttagsintroducedspecifi-callyforjointtaggingandparsingbyBohnetandNivre(2012).FortheJOINTmodel,wesimplyaddfeaturesoverthek-bestmorphologicaldescriptionsanalogoustothefeaturesoverk-besttags.10ExperimentalresultsforthesefourmodelscanbefoundinTable2.FromthePIPELINEresults,weseethatthe1-bestaccuracyofthepreprocessingtag-gerrangesfrom95.0(Finnish)to99.2(Czech)forPOS,andfrom89.4(Finnish)to96.5(Hungarian)forMOR.Thelemmatizerdoesagoodjobforfourofthelanguages(93.9–97.9)buthasreallypoorper-formanceonFinnish(73.7).Withrespecttosyn-tacticaccuracy,thePIPELINEsystemachievesLASrangingfrom79.9(Finnish)to91.8(German)andUASrangingfrom84.4to93.7.ItisinterestingtonotethatthehighestPMDscore,whichrequiresbothmorphologyandsyntaxtobecompletelycor-rect,isobservedforHungarian(86.2).TurningtotheresultsforSIMPLETAG,wenotethatourresultsareconsistentwiththosereportedbyBohnetandNivre(2012),withsmallbutcon-sistentimprovementsinPOSandUAS/LAS(andinthecompoundmetricsPMandPMD)formostlanguages.However,theimprovementinthePMDscoreisstatisticallysignificantonlyforHungarianandRussian(P<0.01).Bycontrast,theresultsforCOMPLEXTAGconfirmourhypothesisthatmergingtagsandmorphologicaldescriptionsintoasingletagisnotaneffectivewaytodojointmorphologicaland9Seehttp://stp.lingfil.uu.se/∼nivre/exp/emnlp12.html.10Acompletedescriptionofourfeaturerepresentationsisavailableathttp://stp.lingfil.uu.se/∼nivre/exp/tacl13.html.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
422
CzechPOSMORLEMUASLASPMPMDPIPELINE99.293.295.588.583.193.078.4SIMPLETAG99.293.295.588.583.293.178.4COMPLEXTAG98.893.395.587.181.293.377.3JOINT99.293.795.588.783.393.779.2LEXHARD99.393.094.388.783.492.978.3LEXSOFT99.494.595.988.883.594.479.8CLUSTER99.494.696.089.083.794.580.0ORACLE99.897.0–92.789.994.184.7GOLD–––89.384.5––FinnishPOSMORLEMUASLASPMPMDPIPELINE95.089.473.784.479.988.871.5SIMPLETAG95.689.473.784.880.589.073.0COMPLEXTAG93.084.973.780.174.584.765.3JOINT95.489.273.784.880.689.172.6LEXHARD95.891.693.486.182.591.175.9LEXSOFT95.591.993.086.082.391.675.7CLUSTER95.792.094.486.683.191.475.8ORACLE98.094.8–91.389.491.883.1GermanPOSMORLEMUASLASPMPMDPIPELINE97.690.097.993.791.889.182.9SIMPLETAG98.090.097.993.891.989.183.0COMPLEXTAG97.387.697.992.390.186.979.7JOINT98.190.897.993.992.090.083.9LEXHARD97.065.697.993.792.064.661.3LEXSOFT98.491.997.994.092.191.285.1CLUSTER98.492.597.994.192.491.785.9ORACLE99.696.3–96.395.991.988.8GOLD–––94.292.7––HungarianPOSMORLEMUASLASPMPMDPIPELINE97.696.593.991.088.496.186.2SIMPLETAG97.896.593.991.388.896.186.6COMPLEXTAG97.590.993.990.687.790.981.2JOINT97.896.493.791.388.996.286.7LEXHARD98.597.399.091.589.197.187.4LEXSOFT98.597.699.091.489.197.487.7CLUSTER98.597.699.091.789.397.488.0ORACLE99.799.3–94.693.397.691.7GOLD–––91.989.8––RussianPOSMORLEMUASLASPMPMDPIPELINE98.494.096.192.687.492.682.7SIMPLETAG98.594.096.192.687.592.682.9COMPLEXTAG97.991.496.191.285.190.879.0JOINT98.594.496.192.887.692.883.5LEXHARD98.995.194.093.088.094.584.1LEXSOFT98.895.796.592.987.795.184.5CLUSTER98.895.796.693.087.995.784.7ORACLE99.998.6–95.592.995.289.0GOLD–––94.089.1––Table2:Testsetresultsforallmodels.ORACLE=ora-clescoresforLEXSOFT;GOLD=accuracyforPIPELINEwithgoldPOS,MOR,LEM.Boldmarksbestresultpercolumnandlanguage(excludingORACLEandGOLD).syntacticanalysis.Here,weseeasignificantdropinmostscoresforalllanguages,butinparticularintheaccuracyofmorphologicaldescriptions(MOR),wherethescoredropsby5.6percentagepointsforHungarian,4.5forFinnish,2.6forRussian,and2.4forGerman.TheonlyexceptionisCzech,whereMORandPMactuallyimproveslightly,butthiscomesattheexpenseofasubstantialdropindepen-dencyaccuracy.Inanycase,thedecreaseinPMDishighlysignificantforalllanguages(p<0.01).Finally,weseethattheJOINTmodel,wheretagsandmorphologicaldescriptionsarepredictedseparatelyduringtheparsingprocess,givessig-nificantimprovementsinMORaccuracycomparedtothePIPELINEandSIMPLETAGmodelsforGer-man(+0.8),Czech(+0.5),andRussian(+0.4),withmarginalimprovementsalsointhesyntacticUASandLASscores.ForFinnishandHungarian,ontheotherhand,thereisactuallyasmalldropinac-curacy(andforFinnishalsoadropinPOSaccu-racycomparedtoSIMPLETAG).Interestingly,how-ever,forboththeselanguagesthereisneverthelessasmallimprovementinthejointPMscore,indicat-ingthattheJOINTmodelingeneraldoesabetterjobatselectingavalidcompletemorphologicalde-scriptionthantheSIMPLETAGmodel.SinceFinnishandHungarianarethemostmorphologicallycom-plexlanguages,itislikelythatthelackofastrongpositiveeffectisdueinparttosparsedata,espe-ciallyforFinnishwherethetrainingsetissmall.Asweshallseeinthenextsection,thisproblemcanbepartlyovercomethroughtheuseofexternallexicalresources.Still,theimprovementinthePMDscoreovertheotherthreemodelsishighlysignificantforalllanguagesexceptFinnish(p<0.01).5LexicalConstraintsOurstartingpointinthissectionistheJOINTmodel,whichgavethebestoverallaccuracyscore(PMD)foralllanguagesexceptFinnish.Tothismodelwenowaddconstraintsderivedfromamorphologicallexiconthatmapseachwordformtoasetofpos-sibletags,morphologicaldescriptionsandlemmas.Weexploretwodifferentwaysofintegratingtheseconstraints:1.IntheLEXHARDmodel,weusethelexicontoderivehardconstraintsandfilterouttagsand
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
423
morphologicaldescriptionsthatarenotinthelexicon.Moreprecisely,forwordformsthatarecoveredbythelexicon,welettheprepro-cessingtaggerselectthekpbesttagsandkmbestmorphologicaldescriptionsthatareinthelexicon.Wedothisbothduringtrainingandparsing,andweuseexactlythesamefeaturesandbeamhandlingasfortheJOINTmodelintheprevioussection.2.IntheLEXSOFTmodel,weinsteadusesoftlexicalconstraintsbyaddingfeaturesthaten-codewhetheratagormorphologicaldescrip-tionisinthelexiconornot.Again,weaddthesefeaturesbothtothepreprocessingtaggerandtothejointparser,whichotherwiseremainexactlyasbefore.OneadditionalmodificationthatwemakeforboththeLEXHARDandtheLEXSOFTmodelistocom-pletelyrelyontheexternallexiconforthepredic-tionoflemmas.Aftertheparserhasselectedatagandmorphologicaldescriptionforaword,wesim-plypredictthecorrespondinglemmafromthelexi-con,breakingtiesarbitrarilyintheveryfewcaseswherethewordform,tagandmorphologicalde-scriptiondonotdetermineauniquelemma,andleavingthelemmaemptyforwordformsthatarenotcontainedinthelexicon.Thismeansthat,incon-trasttothepurelydata-drivenmodels,thelexicon-enrichedmodelspredictthecompletemorphologicalanalysisjointlywithparsing(withthelemmabeingderiveddeterministicallyfromthetagandthemor-phologicaldescription).WemakeanexceptiononlyforGerman,wherethelexiconprovideslemmasthatwouldrequirefurtherdisambiguationandwherewethereforecontinuetousethedata-drivenlemmatizer.AscanbeseeninTable2,theresultsfortheLEX-HARDmodelaresomewhatmixed.ForFinnish,weseeadramaticimprovementoftheLEMscore(from73.7to93.4),indicatingthattherule-basedmorpho-logicalanalyzerisvastlysuperiortothedata-drivenlemmatizerforFinnish.ThereisalsoaveryniceboosttotheMORscore(+2.2)andasmallerim-provementonPOS(+0.4).Theseimprovementsalsoleadtohighersyntacticaccuracy,withLASincreas-ingfrom80.6to82.5andUASfrom84.8to86.1.ForHungarian,wehaveniceimprovementsoftheLEMscore(+5.3),theMORscore(+0.9)andthePOSscore(+0.7),butonlysmallimprovementsinLAS/UAS.ForRussian,weobserveimprovementsinPOSandMOR,asmalldropinLEM,andagainminorimprovementsinUAS/LAS.ForCzechandGerman,finally,weseeadropinMOR(andinLEMforCzechandPOSforGerman),whileUAS/LASislargelyunaffected.ForGerman,thisresultcanprob-ablybeexplainedlargelybythefactthatthemor-phologicaldescriptionsinthelexiconarenotfullycompatiblewiththoseinthetreebank,asexplainedinSection3.Similarly,forCzech,wethinkthedropintheLEMscoreisduetodiscrepanciescausedbyupdatesinthedictionaryversionreleasedin2013,deviatingfromthepreviouslypublishedtreebank.Ingeneral,theLEXSOFTmodelperformsconsid-erablybetter,achievingthebestresultssofarformostlanguagesandmetrics.Theonlyclearex-ceptionisFinnish,whereitperformsslightlyworsethanLEXHARD(butbetterthanalltheothermod-els).Inaddition,thereisamarginaldropinPOSandLAS/UASforRussianandinUASforHungar-ian(butagainonlycomparedtoLEXHARD).TheresultsareparticularlystrikingforGerman,wherethesoftlexicalconstraintsareclearlybeneficial(es-peciallyfortheMORscore)despitenotbeingquitecompatiblewiththemorphologicaldescriptionsinthetrainingset.Intermsofstatisticalsignifance,LEXSOFToutperformstheJOINTmodelwithre-specttothePMDscoreforalllanguages(P<0.01).ItisalsosignificantlybetterthanLEXHARDforalllanguagesexceptFinnish(p<0.01).6WordClustersFinally,weaddwordclusterfeaturestothebestmodelforeachlanguage(LEXHARDforFinnish,LEXSOFTfortheothers).11WeuseBrownclus-ters(Brownetal.,1992),with800clustersforalllanguages,andweusethesamefeaturerepresen-tationasBohnetandNivre(2012).TheresultsinTable2showsmallbutconsistentimprovementsinalmostallmetricsforalllanguages,confirmingthebenefitofclusterfeaturesformorphologicallyrichlanguages.ItisworthnotingthatweseethebiggestimprovementforFinnish,thelanguagewiththesmallesttrainingsetandthereforemostlikelyto11Thebestmodelwasselectedaccordingtoresultsonthedevset(cross-validationonthetrainingsetforFinnish).
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
424
sufferfromsparsedata,wherethesyntacticaccu-racyimprovessubstantially(LAS+0.6,UAS+0.5)andlemmatizationevenmore(LEM+1.0).WealsoseeaniceimprovementinmorphologicalaccuracyforGerman(MOR+0.6,PM+0.5),whichmayberelatedtothelackofacompatiblemorphologicalanalyzerforthislanguageorsimplytothefactthattheclustersarederivedfromamuchlargercorpusforGermanthanfortheotherlanguages.ThePMDimprovementisstatisticallysignificantforalllan-guagesexceptFinnish(P<0.01).7DiscussionTheexperimentalresultsgenerallysupportthecon-clusionthatjointpredictionofmorphologyandsyn-tax,wheremorphologyincludesrichmorphologi-calfeaturesaswellasbasicpart-of-speechtags,im-provesbothmorphologicalandsyntacticaccuracy.Theeffectisespeciallyclearonthejointevalua-tionmetricsPMandPMD,whichindicatesthatthejointmodelproducesmoreinternallyconsistentrep-resentations.However,wealsoseeevidencethatthejointmodelmaysufferfromdatasparsity,asinthecaseofFinnish,whereamodelthatonlypre-dictspart-of-speechtagsjointlywithdependencyrelationsachievebetteraccuracyonsomemetrics.However,eveninthiscase,thejointmodelhasthebestresultsonthejointevaluationmetrics.Thesecondconclusionthatcanbedrawnfromtheexperimentsisthattheuseofanexternallexiconisaneffectivewayofmitigatingthesparsedataprob-lemandtherebyimprovingaccuracy.Ingeneral,however,itismoreeffectivetoaddthelexicalcon-straintsintheformoffeatures,orsoftconstraints,thantoapplythemashardconstraintsanddiscardallanalysesthatarenotlicensedbythelexicon.Inpar-ticular,thisisausefulstrategywhenthelexicalre-sourceisnotcompletelycompatiblewiththeannota-tioninthetrainingset,asseeninthecaseofGermanand(toalesserextent)Czech.TheonlyexceptiontothisgeneralizationisagainFinnish,wherethehardconstraintmodelworksmarginallybetter(exceptfortheMORandPMmetrics),whichmayagainindi-catethatthetrainingsetistoosmalltomakeopti-maluseoftheadditionalfeatures.Still,thesoftcon-straintmodelimprovessubstantiallyoverthemodelswithoutlexicalresourcesalsoforFinnish.Finally,ourexperimentsconfirmthatfeaturesbasedondistributionalwordclustershaveapositiveimpactonsyntacticaccuracy,butlittleornoimpactonmorphologicalaccuracy.Thisisconsistentwithpreviousfindingsintheliterature,mainlyfromEn-glish(Kooetal.,2008;SagaeandGordon,2009),anditisinterestingtoseethatitholdsalsoforrichlyinflectedlanguagesandwhenaddedontopoffea-turesderivedfromexternallexicalresources.Oneissueworthdiscussingisthechoicetoallowthejointmodeltoconsideratmost2tagsand2mor-phologicaldescriptionsperword,whichmayseemoverlyrestrictiveandveryclosetoapipelinemodel.Asalreadymentioned,thiswasmotivatedbythere-sultsofBohnetandNivre(2012),whichexploredhighervalueswithoutseeinganyimprovements,aswellasbyourownpreliminaryexperiments.Inanattempttoshedfurtherlightonthisissue,wecom-putedoraclescoresfortheLEXSOFTmodel,whichusessoftlexicalconstraintsbutnoclusterfeatures.TheoraclescoresforPOSandMORtellushowof-tenthecorrectanalysisisactuallyincludedintheinputtothejointmodel,whiletheoraclescoresforUASandLASreportsthescoreofthebestdepen-dencytreepresentinthebeamattermination.Theresults,reportedinTable2,showthattheoraclescoresareveryhigh,especiallyforpart-of-speechtags(98.0–99.9)butalsoformorphologicaldescrip-tions(94.8–99.3).Hence,veryfewcorrectanalysesareprunedawaywhensettingthekpandkmparam-etersto2,andincreasingthesearchspacefurtheristhereforeunlikelytoimproveaccuracy.Forfurtheranalysis,Table2reportstheUAS/LASscoresofthePIPELINEsystemwhengivengoldstandardtags,morphologicaldescriptionsandlem-masasinput.12Viewingthisasanupperboundonimprovementsinparsingaccuracyforthejointmodels,andcomparingwiththeLEXSOFTmodel,whichlikePIPELINEdoesnotuseclusterfeatures,weseethatjointpredictionwith(soft)lexicalcon-straintsgivesanaverageerrorreductionofabout40%forUASandabout32%forLAS,whichissubstantialespeciallygiventhattheerrorreductioninthePMscore(comparedtotheperfectmorphol-ogyunderlylingtheGOLDscores)isonlyabout12Finnishhadtobeexcludedbecausegoldstandardmorpho-logicalannotationexistsonlyforasmallsubsetofthetreebank.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
425
27.5%.Itisalsoworthpointingoutthattheseim-provementscomeataverymodestcostincomputa-tionalefficiency,astheruntimesfortheLEXSOFTmodelareonaverageonly15%higherthanforthePIPELINEmodel,despitehavinga40%largerbeamsize.13Interestingly,however,foralllanguagestheLAS/UASscoresareactuallyhigherforORACLEthanforGOLD,indicatingthattheLEXSOFTmodelhasinitsfinalbeamdependencytreesthatarebetterthanthe1-besttreespredictedwithperfectmorpho-logicalinputandsuggestingthatthereisroomforfurtherimprovementofthescoringmodel.Thefinalresultsobtainedwithjointpredictionofmorphologyandsyntax,externallexicalconstraints,andclusterfeaturesrepresentanewstateoftheartforsyntacticdependencyparsingforallfivelan-guages.ForCzech,thebestpreviousUASonthestandardtrain-testsplitofthePDTis87.32,re-portedbyKooetal.(2010)withaparserusingnon-projectiveheadautomataanddualdecomposi-tion,whilethebestLASis78.82LASfromNilssonetal.(2006),usingagreedyarc-eagertransition-basedsystemwithpseudo-projectiveparsing.Ourbestresultsare1.7percentagepointsbetterforUAS(89.0)andalmost5percentagepointsbetterforLAS(83.7).14ForFinnish,theonlypreviousresultsarefromHaverinenetal.(2013),whoachieve81.01LASand84.97UASwiththegraph-basedparserofBohnet(2010).Wegetsubstantialimprovementswith83.1LASand86.6UAS.WealsoimproveslightlyovertheirbestPOSscore,obtainedwiththeHunPostagger(Hal´acsyetal.,2007)togetherwiththeOMorFianalyzer(95.7vs.95.4).ForGerman,thebestpreviousresultsonthesametrain-testsplitarefromSeekerandKuhn(2012),usingthegraph-basedparserofBohnet(2010)inapipelinearchi-tecture.Withthesameevaluationsetupasinthispaper,theyachieve91.50LASand93.48UAS–13LEXSOFTaverages0.132mspersentenceonanInteli7-3930Kprocessorwith6cores,against0.112msforPIPELINE.14ItisworthnotingthatthereareanumberofmorerecentparsingresultsforCzech,buttheyalluseadifferenttestset(andoftenadifferenttrainingset),usuallyfromoneoftheCoNLLsharedtasksin2006(BuchholzandMarsi,2006),2007(Nivreetal.,2007)and2009(Hajiˇcetal.,2009).Forthe2009dataset,thebestresultsare83.73LASand88.82UASfromBohnetandNivre(2012),whousetheSIMPLETAGmodelbutwithabeamsizeof80.Inoursetup,weoutperformthismodelby0.5pointsinbothLASandUAS.intheoriginalpaper,theyonlyreportresultswith-outpunctuation–tobecomparedwith92.4LASand94.1UASforourbestmodel.15Inaddition,ourPOSscoreof98.4isthehighestreportedforataggertrainedonlyontheTigerTreebank,outper-formingthepreviousbestfromBohnetandNivre(2012)by0.3percentagepoints.Theonlyprevi-ousresultsonHungarianusingthesameversionofthetreebankarefromFarkasetal.(2012),whore-port87.2LASand90.1UASforthegraph-basedparserofBohnet(2010).Ourbestresultsimprovelabeledaccuracyby2.1points(89.3LAS)andun-labeledaccuracyby1.6points(91.7UAS),whichisagainquitesubstantial.ForRussian,Boguslavskyetal.(2011)report86.0LASand90.0UASusingtherule-basedETAP-3parserwithanaddedstatisticalmodelandjointmorphologicalandsyntacticdisam-biguation.Thescoresarenotstrictlycomparable,becauseweuseamorerecentversionoftheSyn-TagRustreebank(May2013vs.April2011),butourresultsneverthelessshowsubstantialimprovements,inparticularforUAS(93.0)butalsoforLAS(88.0).8ConcludingRemarksWehavepresentedthefirstsystemthatperformsfullmorphologicaldisambiguationandlabelednon-projectivedependencyparsinginajointmodel,andwehavedemonstrateditsusefulnessforparsingrichlyinflectedlanguages.Athoroughempiricalinvestigationofjointpredictionmodels,rule-basedlexicalconstraints,anddistributionalwordclustershasshownsubstantialimprovementsinaccuracyforfivelanguages.Inthefuture,wehopetoconductadetailederroranalysisforalllanguages,whichmaygiveusmoreinsightaboutthebenefitsofdifferentcomponentsandhopefullypavethewayforfurtherimprovements.AcknowledgmentsWorkpartlyfundedbytheprojectsLM2010013andLH12093oftheMEYSoftheCzechRepublicandtheNationalExcellenceProgramoftheStateofHungary(T´AMOP4.2.4.A/2-11-1-2012-0001).15AsinthecaseofCzech,therearemanyrecentresultsforGermanbasedontheCoNLL2009datasets,butthepreviousbestiswiththeSIMPLETAGmodelofBohnetandNivre(2012),whichweoutperformby0.5/0.3pointsinLAS/UAS.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
426
ReferencesJu.Apresian,I.Boguslavsky,L.Iomdin,A.Lazursky,V.Sannikov,V.Sizov,andL.Tsinman.2003.ETAP-3linguisticprocessor:Afull-fledgedNLPimplementa-tionoftheMTT.InProceedingsoftheFirstInter-nationalConferenceonMeaning-TextTheory,pages279–288.IgorBoguslavsky,SvetlanaGrigorieva,NikolaiGrig-oriev,LeonidKreidlin,andNadezhdaFrid.2000.DependencytreebankforRussian:Concept,tools,typesofinformation.InProceedingsofthe18thIn-ternationalConferenceonComputationalLinguistics(COLING),pages987–991.IgorBoguslavsky,IvanChardin,SvetlanaGrigorieva,NikolaiGrigoriev,LeonidIomdin,LeonidKreidlin,andNadezhdaFrid.2002.Developmentofadepen-dencytreebankforRussiananditspossibleapplica-tionsinNLP.InProceedingsofthe3rdInternationalConferenceonLanguageResourcesandEvaluation(LREC),pages852–856.IgorBoguslavsky,LeonidIomdin,VictorSizov,LeonidTsinman,andVadimPetrochenkov.2011.Rule-baseddependencyparserrefinedbyempiricalandcorpusstatistics.InProceedingsoftheInternationalConfer-enceonDependencyLinguistics,pages318–327.AlenaB¨ohmov´a,JanHajiˇc,EvaHajiˇcov´a,andBarboraHladk´a.2003.ThePragueDependencyTreebank:Athree-levelannotationscenario.InAnneAbeill´e,ed-itor,Treebanks:BuildingandUsingParsedCorpora,pages103–127.Kluwer.BerndBohnetandJonasKuhn.2012.Thebestofbothworlds–agraph-basedcompletionmodelfortransition-basedparsers.InProceedingsofthe13thConferenceoftheEuropeanChpateroftheAssocia-tionforComputationalLinguistics(EACL),pages77–87.BerndBohnetandJoakimNivre.2012.Atransition-basedsystemforjointpart-of-speechtaggingandla-belednon-projectivedependencyparsing.InProceed-ingsofthe2012JointConferenceonEmpiricalMeth-odsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning(EMNLP-CoNLL),pages1455–1465.BerndBohnet.2010.Topaccuracyandfastdependencyparsingisnotacontradiction.InProceedingsofthe23rdInternationalConferenceonComputationalLin-guistics(COLING),pages89–97.SabineBrants,StefanieDipper,SilviaHansen,WolfgangLezius,andGeorgeSmith.2002.TIGERtreebank.InProceedingsofthe1stWorkshoponTreebanksandLinguisticTheories(TLT),pages24–42.PeterF.Brown,VincentJ.DellaPietra,PeterV.deSouza,JenniferC.Lai,andRobertL.Mercer.1992.Class-basedn-grammodelsofnaturallanguage.Computa-tionalLinguistics,18:467–479.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsofthe10thConferenceonComputationalNaturalLanguageLearning(CoNLL),pages149–164.ShayB.CohenandNoahA.Smith.2007.Jointmorpho-logicalandsyntacticdisambiguation.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning(EMNLP-CoNLL),pages208–217.ShayB.Cohen,CarlosG´omez-Rodr´ıguez,andGiorgioSatta.2011.Exactinferenceforgenerativeprobabilis-ticnon-projectivedependencyparsing.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNat-uralLanguageProcessing,pages1234–1245.MichaelCollinsandBrianRoark.2004.Incrementalparsingwiththeperceptronalgorithm.InProceed-ingsofthe42ndAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages112–119.MichaelCollins.2002.Discriminativetrainingmeth-odsforhiddenmarkovmodels:Theoryandexperi-mentswithperceptronalgorithms.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(EMNLP),pages1–8.BrookeCowanandMichaelCollins.2005.Morphologyandrerankingforthestatisticalparsingofspanish.InProceedingsoftheHumanLanguageTechnologyCon-ferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),pages795–802.KobyCrammer,OferDekel,JosephKeshet,ShaiShalev-Shwartz,andYoramSinger.2006.Onlinepassive-aggressivealgorithms.JournalofMachineLearningResearch,7:551–585.Rich´ardFarkas,VeronikaVincze,andHelmutSchmid.2012.Dependencyparsingofhungarian:Baselinere-sultsandchallenges.InProceedingsofthe13thCon-ferenceoftheEuropeanChpateroftheAssociationforComputationalLinguistics(EACL),pages55–65.AndreaGesmundo,JamesHenderson,PaolaMerlo,andIvanTitov.2009.Alatentvariablemodelofsyn-chronoussyntactic-semanticparsingformultiplelan-guages.InProceedingsoftheThirteenthConfer-enceonComputationalNaturalLanguageLearning(CoNLL2009):SharedTask,pages37–42.YoavGoldbergandMichaelElhadad.2013.Wordseg-mentation,unknown-wordresolution,andmorpholog-icalagreementinahebrewparsingsystem.Computa-tionalLinguistics,39:121–160.YoavGoldbergandReutTsarfaty.2008.Asinglegener-ativemodelforjointmorphologicalsegmentationand
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
427
syntacticparsing.InProceedingsofthe46thAnnualMeetingoftheAssociationforComputationalLinguis-tics(ACL),pages371–379.JanHajiˇcandBarboraHladk´a.1998.TaggingInflectiveLanguages:PredictionofMorphologicalCategoriesforaRich,StructuredTagset.InProceedingsofthe36thAnnualMeetingoftheAssociationforCompu-tationalLinguistics(ACL)andthe17thInternationalConferenceonComputationalLinguistics(COLING),pages483–490.JanHajiˇc,BarboraVidovaHladka,JarmilaPanevov´a,EvaHajiˇcov´a,PetrSgall,andPetrPajas.2001.PragueDependencyTreebank1.0.LDC,2001T10.JanHajiˇc,MassimilianoCiaramita,RichardJohans-son,DaisukeKawahara,MariaAnt`oniaMart´ı,Llu´ısM`arquez,AdamMeyers,JoakimNivre,SebastianPad´o,JanˇStˇep´anek,PavelStraˇn´ak,MihaiSurdeanu,NianwenXue,andYiZhang.2009.Theconll-2009sharedtask:Syntacticandsemanticdependen-ciesinmultiplelanguages.InProceedingsoftheThir-teenthConferenceonComputationalNaturalLan-guageLearning(CoNLL):SharedTask,pages1–18.JanHajiˇc.2000.Morphologicaltagging:Datavs.dic-tionaries.InProceedingsoftheFirstMeetingoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics(NAACL),pages94–101.P´eterHal´acsy,Andr´asKornai,andCsabaOravecz.2007.HunPos–anopensourcetrigramtagger.InProceed-ingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics:CompanionVolumePro-ceedingsoftheDemoandPosterSessions,pages209–212.JunHatori,TakuyaMatsuzaki,YusukeMiyao,andJun’ichiTsujii.2011.Incrementaljointpostagginganddependencyparsinginchinese.InProceedingsof5thInternationalJointConferenceonNaturalLan-guageProcessing,pages1216–1224.KatriHaverinen,JennaNyblom,TimoViljanen,VeronikaLaippala,SamuelKohonen,AnnaMissil¨a,StinaOjala,TapioSalakoski,andFilipGinter.2013.BuildingtheessentialresourcesforFinnish:theTurkuDependencyTreebank.LanguageResourcesandEvaluation.LiangHuangandKenjiSagae.2010.Dynamicprogram-mingforlinear-timeincrementalparsing.InProceed-ingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1077–1086.LeonidIomdinandViktorSizov.2008.Lexicographer’scompanion:Auser-friendlysoftwaresystemforen-largingandupdatinghigh-profilecomputerizedbilin-gualdictionaries.InLexicographicToolsandTech-niques.MONDILEXFirstOpenWorkshop,pages42–54.TerryKoo,XavierCarreras,andMichaelCollins.2008.Simplesemi-superviseddependencyparsing.InPro-ceedingsofthe46thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages595–603.TerryKoo,AlexanderM.Rush,MichaelCollins,TommiJaakkola,andDavidSontag.2010.Dualdecompo-sitionforparsingwithnon-projectiveheadautomata.InProceedingsofthe2010ConferenceonEmpiri-calMethodsinNaturalLanguageProcessing,pages1288–1298.SandraK¨ubler,RyanMcDonald,andJoakimNivre.2009.DependencyParsing.MorganandClaypool.MarcoKuhlmann,CarlosG´omez-Rodr´ıguez,andGior-gioSatta.2011.Dynamicprogrammingalgorithmsfortransition-baseddependencyparsers.InProceed-ingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages673–682.JohnLee,JasonNaradowsky,andDavidA.Smith.2011.Adiscriminativemodelforjointmorphologicaldisam-biguationanddependencyparsing.InProceedingsofthe49thAnnualMeetingoftheAssociationforCom-putationalLinguistics(ACL),pages885–894.JensNilsson,JoakimNivre,andJohanHall.2006.Graphtransformationsindata-drivendependencyparsing.InProceedingsofthe21stInternationalCon-ferenceonComputationalLinguisticsandthe44thAn-nualMeetingoftheAssociationforComputationalLinguistics,pages257–264.JoakimNivre,JohanHall,SandraK¨ubler,RyanMcDon-ald,JensNilsson,SebastianRiedel,andDenizYuret.2007.TheCoNLL2007sharedtaskondependencyparsing.InProceedingsoftheCoNLLSharedTaskofEMNLP-CoNLL2007,pages915–932.JoakimNivre,MarcoKuhlmann,andJohanHall.2009.Animprovedoraclefordependencyparsingwithonlinereordering.InProceedingsofthe11thInternationalConferenceonParsingTechnologies(IWPT’09),pages73–76.JoakimNivre.2003.Anefficientalgorithmforpro-jectivedependencyparsing.InProceedingsofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages149–160.JoakimNivre.2004.Incrementalityindeterministicde-pendencyparsing.InProceedingsoftheWorkshoponIncrementalParsing:BringingEngineeringandCog-nitionTogether(ACL),pages50–57.JoakimNivre.2009.Non-projectivedependencyparsinginexpectedlineartime.InProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLan-guageProcessingoftheAFNLP(ACL-IJCNLP),pages351–359.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
8
1
5
6
6
6
8
5
/
/
t
l
a
c
_
a
_
0
0
2
3
8
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
428
TommiA.Pirinen.2011.Modularisationoffinnishfinite-statelanguagedescription–towardswidecol-laborationinopensourcedevelopmentofamorpho-logicalanalyser.InProceedingsofthe18thNordicConferenceofComputationalLinguistics(NODAL-IDA),pages299–302.KenjiSagaeandAndrewS.Gordon.2009.Clusteringwordsbysyntacticsimilarityimprovesdependencyparsingofpredicate-argumentstructures.InProceed-ingsofthe11thInternationalConferenceonParsingTechnologies(IWPT),pages192–201.HelmutSchmid,ArneFitschen,andUlrichHeid.2004.SMOR:AGermancomputationalmorphologycover-ingderivation,compositionandinflection.InPro-ceedingsofthe4thInternationalConferenceonLan-guageResourcesandEvaluation(LREC),pages1263–1266.WolfgangSeekerandJonasKuhn.2012.Makingel-lipsesexplicitindependencyconversionforager-mantreebank.InProceedingsofthe8thInternationalConferenceonLanguageResourcesandEvaluation(LREC),pages3132–3139.WolfgangSeeker,BerndBohnet,LiljaØvrelid,andJonasKuhn.2010.Informedwaysofimprovingdata-drivendependencyparsingforgerman.InColing2010:Posters,pages1122–1130.JohankaSpoustov´aandMiroslavSpousta.2012.Ahigh-qualitywebcorpusofczech.InProceedingsofthe8thInternationalConferenceonLanguageResourcesandEvaluation(LREC2012),pages311–315.IvanTitovandJamesHenderson.2007.Alatentvariablemodelforgenerativedependencyparsing.InProceed-ingsofthe10thInternationalConferenceonParsingTechnologies(IWPT),pages144–155.ViktorTr´on,P´eterHal´acsy,P´eterRebrus,Andr´asRung,EszterSimon,andP´eterVajda.2006.Morphdb.hu:Hungarianlexicaldatabaseandmorphologicalgram-mar.InProceedingsofthe5thInternationalConfer-enceonLanguageResourcesandEvaluation(LREC),pages1670–1673.ReutTsarfaty,Djam´eSeddah,YoavGoldberg,San-draKuebler,YannickVersley,MarieCandito,Jen-niferFoster,InesRehbein,andLamiaTounsi.2010.Statisticalparsingofmorphologicallyrichlanguages(spmrl)what,howandwhither.InProceedingsoftheNAACLHLT2010FirstWorkshoponStatisticalPars-ingofMorphologically-RichLanguages,pages1–12.ReutTsarfaty,Djam´eSeddah,SandraK¨ubler,andJoakimNivre.2013.Parsingmorphologicallrichlanguages:Introductiontothespecialissue.ComputationalLin-guistics,39:15–22.ReutTsarfaty.2006.Integratedmorphologicalandsyn-tacticdisambiguationformodernhebrew.InPro-ceedingsoftheCOLING/ACL2006StudentResearchWorkshop,pages49–54.Tam´asV´aradi.2002.Thehungariannationalcorpus.InProceedingsofthe3rdInternationalConferenceonLanguageResourcesandEvaluation(LREC),pages385–389.HiroyasuYamadaandYujiMatsumoto.2003.Statisti-caldependencyanalysiswithsupportvectormachines.InProceedingsofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages195–206.YueZhangandStephenClark.2008.Ataleoftwoparsers:Investigatingandcombininggraph-basedandtransition-baseddependencyparsing.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages562–571.YueZhangandJoakimNivre.2011.Transition-basedparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAssociationforCom-putationalLinguistics(ACL).YueZhangandJoakimNivre.2012.Analyzingtheef-fectofgloballearningandbeam-searchontransition-baseddependencyparsing.InProceedingsofCOL-ING2012:Posters,pages1391–1400.
Scarica il pdf