Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 343–356, 2016. Editor de acciones: Joakim Nivré.
Lote de envío: 1/2016; Lote de revisión: 4/2016; Publicado 7/2016.
2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
C
(cid:13)
Multi-lingualDependencyParsingEvaluation:aLarge-scaleAnalysisofWordOrderPropertiesusingArtificialDataKristinaGulordavaandPaolaMerloDepartmentofLinguisticsUniversityofGeneva5RuedeCandolle,CH-1211Gen`eve4kristina.gulordava@unige.ch,paola.merlo@unige.chAbstractThegrowingworkinmulti-lingualparsingfacesthechallengeoffaircomparativeeval-uationandperformanceanalysisacrosslan-guagesandtheirtreebanks.Thedifficultyliesinteasingapartthepropertiesoftreebanks,suchastheirsizeoraveragesentencelength,fromthoseoftheannotationscheme,andfromthelinguisticpropertiesoflanguages.Wepro-poseamethodtoevaluatetheeffectsofwordorderofalanguageondependencyparsingperformance,whilecontrollingforconfound-ingtreebankproperties.Themethodusesartificially-generatedtreebanksthataremini-malpermutationsofactualtreebankswithre-specttotwowordorderproperties:wordor-dervariationanddependencylengths.Basedontheseartificialdataontwelvelanguages,weshowthatlongerdependenciesandhigherwordordervariabilitydegradeparsingperfor-mance.Ourmethodalsoextendstomini-malpairsofindividualsentences,leadingtoafiner-grainedunderstandingofparsingerrors.1IntroductionFaircomparativeperformanceevaluationacrosslan-guagesandtheirtreebanksisoneofthedifficul-tiesforworkonmulti-lingualparsing(BuchholzandMarsi,2006;Nivreetal.,2007;Seddahetal.,2011).Thedifferencesinparsingperformancecanbetheresultofdisparatepropertiesoftreebanks(suchastheirsizeoraveragesentencelength),choicesinan-notationschemes,andthelinguisticpropertiesoflanguages.Despiterecentattemptstocreateandapplycross-linguisticandcross-frameworkevalua-tionprocedures(Tsarfatyetal.,2011;Seddahetal.,2013),thereisnocommonlyusedmethodofanal-ysisofparsingperformancewhichaccountsfordif-ferentlinguisticandextra-linguisticfactorsoftree-banksandteasesthemapart.Wheninvestigatingpossiblecausalfactorsforob-servedphenomena,onepowerfulmethod,ifavail-able,consistsininterveningonthepostulatedcausestoobservepossiblechangesintheobservedeffects.Inotherwords,ifAcausesB,thenchangingAorpropertiesofAshouldresultinanobservablechangeinB.Thisinterventionistapproachtothestudyofcausalitycreatescounterfactualdataandatypeofcontrolledmodificationthatiswide-spreadinexperimentalmethodology,butthatisnotwidelyusedinfieldsthatrelyonobservationaldata,suchascorpus-drivennaturallanguageprocessing.Inanalysesofparsingperformance,itiscustom-arytomanipulateandcontrolword-levelfeatures,suchaspart-of-speechtagsormorphologicalfea-tures.Thesetypesoffeaturescanbeeasilyomit-tedormodifiedtoassesstheircontributiontopars-ingperformance.However,higher-orderfeatures,suchaslinearwordorderprecedenceproperties,aremuchhardertodefineandtomanipulate.Aparsingperformanceanalysisbasedoncontrolledmodifica-tionofwordorder,infact,hasnotbeenreportedpre-viously.Weproposesuchamethodbasedonwordorderpermutationswhichallowsustomanipulatewordorderpropertiesanalogouslytofamiliarword-levelpropertiesandstudytheireffectonparsingper-formance.Specifically,givenadependencytreebank,weob-tainnewsyntheticdatabypermutingtheoriginalor-derofwordsinthesentences,keepingtheunordered
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
344
dependencytreeconstant.Thesepermutedsen-tencesarenotnecessarilygrammaticalintheorig-inallanguage.Theyconstituteanalternative“lan-guage”whichformsaminimalpairwiththeoriginalone,wheretheonlychangedpropertyistheorderofwords,butalltheotherpropertiesoftheunorderedtreeandtheconfoundingvariablesbetweenthetwodatasetsarekeptconstant,suchasthesizeofthetrainingdata,theaveragesentencelength,thenum-berofPoStagsandthedependencylabels.Weperformtwotypesofwordorderpermutationstothetreebanksinoursample:apermutationwhichminimisesthelengthsofthedependenciesinade-pendencytreeandapermutationwhichminimisesthevariabilityofwordorder.Wethencomparehowtheparsingperformancesontheoriginalandtheper-mutedtreesvaryinrelationtothequantifiedmea-suresofthedependencylengthandwordordervari-ationpropertiesofthetreebanks.Toquantifyde-pendencylength,weusetheratioofminimisationofthelengthofdependenciesbetweenwordsinthetree(dependencylengthminimisation,DLM(GildeaandTemperley,2010)).Toquantifythepropertyintu-itivelyreferredtoasvariabilityofwordorder,weusetheentropyofthelinearprecedenceorderingbe-tweenaheadandachildindependencyarcs(Liu,2010).Thereasontoconcentrateonthesetwowordor-derpropertiescomesfrompreviousparsingresults.Morphologically-richlanguagesareknowntobehardforparsing,asrichmorphologyincreasesthepercentageofnewwordsinthetestset(Nivreetal.,2007;Tsarfatyetal.,2010).Theselanguageshow-everalsooftenexhibitveryflexiblewordorder.Ithasnotsofarbeeninvestigatedhowmuchrichmor-phologycontributestoparsingdifficultycomparedtothedifficultyintroducedbywordordervariationinsuchlanguages.Thelengthofthedependenciesinthetreehasalsobeenshowntoaffectperformance:almostalltypesofdependencyparsers,indiffer-entmeasure,showdegradedperformanceforlongersentencesandlongerdependencies(McDonaldandNivre,2011).1WeusearcdirectionentropyandDLMratio,respectivamente,asthemeasuresofthesetwowordorderpropertiesbecausetheyareformally1ButseeTitovandHenderson(2007)foranexceptionandcomparisontoMalt.definedinthepreviousliteratureandcanbequanti-fiedonadependencytreebankinanylanguage.Topreviewourresults,inasetofpairwisecom-parisonsbetweenoriginalandpermutedtreebanks,weconfirmtheinfluenceofwordordervariabilityanddependencylengthonparsingperformance,atthelargescaleprovidedbyfourteendifferenttree-banksacrosstwelvedifferentlanguages.2Ourre-sultssuggest,inaddition,thatwordorderentropyappliesastrongernegativepressureonparsingper-formancethanlongerdependencies.Finally,onanexampleofonetreebank,weshowhowourmethodcanbeextendedtoprovidefiner-grainedanalysesatthesentencelevelandrelatetheparsingerrorstopropertiesoftheparsingarchitecture.2ParsinganalysisusingsyntheticdataInthissection,weintroduceournewapproachtousingsyntheticdataforcross-linguisticanalysisofparsingperformance.2.1MethodologyOurexperimentswithartificialdataconsistinmod-ifyingatreebankTtocreateitsminimalpairT0andevaluatingparsingperformancebycomparingthesepairsoftreebanks.Wecreateseveralkindsofarti-ficialtreebanksinthesamemanner:eachsentenceinT0isapermutationofthewordsoftheoriginalsentenceinT.Wepermutewordsinvariouswaysaccordingtothewordorderpropertywhoseeffectonparsingwewanttoanalyse.Importantly,weonlychangetheorderofwordsinasentence.Incon-trast,thedependencytreestructureofapermutedsentenceinT0isthesameasintheoriginalsentenceinT.Foreachtreebankinoursampleoflanguagesandatypeofpermutation,weconducttwoparsingeval-uations:TTrain→TTestandT0Train→T0Test.Thetraining-testdatasplitforTandT0isalwaysthesame,thatisT0Train=Permuted(TTrain)andT0Test=Permuted(TTest).Theparsingperfor-manceismeasuredasUnlabeledandLabeledAt-tachmentScores(UASandLAS),theproportionofcorrectlyattachedarcsintheunlabelledorlabelledtree,respectively.2Polish,italiano,Finnish,Español,Francés,Inglés,Bulgar-ian,latín(Vulgate,Cicero),Dutch,AncientGreek(NewTesta-ment,Herodotus),GermanandPersian.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
345
Giventhetraining-testingsetup,thedifferencesinunlabelledattachmentscoresUAS(TTest)−UAS(T0Test)canbedirectlyattributedtothedif-ferencesinwordorderpropertiesobetweenTandT0,abstractingawayfromothertreebankproper-tiesh.Moreformally,weassumethatUAS(t)=f(oT,hT)andUAS(T0)=f(oT0,hT).ExceptforwordorderpropertiesoTandoT0,thetwoequationsshareallothertreebankpropertieshT—suchassize,averagedependencylength,sizeofPoStagset—andfisafunctionthatappliestoalllanguages,hereembodiedbyagivenparser.Ourmethodcanbefurtherextendedtoanalyseparsingperformanceatthesentencelevel.Con-siderthepairconsistingofasentenceinanorigi-naltreebankanditscorrespondenceinapermutedtreebank.Thetwosentencessharealllexicalitemsandunderlyingdependenciesbetweenthem:theex-planationfordifferentparsingaccuraciesmustbesoughtthereforeintheirdifferentwordorders.Instandardtreebankevaluationsettings,en cambio,exactsentence-levelcomparisonsarenotpossible,astwosentencesveryrarelyconstituteatrulyminimalpairwithrespecttoanyspecificsyntacticproperty.Ourapproachopensupthepossibilityofdeeperunder-standingofparsingbehaviouratthesentence-levelandevenofindividualdependenciesbasedonlargesetsofminimalpairs.2.2WordorderpropertiesTobeabletocompareparsingperformanceacrosstheactualandthesyntheticdata,wemustmanip-ulatethecausalpropertywewanttostudy.Inthiswork,weconcentrateonvariabilityofwordorderandlengthofdependencies.Wedefineanddiscussthesetwopropertiesandtheirmeasuresbelow.ArcdirectionentropyOnedimensionthatcangreatlyaffectparsingperformanceacrosslanguagesiswordorderfreedom,theabilitylanguageshavetoexpressthesameorsimilarmeaninginthesamecontextwithafreechoiceofdifferentwordorders.Theextentofwordorderfreedominasentenceisreflectedintheentropyofwordorder,giventhewordsandthesyntacticstructureofthesentence,h(orden|palabras,árbol).Oneapproximationofwordorderentropyistheentropyofthedirectionofdependenciesinatree-bank.Thismeasurehasbeenproposedinseveralre-centworkstoquantitativelydescribethetypologyofwordorderfreedominmanylanguages(Liu,2010;Futrelletal.,2015b).Arcdirectionentropycanbeused,por ejemplo,tocapturethedifferencebetweenadjective-nounwordorderpropertiesinGermanicandRomancelanguages.InEnglish,thiswordorderisfixed,asadjectivesappearalmostexclusivelyprenominally;theadjective-nounarcdirectionentropywillthere-forebecloseto0.InItalian,bycontrast,thesameadjectivecanbothprecedeandfollownouns;theadjective-nounarcdirectionentropywillbegreaterthan0.Wecalculatetheoverallentropyofarcdirectionsinatreebankconditionedontherelationtypede-finedbythedependencylabelRelandthePoStagsoftheheadHandthechildC:h(Dir|Rel,h,C)(1)=Xrel,h,cp(rel,h,C)h(Dir|rel,h,C)Dirin(1)istheorderbetweenthechildandtheheadinthedependencyarc(LeftorRight).Inotherwords,wecomputetheentropyofarcdirec-tionH(Dir)=−p(l)·logp(l)−p(R)·logp(R)foreachobservedtuple(rel,h,C)independentlyandweighthemaccordingtothetuplefrequencyinthecorpora.DLMratioAnotherpropertythathasbeenshowntoaffectparsingperformanceacrosslanguagesandacrossparsersisthelengthofthedependenciesinthetree.3Aglobalmeasureofaveragedependencylengthofawholetreebankhasbeenproposedintheliteratureondependencylengthminimisation(DLM).Thismeasureallowscomparisonsacrosstreebankswithsentencesofdifferentsizeandacrossdependencytreesofdifferenttopology.Experimentalandtheoreticallanguageresearchhasyieldedalargeanddiversebodyofevidenceshowingthatlanguages,synchronicallyanddi-achronically,tendtominimisethelengthoftheirdependencies(Hawkins,1994;Gibson,1998;Dem-bergandKeller,2008;Teja,2010;Gulordavaand3Thelengthofadependency,DL(arc)abajo,isthenumberofwordsinthespancoveredbythedependencyarc.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
346
Merlo,2015;Gulordavaetal.,2015).Languagesdiffer,sin embargo,inthedegreetowhichtheymin-imisedependencies.AlowdegreeofDLMisas-sociatedwithflexibilityofwordorderandinpartic-ularwithhighnon-projectivity,i.e.,thepresenceofcrossingarcsinatree,afeaturethathasbeentreatedindependencyparsingusinglocalwordorderper-mutations(Hajiˇcov´aetal.,2004;Nivre,2009;Titovetal.,2009;Hendersonetal.,2013).ToestimatethedegreeofDLMinalanguage,wefollowprevi-ousworkwhichanalysedthedependencylengthsinatreebankwithrespecttotheirrandomandminimalpotentialalternatives(Temperley,2007;GildeaandTemperley,2010;Futrelletal.,2015a;GulordavaandMerlo,2015).WecalculatetheoverallratioofDLMinatree-bankasshowninequation2.(2)DLMRatio=ΣsDLs|s|2/ΣsOptDLs|s|2Foreachsentencesanditsdependencytreet,wecomputetheoveralldependencylengthoftheoriginalsentenceDL(s)=Parc∈tDL(arc)anditsminimalprojectivedependencylengthOptDL(s)=DL(s0),wheres0isobtainedbyreorderingthewordsinthesentencesusingthealgorithmdescribedinthenextsection(followingGildeaandTemperley(2010)).Toaveragethesevaluesacrossallsentences,wenormalisethemby|s|2,sinceithasbeenobservedempiricallythattherelationbetweenthedependencylengthsDLandOptDLandthelength|s|ofasentenceisnotlin-ear,butratherquadratic(Ferrer-i-CanchoandLiu,2014;Futrelletal.,2015a).4Inthenextsection,weillustratehowwecreatetwopairsof(t,T0)treebanks,manipulatingthetwowordorderpropertiesjustdiscussed.3WordorderpermutationsWecreatetwotypesofpermutedtreebankstoopti-miseforthetwowordorderparametersconsideredintheprevioussection.4WefollowpreviousworkinusingDL(s)asthemeasureforDLMratiocalculation.Equivalently,wecouldusetheav-eragelengthofasingledependencyhDL(arc)i.GiventhathDL(s)i=|s|·hDL(arc)i,thefactthathDL(s)i∼|s|2canbemorenaturallystatedashDL(arc)i∼|s|:theaver-agelengthofasingledependencyislinearwithrespecttothesentencelength.3.1CreatingtreeswithoptimalDLGivenasentencesanditsdependencytreetinanaturallanguage,weemploythealgorithmproposedbyGildeaandTemperley(2010)tocreateanewar-tificialsentences0withapermutedorderofwords.Thealgorithmreordersthewordsinasentencestoyieldtheprojectivedependencytreewiththemini-maloveralldependencylengthDL(s0).5Todoso,itrecursivelyplacesthechildrenontheleftandontherightoftheheadinalternation,sothatthechil-drenonthesamesideoftheheadareorderedbasedontheirsizes—shortestphrasesclosertothehead.Childrenofthesamesizeareorderedbetweeneachotherasfoundintheoriginalsentence.Thisalgorithmisdeterministicandthedepen-dencylengthofeachsentenceisoptimisedindepen-dently.Weexcludefromouranalysissentenceswithanynon-finalpunctuationtokensandsentenceswithmultipleroots.Bydefinition,theDLMratioforsen-tencespermutedinsuchawayisequalto1.3.2CreatingtreeswithoptimalEntropyToobtaintreebankswithaminimalarcdirectionen-tropyequaltozero,wecanfixtheorderofeachtypeofdependency,definedbyatuple(rel,h,C).Thereexistthereforemanypossiblepermutationsre-sultinginzeroarcdirectionentropy.Wechoosetoassignthesamedirection(eitherLeftorRight)toallthedependencies.Thisresultsintwopermuta-tionsyieldingfullyright-branching(RB)andfullyleft-branching(libra)treebanks.Weorderthechil-drenonthesamesideofaheadinthesamewayasintheOptDLpermutation:theshortestchildrenareclosesttothehead.ForRBpermutation,chil-drenofthesamesizearekeptintheorderoftheoriginalsentence;forLBpermutation,thisorderisreversed,sothattheRBandLBordersaresymmet-rical.Thesetwopermutationsareparticularlyin-teresting,astheygiveusthetwoextremesinthespaceofpossibletree-branchingstructures.More-over,sincetheLB/RBwordordersforeachsen-tencearecompletelysymmetrical,thetwotreebanks5Inprinciple,anorderwithminimalDLcanbenon-projective.However,suchcasesarerareinnaturallanguagetrees,whichhavelimitedtopology.Inparticular,naturallan-guagetreeshavesmallaveragebranchingfactors,whileanon-projectiveorderwithminimalDLoccursonlyifatleastonenodeofout-degree3ispresentinthetree(Chung,1984).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
347
constituteaminimalpairwithrespecttothetree-branchingparameter.Importantly,thereexistbothpredominantlyright-branching(e.g.English)andleft-branchingnaturallanguages(Japanese,persa)andthecomparisonbetweenLB/RB-permutedtreebankswillshowhowmuchofthedifferenceinparsingoftypologicallydifferentnaturallanguagescanbeattributedtotheirdifferentbranchingdirections.Ofcourse,thepars-ingsensitivitytotheparameterdependsonthepars-ingarchitecture.Asdiscussedindetailbelow,wein-vestigatebothgraph-basedandtransition-basedar-chitectures.Foragraph-basedparser,wedonotex-pecttoobservemuchdifferenceinparsingperfor-manceduetodirectionality,givenitsglobaloptimi-sationstrategy.Ontheotherhand,atransition-basedparserreliesonleft-to-rightprocessingofwordsandthefullyright-branchingorfullyleft-branchingor-derscanyielddifferentresults.3.3DependencyTreebanksWeuseasampleoffourteendependencytreebanksfortwelvelanguages.ThetreebanksforBulgarian,Inglés,Finnish,Francés,Alemán,ItalianandSpan-ishcomefromtheUniversalDependencyProjectandareannotatedwiththesameannotationscheme(Agi´cetal.,2015).WeusethetreebankforDutchfromtheCONLL2006sharedtask(BuchholzandMarsi,2006).ThePolishtreebankisdescribedinWoli´nskietal.(2011)andthePersiantreebankinRasoolietal.(2013).Inaddition,weusetwoLatinandtwoAncientGreekdependencyannotatedtexts(HaugandJøhndal,2008)becausetheselan-guagesarewell-knownforhavingveryfreewordorder.6Thequantitativepropertiesofthesetree-banksarepresentedinTable1(secondandthirdcol-umn).Thissetoftreebanksincludesthosetreebankswhichhadatleast3,000sentencesintheirtrainingsetaftereliminatingsentencesnotfitforpermutation(withpunctuationtokensormultipleroots).Thisex-cludedfromouranalysissomeotherwisetypologi-6TheLatincorporacompriseworksofCicero(circa40BC)andVulgate(Bibletranslation,4thcenturyAD).TheAncientGreekcorporaareworksofHerodotus(4thcenturyBC)andNewTestament(4thcenturyAD).Despitethefactthattheybe-longtothesamelanguage,thesepairsoftextsofdifferenttimeperiodsshowquitedifferentwordorderproperties(GulordavaandMerlo,2015).callyinterestinglanguagessuchasBasqueandAra-bic.Whereavailable,weusedthetraining-testsplitofatreebankprovidedbyitsdistributors;inothercaseswesplitthetreebankrandomlywitha9-to-1training-testsetproportion.3.4WordorderpropertiesoforiginalandpermutedtreebanksTable1presentsthetreebanksinoursampleandthevaluesofDLMratioandEntropymeasurescalcu-latedonthetrainingsetoftheoriginalnon-permutedtreebanks.Fromthesedata,weconfirmthattheDLMratioandEntropymeasurescapturediffer-entwordorderpropertiesastheyarenotcorrelated(Spearmancorrelationr=0.32,p>0.1).Forex-ample,wecanfindlanguageswithbothlowDLMratioandhighEntropy(Finnish)andhighDLMra-tioandlowEntropy(persa).Además,thesetwomeasuresarenotasimplereflexofgeneticsimi-laritybetweenlanguagesofthesamefamily:forex-ample,Polish(Indo-Europeanfamily)andFinnish(Finno-Ugricfamily)areclusteredtogetheraccord-ingtotheirwordorderproperties.Table1alsoshowshowtheDLMratioandEn-tropyvalueschange,whenweapplythetwoper-mutationstothetreebanks.Forthetreebanksper-mutedtoobtainminimaldependencylength(DLMratio=1),wepresentEntropyvaluesinthecolumn‘OptDLEntropy’.Forthetreebankspermutedtoobtainminimalentropy(Entropy=0),wepresentDLMratiovaluesinthecolumn‘LB/RBDLMra-tio’.Withrespecttothevaluesoftheoriginaltree-banks,theDLMratioandEntropyvaluesofthearti-ficialtreebanksaremuchmorenarrowlydistributed:1.17±0.02(mean±SD)comparedto1.19±0.07forDLMratioand0.59±0.03comparedto0.27±0.17forEntropy.Noticealsothat,onaverage,thetreebanksintheLB/RBpermutedsethavebothlowerentropyandlowerDLMratiothantheoriginaltreebanks.ThetreebanksintheOptDLsethavelowerDLMratio,butalsohigherentropythantheoriginaltreebanks.3.5ParsingsetupToevaluatetheimpactofwordorderpropertiesonparsingperformance,weuseMSTParser(McDon-aldetal.,2006)andMaltParser(Nivreetal.,2006)—twowidelyusedrepresentativesoftwomainde-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
348
LanguageAbbr.SizeAv.sentenceOriginaltreebanksLB/RBOptDLlengthDLMratioEntropyDLMratioEntropyPolishpl29k6.81.130.341.180.55Italianit57k12.11.130.181.180.60Finnishfi46k5.71.130.341.190.53Spanishes63k15.11.150.151.170.62Frenchfr72k14.51.150.111.200.62Englishen62k9.51.170.091.160.58Bulgarianbg30k8.51.170.201.170.58Vulgate(La)la.V63k8.81.170.431.180.59Dutchnl38k8.41.170.261.120.52NewTest(AG)grc.NT69k10.51.190.381.170.62Germande65k11.51.240.211.210.62Cicero(La)la.C35k11.61.260.421.150.61Persianfa35k9.41.330.131.150.61Herodotus(AG)grc.H59k14.41.330.461.200.64Mean(±st.deviation)1.19±0.070.27±0.171.17±0.020.59±0.03Table1:Trainingsize(innumberofwords),averagesentencelength,DLMratioandarcdirectionentropy(Entropy)measuresforthetreebanksinoursample.Thecolumn‘LB/RBDLMratio’presentstheDLMratioforLB/RB-permutedtreebanksoptimisedforzeroentropy;thecolumn‘OptDLEntropy’presentsthearcdirectionentropyforOptDL-permutedtreebanksoptimisedforminimalDLMratio.LanguageOriginalOptDLLBRBUASLASUASLASUASLASUASLASPolish9288948894899489Italian9491918594909591Finnish8380858190859187Spanish8681807285768880French8480817490829185English9088857989838983Bulgarian9389928592859387Vulgate(La)8681888193869386Dutch8884938795909590NewTestament(AG)8579888193859173German8680847589788981Cicero(La)6759796788768776Persian8374847390809080Herodotus(AG)7265837489798867Average8580867991839183Table2:PercentlabelledandunlabelledaccuraciesofMaltParserontheoriginaltreebanks,onthetreebankspermutedforoptimaldependencylength(OptDL),andontheleft-branching(libra)andright-branching(RB)permuteddatathatminimiseentropy.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
349
pendencyparsingarchitectures:agraph-basedpars-ingarchitectureandatransition-basedarchitecture.Thegraph-basedarchitectureisknowntobelessde-pendentonwordorderanddependencylengththantransition-baseddependencyparsers,asitsearchesthewholespaceofpossibleparsetreesandsolvesaglobaloptimisationproblem(McDonaldandNivre,2011).Toachievecompetitiveperformance,thetransition-basedMaltParsermustbeprovidedwithalistoffeaturestailoredforeachtreebankandeachlanguage.WeusedtheMaltOptimizerpackage(BallesterosandNivre,2012),tofindthebestfeaturesbasedonthetrainingset.Bycontrast,MSTParseristrainedonallthetreebanksinoursampleusingthedefaultconfiguration(first-orderprojective).4ExperimentsandresultsInthissection,weillustratethepowerofthetech-niqueandthefine-grainedanalysessupportedbyitwitharangeofplanned,pairwisequantitativeandqualitativeanalysesoftheparsingresults.4.1ComparisonofparsingperformancebetweenoriginalandpermutedtreebanksTable2presentstheparsingresultsforMaltParserfortheoriginaltreebanksandthethreesetsofper-mutedtreebanks(OptDL,libra,RB).Table3presentstheresultsonthesamedataforMSTParser.ForMST,theparsingperformancesonthefullyleft-branchingandright-branchingtreebanksareidenti-cal,asexpected,whenpercentagesareroundedatthetwo-digitlevel,whichiswhatwereporthere.Asdiscussedintheintroduction,acomparisonbetweenparsersinamultilingualsettingisnotstraightforward.Instead,weattempttounderstandtheircommonbehaviourwithrespecttothewordor-derpropertiesoflanguages.Thefirstobservationisthat,en general,allthreesetsofpermuteddataareeas-iertoparsethantheoriginaldata,forbothparsers.Weobserveanincreaseof+1%and+6%UASforOptDLandLB/RBdata,respectivamente,forMalt,andanincreaseof+4%and+8%UASforOptDLandLB/RBdata,respectivamente,forMST.Thebetterre-sultsontheLB/RBpermuteddatamustbeduetotheobservationabove:theLB/RBdatahavebothLang.OriginalOptDLLB/RBUASLASUASLASUASLASpl938595849585it938891859487fi847987809184es856784649168fr847086689271en888587799182bg938791839284la.V847590799482nl857993859588grc.NT847489789383de876788669169la.C685484678972fa847386749179grc.H695786719076Av.847488769279Table3:PercentlabelledandunlabelledaccuraciesofMSTParserontheoriginaltreebanks,onthetreebankspermutedforoptimaldependencylength(OptDL),andontheleft/right-branching(LB/RB)permuteddata.lowerEntropyandDLMratiothantheoriginaldata.Overall,theperformanceoftheparsersonourar-tificialtreebanksconfirmsthatthelengthsofthede-pendenciesandthewordordervariabilityaretwofactorsthatnegativelyaffectparsingaccuracy.TwoillustrativeexamplesareLatin,alanguagewell-knownforitsvariablewordorder(asconfirmedbythehighentropyvaluesof0.42and0.43forourtwotreebanks),andGerman,alanguageknownforitslongdependencies(asconfirmedbyitshighDMLratioof1.24).FortheCicerotext,forexample,wecanconcludethatindeeditsvariablewordorderistheprimaryreasonfortheverylowparsingperfor-mances(67%–68%UAS).Thesenumbersimprovesignificantlywhenthetreebanksarerearrangedinafixedwordorder(87%–89%UAS).Thispermu-tationreducesDLMby0.11andreducesentropyby0.42,yieldingtheveryconsiderableincreaseinUASof21%.Theotherpermutation,whichopti-misesDL,reducesDLMby0.26,butincreasesen-tropyby0.19.ThisincreaseinentropydampensthebeneficialeffectofDLreductionandperformanceincreases12%,lessthaninthefixed-orderpermu-tation.ForGerman,ouranalysisgivesthesameoverallresults.TheDLMratiointheRB/LBsce-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
350
nariodecreasesslightly(from1.24to1.21)anditsentropyalsodecreases(-0.21).TheperformanceoftheparsersonRB/LB-permuteddataisbetterthanontheoriginaldata(89%–91%against86%–87%UAS).Además,whenDLMisreduced(-0.24,intheOptDLpermutation),butentropyisincreased(from0.21to0.62),wefindareductioninperfor-manceforMalt(from86%to84%forUAS).Thesedataweaklysuggestthatthewordordervariabil-ityofGerman,minimisedintheRB/LBcase,hashigherimpactonparsingdifficultythanitswell-knownlongdependencies.Amoredetailedpictureemergeswhenwecom-parepairwisetheoriginaltreebankstothepermutedtreebanksforeachofthelanguages.Forthisanaly-sis,weusethemeasureofunlabelledaccuracy,sinceattachmentdecisionsaremoredirectlydependentonwordorderthanlabellingdecisions,whichareme-diatedbycorrectattachments.Hence,welimitouranalysistothespaceofthreeparameters:DLMra-tio,EntropyandUAS.Figures1(OptDL)and2(RB)plotthediffer-encesinUASofMaltParserbetweenpairsofthepermutedandtheoriginaltreebanksforeachlan-guagetothedifferencesinDLMratioandEn-tropybetweenthesetreebanks.Ourdependentvari-ableis∆UAS=UAS(T0)−UAS(t)computedfromTable2.Thex-axisandthey-axisvalues∆DLM=DLMRatio(t)−DLMRatio(T0)and∆Entropy=Entropy(t)−Entropy(T0)com-putethedifferencesofthemeasuresbetweentheoriginaltreebankandthepermutedtreebankbasedonthenumbersinTable1.Wehavechosentocal-culatethesedifferencesreversingthetwofactors,comparedtothe∆UASvalue,forbetterreadabil-ityofthefigures:anincreaseintheaxesvalues(en-tropyordependencylengths)shouldcorrespondtothedecreaseindifficultyofparsingandthereforetotheincreaseofthedependentvariable∆UAS.ThesamerelativevaluesofthemeasuresandtheparsingaccuracyforMSTParserresultinverysimilarplots,whichwedonotincludehereforreasonsofspace.FortheOptDLdata(Figure1),theoverallpictureisverycoherent:themoreDLsareminimisedandthelessentropyisaddedtotheartificialtreebank,thelargerthegaininparsingperformance(bluecir-clesinthelowerleftcornerandredcirclesintheup-perrightcorner).De nuevo,weobserveaninteractionFigure1:DifferencesinUASofMaltParserbetweenOptDL-permutedandoriginalpairsoftreebanksforthecorporainoursample.betweenDLMratioandEntropyparameters:forthelanguageswithoriginallyrelativelylowDLMratioandlowEntropy,suchasEnglishorSpanish,theperformanceonthepermuteddatadecreases.ThisisbecausewhileDLMdecreases,Entropyincreasesand,forthisgroupoflanguages,theparticulartrade-offbetweenthesetwopropertiesleadstolowerpars-ingaccuracy.RB-permuteddatashowsimilartrends(Figure2).Aninterestingregularityisshownbyfourlan-guages(LatinVulgate,AncientGreekNewTes-tament,DutchandPersian)ontheoff-diagonal.AlthoughtheyhavedifferentrelativeEntropyandDLMratiovalues,whichspanfromnearminimaltomaximalvalues,theimprovementinparsingper-formanceontheselanguagesisverysimilar(asin-dicatedbythesamepurplecolour).ThisagainstronglypointstothefactthatbothDLMratioandEntropycontributetotheobservedparsingperfor-mancevalues.Wecanfurtherconfirmtheeffectofdependencylengthbycomparingtheparsingaccuracyacrosssentences.7ConsidertheDutchtreebankanditsRB-permutedpair.Foreachsentenceanditspermutedcounterpart,wecancomputethedifferenceintheirdependencylengths(∆DLM=DLM−DLMRB)7TheEntropymeasureiscomputedonawholetreebankandcannotbemeaningfullycomparedacrosssentences.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
351
Figure2:DifferencesinUASofMaltParserbetweenRB-permutedandoriginalpairsoftreebanksforthecorporainoursample.andcompareittothedifferenceinparsingperfor-mance(∆UAS=UASRB−UAS).Weexpecttoobservethat∆UASincreaseswhen∆DLMin-creases.Indeed,theparsingresultsonDutchshowapositivecorrelationbetweenthesetwovalues(r=0.40,p<0.001forMaltandr=0.55,p<0.001forMST).Alltheseanalysesconfirmandquantifythatde-pendencylengthand,moresignificantly,wordordervariabilityaffectparsingperformance.4.2Sentence-levelanalysisofparsingperformanceReferringbacktotheresultsinTable2,weobservethatMaltParsershowsthesameaverageaccuracyforRBandLB-permuteddata.However,somelan-guagesshowsignificantlydifferentresultsbetweentheirLBandRB-permuteddata,especiallyintheirlabelledaccuracyscores.TheNewTestamentcor-pus,forexample,ismucheasiertoparsewhenitisrearrangedinleft-branchingorder(91%RBvs93%LBUAS,73%RBvs85%LBLAS).Ourartificialdataallowsustoinvestigatethisdifferenceinthescoresbylookingatparsingaccuracyatthesentencelevel.ThedifferencesinMaltaccuraciesonRB-permutedandLB-permuteddataarestriking,be-causethesedatahavethesamehead-directionen-tropyanddependencylengthsproperties.Theonlywordorderdifferenceisinthebranchingparame-terresultingintwocompletelysymmetricalwordordersforeachsentenceoftheoriginaltreebank.TounderstandthebehaviourofMaltParser,andoftransition-basedparsersingeneral,welookedattheout-degree,orbranchingfactor,ofthesyntactictrees.Theintuitionisthatwhenmanychildrenap-pearononesideofahead,theparserbehaviouronhead-finalandhead-initialorderscandivergeduetosequencesofdifferentoperations,suchasshiftver-susattach,thatmustbechoseninthetwocases.8ThedatafortheNewTestamentshowsthatthebranchingfactorplaysaroleintheLB/RBdiffer-encesfoundinthistreebank.Foreachpairofsen-tenceswithLB/RBorders,wecomputedtheparsingaccuracies(UASandLAS)andthebranchingfactorastheaverageout-degreeofthedependencytree.WethentestedwhetherthebetterperformanceontheLBdataiscorrelatedwiththebranchingfactoracrossthesentences(UASLB−UASRB∼BF).ThePearsoncorrelationforUASvalueswas0.08(p=0.02),butforLASvaluesthecorrelationwas0.30andhighlysignificant(pag<0.001).Onsen-tenceswithlargerbranchingfactors,thelabelledac-curacyscoresontheLBdatawerebettercomparedtotheRBdata.Wecombineourresultforthebranchingfactorwithanobservationbasedontheconfusionma-trixofthelabels,toprovideamoreaccurateex-planationofthecomparativelylowLASintheRB-permutedtreebankoftheNewTestamentcorpus.Wefoundthatwhenaverboranounhasseveralone-wordchildren,suchas‘aux’(auxiliaries),‘atr’(at-tributes),‘obl’(obliques),‘adv’(adverbs)etc,thesearefrequentlyconfusedandreceivethewronglabel,iftheyappearafterthehead(RBdata),butthela-belsareassignedcorrectlyiftheseelementsappearbeforethehead(LBdata).Itappearsthattheleft-wardplacementofchildrenisadvantageousforthetransition-basedMaltParser,asatthemomentoffirstattachmentdecisionforthechildclosesttotheheadithasaccesstoalargerleftcontext.Whenchildrenappearafterthehead,thefirstoneisattachedbeforeanyotherchildrenareseenbytheparserandthela-8TheMaltParserconfigurationsforLBandRBdatahadthesameparsingalgorithm(Covingtonprojective). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 0 3 1 5 6 7 3 8 2 / / t l a c _ a _ 0 0 1 0 3 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 352 bellingdecisionislessinformed,leadingtomorelabellingerrors.Itshouldbenotedthatitisnotalwayspossibletoidentifyasinglesourceofdifficultyintheerroranalysis.ContrarytoNewTestament,Spanishiseasiertoparsewhenitisrearrangedintotheright-branchingorder(88%RBvs85%LBUAS,80%RBvs76%LBLAS).However,thetypesofdifficultdependenciesemergingfromthedifferentbranchingoftheLB/RBdatawerenotsimilarorsymmetrictothatofNewTestament.InthecaseofSpanish,wedidnotobserveadistinctdimensionoferrorswhichwouldexplainthe4%differenceinUASscores.95GeneraldiscussionOurresultshighlightboththecontributionsandthechallengesoftheproposedmethod.Ontheonehand,theresultsshowthatwecanidentifyandma-nipulatewordorderpropertiesoftreebankstoanal-ysetheimpactofthesepropertiesonparsingperfor-manceandsuggestavenuestoimproveit.Inthisrespect,ourframeworkissimilartostandardanal-ysesofparsingperformancebasedonseparatema-nipulationsofindividualword-levelfeatures(suchasomittingmorphologicalannotationorchangingcoarsePoStagstofinePoStags).Similarlytotheseevaluationprocedures,ourapproachcanleadtoim-provedparsingmodelsorbetterchoiceofparsingmodelbyfindingouttheirstrengthsandweaknesses.TheperformanceofMaltandMST(Tables2and3)—whilenotdirectlycomparabletoeachotherduetodifferencesinthetrainingset-up(Maltfea-turesareoptimisedforeachlanguageandpermuta-tion)—showthatMSTperformsbetteronaverageonpermuteddatasetsthanMalt.ThiscansuggestthatMSThandlesthehighentropyoftheOptDLpermutedsetaswellasthelongdependenciesofLB/RBpermutedsetsbetter,or,conversely,thattheMaltParserdoesnotperformwellontreebankswithhighwordordervariabilitybetweenthechildrenat-tachedtothesamehead(seeSection4.2).Whentwoparsingsystemsareknowntohavedifferentstrengthsandweaknessestheycanbesuccessfully9Overall,thevarianceintheLB/RBperformancesonSpan-ishisrelativelyhighandthemeandifference(computedacrossUASscoresforsentences)isnotstatisticallysignificant(t-test:p>0.5)–aresultwewouldexpectiferrorscannotbeimputedtoclearstructuralfactors.combinedinanensemblemodelformorerobustper-formance(SurdeanuandManning,2010).Acontributionoftheparsingperformanceanal-ysesinamultilingualsettingistheidentificationofdifficultpropertiesoftreebanks.ForCiceroandHerodotustexts,forexample,ourmethodrevealsthattheirwordorderpropertiesareimportantrea-sonsfortheverylowparsingperformances.Thisresultconfirmsintuition,butitcouldnotbefirmlyconcludedwithoutfactoringoutconfoundssuchasthesizeofthetrainingsetorthedissimilaritybe-tweenthetrainingandtestsets,whichcouldalsobereasonsforlowparsingperformance.ForGerman,ouranalysisgivesmoreunexpectedresultsandal-lowsustoconcludethatthevariabilityofwordor-derisamorenegativefactoronparsingperformancethanlongdependencies.Together,theknowledgeofwordorderpropertiesofalanguageandtheknowl-edgeofparsingperformancerelatedtotheseprop-ertiesgiveusanaprioriestimationofwhatparsingsystemcouldbebettersuitedforaparticularlan-guage.Ontheotherhand,ourmethodalsoraisessomecomplexities.Comparedtocommonlyusedpars-ingperformanceanalysesrelatedtoword-levelfea-tures,themainchallengestoasystematicanalysisofwordorderlieinitsmultifactorialnatureandinthelargechoiceofquantifiablepropertiescorrelatedwithparsingperformance.First,themultifactorialnatureofwordorderprecludesonefromconsideringwordorderpropertiesseparately.Thetwopropertieswehavelookedat—DLMratioandarcdirectionentropy—cannotbeteasedapartcompletelysinceminimisingonepropertyleadstotheincreaseoftheother.Anotherchallengeisduetothefactthatformalquantitativeapproachestostudyingwordordervari-ationcross-linguisticallyarejustbeginningtoap-pearandnotallwordorderfeaturesrelevantforparsingperformancehavebeenidentified.Inpar-ticular,ourresultssuggestthattherelativeorderbe-tweenthechildren(andnotonlytheorderbetweenheadsandtheirchildren)shouldbetakenintoac-count(Section4.2).Sin embargo,wearenotawareofpreviousworkwhichproposesameasureforthispropertyanddescribesittypologicallyonalargescale.Finally,ourmethod,whichconsistsincreatingar-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
353
tificialtreebanks,canproveusefulbeyondparsingevaluation.Forinstance,ourdatacouldenrichthetrainingdatafortaskssuchasde-lexicalizedparsertransfer(McDonaldetal.,2011).Wordorderprop-ertiesplayanimportantroleincomputingsimilaritybetweenlanguagesandfindingthesourcelanguageleadingtothebestparserperformanceinthetargetlanguage(Naseemetal.,2012;RosaandZabokrt-sky,2015).Apossiblylargeartificiallypermutedtreebankwithwordorderpropertiessimilartothetargetlanguagecouldthenbeabettertrainingmatchthanasmalltreebankofanexistingtargetnaturallanguage.6RelatedworkMuchpreviousworkhasbeendedicatedtotheeval-uationofparsingperformance,alsoinamultilin-gualsetting.Thesharedtasksinmultilingualdepen-dencyparsing(BuchholzandMarsi,2006;Nivreetal.,2007)andparsingofmorphologically-richlan-guages(Tsarfatyetal.,2010;Seddahetal.,2013)collectedalargesetofparsingperformanceresults.SomestepstowardscomparabilityoftheannotationsofmultilingualtreebanksandtheparsingevaluationmeasureswereproposedandundertakeninTsarfatyetal.(2011),Seddahetal.(2013)y,mostrecently,inthecollaborativeUniversalDependencieseffort(deMarneffeetal.,2014;Nivreetal.,2016).How-ever,littleworkhassuggestedananalysisofthedif-ferencesinparsingperformanceacrosslanguagesinconnectionwiththewordorderpropertiesoftree-banks.Somepapershaveanalysedtheimpactofde-pendencylengthsonparsingperformanceinEn-glish.McDonaldandNivre(2011)demonstratedthatparsersmakemoremistakesinlongersentencesandonlongerdependencies.Rimelletal.(2009)andBenderetal.(2011)createdbenchmarktestsetsofconstructionscontaininglongdependencies,suchassubjectandobjectrelativeclauses,andanalysedparsingbehaviourontheseselectedconstructions.Otheranalysesonlong-distancedependenciescanbefoundinNivreetal.(2010)andMerlo(2015).WearenotfamiliarwithanysimilaranalysisofparsingperformanceinEnglishaddressingotherwordordervariationproperties(e.g.head-directionentropy).InGulordavaandMerlo(2015),theparsingper-formanceonseveralLatinandAncientGreektextsisanalysedwithrespecttothedependencylengthand,indirectamente,thehead-directionentropy.Theau-thorscomparetheparsingperformanceacrosstextsofthesamelanguage(LatinorAncientGreek)fromseparatedhistoricalperiodswhichdifferslightlyintheirwordorderproperties.10GulordavaandMerlo(2015)showthattextswithlongerdependenciesandmorevariedwordorderarehardertoparse.As-sumingthesamelexicalmaterialofthetexts,theirparticularsettingallowsamoredirectcomparisonofparsingperformancethanastandardmultilingualsettingwherelanguagesdifferinmanyaspectsotherthanwordorder.ThecalculationoftheminimaldependencylengththroughthepermutationofadependencytreebankwasproposedintheworkofTemperleyandGildea(Temperley,2007;GildeaandTemperley,2010).InthisworkandthefollowingworkofFutrelletal.(2015a),severaltypesofpermutationswereem-ployedtocomputedifferentlowerboundsonde-pendencylengthminimisationinEnglishandacrossdozensoflanguages.ArtificiallypermutedtreebankswerepreviouslyusedinFongandBerwick(2008)asstress-testdi-agnosticsforcognitiveplausibilityofparsingsys-tems.Inparticular,FongandBerwick(2008)per-mutedtheorderofwordsintheEnglishPennTree-banktoobtain‘unnatural’languages.Theirpermu-tationsincludedtransformationstohead-finalandhead-initialorders(appliedwith50%-50%propor-tiontosentencesinthetreebank)andreversingtherespectiveorderofcomplementsandadjuncts.Theparsingperformancesonthesepermutedtreebankswere0.5–1pointlowerthanontheoriginaltreebank,whichtheauthorsinterpretedastooaccuratetobeacognitivelyplausiblebehaviourforamodelofthehumanparser.Fromtheperspectiveofourpaper,thepermutedtreebanksofFongandBerwick(2008)wereconstructedtohavelongerdependenciesandhigherwordordervariation;thelowerperformancesarethereforeinagreementwithourownresults.10TheLatinandAncientGreekdataweusedinthisworkisasubsetofthedatathatGulordavaandMerlo(2015)haveanalysed,allcomingfromthePROIELtreebanks(HaugandJøhndal,2008).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
yo
a
C
_
a
_
0
0
1
0
3
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
354
7ConclusionsWehaveproposedamethodtoanalyseparsingper-formancecross-linguistically.Themethodisbasedonthegenerationandtheevaluationofartificialdataobtainedbypermutingthesentencesinanaturallan-guagetreebank.Themainadvantageofthisap-proachisthatitteasesapartthelinguisticfactorsfromtheextra-linguisticfactorsinparsingevalua-tion.First,wehaveshownhowthismethodcanbeusedtoestimatetheimpactoftwowordorderprop-erties—dependencylengthandhead-directionen-tropy—onparsingperformance.Previousobser-vationsthatlongerdependenciesarehardertoparseareconfirmedonamuchlargerscalethanbefore,whilecontrollingforconfoundingtreebankproper-ties.Ithasalsobeenfoundthatvariabilityofwordorderisanevenmoreprominentfactoraffectingper-formance.Second,wehaveshownthattheconstructionofartificialdataopensanewwaytoanalysethebe-haviorofparsersusingsentence-levelobservations.Sentence-levelevaluationscouldbeaverypower-fultoolfordetailedinvestigationsofhowsyntacticpropertiesoflanguagesaffectparsingperformanceandcouldhelpcreatingmorecross-linguisticallyvalidparsingtechniques.Twoavenuesareopenforfuturework.Firstwewillinvestigatemorepropertiesrelatedtowordorder.Specifically,wewillapplythemethodtothenon-projectivityproperty.Ontheonehand,dependencylengthsandnon-projectivityarecorre-latedproperties,aspredictedtheoretically(Ferrer-i-Cancho,2006).Ourdataconfirmthisrelationem-pirically:thePearsoncorrelationbetweenDLMra-tioandthepercentageofnon-projectivedependen-ciesacrosstreebanksis0.66(pag<0.02).Ontheotherhand,thiscorrelationisnotperfectandbothdependencylengthandnon-projectivityshouldbetakenintoaccounttofullyexplainthevariationinparsingperformance.Second,wehavenotattemptedinthecurrentworktoestimatethefunctionf(seesection2.1).Thistaskisequivalenttoautomaticpredictionofparsingaccuracyofatreebankbasedonitsproper-ties.Ravietal.(2008)haveproposedanaccuracypredictionmethodforonelanguage(English)basedonsimplelexicalandsyntacticproperties.Combin-ingtheirinsightswithouranalysisofwordordercouldleadtoafirstlanguage-independentapprox-imationoff.AcknowledgementsWegratefullyacknowledgethepartialfundingofthisworkbytheSwissNationalScienceFoundation,undergrant144362.ReferencesˇZeljkoAgi´c,MariaJesusAranzabe,AitziberAtutxa,CristinaBosco,JinhoChoi,Marie-CatherinedeMarn-effe,TimothyDozat,Rich´ardFarkas,JenniferFoster,FilipGinter,IakesGoenaga,KoldoGojenola,YoavGoldberg,JanHajiˇc,AndersTrærupJohannsen,JennaKanerva,JuhaKuokkala,VeronikaLaippala,Alessan-droLenci,KristerLind´en,NikolaLjubeˇsi´c,TeresaLynn,ChristopherManning,H´ectorAlonsoMart´ınez,RyanMcDonald,AnnaMissil¨a,SimonettaMonte-magni,JoakimNivre,HannaNurmi,PetyaOsenova,SlavPetrov,JussiPiitulainen,BarbaraPlank,ProkopisProkopidis,SampoPyysalo,WolfgangSeeker,Moj-ganSeraji,NataliaSilveira,MariaSimi,KirilSimov,AaronSmith,ReutTsarfaty,VeronikaVincze,andDanielZeman.2015.Universaldependencies1.1.MiguelBallesterosandJoakimNivre.2012.MaltOp-timizer:AnoptimizationtoolforMaltParser.InPro-ceedingsoftheDemonstrationsatthe13thConferenceoftheEuropeanChapteroftheAssociationforCom-putationalLinguistics(EACL’12),pages58–62,Avi-gnon,France,April.EmilyM.Bender,DanFlickinger,StephanOepen,andYiZhang.2011.Parserevaluationoverlocalandnon-localdeepdependenciesinalargecorpus.InProceed-ingsoftheConferenceonEmpiricalMethodsinNat-uralLanguageProcessing(EMNLP’11),pages397–408,Edinburgh,UnitedKingdom,July.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsoftheTenthConferenceonCompu-tationalNaturalLanguageLearning(CoNLL-X’06),pages149–164,NewYorkCity,NY,USA,June.F.R.K.Chung.1984.Onoptimallineararrangementsoftrees.Computers&MathematicswithApplications,10(1):43–60.Marie-CatherinedeMarneffe,TimothyDozat,NataliaSilveira,KatriHaverinen,FilipGinter,JoakimNivre,andChristopherDManning.2014.UniversalStan-forddependencies:Across-linguistictypology.InProceedingsoftheNinthInternationalConference
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
l
a
c
_
a
_
0
0
1
0
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
355
onLanguageResourcesandEvaluation(LREC’14),pages4585–4592,Reykjavik,Iceland,May.VeraDembergandFrankKeller.2008.Datafromeye-trackingcorporaasevidencefortheoriesofsyntacticprocessingcomplexity.Cognition,109(2):193–210.RamonFerrer-i-CanchoandHaitaoLiu.2014.Therisksofmixingdependencylengthsfromsequencesofdif-ferentlength.Glottotheory,5(2):143–155.RamonFerrer-i-Cancho.2006.Whydosyntacticlinksnotcross?EPL(EurophysicsLetters),76(6):12–28.SandiwayFongandRobertBerwick.2008.Treebankparsingandknowledgeoflanguage:Acognitiveper-spective.InProceedingsofthe30thAnnualConfer-enceoftheCognitiveScienceSociety,pages539–544,Washington,DC,USA,July.RichardFutrell,KyleMahowald,andEdwardGibson.2015a.Large-scaleevidenceofdependencylengthminimizationin37languages.ProceedingsoftheNa-tionalAcademyofSciences,112(33):10336–10341.RichardFutrell,KyleMahowald,andEdwardGibson.2015b.Quantifyingwordorderfreedomindepen-dencycorpora.InProceedingsoftheThirdIn-ternationalConferenceonDependencyLinguistics(Depling2015),pages91–100,Uppsala,Sweden,Au-gust.EdwardGibson.1998.Linguisticcomplexity:Localityofsyntacticdependencies.Cognition,68(1):1–76.DanielGildeaandDavidTemperley.2010.Dogram-marsminimizedependencylength?CognitiveSci-ence,34(2):286–310.KristinaGulordavaandPaolaMerlo.2015.Diachronictrendsinwordorderfreedomanddependencylengthindependency-annotatedcorporaofLatinandAncientGreek.InProceedingsoftheThirdInternationalCon-ferenceonDependencyLinguistics(Depling2015),pages121–130,Uppsala,Sweden,August.KristinaGulordava,PaolaMerlo,andBenoitCrabb´e.2015.Dependencylengthminimisationeffectsinshortspans:alarge-scaleanalysisofadjectiveplace-mentincomplexnounphrases.InProceedingsofthe53rdAnnualMeetingoftheAssociationforComputa-tionalLinguisticsandthe7thInternationalJointCon-ferenceonNaturalLanguageProcessing(Volume2:ShortPapers),pages477–482,Beijing,China,July.EvaHajiˇcov´a,Jiˇr´ıHavelka,PetrSgall,KateˇrinaVesel´a,andDanielZeman.2004.IssuesofprojectivityinthePragueDependencyTreebank.PragueBulletinofMathematicalLinguistics,(81).DagT.T.HaugandMariusL.Jøhndal.2008.Cre-atingaparalleltreebankoftheOldIndo-EuropeanBibletranslations.InProceedingsofthe2ndWork-shoponLanguageTechnologyforCulturalHeritageData,pages27–34,Marrakech,Morocco,June.JohnA.Hawkins.1994.Aperformancetheoryoforderandconstituency.CambridgeUniversityPress,Cam-bridge,UK.JamesHenderson,PaolaMerlo,IvanTitov,andGabrieleMusillo.2013.Multilingualjointparsingofsyntac-ticandsemanticdependencieswithalatentvariablemodel.ComputationalLinguistics,39(4):949–998.HaitaoLiu.2010.Dependencydirectionasameansofword-ordertypology:Amethodbasedondependencytreebanks.Lingua,120(6):1567–1578,June.RyanMcDonaldandJoakimNivre.2011.Analyzingandintegratingdependencyparsers.ComputationalLinguistics,37(1):197–230.RyanMcDonald,KevinLerman,andFernandoPereira.2006.Multilingualdependencyanalysiswithatwo-stagediscriminativeparser.InProceedingsoftheTenthConferenceonComputationalNaturalLan-guageLearning(CoNLL’06),pages216–220,NewYork,NY,USA,June.RyanMcDonald,SlavPetrov,andKeithHall.2011.Multi-sourcetransferofdelexicalizeddependencyparsers.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages62–72,Edinburgh,Scotland,UK,July.PaolaMerlo.2015.Evaluationoftwo-leveldepen-dencyrepresentationsofargumentstructureinlong-distancedependencies.InProceedingsoftheThirdInternationalConferenceonDependencyLinguistics(Depling2015),pages221–230,Uppsala,Sweden,August.TahiraNaseem,ReginaBarzilay,andAmirGloberson.2012.Selectivesharingformultilingualdependencyparsing.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages629–637,JejuIsland,Ko-rea,July.JoakimNivre,JohanHall,andJensNilsson.2006.MaltParser:Adata-drivenparser-generatorfordepen-dencyparsing.InProceedingsofthe5thInternationalConferenceonLanguageResourcesandEvaluation(LREC’06),pages2216–2219,Genova,Italy,May.JoakimNivre,JohanHall,SandraK¨ubler,RyanMc-Donald,JensNilsson,SebastianRiedel,andDenizYuret.2007.TheCoNLL2007sharedtaskondepen-dencyparsing.InProceedingsoftheCoNLLSharedTaskSessionofEMNLP-CoNLL2007,pages915–932,Prague,CzechRepublic,June.JoakimNivre,LauraRimell,RyanMcDonald,andCarlosG´omezRodr´ıguez.2010.Evaluationofdependencyparsersonunboundeddependencies.InProceedingsofthe23rdInternationalConferenceonComputa-tionalLinguistics(Coling2010),pages833–841,Bei-jing,China,August.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
0
3
1
5
6
7
3
8
2
/
/
t
l
a
c
_
a
_
0
0
1
0
3
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
356
JoakimNivre,Marie-CatherinedeMarneffe,FilipGinter,YoavGoldberg,JanHajic,ChristopherD.Manning,RyanMcDonald,SlavPetrov,SampoPyysalo,Na-taliaSilveira,ReutTsarfaty,andDanielZeman.2016.UniversalDependenciesv1:Amultilingualtreebankcollection.InProceedingsofthe10thInternationalConferenceonLanguageResourcesandEvaluation(LREC’16),Portoroz,Slovenia,May.JoakimNivre.2009.Non-projectivedependencyparsinginexpectedlineartime.InProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAFNLP:Volume1-Longpapers,pages351–359,Suntec,Singapore,August.MohammadSadeghRasooli,ManouchehrKouhestani,andAmirsaeidMoloodi.2013.DevelopmentofaPer-siansyntacticdependencytreebank.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:Hu-manLanguageTechnologies,pages306–314,Atlanta,Georgia,June.SujithRavi,KevinKnight,andRaduSoricut.2008.Au-tomaticpredictionofparseraccuracy.InProceedingsofthe2008ConferenceonEmpiricalMethodsinNat-uralLanguageProcessing,pages887–896,Honolulu,Hawaii,October.LauraRimell,StephenClark,andMarkSteedman.2009.Unboundeddependencyrecoveryforparserevalua-tion.InProceedingsofthe2009ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP’09),pages813–821,Suntec,Singapore,Au-gust.RudolfRosaandZdenekZabokrtsky.2015.Klcpos3-alanguagesimilaritymeasurefordelexicalizedparsertransfer.InProceedingsofthe53rdAnnualMeet-ingoftheAssociationforComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguageProcessing(Volume2:ShortPapers),pages243–249,Beijing,China,July.Djam´eSeddah,ReutTsarfaty,andJenniferFoster,ed-itors.2011.ProceedingsoftheSecondWorkshoponStatisticalParsingofMorphologicallyRichLan-guages(SPMRL’11).Dublin,Ireland,October.Djam´eSeddah,ReutTsarfaty,SandraK¨ubler,MarieCan-dito,JinhoD.Choi,Rich´ardFarkas,JenniferFos-ter,IakesGoenaga,KoldoGojenolaGalletebeitia,YoavGoldberg,SpenceGreen,NizarHabash,MarcoKuhlmann,WolfgangMaier,JoakimNivre,AdamPrzepi´orkowski,RyanRoth,WolfgangSeeker,Yan-nickVersley,VeronikaVincze,MarcinWoli´nski,AlinaWr´oblewska,andEricVillemontedelaClergerie.2013.OverviewoftheSPMRL2013sharedtask:Across-frameworkevaluationofparsingmorpholog-icallyrichlanguages.InProceedingsoftheFourthWorkshoponStatisticalParsingofMorphologically-RichLanguages,pages146–182,Seattle,Washington,USA,October.MihaiSurdeanuandChristopherD.Manning.2010.En-semblemodelsfordependencyparsing:Cheapandgood?InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages649–652,LosAngeles,California,June.DavidTemperley.2007.MinimizationofdependencylengthinwrittenEnglish.Cognition,105(2):300–333.HarryJoelTily.2010.TheRoleofProcessingCom-plexityinWordOrderVariationandChange.Ph.D.Thesis,StanfordUniversity.IvanTitovandJamesHenderson.2007.Alatentvari-ablemodelforgenerativedependencyparsing.InPro-ceedingsofthe10thInternationalConferenceonPars-ingTechnologies,IWPT’07,pages144–155,Prague,CzechRepublic,June.IvanTitov,JamesHenderson,PaolaMerlo,andGabrieleMusillo.2009.Onlinegraphplanarisationforsyn-chronousparsingofsemanticandsyntacticdepen-dencies.InProceedingsofthe21stInternationalJointConferenceonArtificialIntelligence(IJCAI’09),pages1562–1567,Pasadena,California,USA,July.ReutTsarfaty,Djam´eSeddah,YoavGoldberg,SandraK¨ubler,MarieCandito,JenniferFoster,YannickVers-ley,InesRehbein,andLamiaTounsi.2010.Statisticalparsingofmorphologicallyrichlanguages(SPMRL):What,howandwhither.InProceedingsoftheNAACLHLT2010FirstWorkshoponStatisticalParsingofMorphologically-RichLanguages,SPMRL’10,pages1–12,LosAngeles,California.ReutTsarfaty,JoakimNivre,andEvelinaAndersson.2011.Evaluatingdependencyparsing:Robustandheuristics-freecross-annotationevaluation.InPro-ceedingsofthe2011ConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP’11),pages385–396,Edinburgh,Scotland,UK,July.MarcinWoli´nski,KatarzynaGłowi´nska,andMarek´Swidzi´nski.2011.ApreliminaryversionofSklad-nicaatreebankofPolish.InProceedingsofthe5thLanguage&TechnologyConference:HumanLan-guageTechnologiesasaChallengeforComputerSci-enceandLinguistics,pages299–303,Poznan,Poland,November.