Transactions of the Association for Computational Linguistics, vol. 5, pp. 279–293, 2017. Action Editor: Yuji Matsumoto.

Transactions of the Association for Computational Linguistics, vol. 5, pp. 279–293, 2017. Action Editor: Yuji Matsumoto.
Submission batch: 5/2016; Revision batch: 10/2016; 2/2017; Published 8/2017.

2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.

c
(cid:13)

Cross-LingualSyntacticTransferwithLimitedResourcesMohammadSadeghRasooliandMichaelCollins∗DepartmentofComputerScience,ColumbiaUniversityNewYork,NY10027,USA{rasooli,mcollins}@cs.columbia.eduAbstractWedescribeasimplebuteffectivemethodforcross-lingualsyntactictransferofdepen-dencyparsers,inthescenariowherealargeamountoftranslationdataisnotavailable.Thismethodmakesuseofthreesteps:1)amethodforderivingcross-lingualwordclus-ters,whichcanthenbeusedinamultilingualparser;2)amethodfortransferringlexicalinformationfromatargetlanguagetosourcelanguagetreebanks;3)amethodforintegrat-ingthesestepswiththedensity-drivenannota-tionprojectionmethodofRasooliandCollins(2015).Experimentsshowimprovementsoverthestate-of-the-artinseverallanguagesusedinpreviouswork,inasettingwheretheonlysourceoftranslationdataistheBible,acon-siderablysmallercorpusthantheEuroparlcorpususedinpreviouswork.ResultsusingtheEuroparlcorpusasasourceoftranslationdatashowadditionalimprovementsovertheresultsofRasooliandCollins(2015).Wecon-cludewithresultson38datasetsfromtheUni-versalDependenciescorpora.1IntroductionCreatingmanually-annotatedsyntactictreebanksisanexpensiveandtimeconsumingtask.Recentlytherehasbeenagreatdealofinterestincross-lingualsyntactictransfer,whereaparsingmodelistrainedforsomelanguageofinterest,usingonlytreebanksinotherlanguages.Thereisaclearmotivationforthisinbuildingparsingmodelsforlanguagesforwhichtreebankdataisunavailable.Methods∗OnleaveatGoogleInc.NewYork.forsyntactictransferincludeannotationprojectionmethods(Hwaetal.,2005;Ganchevetal.,2009;McDonaldetal.,2011;MaandXia,2014;RasooliandCollins,2015;Lacroixetal.,2016;Agi´cetal.,2016),learningofdelexicalizedmodelsonuniver-saltreebanks(ZemanandResnik,2008;McDon-aldetal.,2011;T¨ackstr¨ometal.,2013;RosaandZabokrtsky,2015),treebanktranslation(Tiedemannetal.,2014;Tiedemann,2015;TiedemannandAgi´c,2016)andmethodsthatleveragecross-lingualrep-resentationsofwordclusters,embeddingsordictio-naries(T¨ackstr¨ometal.,2012;Durrettetal.,2012;Duongetal.,2015a;ZhangandBarzilay,2015;XiaoandGuo,2015;Guoetal.,2015;Guoetal.,2016;Ammaretal.,2016a).Thispaperconsiderstheproblemofcross-lingualsyntactictransferwithlimitedresourcesofmono-lingualandtranslationdata.Speciﬁcally,weusetheBiblecorpusofChristodouloupoulosandSteed-man(2014)asasourceoftranslationdata,andWikipediaasasourceofmonolingualdata.Wede-liberatelylimitourselvestotheuseofBibletrans-lationdatabecauseitisavailableforaverybroadsetoflanguages:thedatafromChristodouloupou-losandSteedman(2014)includesdatafrom100languages.TheBibledatacontainsamuchsmallersetofsentences(around24,000)thanothertransla-tioncorpora,forexampleEuroparl(Koehn,2005),whichhasaround2millionsentencesperlanguagepair.Thismakesitaconsiderablymorechalleng-ingcorpustoworkwith.Similarly,ourchoiceofWikipediaasthesourceofmonolingualdataismo-tivatedbytheavailabilityofWikipediadatainaverybroadsetoflanguages.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
6
1
1
5
6
7
4
7
2

/
t

un
c
_
un
_
0
0
0
6
1
p
d

b
oui
g
toi
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

280

Weintroduceasetofsimplebuteffectivemethodsforsyntactictransfer,asfollows:•Wedescribeamethodforderivingcross-lingualclusters,wherewordsfromdifferentlanguageswithasimilarsyntacticorseman-ticrolearegroupedinthesamecluster.Theseclusterscanthenbeusedasfeaturesinashift-reducedependencyparser.•Wedescribeamethodfortransferoflexicalin-formationfromthetargetlanguageintosourcelanguagetreebanks,usingword-to-wordtrans-lationdictionariesderivedfromparallelcor-pora.Lexicalfeaturesfromthetargetlanguagecanthenbeintegratedinparsing.•Wedescribeamethodthatintegratestheabovetwoapproacheswiththedensity-drivenap-proachtoannotationprojectiondescribedbyRasooliandCollins(2015).ExperimentsshowthatourmodeloutperformspreviousworkonasetofEuropeanlanguagesfromtheGoogleuniversaltreebank(McDonaldetal.,2013).Weachieve80.9%averageunlabeledat-tachmentscore(UAS)ontheselanguages;incom-parisontheworkofZhangandBarzilay(2015),Guoetal.(2016)andAmmaretal.(2016b)haveaUASof75.4%,76.3%and77.8%,respectively.AllofthesepreviousworksmakeuseofthemuchlargerEuroparl(Koehn,2005)corpustoderivelex-icalrepresentations.WhenusingEuroparldatain-steadoftheBible,ourapproachgives83.9%accu-racy,a1.7%absoluteimprovementoverRasooliandCollins(2015).Enfin,weconductexperimentson38datasets(26languages)intheuniversaldepen-denciesv1.3(Nivreetal.,2016)corpus.Ourmethodhasanaverageunlabeleddependencyaccuracyof74.8%fortheselanguages,morethan6%higherthanthemethodofRasooliandCollins(2015).Thir-teendatasets(10languages)haveaccuracieshigherthan80.0%.12BackgroundThissectiongivesadescriptionoftheunderlyingparsingmodelsusedinourexperiments,thedata1Theparsercodeisavailableathttps://github.com/rasoolims/YaraParser/tree/transfer.setsused,andabaselineapproachbasedondelexi-calizedparsingmodels.2.1TheParsingModelWeassumethattheparsingmodelisadiscriminativelinearmodel,wheregivenasentencex,andasetofcandidateparsesY(X),theoutputfromthemodelisy∗(X)=argmaxy∈Y(X)θ·φ(X,oui)whereθ∈Rdisaparametervector,andφ(X,oui)isafeaturevectorforthepair(X,oui).Inourexperi-mentsweusetheshift-reducedependencyparserofRasooliandTetreault(2015),whichisanextensionoftheapproachinZhangandNivre(2011).Theparseristrainedusingtheaveragedstructuredper-ceptron(Collins,2002).Weassumethatthefeaturevectorφ(X,oui)istheconcatenationofthreefeaturevectors:•φ(p)(X,oui)isanunlexicalizedsetoffeatures.Eachsuchfeaturemaydependonthepart-of-speech(POS)tagofwordsinthesentence,butdoesnotdependontheidentityofindividualwordsinthesentence.•φ(c)(X,oui)isasetofclusterfeatures.Thesefea-turesrequireaccesstoadictionarythatmapseachwordinthesentencetoanunderlyingclusteridentity.Clustersmay,forexample,belearnedusingtheBrownclusteringalgorithm(Brownetal.,1992).ThefeaturesmaymakeuseofclusteridentitiesincombinationwithPOStags.•φ(je)(X,oui)isasetoflexicalizedfeatures.Eachsuchfeaturemaydependdirectlyonwordiden-titiesinthesentence.Thesefeaturesmayalsodependonpart-of-speechtagsorclusterinfor-mation,inconjunctionwithlexicalinforma-tion.AppendixAhasacompletedescriptionofthefea-turesusedinourexperiments.2.2DataAssumptionsThroughoutthispaperwewillassumethatwehavemsourcelanguagesL1…Lm,andasingletar-getlanguageLm+1.Weassumethefollowingdatasources:

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
6
1
1
5
6
7
4
7
2

/
t

un
c
_
un
_
0
0
0
6
1
p
d

b
oui
g
toi
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

281

Sourcelanguagetreebanks.WehaveatreebankTiforeachlanguagei∈{1…m}.Part-of-speech(POS)data.Wehavehand-annotatedPOSdataforalllanguagesL1…Lm+1.WeassumethatthedatausesauniversalPOSsetthatiscommonacrossalllanguages.Monolingualdata.Wehavemonolingual,rawtextforeachofthe(m+1)languages.WeuseDitorefertothemonolingualdatafortheithlanguage.Translationdata.Wehavetranslationdataforalllanguagepairs.WeuseBi,jtorefertotransla-tiondataforthelanguagepair(je,j)wherei,j∈{1…(m+1)}andi6=j.InourmainexperimentsweusetheGoogleuniversaltreebank(McDonaldetal.,2013)asoursourcelanguagetreebanks2(thistreebankpro-videsuniversaldependencyrelationsandPOStags),Wikipediadataasourmonolingualdata,andtheBiblefromChristodouloupoulosandSteedman(2014)asthesourceofourtranslationdata.Inad-ditionalexperimentsweusetheEuroparlcorpusasasourceoftranslationdata,inordertomeasuretheimpactofusingthesmallerBiblecorpus.2.3ABaselineApproach:DelexicalizedParserswithSelf-TrainingGiventhedataassumptionofauniversalPOSset,thefeaturevectorsφ(p)(X,oui)canbesharedacrosslanguages.AsimpleapproachisthentosimplytrainadelexicalizedparserusingtreebanksT1…Tm,us-ingtherepresentationφ(X,oui)=φ(p)(X,oui)(voir(McDonaldetal.,2013;T¨ackstr¨ometal.,2013)).Ourbaselineapproachmakesuseofadelexical-izedparser,withtworeﬁnements:WALSproperties.WeusethesixpropertiesfromtheWorldAtlasofLanguageStructures(WALS)(DryerandHaspelmath,2013)toselectasubsetofcloselyrelatedlanguagesforeachtargetlanguage.ThesepropertiesareshowninTable1.Themodelforatargetlanguageistrainedontreebankdatafromlanguageswhereatleast4outof6WALSprop-ertiesarecommonbetweenthesourceandtarget2Wealsotrainourbestperformingmodelonthenewlyre-leaseduniversaltreebankv1.3(Nivreetal.,2016).See§4.3formoredetails.FeatureDescription82AOrderofsubjectandverb83AOrderofobjectandverb85AOrderofadpositionandnounphrase86AOrderofgenitiveandnoun87AOrderofadjectiveandnoun88AOrderofdemonstrativeandnounTable1:Thesixpropertiesfromtheworldatlasoflan-guagestructures(WALS)(DryerandHaspelmath,2013)usedtoselectthesourcelanguagesforeachtargetlan-guageinourexperiments.language.3Thisgivesaslightlystrongerbaseline.Ourexperimentsshowedanimprovementinaver-agelabeleddependencyaccuracyforthelanguagesfrom62.52%to63.18%.Table2showsthesetofsourcelanguagesusedforeachtargetlanguage.Thesesourcelanguagesareusedforallexperimentsinthepaper.Self-training.Weuseself-training(McCloskyetal.,2006)tofurtherimproveparsingperformance.Speciﬁcally,weﬁrsttrainadelexicalizedmodelontreebanksT1…Tm;thenusetheresultingmodeltoparseadatasetTm+1thatincludestarget-languagesentenceswhichhavePOStagsbutdonothavede-pendencystructures.Weﬁnallyusetheautomati-callyparseddataT0m+1asthetreebankdataandre-trainthemodel.Thislastmodelistrainedusingallfeatures(unlexicalized,clusters,andlexicalized).Self-traininginthiswaygivesanimprovementinla-beledaccuracyfrom63.18%to63.91%.2.4TranslationDictionariesOuronlyuseofthetranslationdataBi,jfori,j∈{1…(m+1)}istoconstructatranslationdictio-naryt(w,je,j).Hereiandjaretwolanguages,wisawordinlanguageLi,andtheoutputw0=t(w,je,j)isawordinlanguageLjcorrespondingtothemostfrequenttranslationofwintothislanguage.Wedeﬁnethefunctiont(w,je,j)asfollows:WeﬁrstruntheGIZA++alignmentprocess(OchandNey,2003)onthedataBi,j.Wethenkeepinter-sectedalignmentsbetweensentencesinthetwolan-guages.Finally,foreachwordwinLi,wedeﬁne3Therewasnoefforttooptimizethischoice;futureworkmayconsidermoresophisticatedsharingschemes.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
6
1
1
5
6
7
4
7
2

/
t

un
c
_
un
_
0
0
0
6
1
p
d

b
oui
g
toi
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

282

TargetSourcesende,fr,pt,svdeen,fr,ptesfr,it,ptfren,de,es,it,pt,svites,fr,ptpten,de,es,fr,it,svsven,fr,ptTable2:TheselectedsourcelanguagesforeachtargetlanguageintheGoogleuniversaltreebankv2(McDonaldetal.,2013).Alanguageischosenasasourcelanguageifithasatleast4outof6WALSpropertiesincommonwiththetargetlanguage.w0=t(w,je,j)tobethetargetlanguagewordmostfrequentlyalignedtowinthealigneddata.Ifawordwisneverseenalignedtoatargetlanguagewordw0,wedeﬁnet(w,je,j)=NULL.3OurApproachWenowdescribeanapproachthatgivessigniﬁcantimprovementsoverthebaseline.§3.1describesamethodforderivingcross-lingualclusters,allowingustoaddclusterfeaturesφ(c)(X,oui)tothemodel.§3.2describesamethodforaddinglexicalfeaturesφ(je)(X,oui)tothemodel.§3.3describesamethodforintegratingtheapproachwiththedensity-drivenap-proachofRasooliandCollins(2015).Enfin,§4describesexperiments.Weshowthateachoftheabovestepsleadstoimprovementsinaccuracy.3.1LearningCross-LingualClustersWenowdescribeamethodforlearningcross-lingualclusters.Thisfollowspreviousworkoncross-lingualclusteringalgorithms(T¨ackstr¨ometal.,2012).AclusteringisafunctionC(w)thatmapseachwordwinavocabularytoaclusterC(w)∈{1…K},whereKisthenumberofclusters.Ahi-erarchicalclusteringisafunctionC(w,je)thatmapsawordwtogetherwithanintegerltoaclusteratlevellinthehierarchy.Asoneexample,theBrownclusteringalgorithm(Brownetal.,1992)givesahi-erarchicalclustering.Thelevellallowsclusterfea-turesatdifferentlevelsofgranularity.Across-lingualhierarchicalclusteringisafunc-tionC(w,je)wheretheclustersaresharedacrossthe(m+1)languagesofinterest.Thatis,thewordwInputs:1)MonolingualtextsDifori=1…(m+1);2)afunctiont(w,je,j)thattranslatesawordw∈Litow0∈Lj;and3)aparameterαsuchthat0<α<1.Algorithm:D={}fori=1tom+1doforeachsentences∈Didoforp=1to|s|doSample¯a∼[0,1)if¯a≥αthencontinueSamplej∼unif{1,...,m+1}\{i}w0=t(sp,i,j)ifw06=NULLthenSetsp=w0D=D∪{s}UsethealgorithmofStratosetal.(2015)onDtolearnaclusteringC.Output:TheclusteringC.Figure1:Analgorithmforlearningacross-lingualclus-tering.Inourexperimentsweusedtheparametervalueα=0.3.canbefromanyofthe(m+1)languages.Ideally,across-lingualclusteringshouldputwordsacrossdifferentlanguageswhichhaveasimilarsyntacticand/orsemanticroleinthesamecluster.Thereisaclearmotivationforcross-lingualclusteringintheparsingcontext.Wecanusethecluster-basedfea-turesφ(c)(X,oui)onthesourcelanguagetreebanksT1...Tm,andthesefeatureswillnowgeneralizebe-yondthesetreebankstothetargetlanguageLm+1.Welearnacross-lingualclusteringbyleverag-ingthemonolingualdatasetsD1...Dm+1,togetherwiththetranslationdictionariest(w,je,j)learnedfromthetranslationdata.Figure1showsthealgo-rithmthatlearnsacross-lingualclustering.Theal-gorithmﬁrstpreparesamultilingualcorpus,asfol-lows:foreachsentencesinthemonolingualdataDi,foreachwordins,withprobabilityα,were-placethewordwithitstranslationintosomeran-domlychosenlanguage.Oncethisdataiscreated,wecaneasilyobtainacross-lingualclustering.Fig-ure1showsthecompletealgorithm.Theintuitionbehindthismethodisthatbycreatingthecross-lingualdatainthisway,webiastheclusteringal- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 283 gorithmtowardsputtingwordsthataretranslationsofeachotherinthesamecluster.3.2TreebankLexicalizationWenowdescribehowtointroducelexicalrepre-sentationsφ(je)(X,oui)tothemodel.Ourapproachissimple:wetakethetreebankdataT1...Tmforthemsourcelanguages,togetherwiththetransla-tionlexiconst(w,je,m+1).Foranywordwinthesourcetreebankdata,wecanlookupitstransla-tiont(w,je,m+1)inthelexicon,andaddthistrans-latedformtotheunderlyingsentence.Featurescannowconsiderlexicalidentitiesderivedinthisway.InmanycasestheresultingtranslationwillbetheNULLword,leadingtotheabsenceoflexicalfea-tures.However,therepresentationsφ(p)(X,oui)andφ(c)(X,oui)stillapplyinthiscase,sothemodelisro-busttosomewordshavingaNULLtranslation.3.3IntegrationwiththeDensity-DrivenProjectionMethodofRasooliandCollins(2015)Inthissectionwedescribeamethodforintegratingourapproachwiththecross-lingualtransfermethodofRasooliandCollins(2015),whichmakesuseofdensity-drivenprojections.Inannotationprojectionmethods(Hwaetal.,2005;McDonaldetal.,2011),itisassumedthatwehavetranslationdataBi,jforasourceandtargetlanguage,andthatwehaveadependencyparserinthesourcelanguageLi.Thetranslationdatacon-sistsofpairs(e,F)whereeisasourcelanguagesentence,andfisatargetlanguagesentence.AmethodsuchasGIZA++isusedtoderiveanalign-mentbetweenthewordsineandf,foreachsen-tencepair;thesourcelanguageparserisusedtoparsee.Eachdependencyineisthenpotentiallytransferredthroughthealignmentstocreateade-pendencyinthetargetsentencef.Oncedependen-cieshavebeentransferredinthisway,adependencyparsercanbetrainedonthedependenciesinthetar-getlanguage.Thedensity-drivenapproachofRasooliandCollins(2015)makesuseofvariousdeﬁnitionsof“density”oftheprojecteddependencies.Forexam-ple,P100isthesetofprojectedstructureswheretheprojecteddependenciesformafullprojectiveparsetreeforthesentence;P80isthesetofprojectedstructureswhereatleast80%ofthewordsinthepro-jectedstructureareamodiﬁerinsomedependency.Aniterativetrainingprocessisused,wherethepars-ingalgorithmisﬁrsttrainedonthesetT100ofcom-pletestructures,andwhereprogressivelylessdensestructuresareintroducedinlearning.Weintegrateourapproachwiththedensity-drivenapproachofRasooliandCollins(2015)asfollows:considerthetreebanksT1...Tmcreatedusingthelexicalizationmethodof§3.2.WeaddalltreesinthesetreebankstothesetP100offulltreesusedtoinitializethemethodofRasooliandCollins(2015).Inadditionwemakeuseoftherepresentationsφ(p),φ(c)andφ(je),throughoutthelearningprocess.4ExperimentsThissectionﬁrstdescribestheexperimentalsettings,thenreportsresults.4.1DataandToolsDataIntheﬁrstsetofexperiments,weconsider7Europeanlanguagesstudiedinseveralpiecesofpre-viouswork(MaandXia,2014;ZhangandBarzi-lay,2015;Guoetal.,2016;Ammaretal.,2016a;Lacroixetal.,2016).Morespeciﬁcally,weusethe7EuropeanlanguagesintheGoogleuniversaltree-bank(v.2;standarddata)(McDonaldetal.,2013).Asinpreviouswork,goldpart-of-speechtagsareusedforevaluation.Weusetheconcatenationofthetreebanktrainingsentences,WikipediadataandtheBiblemonolingualsentencesasourmonolingualrawtext.Table3showsstatisticsforthemonolin-gualdata.WeusetheBiblefromChristodouloupou-losandSteedman(2014),whichincludesdatafor100languages,asthesourceoftranslations.WealsoconductexperimentswiththeEuroparldata(bothwiththeoriginalsetandasubsetofitwiththesamesizeastheBible)tostudytheeffectsoftranslationdatasizeanddomainshift.Thestatisticsfortransla-tiondataisshowninTable4.Inasecondsetofexperiments,werunexperi-mentson38datasets(26languages)inthemorere-centUniversalDependenciesv1.3corpus(Nivreetal.,2016).ThefullsetoflanguagesweuseislistedinTable9.4WeusetheBibleasthetranslationdata,4WeexcludedlanguagesthatarenotcompletelypresentintheBibleofChristodouloupoulosandSteedman(2014)(Un- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 284 andWikipediaasthemonolingualtext.Thestandardtraining,developmentandtestsetsplitsareusedinallexperiments.Thedevelopmentsetsareusedforanalysis,givenin§5ofthispaper.Lang.endeesfritptsv#Sen.31.820.013.613.610.16.13.9#Token750.5408.2402.3372.1311.1169.360.6#Type3.86.12.72.42.11.61.3Table3:Sizesofthemonolingualdatasetsforeachofourlanguages.Allnumbersareinmillions.BrownClusteringAlgorithmWeusetheoff-the-shelfBrownclusteringtool5(Liang,2005)totrainmonolingualBrownclusterswith500clusters.ThemonolingualBrownclustersareusedasfeaturesoverlexicalizedvaluescreatedinφ(je),andinself-trainingexperiments.Wetrainourcross-lingualclusteringwiththeoff-the-shelf-tool6fromStratosetal.(2015).Wesetthewindowsizeto2withaclustersizeof500.7ParsingModelWeusethek-beamarc-eagerde-pendencyparserofRasooliandTetreault(2015),whichissimilartothemodelofZhangandNivre(2011).Wemodifytheparsersuchthatitcanusebothmonolingualandcross-lingualwordclus-terfeatures.Theparseristrainedusingthethemax-imumviolationupdatestrategy(Huangetal.,2012).Weusethreeepochsoftrainingforallexperiments.WeusetheDEPENDABLETool(Choietal.,2015)tocalculatesigniﬁcancetestsonseveralofthecom-parisons(detailsaregiveninthecaptionstotables5,6,and9).cientGreek,Basque,Catalan,Galician,Gothic,Irish,Kazakh,Latvian,OldChurchSlavonic,andTamil).WealsoexcludedArabic,Hebrew,JapaneseandChinese,astheselanguageshavetokenizationand/ormorphologicalcomplexitythatgoesbeyondthescopeofthispaper.Futureworkshouldconsidertheselan-guages.5https://github.com/percyliang/brown-cluster6https://github.com/karlstratos/singular7UsuallytheoriginalBrownclustersarebetterfeaturesforparsingbuttheirtrainingproceduredoesnotscalewelltolargedatasets.ThereforeweusethemoreefﬁcientalgorithmfromStratosetal.(2015)onthelargercross-lingualdatasetstoobtainwordclusters.DataLang.endeesfritptsvBibletokens1.5M665K657K732K613K670K696Ktypes16K20K27K22K29K29K23KEU-Stokens718K686K753K799K717K739K645Ktypes22K41K31K27K30K32K39KEuroparltokens56M50M57M62M55M56M46Mtypes133K400K195K153K188K200K366KTable4:StatisticsfortheBible,sampledEuroparl(EU-S)andEuroparldatasets.EachindividualBibletextﬁlefromChristodouloupoulosandSteedman(2014)consistsof24720sentences,exceptforEnglishdatasets,wheretwotranslationsintoEnglishareavailable,givingdou-bletheamountofdata.EachtextﬁlefromthesampledEuroparldatasetsconsistsof25KsentencesandEuroparlhasapproximately2millionsentencesperlanguagepair.LBaselineThispaperusingtheBible§3.1§3.2§3.3LASUASLASUASLASUASLASUASen58.265.565.072.366.374.070.876.5de49.759.151.659.754.962.665.272.8es68.377.273.179.676.681.976.782.1fr67.377.769.579.974.481.975.882.2it69.779.471.680.074.782.876.183.3pt71.577.576.981.581.084.481.384.7sv62.674.263.575.168.278.771.280.3avg63.972.967.375.570.978.173.980.3Table5:Performanceofdifferentmodelsinthispaper;ﬁrstthebaselinemodel,thenmodelstrainedusingthemethodsdescribedinsections§3.1–3.3.AllresultsmakeuseoftheBibleasasourceoftranslationdata.Alldiffer-encesinUASandLASarestatisticallysigniﬁcantwithp<0.001usingMcNemar’stest,withtheexceptionof“de”UAS/LASBaselinevs.3.1(i.e.,49.7vs51.6UASand59.1vs59.7LASarenotsigniﬁcantdifferences).WordalignmentWeusetheintersectedalign-mentsfromGIZA++(OchandNey,2003)ontrans-lationdata.Weexcludesentencesintranslationdatawithmorethan100words.4.2ResultsontheGoogleTreebankTable5showsthedependencyparsingaccuracyforthebaselinedelexicalizedapproach,andformodelswhichadd1)cross-lingualclusters(§3.1);2)lexicalfeatures(§3.2);and3)integrationwiththedensity-drivenmethodofRasooliandCollins(2015).Eachofthesethreestepsgivessigniﬁcantimprovementsinperformance.TheﬁnalLAS/UASof73.9/80.3%isseveralpercentagepointshigherthanthebaselineaccuracyof63.9/72.9%. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 285 Lang.BibleEuroparl-SampleEuroparlDensityThisPaperDensityThisPaperDensityThisPaperLASUASLASUASLASUASLASUASLASUASLASUASen59.166.470.876.564.372.870.276.268.476.371.177.5de60.269.565.272.861.672.064.973.073.079.775.682.1es70.376.876.782.172.078.376.081.574.680.976.682.6fr69.976.975.882.271.979.075.782.576.382.777.483.9it71.178.576.183.373.280.476.282.977.083.777.484.4pt72.176.481.384.775.379.781.6184.877.382.182.185.6sv66.576.371.280.371.980.673.581.675.684.176.984.5avg67.075.773.980.370.077.674.080.474.681.376.782.9Table6:Resultsforourmethodusingdifferentsourcesoftranslationdata.“Density”referstothemethodofRasooliandCollins(2015);“Thispaper”givesresultsusingthemethodsdescribedinsections3.1–3.3ofthispaper.The“Bible”experimentsusetheBibledataofChristodouloupoulosandSteedman(2014).The“Europarl”experimentsusetheEuroparldataofKoehn(2005).The“Europarl-Sample”experimentsuse25KrandomlychosensentencesfromEuroparl;thisgivesasimilarnumberofsentencestotheBibledata.AlldifferencesinLASandUASinthistablebetweenthedensityand“thispaper”settings(i.e.,fortheBible,Europarl-SampleandEuroparlsettings)arefoundtobestatisticallysigniﬁcantaccordingtoMcNemar’ssigntest.Lang.MX14LA16ZB15GCY16AMB16RC15ThispaperSupervisedBibleEuroparlUASUASLASUASLASUASLASUASLASUASLASUASLASUASen––59.870.5––––68.476.370.876.571.177.592.093.8de74.376.054.162.555.965.057.165.273.079.765.272.875.682.179.485.3es75.578.968.378.073.079.074.680.274.680.976.782.176.082.682.386.7fr76.580.868.878.971.077.673.980.676.382.775.882.277.483.981.786.3it77.779.469.479.371.278.472.580.777.083.776.183.377.484.486.188.8pt76.6–72.578.678.681.877.081.277.382.181.384.782.185.687.689.4sv79.383.062.575.069.578.268.179.075.684.171.280.376.984.584.188.1avg\en76.7–65.975.469.376.370.577.875.682.274.480.977.783.983.587.4Table7:ComparisonofourworkusingtheBibleandEuroparldata,withpreviouswork:MX14(MaandXia,2014),LA16(Lacroixetal.,2016),ZB15(ZhangandBarzilay,2015),GCY16(Guoetal.,2016),AMB16(Ammaretal.,2016b),andRC15(RasooliandCollins,2015).“Supervised”referstotheperformanceoftheparsertrainedonfullygoldstandarddatainasupervisedfashion(i.e.thepracticalupper-boundofourmodel).“avg\en”referstotheaverageaccuracyforalldatasetsexceptEnglish.ComparisontotheDensity-DrivenApproachus-ingEuroparlDataTable6showsaccuraciesforthedensity-drivenapproachofRasooliandCollins(2015),ﬁrstusingEuroparldata8andsecondusingtheBiblealone(withnocross-lingualclustersorlex-icalization).TheBibledataisconsiderablysmallerthanEuroparl(around100timessmaller),anditcanbeseenthatresultsusingtheBibleareseveralper-centagepointslowerthantheresultsforEuroparl(75.7%UASvs.81.3%UAS).Integratingcluster-basedandlexicalizedfeaturesdescribedinthecur-rentpaperwiththedensity-drivenapproachclosesmuchofthisgapinperformance(80.3%UAS).ThuswehavedemonstratedthatwecangetclosetotheperformanceoftheEuroparl-basedmodelsusing8RasooliandCollins(2015)donotreportresultsonEnglish.WeusethesamesettingtoobtaintheEnglishresults.onlytheBibleasasourceoftranslationdata.Us-ingourapproachonthefullEuroparldatagivesanaverageUASof82.9%,animprovementfromthe81.3%UASofRasooliandCollins(2015).Table6alsoshowsresultswhenweusearandomsubsetoftheEuroparldata,inwhichthenumberofsentences(25,000)ischosentogiveaverysimilarsizetotheBible.ItcanbeseenthataccuraciesusingtheBiblevs.theEuroparl-Sampleareverysimilar(80.3%vs.80.4%UAS),suggestingthatthesizeofthetranslationcorpusismuchmoreimportantthanthegenre.ComparisontoOtherPreviousWorkTable7comparestheaccuracyofourmethodtothefollow-ingrelatedwork:1)MaandXia(2014),whode-scribeanannotationprojectionmethodbasedonen-tropyregularization;2)Lacroixetal.(2016),who l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 286 Lang.RC15ThisPaper(§3.3)BibleEuroparlLASUASLASUASLASUASen66.274.467.874.468.075.1de71.678.861.970.373.680.8es72.379.273.879.974.280.7fr73.580.872.679.975.082.3it74.982.074.081.775.382.6pt75.480.779.283.380.484.4sv73.482.067.377.273.782.2avg72.579.770.978.174.381.2Table8:Theﬁnalresultsbasedonautomaticpartofspeechtags.RC15referstothebestperformingmodelofRasooliandCollins(2015).describeanannotationprojectionmethodbasedontrainingonpartialtreeswithdynamicoracles;3)ZhangandBarzilay(2015),whodescribeamethodthatlearnscross-lingualembeddingsandbilingualdictionariesfromEuroparldata,andusesthesefea-turesinadiscriminativeparsingmodel;4)Guoetal.(2016),whodescribeamethodthatlearnscross-lingualembeddingsfromEuroparldataandusesashift-reduceneuralparserwiththeserepresenta-tions;5)Ammaretal.(2016b)9,whousethesameembeddingsasGuoetal.(2016),withinanLSTM-basedparser;and6)RasooliandCollins(2015)whousethedensity-drivenapproachontheEuroparldata.Ourmethodgivessigniﬁcantimprovementsovertheﬁrstthreemodels,inspiteofusingtheBibletranslationdataratherthanEuroparl.WhenusingtheEuroparldata,ourmethodimprovesthestate-of-the-artmodelofRasooliandCollins(2015).PerformancewithAutomaticPOSTagsForcompleteness,Table8givesresultsforourmethodwithautomaticpart-of-speechtags.Thetagsareob-tainedusingthemodelofCollins(2002)10trainedonthetrainingpartofthetreebankdataset.FutureworkshouldstudyapproachesthattransferPOStagsinadditiontodependencies.4.3ResultsontheUniversalDependenciesv1.3Table9givesresultson38datasets(26languages)fromthenewlyreleaseduniversaldependenciescor-pus(Nivreetal.,2016).Giventhenumberoftree-banksandtospeeduptraining,wepicksourcelan-9Thisworkwaslaterpublishedunderadifferenttitle(Am-maretal.,2016a)withoutincludingUASresults.10https://github.com/rasoolims/SemiSupervisedPosTaggerDatasetDensityThispaperSupervisedLASUASLASUASLASUASit74.381.379.886.188.490.7sl68.275.978.684.186.389.1es69.177.576.384.183.586.9bg66.279.572.083.685.590.5pt66.775.874.883.483.086.7es-ancora68.977.574.683.186.589.4fr72.077.976.682.684.587.1sv-lines67.576.773.382.481.085.4pt-br68.375.276.282.087.889.7sv65.975.771.781.383.687.7no71.778.874.381.288.090.5pl65.477.670.181.085.190.3hr55.870.265.980.976.285.1cs-cac61.170.369.078.582.487.6da63.172.868.377.880.884.3en-lines67.075.968.677.380.784.6cs59.068.167.276.484.588.7id38.055.757.876.079.885.1de61.372.864.975.780.285.8ru-syntagrus56.070.761.675.382.087.8ru56.764.865.474.871.977.7cs-cltt57.565.465.674.777.181.4ro54.667.460.774.678.285.3la54.571.655.772.843.152.5nl-lassysmall51.562.661.971.776.580.6el53.766.759.671.079.183.1et48.965.656.970.975.982.9hi34.450.649.969.989.492.9hu26.148.955.069.969.579.4en59.768.161.869.085.388.1ﬁ-ftb50.363.256.567.573.379.7ﬁ49.860.857.366.473.478.2la-ittb44.155.451.862.876.280.9nl40.649.450.162.070.175.0la-proiel43.660.345.061.364.972.9sl-sst42.459.247.660.663.470.4fa44.453.246.556.084.187.5tr05.318.532.751.965.678.8Average56.768.164.074.878.983.8Table9:Resultsforthedensitydrivenmethod(RasooliandCollins,2015)andoursusingtheBibledataontheuniversaldependenciesv1.3(Nivreetal.,2016).Theta-bleissortedbytheperformanceofourmethod.Thelastmajorcolumnsshowstheperformanceofthesupervisedparser.Theabbreviationsareasfollows:bg(Bulgarian),cs(Czech),da(Danish),de(German),el(Greek),en(En-glish),es(Spanish),et(Estonian),fa(Persian(Farsi)),ﬁ(Finnish),fr(French),hi(Hindi),hr(Croatian),hu(Hun-garian),id(Indonesian),it(Italian),la(Latin),nl(Dutch),no(Norwegian),pl(Polish),pt(Portuguese),ro(Roma-nian),ru(Russian),sl(Slovenian),sv(Swedish),andtr(Turkish).AlldifferencesinLASandUASinthista-blewerefoundtobestatisticallysigniﬁcantaccordingtoMcNemar’ssigntestwithp<0.001.guagesthathaveatleast5outof6commonWALSpropertieswitheachtargetlanguage.Ourexperi-mentsarecarriedoutusingtheBibleasourtransla- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 287 tiondata.AsshowninTable9,ourmethodconsis-tentlyoutperformsthedensity-drivenmethodofRa-sooliandCollins(2015)andformanylanguagestheaccuracyofourmethodgetsclosetotheaccuracyofthesupervisedparser.Inallthelanguages,ourmethodissigniﬁcantlybetterthanthedensity-drivenmethodusingtheMcNemar’stestwithp<0.001.Accuracyonsomelanguages(e.g.,Persian(fa)andTurkish(tr))islow,suggestingthatfutureworkshouldconsidermorepowerfultechniquesfortheselanguages.Therearetwoimportantfactstonote.First,thenumberoffullyprojectedtreesinsomelanguagesissolowsuchthatthedensity-drivenap-proachcannotstartwithagoodinitializationtoﬁllinpartialdependencies.ForexampleTurkishhasonlyonefulltreewithonlysixwords,Persianwith25trees,andDutchwith28trees.Second,weob-serveverylowaccuraciesinsupervisedparsingforsomelanguagesinwhichthenumberoftrainingsen-tencesisverylow(forexample,Latinhasonly1326projectivetreesinthetrainingdata).5AnalysisWeconcludewithsomeanalysisoftheaccuracyofthemethodondifferentdependencytypes,acrossthedifferentlanguages.Table10showsprecisionandrecallondifferentdependencytypesinEnglish(usingtheGoogletreebank).TheimprovementsinaccuracywhenmovingfromthedelexicalizedmodeltotheBibleorEuroparlmodelapplyquiteuniformlyacrossalldependencytypes,withallde-pendencylabelsshowinganimprovement.Table11showsthedependencyaccuracysortedbypart-of-speechtagofthemodiﬁerinthedepen-dency.Webreaktheresultsintothreegroups:G1languages,whereUASisatleast80%overall;G2languages,whereUASisbetween70%and80%;andG3languages,whereUASislessthan70%.Therearesomequitesigniﬁcantdifferencesinac-curacydependingonthePOSofthemodiﬁerword.IntheG1languages,forexample,ADP,DET,ADJ,PRONandAUXallhaveover85%accuracy;incon-trastNOUN,VERB,PROPN,ADVallhaveaccu-racythatislessthan80%.AverysimilarpatternisseenfortheG2languages,withADP,DET,ADJ,andAUXagainhavinggreaterthan85%accuracy,butNOUN,VERB,PROPNandADVhavingloweraccuracies.TheseresultssuggestthatdifﬁcultyvariesquitesigniﬁcantlydependingonthemodiﬁerPOS,anddifferentlanguagesshowthesamepat-ternsofdifﬁcultywithrespecttothemodiﬁerPOS.Table12showsaccuracysortedbythePOStagoftheheadwordofthedependency.ByfarthemostfrequentheadPOStagsareNOUN,VERB,andPROPN(accountingfor85%ofalldependen-cies).ThetablealsoshowsthatforalllanguagegroupsG1,G2,andG3,thef1scoresforNOUN,VERBandPROPNaregenerallyhigherthanthef1scoresforotherheadPOStags.Finally,Table13showsprecisionandrecallfordifferentdependencylabelsfortheG1,G2andG3languages.Weagainseequitelargedifferencesinaccuracybetweendifferentdependencylabels.TheG1languagedependencies,withthemostfrequentlabelnmod,hasanF-scoreof75.2.Incontrast,thesecondmostfrequentlabel,case,has93.7F-score.OtherfrequentlabelswithlowaccuracyintheG1languagesareadvmod,conj,andcc.6RelatedWorkTherehasrecentlybeenagreatdealofworkonsyntactictransfer.Anumberofmethods(ZemanandResnik,2008;McDonaldetal.,2011;Cohenetal.,2011;Naseemetal.,2012;T¨ackstr¨ometal.,2013;RosaandZabokrtsky,2015)directlylearndelexicalizedmodelsthatcanbetrainedonuniversaltreebankdatafromoneormoresourcelanguages,thenappliedtothetargetlanguage.Morerecentworkhasintroducedcross-lingualrepresentations—forexamplecross-lingualword-embeddings—thatcanbeusedtoimproveperformance(ZhangandBarzilay,2015;Guoetal.,2015;Duongetal.,2015a;Duongetal.,2015b;Guoetal.,2016;Am-maretal.,2016b).Thesecross-lingualrepresen-tationsareusuallylearnedfromparalleltranslationdata.Weshowresultsofseveralmethods(ZhangandBarzilay,2015;Guoetal.,2016;Ammaretal.,2016b)inTable7ofthispaper.Theannotationprojectionapproach,wherede-pendenciesfromonelanguagearetransferredthroughtranslationalignmentstoanotherlanguage,hasbeenconsideredbyseveralauthors(Hwaetal.,2005;Ganchevetal.,2009;McDonaldetal.,2011;MaandXia,2014;RasooliandCollins,2015; l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 288 dependencyfreqDelexicalizedBibleEuroparlprec./rec.f1prec./rec.f1prec./rec.f1adpmod10.657.2/62.759.867.1/71.869.470.3/73.872.0adpobj10.665.5/69.167.275.3/77.476.375.9/79.277.6det9.572.5/75.674.084.3/86.385.386.6/89.888.2compmod9.183.7/59.969.887.3/70.277.889.0/73.080.2nsubj8.069.7/60.064.582.1/77.579.783.0/78.180.5amod7.076.9/72.374.583.0/78.780.880.9/77.979.4ROOT4.869.3/70.469.885.0/85.185.083.8/85.884.8num4.667.8/55.360.970.7/55.262.075.0/63.068.5dobj4.560.8/80.369.264.0/84.973.068.4/86.676.5advmod4.165.9/61.963.872.7/68.170.369.6/68.869.2aux3.576.6/93.984.490.2/95.993.089.6/96.492.9cc2.967.6/61.764.573.1/73.173.173.1/73.373.2conj2.846.3/56.150.745.6/62.952.948.1/62.854.5dep2.090.5/25.840.199.2/33.850.492.0/34.450.1poss2.072.1/30.643.077.9/45.857.778.2/42.154.7ccomp1.676.2/28.441.388.0/61.372.382.3/69.175.1adp1.220.0/0.50.992.7/42.157.991.7/23.337.1nmod1.260.7/48.153.756.3/47.151.352.6/46.249.2xcomp1.266.6/48.656.285.1/65.373.978.3/71.074.5mark1.137.8/24.629.873.8/50.359.862.8/53.857.9advcl0.823.6/22.322.938.7/38.838.838.0/42.940.3appos0.88.5/43.014.320.4/61.030.626.4/61.737.0auxpass0.888.9/91.490.196.8/97.197.098.6/98.698.6rcmod0.838.2/33.335.646.8/54.650.452.7/55.053.8nsubjpass0.773.2/64.968.887.6/77.082.085.5/75.880.3acomp0.686.8/92.589.683.3/93.588.191.0/93.992.4adpcomp0.642.0/70.252.547.9/61.553.955.4/47.150.9partmod0.620.2/36.025.836.7/49.142.031.0/40.735.2attr0.567.7/86.475.976.5/92.183.672.6/92.781.4neg0.574.7/85.079.693.3/91.092.192.6/89.891.2prt0.327.4/92.242.232.4/96.648.531.9/97.448.1infmod0.230.7/72.443.238.4/64.448.142.6/63.250.9expl0.184.8/87.586.293.8/93.893.891.2/96.993.9iobj0.151.7/78.962.588.9/84.286.536.4/84.250.8mwe0.10.0/0.00.05.3/2.13.011.1/10.410.8parataxis0.15.6/19.68.717.3/47.125.314.6/45.122.0cop0.00.0/0.00.00.0/0.00.00.0/0.00.0csubj0.012.8/33.318.522.2/26.724.225.0/46.732.6csubjpass0.0100.0/100.0100.0100.0/100.0100.050.0/100.066.7rel0.0100.0/6.311.890.9/62.574.166.7/37.548.0Table10:Precision,recallandf-scoreofdifferentdepen-dencyrelationsontheEnglishdevelopmentdataoftheGoogleuniversaltreebank.Themajorcolumnsshowthedependencylabels(“dep.”),frequency(“freq.”),thebase-linedelexicalizedmodel(“delex”),andourmethodusingtheBibleandEuroparl(“EU”)astranslationdata.Therowsaresortedbyfrequency.Lacroixetal.,2016;Agi´cetal.,2016;SchlichtkrullandSøgaard,2017).Otherrecentwork(Tiedemannetal.,2014;Tiede-mann,2015;TiedemannandAgi´c,2016)hascon-sideredtreebanktranslation,whereastatisticalma-chinetranslationsystem(e.g.,MOSES(Koehnetal.,2007))isusedtotranslateasourcelanguagetreebankintothetargetlanguage,completewithre-orderingoftheinputsentence.ThelexicalizationPOSG1G2G3freq%acc.freq%acc.freq%acc.NOUN22.077.630.071.225.358.0ADP16.992.310.992.311.290.6DET11.996.43.092.43.686.6VERB11.774.513.566.117.152.2PROPN8.179.04.765.26.849.5ADJ8.088.512.786.98.473.6PRON5.487.75.982.27.671.1ADV4.376.06.670.95.661.9CONJ3.671.84.763.04.260.4AUX2.791.51.788.93.070.6NUM2.279.52.368.42.075.7SCONJ1.880.51.977.22.665.0PART0.980.21.864.31.945.0X0.252.30.140.50.636.9SYM0.164.30.140.90.145.5INTJ0.178.50.051.70.360.2Table11:AccuracyofunlabeleddependenciesbyPOSofthemodiﬁerword,forthreegroupsoflanguagesfortheuniversaldependenciesexperimentsinTable9:G1(languageswithUAS≥80),G2(languageswith70≤UAS<80),G3(languageswithUAS<70).TherowsaresortedbyfrequencyintheG1languages.approachdescribedinthispaperisasimpleformoftreebanktranslation,whereweuseaword-to-wordtranslationmodel.Inspiteofitssimplicity,itisaneffectiveapproach.Anumberofauthorshaveconsideredincorporat-inguniversalsyntacticproperties,suchasdepen-dencyorder,byselectivelylearningsyntacticat-tributesfromsimilarsourcelanguages(Naseemetal.,2012;T¨ackstr¨ometal.,2013;ZhangandBarzi-lay,2015;Ammaretal.,2016a).Selectiveshar-ingofsyntacticpropertiesiscomplementarytoourwork.Weusedaverylimitedformofselectiveshar-ing,throughtheWALSproperties,inourbaselineapproach.Morerecently,WangandEisner(2016)havedevelopedasynthetictreebankasauniversaltreebanktohelplearnparsersfornewlanguages.Mart´ınezAlonsoetal.(2017)tryaverydifferentapproachincross-lingualtransferbyusingarank-ingapproach.Anumberofauthors(T¨ackstr¨ometal.,2012;Guoetal.,2015;Guoetal.,2016)haveintroducedmethodsthatlearncross-lingualrepresentationsthatarethenusedinsyntactictransfer.Mostoftheseapproachesintroduceconstraintstoaclusteringorembeddingalgorithmthatencouragewordsthataretranslationsofeachothertohavesimilarrepresen-tations.Ourmethodofderivingacross-lingualcor- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 289 POSG1G2G3freq%prec.rec.f1freq%prec.rec.f1freq%prec.rec.f1NOUN43.985.488.687.043.577.381.279.234.567.171.069.0VERB32.083.583.683.635.474.977.976.441.363.866.565.1PROPN9.184.084.084.04.167.663.265.36.457.254.856.0ADJ4.576.272.474.35.775.756.064.45.864.949.155.9PRON1.479.368.373.41.481.561.470.02.265.249.156.0NUM1.277.272.474.71.052.041.846.30.762.554.758.3ADV1.054.039.045.31.556.527.236.71.244.125.832.6ADP0.639.86.511.20.325.00.91.70.340.58.313.8SYM0.379.081.180.10.141.566.351.00.155.352.253.7DET0.336.322.627.80.160.630.640.70.167.625.336.8AUX0.235.73.76.60.017.26.79.60.833.32.24.2X0.152.452.252.30.142.541.642.10.439.742.741.1SCONJ0.136.810.015.70.145.75.810.30.130.013.518.7PART0.126.73.05.40.115.94.36.80.126.736.830.9CONJ0.147.86.511.40.13.30.91.40.151.710.217.0INTJ0.052.447.850.00.020.07.110.50.144.243.043.6Table12:Precision,recallandf-scoreofunlabeleddependencyattachmentfordifferentPOStagsasheadforthreegroupsoflanguagesfortheuniversaldependenciesexperimentsinTable9:G1(languageswithUAS≥80),G2(languageswith70≤UAS<80),G3(languageswithUAS<70).TherowsaresortedbyfrequencyintheG1languages.pus(seeFigure1)iscloselyrelatedtoDuongetal.(2015a);GouwsandSøgaard(2015);andWicketal.(2015).Ourworkhasmadeuseofdictionariesthatareautomaticallyextractedfrombilingualcorpora.Analternativeapproachwouldbetousehand-craftedtranslationlexicons,forexample,PanLex(Bald-winetal.,2010)(e.g.seeDuongetal.(2015b)),whichcovers1253languagevarieties,Googletrans-late(e.g.,seeAmmaretal.(2016c)),orWiktionary(e.g.,seeDurrettetal.(2012)foranapproachthatusesWiktionaryforcross-lingualtransfer).Theseresourcesarepotentiallyveryrichsourcesofin-formation.Futureworkshouldinvestigatewhethertheycangiveimprovementsinperformance.7ConclusionsWehavedescribedamethodforcross-lingualsyn-tactictransferthatiseffectiveinascenariowherealargeamountoftranslationdataisnotavailable.Wehaveintroducedasimple,directmethodforderivingcross-lingualclusters,andfortransferringlexicalin-formationacrosstreebanksfordifferentlanguages.ExperimentswiththismethodshowthatthemethodgivesimprovedperformanceoverpreviousworkthatmakesuseofEuroparl,amuchlargertranslationcor-pus.AcknowledgementWethanktheanonymousreviewersfortheirvalu-ablefeedback.WealsothankRyanMcDonald,KarlStratosandOscarT¨ackstr¨omfortheircommentsontheﬁrstdraft.AppendixAParsingFeaturesWeusedallfeaturesinZhangandNivre(2011,Ta-ble1and2),whichdescribesfeaturesbasedonthewordandpart-of-speechatvariouspositionsonthestackandbufferofthetransitionsystem.Inaddi-tion,weexpandtheZhangandNivre(2011,Table1)featurestoincludeclusters,asfollows:wheneverafeatureteststhepart-of-speechforawordinpo-sition0ofthestackorbuffer,weintroducefeaturesthatreplacethepart-of-speechwiththeBrownclus-teringbit-stringoflength4and6.Wheneverafea-turetestsforthewordidentityatposition0ofthestackorbuffer,weintroduceaclusterfeaturethatreplacesthewordwiththefullclusterfeature.Wetakethecrossproductofallfeaturescorrespondingtothechoiceof4or6lengthbitstringforpart-of-speechfeatures.ReferencesˇZeljkoAgi´c,AndersJohannsen,BarbaraPlank,H´ectorAlonsoMart´ınez,NatalieSchluter,andAn-dersSøgaard.2016.Multilingualprojectionfor l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 290 Dep.G1G2G3freq%prec.rec.f1freq%prec.rec.f1freq%prec.rec.f1nmod15.874.076.375.216.467.372.269.717.356.957.657.3case15.392.694.793.710.792.493.593.010.790.290.290.2det11.896.596.496.43.591.891.991.93.879.186.482.6nsubj6.585.386.886.07.575.573.574.57.861.063.262.1amod6.492.994.093.510.890.190.990.55.375.782.979.1dobj5.393.090.891.97.184.381.883.05.771.972.672.3root5.384.885.285.06.877.577.977.77.964.965.765.3advmod4.173.472.272.87.168.169.368.75.354.858.756.7conj4.060.468.164.05.850.256.653.24.241.348.144.5cc3.471.271.271.24.563.563.363.43.960.661.661.1mark3.385.187.086.02.276.279.677.93.470.97171acl2.465.961.663.71.749.751.350.52.032.628.730.5aux2.291.593.692.51.286.891.188.92.266.478.271.8name1.986.586.286.41.375.372.173.60.827.845.134.4cop1.673.174.573.81.367.752.559.12.150.851.251nummod1.483.886.084.91.673.977.675.71.479.281.780.5advcl1.360.159.860.01.357.448.852.72.042.638.140.2appos1.373.964.969.10.851.248.950.00.531.332.131.7mwe0.957.715.624.60.566.215.124.60.331.915.620.9xcomp0.882.974.678.61.276.273.474.81.040.762.949.5ccomp0.872.870.871.80.663.164.163.61.242.840.341.5neg0.789.588.188.80.781.282.181.61.173.67272.8iobj0.798.791.194.70.596.371.081.71.197.167.179.3expl0.690.984.787.70.787.386.887.10.162.54552.3auxpass0.595.796.596.10.798.393.595.81.292.349.864.7nsubjpass0.594.689.992.20.796.185.090.20.694.467.278.5parataxis0.456.032.441.10.952.236.843.20.430.433.231.7compound0.474.266.269.90.672.563.667.84.484.751.664.1csubj0.277.052.562.40.388.157.369.40.245.931.337.2dep0.170.452.460.10.691.238.554.20.517.716.216.9discourse0.175.658.566.00.153.360.056.50.777.148.459.4foreign0.062.269.765.70.198.460.775.10.130.919.323.8goeswith0.035.729.432.30.175.019.631.10.026.116.720.3csubjpass0.0100.073.985.00.093.371.280.80.187.519.732.2list0.0–––0.077.045.657.30.171.418.529.4remnant0.090.025.740.00.027.310.214.80.192.311.820.9reparandum0.0–––0.0–––0.1100.034.651.4vocative0.055.631.340.00.057.452.955.10.184.558.669.2dislocated0.088.930.845.70.054.560.057.10.092.048.963.9Table13:Precision,recallandf-scorefordifferentdependencylabelsforthreegroupsoflanguagesfortheuniversaldependenciesexperimentsinTable9:G1(languageswithUAS≥80),G2(languageswith70≤UAS<80),G3(languageswithUAS<70).TherowsaresortedbyfrequencyintheG1languages.parsingtrulylow-resourcelanguages.TransactionsoftheAssociationforComputationalLinguistics,4:301–312.WaleedAmmar,GeorgeMulcaire,MiguelBallesteros,ChrisDyer,andNoahSmith.2016a.Manylanguages,oneparser.TransactionsoftheAssociationforCom-putationalLinguistics,4:431–444.WaleedAmmar,GeorgeMulcaire,MiguelBallesteros,ChrisDyer,andNoahA.Smith.2016b.Oneparser,manylanguages.arXivpreprintarXiv:1602.01595v1.WaleedAmmar,GeorgeMulcaire,YuliaTsvetkov,Guil-laumeLample,ChrisDyer,andNoahA.Smith.2016c.Massivelymultilingualwordembeddings.arXivpreprintarXiv:1602.01925.TimothyBaldwin,JonathanPool,andSusanMColow-ick.2010.PanlexandLEXTRACT:Translatingallwordsofalllanguagesoftheworld.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics:Demonstrations,pages37–40.Associa-tionforComputationalLinguistics.PeterF.Brown,PeterV.Desouza,RobertL.Mercer,Vin-centJ.DellaPietra,andJeniferC.Lai.1992.Class-basedn-grammodelsofnaturallanguage.Computa-tionallinguistics,18(4):467–479.JinhoD.Choi,JoelTetreault,andAmandaStent.2015.Itdepends:Dependencyparsercomparisonusingaweb-basedevaluationtool.InProceedingsofthe53rdAnnualMeetingoftheAssociationforComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguageProcessingoftheAsianFedera-tionofNaturalLanguageProcessing,ACL,pages26–31. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 291 ChristosChristodouloupoulosandMarkSteedman.2014.Amassivelyparallelcorpus:TheBiblein100languages.LanguageResourcesandEvaluation,pages1–21.ShayB.Cohen,DipanjanDas,andNoahA.Smith.2011.Unsupervisedstructurepredictionwithnon-parallelmultilingualguidance.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages50–61,Edinburgh,Écosse,UK.,July.AssociationforComputationalLinguistics.MichaelCollins.2002.Discriminativetrainingmeth-odsforhiddenMarkovmodels:Theoryandexperi-mentswithperceptronalgorithms.InProceedingsofthe2002ConferenceonEmpiricalMethodsinNatu-ralLanguageProcessing,pages1–8.AssociationforComputationalLinguistics,July.MatthewS.DryerandMartinHaspelmath,editors.2013.WALSOnline.MaxPlanckInstituteforEvolutionaryAnthropology,Leipzig.LongDuong,TrevorCohn,StevenBird,andPaulCook.2015a.Cross-lingualtransferforunsuperviseddepen-dencyparsingwithoutparalleldata.InProceedingsoftheNineteenthConferenceonComputationalNaturalLanguageLearning,pages113–122,Beijing,Chine,July.AssociationforComputationalLinguistics.LongDuong,TrevorCohn,StevenBird,andPaulCook.2015b.Aneuralnetworkmodelforlow-resourceuni-versaldependencyparsing.InProceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages339–348,Lisbon,Por-tugal,September.AssociationforComputationalLin-guistics.GregDurrett,AdamPauls,andDanKlein.2012.Syntac-tictransferusingabilinguallexicon.InProceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning,pages1–11,JejuIsland,Korea,July.AssociationforComputationalLinguis-tics.KuzmanGanchev,JenniferGillenwater,andBenTaskar.2009.Dependencygrammarinductionviabitextpro-jectionconstraints.InProceedingsoftheJointCon-ferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLan-guageProcessingoftheAFNLP,pages369–377,Sun-tec,Singapore,August.AssociationforComputationalLinguistics.StephanGouwsandAndersSøgaard.2015.Simpletask-speciﬁcbilingualwordembeddings.InProceedingsofthe2015ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:Hu-manLanguageTechnologies,pages1386–1390,Den-ver,Colorado,May–June.AssociationforComputa-tionalLinguistics.JiangGuo,WanxiangChe,DavidYarowsky,HaifengWang,andTingLiu.2015.Cross-lingualdependencyparsingbasedondistributedrepresentations.InPro-ceedingsofthe53rdAnnualMeetingoftheAssocia-tionforComputationalLinguisticsandthe7thInter-nationalJointConferenceonNaturalLanguagePro-cessing(Volume1:LongPapers),pages1234–1244,Beijing,Chine,July.AssociationforComputationalLinguistics.JiangGuo,WanxiangChe,DavidYarowsky,HaifengWang,andTingLiu.2016.Arepresentationlearningframeworkformulti-sourcetransferparsing.InTheThirtiethAAAIConferenceonArtiﬁcialIntelligence(AAAI-16),Phoenix,Arizona,USA.LiangHuang,SuphanFayong,andYangGuo.2012.Structuredperceptronwithinexactsearch.InPro-ceedingsofthe2012ConferenceoftheNorthAmeri-canChapteroftheAssociationforComputationalLin-guistics:HumanLanguageTechnologies,pages142–151,Montr´eal,Canada,June.AssociationforCompu-tationalLinguistics.RebeccaHwa,PhilipResnik,AmyWeinberg,ClaraCabezas,andOkanKolak.2005.Bootstrappingparsersviasyntacticprojectionacrossparalleltexts.Naturallanguageengineering,11(03):311–325.PhilippKoehn,HieuHoang,AlexandraBirch,ChrisCallison-Burch,MarcelloFederico,etal.2007.Moses:Opensourcetoolkitforstatisticalmachinetranslation.InProceedingsofthe45thAnnualMeet-ingoftheACLonInteractivePosterandDemonstra-tionSessions,ACL’07,pages177–180,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.PhilippKoehn.2005.Europarl:Aparallelcorpusforsta-tisticalmachinetranslation.InMTsummit,volume5,pages79–86.Oph´elieLacroix,LaurianeAufrant,GuillaumeWis-niewski,andFranc¸oisYvon.2016.Frustratinglyeasycross-lingualtransferfortransition-baseddependencyparsing.InProceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics:HumanLanguageTechnolo-gies,pages1058–1063,SanDiego,California,June.AssociationforComputationalLinguistics.PercyLiang.2005.Semi-supervisedlearningfornaturallanguage.Master’sthesis,MassachusettsInstituteofTechnology.XuezheMaandFeiXia.2014.Unsuperviseddepen-dencyparsingwithtransferringdistributionviaparal-lelguidanceandentropyregularization.InProceed-ingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers),pages1337–1348,Baltimore,Maryland,June.Asso-ciationforComputationalLinguistics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 292 H´ectorMart´ınezAlonso,ˇZeljkoAgi´c,BarbaraPlank,andAndersSøgaard.2017.Parsinguniversalde-pendencieswithouttraining.InProceedingsofthe15thConferenceoftheEuropeanChapteroftheAs-sociationforComputationalLinguistics:Volume1,LongPapers,pages230–240.AssociationforCompu-tationalLinguistics.DavidMcClosky,EugeneCharniak,andMarkJohnson.2006.Effectiveself-trainingforparsing.InPro-ceedingsoftheMainConferenceonHumanLanguageTechnologyConferenceoftheNorthAmericanChap-teroftheAssociationofComputationalLinguistics,HLT-NAACL’06,pages152–159,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.RyanMcDonald,SlavPetrov,andKeithHall.2011.Multi-sourcetransferofdelexicalizeddependencyparsers.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages62–72,Edinburgh,Écosse,UK.,July.Associ-ationforComputationalLinguistics.RyanMcDonald,JoakimNivre,YvonneQuirmbach-Brundage,YoavGoldberg,DipanjanDas,etal.2013.Universaldependencyannotationformultilin-gualparsing.InProceedingsofthe51stAnnualMeet-ingoftheAssociationforComputationalLinguistics(Volume2:ShortPapers),pages92–97,Soﬁa,Bul-garia,August.AssociationforComputationalLinguis-tics.TahiraNaseem,ReginaBarzilay,andAmirGloberson.2012.Selectivesharingformultilingualdependencyparsing.InProceedingsofthe50thAnnualMeet-ingoftheAssociationforComputationalLinguistics:LongPapers-Volume1,pages629–637.AssociationforComputationalLinguistics.JoakimNivre,ˇZeljkoAgi´c,LarsAhrenberg,MariaJesusAranzabe,MasayukiAsahara,etal.2016.UniversalDependencies1.3.LINDAT/CLARINdigitallibraryatInstituteofFormalandAppliedLinguistics,CharlesUniversityinPrague.FranzJosefOchandHermannNey.2003.Asystem-aticcomparisonofvariousstatisticalalignmentmod-els.ComputationalLinguistics,29(1):19–51.MohammadSadeghRasooliandMichaelCollins.2015.Density-drivencross-lingualtransferofdependencyparsers.InProceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages328–338,Lisbon,Portugal,September.Associ-ationforComputationalLinguistics.MohammadSadeghRasooliandJoelTetreault.2015.Yaraparser:Afastandaccuratedependencyparser.arXivpreprintarXiv:1503.06733.RudolfRosaandZdenekZabokrtsky.2015.Klcpos3-alanguagesimilaritymeasurefordelexicalizedparsertransfer.InProceedingsofthe53rdAnnualMeet-ingoftheAssociationforComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguageProcessing(Volume2:ShortPapers),pages243–249,Beijing,Chine,July.AssociationforCom-putationalLinguistics.MichaelSchlichtkrullandAndersSøgaard.2017.Cross-lingualdependencyparsingwithlatedecodingfortrulylow-resourcelanguages.InProceedingsofthe15thConferenceoftheEuropeanChapteroftheAs-sociationforComputationalLinguistics:Volume1,LongPapers,pages220–229.AssociationforCompu-tationalLinguistics.KarlStratos,MichaelCollins,andDanielHsu.2015.Model-basedwordembeddingsfromdecompositionsofcountmatrices.InProceedingsofthe53rdAnnualMeetingoftheAssociationforComputationalLinguis-ticsandthe7thInternationalJointConferenceonNat-uralLanguageProcessing(Volume1:LongPapers),pages1282–1291,Beijing,Chine,July.AssociationforComputationalLinguistics.OscarT¨ackstr¨om,RyanMcDonald,andJakobUszkoreit.2012.Cross-lingualwordclustersfordirecttransferoflinguisticstructure.InProceedingsofthe2012con-ferenceoftheNorthAmericanchapteroftheassoci-ationforcomputationallinguistics:Humanlanguagetechnologies,pages477–487.AssociationforCompu-tationalLinguistics.OscarT¨ackstr¨om,RyanMcDonald,andJoakimNivre.2013.Targetlanguageadaptationofdiscriminativetransferparsers.InProceedingsofthe2013Con-ferenceoftheNorthAmericanChapteroftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages1061–1071,Atlanta,Geor-gia,June.AssociationforComputationalLinguistics.J¨orgTiedemannandˇZeljkoAgi´c.2016.Synthetictree-bankingforcross-lingualdependencyparsing.Jour-nalofArtiﬁcialIntelligenceResearch,55:209–248.J¨orgTiedemann,ˇZeljkoAgi´c,andJoakimNivre.2014.Treebanktranslationforcross-lingualparserinduc-tion.InProceedingsoftheEighteenthConferenceonComputationalNaturalLanguageLearning,pages130–140,AnnArbor,Michigan,June.AssociationforComputationalLinguistics.J¨orgTiedemann.2015.Improvingthecross-lingualpro-jectionofsyntacticdependencies.InNordicConfer-enceofComputationalLinguisticsNODALIDA2015,pages191–199.DingquanWangandJasonEisner.2016.Thegalacticdependenciestreebanks:Gettingmoredatabysynthe-sizingnewlanguages.TransactionsoftheAssociationforComputationalLinguistics,4:491–505.MichaelWick,PallikaKanani,andAdamPocock.2015.Minimally-constrainedmultilingualembeddingsvia l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 293 artiﬁcialcode-switching.InWorkshoponTransferandMulti-TaskLearning:TrendsandNewPerspec-tives,Montréal,Canada,December.MinXiaoandYuhongGuo.2015.Annotationprojection-basedrepresentationlearningforcross-lingualdependencyparsing.InProceedingsoftheNineteenthConferenceonComputationalNatu-ralLanguageLearning,pages73–82,Beijing,Chine,July.AssociationforComputationalLinguistics.DanielZemanandPhilipResnik.2008.Cross-languageparseradaptationbetweenrelatedlanguages.InPro-ceedingsoftheIJCNLP-08WorkshoponNLPforLessPrivilegedLanguages,pages35–42.YuanZhangandReginaBarzilay.2015.Hierarchi-callow-ranktensorsformultilingualtransferparsing.InProceedingsofthe2015ConferenceonEmpiri-calMethodsinNaturalLanguageProcessing,pages1857–1867,Lisbon,Portugal,September.AssociationforComputationalLinguistics.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages188–193,Portland,Ore-gon,Etats-Unis,June.AssociationforComputationalLin-guistics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 1 1 5 6 7 4 7 2 / / t l a c _ a _ 0 0 0 6 1 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 294
Télécharger le PDF