计算语言学协会会刊, 2 (2014) 245–258. 动作编辑器: Patrick Pantel.

Submitted 11/2013; 修改 3/2014; 已发表 10/2014. C
(西德:13)

2014 计算语言学协会.

CrosslingualandMultilingualConstructionofSyntax-BasedVectorSpaceModelsJasonUttandSebastianPadóInstitutfürMaschinelleSprachverarbeitungUniversitätStuttgart[uttjn|pado]@ims.uni-stuttgart.deAbstractSyntax-baseddistributionalmodelsoflexicalsemanticsprovideaﬂexibleandlinguisticallyadequaterepresentationofco-occurrenceinfor-mation.However,theirconstructionrequireslarge,accuratelyparsedcorpora,whichareun-availableformostlanguages.Inthispaper,wedevelopanumberofmeth-odstoovercomethisobstacle.Wedescribe(A)acrosslingualapproachthatconstructsasyntax-basedmodelforanewlanguagerequir-ingonlyanEnglishresourceandatranslationlexicon;和(乙)multilingualapproachesthatcombinecrosslingualwithmonolingualinfor-mation,subjecttoavailability.WeevaluateontwolexicalsemanticbenchmarksinGer-manandCroatian.Weﬁndthatthemodelsexhibitcomplementaryproﬁles:crosslingualmodelsyieldhigheraccuracieswhilemonolin-gualmodelsprovidebettercoverage.Inaddi-tion,weshowthatsimplemultilingualmodelscansuccessfullycombinetheirstrengths.1IntroductionBuildingontheDistributionalHypothesis(哈里斯,1954;MillerandCharles,1991),whichstatesthatwordsoccurringinsimilarcontextsaresimilarinmeaning,distributionalsemanticmodels(DSMs)rep-resentaword’smeaningviaitsoccurrenceincontextinlargecorpora.Vectorspaces,themostwidelyusedtypeofDSMs,representwordsasvectorsinahigh-dimensionalspacewhosedimensionscorrespondtofeaturesofthewords’contexts.Wordspacesrepre-sentthesimplestcaseofDSMsinwhichthedimen-sionsaresimplythecontextwords(Schütze,1992).AnotablesubclassofDSMsaresyntax-basedmod-els(林,1998;BaroniandLenci,2010)whichuse(lexicalized)syntacticrelationsasdimensions.Theyareabletomodelmoreﬁne-graineddistinctionsthanwordspacesandhavebeenfoundtobeusefulfortaskssuchasselectionalpreferencelearning(Erketal.,2010),verbclassinduction(SchulteimWalde,2006),analogicalreasoning(特尼,2006),andalter-nationdiscovery(Joanisetal.,2006).Despitetheirﬂexibilityandusefulness,syntax-basedDSMsareusedlessoftenthanword-basedspaces.Animpor-tantreasonisthattheirconstructionrequiresaccurateparsers,whichareunavailableformanylanguages.Inaddition,syntax-basedDSMsareinherentlymoresparsethanwordspaces,whichcallsforalargecor-pusofwellparsabledata.ItisthusnotsurprisingthatbesidesEnglish(BaroniandLenci,2010),onlyfewotherlanguagespossesslarge-scalesyntax-basedDSMs(PadóandUtt,2012;Šnajderetal.,2013).ThispaperdevelopsmethodsthattakeadvantageoftheresourcegradientbetweenEnglishandotherlanguages,exploitingthehigher-qualityresourcesoftheformertoinduceresourcesfortargetlanguagesamongthelatter,bytranslatingtheword-link-wordco-occurrencesthatunderliesyntax-basedDSMs.Thisdirectlyprovidesacrosslingualmethodtocon-structsyntax-basedDSMsfortargetlanguageswith-outanytargetlanguagedata,requiringonlyanEn-glishsyntax-basedDSMandatranslationlexicon.Suchlexiconsareavailableformanylanguagepairs,andweoutlineamethodtoreduceambiguityinherentinsuchdictionaries.Wedescribeasetofmultilin-gualmethodsthatcancombinecorpusevidencefromEnglishandthetargetlanguagetofurtherimprovetheperformanceoftheobtainedDSM.Weconsidertwotargetlanguages,GermanandCroatian,asexamplesofonecloseandonemoreremotetargetlanguage.Forevaluation,weusetwo

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

246

initiativeNpushVdirectionNsubj-1: 3在: 1obj-1: 10intendVabilityNcomp: 5比较: 2(A)DMasgraphW×LWpush(西德:10)comp−1,intend(西德:11)2(西德:10)comp−1,ability(西德:11)5(西德:10)subj−1,initiative(西德:11)3(西德:10)obj−1,initiative(西德:11)10hin,directioni1(乙)DMasW×LWmatrixWL×Whpush,comp−1ihpush,inihpush,obj−1ihpush,subj−1iability5000direction0100initiative00103intend2000(C)DMasWL×WmatrixFigure1:DistributionalMemorysamplearoundtopushrepresentedasagraph(A)andtwomatrices(乙,C)任务,namelysynonymchoiceandsemanticsimilar-ityprediction.Forbothlanguagesandtasks,mono-linguallyconstructedDSMscanprovidestrongbase-lines.Weﬁndsimilarpatternsacrosstasksandtargetlanguages:thecrosslinguallyconstructedDSMcanbeparametrizedsothatitbecomessuperiortoanexistingmonolingualDSMinquality,evenifinferiorincoverage.Asimplemultilingualbackoffcancom-binethecrosslingualmodel’shighqualitywiththemonolingualmodel’shighcoverage.Structureofthepaper.WebeginbysketchingthestructureofDistributionalMemory,ageneralframe-workforsyntax-basedsemanticspaces,inSection2.OurmaincontributionsfollowinSections3and4,namely,afamilyofmodelsforthecrosslingualandmultilingualconstructionofDSMs.Thesecondpartofthepaperisconcernedwithevaluation.Section5describesourexperimentalsetupafterwhichwedis-cussourresultsforGerman(Section6)andCroatian(Section7).Thepaperconcludeswithrelatedwork(Section8)andageneraldiscussion(Section9).2DistributionalMemory:AGeneralModelofSyntax-basedVectorSpaces2.1MotivationandDeﬁnitionSimplesyntax-basedDSMsrepresenttargetwordsintermsofdimensionslabeledwithword-relationpairs(林,1998;格芬施泰特,1994).很遗憾,thisrepresentationonlysupportstasksthatcomparepairsofwordswithregardtotheirmeaning(e.g.,insynonymydetectionorselectionalpreferences),butnotfortaskssuchasanalogicalreasoning,wheresetsofwordpairsarecompared(特尼,2006).Tounifysyntax-basedDSMs,BaroniandLenci(2010)proposedtheDistributionalMemory(DM)modelwhichcapturesdistributionalinformationatthemoregenerallevelofword-link-wordtriples,storedasathirdorderco-occurrencetensor.TheDMtensorcanbeseenasasetoforderedword-link-wordtuplessuchashpencilobjuseiassociatedwithascoringfunctionσ:W×L×W→R+thatscores,forexample,hpencilobjuseimorehighlythanhelephantobjusei.TheDMtensorcanbevisualizedasadirectedgraphwhosenodesarelabeledwithlemmasandwhoseedgesarelabeledwithlinksandscores.Asanexample,数字(1A)showsﬁvelinksfortheverbpushintheEnglishDM,includingsubject,目的,prepositionaladjunct,andgoverningverbs.DSMsforindividualtaskscanbeobtainedby“ma-tricizing”thetensorintotwo-dimensionalmatricescorrespondingtostandardvectorspaces.ThematrixinFigure(1乙)showsthewordbylink-wordspace(W×LW).Itrepresentswordswintermsofpairshl,wiofalinkandacontextword.Thisspacemodelssimilarityamongwords,e.g.forthesaurusconstruc-tion(林,1998).TheexamplematrixinFigure(1C)representsaword-linkbywordspace(WL×W).Itcharacterizespairshw,lithroughcontextwordsw,whichcanbeunderstoodasselectionalpreferences.DMdoesnotassumeaspeciﬁcsourceforbuildingthegraph.However,allexistingDMresourceswereextractedfromlargedependency-parsedcorporasuchasUKWAC(Baronietal.,2008).Inthesimplestcase,thesetoflabelsLis(asubsetof)thedependencyrelationsinthecorpus,andthescoringfunctionσisameasureofassociationbetweenthegovernorandthedependent(seeBaroniandLenci(2010)fordetails).然而,themostrobustDMs(includingBaroniandLenci’sLexDMandTypeDM)usebothsyntacticandlexicalizedlinks,i.e.linkswhichcontainwordsthemselves,aswellassurfaceform-basedlinks,e.g.,observedsubject-verb-objecttriplesinthecorpusleadtoahsubjectverbobjectiedgeintheDMgraph.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

247

2.2DMsforOtherLanguagesGiventheappealingpropertiesofDistributionalMemory,itmaybesurprisingthatnotmanycom-parableresourcesexistforotherlanguages.Toourknowledge,comparableresourcesexistonlyforGer-man(PadóandUtt,2012)andCroatian(Šnajderetal.,2013).BothstudiesreplicatethemonolingualDMconstructionprocessoutlinedbyBaroniandLencifortherespectivelanguages.ForGerman,theprocessisrelativelyunproblematic,sinceGermanisrelativelywell-equippedintermsofcorporaandparsers.Incontrast,Šnajderetal.(2013)facedseriousresourcescarcitywhilebuildingaCroatianDMandhadtogotoconsiderablelengthstocleanalargewebcor-pusandtooptimizethelinguisticprocessingtools.TheresultingDMoutperformsamonolingualcon-textwordmodelfornounsandverbs,butperformsworsethantheword-basedmodelfor(generallyrarer)adjectives.Asadirectconsequence,high-qualitysyntax-basedDSMscanonlybeconstructedforalimitedsetoflanguages.3CrosslingualConstructionofDMs3.1MotivationAsoutlinedintheprevioussection,thereisabottle-neckinmanylanguagesregardingbothlarge,cleancorporaaswellasprocessingpipelinesthatresultinhigh-accuracydependencyparses.Toaddressthisproblem,weproposetoinduceDistributionalMemo-riesforsuchlanguagescrosslinguallybytranslatingasourcelanguageDMintothetargetlanguage.ByadoptingEnglishasthesourcelanguagewecantakeadvantageoftheresourcegradient,thatis,thehighermaturityofEnglishNLPtechniques,suchasparsers,comparedtomostotherlanguages.Formanylanguages,treebankshavebecomeavailableonlywithinthelasttenyears(BuchholzandMarsi,2006),ifatall,whileEnglishhasbeenatthefore-frontofNLPdevelopmentforseveraldecades,andanumberofhighlyaccuratedependencyparsersexist(McDonaldetal.,2005;Nivre,2006).Atthesametime,Englisharguablypossessesthewidestrangeoflargeandwell-cleanedcorporaofanylanguage.Tomakeourapproachesapplicabletoasmanytargetlanguagesaspossible,weassumeinthissec-tionthatveryfewresourcesforthetargetlanguageareavailable.ThecrosslingualmethodswedevelopwoodNHolzNWaldNforestNtimberNGehölzNHainNcopseNgroveNFigure2:SampleoftheEnglish-Germandict.ccdictionary;translationsshownasdashedlines.hereworkwithoutanytargetlanguagecorpora,eithermonolingualorbilingual.Theonlyknowledgeweuseisasimpletranslationlexicon,thatis,alistoftranslationpairswithouttranslationprobabilities,asshowninFigure2.Translationlexiconsofthistypearearguablythemostcommonbilingualresourceandaccurateonesexistforvirtuallyanylanguagepair(Soderlandetal.,2009),evenforlanguageswithfewavailablecorpora.Furthermore,suchtranslationlexiconsareoftencrowdsourcedandareavailablefordownload.Forexample,thewebsitedict.ccprovidesnumeroussuchlexiconsforGermanandEnglish.Thisapproachpromisesinparticulartoyieldmod-elswithaquality-coverageproﬁlecomplementarytothatofmonolingualmodels(Mohammadetal.,2007;PeirsmanandPadó,2011):CrosslingualDMsareextractedfromsourcelanguagecorporawhichweassumetobeparsedmoreaccuratelythantargetlanguagecorpora.Inaddition,thetranslationpro-cesscanbedesignedtoactasafurtherﬁlteringstep(cf.Section3.4below),thusoptimizingcrosslingualmodelsforhigherqualityattheexpenseofcover-age.Incontrast,monolingualmodels–inparticularforunder-resourcedlanguages–oftenhitaqualityceiling,butcangenerallyguaranteehighcoverage.3.2TranslatingDMswithTranslationLexiconsWeconceptualizeDMasadirectedgraph(seeFig-ure1),whichallowsustophrasetranslationingraphterms(MihalceaandRadev,2011).ADMisatriple(V,乙,σ)whereVisasetofvertices(i.e.,thevocab-ulary),Easetoftypededgesbetweenwords,repre-sentedasword-link-wordtriples(cf.Section2),andσ,anedge-weightingfunction.WewilluseSandTtorefertothesourceandtargetlanguagevocabular-ies,分别,和(VS,ES,σS)和(VT,ET,σT)todenotesourceandtargetlanguageDMs.Wecannowaskhowtheshapeofthegraph

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

248

changesundertranslation.Inanidealworld,atranslationlexiconwouldbeabijectivefunctionbe-tweenthesourceandtargetlanguagevocabularies:Tr:S→T.Then,thetransformationwouldmerelyconstitutearelabeling.WewouldthenconstructtheGermanDMgraphbyexchangingallEnglishnodelabelswithGermannodelabels,i.e.,VT=T,andcreatingaGermanedgeforeachEnglishedge.13.3AmbiguityinUnﬁlteredTranslationThedictionaryfragmentinFigure2showsthattrans-lationisnotbijectivebutamany-to-manyrelation.Infact,takingtheEnglish–Germandict.cclex-iconasanexample,thereisanaverageof2.3Ger-mantranslationsforeachEnglishlemma,andanaverageof1.9EnglishtranslationsforeachGermanlemma.Wemodelthissituationusingtwofunctions:Tr:S→2Ttranslatessourcewordsintosetsoftargetwords,andTr−1:T→2Stranslatestargetwordsbackintothesourcelanguage.ThenaivewaytotranslatenodesusingTristousealltranslationsforagivenword.Thus,foreachedgeinthesourceDMbetweenlemmass1ands2,weob-tain|Tr(s1)|·|Tr(s2)|edgesinthetargetlanguage:ET={(t1,l,t2)|∃(s1,l,s2)∈ES:t1∈Tr(s1)∧t2∈Tr(s2)}(1)ThescoreσTofatargetedgeisdeﬁnedasthemeanofthescoresofallsourceedgesthatmaptoit.σT(t1,l,t2)=Xs1∈Tr−1(t1)s2∈Tr−1(t2)σS(s1,l,s2)|Tr−1(t1)|·|Tr−1(t2)|(2)Wetakethemeanasitislesssensitivetooutliersthanmaximumorminimum.Inaddition,unliketakingthesum,itisalsoautomaticallynormalizedregardingthenumberoftranslations,thuspenalizingwordswithmanyunrelatedsenses.AlookatFigures1and3,however,indicatesthatthisprocedureovergenerates.Thisisproblematicontwolevels.First,thetargetlanguagegraphwillcontainaverylargenumberofedges(e.g.,usingdict.cc,theedgehtextsbj_truseihas42German1Webuildontheassumptionthatdependencyrelationsarelanguage-independentwhich,whileincorrect,representsarea-sonablesimpliﬁcation(McDonaldetal.,2013).woodNprecutAHolzNzugeschnittenAWaldNmodmodgreatAgroßAmodmodmodmodFigure3:Unﬁlterededgetranslation(EN–DE)translations).第二,thecorrectnessofthetargetDMsuffers.Forsomecases,suchascopse–Gehölz,Hain,thevarioustranslationsaresynonymous,andEq.(1)isappropriate.Inothercases,multipletrans-lationsindicatelexicalambiguityofthesourceterm.Forexample,thetwotranslationsofwoodcorrespondtoitssensesasforest(Wald)andtimber(Holz),re-spectively.Insuchcases,Eq.(1)confusesthesenses,astheexampleinFigure3illustrates.Theleft-handsideshowsDMedgesbetweenwoodandtwoadjec-tivalmodiﬁers,namelyprecut(whichismoreplau-sibleforthetimbersense)andgreat(whichismoreplausiblefortheforestsense).Theright-handsideshows(partof)theGermantranslationsaccordingtoEq.(1):bothHolz(timber)andWald(forest)arelinkedtobothadjectives,leadingtospuriousedgesintheGermanDM.3.4FilteringbyBacktranslationSincethenatureofthetranslationisnotindicatedinthetranslationlexicon,weexploittypicalredun-danciesinthesourceDM,whichoftencontains“quasi-synonymous”edgesthatexpressthesamerelationwithdifferentwords,e.g.,hbookobjreadiandhnovelobjreadi.Thisallowsustoscoretargetedgecandidatesbyhowwellwecan“backtranslate”(Somers,2005)themintothesourcelanguage.ThisideaisillustratedinFigure4.Westillassume,asabove,thatwoodhastwotranslations,butthatprecuthasonlyone.FortheEnglishedgehprecutmodwoodi,weobtaintwoGermancandidateedges,namelyhzugeschnittenmodHolziandhzugeschnittenmodWaldi.Whenback-translatingthesecandidates,theﬁrstone,hzugeschnittenmodWaldi,mapsonlyontotheorigi-naledge.Thesecondone,hzugeschnittenmodHolzi,isbacktranslatedintoadifferentsourceedge,hprecutmodtimberi,whichmakesitmoreprobable.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

249

woodNprecutAHolzNzugeschnittenAWaldNforestNtimberNmodmodmodmodFigure4:Backtranslationﬁltering.Originalandwinningedgesshowninboldface.WeoperationalizethisbyaddinganotherconditiontoEq.(1),namelythattargetedgesmustbeamongthehighest-scoringedgesforsomesourceedge.RecallthatourtargetscoresσTarealreadydeﬁnedintermsofsourceedgescores,sonoredeﬁnitionofthescoringfunctionisnecessary.ET={(t1,l,t2)|∃(s1,l,s2)∈ES:t1∈Tr(s1)∧t2∈Tr(s2)∧σT(t1,l,t2)=maxt∈Tr(s1)t0∈Tr(s2)σT(t,我,t0),}(3)whereσT(t,我,t0)isthescoreasdeﬁnedinEq.2.Thisﬁlteringschemeisfairlyliberal:wedonotlimitthenumberoftargetedgesthatasourceedgecantranslateto.Astrictervariantcould,e.g.,abstainfromtranslatingasourceedgeifnouniquebestedgeexists.Weleavesuchvariantstofuturework.3.5DeﬁningSimilarityRecallfromFigure1thatDMcontainsinformationaboutboth“incoming”aswellas“outgoing”links.MonolinguallyconstructedDMsbydefaultusealloftheserelationssincetheinformationisreliable.Thesituationisnotasclearinacrosslingualsetting.Ourintuitionisthatselectionalpreferencesaremostinformativeandmostlikelytosurvivetranslation.Forexample,forverbsweexpectknowledgeabouttheirargumentstobemoreinformativethanabouttheirgovernors.Conversely,fornounswewanttouseknowledgeabouttheverbsthattheyoccurwithratherthantheirargumentsormodiﬁers.Weimplementthisideabycomputingsemanticsimilaritybetweenwordvectorseitheroncompletevectors(condition“AllL”)oronaﬁlteredversionthatusesonlyinverselinksforverbsandonlyregularlinksfornounsandadjectives(condition“SPrfL”).CovereditemsModelCorr.Cov.DM.DE(AllL).43.60DM.DE(SPrfL).43.60DM.XLEN→DEﬁlter(AllL).42.61DM.XLEN→DEﬁlter(SPrfL).49.49Table1:CoverageandCorrelation(Pearson’sr)forpredictingwordsimilarity,contrastinglinktypes(alllinksvs.selectionalpreferencelinks)Table1showstheresultsofpreliminaryexperimentsonasemanticsimilaritydataset(detailsinSection5).Theybearoutourhypothesis:inthemonolingualsetting,thereisalmostnodifference.Thus,inlinewithpreviouswork,weadopt(AllL)forDM.DE.Incontrast,weseeaclearquality-coveragetrade-offinthecrosslingualscenario,withahigherqualityfor(SPrfL).Sincethiscorrespondstoourfocusonhigherprecisionforcrosslingualmodels,wewilladopt(SPrfL)forallcrosslingualDMs.4MultilingualConstructionofDMsThecrosslingualmodelsdescribedintheprevioussectiondonotuseanycorpusinformationfromthetargetlanguage:Aspreviouslydiscussed,ourratio-naleistomakethemethodsaswidelyapplicableaspossible.However,thisassumptionmaybetoocau-tiousasmorecorporaandparserscontinuallybecomeavailable.Inordertotakeadvantageofsuchdevel-opments,thissectiondiscussestwosimplemethodsforcombiningmonolinguallyandcrosslinguallycon-structedDMs,therebycombiningcorpusevidencefromboththesourceandthetargetlanguage.WeconcentrateonmethodsthatcanbeappliedtoDMsdirectly,e.g.byresearcherswhodonothaveaccesstothesourcecorpora.Moreover,wecombinenotthegraphs,buttheresultingsemanticsimilarities.2Wetakeourinspirationfromworkoncombiningandsmoothingn-gramlanguagemod-els,wheretheusualoperationsareinterpolationandback-off(ChenandGoodman,1998).Notethatinourcase,thetwomodelstobecombinedareassumedtohavecomplementaryproperties,withthemonolin-2WeconductedexperimentswithgraphmergingbutfoundthatthedifferenttopologiesofthemonolingualandcrosslingualDMsmakeitdifﬁculttomergethegraphsinamannerthatcombinestheinformationfrombothgraphs.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

250

gualmodelhavinghighercoverageandthecrosslin-gualmodelhigherquality(cf.Section3.1).Forthisreason,weassumethatalinearinterpolationofthemodels’similaritiesforeachwordpairwillnotperformwell.Ourﬁrststrategyisasimplebackoffcombination(DM.MULTIBackoff)thatstartswiththecrosslingualmodelandfallsbacktothemono-lingualmodelinthecaseofzero-similarities.Oursecondstrategyfollowstheintuitionthatbothnoiseandsparsedatatendtoresultinunderestimatedsim-ilarities.ThisleadsustotheDM.MULTIMaxSimmodel:Ittakesthepredictionsfromthemonolingualandcrosslingualmodelandtakesthehigherone.BothDM.MULTIvariantscombinepredictionsfromtwomodelsandimplicitlyassumethatthepre-dictionsaredrawnfromthesamescoredistribution.Sincethisisnotguaranteed,westandardizeallscoresbeforecombination,thatis,welinearlytransformthevaluessothattheresultingdistributionhasameanof0andastandarddeviationof1.5ExperimentalSetupToshowthebeneﬁtsofourcrosslingualmethods,weperformexperimentsforthelanguagepairsEnglish–GermanandEnglish–Croatian.Theselanguagesex-emplifyvariabilityontheresourcegradient:TheresourcesituationisbestforEnglish,stillrelativelygoodforGerman,andmostdifﬁcultforCroatian.ThissectionoutlinestheexperimentsforGerman;Section7focusesonCroatian.Weevaluateourmod-elsontwostandardtasksfromlexicalsemantics:synonymchoiceandthepredictionofhumanrelat-ednessjudgments.Eventhoughthesetwotasksarein-vitro,theyarewidelyusedformodelselectionindistributionalspacemodelsandwecancomparetheresultsofourmodelsagainstpreviouswork.Thetwotaskstesthowwellthemodelscanaccountfortwodifferentaspectsoflexicalsemantics,namelyaspeciﬁclexicalrelation(synonymy)andgeneralsemanticrelatedness.5.1TasksandDatasetsOurﬁrsttaskissynonymdetection,wheremodelshavetoidentifythetruesynonymforatargetwordfromfourcandidates.WeusetheGermanReader’sDigestWordPower(RDWP)dataset(WallaceandDemagogedemagogue1Miesmacher×grinch2guterRedner×ablespeaker3skrupelloserXunscrupulousHetzeragitator4Meinungsforscher×pollster(A)Task1:synonymtargetwithfourcandidatesWordPairSimilarityAbsage-ablehnen3.5(rejection-refuse)Absage-Stellenanzeige1.875(rejection-jobadvertisement)Affe-Gepäckkontrolle0.125(monkey-luggageinspection)(乙)Task2:semanticsimilarity(范围:0–4)Table2:ExampleitemsfromevaluationtasksWallace,2005)whichcontains984items.3RDWPissimilartotheEnglishTOEFLdata(LandauerandDumais,1997),butcancontainshortphrasesamongthecandidates(cf.exampleinTable2a).OursecondevaluationtestshowwellthemodelspredictsimilaritiesforGermanwordpairsincludingcloselyrelated,somewhatrelated,andunrelatedwordpairs(cf.Table2b).WeusetheGur350dataset4whichcontains350wordpairsscoredforrelatednessbynativeGermantaggersonaﬁve-pointLikertscalebetween0(unrelated)and4(synonymous).Bothdatasetscontainnouns,verbsandadjectives.5.2ProcedureStartingfromaDMmodel,wematricizeitintoawordbylink-wordspace(W×LW)andcomputesimilaritiesbetweenwordswithCosinesimilarity.InExp.1,wecomputethesemanticsimilaritiesofthetargetwitheachcandidateandpredictthecandidatewiththehighestsimilaritytothetarget.Forphrasalcandidates,wecomputethesimilaritybetweenthetargetandallconstituentwordsandtakethemaxi-mum.WefollowMohammadetal.(2007)inassign-ingpartialcredittoamodelwhenthecandidatesofatargetaretiedformaximalsimilarity.WeevaluatethemodelsonExp.2bycalculatingthestrengthofthecorrelationbetweenthemodelpredictionsandthe3Availablefrom:http://goo.gl/PN42E4Availablefrom:http://goo.gl/3Dflf1

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

251

humanrelatednessjudgments.WeusePearson’scor-relationcoefﬁcientsinceitisthedefactoevaluationmeasureinrelevantearlierwork.5Onbothtasks,wecomparethemodelsintwoconditions.Intheﬁrstcondition(“All”),modelsareforcedtomakepredictionsforallitemsinthedataseteveniftheyhavenoinformationabouttheitem.Inthesecondcondition(“Covered”),modelsareallowedtoabstaininthecaseofzerosimilarities.ForExp.1,wereporttheaccuracy(thenumberofcorrectlyrecognizedsynonymsdividedbythenum-berofattemptedproblems)andcoverage(theratioofitemsattempted;always1forthe“All”condi-tion).Itemsareconsideredcoveredifatleastonecandidatehasanon-zerosimilaritytothetarget.InExp.2,wemeasurethecorrelationbetweenthese-manticsimilaritiesandhumanjudgmentsforwordpairs.Coverageiscalculatedasthepercentageofitemswithsimilaritygreater0.Differencesbetweenmodelsaretestedforsigniﬁ-canceusingbootstrapresampling(EfronandTibshi-rani,1993),alwaysinthe“All”condition.5.3ModelsWeconsiderthreetypesofDMmodels(monolingual,crosslingualandmultilingual),bag-of-wordsmodelsandasetofmodelsproposedintheliterature.Monolingualmodel.WeuseDM.DE(PadóandUtt,2012),constructedfroma900M-tokenwebcor-pus,SDEWAC,parsedwithMATE(博内特,2010).6AsdiscussedinSection3.5,weconsideralllinks(AllL)forthemonolingualmodel.Crosslingualmodels.ThestartingpointforthecrosslingualmodelsisBaroniandLenci(2010)’sEn-glishTypeDMmodelextractedfromapproximately3BtokensofWikipediaandwebcorpustextparsedwithMaltParser(Nivre,2006).7DM.XLnaiveimple-mentsEq.(1),andDM.XLﬁlterimplementsEq.(3).Asourtranslationlexicon,weusethecommunity-builtEnglish–Germandict.cconlinedictionary.85Wenotethatsincethedataarenotnormallydistributed,anon-parametriccorrelationcoefﬁcientwouldbemoreappro-priate.Whileweomittedthemduetospacelimitationsinthispaper,wewillprovideSpearmanρresultsforallmodelsonlineathttp://goo.gl/uxuffp.6Availablefromhttp://goo.gl/H6gViT.7Availablefromhttp://goo.gl/63ajCI.8Availablefromhttp://goo.gl/re44Hg.AdjNounVerbsTotalEnglish37K78K8K123KGerman35L99K9K143KTranslationpairs77K172K28K277KTable3:Sizeofthedict.ccdictionaryClassModelNodesEdgesmonolingualDM.DE(DE)3.5M78MTYPEDM(EN)31K131Mcrosslingual&DM.XLnaive63K5Bmultilingual(DE)DM.XLﬁlter63K1.7BTable4:SizesofvariousDMresourcesThestatisticsofthedictionaryinTable3showthatitisquitelargeandcoversmanyadjectivesandnouns,butrelativelyfewverbs.Wehadtoexcludemuchver-baldataduetoill-structuredentriesorphrasalentries.FollowingSection3.5,weonlyconsiderselectionalpreferencelinks(SPrfL)forthecrosslingualmodel.Multilingualmodels.WeconsiderthetwomodelsdescribedinSection4,namelyDM.MULTIBackoffandDM.MULTIMaxSim,eachcombiningDM.DE(AllL)withDM.XLﬁlter(SPrfL).Bag-of-wordsmodels.WebuildastandardBOWmodelfromthesameGermancorpusSDEWACusedforDM.DE.Weassumeawindowof10contextwordstotheleftandright.Weusethetop10Kmostfrequentcontentwords(nouns,形容词,verbsandadverbs)asdimensions.OursecondBOWmodel(BOWPCA500)wasreducedto500dimensionsbyapplyingprinciplecomponentanalysis,atechniquegenerallyusedtoincreaserobustnesstoparameterchoiceandtocombatsparsity.9Modelsfromtheliterature.Wecompareourmodelsagainstthestateoftheart,representedbytherespectivebestmodelsfromtwopreviousstudies(Zeschetal.,2007;Mohammadetal.,2007).Theycomprisemonolingualontology-basedmodelsthatuseGermaNet,(德语)维基百科,orboth(LinGN,9WealsobuiltmodelsusingsmallercontextwindowsandLatentSemanticAnalysis(LSA,Landauer,1997),bothwith500dimensionsandwithanautomaticallyoptimizednumberofdimensions(Wildetal.,2008).SincethesespacesdidnotconsistentlyyieldbetterresultsthanthereportedmodelsusingPCA,wedonotreporttheresultsindetail.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1

/
t

我

A
C
_
A
_
0
0
1
8
0
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

252

AllCoveredModelAccAccCovBaselinesandword-basedDSMs1Random.25.2512Frequency.31.3113BOW.46.46.984BOWPCA500.55.55.98Syntax-basedDSMs5DM.DE(AllL).48.53.846DM.XLEN→DEnaive(SPrfL).47.63.587DM.XLEN→DEﬁlter(SPrfL).46.61.588DM.MULTIBackoff(7,5).54.58.899DM.MULTIMaxSim(7,5).55.59.89Modelsfromtheliterature10Lindist[MGHZ07]NA.52.4511HPG[MGHZ07]NA.77.2212JC[MGHZ07]NA.44.36Table5:Exp.1:AccuracyandCoverageforsynonymchoiceontheReader’sDigestWordChoicedataset.MGHZ07:Mohammadetal.(2007).Bestresultsforeachmodelclassinbold.HPG,JC,PL);andcrosslingualdistributionalmod-elsthatrepresentthemeaningofGermanlemmasintermsEnglishthesauruscategories(Lindist).DMmodelstatistics.Table4showsthesizesofthevariousDMs.TheGermanandEnglishmonolin-gualDMsaremarkedlydifferent:theEnglishDMismuchmorecompact,coveringonly30KlemmaswhiletheGermanDMcovers3.5Mlemmas,andatthesametimemuchdenser.ThisdiscrepancyisduetothelargerEnglishcorpusandtheinclusionofverylow-frequencyitemsinDM.DE.ThecrosslingualmodelscreatedfromTYPEDMcover63KlemmasinGerman,abouttwicetheEnglishcoveragebutstillal-mosttwoordersofmagnitudebelowthemonolingualDM.DE.Theybecomeverylarge:naivetranslationincreasesthenumberofedgesbyafactorof30,andﬁlteredtranslationstillbyafactorof13.ThismeansﬁlteringdoesreducethesizeoftheresultingDM,butthereisstillconsiderableovergeneration.6ExperimentalEvaluationonGermanTheexperimentalresultsforthetwoexperimentsareshowninTables5and6,structuredbymodeltype.Weobservesimilarpatternsforthetwoexperiments.AllCoveredModelCorrCorrCovBaselinesandword-basedDSMs1Frequency.13.1312BOW.20.21.973BOWPCA500.34.37.97Syntax-basedDSMs4DM.DE(AllL)[PU12].38.43.605DM.XLEN→DEnaive(SPrfL).28.45.496DM.XLEN→DEﬁlter(SPrfL).33.49.497DM.MULTIBackoff(6,4).40.45.698DM.MULTIMaxSim(6,4).42.47.69Modelsfromtheliterature9LinGN[MGHZ07]NA.50.2610Lindist[MGHZ07]NA.51.2611JCGN+PLWP[ZGM07]NA.59.33Table6:Exp.2:Coverageandcorrelation(Pear-son’sr)forpredictingwordsimilarityontheGur350dataset.MGHZ07:Mohammadetal.(2007)8,ZGM07:Zeschetal.(2007)9,PU12:PadóandUtt(2012).Bestresultsforeachmodelclassinbold.Baselinesandword-basedDSMs.Inbothcases,uninformedbaselines(randomandfrequency)per-formbadly.(InExp.1,thefrequencybaselinepre-dictsthemostfrequentitemassynonym;inExp.2,itpredictsmin(F(w1),F(w2)).)Incontrast,word-basedDSMsperformquitewell,particularlythedimensionality-reducedmodel(BOWPCA).Syntax-basedDSM.Weseeaconsistentqualityversuscoveragetradeoffamongthedifferentclassesofsyntax-basedDSMs.ThemonolingualDM.DEmodelissigniﬁcantlyoutperformedbytheBOWmodelonExp.1(p<0.01),butnumericallyoutper-formsitonExp.2(differencenotsigniﬁcant).Inbothtasks,thecrosslingualDM.XLmodelsout-performbothDM.DEandBOWPCAintermsofquality:Theyachievethenumericallyhighestaccu-racy(andcorrelation,respectively)amongallsyntax-basedmodels.Thishighqualitycomesatalowcov-erage,matchingourintuitionsabouttheproﬁleofthe8Mohammadetal.(2007)donotprovidecoveragenumbersintheirpaper.WeappreciatethesupportofTorstenZeschandSaifMohammadinrecoveringthenecessaryinformation.9Zeschetal.(2007)reportresultsforthesubsetofGur350intheintersectionofGermaNetandWikipedia.Thus,theirmodelsmayhavehighercoverageonthecompleteGur350,buttoourknowledgethesenumbershavenotbeenpublished. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 253 crosslingualmodel.FilteringleadstoasigniﬁcantimprovementinExp.2(p<0.05)butnotinExp.1.Themultilingualmodels(DM.MULTI)performevenbetter.Theynearlyretainthequalityofthecrosslingualmodels(accuracyof.59vs..63forExp.1,correlationof.47vs..49forExp.2)butattainhighercoverage(89%inExp.1and69%inExp.2)Notably,thecoverageisevenhigherthanthatoftheDM.DEmodels,attestingtothecomplemen-tarityofmono-andcrosslingualinformation.ThedifferencesamongtheDM.MULTImodelsaresmall,butMaxSimdoesalittlebetterandperformsbestoverall.InExp.1,itdoessigniﬁcantlybetterintheall-itemsevaluationthanallothersyntax-basedmodels(p<0.01).ThedifferencesinExp.2areonlysigniﬁcantatp<0.05forthemodelpair6–8;weattributethistothesmallersizeofthedataset.Insum,wecanconstructcrosslingualDMswith-outanyuseoftargetlanguagecorporathatmirrororevenexceedtheperformanceofmonolingualDMs.Ifmonolingualdataisavailable,thecombinationofcorpusevidenceprovidesasubstantialadvantageoverbothmonolingualandcrosslingualmodels,evenforGerman,alanguagewithlarge,relativelyreliablyparsedcorpora.Userscanchooseamongdifferentmodelswithdifferentcoverage/qualitytradeoffs.Comparisontomodelsfromtheliterature.Mod-elsfromtheliteratureareshownatthebottomofthetwotables.Theygenerallyobtainthehighestac-curacy(orcorrelation,分别),butonlycoverarelativelysmallpartofthedatasets.Inparticu-lar,themodelswithaqualityhigherthantheDMvariants(11inExp.1and10and11inExp.2)ex-hibitacoverageoflessthanhalfthanthatoftheDM.MULTImodels.Thisappearstoshowtheusualtrade-offbetweenhand-constructedknowledgeandautomaticallyacquiredknowledge(GildeaandJu-rafsky,2002).然而,wecansimilarlybiasourDMstowardsaccuracywiththeaidofasimplefre-quencyﬁlterthatonlypermitspredictionsforitemswhereallinvolvedlemmasoccurmorefrequentlyintheGermancorpusthansomethreshold.Settingthesethresholdstomatchthecoverageﬁguresofthebestontology-basedmodels,DM.MULTIMaxSimal-mostreachestheontology-basedresults:OnExp.1,foracoverageof.22weobtainanaccuracyof.68(ontology-basedmodel:.77),andonExp.2,weob-NounsCouscous(couscous),Albino(albino)Adjectiveskursorisch(cursory),süfﬁsant(smug)Verbserodieren(toerode),moussieren(toﬁzz)Table7:Wordsofforeignoriginbetterrepresentedbythemultilingualmodeltainacorrelationof.60(ontology-basedmodel:.59)atacoverageof.33.10Thus,ourDMmodelsapprox-imatethequalityofontology-basedmodelswithoutusinganyhandcraftedresources.DifferencesbetweenExp.1and2.Thetwomaindifferencesbetweentheexperimentsare(A)theper-formanceofDMrelativetotheBOWbaseline,和(乙)theimpactofbacktranslationﬁltering.InExp.1,theBOWperformsaswellasDM.MULTI,andtheunﬁlteredDM.XLhasaslightedge(2%准确性)overtheﬁlteredone.Incontrast,inExp.2ﬁlteringleadstoamajorimprovementandDM.MULTIdoessubstantiallybetterthanBOWPCA.Ouranalysesattributethisdifferencetothenatureofthetwotasks(cf.Section5).Exp.1requirestherecognitionofsynonyms.Here,themaindeterminantofsuccessiswhethertheactualsynonymreceivesthehighestsimilarityornot,irrespectiveofthemargintothecompetingcandidates.Thismargindoesincreasefrom0.09inthenaiveto0.11intheﬁlteredDM.XL,buttheoverallnumberofcorrectpredictionsremainsalmostunchanged.Incontrast,Exp.2coversthewholerangefromhighlysimilartounrelatedwordpairs,andthecorrelationevaluationissensitivetotherelativesizeofsimilaritiesproducedbytheDMacrossmanywordpairs.Theimprovementweseeindicatesthatﬁlteringimprovestheoverallscalingofthesimilarities,butthiseffectismaskedbythedecisioncriterioninExp.1.Qualitativeanalysis.ComparingDM.DEwithDM.MULTI,thequestionarises:canwefurtherchar-acterizethebeneﬁtsthattheinclusionofcrosslingualcorpusevidenceconferstomonolingualmodels?WeﬁrstinspectedExp.1forsynonymsthatwerecor-rectlyrecognizedbyDM.MULTIMaxSimbutnotDM.DE,andfoundalargenumberofwordsoffor-eignorigin(seeTable7).ThesewordstendtoberareintheGermancorpusintheformoftechnical,slang,10Wecannotprovidesigniﬁcancetestssincewedonothaveitem-wisepredictionsforthemodelsfromtheliterature. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 254 orelevatedregisterterms.DuetotheirlowlevelofambiguityaswellasthefactthattheirEnglishtranslationsareoftenmorefrequent,thecrosslingualmodelrepresentsthemmoresensibly.WetheninspectedExp.2inasimilarwaybutfounditmoredifﬁculttoidentifysalientimprovedclassessincetheimprovementismostlyintermsofcoverage.ThedatasetforExp.2includespropernouns,suchasBerlin/Berlin-Kreuzberg,Benedetto/Benedikt,whichareunlikelytobecov-eredbyatranslationlexicon.ItalsocontainsitemsthatencodeworldknowledgesuchasRatzinger/Papst(pope)whichhasabetterchanceofbeingcoveredbytargetlanguagecorpora.ThispairisnotcoveredbytheDM.XLmodels,butthemonolingualmod-els(DM.DE,BOW,andBOWPCA)assignitthesimilarities.23,.66,and.89,respectively.7ExperimentalEvaluationonCroatianOurthirdexperimentconsidersalanguagethatismoredifferentfromEnglishthanGerman,namelyCroatian,aSlaviclanguage.AvailableresourcesforCroatianaremorelimitedthanforEnglishorGer-man.Sincesyntacticanalysisusedtobeabottleneck,theﬁrstsyntax-basedDSMforCroatian,DM.HR,becameavailableonlylastyear(Šnajderetal.,2013).Asforevaluationdatasets,therearenohumansimilar-ityjudgments,butthereisasynonymchoicedataset(CroSyn–seeKaranetal.(2012)fordetails).因此,ourCroatianevaluationisasynonymchoicetaskparalleltoExp.1forGerman.WetakeDM.HRasthemonolingualmodelwhichwasbuiltfromadependency-parsedCroatianwebcorpusof1.2Bto-kens.WeconstructacrosslingualmodelbystartingfromBaroniandLenci’sEnglishTypeDMandusingTaktikaNova’sfreelyavailableEnglish–Croatiandic-tionary11with105Ktranslationpairs.Afterremov-ingentrieswithmorethanonewordperlanguage,wewereleftwith95Kpairs,considerablyfewerthanforEnglish–German.WeapplythemethodsfromSection3foredgetranslationandﬁltering.There-sultingﬁlteredCroatianDM.XLhas47Knodesand315Medges,aboutoneorderofmagnitudesmallerthantheGermancrosslingualresource.Finally,wecombinedDM.HRwiththecrosslingualDM(asinSection4)toobtainmultilingualCroatianDMs.11Availablefromhttp://goo.gl/xHUjJHAllCoveredModelAccAccCovWord-basedDSMs1BOW-LSA[SPA13].66.661Syntax-basedDSMs2DM.HR(AllL).65.65.993DM.XLEN→HRnaive(SPrfL).43.50.714DM.XLEN→HRﬁlter(SPrfL).58.71.715DM.MULTIBackoff(4,2).69.69.996DM.MULTIMaxSim(4,2).70.70.99Table8:Experiment3:AccuracyandCoverageforsynonymchoiceontheCroSyndataset.SPA13:Šna-jderetal.(2013).Inboldface:bestresults.Table8showstheresultswhichcorrespondcloselytothoseforExp.1.Adimensionality-reducedBOWspaceperformscompetitivelywiththemonolingualDM.HR(Šnajderetal.,2013).ThecrosslingualDMisagainabletoimproveaccuracyoverDM.HR(by6%)butdropsincoverage.Again,themultilingualmodelsperformbest:DM.MULTIMaxSimlosesonly1%accuracycomparedtothecrosslingualmodelbutachievesalmostperfectcoverage.ThedifferencestoDM.HRandDM.XLarebothsigniﬁcant(p<0.01).12ThetwomajordifferencestotheGermansynonymchoicetask(Exp.1)arethat(a)ﬁlteringplaysanessentialroleforCroatian(increaseinaccuracyby15%)and(b)DM.MULTIclearlyoutperformstheBOWmodel.Weattributethedifferencetothesemi-automaticconstructionoftheCroatiandatasetfrommachine-readabledictionaries.Overall,theresultsforCroatianareencouraging.Theydemonstratethatlanguageswhereparsingtechnologyisstilldevelop-ingcaninparticularproﬁtfromcross-andmultilin-gualmethods.Thisistrueevenforrelativelysmalltranslationdictionaries,matchingpreviousresultsfromtheliterature(PeirsmanandPadó,2011).8RelatedWorkGiventheresourcegradientbetweenEnglishandotherlanguages,thecrosslingualinductionoflinguis-ticinformationhasbeenanactivetopicofresearch.Manystudiesuseparallelcorpora.Annotationpro-jection(YarowskyandNgai,2001)transferssourcelanguageannotationdirectlyontotargetlanguage12WecannotprovidesigniﬁcancesfortheBOWresultsbe-causeweagaindonothaveper-itempredictions. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 255 sentence.IthasbeenappliedtovariouslinguisticlevelssuchasPOStaggingandsyntax(HiandHwa,2005;Hwaetal.,2005,amongothers).Otherstudiesuseparalleldataasindirectsupervisionformonolin-gualtasks.DiabandResnik(2002)usetranslationsaswordsenselabels;vanderPlasandTiedemann(2006)exploitmultilingualdistributionalsemanticsforrobustsynonymyextraction.Naseemetal.(2009)learnunsupervisedPOStaggersonmultilingualparal-leldata,exploitingthedifferencesbetweenlanguagesassoftconstraints.TitovandKlementiev(2012)andKozhevnikovandTitov(2013)induceshallowse-manticparsersfromparalleldata.Klementievetal.(2012)approachdocumentclassiﬁcationwithmulti-tasklearning,inducingamultilingualDSM.Sinceparallelcorporaarenotavailableinlargequantities,otherstudiesusecomparablecorporawhichcanprovideadditionalfeaturesfromtheotherlanguage.Forexample,Merloetal.(2002)improveEnglishverbclassiﬁcationwithnewfeaturesderivedfromChinesetranslations.DeSmetandMoens(2009)learnmultilingualtopicmodelsfornewsag-gregation.PeirsmanandPadó(2011)usecomparablecorporatotransferselectionalpreferencesandsenti-mentlabels.Wikipediacanbeseenasaparticularlyrichtypeofcomparablecorpuswithadditionallinkstructure.Ithasbeenusedtocomputesemanticre-latedness(NavigliandPonzetto,2012;NavigliandPonzetto,2010)andtocomputeconceptualdocumentrepresentationsforcrosslingualinformationretrieval(Potthastetal.,2008;Cimianoetal.,2009).Ourwork,doesnotrequireparallelorcomparablecorpora.Wenote,however,thattranslationlexiconssuchastheonesweusecanbeextractedfromcompa-rablecorpora(Rapp,1999;Vuli´candMoens,2012,andmanyothers),thoughfewpapersareconcernedwiththetranslationatthelevelofsemanticrelations,asweare.SimilarinthisrespectisFungandChen(2004),whotranslateFrameNet(Bakeretal.,1998)intoChinesewithabilingualontology.Theyusearelation-basedpruningschemethatissomewhatcomparabletoourbacktranslationﬁltering.Toourknowledge,themostsimilarworktooursis(Mohammadetal.,2007),whichalsoconsidersDSMs,albeitadifferentvariety,namelyconcept-basedDSMswheretargetsarecharacterizedintermsoftheirdistributionovercategoriesofRoget’sthe-saurus.Likeourwork,theirstudycreatescrosslin-gualDSMsforGermanusingatranslationlexicon.Itfollowsadifferentstrategy,however:itcollectsco-occurrencecountsfromaGermancorpusandtrans-latesthecontextdimensionsintotheEnglishRogetcategories.Therefore,itcruciallyrequiresalargetar-getlanguagecorpus,whichourcrosslingualmethods(Section3)avoid.Itsuseofatargetlanguagecorpusresemblesourmultilingualmethods(Section4),butunlikethem,doesnotcombinecorpusevidencefrombothlanguages.Insum,webelievethatourmethodsaremoreadaptabletodifferentscenarios,beingabletousewhateverdataisavailableineitherlanguage.9ConclusionTheappealofsyntax-baseddistributionalspacesliesintheirpromiseofﬂexibleandlinguisticallymoreappropriatemodelsformanyphenomenainlexicalsemantics.Amajorobstacletotheiradoptionfornovellanguageshasbeenthesigniﬁcantlyhigherrequirementsonresourcescomparedtowordspaces.Inthispaper,wehavedemonstratedthatthisob-staclecanbeovercomebytransferringEnglishdis-tributionalinformationalongtheresourcegradientintotargetlanguagessuchasGermanandCroatian.Thesimplestmodels,whicharebasedsolelyontheEnglishDistributionalMemory(DM)resourceandatranslationlexicon,alreadybeatmonolingualDMsinquality.ThesecrosslingualmodelssufferfromlowercoveragebutcanbecombinedwiththemonolingualDMyieldingamultilingualDMthatmaintainscom-petitiveaccuracywhileachievingsigniﬁcantlyhighercoveragethaneitherindividualmodel.Theoutcomesofourexperimentsaremostlystableacrossthelan-guagesandtaskspresented,whichleadsustoassumethemethodologysuccessfullygeneralizes.13Directionsforfutureresearchinclude(a),morestringentﬁlteringofspuriousedgesinDM.XLmod-elstomakethegraphtopologymoresimilartomono-lingualmodelsandenablegraphmergingtoobtainuniﬁedmultilingualmodels;(b),theextensionofourapproachtomorethantwolanguages;(c),dimen-sionalityreductionfortensor-basedDSMsbothforefﬁciencyreasonsandtoimproveperformance.13TheGermanDMsarepubliclyavailablefromhttp://goo.gl/uxuffp. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 256 AcknowledgmentsWegratefullyacknowledgepartialfundingofourresearchbytheDFG(SFB732,ProjectD6)andtheEC(ProjectEXCITEMENT,FP7ICT-287923).ReferencesCollinF.Baker,CharlesJ.Fillmore,andJohnB.Lowe.1998.TheBerkeleyFrameNetProject.InProceedingsofthejointAnnualMeetingoftheAssociationforCom-putationalLinguisticsandInternationalConferenceonComputationalLinguistics,pages86–90,Montréal,QC.MarcoBaroniandAlessandroLenci.2010.Distribu-tionalmemory:Ageneralframeworkforcorpus-basedsemantics.ComputationalLinguistics,36(4):1–49.MarcoBaroni,SilviaBernardini,AdrianoFerraresi,andErosZanchetta.2008.TheWaCkyWideWeb:ACollectionofVeryLargeLinguisticallyProcessedWeb-CrawledCorpora.LanguageResourcesandEvalua-tion,43(3):209–226.BerndBohnet.2010.Topaccuracyandfastdependencyparsingisnotacontradiction.InProceedingsofthe23rdInternationalConferenceonComputationalLin-guistics,pages89–97,Beijing,China.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsoftheTenthConferenceonComputationalNaturalLanguageLearning,pages149–164,NewYork,NY.StanleyF.ChenandJoshuaGoodman.1998.Anempiri-calstudyofsmoothingtechniquesforlanguagemodel-ing.TechnicalReportTR-10-98,CenterforResearchinComputingTechnology,HarvardUniversity.PhilippCimiano,AntjeSchultz,SergejSizov,PhilippSorg,andSteffenStaab.2009.Explicitvs.latentcon-ceptmodelsforcross-languageinformationretrieval.InProceedingsoftheInternationalJointConferenceonArtiﬁcialIntelligence,pages1513–1518,Pasadena,CA.WimDeSmetandMarie-FrancineMoens.2009.Cross-languagelinkingofnewsstoriesonthewebusinginter-lingualtopicmodelling.InProceedingsoftheCIKMWorkshoponSocialWebSearchandMining,pages57–64,HongKong.MonaDiabandPhilipResnik.2002.Anunsupervisedmethodforwordsensetaggingusingparallelcorpora.InProceedingsofthe40thAnnualMeetingoftheAsso-ciationforComputationalLinguistics,pages255–262,Philadelphia,PA.BradleyEfronandRobertJ.Tibshirani.1993.AnIn-troductiontotheBootstrap.ChapmanandHall,NewYork,NY.KatrinErk,SebastianPadó,andUlrikePadó.2010.AFlexible,Corpus-DrivenModelofRegularandInverseSelectionalPreferences.ComputationalLinguistics,36(4):723–763.PascaleFungandBenfengChen.2004.BiFrameNet:BilingualFrameSemanticsResourcesConstructionbycrosslingualInduction.InProceedingsofthe20thIn-ternationalConferenceonComputationalLinguistics,pages931–935,Geneva,Switzerland.DanielGildeaandDanielJurafsky.2002.Automaticlabelingofsemanticroles.ComputationalLinguistics,28(3):245–288.GregoryGrefenstette.1994.ExplorationsinAutomaticThesaurusDiscovery.KluwerAcademicPublishers,Boston/Norwell,MA.ZeligS.Harris.1954.Distributionalstructure.Word,10(23):146–162.ChenhaiHiandRebeccaHwa.2005.ABackoffModelforBootstrappingResourcesforNon-EnglishLan-guages.InProceedingsofthejointHumanLanguageTechnologyConferenceandConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages851–858,Vancouver,BC.RebeccaHwa,PhilippResnik,AmyWeinberg,ClaraCabezas,andOkanKolak.2005.BootstrappingParsersviaSyntacticProjectionacrossParallelTexts.JournalofNaturalLanguageEngineering,11(3):311–325.EricJoanis,SuzanneStevenson,andDavidJames.2006.Ageneralfeaturespaceforautomaticverbclassiﬁca-tion.NaturalLanguageEngineering,14(03):337–367.MladenKaran,JanŠnajder,andBojanaDalbeloBaši´c.2012.DistributionalsemanticsapproachtodetectingsynonymsinCroatianlanguage.InProceedingsoftheEighthLanguageTechnologiesConference,Ljubljana,Slovenia.AlexandreKlementiev,IvanTitov,andBinodBhattarai.2012.Inducingcrosslingualdistributedrepresentationsofwords.InProceedingsoftheInternationalConfer-enceonComputationalLinguistics,pages1459–1474,Mumbai,India.MikhailKozhevnikovandIvanTitov.2013.Crosslingualtransferofsemanticrolemodels.InProceedingsofthe51thAnnualMeetingoftheAssociationforComputa-tionalLinguistics,pages1190–1200,Soﬁa,Bulgaria.ThomasKLandauerandSusanTDumais.1997.Asolu-tiontoPlato’sproblem:Thelatentsemanticanalysistheoryofacquisition,induction,andrepresentationofknowledge.PsychologicalReview,104(2):211–240.DekangLin.1998.Automaticretrievalandclusteringofsimilarwords.InProceedingsofthejointAnnualMeetingoftheAssociationforComputationalLinguis-ticsandInternationalConferenceonComputationalLinguistics,pages768–774,Montreal,QC. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 257 RyanMcDonald,FernandoPereira,KirilRibarov,andJanHajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InProceedingsoftheConferenceonHumanLanguageTechnologyandEm-piricalMethodsinNaturalLanguageProcessing,pages523–530,Vancouver,BC.RyanMcDonald,JoakimNivre,YvonneQuirmbach-Brundage,YoavGoldberg,DipanjanDas,KuzmanGanchev,KeithHall,SlavPetrov,HaoZhang,OscarTäckström,ClaudiaBedini,NúriaBertomeuCastelló,andJungmeeLee.2013.Universaldependencyannota-tionformultilingualparsing.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages92–97,Soﬁa,Bulgaria.PaolaMerlo,SuzanneStevenson,VivianTsang,andGi-anlucaAllaria.2002.Amultilingualparadigmforautomaticverbclassiﬁcation.InProceedingsofthe40thAnnualMeetingonAssociationforComputationalLinguistics,pages207–214,Philadelphia,PA.RadaMihalceaandDragomirRadev.2011.Graph-basedNaturalLanguageProcessingandInformationRetrieval.CambridgeUniversityPress,Cambridge,UK.GeorgeA.MillerandWalterG.Charles.1991.Contex-tualcorrelatesofsemanticsimilarity.LanguageandCognitiveProcesses,6(1):1–28.SaifMohammad,IrynaGurevych,GraemeHirst,andTorstenZesch.2007.Crosslingualdistributionalpro-ﬁlesofconceptsformeasuringsemanticdistance.InProceedingsoftheJointConferenceonEmpiricalMeth-odsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning,pages571–580,Prague,CzechRepublic.TahiraNaseem,BenjaminSnyder,JacobEisenstein,andReginaBarzilay.2009.MultilingualPart-of-SpeechTagging:TwoUnsupervisedApproaches.JournalofArtiﬁcialIntelligenceResearch,36:1–45.RobertoNavigliandSimonePaoloPonzetto.2010.Ba-belNet:Buildingaverylargemultilingualsemanticnetwork.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,pages216–225,Uppsala,Sweden.RobertoNavigliandSimonePaoloPonzetto.2012.Ba-belrelate!ajointmultilingualapproachtocomput-ingsemanticrelatedness.InProceedingsofthe26thConferenceonArtiﬁcialIntelligence,pages108–114,Toronto,ON.JoakimNivre.2006.InductiveDependencyParsing.Springer,Dordrecht,Netherlands.SebastianPadóandJasonUtt.2012.AdistributionalmemoryforGerman.InProceedingsoftheKONVENS2012workshoponrecentdevelopmentsandapplica-tionsoflexical-semanticresources,pages462–470,Vi-enna,Austria.YvesPeirsmanandSebastianPadó.2011.Semanticrelationsinbilinguallexicons.ACMTransactionsinSpeechandLanguageProcessing,8(2):3:1–3:21.LonnekevanderPlasandJörgTiedemann.2006.Findingsynonymsusingautomaticwordalignmentandmea-suresofdistributionalsimilarity.InProceedingsofjointAnnualMeetingoftheAssociationforCompu-tationalLinguisticsandInternationalConferenceonComputationalLinguistics,pages866–873,Sydney,Australia.MartinPotthast,BennoStein,andMaikAnderka.2008.Awikipedia-basedmultilingualretrievalmodel.InPro-ceedingsoftheEuropeanConferenceonInformationRetrieval,pages522–530,Glasgow,Scotland.ReinhardRapp.1999.AutomaticidentiﬁcationofwordtranslationsfromunrelatedEnglishandGermancor-pora.InProceedingsofthe37thAnnualMeetingoftheAssociationforComputationalLinguisticsonCom-putationalLinguistics,pages519–526,CollegePark,MD.SabineSchulteimWalde.2006.ExperimentsontheAutomaticInductionofGermanSemanticVerbClasses.ComputationalLinguistics,32(2):159–194.HinrichSchütze.1992.Dimensionsofmeaning.InProceedingsofSupercomputing’92,pages787–796,Minneapolis,MN.StephenSoderland,OrenEtzioni,DanielSWeld,MichaelSkinner,JeffBilmes,etal.2009.CompilingaMassive,MultilingualDictionaryviaProbabilisticInference.InProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessing,pages262–270,Suntec,Singapore.HaroldSomers.2005.Round-triptranslation:Whatisitgoodfor?InProceedingsoftheAustralasianLan-guageTechnologyWorkshop,pages127–133,Sydney,Australia.JanŠnajder,SebastianPadó,andŽeljkoAgi´c.2013.BuildingandevaluatingadistributionalmemoryforCroatian.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages784–789,Soﬁa,Bulgaria.IvanTitovandAlexandreKlementiev.2012.Crosslingualinductionofsemanticroles.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics,pages647–656,JejuIsland,SouthKorea.PeterTurney.2006.Similarityofsemanticrelations.ComputationalLinguistics,32(3):379–416.IvanVuli´candMarie-FrancineMoens.2012.Detectinghighlyconﬁdentwordtranslationsfromcomparablecorporawithoutanypriorknowledge.InProceedingsofthe13thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,pages449–459,Avignon,France. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 8 0 1 5 6 6 9 0 1 / / t l a c _ a _ 0 0 1 8 0 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 258 DeWittWallaceandLilaAchesonWallace.2005.Reader’sDigest,dasBestefürDeutschland.VerlagDasBeste,Stuttgart,Germany.FridolinWild,ChristinaStahl,GeraldStermsek,andGustafNeumann.2008.Parametersdrivingeffective-nessofautomatedessayscoringwithLSA.InProceed-ingsofthe9thComputer-AidedAssessmentConference,pages485–494,Loughborough,UK.DavidYarowskyandGraceNgai.2001.InducingMul-tilingualPOSTaggersandNPBracketersviaRobustProjectionacrossAlignedCorpora.InProceedingsofthe2ndAnnualMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages200–207,Pittsburgh,PA.TorstenZesch,IrynaGurevych,andMaxMühlhäuser.2007.ComparingWikipediaandGermanWordnetbyevaluatingsemanticrelatednessonmultipledatasets.InHumanLanguageTechnologies2007:TheConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages205–208,Rochester,NY.
下载pdf