计算语言学协会会刊, 2 (2014) 245–258. 动作编辑器: Patrick Pantel.
Submitted 11/2013; 修改 3/2014; 已发表 10/2014. C
(西德:13)
2014 计算语言学协会.
CrosslingualandMultilingualConstructionofSyntax-BasedVectorSpaceModelsJasonUttandSebastianPadóInstitutfürMaschinelleSprachverarbeitungUniversitätStuttgart[uttjn|pado]@ims.uni-stuttgart.deAbstractSyntax-baseddistributionalmodelsoflexicalsemanticsprovideaflexibleandlinguisticallyadequaterepresentationofco-occurrenceinfor-mation.However,theirconstructionrequireslarge,accuratelyparsedcorpora,whichareun-availableformostlanguages.Inthispaper,wedevelopanumberofmeth-odstoovercomethisobstacle.Wedescribe(A)acrosslingualapproachthatconstructsasyntax-basedmodelforanewlanguagerequir-ingonlyanEnglishresourceandatranslationlexicon;和(乙)multilingualapproachesthatcombinecrosslingualwithmonolingualinfor-mation,subjecttoavailability.WeevaluateontwolexicalsemanticbenchmarksinGer-manandCroatian.Wefindthatthemodelsexhibitcomplementaryprofiles:crosslingualmodelsyieldhigheraccuracieswhilemonolin-gualmodelsprovidebettercoverage.Inaddi-tion,weshowthatsimplemultilingualmodelscansuccessfullycombinetheirstrengths.1IntroductionBuildingontheDistributionalHypothesis(哈里斯,1954;MillerandCharles,1991),whichstatesthatwordsoccurringinsimilarcontextsaresimilarinmeaning,distributionalsemanticmodels(DSMs)rep-resentaword’smeaningviaitsoccurrenceincontextinlargecorpora.Vectorspaces,themostwidelyusedtypeofDSMs,representwordsasvectorsinahigh-dimensionalspacewhosedimensionscorrespondtofeaturesofthewords’contexts.Wordspacesrepre-sentthesimplestcaseofDSMsinwhichthedimen-sionsaresimplythecontextwords(Schütze,1992).AnotablesubclassofDSMsaresyntax-basedmod-els(林,1998;BaroniandLenci,2010)whichuse(lexicalized)syntacticrelationsasdimensions.Theyareabletomodelmorefine-graineddistinctionsthanwordspacesandhavebeenfoundtobeusefulfortaskssuchasselectionalpreferencelearning(Erketal.,2010),verbclassinduction(SchulteimWalde,2006),analogicalreasoning(特尼,2006),andalter-nationdiscovery(Joanisetal.,2006).Despitetheirflexibilityandusefulness,syntax-basedDSMsareusedlessoftenthanword-basedspaces.Animpor-tantreasonisthattheirconstructionrequiresaccurateparsers,whichareunavailableformanylanguages.Inaddition,syntax-basedDSMsareinherentlymoresparsethanwordspaces,whichcallsforalargecor-pusofwellparsabledata.ItisthusnotsurprisingthatbesidesEnglish(BaroniandLenci,2010),onlyfewotherlanguagespossesslarge-scalesyntax-basedDSMs(PadóandUtt,2012;Šnajderetal.,2013).ThispaperdevelopsmethodsthattakeadvantageoftheresourcegradientbetweenEnglishandotherlanguages,exploitingthehigher-qualityresourcesoftheformertoinduceresourcesfortargetlanguagesamongthelatter,bytranslatingtheword-link-wordco-occurrencesthatunderliesyntax-basedDSMs.Thisdirectlyprovidesacrosslingualmethodtocon-structsyntax-basedDSMsfortargetlanguageswith-outanytargetlanguagedata,requiringonlyanEn-glishsyntax-basedDSMandatranslationlexicon.Suchlexiconsareavailableformanylanguagepairs,andweoutlineamethodtoreduceambiguityinherentinsuchdictionaries.Wedescribeasetofmultilin-gualmethodsthatcancombinecorpusevidencefromEnglishandthetargetlanguagetofurtherimprovetheperformanceoftheobtainedDSM.Weconsidertwotargetlanguages,GermanandCroatian,asexamplesofonecloseandonemoreremotetargetlanguage.Forevaluation,weusetwo
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
246
initiativeNpushVdirectionNsubj-1: 3在: 1obj-1: 10intendVabilityNcomp: 5比较: 2(A)DMasgraphW×LWpush(西德:10)comp−1,intend(西德:11)2(西德:10)comp−1,ability(西德:11)5(西德:10)subj−1,initiative(西德:11)3(西德:10)obj−1,initiative(西德:11)10hin,directioni1(乙)DMasW×LWmatrixWL×Whpush,comp−1ihpush,inihpush,obj−1ihpush,subj−1iability5000direction0100initiative00103intend2000(C)DMasWL×WmatrixFigure1:DistributionalMemorysamplearoundtopushrepresentedasagraph(A)andtwomatrices(乙,C)任务,namelysynonymchoiceandsemanticsimilar-ityprediction.Forbothlanguagesandtasks,mono-linguallyconstructedDSMscanprovidestrongbase-lines.Wefindsimilarpatternsacrosstasksandtargetlanguages:thecrosslinguallyconstructedDSMcanbeparametrizedsothatitbecomessuperiortoanexistingmonolingualDSMinquality,evenifinferiorincoverage.Asimplemultilingualbackoffcancom-binethecrosslingualmodel’shighqualitywiththemonolingualmodel’shighcoverage.Structureofthepaper.WebeginbysketchingthestructureofDistributionalMemory,ageneralframe-workforsyntax-basedsemanticspaces,inSection2.OurmaincontributionsfollowinSections3and4,namely,afamilyofmodelsforthecrosslingualandmultilingualconstructionofDSMs.Thesecondpartofthepaperisconcernedwithevaluation.Section5describesourexperimentalsetupafterwhichwedis-cussourresultsforGerman(Section6)andCroatian(Section7).Thepaperconcludeswithrelatedwork(Section8)andageneraldiscussion(Section9).2DistributionalMemory:AGeneralModelofSyntax-basedVectorSpaces2.1MotivationandDefinitionSimplesyntax-basedDSMsrepresenttargetwordsintermsofdimensionslabeledwithword-relationpairs(林,1998;格芬施泰特,1994).很遗憾,thisrepresentationonlysupportstasksthatcomparepairsofwordswithregardtotheirmeaning(e.g.,insynonymydetectionorselectionalpreferences),butnotfortaskssuchasanalogicalreasoning,wheresetsofwordpairsarecompared(特尼,2006).Tounifysyntax-basedDSMs,BaroniandLenci(2010)proposedtheDistributionalMemory(DM)modelwhichcapturesdistributionalinformationatthemoregenerallevelofword-link-wordtriples,storedasathirdorderco-occurrencetensor.TheDMtensorcanbeseenasasetoforderedword-link-wordtuplessuchashpencilobjuseiassociatedwithascoringfunctionσ:W×L×W→R+thatscores,forexample,hpencilobjuseimorehighlythanhelephantobjusei.TheDMtensorcanbevisualizedasadirectedgraphwhosenodesarelabeledwithlemmasandwhoseedgesarelabeledwithlinksandscores.Asanexample,数字(1A)showsfivelinksfortheverbpushintheEnglishDM,includingsubject,目的,prepositionaladjunct,andgoverningverbs.DSMsforindividualtaskscanbeobtainedby“ma-tricizing”thetensorintotwo-dimensionalmatricescorrespondingtostandardvectorspaces.ThematrixinFigure(1乙)showsthewordbylink-wordspace(W×LW).Itrepresentswordswintermsofpairshl,wiofalinkandacontextword.Thisspacemodelssimilarityamongwords,e.g.forthesaurusconstruc-tion(林,1998).TheexamplematrixinFigure(1C)representsaword-linkbywordspace(WL×W).Itcharacterizespairshw,lithroughcontextwordsw,whichcanbeunderstoodasselectionalpreferences.DMdoesnotassumeaspecificsourceforbuildingthegraph.However,allexistingDMresourceswereextractedfromlargedependency-parsedcorporasuchasUKWAC(Baronietal.,2008).Inthesimplestcase,thesetoflabelsLis(asubsetof)thedependencyrelationsinthecorpus,andthescoringfunctionσisameasureofassociationbetweenthegovernorandthedependent(seeBaroniandLenci(2010)fordetails).然而,themostrobustDMs(includingBaroniandLenci’sLexDMandTypeDM)usebothsyntacticandlexicalizedlinks,i.e.linkswhichcontainwordsthemselves,aswellassurfaceform-basedlinks,e.g.,observedsubject-verb-objecttriplesinthecorpusleadtoahsubjectverbobjectiedgeintheDMgraph.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
247
2.2DMsforOtherLanguagesGiventheappealingpropertiesofDistributionalMemory,itmaybesurprisingthatnotmanycom-parableresourcesexistforotherlanguages.Toourknowledge,comparableresourcesexistonlyforGer-man(PadóandUtt,2012)andCroatian(Šnajderetal.,2013).BothstudiesreplicatethemonolingualDMconstructionprocessoutlinedbyBaroniandLencifortherespectivelanguages.ForGerman,theprocessisrelativelyunproblematic,sinceGermanisrelativelywell-equippedintermsofcorporaandparsers.Incontrast,Šnajderetal.(2013)facedseriousresourcescarcitywhilebuildingaCroatianDMandhadtogotoconsiderablelengthstocleanalargewebcor-pusandtooptimizethelinguisticprocessingtools.TheresultingDMoutperformsamonolingualcon-textwordmodelfornounsandverbs,butperformsworsethantheword-basedmodelfor(generallyrarer)adjectives.Asadirectconsequence,high-qualitysyntax-basedDSMscanonlybeconstructedforalimitedsetoflanguages.3CrosslingualConstructionofDMs3.1MotivationAsoutlinedintheprevioussection,thereisabottle-neckinmanylanguagesregardingbothlarge,cleancorporaaswellasprocessingpipelinesthatresultinhigh-accuracydependencyparses.Toaddressthisproblem,weproposetoinduceDistributionalMemo-riesforsuchlanguagescrosslinguallybytranslatingasourcelanguageDMintothetargetlanguage.ByadoptingEnglishasthesourcelanguagewecantakeadvantageoftheresourcegradient,thatis,thehighermaturityofEnglishNLPtechniques,suchasparsers,comparedtomostotherlanguages.Formanylanguages,treebankshavebecomeavailableonlywithinthelasttenyears(BuchholzandMarsi,2006),ifatall,whileEnglishhasbeenatthefore-frontofNLPdevelopmentforseveraldecades,andanumberofhighlyaccuratedependencyparsersexist(McDonaldetal.,2005;Nivre,2006).Atthesametime,Englisharguablypossessesthewidestrangeoflargeandwell-cleanedcorporaofanylanguage.Tomakeourapproachesapplicabletoasmanytargetlanguagesaspossible,weassumeinthissec-tionthatveryfewresourcesforthetargetlanguageareavailable.ThecrosslingualmethodswedevelopwoodNHolzNWaldNforestNtimberNGehölzNHainNcopseNgroveNFigure2:SampleoftheEnglish-Germandict.ccdictionary;translationsshownasdashedlines.hereworkwithoutanytargetlanguagecorpora,eithermonolingualorbilingual.Theonlyknowledgeweuseisasimpletranslationlexicon,thatis,alistoftranslationpairswithouttranslationprobabilities,asshowninFigure2.Translationlexiconsofthistypearearguablythemostcommonbilingualresourceandaccurateonesexistforvirtuallyanylanguagepair(Soderlandetal.,2009),evenforlanguageswithfewavailablecorpora.Furthermore,suchtranslationlexiconsareoftencrowdsourcedandareavailablefordownload.Forexample,thewebsitedict.ccprovidesnumeroussuchlexiconsforGermanandEnglish.Thisapproachpromisesinparticulartoyieldmod-elswithaquality-coverageprofilecomplementarytothatofmonolingualmodels(Mohammadetal.,2007;PeirsmanandPadó,2011):CrosslingualDMsareextractedfromsourcelanguagecorporawhichweassumetobeparsedmoreaccuratelythantargetlanguagecorpora.Inaddition,thetranslationpro-cesscanbedesignedtoactasafurtherfilteringstep(cf.Section3.4below),thusoptimizingcrosslingualmodelsforhigherqualityattheexpenseofcover-age.Incontrast,monolingualmodels–inparticularforunder-resourcedlanguages–oftenhitaqualityceiling,butcangenerallyguaranteehighcoverage.3.2TranslatingDMswithTranslationLexiconsWeconceptualizeDMasadirectedgraph(seeFig-ure1),whichallowsustophrasetranslationingraphterms(MihalceaandRadev,2011).ADMisatriple(V,乙,σ)whereVisasetofvertices(i.e.,thevocab-ulary),Easetoftypededgesbetweenwords,repre-sentedasword-link-wordtriples(cf.Section2),andσ,anedge-weightingfunction.WewilluseSandTtorefertothesourceandtargetlanguagevocabular-ies,分别,和(VS,ES,σS)和(VT,ET,σT)todenotesourceandtargetlanguageDMs.Wecannowaskhowtheshapeofthegraph
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
248
changesundertranslation.Inanidealworld,atranslationlexiconwouldbeabijectivefunctionbe-tweenthesourceandtargetlanguagevocabularies:Tr:S→T.Then,thetransformationwouldmerelyconstitutearelabeling.WewouldthenconstructtheGermanDMgraphbyexchangingallEnglishnodelabelswithGermannodelabels,i.e.,VT=T,andcreatingaGermanedgeforeachEnglishedge.13.3AmbiguityinUnfilteredTranslationThedictionaryfragmentinFigure2showsthattrans-lationisnotbijectivebutamany-to-manyrelation.Infact,takingtheEnglish–Germandict.cclex-iconasanexample,thereisanaverageof2.3Ger-mantranslationsforeachEnglishlemma,andanaverageof1.9EnglishtranslationsforeachGermanlemma.Wemodelthissituationusingtwofunctions:Tr:S→2Ttranslatessourcewordsintosetsoftargetwords,andTr−1:T→2Stranslatestargetwordsbackintothesourcelanguage.ThenaivewaytotranslatenodesusingTristousealltranslationsforagivenword.Thus,foreachedgeinthesourceDMbetweenlemmass1ands2,weob-tain|Tr(s1)|·|Tr(s2)|edgesinthetargetlanguage:ET={(t1,l,t2)|∃(s1,l,s2)∈ES:t1∈Tr(s1)∧t2∈Tr(s2)}(1)ThescoreσTofatargetedgeisdefinedasthemeanofthescoresofallsourceedgesthatmaptoit.σT(t1,l,t2)=Xs1∈Tr−1(t1)s2∈Tr−1(t2)σS(s1,l,s2)|Tr−1(t1)|·|Tr−1(t2)|(2)Wetakethemeanasitislesssensitivetooutliersthanmaximumorminimum.Inaddition,unliketakingthesum,itisalsoautomaticallynormalizedregardingthenumberoftranslations,thuspenalizingwordswithmanyunrelatedsenses.AlookatFigures1and3,however,indicatesthatthisprocedureovergenerates.Thisisproblematicontwolevels.First,thetargetlanguagegraphwillcontainaverylargenumberofedges(e.g.,usingdict.cc,theedgehtextsbj_truseihas42German1Webuildontheassumptionthatdependencyrelationsarelanguage-independentwhich,whileincorrect,representsarea-sonablesimplification(McDonaldetal.,2013).woodNprecutAHolzNzugeschnittenAWaldNmodmodgreatAgroßAmodmodmodmodFigure3:Unfilterededgetranslation(EN–DE)translations).第二,thecorrectnessofthetargetDMsuffers.Forsomecases,suchascopse–Gehölz,Hain,thevarioustranslationsaresynonymous,andEq.(1)isappropriate.Inothercases,multipletrans-lationsindicatelexicalambiguityofthesourceterm.Forexample,thetwotranslationsofwoodcorrespondtoitssensesasforest(Wald)andtimber(Holz),re-spectively.Insuchcases,Eq.(1)confusesthesenses,astheexampleinFigure3illustrates.Theleft-handsideshowsDMedgesbetweenwoodandtwoadjec-tivalmodifiers,namelyprecut(whichismoreplau-sibleforthetimbersense)andgreat(whichismoreplausiblefortheforestsense).Theright-handsideshows(partof)theGermantranslationsaccordingtoEq.(1):bothHolz(timber)andWald(forest)arelinkedtobothadjectives,leadingtospuriousedgesintheGermanDM.3.4FilteringbyBacktranslationSincethenatureofthetranslationisnotindicatedinthetranslationlexicon,weexploittypicalredun-danciesinthesourceDM,whichoftencontains“quasi-synonymous”edgesthatexpressthesamerelationwithdifferentwords,e.g.,hbookobjreadiandhnovelobjreadi.Thisallowsustoscoretargetedgecandidatesbyhowwellwecan“backtranslate”(Somers,2005)themintothesourcelanguage.ThisideaisillustratedinFigure4.Westillassume,asabove,thatwoodhastwotranslations,butthatprecuthasonlyone.FortheEnglishedgehprecutmodwoodi,weobtaintwoGermancandidateedges,namelyhzugeschnittenmodHolziandhzugeschnittenmodWaldi.Whenback-translatingthesecandidates,thefirstone,hzugeschnittenmodWaldi,mapsonlyontotheorigi-naledge.Thesecondone,hzugeschnittenmodHolzi,isbacktranslatedintoadifferentsourceedge,hprecutmodtimberi,whichmakesitmoreprobable.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
249
woodNprecutAHolzNzugeschnittenAWaldNforestNtimberNmodmodmodmodFigure4:Backtranslationfiltering.Originalandwinningedgesshowninboldface.WeoperationalizethisbyaddinganotherconditiontoEq.(1),namelythattargetedgesmustbeamongthehighest-scoringedgesforsomesourceedge.RecallthatourtargetscoresσTarealreadydefinedintermsofsourceedgescores,sonoredefinitionofthescoringfunctionisnecessary.ET={(t1,l,t2)|∃(s1,l,s2)∈ES:t1∈Tr(s1)∧t2∈Tr(s2)∧σT(t1,l,t2)=maxt∈Tr(s1)t0∈Tr(s2)σT(t,我,t0),}(3)whereσT(t,我,t0)isthescoreasdefinedinEq.2.Thisfilteringschemeisfairlyliberal:wedonotlimitthenumberoftargetedgesthatasourceedgecantranslateto.Astrictervariantcould,e.g.,abstainfromtranslatingasourceedgeifnouniquebestedgeexists.Weleavesuchvariantstofuturework.3.5DefiningSimilarityRecallfromFigure1thatDMcontainsinformationaboutboth“incoming”aswellas“outgoing”links.MonolinguallyconstructedDMsbydefaultusealloftheserelationssincetheinformationisreliable.Thesituationisnotasclearinacrosslingualsetting.Ourintuitionisthatselectionalpreferencesaremostinformativeandmostlikelytosurvivetranslation.Forexample,forverbsweexpectknowledgeabouttheirargumentstobemoreinformativethanabouttheirgovernors.Conversely,fornounswewanttouseknowledgeabouttheverbsthattheyoccurwithratherthantheirargumentsormodifiers.Weimplementthisideabycomputingsemanticsimilaritybetweenwordvectorseitheroncompletevectors(condition“AllL”)oronafilteredversionthatusesonlyinverselinksforverbsandonlyregularlinksfornounsandadjectives(condition“SPrfL”).CovereditemsModelCorr.Cov.DM.DE(AllL).43.60DM.DE(SPrfL).43.60DM.XLEN→DEfilter(AllL).42.61DM.XLEN→DEfilter(SPrfL).49.49Table1:CoverageandCorrelation(Pearson’sr)forpredictingwordsimilarity,contrastinglinktypes(alllinksvs.selectionalpreferencelinks)Table1showstheresultsofpreliminaryexperimentsonasemanticsimilaritydataset(detailsinSection5).Theybearoutourhypothesis:inthemonolingualsetting,thereisalmostnodifference.Thus,inlinewithpreviouswork,weadopt(AllL)forDM.DE.Incontrast,weseeaclearquality-coveragetrade-offinthecrosslingualscenario,withahigherqualityfor(SPrfL).Sincethiscorrespondstoourfocusonhigherprecisionforcrosslingualmodels,wewilladopt(SPrfL)forallcrosslingualDMs.4MultilingualConstructionofDMsThecrosslingualmodelsdescribedintheprevioussectiondonotuseanycorpusinformationfromthetargetlanguage:Aspreviouslydiscussed,ourratio-naleistomakethemethodsaswidelyapplicableaspossible.However,thisassumptionmaybetoocau-tiousasmorecorporaandparserscontinuallybecomeavailable.Inordertotakeadvantageofsuchdevel-opments,thissectiondiscussestwosimplemethodsforcombiningmonolinguallyandcrosslinguallycon-structedDMs,therebycombiningcorpusevidencefromboththesourceandthetargetlanguage.WeconcentrateonmethodsthatcanbeappliedtoDMsdirectly,e.g.byresearcherswhodonothaveaccesstothesourcecorpora.Moreover,wecombinenotthegraphs,buttheresultingsemanticsimilarities.2Wetakeourinspirationfromworkoncombiningandsmoothingn-gramlanguagemod-els,wheretheusualoperationsareinterpolationandback-off(ChenandGoodman,1998).Notethatinourcase,thetwomodelstobecombinedareassumedtohavecomplementaryproperties,withthemonolin-2WeconductedexperimentswithgraphmergingbutfoundthatthedifferenttopologiesofthemonolingualandcrosslingualDMsmakeitdifficulttomergethegraphsinamannerthatcombinestheinformationfrombothgraphs.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
250
gualmodelhavinghighercoverageandthecrosslin-gualmodelhigherquality(cf.Section3.1).Forthisreason,weassumethatalinearinterpolationofthemodels’similaritiesforeachwordpairwillnotperformwell.Ourfirststrategyisasimplebackoffcombination(DM.MULTIBackoff)thatstartswiththecrosslingualmodelandfallsbacktothemono-lingualmodelinthecaseofzero-similarities.Oursecondstrategyfollowstheintuitionthatbothnoiseandsparsedatatendtoresultinunderestimatedsim-ilarities.ThisleadsustotheDM.MULTIMaxSimmodel:Ittakesthepredictionsfromthemonolingualandcrosslingualmodelandtakesthehigherone.BothDM.MULTIvariantscombinepredictionsfromtwomodelsandimplicitlyassumethatthepre-dictionsaredrawnfromthesamescoredistribution.Sincethisisnotguaranteed,westandardizeallscoresbeforecombination,thatis,welinearlytransformthevaluessothattheresultingdistributionhasameanof0andastandarddeviationof1.5ExperimentalSetupToshowthebenefitsofourcrosslingualmethods,weperformexperimentsforthelanguagepairsEnglish–GermanandEnglish–Croatian.Theselanguagesex-emplifyvariabilityontheresourcegradient:TheresourcesituationisbestforEnglish,stillrelativelygoodforGerman,andmostdifficultforCroatian.ThissectionoutlinestheexperimentsforGerman;Section7focusesonCroatian.Weevaluateourmod-elsontwostandardtasksfromlexicalsemantics:synonymchoiceandthepredictionofhumanrelat-ednessjudgments.Eventhoughthesetwotasksarein-vitro,theyarewidelyusedformodelselectionindistributionalspacemodelsandwecancomparetheresultsofourmodelsagainstpreviouswork.Thetwotaskstesthowwellthemodelscanaccountfortwodifferentaspectsoflexicalsemantics,namelyaspecificlexicalrelation(synonymy)andgeneralsemanticrelatedness.5.1TasksandDatasetsOurfirsttaskissynonymdetection,wheremodelshavetoidentifythetruesynonymforatargetwordfromfourcandidates.WeusetheGermanReader’sDigestWordPower(RDWP)dataset(WallaceandDemagogedemagogue1Miesmacher×grinch2guterRedner×ablespeaker3skrupelloserXunscrupulousHetzeragitator4Meinungsforscher×pollster(A)Task1:synonymtargetwithfourcandidatesWordPairSimilarityAbsage-ablehnen3.5(rejection-refuse)Absage-Stellenanzeige1.875(rejection-jobadvertisement)Affe-Gepäckkontrolle0.125(monkey-luggageinspection)(乙)Task2:semanticsimilarity(范围:0–4)Table2:ExampleitemsfromevaluationtasksWallace,2005)whichcontains984items.3RDWPissimilartotheEnglishTOEFLdata(LandauerandDumais,1997),butcancontainshortphrasesamongthecandidates(cf.exampleinTable2a).OursecondevaluationtestshowwellthemodelspredictsimilaritiesforGermanwordpairsincludingcloselyrelated,somewhatrelated,andunrelatedwordpairs(cf.Table2b).WeusetheGur350dataset4whichcontains350wordpairsscoredforrelatednessbynativeGermantaggersonafive-pointLikertscalebetween0(unrelated)and4(synonymous).Bothdatasetscontainnouns,verbsandadjectives.5.2ProcedureStartingfromaDMmodel,wematricizeitintoawordbylink-wordspace(W×LW)andcomputesimilaritiesbetweenwordswithCosinesimilarity.InExp.1,wecomputethesemanticsimilaritiesofthetargetwitheachcandidateandpredictthecandidatewiththehighestsimilaritytothetarget.Forphrasalcandidates,wecomputethesimilaritybetweenthetargetandallconstituentwordsandtakethemaxi-mum.WefollowMohammadetal.(2007)inassign-ingpartialcredittoamodelwhenthecandidatesofatargetaretiedformaximalsimilarity.WeevaluatethemodelsonExp.2bycalculatingthestrengthofthecorrelationbetweenthemodelpredictionsandthe3Availablefrom:http://goo.gl/PN42E4Availablefrom:http://goo.gl/3Dflf1
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
251
humanrelatednessjudgments.WeusePearson’scor-relationcoefficientsinceitisthedefactoevaluationmeasureinrelevantearlierwork.5Onbothtasks,wecomparethemodelsintwoconditions.Inthefirstcondition(“All”),modelsareforcedtomakepredictionsforallitemsinthedataseteveniftheyhavenoinformationabouttheitem.Inthesecondcondition(“Covered”),modelsareallowedtoabstaininthecaseofzerosimilarities.ForExp.1,wereporttheaccuracy(thenumberofcorrectlyrecognizedsynonymsdividedbythenum-berofattemptedproblems)andcoverage(theratioofitemsattempted;always1forthe“All”condi-tion).Itemsareconsideredcoveredifatleastonecandidatehasanon-zerosimilaritytothetarget.InExp.2,wemeasurethecorrelationbetweenthese-manticsimilaritiesandhumanjudgmentsforwordpairs.Coverageiscalculatedasthepercentageofitemswithsimilaritygreater0.Differencesbetweenmodelsaretestedforsignifi-canceusingbootstrapresampling(EfronandTibshi-rani,1993),alwaysinthe“All”condition.5.3ModelsWeconsiderthreetypesofDMmodels(monolingual,crosslingualandmultilingual),bag-of-wordsmodelsandasetofmodelsproposedintheliterature.Monolingualmodel.WeuseDM.DE(PadóandUtt,2012),constructedfroma900M-tokenwebcor-pus,SDEWAC,parsedwithMATE(博内特,2010).6AsdiscussedinSection3.5,weconsideralllinks(AllL)forthemonolingualmodel.Crosslingualmodels.ThestartingpointforthecrosslingualmodelsisBaroniandLenci(2010)’sEn-glishTypeDMmodelextractedfromapproximately3BtokensofWikipediaandwebcorpustextparsedwithMaltParser(Nivre,2006).7DM.XLnaiveimple-mentsEq.(1),andDM.XLfilterimplementsEq.(3).Asourtranslationlexicon,weusethecommunity-builtEnglish–Germandict.cconlinedictionary.85Wenotethatsincethedataarenotnormallydistributed,anon-parametriccorrelationcoefficientwouldbemoreappro-priate.Whileweomittedthemduetospacelimitationsinthispaper,wewillprovideSpearmanρresultsforallmodelsonlineathttp://goo.gl/uxuffp.6Availablefromhttp://goo.gl/H6gViT.7Availablefromhttp://goo.gl/63ajCI.8Availablefromhttp://goo.gl/re44Hg.AdjNounVerbsTotalEnglish37K78K8K123KGerman35L99K9K143KTranslationpairs77K172K28K277KTable3:Sizeofthedict.ccdictionaryClassModelNodesEdgesmonolingualDM.DE(DE)3.5M78MTYPEDM(EN)31K131Mcrosslingual&DM.XLnaive63K5Bmultilingual(DE)DM.XLfilter63K1.7BTable4:SizesofvariousDMresourcesThestatisticsofthedictionaryinTable3showthatitisquitelargeandcoversmanyadjectivesandnouns,butrelativelyfewverbs.Wehadtoexcludemuchver-baldataduetoill-structuredentriesorphrasalentries.FollowingSection3.5,weonlyconsiderselectionalpreferencelinks(SPrfL)forthecrosslingualmodel.Multilingualmodels.WeconsiderthetwomodelsdescribedinSection4,namelyDM.MULTIBackoffandDM.MULTIMaxSim,eachcombiningDM.DE(AllL)withDM.XLfilter(SPrfL).Bag-of-wordsmodels.WebuildastandardBOWmodelfromthesameGermancorpusSDEWACusedforDM.DE.Weassumeawindowof10contextwordstotheleftandright.Weusethetop10Kmostfrequentcontentwords(nouns,形容词,verbsandadverbs)asdimensions.OursecondBOWmodel(BOWPCA500)wasreducedto500dimensionsbyapplyingprinciplecomponentanalysis,atechniquegenerallyusedtoincreaserobustnesstoparameterchoiceandtocombatsparsity.9Modelsfromtheliterature.Wecompareourmodelsagainstthestateoftheart,representedbytherespectivebestmodelsfromtwopreviousstudies(Zeschetal.,2007;Mohammadetal.,2007).Theycomprisemonolingualontology-basedmodelsthatuseGermaNet,(德语)维基百科,orboth(LinGN,9WealsobuiltmodelsusingsmallercontextwindowsandLatentSemanticAnalysis(LSA,Landauer,1997),bothwith500dimensionsandwithanautomaticallyoptimizednumberofdimensions(Wildetal.,2008).SincethesespacesdidnotconsistentlyyieldbetterresultsthanthereportedmodelsusingPCA,wedonotreporttheresultsindetail.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
我
A
C
_
A
_
0
0
1
8
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3
252
AllCoveredModelAccAccCovBaselinesandword-basedDSMs1Random.25.2512Frequency.31.3113BOW.46.46.984BOWPCA500.55.55.98Syntax-basedDSMs5DM.DE(AllL).48.53.846DM.XLEN→DEnaive(SPrfL).47.63.587DM.XLEN→DEfilter(SPrfL).46.61.588DM.MULTIBackoff(7,5).54.58.899DM.MULTIMaxSim(7,5).55.59.89Modelsfromtheliterature10Lindist[MGHZ07]NA.52.4511HPG[MGHZ07]NA.77.2212JC[MGHZ07]NA.44.36Table5:Exp.1:AccuracyandCoverageforsynonymchoiceontheReader’sDigestWordChoicedataset.MGHZ07:Mohammadetal.(2007).Bestresultsforeachmodelclassinbold.HPG,JC,PL);andcrosslingualdistributionalmod-elsthatrepresentthemeaningofGermanlemmasintermsEnglishthesauruscategories(Lindist).DMmodelstatistics.Table4showsthesizesofthevariousDMs.TheGermanandEnglishmonolin-gualDMsaremarkedlydifferent:theEnglishDMismuchmorecompact,coveringonly30KlemmaswhiletheGermanDMcovers3.5Mlemmas,andatthesametimemuchdenser.ThisdiscrepancyisduetothelargerEnglishcorpusandtheinclusionofverylow-frequencyitemsinDM.DE.ThecrosslingualmodelscreatedfromTYPEDMcover63KlemmasinGerman,abouttwicetheEnglishcoveragebutstillal-mosttwoordersofmagnitudebelowthemonolingualDM.DE.Theybecomeverylarge:naivetranslationincreasesthenumberofedgesbyafactorof30,andfilteredtranslationstillbyafactorof13.ThismeansfilteringdoesreducethesizeoftheresultingDM,butthereisstillconsiderableovergeneration.6ExperimentalEvaluationonGermanTheexperimentalresultsforthetwoexperimentsareshowninTables5and6,structuredbymodeltype.Weobservesimilarpatternsforthetwoexperiments.AllCoveredModelCorrCorrCovBaselinesandword-basedDSMs1Frequency.13.1312BOW.20.21.973BOWPCA500.34.37.97Syntax-basedDSMs4DM.DE(AllL)[PU12].38.43.605DM.XLEN→DEnaive(SPrfL).28.45.496DM.XLEN→DEfilter(SPrfL).33.49.497DM.MULTIBackoff(6,4).40.45.698DM.MULTIMaxSim(6,4).42.47.69Modelsfromtheliterature9LinGN[MGHZ07]NA.50.2610Lindist[MGHZ07]NA.51.2611JCGN+PLWP[ZGM07]NA.59.33Table6:Exp.2:Coverageandcorrelation(Pear-son’sr)forpredictingwordsimilarityontheGur350dataset.MGHZ07:Mohammadetal.(2007)8,ZGM07:Zeschetal.(2007)9,PU12:PadóandUtt(2012).Bestresultsforeachmodelclassinbold.Baselinesandword-basedDSMs.Inbothcases,uninformedbaselines(randomandfrequency)per-formbadly.(InExp.1,thefrequencybaselinepre-dictsthemostfrequentitemassynonym;inExp.2,itpredictsmin(F(w1),F(w2)).)Incontrast,word-basedDSMsperformquitewell,particularlythedimensionality-reducedmodel(BOWPCA).Syntax-basedDSM.Weseeaconsistentqualityversuscoveragetradeoffamongthedifferentclassesofsyntax-basedDSMs.ThemonolingualDM.DEmodelissignificantlyoutperformedbytheBOWmodelonExp.1(p<0.01),butnumericallyoutper-formsitonExp.2(differencenotsignificant).Inbothtasks,thecrosslingualDM.XLmodelsout-performbothDM.DEandBOWPCAintermsofquality:Theyachievethenumericallyhighestaccu-racy(andcorrelation,respectively)amongallsyntax-basedmodels.Thishighqualitycomesatalowcov-erage,matchingourintuitionsabouttheprofileofthe8Mohammadetal.(2007)donotprovidecoveragenumbersintheirpaper.WeappreciatethesupportofTorstenZeschandSaifMohammadinrecoveringthenecessaryinformation.9Zeschetal.(2007)reportresultsforthesubsetofGur350intheintersectionofGermaNetandWikipedia.Thus,theirmodelsmayhavehighercoverageonthecompleteGur350,buttoourknowledgethesenumbershavenotbeenpublished.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
253
crosslingualmodel.FilteringleadstoasignificantimprovementinExp.2(p<0.05)butnotinExp.1.Themultilingualmodels(DM.MULTI)performevenbetter.Theynearlyretainthequalityofthecrosslingualmodels(accuracyof.59vs..63forExp.1,correlationof.47vs..49forExp.2)butattainhighercoverage(89%inExp.1and69%inExp.2)Notably,thecoverageisevenhigherthanthatoftheDM.DEmodels,attestingtothecomplemen-tarityofmono-andcrosslingualinformation.ThedifferencesamongtheDM.MULTImodelsaresmall,butMaxSimdoesalittlebetterandperformsbestoverall.InExp.1,itdoessignificantlybetterintheall-itemsevaluationthanallothersyntax-basedmodels(p<0.01).ThedifferencesinExp.2areonlysignificantatp<0.05forthemodelpair6–8;weattributethistothesmallersizeofthedataset.Insum,wecanconstructcrosslingualDMswith-outanyuseoftargetlanguagecorporathatmirrororevenexceedtheperformanceofmonolingualDMs.Ifmonolingualdataisavailable,thecombinationofcorpusevidenceprovidesasubstantialadvantageoverbothmonolingualandcrosslingualmodels,evenforGerman,alanguagewithlarge,relativelyreliablyparsedcorpora.Userscanchooseamongdifferentmodelswithdifferentcoverage/qualitytradeoffs.Comparisontomodelsfromtheliterature.Mod-elsfromtheliteratureareshownatthebottomofthetwotables.Theygenerallyobtainthehighestac-curacy(orcorrelation,分别),butonlycoverarelativelysmallpartofthedatasets.Inparticu-lar,themodelswithaqualityhigherthantheDMvariants(11inExp.1and10and11inExp.2)ex-hibitacoverageoflessthanhalfthanthatoftheDM.MULTImodels.Thisappearstoshowtheusualtrade-offbetweenhand-constructedknowledgeandautomaticallyacquiredknowledge(GildeaandJu-rafsky,2002).然而,wecansimilarlybiasourDMstowardsaccuracywiththeaidofasimplefre-quencyfilterthatonlypermitspredictionsforitemswhereallinvolvedlemmasoccurmorefrequentlyintheGermancorpusthansomethreshold.Settingthesethresholdstomatchthecoveragefiguresofthebestontology-basedmodels,DM.MULTIMaxSimal-mostreachestheontology-basedresults:OnExp.1,foracoverageof.22weobtainanaccuracyof.68(ontology-basedmodel:.77),andonExp.2,weob-NounsCouscous(couscous),Albino(albino)Adjectiveskursorisch(cursory),süffisant(smug)Verbserodieren(toerode),moussieren(tofizz)Table7:Wordsofforeignoriginbetterrepresentedbythemultilingualmodeltainacorrelationof.60(ontology-basedmodel:.59)atacoverageof.33.10Thus,ourDMmodelsapprox-imatethequalityofontology-basedmodelswithoutusinganyhandcraftedresources.DifferencesbetweenExp.1and2.Thetwomaindifferencesbetweentheexperimentsare(A)theper-formanceofDMrelativetotheBOWbaseline,和(乙)theimpactofbacktranslationfiltering.InExp.1,theBOWperformsaswellasDM.MULTI,andtheunfilteredDM.XLhasaslightedge(2%准确性)overthefilteredone.Incontrast,inExp.2filteringleadstoamajorimprovementandDM.MULTIdoessubstantiallybetterthanBOWPCA.Ouranalysesattributethisdifferencetothenatureofthetwotasks(cf.Section5).Exp.1requirestherecognitionofsynonyms.Here,themaindeterminantofsuccessiswhethertheactualsynonymreceivesthehighestsimilarityornot,irrespectiveofthemargintothecompetingcandidates.Thismargindoesincreasefrom0.09inthenaiveto0.11inthefilteredDM.XL,buttheoverallnumberofcorrectpredictionsremainsalmostunchanged.Incontrast,Exp.2coversthewholerangefromhighlysimilartounrelatedwordpairs,andthecorrelationevaluationissensitivetotherelativesizeofsimilaritiesproducedbytheDMacrossmanywordpairs.Theimprovementweseeindicatesthatfilteringimprovestheoverallscalingofthesimilarities,butthiseffectismaskedbythedecisioncriterioninExp.1.Qualitativeanalysis.ComparingDM.DEwithDM.MULTI,thequestionarises:canwefurtherchar-acterizethebenefitsthattheinclusionofcrosslingualcorpusevidenceconferstomonolingualmodels?WefirstinspectedExp.1forsynonymsthatwerecor-rectlyrecognizedbyDM.MULTIMaxSimbutnotDM.DE,andfoundalargenumberofwordsoffor-eignorigin(seeTable7).ThesewordstendtoberareintheGermancorpusintheformoftechnical,slang,10Wecannotprovidesignificancetestssincewedonothaveitem-wisepredictionsforthemodelsfromtheliterature.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
254
orelevatedregisterterms.DuetotheirlowlevelofambiguityaswellasthefactthattheirEnglishtranslationsareoftenmorefrequent,thecrosslingualmodelrepresentsthemmoresensibly.WetheninspectedExp.2inasimilarwaybutfounditmoredifficulttoidentifysalientimprovedclassessincetheimprovementismostlyintermsofcoverage.ThedatasetforExp.2includespropernouns,suchasBerlin/Berlin-Kreuzberg,Benedetto/Benedikt,whichareunlikelytobecov-eredbyatranslationlexicon.ItalsocontainsitemsthatencodeworldknowledgesuchasRatzinger/Papst(pope)whichhasabetterchanceofbeingcoveredbytargetlanguagecorpora.ThispairisnotcoveredbytheDM.XLmodels,butthemonolingualmod-els(DM.DE,BOW,andBOWPCA)assignitthesimilarities.23,.66,and.89,respectively.7ExperimentalEvaluationonCroatianOurthirdexperimentconsidersalanguagethatismoredifferentfromEnglishthanGerman,namelyCroatian,aSlaviclanguage.AvailableresourcesforCroatianaremorelimitedthanforEnglishorGer-man.Sincesyntacticanalysisusedtobeabottleneck,thefirstsyntax-basedDSMforCroatian,DM.HR,becameavailableonlylastyear(Šnajderetal.,2013).Asforevaluationdatasets,therearenohumansimilar-ityjudgments,butthereisasynonymchoicedataset(CroSyn–seeKaranetal.(2012)fordetails).因此,ourCroatianevaluationisasynonymchoicetaskparalleltoExp.1forGerman.WetakeDM.HRasthemonolingualmodelwhichwasbuiltfromadependency-parsedCroatianwebcorpusof1.2Bto-kens.WeconstructacrosslingualmodelbystartingfromBaroniandLenci’sEnglishTypeDMandusingTaktikaNova’sfreelyavailableEnglish–Croatiandic-tionary11with105Ktranslationpairs.Afterremov-ingentrieswithmorethanonewordperlanguage,wewereleftwith95Kpairs,considerablyfewerthanforEnglish–German.WeapplythemethodsfromSection3foredgetranslationandfiltering.There-sultingfilteredCroatianDM.XLhas47Knodesand315Medges,aboutoneorderofmagnitudesmallerthantheGermancrosslingualresource.Finally,wecombinedDM.HRwiththecrosslingualDM(asinSection4)toobtainmultilingualCroatianDMs.11Availablefromhttp://goo.gl/xHUjJHAllCoveredModelAccAccCovWord-basedDSMs1BOW-LSA[SPA13].66.661Syntax-basedDSMs2DM.HR(AllL).65.65.993DM.XLEN→HRnaive(SPrfL).43.50.714DM.XLEN→HRfilter(SPrfL).58.71.715DM.MULTIBackoff(4,2).69.69.996DM.MULTIMaxSim(4,2).70.70.99Table8:Experiment3:AccuracyandCoverageforsynonymchoiceontheCroSyndataset.SPA13:Šna-jderetal.(2013).Inboldface:bestresults.Table8showstheresultswhichcorrespondcloselytothoseforExp.1.Adimensionality-reducedBOWspaceperformscompetitivelywiththemonolingualDM.HR(Šnajderetal.,2013).ThecrosslingualDMisagainabletoimproveaccuracyoverDM.HR(by6%)butdropsincoverage.Again,themultilingualmodelsperformbest:DM.MULTIMaxSimlosesonly1%accuracycomparedtothecrosslingualmodelbutachievesalmostperfectcoverage.ThedifferencestoDM.HRandDM.XLarebothsignificant(p<0.01).12ThetwomajordifferencestotheGermansynonymchoicetask(Exp.1)arethat(a)filteringplaysanessentialroleforCroatian(increaseinaccuracyby15%)and(b)DM.MULTIclearlyoutperformstheBOWmodel.Weattributethedifferencetothesemi-automaticconstructionoftheCroatiandatasetfrommachine-readabledictionaries.Overall,theresultsforCroatianareencouraging.Theydemonstratethatlanguageswhereparsingtechnologyisstilldevelop-ingcaninparticularprofitfromcross-andmultilin-gualmethods.Thisistrueevenforrelativelysmalltranslationdictionaries,matchingpreviousresultsfromtheliterature(PeirsmanandPadó,2011).8RelatedWorkGiventheresourcegradientbetweenEnglishandotherlanguages,thecrosslingualinductionoflinguis-ticinformationhasbeenanactivetopicofresearch.Manystudiesuseparallelcorpora.Annotationpro-jection(YarowskyandNgai,2001)transferssourcelanguageannotationdirectlyontotargetlanguage12WecannotprovidesignificancesfortheBOWresultsbe-causeweagaindonothaveper-itempredictions.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
255
sentence.IthasbeenappliedtovariouslinguisticlevelssuchasPOStaggingandsyntax(HiandHwa,2005;Hwaetal.,2005,amongothers).Otherstudiesuseparalleldataasindirectsupervisionformonolin-gualtasks.DiabandResnik(2002)usetranslationsaswordsenselabels;vanderPlasandTiedemann(2006)exploitmultilingualdistributionalsemanticsforrobustsynonymyextraction.Naseemetal.(2009)learnunsupervisedPOStaggersonmultilingualparal-leldata,exploitingthedifferencesbetweenlanguagesassoftconstraints.TitovandKlementiev(2012)andKozhevnikovandTitov(2013)induceshallowse-manticparsersfromparalleldata.Klementievetal.(2012)approachdocumentclassificationwithmulti-tasklearning,inducingamultilingualDSM.Sinceparallelcorporaarenotavailableinlargequantities,otherstudiesusecomparablecorporawhichcanprovideadditionalfeaturesfromtheotherlanguage.Forexample,Merloetal.(2002)improveEnglishverbclassificationwithnewfeaturesderivedfromChinesetranslations.DeSmetandMoens(2009)learnmultilingualtopicmodelsfornewsag-gregation.PeirsmanandPadó(2011)usecomparablecorporatotransferselectionalpreferencesandsenti-mentlabels.Wikipediacanbeseenasaparticularlyrichtypeofcomparablecorpuswithadditionallinkstructure.Ithasbeenusedtocomputesemanticre-latedness(NavigliandPonzetto,2012;NavigliandPonzetto,2010)andtocomputeconceptualdocumentrepresentationsforcrosslingualinformationretrieval(Potthastetal.,2008;Cimianoetal.,2009).Ourwork,doesnotrequireparallelorcomparablecorpora.Wenote,however,thattranslationlexiconssuchastheonesweusecanbeextractedfromcompa-rablecorpora(Rapp,1999;Vuli´candMoens,2012,andmanyothers),thoughfewpapersareconcernedwiththetranslationatthelevelofsemanticrelations,asweare.SimilarinthisrespectisFungandChen(2004),whotranslateFrameNet(Bakeretal.,1998)intoChinesewithabilingualontology.Theyusearelation-basedpruningschemethatissomewhatcomparabletoourbacktranslationfiltering.Toourknowledge,themostsimilarworktooursis(Mohammadetal.,2007),whichalsoconsidersDSMs,albeitadifferentvariety,namelyconcept-basedDSMswheretargetsarecharacterizedintermsoftheirdistributionovercategoriesofRoget’sthe-saurus.Likeourwork,theirstudycreatescrosslin-gualDSMsforGermanusingatranslationlexicon.Itfollowsadifferentstrategy,however:itcollectsco-occurrencecountsfromaGermancorpusandtrans-latesthecontextdimensionsintotheEnglishRogetcategories.Therefore,itcruciallyrequiresalargetar-getlanguagecorpus,whichourcrosslingualmethods(Section3)avoid.Itsuseofatargetlanguagecorpusresemblesourmultilingualmethods(Section4),butunlikethem,doesnotcombinecorpusevidencefrombothlanguages.Insum,webelievethatourmethodsaremoreadaptabletodifferentscenarios,beingabletousewhateverdataisavailableineitherlanguage.9ConclusionTheappealofsyntax-baseddistributionalspacesliesintheirpromiseofflexibleandlinguisticallymoreappropriatemodelsformanyphenomenainlexicalsemantics.Amajorobstacletotheiradoptionfornovellanguageshasbeenthesignificantlyhigherrequirementsonresourcescomparedtowordspaces.Inthispaper,wehavedemonstratedthatthisob-staclecanbeovercomebytransferringEnglishdis-tributionalinformationalongtheresourcegradientintotargetlanguagessuchasGermanandCroatian.Thesimplestmodels,whicharebasedsolelyontheEnglishDistributionalMemory(DM)resourceandatranslationlexicon,alreadybeatmonolingualDMsinquality.ThesecrosslingualmodelssufferfromlowercoveragebutcanbecombinedwiththemonolingualDMyieldingamultilingualDMthatmaintainscom-petitiveaccuracywhileachievingsignificantlyhighercoveragethaneitherindividualmodel.Theoutcomesofourexperimentsaremostlystableacrossthelan-guagesandtaskspresented,whichleadsustoassumethemethodologysuccessfullygeneralizes.13Directionsforfutureresearchinclude(a),morestringentfilteringofspuriousedgesinDM.XLmod-elstomakethegraphtopologymoresimilartomono-lingualmodelsandenablegraphmergingtoobtainunifiedmultilingualmodels;(b),theextensionofourapproachtomorethantwolanguages;(c),dimen-sionalityreductionfortensor-basedDSMsbothforefficiencyreasonsandtoimproveperformance.13TheGermanDMsarepubliclyavailablefromhttp://goo.gl/uxuffp.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
256
AcknowledgmentsWegratefullyacknowledgepartialfundingofourresearchbytheDFG(SFB732,ProjectD6)andtheEC(ProjectEXCITEMENT,FP7ICT-287923).ReferencesCollinF.Baker,CharlesJ.Fillmore,andJohnB.Lowe.1998.TheBerkeleyFrameNetProject.InProceedingsofthejointAnnualMeetingoftheAssociationforCom-putationalLinguisticsandInternationalConferenceonComputationalLinguistics,pages86–90,Montréal,QC.MarcoBaroniandAlessandroLenci.2010.Distribu-tionalmemory:Ageneralframeworkforcorpus-basedsemantics.ComputationalLinguistics,36(4):1–49.MarcoBaroni,SilviaBernardini,AdrianoFerraresi,andErosZanchetta.2008.TheWaCkyWideWeb:ACollectionofVeryLargeLinguisticallyProcessedWeb-CrawledCorpora.LanguageResourcesandEvalua-tion,43(3):209–226.BerndBohnet.2010.Topaccuracyandfastdependencyparsingisnotacontradiction.InProceedingsofthe23rdInternationalConferenceonComputationalLin-guistics,pages89–97,Beijing,China.SabineBuchholzandErwinMarsi.2006.CoNLL-Xsharedtaskonmultilingualdependencyparsing.InProceedingsoftheTenthConferenceonComputationalNaturalLanguageLearning,pages149–164,NewYork,NY.StanleyF.ChenandJoshuaGoodman.1998.Anempiri-calstudyofsmoothingtechniquesforlanguagemodel-ing.TechnicalReportTR-10-98,CenterforResearchinComputingTechnology,HarvardUniversity.PhilippCimiano,AntjeSchultz,SergejSizov,PhilippSorg,andSteffenStaab.2009.Explicitvs.latentcon-ceptmodelsforcross-languageinformationretrieval.InProceedingsoftheInternationalJointConferenceonArtificialIntelligence,pages1513–1518,Pasadena,CA.WimDeSmetandMarie-FrancineMoens.2009.Cross-languagelinkingofnewsstoriesonthewebusinginter-lingualtopicmodelling.InProceedingsoftheCIKMWorkshoponSocialWebSearchandMining,pages57–64,HongKong.MonaDiabandPhilipResnik.2002.Anunsupervisedmethodforwordsensetaggingusingparallelcorpora.InProceedingsofthe40thAnnualMeetingoftheAsso-ciationforComputationalLinguistics,pages255–262,Philadelphia,PA.BradleyEfronandRobertJ.Tibshirani.1993.AnIn-troductiontotheBootstrap.ChapmanandHall,NewYork,NY.KatrinErk,SebastianPadó,andUlrikePadó.2010.AFlexible,Corpus-DrivenModelofRegularandInverseSelectionalPreferences.ComputationalLinguistics,36(4):723–763.PascaleFungandBenfengChen.2004.BiFrameNet:BilingualFrameSemanticsResourcesConstructionbycrosslingualInduction.InProceedingsofthe20thIn-ternationalConferenceonComputationalLinguistics,pages931–935,Geneva,Switzerland.DanielGildeaandDanielJurafsky.2002.Automaticlabelingofsemanticroles.ComputationalLinguistics,28(3):245–288.GregoryGrefenstette.1994.ExplorationsinAutomaticThesaurusDiscovery.KluwerAcademicPublishers,Boston/Norwell,MA.ZeligS.Harris.1954.Distributionalstructure.Word,10(23):146–162.ChenhaiHiandRebeccaHwa.2005.ABackoffModelforBootstrappingResourcesforNon-EnglishLan-guages.InProceedingsofthejointHumanLanguageTechnologyConferenceandConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages851–858,Vancouver,BC.RebeccaHwa,PhilippResnik,AmyWeinberg,ClaraCabezas,andOkanKolak.2005.BootstrappingParsersviaSyntacticProjectionacrossParallelTexts.JournalofNaturalLanguageEngineering,11(3):311–325.EricJoanis,SuzanneStevenson,andDavidJames.2006.Ageneralfeaturespaceforautomaticverbclassifica-tion.NaturalLanguageEngineering,14(03):337–367.MladenKaran,JanŠnajder,andBojanaDalbeloBaši´c.2012.DistributionalsemanticsapproachtodetectingsynonymsinCroatianlanguage.InProceedingsoftheEighthLanguageTechnologiesConference,Ljubljana,Slovenia.AlexandreKlementiev,IvanTitov,andBinodBhattarai.2012.Inducingcrosslingualdistributedrepresentationsofwords.InProceedingsoftheInternationalConfer-enceonComputationalLinguistics,pages1459–1474,Mumbai,India.MikhailKozhevnikovandIvanTitov.2013.Crosslingualtransferofsemanticrolemodels.InProceedingsofthe51thAnnualMeetingoftheAssociationforComputa-tionalLinguistics,pages1190–1200,Sofia,Bulgaria.ThomasKLandauerandSusanTDumais.1997.Asolu-tiontoPlato’sproblem:Thelatentsemanticanalysistheoryofacquisition,induction,andrepresentationofknowledge.PsychologicalReview,104(2):211–240.DekangLin.1998.Automaticretrievalandclusteringofsimilarwords.InProceedingsofthejointAnnualMeetingoftheAssociationforComputationalLinguis-ticsandInternationalConferenceonComputationalLinguistics,pages768–774,Montreal,QC.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
257
RyanMcDonald,FernandoPereira,KirilRibarov,andJanHajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InProceedingsoftheConferenceonHumanLanguageTechnologyandEm-piricalMethodsinNaturalLanguageProcessing,pages523–530,Vancouver,BC.RyanMcDonald,JoakimNivre,YvonneQuirmbach-Brundage,YoavGoldberg,DipanjanDas,KuzmanGanchev,KeithHall,SlavPetrov,HaoZhang,OscarTäckström,ClaudiaBedini,NúriaBertomeuCastelló,andJungmeeLee.2013.Universaldependencyannota-tionformultilingualparsing.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages92–97,Sofia,Bulgaria.PaolaMerlo,SuzanneStevenson,VivianTsang,andGi-anlucaAllaria.2002.Amultilingualparadigmforautomaticverbclassification.InProceedingsofthe40thAnnualMeetingonAssociationforComputationalLinguistics,pages207–214,Philadelphia,PA.RadaMihalceaandDragomirRadev.2011.Graph-basedNaturalLanguageProcessingandInformationRetrieval.CambridgeUniversityPress,Cambridge,UK.GeorgeA.MillerandWalterG.Charles.1991.Contex-tualcorrelatesofsemanticsimilarity.LanguageandCognitiveProcesses,6(1):1–28.SaifMohammad,IrynaGurevych,GraemeHirst,andTorstenZesch.2007.Crosslingualdistributionalpro-filesofconceptsformeasuringsemanticdistance.InProceedingsoftheJointConferenceonEmpiricalMeth-odsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning,pages571–580,Prague,CzechRepublic.TahiraNaseem,BenjaminSnyder,JacobEisenstein,andReginaBarzilay.2009.MultilingualPart-of-SpeechTagging:TwoUnsupervisedApproaches.JournalofArtificialIntelligenceResearch,36:1–45.RobertoNavigliandSimonePaoloPonzetto.2010.Ba-belNet:Buildingaverylargemultilingualsemanticnetwork.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,pages216–225,Uppsala,Sweden.RobertoNavigliandSimonePaoloPonzetto.2012.Ba-belrelate!ajointmultilingualapproachtocomput-ingsemanticrelatedness.InProceedingsofthe26thConferenceonArtificialIntelligence,pages108–114,Toronto,ON.JoakimNivre.2006.InductiveDependencyParsing.Springer,Dordrecht,Netherlands.SebastianPadóandJasonUtt.2012.AdistributionalmemoryforGerman.InProceedingsoftheKONVENS2012workshoponrecentdevelopmentsandapplica-tionsoflexical-semanticresources,pages462–470,Vi-enna,Austria.YvesPeirsmanandSebastianPadó.2011.Semanticrelationsinbilinguallexicons.ACMTransactionsinSpeechandLanguageProcessing,8(2):3:1–3:21.LonnekevanderPlasandJörgTiedemann.2006.Findingsynonymsusingautomaticwordalignmentandmea-suresofdistributionalsimilarity.InProceedingsofjointAnnualMeetingoftheAssociationforCompu-tationalLinguisticsandInternationalConferenceonComputationalLinguistics,pages866–873,Sydney,Australia.MartinPotthast,BennoStein,andMaikAnderka.2008.Awikipedia-basedmultilingualretrievalmodel.InPro-ceedingsoftheEuropeanConferenceonInformationRetrieval,pages522–530,Glasgow,Scotland.ReinhardRapp.1999.AutomaticidentificationofwordtranslationsfromunrelatedEnglishandGermancor-pora.InProceedingsofthe37thAnnualMeetingoftheAssociationforComputationalLinguisticsonCom-putationalLinguistics,pages519–526,CollegePark,MD.SabineSchulteimWalde.2006.ExperimentsontheAutomaticInductionofGermanSemanticVerbClasses.ComputationalLinguistics,32(2):159–194.HinrichSchütze.1992.Dimensionsofmeaning.InProceedingsofSupercomputing’92,pages787–796,Minneapolis,MN.StephenSoderland,OrenEtzioni,DanielSWeld,MichaelSkinner,JeffBilmes,etal.2009.CompilingaMassive,MultilingualDictionaryviaProbabilisticInference.InProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessing,pages262–270,Suntec,Singapore.HaroldSomers.2005.Round-triptranslation:Whatisitgoodfor?InProceedingsoftheAustralasianLan-guageTechnologyWorkshop,pages127–133,Sydney,Australia.JanŠnajder,SebastianPadó,andŽeljkoAgi´c.2013.BuildingandevaluatingadistributionalmemoryforCroatian.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages784–789,Sofia,Bulgaria.IvanTitovandAlexandreKlementiev.2012.Crosslingualinductionofsemanticroles.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics,pages647–656,JejuIsland,SouthKorea.PeterTurney.2006.Similarityofsemanticrelations.ComputationalLinguistics,32(3):379–416.IvanVuli´candMarie-FrancineMoens.2012.Detectinghighlyconfidentwordtranslationsfromcomparablecorporawithoutanypriorknowledge.InProceedingsofthe13thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,pages449–459,Avignon,France.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
8
0
1
5
6
6
9
0
1
/
/
t
l
a
c
_
a
_
0
0
1
8
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
258
DeWittWallaceandLilaAchesonWallace.2005.Reader’sDigest,dasBestefürDeutschland.VerlagDasBeste,Stuttgart,Germany.FridolinWild,ChristinaStahl,GeraldStermsek,andGustafNeumann.2008.Parametersdrivingeffective-nessofautomatedessayscoringwithLSA.InProceed-ingsofthe9thComputer-AidedAssessmentConference,pages485–494,Loughborough,UK.DavidYarowskyandGraceNgai.2001.InducingMul-tilingualPOSTaggersandNPBracketersviaRobustProjectionacrossAlignedCorpora.InProceedingsofthe2ndAnnualMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages200–207,Pittsburgh,PA.TorstenZesch,IrynaGurevych,andMaxMühlhäuser.2007.ComparingWikipediaandGermanWordnetbyevaluatingsemanticrelatednessonmultipledatasets.InHumanLanguageTechnologies2007:TheConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages205–208,Rochester,NY.
下载pdf