Transactions of the Association for Computational Linguistics, vol. 5, pp. 487–500, 2017. Action Editor: Chris Quirk.
Submission batch: 3/2017; Published 11/2017.
c(cid:13)2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.
PhraseTableInductionUsingIn-DomainMonolingualDataforDomainAdaptationinStatisticalMachineTranslationBenjaminMarieAtsushiFujitaNationalInstituteofInformationandCommunicationsTechnology3-5Hikaridai,Seika-cho,Soraku-gun,Kyoto,619-0289,Japan{bmarie,atsushi.fujita}@nict.go.jpAbstractWepresentanewframeworktoinduceanin-domainphrasetablefromin-domainmonolin-gualdatathatcanbeusedtoadaptageneral-domainstatisticalmachinetranslationsystemtothetargeteddomain.OurmethodfirstcompilessetsofphrasesinsourceandtargetlanguagesseparatelyandgeneratescandidatephrasepairsbytakingtheCartesianproductofthetwophrasesets.Itthencomputesin-expensivefeaturesforeachcandidatephrasepairandfiltersthemusingasupervisedclas-sifierinordertoinduceanin-domainphrasetable.WeexperimentedonthelanguagepairEnglish–French,bothtranslationdirections,intwodomainsandobtainedconsistentlybetterresultsthanastrongbaselinesystemthatusesanin-domainbilinguallexicon.Wealsocon-ductedanerroranalysisthatshowedthein-ducedphrasetablesproposedusefultransla-tions,especiallyforwordsandphrasesunseenintheparalleldatausedtotrainthegeneral-domainbaselinesystem.1IntroductionInphrase-basedstatisticalmachinetranslation(SMT),translationmodelsareestimatedoveralargeamountofparalleldata.Ingeneral,usingmoredataleadstoabettertranslationmodel.Whennospecificdomainistargeted,general-domain1par-alleldatafromvariousdomainsmaybeusedto1AsinAxelrodetal.(2011),inthispaper,weusethetermgeneral-domaininsteadofthecommonlyusedout-of-domainbecauseweassumethattheparalleldatamaycontainsomein-domainsentencepairs.trainageneral-purposeSMTsystem.However,itiswell-knownthat,intrainingasystemtotrans-latetextsfromaspecificdomain,usingin-domainparalleldatacanleadtoasignificantlybettertrans-lationquality(Carpuatetal.,2012).En effet,whenonlygeneral-domainparalleldataareused,itisun-likelythatthetranslationmodelcanlearnexpres-sionsandtheirtranslationsspecifictothetargeteddomain.Suchexpressionswillthenremainuntrans-latedinthein-domaintextstotranslate.Sofar,in-domainparalleldatahavebeenhar-nessedtocoverdomain-specificexpressionsandtheirtranslationsinthetranslationmodel.However,evenifwecanassumetheavailabilityofalargequantityofgeneral-domainparalleldata,atleastforresource-richlanguagepairs,findingin-domainpar-alleldataspecifictoaparticulardomainremainschallenging.In-domainparalleldatamaynotexistforthetargetedlanguagepairsormaynotbeavail-ableathandtotrainagoodtranslationmodel.Inordertocircumventthelackofin-domainpar-alleldata,thispaperpresentsanewmethodtoadaptanexistingSMTsystemtoaspecificdomainbyin-ducinganin-domainphrasetable,i.e.,asetofphrasepairsassociatedwithfeaturesfordecoding,fromin-domainmonolingualdata.AswereviewinSec-tion2,mostoftheexistingmethodsforinducingphrasetablesarenotdesigned,andmaynotperformasexpected,toinduceaphrasetableforaspecificdomainforwhichonlylimitedresourcesareavail-able.Insteadofrelyingonlargequantityofparalleldataorhighlycomparablecorpora,ourmethodin-ducesanin-domainphrasetablefromunalignedin-domainmonolingualdatathroughathree-steppro-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
488
cedure:phrasecollection,phrasepairscoring,andphrasepairfiltering.Incorporatingourinducedin-domainphrasetableintoanSMTsystemachievessubstantialimprovementsintranslatingin-domaintextsoverastrongbaselinesystem,whichusesanin-domainbilinguallexicon.Toachievethisimprovement,ourproposedmethodforinducinganin-domainphrasetablead-dressesseverallimitationsofpreviousworkby:•dealingwithsourceandtargetphrasesofarbi-trarylengthcollectedfromin-domainmonolin-gualdata,•proposingtranslationsfornotonlyunseensourcephrases,butalsothosealreadyseeninthegeneral-domainparalleldata,and•makinguseofpotentiallymanyfeaturescom-putedfromthemonolingualdata,aswellasfromtheparalleldata,inordertoscoreandfil-terthecandidatephrasepairs.Intheremainderofthispaper,wefirstreviewpreviousworkinSection2,highlightingthemainweaknessesofexistingmethodsforinducingaphrasetablefordomainadaptation,andourmoti-vation.InSection3,wethenpresentourphraseta-bleinductionmethodwithallthenecessarysteps:phrasecollection(Section3.1),computingfeaturesofeachphrasepair(Section3.2),andpruningtheinducedphrasetablestokeeptheirsizemanageable(Section3.3).InSection4,wedescribeourexper-imentstoevaluatetheimpactoftheinducedphrasetablesintranslatingin-domaintexts.Followingthedescriptionofthedata(Section4.1),weexplainthetoolsandparametersusedtoinducethephrasetables(Section4.2),ourSMTsystems(Section4.3),andpresentadditionalbaselinesystems(Section4.4).OurexperimentalresultsaregiveninSection4.5.Section5.1analyzestheerrordistributionofthetranslationsproducedbyanSMTsystemusingourinducedphrasetable,followedbytranslationexam-plestofurtherillustrateitsimpactinSection5.2.Fi-nally,Section6concludesthisworkandproposessomepossibleimprovementstoourapproach.2MotivationInmachinetranslation(MT),wordsandphrasesthatdonotappearinthetrainingparalleldata,i.e.,out-of-vocabulary(OOV)tokens,havebeenrecognizedasoneofthefundamentalissues,regardlessofthescenario,suchasadaptingexistingSMTsystemstoanewspecificdomain.OnestraightforwardwaytofindtranslationsofOOVwordsandphrasesconsistsinenlargingtheparalleldatausedtotrainthetranslationmodel.Thiscanbedonebyretrievingparallelsentencesfromcomparablecorpora.However,thesemethodsheav-ilyrelyondocument-levelinformation(ZhaoandVogel,2002;UtiyamaandIsahara,2003;FungandCheung,2004;MunteanuandMarcu,2005)tore-ducetheirsearchspacebyscoringonlysentencepairsextractedfromeachpairofdocuments.In-deed,scoringallpossiblesentencepairsfromtwolargemonolingualcorporausingcostlyfeaturesandaclassifier,asproposedbyMunteanuandMarcu(2005)forinstance,iscomputationallytooexpen-sive.2Inmanycases,wemaynothaveaccesstodocument-levelinformationinthegivenmonolin-gualdataforthetargeteddomain.Furthermore,evenwithoutconsideringcomputationalcost,itisunlikelythatalargenumberofparallelsentencescanberetrievedfromnon-comparablemonolingualcorpora.HewavitharanaandVogel(2016)proposedtodirectlyextractphrasepairsfromcomparablesen-tences.However,thenumberofretrievablephrasepairsisstronglylimited,becauseonecancollectsuchcomparablesentencesonlyonarelativelysmallscaleforthetargetedlanguagepairsanddomains.Whenin-domainparallelorcomparablesentencescannotbeeasilyretrieved,anotherpossibilitytofindtranslationsforOOVwordsisbilingualwordlexi-coninductionusingcomparableorunalignedmono-lingualcorpora(Fung,1995;Rapp,1995;KoehnandKnight,2002;Haghighietal.,2008;Daum´eandJagarlamudi,2011;IrvineandCallison-Burch,2013).Thisapproachisespeciallyusefulinfindingwordsandtheirtranslationsspecifictothegivencor-pus.Arecentandcompletelydifferenttrendofworkusesanunsupervisedmethodregardingtranslationasadeciphermentproblemtolearnabilingualwordlexiconanduseitasatranslationmodel(RaviandKnight,2011;DouandKnight,2012;Nuhnetal.,2012).Cependant,allthesemethodsdealonlywith2Forinstance,usingtheseapproachesonsourceandtargetmonolingualdatacontainingboth5millionssentencesmeansthatwehavetoevaluate25×1012candidatesentencepairs.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
489
words,mainlyowingtothecomputationalcomplex-ityofdealingwitharbitrarylengthsofphrases.Translationsofphrasescanbeinducedusingbilingualwordlexiconsandconsideringpermuta-tionsofwordordering(ZhangandZong,2013;IrvineandCallison-Burch,2014).Cependant,itiscostlytothoroughlyinvestigateallcombinationsofalargenumberofword-leveltranslationcandi-datesandpossiblepermutationsofwordordering.Toretainonlyappropriatephrasepairs,IrvineandCallison-Burch(2014)proposedtoexploitasetoffeatures.Someofthem,includingtemporal,contex-tual,andtopicsimilarityfeatures,stronglyreliedonthecomparabilityofWikipediaarticlesandontheavailabilityofnewsarticlesannotatedwithatimes-tamp(Klementievetal.,2012).Wemaynothavesuchusefulresourcesinlargequantityforthetar-getedlanguagepairsanddomains.Salujaetal.(2014)andZhaoetal.(2015)alsoproposedmethodstoinduceaphrasetable,focus-ingonlyontheOOVwordsandphrases:unigramsandbigramsinthesourcesideoftheirdevelopmentandtestdatathatareunseeninthetrainingdata.Intheirapproach,nonewtranslationoptionsareproposedforknownsourcephrases.Togeneratecandidatephrasepairs,foragivensourcephrase,Salujaetal.(2014)usesonlyphrasesfromthetar-getsideoftheirparalleldataandtheirmorphologi-calvariantsrankedandprunedaccordingtothefor-wardlexicaltranslationprobabilitiesgivenbytheirbaselinesystem’stranslationmodel.Theirapproachthusstronglyreliesontheaccuracyoftheexist-ingtranslationmodel.Forinstance,ifthegivensourcephrasecontainsonlyOOVtokens,asitmayhappenwhentranslatingatextfromadifferentdo-main,theirapproachcannotretrievecandidatetar-getphrases.Furthermore,theydonotmakeuseofexternalmonolingualdatatoexploreunseentargetphrases.Theirmethodisconsequentlyinadequatetoproducetranslationsforphrasesfromadifferentdomainthantheoneoftheparalleldata.WhileSalujaetal.(2014)usedacostlygraphpropagationstrategytoscorethecandidatephrasepairs,Zhaoetal.(2015)usedamethodwithamuchlowercomputationalcostandreportedhigherBLEUscoresusingonlywordembeddingstoscoreandrankmanyphrasepairsgeneratedfromtar-getphrases,unigramsandbigrams,collectedfrommonolingualcorpora.ThemaincontributionofZhaoetal.(2015)istheuseofalocallinearprojec-tionstrategy(LLP)toobtainacross-lingualseman-ticsimilarityscoreforeachphrasepair.Itmakestheprojectionofsourceembeddingstothetargetem-beddingsspacebylearningatranslationmatrixforeachsourcephraseembedding,trainedonmgoldphrasepairswithsourcephraseembeddingssimi-lartotheonetoproject.Aftertheprojection,basedonlyonthesimilarityoverembeddings,theknear-esttargetphrasesoftheprojectedsourcephraseareretrieved.Iftheprojectionforagivensourcephraseisnotaccurateenough,verynoisyphrasepairsaregenerated.Thismaybeaproblemespeciallywhenthegivensourcephrasedoesnotneedtobetrans-lated(i.e.,numbers,dates,moleculenames,etc.).Thesystemwilltranslateit,becausethissourcephrasepreviouslyOOVisnowregisteredinitsin-ducedphrasetable,buthasonlywrongtranslationsavailable(seeSection4.5forempiricalevidences).3In-domainphrasetableinductionToinduceanin-domainphrasetable,ourapproachassumestheavailabilityoflargegeneral-domainparalleldataandin-domainmonolingualdataofbothsourceandtargetlanguages.Forsomeofourconfigurations,wealsoassumetheavailabilityofanin-domainbilinguallexicontocomputefeaturesas-sociatedwitheachcandidatephrasepairandtocom-puteareliabilityscoretofilterappropriateones.3.1In-domainphrasecollectionInastandardconfiguration,SMTsystemsextractphrasesofalengthuptosixorseventokens.Col-lectingallthen-gramsofsuchalengthfromagivenlargemonolingualcorpusisfeasible,butwillpro-videalargesetofsourceandtargetphrases,re-sultinginanenormousnumberofcandidatephrasepairs.Inthenextstep,weevaluateeachcandidateinagivensetofphrasepairs;itisthuscrucialtogetareasonablysmallsetofphrases.Incontrastwithpreviouswork,wecollectmoremeaningfulphrasesthanarbitraryshortn-grams,us-ingthefollowingformulapresentedbyMikolovetal.(2013un):score(wiwj)=freq(wiwj)−δfreq(wi)×freq(wj)
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
490
wherewiandwjaretwoconsecutivetokens,freq(·)thefrequencyofagivenwordorphraseinthegivenmonolingualcorpus,andδadiscountingcoefficientthatpreventstheretrievalofmanyphrasescomposedfrominfrequentwords.Eachbigramwiwjinthemonolingualcorpusisscoredwiththisformulaandonlythebigramswithascoreaboveapredefinedthresholdθareregardedasphrases.Alltheiden-tifiedphrasesaretransformedintoonetoken,3andanewpassisperformedoverthemonolingualcorpustoobtainnewphrasesalsousingthephrasesidenti-fiedinthepreviouspasses.Tofurtherlimitthenum-berofcollectedphrases,weconsideronlyphrasescontainingwordsthatappearatleastKtimesinthemonolingualdata.AfterTpasses,wecompileasetofphraseswith(un)allthesinglewordsand(b)allthephraseswithalengthofuptoLtokensidenti-fiedduringeachpass.StandardSMTsystemsforcloselanguagesdi-rectlyoutputOOVtokensinthetranslation.Tobeasgoodassuchsystems,ourapproachmustbeabletoretrievetherighttranslation,especiallyforthemanydomain-specificwordsandphrasesthatareidenticalinbothsourceandtargetlanguages.Toensurethatasourcephrasethatmustremainuntranslatedhasitsidentityinthetargetphraseset,weexplicitlyaddinthetargetphrasesetallthesourcephrasesthatalsoappearinthetargetmonolingualdata.3.2FeatureengineeringGiventwosetsofphrases,forthesourceandtargetlanguages,respectivement,weregardallpossiblecom-binationsofsourceandtargetphrasesascandidatephrasepairs.Thisnaivecouplingimperativelygen-eratesalargenumberofpairsthataremostlynoise.Thus,thechallengehereistoeffectivelyestimatethereliabilityofeachpair.Thissectiondescribessev-eralfeaturestocharacterizeeachphrasepair;theyareusedforevaluatingphrasepairsandalsoaddedintheinducedphrasetabletoguidethedecoder.3.2.1Cross-lingualsemanticsimilarityManyresearcherstackledtheproblemofesti-matingcross-lingualsemanticsimilaritybetweenpairsofwordsorphrasesbyusingtheirembeddings(Mikolovetal.,2013a;Chandaretal.,2014;Faruqui3Thistransformationisperformedbysimplyreplacingthespacebetweenthetwotokenswithanunderscore.andDyer,2014;Coulmanceetal.,2015;Gouwsetal.,2015;Duongetal.,2016)incombinationwitheitheraseedbilinguallexiconorasetofparallelsentencepairs.Weestimatemonolingualphraseembeddingsviatheelement-wiseadditionofthewordembeddingscomposingthephrase.Thismethodperformswelltoestimatephraseembeddings(MitchellandLap-ata,2010;Mikolovetal.,2013a),despiteitssimplic-ityandrelativelylowcomputationalcostcomparedtostate-of-the-artmethodsbasedonneuralnetworks(Socheretal.,2013a;Socheretal.,2013b)orrichfeatures(Lazaridouetal.,2015).Thislowcomputa-tionalcostiscrucialinourcase,asweneedtoeval-uatealargenumberofcandidatephrasepairs.Inordertomakesourceandtargetphraseem-beddingscomparable,weperformalinearprojec-tion(Mikolovetal.,2013a)oftheembeddingsofsourcephrasestothetargetembeddingspace.Tolearntheprojection,weusethemethodofMikolovetal.(2013un)withtheonlyexceptionthatwedealwithnotonlywordsbutalsophrases.Giventrain-ingdata,i.e.,agoldbilinguallexicon,weobtainatranslationmatrixˆWbysolvingthefollowingopti-mizationproblemwithstochasticgradientdescent:ˆW=argminWXi||Wxi−zi||2wherexiisthesourcephraseembeddingofthei-thtrainingdata,zithetargetphraseembeddingofthecorrespondinggoldtranslation,andWthetransla-tionmatrixusedtoprojectxisuchthatWxiisascloseaspossibletoziinthetargetembeddingspace.Oneimportantparameterhereisthenumberofdi-mensionsofword/phraseembeddings.Thiscanbedifferentforthesourceandtargetembeddings,butmustbesmallerthanthenumberofphrasepairsinthetrainingdata;otherwisetheequationisnotsolv-able.SeeSection4.1forthedetailsaboutthebilin-guallexiconusedinourexperiment.Givenaphrasepairtoevaluate,thesourcephraseembeddingisprojectedtothetargetembeddingspace,usingˆW.Then,wecomputethecosinesimi-laritybetweentheprojectedsourcephraseembed-dingandthetargetphraseembeddingtoevaluatethesemanticsimilaritybetweenthesephrases;thisseemstogivesatisfyingresultsinthiscross-lingualscenarioasshownbyMikolovetal.(2013un).UN
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
491
translationmatrixistrainedforeachtranslationdi-rectionf→eande→f,respectivement,sothatwehavetwocross-lingualsemanticsimilarityfeaturesforeachphrasepair.3.2.2LexicaltranslationprobabilitiesWeassumetheexistenceofalargeamountofgeneral-domainparalleldata,andtrainaregulartranslationmodelwithlexicaltranslationproba-bilitiesinanordinaryway.Althoughin-domainphrasesarelikelytocontaintokensthatareunseeninthegeneral-domainparalleldata,lexicaltransla-tionprobabilitiesmaybeusefultoscorecandidatepairofsourceandtargetphrasesthatcontaintokensseeninthegeneral-domainparalleldata.Tocom-puteaphrase-levelscore,foratargetphraseegivenasourcephrasef,weconsiderallpossiblewordalignmentsasfollows:Plex(e|F)=1IIXi=1log(cid:16)1JJXj=1p(ei|fj)(cid:17)whereIandJarethelengthsofeandf,respec-tively,andp(ei|fj)thelexicaltranslationprobabilityofthei-thtargetwordeiofegiventhej-thsourcewordfjoff.Suchphrase-levellexicaltranslationprobabilitiesarecomputedforbothtranslationdi-rectionsgivingustwofeatures.3.2.3OtherfeaturesAsdemonstratedbypreviouswork(IrvineandCallison-Burch,2014;IrvineandCallison-Burch,2016),featuresbasedonthefrequencyofthephrasesinthemonolingualdatamayhelpustobet-terscoreaphrasepair.Weaddasfeaturesthein-versedfrequencyofthesourceandtargetphrasesinthein-domainmonolingualdata,alongwiththeirrelativedifferencegivenbythefollowingformula:simf(e,F)=(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)log(cid:16)freq(e)Ne(cid:17)−log(cid:16)freq(F)Nf(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)whereNxstandsforthenumberoftokensinthein-domainmonolingualdataofthecorrespondinglan-guage.Thesurface-levelsimilarityofsourceandtargetphrasescanalsobeastrongcluewhenconsideringthetranslationbetweentwolanguagesthatarerela-tivelyclose.Weinvestigatetwofeaturesconcerningthis:thefirstfeatureistheLevenshteindistancebe-tweenthetwophrasescalculatedregardingwordsasunits,4whiletheotherisabinaryfeaturethatfiresifthetwophrasesareidentical.Weshallex-pectbothfeaturestobeveryusefulincaseswheremanydomain-specificwordsandphrasesarewrit-teninthesamewayintwolanguages;forinstance,drugandmoleculenamesinthemedicaldomaininFrenchandEnglish.Wealsoaddasfeaturesthelengthsofthesourceandtargetphrases,i.e.,IandJ,andtheirratio.Usingalltheabove12features,theoverallscoreforeachpairisgivenbyaclassifierasdescribedinSection3.3;thisscoreisalsoaddedasafeatureintheinducedphrasetablefordecoding.3.3PhrasepairfilteringAsmentionedabove,phrasepairssofargeneratedaremostlynoise.Toreducethedecoder’ssearchspacewhenusingourinducedphrasetable,werad-icallyfilteroutinappropriatepairs.EachcandidatephrasepairisassessedbythemethodproposedinIrvineandCallison-Burch(2013),whichpredictswhetherapairofwordsaretranslationsofonean-otherusingaclassifier.Astrainingexamples,weuseabilinguallexiconaspositiveexamplesandran-domlyassociatedphrasepairsfromourphrasesetsasnegativeexamples.Forclassification,weuseallthefeaturespresentedinSection3.2.Weusethescoregivenbytheclassifiertorankthetargetphrasesforeachsourcephrase.Onlythetargetphraseswiththetopnscoresarekeptinthefinalinducedphrasetable.4ExperimentsThissectiondemonstratestheimpactofthein-ducedphrasetablesintranslatingin-domaintextsinthreeconfigurations.Inthefirstconfiguration(Conf.1),weevaluatedwhetherourinducedphrasetableimprovesthetranslationofin-domaintextsoverthevanillaSMTsystemwhichusedonlyonephrasetabletrainedfromgeneral-domainparallel4Herewedidnotusethecharacter-leveleditdistancetomeasuretheorthographicsimilaritybetweenphrases.Eventhoughsuchafeaturemaybeuseful(KoehnandKnight,2002),itscomputationalcostistoohightodealefficientlywithbillionsofphrasepairs.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
492
data.Wethenevaluated,inthesecondconfigura-tion(Conf.2),whetherourinducedphrasetableisalsobeneficialwhenusedinanSMTsystemthatalreadyincorporatesanin-domainbilinguallexiconthatcouldbecreatedmanuallyorinducedbysomeofthemethodsmentionedinSection2.Finally,weevaluatedincomplementaryexperiments(Conf.3)whetherourinducedphrasetablecanalsoofferuse-fulinformationtoimprovetranslationqualityevenwhenusedincombinationwithanotherstandardphrasetablegeneratedfromin-domainparalleldata.4.1DataSinceourapproachassumestheavailabilityoflarge-scalegeneral-domainparallelandmonolingualcor-pora,weconsideredtheFrench–Englishlanguagepairandbothtranslationdirectionsforourexperi-ments.TheFrench–EnglishversionoftheEuroparlparallelcorpus5wasregardedasageneral-domain,andnotstrictlyout-of-domain,corpusbecausemanydebatescanbeassociatedtoaspecificdomainandcancontainphrasesspecifictoparticulardomains.Asgeneral-domainmonolingualdata,weusedtheconcatenationofonesideofEuroparlandthe2007–2014editionsofNewsCrawlcorpora6inthesamelanguage.Wefocusedontwodomains:medical(EMEA)andscience(Science).Forbothdomains,weusedthedevelopmentandtestsetsprovidedforawork-shopondomainadaptationofMT(Carpuatetal.,2012).7Wealsousedtheprovidedin-domainpar-alleldatafortrainingbutregardedonlythetargetsideasmonolingualdata.Sinceourprimaryob-jectiveistheinductionofaphrasetablewithoutusingin-domainparalleldata,thesourcesideofthein-domainparalleldatawasnotusedasapartofthesourcein-domainmonolingualdata,exceptwhentraininganordinaryin-domainphrasetableinConf.3.AsmedicaldomainmonolingualdatafortheEMEAtranslationtask,weusedtheFrenchandEnglishmonolingualmedicaldataprovidedfortheWMT’14medicaltranslationtask.8Noneof5http://statmt.org/europarl/,release76http://statmt.org/wmt15/translation-task.html7http://hal3.name/damt/8http://www.statmt.org/wmt14/medical-task/DomainData#sent.#tok.(En-Fr)EMEAdevelopment2,02228k-32ktest2,04525k-29kparallel472k6M-7Mmonolingual275M-255MSciencedevelopment1,99052k-65ktest1,98252k-65kparallel66k2M-2Mmonolingual82M-2MGeneralparallel2M54M-60Mmonolingual2.8B-1.1BTable1:Statisticsontrain,development,andtestdata.theparallelcorporaprovidedfortheWMT’14med-icaltranslationtaskwasused.AssciencedomainmonolingualdatafortheSciencetranslationtask,weusedtheEnglishsideoftheASPECparallelcor-pus(Nakazawaetal.,2016).9Malheureusement,wedidnotfindanyFrenchmonolingualcorporapub-liclyavailablefortheSciencedomainthatweresuf-ficientlylargeenoughforourexperiments.StatisticsonthedataweusedarepresentedinTable1.Toinducethephrasetablesfromthemonolin-gualdata,wecomparedtwobilinguallexicons:ageneral-domainandanin-domainlexicons.Theselexiconsareusedtotrainthetranslationmatrices(seeSection3.2.1)andtotraintheclassifier(seeSection3.3).Thegeneral-domainlexicon(hence-forth,gen-lex)isaphrase-basedoneextractedfromthephrasetablebuiltonthegeneral-domainparalleldata(seeSection4.3).Weextractedthe5,000mostfrequentsourcephrasesandtheirmostprobabletranslationaccordingtotheforwardtrans-lationprobability,p(e|F).Weadoptedthissizeasithadbeenprovenoptimaltolearnthemappingbetweentwomonolingualembeddingspaces(Vuli´candKorhonen,2016).Forsomeexperiments,wealsosimulatedtheavailabilityofanin-domainbilin-guallexicon.Weautomaticallygeneratedalexiconforeachdomain(henceforth,in-lex)usingtheentirein-domainparalleldata,inthesamemannerascompilinggen-lex,exceptthatweselectedthe5,000mostfrequentsourcewordsinthein-domainparalleldatathatwerenotinthe5,000mostfrequentwordsinthegeneral-domainparalleldatainorder9http://orchid.kuee.kyoto-u.ac.jp/ASPEC/
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
493
SideDomainDataw2pw2vsourcegeneralmonolingual√in-domainmonolingual√√parallel(√)(√)development√test√targetgeneralmonolingual√in-domainmonolingual√√parallel√√Table2:Corporausedforextractingphrasesandcomput-ingwordembeddings:w2pindicatesword2phrase,whilew2vforword2vec.(√)denotesthatthedataareusedinConf.3only.toensurethatweobtainedmostlyin-domainwordpairs.Notethatwedidnotusephrasesbutwordsforin-lex,assumingthathumansarenotabletomanuallyconstructalexiconcomprisingphrasepairssimilartothoseinphrasetablesforSMTsys-tems.ForConf.3,asweassumetheavailabil-ityofin-domainparalleldata,thebilinguallexi-con(para-lex)usedwas5,000phrasepairsex-tractedfromthein-domainphrasetable,excludingthesourcephrasesofgen-lex.4.2ToolsandparametersAsummaryofthedatausedtocollectphrasesandestimatewordembeddingsispresentedbyTable2.Foreachpairofdomainandtranslationdirec-tion,setsofsourceandtargetphraseswereextractedfromthein-domainmonolingualdata,asdescribedinSection3.1.Asinpreviouswork(IrvineandCallison-Burch,2014;Salujaetal.,2014;Zhaoetal.,2015),wefocusonsourcephrasesappearinginthedevelopmentandtestsetsinordertomaximizethecoverageofourinducedphrasetableforthem.10Moreprecisely,sourcephraseswerecollectedfrom10Weareawarethatthismaynotbepracticalbecauseitrequirestheknowledgeofthedevelopmentandtestsetsbe-forehand.ForinstancefortheFr→EnEMEAtranslationtask,inducingaphrasetablegivenallthe4.5Mcollectedsourcephrasewouldrequiredapproximately3monthsusing100CPUthreads.IncreasingthevalueofKtocollectlesssourcephrasescanbeareasonablealternativetosignificantlydecreasethiscomputationtime,eventhoughitwillalsonecessarilydecreasethecoverageofthephrasetable.Weleaveforourfutureworkthestudyofaphrasetableinductionwithsourcephrasesex-tractedfromsourcemonolingualdatawithoutreferringtothedevelopmentandtestsets.Tasksourcetarget#phrasealldev+testpairsEMEAFr→En4.5M20k437k8.7BEn→Fr5.1M11k469k5.2BScienceFr→En1.1M28k216k6.0BEn→Fr2.3M24k18k432MTable3:Sizeofthephrasesetscollectedfromthesourceandtargetin-domainmonolingualdataandthenumberofphrasesappearingonlyintheconcatenationofthesourcesideofthedevelopmentandtestsets(dev+test).“#phrasepairs”denotesthenumberofphrasepairsas-sessedbytheclassifier.theconcatenationofthedevelopmentandtestsetsandthein-domainmonolingualdatawithreliablestatistics,andthenonlyphrasesappearinginthede-velopmentandtestsetswerefiltered.11Weremovedphrasescontainingtokensunseeninthein-domainmonolingualdata,becauseweareunabletocom-puteallourfeaturesforthem.Ontheotherhand,targetphraseswerecollectedfromthein-domainmonolingualdata,includingthetargetsideofin-domainparalleldata.Toidentifyphrases,weusedtheword2phrasetoolincludedintheword2vecpackage,12withthedefaultvaluesforδandθ.WesetK=1forthesourcelanguagetoensurethatmostofthetokenswouldbetranslated,andK=25forthetargetlanguagetolimitthenumberofresult-ingphrases.WesetL=6asthisisthesamemax-imalphraselengththatwesetforthephrasetablestrainedfromtheparalleldata.WestoppedatT=4passesasthefifthpassretrievedonlyaverysmallnumberofnewphrasescomparedtothefourthpass.StatisticsofthecollectedphrasesforeachtaskarepresentedinTable3.Totrainthewordembeddings,weusedword2vecwiththefollowingparameters:-cbow1-window10-negative15-sample1e-4-iter15-min-count1.Mikolovetal.(2013un)observedthatbetterresultsforcross-lingualsemanticsimilaritywereobtainedwhenusingwordembeddingswithhigherdimensions11AswehadnoFrenchmonolingualcorpusfortheSciencedomain,thedevelopmentandtestsetsfortheScienceFr→Entaskwereconcatenatedwithonemillionsentencesrandomlyextractedfromthegeneral-domainmonolingualdata.12https://code.google.com/archive/p/word2vec/
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
494
DataLM1LM2LM3Targetsideofin-domainparalleldata√√√In-domainmonolingualdata√√General-domainmonolingualdata√Table4:Sourceofourthreelanguagemodels.onthesourcesidethanonthetargetside.Wethereforechose800and300dimensionsforthesourceandtargetembeddings,respectively.Theembeddingsweretrainedontheconcatenationofallthegeneral-domainandin-domainmonolingualdataaspresentedbyTable2.Consequently,foreachpairofdomainandtranslationdirection,wehavefourwordembeddingspaces:thosewith300or800dimensionsforsourceandtargetlanguages.ThereliabilityofeachphrasepairwasestimatedasdescribedinSection3.3tocompilephrasetablesofreasonablesizeandquality.WeusedVowpalWabbit13toperformlogisticregressionwithonepass,defaultparameters,et–linklogisticoptiontoobtainaclassificationscoreforeachphrasepair.Inthefinalinducedphrasetable,wekeptthe300besttargetphrases14foreachsourcephraseac-cordingtothisscore.4.3SMTsystemsTheMosestoolkit(Koehnetal.,2007)15wasusedfortrainingSMTmodels,parametertuning,andde-coding.Thephrasetablesweretrainedonthepar-allelcorpususingSyMGIZA++(Junczys-DowmuntandSzał,2012)16withIBM-2wordalignmentandthegrow-diag-final-andheuristics.Toob-tainstrongbaselinesystems,allSMTsystemsusedthreelanguagemodels17builtondifferentsetsofcorporaasshowninTable4;eachlanguagemodelisa4-grammodifiedKneser-Neysmoothedone13https://github.com/JohnLangford/vowpal_wabbit/14AsinIrvineandCallison-Burch(2014),weobtainedbet-terresultswhenfavoringrecalloverprecision.Wechose300empiricallysincewedidnotobserveanyfurtherimprovementswhenkeepingmoretargetphrases.15http://statmt.org/moses/,version2.1.116https://github.com/emjotde/symgiza-pp/17TheoneexceptionisthesystemfortheScienceEn→Frtask,whichusesonlytwolanguagemodelsaswedonothaveanyin-domainmonolingualdatainadditiontothetargetsideofthein-domainparalleldata.PhrasetableConf.1Conf.2Conf.3Phrasetabletrainedfrom√√√general-domainparalleldataPhrasetabletrainedfrom√in-domainparalleldataIn-domainbilinguallexicon√Phrasetableinducedfrom√√√in-domainmonolingualdataTable5:Multiplephrasetableconfigurations.trainedusinglmplz(Heafieldetal.,2013).18Toconcentrateonthetranslationmodel,wedidnotusethelexicalreorderingmodelthroughouttheexperi-ments,whileweenableddistance-basedreorderinguptosixwords.OursystemsusedthemultipledecodingpathsabilityofMoses;weuseduptothreephrasetablesinonesystem,assummarizedinTable5.WedidnotaddthefeaturespresentedinSection3.2tothephrasepairsdirectlyderivedfromtheparalleldata.19Weightsofthefeatureswereoptimizedwithkb-mira(CherryandFoster,2012)using200-besthypotheseson15iterations.Thetranslationout-putswereevaluatedwithBLEU(Papinenietal.,2002)andMETEOR(DenkowskiandLavie,2014).Theresultswereaveragedoverthreetuningruns.Thestatisticalsignificancewasmeasuredbyap-proximaterandomization(Clarketal.,2011)usingMultEval.204.4AdditionalbaselinesystemsTocompareourworkwithastate-of-the-artphrasetableinductionmethod,weimplementedtheworkofZhaoetal.(2015).Eventhoughtheydidnotpro-posetheirmethodtoperformdomainadaptationofanSMTsystem,theirworkistheclosesttooursanddoesnotrequireotherexternalresourcesthanthoseweused,i.e.,paralleldataandmonolingualdatanotnecessarilycomparable.Weimplementedbothglobal(GLP)andlocal(LLP)linearprojectionstrategiesandcollectedsourceandtargetphrasesastheydid.Thesourcephrasesetcontainsalluni-18https://kheafield.com/code/kenlm/estimation19AsinIrvineandCallison-Burch(2014),wegotadropofupto0.5BLEUpointswhenweaddedourfeatures,derivedfrommonolingualdata,totheoriginalphrasetable.20https://github.com/jhclark/multeval/
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
495
gramsandbigramsinthedevelopmentandtestsets,whilethetargetphrasesetcontainsunigramsandbigramscollectedfromthein-domainmonolingualdata.Theydidnotmentionanyfilteringoftheirphrasesets,butwechosetoremoveallphrasescon-tainingdigitsorpunctuationmarks,sincetryingtoretrievethetranslationofnumbersorpunctuationmarksrelyingonlyonwordembeddingsseemsin-appropriateandinfactproducedworseresultsinourpreliminaryexperiments.Tohighlighttheimpactofthephrasesetsused,wealsoexperimentedLLPus-ingourphrasesetscollectedwithword2phrase.Furthermore,togetthebestpossibleresults,wedidnotusethesearchapproximationspresentedinZhaoetal.(2015),i.e.,localsensitivehashingandredun-dantbitvector,andusedinsteadlinearsearch.FortheGLPconfiguration,thetranslationma-trixwastrainedongen-lex,i.e.,5,000phrasepairsextractedfromthegeneral-domainphraseta-bletrainedonparalleldata.FortheLLPconfigura-tions,asinZhaoetal.(2015),wetrainedthetrans-lationmatrixforeachsourcephraseonthe500mostsimilarsourcephrases,retrievedfromthegeneral-domainphrasetable,associatedtotheirmostprob-abletranslation.ForbothGLPandLLPconfig-urations,wekeptthe300besttargetphrasesforeachsourcephrase.Fourfeatures,phraseandlex-icaltranslationprobabilitiesforbothtranslationdi-rections,wereapproximatedusingthesimilaritybe-tweensourceandtargetphraseembeddingsforeachphrasepairandincludedintheinducedphrasetableasdescribedbyZhaoetal.(2015).SincethisapproachproposestotranslateallOOVunigramsandbigrams,itislikelyinourscenariothatsomemedicalterms,forinstance,willhavenocorrecttranslationsintheinducedphrasetable.Foracomparison,weaddedonemorebaselinesystem,whichmerelyusesavanillaMoseswiththe-duop-tionofMosesactivatedtodropallunknownwordsinsteadofcopyingthemintothetranslation.4.5ResultsTheexperimentalresultsaregiveninTable6.InConf.1,ourresultsshowthatbothGLPandLLPconfigurationsperformedmuchworsethanthevanillaMoseswhenusingphrasesnaivelycollected.Thisisduetothefactthattheinducedphraseta-blecontainstranslationsforeveryOOVunigramsandbigrams,evenforthosewhodonotneedtobetranslated,suchasmoleculenamesorplacenames.Wordembeddingsarewell-knowntobeinaccurateforveryinfrequentwords(Mikolovetal.,2013b);par conséquent,forsomeraresourcephrases,eveniftherighttranslationisinthetargetphraseset,itisnotguaranteedthatitwillberegisteredinthein-ducedphrasetableasoneofthe300besttranslationsforthesourcephrase,relyingonlyonwordembed-dings.ThesignificantimprovementsoveravanillaMosesobservedbyZhaoetal.(2015)wouldpoten-tiallybebecausetheytranslatedfromArabic,andUrdu,toEnglish.Forsuchlanguagepairs,onecansafelytrytotranslateeveryOOVtokenofageneral-domaintext,anditisunlikelytodoworsethanavanillaMosessystemthatwillleavetheOOVto-kensasisinthetranslation.AsshownbytheMosesduconfigurations,droppingthemledtoadropofupto4.2BLEUpointsfortheEMEAFr→Entrans-lationtask.ThissuggeststhatOOVtokensmustbecarefullytranslatedonlywhennecessary.ManyOOVtokensinourtranslationtasksdonotneedtobetranslatedintodifferentforms.Hence,weregardthevanillaMosesthatcopiestheOOVtokensinthetranslationastrongbaselinesystem.Interestingly,usingthephrasescollectedbyourmethodforLLPproducedmuchbettertranslations,evenslightlybetterthantheoneproducedbythevanillaMosessystemfortheEMEAEn→Frtransla-tiontaskwithanimprovementof0.2BLEUpoints.ThismaybeduetothefactthatoursourcephrasesetisnotonlymadefromOOVphrases,mean-ingthatnewusefultranslationsmaybeproposedforsourcephrasesthatarealreadyregisteredinthegeneral-domainphrasetable.Moreover,withourphrasesets,thedecoderalsohasthepossibilitytoleavesometokensuntranslatedsinceweaddedeachsourcephraseinthetargetphrasesetifitappearedinthetargetmonolingualdata.Insteadofrelyingonlyonwordembeddings,thefeaturesusedinourapproachhelpedsignificantlytoimprovethetranslationquality.WhenweaddedourinducedphrasetabletoavanillaMosessystem,weobservedconsistentandsignificantimprovementsintranslationquality,withupto2.1BLEUand2.2METEORpointsofimprovementfortheScienceEn→Frtranslationtask.ComparedtotheLLPmethodproposedbyZhao
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
je
un
c
_
un
_
0
0
0
7
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
496
ConfigurationEMEAScienceFr→EnEn→FrFr→EnEn→FrBLEUMETEORBLEUMETEORBLEUMETEORBLEUMETEORvanillaMosesdu24.227.421.740.022.329.120.442.7vanillaMoses(Conf.1)28.430.125.444.824.130.422.745.1+GLPIPTnaive24.327.022.341.022.429.020.642.6+LLPIPTnaive24.727.422.040.422.529.321.143.4+LLPIPT27.929.625.645.122.729.321.343.5+ourIPT(gen-lex)30.232.127.146.625.432.024.847.3+in-domainbilinguallexicon(Conf.2)32.432.828.348.226.632.424.948.0+ourIPT(gen-lex)33.532.628.848.628.533.825.248.4+ourIPT(in-lex)33.832.929.248.926.932.725.949.0+in-domainphrasetable(Conf.3)39.136.133.853.132.136.131.053.9+ourIPT(para-lex)39.136.134.053.232.136.131.254.1Table6:Results(BLEUandMETEOR)withaninducedphrasetable(IPT).TheMosesduandvanillaMosessystemsuseonlyonephrasetabletrainedfromthegeneral-domainparalleldata.Thetranslationmatricesandtheclassifiershavebeentrainedwithabilinguallexicon:gen-lex,in-lex,orpara-lex.Theconfigurationsdenotedas“naive”useaphrasetableinducedfromphrasescollectedasdescribedinSection4.4.Boldscoresindicatethestatisticalsignificance(p<0.01)ofthegainoverthebaselinesystem(Conf.X)ineachconfiguration.etal.(2015),ourapproachincludesmorefeaturesandanadditionalclassificationstep.Thus,thein-ductionofaphrasetableismuchslower.Forin-stance,fortheEMEAFr→Entranslationtask,usingthephrasesetsextractedwithword2phrase,ourinductionmethod(excludingphrasecollection)wasnearly14timesslower(9hoursvs.38minutes).21Phrasecollectionusingword2phrasewasmuchfasterthanfeaturecomputationandphrasepairclas-sification.Forinstance,ittook72minutestocol-lecttargetphrasesfortheEMEAFr→Entransla-tiontask,usingfouriterationsofword2phraseontheEnglishin-domainmonolingualdatawith1CPUthread.InConf.2,addinganin-domainbilinguallexi-conasaphrasetabletothevanillabaselinesys-temsignificantlyboostedtheperformance,mainlybyreducingthenumberofOOVtokens.Ourin-ducedphrasetableshadlessimpact,probablyduetotheoverlapbetweenusefulwordpairscontainedinboththeinducedphrasetableandtheaddedbilin-guallexicon.However,westillobservedsignificantimprovements,whichsupporttheusefulnessofthe21Theexperimentswereperformedwith20CPUthreads.Notealsothatcomputationalspeedwasnotourprimaryfo-cuswhenimplementingourapproach.Optimizingourimple-mentationmayleadtosignificantgainsinspeed,whileZhaoetal.(2015)havepresentedasearchapproximationabletomaketheirapproach18timesfasterthanlinearsearch.inducedphrasetable,withupto1.4and1.0BLEUpointsofimprovements,respectively,fortheEMEAFr→EnandScienceEn→Frtranslationtasksforin-stance.Inthisconfiguration,thein-lexphrasetableledtoslightbutconsistentimprovements.Ithelpedmorethanthegen-lexphrasetable,exceptintheScienceFr→Entask,forwhichtheuseofthegen-lexphrasetableyieldedsignificantlybetterresultsthantheuseofthein-lexphrasetable.Wecanexpectsuchdifferenceswhentheclassifierandthetranslationmatricesaretrainedusinginfrequentwords.Embeddingsforsuchwordsaretypicallynotaswellestimatedasthoseforfrequentwords,mean-ingthatthefeaturesbasedonthewordembeddingsarelessreliableandthusmisleadboththeclassifierforpruningandthedecoder.InConf.3,wherethebaselinesystemevenusedaphrasetabletrainedonin-domainparalleldata,weobtainedcontrastedresults,withonlyslightim-provementsfortheEn→FrtranslationdirectionandnoimprovementsfortheFr→Entranslationdirec-tion.Thislackofimprovementmaybeduetothemorereliablefeaturesandmoreaccuratephrasepairscontainedinthephrasetabledirectlylearnedfromtheparalleldata.Thismayleadthedecodertopreferthistabletotheinducedoneandgivehigherweightsofitsfeaturesaccordingtothispreferenceduringtuning.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
l
a
c
_
a
_
0
0
0
7
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
497
EMEAScienceFr→EnEn→FrFr→EnEn→Frw/ow/w/ow/w/ow/w/ow/correct53.155.152.854.254.057.155.357.2SEEN6.63.97.86.15.92.38.52.2SENSE18.113.315.611.518.914.012.912.9SCORE22.227.723.828.221.226.623.327.7Table7:Percentageofthesourcetokens:comparisonofthetranslationsgeneratedwith(w/)orwithout(w/o)ourgen-lexinducedphrasetable(Conf.1).5ErroranalysisInSection5.1,wefirstpresentananalysisofthedis-tributionoftranslationerrorsthatoursystemspro-duced,usingtheS4taxonomy(Irvineetal.,2013).Then,inSection5.2,weillustratesometranslationexamplesforwhichourinducedphrasetableshaveproducedabettertranslation.5.1AnalysiswiththeS4taxonomyTheS4taxonomycomprisesthefollowingfourerrortypes:•SEEN:attempttotranslateawordneverseenbefore•SENSE:attempttotranslateawordwiththewrongsense•SCORE:agoodtranslationforthewordisavailablebutanotherone,givingabetterscoretothehypothesis,ischosenbythesystem•SEARCH:agoodtranslationisavailableforthewordbutisprunedduringthesearchforthebesthypothesisWeconsideredtheSEEN,SENSE,andSCOREer-rorsasinIrvineetal.(2013),butnottheSEARCHerrors,assumingthattherecentphrase-basedSMTsystemsrarelymakethistypeoferrorsandwith-outimpactonthetranslationquality(Wisniewskietal.,2010;Azizetal.,2014).WeperformedaWordAlignmentDrivenEvaluation(WADE)(Irvineetal.,2013)tocounttheword-levelerrors.Table7comparestheresultswithandwithoutourgen-lexinducedphrasetables(Conf.1).Forthefourtasks,morethanhalfofthesourcetokenswerecorrectlytranslatedaccordingtothetranslationref-erence.Ouranalysisrevealsthatourinducedphrasetablehelpstoobtainmorecorrecttranslations,ashigherpercentagesofsourcewordswerecorrectlytranslated,despitethesignificantincreaseofSCOREerrors(around5%forallthetasks).Thismeansthatthecorrecttranslationforthesourcewordisavailable,butthefeaturesassociatedtothistrans-lationwerenotinformativeenoughforthedecodertochooseit.ThepercentageofSEENerrorsinthetranslationsdecreasedsignificantlywiththeinducedphrasetableforallthetasks,asaresultofmanywordsandphrasesunseeninthegeneral-domainparalleldatabeingcoveredbyusingthein-domainmonolingualdata.However,ourmethoddoesnotguaranteetofindappropriatetranslationsforthesewords.Itisevenpossiblethatalltheproposedtrans-lationsareinappropriate.Nonetheless,wecanseeanoticeabledecreaseoftheSENSEerrors,exceptintheScienceEn→Frtask,forwhichwehaveusedonlyasmallamountofin-domainFrenchmono-lingualdata.AsreportedinTable3,fewertargetphraseswerecollectedforthistask,leadingtoonlyasmallchanceofobtainingtherighttranslationforagivensourcephrase.ThepercentageofSENSEerrorsstillremainshigherthan10%foralltasks,in-dicatingthatthecorrecttranslationisnotavailableinourphrasesetorisprunedbytheclassifierduringthephrasetableinduction.Fromthisanalysis,wedrawtheconclusionthatourapproachhassignificantlyincreasedthereach-abilityofthetranslationreferencealongwiththequalityofthetranslationproducedbythedecoder.Weexpectthatmoreinformativeorbetterestimatedfeaturescanfurtherimproveourresults.Improv-ingourmethodtocollectthetargetphrasesorusingalargerin-domainmonolingualcorpuswouldalsohelptoreduceSENSEerrors.5.2TranslationexamplesTable8presentsexamplesofsourcephraseandtheirtranslationschosenbythedecoderintheEMEAFr→Entranslationtask.AsshownbyExample#1,bothLLPandgen-lexconfigurationscanfindagoodtranslationintheirinducedphraseta-bleforthephrase“aupointd’injection”whilethegeneral-domainphrasetabledoesnotcontainthissourcephrase.Asaresult,thevanillaMosessystemproducedawrongtranslationusinggeneral-domainwordtranslations.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
l
a
c
_
a
_
0
0
0
7
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
498
System#1#2#3#4sourceaupointd’injectionglaucomeaigucontientdulactosemonohydrat´elelansoprazolen’estpasvanillaMosesatinjectionacuteglaucomemonohydrat´econtainslactosethelansoprazoleisnotLLPIPTatthepointofinjectionacutecontainslactosethe,isnotourIPT(gen-lex)atthesiteofinjectionacuteglaucomacontainslactosemonohydratethelansoprazoleisnotreferenceattheinjectionsiteacuteglaucomacontainslactosemonohydratelansoprazoleisnotTable8:Examplesofsourcephraseandtheirtranslation,fromthetestsetoftheEMEAFr→Entranslationtask,producedbythedecoderusingdifferentconfigurations:vanillaMoses(Conf.1)andMosesusingaphrasetableinducedwithLLPorwithourmethod(gen-lex).Example#2showsatypicalerrormadebytheLLPconfiguration.Inthisexample,“glaucome”isOOV,notranslationisproposedforthistokeninthegeneral-domainphrasetable.TheLLPIPTcon-tainsthesourcephrase“glaucomeaigu”butnoneofthe300bestcorrespondingtargetphrasescontainthetoken“glaucoma”.However,mostofthemcon-tainthemeaningof“acute”.Thiscanbeexplainedbythemuchhigherfrequencyof“aigu”whiletheword“glaucome”isveryrare,eveninthein-domainmonolingualdata.Consequently,“aigu”hasanem-beddingmoreaccuratethantheoneof“glaucome”whichisthenmuchmoredifficulttoprojectcor-rectlyacrosslanguages.Incontrast,ourgen-lexIPTcontainsthetranslationreferencefor“glaucomeaigu”andthistranslationhasbeenusedcorrectlybythedecoder,guidedbyourfeatureset.Example#3issimilartoExample#2,theembed-dingoftherareword“monohydrat´e”isprobablynotaccurateenoughtobecorrectlyprojected,thecor-recttranslationisnotintheLLPIPT,whileourap-proachsucceededtotranslateitcorrectly.Finally,Example#4presentsanothercommonsit-uationwhereanOOVtoken,here“lansoprazole”hastobepreservedasisandiscorrectlyreportedinthetranslationbythevanillaMosessystem.TheLLPIPTproposestranslationsfor“lansoprazole”,mostofthemsemanticallyunrelated,liketheonechosenbythedecoderinthisconfiguration.Weassumethatthesurface-levelsimilarityfea-turesofourmethodhelpedthedecodertoidentifytherighttranslationinthissituation.Nonetheless,evenwhenusingourgen-lexIPT,westillob-servedsomesituationswheretokensthatshouldbepreservedwereactuallywronglytranslated,produc-ingoutputsworsethanthoseproducedbythevanillaMosessystem.6ConclusionandfutureworkWepresentedaframeworktoinduceaphraseta-blefromunalignedmonolingualdataofspecificdo-mains.Weshowedthatsuchaphrasetable,whenintegratedtothedecoder,consistentlyandsignifi-cantlyimprovedthetranslationqualityfortextsinthetargeteddomain.Ourapproachusesonlysim-plefeatureswithoutrequiringstronglycomparableorannotatedtextsinthetargeteddomain.Ourmethodcouldfurtherbeimprovedinseveralways.First,weexpectbetterimprovementsbyusingmorein-domainmonolingualdataorbybeingmorecarefulincollectingthetargetphrasestouseforthephrasetableinductionasopposedtosimplypruningthemaccordingtothewordfrequency.Moreover,aswesawinSection5,scoringthephrasepairsisoneofthemostimportantissues.Weneedmorein-formativefeaturestobetterscorethepairsofsourceandtargetphrases.Despitetheirhighcomputationalcost,includingfeaturesbasedonorthographicsimi-larityorusingbetterestimatedcross-lingualembed-dingsmayhelpforthispurpose.AcknowledgmentsWewouldliketothanktheanonymousreviewersandtheactioneditor,ChrisQuirk,fortheirinsight-fulcomments.ReferencesAmittaiAxelrod,XiaodongHe,andJianfengGao.2011.DomainAdaptationviaPseudoIn-DomainDataSe-lection.InProceedingsofEMNLP,Edinburgh,Scot-land,UK.WilkerAziz,MarcDymetman,andLuciaSpecia.2014.ExactDecodingforPhrase-BasedStatisticalMachineTranslation.InProceedingsofEMNLP,Doha,Qatar.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
l
a
c
_
a
_
0
0
0
7
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
499
MarineCarpuat,HalDaum´eIII,AlexanderFraser,ChrisQuirk,FabienneBraune,AnnClifton,etal.2012.Do-mainadaptationinmachinetranslation:Finalreport.In2012JohnsHopkinssummerworkshopfinalreport.Baltimore,MD:JohnsHopkinsUniversity.A.P.SarathChandar,StanislasLauly,HugoLarochelle,MiteshMKhapra,BalaramanRavindran,VikasRaykar,andAmritaSaha.2014.AnAutoencoderAp-proachtoLearningBilingualWordRepresentations.InProceedingsofNIPS,Montr´eal,Canada.ColinCherryandGeorgeFoster.2012.BatchTuningStrategiesforStatisticalMachineTranslation.InPro-ceedingsofNAACL-HLT,Montr´eal,Canada.JonathanH.Clark,ChrisDyer,AlonLavie,andNoahA.Smith.2011.BetterHypothesisTestingforStatis-ticalMachineTranslation:ControllingforOptimizerInstability.InProceedingsofACL-HLT,Portland,OR,USA.JocelynCoulmance,Jean-MarcMarty,GuillaumeWen-zek,andAmineBenhalloum.2015.Trans-gram,FastCross-lingualWord-embeddings.InProceedingsofEMNLP,Lisbon,Portugal.HalDaum´e,IIIandJagadeeshJagarlamudi.2011.Do-mainAdaptationforMachineTranslationbyMiningUnseenWords.InProceedingsofACL-HLT,Portland,OR,USA.MichaelDenkowskiandAlonLavie.2014.MeteorUniversal:LanguageSpecificTranslationEvaluationforAnyTargetLanguage.InProceedingsofEACL,Gothenburg,Sweden.QingDouandKevinKnight.2012.LargeScaleDeci-phermentforOut-of-domainMachineTranslation.InProceedingsofEMNLP-CoNLL,JejuIsland,Korea.LongDuong,HiroshiKanayama,TengfeiMa,StevenBird,andTrevorCohn.2016.LearningCrosslingualWordEmbeddingswithoutBilingualCorpora.InPro-ceedingsofEMNLP,Austin,TX,USA.ManaalFaruquiandChrisDyer.2014.ImprovingVectorSpaceWordRepresentationsUsingMultilingualCor-relation.InProceedingsofEACL,Gothenburg,Swe-den.PascaleFungandPercyCheung.2004.MiningVery-Non-ParallelCorpora:ParallelSentenceandLexiconExtractionviaBootstrappingandEM.InProceedingsofEMNLP,Barcelona,Spain.PascaleFung.1995.CompilingBilingualLexiconEn-triesFromaNon-ParallelEnglish-ChineseCorpus.InProceedingsofthe3rdWorkshoponVeryLargeCor-pora,Cambridge,MA,USA.StephanGouws,YoshuaBengio,andGregCorrado.2015.BilBOWA:FastBilingualDistributedRepre-sentationswithoutWordAlignments.InProceedingsofICML,Lille,France.AriaHaghighi,PercyLiang,TaylorBerg-Kirkpatrick,andDanKlein.2008.LearningBilingualLexiconsfromMonolingualCorpora.InProceedingsofACL-HLT,Colombus,OH,USA.KennethHeafield,IvanPouzyrevsky,JonathanH.Clark,andPhilippKoehn.2013.ScalableModifiedKneser-NeyLanguageModelEstimation.InProceedingsofACL,Sofia,Bulgaria.SanjikaHewavitharanaandStephanVogel.2016.Ex-tractingparallelphrasesfromcomparabledataforma-chinetranslation.NaturalLanguageEngineering,22(4):549–573.AnnIrvineandChrisCallison-Burch.2013.SupervisedBilingualLexiconInductionwithMultipleMonolin-gualSignals.InProceedingsofHLT-NAACL,Atlanta,GA,USA.AnnIrvineandChrisCallison-Burch.2014.Halluci-natingPhraseTranslationsforLowResourceMT.InProceedingsofCoNLL,Baltimore,MD,USA.AnnIrvineandChrisCallison-Burch.2016.End-to-EndStatisticalMachineTranslationwithZeroorSmallParallelTexts.NaturalLanguageEngineering,22(4):517–548.AnnIrvine,JohnMorgan,MarineCarpuat,HalDaum´eIII,andDragosMunteanu.2013.MeasuringMachineTranslationErrorsinNewDomains.Trans-actionsoftheAssociationforComputationalLinguis-tics,1.MarcinJunczys-DowmuntandArkadiuszSzał.2012.SyMGiza++:SymmetrizedWordAlignmentMod-elsforMachineTranslation.InSecurityandIntel-ligentInformationSystems(SIIS),volume7053ofLectureNotesinComputerScience.Springer-Verlag,Berlin/Heidelberg,Germany.AlexKlementiev,AnnIrvine,ChrisCallison-Burch,andDavidYarowsky.2012.TowardStatisticalMachineTranslationwithoutParallelCorpora.InProceedingsofEACL,Avignon,France.PhilippKoehnandKevinKnight.2002.LearningaTranslationLexiconfromMonolingualCorpora.InProceedingsoftheACLWorkshoponUnsupervisedLexicalAcquisition,Philadelphia,PA,USA.PhilippKoehn,HieuHoang,AlexandraBirch,ChrisCallison-Burch,MarcelloFederico,NicolaBertoldi,BrookeCowan,WadeShen,ChristineMoran,RichardZens,ChrisDyer,OndˇrejBojar,AlexandraCon-stantin,andEvanHerbst.2007.Moses:OpenSourceToolkitforStatisticalMachineTranslation.InPro-ceedingsofACL,Prague,CzechRepublic.AngelikiLazaridou,GeorgianaDinu,AdamLiska,andMarcoBaroni.2015.FromVisualAttributestoAd-jectivesthroughDecompositionalDistributionalSe-mantics.TransactionsoftheAssociationforCompu-tationalLinguistics,3.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
7
5
1
5
6
7
5
3
1
/
/
t
l
a
c
_
a
_
0
0
0
7
5
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
500
TomasMikolov,QuocV.Le,andIlyaSutskever.2013a.ExploitingSimilaritiesamongLanguagesforMachineTranslation.CoRR,abs/1309.4168.TomasMikolov,IlyaSutskever,KaiChen,GregSCor-rado,andJeffDean.2013b.DistributedRepresenta-tionsofWordsandPhrasesandtheirCompositionality.InProceedingsofNIPS,LakeTahoe,NV,USA.JeffMitchellandMirellaLapata.2010.CompositioninDistributionalModelsofSemantics.CognitiveSci-ence,34(8).DragosStefanMunteanuandDanielMarcu.2005.Im-provingMachineTranslationPerformancebyExploit-ingNon-ParallelCorpora.ComputationalLinguistics,31(4):477–504.ToshiakiNakazawa,ManabuYaguchi,KiyotakaUchi-moto,MasaoUtiyama,EiichiroSumita,SadaoKuro-hashi,andHitoshiIsahara.2016.ASPEC:AsianScientificPaperExcerptCorpus.InProceedingsofLREC,Portoroˇz,Slovenia.MalteNuhn,ArneMauser,andHermannNey.2012.De-cipheringForeignLanguagebyCombiningLanguageModelsandContextVectors.InProceedingsofACL,JejuIsland,Korea.KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.BLEU:aMethodforAutomaticEvaluationofMachineTranslation.InProceedingsofACL,Philadelphia,PA,USA.ReinhardRapp.1995.IdentifyingWordTranslationsinNon-parallelTexts.InProceedingsofACL,Cam-bridge,MA,USA.SujithRaviandKevinKnight.2011.DecipheringFor-eignLanguage.InProceedingsofACL-HLT,Portland,OR,USA.AvneeshSaluja,HanyHassan,KristinaToutanova,andChrisQuirk.2014.Graph-basedSemi-SupervisedLearningofTranslationModelsfromMonolingualData.InProceedingsofACL,Baltimore,MD,USA.RichardSocher,JohnBauer,ChristopherD.Manning,andAndrewY.Ng.2013a.ParsingwithCompo-sitionalVectorGrammars.InProceedingsofACL,Sofia,Bulgaria.RichardSocher,AlexPerelygin,JeanWu,JasonChuang,ChristopherD.Manning,AndrewNg,andChristopherPotts.2013b.RecursiveDeepModelsforSemanticCompositionalityOveraSentimentTreebank.InPro-ceedingsofEMNLP,Seattle,WA,USA.MasaoUtiyamaandHitoshiIsahara.2003.ReliableMeasuresforAligningJapanese-EnglishNewsArti-clesandSentences.InProceedingsofACL,Sapporo,Japan.IvanVuli´candAnnaKorhonen.2016.OntheRoleofSeedLexiconsinLearningBilingualWordEmbed-dings.InProceedingsofACL,Berlin,Germany.GuillaumeWisniewski,AlexandreAllauzen,andFranc¸oisYvon.2010.AssessingPhrase-BasedTrans-lationModelswithOracleDecoding.InProceedingsofEMNLP,Cambridge,MA,USA.JiajunZhangandChengqingZong.2013.LearningaPhrase-basedTranslationModelfromMonolingualDatawithApplicationtoDomainAdaptation.InPro-ceedingsofACL,Sofia,Bulgaria.BingZhaoandStephanVogel.2002.Adaptiveparallelsentencesminingfromwebbilingualnewscollection.InProceedingsofIEEEICDM,Maebashi,Japan.KaiZhao,HanyHassan,andMichaelAuli.2015.Learn-ingTranslationModelsfromMonolingualContinu-ousRepresentations.InProceedingsofNAACL-HLT,Denver,CO,USA.
Télécharger le PDF