计算语言学协会会刊, 卷. 4, PP. 99–112, 2016. 动作编辑器: Philipp Koehn.

计算语言学协会会刊, 卷. 4, PP. 99–112, 2016. 动作编辑器: Philipp Koehn.
提交批次: 11/2015; 修改批次: 2/2016; 已发表 4/2016.

2016 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

C
(西德:13)

AdaptingtoAllDomainsatOnce:RewardingDomainInvarianceinSMTHoangCuongandKhalilSima’anandIvanTitovInstituteforLogic,LanguageandComputationUniversityofAmsterdamSciencePark107,1098XGAmsterdam,TheNetherlands{c.hoang,k.simaan,titov}@uva.nlAbstractExistingworkondomainadaptationforstatis-ticalmachinetranslationhasconsistentlyas-sumedaccesstoasmallsamplefromthetestdistribution(targetdomain)attrainingtime.Inpractice,然而,thetargetdomainmaynotbeknownattrainingtimeoritmaychangetomatchuserneeds.Insuchsituations,itisnaturaltopushthesystemtomakesaferchoices,givinghigherpreferencetodomain-invarianttranslations,whichworkwellacrossdomains,overriskydomain-speciﬁcalterna-tives.Weencodethisintuitionby(1)in-ducinglatentsubdomainsfromthetrainingdataonly;(2)introducingfeatureswhichmea-surehowspecializedphrasesaretoindividualinducedsub-domains;(3)estimatingfeatureweightsonout-of-domaindata(ratherthanonthetargetdomain).Weconductexperimentsonthreelanguagepairsandanumberofdiffer-entdomains.Weobserveconsistentimprove-mentsoverabaselinewhichdoesnotexplic-itlyrewarddomaininvariance.1IntroductionMismatchinphrasetranslationdistributionsbe-tweentestdata(targetdomain)andtraindataisknowntoharmperformanceofstatisticaltransla-tionsystems(Irvineetal.,2013;Carpuatetal.,2014).Domain-adaptationmethods(Fosteretal.,2010;Bisazzaetal.,2011;Sennrich,2012乙;Raz-maraetal.,2012;Sennrichetal.,2013;Haddow,2013;Jotyetal.,2015)aimtospecializeasystemestimatedonout-of-domaintrainingdatatoatargetdomainrepresentedbyasmalldatasample.Inprac-tice,然而,thetargetdomainmaynotbeknownattrainingtimeoritmaychangeovertimedepend-ingonuserneeds.Inthisworkweaddressexactlythesettingwherewehaveadomain-agnosticsystembutwehavenoaccesstoanysamplesfromthetar-getdomainattrainingtime.Thisisanimportantandchallengingsettingwhich,asfarasweareaware,hasnotyetreceivedattentionintheliterature.Whenthetargetdomainisunknownattrainingtime,thesystemcouldbetrainedtomakesaferchoices,preferringtranslationswhicharelikelytoworkacrossdifferentdomains.Forexample,whentranslatingfromEnglishtoRussian,themostnaturaltranslationfortheword‘code’wouldbehighlyde-pendentonthedomain(andthecorrespondingwordsense).TheRussianwords‘xifr’,‘zakon’or‘programma’wouldperhapsbeoptimalchoicesifweconsidercryptography,legalandsoftwaredevel-opmentdomains,respectively.However,thetransla-tion‘kod’isalsoacceptableacrossallthesedomainsand,assuch,wouldbeasaferchoicewhenthetar-getdomainisunknown.Notethatsuchatransla-tionmaynotbethemostfrequentoveralland,con-sequently,mightnotbeproposedbyastandard(i.e.,domain-agnostic)phrase-basedtranslationsystem.Inordertoencodepreferencefordomain-invarianttranslations,weintroduceameasurewhichquantiﬁeshowlikelyaphrase(oraphrase-pair)istobe“domain-invariant”.Werecallthatmostlargeparallelcorporaareheterogeneous,consistingofdi-verselanguageuseoriginatingfromavarietyofun-speciﬁedsubdomains.Forexample,newsarticlesmaycoversports,ﬁnance,政治,technologyandavarietyofothernewstopics.Noneofthesub-domainsmaymatchthetargetdomainparticularly

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

100

出色地,buttheycanstillrevealhowdomain-speciﬁcagivenphraseis.Forexample,ifwewouldob-servethattheword‘code’canbetranslatedas‘kod’acrosscryptographyandlegalsubdomainsobservedintrainingdata,wecanhypothesizethatitmayworkbetteronanewunknowndomainthan‘zakon’whichwasspeciﬁconlytoasinglesubdomain(le-gal).Thiswouldbeasuitabledecisionifthetestdomainhappenstobesoftwaredevelopment,eventhoughnotextspertainingtothisdomainwerein-cludedintheheterogeneoustrainingdata.Importantly,thesubdomainsareusuallynotspec-iﬁedintheheterogeneoustrainingdata.Therefore,wetreatthesubdomainsaslatent,sowecaninducethemautomatically.Onceinduced,wedeﬁnemea-suresofdomainspeciﬁcity,particularlyexpressingtwogenericproperties:PhrasedomainspeciﬁcityHowspeciﬁcisatargetorasourcephrasetosomeoftheinducedsub-domains?PhrasepairdomaincoherenceHowcoherentisasourcephraseandatargetlanguagetranslationacrosstheinducedsubdomains?Thesefeaturescapturetwoorthogonalaspectsofphrasebehaviourinheterogeneouscorpora,withtherationalethatphrasepairscanbeweightedalongthesetwodimensions.Domain-speciﬁcitycapturestheintuitionthatthemorespeciﬁcaphraseistocertainsubdomains,thelessapplicableitisingen-eral.Notethatspeciﬁcityisappliednotonlytotar-getphrases(as‘kod’and‘zakon’intheaboveex-ample)butalsotosourcephrases.Whenappliedtoasourcephrase,itmaygiveapreferencetowardsusingshorterphrasesastheyareinherentlylessdo-mainspeciﬁc.Incontrasttophrasedomainspeci-ﬁcity,phrasepaircoherencereﬂectswhethercan-didatetargetandsourcephrasesaretypicallyusedinthesamesetofdomains.Theintuitionhereisthatthemoredivergentthedistributionalbehaviourofsourceandtargetphrasesacrosssubdomains,thelesscertainwearewhetherthisphrasepairisvalidfortheunknowntargetdomain.Inotherwords,atranslationrulewithsourceandtargetphraseshav-ingtwosimilardistributionsoverthelatentsubdo-mainsislikelysafertouse.Weightsforthesefeatures,alongsideallotherstandardfeatures,aretunedonadevelopmentset.Importantly,weshowthatthereisnonoteworthybeneﬁtfromtuningtheweightsonasamplefromthetargetdomain.Itisenoughtotunethemonamixed-domaindatasetsufﬁcientlydifferentfromthetrainingdata.Weattributethisattractiveprop-ertytothefactthatourfeatures,unliketheonestypicallyconsideredinstandarddomain-adaptationwork,aregenericandonlyaffecttheamountofriskoursystemtakes.Incontrast,forexample,inEi-delmanetal.(2012),Chiangetal.(2011),Huetal.(2014),Hasleretal.(2014),Suetal.(2015),Sen-nrich(2012乙),Chenetal.(2013乙),andCarpuatetal.(2014),featurescapturesimilaritiesbetweenatargetdomainandeachofthetrainingsubdomains.Clearly,domainadaptationwithsuchrichfeatures,thoughpotentiallymorepowerful,wouldnotbepos-siblewithoutadevelopmentsetcloselymatchingthetargetdomain.Weconductourexperimentsonthreelanguagepairsandexploreadaptationto9domainadapta-tiontasksintotal.Weobservesigniﬁcantandcon-sistentperformanceimprovementsoverthebaselinedomain-agnosticsystems.Thisresultconﬁrmsthatourtwofeatures,andthelatentsubdomainstheyarecomputedfrom,areusefulalsofortheverychal-lengingdomainadaptationsettingconsideredinthiswork.2Domain-InvarianceforPhrasesAtthecoreofastandardstate-of-the-artphrase-basedsystem(Koehnetal.,2003;OchandNey,2004)liesaphrasetable{h˜e,˜fi}ex-tractedfromaword-alignedtrainingcorpustogetherwithestimatesforphrasetranslationprobabilitiesPcount(˜e|˜f)andPcount(˜f|˜e).Typicallythephrasesandtheirprobabilitiesareobtainedfromlargeparal-lelcorpora,whichareusuallybroadenoughtocoveramixtureofseveralsubdomains.Insuchmixtures,phrasedistributionsmaybedifferentacrossdifferentsubdomains.Somephrases(whethersourceortar-get)aremorespeciﬁcforcertainsubdomainsthanothers,whilesomephrasesareusefulacrossmanysubdomains.Moreover,foraphrasepair,thedistri-butionoverthesubdomainsforitssourcesidemaybesimilarornottothedistributionforitstargetside.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

101

SourcePhraseProjectiondomain.i……domain.1domain.KTargetPhraseProjectionFigure1:TheprojectionframeworkofphrasesintoK-dimensionalvectorspaceofprobabilisticlatentsubdo-mains.Coherentpairsseemsafertoemploythanpairsthatexhibitdifferentdistributionsoverthesubdomains.Thesetwofactors,domainspeciﬁcityanddomaincoherence,canbeestimatedfromthetrainingcor-pusifwehaveaccesstosubdomainstatisticsforthephrases.Inthesettingaddressedhere,thesubdo-mainsarenotknowninadvanceandwehavetocon-siderthemlatentinthetrainingdata.Therefore,weintroducearandomvariablez∈{1,…,K}encoding(arbitrary)Klatentsubdo-mainsthatgenerateeachsourceandtargetphrase˜eand˜fofeveryphrasepairh˜e,˜fi.InthenextSec-tion,weaimtoestimatedistributionsP(z|˜e)andP(z|˜f)forsubdomainzoverthesourceandtargetphrasesrespectively.Inotherwords,weaimatpro-jectingphrasesontoacompact(K−1)dimensionalsimplexofsubdomainswithvectors:~˜e=hP(z=1|˜e),…,磷(z=K|˜e)我,(1)~˜f=hP(z=1|˜f),…,磷(z=K|˜f)i.(2)EachoftheKelementsencodeshowwelleachsourceandtargetphraseexpressesaspeciﬁclatentsubdomaininthetrainingdata.SeeFig.1foranillustrationoftheprojectionframework.Oncetheprojectionisperformed,thehiddencross-domaintranslationbehaviourofphrasesandphrasepairscanbemodeledasfollows:•Domain-speciﬁcityofphrases:Arulewithsourceandtargetphraseshavingapeakeddistributionoverlatentsubdomainsislikelydomain-speciﬁc.Technicallyspeaking,entropycomesasanaturalchoiceforquantifyingdomainspeciﬁcity.Here,weoptfortheRenyientropyanddeﬁnethedo-mainspeciﬁcityasfollows:Dα(~˜e)=11−αlog(西德:16)KXi=1P(z=i|˜e)A(西德:17)Dα(~˜f)=11−αlog(西德:16)KXi=1P(z=i|˜f)A(西德:17)Forconvenience,werefertoDα(·)asthedomainspeciﬁcityofaphrase.Inthisstudy,wechoosethevalueofαas2whichisthedefaultchoice(alsoknownastheCollisionentropy).•Source-targetcoherenceacrosssubdomains:Atranslationrulewithsourceandtargetphraseshavingtwosimilardistributionsoverthelatentsubdomainsislikelysafertouse.WeusetheChebyshevdistanceformeasuringthesimilaritybetweentwodistributions.Thedivergenceoftwovectors~˜eand~˜fisdeﬁnedasfollowsD(~˜e,~˜f)=maxi={1,…,K}(西德:12)(西德:12)(西德:12)磷(z=i|˜e)−P(z=i|˜f)(西德:12)(西德:12)(西德:12)WerefertoD(~˜e,~˜f)asthephrasepaircoherenceacrosslatentsubdomains.Weinvestigatedsomeothersimilaritiesforphrasepaircoherence(theKullback-LeiblerdivergenceandtheHellingerdistance)buthavenotobservedanynoticeableimprovementsintheperformance.Wewilldiscusstheseexperimentsintheempiricalsec-tion.Oncecomputedforeveryphrasepair,thetwomeasuresDα(~˜e),Dα(~˜f)D(~˜e,~˜f),willbeintegratedintoaphrase-basedSMTsystemasfeaturefunc-tions.3LatentSubdomainInductionWenowpresentourapproachforinducinglatentsubdomaindistributionsP(z|˜e)andP(z|˜f)forev-erysourceandtargetphrases˜eand˜f.Inourexper-iments,wecompareusingoursubdomaininductionframeworkwithrelyingontopicdistributionspro-videdbyastandardtopicmodel,LatentDirichletAllocation(Bleietal.,2003).NotethatunlikeLDAwerelyonparalleldataandwordalignmentswhen

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

102

inducingdomains.Ourintuitionisthatlatentvari-ablescapturingregularitiesinbilingualdatamaybemoreappropriateforthetranslationtask.Inducingtheseprobabilitiesdirectlyisratherdif-ﬁcultasthetaskofdesigningafullygenerativephrase-basedmodelisknowntobechallenging.1Inordertoavoidthis,wefollowMatsoukasetal.(2009)andCuongandSima’an(2014A)who“em-bed”suchaphrase-levelmodelintoalatentsubdo-mainmodelthatworksatthesentencelevel.Inotherwords,weassociatelatentdomainswithsentencepairsratherthanwithphrases,andusetheposteriorprobabilitiescomputedforthesentenceswithallthephrasesappearinginthecorrespondingsentences.GivenP(z|e,F)-alatentsubdomainmodelgivensentencepairshe,fi-theestimationofP(z|˜e)andP(z|˜f),forphrases˜eand˜f,canbesimpliﬁedbycomputingexpectationszforallz∈{1,…,K}:磷(z=i|˜e)=Pe,fP(z=i|e,F)C(˜e;e)PKi0=1Pe,fP(z=i0|e,F)C(˜e;e),磷(z=i|˜f)=Pe,fP(z=i|e,F)C(˜f;F)PKi0=1Pe,fP(z=i0|e,F)C(˜f;F).这里,C(˜e,e)isthecountofaphrase˜einasentenceeinthetrainingcorpus.Latentsubdomainsforsentences.Wenowturntodescribingourlatentsubdomainmodelforsen-tences.Weassumethefollowinggenerativestoryforsentencepairs:1.generatethedomainzfromthepriorP(z);2.choosethegenerationdirection:f-to-eore-to-f,withequalprobability;3.ifthee-to-fdirectionischosenthengeneratethepairrelyingonP(e|z)磷(F|e,z);4.否则,useP(F|z)磷(e|F,z).正式地,itisauniformmixtureofthegenera-tiveprocessesforthetwopotentialtranslationdi-1Doingthatrequiresincorporatingintothemodeladditionalhiddenvariablesencodingphrasesegmentation(DeNeroetal.,2006).Thiswouldsigniﬁcantlycomplicateinference(Mylon-akisandSima’an,2008;Neubigetal.,2011;CohnandHaffari,2013).rections.2Thisgenerativestoryimplieshavingtwotranslationmodels(TMs)andtwolanguagemod-els(LMs),eachaugmentedwithlatentsubdomains.Now,theposteriorP(z|e,F)canbecomputedasP(z|e,F)∝P(z)(西德:16)12磷(e|z)磷(F|e,z)+12磷(F|z)磷(e|F,z)(西德:17).(3)Asweaimforasimpleapproach,ourTMsarecomputedthroughtheintroductionofhiddenalign-mentsaanda0inf-to-eande-to-fdirectionsre-spectively,inwhichP(F|e,z)=PaP(F,A|e,z)andP(e|F,z)=Pa0P(e,a0|F,z).Tomakethemarginalizationofalignmentstractable,were-strictP(F,A|e,z)andP(e,a0|F,z)tothesameassumptionsasIBMModel1(Brownetal.,1993)(i.e.,amultiplicationoftranslationoflexicalproba-bilitieswithrespecttolatentsubdomains).Weusestandardnth-orderMarkovmodelforP(e|z)andP(F|z),inwhichP(e|z)=QiP(不|ei−1i−n,z)andP(F|z)=QjP(fj|fj−1j−n,z).这里,thenotationei−1i−nandfj−1j−nisusedtodenotethehistoryoflengthnforthesourceandtargetwordseiandfj,respec-tively.Training.Fortraining,wemaximizethelog-likelihoodLofthedataL=Xe,flog(西德:16)XzP(z)(西德:16)12磷(e|z)XaP(F,A|e,z)+12磷(F|z)Xa0P(e,a0|F,z)(西德:17)(西德:17).(4)Asthereisnoclosed-formsolution,weusetheexpectation-maximization(EM)algorithm(Demp-steretal.,1977).IntheE-step,wecomputetheposteriordistribu-2Notethatweeffectivelyaveragebetweenthemwhichisreasonable,asthereisnoreasontogivepreferencetoanyofthem.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

103

tionsP(A,z|e,F)andP(a0,z|e,F)asfollowsP(A,z|e,F)∝P(z)(西德:16)磷(e|z)磷(F,A|e,z)+磷(F|z)Xa0P(e,a0|F,z)(西德:17),(5)磷(a0,z|e,F)∝P(z)(西德:16)磷(e|z)XaP(F,A|e,z)+磷(F|z)磷(e,a0|F,z)(西德:17).(6)IntheM-step,weusetheposteriorsP(A,z|e,F)andP(a0,z|e,F)tore-estimateparametersofbothalignmentmodels.ThisisdoneinaverysimilarwaytoestimationofthestandardIBMModel1.Weusetheposteriorstore-estimateLMparame-tersasfollowsP(不|ei−11,z)∝Xe,fP(z|e,F)C(ei1;e),(7)磷(fi|fi−11,z)∝Xe,fP(z|e,F)C(fi1;F).(8)Toobtainbetterparameterestimatesforwordpre-dictionsandavoidoverﬁtting,weusesmoothingintheM-step.Inthiswork,wechosetoapplyexpectedKneser-Neysmoothingtechnique(ZhangandChi-ang,2014)asitissimpleandachievesstate-of-the-artperformanceonthelanguagemodelingproblem.Finally,磷(z)canbesimplyestimatedasfollowsP(z)∝Xe,fP(z|e,F)(9)HierarchicalTraining.Inpractice,wefoundthattrainingthefulljointmodelleadstobrittleperfor-mance,asEMisverylikelytogetstuckinbadlo-calmaxima.Toaddressthisdifﬁculty,inourim-plementation,westartoutbyﬁrstjointlytrainingP(z),磷(e|z)andP(F|z).InthiswayintheE-step,weﬁxourmodelparametersandcomputeP(z|e,F)foreverysentencepair:磷(z|e,F)∝P(e|z)磷(F|z)磷(z).IntheM-step,weusethepos-teriorstore-estimatethemodelparameters,asinEquations(7),(8)和(9).Oncethemodelistrained,weﬁxthelanguagemodelingparametersandﬁnallytrainthefullmodel.Thisparallellatentsubdomainlanguagemodelislessexpressiveand,最后,islesslikelytogetstuckinalocalmaximum.TheLMsestimatedinthiswaywillthendrivethefullalignmentmodelto-wardsbetterconﬁgurationsintheparameterspace.3Inpractice,thistrainingschemeisparticularlyuse-fulincaseoflearningamoreﬁne-grainedlatentsub-domainmodelwithlargerK.4ExperimentsTrainingDataEnglishFrenchSents5.01MWords103.39M125.81MEnglishSpanishSents4.00MWords81.48M89.08MEnglishGermanSents4.07MWords93.19M88.48MTable1:DataPreparation.4.1DataWeconductexperimentswithlarge-scaleSMTsys-temsacrossanumberofdomainsforthreelan-guagepairs(English-Spanish,English-GermanandEnglish-French).ThedatasetsaresummarizedinTable1.ForEnglish-Spanish,werunexperimentswithtrainingdataconsistingof4MsentencepairscollectedfrommultipleresourceswithintheWMT2013MTSharedTask.TheseincludeEuroParl(科恩,2005),CommonCrawlCorpus,UNCor-pus,andNewsCommentary.ForEnglish-German,ourtrainingdataconsistsof4.1MsentencepairscollectedfromtheWMT2015MTSharedTask,in-cludingEuroParl,CommonCrawlCorpusandNewsCommentary.Finally,forEnglish-French,wetrainSMTsystemsonacorpusof5Msentencepairscol-lectedfromtheWMT2015MTSharedTask,includ-ingthe109French-Englishcorpus.Weconductedexperimentson9differentdomains(任务)wherethedatawasmanuallycollectedbyaTAUS.4Table2presentsthetranslationtasks:eachofthetasksdealswithaspeciﬁcdomain,eachofthistaskhaspresumablyaverydifferentrelevancelevel3Thisprocedurecanberegardedasaformofhierarchicalestimation:westartwithasimplermodelandthenuseittodriveamoreexpressivemodel.NotethatwealsouseP(z)estimatedwithintheparallellatentsubdomainLMstoinitializeP(z)forthelatentsubdomainalignmentmodel.4https://www.taus.net/.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

104

EnglishFrenchProfessional&BusinessServicesDevSents2KWords74.16K83.85KTestSents5KWords92.84K105.05KLeisure,TourismandArtsDevSents2KWords107.45K117.16KTestSents5KWords101.82K114.76KEnglishSpanishProfessional&BusinessServicesDevSents2KWords31.70K34.62KTestSents5KWords84.1K93.4KLegalDevSents2KWords35.06K38.78KTestSents5KWords88.63K102.71KFinancialsDevSents2KWords37.23K42.89KTestSents5KWords99.05K109.81KEnglishGermanProfessional&BusinessServicesDevSents2KWords80.49K85.08KTestSents5KWords79.75K85.28KLegalDevSents2KWords50.54K45.99KTestSents5KWords124.93K111.70KComputerSoftwareDevSents2KWords40.24K38.31KTestSents5KWords102.71K101.12KComputerHardwareDevSents2KWords37.40K36.98KTestSents5KWords103.29K98.04KTable2:Dataandadaptationtasks.tothetrainingdata.Inthisway,wetestthestabilityofourresultsacrossawiderangeoftargetdomains.4.2SystemsWeuseastandardstate-of-the-artphrase-basedsys-tem.TheBaselinesystemincludesMOSES(Koehnetal.,2007)baselinefeaturefunctions,pluseighthi-erarchicallexicalizedreorderingmodelfeaturefunc-tions(GalleyandManning,2008).Thetrainingdataisﬁrstword-alignedusingGIZA++(OchandNey,2003)andthensymmetrizedwithgrow(-diag)-ﬁnal-and(Koehnetal.,2003).Welimitthephraselengthtothemaximumofsevenwords.Thelan-guagemodelsareinterpolated5-gramswithKneser-Neysmoothing,estimatedbyKenLM(Heaﬁeldetal.,2013)fromalargemonolingualcorpusofnearly2.1BEnglishwordscollectedwithintheWMT2015MTSharedTask.Finally,weuseMOSESasade-coder(Koehnetal.,2007).Oursystemisexactlythesameasthebase-line,plusthreeadditionalfeaturefunctionsinducedforthetranslationrules:twofeaturesfordomain-speciﬁcityofphrases(bothforthesourceside(Dα(~˜f))andthetargetside(Dα(~˜e)),andonefea-tureforsource-targetcoherenceacrosssubdomains(D(~˜e,~˜f)).Fortheprojection,weuseK=12.WealsoexploreddifferentvaluesforK,buthavenotobservedsigniﬁcantdifferenceinthescores.InourexperimentswedooneiterationofEMwithparal-lelLMs(asdescribedinSection3),beforecontin-uingwiththefullmodelforthreemoreiterations.WedidnotobserveasigniﬁcantimprovementfromrunningEManylonger.Finally,weusehardEM,asithasbeenfoundtoyieldbettermodelsthanthestandardsoftEMonanumberofdifferenttask(例如,(约翰逊,2007)).Inotherwords,insteadofstan-dard‘soft’EMupdateswithphrasecountsweightedaccordingtotheposteriorP(z=i|e,F),weusethe‘winner-takes-all’approach:磷(z=i|˜e)∝Xhe,fic(我;ˆzhe,fi)δ(˜e;e),磷(z=i|˜f)∝Xhe,fic(我;ˆzhe,fi)δ(˜f;F).这里,ˆzhe,fiisthe“winning”latentsubdomainforsentencepairhe,fi:ˆzhe,fi=argmaxi∈{1,…,K}磷(z=i|e,F)Inpractice,wefoundthatusingthishardversionleadstobetterperformance.54.3AlternativetuningscenariosInordertotuneallsystems,weusethek-bestbatchMIRA(CherryandFoster,2012).Wereportthetranslationaccuracywiththreemetrics-BLEU5Amoreprincipledalternativewouldbetouseposteriorreg-ularization(Ganchevetal.,2009).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

105

TaskSystemBLEU↑/∆METEOR↑/∆TER↓/∆English-FrenchProfessional&BusinessServicesBaseline21.428.860.0OurSystem21.5/+0.128.9/+0.159.7/-0.3Leisure,TourismandArtsBaseline39.936.748.1OurSystem40.8/+0.937.1/+0.447.1/-1.0English-SpanishFinancialsBaseline32.537.145.6OurSystem32.8/+0.337.2/+0.145.4/-0.2Professional&BusinessServicesBaseline24.431.754.9OurSystem24.8/+0.431.9/+0.254.8/-0.1LegalServicesBaseline33.336.349.5OurSystem33.8/+0.536.5/+0.249.1/-0.4English-GermanComputerSoftwareBaseline22.827.764.3OurSystem23.1/+0.327.8/+0.164.0/-0.3ComputerHardwareBaseline20.527.761.2OurSystem20.9/+0.427.9/+0.261.1/-0.1Professional&BusinessServicesBaseline15.325.469.2OurSystem15.7/+0.425.6/+0.268.6/-0.6LegalServicesBaseline29.632.955.6OurSystem30.2/+0.633.3/+0.455.1/-0.5Table3:Adaptationresultswhentuningonthein-domaindevelopmentset.Theboldfaceindicatesthattheimprovementoverthebaselineissigniﬁcant.TaskSystemBLEU↑/∆METEOR↑/∆TER↓/∆English-FrenchProfessional&BusinessServicesBaseline20.728.359.5OurSystem20.7/+0.028.4/+0.159.4/-0.1Leisure,TourismandArtsBaseline39.737.048.6OurSystem40.6/+0.937.4/+0.447.4/-1.2English-SpanishFinancialsBaseline33.637.545.4OurSystem34.0/+0.437.7/+0.245.0/-0.4Professional&BusinessServicesBaseline24.431.955.3OurSystem24.9/+0.532.0/+0.154.9/-0.4LegalServicesBaseline32.435.849.0OurSystem32.9/+0.536.0/+0.248.8/-0.2English-GermanComputerSoftwareBaseline23.227.663.4OurSystem23.5/+0.327.8/+0.263.0/-0.4ComputerHardwareBaseline20.827.861.5OurSystem21.0/+0.228.0/+0.261.2/-0.3Professional&BusinessServicesBaseline13.825.272.2OurSystem13.9/+0.125.3/+0.172.1/-0.1LegalServicesBaseline29.332.755.2OurSystem29.9/+0.633.1/+0.454.6/-0.6Table4:Adaptationresultswhentuningonthemixed-domaindevelopmentset.Theboldfaceindicatesthattheimprovementoverthebaselineissigniﬁcant.(Papinenietal.,2002),METEOR(DenkowskiandLavie,2011)andTER(Snoveretal.,2006).Wemarkanimprovementassigniﬁcantwhenweob-tainthep-levelof5%underpairedbootstrapresam-pling(科恩,2004).Notethatbetterresultscorre-spondtolargerBLEUandMETEORbuttosmallerTER.Foreverysystemreported,weruntheopti-mizeratleastthreetimes,beforerunningMultEval

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

106

(Clarketal.,2011)forresamplingandsigniﬁcancetesting.Notethatthescoresforthesystemsareav-eragesovermultipleruns.Fortuningthesystemsweexploretwokindsofdevelopmentsets:(1)Anin-domaindevelop-mentsetofin-domaindatathatdirectlyexempli-ﬁesthetranslationtask(i.e.,asampleoftarget-domaindata),和(2)amixed-domaindevelopmentsetwhichisafullconcatenationofdevelopmentsetsfromalltheavailabledomainsforalanguagepair;thisscenarioisamorerealisticonewhennoin-domaindataisavailable.Intheanalysissectionwealsotestthesetwoscenariosagainstthescenariomixed-domainminusin-domain,whichexcludesthein-domaindevelopmentsetpartfromthemixed-domaindevelopmentset.Byexploringthethreedif-ferentdevelopmentsetswehopetoshedlightontheimportanceofhavingsamplesfromthetargetdo-mainwhenusingourfeatures.Ifourfeaturescanindeedcapturedomaininvarianceofphrasesthentheyshouldimprovetheperformanceinallthreeset-tings,includingthemostdifﬁcultsettingwherethein-domaindatahasbeenexplicitlyexcludedfromthetuningphase.4.4MainresultsIn-domaintuningscenario.Table3presentstheresultsforthein-domaindevelopmentsetscenario.Theintegrationofthedomain-invariantfeaturefunctionsintothebaselineresultsinasigniﬁcantimprovementacrossalldomains:average+0.50BLEUontwoadaptationtasksforEnglish-French,+0.40BLEUonthreeadaptationtasksforEnglish-Spanishand+0.43BLEUonfouradaptationtasksforEnglish-German.Mixed-domaintuningscenario.Whiletheim-provementsarerobustandconsistentforthein-domaindevelopmentsetscenario,weareespe-ciallydelightedtoseeasimilarimprovementforthemixed-domaintuningscenario(Table4).Inde-tail,weobserveanaverage+0.45BLEUontwoadaptationtasksforEnglish-French,+0.47BLEUonthreeadaptationtasksforEnglish-Spanishand+0.30BLEUonfouradaptationtasksforEnglish-German.Wewouldliketoemphasizethatthisperformanceimprovementisobtainedwithouttun-ingspeciﬁcallyforthetargetdomainorusingotherdomain-relatedmeta-informationinthetrainingcor-pus.AdditionalanalysisWeinvestigatetheindividualcontributionofeachdomain-invariancefeature.Weconductexperimentsusingabasiclarge-scalephrase-basedsystemde-scribedinKoehnetal.(2003)asabaseline.Thebaselineincludestwobi-directionalphrase-basedmodels(Pcount(˜e|˜f)andPcount(˜f|˜e)),threepenal-tiesforword,phraseanddistortion,andﬁnally,thelanguagemodel.Ontopofthebaseline,webuildfourdifferentsystems,eachaugmentedwithadomain-invariancefeature.Theﬁrstfeatureisthesource-targetcoherencefeature,D(˜e,˜f),whereweusetheChebyshevdistanceasourdefaultoptions.WealsoinvestigatetheperformanceofothermetricsincludingtheHellingerdistance,6andtheKullback-Leiblerdivergence.7OursecondandthirdfeaturesarethedomainspeciﬁcityofphrasesonthesourceDα(˜f)andonthetargetDα(˜e)sides.Finally,wealsodeployallthesethreedomain-invariancefea-turesDα(˜f)+Dα(˜e)+D(˜e,˜f)).Theexperi-mentsareconductedforthetaskLegalonEnglish-German.English-German(任务:Legal)DevSystemBLEU↑In-domainBaseline28.8+D(˜e,˜f)29.1/+0.3+Dα(˜e)29.4/+0.6+Dα(˜f)29.8/+1.0+Dα(˜f)+Dα(˜e)+D(˜e,˜f)29.9/+1.1Mixed-domainsBaseline28.5+D(˜e,˜f)28.8/+0.3+Dα(˜e)29.3/+0.8+Dα(˜f)29.6/+1.1+Dα(˜f)+Dα(˜e)+D(˜e,˜f)29.8/+1.3Mixed-domains(ExcludeLegal)Baseline28.3+D(˜e,˜f)28.6/+0.3+Dα(˜e)29.1/+0.8+Dα(˜f)29.5/+1.2+Dα(˜f)+Dα(˜e)+D(˜e,˜f)29.6/+1.3Table5:Improvementsoverthebaseline.Theboldfactindicatesthatthedifferenceisstatisticallysig-niﬁcant.6DH(~˜e,~˜f)=1√2rPz(西德:16)pP(z|˜e)−qP(z|˜f)(西德:17)2.7DKL(~˜e,~˜f)=PzP(z|˜e)logP(z|˜e)磷(z|˜f);DKL(~˜f,~˜e)=PzP(z|˜f)logP(z|˜f)磷(z|˜e).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

107

German-English(任务:LegalServices)Inputimjahr2004beﬁndetderratüberdieverpﬂichtungderelektronischenübertragungsolcheraufzeichnungen.Referencethecouncilshalldecidein2004ontheobligationtotransmitsuchrecordselectronically.Baselinein2004thecouncilistheobligationontheelectronictransferofsuchrecords.+Dα(˜f)in2004thecouncilisontheobligationofelectronictransferofsuchrecords.+Dα(˜e)in2004thecouncilisontheobligationofelectronictransmissionofsuchrecords.+D(˜e,˜f)in2004thecouncilisontheobligationofelectronictransmissionofsuchrecords.+ALLin2004thecouncilisontheobligationofelectronictransmissionofsuchrecords.InputdieangemessenheitundwirksamkeitderinternenverwaltungssystemesowiedieleistungderdienststellenReferenceforassessingthesuitabilityandeffectivenessofinternalmanagementsystemsandtheperformanceofde-partmentsBaselinetheadequacyandeffectivenessofinternaladministrativesystemsaswellastheperformanceoftheservices+Dα(˜f)theadequacyandeffectivenessoftheinternalmanagementsystems,aswellastheperformanceoftheservices+Dα(˜e)theadequacyandeffectivenessofinternalmanagementsystems,aswellastheperformanceoftheservices+D(˜e,˜f)theadequacyandeffectivenessoftheinternaladministrativesystemsaswellastheperformanceoftheservices+ALLtheadequacyandeffectivenessofinternalmanagementsystems,aswellastheperformanceoftheservicesInputzurausführungderausgabennimmtderanweisungsbefugtemittelbindungenvor,gehtrechtlicheverpﬂich-tungeneinReferencetoimplementexpenditure,theauthorisingofﬁcershallmakebudgetcommitmentsandlegalcommitmentsBaselinetheimplementationofexpenditure,theauthorisingofﬁcercommitmentsbefore,isalegalcommitments+Dα(˜f)theimplementationofexpenditure,theauthorisingofﬁcercommitments,isalegalobligations+Dα(˜e)theimplementationofexpenditure,theauthorisingofﬁcercommitmentsbefore,isalegalobligations+D(˜e,˜f)theimplementationofexpenditure,theauthorisingofﬁcercommitmentsbefore,isalegalcommitments+ALLtheimplementationofexpenditure,theauthorisingofﬁcercommitmentsbefore,isalegalobligationsTable7:Translationoutputsproducedbythebasicbaselineanditsaugmentedsystemswithadditionalabstractfeaturefunctionsderivedfromhiddendomaininformation.English-German(任务:Legal)DevMetricBLEU↑In-domainChebyshev29.1/+0.3Kullback-Leibler(DKL(~˜e,~˜f))29.2/+0.4Kullback-Leibler(DKL(~˜f,~˜e))29.0/+0.2Hellinger29.0/+0.2Table6:Usingdifferentmetricsasthemeasureofcoherence.Table5andTable6presenttheresults.Overall,wecanseethatalldomain-invariancefeaturescon-tributetoadaptationperformance.Speciﬁcally,weobservethefollowing:•Favouringthesource-targetcoherenceacrosssub-domains(i.e.,addingthefeatureD(˜e,˜f))pro-videsasigniﬁcanttranslationimprovementof+0.3BLEU.Whichspeciﬁcsimilaritymeasureisuseddoesnotseemtomatterthatmuch(seeTable6).Weobtainthebestresult(+0.4蓝线)withtheKLdivergence(DKL(~˜e,~˜f)).然而,thedifferencesarenotstatisticallysigniﬁcant.•Integratingapreferenceforlessdomain-speciﬁctranslationphrasesatthetargetside(Dα(˜e))leadstoatranslationimprovementof+0.6BLEU.•Doingthesameforthesourceside(Dα(˜f)),inturn,leadstoanimprovementof+1.0BLEU.•Augmentingthebaselinebyintegratingallourfeaturesleadstothebestresult,withanimprove-mentof+1.1BLEU.•Thetranslationimprovementisobservedalsofortrainingwithadevelopmentsetofmixeddomains(evenforthemixed-domainminusin-domainsettingwhenexcludingtheLegaldatafromthemixeddevelopmentset).•Theweightsforalldomain-invariancefeatures,oncetuned,arepositiveinalltheexperiments.Table7presentsexamplesoftranslationsfromdifferentsystems.Forexample,thedomain-invariantsystemrevisesthetranslationfrom”elec-tronictransfer”到”electronictransmission”forthe

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

108

English-GermanTaskBaselineOurSystem+z1+z2+z3+z4+z5+z6+z7+z8+z9+z10+z11+z12Hardware20.220.420.420.420.520.520.520.420.420.520.420.420.4Software22.823.023.023.022.822.923.123.023.023.023.023.022.8P&BServices13.313.613.613.313.513.613.613.513.513.613.513.613.5Legal28.528.728.629.128.728.628.928.828.828.928.628.628.8Table8:LatentSubdomainAnalysis(withBLEUscore).Germanphrase”elektronischenÜbertragung”,andfrom”internaladministrativesystems”到”internalmanagementsystems”fortheGermanphrase”in-ternenverwaltungssysteme”.Therevisions,how-ever,arenotalwayssuccessful.Forinstance,addingDα(˜e)andDα(˜f)resultedinrevisingthetranslationoftheGermanphrase”rechtlicheverpﬂichtungen”到”legalobligations”,whichisaworsechoice(atleastaccordingtoBLEU)比”legalcommitments”pro-ducedbythebaseline.Wealsopresentabriefanalysisoflatentsubdomainsinducedbyourprojectionframe-work.Foreachsubdomainzweintegratethedomainposteriors(磷(z|˜e)andP(z|˜f)andthesource-targetdomain-coherencefeature(西德:12)(西德:12)(西德:12)磷(z|˜e)−P(z|˜f)(西德:12)(西德:12)(西德:12)).Wehypothesizethatwhen-everweobserveanimprovementforatranslationtaskwithdomain-informedfeatures,thismeansthatthecorrespondinglatentsubdomainzisclosetothetargettranslationdomain.TheresultsarepresentedinTable8.Apparently,amongthelatentsubdomains,z4,z5,z6,z9areclosesttothetargetdomainofHardware.Theirderivedfeaturefunctionsarehelpfulinimprovingthetranslationaccuracyforthetask.Similarly,z1,z2,z5,z6,z9andz11areclosesttoProfessional&商业,z6isclosesttoSoftware,andz3isclosesttoLegal.Meanwhile,z4,z5andz12arenotrelevanttothetaskofSoftware.Similarly,z3isnotrelevanttoProfessional&商业,andz2,z5andz10arenotrelevanttoLegal.Usingtopicmodelsinsteadoflatentdomains.Ourdomain-invarianceframeworkdemandsaccesstoposteriordistributionsoflatentdomainsforphrases.Thoughwearguedforusingourdomaininductionapproach,otherlatentvariablemodelscanbeusedtocomputetheseposteriors.Onenaturaloptionistousetopicmodels,andmorespeciﬁcallyLDA(Bleietal.,2003).Willourdomain-invarianceframeworkstillworkwithtopicmodels,andhowcloselyrelatedaretheinducedlatentdomainsinducedwithLDAandourmodel?Thesearethequestionswestudyinthissection.WeestimateLDAatthesentencelevelinamono-lingualregime8ononesideofeachparallelcorpus(letusassumefornowthatthisisthesourceside).Whenthemodelisestimated,weobtainthepos-teriordistributionsoftopics(wedenotethemasz,aswetreatthemasdomains)foreachsource-sidesentenceinthetrainingset.Now,aswedidwithourphraseinductionframework,weassociatetheseposteriorswitheveryphrasebothinthesourceandinthetargetsidesofthatsentencepair.Phraseandphrase-pairfeaturesdeﬁnedinSection2arecom-putedrelyingontheseprobabilitiesaveragedovertheentiretrainingset.Wetrybothdirections,thatisalsoestimatingLDAonthetargetsideandtransfer-ringtheposteriorprobabilitiestothesourceside.InordertoestimateLDA,weusedGibbssam-plingimplementedintheMalletpackage(McCal-lum,2002)withdefaultvaluesofhyper-parameters(α=0.01andβ=0.01).Table9presentstheresultsfortheLegaltaskwiththreedifferentsys-temoptimizationsettings.BLEU,METEORandTERarereported.Astheresultsuggests,usingourinductionframeworktendstoyieldslightlybettertranslationresultsintermsofMETEORandespe-ciallyBLEU.However,usingLDAseemstoleadtoslightlybettertranslationresultintermsofTER.TopicsinLDA-likemodelsencodeco-occurrencepatternsinbag-of-wordrepresentationsofsen-tences.Incontrast,domainsinourdomain-inductionframeworkrelyonngramsandword-alignmentinformation.Consequently,thesemod-8NotethatbilingualLDAmodels(e.g.,seeHasleretal.(2014),Zhangetal.(2014))couldpotentiallyproducebetterresultsbutweleavethemforfuturework.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

109

English-German(任务:Legal)DevAlgorithmsBLEU↑METEOR↑TER↓In-domainOur29.933.155.5LDA(来源)29.933.155.4LDA(目标)29.933.155.3Mixed-domainsOur29.832.954.9LDA(来源)29.732.954.8LDA(目标)29.732.954.8Mixed-domains(ExcludeLegal)Our29.632.854.6LDA(来源)29.432.754.5LDA(目标)29.432.754.6Table9:Comparisoninlatentdomaininductionwithvariousalgorithms.elsarelikelytoencodedifferentlatentinformationaboutsentences.Wealsoinvestigatetranslationper-formancewhenweusebothcoherencefeaturesfromLDAandcoherencefeaturesfromourownframe-work.Table10showsthatusingalltheinducedco-herencefeaturesresultsinthebesttranslation,nomatterwhichtranslationmetricisused.Weleavetheexplorationofsuchanextensionforfuturework.English-German(任务:Legal)DevAlgorithmsBLEU↑METEOR↑TER↓MixeddomainsOurfeatures29.832.954.9LDA(来源)features29.732.954.8AllFeatures29.833.054.7Table10:Combinationofallfeatures.5RelatedWorkandDiscussionDomainadaptationisanimportantchallengeformanyNLPproblems.AgoodsurveyofpotentialtranslationerrorsinMTadaptationcanbefoundinIrvineetal(2013).Lexicalselectionappearstobethemostcommonsourceoferrorsindomainadap-tationscenarios(Irvineetal.,2013;Weesetal.,2015).Othertranslationerrorsincludereorderingerrors(Chenetal.,2013a;Zhangetal.,2015),align-menterrors(CuongandSima’an,2015)andoverﬁt-tingtothesourcedomainattheparametertuningstage(Jotyetal.,2015).AdaptationinSMTcanberegardedasinjectingpriorknowledgeaboutthetargettranslationtaskintothelearningprocess.Variousapproacheshavesofarbeenexploitedintheliterature.Theycanbelooselycategorizedaccordingtothetypeofpriorknowledgeexploitedforadaptation.Often,aseedin-domaincorpusexemplifyingthetargettranslationtaskisusedasaformofpriorknowledge.Varioustechniquescanthenbeusedforadaptation.Forex-ample,oneapproachistocombineasystemtrainedonthein-domaindatawithanothergeneral-domainsystemtrainedontherestofthedata(e.g.,seeKoehnandSchroeder(2007),Fosteretal.(2010),Bisazzaetal.(2011),Sennrich(2012乙),Razmaraetal.(2012),Sennrichetal.(2013),Haddow(2013),Jotyetal.(2015)).Ratherthanusingtheentiretrainingdata,itisalsocommontocombinethein-domainsystemwithasystemtrainedonaselectedsubsetofthedata(e.g.,seeAxelrodetal.(2011),KoehnandHaddow(2012),Duhetal.(2013),KirchhoffandBilmes(2014),CuongandSima’an(2014乙)).Insomeothercases,thepriorknowledgeliesinmeta-informationaboutthetrainingdata.Thiscouldbedocument-annotatedtraininginformation(Eidel-manetal.,2012;Huetal.,2014;Hasleretal.,2014;Suetal.,2015;Zhangetal.,2014),anddomain-annotatedsub-corpora(Chiangetal.,2011;Sen-nrich,2012乙;Chenetal.,2013b;Carpuatetal.,2014;CuongandSima’an,2015).Somerecentap-proachesperformadaptationbyexploitingatargetdomaindevelopment,orevenonlythesourcesideofthedevelopmentset(Sennrich,2012A;Carpuatetal.,2013;Carpuatetal.,2014;MansourandNey,2014).最近,therewassomeresearchonadaptingsi-multaneouslytomultipledomains,thegoalrelatedtoours(Clarketal.,2012;Sennrich,2012A).Forinstance,Clarketal.(2012)augmentaphrase-basedMTsystemwithvariousdomainindicatorfeaturestobuildasinglesystemthatperformswellacrossarangeofdomains.Sennrich(2012A)proposedtoclustertrainingdatainanunsupervisedfashiontobuildmixturemodelsthatyieldgoodperformanceonmultipletestdomains.However,theirapproachesareverydifferentfromours,thatisminimizingriskassociatedwithchoosingdomain-speciﬁctransla-tions.Moreover,thepresentworkdeviatesradicallyfromearlierworkinthatitexploresthescenariowherenopriordataorknowledgeisavailableaboutthetranslationtaskduringtrainingtime.Thefocusofourapproachistoaimforsafertranslationbyre-wardingdomain-invarianceoftranslationrulesoverlatentsubdomainsthatcanbe(仍然)usefulonadap-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

110

tationtasks.ThepresentstudyisinspiredbyZhangetal.(2014)whichexploitstopic-insensitivitythatislearnedoverdocumentsfortranslation.Thegoalandsettingweareworkingonismarkedlydiffer-ent(i.e.,wedonothaveaccesstometa-informationaboutthetrainingandtranslationtasksatall).Thedomain-invarianceinducedisintegratedintoSMTsystemsasfeaturefunctions,redirectingthedecodertoabettersearchspaceforthetranslationoveradap-tationtasks.Thisaimsatbiasingthedecoderto-wardstranslationsthatarelessdomain-speciﬁcandmoresource-targetdomaincoherent.ThereisaninterestingrelationbetweenthisworkandextensivepriorworkonminimumBayesrisk(MBR)目标(usedeitherattesttime(KumarandByrne,2004)orduringtraining(SmithandEis-ner,2006;Paulsetal.,2009)).Aswithourwork,thegoalofMBRminimizationistoselecttransla-tionsthatareless“risky”.Theirriskisduetotheuncertaintyinmodelpredictions,andsomeofthisuncertaintymayindeedbeassociatedwithdomain-variabilityoftranslations.Still,asystemtrainedwithanMBRobjectivewilltendtooutputmostfrequenttranslationratherthanthemostdomain-invariantone,andthis,aswearguedintheintroduc-tion,mightnotbetherightdecisionwhenapplyingitacrossdomains.Webelievethatthetwoclassesofmethodsarelargelycomplementary,andleavefur-therinvestigationforfuturework.Ataconceptuallevelitisalsorelatedtoregular-izersusedinlearningdomain-invariantneuralmod-els(Titov,2011),speciﬁcallyautoencoders.Thoughtheyalsoconsiderdivergencesbetweendistributionsoflatentvariablevectors,theyusethesedivergencesatlearningtimetobiasmodelstoinducerepresen-tationsmaximallyinvariantacrossdomains.More-over,theyassumeaccesstometa-informationaboutdomainsandconsideronlyclassiﬁcationproblems.6ConclusionThispaperaimsatadaptingmachinetranslationsys-temstoalldomainsatoncebyfavoringphrasesthataredomain-invariant,thataresafetouseacrossavarietyofdomains.Whiletypicaldomainadapta-tionsystemsexpectasampleofthetargetdomain,ourapproachdoesnotrequireoneandisdirectlyapplicabletoanydomainadaptationscenario.Ex-perimentsshowthattheproposedapproachresultsinmodestbutconsistentimprovementsinBLEU,METEORandTER.Tothebestofourknowledge,ourresultsaretheﬁrsttosuggestconsistentandsig-niﬁcantimprovementbyafullyunsupervisedadap-tationmethodacrossawidevarietyoftranslationtasks.Theproposedadaptationframeworkisfairlysim-ple,leavingmuchspaceforfutureresearch.Onepotentialdirectionistheintroductionofadditionalfeaturesrelyingontheassignmentofphrasestodo-mains.Theframeworkforinducinglatentdomainsproposedinthispapershouldbebeneﬁcialinthisfuturework.Theimplementationofoursubdomain-inductionframeworkisavailableathttps://github.com/hoangcuong2011/UDIT.AcknowledgementsWethankanonymousreviewersfortheirconstruc-tivecommentsonearlierversions.WealsothankHuiZhangforhishelponexpectedKneser-Neysmoothingtechnique.TheﬁrstauthorissupportedbytheEXPERT(EXPloitingEmpiricalappRoachestoTranslation)InitialTrainingNetwork(ITN)oftheEuropeanUnion’sSeventhFrameworkProgramme.ThesecondauthorissupportedbyVICIgrantnr.277-89-002fromtheNetherlandsOrganizationforScientiﬁcResearch(NWO).WethankTAUSforprovidinguswithsuitabledata.ReferencesAmittaiAxelrod,XiaodongHe,andJianfengGao.2011.Domainadaptationviapseudoin-domaindataselec-tion.InProceedingsofEMNLP.AriannaBisazza,NickRuiz,andMarcelloFederico.2011.Fill-upversusinterpolationmethodsforphrase-basedSMTadaptation.InIWSLT.DavidM.Blei,AndrewY.Ng,andMichaelI.Jordan.2003.Latentdirichletallocation.JMLR.PeterF.Brown,VincentJ.DellaPietra,StephenA.DellaPietra,andRobertL.Mercer.1993.Themathemat-icsofstatisticalmachinetranslation:parameteresti-mation.Comput.Linguist.MarineCarpuat,HalDaumeIII,KatharineHenry,AnnIrvine,JagadeeshJagarlamudi,andRachelRudinger.2013.Sensespotting:Neverletyourparalleldatatieyoutoanolddomain.InProceedingsofACL.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

111

MarineCarpuat,CyrilGoutte,andGeorgeFoster.2014.Linearmixturemodelsforrobustmachinetranslation.InProc.ofWMT.BoxingChen,GeorgeFoster,andRolandKuhn.2013a.Adaptationofreorderingmodelsforstatisticalma-chinetranslation.InProceedingsofNAACL.BoxingChen,RolandKuhn,andGeorgeFoster.2013b.Vectorspacemodelforadaptationinstatisticalma-chinetranslation.InProceedingsoftheACL.ColinCherryandGeorgeFoster.2012.Batchtuningstrategiesforstatisticalmachinetranslation.InPro-ceedingsoftheNAACL-HLT.DavidChiang,SteveDeNeefe,andMichaelPust.2011.Twoeasyimprovementstolexicalweighting.InPro-ceedingsofACL(ShortPapers).JonathanClark,ChrisDyer,AlonLavie,andNoahA.Smith.2011.Betterhypothesistestingforstatisticalmachinetranslation:Controllingforoptimizerinsta-bility.InProceedingsofACL(ShortPapers).JonathanClark,AlonLavie,andChrisDyer.2012.Onesystem,manydomains:Open-domainstatisticalma-chinetranslationviafeatureaugmentation.TrevorCohnandGholamrezaHaffari.2013.Aninﬁnitehierarchicalbayesianmodelofphrasaltranslation.InProceedingsoftheACL.HoangCuongandKhalilSima’an.2014a.Latentdo-mainphrase-basedmodelsforadaptation.InProceed-ingsofEMNLP.HoangCuongandKhalilSima’an.2014b.Latentdo-maintranslationmodelsinmix-of-domainshaystack.InProceedingsofCOLING.HoangCuongandKhalilSima’an.2015.Latentdomainwordalignmentforheterogeneouscorpora.InPro-ceedingsofNAACL-HLT.ArthurDempster,NanLaird,andDonaldRubin.1977.Maximumlikelihoodfromincompletedataviatheemalgorithm.JRSS,SERIESB,39(1):1–38.JohnDeNero,DanGillick,JamesZhang,andDanKlein.2006.Whygenerativephrasemodelsunderperformsurfaceheuristics.InProc.ofWMT.MichaelDenkowskiandAlonLavie.2011.Meteor1.3:Automaticmetricforreliableoptimizationandevalua-tionofmachinetranslationsystems.InProc.ofWMT.KevinDuh,GrahamNeubig,KatsuhitoSudoh,andHa-jimeTsukada.2013.Adaptationdataselectionus-ingneurallanguagemodels:Experimentsinmachinetranslation.InProceedingsoftheACL.VladimirEidelman,JordanBoyd-Graber,andPhilipResnik.2012.Topicmodelsfordynamictranslationmodeladaptation.InACL(ShortPapers).GeorgeFoster,CyrilGoutte,andRolandKuhn.2010.Discriminativeinstanceweightingfordomainadapta-tioninstatisticalmachinetranslation.InProceedingsofEMNLP.MichelGalleyandChristopherD.Manning.2008.Asimpleandeffectivehierarchicalphrasereorderingmodel.InProceedingsofEMNLP.KuzmanGanchev,BenTaskar,FernandoPereira,andJoaoGama.2009.Posteriorvsparametersparsityinlatentvariablemodels.InProceedingsofNIPS.BarryHaddow.2013.Applyingpairwiserankedoptimi-sationtoimprovetheinterpolationoftranslationmod-els.InProceedingsofNAACL-HLT.EvaHasler,PhilBlunsom,PhilippKoehn,andBarryHaddow.2014.Dynamictopicadaptationforphrase-basedmt.InProceedingsofEACL.KennethHeaﬁeld,IvanPouzyrevsky,JonathanClark,andPhilippKoehn.2013.ScalableModiﬁedKneser-NeyLanguageModelEstimation.InProceedingsoftheACL(Volume2:ShortPapers).YueningHu,KeZhai,VladimirEidelman,andJordanBoyd-Graber.2014.Polylingualtree-basedtopicmodelsfortranslationdomainadaptation.InProceed-ingsoftheACL.AnnIrvine,JohnMorgan,MarineCarpuat,DaumeHalIII,andDragosMunteanu.2013.Measuringmachinetranslationerrorsinnewdomains.InTACL.MarkJohnson.2007.Whydoesn’tEMﬁndgoodHMMPOS-taggers?InProceedingsofEMNLP-CoNLL.ShaﬁqJoty,HassanSajjad,NadirDurrani,KamlaAl-Mannai,AhmedAbdelali,andStephanVogel.2015.Howtoavoidunwantedpregnancies:Domainadapta-tionusingneuralnetworkmodels.InProceedingsofEMNLP.KatrinKirchhoffandJeffBilmes.2014.Submodularityfordataselectioninmachinetranslation.InEMNLP.PhilippKoehnandBarryHaddow.2012.Towardseffec-tiveuseoftrainingdatainstatisticalmachinetransla-tion.InProceedingsoftheWMT.PhilippKoehnandJoshSchroeder.2007.Experimentsindomainadaptationforstatisticalmachinetranslation.InProceedingsofWMT.PhilippKoehn,FranzOch,andDanielMarcu.2003.Statisticalphrase-basedtranslation.InProceedingsofNAACL.PhilippKoehn,HieuHoang,AlexandraBirch,ChrisCallison-Burch,MarcelloFederico,NicolaBertoldi,BrookeCowan,WadeShen,ChristineMoran,RichardZens,ChrisDyer,OndˇrejBojar,AlexandraCon-stantin,andEvanHerbst.2007.Moses:Opensourcetoolkitforstatisticalmachinetranslation.InProceed-ingsofACL.PhilippKoehn.2004.Statisticalsigniﬁcancetestsformachinetranslationevaluation.InProceedingsofEMNLP.PhilippKoehn.2005.Europarl:AParallelCorpusforStatisticalMachineTranslation.InProceedingsofMTSummit.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
8
6
1
5
6
7
3
5
4

/
t

我

A
C
_
A
_
0
0
0
8
6
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

112

ShankarKumarandWilliamJ.Byrne.2004.Minimumbayes-riskdecodingforstatisticalmachinetranslation.InHLT-NAACL.SaabMansourandHermannNey.2014.Unsupervisedadaptationforstatisticalmachinetranslation.InPro-ceedingsofWMT.SpyrosMatsoukas,Antti-VeikkoI.Rosti,andBingZhang.2009.Discriminativecorpusweightesti-mationformachinetranslation.InProceedingsofEMNLP.AndrewKachitesMcCallum.2002.Mal-let:Amachinelearningforlanguagetoolkit.http://mallet.cs.umass.edu.MarkosMylonakisandKhalilSima’an.2008.Phrasetranslationprobabilitieswithitgpriorsandsmoothingaslearningobjective.InProceedingsofEMNLP.GrahamNeubig,TaroWatanabe,EiichiroSumita,Shin-sukeMori,andTatsuyaKawahara.2011.Anunsuper-visedmodelforjointphrasealignmentandextraction.InProceedingsofACL-HLT.FranzOchandHermannNey.2003.Asystematiccom-parisonofvariousstatisticalalignmentmodels.Com-put.Linguist.,pages19–51.FranzOchandHermannNey.2004.Thealignmenttem-plateapproachtostatisticalmachinetranslation.Com-put.Linguist.,pages417–449.KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.Bleu:Amethodforautomaticevalu-ationofmachinetranslation.InProceedingsofACL.AdamPauls,JohnDeNero,andDanKlein.2009.Con-sensustrainingforconsensusdecodinginmachinetranslation.InProceedingsofEMNLP.MajidRazmara,GeorgeFoster,BaskaranSankaran,andAnoopSarkar.2012.Mixingmultipletranslationmodelsinstatisticalmachinetranslation.InProceed-ingsofACL.RicoSennrich,HolgerSchwenk,andWalidAransa.2013.Amulti-domaintranslationmodelframeworkforstatisticalmachinetranslation.InProceedingsofACL.RicoSennrich.2012a.Mixture-modelingwithunsuper-visedclustersfordomainadaptationinstatisticalma-chinetranslation.InProceedingsoftheEAMT.RicoSennrich.2012b.Perplexityminimizationfortranslationmodeldomainadaptationinstatisticalma-chinetranslation.InProceedingsofEACL.DavidA.SmithandJasonEisner.2006.Minimumriskannealingfortraininglog-linearmodels.InProceed-ingsoftheCOLING/ACL.MatthewSnover,BonnieDorr,R.Schwartz,L.Micciulla,andJ.Makhoul.2006.Astudyoftranslationeditratewithtargetedhumanannotation.InProceedingsofAMTA.JinsongSu,DeyiXiong,YangLiu,XianpeiHan,HongyuLin,JunfengYao,andMinZhang.2015.Acontext-awaretopicmodelforstatisticalmachinetranslation.InProceedingsoftheACL-IJCNLP.IvanTitov.2011.Domainadaptationbyconstraininginter-domainvariabilityoflatentfeaturerepresenta-tion.InProceedingsofACL.MarliesvanderWees,AriannaBisazza,WouterWeerkamp,andChristofMonz.2015.What’sinaDomain?AnalyzingGenreandTopicDifferencesinSMT.InProceedingsofACL-IJCNLP(shortpaper).HuiZhangandDavidChiang.2014.Kneser-NeySmoothingonExpectedCounts.InProceedingsofACL.MinZhang,XinyanXiao,DeyiXiong,andQunLiu.2014.Topic-baseddissimilarityandsensitivitymod-elsfortranslationruleselection.JAIR.BiaoZhang,JinsongSu,DeyiXiong,HongDuan,andJunfengYao.2015.Discriminativereorderingmodeladaptationviastructurallearning.InIJCAI.
下载pdf