Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk.

Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk.
Submission batch: 6/2016; Revision batch: 10/2016; Published 3/2017.

2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

c
(cid:13)

ContextGatesforNeuralMachineTranslationZhaopengTu†YangLiu‡ZhengdongLu†XiaohuaLiu†HangLi††Noah’sArkLab,HuaweiTechnologies,HongKong{tu.zhaopeng,lu.zhengdong,liuxiaohua3,hangli.hl}@huawei.com‡DepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijingliuyang2011@tsinghua.edu.cnAbstractInneuralmachinetranslation(NMT),genera-tionofatargetworddependsonbothsourceandtargetcontexts.Wefindthatsourcecon-textshaveadirectimpactontheadequacyofatranslationwhiletargetcontextsaffecttheflu-ency.Intuitively,generationofacontentwordshouldrelymoreonthesourcecontextandgenerationofafunctionalwordshouldrelymoreonthetargetcontext.Duetothelackofeffectivecontrolovertheinfluencefromsourceandtargetcontexts,conventionalNMTtendstoyieldfluentbutinadequatetransla-tions.Toaddressthisproblem,weproposecontextgateswhichdynamicallycontroltheratiosatwhichsourceandtargetcontextscon-tributetothegenerationoftargetwords.Inthisway,wecanenhanceboththeadequacyandfluencyofNMTwithmorecarefulcon-troloftheinformationflowfromcontexts.Experimentsshowthatourapproachsignif-icantlyimprovesuponastandardattention-basedNMTsystemby+2.3BLEUpoints.1IntroductionNeuralmachinetranslation(NMT)(KalchbrennerandBlunsom,2013;Sutskeveretal.,2014;Bah-danauetal.,2015)hasmadesignificantprogressinthepastseveralyears.Itsgoalistoconstructandutilizeasinglelargeneuralnetworktoaccom-plishtheentiretranslationtask.Onegreatadvan-tageofNMTisthatthetranslationsystemcanbecompletelyconstructedbylearningfromdatawith-outhumaninvolvement(cf.,featureengineeringinstatisticalmachinetranslation(SMT)).Theencoder-decoderarchitectureiswidelyemployed(Choetal.,inputj¯ınni´anqi´anliˇangyu`eguˇangd¯ongg¯aox¯ınj`ısh`uchˇanpˇınch¯ukˇou37.6y`ımˇeiyu´anNMTinthefirsttwomonthsofthisyear,theexportofnewhighleveltechnologyproductwasUNK-billionusdollars5srcchina’sguangdonghi-techexportshit58billiondollars5tgtchina’sexportofhighandnewhi-techexportsoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportof···Table1:Sourceandtargetcontextsarehighlycor-relatedtotranslationadequacyandfluency,respec-tively.5srcand5tgtdenotehalvingthecontribu-tionsfromthesourceandtargetcontextswhengen-eratingthetranslation,respectively.2014;Sutskeveretal.,2014),inwhichtheencodersummarizesthesourcesentenceintoavectorrepre-sentation,andthedecodergeneratesthetargetsen-tenceword-by-wordfromthevectorrepresentation.Therepresentationofthesourcesentenceandtherepresentationofthepartiallygeneratedtargetsen-tence(translation)ateachpositionarereferredtoassourcecontextandtargetcontext,respectively.Thegenerationofatargetwordisdeterminedjointlybythesourcecontextandtargetcontext.SeveraltechniquesinNMThaveproventobeveryeffective,includinggating(HochreiterandSchmidhuber,1997;Choetal.,2014)andat-tention(Bahdanauetal.,2015)whichcanmodellong-distancedependenciesandcomplicatedalign-

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
t

l

a
c
_
a
_
0
0
0
4
8
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

88

mentrelationsinthetranslationprocess.Usinganencoder-decoderframeworkthatincorporatesgat-ingandattentiontechniques,ithasbeenreportedthattheperformanceofNMTcansurpasstheper-formanceoftraditionalSMTasmeasuredbyBLEUscore(Luongetal.,2015).Despitethissuccess,weobservethatNMTusu-allyyieldsfluentbutinadequatetranslations.1Weattributethistoastrongerinfluenceoftargetcon-textongeneration,whichresultsfromastrongerlanguagemodelthanthatusedinSMT.Oneques-tionnaturallyarises:whatwillhappenifwechangetheratioofinfluencesfromthesourceortargetcon-texts?Table1showsanexampleinwhichanattention-basedNMTsystem(Bahdanauetal.,2015)gener-atesafluentyetinadequatetranslation(e.g.,missingthetranslationof“guˇangd¯ong”).Whenwehalvethecontributionfromthesourcecontext,theresultfur-therlosesitsadequacybymissingthepartialtrans-lation“inthefirsttwomonthsofthisyear”.Onepossibleexplanationisthatthetargetcontexttakesahigherweightandthusthesystemfavorsashortertranslation.Incontrast,whenwehalvethecon-tributionfromthetargetcontext,theresultcom-pletelylosesitsfluencybyrepeatedlygeneratingthetranslationof“ch¯ukˇou”(i.e.,“theexportof”)un-tilthegeneratedtranslationreachesthemaximumlength.Therefore,thisexampleindicatesthatsourceandtargetcontextsinNMTarehighlycorrelatedtotranslationadequacyandfluency,respectively.Infact,conventionalNMTlackseffectivecontrolontheinfluenceofsourceandtargetcontexts.Ateachdecodingstep,NMTtreatsthesourceandtar-getcontextsequally,andthusignoresthedifferentneedsofthecontexts.Forexample,contentwordsinthetargetsentencearemorerelatedtothetransla-tionadequacy,andthusshoulddependmoreonthesourcecontext.Incontrast,functionwordsinthetargetsentenceareoftenmorerelatedtothetrans-lationfluency(e.g.,“of”after“isfond”),andthusshoulddependmoreonthetargetcontext.Inthiswork,weproposetousecontextgatestocontrolthecontributionsofsourceandtargetcon-textsonthegenerationoftargetwords(decoding)1Fluencymeasureswhetherthetranslationisfluent,whileadequacymeasureswhetherthetranslationisfaithfultotheoriginalsentence(Snoveretal.,2009).Figure1:ArchitectureofdecoderRNN.inNMT.Contextgatesarenon-lineargatingunitswhichcandynamicallyselecttheamountofcontextinformationinthedecodingprocess.Specifically,ateachdecodingstep,thecontextgateexaminesboththesourceandtargetcontexts,andoutputsaratiobetweenzeroandonetodeterminethepercentagesofinformationtoutilizefromthetwocontexts.Inthisway,thesystemcanbalancetheadequacyandfluencyofthetranslationwithregardtothegenera-tionofawordateachposition.Experimentalresultsshowthatintroducingcon-textgatesleadstoanaverageimprovementof+2.3BLEUpointsoverastandardattention-basedNMTsystem(Bahdanauetal.,2015).Aninterestingfind-ingisthatwecanreplacetheGRUunitsinthede-coderwithconventionalRNNunitsandinthemean-timeutilizecontextgates.Thetranslationperfor-manceiscomparablewiththestandardNMTsystemwithGRU,butthesystemenjoysasimplerstructure(i.e.,usesonlyasinglegateandhalfoftheparam-eters)andafasterdecoding(i.e.,requiresonlyhalfthematrixcomputationsfordecoding).22NeuralMachineTranslationSupposethatx=x1,…xj,…xJrepresentsasourcesentenceandy=y1,…yi,…yIatargetsentence.NMTdirectlymodelstheprobabilityoftranslationfromthesourcesentencetothetargetsentencewordbyword:P(y|x)=IYi=1P(yi|y<=>evaluator130.0%54.0%16.0%28.5%48.5%23.0%evaluator230.0%50.0%20.0%29.5%54.5%16.0%Table3:Subjectiveevaluationoftranslationadequacyandfluency.OverGroundHog(GRU)WetheninvestigatedtheeffectofthecontextgatesonastandardNMTwithGRUasthedecodingactivationfunction(Rows4-7).Severalobservationscanbemade.First,con-textgatesalsoboostperformancebeyondtheGRUinallcases,demonstratingourclaimthatcontextgatesarecomplementarytotheresetandupdategatesinGRU.Second,jointlycontrollingtheinfor-mationfrombothtranslationcontextsconsistentlyoutperformsitssingle-sidecounterparts,indicatingthatadirectinteractionbetweeninputsignalsfromthesourceandtargetcontextsisusefulforNMTmodels.OverGroundHog-Coverage(GRU)Wefinallytestedonastrongerbaseline,whichemploysacov-eragemechanismtoindicatewhetherornotasourcewordhasalreadybeentranslated(Tuetal.,2016).Ourcontextgatestillachievesasignificantimprove-mentof1.6BLEUpointsonaverage,reconfirm-ingourclaimthatthecontextgateiscomplemen-tarytotheimprovedattentionmodelthatproducesabettersourcecontextrepresentation.Finally,ourbestmodel(Row7)outperformstheSMTbaselinesystemusingthesamedata(Row1)by3.3BLEUpoints.Fromhereon,wereferto“GroundHog”for“GroundHog(GRU)”,and“ContextGate”for“ContextGate(both)”ifnototherwisestated.SubjectiveEvaluationWealsoconductedasub-jectiveevaluationofthebenefitofincorporatingcontextgates.Twohumanevaluatorswereaskedtocomparethetranslationsof200sourcesentencesrandomlysampledfromthetestsetswithoutknow-ingwhichsystemproducedeachtranslation.Table3showstheresultsofsubjectiveevaluation.Thetwohumanevaluatorsmadesimilarjudgments:inade-quacy,around30%ofGroundHogtranslationsareworse,52%areequal,and18%arebetter;whileinSystemSAERAERGroundHog67.0054.67+ContextGate67.4355.52GroundHog-Coverage64.2550.50+ContextGate63.8049.40Table4:Evaluationofalignmentquality.Thelowerthescore,thebetterthealignmentquality.fluency,around29%areworse,52%areequal,and19%arebetter.5.3AlignmentQualityTable4liststhealignmentperformances.Follow-ingTuetal.(2016),weusedthealignmenterrorrate(AER)(OchandNey,2003)anditsvariantSAERtomeasurethealignmentquality:SAER=1−|MA×MS|+|MA×MP||MA|+|MS|whereAisacandidatealignment,andSandParethesetsofsureandpossiblelinksintherefer-encealignmentrespectively(S⊆P).Mdenotesthealignmentmatrix,andforbothMSandMPweassigntheelementsthatcorrespondtotheexistinglinksinSandPprobability1andtheotherelementsprobability0.Inthisway,weareabletobettereval-uatethequalityofthesoftalignmentsproducedbyattention-basedNMT.Wefindthatcontextgatesdonotimprovealign-mentqualitywhenusedalone.Whencombinedwithcoveragemechanism,however,itproducesbet-teralignments,especiallyone-to-onealignmentsbyselectingthesourcewordwiththehighestalign-mentprobabilitypertargetword(i.e.,AERscore).Onepossiblereasonisthatbetterestimateddecod-ingstates(fromthecontextgate)andcoveragein-formationhelptoproducemoreconcentratedalign-ments,asshowninFigure6.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
t

l

a
c
_
a
_
0
0
0
4
8
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

96

(a)GroundHog-Coverage(SAER=50.80)(b)+ContextGate(SAER=47.35)Figure6:Examplealignments.Incorporatingcontextgateproducesmoreconcentratedalignments.#SystemGateInputsMT05MT06MT08Ave.1GroundHog–30.6131.1223.2328.3221+GatingScalarti−131.62∗31.4823.8528.9831+ContextGate(source)ti−131.69∗31.6324.25∗29.1941+ContextGate(both)ti−132.15∗32.05∗24.39∗29.535ti−1,si31.81∗32.75∗25.66∗30.076ti−1,si,yi−133.52∗33.46∗24.85∗30.61Table5:AnalysisofthemodelarchitecturesmeasuredinBLEUscores.“GatingScalar”denotesthemodelproposedby(Xuetal.,2015)intheimagecaptiongenerationtask,whichlooksatonlythepreviousdecod-ingstateti−1andscalesthewholesourcecontextsiatthevector-level.Toinvestigatetheeffectofeachcomponent,welisttheresultsofcontextgatevariantswithdifferentinputs(e.g.,thepreviouslygeneratedwordyi−1).“*”indicatesstatisticallysignificantdifference(p<0.01)from“GroundHog”.5.4ArchitectureAnalysisTable5showsadetailedanalysisofarchitecturecomponentsmeasuredinBLEUscores.Severalob-servationscanbemade:•OperationGranularity(Rows2and3):Element-wisemultiplication(i.e.,ContextGate(source))outperformsthevector-levelscalar(i.e.,GatingScalar),indicatingthatprecisecontrolofeachelementinthecontextvectorbooststranslationperformance.•GateStrategy(Rows3and4):Whenonlyfedwiththepreviousdecodingstateti−1,ContextGate(both)consistentlyoutperformsContextGate(source),showingthatjointlycontrollinginformationfrombothsourceandtargetsidesisimportantforjudgingtheimportanceofthecontexts.•Peepholeconnections(Rows4and5):Peep-holes,bywhichthesourcecontextsicontrolsthegate,playanimportantroleinthecontextgate,whichimprovestheperformanceby0.57inBLEUscore.•Previouslygeneratedword(Rows5and6):Previouslygeneratedwordyi−1providesamoreexplicitsignalforthegatetojudgetheimportanceofcontexts,leadingtoafurtherim-provementontranslationperformance.5.5EffectsonLongSentencesWefollowBahdanauetal.(2015)andgroupsen-tencesofsimilarlengthstogether.Figure7shows l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 97 Figure7:Performanceoftranslationsonthetestsetwithrespecttothelengthsofthesourcesentences.Contextgateimprovesperformancebyalleviatingin-adequatetranslationsonlongsentences.theBLEUscoreandtheaveragedlengthoftrans-lationsforeachgroup.GroundHogperformsverywellonshortsourcesentences,butdegradesonlongsourcesentences(i.e.,>30),whichmaybeduetothefactthatsourcecontextisnotfullyinterpreted.Contextgatescanalleviatethisproblembybalanc-ingthesourceandtargetcontexts,andthusimprovedecoderperformanceonlongsentences.Infact,in-corporatingcontextgatesboosttranslationperfor-manceonallsourcesentencegroups.Weconfirmthatcontextgateweightzicorrelateswellwithtranslationperformance.Inotherwords,translationsthatcontainhigherzi(i.e.,sourcecon-textcontributesmorethantargetcontext)atmanytimestepsarebetterintranslationperformance.Weusedthemeanofthesequencez1,…,zi,…,zIasthegateweightofeachsentence.WecalculatedthePearsonCorrelationbetweenthesentence-levelgateweightandthecorrespondingimprovementontranslationperformance(i.e.,BLEU,adequacy,andfluencyscores),9asshowninTable6.Weobservedthatcontextgateweightispositivelycorrelatedwithtranslationperformanceimprovementandthatthecorrelationishigheronlongsentences.Asanexample,considerthissourcesentencefromthetestset:9Weusetheaverageofcorrelationsonsubjectiveevaluationmetrics(i.e.,adequacyandfluency)bytwoevaluators.LengthBLEUAdequacyFluency<300.0240.0710.040>300.0760.1210.168Table6:Correlationbetweencontextgateweightandimprovementoftranslationperformance.“Length”denotesthelengthofsourcesentence.“BLEU”,“Adequacy”,and“Fluency”denotesdifferentmetricsmeasuringthetranslationperfor-manceimprovementofusingcontextgates.zh¯ouli`uzh`engsh`ıy¯ınggu´om´ınzh`ongd`aoch¯aosh`ıcˇaig`oudeg¯aof¯engsh´ık`e,d¯angsh´ı14ji¯ach¯aosh`ıdegu¯anb`ıl`ıngy¯ınggu´ozh`eji¯azu`ıd`adeli´ansuˇoch¯aosh`ısˇunsh¯ısh`ubˇaiw`any¯ıngb`angdexi¯aosh`oush¯our`u.GroundHogtranslatesitinto:twenty-sixlondonsupermarketswereclosedatapeakhourofthebritishpop-ulationinthesameperiodoftime.whichalmostmissesalltheinformationofthesourcesentence.Integratingcontextgatesimprovesthetranslationadequacy:thisisexactlythepeakdaysBritishpeo-plebuyingthesupermarket.theclosure

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
t

l

a
c
_
a
_
0
0
0
4
8
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

98

ofthe14supermarketsofthe14super-marketsthatthelargestchainsupermar-ketinenglandlostseveralmillionpoundsofsalesincome.Coveragemechanismsfurtherimprovethetransla-tionbyrectifyingover-translation(e.g.,“ofthe14supermarkets”)andunder-translation(e.g.,“satur-day”and“atthattime”):saturdayisthepeakseasonofbritishpeo-ple’spurchasesofthesupermarket.atthattime,theclosureof14supermarketsmadethebiggestsupermarketofbritainlosemillionsofpoundsofsalesincome.6ConclusionWefindthatsourceandtargetcontextsinNMTarehighlycorrelatedtotranslationadequacyandflu-ency,respectively.Basedonthisobservation,weproposeusingcontextgatesinNMTtodynamicallycontrolthecontributionsfromthesourceandtargetcontextsinthegenerationofatargetsentence,toenhancetheadequacyofNMT.ByprovidingNMTtheabilitytochoosetheappropriateamountofin-formationfromthesourceandtargetcontexts,onecanalleviatemanytranslationproblemsfromwhichNMTsuffers.ExperimentalresultsshowthatNMTwithcontextgatesachievesconsistentandsignifi-cantimprovementsintranslationqualityoverdiffer-entNMTmodels.Contextgatesareinprincipleapplicabletoallsequence-to-sequencelearningtasksinwhichinfor-mationfromthesourcesequenceistransformedtothetargetsequence(correspondingtoadequacy)andthetargetsequenceisgenerated(correspondingtofluency).Inthefuture,wewillinvestigatetheef-fectivenessofcontextgatestoothertasks,suchasdialogueandsummarization.ItisalsonecessarytovalidatetheeffectivenessofourapproachonmorelanguagepairsandotherNMTarchitectures(e.g.,usingLSTMaswellasGRU,ormultiplelayers).AcknowledgementThisworkissupportedbyChinaNational973project2014CB340301.YangLiuissupportedbytheNationalNaturalScienceFoundationofChina(No.61522204)andthe863Program(2015AA015407).WethankactioneditorChrisQuirkandthreeanonymousreviewersfortheirin-sightfulcomments.ReferencesDzmitryBahdanau,KyunghyunCho,andYoshuaBen-gio.2015.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.ICLR2015.PeterE.Brown,StephenA.DellaPietra,VincentJ.DellaPietra,andRobertL.Mercer.1993.Themathematicsofstatisticalmachinetranslation:Parameterestima-tion.ComputationalLinguistics,19(2):263–311.KyunghyunCho,BartvanMerrienboer,CaglarGulcehre,FethiBougares,HolgerSchwenk,andYoshuaBen-gio.2014.Learningphraserepresentationsusingrnnencoder-decoderforstatisticalmachinetranslation.InEMNLP2014.JunyoungChung,CaglarGulcehre,KyungHyunCho,andYoshuaBengio.2014.Empiricalevaluationofgatedrecurrentneuralnetworksonsequencemodel-ing.arXiv.MichaelCollins,PhilippKoehn,andIvonaKuˇcerov´a.2005.Clauserestructuringforstatisticalmachinetranslation.InACL2005.FelixAGersandJ¨urgenSchmidhuber.2000.Recurrentnetsthattimeandcount.InIJCNN2000.IEEE.SeppHochreiterandJ¨urgenSchmidhuber.1997.Longshort-termmemory.NeuralComputation.S´ebastienJean,KyunghyunCho,RolandMemisevic,andYoshuaBengio.2015.Onusingverylargetargetvo-cabularyforneuralmachinetranslation.InACL2015.YangfengJi,TrevorCohn,LingpengKong,ChrisDyer,andJacobEisenstein.2015.Documentcontextlan-guagemodels.InICLR2015.NalKalchbrennerandPhilBlunsom.2013.Recurrentcontinuoustranslationmodels.InEMNLP2013.PhilippKoehn,HieuHoang,AlexandraBirch,ChrisCallison-Burch,MarcelloFederico,NicolaBertoldi,BrookeCowan,WadeShen,ChristineMoran,RichardZens,ChrisDyer,OndrejBojar,AlexandraCon-stantin,andEvanHerbst.2007.Moses:opensourcetoolkitforstatisticalmachinetranslation.InACL2007.Minh-ThangLuong,HieuPham,andChristopherD.Manning.2015.Effectiveapproachestoattention-basedneuralmachinetranslation.InEMNLP2015.TomasMikolovandGeoffreyZweig.2012.Contextde-pendentrecurrentneuralnetworklanguagemodel.InSLT2012.FranzJ.OchandHermannNey.2003.Asystematiccomparisonofvariousstatisticalalignmentmodels.ComputationalLinguistics,29(1):19–51.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
t

l

a
c
_
a
_
0
0
0
4
8
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

99

KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.BLEU:amethodforautomaticeval-uationofmachinetranslation.InACL2002.MatthewSnover,NitinMadnani,BonnieJDorr,andRichardSchwartz.2009.Fluency,adequacy,orHTER?:exploringdifferenthumanjudgmentswithatunableMTmetric.InProceedingsoftheFourthWorkshoponStatisticalMachineTranslation,pages259–268.IlyaSutskever,OriolVinyals,andQuocV.Le.2014.Sequencetosequencelearningwithneuralnetworks.InNIPS2014.KristinaToutanova,H.TolgaIlhan,andChristopherD.Manning.2002.ExtensionstoHMM-basedstatisticalwordalignmentmodels.InEMNLP2012.ZhaopengTu,ZhengdongLu,YangLiu,XiaohuaLiu,andHangLi.2016.Modelingcoverageforneuralmachinetranslation.InACL2016.TianWangandKyunghyunCho.2016.Larger-contextlanguagemodellingwithrecurrentneuralnetwork.InACL2016.KelvinXu,JimmyBa,RyanKiros,KyunghyunCho,AaronCourville,RuslanSalakhutdinov,RichardZemel,andYoshuaBengio.2015.Show,attendandtell:Neuralimagecaptiongenerationwithvisualat-tention.InICML2015.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
t

l

a
c
_
a
_
0
0
0
4
8
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

100Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image
Transactions of the Association for Computational Linguistics, vol. 5, pp. 87–99, 2017. Action Editor: Chris Quirk. image

Download pdf