Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk.

Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk.
Lotto di invio: 6/2016; Lotto di revisione: 10/2016; Pubblicato 3/2017.

2017 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.

C
(cid:13)

ContextGatesforNeuralMachineTranslationZhaopengTu†YangLiu‡ZhengdongLu†XiaohuaLiu†HangLi††Noah’sArkLab,HuaweiTechnologies,HongKong{tu.zhaopeng,lu.zhengdong,liuxiaohua3,hangli.hl}@huawei.com‡DepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijingliuyang2011@tsinghua.edu.cnAbstractInneuralmachinetranslation(NMT),genera-tionofatargetworddependsonbothsourceandtargetcontexts.Wefindthatsourcecon-textshaveadirectimpactontheadequacyofatranslationwhiletargetcontextsaffecttheflu-ency.Intuitively,generationofacontentwordshouldrelymoreonthesourcecontextandgenerationofafunctionalwordshouldrelymoreonthetargetcontext.Duetothelackofeffectivecontrolovertheinfluencefromsourceandtargetcontexts,conventionalNMTtendstoyieldfluentbutinadequatetransla-tions.Toaddressthisproblem,weproposecontextgateswhichdynamicallycontroltheratiosatwhichsourceandtargetcontextscon-tributetothegenerationoftargetwords.Inthisway,wecanenhanceboththeadequacyandfluencyofNMTwithmorecarefulcon-troloftheinformationflowfromcontexts.Experimentsshowthatourapproachsignif-icantlyimprovesuponastandardattention-basedNMTsystemby+2.3BLEUpoints.1IntroductionNeuralmachinetranslation(NMT)(KalchbrennerandBlunsom,2013;Sutskeveretal.,2014;Bah-danauetal.,2015)hasmadesignificantprogressinthepastseveralyears.Itsgoalistoconstructandutilizeasinglelargeneuralnetworktoaccom-plishtheentiretranslationtask.Onegreatadvan-tageofNMTisthatthetranslationsystemcanbecompletelyconstructedbylearningfromdatawith-outhumaninvolvement(cf.,featureengineeringinstatisticalmachinetranslation(SMT)).Theencoder-decoderarchitectureiswidelyemployed(Choetal.,inputj¯ınni´anqi´anliˇangyu`eguˇangd¯ongg¯aox¯ınj`ısh`uchˇanpˇınch¯ukˇou37.6y`ımˇeiyu´anNMTinthefirsttwomonthsofthisyear,theexportofnewhighleveltechnologyproductwasUNK-billionusdollars5srcchina’sguangdonghi-techexportshit58billiondollars5tgtchina’sexportofhighandnewhi-techexportsoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportoftheexportof···Table1:Sourceandtargetcontextsarehighlycor-relatedtotranslationadequacyandfluency,respec-tively.5srcand5tgtdenotehalvingthecontribu-tionsfromthesourceandtargetcontextswhengen-eratingthetranslation,respectively.2014;Sutskeveretal.,2014),inwhichtheencodersummarizesthesourcesentenceintoavectorrepre-sentation,andthedecodergeneratesthetargetsen-tenceword-by-wordfromthevectorrepresentation.Therepresentationofthesourcesentenceandtherepresentationofthepartiallygeneratedtargetsen-tence(translation)ateachpositionarereferredtoassourcecontextandtargetcontext,respectively.Thegenerationofatargetwordisdeterminedjointlybythesourcecontextandtargetcontext.SeveraltechniquesinNMThaveproventobeveryeffective,includinggating(HochreiterandSchmidhuber,1997;Choetal.,2014)andat-tention(Bahdanauetal.,2015)whichcanmodellong-distancedependenciesandcomplicatedalign-

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
T

l

UN
C
_
UN
_
0
0
0
4
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

88

mentrelationsinthetranslationprocess.Usinganencoder-decoderframeworkthatincorporatesgat-ingandattentiontechniques,ithasbeenreportedthattheperformanceofNMTcansurpasstheper-formanceoftraditionalSMTasmeasuredbyBLEUscore(Luongetal.,2015).Despitethissuccess,weobservethatNMTusu-allyyieldsfluentbutinadequatetranslations.1Weattributethistoastrongerinfluenceoftargetcon-textongeneration,whichresultsfromastrongerlanguagemodelthanthatusedinSMT.Oneques-tionnaturallyarises:whatwillhappenifwechangetheratioofinfluencesfromthesourceortargetcon-texts?Table1showsanexampleinwhichanattention-basedNMTsystem(Bahdanauetal.,2015)gener-atesafluentyetinadequatetranslation(e.g.,missingthetranslationof“guˇangd¯ong”).Whenwehalvethecontributionfromthesourcecontext,theresultfur-therlosesitsadequacybymissingthepartialtrans-lation“inthefirsttwomonthsofthisyear”.Onepossibleexplanationisthatthetargetcontexttakesahigherweightandthusthesystemfavorsashortertranslation.Incontrast,whenwehalvethecon-tributionfromthetargetcontext,theresultcom-pletelylosesitsfluencybyrepeatedlygeneratingthetranslationof“ch¯ukˇou”(i.e.,“theexportof”)un-tilthegeneratedtranslationreachesthemaximumlength.Therefore,thisexampleindicatesthatsourceandtargetcontextsinNMTarehighlycorrelatedtotranslationadequacyandfluency,respectively.Infact,conventionalNMTlackseffectivecontrolontheinfluenceofsourceandtargetcontexts.Ateachdecodingstep,NMTtreatsthesourceandtar-getcontextsequally,andthusignoresthedifferentneedsofthecontexts.Forexample,contentwordsinthetargetsentencearemorerelatedtothetransla-tionadequacy,andthusshoulddependmoreonthesourcecontext.Incontrast,functionwordsinthetargetsentenceareoftenmorerelatedtothetrans-lationfluency(e.g.,“of”after“isfond”),andthusshoulddependmoreonthetargetcontext.Inthiswork,weproposetousecontextgatestocontrolthecontributionsofsourceandtargetcon-textsonthegenerationoftargetwords(decoding)1Fluencymeasureswhetherthetranslationisfluent,whileadequacymeasureswhetherthetranslationisfaithfultotheoriginalsentence(Snoveretal.,2009).Figure1:ArchitectureofdecoderRNN.inNMT.Contextgatesarenon-lineargatingunitswhichcandynamicallyselecttheamountofcontextinformationinthedecodingprocess.Specifically,ateachdecodingstep,thecontextgateexaminesboththesourceandtargetcontexts,andoutputsaratiobetweenzeroandonetodeterminethepercentagesofinformationtoutilizefromthetwocontexts.Inthisway,thesystemcanbalancetheadequacyandfluencyofthetranslationwithregardtothegenera-tionofawordateachposition.Experimentalresultsshowthatintroducingcon-textgatesleadstoanaverageimprovementof+2.3BLEUpointsoverastandardattention-basedNMTsystem(Bahdanauetal.,2015).Aninterestingfind-ingisthatwecanreplacetheGRUunitsinthede-coderwithconventionalRNNunitsandinthemean-timeutilizecontextgates.Thetranslationperfor-manceiscomparablewiththestandardNMTsystemwithGRU,butthesystemenjoysasimplerstructure(i.e.,usesonlyasinglegateandhalfoftheparam-eters)andafasterdecoding(i.e.,requiresonlyhalfthematrixcomputationsfordecoding).22NeuralMachineTranslationSupposethatx=x1,xj,…xJrepresentsasourcesentenceandy=y1,yi,…yIatargetsentence.NMTdirectlymodelstheprobabilityoftranslationfromthesourcesentencetothetargetsentencewordbyword:P(sì|X)=IYi=1P(yi|sìgithub.com/nyu-dl/dl4mt-tutorial),ti−1andyi−1arecombinedtogetherwithaGRUbeforebeingfedintothedecoder,whichcanboosttranslationperformance.Wefollowthepracticeandtreatbothofthemastargetcontext. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 90 forsourceandtargetcontexts:ti=f(b⊗(Noi(yi−1)+Uti−1)+a⊗Csi)Forexample,thepair(1.0,0.5)meansfullylever-agingtheeffectofsourcecontextwhilehalvingtheeffectoftargetcontext.Reducingtheeffectoftar-getcontext(i.e.,thelines(1.0,0.8)E(1.0,0.5))resultsinlongertranslations,whilereducingtheef-fectofsourcecontext(i.e.,thelines(0.8,1.0)E(0.5,1.0))leadstoshortertranslations.Whenhalv-ingtheeffectofthetargetcontext,mostofthegener-atedtranslationsreachthemaximumlength,whichisthreetimesthelengthofsourcesentenceinthiswork.Figure2(B)showstheresultsofmanualevalu-ationon200sourcesentencesrandomlysampledfromthetestsets.Reducingtheeffectofsourcecon-text(cioè.,(0.8,1.0)E(0.5,1.0))leadstomoreflu-entyetlessadequatetranslations.Ontheotherhand,reducingtheeffectoftargetcontext(cioè.,(1.0,0.5)E(1.0,0.8))isexpectedtoyieldmoreadequatebutlessfluenttranslations.Inthissetting,thesourcewordsaretranslated(i.e.,higheradequacy)whilethetranslationsareinwrongorder(i.e.,lowerflu-ency).Inpractice,Tuttavia,weobservethesideef-fectthatsomesourcewordsaretranslatedrepeatedlyuntilthetranslationreachesthemaximumlength(i.e.,lowerfluency),whileothersareleftuntrans-lated(i.e.,loweradequacy).Thereasonistwofold:1.NMTlacksamechanismthatguaranteesthateachsourcewordistranslated.4Thedecod-ingstateimplicitlymodelsthenotionof“cover-age”byrecurrentlyreadingthetime-dependentsourcecontextsi.Loweringitscontributionweakensthe“coverage”effectandencour-agesthedecodertoregeneratephrasesmultipletimestoachievethedesiredtranslationlength.2.Thetranslationisincomplete.AsshowninTa-ble1,NMTcangetstuckinaninfinitelooprepeatedlygeneratingaphraseduetotheover-whelminginfluenceofthesourcecontext.Asaresult,generationterminatesearlybecause4Therecentlyproposedcoveragebasedtechniquecanallevi-atethisproblem(Tuetal.,2016).Inthiswork,weconsideran-otherapproach,whichiscomplementarytothecoveragemech-anism.Figure3:Architectureofcontextgate.thetranslationreachesthemaximumlengthal-lowedbytheimplementation,eventhoughthedecodingprocedureisnotfinished.Thequantitative(Figure2)andqualitative(Ta-ble1)resultsconfirmourhypothesis,i.e.,sourceandtargetcontextsarehighlycorrelatedtotranslationadequacyandfluency.WebelievethatamechanismthatcandynamicallyselectinformationfromsourcecontextandtargetcontextwouldbeusefulforNMTmodels,andthisisexactlytheapproachwepropose.3ContextGates3.1ArchitectureInspiredbythesuccessofgatedunitsinRNN(HochreiterandSchmidhuber,1997;Choetal.,2014),weproposeusingcontextgatestodynamicallycontroltheamountofinformationflowingfromthesourceandtargetcontextsandthusbalancethefluencyandadequacyofNMTateachdecodingstep.Intuitively,ateachdecodingstepi,thecontextgatelooksatinputsignalsfromboththesource(i.e.,si)andtarget(i.e.,ti−1andyi−1)sides,andoutputsanumberbetween0and1foreachelementintheinputvectors,where1denotes“completelytrans-ferringthis”while0denotes“completelyignoringthis”.Thecorrespondinginputsignalsarethenpro-cessedwithanelement-wisemultiplicationbeforebeingfedtotheactivationlayertoupdatethedecod-ingstate.Formally,acontextgateconsistsofasigmoidneuralnetworklayerandanelement-wisemultipli-cationoperation,asillustratedinFigure3.Thecon-textgateassignsanelement-wiseweighttotheinput l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 91 (UN)ContextGate(source)(B)ContextGate(target)(C)ContextGate(both)Figure4:ArchitecturesofNMTwithvariouscontextgates,whicheitherscaleonlyonesideoftranslationcontexts(i.e.,sourcecontextin(UN)andtargetcontextin(B))orcontroltheeffectsofbothsides(cioè.,(C)).signals,computedbyzi=σ(Wze(yi−1)+Uzti−1+Czsi)(4)Hereσ(·)isalogisticsigmoidfunction,andWz∈Rn×m,Uz∈Rn×n,Cz∈Rn×n0aretheweightmatrices.Again,M,nandn0arethedimensionsofwordembedding,decodingstate,andsourcerep-resentation,respectively.Notethatzihasthesamedimensionalityasthetransferredinputsignals(e.g.,Csi),andthuseachelementintheinputvectorshasitsownweight.3.2IntegratingContextGatesintoNMTNext,weconsiderhowtointegratecontextgatesintoanNMTmodel.Thecontextgatecandecidetheamountofcon-textinformationusedingeneratingthenexttargetwordateachstepofdecoding.Forexample,afterobtainingthepartialtranslation“...newhighleveltechnologyproduct”,thegatelooksatthetranslationcontextsanddecidestodependmoreheavilyonthesourcecontext.Accordingly,thegateassignshigherweightstothesourcecontextandlowerweightstothetargetcontextandthenfeedsthemintothede-codingactivationlayer.Thiscouldcorrectinade-quatetranslations,suchasthemissingtranslationof“guˇangd¯ong”,duetogreaterinfluencefromthetar-getcontext.WehavethreestrategiesforintegratingcontextgatesintoNMTthateitheraffectoneofthetransla-tioncontextsorbothcontexts,asillustratedinFig-ure4.Thefirsttwostrategiesareinspiredbyout-putgatesinLSTMs(HochreiterandSchmidhuber,1997),whichcontroltheamountofmemorycontentutilized.Inthesekindsofmodels,zionlyaffectseithersourcecontext(i.e.,si)ortargetcontext(i.e.,yi−1andti−1):•ContextGate(source)ti=f(cid:0)Noi(yi−1)+Uti−1+zi◦Csi(cid:1)•ContextGate(target)ti=f(cid:0)zi◦(Noi(yi−1)+Uti−1)+Csi(cid:1)where◦isanelement-wisemultiplication,andziisthecontextgatecalculatedbyEquation4.ThisisalsoessentiallysimilartotheresetgateintheGRU,whichdecideswhatinformationtoforgetfromthepreviousdecodingstatebeforetransferringthatin-formationtothedecodingactivationlayer.Thedif-ferenceisthatherethe“reset”gateresetsthecontextvectorratherthanthepreviousdecodingstate.Thelaststrategyisinspiredbytheconceptofup-dategatefromGRU,whichtakesalinearsumbe-tweenthepreviousstateti−1andthecandidatenewstate˜ti.Inourcase,wetakealinearinterpolationbetweensourceandtargetcontexts:•ContextGate(both)ti=f(cid:0)(1−zi)(Noi(yi−1)+Uti−1)+zi◦Csi(cid:1) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 92 (UN)GatingScalar(B)ContextGateFigure5:ComparisontoGatingScalarproposedbyXuetal.(2015).4RelatedWorkComparisonto(Xuetal.,2015):ContextgatesareinspiredbythegatingscalarmodelproposedbyXuetal.(2015)fortheimagecaptiongenera-tiontask.Theessentialdifferenceliesinthetaskrequirement:•Inimagecaptiongeneration,thesourceside(i.e.,image)containsmoreinformationthanthetargetside(i.e.,caption).Therefore,theyem-ployagatingscalartoscaleonlythesourcecontext.•Inmachinetranslation,bothlanguagesshouldcontainequivalentinformation.Ourmodeljointlycontrolsthecontributionsfromthesourceandtargetcontexts.AdirectinteractionbetweeninputsignalsfrombothsidesisusefulforbalancingadequacyandfluencyofNMT.Otherdifferencesinthearchitectureinclude:1Xuetal.(2015)usesascalarthatissharedbyallelementsinthesourcecontext,whileweemployagatewithadistinctweightforeachel-ement.Thelatteroffersthegateamoreprecisecontrolofthecontextvector,sincedifferentel-ementsretaindifferentinformation.2Weaddpeepholeconnectionstothearchitec-ture,bywhichthesourcecontextcontrolsthegate.Ithasbeenshownthatpeepholeconnec-tionsmakeprecisetimingseasiertolearn(GersandSchmidhuber,2000).3Ourcontextgatealsoconsidersthepreviouslygeneratedwordyi−1asinput.Themostre-centlygeneratedwordcanhelpthegatetobet-terestimatetheimportanceoftargetcontext,especiallyforthegenerationoffunctionwordsintranslationsthatmaynothaveacorrespond-ingwordinthesourcesentence(e.g.,“of”after“isfond”).Experimentalresults(Section5.4)showthatthesemodificationsconsistentlyimprovetranslationqual-ity.ComparisontoGatedRNN:State-of-the-artNMTmodels(Sutskeveretal.,2014;Bahdanauetal.,2015)generallyemployagatedunit(e.g.,GRUorLSTM)astheactivationfunctioninthedecoder.Onemightsuspectthatthecontextgateproposedinthisworkissomewhatredundant,giventheexistinggatesthatcontroltheamountofinformationcarriedoverfromthepreviousdecodingstatesi−1(e.g.,re-setgateinGRU).Wearguethattheyareinfactcom-plementary:thecontextgateregulatesthecontextualinformationflowingintothedecodingstate,whilethegatedunitcaptureslong-termdependenciesbe-tweendecodingstates.Ourexperimentsconfirmthecorrectnessofourhypothesis:thecontextgatenotonlyimprovestranslationqualitywhencomparedtoaconventionalRNNunit(e.g.,anelement-wisetanh),butalsowhencomparedtoagatedunitofGRU,asshowninSection5.2.ComparisontoCoverageMechanism:Re-cently,Tuetal.(2016)proposeaddingacoveragemechanismintoNMTtoalleviateover-translationandunder-translationproblems,whichdirectlyaffecttranslationadequacy.Theymaintainacov-eragevectortokeeptrackofwhichsourcewordshavebeentranslated.Thecoveragevectorisfedtotheattentionmodeltohelpadjustfutureattention.ThisguidesNMTtofocusontheun-translatedsourcewordswhileavoidingrepetitionofsourcecontent.Ourapproachiscomplementary:thecov-eragemechanismproducesabettersourcecontextrepresentation,whileourcontextgatecontrolstheeffectofthesourcecontextbasedonitsrelativeimportance.ExperimentsinSection5.2showthatcombiningthetwomethodscanfurtherimprovetranslationperformance.Thereisanotherdifference l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 93 aswell:thecoveragemechanismisonlyapplicabletoattention-basedNMTmodels,whilethecontextgateisapplicabletoallNMTmodels.ComparisontoExploitingAuxiliaryContextsinLanguageModeling:Athreadofworkinlan-guagemodeling(LM)attemptstoexploitauxiliarysentence-levelordocument-levelcontextinanRNNLM(MikolovandZweig,2012;Jietal.,2015;WangandCho,2016).Independentofourwork,WangandCho(2016)propose“earlyfusion”modelsofRNNswhereadditionalinformationfromaninter-sentencecontextis“fused”withtheinputtotheRNN.CloselyrelatedtoWangandCho(2016),ourapproachaimstodynamicallycontrolthecontribu-tionsofrequiredsourceandtargetcontextsforma-chinetranslation,whiletheirsfocusesonintegratingauxiliarycorpus-levelcontextsforlanguagemod-ellingtobetterapproximatethecorpus-levelprob-ability.Inaddition,weemployagatingmechanismtoproduceadynamicweightatdifferentdecodingstepstocombinesourceandtargetcontexts,whiletheydoalinearcombinationofintra-sentenceandinter-sentencecontextswithstaticweights.Exper-imentsinSection5.2showthatourgatingmech-anismsignificantlyoutperformslinearinterpolationwhencombiningcontexts.ComparisontoHandlingNull-GeneratedWordsinSMT:Inmachinetranslation,therearecertainsyntacticelementsofthetargetlanguagethataremissinginthesource(i.e.,null-generatedwords).Infactthiswasthepreliminarymotivationforourapproach:currentattentionmodelslackamecha-nismtocontrolthegenerationofwordsthatdonothaveastrongcorrespondenceonthesourceside.ThemodelstructureofNMTisquitesimilartothetraditionalword-basedSMT(Brownetal.,1993).Therefore,techniquesthathaveproveneffectiveinSMTmayalsobeapplicabletoNMT.Toutanovaetal.(2002)extendthecalculationoftranslationprob-abilitiestoincludenull-generatedtargetwordsinword-basedSMT.Thesewordsaregeneratedbasedonboththespecialsourcetokennullandtheneigh-bouringwordinthetargetlanguagebyamixturemodel.Wehavesimplifiedandgeneralizedtheirap-proach:weusecontextgatestodynamicallycontrolthecontributionofsourcecontext.Whenproduc-ingnull-generatedwords,thecontextgatecanas-signlowerweightstothesourcecontext,bywhichthesource-sideinformationhavelessinfluence.Inasense,thecontextgaterelievestheneedforanullstateinattention.5Experiments5.1SetupWecarriedoutexperimentsonChinese-Englishtranslation.Thetrainingdatasetconsistedof1.25MsentencepairsextractedfromLDCcorpora5,with27.9MChinesewordsand34.5MEnglishwordsre-spectively.WechosetheNIST2002(MT02)datasetasthedevelopmentset,andtheNIST2005(MT05),2006(MT06)and2008(MT08)datasetsasthetestsets.Weusedthecase-insensitive4-gramNISTBLEUscore(Papinenietal.,2002)astheevalua-tionmetric,andsign-test(Collinsetal.,2005)forthestatisticalsignificancetest.Forefficienttrainingoftheneuralnetworks,welimitedthesourceandtargetvocabulariestothemostfrequent30KwordsinChineseandEnglish,coveringapproximately97.7%and99.3%ofthedatainthetwolanguagesrespectively.Allout-of-vocabularywordsweremappedtoaspecialtokenUNK.Wetrainedeachmodelonsentencesoflengthupto80wordsinthetrainingdata.Thewordem-beddingdimensionwas620andthesizeofahid-denlayerwas1000.WetrainedourmodelsuntiltheBLEUscoreonthedevelopmentsetstopsimprov-ing.WecomparedourmethodwithrepresentativeSMTandNMT6models:•Moses(Koehnetal.,2007):anopensourcephrase-basedtranslationsystemwithdefaultconfigurationanda4-gramlanguagemodeltrainedonthetargetportionoftrainingdata;•GroundHog(Bahdanauetal.,2015):anopensourceattention-basedNMTmodelwithde-faultsetting.Wehavetwovariantsthatdifferintheactivationfunctionusedinthedecoder5ThecorporaincludeLDC2002E18,LDC2003E07,LDC2003E14,HansardsportionofLDC2004T07,LDC2004T08andLDC2005T06.6Thereissomerecentprogressonaggregatingmultiplemodelsorenlargingthevocabulary(e.g.,,in(Jeanetal.,2015)),butherewefocusonthegenericmodels. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 94 #System#ParametersMT05MT06MT08Ave.1Moses–31.3730.8523.0128.412GroundHog(vanilla)77.1M26.0727.3420.3824.6032+ContextGate(both)80.7M30.86∗30.85∗24.71∗28.814GroundHog(GRU)84.3M30.6131.1223.2328.3254+ContextGate(source)87.9M31.96∗32.29∗24.97∗29.7464+ContextGate(target)87.9M32.38∗32.11∗23.7829.4274+ContextGate(both)87.9M33.52∗33.46∗24.85∗30.618GroundHog-Coverage(GRU)84.4M32.7332.4725.2330.1498+ContextGate(both)88.0M34.13∗34.83∗26.22∗31.73Table2:Evaluationoftranslationqualitymeasuredbycase-insensitiveBLEUscore.“GroundHog(vanilla)”and“GroundHog(GRU)”denoteattention-basedNMT(Bahdanauetal.,2015)andusesasim-pletanhfunctionorasophisticatedgatefunctionGRUrespectivelyastheactivationfunctioninthede-coderRNN.“GroundHog-Coverage”denotesattention-basedNMTwithacoveragemechanismtoindicatewhetherasourcewordistranslatedornot(Tuetal.,2016).“*”indicatestatisticallysignificantdifference(P<0.01)fromthecorrespondingNMTvariant.“2+ContextGate(both)”denotesintegrating“ContextGate(both)”intothebaselinesysteminRow2(i.e.,“GroundHog(vanilla)”).RNN:1)GroundHog(vanilla)usesasimpletanhfunctionastheactivationfunction,and2)GroundHog(GRU)usesasophisticatedgatefunctionGRU;•GroundHog-Coverage(Tuetal.,2016)7:animprovedattention-basedNMTmodelwithacoveragemechanism.5.2TranslationQualityTable2showsthetranslationperformancesintermsofBLEUscores.WecarriedoutexperimentsonmultipleNMTvariants.Forexample,“2+ContextGate(both)”inRow3denotesintegrating“Con-textGate(both)”intothebaselineinRow2(i.e.,GroundHog(vanilla)).Forbaselines,wefoundthatthegatedunit(i.e.,GRU,Row4)indeedsurpassesitsvanillacounterpart(i.e.,tanh,Row2),whichisconsistentwiththeresultsinotherwork(Chungetal.,2014).Clearlytheproposedcontextgatessignificantlyimprovethetranslationqualityinallcases,althoughtherearestillconsiderablediffer-encesamongthevariants:ParametersContextgatesintroduceafewnewparameters.Thenewlyintroducedparametersin-cludeWz∈Rn×m,Uz∈Rn×n,Cz∈Rn×n0in7https://github.com/tuzhaopeng/NMT-Coverage.Equation4.Inthiswork,thedimensionalityofthedecodingstateisn=1000,thedimensionalityofthewordembeddingism=620,andthedimen-sionalityofcontextrepresentationisn0=2000.Thecontextgatesonlyintroduce3.6Madditionalparam-eters,whichisquitesmallcomparedtothenumberofparametersintheexistingmodels(e.g.,84.3Minthe“GroundHog(GRU)”).OverGroundHog(vanilla)Wefirstcarriedoutexperimentsonasimpledecoderwithoutgatingfunction(Rows2and3),tobetterestimatetheim-pactofcontextgates.AsshowninTable2,theproposedcontextgatesignificantlyimprovedtrans-lationperformanceby4.2BLEUpointsonaverage.Itisworthemphasizingthatcontextgateevenout-performsamoresophisticatedgatingfunction(i.e.,GRUinRow4).Thisisveryencouraging,sinceourmodelonlyhasasinglegatewithhalfoftheparam-eters(i.e.,3.6Mversus7.2M)andlesscomputations(i.e.,halfthematrixcomputationstoupdatethede-codingstate8).8WeonlyneedtocalculatethecontextgateonceviaEqua-tion4andthenapplyitwhenupdatingthedecodingstate.Incontrast,GRUrequiresthecalculationofanupdategate,are-setgate,aproposedupdateddecodingstateandaninterpolationbetweenthepreviousstateandtheproposedstate.Pleasereferto(Choetal.,2014)formoredetails. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 95 GroundHogvs.GroundHog+ContextGateAdequacyFluency<=><=>evaluator130.0%54.0%16.0%28.5%48.5%23.0%evaluator230.0%50.0%20.0%29.5%54.5%16.0%Table3:Subjectiveevaluationoftranslationadequacyandfluency.OverGroundHog(GRU)WetheninvestigatedtheeffectofthecontextgatesonastandardNMTwithGRUasthedecodingactivationfunction(Rows4-7).Severalobservationscanbemade.First,con-textgatesalsoboostperformancebeyondtheGRUinallcases,demonstratingourclaimthatcontextgatesarecomplementarytotheresetandupdategatesinGRU.Second,jointlycontrollingtheinfor-mationfrombothtranslationcontextsconsistentlyoutperformsitssingle-sidecounterparts,indicatingthatadirectinteractionbetweeninputsignalsfromthesourceandtargetcontextsisusefulforNMTmodels.OverGroundHog-Coverage(GRU)Wefinallytestedonastrongerbaseline,whichemploysacov-eragemechanismtoindicatewhetherornotasourcewordhasalreadybeentranslated(Tuetal.,2016).Ourcontextgatestillachievesasignificantimprove-mentof1.6BLEUpointsonaverage,reconfirm-ingourclaimthatthecontextgateiscomplemen-tarytotheimprovedattentionmodelthatproducesabettersourcecontextrepresentation.Finally,ourbestmodel(Row7)outperformstheSMTbaselinesystemusingthesamedata(Row1)by3.3BLEUpoints.Fromhereon,wereferto“GroundHog”for“GroundHog(GRU)",and“ContextGate”for“ContextGate(both)”ifnototherwisestated.SubjectiveEvaluationWealsoconductedasub-jectiveevaluationofthebenefitofincorporatingcontextgates.Twohumanevaluatorswereaskedtocomparethetranslationsof200sourcesentencesrandomlysampledfromthetestsetswithoutknow-ingwhichsystemproducedeachtranslation.Table3showstheresultsofsubjectiveevaluation.Thetwohumanevaluatorsmadesimilarjudgments:inade-quacy,around30%ofGroundHogtranslationsareworse,52%areequal,and18%arebetter;whileinSystemSAERAERGroundHog67.0054.67+ContextGate67.4355.52GroundHog-Coverage64.2550.50+ContextGate63.8049.40Table4:Evaluationofalignmentquality.Thelowerthescore,thebetterthealignmentquality.fluency,around29%areworse,52%areequal,and19%arebetter.5.3AlignmentQualityTable4liststhealignmentperformances.Follow-ingTuetal.(2016),weusedthealignmenterrorrate(AER)(OchandNey,2003)anditsvariantSAERtomeasurethealignmentquality:SAER=1−|MA×MS|+|MA×MP||MA|+|MS|whereAisacandidatealignment,andSandParethesetsofsureandpossiblelinksintherefer-encealignmentrespectively(S⊆P).Mdenotesthealignmentmatrix,andforbothMSandMPweassigntheelementsthatcorrespondtotheexistinglinksinSandPprobability1andtheotherelementsprobability0.Inthisway,weareabletobettereval-uatethequalityofthesoftalignmentsproducedbyattention-basedNMT.Wefindthatcontextgatesdonotimprovealign-mentqualitywhenusedalone.Whencombinedwithcoveragemechanism,Tuttavia,itproducesbet-teralignments,especiallyone-to-onealignmentsbyselectingthesourcewordwiththehighestalign-mentprobabilitypertargetword(i.e.,AERscore).Onepossiblereasonisthatbetterestimateddecod-ingstates(fromthecontextgate)andcoveragein-formationhelptoproducemoreconcentratedalign-ments,asshowninFigure6.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
T

l

UN
C
_
UN
_
0
0
0
4
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

96

(UN)GroundHog-Coverage(SAER=50.80)(B)+ContextGate(SAER=47.35)Figure6:Examplealignments.Incorporatingcontextgateproducesmoreconcentratedalignments.#SystemGateInputsMT05MT06MT08Ave.1GroundHog–30.6131.1223.2328.3221+GatingScalarti−131.62∗31.4823.8528.9831+ContextGate(source)ti−131.69∗31.6324.25∗29.1941+ContextGate(both)ti−132.15∗32.05∗24.39∗29.535ti−1,si31.81∗32.75∗25.66∗30.076ti−1,si,yi−133.52∗33.46∗24.85∗30.61Table5:AnalysisofthemodelarchitecturesmeasuredinBLEUscores.“GatingScalar”denotesthemodelproposedby(Xuetal.,2015)intheimagecaptiongenerationtask,whichlooksatonlythepreviousdecod-ingstateti−1andscalesthewholesourcecontextsiatthevector-level.Toinvestigatetheeffectofeachcomponent,welisttheresultsofcontextgatevariantswithdifferentinputs(e.g.,thepreviouslygeneratedwordyi−1).“*”indicatesstatisticallysignificantdifference(P<0.01)from“GroundHog”.5.4ArchitectureAnalysisTable5showsadetailedanalysisofarchitecturecomponentsmeasuredinBLEUscores.Severalob-servationscanbemade:•OperationGranularity(Rows2and3):Element-wisemultiplication(i.e.,ContextGate(source))outperformsthevector-levelscalar(i.e.,GatingScalar),indicatingthatprecisecontrolofeachelementinthecontextvectorbooststranslationperformance.•GateStrategy(Rows3and4):Whenonlyfedwiththepreviousdecodingstateti−1,ContextGate(both)consistentlyoutperformsContextGate(source),showingthatjointlycontrollinginformationfrombothsourceandtargetsidesisimportantforjudgingtheimportanceofthecontexts.•Peepholeconnections(Rows4and5):Peep-holes,bywhichthesourcecontextsicontrolsthegate,playanimportantroleinthecontextgate,whichimprovestheperformanceby0.57inBLEUscore.•Previouslygeneratedword(Rows5and6):Previouslygeneratedwordyi−1providesamoreexplicitsignalforthegatetojudgetheimportanceofcontexts,leadingtoafurtherim-provementontranslationperformance.5.5EffectsonLongSentencesWefollowBahdanauetal.(2015)andgroupsen-tencesofsimilarlengthstogether.Figure7shows l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 8 1 5 6 7 4 4 4 / / t l a c _ a _ 0 0 0 4 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 97 Figure7:Performanceoftranslationsonthetestsetwithrespecttothelengthsofthesourcesentences.Contextgateimprovesperformancebyalleviatingin-adequatetranslationsonlongsentences.theBLEUscoreandtheaveragedlengthoftrans-lationsforeachgroup.GroundHogperformsverywellonshortsourcesentences,butdegradesonlongsourcesentences(i.e.,>30),whichmaybeduetothefactthatsourcecontextisnotfullyinterpreted.Contextgatescanalleviatethisproblembybalanc-ingthesourceandtargetcontexts,andthusimprovedecoderperformanceonlongsentences.Infact,in-corporatingcontextgatesboosttranslationperfor-manceonallsourcesentencegroups.Weconfirmthatcontextgateweightzicorrelateswellwithtranslationperformance.Inotherwords,translationsthatcontainhigherzi(i.e.,sourcecon-textcontributesmorethantargetcontext)atmanytimestepsarebetterintranslationperformance.Weusedthemeanofthesequencez1,…,zi,…,zIasthegateweightofeachsentence.WecalculatedthePearsonCorrelationbetweenthesentence-levelgateweightandthecorrespondingimprovementontranslationperformance(i.e.,BLEU,adequacy,andfluencyscores),9asshowninTable6.Weobservedthatcontextgateweightispositivelycorrelatedwithtranslationperformanceimprovementandthatthecorrelationishigheronlongsentences.Asanexample,considerthissourcesentencefromthetestset:9Weusetheaverageofcorrelationsonsubjectiveevaluationmetrics(i.e.,adequacyandfluency)bytwoevaluators.LengthBLEUAdequacyFluency<300.0240.0710.040>300.0760.1210.168Table6:Correlationbetweencontextgateweightandimprovementoftranslationperformance.“Length”denotesthelengthofsourcesentence.“BLEU”,“Adequacy”,and“Fluency”denotesdifferentmetricsmeasuringthetranslationperfor-manceimprovementofusingcontextgates.zh¯ouli`uzh`engsh`ıy¯ınggu´om´ınzh`ongd`aoch¯aosh`ıcˇaig`oudeg¯aof¯engsh´ık`e,d¯angsh´ı14ji¯ach¯aosh`ıdegu¯anb`ıl`ıngy¯ınggu´ozh`eji¯azu`ıd`adeli´ansuˇoch¯aosh`ısˇunsh¯ısh`ubˇaiw`any¯ıngb`angdexi¯aosh`oush¯our`u.GroundHogtranslatesitinto:twenty-sixlondonsupermarketswereclosedatapeakhourofthebritishpop-ulationinthesameperiodoftime.whichalmostmissesalltheinformationofthesourcesentence.Integratingcontextgatesimprovesthetranslationadequacy:thisisexactlythepeakdaysBritishpeo-plebuyingthesupermarket.theclosure

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
T

l

UN
C
_
UN
_
0
0
0
4
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

98

ofthe14supermarketsofthe14super-marketsthatthelargestchainsupermar-ketinenglandlostseveralmillionpoundsofsalesincome.Coveragemechanismsfurtherimprovethetransla-tionbyrectifyingover-translation(e.g.,“ofthe14supermarkets”)andunder-translation(e.g.,“satur-day”and“atthattime”):saturdayisthepeakseasonofbritishpeo-ple’spurchasesofthesupermarket.atthattime,theclosureof14supermarketsmadethebiggestsupermarketofbritainlosemillionsofpoundsofsalesincome.6ConclusionWefindthatsourceandtargetcontextsinNMTarehighlycorrelatedtotranslationadequacyandflu-ency,respectively.Basedonthisobservation,weproposeusingcontextgatesinNMTtodynamicallycontrolthecontributionsfromthesourceandtargetcontextsinthegenerationofatargetsentence,toenhancetheadequacyofNMT.ByprovidingNMTtheabilitytochoosetheappropriateamountofin-formationfromthesourceandtargetcontexts,onecanalleviatemanytranslationproblemsfromwhichNMTsuffers.ExperimentalresultsshowthatNMTwithcontextgatesachievesconsistentandsignifi-cantimprovementsintranslationqualityoverdiffer-entNMTmodels.Contextgatesareinprincipleapplicabletoallsequence-to-sequencelearningtasksinwhichinfor-mationfromthesourcesequenceistransformedtothetargetsequence(correspondingtoadequacy)andthetargetsequenceisgenerated(correspondingtofluency).Inthefuture,wewillinvestigatetheef-fectivenessofcontextgatestoothertasks,suchasdialogueandsummarization.ItisalsonecessarytovalidatetheeffectivenessofourapproachonmorelanguagepairsandotherNMTarchitectures(e.g.,usingLSTMaswellasGRU,ormultiplelayers).AcknowledgementThisworkissupportedbyChinaNational973project2014CB340301.YangLiuissupportedbytheNationalNaturalScienceFoundationofChina(No.61522204)andthe863Program(2015AA015407).WethankactioneditorChrisQuirkandthreeanonymousreviewersfortheirin-sightfulcomments.ReferencesDzmitryBahdanau,KyunghyunCho,andYoshuaBen-gio.2015.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.ICLR2015.PeterE.Brown,StephenA.DellaPietra,VincentJ.DellaPietra,andRobertL.Mercer.1993.Themathematicsofstatisticalmachinetranslation:Parameterestima-tion.ComputationalLinguistics,19(2):263–311.KyunghyunCho,BartvanMerrienboer,CaglarGulcehre,FethiBougares,HolgerSchwenk,andYoshuaBen-gio.2014.Learningphraserepresentationsusingrnnencoder-decoderforstatisticalmachinetranslation.InEMNLP2014.JunyoungChung,CaglarGulcehre,KyungHyunCho,andYoshuaBengio.2014.Empiricalevaluationofgatedrecurrentneuralnetworksonsequencemodel-ing.arXiv.MichaelCollins,PhilippKoehn,andIvonaKuˇcerov´a.2005.Clauserestructuringforstatisticalmachinetranslation.InACL2005.FelixAGersandJ¨urgenSchmidhuber.2000.Recurrentnetsthattimeandcount.InIJCNN2000.IEEE.SeppHochreiterandJ¨urgenSchmidhuber.1997.Longshort-termmemory.NeuralComputation.S´ebastienJean,KyunghyunCho,RolandMemisevic,andYoshuaBengio.2015.Onusingverylargetargetvo-cabularyforneuralmachinetranslation.InACL2015.YangfengJi,TrevorCohn,LingpengKong,ChrisDyer,andJacobEisenstein.2015.Documentcontextlan-guagemodels.InICLR2015.NalKalchbrennerandPhilBlunsom.2013.Recurrentcontinuoustranslationmodels.InEMNLP2013.PhilippKoehn,HieuHoang,AlexandraBirch,ChrisCallison-Burch,MarcelloFederico,NicolaBertoldi,BrookeCowan,WadeShen,ChristineMoran,RichardZens,ChrisDyer,OndrejBojar,AlexandraCon-stantin,andEvanHerbst.2007.Moses:opensourcetoolkitforstatisticalmachinetranslation.InACL2007.Minh-ThangLuong,HieuPham,andChristopherD.Manning.2015.Effectiveapproachestoattention-basedneuralmachinetranslation.InEMNLP2015.TomasMikolovandGeoffreyZweig.2012.Contextde-pendentrecurrentneuralnetworklanguagemodel.InSLT2012.FranzJ.OchandHermannNey.2003.Asystematiccomparisonofvariousstatisticalalignmentmodels.ComputationalLinguistics,29(1):19–51.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
T

l

UN
C
_
UN
_
0
0
0
4
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

99

KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.BLEU:amethodforautomaticeval-uationofmachinetranslation.InACL2002.MatthewSnover,NitinMadnani,BonnieJDorr,andRichardSchwartz.2009.Fluency,adequacy,orHTER?:exploringdifferenthumanjudgmentswithatunableMTmetric.InProceedingsoftheFourthWorkshoponStatisticalMachineTranslation,pages259–268.IlyaSutskever,OriolVinyals,andQuocV.Le.2014.Sequencetosequencelearningwithneuralnetworks.InNIPS2014.KristinaToutanova,H.TolgaIlhan,andChristopherD.Manning.2002.ExtensionstoHMM-basedstatisticalwordalignmentmodels.InEMNLP2012.ZhaopengTu,ZhengdongLu,YangLiu,XiaohuaLiu,andHangLi.2016.Modelingcoverageforneuralmachinetranslation.InACL2016.TianWangandKyunghyunCho.2016.Larger-contextlanguagemodellingwithrecurrentneuralnetwork.InACL2016.KelvinXu,JimmyBa,RyanKiros,KyunghyunCho,AaronCourville,RuslanSalakhutdinov,RichardZemel,andYoshuaBengio.2015.Show,attendandtell:Neuralimagecaptiongenerationwithvisualat-tention.InICML2015.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
4
8
1
5
6
7
4
4
4

/

/
T

l

UN
C
_
UN
_
0
0
0
4
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

100Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine
Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 87–99, 2017. Redattore di azioni: Chris Quirk. Immagine

Scarica il pdf