计算语言学协会会刊, 卷. 5, PP. 365–378, 2017. 动作编辑器: 亚当·洛佩兹（Adam Lopez）.

计算语言学协会会刊, 卷. 5, PP. 365–378, 2017. 动作编辑器: 亚当·洛佩兹（Adam Lopez）.
提交批次: 11/2016; 修改批次: 2/2017; 已发表 10/2017.

2017 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

C
(西德:13)

FullyCharacter-LevelNeuralMachineTranslationwithoutExplicitSegmentationJasonLee∗ETHZ¨urichjasonlee@inf.ethz.chKyunghyunChoNewYorkUniversitykyunghyun.cho@nyu.eduThomasHofmannETHZ¨urichthomas.hofmann@inf.ethz.chAbstractMostexistingmachinetranslationsystemsop-erateatthelevelofwords,relyingonex-plicitsegmentationtoextracttokens.Wein-troduceaneuralmachinetranslation(NMT)modelthatmapsasourcecharactersequencetoatargetcharactersequencewithoutanyseg-mentation.Weemployacharacter-levelcon-volutionalnetworkwithmax-poolingattheencodertoreducethelengthofsourcerep-resentation,allowingthemodeltobetrainedataspeedcomparabletosubword-levelmod-elswhilecapturinglocalregularities.Ourcharacter-to-charactermodeloutperformsarecentlyproposedbaselinewithasubword-levelencoderonWMT’15DE-ENandCS-EN,andgivescomparableperformanceonFI-ENandRU-EN.Wethendemonstratethatitispossibletoshareasinglecharacter-levelencoderacrossmultiplelanguagesbytrainingamodelonamany-to-onetransla-tiontask.Inthismultilingualsetting,thecharacter-levelencodersigniﬁcantlyoutper-formsthesubword-levelencoderonallthelanguagepairs.WeobservethatonCS-EN,FI-ENandRU-EN,thequalityofthemultilin-gualcharacter-leveltranslationevensurpassesthemodelsspeciﬁcallytrainedonthatlan-guagepairalone,bothintermsoftheBLEUscoreandhumanjudgment.1IntroductionNearlyallpreviousworkinmachinetranslationhasbeenatthelevelofwords.Asidefromourintu-∗ThemajorityofthisworkwascompletedwhiletheauthorwasvisitingNewYorkUniversity.itiveunderstandingofwordasabasicunitofmean-ing(杰肯道夫,1992),onereasonbehindthisisthatsequencesaresigniﬁcantlylongerwhenrep-resentedincharacters,compoundingtheproblemofdatasparsityandmodelinglong-rangedepen-dencies.ThishasdrivenNMTresearchtobeal-mostexclusivelyword-level(Bahdanauetal.,2015;Sutskeveretal.,2014).Despitetheirremarkablesuccess,word-levelNMTmodelssufferfromseveralmajorweaknesses.Forone,theyareunabletomodelrare,out-of-vocabularywords,makingthemlimitedintranslat-inglanguageswithrichmorphologysuchasCzech,FinnishandTurkish.Ifoneusesalargevocabularytocombatthis(Jeanetal.,2015),thecomplexityoftraininganddecodinggrowslinearlywithrespecttothetargetvocabularysize,leadingtoaviciouscycle.Toaddressthis,wepresentafullycharacter-levelNMTmodelthatmapsacharactersequenceinasourcelanguagetoacharactersequenceinatargetlanguage.Weshowthatourmodeloutperformsabaselinewithasubword-levelencoderonDE-ENandCS-EN,andachievesacomparableresultonFI-ENandRU-EN.Apurelycharacter-levelNMTmodelwithabasicencoderwasproposedasabase-linebyLuongandManning(2016),buttrainingitwasprohibitivelyslow.Wewereabletotrainourmodelatareasonablespeedbydrasticallyreducingthelengthofsourcesentencerepresentationusingastackofconvolutional,poolingandhighwaylayers.Oneadvantageofcharacter-levelmodelsisthattheyarebettersuitedformultilingualtranslationthantheirword-levelcounterpartswhichrequireaseparatewordvocabularyforeachlanguage.We

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
7
1
5
6
7
5
3
7

/
t

我

A
C
_
A
_
0
0
0
6
7
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

366

verifythisbytrainingasinglemodeltotranslatefourlanguages(德语,Czech,FinnishandRus-sian)toEnglish.Ourmultilingualcharacter-levelmodeloutperformsthesubword-levelbaselinebyaconsiderablemargininallfourlanguagepairs,stronglyindicatingthatacharacter-levelmodelismoreﬂexibleinassigningitscapacitytodifferentlanguagepairs.Furthermore,weobservethatourmultilingualcharacter-leveltranslationevenexceedsthequalityofbilingualtranslationinthreeoutoffourlanguagepairs,bothinBLEUscoremetricandhumanevaluation.Thisdemonstratesexcel-lentparameterefﬁciencyofcharacter-leveltransla-tioninamultilingualsetting.Wealsoshowcaseourmodel’sabilitytohandleintra-sentencecode-switchingwhileperforminglanguageidentiﬁcationontheﬂy.Thecontributionsofthisworkaretwofold:weempiricallyshowthat(1)wecantraincharacter-to-characterNMTmodelwithoutanyexplicitsegmen-tation;和(2)wecanshareasinglecharacter-levelencoderacrossmultiplelanguagestobuildamul-tilingualtranslationsystemwithoutincreasingthemodelsize.2Background:AttentionalNeuralMachineTranslationNeuralmachinetranslation(NMT)isarecentlyproposedapproachtomachinetranslationthatbuildsasingleneuralnetworkwhichtakesasaninput,asourcesentenceX=(x1,…,xTX)andgeneratesitstranslationY=(y1,…,yTY),wherextandyt0aresourceandtargetsymbols(Bahdanauetal.,2015;Sutskeveretal.,2014;Luongetal.,2015;Choetal.,2014a).AttentionalNMTmodelshavethreecomponents:anencoder,adecoderandanattentionmechanism.EncoderGivenasourcesentenceX,theen-coderconstructsacontinuousrepresentationthatsummarizesitsmeaningwitharecurrentneuralnetwork(RNN).AbidirectionalRNNisoftenimplementedasproposedin(Bahdanauetal.,2015).Aforwardencoderreadstheinputsentencefromlefttoright:−→ht=−→fenc(西德:0)前任(xt),−→ht−1(西德:1).相似地,abackwardencoderreadsitfromrighttoleft:←−ht=←−fenc(西德:0)前任(xt),←−ht+1(西德:1),whereExisthesourceembeddinglookuptable,and−→fencand←−fencarerecurrentactivationfunctionssuchaslongshort-termmemoryunits(LSTMS)(HochreiterandSchmidhuber,1997)orgatedrecurrentunits(GRUs)(Choetal.,2014b).TheencoderconstructsasetofcontinuoussourcesentencerepresentationsCbyconcatenatingtheforwardandbackwardhid-denstatesateachtimestep:C=(西德:8)h1,…,hTX(西德:9),whereht=(西德:2)−→ht;←−ht(西德:3).AttentionFirstintroducedinBahdanauetal.(2015),theattentionmechanismletsthedecoderat-tendmoretodifferentsourcesymbolsforeachtargetsymbol.Moreconcretely,itcomputesthecontextvectorct0ateachdecodingtimestept0asaweightedsumofthesourcehiddenstates:ct0=PTXt=1αt0tht.SimilarlytoChungetal.(2016)andFiratetal.(2016A),eachattentionalweightαt0trepresentshowrelevantthet-thsourcetokenxtistothet0-thtargettokenyt0,andiscomputedas:αt0t=1Zexp(西德:18)分数(西德:16)Ey(yt0−1),st0−1,ht(西德:17)(西德:19),(1)whereZ=PTXk=1exp(西德:0)分数(Ey(yt0−1),st0−1,hk)(西德:1)isthenormalizationconstant.score()isafeed-forwardneuralnetworkwithasinglehiddenlayerthatscoreshowwellthesourcesymbolxtandthetargetsymbolyt0match.Eyisthetargetembeddinglookuptableandst0isthetargethiddenstateattimet0.DecoderGivenasourcecontextvectorct0,thede-codercomputesitshiddenstateattimet0as:st0=fdec(西德:0)Ey(yt0−1),st0−1,ct0(西德:1).然后,aparametricfunc-tionoutk()returnstheconditionalprobabilityofthenexttargetsymbolbeingk:p(yt0=k|y

下载pdf

麻省理工学院人工智能研究专业

麻省理工学院人工智能研究专业

计算语言学协会会刊, 卷. 5, PP. 365–378, 2017. 动作编辑器: 亚当·洛佩兹（Adam Lopez）.