计算语言学协会会刊, 卷. 6, PP. 225–240, 2018. 动作编辑器: Philipp Koehn.

计算语言学协会会刊, 卷. 6, PP. 225–240, 2018. 动作编辑器: Philipp Koehn.
提交批次: 10/2017; 修改批次: 2/2018; 已发表 4/2018.

2018 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

C
(西德:13)

ScheduledMulti-TaskLearning:FromSyntaxtoTranslationEliyahuKiperwasser∗ComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelelikip@gmail.comMiguelBallesterosIBMResearch1101KitchawanRoad,Route134YorktownHeights,NY10598.U.Smiguel.ballesteros@ibm.comAbstractNeuralencoder-decodermodelsofmachinetranslationhaveachievedimpressiveresults,whilelearninglinguisticknowledgeofboththesourceandtargetlanguagesinanimplicitend-to-endmanner.Weproposeaframeworkinwhichourmodelbeginslearningsyntaxandtranslationinterleaved,graduallyputtingmorefocusontranslation.Usingthisapproach,weachieveconsiderableimprovementsintermsofBLEUscoreonrelativelylargeparallelcor-pus(WMT14EnglishtoGerman)andalow-resource(WITGermantoEnglish)setup.1IntroductionNeuralMachineTranslation(NMT)(KalchbrennerandBlunsom,2013;Sutskeveretal.,2014;Bah-danauetal.,2014)hasrecentlybecomethestate-of-the-artapproachtomachinetranslation(Bojaretal.,2016).Oneofthemainadvantagesofneuralap-proachesistheimpressiveabilityofRNNstoactasfeatureextractorsovertheentireinput(KiperwasserandGoldberg,2016),ratherthanfocusingonlocalinformation.Neuralarchitecturesareabletoextractlinguisticpropertiesfromtheinputsentenceintheformofmorphology(Belinkovetal.,2017)orsyn-tax(Linzenetal.,2016).尽管如此,asshowninDyeretal.(2016)andDyer(2017),systemsthatignoreexplicitlinguis-ticstructuresareincorrectlybiasedandtheytendtomakeoverlystronglinguisticgeneralizations.Pro-vidingexplicitlinguisticinformation(Dyeretal.,∗WorkcarriedoutduringsummerinternshipatIBMRe-search.2016;Kuncoroetal.,2017;NiehuesandCho,2017;SennrichandHaddow,2016;Eriguchietal.,2017;AharoniandGoldberg,2017;Nadejdeetal.,2017;Bastingsetal.,2017;Matthewsetal.,2018)hasproventobebeneﬁcial,achievinghigherresultsinlanguagemodelingandmachinetranslation.Multi-tasklearning(MTL)consistsofbeingabletosolvesynergistictaskswithasinglemodelbyjointlytrainingmultipletasksthatlookalike.Theﬁ-naldenserepresentationsoftheneuralarchitecturesencodethedifferentobjectives,andtheyleveragetheinformationfromeachtasktohelptheothers.Forexample,taskslikemultiwordexpressionde-tectionandpart-of-speechtagginghavebeenfoundveryusefulforotherslikecombinatorycategoricalgrammar(CCG)解析,chunkingandsuper-sensetagging(BingelandSøgaard,2017).Inordertoperformaccuratetranslations,wepro-ceedbyanalogytohumans.Itisdesirabletoacquireadeepunderstandingofthelanguages;和,oncethisisacquireditispossibletolearnhowtotranslategraduallyandwithexperience(includingrevisitingandre-learningsomeaspectsofthelanguages).Weproposeasimilarstrategybyintroducingthecon-ceptofScheduledMulti-TaskLearning(Section4)inwhichweproposetointerleavethedifferenttasks.Inthispaper,weproposetolearnthestructureoflanguage(throughsyntacticparsingandpart-of-speechtagging)withamulti-tasklearningstrategywiththeintentionsofimprovingtheperformanceoftaskslikemachinetranslationthatusethatstructureandmakegeneralizations.WeachieveconsiderableimprovementsintermsofBLEUscoreonarela-tivelylargeparallelcorpus(WMT14EnglishtoGer-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
1
7
1
5
6
7
6
0
0

/
t

我

A
C
_
A
_
0
0
0
1
7
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

226

男人)andalow-resource(WITGermantoEnglish)setup.Ourdifferentschedulingstrategiesshowin-terestingdifferencesinperformancebothinthelow-resourceandstandardsetups.2SequencetoSequencewithAttentionNeuralMachineTranslation(NMT)(Sutskeveretal.,2014;Bahdanauetal.,2014)directlymodelstheconditionalprobabilityp(y|X)ofthetargetse-quenceofwordsy=givenasourcesequencex=.Inthispaper,webaseourneuralarchitectureonthesamesequencetosequencewithattentionmodel;inthefollowingweexplainthedetailsanddescribethenuancesofourarchitecture.2.1EncoderWeusebidirectionalLSTMstoencodethesourcesentences(格雷夫斯,2012).Givenasourcesentencex=,weembedthewordsintovec-torsthroughanembeddingmatrixWS,thevectorofthei-thwordisWSxi.Wegettherepresentationsofthei-thwordbysummarizingtheinformationofneighboringwordsusingbidirectionalLSTMs(Bah-danauetal.,2014),hFi=LSTMF(hFi−1,WSxi)(1)hBi=LSTMB(hBi+1,WSxi).(2)Theforwardandbackwardrepresentationareconcatenatedtogetthebi-directionalencoderrep-resentationofwordiashi=[hFi,hBi].2.2DecoderThedecodergeneratesonetargetwordpertime-step,因此,wecandecomposetheconditionalprob-abilityaslogp(y|X)=Xjp(yj|yjhi(4)αi=exp(不)Pkexp(我).(5)Avectorrepresentation(西杰)capturingtheinfor-mationrelevanttothistime-stepiscomputedbyaweightedsumoftheencodedsourcevectorrepre-sentationsusingαvaluesasweights.cj=Xiαi·hi.(6)Giventhesentencerepresentationproducedbytheattentionmechanism(西杰)andthedecoderstatecapturingthetranslatedwordssofar(dj),themodeldecodesthenextwordintheoutputsequence.Thedecodingisdoneusingamulti-layerperceptronwhichreceivescjanddjandoutputsascoreforeachwordinthetargetvocabulary:gj=tanh(W1Decdj+W1Attcj)(7)uj=tanh(gj+W2Decdj+W2Attcj)(8)p(yj|y
计算语言学协会会刊, 卷. 6, PP. 225–240, 2018. 动作编辑器: Philipp Koehn. 图像

下载pdf