Transacciones de la Asociación de Lingüística Computacional, volumen. 6, páginas. 225–240, 2018. Editor de acciones: Philipp Koehn.
Lote de envío: 10/2017; Lote de revisión: 2/2018; Publicado 4/2018.
2018 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
C
(cid:13)
ScheduledMulti-TaskLearning:FromSyntaxtoTranslationEliyahuKiperwasser∗ComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelelikip@gmail.comMiguelBallesterosIBMResearch1101KitchawanRoad,Route134YorktownHeights,NY10598.U.Smiguel.ballesteros@ibm.comAbstractNeuralencoder-decodermodelsofmachinetranslationhaveachievedimpressiveresults,whilelearninglinguisticknowledgeofboththesourceandtargetlanguagesinanimplicitend-to-endmanner.Weproposeaframeworkinwhichourmodelbeginslearningsyntaxandtranslationinterleaved,graduallyputtingmorefocusontranslation.Usingthisapproach,weachieveconsiderableimprovementsintermsofBLEUscoreonrelativelylargeparallelcor-pus(WMT14EnglishtoGerman)andalow-resource(WITGermantoEnglish)setup.1IntroductionNeuralMachineTranslation(NMT)(KalchbrennerandBlunsom,2013;Sutskeveretal.,2014;Bah-danauetal.,2014)hasrecentlybecomethestate-of-the-artapproachtomachinetranslation(Bojaretal.,2016).Oneofthemainadvantagesofneuralap-proachesistheimpressiveabilityofRNNstoactasfeatureextractorsovertheentireinput(KiperwasserandGoldberg,2016),ratherthanfocusingonlocalinformation.Neuralarchitecturesareabletoextractlinguisticpropertiesfromtheinputsentenceintheformofmorphology(Belinkovetal.,2017)orsyn-tax(Linzenetal.,2016).Sin embargo,asshowninDyeretal.(2016)andDyer(2017),systemsthatignoreexplicitlinguis-ticstructuresareincorrectlybiasedandtheytendtomakeoverlystronglinguisticgeneralizations.Pro-vidingexplicitlinguisticinformation(Dyeretal.,∗WorkcarriedoutduringsummerinternshipatIBMRe-search.2016;Kuncoroetal.,2017;NiehuesandCho,2017;SennrichandHaddow,2016;Eriguchietal.,2017;AharoniandGoldberg,2017;Nadejdeetal.,2017;Bastingsetal.,2017;Matthewsetal.,2018)hasproventobebeneficial,achievinghigherresultsinlanguagemodelingandmachinetranslation.Multi-tasklearning(MTL)consistsofbeingabletosolvesynergistictaskswithasinglemodelbyjointlytrainingmultipletasksthatlookalike.Thefi-naldenserepresentationsoftheneuralarchitecturesencodethedifferentobjectives,andtheyleveragetheinformationfromeachtasktohelptheothers.Forexample,taskslikemultiwordexpressionde-tectionandpart-of-speechtagginghavebeenfoundveryusefulforotherslikecombinatorycategoricalgrammar(CCG)analizando,chunkingandsuper-sensetagging(BingelandSøgaard,2017).Inordertoperformaccuratetranslations,wepro-ceedbyanalogytohumans.Itisdesirabletoacquireadeepunderstandingofthelanguages;y,oncethisisacquireditispossibletolearnhowtotranslategraduallyandwithexperience(includingrevisitingandre-learningsomeaspectsofthelanguages).Weproposeasimilarstrategybyintroducingthecon-ceptofScheduledMulti-TaskLearning(Section4)inwhichweproposetointerleavethedifferenttasks.Inthispaper,weproposetolearnthestructureoflanguage(throughsyntacticparsingandpart-of-speechtagging)withamulti-tasklearningstrategywiththeintentionsofimprovingtheperformanceoftaskslikemachinetranslationthatusethatstructureandmakegeneralizations.WeachieveconsiderableimprovementsintermsofBLEUscoreonarela-tivelylargeparallelcorpus(WMT14EnglishtoGer-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
7
1
5
6
7
6
0
0
/
/
t
yo
a
C
_
a
_
0
0
0
1
7
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
226
hombre)andalow-resource(WITGermantoEnglish)setup.Ourdifferentschedulingstrategiesshowin-terestingdifferencesinperformancebothinthelow-resourceandstandardsetups.2SequencetoSequencewithAttentionNeuralMachineTranslation(NMT)(Sutskeveretal.,2014;Bahdanauetal.,2014)directlymodelstheconditionalprobabilityp(y|X)ofthetargetse-quenceofwordsy=