Transactions of the Association for Computational Linguistics, 1 (2013) 327–340. Action Editor: Philipp Koehn. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, 1 (2013) 327–340. Editor de acciones: Philipp Koehn.

Submitted 1/2013; Revisado 5/2013; Publicado 7/2013. C
(cid:13)

2013 Asociación de Lingüística Computacional.

DynamicallyShapingtheReorderingSearchSpaceofPhrase-BasedStatisticalMachineTranslationAriannaBisazzaandMarcelloFedericoFondazioneBrunoKesslerTrento,Italia{bisazza,federico}@fbk.euAbstractDeﬁningthereorderingsearchspaceisacru-cialissueinphrase-basedSMTbetweendis-tantlanguages.Infact,theoptimaltrade-offbetweenaccuracyandcomplexityofde-codingisnowadaysreachedbyharshlylim-itingtheinputpermutationspace.Wepro-poseamethodtodynamicallyshapesuchspaceand,de este modo,capturelong-rangewordmovementswithouthurtingtranslationqual-itynordecodingtime.Thespacedeﬁnedbyloosereorderingconstraintsisdynamicallyprunedthroughabinaryclassiﬁerthatpredictswhetheragiveninputwordshouldbetrans-latedrightafteranother.Theintegrationofthismodelintoaphrase-baseddecoderim-provesastrongArabic-Englishbaselineal-readyincludingstate-of-the-artearlydistor-tioncost(MooreandQuirk,2007)andhierar-chicalphraseorientationmodels(GalleyandManning,2008).Signiﬁcantimprovementsinthereorderingofverbsareachievedbyasys-temthatisnotablyfasterthanthebaseline,whileBLEUandMETEORremainstable,orevenincrease,ataveryhighdistortionlimit.1IntroductionWordorderdifferencesareamongthemostimpor-tantfactorsdeterminingtheperformanceofstatisti-calmachinetranslation(SMT)onagivenlanguagepair(Birchetal.,2009).Thisisparticularlytrueintheframeworkofphrase-basedSMT(PSMT)(Zensetal.,2002;Koehnetal.,2003;OchandNey,2002),anapproachthatremainshighlycompetitivedespitetherecentadvancesofthetree-basedapproaches.DuringthePSMTdecodingprocess,theoutputsentenceisbuiltfromlefttoright,whiletheinputsentencepositionscanbecoveredindifferentor-ders.Thus,reorderinginPSMTcanbeviewedastheproblemofchoosingtheinputpermutationthatleadstothehighest-scoringoutputsentence.Duetoefﬁciencyreasons,sin embargo,theinputpermutationspacecannotbefullyexplored,andisthereforelim-itedwithhardreorderingconstraints.Althoughmanysolutionshavebeenproposedtoexplicitlymodelwordreorderingduringdecoding,PSMTstilllargelyfailstohandlelong-rangewordmovementsinlanguagepairswithdifferentsyntac-ticstructures1.Webelievethisismostlynotduetodeﬁcienciesoftheexistingreorderingmodels,butrathertoaverycoarsedeﬁnitionofthereorder-ingsearchspace.Indeed,theexistingreorderingconstraintsarerathersimpleandtypicallybasedonword-to-worddistances.Moreover,theyareuni-formthroughouttheinputsentenceandinsensitivetotheactualwordsbeingtranslated.Relaxingthiskindofconstraintsmeansdramaticallyincreasingthesizeofthesearchspaceandmakingthereorder-ingmodel’staskextremelycomplex.Asaresult,eveninlanguagepairswherelongreorderingisreg-ularlyobserved,PSMTqualitydegradeswhenlongwordmovementsareallowedtothedecoder.Weaddressthisproblembytrainingabinaryclassiﬁertopredictwhetheragiveninputpositionshouldbetranslatedrightafteranother,giventhewordsatthosepositionsandtheircontexts.Whenthismodelisintegratedintothedecoder,itspredic-1Forempiricalevidence,seeforinstance(Birchetal.,2009;GalleyandManning,2008;BisazzaandFederico,2012).

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

328

tionscanbeusednotonlyasanadditionalfeaturefunction,butalsoasanearlyindicationofwhetherornotagivenreorderingpathshouldbefurtherex-plored.Morespeciﬁcally,ateachhypothesisex-pansion,weconsiderthesetofinputpositionsthatarereachableaccordingtotheusualreorderingcon-straints,andpruneitbasedonlyonthereorder-ingmodelscore.Then,thehypothesiscanbeex-pandednormallybycoveringthenon-prunedposi-tions.Thistechniquemakesitpossibletodynami-callyshapethesearchspacewhiledecodingwithaveryhighdistortionlimit,whichcanimprovetrans-lationqualityandefﬁciencyatthesametime.Theremainderofthepaperisorganizedasfol-lows.Afteranoverviewoftherelevantliterature,wedescribeindetailourwordreorderingmodel.Inthefollowingsection,weintroduceearlypruningofreorderingstepsasawaytodynamicallyshapetheinputpermutationspace.Finally,wepresentanem-piricalanalysisofourapproach,includingintrinsicevaluationofthemodelandSMTexperimentsonawell-knownArabic-Englishnewstranslationtask.2PreviousWorkInthispaper,wefocusonmethodsthatguidethereorderingsearchduringthephrase-baseddecodingprocess.Seeforinstance(Costa-juss`aandFonol-losa,2009)forareviewofpre-andpost-reorderingapproachesthatarenottreatedhere.Assumingaone-to-onecorrespondencebetweensourceandtargetphrases,reorderinginPSMTcanbeviewedastheproblemofsearchingthroughasetofpermutationsoftheinputsentence.Thus,twosub-problemsarise:deﬁningthesetofallowedper-mutations(reorderingconstraints)andscoringtheallowedpermutationsaccordingtosomelikelihoodcriterion(reorderingmodel).Webeginwiththelat-ter,returningtotheconstraintslaterinthissection.2.1ReorderingmodelingInitsoriginalformulation,thePSMTapproachincludesabasicreorderingmodel,calleddistor-tioncost,thatexponentiallypenalizeslongerjumpsamongconsecutivelytranslatedphrases(˜fi−1,˜fi):d(˜fi−1,˜fi)=e−|comenzar(˜fi)−end(˜fi−1)−1|Anumberofmoresophisticatedsolutionshavebeenproposedtoexplicitlymodelwordreorder-ingduringdecoding.Thesecanmostlybegroupedintothreefamilies:phraseorientationmodels,jumpmodelsandsourcedecodingsequencemodels.Phraseorientationmodels(Tillmann,2004;Koehnetal.,2005;ZensandNey,2006;GalleyandManning,2008),alsoknownaslexicalizedreorder-ingmodels,predicttheorientationofaphrasewithrespecttothelasttranslatedone,byclassifyingitasmonotone,swapordiscontinuous.Thesemod-elshaveprovenveryusefulforshortandmedium-rangereorderingandareamongthemostwidelyusedinPSMT.However,theircoarseclassiﬁcationofreorderingstepsmakesthemunsuitabletopredictlong-rangereorderings.Jumpmodels(Al-OnaizanandPapineni,2006;Greenetal.,2010;YahyaeiandMonz,2010)predictthedirectionandlengthofajumptoperformafteragiveninputword2.BoththeseworksachievetheirbestArabic-EnglishresultswithinarathersmallDL:a saber,8en(Al-OnaizanandPapineni,2006)and5in(Greenetal.,2010),thusfailingtocapturetherarebutcruciallongreorderingsthatweretheirmainmotivation.Adrawbackofthisapproachisthatlongjumpsaretypicallypenalizedbecauseoftheirlowfrequencycomparedtoshortjumps.Thisstrongbiasisundesirable,giventhatweareespeciallyin-terestedindetectingprobablelongreorderings.Sourcedecodingsequencemodelspredictwhichinputwordislikelytobetranslatedatagivenstateofdecoding.Forinstance,reorderedsourcelan-guagemodels(Fengetal.,2010)aresmoothedn-grammodelstrainedonacorpusofsourcesentencesreorderedtomatchthetargetwordorder.Wheninte-gratedintotheSMTsystem,theyassignaprobabil-itytoeachnewlytranslatedwordgiventhen-1pre-viouslytranslatedwords.Finally,sourcewordpairreorderingmodels(Visweswariahetal.,2011)esti-mate,foreachpairofinputwordsiandj,thecostoftranslatingjrightafterigivenvariousfeaturesofi,jandtheirrespectivecontexts.DifferentlyfromreorderedsourceLMs,thesemodelsarediscrimina-tiveandcanproﬁtfromricherfeaturesets.Atthesametime,theydonotemploydecodinghistory-basedfeatures,whichallowsformoreeffectivehy-2Inthispaper,aporte(orsource)worddenotesthewordatagivenpositionoftheinputsentence,ratherthanawordtype.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

329

pothesisrecombination.Themodelwearegoingtopresentbelongstothislastsub-group,whichweﬁndespeciallysuitabletopredictlongreorderings.2.2ReorderingconstraintsThereorderingconstraintoriginallyincludedinthePSMTframeworkandimplementedinourreferencetoolkit,Moses(Koehnetal.,2007),iscalleddis-tortionlimit(DL).Thisconsistsinallowingthede-codertoskip,orjump,atmostkwordsfromthelasttranslatedphrasetothenextone.Moreprecisely,thelimitisimposedonthedistortionDbetweenconsec-utivelytranslatedphrases(˜fi−1,˜fi):D(˜fi−1,˜fi)=!!!comenzar(˜fi)−end(˜fi−1)−1!!!≤DLLimitingtheinputpermutationspaceisnecessaryforbeam-searchPSMTdecoderstofunctioninlin-eartime.Reorderingconstraintsarealsoimportantfortranslationqualitybecausetheexistingmodelsaretypicallynotdiscriminativeenoughtoguidethesearchoververylargesetsofreorderinghypotheses.Despitetheircrucialeffectsonthecomplexityofreorderingmodeling,aunque,reorderingconstraintshavedrawnlessattentionintheliterature.Theex-istingreorderingconstraintsaretypicallybasedonword-to-worddistances–IBM(Bergeretal.,1996)andDL(Koehnetal.,2007)–oronpermutationpat-terns–ITG(Wu,1997).Bothkindsofconstraintsareuniformthroughouttheinputsentence,andin-sensitivetothewordbeingtranslatedandtoitscon-text.Thisresultsinaverycoarsedeﬁnitionofthereorderingsearchspace,whichisproblematicinlan-guagepairswithdifferentsyntacticstructures.Toaddressthisproblem,YahyaeiandMonz(2010)presentatechniquetodynamicallysettheDL:theytrainaclassiﬁertopredictthemostprob-ablejumplengthaftereachinputword,andusethepredictedvalueastheDLafterthatposition.Un-fortunately,thismethodcangenerateinconsistentconstraintsleadingtodecodingdead-ends.Asaso-lution,thedynamicDLisrelaxedwhenneededtoreachtheﬁrstuncoveredposition.Translationim-provementsarereportedonlyonasmall-scaletaskwithshortsentences(BTEC),overabaselinethatin-cludesaverysimplereorderingmodel.Inourworkwedevelopthisideafurtheranduseareorderingmodeltopredictwhichspeciﬁcinputwords,ratherthaninputintervals,arelikelybetranslatednext.Moreover,oursolutionisnotaffectedbythecon-straintinconsistencyproblem(seeSect.4).Inanotherrelatedwork,BisazzaandFederico(2012)generatelikelyreorderingsoftheinputsen-tencebymeansoflanguage-speciﬁcfuzzyrulesbasedonshallowsyntax.Longjumpsarethensug-gestedtothePSMTdecoderbyreducingthedistor-tioncostforspeciﬁcpairsofinputwords.Incom-parisontothedynamicDL,thatisamuchﬁnerwaytodeﬁnethereorderingspace,leadingtoconsistentimprovementsofbothtranslationqualityandefﬁ-ciencyoverastrongbaseline.However,theneedofspeciﬁcreorderingrulesmakesthemethodhardertoapplytonewlanguagepairs.3TheWaWreorderingmodelWemodelreorderingastheproblemofdecidingwhetheragiveninputwordshouldbetranslatedafteranother(Word-after-Word).Thisformulationisparticularlysuitabletohelpthedecoderdecidewhetherareorderingpathispromisingenoughtobefurtherexplored.Moreover,whentranslatingasentence,choosingthenextsourcewordtotranslateappearsasamorenaturalproblemthanguessinghowmuchtotheleftortotherightweshouldmovefromthecurrentsourceposition.TheWaWreorderingmodeladdressesabinarydecisiontaskthroughthefollowingmaximum-entropyclassiﬁer:PAG(Ri,j=Y|fJ1,i,j)=exp[«mλmhm(fJ1,i,j,Ri,j=Y)]»Y!exp.[«mλmhm(fJ1,i,j,Ri,j=Y»)]wherefJ1isasourcesentenceofJwords,hmarefeaturefunctionsandλmthecorrespondingfeatureweights.TheoutcomeYcanbeeither1or0,withRi,j=1meaningthatthewordatpositionjistrans-latedrightafterthewordatpositioni.OurWaWreorderingmodelisstronglyrelatedtothatofVisweswariahetal.(2011)–herebycalledTravellingSalesmanProblem(TSP)model–withfewimportantdifferences:(i)wedonotincludeinthefeaturesanyexplicitindicationofthejumplength,inordertoavoidthebiasonshortjumps;(II)theytrainalinearmodelwithMIRA(Cram-merandSinger,2003)byminimizingthenumber

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

330

ofinputwordsthatgetplacedafterthewrongpo-sition,whileweuseamaximum-entropyclassiﬁertrainedbymaximum-likelihood;(III)theyuseanoff-theshelfTSPsolvertoﬁndthebestsourcesen-tencepermutationandapplyitaspre-processingtotrainingandtestdata.Bycontrast,weintegratethemaximum-entropyclassiﬁerdirectlyintotheSMTdecoderandletallitsothermodels(phraseorien-tation,traducción,targetLMetc.)contributetotheﬁnalreorderingdecision.3.1FeaturesLiketheTSPmodel(Visweswariahetal.,2011),theWaWmodelbuildsonbinaryfeaturessimilartothosetypicallyemployedfordependencyparsing(McDonaldetal.,2005):a saber,combinationsofsurfaceformsorPOStagsofthewordsiandjandtheircontext.OurfeaturetemplatesarepresentedinTable1.ThemainnoveltieswithrespecttotheTSPmodelarethemixedword-POStemplates(rows16-17)andtheshallowsyntaxfeatures.Inparticular,weusethechunktypesofi,jandtheircontext(18-19),aswellasthechunkheadwordsofiandj(20).Fi-nallyweaddafeaturetoindicatewhetherthewordsiandjbelongtothesamechunk(21).Thejumporientation–forward/backward–isincludedinthefeaturesthatrepresentthewordscomprisedbetweeniandj(rows6,7,14,15).Noexplicitindicationofthejumplengthisincludedinanyfeature.3.2TrainingdataTogeneratetrainingdatafortheclassiﬁer,weﬁrstextractreferencereorderingsfromaword-alignedparallelcorpus.Givenaparallelsentence,differ-entheuristicsmaybeusedtoconvertarbitrarywordalignmentstoasourcepermutation(Birchetal.,2010;Fengetal.,2010;Visweswariahetal.,2011).Similarlytothislastwork,wecomputeforeachsourcewordfithemeanaiofthetargetpositionsalignedtofi,thensortthesourcewordsaccordingtothisvalue.3Asadifference,aunque,wedonotdiscardunalignedwordsbutassignthemthemean3Usingthemeanofthealignedindicesmakesthegener-ationofreferencepermutationsmorerobusttoalignmenter-rors.Admittedly,thisheuristicdoesnothandlewellthecaseofsourcewordsthatarecorrectlyalignedtonon-consecutivetar-getwords.However,thisphenomenonisalsonotcapturedbystandardPSMTmodels,whoonlylearncontinuousphrases.i−2i−1ii+1bj−1jj+11ww2www3wwww4wwww5wwww6www7wallww8pp9ppp10pppp11pppp12pppp13pppppp14ppp15pallpp16wp17pw18cc19cccccc20hh21belongtosamechunk(i,j)?w:wordidentity,pag:POStag,C:chunktype,h:chunkheadTable1:Featuretemplatesusedtolearnwhetherasourcepositionjistobetranslatedrightafteri.Positionscom-prisedbetweeniandjaredenotedbybandgeneratetwofeaturetemplates:oneforeachposition(6and14)andonefortheconcatentationofthemall(7and15).oftheirneighbouringwords’alignmentmeans,sothatacompletepermutationofthesourcesentence(pag)isobtained.Table2(a)illustratesthisprocedure.Giventhereferencepermutation,wethengener-atepositiveandnegativetrainingsamplesbysimu-latingthedecodingprocess.Wetraversethesourcepositionsintheorderdeﬁnedbyσ,keepingtrackofthepositionsthathavealreadybeencoveredand,foreacht:1≤t≤J,generate:•onepositivesample(Rσt,σt+1=1)forthesourcepositionthatcomesrightafterit,•anegativesample(Rσt,u=0)foreachsourcepositionin{tu:σt−δ+17andforwardjumpswithD>6.InTable4,resultsarecomparedwiththerankingproducedbystandarddistortion,whichalwaysfavorsshorterjumps.Twoconditionsareconsidered:DL=10correspondingtothesamplingwindowδusedtoproducethetrainingdata,andDL=18thatisthemaximumdistortionofjumpsthatwillbeconsideredinourearly-pruningSMTexperiment.ModelDLDL-errTop-1Top-3Top-3-longbackforw.Distortion102.461.879.650.766.0180.862.080.018.952.3WaW102.471.291.276.469.3180.871.291.868.051.8Table4:Word-to-wordjumprankingaccuracy(%)ofstandarddistortionandWaWreorderingmodel,indif-ferentDLconditions.DL-erristhepercentageofcorrectjumpsbeyondDL.Thetestsetconsistsof40Kreorderingdecisions:oneforeachsourcewordinTIDES-MT04.Wecanseethat,intermsofoverallaccuracies,theWaWreorderingmodeloutperformsstandarddistor-tionbyalargemargin(about10%absolute).Thisisanimportantresult,consideringthatthejumplength,stronglycorrelatingwiththejumplikeli-hood,isnotdirectlyknowntoourmodel.Asre-gardstheDL,thehigherlimitnaturallyresultsinalowerDL-errorrate(percentageofcorrectjumpsbe-yondDL):namely0.8%insteadof2.4%.However,jumppredictionbecomesmuchharder:Top-3accu-racyoflongjumpsbydistortiondropsfrom50.7%to18.9%(backward)andfrom66.0%to52.3%(for-ward).Ourmodelisremarkablyrobusttothiseffectonbackwardjumps,whereitachieves68.0%accu- l D o w n o a d e desde h t t p : / / directo . mi t . e d u / t a c l / lartice – pdf / ¿yo? / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 3 1 1 5 6 6 6 5 7 / / t l a c _ a _ 0 0 2 3 1 pd . f por invitado 0 7 septiembre 2 0 2 3 334 racy.DuetothesyntacticcharacteristicsofArabicandEnglish,thetypicallongreorderingpatterncon-sistsin(i)skippingaclause-initialArabicverb,(II)coveringalongsubject,thenﬁnally(III)jumpingbacktotranslatetheverband(IV)jumpingforwardtocontinuetranslatingtherestofthesentence(seeFig.3foranexample).12Decidingwhentojumpbacktocovertheverb(III)isthehardestpartofthisprocess,andthatispreciselywhereourmodelseemsmorehelpful,whiledistortionalwayspreferstoproceedmonotonicallyachievingaverylowac-curacyof18.9%.Inthecaseoflongforwardjumps(IV),en cambio,distortionisadvantagedasthecorrectchoicetypicallycorrespondstotranslatingtheﬁrstuncoveredposition,thatistheshortestjumpavail-ablefromthelasttranslatedword.Evenhere,ourmodelachievesanaccuracyof51.8%,onlyslightlylowerthanthatofdistortion(52.3%).Insummary,theWaWreorderingmodelsigniﬁ-cantlyoutperformsdistortionintherankingoflongjumps.Inthelargemajorityofcases,itisabletorankacorrectlongjumpinthetop3reorderingop-tions,whichsuggeststhatitcanbeeffectivelyusedforearlyreorderingpruning.5.2SMTexperimentalsetupOurSMTsystemsarebuiltwiththeMosestoolkit,whilewordalignmentisproducedbytheBerke-leyAligner(Liangetal.,2006).Thebaselinede-coderincludesaphrasetranslationmodel,alexi-calizedreorderingmodel,a6-gramtargetlanguagemodel,distortioncost,wordandphrasepenalties.Morespeciﬁcally,thebaselinereorderingmodelisahierarchicalphraseorientationmodel(Tillmann,2004;Koehnetal.,2005;GalleyandManning,2008)trainedonalltheavailableparalleldata.Thisvariantwasshowntooutperformthedefaultword-basedonanArabic-Englishtask.Tomakeourbase-lineevenmorecompetitive,weapplyearlydistor-tioncost,asproposedbyMooreandQuirk(2007).Thisfunctionhasthesamevalueasthestandardoneoveracompletetranslationhypothesis,butitantic-ipatesthegradualaccumulationofthecost,mak-inghypothesesofthesamelengthmorecompara-bletooneanother.Notethatthisoptionhasnoef-12Clearly,wewouldexpectdifferentﬁguresfromtestingthemodelonanotherlanguagepairlikeGerman-English,wheretheverbisoftenpostponedinthesourcewithrespecttothetarget.fectonthedistortionlimit,butonlyonthedistor-tioncostfeaturefunction.AsproposedbyJohnsonetal.(2007),statisticallyimprobablephrasepairsareremovedfromthetranslationmodel.Thelan-guagemodelsareestimatedbytheIRSTLMtoolkit(Federicoetal.,2008)withmodiﬁedKneser-Neysmoothing(ChenandGoodman,1999).FeatureweightsareoptimizedbyminimumBLEU-errortraining(Och,2003)ondev06-nw.Toreducetheeffectsoftheoptimizerinstability,wetuneeachconﬁgurationfourtimesandusetheav-erageoftheresultingweightvectorstotranslatethetestsets,assuggestedbyCettoloetal.(2011).Finalmente,eval08-nwisusedtoselecttheearlyprun-ingparametersforthelastexperiment,whileeval09-nwisalwaysreservedasblindtest.5.3EvaluationmetricsWeevaluateglobaltranslationqualitywithBLEU(Papinenietal.,2002)andMETEOR(BanerjeeandLavie,2005).Thesemetrics,aunque,areonlyin-directlysensitivetowordorder,andespeciallyun-likelytocaptureimprovementsattheleveloflong-rangereordering.Forthisreason,wealsocom-putetheKendallReorderingScoreorKRS(Birchetal.,2010)whichisapositivescorebasedontheKendall’sTaudistancebetweenthesource-outputpermutationπandthesource-referencepermuta-tionsσ:KRS(Pi,pag)=(1−#K(Pi,pag))·BPK(Pi,pag)=»ni=1″nj=1d(i,j)12norte(n−1)d(i,j)=$1ifπi<πjandσi>σj0otherwisewhereBPisasentence-levelbrevitypenalty,similartothatofBLEU.TheKRSisrobusttolexicalchoicebecauseitperformsnocomparisonbetweenoutputandreferencewords,butonlybetweenthepositionsoftheirtranslations.Besides,itwasshowntocorre-latestronglywithhumanjudgementsofﬂuency.Ourworkspeciﬁcallyaddresseslong-rangere-orderingphenomenainlanguagepairswherethesearequiterare,althoughcrucialforpreservingthesourcetextmeaning.Hence,animprovementatthislevelmaynotbedetectedbythegeneral-purposemetrics.WethendevelopaKRSvariantthatisonly

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

335

sensitivetothepositioningofspeciﬁcinputwords.Assumingthateachinputwordfiisassignedaweightλi,theformulaaboveismodiﬁedasfollows:dλ(i,j)=$λi+λjifπi<πjandσi>σj0otherwiseAsimilarelement-weightedversionofKendallTauwasproposedbyKumarandVassilvitskii(2010)toevaluatedocumentrankingsininformationretrieval.BecauselongreorderingerrorsinArabic-Englishmostlyaffectverbs,wesettheweightsto1forverbsand0forallotherwordstoonlycaptureverbre-orderingerrors,andcalltheresultingmetricKRS-V.Thesource-referencewordalignmentsneededtocomputethereorderingscoresaregeneratedbytheBerkeleyAlignerpreviouslytrainedonthetrainingdata.Source-outputwordalignmentsareinsteadob-tainedfromthedecoder’strace.5.4ResultsanddiscussionTomotivatethechoiceofourbaselinesetup(earlydistortioncostandDL=8),weﬁrstcomparetheper-formanceofstandardandearlydistortioncostsun-dervariousDLconditions.!»#$%!»#&'()%!»#($%!»#&%!»#$%!»#()%!»#($%!»#»$!%#»$!!#»$!&#»$!’#»$&(#»$&)#»$&*#»$&+#»$,+#)$,+#%$,,#)$,,#%$,»#)$*+,%-«./%-./01$23.45./5$Figure2:Standardvsearlydistortioncostresultsoneval08-nwunderdifferentdistortionlimits(DL),usingthemedium-sizeLM.Bestscoresareontop-rightcorner.AsshowninFig.2,mostresultsareclosetoeachotherintermsofBLEUandKRS,butearlydistor-tionconsistentlyoutperformsthestandardone(sta-tisticallysigniﬁcant).Themoststrikingdifferenceappearsataveryhighdistortionlimit(18),wherestandarddistortionscoresdropbymorethan1BLEUpointandalmost7KRSpoints!Earlydistortionismuchmorerobust(only-1KRSwhengoingfromDL=8toDL=18),whichmakesourbaselinesystemespeciallystrongatthelevelofreordering.Table5presentstheresultsobtainedbyintegrat-ingtheWaWreorderingmodelasanadditionalfeaturefunction,andbyapplyingearlyreorderingpruning.Theupperpartofthetablereferstothemedium-scaleevaluation,whilethelowerpartreferstothelarge-scaleevaluation.Ineachpart,statis-ticalsigniﬁcanceiscomputedagainstthebaseline[B]byapproximaterandomizationasin(RiezlerandMaxwell,2005).RuntimesareobtainedbyanIntelXeonX5650processorontheﬁrst500sentencesofeval08-nw,andexcludeloadingtimeofallmodels.Medium-scaleevaluation.IntegratingtheWaWmodelasanadditionalfeaturefunctionresultsinsmallbutconsistentimprovementsinallDLcondi-tions,whichshowsthatthistypeofmodelconveysinformationthatismissingfromthestate-of-the-artreorderingmodels.Asregardsefﬁciency,thenewmodelmakesdecodingtimeincreaseby8%.AmongtheDLsettingsconsidered,DL=8iscon-ﬁrmedastheoptimalone–withorwithoutWaWmodel.RaisingtheDLto18withnospecialprun-inghasanegativeimpactonbothtranslationqualityandefﬁciency.Theeffectisespeciallyvisibleonthereorderingscores:thatis,from84.7to83.9KRSandfrom86.2to85.8KRS-Voneval09-nw.Runtimesarealmostdoubled:from87to164andfrom94to178ms/word,thatisa89%increase.WethenproceedtothelastexperimentwherethereorderingspaceisdynamicallyprunedbasedontheWaWmodelscore.AsexplainedinSect.4,anon-prunable-zoneofwidthϑ=5issetaroundthelastcoveredposition.Tosettheearlypruningpa-rameters,weperformagridsearchoverthevalues(1,2,3,4,5)forhistogramand(0.5,0.25,0.1)forrelativethreshold,andselectthevaluesthatachievethebestBLEUandKRSoneval08-nw,namely3(his-togram)and0.1(límite).Theresultingconﬁgu-rationisthenre-optimizedbyMERTondev06-nw.Thissettingimpliesthat,atagivenpointofdecod-ingwhereiisthelastcoveredposition,anewwordcanbetranslatedonlyif:•itlieswithinaDLof5fromi,or•itlieswithinaDLof18fromianditsWaWreorderingscoreisamongthetop3andatleastequalto1/10ofthebestscore(inlinearspace).AsshowninTable5,earlypruningachievesthebestresultsoverall:despitethehighDL,wereport l D o w n o a d e d f r o m h t t p : / / directo . mi t . e d u / t a c l / lartice – pdf / ¿yo? / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 3 1 1 5 6 6 6 5 7 / / t l a c _ a _ 0 0 2 3 1 pd . f por invitado 0 7 septiembre 2 0 2 3 336 DLReo.modelseval08-nweval09-nwvs-09ms/bleumetkrskrs-Vbleumetkrskrs-Vkrs-VwordUsingthemedium-sizeLM(147MEnglishtokens):5hier.lexreo,earlydisto44.735.1!83.0!84.7!50.3″38.184.685.984.759+WaWmodel44.835.183.785.451.0#38.3#85.1#86.6$85.5#648hier.lexreo,earlydisto[B]44.835.283.485.650.638.184.786.284.887+WaWmodel45.035.283.7$85.951.1#38.3#85.1#86.8#85.8#9418hier.lexreo,earlydisto44.734.9!82.4!84.9!50.338.0″83.9!85.8″84.3″164+WaWmodel44.835.282.7!85.551.0$38.3#84.2″86.285.2178+earlyreo.pruning(ϑ=5)45.035.383.7$86.3#50.938.3#84.987.0#86.2#68UsingthelargeinterpolatedLM(2130MEnglishtokens)anddoublebeam-size:8hier.lexreo,earlydisto[B]46.335.083.285.051.638.384.585.884.5257918hier.lexreo,earlydisto45.9″34.9!81.7!84.1!51.438.1!83.0!84.6!83.1!5462+WaW+reo.pruning(ϑ=5)46.335.283.485.7#52.8#38.6#84.686.6#85.5#1588Table5:EffectsofWaWreorderingmodelingandearlyreorderingpruningontranslationquality,measuredwith%BLEU,METEOR,andKendallReorderingScore:regular(KRS)andverb-speciﬁc(KRS-V).Statisticallysigniﬁcantdifferenceswithrespecttothebaseline[B]aremarkedwith#!atthep≤.05leveland$»atthep≤.10level.Decodingtimeismeasuredinmillisecondsperinputword.nolossinBLEU,METEORandKRS,butweactuallyseeseveralimprovements.Inparticular,thegainsontheblindtesteval09-nware+0.3BLEU,+0.2ME-TEORand+0.2KRS(onlyMETEORissigniﬁcant).Whilethesegainsareadmittedlysmall,werecallthatourtechniquesaffectrareandisolatedeventswhichcanhardlyemergefromthegeneral-purposeevaluationmetrics.Moreover,toourknowledge,thisistheﬁrsttimethataPSMTsystemisshowntomaintainagoodperformanceonthislanguagepairwhileadmittingverylong-rangereorderings.Finallyandmoreimportantly,thereorderingofverbsimprovessigniﬁcantlyonbothgenerictestsandontheVS-sentencesubset(vs-09):a saber,inthelatter,weachieveanotablegainof1.4KRS-V.Efﬁciencyisalsolargelyimprovedbyourearlyreorderingpruningtechnique:decodingtimeisre-ducedto68ms/word,correspondingtoa22%speed-upoverthebaseline.Large-scaleevaluation.Wealsoinvestigatewhetherourmethodscanbeusefulinascenariowhereefﬁciencyislessimportantandmoredataisavailablefortraining.Tothisend,webuildaverylargeLMbyinterpolatingthemainLMwiththreeotherLMstrainedondifferentGigawordsec-tions(seeSect.5).Además,werelaxthedecoder’sbeamsizefromthedefaultvalueof200to400hy-potheses,toreducetheriskofsearcherrorsandob-tainthebestpossiblebaselineperformance.Bycomparingthelarge-scalewiththemedium-scalebaselineinTable5,wenotethattheadditionofLMdataisespeciallybeneﬁcialforBLEU(+1.5oneval08-nwand+1.0oneval09-nw),butnotasmuchfortheothermetrics,whichchallengesthecommonlyheldideathatmoredataalwaysimprovestranslationquality.Heretoo,relaxingtheDLwithoutspecialpruninghurtsnotonlyefﬁciencybutalsotranslationqual-ity:allthescoresdecreaseconsiderably,showingthateventhestrongerLMisnotsufﬁcienttoguidesearchthroughaverylargereorderingsearchspace.Asforourenhancedsystem,itachievessimi-largainsasinthemedium-scalescenario:thatis,BLEUandMETEORarepreservedorslightlyim-proveddespitetheveryhighDL,whileallthere-orderingscoresincrease.Inparticular,wereportsta-tisticallysigniﬁcantimprovementsinthereorderingofverbs,whichiswheretheimpactofourmethodisexpectedtoconcentrate(+0.7,+0.8and+1.0KRS-Voneval08-nw,eval09-nwandvs-09,respectivamente).Theseresultsconﬁrmtheusefulnessofourmethodnotonlyasanoptimizationtechnique,butalsoasawaytoimprovetranslationqualityontopofaverystrongbaseline.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

337

SRC!!!»#$!%&'()*+,-./0$%&'&-1!2+1+03*+045(67!8+9#+77!5:65&-3*;24<5(&-73!045(&-=>?@A(0B*+CDEF(23*verbsubj.obj.compl.ywASlsfyrAlmmlkpAlErbypAlsEwdypldYlbnAnEbdAlEzyzxwjptHrk-hfyAtjAh…continuesambassadorKingdomArabianSauditoLebanonAbdulazizKhawjamovehisindirectionREFTheKingdomofSaudiArabia’sambassadortoLebanonAbdulazizKhawjacontinueshismovestowards…BASEcontinuetoSaudiArabianambassadortoLebanon,AbdulazizKhwjaitsmoveinthedirectionof…NEWTheKingdomofSaudiArabia’sambassadortoLebanon,AbdulazizKhwjacontinueitsmoveinthedirectionof…SRC;#7*$G’(h(+0&B5()I(E4JD>5)wereperformedtotrans-latethe586sentencesofeval09-nw;38outofthesewerelongerthan8andmostlyconcentratedontheVS-sentencesubset(27jumpsD>8performedinvs-09).13Thisandthehigherreorderingscoressug-gestthatlongjumpsaremainlycarriedouttocor-rectlyreorderclause-initalverbsoverlongsubjects.Fig.3showstwoArabicsentencestakenfromeval09-nw,thatwereerroneuouslyreorderedbythebaselinesystem.ThesystemincludingtheWaWmodelandearlyreorderingpruning,en cambio,pro-ducedthecorrecttranslation.TheﬁrstsentenceisatypicalexampleofVSOorderwithalongsubject:whilethebaselinesystemlefttheverbinitsAra-bicposition,producinganincomprehensibletrans-lation,thenewsystemplaceditrightlybetweentheEnglishsubjectandobject.Thisreorderinginvolvedtwolongjumps:onewithD=9backwardandonewithD=8forward.Thesecondsentencedisplaysanother,lesscom-mon,Arabicconstruction:namelyVOS,withaper-sonalpronounobject.Inthiscase,abackwardjumpwithD=10andaforwardjumpwithD=8werenec-essarytoachievethecorrectreordering.13Statisticscomputedonthemedium-LMsystem.6ConclusionsWehavetrainedadiscriminativemodeltopredictlikelyreorderingstepsinawaythatiscomplemen-tarytostate-of-the-artPSMTreorderingmodels.WehaveeffectivelyintegrateditintoaPSMTdecoderasadditionalfeature,ensuringthatitstotalscoreoveracompletetranslationhypothesisisconsistentacrossdifferentphrasesegmentations.Lastly,wehavepro-posedearlyreorderingpruningasanovelmethodtodynamicallyshapetheinputreorderingspaceandcapturelong-rangereorderingphenomenathatareoftencriticalwhentranslatingbetweenlanguageswithdifferentsyntacticstructures.EvaluatedonapopularArabic-Englishnewstranslationtaskagainstastrongbaseline,ourap-proachleadstosimilarorevenhigherBLEU,ME-TEORandKRSscoresataveryhighdistortionlimit(18),whichisbyitselfanimportantachievement.Atthesametime,thereorderingofverbs,measuredwithanovelversionoftheKRS,isconsistentlyim-proved,whiledecodinggetssigniﬁcantlyfaster.TheimprovementsarealsoconﬁrmedwhenaverylargeLMisusedandthedecoder’sbeamsizeisdou-bled,whichshowsthatourmethodreducesnotonlysearcherrorsbutalsomodelerrorsevenwhenbase-linemodelsareverystrong.Wordreorderingisprobablythemostdifﬁcultas-pectofSMTandanimportantfactorofbothitsqual-ityandefﬁciency.GivenitsstronginteractionwiththeotheraspectsofSMT,itappearsnaturaltosolve

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

338

wordreorderingduringdecoding,ratherthanbeforeorafterit.Todate,sin embargo,thisobjectivewasonlypartiallyachieved.Webelievethereisapromisingwaytogobetweenfully-integratedreorderingmod-elsandmonolingualpre-orderingmethods.Thisworkhasstartedtoexploreit.AknowledgmentsThisworkwaspartiallyfundedbytheEuropeanUnionFP7grantagreement287658(EU-BRIDGE).ReferencesYaserAl-OnaizanandKishorePapineni.2006.Distor-tionmodelsforstatisticalmachinetranslation.InPro-ceedingsofthe21stInternationalConferenceonCom-putationalLinguisticsand44thAnnualMeetingoftheAssociationforComputationalLinguistics,pages529–536,Sydney,Australia,July.JacobAndreas,NizarHabash,andOwenRambow.2011.Fuzzysyntacticreorderingforphrase-basedstatisticalmachinetranslation.InProceedingsoftheSixthWork-shoponStatisticalMachineTranslation,pages227–236,Edinburgh,Escocia,July.SatanjeevBanerjeeandAlonLavie.2005.METEOR:AnautomaticmetricforMTevaluationwithimprovedcorrelationwithhumanjudgments.InProceedingsoftheACLWorkshoponIntrinsicandExtrinsicEvalu-ationMeasuresforMachineTranslationand/orSum-marization,pages65–72,AnnArbor,Michigan,June.A.L.Berger,P.F.Brown,S.A.DellaPietra,V.J.DellaPietra,J.R.Gillett,A.S.Kehler,andR.L.Mercer.1996.Languagetranslationapparatusandmethodofusingcontext-basedtranslationmodels.UnitedStatesPatent,No.5510981,Apr.AlexandraBirch,PhilBlunsom,andMilesOsborne.2009.Aquantitativeanalysisofreorderingphenom-ena.InStatMT’09:ProceedingsoftheFourthWork-shoponStatisticalMachineTranslation,pages197–205,Morristown,Nueva Jersey,USA.AlexandraBirch,MilesOsborne,andPhilBlunsom.2010.MetricsforMTevaluation:evaluatingreorder-ing.MachineTranslation,24(1):15–26.AriannaBisazzaandMarcelloFederico.2012.Modi-ﬁeddistortionmatricesforphrase-basedstatisticalma-chinetranslation.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguis-tics(Volume1:LongPapers),pages478–487,JejuIs-land,Corea,July.AriannaBisazza,DanielePighin,andMarcelloFed-erico.2012.Chunk-latticesforverbreorderinginArabic-Englishstatisticalmachinetranslation.Ma-chineTranslation,SpecialIssueonMTforArabic,26(1-2):85–103.AriannaBisazza.2013.LinguisticallyMotivatedRe-orderingModelingforPhrase-BasedStatisticalMa-chineTranslation.Ph.D.thesis,UniversityofTrento.http://eprints-phd.biblio.unitn.it/1019/.MarineCarpuat,YuvalMarton,andNizarHabash.2012.ImprovedArabic-to-Englishstatisticalmachinetrans-lationbyreorderingpost-verbalsubjectsforwordalignment.MachineTranslation,SpecialIssueonMTforArabic,26(1-2):105–120.MauroCettolo,NicolaBertoldi,andMarcelloFederico.2011.MethodsforsmoothingtheoptimizerinstabilityinSMT.InMTSummitXIII:theThirteenthMachineTranslationSummit,pages32–39,Xiamen,China.StanleyF.ChenandJoshuaGoodman.1999.Anempiri-calstudyofsmoothingtechniquesforlanguagemodel-ing.ComputerSpeechandLanguage,4(13):359–393.MartaR.Costa-juss`aandJos´eA.R.Fonollosa.2009.State-of-the-artwordreorderingapproachesinstatisti-calmachinetranslation:Asurvey.IEICETRANSAC-TIONSonInformationandSystems,E92-D(11):2179–2185.KobyCrammerandYoramSinger.2003.Ultraconser-vativeonlinealgorithmsformulticlassproblems.J.Mach.Learn.Res.,3:951–991,March.HalDaum´eIII.2004.NotesonCGandLM-BFGSop-timizationoflogisticregression.Paperavailableathttp://pub.hal3.name,implementationavail-ableathttp://hal3.name/megam.MonaDiab,KadriHacioglu,andDanielJurafsky.2004.AutomaticTaggingofArabicText:FromRawTexttoBasePhraseChunks.InDanielMarcuSusanDumaisandSalimRoukos,editores,HLT-NAACL2004:ShortPapers,pages149–152,Boston,Massachusetts,USA.MarcelloFederico,NicolaBertoldi,andMauroCettolo.2008.IRSTLM:anOpenSourceToolkitforHandlingLargeScaleLanguageModels.InProceedingsofIn-terspeech,pages1618–1621,Melbourne,Australia.MinweiFeng,ArneMauser,andHermannNey.2010.Asource-sidedecodingsequencemodelforstatisti-calmachinetranslation.InConferenceoftheAssocia-tionforMachineTranslationintheAmericas(AMTA),Denver,Colorado,USA.MichelGalleyandChristopherD.Manning.2008.Asimpleandeffectivehierarchicalphrasereorderingmodel.InEMNLP’08:ProceedingsoftheConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,pages848–856,Morristown,Nueva Jersey,USA.SpenceGreen,ConalSathi,andChristopherD.Man-ning.2009.NPsubjectdetectioninverb-initialAra-bicclauses.InProceedingsoftheThirdWorkshop

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
1
1
5
6
6
6
5
7

/
t

a
C
_
a
_
0
0
2
3
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

339

onComputationalApproachestoArabicScript-basedLanguages(CAASL3),Ottawa,Canada.SpenceGreen,MichelGalley,andChristopherD.Man-ning.2010.Improvedmodelsofdistortioncostforstatisticalmachinetranslation.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics(NAACL),pages867–875,LosAngeles,California.MagnusR.HestenesandEduardStiefel.1952.Meth-odsofconjugategradientsforsolvinglinearsystems.JournalofResearchoftheNationalBureauofStan-dards,49(6):409–436.H.Johnson,J.Martin,G.Foster,andR.Kuhn.2007.Im-provingtranslationqualitybydiscardingmostofthephrasetable.InInProceedingsofEMNLP-CoNLL07,pages967–975.PhilippKoehn,FranzJosefOch,andDanielMarcu.2003.Statisticalphrase-basedtranslation.InProceed-ingsofHLT-NAACL2003,pages127–133,Edmonton,Canada.PhilippKoehn,AmittaiAxelrod,AlexandraBirchMayne,ChrisCallison-Burch,MilesOsborne,andDavidTalbot.2005.Edinburghsystemdescriptionforthe2005IWSLTspeechtranslationevaluation.InProc.oftheInternationalWorkshoponSpokenLan-guageTranslation,October.P.Koehn,H.Hoang,A.Birch,C.Callison-Burch,M.Federico,N.Bertoldi,B.Cowan,W.Shen,C.Moran,R.Zens,C.Dyer,O.Bojar,A.Constantin,andE.Herbst.2007.Moses:OpenSourceToolkitforStatisticalMachineTranslation.InProceedingsofthe45thAnnualMeetingoftheAssociationforCom-putationalLinguisticsCompanionVolumeProceed-ingsoftheDemoandPosterSessions,pages177–180,Prague,CzechRepublic.RaviKumarandSergeiVassilvitskii.2010.General-izeddistancesbetweenrankings.InProceedingsofthe19thinternationalconferenceonWorldWideWeb,pages571–580,NewYork,Nueva York,USA.ACM.PercyLiang,BenTaskar,andDanKlein.2006.Align-mentbyagreement.InProceedingsoftheHumanLanguageTechnologyConferenceoftheNAACL,MainConference,pages104–111,NewYorkCity,EE.UU,June.RyanMcDonald,FernandoPereira,KirilRibarov,andJanHajiˇc.2005.Non-projectivedependencyparsingusingspanningtreealgorithms.InProceedingsoftheconferenceonHumanLanguageTechnologyandEm-piricalMethodsinNaturalLanguageProcessing,HLT’05,pages523–530,Stroudsburg,Pensilvania,USA.RobertC.MooreandChrisQuirk.2007.Fasterbeam-searchdecodingforphrasalstatisticalmachinetransla-tion.InInProceedingsofMTSummitXI,pages321–327,Copenhagen,Denmark.F.OchandH.Ney.2002.Discriminativetrainingandmaximumentropymodelsforstatisticalmachinetranslation.InProceedingsofthe40thAnnualMeet-ingoftheAssociationforComputationalLinguistics(LCA),pages295–302,Philadelhpia,PA.FranzJosefOch.2003.MinimumErrorRateTraininginStatisticalMachineTranslation.InErhardHinrichsandDanRoth,editores,Proceedingsofthe41stAnnualMeetingoftheAssociationforComputationalLinguis-tics,pages160–167.KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.BLEU:amethodforautomaticeval-uationofmachinetranslation.InProceedingsofthe40thAnnualMeetingoftheAssociationofCompu-tationalLinguistics(LCA),pages311–318,Philadel-phia,PA.StefanRiezlerandJohnT.Maxwell.2005.Onsomepitfallsinautomaticevaluationandsigniﬁcancetest-ingforMT.InProceedingsoftheACLWorkshoponIntrinsicandExtrinsicEvaluationMeasuresforMa-chineTranslationand/orSummarization,pages57–64,AnnArbor,Michigan,June.ChristophTillmann.2004.AUnigramOrientationModelforStatisticalMachineTranslation.InPro-ceedingsoftheJointConferenceonHumanLanguageTechnologiesandtheAnnualMeetingoftheNorthAmericanChapteroftheAssociationofComputa-tionalLinguistics(HLT-NAACL).KarthikVisweswariah,RajakrishnanRajkumar,AnkurGandhe,AnanthakrishnanRamanathan,andJiriNavratil.2011.Awordreorderingmodelforim-provedmachinetranslation.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNatu-ralLanguageProcessing,pages486–496,Edinburgh,Escocia,UK.,July.DekaiWu.1997.Stochasticinversiontransductiongrammarsandbilingualparsingofparallelcorpora.ComputationalLinguistics,23(3):377–403.SirvanYahyaeiandChristofMonz.2010.Dynamicdis-tortioninadiscriminativereorderingmodelforsta-tisticalmachinetranslation.InInternationalWork-shoponSpokenLanguageTranslation(IWSLT),París,France.RichardZensandHermannNey.2006.Discriminativereorderingmodelsforstatisticalmachinetranslation.InProceedingsontheWorkshoponStatisticalMa-chineTranslation,pages55–63,NewYorkCity,June.R.Zens,F.J.Och,andH.Ney.2002.Phrase-basedsta-tisticalmachinetranslation.In25thGermanConfer-enceonArtiﬁcialIntelligence(KI2002),pages18–32,Aachen,Germany.SpringerVerlag.