Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III.

Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III.

Submitted 2/2014; Revised 5/2014; Published 10/2014. c
(cid:13)

2014 Association for Computational Linguistics.

TREETALK:CompositionandCompressionofTreesforImageDescriptionsPolinaKuznetsova††StonyBrookUniversityStonyBrook,NYpkuznetsova@cs.stonybrook.eduVicenteOrdonez‡TamaraL.Berg‡‡UNCChapelHillChapelHill,Caroline du Nord{vicente,tlberg}@cs.unc.eduYejinChoi††††UniversityofWashingtonSeattle,WAyejin@cs.washington.eduAbstractWepresentanewtreebasedapproachtocomposingexpressiveimagedescriptionsthatmakesuseofnaturallyoccuringwebimageswithcaptions.Weinvestigatetworelatedtasks:imagecaptiongeneralizationandgen-eration,wheretheformerisanoptionalsub-taskofthelatter.Thehigh-levelideaofourapproachistoharvestexpressivephrases(astreefragments)fromexistingimagedescrip-tions,thentocomposeanewdescriptionbyselectivelycombiningtheextracted(andop-tionallypruned)treefragments.Keyalgo-rithmiccomponentsaretreecompositionandcompression,bothintegratingtreestructurewithsequencestructure.Ourproposedsystemattainssignificantlybetterperformancethanpreviousapproachesforbothimagecaptiongeneralizationandgeneration.Inaddition,ourworkisthefirsttoshowtheempiricalben-efitofautomaticallygeneralizedcaptionsforcomposingnaturalimagedescriptions.1IntroductionThewebisincreasinglyvisual,withhundredsofbil-lionsofusercontributedphotographshostedonline.Asubstantialportionoftheseimageshavesomesortofaccompanyingtext,rangingfromkeywords,tofreetextonwebpages,totextualdescriptionsdi-rectlydescribingdepictedimagecontent(i.e.cap-tions).Wetapintothelastkindoftext,usingnatu-rallyoccuringpairsofimageswithnaturallanguagedescriptionstocomposeexpressivedescriptionsforqueryimagesviatreecompositionandcompression.Suchautomaticimagecaptioningeffortscouldpotentiallybeusefulformanyapplications:fromautomaticorganizationofphotocollections,tofacil-itatingimagesearchwithcomplexnaturallanguagequeries,toenhancingwebaccessibilityforthevi-suallyimpaired.Ontheintellectualside,bylearn-ingtodescribethevisualworldfromnaturallyexist-ingwebdata,ourstudyextendsthedomainsoflan-guagegroundingtothehighlyexpressivelanguagethatpeopleuseintheireverydayonlineactivities.Therehasbeenarecentspikeineffortstoau-tomaticallydescribevisualcontentinnaturallan-guage(Yangetal.,2011;Kulkarnietal.,2011;Lietal.,2011;Farhadietal.,2010;Krishnamoorthyetal.,2013;ElliottandKeller,2013;YuandSiskind,2013;Socheretal.,2014).Thisreflectsthelongstandingunderstandingthatencodingthecomplex-itiesandsubtletiesofimagecontentoftenrequiresmoreexpressivelanguageconstructsthanasetoftags.Nowthatvisualrecognitionalgorithmsarebe-ginningtoproducereliableestimatesofimagecon-tent(Perronninetal.,2012;Dengetal.,2012a;Dengetal.,2010;Krizhevskyetal.,2012),thetimeseemsripetobeginexploringhigherlevelsemantictasks.Therehavebeentwomaincomplementarydirec-tionsexploredforautomaticimagecaptioning.Thefirstfocusesondescribingexactlythoseitems(e.g.,objects,attributes)thataredetectedbyvisionrecog-nition,whichsubsequentlyconfineswhatshouldbedescribedandhow(Yaoetal.,2010;Kulkarnietal.,2011;Kojimaetal.,2002).Approachesinthisdirec-tioncouldbeidealforvariouspracticalapplicationssuchasimagedescriptionforthevisuallyimpaired.However,itisnotclearwhetherthesemanticexpres-sivenessoftheseapproachescaneventuallyscaleuptothecasual,buthighlyexpressivelanguagepeo-

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

352

Target’Image’Acow!standing!dans!le!eau!je!no/ced!que!ce!funny!cow!était”staring”à”me”UN!bird!hovering!dans”le”grassYou!peut!voir!ces!beau/ful!hills!only!dans”le”countrysideObject’Ac/on’Stuff’Scene’Figure1:Harvestingphrases(astreefragments)forthetargetimagebasedon(partial)visualmatch.plenaturallyuseintheironlineactivities.InFig-ure1,forexample,itwouldbehardtocompose“Inoticedthatthisfunnycowwasstaringatme”or“Youcanseethesebeautifulhillsonlyinthecoun-tryside”inapurelybottom-upmannerbasedontheexactcontentdetected.Thekeytechnicalbottleneckisthattherangeofdescribablecontent(i.e.,objects,attributes,actions)isultimatelyconfinedbythesetofitemsthatcanbereliablyrecognizedbystate-of-the-artvisiontechniques.Theseconddirection,inacomplementaryavenuetothefirst,hasexploredwaystomakeuseoftherichspectrumofvisualdescriptionscontributedbyonlinecitizens(Kuznetsovaetal.,2012;FengandLapata,2013;Mason,2013;Ordonezetal.,2011).Intheseapproaches,thesetofwhatcanbedescribedcanbesubstantiallylargerthanthesetofwhatcanberecognized,wheretheformerisshapedanddefinedbythedata,ratherthanbyhumans.Thisallowstheresultingdescriptionstobesubstantiallymoreex-pressive,elaborate,andinterestingthanwhatwouldbepossibleinapurelybottom-upmanner.Ourworkcontributestothissecondlineofresearch.Onechallengeinutilizingnaturallyexistingmul-timodaldata,cependant,isthenoisysemanticalign-mentbetweenimagesandtext(Dodgeetal.,2012;Bergetal.,2010).Donc,wealsoinvesti-gatearelatedtaskofimagecaptiongeneralization(Kuznetsovaetal.,2013),whichaimstoimprovethesemanticimage-textalignmentbyremovingbitsoftextfromexistingcaptionsthatarelesslikelytobetransferabletootherimages.Thehigh-levelideaofoursystemistoharvestusefulbitsoftext(astreefragments)fromexist-ingimagedescriptionsusingdetectedvisualcontentsimilarity,andthentocomposeanewdescriptionbyselectivelycombiningtheseextracted(andop-tionallypruned)treefragments.Thisoverallideaofcompositionbasedonextractedphrasesisnotnewinitself(Kuznetsovaetal.,2012),cependant,wemakeseveraltechnicalandempiricalcontributions.First,weproposeanovelstochastictreecompo-sitionalgorithmbasedonextractedtreefragmentsthatintegratesbothtreestructureandsequenceco-hesionintostructuralinference.Ouralgorithmper-mitsasubstantiallyhigherleveloflinguisticexpres-siveness,flexibility,andcreativitythanthosebasedonrulesortemplates(Kulkarnietal.,2011;Yangetal.,2011;Mitchelletal.,2012),whilealsoaddress-inglong-distancegrammaticalrelationsinamoreprincipledwaythanthosebasedonhand-codedcon-straints(Kuznetsovaetal.,2012).Deuxième,weaddressimagecaptiongeneralizationasanoptionalsubtaskofimagecaptiongeneration,andproposeatreecompressionalgorithmthatper-formsalight-weightparsingtosearchfortheop-timalsetoftreebranchestoprune.Ourworkisthefirsttoreportempiricalbenefitsofautomaticallycompressedcaptionsforimagecaptioning.Theproposedapproachesattainsignificantlybet-terperformanceforbothimagecaptiongeneraliza-tionandgenerationtasksovercompetitivebaselinesandpreviousapproaches.Ourworkresultsinanim-provedimagecaptioncorpuswithautomaticgener-alization,whichispubliclyavailable.12HarvestingTreeFragmentsGivenaqueryimage,weretrieveimagesthatarevi-suallysimilartothequeryimage,thenextractpo-tentiallyusefulsegments(i.e.,phrases)fromtheircorrespondingimagedescriptions.Wethencom-poseanewimagedescriptionusingtheseretrievedtextfragments(§3).Extractionofusefulphrasesisguidedbybothvisualsimilarityandthesyn-tacticparseofthecorrespondingtextualdescrip-1http://ilp-cky.appspot.com/

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

353

tion.Thisextractionstrategy,originallyproposedbyKuznetsovaetal.(2012),attemptstomakethebestuseoflinguisticregularitieswithrespecttoobjects,actions,andscenes,makingitpossibletoobtainrichertextualdescriptionsthanwhatcur-rentstate-of-the-artvisiontechniquescanprovideinisolation.InallofourexperimentsweusethecaptionedimagecorpusofOrdonezetal.(2011),firstpre-processingthecorpusforrelevantcontentbyrunningdeformablepartmodelobjectdetec-tors(Felzenszwalbetal.,2010).Forourstudy,werundetectorsfor89objectclassessetahighconfi-dencethresholdfordetection.AsillustratedinFigure1,foraqueryimagede-tection,weextractfourtypesofphrases(astreefragments).D'abord,weretrieverelevantnounphrasesfromimageswithvisuallysimilarobjectdetections.Weusecolor,texture(LeungandMalik,1999),andshape(DalalandTriggs,2005;Lowe,2004)basedfeaturesencodedinahistogramofvectorquantizedresponsestomeasurevisualsimilarity.Second,weextractverbphrasesforwhichthecorrespondingnounphrasetakesthesubjectrole.Third,fromthoseimageswith“stuff”detections,e.g.“water”,or“sky”(typicallymassnouns),weextractpreposi-tionalphrasesbasedonsimilarityofbothvisualap-pearanceandrelativespatialrelationshipsbetweendetectedobjectsand“stuff”.Finally,weuseglobal“scene”similarity2toextractprepositionalphrasesreferringtotheoverallscene,e.g.,“attheconfer-ence,”or“inthemarket”.Weperformthisphraseretrievalprocessforeachdetectedobjectinthequeryimageandgenerateonesentenceforeachobject.Allsentencesarethencombinedtogethertoproducethefinaldescription.Optionally,weapplyimagecaptiongeneralization(viacompression)(§4)toallcaptionsinthecorpuspriortothephraseextractionandcomposition.3TreeCompositionWemodeltreecompositionasconstraintoptimiza-tion.Theinputtoouralgorithmisthesetofre-trievedphrases(i.e.,treefragments),asillustratedin§2.LetP={p0,…,pL−1}bethesetofallphrasesacrossthefourphrasetypes(objets,ac-tions,stuffandscene).Weassumeamappingfunc-2L2distancebetweenclassificationscorevectors(Xiaoetal.,2010)tionpt:[0,L)→T,whereTisthesetofphrasetypes,sothatthephrasetypeofpiispt(je).Inad-dition,letRbethesetofPCFGproductionrulesandNTbethesetofnonterminalsymbolsofthePCFG.Thegoalistofindandcombineagoodse-quenceofphrasesG,|G||T|=N=4,drawnfromP,intoafinalsentence.Moreconcretely,wewanttoselectandorderasubsetofphrases(atmostonephraseofeachphrasetype)whileconsideringboththeparsestructureandn-gramcohesionacrossphrasalboundaries.Figure2showsasimplifiedexampleofacom-posedsentencewithitscorrespondingparsestruc-ture.Forbrevity,thefigureshowsonlyonephraseforeachphrasetype,butinactualitytherewouldbeasetofcandidatephrasesforeachtype.Figure3showstheCKY-stylerepresentationoftheinternalmechanicsofconstraintoptimizationfortheexam-plecompositionfromFigure2.EachcellijoftheCKYmatrixcorrespondstoGij,asubsequenceofGstartingatpositioniandendingatpositionj.IfacellintheCKYmatrixislabeledwithanontermi-nalsymbols,itmeansthatthecorrespondingtreeofGijhassasitsroot.AlthoughwevisualizetheoperationusingaCKY-stylerepresentationinFigure3,notethatcomposi-tionrequiresmorecomplexcombinatorialdecisionsthanCKYparsingduetotwoadditionalconsidera-tions.Weare:(1)selectingasubsetofcandidatephrases,et(2)re-orderingtheselectedphrases(hencemakingtheproblemNP-hard).Donc,weencodeourproblemusingIntegerLinearPro-gramming(ILP)(RothandtauYih,2004;ClarkeandLapata,2008)andusetheCPLEX(ILOG,Inc,2006)solver.3.1ILPVariablesVariablesforSequenceStructure:Variablesαen-codephraseselectionandordering:αik=1iffphrasei∈Pisselected(1)forpositionk∈[0,N)WherekisoneoftheN=4positionsinasentence.3Additionally,wedefinevariablesforeachpairofad-jacentphrasestocapturesequencecohesion:3Thenumberofpositionsisequaltothenumberofphrasetypes,sinceweselectatmostonefromeachtype.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

354

UN”cow in”le”countryside wasstaring”à”me in#the#grass NP PP VP PP NP S i=0$j=2$k=1$0123levelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[je,j).Hereandfortheremainderofthepaper,notation[je,j)moyens{je,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[je,j).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6ThenumberofpositionsisequaltothenumberofphrasetypesFigure2:Anexamplescenariooftreecomposition.Onlythefirstthreephrasesarechosenforthecomposition.αijk=1iffαik=αj(k+1)=1(2)VariablesforTreeStructure:Variablesβencodetheparsestructure:βijs=1iffthephrasesequenceGij(3)mapstothenonterminalsymbols∈NTWherei∈[0,N)andj∈[je,N)indexrowsandcolumnsoftheCKY-stylematrixinFigure3.Acor-respondingexampletreeisshowninFigure2,wherethephrasesequenceG02correspondstothecellla-beledwithS.WealsodefinevariablestoindicateselectedPCFGrulesintheresultingparse:βijkr=1iffβijh=βikp(4)(k+1)jq=1,Wherer=h→pq∈Randk∈[je,j).IndexkpointstotheboundaryofsplitbetweentwochildrenasshowninFigure2forthesequenceG02.AuxiliaryVariables:Fornotationalconvenience,wealsoinclude:γijk=1iffPs∈NTβijs(5)=Ps∈NTβiks=Ps∈NTβ(k+1)js=13.2ILPObjectiveFunctionWemodeltreecompositionasmaximizationofthefollowingobjectivefunction:F=XiFi×N−1Xk=0αik(6)+XijFij×N−2Xk=0αijk+Xijj−1Xk=iXr∈RFr×βijkrNP NP S Acow PP PP-VP in”le”countryside VP VP wasstaring”à”me PP in#the#grass 00″01″02″03″11″12″13″33″22″23″k=1$k=0$levelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[je,j).Hereandfortheremainderofthepaper,notation[je,j)moyens{je,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[je,j).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypeslevelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[je,j).Hereandfortheremainderofthepaper,notation[je,j)moyens{je,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[je,j).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypeslevelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[je,j).Hereandfortheremainderofthepaper,notation[je,j)moyens{je,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[je,j).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypesk=0$oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Ainsi,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,h,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)↵i0=1(14)↵ij1=1(15)En plus,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Enfin,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,et(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Ainsi,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,h,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)↵i0=1(14)↵ij1=1(15)En plus,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Enfin,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,et(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Ainsi,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,h,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)Fi(14)Fij(15)En plus,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Enfin,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,et(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Ainsi,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,h,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)Fi(14)Fij(15)En plus,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Enfin,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,et(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.Figure3:CKY-stylerepresentationofdecisionvariablesasdefinedin§3.1forthetreeexampleinFig2.Non-terminalsymbolsinboldface(inblue)andsolidarrows(alsoinblue)representthechosenPCFGrulestocom-binetheselectedsetofphrases.Nonterminalsymbolsinsmallerfont(inred)anddottedarrows(alsoinred)rep-resentpossibleotherchoicesthatarenotselected.Thisobjectiveiscomprisedofthreetypesofweights(confidencescores):Fi,Fij,Fr.4Firepresentsthephraseselectionscorebasedonvisualsimilarity,de-scribedin§2.Fijquantifiesthesequencecohe-sionacrossphraseboundaries.Forthis,weusen-gramscores(n∈[2,5])betweenadjacentphrasescomputedusingtheGoogleWeb1-Tcorpus(BrantsandFranz.,2006).Enfin,FrquantifiesPCFGrulescores(logprobabilities)estimatedfromthe1Mim-agecaptioncorpus(Ordonezetal.,2011)parsedus-ingtheStanfordparser(KleinandManning,2003).OnecanviewFiasacontentselectionscore,whileFijandFrcorrespondtolinguisticfluencyscorescapturingsequenceandtreestructurerespec-tively.Ifwesetpositivevaluesforalloftheseweights,theoptimizationfunctionwouldbebiasedtowardverboseproduction,sinceselectinganaddi-tionalphrasewillincreasetheobjectivefunction.Tocontrolforverbosity,wesetscorescorrespondingtolinguisticfluency,i.e.,FijandFrusingnegativevalues(smallerabsolutevaluesforhigherfluency),tobalancedynamicsbetweencontentselectionandlinguisticfluency.3.3ILPConstraintsSoundnessConstraints:Weneedconstraintstoenforceconsistencybetweendifferenttypesofvari-4Allweightsarenormalizedusingz-score.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

355

ables(Equations2,4,5).ConstraintsforaproductoftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).∀ijk,αijk≤αik(7)αijk≤αj(k+1)αijk+(1−αik)+(1−αj(k+1))≥1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedbyαijkmustbeconsistentwiththeorderingofphrasesimpliedbytheβvariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,βkks)intheCKY-stylematrixwithαvariablesasfollows:∀ik,αik≤Xs∈NTiβkks(8)∀k,Xiαik=Xs∈NTβkks(9)WhereNTireferstothesetofPCFGnonterminalsthatarecompatiblewithaphrasetypept(je)ofpi.Forexample,NTi={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Ainsi,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire∀ij,Xs∈NTβijs≤1(10)Wealsorequirethat∀i,j>i,h,βijh=j−1Xk=iXr∈Rhβijkr(11)WhereRh={r∈R:r=h→pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.∀k∈[1,N),Xs∈NTβkks≤N−1Xt=kXs∈NTβ0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:∀ijXkγijk≤1(13)En plus,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Enfin,weincludeseveralconstraintstoavoiddegeneratesolutionsortootherwiseenhancethecomposedoutput.We:(1)enforcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,wedonotneedtospecificallyhandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisusingtreefragmentsasthebuildingblocksofcompositionratherthanindivid-ualwords.Therearethreepracticalbenefits:(1)syntacticandsemanticexpressiveness,(2)correct-ness,et(3)computationalefficiency.Becauseweextractphrasesfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseex-tractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefragments,theILPformulationencodedinthisworkiscom-putationallylightweight.Iftheunitofcompositionwaswords,theILPinstanceswouldbesignificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticerrors.4TreeCompressionAsnotedbyrecentstudies(MasonandCharniak,2013;Kuznetsovaetal.,2013;Jamiesonetal.,2010),naturallyexistingimagecaptionsoftenin-cludecontextualinformationthatdoesnotdirectlydescribevisualcontent,whichultimatelyhinderstheirusefulnessfordescribingotherimages.There-fore,toimprovethefidelityofthegenerateddescrip-tions,weexploreimagecaptiongeneralizationasan

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

356

Late%in%the%day,%un,er%my%sunset%shot%a2empts,%my%cat%strolled%along%the%fence%and%posed%for%this%classic%profile%Late%in%the%day%%%cat%%%posed%for%this%profile%Generaliza)on+This%bridge%stands%late%in%the%day,%un,er%my%sunset%shot%a2empts%A%cat%strolled%along%the%fence%and%posed%for%this%classic%profile%Figure4:Compressedcaptions(ontheleft)aremoreap-plicablefordescribingnewimages(ontheright).optionalpre-processingstep.Figure4illustratesaconcreteexampleofimagecaptiongeneralizationinthecontextofimagecaptiongeneration.Wecastcaptiongeneralizationassentencecom-pression.WeencodetheproblemastreepruningvialightweightCKYparsing,whilealsoincorporatingseveralotherconsiderationssuchasleaf-levelngramcohesionscoresandvisuallyinformedcontentselec-tion.Figure5showsanexamplecompression,andFigure6showsthecorrespondingCKYmatrix.Atahighlevel,thecompressionoperationresem-blesbottom-upCKYparsing,butinadditiontopars-ing,wealsoconsiderdeletionofpartsofthetrees.Whendeletingpartsoftheoriginaltree,wemightneedtore-parsetheremainderofthetree.Notethatweconsiderre-parsingonlywithrespecttotheorig-inalparsetreeproducedbyastate-of-the-artparser,henceitisonlyalight-weightparsing.54.1DynamicProgrammingInputtothealgorithmisasentence,representedasavectorx=x0xn−1=x[0:n−1],anditsPCFGparseπ(X)obtainedfromtheStanfordparser.Forsimplicityofnotation,weassumethatboththeparsetreeandthewordsequenceareencodedinx.Then,thecompressioncanbeformalizedas:5Integratingfullparsingintotheoriginalsentencewouldbeastraightforwardextensionconceptually,butmaynotbeanem-piricallybetterchoicewhenparsingforcompressionisbasedonvanillaunlexicalizedparsing.ˆy=argmaxyYiφi(X,oui)(14)Whereeachφiisapotentialfunction,correspondingtoacriteriaofthedesiredcompression:φi(X,oui)=exp(θi·fi(X,oui))(15)Whereθiistheweightforaparticularcriteria(de-scribedin§4.2),whosescoringfunctionisfi.Wesolvethedecodingproblem(Equation14)us-ingdynamicprogramming.Forthis,weneedtosolvethecompressionsub-problemsforsequencesx[je:j],whichcanbeviewedasbranchesˆy[je,j]ofthefinaltreeˆy[0:n−1].Forexample,inFigure5,thefinalsolutionisˆy[0:7],whileasub-solutionofx[4:7]correspondstoatreebranchPP.Noticethatsub-solutionˆy[3:7]representsthesamebranchasˆy[4:7]duetobranchdeletion.Somecomputedsub-solutions,e.g.,ˆy[1:4],getdroppedfromthefinalcompressedtree.WedefineamatrixofscoresD[je,j,h](Equa-tion17),wherehisoneofthenonterminalsymbolsbeingconsideredforacellindexedbyi,j,i.e.acan-didatefortherootsymbolofabranchˆy[je:j].WhenallvaluesD[je,j,h]arecomputed,wetakeˆh=argmaxhD[0,n−1,h](16)andbacktracktoreconstructthefinalcompression(theexactsolutiontoequation14).D[je,j,h]=maxk∈[je,j)r∈Rh(1)D[je,k,p]+D[k+1,j,q]+∆φ[r,ij](2)D[je,k,p]+∆φ[r,ij](3)D[k+1,j,p]+∆φ[r,ij](17)WhereRh={r∈R:r=h→pq∨r=h→p}.Indexkdeterminesasplitpointforchildbranchesofasubtreeˆy[je:j].Forexample,intheFigure5thesplitpointforchildrenofthesubtreeˆy[0:7]isk=2.Thethreecases((1)–(3))oftheaboveequationcorrespondtothefollowingtreepruningcases:PruningCase(1):Noneofthechildrenofthecur-rentnodeisdeleted.Forexample,inFigures5and6,thePCFGrulePP→INPP,correspondingtothesequence“inblackandwhite”,isretained.Anothersituationthatcanbeencounteredistreere-parsing.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

357

Vintage!motorcycle!shot!fait!dans!black!et!blanc!JJ!NN!NN!VBN!IN!JJ!JJ!CC!NP, NN!NP!CC-JJ VP, PP NP!PP S Dele%on!probability!Rule!probability!Vision!confidence!Ngram!cohesion!(Dele%on,)case)2))(Dele%on,)case)1))01234567k=2$Figure5:CKYcompression.Boththechosenrulesandphrases(blueboldfontandbluesolidarrows)andnotchosenrulesandphrases(reditalicsmallerfontandreddashedlines)areshown.PruningCase(2)/(3):Deletionoftheleft/rightchildrespectively.Therearetwotypesofdeletion,asillustratedinFigures5and6.Thefirstcorre-spondstodeletionofachildnode.Forexample,thesecondchildNNofruleNP→NPNNisdeleted,whichyieldsdeletionof“shot”.Thesec-ondtypeisaspecialcaseofpropagatinganodetoahigher-levelofthetree.InFigure6,thissit-uationoccurswhendeletingJJ“Vintage”,whichcausesthepropagationofNNfromcell11tocell01.Forthispurpose,weexpandthesetofrulesRwithadditionalspecialrulesoftheformh→h,e.g.,NN→NN,whichallowspropagationoftreenodestohigherlevelsofthecompressedtree.64.2ModelingCompressionCriteriaThe∆φterm7inEquation17denotesthesumoflogofpotentialfunctionsforeachcriteriaq:∆φ[r,ij]=Xqθ·∆fq(r,ij)(18)Notethat∆φdependsonthecurrentruler,alongwiththehistoricalinformationbeforethecurrentstepij,suchastheoriginalrulerij,andngramsontheborderbetweenleftandrightchildbranchesofrulerij.Weusethefollowingfourcriteriafqinourmodel,whicharedemonstratedinFigures5and6.I.TreeStructure:WecapturePCFGruleprob-abilitiesestimatedfromthecorpusas∆fpcfg=logPpcfg(r).6Weassignprobabilitiesofthesespecialpropagationrulesto1sothattheywillnotaffectthefinalparsetreescore.TurnerandCharniak(2005)handledpropagationcasessimilarly.7Weuse∆todistinguishthepotentialvalueforthewholesentencefromthegainofthepotentialduringasinglestepofthealgorithm.JJ NP, NN NP S Vintage NN motorcycle NN shot VBN VP, PP done IN PP in JJ NP black CC CC-JJ and JJ white 00″11″01″Rule%probability%Ngram%cohesion%Dele6on%probability%Vision%Confidence%i”j”Figure6:CKYcompression.Boththechosenrulesandphrases(blueboldfontandbluesolidarrows)andnotchosenrulesandphrases(reditalicsmallerfontandreddashedlines)areshown.II.SequenceStructure:Weincorporatengramcohesionscoresonlyacrosstheborderbetweentwobranchesofasubtree.III.BranchDeletionProbabilities:Wecomputeprobabilitiesofdeletionforchildrenas:∆fdel=logP(rt|rij)=logcount(rt,rij)count(rij)(19)Wherecount(rt,rij)isthefrequencyinwhichrijistransformedtortbydeletionofoneofthechildren.Weestimatethisprobabilityfromatrainingcorpus,describedin§4.3.count(rij)isthecountofrijinuncompressedsentences.IV.VisionDetection(ContentSelection):Wewanttokeepwordsreferringtoactualobjectsintheimage.Thus,weuseV(xj),avisualsimilarityscore,asourconfidenceofanobjectcorrespondingtowordxj.Thissimilarityisobtainedfromthevi-sualrecognitionpredictionsof(Dengetal.,2012b).Notethatsometestinstancesincluderulesthatwehavenotobservedduringtraining.Wedefaulttotheoriginalcaptioninthosecases.Theweightsθiaresetusingatuningdataset.Wecontrolover-compressionbysettingtheweightforfdeltoasmallvaluerelativetotheotherweights.4.3HumanCompressedCaptionsAlthoughwemodelimagecaptiongeneralizationassentencecompression,inpracticalapplicationswemaywanttheoutputsofthesetwotaskstobediffer-ent.Forexample,theremaybedifferencesinwhatshouldbedeleted(namedentitiesinnewswiresum-mariescouldbeimportanttokeep,whiletheymay

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

358

Orig:”Note”le”pillows,”ils”match”le”chair”que”goes”avec”it,”plus”le”table”dans”le”picture”est”included.%SeqCompression:%Le”table”dans”le”picture.””TreePruning:”Le”chair”avec”le”table”dans”le”picture.Orig:”Only”dans”winter;me”nous”voir”ces”birds”ici”dans”le”river.”%SeqCompression:”Voir”ces”birds”dans”le”river.””TreePruning:”Thesebirds”dans”le”river.””Orig:”Le”world’smostpowerfullighthousesi@ngbeside”le”maison”avec”le”world’sthickestcurtains.SeqCompression:%Si@ngbeside”le”maison””TreePruning:”Powerfullighthousebeside”le”maison”avec”le”curtains.””Orig:”Orangecloud”sur”streetlight”C”près”LanakilaStreet”(phonecamera).””SeqCompression:%Orangestreet””TreePruning:”Phonecamera.%Relevance(problem(Orig:”There’ssomething”à propos”having”5″trucksparked”dans”front”de”mon”maison”que”makesmefeel”tous”importantClike.SeqCompression:%Front”de”mon”house.””TreePruning:”Trucks”dans”front”mon”house.%Grammar(mistakes(Figure7:Captiongeneralization:good/badexamples.beextraneousforimagecaptiongeneralization).Tolearnthesyntacticpatternsforcaptiongeneraliza-tion,wecollectasmallsetofexamplecompressedcaptions(380intotal)usingAmazonMechanicalTurk(AMT)(Snowetal.,2008).Foreachimage,weasked3turkerstofirstlistallvisibleobjectsinanimageandthentowriteacompressedcaptionbyremovingnotvisuallyverifiablebitsoftext.Wethenaligntheoriginalandcompressedcaptionstomea-sureruledeletionprobabilities,excludingmisalign-ments,similartoKnightandMarcu(2000).Notethatweremovethisdatasetfromthe1Mcaptioncor-puswhenweperformdescriptiongeneration.5ExperimentsWeusethe1McaptionedimagecorpusofOrdonezetal.(2011).Wereserve1Kimagesasatestset,andusetherestofthecorpusforphraseextraction.Weexperimentwiththefollowingapproaches:ProposedApproaches:•TREEPRUNING:Ourtreecompressionap-proachasdescribedin§4.•SEQ+TREE:Ourtreecompositionapproachasdescribedin§3.•SEQ+TREE+PRUNING:SEQ+TREEusingcompressedcaptionsofTREEPRUNINGasbuildingblocks.BaselinesforComposition:•SEQ+LINGRULE:Themostequivalenttotheoldersequence-drivensystem(Kuznetsovaetal.,2012).Usesafewminorenhancements,suchassentence-boundarystatistics,toim-provegrammaticality.•SEQ:The§3systemwithouttreemodelsandmentionedenhancementsofSEQ+LINGRULE.MethodBleuMeteorw/(w/o)penaltyPRMSEQ+LINGRULE0.152(0.152)0.130.170.095SEQ0.138(0.138)0.120.180.094SEQ+TREE0.149(0.149)0.130.140.082SEQ+PRUNING0.177(0.177)0.150.160.101SEQ+TREE+PRUNING0.140(0.189)0.160.120.088Table1:AutomaticEvaluation•SEQ+PRUNING:SEQusingcompressedcap-tionsofTREEPRUNINGasbuildingblocks.Wealsoexperimentwiththecompressionofhumanwrittencaptions,whichareusedtogenerateimagedescriptionsforthenewtargetimages.BaselinesforCompression:•SEQCOMPRESSION(Kuznetsovaetal.,2013):Inferenceoperatesoverthesequencestructure.Althoughoptimizationissubjecttoconstraintsderivedfromdependencyparse,parsingisnotanexplicitpartoftheinferencestructure.Ex-ampleoutputsareshowninFigure7.5.1AutomaticEvaluationWeperformautomaticevaluationusingtwomea-sureswidelyusedinmachinetranslation:BLEU(Pa-pinenietal.,2002)8andMETEOR(DenkowskiandLavie,2011).9Weremoveallpunctuationandcon-vertcaptionstolowercase.Weuse1Ktestim-agesfromthecaptionedimagecorpus,10andas-sumetheoriginalcaptionsasthegoldstandardcap-tionstocompareagainst.TheresultsinTable18WeusetheunigramNISTimplementation:ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a-20091001.tar.gz9WithequalweightbetweenprecisionandrecallinTable1.10ExceptforthoseforwhichimageURLsarebroken,orCPLEXdidnotreturnasolution.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

359

Method-1Method-2CriteriaMethod-1preferredoverMethod-2(%)allturkersturkersw/κ>0.55turkersw/κ>0.6ImageDescriptionGenerationSEQ+TREESEQRel727272SEQ+TREESEQGmar838383SEQ+TREESEQAll686966SEQ+TREE+PRUNINGSEQ+TREERel687272SEQ+TREE+PRUNINGSEQ+TREEGmar413841SEQ+TREE+PRUNINGSEQ+TREEAll636466SEQ+TREESEQ+LINGRULEAll626462SEQ+TREE+PRUNINGSEQ+LINGRULEAll677577SEQ+TREE+PRUNINGSEQ+PRUNINGAll737575SEQ+TREE+PRUNINGHUMANAll241919ImageCaptionGeneralizationTREEPRUNINGSEQCOMPRESSION∗Rel656566Table2:HumanEvaluation:posedasabinaryquestion“whichofthetwooptionsisbetter?”withrespecttoRelevance(Rel),Grammar(Gmar),andOverall(All).AccordingtoPearson’sχ2test,allresultsarestatisticallysignificant.showthatboththeintegrationofthetreestructure(+TREE)andthegeneralizationofcaptionsusingtreecompression(+PRUNING)improvetheBLEUscorewithoutbrevitypenaltysignificantly,11whileimprovingMETEORonlymoderately(duetoanim-provementonprecisionwithadecreaseinrecall.)5.2HumanEvaluationNeitherBLEUnorMETEORdirectlymeasuregrammaticalcorrectnessoverlongdistancesandmaynotcorrespondperfectlytohumanjudgments.Therefore,wesupplementautomaticevaluationwithhumanevaluation.Forhumanevaluations,wepresenttwooptionsgeneratedfromtwocompet-ingsystems,andaskturkerstochoosetheonethatisbetterwithrespectto:relevance,grammar,andoverall.ResultsareshowninTable2with3turkerratingsperimage.Wefilteroutturkersbasedonacontrolquestion.Wethencomputetheselec-tionrate(%)ofpreferringmethod-1overmethod-2.Theagreementamongturkersisafrequentconcern.Therefore,wevarythesetofdependableusersbasedontheirCohen’skappascore(κ)againstotherusers.Itturnsout,filteringusersbasedonκdoesnotmakeabigdifferenceindeterminingthewinningmethod.Asexpected,tree-basedsystemssignificantlyout-performsequence-basedcounterparts.Forexample,11While4-gramBLEUwithbrevitypenaltyisfoundtocor-relatebetterwithhumanjudgesbyrecentstudies(ElliottandKeller,2014),wefoundthatthisisnotthecaseforourtask.Thismaybeduetothedifferencesinthegoldstandardcap-tions.Weusenaturallyexistingones,whichincludeawiderrangeofcontentandstylethancrowd-sourcedcaptions.Seq:”UN”bu&erfly”à”le”car”était”spo&éd”par”mon”nine”année”vieux”cousin.Seq+Pruning:”Le”bu&erflies”sont”un&racted”à”le”colourfulflowers”à”le”car.+Seq+Tree:”Le”bu&erflies”sont”un&racted”à”le”colourfulflowers”dans”HopeGardens.””Seq+Tree+Pruning:”Le”bu&erflies”sont”un&racted”à”le”colourfulflowers.Orig:”Le”bu&erflies”sont”un&racted”à”le”colourfulflowers”dans”HopeGardens.””SeqCompression:”Le”colourfulflowers.”””TreePruning:”Le”bu&erflies”sont”un&racted”à”le”colourfulflowers.”””Cap>onGeneraliza>on”Image”Descrip>onGenera>onFigure8:Anexampleofadescriptionpreferredoverhu-mangoldstandard.Imagedescriptionisimprovedduetocaptiongeneralization.SEQ+TREEisstronglypreferredoverSEQ,withaselectionrateof83%.Somewhatsurprisingly,im-provedgrammaticalityalsoseemstoimproverele-vancescores(72%),possiblybecauseitishardertoappreciatethesemanticrelevanceofautomaticcap-tionswhentheyarelesscomprehensible.Alsoasexpected,compositionsbasedonprunedtreefrag-mentssignificantlyimproverelevance(68–72%),whileslightlydeterioratinggrammar(38–41%).Notably,thecaptionsgeneratedbyoursystemarepreferredovertheoriginal(ownergenerated)cap-tions19–24%ofthetime.Onesuchexampleisin-cludedinFigure8:“Thebutterfliesareattractedtothecolorfulflowers.”Additionalexamples(goodandbad)arepro-videdinFigures9and10.Manyofthesecaptionsarehighlyexpressivewhileremainingsemantically

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
t

je

un
c
_
un
_
0
0
1
8
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

360

Human:”Someflower”sur”un”bar”dans”un”hotel”dans”Grapevine,”TX.””&Seq+Tree+Pruning:”Le”flower”était”donc”vif”et”un:rac
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. image

Télécharger le PDF