Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III.

Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III.

Submitted 2/2014; Überarbeitet 5/2014; Published 10/2014. C
(cid:13)

2014 Verein für Computerlinguistik.

TREETALK:CompositionandCompressionofTreesforImageDescriptionsPolinaKuznetsova††StonyBrookUniversityStonyBrook,NYpkuznetsova@cs.stonybrook.eduVicenteOrdonez‡TamaraL.Berg‡‡UNCChapelHillChapelHill,NC{vicente,tlberg}@cs.unc.eduYejinChoi††††UniversityofWashingtonSeattle,WAyejin@cs.washington.eduAbstractWepresentanewtreebasedapproachtocomposingexpressiveimagedescriptionsthatmakesuseofnaturallyoccuringwebimageswithcaptions.Weinvestigatetworelatedtasks:imagecaptiongeneralizationandgen-eration,wheretheformerisanoptionalsub-taskofthelatter.Thehigh-levelideaofourapproachistoharvestexpressivephrases(astreefragments)fromexistingimagedescrip-tions,thentocomposeanewdescriptionbyselectivelycombiningtheextracted(andop-tionallypruned)treefragments.Keyalgo-rithmiccomponentsaretreecompositionandcompression,bothintegratingtreestructurewithsequencestructure.Ourproposedsystemattainssignificantlybetterperformancethanpreviousapproachesforbothimagecaptiongeneralizationandgeneration.Inaddition,ourworkisthefirsttoshowtheempiricalben-efitofautomaticallygeneralizedcaptionsforcomposingnaturalimagedescriptions.1IntroductionThewebisincreasinglyvisual,withhundredsofbil-lionsofusercontributedphotographshostedonline.Asubstantialportionoftheseimageshavesomesortofaccompanyingtext,rangingfromkeywords,tofreetextonwebpages,totextualdescriptionsdi-rectlydescribingdepictedimagecontent(i.e.cap-tions).Wetapintothelastkindoftext,usingnatu-rallyoccuringpairsofimageswithnaturallanguagedescriptionstocomposeexpressivedescriptionsforqueryimagesviatreecompositionandcompression.Suchautomaticimagecaptioningeffortscouldpotentiallybeusefulformanyapplications:fromautomaticorganizationofphotocollections,tofacil-itatingimagesearchwithcomplexnaturallanguagequeries,toenhancingwebaccessibilityforthevi-suallyimpaired.Ontheintellectualside,bylearn-ingtodescribethevisualworldfromnaturallyexist-ingwebdata,ourstudyextendsthedomainsoflan-guagegroundingtothehighlyexpressivelanguagethatpeopleuseintheireverydayonlineactivities.Therehasbeenarecentspikeineffortstoau-tomaticallydescribevisualcontentinnaturallan-guage(Yangetal.,2011;Kulkarnietal.,2011;Lietal.,2011;Farhadietal.,2010;Krishnamoorthyetal.,2013;ElliottandKeller,2013;YuandSiskind,2013;Socheretal.,2014).Thisreflectsthelongstandingunderstandingthatencodingthecomplex-itiesandsubtletiesofimagecontentoftenrequiresmoreexpressivelanguageconstructsthanasetoftags.Nowthatvisualrecognitionalgorithmsarebe-ginningtoproducereliableestimatesofimagecon-tent(Perronninetal.,2012;Dengetal.,2012a;Dengetal.,2010;Krizhevskyetal.,2012),thetimeseemsripetobeginexploringhigherlevelsemantictasks.Therehavebeentwomaincomplementarydirec-tionsexploredforautomaticimagecaptioning.Thefirstfocusesondescribingexactlythoseitems(e.g.,objects,attributes)thataredetectedbyvisionrecog-nition,whichsubsequentlyconfineswhatshouldbedescribedandhow(Yaoetal.,2010;Kulkarnietal.,2011;Kojimaetal.,2002).Approachesinthisdirec-tioncouldbeidealforvariouspracticalapplicationssuchasimagedescriptionforthevisuallyimpaired.However,itisnotclearwhetherthesemanticexpres-sivenessoftheseapproachescaneventuallyscaleuptothecasual,buthighlyexpressivelanguagepeo-

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

352

Target’Image’Acow!Stehen!In!Die!Wasser!ICH!no/ced!Das!Das!funny!cow!War”staring”bei”me”A!bird!hovering!In”Die”grassYou!can!sehen!diese!beau/ful!hills!nur!In”Die”countrysideObject’Ac/on’Stuff’Scene’Figure1:Harvestingphrases(astreefragments)forthetargetimagebasedon(partial)visualmatch.plenaturallyuseintheironlineactivities.InFig-ure1,forexample,itwouldbehardtocompose“Inoticedthatthisfunnycowwasstaringatme”or“Youcanseethesebeautifulhillsonlyinthecoun-tryside”inapurelybottom-upmannerbasedontheexactcontentdetected.Thekeytechnicalbottleneckisthattherangeofdescribablecontent(i.e.,objects,attributes,Aktionen)isultimatelyconfinedbythesetofitemsthatcanbereliablyrecognizedbystate-of-the-artvisiontechniques.Theseconddirection,inacomplementaryavenuetothefirst,hasexploredwaystomakeuseoftherichspectrumofvisualdescriptionscontributedbyonlinecitizens(Kuznetsovaetal.,2012;FengandLapata,2013;Mason,2013;Ordonezetal.,2011).Intheseapproaches,thesetofwhatcanbedescribedcanbesubstantiallylargerthanthesetofwhatcanberecognized,wheretheformerisshapedanddefinedbythedata,ratherthanbyhumans.Thisallowstheresultingdescriptionstobesubstantiallymoreex-pressive,elaborate,andinterestingthanwhatwouldbepossibleinapurelybottom-upmanner.Ourworkcontributestothissecondlineofresearch.Onechallengeinutilizingnaturallyexistingmul-timodaldata,Jedoch,isthenoisysemanticalign-mentbetweenimagesandtext(Dodgeetal.,2012;Bergetal.,2010).daher,wealsoinvesti-gatearelatedtaskofimagecaptiongeneralization(Kuznetsovaetal.,2013),whichaimstoimprovethesemanticimage-textalignmentbyremovingbitsoftextfromexistingcaptionsthatarelesslikelytobetransferabletootherimages.Thehigh-levelideaofoursystemistoharvestusefulbitsoftext(astreefragments)fromexist-ingimagedescriptionsusingdetectedvisualcontentsimilarity,andthentocomposeanewdescriptionbyselectivelycombiningtheseextracted(andop-tionallypruned)treefragments.Thisoverallideaofcompositionbasedonextractedphrasesisnotnewinitself(Kuznetsovaetal.,2012),Jedoch,wemakeseveraltechnicalandempiricalcontributions.First,weproposeanovelstochastictreecompo-sitionalgorithmbasedonextractedtreefragmentsthatintegratesbothtreestructureandsequenceco-hesionintostructuralinference.Ouralgorithmper-mitsasubstantiallyhigherleveloflinguisticexpres-siveness,flexibility,andcreativitythanthosebasedonrulesortemplates(Kulkarnietal.,2011;Yangetal.,2011;Mitchelletal.,2012),whilealsoaddress-inglong-distancegrammaticalrelationsinamoreprincipledwaythanthosebasedonhand-codedcon-straints(Kuznetsovaetal.,2012).Zweite,weaddressimagecaptiongeneralizationasanoptionalsubtaskofimagecaptiongeneration,andproposeatreecompressionalgorithmthatper-formsalight-weightparsingtosearchfortheop-timalsetoftreebranchestoprune.Ourworkisthefirsttoreportempiricalbenefitsofautomaticallycompressedcaptionsforimagecaptioning.Theproposedapproachesattainsignificantlybet-terperformanceforbothimagecaptiongeneraliza-tionandgenerationtasksovercompetitivebaselinesandpreviousapproaches.Ourworkresultsinanim-provedimagecaptioncorpuswithautomaticgener-alization,whichispubliclyavailable.12HarvestingTreeFragmentsGivenaqueryimage,weretrieveimagesthatarevi-suallysimilartothequeryimage,thenextractpo-tentiallyusefulsegments(i.e.,phrases)fromtheircorrespondingimagedescriptions.Wethencom-poseanewimagedescriptionusingtheseretrievedtextfragments(§3).Extractionofusefulphrasesisguidedbybothvisualsimilarityandthesyn-tacticparseofthecorrespondingtextualdescrip-1http://ilp-cky.appspot.com/

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

353

tion.Thisextractionstrategy,originallyproposedbyKuznetsovaetal.(2012),attemptstomakethebestuseoflinguisticregularitieswithrespecttoobjects,Aktionen,andscenes,makingitpossibletoobtainrichertextualdescriptionsthanwhatcur-rentstate-of-the-artvisiontechniquescanprovideinisolation.InallofourexperimentsweusethecaptionedimagecorpusofOrdonezetal.(2011),firstpre-processingthecorpusforrelevantcontentbyrunningdeformablepartmodelobjectdetec-tors(Felzenszwalbetal.,2010).Forourstudy,werundetectorsfor89objectclassessetahighconfi-dencethresholdfordetection.AsillustratedinFigure1,foraqueryimagede-tection,weextractfourtypesofphrases(astreefragments).Erste,weretrieverelevantnounphrasesfromimageswithvisuallysimilarobjectdetections.Weusecolor,texture(LeungandMalik,1999),andshape(DalalandTriggs,2005;Lowe,2004)basedfeaturesencodedinahistogramofvectorquantizedresponsestomeasurevisualsimilarity.Second,weextractverbphrasesforwhichthecorrespondingnounphrasetakesthesubjectrole.Third,fromthoseimageswith“stuff”detections,e.g.“water”,or“sky”(typicallymassnouns),weextractpreposi-tionalphrasesbasedonsimilarityofbothvisualap-pearanceandrelativespatialrelationshipsbetweendetectedobjectsand“stuff”.Finally,weuseglobal“scene”similarity2toextractprepositionalphrasesreferringtotheoverallscene,e.g.,“attheconfer-ence,”or“inthemarket”.Weperformthisphraseretrievalprocessforeachdetectedobjectinthequeryimageandgenerateonesentenceforeachobject.Allsentencesarethencombinedtogethertoproducethefinaldescription.Optionally,weapplyimagecaptiongeneralization(viacompression)(§4)toallcaptionsinthecorpuspriortothephraseextractionandcomposition.3TreeCompositionWemodeltreecompositionasconstraintoptimiza-tion.Theinputtoouralgorithmisthesetofre-trievedphrases(i.e.,treefragments),asillustratedin§2.LetP={p0,…,pL−1}bethesetofallphrasesacrossthefourphrasetypes(Objekte,ac-tions,stuffandscene).Weassumeamappingfunc-2L2distancebetweenclassificationscorevectors(Xiaoetal.,2010)tionpt:[0,L)→T,whereTisthesetofphrasetypes,sothatthephrasetypeofpiispt(ich).Inad-dition,letRbethesetofPCFGproductionrulesandNTbethesetofnonterminalsymbolsofthePCFG.Thegoalistofindandcombineagoodse-quenceofphrasesG,|G|≤|T|=N=4,drawnfromP,intoafinalsentence.Moreconcretely,wewanttoselectandorderasubsetofphrases(atmostonephraseofeachphrasetype)whileconsideringboththeparsestructureandn-gramcohesionacrossphrasalboundaries.Figure2showsasimplifiedexampleofacom-posedsentencewithitscorrespondingparsestruc-ture.Forbrevity,thefigureshowsonlyonephraseforeachphrasetype,butinactualitytherewouldbeasetofcandidatephrasesforeachtype.Figure3showstheCKY-stylerepresentationoftheinternalmechanicsofconstraintoptimizationfortheexam-plecompositionfromFigure2.EachcellijoftheCKYmatrixcorrespondstoGij,asubsequenceofGstartingatpositioniandendingatpositionj.IfacellintheCKYmatrixislabeledwithanontermi-nalsymbols,itmeansthatthecorrespondingtreeofGijhassasitsroot.AlthoughwevisualizetheoperationusingaCKY-stylerepresentationinFigure3,notethatcomposi-tionrequiresmorecomplexcombinatorialdecisionsthanCKYparsingduetotwoadditionalconsidera-tions.Weare:(1)selectingasubsetofcandidatephrases,Und(2)re-orderingtheselectedphrases(hencemakingtheproblemNP-hard).daher,weencodeourproblemusingIntegerLinearPro-gramming(ILP)(RothandtauYih,2004;ClarkeandLapata,2008)andusetheCPLEX(ILOG,Inc,2006)solver.3.1ILPVariablesVariablesforSequenceStructure:Variablesαen-codephraseselectionandordering:αik=1iffphrasei∈Pisselected(1)forpositionk∈[0,N)WherekisoneoftheN=4positionsinasentence.3Additionally,wedefinevariablesforeachpairofad-jacentphrasestocapturesequencecohesion:3Thenumberofpositionsisequaltothenumberofphrasetypes,sinceweselectatmostonefromeachtype.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

354

A”cow in”Die”countryside wasstaring”bei”me in#the#grass NP PP VP PP NP S i=0$j=2$k=1$0123levelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[ich,J).Hereandfortheremainderofthepaper,notation[ich,J)means{ich,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[ich,J).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6ThenumberofpositionsisequaltothenumberofphrasetypesFigure2:Anexamplescenariooftreecomposition.Onlythefirstthreephrasesarechosenforthecomposition.αijk=1iffαik=αj(k+1)=1(2)VariablesforTreeStructure:Variablesβencodetheparsestructure:βijs=1iffthephrasesequenceGij(3)mapstothenonterminalsymbols∈NTWherei∈[0,N)andj∈[ich,N)indexrowsandcolumnsoftheCKY-stylematrixinFigure3.Acor-respondingexampletreeisshowninFigure2,wherethephrasesequenceG02correspondstothecellla-beledwithS.WealsodefinevariablestoindicateselectedPCFGrulesintheresultingparse:βijkr=1iffβijh=βikp(4)(k+1)jq=1,Wherer=h→pq∈Randk∈[ich,J).IndexkpointstotheboundaryofsplitbetweentwochildrenasshowninFigure2forthesequenceG02.AuxiliaryVariables:Fornotationalconvenience,wealsoinclude:γijk=1iffPs∈NTβijs(5)=Ps∈NTβiks=Ps∈NTβ(k+1)js=13.2ILPObjectiveFunctionWemodeltreecompositionasmaximizationofthefollowingobjectivefunction:F=XiFi×N−1Xk=0αik(6)+XijFij×N−2Xk=0αijk+Xijj−1Xk=iXr∈RFr×βijkrNP NP S Acow PP PP-VP in”Die”countryside VP VP wasstaring”bei”me PP in#the#grass 00″01″02″03″11″12″13″33″22″23″k=1$k=0$levelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[ich,J).Hereandfortheremainderofthepaper,notation[ich,J)means{ich,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[ich,J).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypeslevelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[ich,J).Hereandfortheremainderofthepaper,notation[ich,J)means{ich,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[ich,J).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypeslevelandeachnodeofthatlevel,algorithmhastodecide,whichparsetagtochoose.Thisprocessisrepresentedbyassignmentofaparticulartagtoamatrixcell.Thechosentagmustbeaheadofarule,fiexamplecell12isassignedtagVP,correspond-ingtoruleVP!VPPP.Thisruleconnectsleafs“goingouttosea”and“intheocean”.Theprob-lemistofindtagassignmentforeachcellofthema-trix,givensomecellscanbeempty,iftheydonotconnectchildrencells.lattercorrespondtochildrenbranchesofthetreeandbelongtothepreviousdiag-onalintheleft-to-rightorder.Alsowedonottryallpossiblepairs5ofchildrenfrompreviousdiagonal.WeusetechniquesimilartotheoneusedinCKYparsingapproach.Matrixcellpairscorrespondingtochildrenpairsare,wherek2[ich,J).Hereandfortheremainderofthepaper,notation[ich,J)means{ich,i+1,…,j1}andrish pqunlessotherwisestated.Theproblemofchoosingphraseordertogetherwiththebestparsetreeofthedescriptionisacom-plexoptimizationproblem,whichwesolveusingIntegerLinearProgramming(ILP).Weuseasepa-rateILPformulationforforsentencereorderingandsalientobjectselection,whichweomitforbrevity.Asmentionedearlier,overallforeachobjectwehavefourtypesofphrases.WeuseCKY-drivenILPformulationtocombinethemtogetherintoaplausi-bledescriptionswhichobeysPCFGrules.FortheremainderofthepaperwewillcallourILPformu-lationILP-TREE.WeexploitCplex(ILOG,Inc,2006)tosolveILPproblem.Todo:[mentioncplexparameters.Forinstance,30seclimitongeneration]3.0.2ILPvariablesPhraseIndicatorVariables:Wedefinevariables↵whichindicatephraseselectionandphraseordering.↵ijk=1iffphraseioftypej(1)isselectedforpositionk5ThereisonlytwochildrenasweuseChomskyNormalForm↵ij0=1↵ij1pq2=102S=1010(NP!NPPP)=1021=1Wherek2[0,N)Todo:[checkforthewholepa-perifkrangesfrom0]indexesoneofN=4positionsinasentence6.Phraseorderingiscapturedbyindicatorvariablesforadjacentpairsofphrases:↵ijkpq(k+1)=1iff↵ijk=↵pq(k+1)=1(2)AnexampleofILP-CKYatFigure3showsselec-tionofphrasesandtheirordering:“Thelittleboat”,“goingouttosea”and“intheocean”.TreeIndicatorVariables:Wealsodefinevariables,whichareindicatorsofCKYmatrixcontent(Fig-ure3).ijs=1iffcellijofthematrixisassigned(3)parsetreesymbolsTodo:[Renamesymbolstotagsthroughoutthepa-per]Wherei2[0,N)indexesCKYmatrixdiagonalsandj2[0,Ni)indexeselementsofdiagonali.InordertomodelruleselectionateachCKYstep,wedefinevariables,whichcorrespondtoaPCFGruleusedatthegivencellijofCKYmatrix:ijkr=1iffijh=ikp(4)=(k+1)jq=1,Wherer=h pq2Randk2[ich,J).Valuekcorrespondstothechoiceofchildrenforthecurrentcell.6Thenumberofpositionsisequaltothenumberofphrasetypesk=0$oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Daher,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,H,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)↵i0=1(14)↵ij1=1(15)Zusätzlich,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Endlich,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,Und(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Daher,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,H,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)↵i0=1(14)↵ij1=1(15)Zusätzlich,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Endlich,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,Und(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Daher,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,H,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)Fi(14)Fij(15)Zusätzlich,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Endlich,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,Und(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.oftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).8ijkpqm,↵ijk↵ik(7)↵ijk↵j(k+1)↵ijk+(1↵ik)+(1↵j(k+1))1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedby↵ijkmustbeconsistentwiththeorderingofphrasesimpliedbythevariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,kks)intheCKY-stylematrixwith↵variablesasfollows:8ik,↵ikXs2Sikks(8)8k,Xi↵ik=Xs2Skks(9)WhereSireferstothesetofPCFGnonterminalsthatarecompatiblewiththephrasetypeofpi.Forexample,Si={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Daher,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire8ij,Xs2Sijs1(10)Wealsorequirethat8i,j>i,H,ijh=j1Xk=iXr2Rhijkr(11)WhereRh={r2R:r=h!pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.8k2[1,N),Xs2SkksN1Xt=kXs2S0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:8ijXkijk1(13)Fi(14)Fij(15)Zusätzlich,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Endlich,weincludeseveralconstraintstoavoiddegeneratesolutionsorotherwisetoenhancethecomposedoutput:(1)en-forcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,weneednothandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisthatbuildingblocksofcom-positionaretreefragments,ratherthanindividualwords.Therearethreepracticalbenefits:(1)syn-tacticandsemanticexpressiveness,(2)correctness,Und(3)computationalefficiency.Becauseweex-tractnicesegmentsfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseextractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefrag-ments,theILPformulationencodedinthisworkiscomputationallylightweight.Iftheunitofcompo-sitionwaswords,theILPinstanceswouldbesig-nificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticer-rors.Figure3:CKY-stylerepresentationofdecisionvariablesasdefinedin§3.1forthetreeexampleinFig2.Non-terminalsymbolsinboldface(inblue)andsolidarrows(alsoinblue)representthechosenPCFGrulestocom-binetheselectedsetofphrases.Nonterminalsymbolsinsmallerfont(inred)anddottedarrows(alsoinred)rep-resentpossibleotherchoicesthatarenotselected.Thisobjectiveiscomprisedofthreetypesofweights(confidencescores):Fi,Fij,Fr.4Firepresentsthephraseselectionscorebasedonvisualsimilarity,de-scribedin§2.Fijquantifiesthesequencecohe-sionacrossphraseboundaries.Forthis,weusen-gramscores(n∈[2,5])betweenadjacentphrasescomputedusingtheGoogleWeb1-Tcorpus(BrantsandFranz.,2006).Endlich,FrquantifiesPCFGrulescores(logprobabilities)estimatedfromthe1Mim-agecaptioncorpus(Ordonezetal.,2011)parsedus-ingtheStanfordparser(KleinandManning,2003).OnecanviewFiasacontentselectionscore,whileFijandFrcorrespondtolinguisticfluencyscorescapturingsequenceandtreestructurerespec-tively.Ifwesetpositivevaluesforalloftheseweights,theoptimizationfunctionwouldbebiasedtowardverboseproduction,sinceselectinganaddi-tionalphrasewillincreasetheobjectivefunction.Tocontrolforverbosity,wesetscorescorrespondingtolinguisticfluency,i.e.,FijandFrusingnegativevalues(smallerabsolutevaluesforhigherfluency),tobalancedynamicsbetweencontentselectionandlinguisticfluency.3.3ILPConstraintsSoundnessConstraints:Weneedconstraintstoenforceconsistencybetweendifferenttypesofvari-4Allweightsarenormalizedusingz-score.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

355

fähig(Equations2,4,5).ConstraintsforaproductoftwovariableshavebeendiscussedbyClarkeandLapata(2008).ForEquation2,weaddthefollow-ingconstraints(similarconstraintsarealsoaddedforEquations4,5).∀ijk,αijk≤αik(7)αijk≤αj(k+1)αijk+(1−αik)+(1−αj(k+1))≥1ConsistencybetweenTreeLeafsandSequences:Theorderingofphrasesimpliedbyαijkmustbeconsistentwiththeorderingofphrasesimpliedbytheβvariables.Thiscanbeachievedbyaligningtheleafcells(i.e.,βkks)intheCKY-stylematrixwithαvariablesasfollows:∀ik,αik≤Xs∈NTiβkks(8)∀k,Xiαik=Xs∈NTβkks(9)WhereNTireferstothesetofPCFGnonterminalsthatarecompatiblewithaphrasetypept(ich)ofpi.Forexample,NTi={NN,NP,…}ifpicorrespondstoan“object”(noun-phrase).Daher,Equation8en-forcesthecorrespondencebetweenphrasetypesandnonterminalsymbolsatthetreeleafs.Equation9enforcestheconstraintthatthenumberofselectedphrasesandinstantiatedtreeleafsmustbethesame.TreeCongruenceConstraints:ToensurethateachCKYcellhasatmostonesymbolwerequire∀ij,Xs∈NTβijs≤1(10)Wealsorequirethat∀i,j>i,H,βijh=j−1Xk=iXr∈Rhβijkr(11)WhereRh={r∈R:r=h→pq}.Weenforcetheseconstraintsonlyfornon-leafs.Thisconstraintforbidsinstantiationswhereanonterminalsymbolhisselectedforcellijwithoutselectingacorrespond-ingPCFGrule.Wealsoensurethatweproduceavalidtreestruc-ture.Forinstance,ifweselect3phrasesasshowninFigure3,wemusthavetherootofthetreeatthecorrespondingcell02.∀k∈[1,N),Xs∈NTβkks≤N−1Xt=kXs∈NTβ0ts(12)Wealsorequirecellsthatarenotselectedfortheresultingparsestructuretobeempty:∀ijXkγijk≤1(13)Zusätzlich,wepenalizesolutionswithouttheStagattheparserootasasoft-constraint.MiscellaneousConstraints:Endlich,weincludeseveralconstraintstoavoiddegeneratesolutionsortootherwiseenhancethecomposedoutput.We:(1)enforcethatanoun-phraseisselected(toensurese-manticrelevancetotheimagecontent),(2)allowatmostonephraseofeachtype,(3)donotallowmul-tiplephraseswithidenticalheadwords(toavoidre-dundancy),(4)allowatmostonescenephraseforallsentencesinthedescription.Wefindthathan-dlingofsentenceboundariesisimportantiftheILPformulationisbasedonlyonsequencestructure,butwiththeintegrationoftree-basedstructure,wedonotneedtospecificallyhandlesentenceboundaries.3.4DiscussionAninterestingaspectofdescriptiongenerationex-ploredinthispaperisusingtreefragmentsasthebuildingblocksofcompositionratherthanindivid-ualwords.Therearethreepracticalbenefits:(1)syntacticandsemanticexpressiveness,(2)correct-ness,Und(3)computationalefficiency.Becauseweextractphrasesfromhumanwrittencaptions,weareabletouseexpressivelanguage,andlesslikelytomakesyntacticorsemanticerrors.Ourphraseex-tractionprocesscanbeviewedatahighlevelasvisually-groundedorvisually-situatedparaphrasing.Also,becausetheunitofoperationistreefragments,theILPformulationencodedinthisworkiscom-putationallylightweight.Iftheunitofcompositionwaswords,theILPinstanceswouldbesignificantlymorecomputationallyintensive,andmorelikelytosufferfromgrammaticalandsemanticerrors.4TreeCompressionAsnotedbyrecentstudies(MasonandCharniak,2013;Kuznetsovaetal.,2013;Jamiesonetal.,2010),naturallyexistingimagecaptionsoftenin-cludecontextualinformationthatdoesnotdirectlydescribevisualcontent,whichultimatelyhinderstheirusefulnessfordescribingotherimages.There-fore,toimprovethefidelityofthegenerateddescrip-tions,weexploreimagecaptiongeneralizationasan

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

356

Late%in%the%day,%A,er%my%sunset%shot%a2empts,%my%cat%strolled%along%the%fence%and%posed%for%this%classic%profile%Late%in%the%day%%%cat%%%posed%for%this%profile%Generaliza)on+This%bridge%stands%late%in%the%day,%A,er%my%sunset%shot%a2empts%A%cat%strolled%along%the%fence%and%posed%for%this%classic%profile%Figure4:Compressedcaptions(ontheleft)aremoreap-plicablefordescribingnewimages(ontheright).optionalpre-processingstep.Figure4illustratesaconcreteexampleofimagecaptiongeneralizationinthecontextofimagecaptiongeneration.Wecastcaptiongeneralizationassentencecom-pression.WeencodetheproblemastreepruningvialightweightCKYparsing,whilealsoincorporatingseveralotherconsiderationssuchasleaf-levelngramcohesionscoresandvisuallyinformedcontentselec-tion.Figure5showsanexamplecompression,andFigure6showsthecorrespondingCKYmatrix.Atahighlevel,thecompressionoperationresem-blesbottom-upCKYparsing,butinadditiontopars-ing,wealsoconsiderdeletionofpartsofthetrees.Whendeletingpartsoftheoriginaltree,wemightneedtore-parsetheremainderofthetree.Notethatweconsiderre-parsingonlywithrespecttotheorig-inalparsetreeproducedbyastate-of-the-artparser,henceitisonlyalight-weightparsing.54.1DynamicProgrammingInputtothealgorithmisasentence,representedasavectorx=x0xn−1=x[0:n−1],anditsPCFGparseπ(X)obtainedfromtheStanfordparser.Forsimplicityofnotation,weassumethatboththeparsetreeandthewordsequenceareencodedinx.Then,thecompressioncanbeformalizedas:5Integratingfullparsingintotheoriginalsentencewouldbeastraightforwardextensionconceptually,butmaynotbeanem-piricallybetterchoicewhenparsingforcompressionisbasedonvanillaunlexicalizedparsing.ˆy=argmaxyYiφi(X,j)(14)Whereeachφiisapotentialfunction,correspondingtoacriteriaofthedesiredcompression:φi(X,j)=exp(θi·fi(X,j))(15)Whereθiistheweightforaparticularcriteria(de-scribedin§4.2),whosescoringfunctionisfi.Wesolvethedecodingproblem(Equation14)us-ingdynamicprogramming.Forthis,weneedtosolvethecompressionsub-problemsforsequencesx[ich:J],whichcanbeviewedasbranchesˆy[ich,J]ofthefinaltreeˆy[0:n−1].Forexample,inFigure5,thefinalsolutionisˆy[0:7],whileasub-solutionofx[4:7]correspondstoatreebranchPP.Noticethatsub-solutionˆy[3:7]representsthesamebranchasˆy[4:7]duetobranchdeletion.Somecomputedsub-solutions,e.g.,ˆy[1:4],getdroppedfromthefinalcompressedtree.WedefineamatrixofscoresD[ich,J,H](Equa-tion17),wherehisoneofthenonterminalsymbolsbeingconsideredforacellindexedbyi,J,i.e.acan-didatefortherootsymbolofabranchˆy[ich:J].WhenallvaluesD[ich,J,H]arecomputed,wetakeˆh=argmaxhD[0,n−1,h](16)andbacktracktoreconstructthefinalcompression(theexactsolutiontoequation14).D[ich,J,H]=maxk∈[ich,J)r∈Rh(1)D[ich,k,P]+D[k+1,j,Q]+∆φ[R,ij](2)D[ich,k,P]+∆φ[R,ij](3)D[k+1,j,P]+∆φ[R,ij](17)WhereRh={r∈R:r=h→pq∨r=h→p}.Indexkdeterminesasplitpointforchildbranchesofasubtreeˆy[ich:J].Forexample,intheFigure5thesplitpointforchildrenofthesubtreeˆy[0:7]isk=2.Thethreecases((1)–(3))oftheaboveequationcorrespondtothefollowingtreepruningcases:PruningCase(1):Noneofthechildrenofthecur-rentnodeisdeleted.Forexample,inFigures5and6,thePCFGrulePP→INPP,correspondingtothesequence“inblackandwhite”,isretained.Anothersituationthatcanbeencounteredistreere-parsing.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

357

Vintage!motorcycle!shot!done!In!black!Und!Weiß!JJ!NN!NN!VBN!IN!JJ!JJ!CC!NP, NN!NP!CC-JJ VP, PP NP!PP S Dele%on!probability!Rule!probability!Vision!confidence!Ngram!cohesion!(Dele%on,)Fall)2))(Dele%on,)Fall)1))01234567k=2$Figure5:CKYcompression.Boththechosenrulesandphrases(blueboldfontandbluesolidarrows)andnotchosenrulesandphrases(reditalicsmallerfontandreddashedlines)areshown.PruningCase(2)/(3):Deletionoftheleft/rightchildrespectively.Therearetwotypesofdeletion,asillustratedinFigures5and6.Thefirstcorre-spondstodeletionofachildnode.Forexample,thesecondchildNNofruleNP→NPNNisdeleted,whichyieldsdeletionof“shot”.Thesec-ondtypeisaspecialcaseofpropagatinganodetoahigher-levelofthetree.InFigure6,thissit-uationoccurswhendeletingJJ“Vintage”,whichcausesthepropagationofNNfromcell11tocell01.Forthispurpose,weexpandthesetofrulesRwithadditionalspecialrulesoftheformh→h,e.g.,NN→NN,whichallowspropagationoftreenodestohigherlevelsofthecompressedtree.64.2ModelingCompressionCriteriaThe∆φterm7inEquation17denotesthesumoflogofpotentialfunctionsforeachcriteriaq:∆φ[R,ij]=Xqθ·∆fq(R,ij)(18)Notethat∆φdependsonthecurrentruler,alongwiththehistoricalinformationbeforethecurrentstepij,suchastheoriginalrulerij,andngramsontheborderbetweenleftandrightchildbranchesofrulerij.Weusethefollowingfourcriteriafqinourmodel,whicharedemonstratedinFigures5and6.I.TreeStructure:WecapturePCFGruleprob-abilitiesestimatedfromthecorpusas∆fpcfg=logPpcfg(R).6Weassignprobabilitiesofthesespecialpropagationrulesto1sothattheywillnotaffectthefinalparsetreescore.TurnerandCharniak(2005)handledpropagationcasessimilarly.7Weuse∆todistinguishthepotentialvalueforthewholesentencefromthegainofthepotentialduringasinglestepofthealgorithm.JJ NP, NN NP S Vintage NN motorcycle NN shot VBN VP, PP done IN PP in JJ NP black CC CC-JJ and JJ white 00″11″01″Rule%probability%Ngram%cohesion%Dele6on%probability%Vision%Confidence%i”J”Figure6:CKYcompression.Boththechosenrulesandphrases(blueboldfontandbluesolidarrows)andnotchosenrulesandphrases(reditalicsmallerfontandreddashedlines)areshown.II.SequenceStructure:Weincorporatengramcohesionscoresonlyacrosstheborderbetweentwobranchesofasubtree.III.BranchDeletionProbabilities:Wecomputeprobabilitiesofdeletionforchildrenas:∆fdel=logP(rt|rij)=logcount(rt,rij)zählen(rij)(19)Wherecount(rt,rij)isthefrequencyinwhichrijistransformedtortbydeletionofoneofthechildren.Weestimatethisprobabilityfromatrainingcorpus,describedin§4.3.count(rij)isthecountofrijinuncompressedsentences.IV.VisionDetection(ContentSelection):Wewanttokeepwordsreferringtoactualobjectsintheimage.Thus,weuseV(xj),avisualsimilarityscore,asourconfidenceofanobjectcorrespondingtowordxj.Thissimilarityisobtainedfromthevi-sualrecognitionpredictionsof(Dengetal.,2012b).Notethatsometestinstancesincluderulesthatwehavenotobservedduringtraining.Wedefaulttotheoriginalcaptioninthosecases.Theweightsθiaresetusingatuningdataset.Wecontrolover-compressionbysettingtheweightforfdeltoasmallvaluerelativetotheotherweights.4.3HumanCompressedCaptionsAlthoughwemodelimagecaptiongeneralizationassentencecompression,inpracticalapplicationswemaywanttheoutputsofthesetwotaskstobediffer-ent.Forexample,theremaybedifferencesinwhatshouldbedeleted(namedentitiesinnewswiresum-mariescouldbeimportanttokeep,whiletheymay

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

358

Orig:”Notiz”Die”pillows,”Sie”match”Die”chair”Das”geht”mit”Es,”Plus”Die”table”In”Die”picture”Ist”included.%SeqCompression:%Der”table”In”Die”picture.””TreePruning:”Der”chair”mit”Die”table”In”Die”picture.Orig:”Nur”In”Winter;me”Wir”sehen”diese”birds”Hier”In”Die”river.”%SeqCompression:”Sehen”diese”birds”In”Die”river.””TreePruning:”Diese”birds”In”Die”river.””Orig:”Der”world’smost”mächtig”lighthousesi@ngbeside”Die”house”mit”Die”world’sthickestcurtains.SeqCompression:%Si@ngbeside”Die”house””TreePruning:”Powerfullighthousebeside”Die”house”mit”Die”curtains.””Orig:”Orange”cloud”An”streetlight”C”nearLanakilaStreet”(Telefon”camera).””SeqCompression:%Orange”street””TreePruning:”Phonecamera.%Relevance(Problem(Orig:”There’ssomething”um”having”5″LKWs”parked”In”front”von”Mein”house”Das”makesme”fühlen”alle”importantClike.SeqCompression:%Front”von”Mein”house.””TreePruning:”Trucks”In”front”Mein”house.%Grammar(mistakes(Figure7:Captiongeneralization:good/badexamples.beextraneousforimagecaptiongeneralization).Tolearnthesyntacticpatternsforcaptiongeneraliza-tion,wecollectasmallsetofexamplecompressedcaptions(380intotal)usingAmazonMechanicalTurk(AMT)(Snowetal.,2008).Foreachimage,weasked3turkerstofirstlistallvisibleobjectsinanimageandthentowriteacompressedcaptionbyremovingnotvisuallyverifiablebitsoftext.Wethenaligntheoriginalandcompressedcaptionstomea-sureruledeletionprobabilities,excludingmisalign-ments,similartoKnightandMarcu(2000).Notethatweremovethisdatasetfromthe1Mcaptioncor-puswhenweperformdescriptiongeneration.5ExperimentsWeusethe1McaptionedimagecorpusofOrdonezetal.(2011).Wereserve1Kimagesasatestset,andusetherestofthecorpusforphraseextraction.Weexperimentwiththefollowingapproaches:ProposedApproaches:•TREEPRUNING:Ourtreecompressionap-proachasdescribedin§4.•SEQ+TREE:Ourtreecompositionapproachasdescribedin§3.•SEQ+TREE+PRUNING:SEQ+TREEusingcompressedcaptionsofTREEPRUNINGasbuildingblocks.BaselinesforComposition:•SEQ+LINGRULE:Themostequivalenttotheoldersequence-drivensystem(Kuznetsovaetal.,2012).Usesafewminorenhancements,suchassentence-boundarystatistics,toim-provegrammaticality.•SEQ:The§3systemwithouttreemodelsandmentionedenhancementsofSEQ+LINGRULE.MethodBleuMeteorw/(w/o)penaltyPRMSEQ+LINGRULE0.152(0.152)0.130.170.095SEQ0.138(0.138)0.120.180.094SEQ+TREE0.149(0.149)0.130.140.082SEQ+PRUNING0.177(0.177)0.150.160.101SEQ+TREE+PRUNING0.140(0.189)0.160.120.088Table1:AutomaticEvaluation•SEQ+PRUNING:SEQusingcompressedcap-tionsofTREEPRUNINGasbuildingblocks.Wealsoexperimentwiththecompressionofhumanwrittencaptions,whichareusedtogenerateimagedescriptionsforthenewtargetimages.BaselinesforCompression:•SEQCOMPRESSION(Kuznetsovaetal.,2013):Inferenceoperatesoverthesequencestructure.Althoughoptimizationissubjecttoconstraintsderivedfromdependencyparse,parsingisnotanexplicitpartoftheinferencestructure.Ex-ampleoutputsareshowninFigure7.5.1AutomaticEvaluationWeperformautomaticevaluationusingtwomea-sureswidelyusedinmachinetranslation:BLEU(Pa-pinenietal.,2002)8andMETEOR(DenkowskiandLavie,2011).9Weremoveallpunctuationandcon-vertcaptionstolowercase.Weuse1Ktestim-agesfromthecaptionedimagecorpus,10andas-sumetheoriginalcaptionsasthegoldstandardcap-tionstocompareagainst.TheresultsinTable18WeusetheunigramNISTimplementation:ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a-20091001.tar.gz9WithequalweightbetweenprecisionandrecallinTable1.10ExceptforthoseforwhichimageURLsarebroken,orCPLEXdidnotreturnasolution.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

359

Method-1Method-2CriteriaMethod-1preferredoverMethod-2(%)allturkersturkersw/κ>0.55turkersw/κ>0.6ImageDescriptionGenerationSEQ+TREESEQRel727272SEQ+TREESEQGmar838383SEQ+TREESEQAll686966SEQ+TREE+PRUNINGSEQ+TREERel687272SEQ+TREE+PRUNINGSEQ+TREEGmar413841SEQ+TREE+PRUNINGSEQ+TREEAll636466SEQ+TREESEQ+LINGRULEAll626462SEQ+TREE+PRUNINGSEQ+LINGRULEAll677577SEQ+TREE+PRUNINGSEQ+PRUNINGAll737575SEQ+TREE+PRUNINGHUMANAll241919ImageCaptionGeneralizationTREEPRUNINGSEQCOMPRESSION∗Rel656566Table2:HumanEvaluation:posedasabinaryquestion“whichofthetwooptionsisbetter?”withrespecttoRelevance(Rel),Grammar(Gmar),andOverall(Alle).AccordingtoPearson’sχ2test,allresultsarestatisticallysignificant.showthatboththeintegrationofthetreestructure(+TREE)andthegeneralizationofcaptionsusingtreecompression(+PRUNING)improvetheBLEUscorewithoutbrevitypenaltysignificantly,11whileimprovingMETEORonlymoderately(duetoanim-provementonprecisionwithadecreaseinrecall.)5.2HumanEvaluationNeitherBLEUnorMETEORdirectlymeasuregrammaticalcorrectnessoverlongdistancesandmaynotcorrespondperfectlytohumanjudgments.Therefore,wesupplementautomaticevaluationwithhumanevaluation.Forhumanevaluations,wepresenttwooptionsgeneratedfromtwocompet-ingsystems,andaskturkerstochoosetheonethatisbetterwithrespectto:relevance,grammar,andoverall.ResultsareshowninTable2with3turkerratingsperimage.Wefilteroutturkersbasedonacontrolquestion.Wethencomputetheselec-tionrate(%)ofpreferringmethod-1overmethod-2.Theagreementamongturkersisafrequentconcern.Therefore,wevarythesetofdependableusersbasedontheirCohen’skappascore(κ)againstotherusers.Itturnsout,filteringusersbasedonκdoesnotmakeabigdifferenceindeterminingthewinningmethod.Asexpected,tree-basedsystemssignificantlyout-performsequence-basedcounterparts.Forexample,11While4-gramBLEUwithbrevitypenaltyisfoundtocor-relatebetterwithhumanjudgesbyrecentstudies(ElliottandKeller,2014),wefoundthatthisisnotthecaseforourtask.Thismaybeduetothedifferencesinthegoldstandardcap-tions.Weusenaturallyexistingones,whichincludeawiderrangeofcontentandstylethancrowd-sourcedcaptions.Seq:”A”bu&erfly”Zu”Die”car”War”spo&Hrsg”von”Mein”neun”Jahr”alt”cousin.Seq+Pruning:”Der”bu&erflies”Sind”A&racted”Zu”Die”colourfulflowers”Zu”Die”car.+Seq+Tree:”Der”bu&erflies”Sind”A&racted”Zu”Die”colourfulflowers”In”HopeGardens.””Seq+Tree+Pruning:”Der”bu&erflies”Sind”A&racted”Zu”Die”colourfulflowers.Orig:”Der”bu&erflies”Sind”A&racted”Zu”Die”colourfulflowers”In”HopeGardens.””SeqCompression:”Der”colourfulflowers.”””TreePruning:”Der”bu&erflies”Sind”A&racted”Zu”Die”colourfulflowers.”””Cap>onGeneraliza>onImageDescrip>onGenera>onFigure8:Anexampleofadescriptionpreferredoverhu-mangoldstandard.Imagedescriptionisimprovedduetocaptiongeneralization.SEQ+TREEisstronglypreferredoverSEQ,withaselectionrateof83%.Somewhatsurprisingly,im-provedgrammaticalityalsoseemstoimproverele-vancescores(72%),possiblybecauseitishardertoappreciatethesemanticrelevanceofautomaticcap-tionswhentheyarelesscomprehensible.Alsoasexpected,compositionsbasedonprunedtreefrag-mentssignificantlyimproverelevance(68–72%),whileslightlydeterioratinggrammar(38–41%).Vor allem,thecaptionsgeneratedbyoursystemarepreferredovertheoriginal(ownergenerated)cap-tions19–24%ofthetime.Onesuchexampleisin-cludedinFigure8:“Thebutterfliesareattractedtothecolorfulflowers.”Additionalexamples(goodandbad)arepro-videdinFigures9and10.Manyofthesecaptionsarehighlyexpressivewhileremainingsemantically

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
1
8
8
1
5
6
6
9
0
5

/

/
T

l

A
C
_
A
_
0
0
1
8
8
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

360

Human:”Someflower”An”A”bar”In”A”hotel”In”Grapevine,”TX.””&Seq+Tree+Pruning:”Der”flower”War”Also”vivid”Und”A:rac
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild
Transactions of the Association for Computational Linguistics, 2 (2014) 351–362. Action Editor: Hal Daume III. Bild

PDF Herunterladen