Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller.

Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller.
Submission batch: 1/2018; Revision batch: 5/2018; Published 8/2018.

2018 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.

c
(cid:13)

ProbabilisticVerbSelectionforData-to-TextGenerationDellZhang†1,JiahaoYuan‡,XiaolingWang‡2,andAdamFoster††Birkbeck,UniversityofLondon,MaletStreet,LondonWC1E7HX,UK‡ShanghaiKeyLabofTrustworthyComputing,EastChinaNormalUniversity,3663NorthZhongshanRoad,Shanghai200062,China1dell.z@ieee.org,2xlwang@sei.ecnu.edu.cnAbstractIndata-to-textNaturalLanguageGeneration(NLG)systèmes,computersneedtofindtherightwordstodescribephenomenaseeninthedata.Thispaperfocusesontheproblemofchoosingappropriateverbstoexpressthedi-rectionandmagnitudeofapercentagechange(e.g.,instockprices).Ratherthansimplyusingthesameverbsagainandagain,wepresentaprincipleddata-drivenapproachtothisprob-lembasedonShannon’snoisy-channelmodelsoastobringvariationandnaturalnessintothegeneratedtext.Ourexperimentsonthreelarge-scalereal-worldnewscorporademon-stratethattheproposedprobabilisticmodelcanbelearnedtoaccuratelyimitatehumanauthors’patternofusagearoundverbs,outperformingthestate-of-the-artmethodsignificantly.1IntroductionNaturalLanguageGeneration(NLG)isafundamen-taltaskinArtificialIntelligence(AI)(RussellandNorvig,2009).Itaimstoautomaticallyturnstruc-tureddataintoprose(Reiter,2007;BelzandKow,2009)—theoppositeofthebetter-knownfieldofNaturalLanguageProcessing(NLP)thattransformsrawtextintostructureddata(e.g.,alogicalformoraknowledgebase)(JurafskyandMartin,2009).Beingdubbed“algorithmicauthors”or“robotjournalists”,NLGsystemshaveattractedalotofattentioninre-centyears,thankstotheriseofbigdata(Wright,2015).TheuseofNLGinfinancialserviceshasbeengrowingveryfast.OneparticularlyimportantNLGproblemforsummarizingfinancialorbusinessdataistoautomaticallygeneratetextualdescriptionsoftrendsbetweentwodatapoints(suchasstockprices).Inthispaper,weelecttouserelativepercentagesratherthanabsolutenumberstodescribethechangefromonedatapointtoanother.Thisisbecauseanabsolutenumbermightbeconsideredsmallinonecasebutlargeinanother,dependingontheunitandthecontext(Krifka,2007;Smileyetal.,2016).Forexample,1000Britishpoundsareworthmuchmorethan1000Japaneseyen;ariseof100USdollarsincarpricemightbenegligiblebutthesameamountofincreaseinbikepricewouldbesignificant.Giventwodatapoints(e.g.,onastockchart),thepercentagechangecanalwaysbecalculatedeasily.Thechallengeistoselecttheappropriateverbforanypercentagechange.Forexample,innewspa-pers,weoftenseeheadlineslike“Apple’sstockhadjumped34%thisyearinanticipationofthenextiPhone”and“Microsoft’sprofitclimbed28%withshifttoWeb-basedsoftware”.Thejournal-istswritingsuchnewsstoriesusedescriptivelan-guagesuchastheverbslikejumpandclimbtoexpressthedirectionandmagnitudeofapercent-agechange.Itisofcoursepossibletosimplykeepusingthesameneutralverbs,e.g.,increaseanddecreaseforupwardanddownwardchangesre-spectively,againandagain,asinmostexistingdata-to-textNLGsystems.However,thegeneratedtextwouldsoundmuchmorenaturalifcomputerscoulduseavarietyofverbssuitableinthecontextlikehumanauthorsdo.Expressionsofpercentagechangesarereadilyavailableinmanynaturallanguagetextdatasetsand

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

512

canbeeasilyextracted.Thereforecomputersshouldbeabletolearnfromsuchexpressionshowpeoplede-cidewhichverbstouseforwhatkindofpercentagechanges.Inthispaper,weaddresstheproblemofverbse-lectionfordata-to-textNLGthroughaprincipleddata-drivenapproach.Specifically,weshowhowtoemployBayesianreasoningtotrainaprobabilisticmodelforverbselectionbasedonlarge-scalereal-worldnewscorpora,anddemonstrateitsadvantagesoverexistingverbselectionmethods.Therestofthispaperisorganizedasfollows.InSection2,wereviewtherelatedworkinliterature.InSection3,wedescribethedatasetusedforourinves-tigation.InSection4,wepresentourprobabilisticmodelforverbselectionindetail.InSection5,weconductexperimentalevaluation.InSection6,wediscusspossibleextensionstotheproposedapproach.InSection7,wedrawconclusions.2RelatedWorkThemostsuccessfulNLGapplications,fromthecom-mercialperspective,havebeendata-to-textNLGsys-temswhichgeneratetextualdescriptionsofdatabasesordatasets(Reiter,2007;BelzandKow,2009).Atypicalexampleistheautomaticgenerationoftex-tualweatherforecastsfromweatherdatathathasbeenusedbyEnvironmentCanadaandUKMetOf-fice(Goldbergetal.,1994;Belz,2008;Sripadaetal.,2014).TheTRENDsystem(Boyd,1998)focusesongeneratingdescriptionsofhistoricalweatherpatterns.Theirmethodconcentratesprimarilyonthedetectionofupwardanddownwardtrendsintheweatherdata,andusesalimitedsetofverbstodescribedifferenttypesofmovements.Ramos-Sotoetal.(2013)alsoaddressthesurfacerealizationofweathertrenddatabycreatingan“intermediatelanguage”fortemper-ature,windetc.andthenusingfourdifferentwaystoverbalizetemperaturesbasedontheminimum,maximumandtrendinthetimeframeconsidered.Anempiricalcorpus-basedstudyofhuman-writtenweatherforecastshasbeenconductedinSUMTIME-MOUSAM(Reiteretal.,2005),andoneaspectoftheirresearchfocusedonverbselectioninweatherforecasts.Theybuiltaclassifiertopredictthechoiceofverbbasedontype(speedvs.direction),informa-tioncontent(changeortransitionfromonewindstatetoanother)andnear-synonymchoice.ThereismoreandmoreinterestinusingNLGtoenhanceacces-sibility,forexamplebydescribingdataintheformofgraphsetc.tovisuallyimpairedpeople.InsuchNLGsystems,therehasalsobeenexplorationintothegenerationoftextfortrenddatawhichshouldbeau-tomaticallyadaptedtousers’readinglevels(Moraesetal.,2014).Thereexistswide-spreadusageofNLGsystemsonthefinancialandbusinessdata.Forex-ample,theSPOTLIGHTsystemdevelopedatA.C.NielsenautomaticallygeneratedreadableEnglishtextbasedontheanalysisoflargeamountsofretailsalesdata.Foranotherexample,in2016Forbesre-portedthatFactSetusedNLGtoautomaticallywritehundredsofthousandsofcompanydescriptionsaday.Itisnotdifficulttoimaginethatdifferentkindsofsuchdata-to-textNLGsystemscanbeutilizedbyamodernchatbotlikeAmazonEchoorMicrosoftXiaoIce(Shumetal.,2018)toenableusersaccessavarietyofonlinedataresourcesvianaturallanguageconversation.Typically,acompletedata-to-textNLGsystemim-plementsapipelinewhichinvolvesbothcontentse-lection(“whattosay”)andsurfacerealization(“howtosay”).Inrecentyears,researchershavemademuchprogressintheend-to-endjointoptimizationofthosetwoaspects:Angelietal.(2010)treatthegenerationprocessasasequenceoflocaldecisionsrepresentedbylog-linearmodels;KonstasandLapata(2013)employaprobabilisticcontext-freegrammar(PCFG)specifyingthestructureoftheeventrecordsandcomplementitwithann-gramlanguagemodelaswellasadependencymodel;themostadvancedmethodtodateistheLSTMrecurrentneuralnet-work(RNN)basedencoder-aligner-decodermodelproposedbyMeietal.(2016)whichisabletolearncontentselectionandsurfacerealizationtogetherdi-rectlyfromdatabase-textpairs.Theverbselectionproblemthatwefocusoninthispaperbelongstothelexicalizationstepofcontentselection,morespecifi-cally,sentenceplanning.Similartotheabovemen-tionedjointoptimizationmethods,ourapproachtoverbselectionisalsoautomatic,unsupervised,anddomain-independent.Itwouldbestraightforwardtogeneralizeourproposedmodeltoselectothertypesofwords(likeadjectivesandadverbs),oreventextualtemplatesasusedbyAngelietal.(2010),todescribenumericaldata.Duetoitsprobabilisticnature,notre

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

513

proposedmodelcouldbepluggedinto,orinterpo-latedwith,abiggerend-to-endprobabilisticmodel(KonstasandLapata,2013)relativelyeasily,butitisnotobvioushowthismodelcouldfitintoaneuralarchitecture(Meietal.,2016).Theexistingworkonlexicalizationthatismostsimilartooursisacorpusbasedmethodforverbse-lectiondevelopedbySmileyetal.(2016)atThomsonReuters.Theyanalyzetheusagepatternsofverbsexpressingpercentagechangesinaverylargecorpus,theReutersNewsArchive.Foreachverb,theycal-culatetheinterquartilerange(IQR)ofitsassociatedpercentagechangesinthecorpus.Givenanewper-centagechange,theirmethodrandomlyselectsaverbfromthoseverbswhoseIQRscoverthepercentageinquestion,withequalprobabilities.Acrowdsourcingbasedevaluationhasdemonstratedthesuperiorityoftheirverbselectionmethodtotherandombaselinethatjustchoosesverbscompletelyrandomly.ItisnotablethattheirmethodhasbeenincorporatedintoThomsonReutersEikonTM,theircommercialdata-to-textNLGsoftwareproductformacro-economicindicatorsandmergers-and-acquisitionsdeals(Pla-chourasetal.,2016).WewillmakeexperimentalcomparisonsbetweenourproposedapproachandtheirsinSection5.3Data3.1TheWSJCorpusThefirst(andmain)datasetthatwehaveusedtoinvestigatetheproblemofverbselectionisBLLIP1987-89WallStreetJournal(WSJ)CorpusRelease1whichcontainsathree-yearWallStreetJournal(WSJ)collectionof98,732storiesfromACL/DCI(LDC93T1),approximately30millionwords(Char-niaketal.,2000).WefirstutilizedtheStanfordCoreNLP1(Manningetal.,2014)toolkittoextract“relationtriples”fromallthedocumentsinthedataset,viaitsopen-domaininformationextraction(OpenIE)functionality.Then,withthehelpofpart-of-speech(POS)taggingpro-videdbythePythonpackageNLTK2(Birdetal.,2009),wefilteredtheextractedrelationtriplesandretainedonlythoseexpressingapercentagechange1https://stanfordnlp.github.io/CoreNLP/2http://www.nltk.org/inthefollowingformat:Google’srevenue|{z}subjectrose|{z}verb22.2%|{z}percentage.Herethenumericalvalueofpercentagechangecouldbewrittenusingeitherthesymbol%orthewordpercent.Notethatallauxiliaryverbs(includingmodalverbs)wouldhavebeenremoved,andlemma-tization(Manningetal.,2008;JurafskyandMartin,2009)wouldhavebeenappliedtoallmainverbssothatthedifferentinflectionalformsofthesameverbwouldbereducedtotheircommonbaseform.Afterextracting57,005candidatetriplesforato-talof1,355verbs,weeliminatedrareverbswhichoccurlessthan50timesinthedataset.Furthermore,wemanuallyannotatedthedirectionofeachverbasupwardordownward,anddiscardedtheverbslikeyieldwhichdonotindicatethedirectionofper-centagechange.Theabovepreprocessingleftuswith25(normalized)verbsofwhich11areupwardand14aredownward.Thereare21,766verb-percentagepairsintotal.Furthermore,itisfoundthatmostoftheper-centagechangesinthisdatasetresidewithintherange[0%,100%].Onlyatinyportionofpercentagechangesarebeyondthatrange:1.35%forupwardverbsand0.10%fordownwardverbs.Thoseout-of-rangepercentagechangesareconsideredoutliersandareexcludedfromourstudyinthispaper,thoughthewaytorelaxthisconstraintwillbediscussedlaterinSection6.3.2TheReutersCorpusWehavealsovalidatedourmodelinawidely-usedpublicdataset,theReuters-21578textcategorizationcollection3.Itisacollectionof21,578documentsthatappearedonReutersnewswirein1987.Thedoc-umentswereassembledandindexedwithcategories,buttheywerenotneededinthispaper.ThesamepreprocessingasontheWSJcorpushasbeenappliedtothisdataset,exceptthattheminimumoccurringfrequencyofverbswasnot50but5timesduetothesmallersizeofthisdataset.Aftermanualannotationandfiltering,weendedupwith8verbsin-cluding4upwardonesand4downwardones.Thereare603verb-percentagepairsintotal.3https://goo.gl/NrOfu

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

514

3.3TheChineseCorpusFurthermore,toverifytheeffectivenessofourap-proachinotherlanguages,wehavealsomadeuseoftheChineseGigaword(5thedition)dataset.ItisacomprehensivearchiveofnewswiretextdatathathasbeenacquiredfromeightdistinctsourcesofChinesenewswirebyLDCoveranumberofyears(LDC2011T13),andcontainsmorethan10millionsentences.Sincewecouldnotfindanyopen-domaininfor-mationextractiontoolkitfor“relationtriples”inChi-nese,weresortedtoregularexpressionmatchingtoextract,fromChinesesentences,theexpressionsofpercentagetogetherwiththeirlocalcontexts.Anum-berofregularexpressionpatternshavebeenutilizedtoensurethattheycouldcoverallthedifferentwaystowriteapercentageinChinese.Then,afterPOStagging,wewouldbeabletoidentifytheverbimme-diatelyprecedingeachpercentageifitisassociatedwithone.Forourapplication,abigdifferencebetweenChi-neseandEnglishisthattheavailablechoicesofverbstoexpressupwardordownwardpercentagechangesareprettylimitedinChinese:thevariationinfactmostlycomesfromtheadverbusedtogetherwiththeverb.Therefore,whenwetalkabouttheprob-lemofChineseverbselectioninthispaper,weac-tuallymeanthechoiceofnotjustverbsbutinsteadadverb+verbcombinations,e.g.,狂升(risecrazily)and略降(fallslightly).Ourproposedprobabilisticmodelforverbselection,describedbelowinSec-tion4,canbeextendedstraightforwardlytosuchgeneralizedChinese“verbs”.Similartothepreprocessingofotherdatasets,rarelyoccurringverbswithfrequencylessthan50wouldhavebeenfilteredout.Intheend,wegot18Chineseverbsofwhich14areupwardand4aredownward.Thereare2,829verb-percentagepairsintotal.4ApproachInthissection,weproposetoformulatethetaskofverbselectionfordata-to-textNLG(seeSection1)asasupervisedlearningproblem(Hastieetal.,2009)andtoaddressitusingShannon’snoisy-channelmodel(Shannon,1948).Foreachofthetwopossiblechangedirections(upwardanddownward),weneedtobuildaspecificmodel.Withoutlossofgenerality,inthesubsequentdiscussion,wefocusonselectingtheverbsofoneparticulardirection;thewaytodealwiththeotherdirectionisexactlythesame.Thusapercentagechangeisfullyspecifiedbyitsmagnitudeinonemodel.Theset-upofoursupervisedlearningproblemisasfollows.Supposethatwehaveasetoftrainingex-amplesD={(x1,w1),…,(xN,wN)},whereeachexampleconsistsofapercentagechangexipairedwiththeverbwiusedbythehumanauthortoexpressthatpercentagechange.SuchtrainingdatacouldbeobtainedfromalargecorpusasdescribedinSec-tion3.LetXdenotethesetofpossiblepercentagechanges:asmentionedearlier,inthispaperweas-sumethatX=[0%,100%].LetVdenotethesetofpossibleverbs,i.e.,thevocabulary.Ourtaskistolearnapredictivefunctionf:X→Vthatcanmapanygivenpercentagechangextoanappropriateverbw=f(X).Apparently,thereisinherentuncertaintyintheabovedescribedprocessofpredictingthechoiceofverbsforapercentagechange.Makinguseofprobabilisticreasoning,theprincipledapproachtohandlinguncertainties,wearguethatthefunctionfshouldbedeterminedbytheposteriorprobabilityP(w|X).Cependant,itlooksdifficulttodirectlyes-timatetheparametersofsuchaconditionalmodel,akadiscriminativemodel,foreverypossiblevalueofxwhichisacontinuousvariable.Hence,weturntotheeasieralternativewayoftenusedinmachinelearning:toconstructagenerativemodel.Ratherthandirectlyestimatingtheconditionalprobabilitydistribution,weinsteadestimatethejointprobabilityP(X,w)over(X,w)pairsinthegenerativemodel.Thejointprobabilitycanbedecomposedasfollows:P.(X,w)=P(w)|{z}priorP(X|w)|{z}likelihood,(1)whereP(w)isthepriorprobabilitydistributionoververbsw,andP(X|w)isthelikelihood,i.e.,theprob-abilityofseeingthepercentagechangexgiventhattheassociatedverbisw.ThebenefitofmakingtheabovedecompositionisthattheparametersofP(w)andP(X|w)canbeestimatedseparately.Givensuchagenerativemodel,wecanthenusetheBayesruletoderivetheposteriorprobabilityP(w|X)

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

515

foranynewexampleofx:P.(w|X)=P(w)P.(X|w)P.(X),(2)whereP(X)=Xw∈VP(X,w)=Xw∈VP(w)P.(X|w)(3)isthemodelevidenceactingasthenormalizingcon-stantintheformula.Intuitively,thisgenerativemodelcouldbeconsid-eredasanoisy-channel(Shannon,1948).Whenweseeapercentagechangex,wecanimaginethatithasbeengeneratedintwosteps(Raviv,1967).D'abord,averbwwouldbechosenwiththepriorprobabilityP(w).Deuxième,theverbwwouldbepassedthroughacommunication“channel”andbecorruptedbythe“noise”toproducethepercentagechangexaccordingtothelikelihoodfunction(akathechannelmodel)P.(X|w).Inotherwords,thepercentagechangexthatweseeisactuallythedistortedformofitsassociatedverbw.Analternative,butequivalent,interpretationisthatwhenapair(X,w)ispassedthroughthenoisy-channel,theverbwwillbelostandfinallyonlythepercentagechangexwillbeseen.Thetaskistorecoverthelostwbasedontheobservedx.Shannon’snoisy-channelmodelisinfactakindofBayesianinference.IthasbeenappliedtomanyNLPtaskssuchastextcategorization,spellchecking,questionanswering,speechrecognition,andmachinetranslation(JurafskyandMartin,2009).Ourappli-cation—probabilisticverbselection—isdifferentfromthembecausetheobserveddataarecontinu-ousreal-valuednumbersbutnotdiscretesymbols.Moreimportantly,inmostofthoseapplicationssuchastextcategorizationusingtheNa¨ıveBayesalgo-rithm(Manningetal.,2008),theobjectiveis“decod-ing”,i.e.,tofindthesinglemostlikelylabelw∗foranygiveninputxfromthemodelw∗=argmaxw∈VP(w|X)=argmaxw∈VP(w)P.(X|w)/P.(X)=argmaxw∈VP(w)P.(X|w),(4)andthereforethenormalizingconstantP(X)doesnotneedtobecalculated.However,thisisactuallyundesirableforthetaskofverbselection,becauseitimpliesthattheapercentagechangexwouldalwaysbeexpressedbythesame“optimal”verbw∗corre-spondingtoit.Toachievevariationandnaturalness,wemustmaintainthediversityofwordusage.Sotherightmethodtogenerateaverbwforthegivenpercentagechangexistocomputetheposteriorprob-abilitydistributionP(w|X)overallthepossibleverbsinthevocabularyVusingEq.(2)andthenrandomlysampleaverbfromthatdistribution.AlthoughthismeansthatthenormalizingconstantP(X)needstobecalculatedeachtime,thecomputationisstilleffi-cient,asunlikeinmanyotherapplicationsthevocab-ularysize|V|isaquitesmallnumberinpractice(seeSection3).Inthefollowingtwosubsections,westudythetwocomponentsofourproposedprobabilisticmodelforverbselection,thepriorprobabilitydistributionandthelikelihoodfunction,respectively.4.1PriorThepriorprobabilitydistributionP(w)couldsim-plybeobtainedbymaximumlikelihoodestimation(MLE):P.(w)MLE=Nw/N,(5)whereNwisthenumberoftrainingexampleswiththeverbw,andNisthetotalnumberoftrainingexamples.Therelationshipbetweenaverb’srankandfre-quencyintheWSJcorpusisdepictedbythelog-logplotFig.1,revealingthattheempiricaldistributionofverbsfollowstheZipf’slaw(Powers,1998),whichisrelatedtothepowerlaw(Adamic,2000;Newman,2005).Specifically,thefrequencyofthei-thpopularverb,fi,isproportionalto1/is,wheresistheex-ponentcharacterizingthedistribution(shownastheslopeofthestraightlineinthecorrespondinglog-logplot).Thisimpliesthatinthecontextofexpressingpercentagechanges,thehumanchoiceofverbsisdominatedbyafewfrequentlyusedones,andmanyotherverbsareonlyusedveryoccasionally.Smoothing:Ifwewouldliketointentionallyboostthediversityofverbchoices,wecouldmitigatethehighskewnessoftheempiricaldistributionofverbsbysmoothing(ZhaiandLafferty,2004).AsimplesmoothingtechniquesuitableforthispurposeistheJelinek-Mercersmoothing(JelinekandMercer,1980)

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

516

ORJUDQNORJIUHT(un)upwardverbsORJUDQNORJIUHT(b)downwardverbsFigure1:TheempiricaldistributionofverbsP(w)MLEfollowstheZipf’slaw,intheWSJcorpus.whichusesalinearinterpolationbetweenthemaxi-mumlikelihoodestimationofaverbw’spriorproba-bilitydistributionwiththeuniformdistributionoverthevocabularyofverbsV,i.e.,P(w)=λP(w)MLE+(1−λ)1|V|,(6)whereP(w)MLEisgivenbyEq.(5),andtheparame-terλ∈[0,1]providesameanstoexplicitlycontrolthetrade-offbetweenaccuracyanddiversity.Thesmallertheparameterλis,themorediversethegen-eratedverbswouldbe.Whenλ=0,thepriorprob-abilityiscompletelyignoredandtheselectionofaverbsolelydependsonhowcompatibletheverbiswiththegivenpercentagechange.Whenλ=1,itbacksofftotheoriginalmodelwithoutsmoothing.Theoptimalvalueoftheparameterλcouldbetunedonadevelopmentset(seeSection5.3).4.2LikelihoodForeachverbw∈V,weanalyzethedistributionofitsassociatedpercentagechangesandcalculatethefollowingdescriptivestatistics:mean,standarddevi-ation(std),skewness,kurtosis,median,andinterquar-tilerange(IQR).AllthosedescriptivestatisticsfortheWSJcorpusaregiveninTable1.Inaddition,Fig.2showstheboxplotsofpercentagechangesfortop-10(mostfrequent)verbsintheWSJcorpus,wheretherectangularboxcorrespondingtoeachverbrepresentsthespanfromthefirstquartiletothethirdquartile,i.e.,theinterquartilerange(IQR),withthesegmentinsidetheboxindicatingthemedianandthewhiskersoutsidetheboxindicatingtherestofthedistribution(exceptforthepointsthataredeterminedtobe“outliers”usingtheso-calledTukeyboxplotmethod).Itcanbeseenthatthechoiceofverbsoftenim-plythemagnitudeofpercentagechange:someverbs(suchassoarandplunge)aremostlyusedtoex-pressbigchanges(largemedians),whilesomeverbs(suchasadvanceandease)aremostlyusedtoexpresssmallchanges(smallmedians).Generallyspeaking,theformerisassociatedwitharelativelywiderangeofpercentagechanges(largeIQRs)whilethelatterisassociatedwitharelativelynarrowrangeofpercentagechanges(smallIQRs).De plus,itisinterestingtoseethatforalmostalltheverbs,thedistributionofpercentagechangesisheavilyskewedtotheleftside(i.e.,smallerchanges).Givenanewpercentagechangex,inordertocal-culateitsprobabilityofbeinggeneratedfromaverbwintheabovedescribedgenerativemodel,weneedtofitthelikelihoodfunction,i.e.,theprobabilitydis-tributionP(X|w),foreachwordw∈V,basedonthetrainingdata.Onecommontechniqueforthispurposeiskerneldensityestimation(KDE)(Hastieetal.,2009),anon-parametricwaytoestimatetheprobabilitydensityfunctionasfollows:P.(X|w)=1NwhNwXi=1K(cid:18)x−xih(cid:19),(7)

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

517

verbsmeanstdskewnesskurtosismedianIQRupwardrise16.9318.581.772.809.40[04.90,22.00]increase17.0518.061.763.0110.45[05.00,23.00]grow15.4617.481.772.938.40[03.20,21.00]climb17.2218.321.813.2610.00[05.57,23.00]jump31.2823.640.77-0.2424.20[12.53,48.00]surge29.0325.430.85-0.3321.00[08.00,46.00]gain13.7816.791.953.897.50[02.00,20.00]soar39.3927.680.42-0.9435.00[15.20,58.00]raise16.5415.541.834.1911.40[05.00,22.75]advance15.8315.471.873.4910.55[06.03,20.00]boost20.1516.161.682.8016.00[09.78,24.99]downwardfall17.5219.931.611.868.90[04.18,24.00]decline14.8117.091.873.078.00[04.58,19.00]drop18.3619.001.511.7210.00[05.47,26.00]slip11.9517.512.093.246.00[02.00,09.12]plunge38.8726.920.48-0.8334.05[15.08,58.00]slide23.0922.291.00-0.0315.00[05.25,38.65]lose23.6521.651.050.4717.00[06.00,36.98]tumble28.8422.460.980.4224.90[10.00,39.20]plummet36.4323.890.62-0.3531.00[19.90,50.00]ease11.0217.272.253.975.50[01.95,08.67]decrease19.7218.671.250.8212.00[05.60,30.80]reduce25.7221.811.411.2120.00[10.00,30.00]dip13.9818.982.012.916.85[03.75,10.25]shrink23.8220.721.331.3715.00[10.00,35.00]Table1:Thedescriptivestatisticsofpercentagechanges(in%)foreachverb,intheWSJcorpus.SHUFHQWULVHLQFUHDVHFOLPEJURZJDLQMXPSVRDUVXUJHUDLVHDGYDQFHYHUE(un)upwardverbsSHUFHQWIDOOGHFOLQHGURSWXPEOHVOLSORVHSOXQJHHDVHVOLGHSOXPPHWYHUE(b)downwardverbsFigure2:Theboxplotsofpercentagechanges(in%)forthetop-10verbs,intheWSJcorpus.whereNwisthenumberoftrainingexampleswiththeverbw,K(·)isthekernel(anon-negativefunc-tionthatintegratestooneandhasmeanzero),andh>0isasmoothingparametercalledthebandwidth.Fig.3showsthelikelihoodfunctionP(X|w)fittedbyKDEwithGaussiankernelsandautomaticband-widthdeterminationusingtheruleofScott(2015),forthemostpopularupwardanddownwardverbsintheWSJcorpus:riseandfall.ItisalsopossibletofitaparametricmodelofP(X|w)whichwouldbemoreefficientthanKDE.Sinceinthispaperxisassumedtobeacontinuousrandomvariablewithintherange[0%,100%](seeSection3),wechoosetofitP(X|w)withtheBeta

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
0
3
8
1
5
6
7
6
4
8

/

/
t

je

un
c
_
un
_
0
0
0
3
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

518

(un)theverbrise(b)theverbfallFigure3:ThelikelihoodfunctionP(X|w)fittedbykerneldensityestimation(KDE).(un)theverbrise(b)theverbfallFigure4:ThelikelihoodfunctionP(X|w)fittedbytheBetadistribution.distributionwhichisacontinuousdistributionsup-portedontheboundedinterval[0,1]:P.(X|w)=Beta(un,β)(α+β)Γ(un)Γ(β)xα−1(1−x)β−1.(8)Althoughthereexistanumberofcontinuousdis-tributionssupportedontheboundedintervalsuchasthetruncatednormaldistribution,theBetadis-tributionispickedhereasithastheabilitytotakeagreatvarietyofdifferentshapesusingonlytwoparametersαandβ.Thesetwoparameterscanbees-timatedusingthemethodofmoments,ormaximumlikelihood.Forexample,usingtheformer,wehavebα=¯x(cid:16)¯x(1−¯x)¯v−1(cid:17)andbβ=(1−¯x)(cid:16)¯x(1−¯x)¯v−1(cid:17)if¯v<¯x(1−¯x),where¯xand¯varethesamplemeanandsamplevariancerespectively.Fig.4showsthelikelihoodfunctionP(x|w)fittedbytheBetadis-tributionusingSciPy4forthemostpopularupwardanddownwardverbsintheWSJcorpus:riseandfall.5Experiments5.1BaselinesThomsonReuters:Theonlypublishedapproachthatweareawareoftothisspecifictaskofverbselec-tioninthecontextofdata-to-textNLGisthemethodadoptedbyThomsonReutersEikonTM(Smileyetal.,2016).Thisbaselinemethod’seffectivenesshasbeenverifiedthroughcrowdsourcing,aswehavementionedbefore(seeSection2).Furthermore,itisfairlynew(publishedin2016),thereforeshould4https://www.scipy.org/ l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 519 representthestateoftheartinthisfield.Notethattheirmodelwasnottakenoff-the-shelfbutre-trainedonourdatasetstoensureafaircomparisonwithourapproach.NeuralNetwork:Anotherbaselinemethodthatwehavetriedisafeed-forwardartificialneuralnet-workwithhiddenlayers,aka,amulti-layerpercep-tron(RussellandNorvig,2009;Goodfellowetal.,2016).Itisbecauseneuralnetworksarewell-knownuniversalfunctionapproximators,andtheyrepresentquiteadifferentfamilyofsupervisedlearningalgo-rithms.Unlikeourproposedprobabilisticapproachwhichisessentiallyagenerativemodel,theneuralnetworkusedinourexperimentsisadiscrimina-tivemodelwhichtakesthepercentagechangein-put(representedasasinglefloating-pointnumber)andthenpredictstheverbchoicedirectly.Sincewewouldliketohaveprobabilityestimatesforeachverb,thesoftmaxfunctionwasusedfortheoutputlayerofneurons,andthenetworkwastrainedviaback-propagationtominimizethecross-entropylossfunction.Anl2regularizationtermwasalsoaddedtothelossfunctionthatwouldshrinkmodelparam-eterstopreventoverfitting.Theactivationfunctionwassettotherectifiedlinearunit(ReLU)(Hahn-loseretal.,2000).TheAdamoptimizationalgo-rithm(KingmaandBa,2014)wasemployedasthesolver,withthesamplesshuffledaftereachiteration.Theinitiallearningratewassetto0.001,andthemaximumnumberofiterations(epochs)wassetto1500.Forourdatasets,asinglehiddenlayerof100neuronswouldbesufficientandaddingmoreneu-ronsorlayerscouldnothelp.Thiswasfoundusingthedevelopmentsetthroughalinesearchfrom20to500hiddenneuronswithstepsize20.Notethatwhenapplyingthetrainedneuralnetworktoselectverbs,weshouldusenotargmaxbutsamplingfromthepredictedprobabilitydistribution(givenbythesoftmaxfunction),inthesamewayaswedoinourproposedprobabilisticmodel(seeSection4).5.2CodeThePythoncodeforourexperiments,alongwiththedatasetsofverb-percentagepairsextractedfromthosethreecorpora(seeSection3),havebeenmadeavailabletotheresearchcommunity5.5https://goo.gl/gkj8Fa5.3AutomaticEvaluationTheendusers’perceptionofaverbselectionalgo-rithm’squalitydependsonnotonlyhowaccuratelythechosenverbsreflectthecorrespondingpercent-agechangesbutalsohowdiversethechosenverbsare,whicharetwolargelyorthogonaldimensionsforevaluation.Accuracy:TheeasiestwaytoassesstheaccuracyofanNLGmethodorsystemistocomparethetextsgeneratedbycomputersandthetextswrittenbyhu-mansforthesameinputdata(MellishandDale,1998;ReiterandBelz,2009),usinganautomaticmetricsuchasBLEU(Papinenietal.,2002).Forourtaskofverbselection,wedecidetousethemetricMRRthatstandsformeanreciprocalrank(Voorhees,1999;Radevetal.,2002)andcanbecalculatedasfollows:MRR=1|Q|X(x0i,w0i)∈Q1rank(w0i),(9)whereQ={(x01,w01),...,(x0M,w0M)}isthesetoftestexamples,andrank(w0i)referstotherankpo-sitionofw0i—theverbreallyusedbythehumanauthortodescribethepercentagechangex0i—inthelistofpredictedverbsrankedinthedescendingorderoftheirprobabilitiesofcorrectnessgivenbythemodel.TheMRRmetricismostwidelyusedfortheevaluationofautomaticquestionansweringwhichissimilartoautomaticverbselectioninthefollowingsense:theybothaimtooutputjustonesuitableresponse(answerorverb)toanygiveninput(questionorpercentagechange).Through5-foldcross-validation(Hastieetal.,2009),wehavegottheMRRscoresofourproposedmodel(seeSection4)andthetwobaselinemod-els(seeSection5.1)whichareshowninTable2.Themodelsweretrained/testedseparatelyoneachdataset(seeSection3).Ineachroundof5-foldcross-validation,20%ofthedatawouldbecomethetestset;intheremaining80%ofthedata,randomlyselected60%wouldbethetrainingsetandtheother20%wouldbethedevelopmentsetifparametertuningisneeded(otherwisethewhole80%wouldbeusedfortraining).Theparameterλofourmodelcontrolsthestrengthofsmoothingoverthepriorprobability(seeSec-tion4.1)andthusdictatesthetrade-offbetweenac-curacyanddiversity.Ifwefocusontheaccuracy l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 520 corpusmethodupwardverbsdownwardverbsWSJThomsonReuters0.119±0.0020.106±0.003NeuralNetwork0.581±0.0440.567±0.013OurApproach(λ=1,KDE)0.724±0.0110.686±0.016OurApproach(λ=1,Beta)0.730±0.0110.685±0.015OurApproach(λ=0.05,KDE)0.533±0.0180.516±0.003OurApproach(λ=0.05,Beta)0.527±0.0120.532±0.011ReutersThomsonReuters0.370±0.0330.339±0.023NeuralNetwork0.860±0.0500.855±0.044OurApproach(λ=1,KDE)0.887±0.0380.881±0.036OurApproach(λ=1,Beta)0.887±0.0450.872±0.038OurApproach(λ=0.05,KDE)0.729±0.0600.799±0.036OurApproach(λ=0.05,Beta)0.721±0.0700.695±0.054ChineseThomsonReuters0.167±0.0050.345±0.019NeuralNetwork0.508±0.0570.668±0.058OurApproach(λ=1,KDE)0.525±0.0110.702±0.047OurApproach(λ=1,Beta)0.528±0.0160.696±0.042OurApproach(λ=0.05,KDE)0.433±0.0130.656±0.040OurApproach(λ=0.05,Beta)0.445±0.0120.639±0.044Table2:TheaccuracyofverbselectionmeasuredbyMRR(mean±std)via5-foldcross-validation.onlyandignorethediversity,theoptimalvalueofλshouldjustbe1(i.e.,nosmoothing).Inordertostrikeahealthybalancebetweenaccuracyanddiver-sity,wecarriedoutalinesearchforthevalueofλfrom0to1withstepsize0.05usingthedevelopmentset.Itturnedoutthatthesmoothingeffectupondiver-sitywouldonlybecomenoticeablewhenλ≤0.1,sowefurtherconductedalinesearchfrom0to0.1withstepsize0.01,andfoundthatusingλ=0.05consis-tentlyyieldagoodperformanceondifferentcorpora.Actually,thisphenomenonshouldnotbeverysur-prising,giventheZipfiandistributionofverbswhichishighlyskewed(seeFig.1).Ourobservationintheexperimentsstillindicatethatsmoothingwithanone-zeroλworkedbetterthansettingλ=0.Thatistosay,itwouldnotbewisetogotoextremestoignorethepriorentirelywhichwouldunnecessarilyharmtheaccuracy.Analternativesmoothingsolutionformitigatingthesevereskewnessoftheempiricalpriorthatwealsoconsideredistomakethesmoothedpriorprobabilityproportionaltothelogarithmoftherawpriorprobability,butwedidnottakethatrouteas(i)wecouldnotfindagoodprincipledinterpreta-tionforsuchatrickand;(ii)usingasmallλvaluelike0.05seemedtoworksufficientlywell.Itwillbeshownlaterthatsamplingverbsfromtheposteriorprobabilitydistributionratherthanjustusingtheonewiththemaximumprobabilitywouldhelptoalleviatetheproblemofpriorskewnessandthuspreventverbselectionfrombeingdominatedbythemostpopularverbs.Itcanbeobservedfromtheexperimentalresultsthatsmoothing(seeSection4.1)doesreducetheaccuracyofverbselection.TheMRRscoreswithλ=0.05arelowerthanthosewithλ=1.Nev-ertheless,asweshallsoonsee,strongsmoothingiscruciallyimportantforachievingagoodlevelofdiversity.Furthermore,thereseemedtobelittleper-formancedifferencebetweentheusageoftheKDEtechniqueortheBetadistributiontofitthelikelihoodfunctioninourapproach.Thissuggeststhatthelatterispreferablebecauseitisaseffectiveastheformerbutmuchmoreefficient.Therefore,intheremainingpartofthispaper,weshallfocusonthisspecificver-sionofourmodel(withλ=0.05,Beta)eventhoughitmaynotbethemostaccurate.TheMRRscoresachievedbyourapproacharearound0.4–0.8whichimpliesthat,onaverage,thefirstorthesecondverbselectedbyourapproachwouldbethe“correct”verbusedbyhumanauthors.Acrossallthethreecorpora,ourproposedproba-bilisticmodel,whetheritissmoothedornot,whetheritusestheKDEtechniqueortheBetadistribution,outperformstheThomsonReutersbaselinebyalarge l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 521 marginintermsofMRR.AccordingtotheWilcoxonsigned-ranktest(Wilcoxon,1945;Kerby,2014),theperformanceimprovementsbroughtbyourapproachovertheThomsonReutersbaselinearestatisticallysignificantwiththe(two-sided)p-value(cid:28)0.0001onthetwoEnglishcorporaand=0.0027ontheChinesecorpus.WithrespecttotheNeuralNetworkbaseline,onallthethreecorpora,itsaccuracyisslightlybetterthanthatofoursmoothedmodel(λ=0.05)thoughitstillcouldnotbeatouroriginalunsmoothedmodel(λ=1).ThemajorproblemwiththeNeuralNet-workbaselineisthat,similartotheprobabilisticmodelwithoutsmoothing,itsverbchoiceswouldconcentrateonthemostfrequentonesandthushaveverypoordiversity.Aprominentadvantageofourproposedprobabilisticmodel,incomparisonwithdiscriminativelearningalgorithmssuchastheNeuralNetworkbaseline,isthatweareabletoexplicitlycontrolthetrade-offbetweenaccuracyanddiversitybyadjustingthestrengthofsmoothing.Itisworthemphasizingthattheaccuracyofaverbselectionmethodonlyreflectsitsabilitytoimitatehowwriters(journalists)useverbs,butthisisnotnecessarilythesameashowreadersinterprettheverbs.UsuallytheultimategoalofanNLGsys-temistosuccessfullycommunicateinformationtoreaders.PreviousresearchinNLGandpsychologysuggeststhatthereiswidevariationinhowdifferentpeopleinterpretverbsandwordsingeneral,whichisprobablymuchlargerinthegeneralpopulationthanamongstjournalists.Specifically,theMRRmetricwouldprobablyunderestimatetheeffectivenessofaverbselectionmethod,sinceaverbdifferentfromtheonereallyusedbythewriterisnotnecessarilyalessappropriatechoiceforthecorrespondingpercentagechangefromthereader’sperspective.Diversity:Otherthantheaccuracyofreproducingtheverbchoicesmadebyhumanauthors,verbselec-tionmethodscouldalsobeautomaticallyevaluatedintermsofdiversity.FollowingKingranietal.(2015),weborrowthediversitymeasuresfromecology(Magurran,1988)toquantitativelyanalyzethediversityofverbchoices:eachspecificverbisconsideredasaparticularspecies.Whenmeasuringthebiologicaldiversityofahabitant,itisimportanttoconsidernotonlythenumberofdistinctspeciespresentbutalsotherela-tiveabundanceofeachspecies.Intheliteratureofecology,theformeriscalledrichnessandthelatteriscalledevenness.Hereweutilizethewell-knownInverseSimpsonIndexakaSimpson’sReciprocalIn-dex(Simpson,1949)whichtakesbothrichnessandevennessintoaccount:D=(cid:16)PRi=1p2i(cid:17)−1,whereRisthetotalnumberofdistinctspecies(i.e.,rich-ness),andpiisthetheproportionoftheindividualsbelongingtothei-thspeciesrelativetotheentirepopulation.Theevennessisgivenbythevalueofdiversitynormalizedtotherangebetween0and1,soitcanbecalculatedasD/R.Table3showsthediversityscoresofverbchoicesmadebyourapproachandtheThomsonReutersbase-linefor450randomlysampledpercentagechanges(seeSection5.4).Overall,intermsofdiversity,ourapproachwouldlosetoThomsonReuters.TheNeu-ralNetworkbaselineisomittedherebecauseitsdi-versityscoreswereverylow.Discussion:Figs.5and6showtheconfusionma-tricesofourapproach(λ=0.05,Beta)ontheWSJcorpusas(row-normalized)heatmaps:intheformerwechoosetheverbwiththehighestposteriorproba-bility(argmax)whileinthelatterwesampletheverbfromtheposteriorprobabilitydistribution(seeSec-tion4).Theargmaxwaywouldbedominatedbyafewverbs(e.g.,“rise”,“soar”,“fall”,and“plummet”).Incontrast,randomsamplingwouldleadtoamuchwidervarietyofverbs.Theexperimentalresultsofallverbselectionmethodsreportedinthispaperaregeneratedbythesamplingstrategy,ifnotindicatedotherwise.ItcanbeseenfromFig.6thattheverbs“soar”and“plunge”aretheeasiesttobepredicted.Generallyspeaking,thepredictionofverbsisrela-tivelymoreaccurateforbiggerpercentagechanges,whetherupwardsordownwards.Thisisprobablybe-causetherearefewerverbsavailabletodescribesuchradicalpercentagechanges(seeFig.2)andthusthemodelfaceslessuncertainty.Mostmisclassification(confusion)happenswhenaverbisincorrectlypre-dictedtobethemostfrequentone(“rise”or“fall”).5.4HumanEvaluationThetwoaspects,accuracyanddiversity,arebothim-portantforthetaskofverbselection.Althoughwehaveshownthatautomaticevaluationcouldbecar- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 522 riseincreasegrowclimbjumpsurgegainsoarraiseadvanceboostriseincreasegrowclimbjumpsurgegainsoarraiseadvanceboost0.00.10.20.30.40.5(a)upwardverbsfalldeclinedropslipplungeslidelosetumbleplummeteasedecreasereducedipshrinkfalldeclinedropslipplungeslidelosetumbleplummeteasedecreasereducedipshrink0.000.150.300.450.60(b)downwardverbsFigure5:TheconfusionmatrixheatmapofourapproachontheWSJcorpus:choosingtheverbwiththehighestposteriorprobability.riseincreasegrowclimbjumpsurgegainsoarraiseadvanceboostriseincreasegrowclimbjumpsurgegainsoarraiseadvanceboost0.080.100.120.140.16(a)upwardverbsfalldeclinedropslipplungeslidelosetumbleplummeteasedecreasereducedipshrinkfalldeclinedropslipplungeslidelosetumbleplummeteasedecreasereducedipshrink0.0450.0600.0750.0900.1050.120(b)downwardverbsFigure6:TheconfusionmatrixheatmapofourapproachontheWSJcorpus:samplingtheverbfromtheposteriorprobabilitydistribution.corpusmethodupwardverbsdownwardverbsrichnessevennessdiversityrichnessevennessdiversityWSJOurApproach50.63243.16250.46982.349ThomsonReuters110.87719.648140.68219.550ReutersOurApproach30.75202.25630.59331.780ThomsonReuters40.64532.58140.57202.288ChineseOurApproach60.79654.77940.52652.106ThomsonReuters140.58318.16440.71502.860Table3:ThediversityofverbselectionmeasuredbytheInverseSimpsonIndex. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 523 corpusverbsOurApproachvsThomsonReutersOurApproachvsNeuralNetwork><≈p-value><≈p-valueWSJupward433200.2480532200.0004downward442830.0764423210.2954both876030.0316955410.0010Reutersupward3728100.3211432480.0271downward393150.4030502320.0021both7659150.16839347100.0001Chineseupward423030.19456591(cid:28)0.0001downward293790.3891373440.8126both7167120.7985102435(cid:28)0.0001Allboth234186300.021729014416(cid:28)0.0001Table4:Theresultsofhumanevaluation,wherethep-valuesaregivenbythesigntest(two-sided).riedoutforeitheraccuracyordiversityalone,thereisnoobviouswaytoassesstheoveralleffectivenessofaverbselectionmethodusingmachinesonly.Theultimatejudgmentonthequalityofverbselectionwouldhavetocomefromhumanassessors(MellishandDale,1998;ReiterandBelz,2009;Smileyetal.,2016).Tomanuallycompareourapproach(theversionwithλ=0.05,Beta)withabaselinemethod(Thom-sonReutersorNeuralNetwork),weconductaques-tionnairesurveywith450multiple-choicequestions.Ineachquestion,arespondentwouldseeapairofgeneratedsentencesdescribingthesamepercentagechangewiththeverbsselectedbytwodifferentmeth-odsrespectivelyandneedtojudgewhichonesoundsbetterthantheother(oritishardtotell).Forexam-ple,arespondentcouldbeshownthefollowingpairofgeneratedsentences:(1)Netprofitdeclines3%(2)Netprofitplummets3%andthentheyweresupposedtochooseoneofthethreefollowingoptionsastheiranswer:[a]Sentence(1)soundsbetter.[b]Sentence(2)soundsbetter.[c]Theyareequallygood.Therespondentswouldbeblindedtowhetherthefirstverborthesecondverbwasprovidedbyourproposedmethod,astheirappearingorderwouldhavebeenrandomizedinadvance.Thequestionnairesurveysystemwithheldtheinformationaboutthesourceofeachverbuntiltheanswersfromallrespondentshadbeencollected,andthenitwouldcounthowmanytimestheverbselectedbyourproposedmethodwasdeemedbetterthan(>),worsethan(<),orasgoodas(≈)theverbselectedbythebaselinemethod.Foreachcorpus,weproduced150differentques-tions,ofwhichhalfwereaboutupwardverbsandhalfwereaboutdownwardverbs.Aswehaveexplainedabove,eachquestioncomparesapairofgeneratedsentencesdescribingthesamepercentagechangewithdifferentverbs.ThesentencegenerationprocessisthesameasthatusedbySmileyetal.(2016).Thesubjectswererandomlypickedfromthemostpopularonesinthecorpus(e.g.,“grossdomesticproduct”),andthepercentagechanges(astheobjects)wereran-domlysampledfromthecorpusaswell.Eachofthetwoverbselectionmethods,incomparison,wouldprovideoneverb(asthepredicate)fordescribingthatspecificpercentagechange.Notethatinthissentencegenerationprocess,apairofsentenceswouldbere-tainedonlyiftheverbsselectedbythetwomethodsweredifferent,asitwouldbemeaninglesstocomparetwoidenticalsentences.Atotalof15college-educatedpeopleparticipatedinthequestionnairesurvey.Theyareallbilingual,i.e.,nativeorfluentspeakersofbothEnglishandChinese.Eachpersonwasgiven30questions:10questions(including5upwardand5downwardones)fromeachcorpus.We(theauthorsofthispaper)wereexcludedfromparticipatinginthequestionnairesurveytoavoidanyconsciousorunconsciousbias.TheresultsofhumanevaluationareshowninTa-ble4.Altogether,respondentsprefertheverbse-lectedbyourapproach234/450=52%oftimes,asopposedto186/450=41%fortheThomsonReutersbaseline;respondentsprefertheverbselectedby l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 524 ourapproach290/450=64%oftimes,asopposedto144/450=32%fortheNeuralNetworkbaseline.Accordingtothesigntest(Wackerlyetal.,2007),ourapproachworkssignificantlybetterthanthetwobaselinemethods,ThomsonReutersandNeuralNet-work:overallthe(two-sided)p-valuesarelessthan0.05.Discussion:Ourapproachexhibitsmoresuperior-ityovertheThomsonReutersbaselineontheEnglishdatasetsthanontheChinesedataset.SincetheChi-nesedatasetisbiggerthantheReutersdataset,thoughsmallerthantheWSJdataset,theperformancediffer-enceisnotcausedbycorpussizebutduetolanguagecharacteristics.RememberthatforChineseweareactuallypredictingadverb+verbcombinations(seeSection3.3).RetrospectivemanualinspectionoftheexperimentalresultssuggeststhatusersseemtohaverelativelyhigherexpectationsofdiversityforChineseadverbsthanforEnglishverbs.6ExtensionsRobustness:Itisstillpossible,thoughveryun-likely,fortheproposedprobabilisticmodeltogen-erateatypicalusesofaverb.Asimplemeasuretoavoidsuchsituationsistorejectthesampledverbw∗iftheposteriorprobabilityP(w∗|x)<τwhereτisapredefinedthreshold,e.g.,5%,andthenresamplew∗untilP(w∗|x)≥τ.UnlimitedRange:Ifthemagnitudeofapercent-agechangeisallowedtogobeyond100%,wewouldnolongerbeabletousetheBetadistributiontofitthelikelihoodfunctionP(x|w)asitissupportedonaboundedinterval.However,itshouldbestraight-forwardtouseaflexibleprobabilitydistributionsup-portedonthesemi-infiniteinterval[0,+∞],suchastheGammadistribution.Subject:Thecontext,inparticularthesubjectofthepercentagechange,hasnotbeentakenintoac-countbythepresentedmodels.Asillustratedbythetwoexamplesentencesbelow,thesameverb(“surge”)couldbeusedforquitedifferentpercentagechanges(“181%”vs“8%”)dependingonthesubject(“wheatprice”vs“inflation”).•“AccordingtoWorldBankfigures,wheatpriceshavesurgedupby181percentinthepastthreeyearstoFebruary2008.”•“Whileinflationhassurgedtoalmost8%in2008,itisprojectedbytheCommissiontofallin2009.”Furthermore,thesignificanceofapercentagechangeoftendependsonthedomain,andconsequently,sodoesthemostappropriateverbtodescribeaper-centagechange.Forexample,a10%increaseinstockpriceisinteresting,whilea10%increaseinbodytemperatureislife-threatening.Itis,ofcourse,possibletoincorporatethesubjectinformationintoourprobabilisticmodelbyextendingEq.(2)toP(w|x,s)=P(w,s)P(x|w,s)/P(x,s)wheresisthesubjectwordinthetriple.Ononehand,thisshouldmakethemodelmoreeffective,fortherea-sonsexplainedabove.Ontheotherhand,thiswouldrequirealotmoredataforreliableestimationofthemodelparameters,whichisoneofthereasonswhyweleaveitforfuturework.LanguageModeling:Thankstoitsprobabilisticnature,ourproposedmodelforverbselectioncouldbeseamlesslypluggedintoann-gramstatisticallan-guagemodel(JurafskyandMartin,2009),e.g.,fortheMSRSentenceCompletionChallenge6.Thismightbeabletoreducethelanguagemodel’sperplex-ity,astheprobabilityofhsubject,verb,percentageitriplescouldbecalculatedmoreprecisely.HierarchicalModeling:Thechoiceofverbtode-scribeaparticularpercentagechangecouldbeaf-fectedbythestyleoftheauthor,thetopicofthedocument,andothercontextualfactors.Totakethosedimensionsintoaccountandbuildafinerprob-abilisticmodelforverbselection,wecouldembraceBayesianhierarchicalmodeling(Gelmanetal.,2013;Kruschke,2014)which,forexample,couldleteachauthor’smodelborrowthe“statisticalpower”fromotherauthors’.Psychology:Thereexistalotofstudiesinpsy-chologyonhowpeopleinterpretprobabilitiesandrisks(Reaganetal.,1989;Berryetal.,2004).Theycouldprovideusefulinsightsforfurtherenhancingourverbselectionmethod.7ConclusionsThemajorresearchcontributionofthispaperisaprobabilisticmodelthatcanselectappropriateverbs6https://goo.gl/yyKBYa l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 525 toexpresspercentagechangeswithdifferentdirec-tionsandmagnitudes.Thismodelisnotrelyingonhard-wiredheuristics,butlearnedfromtrainingex-amples(intheformofverb-percentagepairs)thatareextractedfromlarge-scalereal-worldnewscorpora.Thechoicesofverbsmadebytheproposedmodelarefoundtomatchourintuitionsabouthowdiffer-entverbsarecollocatedwithpercentagechangesofdifferentsizes.Therealchallengehereistostriketherightbalancebetweenaccuracyanddiversity,whichcanberealizedviasmoothing.Ourexperi-mentshaveconfirmedthattheproposedmodelcancapturehumanauthors’patternofusagearoundverbsbetterthantheexistingmethodcurrentlyemployedbyThomsonReutersEikonTM.Wehopethatthisprobabilisticmodelforverbselectioncouldhelpdata-to-textNLGsystemsachievegreatervariationandnaturalness.AcknowledgmentsTheresearchispartlyfundedbytheNationalKeyR&DProgramofChina(ID:2017YFC0803700)andtheNSFCgrant(No.61532021).TheTitanXPascalGPUusedforourexperimentswaskindlydonatedbytheNVIDIACorporation.ProfXuanjingHuang(Fudan)hashelpedwiththedatasets.Wethanktheanonymousreviewersandtheac-tioneditorfortheirconstructiveandhelpfulcom-ments.WealsogratefullyacknowledgethesupportofGeek.AIforthiswork.ReferencesLadaAAdamic.2000.Zipf,power-laws,andPareto—Arankingtutorial.Technicalreport,HPLabs.GaborAngeli,PercyLiang,andDanKlein.2010.Asimpledomain-independentprobabilisticapproachtogeneration.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages502–512.AnjaBelzandEricKow.2009.Systembuildingcostvs.outputqualityindata-to-textgeneration.InPro-ceedingsofthe12thEuropeanWorkshoponNaturalLanguageGeneration(ENLG),pages16–24.AnjaBelz.2008.Automaticgenerationofweatherfore-casttextsusingcomprehensiveprobabilisticgeneration-spacemodels.NaturalLanguageEngineering(NLE),14(04):431–455.DianneBerry,TheoRaynor,PeterKnapp,andElisabettaBersellini.2004.Overthecountermedicinesandtheneedforimmediateaction:AfurtherevaluationofEu-ropeanCommissionrecommendedwordingsforcom-municatingrisk.PatientEducationandCounseling,53(2):129–134.StevenBird,EwanKlein,andEdwardLoper.2009.Natu-ralLanguageProcessingwithPython:AnalyzingTextwiththeNaturalLanguageToolkit.O’ReillyMedia.SarahBoyd.1998.TREND:Asystemforgeneratingintelligentdescriptionsoftimeseriesdata.InPro-ceedingsofthe2ndIEEEInternationalConferenceonIntelligentProcessingSystems(ICIPS).EugeneCharniak,DonBlaheta,NiyuGe,KeithHall,JohnHale,andMarkJohnson.2000.BLLIP1987-89WSJCorpusRelease1LDC2000T43.WebDownload.Philadelphia:LinguisticDataConsortium.AndrewGelman,JohnCarlin,HalStern,DavidDunson,AkiVehtari,andDonaldRubin.2013.BayesianDataAnalysis.CRC,3rdedition.EliGoldberg,NorbertDriedger,andRichardI.Kittredge.1994.Usingnatural-languageprocessingtoproduceweatherforecasts.IEEEExpert,9(2):45–53.IanGoodfellow,YoshuaBengio,andAaronCourville.2016.DeepLearning.MITPress.RichardH.R.Hahnloser,RahulSarpeshkar,MishaA.Ma-howald,RodneyJ.Douglas,andH.SebastianSeung.2000.Digitalselectionandanalogueamplificationcoexistinacortex-inspiredsiliconcircuit.Nature,405(6789):947–951.TrevorHastie,RobertTibshirani,andJeromeFriedman.2009.TheElementsofStatisticalLearning:DataMin-ing,Inference,andPrediction.Springer,2ndedition.FrederickJelinekandRobertMercer,1980.InterpolatedEstimationofMarkovSourceParametersfromSparseData,pages381–402.North-HollandPublishing.DanielJurafskyandJamesH.Martin.2009.SpeechandLanguageProcessing:AnIntroductiontoNaturalLanguageProcessing,ComputationalLinguisticsandSpeechRecognition.PrenticeHall,2ndedition.DaveSKerby.2014.Thesimpledifferenceformula:Anapproachtoteachingnonparametriccorrelation.Com-prehensivePsychology,3(1).DiederikP.KingmaandJimmyBa.2014.Adam:Amethodforstochasticoptimization.arXivpreprintarXiv:1412.6980.SuneelKumarKingrani,MarkLevene,andDellZhang.2015.Diversityanalysisofwebsearchresults.InPro-ceedingsoftheAnnualInternationalACMWebScienceconference(WebSci).IoannisKonstasandMirellaLapata.2013.Aglobalmodelforconcept-to-textgeneration.JournalofArtifi-cialIntelligenceResearch(JAIR),48:305–346. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 526 ManfredKrifka.2007.Approximateinterpretationsofnumberwords:Acaseforstrategiccommunication.InCognitiveFoundationsofInterpretation,pages111–126.JohnKKruschke.2014.DoingBayesianDataAnalysis:ATutorialwithR,JAGS,andStan.AcademicPress,2ndedition.AnneE.Magurran.1988.EcologicalDiversityandItsMeasurement.PrincetonUniversityPress.ChristopherD.Manning,PrabhakarRaghavan,andHin-richSch¨utze.2008.IntroductiontoInformationRe-trieval.CambridgeUniversityPress.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyRoseFinkel,StevenBethard,andDavidMc-Closky.2014.TheStanfordCoreNLPnaturallanguageprocessingtoolkit.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguis-tics(ACL),SystemDemonstrations,pages55–60.HongyuanMei,MohitBansal,andMatthewR.Walter.2016.Whattotalkaboutandhow?Selectivegen-erationusingLSTMswithcoarse-to-finealignment.InProceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(NAACL-HLT),pages720–730.ChrisMellishandRobertDale.1998.Evaluationinthecontextofnaturallanguagegeneration.ComputerSpeech&Language,12(4):349–373.PriscillaMoraes,KathleenMcCoy,andSandraCarberry.2014.Adaptinggraphsummariestotheusers’readinglevels.InProceedingsofthe8thInternationalNaturalLanguageGenerationConference(INLG),pages64–73.MarkE.J.Newman.2005.Powerlaws,Paretodistribu-tionsandZipf’slaw.ContemporaryPhysics,46(5):323–351.KishorePapineni,SalimRoukos,ToddWard,andWei-JingZhu.2002.BLEU:Amethodforautomaticevalu-ationofmachinetranslation.InProceedingsofthe40thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages311–318.VassilisPlachouras,ChareseSmiley,HirokoBretz,OlaTaylor,JochenL.Leidner,DezhaoSong,andFrankSchilder.2016.Interactingwithfinancialdataus-ingnaturallanguage.InProceedingsofthe39thIn-ternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR),pages1121–1124.DavidMWPowers.1998.ApplicationsandexplanationsofZipf’slaw.InProceedingsoftheJointConferencesonNewMethodsinLanguageProcessingandComputa-tionalNaturalLanguageLearning(NeMLaP/CoNLL),pages151–160.DragomirR.Radev,HongQi,HarrisWu,andWeiguoFan.2002.Evaluatingweb-basedquestionanswer-ingsystems.InProceedingsofthe3rdInternationalConferenceonLanguageResourcesandEvaluation(LREC).AlejandroRamos-Soto,AlbertoBugar´ın,Sen´enBarro,andJuanTaboada.2013.Automaticgenerationoftextualshort-termweatherforecastsonrealpredictiondata.InProceedingsofthe10thInternationalConfer-enceonFlexibleQueryAnsweringSystems(FQAS),pages269–280.JosefRaviv.1967.DecisionmakinginMarkovchainsappliedtotheproblemofpatternrecognition.IEEETransactionsonInformationTheory,13(4):536–551.RobertT.Reagan,FrederickMosteller,andCleoYoutz.1989.Quantitativemeaningsofverbalprobabilityex-pressions.JournalofAppliedPsychology,74(3):433.EhudReiterandAnjaBelz.2009.Aninvestigationintothevalidityofsomemetricsforautomaticallyevalu-atingnaturallanguagegenerationsystems.Computa-tionalLinguistics,35(4):529–558.EhudReiter,SomayajuluSripada,JimHunter,JinYu,andIanDavy.2005.Choosingwordsincomputer-generatedweatherforecasts.ArtificialIntelligence,167(1-2):137–169.EhudReiter.2007.Anarchitecturefordata-to-textsys-tems.InProceedingsofthe11thEuropeanWorkshoponNaturalLanguageGeneration(ENLG),pages97–104.StuartRussellandPeterNorvig.2009.ArtificialIntelli-gence:AModernApproach.PrenticeHall,3rdedition.DavidWScott.2015.MultivariateDensityEstimation:Theory,Practice,andVisualization.JohnWiley&Sons.ClaudeE.Shannon.1948.Amathematicaltheoryofcom-munication.BellSystemTechnicalJournal,27:623–656.Heung-YeungShum,XiaodongHe,andDiLi.2018.FromElizatoXiaoIce:Challengesandopportunitieswithsocialchatbots.arXivpreprintarXiv:1801.01957.EdwardHSimpson.1949.Measurementofdiversity.Nature.ChareseSmiley,VassilisPlachouras,FrankSchilder,Hi-rokoBretz,JochenL.Leidner,andDezhaoSong.2016.Whentoplummetandwhentosoar:Corpusbasedverbselectionfornaturallanguagegeneration.InPro-ceedingsofthe9thInternationalNaturalLanguageGenerationConference(INLG),pages36–39.SomayajuluSripada,NeilBurnett,RossTurner,JohnMastin,andDaveEvans.2014.Acasestudy:NLGmeetingweatherindustrydemandforqualityandquan-tityoftextualweatherforecasts.InProceedingsofthe8thInternationalNaturalLanguageGenerationCon-ference(INLG),pages1–5. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 527 EllenM.Voorhees.1999.TheTREC-8questionan-sweringtrackreport.InProceedingsofthe8thTextREtrievalConference(TREC),pages77–82.DennisWackerly,WilliamMendenhall,andRichardScheaffer.2007.MathematicalStatisticswithApplica-tions.NelsonEducation.FrankWilcoxon.1945.Individualcomparisonsbyrank-ingmethods.BiometricsBulletin,1(6):80–83.AlexWright.2015.Algorithmicauthors.Communica-tionsoftheACM(CACM),58(11):12–14.ChengxiangZhaiandJohnLafferty.2004.Astudyofsmoothingmethodsforlanguagemodelsappliedtoin-formationretrieval.ACMTransactionsonInformationSystems(TOIS),22(2):179–214. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 3 8 1 5 6 7 6 4 8 / / t l a c _ a _ 0 0 0 3 8 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 528Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller. image
Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller. image
Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller. image
Transactions of the Association for Computational Linguistics, vol. 6, pp. 511–527, 2018. Action Editor: Alexander Koller. image

Télécharger le PDF