Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 31–44, 2017. Redattore di azioni: Hwee Tou Ng. - Ricerca sull'intelligenza artificiale specializzata al MIT

Operazioni dell'Associazione per la Linguistica Computazionale, vol. 5, pag. 31–44, 2017. Redattore di azioni: Hwee Tou Ng.
Lotto di invio: 8/2016 Lotto di revisione: 10/2016; Pubblicato 1/2017.

2017 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.

C
(cid:13)

ModelingSemanticExpectation:UsingScriptKnowledgeforReferentPredictionAshutoshModi1,3IvanTitov2,4VeraDemberg1,3AsadSayeed1,3ManfredPinkal1,31{ashutosh,vera,asayeed,pinkal}@coli.uni-saarland.de2titov@uva.nl3Universit¨atdesSaarlandes,Germany4ILLC,UniversityofAmsterdam,theNetherlandsAbstractRecentresearchinpsycholinguisticshaspro-videdincreasingevidencethathumanspredictupcomingcontent.Predictionalsoaffectsper-ceptionandmightbeakeytorobustnessinhumanlanguageprocessing.Inthispaper,weinvestigatethefactorsthataffecthumanpredictionbybuildingacomputationalmodelthatcanpredictupcomingdiscoursereferentsbasedonlinguisticknowledgealonevs.lin-guisticknowledgejointlywithcommon-senseknowledgeintheformofscripts.Weﬁndthatscriptknowledgesigniﬁcantlyimprovesmodelestimatesofhumanpredictions.Inasecondstudy,wetestthehighlycontroversialhypothesisthatpredictabilityinﬂuencesrefer-ringexpressiontypebutdonotﬁndevidenceforsuchaneffect.1IntroductionBeingabletoanticipateupcomingcontentisacorepropertyofhumanlanguageprocessing(Kutasetal.,2011;KuperbergandJaeger,2016)thathasre-ceivedalotofattentioninthepsycholinguisticliter-atureinrecentyears.Expectationsaboutupcomingwordshelphumanscomprehendlanguageinnoisysettingsanddealwithungrammaticalinput.Inthispaper,weuseacomputationalmodeltoaddressthequestionofhowdifferentlayersofknowledge(lin-guisticknowledgeaswellascommon-senseknowl-edge)inﬂuencehumananticipation.Herewefocusourattentiononsemanticpre-dictionsofdiscoursereferentsforupcomingnounphrases.Thistaskisparticularlyinterestingbecauseitallowsustoseparatethesemantictaskofantic-ipatinganintendedreferentandtheprocessingoftheactualsurfaceform.Forexample,inthecon-textofIorderedamediumsirloinsteakwithfries.Later,thewaiterbrought…,thereisastrongex-pectationofaspeciﬁcdiscoursereferent,i.e.,thereferentintroducedbytheobjectNPofthepreced-ingsentence,whilethepossiblereferringexpressioncouldbeeitherthesteakIhadordered,thesteak,ourfood,orit.Existingmodelsofhumanpredic-tionareusuallyformulatedusingtheinformation-theoreticconceptofsurprisal.Inrecentwork,how-ever,surprisalisusuallynotcomputedforDRs,whichrepresenttherelevantsemanticunit,butforthesurfaceformofthereferringexpressions,eventhoughthereisanincreasingamountofliteraturesuggestingthathumanexpectationsatdifferentlev-elsofrepresentationhaveseparableeffectsonpre-dictionand,asaconsequence,thatthemodellingofonlyonelevel(thelinguisticsurfaceform)isin-sufﬁcient(KuperbergandJaeger,2016;Kuperberg,2016;Zarconeetal.,2016).Thepresentmodelad-dressesthisshortcomingbyexplicitlymodellingandrepresentingcommon-senseknowledgeandconcep-tuallyseparatingthesemantic(discoursereferent)andthesurfacelevel(referringexpression)expec-tations.OurdiscoursereferentpredictiontaskisrelatedtotheNLPtaskofcoreferenceresolution,butitsubstantiallydiffersfromthattaskinthefollowingways:1)weuseonlytheincrementallyavailableleftcontext,whilecoreferenceresolutionusesthefulltext;2)coreferenceresolutiontriestoidentifytheDRforagiventargetNPincontext,whilewelookattheexpectationsofDRsbasedonlyonthecontext

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

beforethetargetNPisseen.Thedistinctionbetweenreferentpredictionandpredictionofreferringexpressionsalsoallowsustostudyacloselyrelatedquestioninnaturallanguagegeneration:thechoiceofatypeofreferringexpres-sionbasedonthepredictabilityoftheDRthatisintendedbythespeaker.ThispartofourworkisinspiredbyareferentguessingexperimentbyTilyandPiantadosi(2009),whoshowedthathighlypre-dictablereferentsweremorelikelytoberealizedwithapronounthanunpredictablereferents,whichweremorelikelytoberealizedusingafullNP.TheeffecttheyobserveisconsistentwithaGriceanpointofview,ortheprincipleofuniforminformationden-sity(seeSection5.1).Tuttavia,TilyandPiantadosidonotprovideacomputationalmodelforestimat-ingreferentpredictability.Also,theydonotincludeselectionalpreferenceorcommon-senseknowledgeeffectsintheiranalysis.Webelievethatscriptknowledge,i.e.,common-senseknowledgeabouteverydayeventsequences,representsagoodstartingpointformodellingcon-versationalanticipation.Thistypeofcommon-senseknowledgeincludestemporalstructurewhichispar-ticularlyrelevantforanticipationincontinuouslan-guageprocessing.Furthermore,ourapproachcanbuildonprogressthathasbeenmadeinrecentyearsinmethodsforacquiringlarge-scalescriptknowl-edge;seeSection1.1.Ourhypothesisisthatscriptknowledgemaybeasigniﬁcantfactorinhumanan-ticipationofdiscoursereferents.Explicitlymod-ellingthisknowledgewillthusallowustoproducemorehuman-likepredictions.Scriptknowledgeenablesourmodeltogenerateanticipationsaboutdiscoursereferentsthathaveal-readybeenmentionedinthetext,aswellasanticipa-tionsabouttextuallynewdiscoursereferentswhichhavebeenactivatedduetoscriptknowledge.Bymodellingeventsequencesandeventparticipants,ourmodelcapturesmanymorelong-rangedepen-denciesthannormallanguagemodelsareableto.Asanexample,considerthefollowingtwoalternativetextpassages:Wegotseated,andhadtowaitfor20minutes.Then,thewaiterbroughtthe…Weordered,andhadtowaitfor20minutes.Then,thewaiterbroughtthe…Preferredcandidatereferentsfortheobjectposi-tionofthewaiterbroughtthe…areinstancesofthefood,menu,orbillparticipanttypes.Inthecon-textofthealternativeprecedingsentences,thereisastrongexpectationofinstancesofamenuandafoodparticipant,respectively.Thispaperrepresentsfoundationalresearchin-vestigatinghumanlanguageprocessing.However,italsohasthepotentialforapplicationinassistanttechnologyandembodiedagents.Thegoalistoachievehuman-levellanguagecomprehensioninre-alisticsettings,andinparticulartoachieverobust-nessinthefaceoferrorsornoise.Explicitlymod-ellingexpectationsthataredrivenbycommon-senseknowledgeisanimportantstepinthisdirection.Inordertobeabletoinvestigatetheinﬂuenceofscriptknowledgeondiscoursereferentexpecta-tions,weuseacorpusthatcontainsfrequentrefer-encetoscriptknowledge,andprovidesannotationsforcoreferenceinformation,scripteventsandpar-ticipants(Section2).InSection3,wepresentalarge-scaleexperimentforempiricallyassessinghu-manexpectationsonupcomingreferents,whichal-lowsustoquantifyatwhatpointsinatexthumanshaveveryclearanticipationsvs.whentheydonot.Ourgoalistomodelhumanexpectations,eveniftheyturnouttobeincorrectinaspeciﬁcinstance.TheexperimentwasconductedviaMechanicalTurkandfollowsthemethodologyofTilyandPianta-dosi(2009).Insection4,wedescribeourcomputa-tionalmodelthatrepresentsscriptknowledge.Themodelistrainedonthegoldstandardannotationsofthecorpus,becauseweassumethathumancompre-hendersusuallywillhaveananalysisofthepreced-ingdiscoursewhichcloselycorrespondstothegoldstandard.Wecomparethepredictionaccuracyofthismodeltohumanpredictions,aswellastotwobaselinemodelsinSection4.3.Oneofthemusesonlystructurallinguisticfeaturesforpredictingref-erents;theotherusesgeneralscript-independentse-lectionalpreferencefeatures.InSection5,wetestwhethersurprisal(asestimatedfromhumanguessesvs.computationalmodels)canpredictthetypeofreferringexpressionusedintheoriginaltextsinthecorpus(pronounvs.fullreferringexpression).Thisexperimentalsohaswiderimplicationswithrespecttotheon-goingdiscussionofwhetherthereferringexpressionchoiceisdependentonpredictability,aspredictedbytheuniforminformationdensityhy-

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

(IO)(1)Pbather[decided]Ewashtotakea(bath)(2)Pbathyesterdayafternoonafterworkingout.Once(IO)(1)Pbathergotbackhome,(IO)(1)Pbather[walked]Eenterbathroomto(my)(1)Pbather(bathroom)(3)Pbathroomandﬁrstquicklyscrubbedthe(bathroomtub)(4)Pbathtubby[turningon]Eturnwateronthe(water)(5)Pwaterandrinsing(Esso)(4)Pbathtubcleanwitharag.After(IO)(1)Pbatherﬁnished,(IO)(1)Pbather[plugged]Eclosedrainthe(tub)(4)Pbathtubandbegan[ﬁlling]Eﬁllwater(Esso)(4)Pbathtubwithwarm(water)(5)Pwatersetatabout98(degrees)(6)Ptemperature.Figure1:AnexcerptfromastoryintheInScriptcorpus.Thereferringexpressionsareinparentheses,andthecorrespondingdiscoursereferentlabelisgivenbythesuperscript.Referringexpressionsofthesamediscoursereferenthavethesamecolorandsuperscriptnumber.Script-relevanteventsareinsquarebracketsandcoloredinorange.Eventtypeisindicatedbythecorrespondingsubscript.pothesis.Thecontributionsofthispaperconsistof:•alargedatasetofhumanexpectations,inava-rietyoftextsrelatedtoevery-dayactivities.•animplementationoftheconceptualdistinctionbetweenthesemanticlevelofreferentpredic-tionandthetypeofareferringexpression.•acomputationalmodelwhichsigniﬁcantlyim-provesmodellingofhumananticipations.•showingthatscriptknowledgeisasigniﬁcantfactorinhumanexpectations.•testingthehypothesisofTilyandPiantadosithatthechoiceofthetypeofreferringexpres-sion(pronounorfullNP)dependsonthepre-dictabilityofthereferent.1.1ScriptsScriptsrepresentknowledgeabouttypicaleventsequences(SchankandAbelson,1977),forexam-plethesequenceofeventshappeningwheneatingatarestaurant.Scriptknowledgetherebyincludeseventslikeorder,bringandeataswellaspartici-pantsofthoseevents,e.g.,menu,waiter,food,guest.Existingmethodsforacquiringscriptknowledgearebasedonextractingnarrativechainsfromtext(ChambersandJurafsky,2008;ChambersandJuraf-sky,2009;Jansetal.,2012;PichottaandMooney,2014;Rudingeretal.,2015;Modi,2016;AhrendtandDemberg,2016)orbyelicitingscriptknowledgeviaCrowdsourcingonMechanicalTurk(Regnerietal.,2010;Frermannetal.,2014;ModiandTitov,2014).Modellinganticipatedeventsandparticipantsismotivatedbyevidenceshowingthateventrepre-sentationsinhumanscontaininformationnotonlyaboutthecurrentevent,butalsoaboutpreviousandfuturestates,thatis,humansgenerateanticipa-tionsabouteventsequencesduringnormallanguagecomprehension(Sch¨utz-BosbachandPrinz,2007).ScriptknowledgerepresentationshavebeenshowntobeusefulinNLPapplicationsforambiguityreso-lutionduringreferenceresolution(RahmanandNg,2012).2Data:TheInScriptCorpusOrdinarytexts,includingnarratives,encodescriptstructureinawaythatistoocomplexandtooim-plicitatthesametimetoenableasystematicstudyofscript-basedexpectation.Theycontaininterleavedreferencestomanydifferentscripts,andtheyusuallyrefertosinglescriptsinapoint-wisefashiononly,relyingontheabilityofthereadertoinferthefulleventchainusingtheirbackgroundknowledge.WeusetheInScriptcorpus(Modietal.,2016)tostudythepredictiveeffectofscriptknowledge.In-Scriptisacrowdsourcedcorpusofsimplenarrativetexts.Participantswereaskedtowriteaboutaspe-ciﬁcactivity(e.g.,arestaurantvisit,abusride,oragroceryshoppingevent)whichtheypersonallyex-perienced,andtheywereinstructedtotellthestoryasifexplainingtheactivitytoachild.Thisresultedinstoriesthatarecenteredaroundaspeciﬁcscenarioandthatexplicitlymentionmundanedetails.Thus,theygenerallyrealizelongereventchainsassociatedwithasinglescript,whichmakesthemparticularlyappropriatetoourpurpose.TheInScriptcorpusislabelledwithevent-type,participant-type,andcoreferenceinformation.Fullverbsarelabeledwitheventtypeinformation,headsofallnounphraseswithparticipanttypes,usingscenario-speciﬁclistsofeventtypes(suchasenterbathroom,closedrainandﬁllwaterforthe“takingabath”scenario)andparticipanttypes(suchasbather,waterandbathtub).Onaverage,eachtemplateof-fersachoiceof20eventtypesand18participant

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

(IO)(1)decidedtotakea(bath)(2)yesterdayafternoonafterworkingout.Once(IO)(1)gotbackhome,(IO)(1)walkedto(my)(1)(bathroom)(3)andﬁrstquicklyscrubbedthe(bathroomtub)(4)byturningonthe(water)(5)andrinsing(Esso)(4)cleanwitharag.Af-ter(IO)(1)ﬁnished,(IO)(1)pluggedXXXXXXFigure2:AnillustrationoftheMechanicalTurkexper-imentforthereferentclozetask.Workersaresupposedtoguesstheupcomingreferent(indicatedbyXXXXXXabove).Theycaneitherchoosefromthepreviouslyacti-vatedreferents,ortheycanwritesomethingnew.051015201451DR_4(P_bathtub)the drain(new DR)DR_1(P_bather)Number of WorkersFigure3:ResponseofworkerscorrespondingtothestoryinFig.2.Workersguessedtwoalreadyactivateddis-coursereferents(DR)DR4andDR1.Someoftheworkersalsochosethe“new”optionandwrotedifferentlexicalvariantsof“bathtubdrain”,anewDRcorrespond-ingtotheparticipanttype“thedrain”.types.TheInScriptcorpusconsistsof910storiesad-dressing10scenarios(about90storiesperscenario).Thecorpushas200,000words,12,000verbin-stanceswitheventlabels,and44,000headnounswithparticipantinstances.Modietal.(2016)reportaninter-annotatoragreementof0.64foreventtypesand0.77forparticipanttypes(Fleiss’kappa).Weusegold-standardevent-andparticipant-typeannotationtostudytheinﬂuenceofscriptknowl-edgeontheexpectationofdiscoursereferents.Inaddition,InScriptprovidescoreferenceannotation,whichmakesitpossibletokeeptrackofthemen-tioneddiscoursereferentsateachpointinthestory.WeusethisinformationinthecomputationalmodelofDRpredictionandintheDRguessingexperimentdescribedinthenextsection.Anexampleofanan-notatedInScriptstoryisshowninFigure1.3ReferentClozeTaskWeusetheInScriptcorpustodevelopcomputa-tionalmodelsforthepredictionofdiscourserefer-ents(DRs)andtoevaluatetheirpredictionaccuracy.Thiscanbedonebytestinghowoftenourmodelsmanagetoreproducetheoriginaldiscoursereferent(cf.alsothe“narrativecloze”taskby(ChambersandJurafsky,2008)whichtestswhetheraverbtogetherwitharolecanbecorrectlyguessedbyamodel).Tuttavia,wedonotonlywanttopredictthe“cor-rect”DRsinatextbutalsotomodelhumanexpec-tationofDRsincontext.Toempiricallyassesshu-manexpectation,wecreatedanadditionaldatabaseofcrowdsourcedhumanpredictionsofdiscourseref-erentsincontextusingAmazonMechanicalTurk.Thedesignofourexperimentcloselyresemblestheguessinggameof(TilyandPiantadosi,2009)butex-tendsitinasubstantialway.WorkershadtoreadstoriesoftheInScriptcorpus1andguessupcomingparticipants:foreachtargetNP,workerswereshownthestoryuptothisNPex-cludingtheNPitself,andtheywereaskedtoguessthenextpersonorobjectmostlikelytobereferredto.Incasetheydecidedinfavourofadiscourseref-erentalreadymentioned,theyhadtochooseamongtheavailablediscoursereferentsbyclickinganNPintheprecedingtext,i.e.,somenounwithaspeciﬁc,coreference-indicatingcolor;seeFigure2.Other-wise,theywouldclickthe“New”button,andwouldinturnbeaskedtogiveashortdescriptionofthenewpersonorobjecttheyexpectedtobementioned.Thepercentageofguessesthatagreewiththeactuallyre-ferredentitywastakenasabasisforestimatingthesurprisal.Theexperimentwasdoneforallstoriesofthetestset:182stories(20%)oftheInScriptcorpus,evenlytakenfromallscenarios.Sinceourfocusisontheeffectofscriptknowledge,weonlyconsid-eredthoseNPsastargetsthataredirectdependentsofscript-relatedevents.Guessingstartedfromthethirdsentenceonlyinordertoensurethatamini-mumofcontextinformationwasavailable.Tokeepthecomplexityofthecontextmanageable,were-strictedguessingtoamaximumof30targetsandskippedtherestofthestory(thisappliedto12%ofthestories).Wecollected20guessesperNPfor3346nounphraseinstances,whichamountstoato-talofaround67Kguesses.Workersselectedacon-1Thecorpusisavailableat:http://www.sfb1102.uni-saarland.de/?page_id=2582

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

textNPin68%ofcasesand“New”in32%ofcases.Ourleadinghypothesisisthatscriptknowledgesubstantiallyinﬂuenceshumanexpectationofdis-coursereferents.TheguessingexperimentprovidesabasistoestimatehumanexpectationofalreadymentionedDRs(thenumberofclicksontherespec-tiveNPsintext).Tuttavia,weexpectthatscriptknowledgehasaparticularlystronginﬂuenceinthecaseofﬁrstmentions.Onceascriptisevokedinatext,weassumethatthefullscriptstructure,includ-ingallparticipants,isactivatedandavailabletothereader.TilyandPiantadosi(2009)areinterestedinsec-ondmentionsonlyandthereforedonotmakeuseoftheworker-generatednounphrasesclassiﬁedas“New”.Tostudytheeffectofactivatedbutnotexplicitlymentionedparticipants,wecarriedoutasubsequentannotationstepontheworker-generatednounphrasesclassiﬁedas“New”.Wepresentedan-notatorswiththesenounphrasesintheircontexts(withco-referringNPsmarkedbycolor,asintheM-Turkexperiment)E,inaddition,displayedallpar-ticipanttypesoftherelevantscript(i.e.,thescriptas-sociatedwiththetextintheInScriptcorpus).Anno-tatorsdidnotseethe“correct”targetNP.Weaskedannotatorstoeither(1)selecttheparticipanttypein-stantiatedbytheNP(ifany),(2)labeltheNPasun-relatedtothescript,O(3),linktheNPtoanovertantecedentinthetext,inthecasethattheNPisac-tuallyasecondmentionthathadbeenerroneouslylabeledasnewbytheworker.Option(1)providesabasisforaﬁne-grainedestimationofﬁrst-mentionDRs.Option(3),whichweaddedwhenwenoticedtheconsiderablenumberofoverlookedantecedents,servesascorrectionoftheresultsoftheM-Turkex-periment.Outofthe22Kannotated“New”cases,39%wereidentiﬁedassecondmentions,55%werelinkedtoaparticipanttype,and6%wereclassiﬁedasreallynovel.4ReferentPredictionModelInthissection,wedescribethemodelweusetopredictupcomingdiscoursereferents(DRs).4.1ModelOurmodelshouldnotonlyassignprobabilitiestoDRsalreadyexplicitlyintroducedinthepreced-ingtextfragment(e.g.,“bath”or“bathroom”fortheclozetaskinFigure2)butalsoreservesomeprob-abilitymassfor‘new’DRs,i.e.,DRsactivatedviathescriptcontextorcompletelynovelonesnotbe-longingtothescript.Inprinciple,differentvariantsoftheactivationmechanismmustbedistinguished.Formanyparticipanttypes,asingleparticipantbe-longingtoaspeciﬁcsemanticclassisexpected(re-ferredtowiththebathtuborthesoap).Incontrast,the“towel’participanttypemayactivateasetofob-jects,elementsofwhichthencanbereferredtowithatoweloranothertowel.The“bathmeans”partici-panttypemayevenactivateagroupofDRsbelong-ingtodifferentsemanticclasses(e.g.,bubblebathandsalts).Sinceitisnotfeasibletoenumerateallpotentialparticipants,for‘new’DRsweonlypre-dicttheirparticipanttype(“bathmeans”inourex-ample).Inotherwords,thenumberofcategoriesinourmodelisequaltothenumberofpreviouslyintroducedDRsplusthenumberofparticipanttypesofthescriptplus1,reservedforanewDRnotcorre-spondingtoanyscriptparticipant(e.g.,cellphone).Inwhatfollows,weslightlyabusetheterminologyandrefertoallthesecategoriesasdiscourserefer-ents.Unlikestandardco-referencemodels,whichpre-dictco-referencechainsrelyingontheentiredocu-ment,ourmodelisincremental,thatis,whenpre-dictingadiscoursereferentd(T)atagivenpositiont,itcanlookonlyinthehistoryh(T)(i.e.,thepre-cedingpartofthedocument),excludingtherefer-ringexpression(RE)forthepredictedDR.WealsoassumethatpastREsarecorrectlyresolvedandas-signedtocorrectparticipanttypes(PTs).TypicalNLPapplicationsuseautomaticcoreferencereso-lutionsystems,butsincewewanttomodelhumanbehavior,thismightbeinappropriate,sinceanau-tomatedsystemwouldunderestimatehumanperfor-mance.Thismaybeastrongassumption,butforreasonsexplainedabove,weusegoldstandardpastREs.Weusethefollowinglog-linearmodel(“softmaxregression”):P(D(T)=d|H(T))=exp(wTf(D,H(T)))Pd0exp(wTf(d0,h(T))),wherefisthefeaturefunctionwewilldiscussinthefollowingsubsection,waremodelparameters,andthesummationinthedenominatorisoverthe

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

FeatureTypeRecencyShallowLinguisticFrequencyShallowLinguisticGrammaticalfunctionShallowLinguisticPrevioussubjectShallowLinguisticPreviousobjectShallowLinguisticPreviousREtypeShallowLinguisticSelectionalpreferencesLinguisticParticipanttypeﬁtScriptPredicateschemasScriptTable1:Summaryoffeaturetypessetofcategoriesdescribedabove.Someofthefeaturesincludedinfareafunc-tionofthepredicatesyntacticallygoverningtheunobservabletargetRE(correspondingtotheDRbeingpredicted).Tuttavia,inourincrementalsetting,thepredicateisnotavailableinthehis-toryh(T)forsubjectNPs.Inthiscase,weuseanadditionalprobabilisticmodel,whichesti-matestheprobabilityofthepredicatevgiventhecontexth(T),andmarginalizeoutitspredictions:P(D(T)=d|H(T))=Xvp(v|H(T))esp(wTf(D,H(T),v))Pd0exp(wTf(d0,h(T),v))Thepredicateprobabilitiesp(v|H(T))arecomputedbasedonthesequenceofprecedingpredicates(i.e.,ignoringanyotherwords)usingtherecurrentneuralnetworklanguagemodelestimatedonourtrainingset.2Theexpressionf(D,H(T),v)denotesthefeaturefunctioncomputedforthereferentd,giventhehistorycomposedofh(T)andthepredicatev.4.2FeaturesOurfeaturesencodepropertiesofaDRaswellascharacterizeitscompatibilitywiththecontext.Wefacetwochallengeswhendesigningourfea-tures.First,althoughthesizesofourdatasetsarerespectablefromthescriptannotationperspective,theyaretoosmalltolearnarichlyparameterizedmodel.Formanyofourfeatures,weaddressthischallengebyusingexternalwordembeddings3andassociateparameterswithsomesimplesimilaritymeasurescomputedusingtheseembeddings.Con-2WeusedRNNLMtoolkit(Mikolovetal.,2011;Mikolovetal.,2010)withdefaultsettings.3Weuse300-dimensionalwordembeddingsestimatedonWikipediawiththeskip-grammodelofMikolovetal.(2013):https://code.google.com/p/word2vec/sequently,thereareonlyafewdozenparameterswhichneedtobeestimatedfromscenario-speciﬁcdata.Second,inordertotestourhypothesisthatscriptinformationisbeneﬁcialfortheDRpredictiontask,weneedtodisentangletheinﬂuenceofscriptinformationfromgenerallinguisticknowledge.Weaddressthisbycarefullysplittingthefeaturesapart,evenifitpreventsusfrommodelingsomeinterplaybetweenthesourcesofinformation.Wewillde-scribebothclassesoffeaturesbelow;alsoseeasum-maryinTable1.4.2.1ShallowLinguisticFeaturesThesefeaturesarebasedonTilyandPianta-dosi(2009).Inaddition,weconsideraselectionalpreferencefeature.Recencyfeature.Thisfeaturecapturesthedistancelt(D)betweenthepositiontandthelastoccurrenceofthecandidateDRd.Asadistancemeasure,weusethenumberofsentencesfromthelastmentionandexponentiatethisnumbertomakethedepen-dencemoreextreme;onlyveryrecentDRswillre-ceiveanoticeableweight:esp(−lt(D)).Thisfeatureissetto0fornewDRs.Frequency.Thefrequencyfeatureindicatesthenumberoftimesthecandidatediscoursereferentdhasbeenmentionedsofar.Wedonotperformanybucketing.Grammaticalfunction.ThisfeatureencodesthedependencyrelationassignedtotheheadwordofthelastmentionoftheDRoraspecialnonelabeliftheDRisnew.Previoussubjectindicator.Thisbinaryfeaturein-dicateswhetherthecandidateDRdiscoreferentialwiththesubjectofthepreviousverbalpredicate.Previousobjectindicator.Thesamebutfortheob-jectposition.PreviousREtype.Thisthree-valuedfeatureindi-cateswhetherthepreviousmentionofthecandidateDRdisapronoun,anon-pronominalnounphrase,orhasneverbeenobservedbefore.4.2.2SelectionalPreferencesFeatureTheselectionalpreferencefeaturecaptureshowwellthecandidateDRdﬁtsagivensyntacticpo-sitionrofagivenverbalpredicatev.Itiscom-putedasthecosinesimilaritysimcos(xTd,xv,R)ofavector-spacerepresentationoftheDRxdandastructuredvector-spacerepresentationofthepred-

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

icatexv,r.ThesimilaritiesarecalculatedusingaDistributionalMemoryapproachsimilartothatofBaroniandLenci(2010).Theirstructuredvectorspacerepresentationhasbeenshowntoworkwellontasksthatevaluatecorrelationwithhumanthe-maticﬁtestimates(BaroniandLenci,2010;Baronietal.,2014;Sayeedetal.,2016)andisthussuitedtoourtask.Therepresentationxdiscomputedasanaver-ageofheadwordrepresentationsofalltheprevi-ousmentionsofDRd,wherethewordvectorsareobtainedfromtheTypeDMmodelofBaroniandLenci(2010).Thisisacount-based,third-orderco-occurrencetensorwhoseindicesareawordw0,asecondwordw1,andacomplexsyntacticrelationr,whichisusedasastand-inforasemanticlink.Thevaluesforeach(w0,r,w1)cellofthetensorarethelocalmutualinformation(LMI)estimatesobtainedfromadependency-parsedcombinationoflargecor-pora(ukWaC,BNC,andWikipedia).OurprocedurehassomedifferenceswiththatofBaroniandLenci.Forexample,forestimatingtheﬁtofanalternativenewDR(inotherwords,xdbasedonnopreviousmentions),weuseanaver-ageoverheadwordsofallREsinthetrainingset,a“nullreferent.”xv,riscalculatedastheaverageofthetop20(byLMI)r-ﬁllersforvinTypeDM;inotherwords,theprototypicalinstrumentofrubmayberepresentedbysummingvectorsliketowel,soap,eraser,coin…Ifthepredicatehasnotyetbeenen-countered(asforsubjectpositions),scoresforallscenario-relevantverbsareemittedformarginaliza-tion.4.2.3ScriptFeaturesInthissection,wedescribefeatureswhichrelyonscriptinformation.Ourgoalwillbetoshowthatsuchcommon-senseinformationisbeneﬁcialinper-formingDRprediction.Weconsideronlytwoscriptfeatures.ParticipanttypeﬁtThisfeaturecharacterizeshowwelltheparticipanttype(PT)ofthecandidateDRdﬁtsaspeciﬁcsyn-tacticrolerofthegoverningpredicatev;itcanberegardedasageneralizationoftheselectionalprefer-encefeaturetoparticipanttypesandalsoitsspecial-isationtotheconsideredscenario.Giventhecandi-dateDRd,itsparticipanttypep,andthesyntactic(IO)(1)decidedtotakea(bath)(2)yesterdayafternoonafterworkingout.(IO)(1)wasgettingreadytogooutandneededtogetcleanedbefore(IO)(1)wentso(IO)(1)decidedtotakea(bath)(2).(IO)(1)ﬁlledthe(bath-tub)(3)withwarm(water)(4)andaddedsome(bub-blebath)(5).(IO)(1)gotundressedandsteppedintothe(water)(4).(IO)(1)grabbedthe(soap)(5)andrubbediton(my)(1)(body)(7)andrinsedXXXXXXFigure4:Anexampleofthereferentclozetask.SimilartotheMechanicalTurkexperiment(Figure2),ourrefer-entpredictionmodelisaskedtoguesstheupcomingDR.relationr,wecollectallthepredicatesinthetrain-ingsetwhichhavetheparticipanttypepintheposi-tionr.TheembeddingoftheDRxp,risgivenbytheaverageembeddingofthesepredicates.Thefeatureiscomputedasthedotproductofxp,randthewordembeddingofthepredicatev.PredicateschemasThefollowingfeaturecapturesaspeciﬁcaspectofknowledgeaboutprototypicalsequencesofevents.Thisknowledgeiscalledpredicateschemasintherecentco-referencemodelingworkofPengetal.(2015).Inpredicateschemas,thegoalistomodelpairsofeventssuchthatifaDRdparticipatedintheﬁrstevent(inaspeciﬁcrole),itislikelytopartici-pateinthesecondevent(Ancora,inaspeciﬁcrole).Forexample,intherestaurantscenario,ifoneob-servesaphraseJohnordered,oneislikelytoseeJohnwaitedsomewherelaterinthedocument.Spe-ciﬁcargumentsarenotthatimportant(whereitisJohnorsomeotherDR),whatisimportantisthattheargumentisreusedacrossthepredicates.ThiswouldcorrespondtotheruleX-subject-of-order→X-subject-of-eat.4Unlikethepreviouswork,ourdatasetissmall,sowecannotinducetheserulesdi-rectlyastherewillbeveryfewrules,andthemodelwouldnotgeneralizetonewdatawellenough.In-stead,weagainencodethisintuitionusingsimilari-tiesinthereal-valuedembeddingspace.Recallthatourgoalistocomputeafeatureϕ(D,H(T))indicatinghowlikelyapotentialDRdistofollow,giventhehistoryh(T).Forexample,imag-4Inthiswork,welimitourselvestoruleswherethesyntacticfunctionisthesameonbothsidesoftherule.Inotherwords,wecan,inprinciple,encodethepatternXpushedY→XapologizedbutnotthepatternXpushedY→Ycried.

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

ModelNameFeatureTypesFeaturesBaseShallowLinguisticFeaturesRecency,Frequency,Grammaticalfunction,Previoussubject,PreviousobjectLinguisticShallowLinguisticFeatures+LinguisticFeatureRecency,Frequency,Grammaticalfunction,Previoussubject,Previousobject+SelectionalPreferencesScriptShallowLinguisticFeatures+LinguisticFeature+ScriptFeaturesRecency,Frequency,Grammaticalfunction,Previoussubject,Previousobject+SelectionalPreferences+Participanttypeﬁt,PredicateschemasTable2:SummaryofmodelfeaturesinethatthemodelisaskedtopredicttheDRmarkedbyXXXXXXinFigure4.Predicate-schemarulescanonlyyieldpreviouslyintroducedDRs,sothescoreϕ(D,H(T))=0foranynewDRd.Letususe“soap”asanexampleofapreviouslyintroducedDRandseehowthefeatureiscomputed.Inordertochoosewhichinferencerulescanbeappliedtoyield“soap”,wecaninspectFigure4.ThereareonlytwoprecedingpredicateswhichhaveDR“soap”astheirobject(rubbedandgrabbed),resultingintwopoten-tialrulesX-object-of-grabbed→X-object-of-rinsedandX-object-of-rubbed→X-object-of-rinsed.Wedeﬁnethescoreϕ(D,H(T))astheaverageoftherulescores.Moreformally,wecanwriteϕ(D,H(T))=1|N(D,H(T))|X(tu,v,R)∈N(D,H(T))ψ(tu,v,R),(1)whereψ(tu,v,R)isthescoreforaruleX-r-of-u→X-r-of-v,N(D,H(T))isthesetofapplicablerules,E|N(D,H(T))|denotesitscardinality.5Wedeﬁneϕ(D,H(T))as0,whenthesetofapplicablerulesisempty(cioè.|N(D,H(T))|=0).Thescoringfunctionψ(tu,v,R)asalinearfunc-5Inallourexperiments,ratherthanconsideringallpotentialpredicatesinthehistorytoinstantiaterules,wetakeintoac-countonly2precedingverbs.Inotherwords,uandvcanbeinterleavedbyatmostoneverband|N(D,H(T))|isin{0,1,2}.tionofajointembeddingxu,vofverbsuandv:ψ(tu,v,R)=αTrxu,v.Thetworemainingquestionsare(1)howtodeﬁnethejointembeddingsxu,v,E(2)howtoestimatetheparametervectorαr.Thejointembeddingoftwopredicates,xu,v,can,inprinciple,beanycomposi-tionfunctionofembeddingsofuandv,forexampletheirsumorcomponent-wiseproduct.InspiredbyBordesetal.(2013),weusethedifferencebetweenthewordembeddings:ψ(tu,v,R)=αTr(xu−xv),wherexuandxvareexternalembeddingsofthecorrespondingverbs.Encodingthesuccessionre-lationastranslationintheembeddingspacehasonedesirableproperty:thescoringfunctionwillbelargelyagnostictothemorphologicalformofthepredicates.Forexample,thedifferencebetweentheembeddingsofrinsedandrubbedisverysim-ilartothatofrinseandrub(BothaandBlunsom,2014),sothecorrespondingruleswillreceivesimi-larscores.Now,wecanrewritetheequation(1)asϕ(D,H(T))=αTr(H(T))P(tu,v,R)∈N(D,H(T))(xu−xv)|N(D,H(T))|(2)wherer(H(T))denotesthesyntacticfunctioncorre-spondingtotheDRbeingpredicted(objectinourexample).Asfortheparametervectorαr,thereareagainanumberofpotentialwayshowitcanbeestimated.Forexample,onecantrainadiscriminativeclassiﬁertoestimatetheparameters.However,weoptedforasimplerapproach—wesetitequaltotheempiricalestimateoftheexpectedfeaturevectorxu,vonthetrainingset:6αr=1DrXl,tδr(R(H(l,T)))X(tu,v,r0)∈N(D(l,T),H(l,T))(xu−xv),(3)wherelreferstoadocumentinthetrainingset,tis(asbefore)apositioninthedocument,H(l,T)and6ThisessentiallycorrespondstousingtheNaiveBayesmodelwiththesimplisticassumptionthatthescoredifferencesarenormallydistributedwithsphericalcovariancematrices.

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu

/
T

UN
C
l
/

UN
R
T
io
C
e
–
P
D

F
/

D
o

io
/

1
0
1
1
6
2

/
T

UN
C
_
UN
_
0
0
0
4
4
1
5
6
7
4
3
2

/
T

UN
C
_
UN
_
0
0
0
4
4
P
D

B
sì
G
tu
e
S
T

o
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

ScenarioHumanModelScriptModelLinguisticModelTilyModelAccuracyPerplexityAccuracyPerplexityAccuracyPerplexityAccuracyPerplexityGroceryShopping74.802.1368.173.1653.856.5432.8924.48Repairingaﬂatbicycletyre78.342.7262.093.8951.266.3829.2419.08Ridingapublicbus72.192.2864.573.6752.656.3432.7823.39Gettingahaircut71.062.4558.823.7942.827.1128.7015.40Plantingatree71.862.4659.324.2547.807.3128.1424.28Borrowingbookfromlibrary77.491.9364.073.5543.298.4033.3320.26TakingBath81.291.8467.423.1461.294.3343.2316.33Goingonatrain70.792.3958.734.2047.627.6830.1635.11Bakingacake76.432.1661.795.1146.409.1624.0723.67Flyinginanairplane62.043.0861.314.0148.187.2730.9030.18Average73.632.3462.633.8849.527.0531.3423.22Table3:Accuracies(in%)andperplexitiesfordifferentmodelsandscenarios.Thescriptmodelsubstantiallyout-performslinguisticandbasemodels(withp<0.001,signiﬁcancetestedwithMcNemar’stest(Everitt,1992)).Asexpected,thehumanpredictionmodeloutperformsthescriptmodel(withp<0.001,signiﬁcancetestedbyMcNe-mar’stest).ModelAccuracyPerplexityLinguisticModel49.527.05LinguisticModel+PredicateSchemas55.445.88LinguisticModel+Participanttypeﬁt58.884.29FullScriptModel(bothfeatures)62.633.88Table4:Accuraciesfromablationexperiments.d(l,t)arethehistoryandthecorrectDRforthisposi-tion,respectively.Thetermδr(r0)istheKroneckerdeltawhichequals1ifr=r0and0,otherwise.Dristhetotalnumberofrulesforthesyntacticfunctionrinthetrainingset:Dr=Xl,tδr(r(h(l,t)))×|N(d(l,t),h(l,t))|.Letusillustratethecomputationwithanexample.Imaginethatourtrainingsetconsistsofthedocu-mentinFigure1,andthetrainedmodelisusedtopredicttheupcomingDRinourreferentclozeexam-ple(Figure4).ThetrainingdocumentincludesthepairX-object-of-scrubbed→X-object-of-rinsing,sothecorrespondingterm(xscrubbed-xrinsing)partici-patesinthesummation(3)forαobj.Aswerelyonexternalembeddings,whichencodesemanticsimi-laritiesbetweenlexicalitems,thedotproductofthistermand(xrubbed-xrinsed)willbehigh.7Conse-quently,ϕ(d,h(t))isexpectedtobepositiveford=“soap”,thus,predicting“soap”asthelikelyforth-comingDR.Unfortunately,thereareotherterms(xu−xv)bothinexpression(3)forαobjandinexpression(2)forϕ(d,h(t)).Thesetermsmaybe7Thescorewouldhavebeenevenhigher,shouldthepred-icatebeinthemorphologicalformrinsingratherthanrinsed.However,embeddingsofrinsingandrinsedwouldstillbesuf-ﬁcientlyclosetoeachotherforourargumenttohold.irrelevanttothecurrentprediction,asX-object-of-plugged→X-object-of-ﬁllingfromFigure1,andmaynotevenencodeanyvalidregularities,asX-object-of-got→X-object-of-scrubbed(againfromFigure1).Thismaysuggestthatourfeaturewillbetoocontaminatedwithnoisetobeinformativeformakingpredictions.However,recallthatinde-pendentrandomvectorsinhighdimensionsareal-mostorthogonal,and,assumingtheyarebounded,theirdotproductsareclosetozero.Consequently,theproductsoftherelevant(“non-random”)terms,inourexample(xscrubbed-xrinsing)and(xrubbed-xrinsed),arelikelytoovercomethe(“random”)noise.Aswewillseeintheablationstudies,thepredicate-schemafeatureisindeedpredictiveofaDRandcon-tributestotheperformanceofthefullmodel.4.3ExperimentsWewouldliketotestwhetherourmodelcanpro-duceaccuratepredictionsandwhetherthemodel’sguessescorrelatewellwithhumanpredictionsforthereferentclozetask.Inordertobeabletoevaluatetheeffectofscriptknowledgeonreferentpredictability,wecomparethreemodels:ourfullScriptmodelusesallofthefeaturesintroducedinsection4.2;theLinguisticmodelreliesonlyonthe‘linguisticfeatures’butnotthescript-speciﬁcones;andtheBasemodelincludesalltheshallowlinguisticfeatures.TheBasemodeldiffersfromthelinguisticmodelinthatitdoesnotmodelselectionalpreferences.Table2summarizesfeaturesusedindifferentmodels.Thedatasetwasrandomlydividedintotraining(70%),development(10%,91storiesfrom10sce- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l a c _ a _ 0 0 0 4 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 40 narios),andtest(20%,182storiesfrom10scenar-ios)sets.ThefeatureweightswerelearnedusingL-BFGS(Byrdetal.,1995)tooptimizethelog-likelihood.Evaluationagainstoriginalreferents.Wecalcu-latedthepercentageofcorrectDRpredictions.SeeTable3fortheaveragesacross10scenarios.Wecanseethatthetaskappearshardforhumans:theiraverageperformancereachesonly73%accuracy.Asexpected,theBasemodelistheweakestsystem(theaccuracyof31%).Modelingselectionalpref-erencesyieldsanextra18%inaccuracy(Linguis-ticmodel).Thekeyﬁndingisthatincorporationofscriptknowledgeincreasestheaccuracybyfurther13%,althoughstillfarbehindhumanperformance(62%vs.73%).Besidesaccuracy,weuseperplex-ity,whichwecomputednotonlyforallourmodelsbutalsoforhumanpredictions.Thiswaspossibleaseachtaskwassolvedbymultiplehumans.Weusedunsmoothednormalizedguessfrequenciesastheprobabilities.AswecanseefromTable3,theperplexityscoresareconsistentwiththeaccuracies:thescriptmodelagainoutperformsothermethods,E,asexpected,allthemodelsareweakerthanhu-mans.Asweusedtwosetsofscriptfeatures,capturingdifferentaspectsofscriptknowledge,weperformedextraablationstudies(Table4).Theexperimentsconﬁrmthatbothfeaturesetswerebeneﬁcial.Evaluationagainsthumanexpectations.Intheprevioussubsection,wedemonstratedthatthein-corporationofselectionalpreferencesand,perhapsmoreinterestingly,theintegrationofautomaticallyacquiredscriptknowledgeleadtoimprovedaccu-racyinpredictingdiscoursereferents.Nowweturntoanotherquestionraisedintheintroduction:doesincorporationofthisknowledgemakeourpredic-tionsmorehuman-like?Inotherwords,areweabletoaccuratelyestimatehumanexpectations?Thisin-cludesnotonlybeingsufﬁcientlyaccuratebutalsomakingthesamekindofincorrectpredictions.Inthisevaluation,wethereforeusehumanguessescollectedduringthereferentclozetaskasourtarget.Wethencalculatetherelativeaccuracyofeachcomputationalmodel.AscanbeseeninFigure5,theScriptmodel,atapprox.53%accuracy,isalotmoreaccurateinpredictinghumanguessesthantheLinguisticmodelandtheBasemodel.WecanalsoScriptLinguisticBase010203040506052.938.434.52Rel. Precisione (In %)Figure5:Averagerelativeaccuraciesofdifferentmodelsw.r.thumanpredictions.ScriptLinguisticBase0.00.20.40.60.80.50.570.66JS DivergenceFigure6:AverageJensen-Shannondivergencebetweenhumanpredictionsandmodels.observethatthemarginbetweentheScriptmodelandtheLinguisticmodelisalotlargerinthisevalu-ationthanbetweentheBasemodelandtheLinguis-ticmodel.Thisindicatesthatthemodelwhichhasaccesstoscriptknowledgeismuchmoresimilartohumanpredictionbehaviorintermsoftopguessesthanthescript-agnosticmodels.Nowwewouldliketoassessifourpredictionsaresimilarasdistributionsratherthanonlyyield-ingsimilartoppredictions.Inordertocomparethedistributions,weusetheJensen-Shannondivergence(JSD),asymmetrizedversionoftheKullback-Leiblerdivergence.Intuitively,JSDmeasuresthedistancebetweentwoprobabilitydistributions.AsmallerJSDvalueisindicativeofmoresimilardistributions.Fig-ure6showsthattheprobabilitydistributionsresult-ingfromtheScriptmodelaremoresimilartohumanpredictionsthanthoseoftheLinguisticandBasemodels.Intheseexperiments,wehaveshownthatscriptknowledgeimprovespredictionsofupcomingref-erentsandthatthescriptmodelisthebestamongourmodelsinapproximatinghumanreferentpredic-tions.5ReferringExpressionTypePredictionModel(REModel)Usingthereferentpredictionmodels,wenextat-tempttoreplicateTilyandPiantadosi’sﬁndingsthat l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l a c _ a _ 0 0 0 4 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 41 thechoiceofthetypeofreferringexpression(pro-nounorfullNP)dependsinpartonthepredictabilityofthereferent.5.1UniformInformationDensityhypothesisTheuniforminformationdensity(UID)hypothe-sissuggeststhatspeakerstendtoconveyinformationatauniformrate(Jaeger,2010).Appliedtochoiceofreferringexpressiontype,itwouldpredictthatahighlypredictablereferentshouldbeencodedus-ingashortcode(here:apronoun),whileanunpre-dictablereferentshouldbeencodedusingalongerform(here:afullNP).Informationdensityismea-suredusingtheinformation-theoreticmeasureofthesurprisalSofamessagemi:S(mi)=−logP(mi|context)UIDhasbeenverysuccessfulinexplainingavari-etyoflinguisticphenomena;seeJaegeretal.(2016).Thereis,Tuttavia,controversyaboutwhetherUIDaffectspronominalization.TilyandPiantadosi(2009)reportevidencethatwritersaremorelikelytoreferusingapronounorpropernamewhentheref-erentiseasytoguessanduseafullNPwhenreadershavelesscertaintyabouttheupcomingreferent;seealsoArnold(2001).Butotherexperiments(usinghighlycontrolledstimuli)havefailedtoﬁndanef-fectofpredictabilityonpronominalization(Steven-sonetal.,1994;FukumuraandvanGompel,2010;RohdeandKehler,2014).ThepresentstudyhencecontributestothedebateonwhetherUIDaffectsre-ferringexpressionchoice.5.2AmodelofReferringExpressionChoiceOurgoalistodeterminewhetherreferentpre-dictability(quantiﬁedintermsofsurprisal)iscor-relatedwiththetypeofreferringexpressionusedinthetext.Herewefocusonthedistinctionbe-tweenpronounsandfullnounphrases.Ourdataalsocontainsasmallpercentage(ca.1%)ofpropernames(like“John”).Duetothissmallclasssizeandearlierﬁndingsthatpropernounsbehavemuchlikepronouns(TilyandPiantadosi,2009),wecom-binedpronounsandpropernamesintoasingleclassofshortencodings.Forthereferringexpressiontypepredictiontask,weestimatethesurprisalofthereferentfromeachofourcomputationalmodelsfromSection4aswellasthehumanclozetask.Thesurprisalofanupcomingdiscoursereferentd(T)basedonthepreviouscontexth(T)istherebyestimatedas:S(D(T))=−logp(D(T)|H(T))Inordertodeterminewhetherreferentpredictabilityhasaneffectonreferringexpressiontypeoverandaboveotherfactorsthatareknowntoaffectthechoiceofreferringexpression,wetrainalogisticregressionmodelwithreferringexpressiontypeasaresponsevariableanddiscoursereferentpredictabil-ityaswellasalargesetofotherlinguisticfactors(basedonTilyandPiantadosi,2009)asexplanatoryvariables.Themodelisdeﬁnedasfollows:P(N(T)=n|D(T),H(T))=exp(vTg(N,dt,H(T)))Pn0exp(vTg(n0,dt,H(T))),whered(T)andh(T)aredeﬁnedasbefore,gisthefeaturefunction,andvisthevectorofmodelpa-rameters.ThesummationinthedenominatorisoverNPtypes(fullNPvs.pronoun/propernoun).5.3REModelExperimentsWeranfourdifferentlogisticregressionmodels.Thesemodelsallcontainedexactlythesamesetoflinguisticpredictorsbutdifferedintheestimatesusedforreferenttypesurprisalandresidualentropy.Onelogisticregressionmodelusedsurprisalesti-matesbasedonthehumanreferentclozetask,whilethethreeothermodelsusedestimatesbasedonthethreecomputationalmodels(Base,LinguisticandScript).Forourexperiment,weareinterestedinthechoiceofreferringexpressiontypeforthoseoccur-rencesofreferences,wherea“realchoice”ispossi-ble.Wethereforeexcludeforouranalysisreportedbelowallﬁrstmentionsaswellasallﬁrstandsecondpersonpronouns(becausethereisnooptionalityinhowtorefertoﬁrstorsecondperson).Thissubsetcontains1345datapoints.5.4ResultsTheresultsofallfourlogisticregressionmodelsareshowninTable5.Weﬁrsttakealookattheresultsforthelinguisticfeatures.Whilethereisabitofvariabilityintermsoftheexactcoefﬁcientes-timatesbetweenthemodels(thisissimplyduetosmallcorrelationsbetweenthesepredictorsandthepredictorsforsurprisal),theeffectofallofthesefeaturesislargelyconsistentacrossmodels.Forin-stance,thepositivecoefﬁcientsfortherecencyfea-turemeansthatwhenapreviousmentionhappened l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l a c _ a _ 0 0 0 4 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 42 EstimateStd.ErrorPr(>|z|)HumanScriptLinguisticBaseHumanScriptLinguisticBaseHumanScriptLinguisticBase(Intercept)-3.4-3.418-3.245-3.0610.2440.2790.3210.791<2e-16***<2e-16***<2e-16***0.00011***recency1.3221.3221.3241.3220.0950.0950.0960.097<2e-16***<2e-16***<2e-16***<2e-16***frequency0.0970.1030.1120.1140.0980.0970.0980.1020.3170.2890.2510.262pastObj0.4070.3960.4230.3950.2930.2940.2950.30.1650.1780.1510.189pastSubj-0.967-0.973-0.909-0.9260.5590.5640.5620.5650.0838.0.0846.0.1060.101pastExpPronoun1.6031.6191.6161.6020.210.2070.2080.2452.19e-14***5.48e-15***7.59e-15***6.11e-11***depTypeSubj2.9392.9422.6562.4170.2990.3470.4291.113<2e-16***<2e-16***5.68e-10***0.02994*depTypeObj1.1991.2270.9770.7050.2480.3060.3891.1091.35e-06***6.05e-05***0.0119*0.525surprisal-0.04-0.0060.002-0.1310.0990.0970.1170.3870.6840.9510.9880.735residualEntropy-0.0090.023-0.141-0.1280.0880.1280.1680.2580.9160.8590.4010.619Table5:Coefﬁcientsobtainedfromregressionanalysisfordifferentmodels.TwoNPtypesconsidered:fullNPandPronoun/ProperNoun,withbaseclassfullNP.Signiﬁcance:‘***’<0.001,‘**’<0.01,‘*’<0.05,and‘.’<0.1.veryrecently,thereferringexpressionismorelikelytobeapronoun(andnotafullNP).Thecoefﬁcientsforthesurprisalestimatesofthedifferentmodelsare,however,notsigniﬁcantlydif-ferentfromzero.Modelcomparisonshowsthattheydonotimprovemodelﬁt.Wealsousedtheesti-matedmodelstopredictreferringexpressiontypeonnewdataandagainfoundthatsurprisalestimatesfromthemodelsdidnotimprovepredictionaccu-racy.Thiseffectevenholdsforourhumanclozedata.Hence,itcannotbeinterpretedasaproblemwiththemodels—evenhumanpredictabilityesti-matesare,forthisdataset,notpredictiveofreferringexpressiontype.Wealsocalculatedregressionmodelsforthefulldatasetincludingﬁrstandsecondpersonpronounsaswellasﬁrstmentions(3346datapoints).There-sultsforthefulldatasetarefullyconsistentwiththeﬁndingsshowninTable5:therewasnosigniﬁcanteffectofsurprisalonreferringexpressiontype.ThisresultcontrastswiththeﬁndingsbyTilyandPiantadosi(2009),whoreportedasigniﬁcanteffectofsurprisalonREtypefortheirdata.Inordertoreplicatetheirsettingsascloselyaspossible,wealsoincludedresidualEntropyasapredictorinourmodel(seelastpredictorinTable5);however,thisdidnotchangetheresults.6DiscussionandFutureWorkOurstudyonincrementallypredictingdiscoursereferentsshowedthatscriptknowledgeisahighlyimportantfactorindetermininghumandiscourseex-pectations.Crucially,thecomputationalmodellingapproachallowedustoteaseapartthedifferentfac-torsthataffecthumanpredictionaswecannotma-nipulatethisinhumansdirectly(byaskingthemto“switchoff”theircommon-senseknowledge).Bymodellingcommon-senseknowledgeintermsofeventsequencesandeventparticipants,ourmodelcapturesmanymorelong-rangedependenciesthannormallanguagemodels.Thescriptknowledgeisautomaticallyinducedbyourmodelfromcrowd-sourcedscenario-speciﬁctextcollections.Inasecondstudy,wesetouttotestthehypoth-esisthatuniforminformationdensityaffectsrefer-ringexpressiontype.Thisquestionishighlycon-troversialintheliterature:whileTilyandPiantadosi(2009)ﬁndasigniﬁcanteffectofsurprisalonrefer-ringexpressiontypeinacorpusstudyverysimilartoours,otherstudiesthatuseamoretightlycon-trolledexperimentalapproachhavenotfoundanef-fectofpredictabilityonREtype(Stevensonetal.,1994;FukumuraandvanGompel,2010;RohdeandKehler,2014).Thepresentstudy,whilereplicatingexactlythesettingofT&Pintermsoffeaturesandanalysis,didnotﬁndsupportforaUIDeffectonREtype.ThedifferenceinresultsbetweenT&P2009andourresultscouldbeduetothedifferentcorporaandtextsortsthatwereused;speciﬁcally,wewouldexpectthatlargerpredictabilityeffectsmightbeob-servableatscriptboundaries,ratherthanwithinascript,asisthecaseinourstories.Anextstepinmovingourparticipantpredic-tionmodeltowardsNLPapplicationswouldbetoreplicateourmodellingresultsonautomatictext-to-scriptmappinginsteadofgold-standarddataasdonehere(inordertoapproximatehumanlevelofprocessing).Furthermore,weaimtomovetomorecomplextexttypesthatincludereferencetoseveralscripts.WeplantoconsidertherecentlypublishedROCStoriescorpus(Mostafazadehetal.,2016),alargecrowdsourcedcollectionoftopicallyunre-strictedshortandsimplenarratives,asabasisforthesenextstepsinourresearch. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l a c _ a _ 0 0 0 4 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 43 AcknowledgmentsWethanktheeditorsandtheanonymousreview-ersfortheirinsightfulsuggestions.WewouldliketothankFlorianPusseforhelpingwiththeAma-zonMechanicalTurkexperiment.WewouldalsoliketothankSimonOstermannandTatjanaAnikinaforhelpingwiththeInScriptcorpus.ThisresearchwaspartiallysupportedbytheGermanResearchFoundation(DFG)aspartofSFB1102‘Informa-tionDensityandLinguisticEncoding’,EuropeanResearchCouncil(ERC)aspartofERCStartingGrantBroadSem(#678254),theDutchNationalSci-enceFoundationaspartofNWOVIDI639.022.518,andtheDFGonceagainaspartoftheMMCIClusterofExcellence(EXC284).ReferencesSimonAhrendtandVeraDemberg.2016.Improvingeventpredictionbyrepresentingscriptparticipants.InProceedingsofNAACL-HLT.JenniferE.Arnold.2001.Theeffectofthematicrolesonpronounuseandfrequencyofreferencecontinuation.DiscourseProcesses,31(2):137–162.MarcoBaroniandAlessandroLenci.2010.Distribu-tionalmemory:Ageneralframeworkforcorpus-basedsemantics.ComputationalLinguistics,36(4):673–721.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tcount,predict!Asystematiccompari-sonofcontext-countingvs.context-predictingseman-ticvectors.InProceedingsofACL.AntoineBordes,NicolasUsunier,AlbertoGarcia-Duran,JasonWeston,andOksanaYakhnenko.2013.Trans-latingembeddingsformodelingmulti-relationaldata.InProceedingsofNIPS.JanA.BothaandPhilBlunsom.2014.Compositionalmorphologyforwordrepresentationsandlanguagemodelling.InProceedingsofICML.RichardH.Byrd,PeihuangLu,JorgeNocedal,andCiyouZhu.1995.Alimitedmemoryalgorithmforboundconstrainedoptimization.SIAMJournalonScientiﬁcComputing,16(5):1190–1208.NathanaelChambersandDanielJurafsky.2008.Unsu-pervisedlearningofnarrativeeventchains.InPro-ceedingsofACL.NathanaelChambersandDanJurafsky.2009.Unsuper-visedlearningofnarrativeschemasandtheirpartici-pants.InProceedingsofACL.BrianS.Everitt.1992.Theanalysisofcontingencyta-bles.CRCPress.LeaFrermann,IvanTitov,andManfredPinkal.2014.AhierarchicalBayesianmodelforunsupervisedinduc-tionofscriptknowledge.InProceedingsofEACL.KumikoFukumuraandRogerP.G.vanGompel.2010.Choosinganaphoricexpressions:Dopeopletakeintoaccountlikelihoodofreference?JournalofMemoryandLanguage,62(1):52–66.T.FlorianJaeger,EstebanBuz,EvaM.Fernandez,andHelenS.Cairns.2016.Signalreductionandlinguis-ticencoding.Handbookofpsycholinguistics.Wiley-Blackwell.T.FlorianJaeger.2010.Redundancyandreduction:Speakersmanagesyntacticinformationdensity.Cog-nitivepsychology,61(1):23–62.BramJans,StevenBethard,IvanVuli´c,andMarieFrancineMoens.2012.Skipn-gramsandrankingfunctionsforpredictingscriptevents.InProceedingsofEACL.GinaR.KuperbergandT.FlorianJaeger.2016.Whatdowemeanbypredictioninlanguagecomprehension?Language,cognitionandneuroscience,31(1):32–59.GinaR.Kuperberg.2016.Separatestreamsorproba-bilisticinference?WhattheN400cantellusaboutthecomprehensionofevents.Language,CognitionandNeuroscience,31(5):602–616.MartaKutas,KatherineA.DeLong,andNathanielJ.Smith.2011.Alookaroundatwhatliesahead:Pre-dictionandpredictabilityinlanguageprocessing.Pre-dictionsinthebrain:Usingourpasttogenerateafu-ture.TomasMikolov,MartinKaraﬁ´at,LukasBurget,JanCer-nock`y,andSanjeevKhudanpur.2010.Recurrentneu-ralnetworkbasedlanguagemodel.InProceedingsofInterspeech.TomasMikolov,StefanKombrink,AnoopDeoras,LukarBurget,andJanCernocky.2011.RNNLM-recurrentneuralnetworklanguagemodelingtoolkit.InPro-ceedingsofthe2011ASRUWorkshop.TomasMikolov,IlyaSutskever,KaiChen,GregS.Cor-rado,andJeffDean.2013.Distributedrepresentationsofwordsandphrasesandtheircompositionality.InProceedingsofNIPS.AshutoshModiandIvanTitov.2014.Inducingneuralmodelsofscriptknowledge.ProceedingsofCoNLL.AshutoshModi,TatjanaAnikina,SimonOstermann,andManfredPinkal.2016.Inscript:Narrativetextsanno-tatedwithscriptinformation.ProceedingsofLREC.AshutoshModi.2016.Eventembeddingsforsemanticscriptmodeling.ProceedingsofCoNLL.NasrinMostafazadeh,NathanaelChambers,XiaodongHe,DeviParikh,DhruvBatra,LucyVanderwende,PushmeetKohli,andJamesAllen.2016.Acorpusandclozeevaluationfordeeperunderstandingofcom-monsensestories.ProceedingsofNAACL. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l a c _ a _ 0 0 0 4 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 44 HaoruoPeng,DanielKhashabi,andDanRoth.2015.Solvinghardcoreferenceproblems.InProceedingsofNAACL.KarlPichottaandRaymondJMooney.2014.Statisticalscriptlearningwithmulti-argumentevents.Proceed-ingsofEACL.AltafRahmanandVincentNg.2012.Resolvingcom-plexcasesofdeﬁnitepronouns:theWinogradschemachallenge.InProceedingsofEMNLP.MichaelaRegneri,AlexanderKoller,andManfredPinkal.2010.Learningscriptknowledgewithwebexperiments.InProceedingsofACL.HannahRohdeandAndrewKehler.2014.Grammati-calandinformation-structuralinﬂuencesonpronounproduction.Language,CognitionandNeuroscience,29(8):912–927.RachelRudinger,VeraDemberg,AshutoshModi,Ben-jaminVanDurme,andManfredPinkal.2015.Learn-ingtopredictscripteventsfromdomain-speciﬁctext.ProceedingsoftheInternationalConferenceonLexi-calandComputationalSemantics(*SEM2015).AsadSayeed,ClaytonGreenberg,andVeraDemberg.2016.Thematicﬁtevaluation:anaspectofselectionalpreferences.InProceedingsoftheWorkshoponEval-uatingVectorSpaceRepresentationsforNLP(RepE-val2016).RogerC.SchankandRobertP.Abelson.1977.Scripts,Plans,Goals,andUnderstanding.LawrenceErlbaumAssociates,Potomac,Maryland.SimoneSch¨utz-BosbachandWolfgangPrinz.2007.Prospectivecodingineventrepresentation.Cognitiveprocessing,8(2):93–102.RosemaryJ.Stevenson,RosalindA.Crawley,andDavidKleinman.1994.Thematicroles,focusandtherep-resentationofevents.LanguageandCognitivePro-cesses,9(4):519–548.HarryTilyandStevenPiantadosi.2009.Referefﬁ-ciently:Uselessinformativeexpressionsformorepre-dictablemeanings.InProceedingsoftheworkshopontheproductionofreferringexpressions:Bridgingthegapbetweencomputationalandempiricalapproachestoreference.AlessandraZarcone,MartenvanSchijndel,JorrigVo-gels,andVeraDemberg.2016.Salienceandatten-tioninsurprisal-basedaccountsoflanguageprocess-ing.FrontiersinPsychology,7:844.
Scarica il pdf