计算语言学协会会刊, 卷. 5, PP. 31–44, 2017. 动作编辑器: Hwee Tou Ng.

计算语言学协会会刊, 卷. 5, PP. 31–44, 2017. 动作编辑器: Hwee Tou Ng.
提交批次: 8/2016 修改批次: 10/2016; 已发表 1/2017.

2017 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

C
(西德:13)

ModelingSemanticExpectation:UsingScriptKnowledgeforReferentPredictionAshutoshModi1,3IvanTitov2,4VeraDemberg1,3AsadSayeed1,3ManfredPinkal1,31{ashutosh,维拉,asayeed,pinkal}@coli.uni-saarland.de2titov@uva.nl3Universit¨atdesSaarlandes,Germany4ILLC,UniversityofAmsterdam,theNetherlandsAbstractRecentresearchinpsycholinguisticshaspro-videdincreasingevidencethathumanspredictupcomingcontent.Predictionalsoaffectsper-ceptionandmightbeakeytorobustnessinhumanlanguageprocessing.Inthispaper,weinvestigatethefactorsthataffecthumanpredictionbybuildingacomputationalmodelthatcanpredictupcomingdiscoursereferentsbasedonlinguisticknowledgealonevs.lin-guisticknowledgejointlywithcommon-senseknowledgeintheformofscripts.Weﬁndthatscriptknowledgesigniﬁcantlyimprovesmodelestimatesofhumanpredictions.Inasecondstudy,wetestthehighlycontroversialhypothesisthatpredictabilityinﬂuencesrefer-ringexpressiontypebutdonotﬁndevidenceforsuchaneffect.1IntroductionBeingabletoanticipateupcomingcontentisacorepropertyofhumanlanguageprocessing(Kutasetal.,2011;KuperbergandJaeger,2016)thathasre-ceivedalotofattentioninthepsycholinguisticliter-atureinrecentyears.Expectationsaboutupcomingwordshelphumanscomprehendlanguageinnoisysettingsanddealwithungrammaticalinput.Inthispaper,weuseacomputationalmodeltoaddressthequestionofhowdifferentlayersofknowledge(lin-guisticknowledgeaswellascommon-senseknowl-edge)inﬂuencehumananticipation.Herewefocusourattentiononsemanticpre-dictionsofdiscoursereferentsforupcomingnounphrases.Thistaskisparticularlyinterestingbecauseitallowsustoseparatethesemantictaskofantic-ipatinganintendedreferentandtheprocessingoftheactualsurfaceform.Forexample,inthecon-textofIorderedamediumsirloinsteakwithfries.Later,thewaiterbrought…,thereisastrongex-pectationofaspeciﬁcdiscoursereferent,i.e.,thereferentintroducedbytheobjectNPofthepreced-ingsentence,whilethepossiblereferringexpressioncouldbeeitherthesteakIhadordered,thesteak,ourfood,orit.Existingmodelsofhumanpredic-tionareusuallyformulatedusingtheinformation-theoreticconceptofsurprisal.Inrecentwork,how-ever,surprisalisusuallynotcomputedforDRs,whichrepresenttherelevantsemanticunit,butforthesurfaceformofthereferringexpressions,eventhoughthereisanincreasingamountofliteraturesuggestingthathumanexpectationsatdifferentlev-elsofrepresentationhaveseparableeffectsonpre-dictionand,asaconsequence,thatthemodellingofonlyonelevel(thelinguisticsurfaceform)isin-sufﬁcient(KuperbergandJaeger,2016;Kuperberg,2016;Zarconeetal.,2016).Thepresentmodelad-dressesthisshortcomingbyexplicitlymodellingandrepresentingcommon-senseknowledgeandconcep-tuallyseparatingthesemantic(discoursereferent)andthesurfacelevel(referringexpression)expec-tations.OurdiscoursereferentpredictiontaskisrelatedtotheNLPtaskofcoreferenceresolution,butitsubstantiallydiffersfromthattaskinthefollowingways:1)weuseonlytheincrementallyavailableleftcontext,whilecoreferenceresolutionusesthefulltext;2)coreferenceresolutiontriestoidentifytheDRforagiventargetNPincontext,whilewelookattheexpectationsofDRsbasedonlyonthecontext

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

beforethetargetNPisseen.Thedistinctionbetweenreferentpredictionandpredictionofreferringexpressionsalsoallowsustostudyacloselyrelatedquestioninnaturallanguagegeneration:thechoiceofatypeofreferringexpres-sionbasedonthepredictabilityoftheDRthatisintendedbythespeaker.ThispartofourworkisinspiredbyareferentguessingexperimentbyTilyandPiantadosi(2009),whoshowedthathighlypre-dictablereferentsweremorelikelytoberealizedwithapronounthanunpredictablereferents,whichweremorelikelytoberealizedusingafullNP.TheeffecttheyobserveisconsistentwithaGriceanpointofview,ortheprincipleofuniforminformationden-sity(seeSection5.1).然而,TilyandPiantadosidonotprovideacomputationalmodelforestimat-ingreferentpredictability.Also,theydonotincludeselectionalpreferenceorcommon-senseknowledgeeffectsintheiranalysis.Webelievethatscriptknowledge,i.e.,common-senseknowledgeabouteverydayeventsequences,representsagoodstartingpointformodellingcon-versationalanticipation.Thistypeofcommon-senseknowledgeincludestemporalstructurewhichispar-ticularlyrelevantforanticipationincontinuouslan-guageprocessing.Furthermore,ourapproachcanbuildonprogressthathasbeenmadeinrecentyearsinmethodsforacquiringlarge-scalescriptknowl-edge;seeSection1.1.Ourhypothesisisthatscriptknowledgemaybeasigniﬁcantfactorinhumanan-ticipationofdiscoursereferents.Explicitlymod-ellingthisknowledgewillthusallowustoproducemorehuman-likepredictions.Scriptknowledgeenablesourmodeltogenerateanticipationsaboutdiscoursereferentsthathaveal-readybeenmentionedinthetext,aswellasanticipa-tionsabouttextuallynewdiscoursereferentswhichhavebeenactivatedduetoscriptknowledge.Bymodellingeventsequencesandeventparticipants,ourmodelcapturesmanymorelong-rangedepen-denciesthannormallanguagemodelsareableto.Asanexample,considerthefollowingtwoalternativetextpassages:Wegotseated,andhadtowaitfor20minutes.Then,thewaiterbroughtthe…Weordered,andhadtowaitfor20minutes.Then,thewaiterbroughtthe…Preferredcandidatereferentsfortheobjectposi-tionofthewaiterbroughtthe…areinstancesofthefood,menu,orbillparticipanttypes.Inthecon-textofthealternativeprecedingsentences,thereisastrongexpectationofinstancesofamenuandafoodparticipant,respectively.Thispaperrepresentsfoundationalresearchin-vestigatinghumanlanguageprocessing.However,italsohasthepotentialforapplicationinassistanttechnologyandembodiedagents.Thegoalistoachievehuman-levellanguagecomprehensioninre-alisticsettings,andinparticulartoachieverobust-nessinthefaceoferrorsornoise.Explicitlymod-ellingexpectationsthataredrivenbycommon-senseknowledgeisanimportantstepinthisdirection.Inordertobeabletoinvestigatetheinﬂuenceofscriptknowledgeondiscoursereferentexpecta-tions,weuseacorpusthatcontainsfrequentrefer-encetoscriptknowledge,andprovidesannotationsforcoreferenceinformation,scripteventsandpar-ticipants(Section2).InSection3,wepresentalarge-scaleexperimentforempiricallyassessinghu-manexpectationsonupcomingreferents,whichal-lowsustoquantifyatwhatpointsinatexthumanshaveveryclearanticipationsvs.whentheydonot.Ourgoalistomodelhumanexpectations,eveniftheyturnouttobeincorrectinaspeciﬁcinstance.TheexperimentwasconductedviaMechanicalTurkandfollowsthemethodologyofTilyandPianta-dosi(2009).Insection4,wedescribeourcomputa-tionalmodelthatrepresentsscriptknowledge.Themodelistrainedonthegoldstandardannotationsofthecorpus,becauseweassumethathumancompre-hendersusuallywillhaveananalysisofthepreced-ingdiscoursewhichcloselycorrespondstothegoldstandard.Wecomparethepredictionaccuracyofthismodeltohumanpredictions,aswellastotwobaselinemodelsinSection4.3.Oneofthemusesonlystructurallinguisticfeaturesforpredictingref-erents;theotherusesgeneralscript-independentse-lectionalpreferencefeatures.InSection5,wetestwhethersurprisal(asestimatedfromhumanguessesvs.computationalmodels)canpredictthetypeofreferringexpressionusedintheoriginaltextsinthecorpus(pronounvs.fullreferringexpression).Thisexperimentalsohaswiderimplicationswithrespecttotheon-goingdiscussionofwhetherthereferringexpressionchoiceisdependentonpredictability,aspredictedbytheuniforminformationdensityhy-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(我)(1)Pbather[决定]Ewashtotakea(bath)(2)Pbathyesterdayafternoonafterworkingout.Once(我)(1)Pbathergotbackhome,(我)(1)Pbather[walked]Eenterbathroomto(我的)(1)Pbather(bathroom)(3)Pbathroomandﬁrstquicklyscrubbedthe(bathroomtub)(4)Pbathtubby[turningon]Eturnwateronthe(水)(5)Pwaterandrinsing(它)(4)Pbathtubcleanwitharag.After(我)(1)Pbatherﬁnished,(我)(1)Pbather[plugged]Eclosedrainthe(tub)(4)Pbathtubandbegan[ﬁlling]Eﬁllwater(它)(4)Pbathtubwithwarm(水)(5)Pwatersetatabout98(学位)(6)Ptemperature.Figure1:AnexcerptfromastoryintheInScriptcorpus.Thereferringexpressionsareinparentheses,andthecorrespondingdiscoursereferentlabelisgivenbythesuperscript.Referringexpressionsofthesamediscoursereferenthavethesamecolorandsuperscriptnumber.Script-relevanteventsareinsquarebracketsandcoloredinorange.Eventtypeisindicatedbythecorrespondingsubscript.pothesis.Thecontributionsofthispaperconsistof:•alargedatasetofhumanexpectations,inava-rietyoftextsrelatedtoevery-dayactivities.•animplementationoftheconceptualdistinctionbetweenthesemanticlevelofreferentpredic-tionandthetypeofareferringexpression.•acomputationalmodelwhichsigniﬁcantlyim-provesmodellingofhumananticipations.•showingthatscriptknowledgeisasigniﬁcantfactorinhumanexpectations.•testingthehypothesisofTilyandPiantadosithatthechoiceofthetypeofreferringexpres-sion(pronounorfullNP)dependsonthepre-dictabilityofthereferent.1.1ScriptsScriptsrepresentknowledgeabouttypicaleventsequences(SchankandAbelson,1977),forexam-plethesequenceofeventshappeningwheneatingatarestaurant.Scriptknowledgetherebyincludeseventslikeorder,bringandeataswellaspartici-pantsofthoseevents,e.g.,menu,waiter,食物,guest.Existingmethodsforacquiringscriptknowledgearebasedonextractingnarrativechainsfromtext(ChambersandJurafsky,2008;ChambersandJuraf-sky,2009;Jansetal.,2012;PichottaandMooney,2014;Rudingeretal.,2015;Modi,2016;AhrendtandDemberg,2016)orbyelicitingscriptknowledgeviaCrowdsourcingonMechanicalTurk(Regnerietal.,2010;Frermannetal.,2014;ModiandTitov,2014).Modellinganticipatedeventsandparticipantsismotivatedbyevidenceshowingthateventrepre-sentationsinhumanscontaininformationnotonlyaboutthecurrentevent,butalsoaboutpreviousandfuturestates,thatis,humansgenerateanticipa-tionsabouteventsequencesduringnormallanguagecomprehension(Sch¨utz-BosbachandPrinz,2007).ScriptknowledgerepresentationshavebeenshowntobeusefulinNLPapplicationsforambiguityreso-lutionduringreferenceresolution(RahmanandNg,2012).2数据:TheInScriptCorpusOrdinarytexts,includingnarratives,encodescriptstructureinawaythatistoocomplexandtooim-plicitatthesametimetoenableasystematicstudyofscript-basedexpectation.Theycontaininterleavedreferencestomanydifferentscripts,andtheyusuallyrefertosinglescriptsinapoint-wisefashiononly,relyingontheabilityofthereadertoinferthefulleventchainusingtheirbackgroundknowledge.WeusetheInScriptcorpus(Modietal.,2016)tostudythepredictiveeffectofscriptknowledge.In-Scriptisacrowdsourcedcorpusofsimplenarrativetexts.Participantswereaskedtowriteaboutaspe-ciﬁcactivity(e.g.,arestaurantvisit,abusride,oragroceryshoppingevent)whichtheypersonallyex-perienced,andtheywereinstructedtotellthestoryasifexplainingtheactivitytoachild.Thisresultedinstoriesthatarecenteredaroundaspeciﬁcscenarioandthatexplicitlymentionmundanedetails.Thus,theygenerallyrealizelongereventchainsassociatedwithasinglescript,whichmakesthemparticularlyappropriatetoourpurpose.TheInScriptcorpusislabelledwithevent-type,participant-type,andcoreferenceinformation.Fullverbsarelabeledwitheventtypeinformation,headsofallnounphraseswithparticipanttypes,usingscenario-speciﬁclistsofeventtypes(suchasenterbathroom,closedrainandﬁllwaterforthe“takingabath”scenario)andparticipanttypes(suchasbather,waterandbathtub).Onaverage,eachtemplateof-fersachoiceof20eventtypesand18participant

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

(我)(1)decidedtotakea(bath)(2)yesterdayafternoonafterworkingout.Once(我)(1)gotbackhome,(我)(1)walkedto(我的)(1)(bathroom)(3)andﬁrstquicklyscrubbedthe(bathroomtub)(4)byturningonthe(水)(5)andrinsing(它)(4)cleanwitharag.Af-ter(我)(1)ﬁnished,(我)(1)pluggedXXXXXXFigure2:AnillustrationoftheMechanicalTurkexper-imentforthereferentclozetask.Workersaresupposedtoguesstheupcomingreferent(indicatedbyXXXXXXabove).Theycaneitherchoosefromthepreviouslyacti-vatedreferents,ortheycanwritesomethingnew.051015201451DR_4(P_bathtub)the drain(new DR)DR_1(P_bather)Number of WorkersFigure3:ResponseofworkerscorrespondingtothestoryinFig.2.Workersguessedtwoalreadyactivateddis-coursereferents(DR)DR4andDR1.Someoftheworkersalsochosethe“new”optionandwrotedifferentlexicalvariantsof“bathtubdrain”,anewDRcorrespond-ingtotheparticipanttype“thedrain”.types.TheInScriptcorpusconsistsof910storiesad-dressing10scenarios(about90storiesperscenario).Thecorpushas200,000words,12,000verbin-stanceswitheventlabels,and44,000headnounswithparticipantinstances.Modietal.(2016)reportaninter-annotatoragreementof0.64foreventtypesand0.77forparticipanttypes(Fleiss’kappa).Weusegold-standardevent-andparticipant-typeannotationtostudytheinﬂuenceofscriptknowl-edgeontheexpectationofdiscoursereferents.Inaddition,InScriptprovidescoreferenceannotation,whichmakesitpossibletokeeptrackofthemen-tioneddiscoursereferentsateachpointinthestory.WeusethisinformationinthecomputationalmodelofDRpredictionandintheDRguessingexperimentdescribedinthenextsection.Anexampleofanan-notatedInScriptstoryisshowninFigure1.3ReferentClozeTaskWeusetheInScriptcorpustodevelopcomputa-tionalmodelsforthepredictionofdiscourserefer-ents(DRs)andtoevaluatetheirpredictionaccuracy.Thiscanbedonebytestinghowoftenourmodelsmanagetoreproducetheoriginaldiscoursereferent(cf.alsothe“narrativecloze”taskby(ChambersandJurafsky,2008)whichtestswhetheraverbtogetherwitharolecanbecorrectlyguessedbyamodel).然而,wedonotonlywanttopredictthe“cor-rect”DRsinatextbutalsotomodelhumanexpec-tationofDRsincontext.Toempiricallyassesshu-manexpectation,wecreatedanadditionaldatabaseofcrowdsourcedhumanpredictionsofdiscourseref-erentsincontextusingAmazonMechanicalTurk.Thedesignofourexperimentcloselyresemblestheguessinggameof(TilyandPiantadosi,2009)butex-tendsitinasubstantialway.WorkershadtoreadstoriesoftheInScriptcorpus1andguessupcomingparticipants:foreachtargetNP,workerswereshownthestoryuptothisNPex-cludingtheNPitself,andtheywereaskedtoguessthenextpersonorobjectmostlikelytobereferredto.Incasetheydecidedinfavourofadiscourseref-erentalreadymentioned,theyhadtochooseamongtheavailablediscoursereferentsbyclickinganNPintheprecedingtext,i.e.,somenounwithaspeciﬁc,coreference-indicatingcolor;seeFigure2.Other-wise,theywouldclickthe“New”button,andwouldinturnbeaskedtogiveashortdescriptionofthenewpersonorobjecttheyexpectedtobementioned.Thepercentageofguessesthatagreewiththeactuallyre-ferredentitywastakenasabasisforestimatingthesurprisal.Theexperimentwasdoneforallstoriesofthetestset:182故事(20%)oftheInScriptcorpus,evenlytakenfromallscenarios.Sinceourfocusisontheeffectofscriptknowledge,weonlyconsid-eredthoseNPsastargetsthataredirectdependentsofscript-relatedevents.Guessingstartedfromthethirdsentenceonlyinordertoensurethatamini-mumofcontextinformationwasavailable.Tokeepthecomplexityofthecontextmanageable,were-strictedguessingtoamaximumof30targetsandskippedtherestofthestory(thisappliedto12%ofthestories).Wecollected20guessesperNPfor3346nounphraseinstances,whichamountstoato-talofaround67Kguesses.Workersselectedacon-1Thecorpusisavailableat:http://www.sfb1102.uni-saarland.de/?page_id=2582

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

textNPin68%ofcasesand“New”in32%ofcases.Ourleadinghypothesisisthatscriptknowledgesubstantiallyinﬂuenceshumanexpectationofdis-coursereferents.TheguessingexperimentprovidesabasistoestimatehumanexpectationofalreadymentionedDRs(thenumberofclicksontherespec-tiveNPsintext).然而,weexpectthatscriptknowledgehasaparticularlystronginﬂuenceinthecaseofﬁrstmentions.Onceascriptisevokedinatext,weassumethatthefullscriptstructure,includ-ingallparticipants,isactivatedandavailabletothereader.TilyandPiantadosi(2009)areinterestedinsec-ondmentionsonlyandthereforedonotmakeuseoftheworker-generatednounphrasesclassiﬁedas“New”.Tostudytheeffectofactivatedbutnotexplicitlymentionedparticipants,wecarriedoutasubsequentannotationstepontheworker-generatednounphrasesclassiﬁedas“New”.Wepresentedan-notatorswiththesenounphrasesintheircontexts(withco-referringNPsmarkedbycolor,asintheM-Turkexperiment)和,inaddition,displayedallpar-ticipanttypesoftherelevantscript(i.e.,thescriptas-sociatedwiththetextintheInScriptcorpus).Anno-tatorsdidnotseethe“correct”targetNP.Weaskedannotatorstoeither(1)selecttheparticipanttypein-stantiatedbytheNP(ifany),(2)labeltheNPasun-relatedtothescript,或者(3),linktheNPtoanovertantecedentinthetext,inthecasethattheNPisac-tuallyasecondmentionthathadbeenerroneouslylabeledasnewbytheworker.Option(1)providesabasisforaﬁne-grainedestimationofﬁrst-mentionDRs.Option(3),whichweaddedwhenwenoticedtheconsiderablenumberofoverlookedantecedents,servesascorrectionoftheresultsoftheM-Turkex-periment.Outofthe22Kannotated“New”cases,39%wereidentiﬁedassecondmentions,55%werelinkedtoaparticipanttype,and6%wereclassiﬁedasreallynovel.4ReferentPredictionModelInthissection,wedescribethemodelweusetopredictupcomingdiscoursereferents(DRs).4.1ModelOurmodelshouldnotonlyassignprobabilitiestoDRsalreadyexplicitlyintroducedinthepreced-ingtextfragment(e.g.,“bath”or“bathroom”fortheclozetaskinFigure2)butalsoreservesomeprob-abilitymassfor‘new’DRs,i.e.,DRsactivatedviathescriptcontextorcompletelynovelonesnotbe-longingtothescript.Inprinciple,differentvariantsoftheactivationmechanismmustbedistinguished.Formanyparticipanttypes,asingleparticipantbe-longingtoaspeciﬁcsemanticclassisexpected(re-ferredtowiththebathtuborthesoap).Incontrast,the“towel’participanttypemayactivateasetofob-jects,elementsofwhichthencanbereferredtowithatoweloranothertowel.The“bathmeans”partici-panttypemayevenactivateagroupofDRsbelong-ingtodifferentsemanticclasses(e.g.,bubblebathandsalts).Sinceitisnotfeasibletoenumerateallpotentialparticipants,for‘new’DRsweonlypre-dicttheirparticipanttype(“bathmeans”inourex-ample).Inotherwords,thenumberofcategoriesinourmodelisequaltothenumberofpreviouslyintroducedDRsplusthenumberofparticipanttypesofthescriptplus1,reservedforanewDRnotcorre-spondingtoanyscriptparticipant(e.g.,cellphone).Inwhatfollows,weslightlyabusetheterminologyandrefertoallthesecategoriesasdiscourserefer-ents.Unlikestandardco-referencemodels,whichpre-dictco-referencechainsrelyingontheentiredocu-ment,ourmodelisincremental,thatis,whenpre-dictingadiscoursereferentd(t)atagivenpositiont,itcanlookonlyinthehistoryh(t)(i.e.,thepre-cedingpartofthedocument),excludingtherefer-ringexpression(关于)forthepredictedDR.WealsoassumethatpastREsarecorrectlyresolvedandas-signedtocorrectparticipanttypes(PTs).TypicalNLPapplicationsuseautomaticcoreferencereso-lutionsystems,butsincewewanttomodelhumanbehavior,thismightbeinappropriate,sinceanau-tomatedsystemwouldunderestimatehumanperfor-mance.Thismaybeastrongassumption,butforreasonsexplainedabove,weusegoldstandardpastREs.Weusethefollowinglog-linearmodel(“softmaxregression”):p(d(t)=d|H(t))=exp(wTf(d,H(t)))Pd0exp(wTf(d0,h(t))),wherefisthefeaturefunctionwewilldiscussinthefollowingsubsection,waremodelparameters,andthesummationinthedenominatorisoverthe

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

FeatureTypeRecencyShallowLinguisticFrequencyShallowLinguisticGrammaticalfunctionShallowLinguisticPrevioussubjectShallowLinguisticPreviousobjectShallowLinguisticPreviousREtypeShallowLinguisticSelectionalpreferencesLinguisticParticipanttypeﬁtScriptPredicateschemasScriptTable1:Summaryoffeaturetypessetofcategoriesdescribedabove.Someofthefeaturesincludedinfareafunc-tionofthepredicatesyntacticallygoverningtheunobservabletargetRE(correspondingtotheDRbeingpredicted).然而,inourincrementalsetting,thepredicateisnotavailableinthehis-toryh(t)forsubjectNPs.Inthiscase,weuseanadditionalprobabilisticmodel,whichesti-matestheprobabilityofthepredicatevgiventhecontexth(t),andmarginalizeoutitspredictions:p(d(t)=d|H(t))=Xvp(v|H(t))经验值(wTf(d,H(t),v))Pd0exp(wTf(d0,h(t),v))Thepredicateprobabilitiesp(v|H(t))arecomputedbasedonthesequenceofprecedingpredicates(i.e.,ignoringanyotherwords)usingtherecurrentneuralnetworklanguagemodelestimatedonourtrainingset.2Theexpressionf(d,H(t),v)denotesthefeaturefunctioncomputedforthereferentd,giventhehistorycomposedofh(t)andthepredicatev.4.2FeaturesOurfeaturesencodepropertiesofaDRaswellascharacterizeitscompatibilitywiththecontext.Wefacetwochallengeswhendesigningourfea-tures.First,althoughthesizesofourdatasetsarerespectablefromthescriptannotationperspective,theyaretoosmalltolearnarichlyparameterizedmodel.Formanyofourfeatures,weaddressthischallengebyusingexternalwordembeddings3andassociateparameterswithsomesimplesimilaritymeasurescomputedusingtheseembeddings.Con-2WeusedRNNLMtoolkit(Mikolovetal.,2011;Mikolovetal.,2010)withdefaultsettings.3Weuse300-dimensionalwordembeddingsestimatedonWikipediawiththeskip-grammodelofMikolovetal.(2013):https://code.google.com/p/word2vec/sequently,thereareonlyafewdozenparameterswhichneedtobeestimatedfromscenario-speciﬁcdata.Second,inordertotestourhypothesisthatscriptinformationisbeneﬁcialfortheDRpredictiontask,weneedtodisentangletheinﬂuenceofscriptinformationfromgenerallinguisticknowledge.Weaddressthisbycarefullysplittingthefeaturesapart,evenifitpreventsusfrommodelingsomeinterplaybetweenthesourcesofinformation.Wewillde-scribebothclassesoffeaturesbelow;alsoseeasum-maryinTable1.4.2.1ShallowLinguisticFeaturesThesefeaturesarebasedonTilyandPianta-dosi(2009).Inaddition,weconsideraselectionalpreferencefeature.Recencyfeature.Thisfeaturecapturesthedistancelt(d)betweenthepositiontandthelastoccurrenceofthecandidateDRd.Asadistancemeasure,weusethenumberofsentencesfromthelastmentionandexponentiatethisnumbertomakethedepen-dencemoreextreme;onlyveryrecentDRswillre-ceiveanoticeableweight:经验值(−lt(d)).Thisfeatureissetto0fornewDRs.Frequency.Thefrequencyfeatureindicatesthenumberoftimesthecandidatediscoursereferentdhasbeenmentionedsofar.Wedonotperformanybucketing.Grammaticalfunction.ThisfeatureencodesthedependencyrelationassignedtotheheadwordofthelastmentionoftheDRoraspecialnonelabeliftheDRisnew.Previoussubjectindicator.Thisbinaryfeaturein-dicateswhetherthecandidateDRdiscoreferentialwiththesubjectofthepreviousverbalpredicate.Previousobjectindicator.Thesamebutfortheob-jectposition.PreviousREtype.Thisthree-valuedfeatureindi-cateswhetherthepreviousmentionofthecandidateDRdisapronoun,anon-pronominalnounphrase,orhasneverbeenobservedbefore.4.2.2SelectionalPreferencesFeatureTheselectionalpreferencefeaturecaptureshowwellthecandidateDRdﬁtsagivensyntacticpo-sitionrofagivenverbalpredicatev.Itiscom-putedasthecosinesimilaritysimcos(xTd,xv,r)ofavector-spacerepresentationoftheDRxdandastructuredvector-spacerepresentationofthepred-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

icatexv,r.ThesimilaritiesarecalculatedusingaDistributionalMemoryapproachsimilartothatofBaroniandLenci(2010).Theirstructuredvectorspacerepresentationhasbeenshowntoworkwellontasksthatevaluatecorrelationwithhumanthe-maticﬁtestimates(BaroniandLenci,2010;Baronietal.,2014;Sayeedetal.,2016)andisthussuitedtoourtask.Therepresentationxdiscomputedasanaver-ageofheadwordrepresentationsofalltheprevi-ousmentionsofDRd,wherethewordvectorsareobtainedfromtheTypeDMmodelofBaroniandLenci(2010).Thisisacount-based,third-orderco-occurrencetensorwhoseindicesareawordw0,asecondwordw1,andacomplexsyntacticrelationr,whichisusedasastand-inforasemanticlink.Thevaluesforeach(w0,r,w1)cellofthetensorarethelocalmutualinformation(LMI)estimatesobtainedfromadependency-parsedcombinationoflargecor-pora(ukWaC,BNC,andWikipedia).OurprocedurehassomedifferenceswiththatofBaroniandLenci.Forexample,forestimatingtheﬁtofanalternativenewDR(inotherwords,xdbasedonnopreviousmentions),weuseanaver-ageoverheadwordsofallREsinthetrainingset,a“nullreferent.”xv,riscalculatedastheaverageofthetop20(byLMI)r-ﬁllersforvinTypeDM;inotherwords,theprototypicalinstrumentofrubmayberepresentedbysummingvectorsliketowel,肥皂,eraser,硬币…Ifthepredicatehasnotyetbeenen-countered(asforsubjectpositions),scoresforallscenario-relevantverbsareemittedformarginaliza-tion.4.2.3ScriptFeaturesInthissection,wedescribefeatureswhichrelyonscriptinformation.Ourgoalwillbetoshowthatsuchcommon-senseinformationisbeneﬁcialinper-formingDRprediction.Weconsideronlytwoscriptfeatures.ParticipanttypeﬁtThisfeaturecharacterizeshowwelltheparticipanttype(PT)ofthecandidateDRdﬁtsaspeciﬁcsyn-tacticrolerofthegoverningpredicatev;itcanberegardedasageneralizationoftheselectionalprefer-encefeaturetoparticipanttypesandalsoitsspecial-isationtotheconsideredscenario.Giventhecandi-dateDRd,itsparticipanttypep,andthesyntactic(我)(1)decidedtotakea(bath)(2)yesterdayafternoonafterworkingout.(我)(1)wasgettingreadytogooutandneededtogetcleanedbefore(我)(1)wentso(我)(1)decidedtotakea(bath)(2).(我)(1)ﬁlledthe(bath-tub)(3)withwarm(水)(4)andaddedsome(bub-blebath)(5).(我)(1)gotundressedandsteppedintothe(水)(4).(我)(1)grabbedthe(肥皂)(5)andrubbediton(我的)(1)(身体)(7)andrinsedXXXXXXFigure4:Anexampleofthereferentclozetask.SimilartotheMechanicalTurkexperiment(Figure2),ourrefer-entpredictionmodelisaskedtoguesstheupcomingDR.relationr,wecollectallthepredicatesinthetrain-ingsetwhichhavetheparticipanttypepintheposi-tionr.TheembeddingoftheDRxp,risgivenbytheaverageembeddingofthesepredicates.Thefeatureiscomputedasthedotproductofxp,randthewordembeddingofthepredicatev.PredicateschemasThefollowingfeaturecapturesaspeciﬁcaspectofknowledgeaboutprototypicalsequencesofevents.Thisknowledgeiscalledpredicateschemasintherecentco-referencemodelingworkofPengetal.(2015).Inpredicateschemas,thegoalistomodelpairsofeventssuchthatifaDRdparticipatedintheﬁrstevent(inaspeciﬁcrole),itislikelytopartici-pateinthesecondevent(再次,inaspeciﬁcrole).Forexample,intherestaurantscenario,ifoneob-servesaphraseJohnordered,oneislikelytoseeJohnwaitedsomewherelaterinthedocument.Spe-ciﬁcargumentsarenotthatimportant(whereitisJohnorsomeotherDR),whatisimportantisthattheargumentisreusedacrossthepredicates.ThiswouldcorrespondtotheruleX-subject-of-order→X-subject-of-eat.4Unlikethepreviouswork,ourdatasetissmall,sowecannotinducetheserulesdi-rectlyastherewillbeveryfewrules,andthemodelwouldnotgeneralizetonewdatawellenough.In-stead,weagainencodethisintuitionusingsimilari-tiesinthereal-valuedembeddingspace.Recallthatourgoalistocomputeafeatureϕ(d,H(t))indicatinghowlikelyapotentialDRdistofollow,giventhehistoryh(t).Forexample,imag-4Inthiswork,welimitourselvestoruleswherethesyntacticfunctionisthesameonbothsidesoftherule.Inotherwords,wecan,inprinciple,encodethepatternXpushedY→XapologizedbutnotthepatternXpushedY→Ycried.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

ModelNameFeatureTypesFeaturesBaseShallowLinguisticFeaturesRecency,Frequency,Grammaticalfunction,Previoussubject,PreviousobjectLinguisticShallowLinguisticFeatures+LinguisticFeatureRecency,Frequency,Grammaticalfunction,Previoussubject,Previousobject+SelectionalPreferencesScriptShallowLinguisticFeatures+LinguisticFeature+ScriptFeaturesRecency,Frequency,Grammaticalfunction,Previoussubject,Previousobject+SelectionalPreferences+Participanttypeﬁt,PredicateschemasTable2:SummaryofmodelfeaturesinethatthemodelisaskedtopredicttheDRmarkedbyXXXXXXinFigure4.Predicate-schemarulescanonlyyieldpreviouslyintroducedDRs,sothescoreϕ(d,H(t))=0foranynewDRd.Letususe“soap”asanexampleofapreviouslyintroducedDRandseehowthefeatureiscomputed.Inordertochoosewhichinferencerulescanbeappliedtoyield“soap”,wecaninspectFigure4.ThereareonlytwoprecedingpredicateswhichhaveDR“soap”astheirobject(rubbedandgrabbed),resultingintwopoten-tialrulesX-object-of-grabbed→X-object-of-rinsedandX-object-of-rubbed→X-object-of-rinsed.Wedeﬁnethescoreϕ(d,H(t))astheaverageoftherulescores.Moreformally,wecanwriteϕ(d,H(t))= 1|氮(d,H(t))|X(你,v,r)∈N(d,H(t))φ(你,v,r),(1)whereψ(你,v,r)isthescoreforaruleX-r-of-u→X-r-of-v,氮(d,H(t))isthesetofapplicablerules,和|氮(d,H(t))|denotesitscardinality.5Wedeﬁneϕ(d,H(t))as0,whenthesetofapplicablerulesisempty(IE。|氮(d,H(t))|=0).Thescoringfunctionψ(你,v,r)asalinearfunc-5Inallourexperiments,ratherthanconsideringallpotentialpredicatesinthehistorytoinstantiaterules,wetakeintoac-countonly2precedingverbs.Inotherwords,uandvcanbeinterleavedbyatmostoneverband|氮(d,H(t))|isin{0,1,2}.tionofajointembeddingxu,vofverbsuandv:φ(你,v,r)=αTrxu,v.Thetworemainingquestionsare(1)howtodeﬁnethejointembeddingsxu,v,和(2)howtoestimatetheparametervectorαr.Thejointembeddingoftwopredicates,xu,v,能,inprinciple,beanycomposi-tionfunctionofembeddingsofuandv,forexampletheirsumorcomponent-wiseproduct.InspiredbyBordesetal.(2013),weusethedifferencebetweenthewordembeddings:φ(你,v,r)=αTr(xu−xv),wherexuandxvareexternalembeddingsofthecorrespondingverbs.Encodingthesuccessionre-lationastranslationintheembeddingspacehasonedesirableproperty:thescoringfunctionwillbelargelyagnostictothemorphologicalformofthepredicates.Forexample,thedifferencebetweentheembeddingsofrinsedandrubbedisverysim-ilartothatofrinseandrub(BothaandBlunsom,2014),sothecorrespondingruleswillreceivesimi-larscores.Now,wecanrewritetheequation(1)asϕ(d,H(t))=αTr(H(t))磷(你,v,r)∈N(d,H(t))(xu−xv)|氮(d,H(t))|(2)wherer(H(t))denotesthesyntacticfunctioncorre-spondingtotheDRbeingpredicted(objectinourexample).Asfortheparametervectorαr,thereareagainanumberofpotentialwayshowitcanbeestimated.Forexample,onecantrainadiscriminativeclassiﬁertoestimatetheparameters.However,weoptedforasimplerapproach—wesetitequaltotheempiricalestimateoftheexpectedfeaturevectorxu,vonthetrainingset:6αr=1DrXl,tδr(r(H(我,t)))X(你,v,r0)∈N(d(我,t),H(我,t))(xu−xv),(3)wherelreferstoadocumentinthetrainingset,tis(asbefore)apositioninthedocument,H(我,t)and6ThisessentiallycorrespondstousingtheNaiveBayesmodelwiththesimplisticassumptionthatthescoredifferencesarenormallydistributedwithsphericalcovariancematrices.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
4
4
1
5
6
7
4
3
2

/
t

我

A
C
_
A
_
0
0
0
4
4
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

ScenarioHumanModelScriptModelLinguisticModelTilyModelAccuracyPerplexityAccuracyPerplexityAccuracyPerplexityAccuracyPerplexityGroceryShopping74.802.1368.173.1653.856.5432.8924.48Repairingaﬂatbicycletyre78.342.7262.093.8951.266.3829.2419.08Ridingapublicbus72.192.2864.573.6752.656.3432.7823.39Gettingahaircut71.062.4558.823.7942.827.1128.7015.40Plantingatree71.862.4659.324.2547.807.3128.1424.28Borrowingbookfromlibrary77.491.9364.073.5543.298.4033.3320.26TakingBath81.291.8467.423.1461.294.3343.2316.33Goingonatrain70.792.3958.734.2047.627.6830.1635.11Bakingacake76.432.1661.795.1146.409.1624.0723.67Flyinginanairplane62.043.0861.314.0148.187.2730.9030.18Average73.632.3462.633.8849.527.0531.3423.22Table3:Accuracies(in%)andperplexitiesfordifferentmodelsandscenarios.Thescriptmodelsubstantiallyout-performslinguisticandbasemodels(withp<0.001,signiﬁcancetestedwithMcNemar’stest(Everitt,1992)).Asexpected,thehumanpredictionmodeloutperformsthescriptmodel(withp<0.001,signiﬁcancetestedbyMcNe-mar’stest).ModelAccuracyPerplexityLinguisticModel49.527.05LinguisticModel+PredicateSchemas55.445.88LinguisticModel+Participanttypeﬁt58.884.29FullScriptModel(bothfeatures)62.633.88Table4:Accuraciesfromablationexperiments.d(l,t)arethehistoryandthecorrectDRforthisposi-tion,respectively.Thetermδr(r0)istheKroneckerdeltawhichequals1ifr=r0and0,otherwise.Dristhetotalnumberofrulesforthesyntacticfunctionrinthetrainingset:Dr=Xl,tδr(r(h(l,t)))×|N(d(l,t),h(l,t))|.Letusillustratethecomputationwithanexample.Imaginethatourtrainingsetconsistsofthedocu-mentinFigure1,andthetrainedmodelisusedtopredicttheupcomingDRinourreferentclozeexam-ple(Figure4).ThetrainingdocumentincludesthepairX-object-of-scrubbed→X-object-of-rinsing,sothecorrespondingterm(xscrubbed-xrinsing)partici-patesinthesummation(3)forαobj.Aswerelyonexternalembeddings,whichencodesemanticsimi-laritiesbetweenlexicalitems,thedotproductofthistermand(xrubbed-xrinsed)willbehigh.7Conse-quently,ϕ(d,h(t))isexpectedtobepositiveford=“soap”,thus,predicting“soap”asthelikelyforth-comingDR.Unfortunately,thereareotherterms(xu−xv)bothinexpression(3)forαobjandinexpression(2)forϕ(d,h(t)).Thesetermsmaybe7Thescorewouldhavebeenevenhigher,shouldthepred-icatebeinthemorphologicalformrinsingratherthanrinsed.However,embeddingsofrinsingandrinsedwouldstillbesuf-ﬁcientlyclosetoeachotherforourargumenttohold.irrelevanttothecurrentprediction,asX-object-of-plugged→X-object-of-ﬁllingfromFigure1,andmaynotevenencodeanyvalidregularities,asX-object-of-got→X-object-of-scrubbed(againfromFigure1).Thismaysuggestthatourfeaturewillbetoocontaminatedwithnoisetobeinformativeformakingpredictions.However,recallthatinde-pendentrandomvectorsinhighdimensionsareal-mostorthogonal,and,assumingtheyarebounded,theirdotproductsareclosetozero.Consequently,theproductsoftherelevant(“non-random”)terms,inourexample(xscrubbed-xrinsing)and(xrubbed-xrinsed),arelikelytoovercomethe(“random”)noise.Aswewillseeintheablationstudies,thepredicate-schemafeatureisindeedpredictiveofaDRandcon-tributestotheperformanceofthefullmodel.4.3ExperimentsWewouldliketotestwhetherourmodelcanpro-duceaccuratepredictionsandwhetherthemodel’sguessescorrelatewellwithhumanpredictionsforthereferentclozetask.Inordertobeabletoevaluatetheeffectofscriptknowledgeonreferentpredictability,wecomparethreemodels:ourfullScriptmodelusesallofthefeaturesintroducedinsection4.2;theLinguisticmodelreliesonlyonthe‘linguisticfeatures’butnotthescript-speciﬁcones;andtheBasemodelincludesalltheshallowlinguisticfeatures.TheBasemodeldiffersfromthelinguisticmodelinthatitdoesnotmodelselectionalpreferences.Table2summarizesfeaturesusedindifferentmodels.Thedatasetwasrandomlydividedintotraining(70%),development(10%,91storiesfrom10sce- l d o w n o 一个 d e d f r o m h t t p : / / d 我 r e c t . m 我 t . e d 你 / t 一个 c l / l 一个 r t 我 c e - p d f / d o 我 / . 1 0 1 1 6 2 / t l 一个 c _ 一个 _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l 一个 c _ 一个 _ 0 0 0 4 4 p d . f b y g 你 e s t t o n 0 9 s e p e m b e r 2 0 2 3 40 narios),andtest(20%,182storiesfrom10scenar-ios)sets.ThefeatureweightswerelearnedusingL-BFGS(Byrdetal.,1995)tooptimizethelog-likelihood.Evaluationagainstoriginalreferents.Wecalcu-latedthepercentageofcorrectDRpredictions.SeeTable3fortheaveragesacross10scenarios.Wecanseethatthetaskappearshardforhumans:theiraverageperformancereachesonly73%accuracy.Asexpected,theBasemodelistheweakestsystem(theaccuracyof31%).Modelingselectionalpref-erencesyieldsanextra18%inaccuracy(Linguis-ticmodel).Thekeyﬁndingisthatincorporationofscriptknowledgeincreasestheaccuracybyfurther13%,althoughstillfarbehindhumanperformance(62%vs.73%).Besidesaccuracy,weuseperplex-ity,whichwecomputednotonlyforallourmodelsbutalsoforhumanpredictions.Thiswaspossibleaseachtaskwassolvedbymultiplehumans.Weusedunsmoothednormalizedguessfrequenciesastheprobabilities.AswecanseefromTable3,theperplexityscoresareconsistentwiththeaccuracies:thescriptmodelagainoutperformsothermethods,和,asexpected,allthemodelsareweakerthanhu-mans.Asweusedtwosetsofscriptfeatures,capturingdifferentaspectsofscriptknowledge,weperformedextraablationstudies(Table4).Theexperimentsconﬁrmthatbothfeaturesetswerebeneﬁcial.Evaluationagainsthumanexpectations.Intheprevioussubsection,wedemonstratedthatthein-corporationofselectionalpreferencesand,perhapsmoreinterestingly,theintegrationofautomaticallyacquiredscriptknowledgeleadtoimprovedaccu-racyinpredictingdiscoursereferents.Nowweturntoanotherquestionraisedintheintroduction:doesincorporationofthisknowledgemakeourpredic-tionsmorehuman-like?Inotherwords,areweabletoaccuratelyestimatehumanexpectations?Thisin-cludesnotonlybeingsufﬁcientlyaccuratebutalsomakingthesamekindofincorrectpredictions.Inthisevaluation,wethereforeusehumanguessescollectedduringthereferentclozetaskasourtarget.Wethencalculatetherelativeaccuracyofeachcomputationalmodel.AscanbeseeninFigure5,theScriptmodel,atapprox.53%accuracy,isalotmoreaccurateinpredictinghumanguessesthantheLinguisticmodelandtheBasemodel.WecanalsoScriptLinguisticBase010203040506052.938.434.52Rel. 准确性 (在 %)Figure5:Averagerelativeaccuraciesofdifferentmodelsw.r.thumanpredictions.ScriptLinguisticBase0.00.20.40.60.80.50.570.66JS DivergenceFigure6:AverageJensen-Shannondivergencebetweenhumanpredictionsandmodels.observethatthemarginbetweentheScriptmodelandtheLinguisticmodelisalotlargerinthisevalu-ationthanbetweentheBasemodelandtheLinguis-ticmodel.Thisindicatesthatthemodelwhichhasaccesstoscriptknowledgeismuchmoresimilartohumanpredictionbehaviorintermsoftopguessesthanthescript-agnosticmodels.Nowwewouldliketoassessifourpredictionsaresimilarasdistributionsratherthanonlyyield-ingsimilartoppredictions.Inordertocomparethedistributions,weusetheJensen-Shannondivergence(JSD),asymmetrizedversionoftheKullback-Leiblerdivergence.Intuitively,JSDmeasuresthedistancebetweentwoprobabilitydistributions.AsmallerJSDvalueisindicativeofmoresimilardistributions.Fig-ure6showsthattheprobabilitydistributionsresult-ingfromtheScriptmodelaremoresimilartohumanpredictionsthanthoseoftheLinguisticandBasemodels.Intheseexperiments,wehaveshownthatscriptknowledgeimprovespredictionsofupcomingref-erentsandthatthescriptmodelisthebestamongourmodelsinapproximatinghumanreferentpredic-tions.5ReferringExpressionTypePredictionModel(REModel)Usingthereferentpredictionmodels,wenextat-tempttoreplicateTilyandPiantadosi’sﬁndingsthat l D o w n o a d e d f r o m h t t p : / / d 我 r e c t . m 我 t . e d 你 / t 一个 c l / l 一个 r t 我 c e - p d f / d o 我 / . 1 0 1 1 6 2 / t l 一个 c _ 一个 _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l 一个 c _ 一个 _ 0 0 0 4 4 p d . f b y g 你 e s t t o n 0 9 s e p e m b e r 2 0 2 3 41 thechoiceofthetypeofreferringexpression(pro-nounorfullNP)dependsinpartonthepredictabilityofthereferent.5.1UniformInformationDensityhypothesisTheuniforminformationdensity(UID)hypothe-sissuggeststhatspeakerstendtoconveyinformationatauniformrate(Jaeger,2010).Appliedtochoiceofreferringexpressiontype,itwouldpredictthatahighlypredictablereferentshouldbeencodedus-ingashortcode(这里:apronoun),whileanunpre-dictablereferentshouldbeencodedusingalongerform(这里:afullNP).Informationdensityismea-suredusingtheinformation-theoreticmeasureofthesurprisalSofamessagemi:S(mi)=−logP(mi|语境)UIDhasbeenverysuccessfulinexplainingavari-etyoflinguisticphenomena;seeJaegeretal.(2016).Thereis,然而,controversyaboutwhetherUIDaffectspronominalization.TilyandPiantadosi(2009)reportevidencethatwritersaremorelikelytoreferusingapronounorpropernamewhentheref-erentiseasytoguessanduseafullNPwhenreadershavelesscertaintyabouttheupcomingreferent;seealsoArnold(2001).Butotherexperiments(usinghighlycontrolledstimuli)havefailedtoﬁndanef-fectofpredictabilityonpronominalization(Steven-sonetal.,1994;FukumuraandvanGompel,2010;RohdeandKehler,2014).ThepresentstudyhencecontributestothedebateonwhetherUIDaffectsre-ferringexpressionchoice.5.2AmodelofReferringExpressionChoiceOurgoalistodeterminewhetherreferentpre-dictability(quantiﬁedintermsofsurprisal)iscor-relatedwiththetypeofreferringexpressionusedinthetext.Herewefocusonthedistinctionbe-tweenpronounsandfullnounphrases.Ourdataalsocontainsasmallpercentage(ca.1%)ofpropernames(like“John”).Duetothissmallclasssizeandearlierﬁndingsthatpropernounsbehavemuchlikepronouns(TilyandPiantadosi,2009),wecom-binedpronounsandpropernamesintoasingleclassofshortencodings.Forthereferringexpressiontypepredictiontask,weestimatethesurprisalofthereferentfromeachofourcomputationalmodelsfromSection4aswellasthehumanclozetask.Thesurprisalofanupcomingdiscoursereferentd(t)basedonthepreviouscontexth(t)istherebyestimatedas:S(d(t))=−logp(d(t)|H(t))Inordertodeterminewhetherreferentpredictabilityhasaneffectonreferringexpressiontypeoverandaboveotherfactorsthatareknowntoaffectthechoiceofreferringexpression,wetrainalogisticregressionmodelwithreferringexpressiontypeasaresponsevariableanddiscoursereferentpredictabil-ityaswellasalargesetofotherlinguisticfactors(basedonTilyandPiantadosi,2009)asexplanatoryvariables.Themodelisdeﬁnedasfollows:p(n(t)=n|d(t),H(t))=exp(vTg(n,dt,H(t)))Pn0exp(vTg(n0,dt,H(t))),whered(t)andh(t)aredeﬁnedasbefore,gisthefeaturefunction,andvisthevectorofmodelpa-rameters.ThesummationinthedenominatorisoverNPtypes(fullNPvs.pronoun/propernoun).5.3REModelExperimentsWeranfourdifferentlogisticregressionmodels.Thesemodelsallcontainedexactlythesamesetoflinguisticpredictorsbutdifferedintheestimatesusedforreferenttypesurprisalandresidualentropy.Onelogisticregressionmodelusedsurprisalesti-matesbasedonthehumanreferentclozetask,whilethethreeothermodelsusedestimatesbasedonthethreecomputationalmodels(Base,LinguisticandScript).Forourexperiment,weareinterestedinthechoiceofreferringexpressiontypeforthoseoccur-rencesofreferences,wherea“realchoice”ispossi-ble.Wethereforeexcludeforouranalysisreportedbelowallﬁrstmentionsaswellasallﬁrstandsecondpersonpronouns(becausethereisnooptionalityinhowtorefertoﬁrstorsecondperson).Thissubsetcontains1345datapoints.5.4ResultsTheresultsofallfourlogisticregressionmodelsareshowninTable5.Weﬁrsttakealookattheresultsforthelinguisticfeatures.Whilethereisabitofvariabilityintermsoftheexactcoefﬁcientes-timatesbetweenthemodels(thisissimplyduetosmallcorrelationsbetweenthesepredictorsandthepredictorsforsurprisal),theeffectofallofthesefeaturesislargelyconsistentacrossmodels.Forin-stance,thepositivecoefﬁcientsfortherecencyfea-turemeansthatwhenapreviousmentionhappened l D o w n o a d e d f r o m h t t p : / / d 我 r e c t . m 我 t . e d 你 / t 一个 c l / l 一个 r t 我 c e - p d f / d o 我 / . 1 0 1 1 6 2 / t l 一个 c _ 一个 _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l 一个 c _ 一个 _ 0 0 0 4 4 p d . f b y g 你 e s t t o n 0 9 s e p e m b e r 2 0 2 3 42 EstimateStd.ErrorPr(>|z|)HumanScriptLinguisticBaseHumanScriptLinguisticBaseHumanScriptLinguisticBase(Intercept)-3.4-3.418-3.245-3.0610.2440.2790.3210.791<2e-16***<2e-16***<2e-16***0.00011***recency1.3221.3221.3241.3220.0950.0950.0960.097<2e-16***<2e-16***<2e-16***<2e-16***frequency0.0970.1030.1120.1140.0980.0970.0980.1020.3170.2890.2510.262pastObj0.4070.3960.4230.3950.2930.2940.2950.30.1650.1780.1510.189pastSubj-0.967-0.973-0.909-0.9260.5590.5640.5620.5650.0838.0.0846.0.1060.101pastExpPronoun1.6031.6191.6161.6020.210.2070.2080.2452.19e-14***5.48e-15***7.59e-15***6.11e-11***depTypeSubj2.9392.9422.6562.4170.2990.3470.4291.113<2e-16***<2e-16***5.68e-10***0.02994*depTypeObj1.1991.2270.9770.7050.2480.3060.3891.1091.35e-06***6.05e-05***0.0119*0.525surprisal-0.04-0.0060.002-0.1310.0990.0970.1170.3870.6840.9510.9880.735residualEntropy-0.0090.023-0.141-0.1280.0880.1280.1680.2580.9160.8590.4010.619Table5:Coefﬁcientsobtainedfromregressionanalysisfordifferentmodels.TwoNPtypesconsidered:fullNPandPronoun/ProperNoun,withbaseclassfullNP.Signiﬁcance:‘***’<0.001,‘**’<0.01,‘*’<0.05,and‘.’<0.1.veryrecently,thereferringexpressionismorelikelytobeapronoun(andnotafullNP).Thecoefﬁcientsforthesurprisalestimatesofthedifferentmodelsare,however,notsigniﬁcantlydif-ferentfromzero.Modelcomparisonshowsthattheydonotimprovemodelﬁt.Wealsousedtheesti-matedmodelstopredictreferringexpressiontypeonnewdataandagainfoundthatsurprisalestimatesfromthemodelsdidnotimprovepredictionaccu-racy.Thiseffectevenholdsforourhumanclozedata.Hence,itcannotbeinterpretedasaproblemwiththemodels—evenhumanpredictabilityesti-matesare,forthisdataset,notpredictiveofreferringexpressiontype.Wealsocalculatedregressionmodelsforthefulldatasetincludingﬁrstandsecondpersonpronounsaswellasﬁrstmentions(3346datapoints).There-sultsforthefulldatasetarefullyconsistentwiththeﬁndingsshowninTable5:therewasnosigniﬁcanteffectofsurprisalonreferringexpressiontype.ThisresultcontrastswiththeﬁndingsbyTilyandPiantadosi(2009),whoreportedasigniﬁcanteffectofsurprisalonREtypefortheirdata.Inordertoreplicatetheirsettingsascloselyaspossible,wealsoincludedresidualEntropyasapredictorinourmodel(seelastpredictorinTable5);however,thisdidnotchangetheresults.6DiscussionandFutureWorkOurstudyonincrementallypredictingdiscoursereferentsshowedthatscriptknowledgeisahighlyimportantfactorindetermininghumandiscourseex-pectations.Crucially,thecomputationalmodellingapproachallowedustoteaseapartthedifferentfac-torsthataffecthumanpredictionaswecannotma-nipulatethisinhumansdirectly(byaskingthemto“switchoff”theircommon-senseknowledge).Bymodellingcommon-senseknowledgeintermsofeventsequencesandeventparticipants,ourmodelcapturesmanymorelong-rangedependenciesthannormallanguagemodels.Thescriptknowledgeisautomaticallyinducedbyourmodelfromcrowd-sourcedscenario-speciﬁctextcollections.Inasecondstudy,wesetouttotestthehypoth-esisthatuniforminformationdensityaffectsrefer-ringexpressiontype.Thisquestionishighlycon-troversialintheliterature:whileTilyandPiantadosi(2009)ﬁndasigniﬁcanteffectofsurprisalonrefer-ringexpressiontypeinacorpusstudyverysimilartoours,otherstudiesthatuseamoretightlycon-trolledexperimentalapproachhavenotfoundanef-fectofpredictabilityonREtype(Stevensonetal.,1994;FukumuraandvanGompel,2010;RohdeandKehler,2014).Thepresentstudy,whilereplicatingexactlythesettingofT&Pintermsoffeaturesandanalysis,didnotﬁndsupportforaUIDeffectonREtype.ThedifferenceinresultsbetweenT&P2009andourresultscouldbeduetothedifferentcorporaandtextsortsthatwereused;speciﬁcally,wewouldexpectthatlargerpredictabilityeffectsmightbeob-servableatscriptboundaries,ratherthanwithinascript,asisthecaseinourstories.Anextstepinmovingourparticipantpredic-tionmodeltowardsNLPapplicationswouldbetoreplicateourmodellingresultsonautomatictext-to-scriptmappinginsteadofgold-standarddataasdonehere(inordertoapproximatehumanlevelofprocessing).Furthermore,weaimtomovetomorecomplextexttypesthatincludereferencetoseveralscripts.WeplantoconsidertherecentlypublishedROCStoriescorpus(Mostafazadehetal.,2016),alargecrowdsourcedcollectionoftopicallyunre-strictedshortandsimplenarratives,asabasisforthesenextstepsinourresearch. l d o w n o 一个 d e d f r o m h t t p : / / d 我 r e c t . m 我 t . e d 你 / t 一个 c l / l 一个 r t 我 c e - p d f / d o 我 / . 1 0 1 1 6 2 / t l 一个 c _ 一个 _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l 一个 c _ 一个 _ 0 0 0 4 4 p d . f b y g 你 e s t t o n 0 9 s e p e m b e r 2 0 2 3 43 AcknowledgmentsWethanktheeditorsandtheanonymousreview-ersfortheirinsightfulsuggestions.WewouldliketothankFlorianPusseforhelpingwiththeAma-zonMechanicalTurkexperiment.WewouldalsoliketothankSimonOstermannandTatjanaAnikinaforhelpingwiththeInScriptcorpus.ThisresearchwaspartiallysupportedbytheGermanResearchFoundation(DFG)aspartofSFB1102‘Informa-tionDensityandLinguisticEncoding’,EuropeanResearchCouncil(ERC)aspartofERCStartingGrantBroadSem(#678254),theDutchNationalSci-enceFoundationaspartofNWOVIDI639.022.518,andtheDFGonceagainaspartoftheMMCIClusterofExcellence(EXC284).ReferencesSimonAhrendtandVeraDemberg.2016.Improvingeventpredictionbyrepresentingscriptparticipants.InProceedingsofNAACL-HLT.JenniferE.Arnold.2001.Theeffectofthematicrolesonpronounuseandfrequencyofreferencecontinuation.DiscourseProcesses,31(2):137–162.MarcoBaroniandAlessandroLenci.2010.Distribu-tionalmemory:Ageneralframeworkforcorpus-basedsemantics.ComputationalLinguistics,36(4):673–721.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tcount,predict!Asystematiccompari-sonofcontext-countingvs.context-predictingseman-ticvectors.InProceedingsofACL.AntoineBordes,NicolasUsunier,AlbertoGarcia-Duran,JasonWeston,andOksanaYakhnenko.2013.Trans-latingembeddingsformodelingmulti-relationaldata.InProceedingsofNIPS.JanA.BothaandPhilBlunsom.2014.Compositionalmorphologyforwordrepresentationsandlanguagemodelling.InProceedingsofICML.RichardH.Byrd,PeihuangLu,JorgeNocedal,andCiyouZhu.1995.Alimitedmemoryalgorithmforboundconstrainedoptimization.SIAMJournalonScientiﬁcComputing,16(5):1190–1208.NathanaelChambersandDanielJurafsky.2008.Unsu-pervisedlearningofnarrativeeventchains.InPro-ceedingsofACL.NathanaelChambersandDanJurafsky.2009.Unsuper-visedlearningofnarrativeschemasandtheirpartici-pants.InProceedingsofACL.BrianS.Everitt.1992.Theanalysisofcontingencyta-bles.CRCPress.LeaFrermann,IvanTitov,andManfredPinkal.2014.AhierarchicalBayesianmodelforunsupervisedinduc-tionofscriptknowledge.InProceedingsofEACL.KumikoFukumuraandRogerP.G.vanGompel.2010.Choosinganaphoricexpressions:Dopeopletakeintoaccountlikelihoodofreference?JournalofMemoryandLanguage,62(1):52–66.T.FlorianJaeger,EstebanBuz,EvaM.Fernandez,andHelenS.Cairns.2016.Signalreductionandlinguis-ticencoding.Handbookofpsycholinguistics.Wiley-Blackwell.T.FlorianJaeger.2010.Redundancyandreduction:Speakersmanagesyntacticinformationdensity.Cog-nitivepsychology,61(1):23–62.BramJans,StevenBethard,IvanVuli´c,andMarieFrancineMoens.2012.Skipn-gramsandrankingfunctionsforpredictingscriptevents.InProceedingsofEACL.GinaR.KuperbergandT.FlorianJaeger.2016.Whatdowemeanbypredictioninlanguagecomprehension?语言,cognitionandneuroscience,31(1):32–59.GinaR.Kuperberg.2016.Separatestreamsorproba-bilisticinference?WhattheN400cantellusaboutthecomprehensionofevents.Language,CognitionandNeuroscience,31(5):602–616.MartaKutas,KatherineA.DeLong,andNathanielJ.Smith.2011.Alookaroundatwhatliesahead:Pre-dictionandpredictabilityinlanguageprocessing.Pre-dictionsinthebrain:Usingourpasttogenerateafu-ture.TomasMikolov,MartinKaraﬁ´at,LukasBurget,JanCer-nock`y,andSanjeevKhudanpur.2010.Recurrentneu-ralnetworkbasedlanguagemodel.InProceedingsofInterspeech.TomasMikolov,StefanKombrink,AnoopDeoras,LukarBurget,andJanCernocky.2011.RNNLM-recurrentneuralnetworklanguagemodelingtoolkit.InPro-ceedingsofthe2011ASRUWorkshop.TomasMikolov,IlyaSutskever,KaiChen,GregS.Cor-rado,andJeffDean.2013.Distributedrepresentationsofwordsandphrasesandtheircompositionality.InProceedingsofNIPS.AshutoshModiandIvanTitov.2014.Inducingneuralmodelsofscriptknowledge.ProceedingsofCoNLL.AshutoshModi,TatjanaAnikina,SimonOstermann,andManfredPinkal.2016.Inscript:Narrativetextsanno-tatedwithscriptinformation.ProceedingsofLREC.AshutoshModi.2016.Eventembeddingsforsemanticscriptmodeling.ProceedingsofCoNLL.NasrinMostafazadeh,NathanaelChambers,XiaodongHe,DeviParikh,DhruvBatra,LucyVanderwende,PushmeetKohli,andJamesAllen.2016.Acorpusandclozeevaluationfordeeperunderstandingofcom-monsensestories.ProceedingsofNAACL. l d o w n o 一个 d e d f r o m h t t p : / / d 我 r e c t . m 我 t . e d 你 / t 一个 c l / l 一个 r t 我 c e - p d f / d o 我 / . 1 0 1 1 6 2 / t l 一个 c _ 一个 _ 0 0 0 4 4 1 5 6 7 4 3 2 / / t l 一个 c _ 一个 _ 0 0 0 4 4 p d . f b y g 你 e s t t o n 0 9 s e p e m b e r 2 0 2 3 44 HaoruoPeng,DanielKhashabi,andDanRoth.2015.Solvinghardcoreferenceproblems.InProceedingsofNAACL.KarlPichottaandRaymondJMooney.2014.Statisticalscriptlearningwithmulti-argumentevents.Proceed-ingsofEACL.AltafRahmanandVincentNg.2012.Resolvingcom-plexcasesofdeﬁnitepronouns:theWinogradschemachallenge.InProceedingsofEMNLP.MichaelaRegneri,AlexanderKoller,andManfredPinkal.2010.Learningscriptknowledgewithwebexperiments.InProceedingsofACL.HannahRohdeandAndrewKehler.2014.Grammati-calandinformation-structuralinﬂuencesonpronounproduction.Language,CognitionandNeuroscience,29(8):912–927.RachelRudinger,VeraDemberg,AshutoshModi,Ben-jaminVanDurme,andManfredPinkal.2015.Learn-ingtopredictscripteventsfromdomain-speciﬁctext.ProceedingsoftheInternationalConferenceonLexi-calandComputationalSemantics(*SEM2015).AsadSayeed,ClaytonGreenberg,andVeraDemberg.2016.Thematicﬁtevaluation:anaspectofselectionalpreferences.InProceedingsoftheWorkshoponEval-uatingVectorSpaceRepresentationsforNLP(RepE-val2016).RogerC.SchankandRobertP.Abelson.1977.Scripts,Plans,Goals,andUnderstanding.LawrenceErlbaumAssociates,Potomac,Maryland.SimoneSch¨utz-BosbachandWolfgangPrinz.2007.Prospectivecodingineventrepresentation.Cognitiveprocessing,8(2):93–102.RosemaryJ.Stevenson,RosalindA.Crawley,andDavidKleinman.1994.Thematicroles,focusandtherep-resentationofevents.LanguageandCognitivePro-cesses,9(4):519–548.HarryTilyandStevenPiantadosi.2009.Referefﬁ-ciently:Uselessinformativeexpressionsformorepre-dictablemeanings.InProceedingsoftheworkshopontheproductionofreferringexpressions:Bridgingthegapbetweencomputationalandempiricalapproachestoreference.AlessandraZarcone,MartenvanSchijndel,JorrigVo-gels,andVeraDemberg.2016.Salienceandatten-tioninsurprisal-basedaccountsoflanguageprocess-ing.FrontiersinPsychology,7:844.
下载pdf

麻省理工学院人工智能研究专业

麻省理工学院人工智能研究专业

计算语言学协会会刊, 卷. 5, PP. 31–44, 2017. 动作编辑器: Hwee Tou Ng.