计算语言学协会会刊, 卷. 5, PP. 379–395, 2017. 动作编辑器: Mark Steedman.

计算语言学协会会刊, 卷. 5, PP. 379–395, 2017. 动作编辑器: Mark Steedman.
提交批次: 12/2016; 修改批次: 3/2017; 已发表 11/2017.

2017 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

C
(西德:13)

OrdinalCommon-senseInferenceShengZhangJohnsHopkinsUniversityzsheng2@jhu.eduRachelRudingerJohnsHopkinsUniversityrudinger@jhu.eduKevinDuhJohnsHopkinsUniversitykevinduh@cs.jhu.eduBenjaminVanDurmeJohnsHopkinsUniversityvandurme@cs.jhu.eduAbstractHumanshavethecapacitytodrawcommon-senseinferencesfromnaturallanguage:vari-ousthingsthatarelikelybutnotcertaintoholdbasedonestablisheddiscourse,andarerarelystatedexplicitly.Weproposeanevaluationofautomatedcommon-senseinferencebasedonanextensionofrecognizingtextualentail-ment:predictingordinalhumanresponsesonthesubjectivelikelihoodofaninferencehold-inginagivencontext.Wedescribeaframe-workforextractingcommon-senseknowledgefromcorpora,whichisthenusedtoconstructadatasetforthisordinalentailmenttask.Wetrainaneuralsequence-to-sequencemodelonthisdataset,whichweusetoscoreandgen-eratepossibleinferences.Further,weanno-tatesubsetsofpreviouslyestablisheddatasetsviaourordinalannotationprotocolinordertothenanalyzethedistinctionsbetweentheseandwhatwehaveconstructed.1IntroductionWeusewordstotalkabouttheworld.There-fore,tounderstandwhatwordsmean,wemusthaveapriorexplicationofhowweviewtheworld.–Hobbs(1987)ResearchersinArtiﬁcialIntelligenceand(Compu-tational)Linguisticshavelong-citedtherequire-mentofcommon-senseknowledgeinlanguageun-derstanding.1Thisknowledgeisviewedasakey1Schank(1975):Ithasbeenapparent…之内…naturallanguageunderstanding…thattheeventuallimittooursolu-tion…wouldbeourabilitytocharacterizeworldknowledge.Samboughtanewclock;TheclockrunsDavefoundanaxeinhisgarage;AcarisparkedinthegarageTomwasaccidentallyshotbyhisteammateinthearmy;TheteammatediesTwofriendswereinaheatedgameofcheckers;ApersonshootsthecheckersMyfriendsandIdecidedtogoswimmingintheocean;TheoceaniscarbonatedFigure1:Examplesofcommon-senseinferencerangingfromverylikely,likely,plausible,technicallypossible,toimpossible.componentinﬁllinginthegapsbetweenthetele-graphicstyleofnaturallanguagestatements.Weareabletoconveyconsiderableinformationinarela-tivelysparsechannel,presumablyowingtoapar-tiallysharedmodelatthestartofanydiscourse.2Common-senseinference–inferencesbasedoncommon-senseknowledge–ispossibilistic:thingseveryonemoreorlesswouldexpecttoholdinagivencontext,butwithoutthenecessarystrengthoflogicalentailment.3Becausenaturallanguagecor-poraexhibitshumanreportingbias(GordonandVanDurme,2013),systemsthatderiveknowledgeex-clusivelyfromsuchcorporamaybemoreaccuratelyconsideredmodelsoflanguage,ratherthanofthe2McCarthy(1959):aprogramhascommonsenseifitau-tomaticallydeducesforitselfasufﬁcientlywideclassofimme-diateconsequencesofanythingitistoldandwhatitalreadyknows.3ManyofthebridginginferencesofClark(1975)makeuseofcommon-senseknowledge,suchasthefollowingexampleof“Probablepart”:Iwalkedintotheroom.Thewindowslookedouttothebay.Toresolvethedeﬁnitereferencethewindows,oneneedstoknowthatroomshavewindowsisprobable.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

380

世界(Rudingeretal.,2015).Factssuchas“Aper-sonwalkingintoaroomisverylikelytobeblink-ingandbreathing”areusuallyunstatedintext,sotheirreal-worldlikelihoodsdonotaligntolanguagemodelprobabilities.4Wewouldliketohavesystemscapableofreadingasentencethatdescribesareal-worldsituationandinferringhowlikelyotherstate-mentsaboutthatsituationaretoholdtrueintherealworld,e.g.Thiscapabilityissubtlybutcruciallydistinctfromtheabilitytopredictothersentencesreportedinthesametext,asalanguagemodelmaybetrainedtodo.Wethereforeproposeamodelofknowledgeac-quisitionbasedonﬁrstderivingpossibilisticstate-mentsfromtext.Astherelativefrequencyofthesestatementssuffersthementionedreportingbias,wethenfollowupwithhumanannotationofderivedex-amples.Sinceweinitiallyareuncertainaboutthereal-worldlikelihoodofthederivedcommon-senseknowledgeholdinginanyparticularcontext,wepairitwithvariousgroundedcontextandpresenttohu-mansfortheirownassessment.Astheseexamplesvaryinassessedplausibility,weproposethetaskofordinalcommon-senseinference,whichembracesawidersetofnaturalconclusionsarisingfromlan-guagecomprehension(seeFig1).Inwhatfollows,wedescribeprioreffortsincommon-senseandtextualinference(§2).Wethenstateourpositiononhowordinalcommon-sensein-ferenceshouldbedeﬁned(§3),anddetailourownframeworkforlarge-scaleextractionandabstrac-tion,alongwithacrowdsourcingprotocolforassess-ment(§4).Thisincludesanovelneuralmodelforforwardgenerationoftextualinferencestatements.Togetherthesemethodsareappliedtocontextsde-rivedfromvariouspriortextualinferenceresources,resultingintheJHUOrdinalCommon-senseInfer-ence(JOCI)语料库,alargecollectionofdiversecommon-senseinferenceexamples,judgedtoholdwithvaryinglevelsofsubjectivelikelihood(§5).Weprovidebaselineresults(§6)forpredictionontheJOCIcorpus.54ForfurtherbackgroundseediscussionsbyVanDurme(2010),GordonandVanDurme(2013),Rudingeretal.(2015)andMisraetal.(2016).5TheJOCIcorpusisreleasedfreelyat:http://decomp.net/.2BackgroundMiningCommonSenseBuildinglargecollec-tionsofcommon-senseknowledgecanbedonemanuallyviaprofessionals(HobbsandNavarretta,1993),butatconsiderablecostintermsoftimeandexpense(磨坊主,1995;莱纳特,1995;Bakeretal.,1998;Friedlandetal.,2004).Effortshavepursuedvolunteers(辛格,2002;Havasietal.,2007)andgameswithapurpose(Chklovski,2003),butarestillleftfullyreliantonhumanlabor.Manyhavepursuedautomatingtheprocess,suchasinexpand-inglexicalhierarchies(Hearst,1992;Snowetal.,2006),constructinginferencepatterns(LinandPan-tel,2001;Berantetal.,2011),readingreferencematerials(Richardsonetal.,1998;Suchaneketal.,2007),miningsearchenginequerylogs(Pas¸caandVanDurme,2007),andmostrelevanthere:abstract-ingfrominstance-levelpredicationsdiscoveredindescriptivetexts(Schubert,2002;LiakataandPul-man,2002;Clarketal.,2003;BankoandEtzioni,2007).Inthisarticleweareconcernedwithknowl-edgeminingforpurposesofseedingatextgenera-tionprocess(constructingcommon-senseinferenceexamples).Common-senseTasksManytextualinferencetaskshavebeendesignedtorequiresomede-greeofcommon-senseknowledge,e.g.,theWino-gradSchemaChallengediscussedbyLevesqueetal.(2011).Thedataforthesetasksareeithersmaller,carefullyconstructedevaluationsetsbypro-fessionals,followingeffortsliketheFRACAStestsuite(Cooperetal.,1996),ortheyrelyoncrowd-sourcedelicitation(Bowmanetal.,2015).Crowd-sourcingisscalable,butelicitationprotocolscanleadtobiasedresponsesunlikelytocontainawiderangeofpossiblecommon-senseinferences.Hu-manscangenerallyagreeontheplausibilityofawiderangeofpossibleinferencepairs,buttheyarenotlikelytogeneratethemfromaninitialprompt.6TheconstructionofSICK(SentencesInvolvingCompositionalKnowledge)madeuseofexistingparaphrasticsentencepairs(descriptionsbydiffer-6McRaeetal.(2005):Featuressuchas或者,forexample;al-thoughlogicallypossible,donotoccurin[humanresponses][…]Althoughpeoplearecapableofverifyingthata.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

381

entpeopleofthesameimage),whichweremodi-ﬁedthroughaseriesofrule-basedtransformationsthenjudgedbyhumans(Marellietal.,2014).AswithSICK,werelyonhumansonlyforjudgingpro-videdexamples,ratherthanelicitationoftext.Un-likeSICK,ourgenerationisbasedonaprocesstar-getedspeciﬁcallyatcommonsense(see§4.1.1).PlausibilityResearchersinpsycholinguisticshaveexploredanotionofplausibilityinhumansentenceprocessing,在哪里,forinstance,argumentstopredicatesareintuitivelymoreorless“plausible”asﬁllerstodifferentthematicroles,asreﬂectedinhumanreadingtimes.Forexample,McRaeetal.(1998)lookedatmanipulationssuchas:(A)Thebosshiredbythecorporationwasper-fectforthejob.(乙)Theapplicanthiredbythecorporationwasperfectforthejob.wheretheplausibilityofabossbeingtheagent–ascomparedtopatient–ofthepredicatehiredmightbemeasuredbylookingatdelaysinreadingtimeinthewordsfollowingthepredicate.Thismeasurementisthencontrastedwiththetimingobservedinthesamepositionsin(乙).7Ratherthanmeasuringaccordingtopredictionssuchashumanreadingtimes,hereweaskanno-tatorsexplicitlytojudgeplausibilityona5-pointordinalscale(See§3).更远,oureffortmightbedescribedinthissettingasconditionalplausibil-ity,8whereplausibilityjudgmentsforagivensen-tenceareexpectedtobedependentonprecedingcontext.Furtherexplorationofconditionalplau-sibilityisaninterestingavenueofpotentialfuturework,perhapsthroughthemeasurementofhumanreadingtimeswhenusingpromptsderivedfromourordinalcommon-senseinferenceexamples.Compu-tationalmodelingof(unconditional)semanticplau-sibilityhasbeenexploredbythosesuchasPad´oetal.(2009),Erketal.(2010)andSayeedetal.(2015).TextualEntailmentAmulti-yearsourceoftex-tualinferenceexamplesweregeneratedundertheRecognizingTextualEntailment(RTE)挑战,introducedbyDaganetal.(2006):7Thisnotionofthematicplausibilityisthenrelatedtothenotionofverb-argumentselectionalpreference(Zernik,1992;Resnik,1993;ClarkandWeir,1999),andsortal(在)correctness(Thomason,1972).8Thankstotheanonymousreviewerforthisconnection.WesaythatTentailsHif,typically,ahumanreadingTwouldinferthatHismostlikelytrue.Thissomewhatinformaldeﬁnitionisbasedon(andassumes)commonhumanun-derstandingoflanguageaswellascommonbackgroundknowledge.Thisdeﬁnitionstrayedfromthemorestrictnotionofentailmentasusedbylinguisticsemanticists,suchasthoseinvolvedwithFRACAS.WhileGiampic-coloetal.(2008)extendedbinaryRTEwithan“un-known”category,theentailmentcommunityhaspri-marilyfocusedonissuessuchas“paraphrase”and“monotonicity”.AnexampleofthisistheNaturalLogicimplementationofMacCartneyandManning(2007).Languageunderstandingincontextisnotonlyun-derstandingtheentailmentsofasentence,butalsotheplausibleinferencesofthesentence,i.e.thenewposteriorontheworldafterreadingthesen-tence.Anewsentenceinadiscourseisalmostneverentailedbyanothersentenceinthediscourse,be-causesuchasentencewouldaddnonewinforma-tion.Inordertosuccessfullyprocessadiscourse,thereneedstobesomeunderstandingofwhatnewinformationcanbe,possiblyorplausibly,addedtothediscourse.Collectingsentencepairswithordi-nalentailmentconnectionsispotentiallyusefulforimprovingandtestingtheselanguageunderstandingcapabilitiesthatwouldbeneededbyalgorithmsforapplicationslikestorytelling.Garretteetal.(2011)andBeltagyetal.(2017)treatedtextualentailmentasprobabilisticlogicalin-ferenceinMarkovLogicNetworks(RichardsonandDomingos,2006).然而,thenotionofprobabil-ityintheirentailmenttaskhasasubtledistinctionfromourproblemofcommon-senseinference.Theprobabilityofbeinganentailmentgivenbyaproba-bilisticmodeltrainedforabinaryclassiﬁcation(be-inganentailmentornot)isnotnecessarilythesameasthelikelihoodofaninferencebeingtrue.Forex-ample:时间:Apersonﬂipsacoin.H:Thatﬂipcomesupheads.NohumanreadingTshouldinferthatHistrue.Amodeltrainedtomakeordinalpredictionsshouldsay:“plausible,withprobability1.0”,whereasamodeltrainedtomakebinaryentailed/not-entailed

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

382

predictionsshouldsay:“notentailed,withprobabil-ity1.0”.Thefollowingexampleexhibitsthesameproperty:时间:Ananimaleatsfood.H:Apersoneatsfood.Again,withhighconﬁdence,Hisplausible;和,withhighconﬁdence,itisalsonotentailed.Non-entailingInferenceOfthevariousnon-“entailment”textualinferencetasks,afewaremostsalienthere.Agirreetal.(2012)pilotedaTextualSimilarityevaluationwhichhasbeenreﬁnedinsub-sequentyears.Systemsproducescalarvaluescorre-spondingtopredictionsofhowsimilarthemeaningisbetweentwoprovidedsentences,e.g.,thefollow-ingpairfromSICKwasjudgedverysimilar(4.2outof5),whilealsobeingacontradiction:ThereisnobikerjumpingintheairandAlonebikerisjump-ingintheair.Theordinalapproachweadvocateforreliesonagradednotion,liketextualsimilarity.TheChoiceofPlausibleAlternative(COPA)任务(Roemmeleetal.,2011)wasareactiontoRTE,similarlymotivatedtoprobeasystem’sabilitytoun-derstandinferencesthatarenotstrictlyentailed.Asinglecontextwasprovided,withtwoalternativein-ferences,andasystemhadtojudgewhichwasmoreplausible.TheCOPAdatasetwasmanuallyelicited,andisnotlarge;wediscussthisdatafurtherin§5.TheNarrativeClozetask(ChambersandJuraf-sky,2008)requiresasystemtoscorecandidatein-ferencesastohowlikelytheyaretoappearinadocumentthatalsoincludedtheprovidedcontext.Manysuchinferencesarethennotstrictlyentailedbythecontext.Further,theClozetaskgivestheben-eﬁtofbeingabletogenerateverylargenumbersofexamplesautomaticallybysimplyoccludingpartsofexistingdocumentsandaskingasystemtopre-dictwhatismissing.TheLAMBADAdataset(Pa-pernoetal.,2016)isakintoourstrategyforauto-maticgenerationfollowedbyhumanﬁltering,butforClozeexamples.Asourconcerniswithinfer-encesthatareoftentruebutneverstatedinadoc-ument,thisapproachisnotviablehere.TheROC-Storiescorpus(Mostafazadehetal.,2016)elicitedamore“plausible”collectionofdocumentsinor-dertoretainthenarrativeClozeinthecontextofcommon-senseinference.TheROCStoriescorpuscanbeviewedasanextensionoftheideabehindtheCOPAcorpus,doneatalargerscalewithcrowd-sourcing,andwithmulti-sentencecontexts;wecon-siderthisdatasetin§5.AlongsidethenarrativeCloze,PichottaandMooney(2016)madeuseofa5-pointLikertscale(verylikelytoveryunlikely)asasecondaryevalu-ationofvariousscriptinductiontechniques.Whiletheywereconcernedwithmeasuringtheirabilitytogenerateverylikelyinferences,hereweareinter-estedingeneratingawideswathofinferencecandi-dates,includingthosethatareimpossible.3OrdinalCommon-senseInferenceOurgoalisasystemthatcanperformspeculative,common-senseinferenceaspartofunderstandinglanguage.Basedontheobservedshortfallsofpriorwork,weproposethenotionofOrdinalCommon-senseInference(OCI).OCIembracesthenotionofDaganetal.(2006),inthatweareconcernedwithhumanjudgmentsofepistemicmodality.9Asagreedbymanylinguists,modalityinnat-urallanguageisacontinuouscategory,butspeakersareabletomapareasofthisaxisintodiscretevalues(Lyons,1977;Horn,1989;deHaan,1997)–Saur´ıandPustejovsky(2009)AccordingtoHorn(1989),therearetwoscalesofepistemicmodalitywhichdifferinpolarity(posi-tivevs.negativepolarity):hcertain,likely,possibleiandhimpossible,不太可能,uncertaini.TheSquareofOpposition(SO)(Fig2)illustratesthelogicalre-lationsholdingbetweenvaluesinthetwoscales.Basedontheirlogicalrelations,wecanmakeasetofexhaustiveepistemicmodals:hverylikely,likely,可能的,impossiblei,wherehverylikely,likely,pos-sibleilieonasingle,positiveHornscale,andim-possible,acomplementaryconceptfromthecor-respondingnegativeHornscale,completestheset.Inthispaper,wefurtherreplacethevaluepossiblebythemoreﬁne-grainedvalues(technicallypossi-bleandplausible).Thisresultsina5-pointscaleoflikelihood:hverylikely,likely,plausible,techni-callypossible,impossiblei.TheOCItaskdeﬁnitiondirectlyembracessubjectivelikelihoodonsuchan9Epistemicmodality:thelikelihoodthat(someaspectof)acertainstateofaffairsis/hasbeen/willbetrue(orfalse)inthecontextofthepossibleworldunderconsideration.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

383

ordinalscale.HumansarepresentedwithacontextCandaskedwhetheraprovidedhypothesisHisverylikely,likely,plausible,technicallypossible,orim-possible.Furthermore,animportantpartofthispro-cessisthegenerationofHbyautomaticmethods,whichseekstoavoidtheelicitationbiasofmanypriorworks.AEContrariesIOSubcontrariesContradictoriescertainlikelypossibleimpossibleunlikelyuncertainPositiveNegativeFigure2:SOforepistemicmodals(Saur´ıandPuste-jovsky,2009).104FrameworkforcollectingOCIcorpusWenowdescribeourframeworkforcollectingordi-nalcommon-senseinferenceexamples.Itisnaturaltocollectthisdataintwostages.Intheﬁrststage(§4.1),weautomaticallygenerateinferencecandi-datesgivensomecontext.Weproposetwobroadapproachesusingeithergeneralworldknowledgeorneuralmethods.Inthesecondstage(§4.2),wean-notatethesecandidateswithordinallabels.4.1GenerationofCommon-senseInferenceCandidates4.1.1GenerationbasedonWorldKnowledgeOurmotivationforthisapproachwasﬁrstintro-ducedbySchubert(2002):Thereisalargelyuntappedsourceofgeneralknowledgeintexts,lyingatalevelbeneaththeexplicitassertionalcontent.Thisknowledgeconsistsofrelationshipsimpliedtobepossi-bleintheworld,或者,undercertainconditions,impliedtobenormalorcommonplaceintheworld.FollowingSchubert(2002)andVanDurmeandSchubert(2008),wedeﬁneanapproachforab-stractingoverexplicitassertionsderivedfromcor-pora,leadingtoalarge-scalecollectionofgeneralpossibilisticstatements.AsshowninFig3,this10“Contradictories”:exhaustiveandmutuallyexclusivecon-ditions.“Contraries”:non-exhaustiveandmutuallyexclusive.“Subcontraries”:exhaustiveandnon-mutuallyexclusive.approachgeneratescommon-senseinferencecan-didatesinfoursteps:(A)extractingpropositionswithpredicate-argumentstructuresfromtexts,(乙)abstractingoverpropositionstogeneratetemplatesforconcepts,(C)derivingpropertiesofconceptsviadifferentstrategies,和(d)generatingpossibilistichypothesesfromcontexts.publication.n.01person buy ____collection.n.02magazine.n.01book.n.01Noperson subscribe to ____Yesperson borrow ____ from library…YesNoYes(C) Property derivation using the decision treefeaturefeaturefeatureNo[人] borrow [书] 从 [library]person.n.01book.n.01library.n.01____ borrow book from libraryperson borrow ____ from libraryperson borrow book from ____propositional templatesabstracted proposition[约翰] borrowed [the books] 从 [the library]pred-arg structured propositionJohn borrowed the books from the library .plain text(A) 萃取(乙) AbstractionThe professor recommended [图书] for this course. 语境(d) Inference generationA person borrows the books from a library.inferenceapproximationtemplate generationextractionpropertyderivationverbalization hypothesisHypothesis generationFigure3:Generatingcommon-senseinferencesbasedongeneralworldknowledge.(A)Extractingpropositions:Firstweextractalargesetofpropositionswithpredicate-argumentstructuresfromnounphrasesandclauses,underwhichgeneralworldpresumptionsoftenlie.Toachievethisgoal,weusePredPatt11(Whiteetal.,2016;Zhangetal.,2017),whichdeﬁnesaframe-11https://github.com/hltcoe/PredPatt

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

384

workofinterpretable,language-neutralpredicate-argumentextractionpatternsfromUniversalDepen-dencies(deMarneffeetal.,2014).Fig3(A)showsanexampleextraction.WeusetheGigawordcorpus(Parkeretal.,2011)forextractingpropositionsasitisacomprehensivetextarchive.Thereexistsaversioncontainingau-tomaticallygeneratedsyntacticannotation(Ferraroetal.,2014),whichbootstrapslarge-scaleknowl-edgeextraction.WeusePyStanfordDependencies12toconvertconstituencyparsestodepedencyparses,fromwhichweextractstructuredpropositions.(乙)Abstractingpropositions:Inthisstep,weab-stractthepropositionsintoamoregeneralform.Thisinvolveslemmatization,strippinginessentialmodiﬁersandconjuncts,andreplacingspeciﬁcar-gumentswithgenerictypes.13Thismethodofab-stractionoftenyieldsgeneralpresumptionsabouttheworld.Toreducenoisefrompredicate-argumentextraction,weonlykeep1-placeand2-placepredi-catesafterabstraction.Wefurthergeneralizeindividualargumentstoconceptsbyattachingsemantic-classlabelstothem.HerewechooseWordNet(磨坊主,1995)nounsynsets14asthesemantic-classset.Whenselect-ingthecorrectsenseforanargument,weadoptafastandrelativelyaccuratemethod:alwaystakingtheﬁrstsensewhichisusuallythemostcommonlyusedsense(Suchaneketal.,2007;Pasca,2008).Bydoingso,weattach84millionabstractedproposi-tionswithsenses,covering43.7%(35,811/81,861)ofWordNetnounsenses.EachoftheseWordNetsenses,然后,isassoci-atedwithasetofabstractedpropositions.Theab-stractedpropositionsareturnedintotemplatesbyre-placingthesense’scorrespondingargumentwithaplaceholder,similartoVanDurmeetal.(2009)(seeFig3(乙)).Weremoveanytemplateassociatedwithasenseifitoccurslessthantwotimesforthatsense,12https://pypi.python.org/pypi/PyStanfordDependencies13UsingEnglishglossesofthelogicalrepresentations,ab-stractionof“along,darkcorridor”wouldyield“corridor”forexample;“asmallofﬁceattheendofalongdarkcorridor”wouldyield“ofﬁce”;and“Mrs.MacReady”wouldyield“per-son”.SeeSchubert(2002)fordetail.14Inordertoavoidtoogeneralsenses,wesetcutpointsatthedepthof4(Panteletal.,2007)totruncatethehierarchyandconsiderall81,861sensesbelowthesepoints.leaving38millionuniquetemplates.(C)DerivingpropertiesviaWordNet:Atthisstep,wewanttoassociatewitheachWordNetsenseasetofpossibleproperties.Weemploythreestrategies.TheﬁrststrategyistouseadecisiontreetopickouthighlydiscriminativepropertiesforeachWordNetsense.Speciﬁcally,foreachsetofco-hyponyms,15wetrainadecisiontreeusingtheas-sociatedtemplatesasfeatures.Forexample,inFig3(C),wetrainadecisiontreeovertheco-hyponymsofpublication.n.01.Thenthetemplate“personsubscribeto”wouldbeselectedasapropertyofmagazine.n.01,andthetemplate“personborrowfromlibrary”forbook.n.01.Thesecondstrategyselectsthemostfrequenttemplatesassoci-atedwitheachsenseaspropertiesofthatsense.ThethirdstrategyusesWordNetISArelationstoderivenewpropertiesofsenses.Forthesensebook.n.01anditshypernympublication.n.01,wegenerateaproperty“bepublication”.(d)Generatinghypotheses:AsshowninFig3(d),givenadiscoursecontext(TanenhausandSeiden-berg,1980),weﬁrstextractanargumentofthecon-text,thenselectthederivedpropertiesfortheargu-ment.Sincewedon’tassumeanyspeciﬁcsensefortheargument,thesepropertiescouldcomefromanyofitscandidatesenses.Wegeneratehypothesesbyreplacingtheplaceholderintheselectedpropertieswiththeargument,andverbalizingtheproperties.164.1.2GenerationviaNeuralMethodsInadditiontotheknowledge-basedmethodsde-scribedabove,wealsoadaptaneuralsequence-to-sequencemodel(Vinyalsetal.,2015;Bahdanauetal.,2014)togenerateinferencecandidatesgivencontexts.Themodelistrainedonsentencepairsla-beled“entailment”fromtheSNLIcorpus(Bowmanetal.,2015)(火车).这里,theSNLI“premise”istheinput(contextC),andtheSNLI“hypothesis”istheoutput(hypothesisH).Weemploytwodifferentstrategiesforforwardgenerationofinferencecandidatesgivenanycon-15Sensessharingahypernymwitheachotherarecalledco-hyponyms(e.g.,book.n.01,magazine.n.01andcollections.n.02areco-hyponymsofpublication.n.01).16Weusethepattern.enmodule(http://www.clips.ua.ac.be/pages/pattern-en)forverbalization,whichincludesdeterminingpluralityoftheargument,addingproperarticles,andconjugatingverbs.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

385

text.Thesentence-promptstrategyusestheentiresentenceinthecontextasaninput,andgeneratesoutputusinggreedydecoding.Theword-promptstrategydiffersbyusingonlyasinglewordfromthecontextasinput.Thiswordischoseninthesamefashionasthestep(d)inthegenerationbasedonworldknowledge,i.e.anargumentofthecontext.Thesecondapproachismotivatedbyourhypothesisthatprovidingonlyasinglewordcontextwillforcethemodeltogenerateahypothesisthatgeneralizesoverthemanycontextsinwhichthatwordwasseen,resultinginmorecommon-sense-likehypotheses,asinFig4.Welaterpresentthefullcontextandde-codedhypothesestocrowdsourcedannotation.dustpan;apersoniscleaning.aboyinblueandwhiteshortsissweepingwithabroomanddustpan.;ayoungmanisholdingabroom.Figure4:Examplesofsequence-to-sequencehypothesisgenerationfromsingle-wordandfull-sentenceinputs.NeuralSequence-to-SequenceModelNeuralsequence-to-sequencemodelslearntomapvariable-lengthinputsequencestovariable-lengthoutputsequences,asaconditionalprobabilityofoutputgiveninput.Forourpurposes,wewanttolearntheconditionalprobabilityofanhypothe-sissentence,H,givenacontextsentence,C,i.e.,P(H|C).Thesequence-to-sequencearchitectureconsistsoftwocomponents:anencoderandadecoder.Theencoderisarecurrentneuralnetwork(RNN)iter-atingoverinputtokens(i.e.,wordsinC),andthedecoderisanotherRNNiteratingoveroutputtokens(wordsinH).Theﬁnalstateoftheencoder,hC,ispassedtothedecoderasitsinitialstate.Weuseathree-layerstackedLSTM(statesize512)forboththeencoderanddecoderRNNcells,withindepen-dentparametersforeach.WeusetheLSTMfor-mulationofHochreiterandSchmidhuber(1997)assummarizedinVinyalsetal.(2015).ThenetworkcomputesP(H|C):磷(H|C)=len(H)Yt=1p(wt|wl D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 386 Inthecasethatthehypothesesintheinferencecandidatesdonotmakesense,orhavegrammat-icalerrors,judgescanprovideanadditionalla-bel,NA,sothatwecanﬁlterthesecandidatesinpost-processing.Thecombinationofgenerationofcommon-senseinferencecandidateswithhumanﬁl-teringseekstoavoidtheproblemofelicitationbias.5JOCICorpusWenowdescribeindepthhowwecreatedtheJHUOrdinalCommon-senseInference(JOCI)corpus.ThemainpartofthecorpusconsistsofcontextschosenfromSNLI(Bowmanetal.,2015)andROCStories(Mostafazadehetal.,2016),pairedwithhypothesesgeneratedviamethodsdescribedin§4.1.Thesepairsarethenannotatedwithordi-nallabelsusingcrowdsourcing(§4.2).Wealsoin-cludecontext-hypothesispairsdirectlytakenfromSNLIandothercorpora(e.g.,aspremise-hypothesispairs),andre-annotatethemwithordinallabels.5.1DatasourcesforContext-HypothesisPairsInordertocomparewithexistinginferencecor-pora,wechoosecontextsfromtworesources:(1)theﬁrstsentenceinthesentencepairsoftheSNLIcorpuswhicharecaptionsfromtheFlickr30kcor-pus(Youngetal.,2014),和(2)theﬁrstsentenceinthestoriesoftheROCStoriescorpus.Wethencollectcandidatesofautomaticallygen-eratedcommon-senseinferences(AGCI)againstthesecontexts.Speciﬁcally,intheSNLItrainset,thereareover150Kdifferentﬁrstsentences,involving7,414differentargumentsaccordingtopredicate-argumentextraction.Werandomlychoose4,600arguments.Foreachargument,wesampleoneﬁrstsentencethathastheargumentandcol-lectcandidatesofAGCIagainstthisascontext.WealsodothesamegenerationfortheSNLIdevelop-mentsetandtestset.WealsocollectcandidatesofAGCIagainstrandomlysampledﬁrstsentencesintheROCStoriescorpus.Collectively,thesepairsandtheirordinallabels(tobedescribedin§5.2)makeupthemainpartoftheJOCIcorpus.ThestatisticsofthissubsetareshowninTable1(ﬁrstﬁverows).Forcomprehensiveness,wealsoproducedordi-nallabelson(C,H)pairsdirectlydrawnfromex-istingcorpora.ForSNLI,werandomlyselect1000contexts(前提)fromtheSNLItrainset.Then,thecorrespondinghypothesisisoneoftheentail-ment,neutral,orcontradictionhypothesestakenfromSNLI.ForROCStories,wedeﬁnedCastheﬁrstsentenceofthestory,andHasthesecondorthirdsentence.ForCOPA,(C,H)correspondstopremise-effect.Thestatisticsareshowninthebot-tomrowsofTable1.SubsetName#pairsContextSourceHypothesisSourceAGCIagainstSNLI/ROCStories22,086SNLI-trainAGCI-WK2,456SNLI-devAGCI-WK2,362SNLI-testAGCI-WK5,002ROCStoriesAGCI-WK1,211SNLI-trainAGCI-NNSNLI993SNLI-trainSNLI-entailment988SNLI-trainSNLI-neutral995SNLI-trainSNLI-contradictionROCStories1,000ROCStories-1stROCStories-2nd1,000ROCStories-1stROCStories-3rdCOPA1,000COPA-premiseCOPA-effectTotal39,093--Table1:JOCIcorpusstatistics,whereeachsubsetconsistsofdifferentsourcesforcontext-and-hypothesispairs,eachannotatedwithcommon-senseordinalla-bels.AGCI-WKrepresentscandidatesgeneratedbasedonworldknowledge.AGCI-NNrepresentscandidatesgen-eratedvianeuralmethods.5.2CrowdsourcedOrdinalLabelAnnotationWeuseAmazonMechanicalTurktoannotatethehypotheseswithordinallabels.IneachHIT(Hu-manIntelligenceTask),aworkerispresentedwithonecontextandoneortwohypotheses,asshowninFig5.First,theannotatorseesan“InitialSen-tence”(语境),e.g.“John’sgoalwastolearnhowtodrawwell.”,andisthenaskedabouttheplausibil-ityofthehypothesis,e.g.“Apersonaccomplishesthegoal”.Inparticular,weasktheannotatorhowplausiblethehypothesisistrueduringorshortlyaf-ter,becausewithoutthisconstraint,mostsentencesaretechnicallyplausibleinsomeimaginaryworld.Ifthehypothesisdoesnotmakesense18,thework-erscanchecktheboxunderthequestionandskiptheordinalannotation.Intheannotation,about25%ofhypothesesaremarkedasnotmakingsense,andareremovedfromourdata.Withthesampledcontextsandtheauto-generated18“Notmakingsense”meansthatinferencesthatareincom-pletesentencesorgrammaticallywrong. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 387 M.LabelsContextHypothesis5[5,5,5]Johnwasexcitedtogotothefair.Thefairopens.4[4,4,3]Todaymywaterheaterbroke.Apersonlooksforaheater.3[3,3,4]John’sgoalwastolearnhowtodrawwell.Apersonaccomplishesthegoal.2[2,2,2]KellywasplayingasoccermatchforherUniversity.TheUniversityisdismantled.1[1,1,1]Abrown-hairedladydressedallinbluedenimsitsinagroupofpigeons.Peoplearemadeofthedenim.5[5,5,4]Twofemalesareplayingrugbyonaﬁeld,onewithablueuniformandonewithawhiteuniform.Twofemalesareplaysportsoutside.4[4,4,3]Agroupofpeoplehaveanoutsidecookout.Peoplearehavingconversations.3[3,3,3]Twodogsﬁghting,oneisblack,theotherbeige.Thedogsareplaying.2[2,2,3]Abareheadedmanwearingadarkbluecassock,sandals,anddarkbluesocksmountsthestonestepsleadingintoaweatheredoldbuilding.Amanisinthemiddleofhomebuilding.1[1,1,1]Askydiverhangsfromtheundercarriageofanair-planeorsomesortofairglidingdevice.Acameraisusinganobject.Table2:Examplesofcontext-and-hypothesispairswithordinaljudgmentsandMedianvalue.(Theupper5rowsaresamplesfromAGCI-WK.Thelower5rowsaresamplesfromAGCI-NN.)Initial Sentence: John ’s goal was to learn how to draw well1. The following statements is to be true during or shortly after the context of the initial sentence. A person accomplishes the goal .This statement does not make sense.1ex. The following statements is to be true during or shortly after the context of the initial sentence. The goal is a content .This statement does not make sense.Figure5:Theannotationinterface,withadrop-downlistprovidesordinallabelstoselect.hypotheses,weprepare50Kcommon-senseinfer-enceexamplesforcrowdsourcedannotationinbulk.Inordertoguaranteethequalityofannotation,wehaveeachexampleannotatedbythreeworkers.Wetakethemedianofthethreeasthegoldlabel.Table3showsthestatisticsofthecrowdsourcedefforts.#examples50,832#participatedworkers150averagecostperexample1.99¢averageworktimeperexample20.71sTable3:Statisticsofthecrowdsourcedefforts.Tomakesurenon-expertworkershaveacorrectunderstandingofourtask,beforelaunchingthelatertasksinbulk,weruntwopilotstocreateapoolofqualiﬁedworkers.Intheﬁrstpilot,wepublish100examples.Eachexampleisannotatedbyﬁvework-ers.Fromthispilot,wecollectasetof“good”ex-ampleswhichhavea100%annotationagreementamongworkers.Theordinallabelschosenbytheworkersareregardedasthegoldlabels.Inthesec-ondpilot,werandomlyselecttwo“good”(high-agreement)examplesforeachordinallabelandpub-lishanHITwiththeseexamples.Tomeasuretheworkers’agreement,wecalculatetheaverageofquadraticweightedCohen’sκscoresbetweentheworkers’annotation.Bysettingathresholdoftheaverageofκscoresto0.7,weareabletocreateapoolthathasover150qualiﬁedworkers.5.3CorpusCharacteristicsWewantacorpuswithareliableinter-annotatoragreement.Additionally,inordertoevaluateortrainacommon-senseinferencesystem,weideallyneedacorpusthatprovidesasmanyinferenceexamplesaspossibleforeveryordinallikelihoodvalue.Inthissection,weinvestigatethecharacteristicsoftheJOCIcorpus.WealsocompareJOCIwithrelatedresourcesunderourannotationprotocol.Quality:WemeasurethequalityofeachpairbycalculatingCohen’sκofworkers’annotations.TheaverageκoftheJOCIcorpusis0.54.Fig7showsthegrowthofthesizeofJOCIaswedecreasethethresholdoftheaveragedκtoﬁlterpairs.Evenifweplacearelativelystrictthreshold(>0.6),westillgetalargesubsetofJOCIwithover20Kpairs.Table2containspairsrandomlysampledfromthissubset,

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

388

impossibletech-possibleplausiblelikelyvery-likely0%10%20%30%40%50%60%70%80%90%JOCISNLI-entailmentSNLI-neutralSNLI-contradiction(A)JOCIvs.SNLIimpossibletech-possibleplausiblelikelyvery-likely0%10%20%30%40%50%60%JOCIROCStories-2ndROCStories-3rd(乙)JOCIvs.ROCStoriesimpossibletech-possibleplausiblelikelyvery-likely0%10%20%30%40%50%60%JOCICOPA-0COPA-1(C)JOCIvs.COPAFigure6:ComparisonofnormalizeddistributionsbetweenJOCIandothercorpora.qualitativelyconﬁrmingwecangenerateandcollectannotationsofpairsateachordinalcategory.0.00.20.40.60.81.00500010000150002000025000300003500040000very likelylikelyplausibletech possibleimpossibleFigure7:Datagrowthalongaveragedκscores.LabelDistribution:Webelievedatasetswithwidesupportoflabeldistributionareimportantintrainingandevaluatingsystemstorecognizeordinalscalein-ferences.Fig6ashowsthenormalizedlabeldistri-butionofJOCIvs.SNLI.Asdesired,JOCIcoversawiderangeofordinallikelihoods,withmanysam-plesineachordinalscale.NotealsohowtraditionalRTElabelsarerelatedtoordinallabels,althoughmanyinferencesinSNLIrequirenocommon-senseknowledge(e.g.paraphrases).Asexpected,entail-mentsaremostlyconsideredverylikely;neutralin-ferencesmostlyplausible;andcontradictionslikelytobeeitherimpossibleortechnicallypossible.Fig6bshowsthenormalizeddistributionsofJOCIandROCStories.ComparedwithROCStories,JOCIstillcoversawiderrangeofordinallikelihood.InROCStoriesweobservethat,while2ndsentencesareingeneralmorelikelytobetruethan3rd,alargeproportionofboth2ndand3rdsentencesareplausible,ascomparedtolikelyorverylikely.Thismatchesintuition:pragmaticsdictatesthatsubse-quentsentencesinastandardnarrativecarrynewin-formation.19Thatourprotocolpicksthisupisanencouragingsignforourordinalprotocol,aswellassuggestivethatthemakeupoftheelicitedROCSto-riescollectionisindeed“storylike.”FortheCOPAdataset,weonlymakeuseofthepairsinwhichthealternativesareplausibleeffects(ratherthancauses)ofthepremise,asourproto-colmoreeasilyaccommodatesthesepairs.20An-notatingthissectionofCOPAwithordinallabelsprovidesanenlighteningandvalidatingviewofthedataset.Fig6cshowsthenormalizeddistributionofCOPAnexttothatofJOCI(COPA-1alternativesaremarkedasmostplausible;COPA-0arenot.),Truetoitsname,themajorityofCOPAalternativesarela-beledaseitherplausibleorlikely;almostnoneareimpossible.ThisisconsistentwiththeideathattheCOPAtaskistodeterminewhichoftwopossibleoptionsisthemoreplausible.Fig8showsthejointdistributionofordinallabelson(COPA-0,COPA-1)pairs.Asexpected,thedensestareasoftheheatmaplieabovethediagonal,indicatingthatinalmostev-erypair,COPA-1receivedahigherlikelihoodjudge-mentthanCOPA-0.AutomaticGenerationComparisons:Wecom-parethelabeldistributionsofdifferentmethodsforautomaticgenerationofcommon-senseinference(AGCI)inFig9.AmongACGI-WK(generationbasedonworldknowledge)方法,theISAstrat-egyyieldsabimodaldistribution,withthemajor-ityofinferenceslabeledimpossibleorverylikely.19Ifsubsequentsentencesinastorywerealwaysverylikely,thenthosewouldbeboringtales;thereadercouldinferthecon-clusionbasedontheintroduction.Whileatthesametimeifmostsubsequentsentenceswereonlytechnicallypossible,thereaderwouldgiveupinconfusion.20Speciﬁcally,wetreatpremisesascontextsandeffectalter-nativesaspossiblehypotheses.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

389

impossibletech-possibleplausiblelikelyvery-likelyCOPA-0impossibletech-possibleplausiblelikelyvery-likelyCOPA-1000000120002844514102124231856791840255075100Figure8:COPAheatmap.Thisislikelybecausemostcopularstatementsgen-eratedwiththeISAstrategywilleitherbecategori-callytrueorfalse.Incontrast,thedecisiontreeandfrequencybasedstrategiesgeneratemanymorehy-potheseswithintermediateordinallabels.Thissug-geststhepropositionaltemplates(learnedfromtext)capturemany“possibilistic”hypotheses,whichisouraim.ThetwoAGCI-NN(generationvianeuralmeth-ods)strategiesshowinterestingdifferencesinlabeldistributionaswell.Sequence-to-sequencedecod-ingswithfull-sentencepromptsleadtomoreverylikelylabelsthansingle-wordprompts.ThereasonmaybethatthemodelbehavesmoresimilarlytoSNLIentailmentswhenithasaccesstoallthein-formationinthecontext.Whencombined,theﬁveAGCIstrategies(threeAGCI-WKandtwoAGCI-NN)providereasonablecoverageoverallﬁvecat-egories,ascanbeseeninFig6.6PredictingOrdinalJudgmentsWewanttobeabletopredictordinaljudgmentsofthekindpresentedinthiscorpus.Ourgoalinthissectionistoestablishbaselineresultsandexplorewhatkindsoffeaturesareusefulforpredictingordi-nalcommon-senseinference.Todoso,wetrainandtestalogisticordinalregressionmodelgθ(φ(C,H)),whichoutputsordinallabelsusingfeaturesφdeﬁnedoncontext-inferencepairs.Here,gθ(·)isaregres-sionmodelwithθastrainedparameters;wetrainusingthemargin-basedmethodof(RennieandSre-bro,2005),implementedin(Pedregosa-Izquierdo,2015),21withthefollowingfeatures:21LogisticSE:http://github.com/fabianp/mordimpossibletech-possibleplausiblelikelyvery-likely0100020003000400050006000decision treefrequency basedISA based(A)DistributionofAGCI-WKimpossibletech-possibleplausiblelikelyvery-likely050100150200250300350word promptssentence prompts(乙)DistributionofAGCI-NNFigure9:LabeldistributionsofAGCI.Bagofwordsfeatures(BOW):Wecompute(1)“BOWoverlap”(sizeofwordoverlapinCandH),和(2)BOWoverlapdividedbythelengthofH.Similarityfeatures(SIM):UsingGoogle’sword2vecvectorstrainedon100billiontokensofGoogleNews,22我们(1)sumthevectorsinboththecontextandhypothesisandcomputethecosine-similarityoftheresultingtwovectors(“similarityofaverage”),和(2)computethecosine-similarityofallwordpairsacrossthecontextandinference,thenaveragethosesimilarities(“averageofsimilarity”).Seq2seqscorefeatures(S2S):WecomputethelogprobabilitylogP(H|C)underthesequence-to-sequencemodeldescribedin§4.1.2.Thereareﬁvevariants:(1)Seq2seqtrainedonSNLI“entailment”pairsonly,(2)“neutral”pairsonly,(3)“contradic-tion”pairsonly,(4)“neutral”and“contradiction”pairs,和(5)SNLIpairs(anylabel)withthecon-text(前提)replacedbyanemptystring.Seq2seqbinaryfeatures(S2S-BIN):Binaryindica-torfeaturesforeachoftheﬁveseq2seqmodelvari-ants,indicatingthatmodelachievedthelowestscoreonthecontext-hypothesispair.22TheGoogleNewsembeddingsareavailableat:https://code.google.com/archive/p/word2vec/

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
0
6
8
1
5
6
7
4
9
2

/
t

我

A
C
_
A
_
0
0
0
6
8
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

390

Lengthfeatures(LEN):Thissetcomprisesthreefeatures:thelengthofthecontext(intokens),thedifferenceinlengthbetweenthecontextandhypoth-esis,andabinaryfeatureindicatingifthehypothesisislongerthanthecontext.6.1AnalysisWetrainandtestourregressionmodelontwosub-setsoftheJOCIcorpus,哪个,forbrevity,wecall“A”and“B.”“A”consistsof2,976sentencepairs(i.e.,context-hypothesispairs)fromSNLI-trainan-notatedwithordinallabels.ThiscorrespondstothethreerowslabeledSNLIinTable1(993+988+995=2,976pairs),andcanbeviewedasatextualentailmentdatasetre-labeledwithordinaljudgments.“B”consistsof6,375context-inferencepairs,inwhichthecontextsarethesame2,976SNLI-trainpremisesas“A”,andthehypothesesaregeneratedbasedonworldknowledge(§4.1.1);thesepairsarealsoannotatedwithordinallabels.ThiscorrespondstoasubsetoftherowlabeledAGCIinTable1.Akeydifferencebetween“A”and“B”isthatthehypothesesin“A”arehuman-elicited,whilethosein“B”areauto-generated;weareinterestedinseeingwhetherthisaffectsthetask’sdifﬁculty.23ModelA-trainA-testB-trainB-testRegression:gθ(·)2.051.962.482.74MostFrequent5.705.566.557.00Freq.Sampling4.624.295.615.54RoundedAverage2.462.392.792.89One-vs-All3.743.805.145.71Table4:Meansquarederror.ModelA-trainA-testB-trainB-testRegression:gθ(·).39*.40*.32*.27*MostFrequent.00*.00*.00*.00*Freq.Sampling.03.10.01.01RoundedAverage.00*.00*.00*.00*One-vs-All.31*.30*.28*.24*Table5:Spearman’sρ.(*p-value<.01)Tables4and5showeachmodel’sperformance(meansquarederrorandSpearman’sρ,respectively)inpredictingordinallabels.24Wecompareourordi-nalregressionmodelgθ(·)withthesebaselines:23Detailsofthedatasplitisreportedinthedatasetrelease.24MSEandSpearman’sρarebothcommonlyusedevalua-MostFrequent:Selecttheordinalclassappear-ingmostoftenintrain.FrequencySampling:Selectanordinallabelac-cordingtotheirdistributionintrain.RoundedAverage:Averageoveralllabelsfromtrainroundedtonearestordinal.One-vs-All:TrainoneSVMclassiﬁerperordinalclassandselecttheclasslabelwiththelargestcorre-spondingmargin.Wetrainthismodelwiththesamesetoffeaturesastheordinalregressionmodel.Overall,theregressionmodelachievesthelow-estMSEandhighestρ,implyingthatthisdatasetislearnableandtractable.Naturally,wewouldde-sireamodelthatachievesMSEunder1.0,andwehopethatthereleaseofourdatasetwillencouragemoreconcertedeffortinthiscommon-senseinfer-encetask.Importantly,notethatperformanceonA-testisbetterthanonB-test.Webelieve“B”isamorechallengingdatasetbecauseauto-generationofhypothesisleadstowidervarietythanelicitation.MSESpear.ρFeatureSetABABALL1.962.74.40*.27*ALL–{SIM}2.102.75.34*.25*ALL–{BOW}2.022.77.37*.25*ALL–{SIM,BOW}2.312.79.16*.20*ALL–{S2S}2.002.85.38*.22*ALL–{S2S-BIN}1.972.76.40*.26*ALL–{S2S,S2S-BIN}2.062.87.35*.21*ALL–{LEN}2.012.77.39*.25*∅+{SIM}2.063.04.35*.10∅+{BOW}2.102.89.34*.12*∅+{S2S}2.332.80.14.20*∅+{S2S-BIN}2.392.89.00.00∅+{LEN}2.392.89.00.05Table6:AblationresultsforordinalregressionmodelonA-testandB-test.(*p-value<.01forρ)Wealsorunafeatureablationtest.Table6showsthatthemostusefulfeaturesdifferforA-testandB-test.OnA-test,wheretheinferencesareelicitedfromhumans,removalofsimilarity-andbow-basedfeaturestogetherresultsinthelargestperformancedrop.OnB-test,bycontrast,remov-ingsimilarityandbowfeaturesresultsinacom-tionsinordinalpredictiontasks(Baccianellaetal.,2009;Ben-nettandLanning,2007;GaudetteandJapkowicz,2009;AgrestiandKateri,2011;PopescuandDinu,2009;Liuetal.,2015;Gellaetal.,2013). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 391 parableperformancedroptoremovingseq2seqfea-tures.Theseobservationspointtostatisticaldiffer-encesbetweenhuman-elicitedandauto-generatedhypotheses,amotivatingpointoftheJOCIcorpus.7ConclusionsandFutureWorkInmotivatingtheneedforautomaticallybuildingcollectionsofcommon-senseknowledge,Clarketal.(2003)写了:“ChinalaunchedameteorologicalsatelliteintoorbitWednesday.”suggeststoahumanreaderthat(amongotherthings)therewasarocketlaunch;Chinaprobablyownsthesatel-lite;thesatelliteisformonitoringweather;theorbitisaroundEarth;etcTheuseof“etc”summarizesaninﬁnitenumberofotherstatementsthatahumanreaderwouldﬁndtobeverylikely,likely,technicallyplausible,orim-possible,giventheprovidedcontext.Preferablywecouldbuildsystemsthatwouldau-tomaticallylearncommon-senseexclusivelyfromavailablecorpora;extractingnotjuststatementsaboutwhatispossible,butalsotheassociatedprob-abilitiesofhowlikelycertainthingsaretoobtaininanygivencontext.Weareunawareofexistingworkthathasdemonstratedthistobefeasible.Wehavethusdescribedamulti-stageapproachtocommon-sensetextualinference:weﬁrstextractlargenumbersofpossiblestatementsfromacorpus,andusethosestatementstogeneratecontextuallygroundedcontext-hypothesispairs.Thesearepre-sentedtohumansfordirectassessmentofsubjec-tivelikelihood,ratherthanrelyingoncorpusdataalone.Asthedataisautomaticallygenerated,weseektobypassissuesinhumanelicitationbias.Fur-ther,sincesubjectivelikelihoodjudgmentsarenotdifﬁcultforhumans,ourcrowdsourcingtechniqueisbothinexpensiveandscalable.Futureworkwillextendourtechniquesforfor-wardinferencegeneration,furtherscaleuptheanno-tationofadditionalexamples,andexploretheuseoflarger,morecomplexcontexts.TheresultingJOCIcorpuswillbeusedtoimprovealgorithmsfornatu-rallanguageinferencetaskssuchasstorytellingandstoryunderstanding.AcknowledgmentsThank you to action editor Mark Steedman and the anonymous reviewers for their feedback, as well as colleagues including Lenhart Schubert, Kyle Rawl-ins, Aaron White, and Keisuke Sakaguchi. This work was supported in part by DARPA LORELEI, the National Science Foundation Graduate Research Fellowship and the JHU Human Language Tech-nology Center of Excellence (HLTCOE).ReferencesEneko Agirre, Mona Diab, Daniel Cer, and AitorGonzalez-Agirre.2012.Semeval-2012task6:Apilotonsemantictextualsimilarity.InProceedingsoftheFirstJointConferenceonLexicalandComputationalSemantics-Volume1:ProceedingsoftheMainConfer-enceandtheSharedTask,andVolume2:ProceedingsoftheSixthInternationalWorkshoponSemanticEval-uation,pages385–393.AssociationforComputationalLinguistics.AlanAgrestiandMariaKateri.2011.Categoricaldataanalysis.InInternationalencyclopediaofstatisticalscience,pages206–208.Springer.StefanoBaccianella,AndreaEsuli,andFabrizioSebas-tiani.2009.Evaluationmeasuresforordinalregres-sion.In2009NinthInternationalConferenceonIn-telligentSystemsDesignandApplications,pages283–287.InstituteofElectricalandElectronicsEngineers.DzmitryBahdanau,KyunghyunCho,andYoshuaBen-gio.2014.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.arXivpreprintarXiv:1409.0473v7.CollinF.Baker,CharlesJ.Fillmore,andJohnB.Lowe.1998.TheBerkeleyFrameNetProject.InProceed-ingsofthe36thAnnualMeetingoftheAssociationforComputationalLinguisticsand17thInternationalConferenceonComputationalLinguistics,Volume1,pages86–90.AssociationforComputationalLinguis-tics.MicheleBankoandOrenEtzioni.2007.Strategiesforlifelongknowledgeextractionfromtheweb.InProceedingsofthe4thInternationalConferenceonKnowledgeCapture,pages95–102.AssociationforComputingMachinery.IslamBeltagy,StephenRoller,PengxiangCheng,KatrinErk,andRaymondJ.Mooney.2017.Representingmeaningwithacombinationoflogicalanddistribu-tionalmodels.ComputationalLinguistics.JamesBennettandStanLanning.2007.TheNetﬂixprize.InProceedingsofKDDCupandWorkshop,page35. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 392 JonathanBerant,IdoDagan,andJacobGoldberger.2011.Globallearningoftypedentailmentrules.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages610–619.AssociationforComputationalLinguistics.SamuelR.Bowman,GaborAngeli,ChristopherPotts,andChristopherD.Manning.2015.Alargeanno-tatedcorpusforlearningnaturallanguageinference.InProceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages632–642.AssociationforComputationalLinguistics.NathanaelChambersandDanJurafsky.2008.Unsuper-visedlearningofnarrativeeventchains.InProceed-ingsofACL-08:赫勒特,pages789–797.AssociationforComputationalLinguistics.TimothyChklovski.2003.Learner:ASystemforAc-quiringCommonsenseKnowledgebyAnalogy.InProceedingsofSecondInternationalConferenceonKnowledgeCapture(K-CAP2003).StephenClarkandDavidWeir.1999.Aniterativeap-proachtoestimatingfrequenciesoverasemantichier-archy.InProceedingsoftheJointSIGDATConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandVeryLargeCorpora.Citeseer.PeterClark,PhilHarrison,andJohnThompson.2003.Aknowledge-drivenapproachtotextmeaningprocess-ing.InProceedingsoftheNorthAmericanChap-teroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologiesWorkshoponTextMeaning-Volume9,pages1–6.AssociationforCom-putationalLinguistics.HerbertH.Clark.1975.Bridging.InR.C.SchankandB.L.Nash-Webber,编辑,Theoreticalissuesinnat-urallanguageprocessing.AssociationforComputingMachinery,NewYork.RobinCooper,DickCrouch,JanVanEijck,ChrisFox,JohanVanGenabith,JanJaspars,HansKamp,DavidMilward,ManfredPinkal,MassimoPoesio,andStevePulman.1996.Usingtheframework.Technicalre-port,TechnicalReportLRE62-051D-16,TheFraCaSConsortium.IdoDagan,OrenGlickman,andBernardoMagnini.2006.ThePascalrecognisingtextualentailmentchal-lenge.InMachinelearningchallenges:evaluatingpredictiveuncertainty,visualobjectclassiﬁcation,andrecognisingtextualentailment.FerdinanddeHaan.1997.TheInteractionofModalityandNegation:ATypologicalStudy.AGarlandSeries.GarlandPub.Marie-CatherinedeMarneffe,TimothyDozat,NataliaSilveira,KatriHaverinen,FilipGinter,JoakimNivre,andChristopherD.Manning.2014.Universalstan-forddependencies:Across-linguistictypology.InProceedingsoftheNinthInternationalConferenceonLanguageResourcesandEvaluation(LREC’14),pages4585–4592.EuropeanLanguageResourcesAs-sociation(ELRA).KatrinErk,SebastianPad´o,andUlrikePad´o.2010.Aﬂexible,corpus-drivenmodelofregularandinverseselectionalpreferences.ComputationalLinguistics,36(4):723–763.FrancisFerraro,MaxThomas,MatthewR.Gormley,TravisWolfe,CraigHarman,andBenjaminVanDurme.2014.ConcretelyAnnotatedCorpora.In4thWorkshoponAutomatedKnowledgeBaseConstruc-tion(AKBC).NoahS.Friedland,PaulG.Allen,GavinMatthews,MichaelWitbrock,DavidBaxter,JonCurtis,BlakeShepard,PierluigiMiraglia,JurgenAngele,SteffenStaab,etal.2004.ProjectHalo:Towardsadigitalaristotle.AImagazine,25(4):29.DanGarrette,KatrinErk,andRaymondMooney.2011.IntegratinglogicalrepresentationswithprobabilisticinformationusingMarkovlogic.InProceedingsoftheNinthInternationalConferenceonComputationalSemantics,pages105–114.AssociationforComputa-tionalLinguistics.LisaGaudetteandNathalieJapkowicz.2009.Evalua-tionmethodsforordinalclassiﬁcation.InProceedingsofthe22ndCanadianConferenceonArtiﬁcialIntel-ligence:AdvancesinArtiﬁcialIntelligence,CanadianAI’09,pages207–210.Springer-Verlag.SpandanaGella,PaulCook,andBoHan.2013.Unsu-pervisedwordusagesimilarityinsocialmediatexts.InSecondJointConferenceonLexicalandComputa-tionalSemantics(*SEM),Volume1:ProceedingsoftheMainConferenceandtheSharedTask:Seman-ticTextualSimilarity,pages248–253.AssociationforComputationalLinguistics.DaniloGiampiccolo,HoaTrangDang,BernardoMagnini,IdoDagan,ElenaCabrio,andBillDolan.2008.ThefourthPascalrecognizingtextualentail-mentchallenge.InProceedingsoftheTextAnalysisConference(TAC)2008.JonathanGordonandBenjaminVanDurme.2013.Re-portingbiasandknowledgeextraction.InAutomatedKnowledgeBaseConstruction(AKBC):The3rdWork-shoponKnowledgeExtractionattheACMConferenceonInformationandKnowledgeManagement.CatherineHavasi,RobertSpeer,andJasonAlonso.2007.ConceptNet3:aFlexible,MultilingualSemanticNet-workforCommonSenseKnowledge.InProceedingsofRecentAdvancesinNaturalLanguageProcessing.MartiA.Hearst.1992.Automaticacquisitionofhy-ponymsfromlargetextcorpora.InProceedingsof l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 393 the14thConferenceonComputationalLinguistics-Volume2,pages539–545.AssociationforComputa-tionalLinguistics.JerryR.HobbsandCostanzaNavarretta.1993.Methodologyforknowledgeacquisition(unpublishedmanuscript).http://www.isi.edu/˜hobbs/damage.text.JerryR.Hobbs.1987.Worldknowledgeandwordmean-ing.InProceedingsofthe1987WorkshoponTheoret-icalIssuesinNaturalLanguageProcessing,TINLAP’87,pages20–27.AssociationforComputationalLin-guistics.SeppHochreiterandJ¨urgenSchmidhuber.1997.Longshort-termmemory.Neuralcomputation,9(8):1735–1780.LaurenceR.Horn.1989.ANaturalHistoryofNegation.DavidHumeseries.CSLIPublications.DouglasB.Lenat.1995.CYC:Alarge-scaleinvestmentinknowledgeinfrastructure.CommunicationsoftheACM,38(11):33–38.HectorJ.Levesque,ErnestDavis,andLeoraMorgen-stern.2011.TheWinogradschemachallenge.InAAAISpringSymposium:LogicalFormalizationsofCommonsenseReasoning.MariaLiakataandStephenPulman.2002.Fromtreestopredicate-argumentstructures.InProceed-ingsofthe19thInternationalConferenceonCompu-tationalLinguistics-Volume1,pages1–7.AssociationforComputationalLinguistics.DekangLinandPatrickPantel.2001.DIRT-DiscoveryofInferenceRulesfromText.InProceedingsoftheseventhACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages323–328.AssociationforComputingMachinery.QuanLiu,HuiJiang,SiWei,Zhen-HuaLing,andYuHu.2015.Learningsemanticwordembeddingsbasedonordinalknowledgeconstraints.InProceedingsofthe53rdAnnualMeetingoftheAssociationforCompu-tationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguageProcessing(ACL-IJCNLP),pages1501–1511.JohnLyons.1977.Semantics.CambridgeUniversityPress.BillMacCartneyandChristopherD.Manning.2007.Naturallogicfortextualinference.InProceedingsoftheACL-PASCALWorkshoponTextualEntailmentandParaphrasing,pages193–200.AssociationforComputationalLinguistics.MarcoMarelli,StefanoMenini,MarcoBaroni,LuisaBentivogli,RaffaellaBernardi,andRobertoZampar-elli.2014.ASICKcurefortheevaluationofcomposi-tionaldistributionalsemanticmodels.InProceedingsoftheNinthInternationalConferenceonLanguageResourcesandEvaluation,pages216–223.JohnMcCarthy.1959.Programswithcommonsense.InProceedingsoftheTeddingtonConferenceontheMechanizationofThoughtProcesses,伦敦:HerMajesty’sStationeryOfﬁce.KenMcRae,MichaelJ.Spivey-Knowlton,andMichaelK.Tanenhaus.1998.Modelingthein-ﬂuenceofthematicﬁt(andotherconstraints)inon-linesentencecomprehension.JournalOfMemoryandLanguage,38:283–312.KenMcRae,GeorgeS.Cree,MarkS.Seidenberg,andChrisMcNorgan.2005.Semanticfeatureproductionnormsforalargesetoflivingandnonlivingthings.BehaviorResearchMethods,Instruments,&Comput-ers,37(4):547–559.George A. Miller.1995. WordNet: a lexical database forEnglish. ACM 通讯, 38(11):39–41. Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell,andRossGirshick.2016.Seeingthroughthehumanreportingbias:Visualclassiﬁersfromnoisyhuman-centriclabels.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages2930–2939.NasrinMostafazadeh,NathanaelChambers,XiaodongHe,DeviParikh,DhruvBatra,LucyVanderwende,PushmeetKohli,andJamesAllen.2016.AcorpusandClozeevaluationfordeeperunderstandingofcom-monsensestories.InProceedingsofthe2016Confer-enceoftheNorthAmericanChapteroftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies,pages839–849.AssociationforCompu-tationalLinguistics.MariusPas¸caandBenjaminVanDurme.2007.Whatyouseekiswhatyouget:Extractionofclassattributesfromquerylogs.InProceedingsofthe20thInterna-tionalJointConferenceonArtiﬁcalIntelligence.UlrikePad´o,MatthewW.Crocker,FrankKeller,andMatthewW.Crocker.2009.Aprobabilisticmodelofsemanticplausibilityinsentenceprocessing.Cog-nitiveScience,33(5):795–838.PatrickPantel,RahulBhagat,BonaventuraCoppola,TimothyChklovski,andEduardH.Hovy.2007.ISP:Learninginferentialselectionalpreferences.InPro-ceedingsofHumanLanguageTechnologies:TheAn-nualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages564–571.DenisPaperno,Germ´anKruszewski,AngelikiLazari-dou,NgocQuanPham,RaffaellaBernardi,SandroPezzelle,MarcoBaroni,GemmaBoleda,andRaquelFernandez.2016.TheLAMBADAdataset:Wordpre-dictionrequiringabroaddiscoursecontext.InPro-ceedingsofthe54thAnnualMeetingoftheAssocia-tionforComputationalLinguistics(Volume1:Long l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 394 文件),pages1525–1534.AssociationforComputa-tionalLinguistics.RobertParker,DavidGraff,JunboKong,KeChen,andKazuakiMaeda.2011.EnglishGigawordFifthEdi-tion.LinguisticDataConsortium.MariusPasca.2008.Turningwebtextandsearchqueriesintofactualknowledge:Hierarchicalclassattributeex-traction.InProceedingsofthe23rdNationalConfer-enceonArtiﬁcialIntelligence.FabianPedregosa-Izquierdo.2015.FeatureextractionandsupervisedlearningonfMRI:frompracticetothe-ory.Ph.D.thesis,Universit´ePierreetMarieCurie-ParisVI.KarlPichottaandRaymondJ.Mooney.2016.Learn-ingstatisticalscriptswithLSTMrecurrentneuralnet-works.InProceedingsofthe30thAAAIConferenceonArtiﬁcialIntelligence(AAAI),pages2800–2806.MariusPopescuandLiviuP.Dinu.2009.Comparingstatisticalsimilaritymeasuresforstylisticmultivariateanalysis.InProceedingsofRecentAdvancesinNatu-ralLanguageProcessing,pages349–354.JasonD.M.RennieandNathanSrebro.2005.Lossfunc-tionsforpreferencelevels:Regressionwithdiscreteorderedlabels.InProceedingsoftheIJCAIMultidis-ciplinaryWorkshoponAdvancesinPreferenceHan-dling,pages180–186.PhilipResnik.1993.Semanticclassesandsyntacticam-biguity.InProceedingsofARPAWorkshoponHumanLanguageTechnology.MatthewRichardsonandPedroDomingos.2006.Markovlogicnetworks.Machinelearning,62(1-2):107–136.StephenD.Richardson,WilliamB.Dolan,andLucyVan-derwende.1998.MindNet:Acquiringandstructuringsemanticinformationfromtext.InProceedingsofthe36thAnnualMeetingoftheAssociationforComputa-tionalLinguisticsand17thInternationalConferenceonComputationalLinguistics,Volume2,pages1098–1102.AssociationforComputationalLinguistics.MelissaRoemmele,CosminAdrianBejan,andAn-drewS.Gordon.2011.Choiceofplausiblealterna-tives:Anevaluationofcommonsensecausalreason-ing.InAAAISpringSymposium:LogicalFormaliza-tionsofCommonsenseReasoning,pages90–95.RachelRudinger,PushpendreRastogi,FrancisFerraro,andBenjaminVanDurme.2015.Scriptinductionaslanguagemodeling.InProceedingsofthe2015Con-ferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1681–1686.AssociationforCom-putationalLinguistics.RoserSaur´ıandJamesPustejovsky.2009.FactBank:Acorpusannotatedwitheventfactuality.Languageresourcesandevaluation,43(3):227–268.AsadSayeed,VeraDemberg,andPavelShkadzko.2015.Anexplorationofsemanticfeaturesinanunsupervisedthematicﬁtevaluationframework.ItalianJournalofComputationalLinguistics,1(1).RogerC.Schank.1975.Usingknowledgetounderstand.InTINLAP’75:Proceedingsofthe1975WorkshoponTheoreticalIssuesinNaturalLanguageProcessing,pages117–121.LenhartSchubert.2002.Canwederivegeneralworldknowledgefromtexts?InProceedingsoftheSecondInternationalConferenceonHumanLanguageTech-nologyResearch,pages94–97.MorganKaufmannPublishersInc.PushSingh.2002.Thepublicacquisitionofcommon-senseknowledge.InProceedingsofAAAISpringSymposium:Acquiring(andUsing)Linguistic(andWorld)KnowledgeforInformationAccess.AAAI.RionSnow,DanielJurafsky,andAndrewY.Ng.2006.Semantictaxonomyinductionfromheterogenousevi-dence.InProceedingsofthe21stInternationalCon-ferenceonComputationalLinguisticsand44thAnnualMeetingoftheAssociationforComputationalLinguis-tics,pages801–808.AssociationforComputationalLinguistics.FabianM.Suchanek,GjergjiKasneci,andGerhardWeikum.2007.YAGO:ACoreofSemanticKnowl-edgeUnifyingWordNetandWikipedia.InProceed-ingsofthe16thInternationalConferenceonWorldWideWeb,page697.MichaelK.TanenhausandMarkS.Seidenberg.1980.Discoursecontextandsentenceperception.TechnicalReport176,CenterfortheStudyofReading,IllinoisUniversity,Urbana.RichmondH.Thomason.1972.ASemanticTheoryofSortalIncorrectness.JournalofPhilosophicalLogic,1(2):209–258,May.BenjaminVanDurmeandLenhartSchubert.2008.Openknowledgeextractionthroughcompositionallanguageprocessing.InProceedingsofthe2008ConferenceonSemanticsinTextProcessing,pages239–254.Associ-ationforComputationalLinguistics.BenjaminVanDurme,PhillipMichalak,andLenhartSchubert.2009.DerivinggeneralizedknowledgefromcorporausingWordNetabstraction.InProceed-ingsofthe12thConferenceoftheEuropeanChapteroftheACL(EACL2009),pages808–816.AssociationforComputationalLinguistics.BenjaminVanDurme.2010.Extractingimplicitknowl-edgefromtext.Ph.D.thesis,UniversityofRochester,DepartmentofComputerScience,罗切斯特,NY14627-0226.OriolVinyals,ŁukaszKaiser,TerryKoo,SlavPetrov,IlyaSutskever,andGeoffreyHinton.2015.Grammar l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 395 asaforeignlanguage.InC.Cortes,N.D.Lawrence,D.D.Lee,M.Sugiyama,andR.Garnett,编辑,Ad-vancesinNeuralInformationProcessingSystems28,pages2773–2781.CurranAssociates,Inc.AaronStevenWhite,DrewReisinger,KeisukeSak-aguchi,TimVieira,ShengZhang,RachelRudinger,KyleRawlins,andBenjaminVanDurme.2016.Uni-versaldecompositionalsemanticsonuniversaldepen-dencies.InProceedingsofthe2016ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1713–1723.AssociationforComputationalLin-guistics.PeterYoung,AliceLai,MicahHodosh,andJuliaHock-enmaier.2014.Fromimagedescriptionstovisualdenotations:Newsimilaritymetricsforsemanticin-ferenceovereventdescriptions.TransactionsoftheAssociationforComputationalLinguistics,2:67–78.UriZernik.1992.Closedyesterdayandclosedminds:Askingtherightquestionsofthecorpustodistinguishthematicfromsententialrelations.InProceedingsofthe14thConferenceonComputationalLinguistics-Volume4,pages1305–1311.AssociationforCompu-tationalLinguistics.ShengZhang,KevinDuh,andBenjaminVanDurme.2017.MT/IE:Cross-lingualopeninformationex-tractionwithneuralsequence-to-sequencemodels.InProceedingsofthe15thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguis-tics:Volume2,ShortPapers,pages64–70.Associa-tionforComputationalLinguistics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 6 8 1 5 6 7 4 9 2 / / t l a c _ a _ 0 0 0 6 8 p d . f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 396 计算语言学协会会刊, 卷. 5, PP. 379–395, 2017. 动作编辑器: Mark Steedman. 图像

下载pdf