Transactions of the Association for Computational Linguistics, vol. 4, pp. 537–549, 2016. Action Editor: Timothy Baldwin.
Submission batch: 1/2016; Revision batch: 5/2016; Published 12/2016.
2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.
c
(cid:13)
UnderstandingSatiricalArticlesUsingCommon-SenseDanGoldwasserPurdueUniversityDepartmentofComputerSciencedgoldwas@purdue.eduXiaoZhangPurdueUniversityDepartmentofComputerSciencezhang923@purdue.eduAbstractAutomaticsatiredetectionisasubtletextclas-sificationtask,formachinesandattimes,evenforhumans.Inthispaperwearguethatsatiredetectionshouldbeapproachedusingcommon-senseinferences,ratherthantradi-tionaltextclassificationmethods.Wepresentahighlystructuredlatentvariablemodelcap-turingtherequiredinferences.Themodelab-stractsoverthespecificentitiesappearinginthearticles,groupingthemintogeneralizedcategories,thusallowingthemodeltoadapttopreviouslyunseensituations.1IntroductionSatireisawritingtechniqueforpassingcriticismusinghumor,ironyorexaggeration.Itisoftenusedincontemporarypoliticstoridiculeindividualpoliticians,politicalpartiesorsocietyasawhole.Werestrictourselvesinthispapertosuchpoliti-calsatirearticles,broadlydefinedasarticleswhosepurposeisnottoreportrealevents,butrathertomocktheirsubjectmatter.Satiricalwritingoftenbuildsonrealfactsandexpectations,pushedtoab-surditytoexpresshumorousinsightsaboutthesitu-ation.Asaresult,thedifferencebetweenrealandsatiricalarticlescanbesubtleandoftenconfusingtoreaders.Withtherecentriseofsocialmediaoutlets,satiricalarticleshavebecomeincreasinglypopularandhavefamouslyfooledseveralleadingnewsagencies1.Thesemisinterpretationscanoften1https://newrepublic.com/article/118013/satire-news-websites-are-cashing-gullible-outraged-readersVicePresidentJoeBidensuddenlybargedin,askingifanyonecould“hook[him]upwithaDixiecup”oftheirurine.“C’mon,yougottahelpmegetsomecleanwhiz.Shinseki,Donovan,I’mlookinginyourdirection”saidBiden.“Doyouwanttohitthis?”amanaskedPresidentBarackObamainabarinDenverTuesdaynight.Thepresidentlaughedbutdidn’tindulge.Itwasn’ttheonlytimeObamawasofferedweedonhisnightout.Figure1:Examplesofrealandsatiricalarticles.Top:satiricalnewsexcerpt.Bottom:realnewsexcerpt.beattributedtocarelessreading,asthereisaclearlinebetweenunusualeventsfindingtheirwaytothenewsandsatire,whichintentionallyplaceskeypo-liticalfiguresinunlikelyhumorousscenarios.Thetwocanbeseparatedbycarefullyreadingthearti-cles,exposingthesatiricalnatureoftheeventsde-scribedinsucharticles.Inthispaperwefollowthisintuition.Welookintothesatiredetectiontask(BurfootandBald-win,2009),predictingifagivennewsarticleisrealorsatirical,andsuggestthatthispredictiontaskshouldbedefinedovercommon-senseinferences,ratherthanlookingatitasalexicaltextclassifica-tiontask(PangandLee,2008;BurfootandBald-win,2009),whichbasesthedecisiononword-levelfeatures.Tofurthermotivatethisobservation,considerthetwoexcerptsinFigure1.Bothexcerptsmentiontop-rankingpoliticians(thePresidentandVicePres-ident)inadrug-relatedcontext,andcontaininfor-malslangutterances,inappropriateforthesubjects’
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
538
position.Thedifferencebetweenthetwoexamplesisapparentwhenanalyzingthesituationdescribedinthetwoarticles:Thefirstexample(top),de-scribestheVicePresidentspeakinginappropriatelyinaworksetting,clearlyanunrealisticsituation.Inthesecond(bottom)thePresidentisspokentoinap-propriately,anunlikely,yetnotunrealistic,situation.Fromtheperspectiveofourpredictiontask,itisad-visabletobasethepredictiononastructuredrepre-sentationcapturingtheeventsandtheirparticipants,describedinthetext.Theabsurdityofthesituationdescribedinsatir-icalarticlesisoftennotuniquetothespecificin-dividualsappearinginthenarrative.Inourexam-ple,bothpoliticiansareinterchangeable:placingthepresidentinthesituationdescribedinthefirstex-cerptwouldnotmakeitlessabsurd.Itisthereforedesirabletomakeacommon-senseinferenceabouthigh-rankingpoliticiansinthisscenario.Wefollowtheseintuitionsandsuggestanovelapproachforthesatirepredictiontask.Ourmodel,COMSENSE,makespredictionsbymakingcommon-senseinferencesoverasimplifiednarra-tiverepresentation.Similarlytopriorwork(Cham-bersandJurafsky,2008;Goyaletal.,2010;WangandMcAllester,2015)werepresentthenarrativestructurebycapturingthemainentities(andtrackingtheirmentionsthroughoutthetext),theiractivities,andtheirutterances.TheresultofthisprocessisaNarrativeRepresentationGraph(NRG).Figure2de-pictsexamplesofthisrepresentationfortheexcerptsinFigure1.GivenanNRG,ourmodelmakesinferencesquantifyinghowlikelyareeachoftherepresentedeventsandinteractionstoappearinareal,orsatiri-calcontext.AnnotatingtheNRGforsuchinferencesisachallengingtask,asthespaceofpossiblesitua-tionsisextremelylarge.Instead,weframethere-quiredinferencesasahighly-structuredlatentvari-ablemodel,traineddiscriminativelyaspartofthepredictiontask.Withoutexplicitsupervision,themodelassignscategoriestotheNRGvertices(forexample,bygroupingpoliticiansintoasinglecate-gory,orbygroupinginappropriateslangutterances,regardlessofspecificwordchoice).Thesecategoryassignmentsformtheinfrastructureforhigher-levelreasoning,astheyallowsthemodeltoidentifythecommonalitiesbetweenunrelatedpeople,theirac-tionsandtheirwords.Themodellearnscommon-sensepatternsleadingtorealorsatiricaldecisionsbasedonthesecategories.Weexpressthesepat-ternsasparametrizedrules(actingasglobalfea-turesinthepredictionmodel),andbasethepredic-tionontheiractivationvalues.Inourexample,theserulescancapturethecombinationof(EPolitician)∧(Qslang)→Satire,whereEPoliticianandQslangarelatentvariableassignmentstoentityandutterancecategoriesrespectively.Ourexperimentslookintotwovariantsofsatireprediction:usingfullarticles,andthemorechal-lengingsub-taskofpredictingifaquoteisrealgivenitsspeaker.Weusetwodatasetscollected6yearsapart.Thefirstcollectedin2009(BurfootandBald-win,2009)andanadditionaldatasetcollectedre-cently.Sincesatiricalarticlestendtofocusoncur-rentevents,thetwodatasetsdescribedifferentpeo-pleandworldevents.Todemonstratetherobust-nessofourCOMSENSEapproachweusethefirstdatasetfortraining,andthesecondasout-of-domaintestdata.WecompareCOMSENSEtoseveralcom-petingsystemsincludingastate-of-the-artConvo-lutionalNeuralNetwork(Kim,2014).Ourexperi-mentsshowthatCOMSENSEoutperformsallothermodels.Mostinterestingly,itdoessowithalargermarginwhentestedovertheout-of-domaindataset,demonstratingthatitismoreresistanttooverfittingcomparedtoothermodels.2RelatedWorkTheproblemofbuildingcomputationalmodelsdeal-ingwithhumor,satire,ironyandsarcasmhasat-tractedconsiderableinterestinthetheNaturalLan-guageProcessing(NLP)andMachineLearning(ML)communitiesinrecentyears(Wallaceetal.,2014;Riloffetal.,2013;Wallaceetal.,2015;Davi-dovetal.,2010;Karouietal.,2015;BurfootandBaldwin,2009;Teppermanetal.,2006;Gonz´alez-Ib´anezetal.,2011;LukinandWalker,2013;Fi-latova,2012;Reyesetal.,2013).Mostworkhaslookedintoironicexpressionsinshortertexts,suchastweetsandforumcomments.MostrelatedtoourworkisBurfootandBaldwin(2009)whichfocusedonsatiricalarticles.Inthatworktheauthorssug-gestatextclassificationapproachforsatiredetec-tion.Inadditiontousingbag-of-wordsfeatures,le
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
539
authorsalsoexperimentwithsemanticvalidityfea-tureswhichpairentitiesmentionedinthearticle,thuscapturingcombinationsunlikelytoappearinarealcontext.Thispaperfollowsasimilarintuition;cependant,itlooksintostructuredrepresentationsofthisinformation,andstudiestheiradvantages.Ourstructuredrepresentationisrelatedtoseveralrecentreadingcomprehensiontasks(Richardsonetal.,2013;Berantetal.,2014)andworkonnarrativerepresentationsuch,asevent-chains(ChambersandJurafsky,2009;ChambersandJurafsky,2008),plot-units(Goyaletal.,2010;Lehnert,1981)andStoryIntentionGraphs(Elson,2012).Unliketheseworks,narrativerepresentationisnotthefocusofthiswork,butratherprovidesthebasisformakinginferences,andasresultwechooseasimpler(andmorero-bust)representation,mostcloselyresemblingeventchains(ChambersandJurafsky,2008)Makingcommon-senseinferencesisoneofthecoremissionsofAI,applicabletoawiderangeoftasks.Earlywork(Reiter,1980;McCarthy,1980;Hobbsetal.,1988)focusedonlogicalinference,andmanualconstructionofsuchknowledgerepos-itories(Lenat,1995;LiuandSingh,2004).Morerecently,severalresearchershavelookedintoau-tomaticcommon-senseknowledgeconstructionandexpansionusingcommon-senseinferences(Tandonetal.,2011;Bordesetal.,2011;Socheretal.,2013;AngeliandManning,2014).SeveralworkshavelookedintocombiningNLPwithcommon-sense(Gerberetal.,2010;Gordonetal.,2011;LoBueandYates,2011;LabutovandLipson,2012;Gordonetal.,2012).MostrelevanttoourworkisaSemEval-2012task(Gordonetal.,2012),lookingintocommon-sensecausalityidentificationpredic-tion.Inthisworkwefocusonadifferenttask,satiredetectioninnewsarticles.Wearguethatthistaskisinherentlyacommon-sensereasoningtask,asiden-tifyingthesatiricalaspectsinnarrativetextdoesnotrequireanyspecializedtraining,butinsteadreliesheavilyoncommonexpectationsofnormativebe-havioranddeviationfromitinsatiricaltext.Wedesignourmodeltocapturethesebehavioralexpec-tationsusing(weighted)rules,insteadofrelyingonlexicalfeaturesasisoftenthecaseintextcategoriza-tiontasks.Othercommon-senseframeworkstypi-callybuildonexistingknowledgebasesrepresent-?C?mon, you gotta help me get some clean whiz- Shinseki, Donovan, je?m looking in your direction”MNRArguments and modifiers PredicatesAnimateEntitiesbargeSuddenlyQuoteAskVice President Joe Biden(un)NRGforasatiricalarticleTMPQuoteAska man”Do you want to hit this?”Tuesday NightBar in DenverDid Not President Barack Obama The PresidentCoRefArguments and modifiers PredicatesAnimateEntitiesLOCLaughA0A1A0IndulgeA0NEG(b)NRGforarealarticleFigure2:NarrativeRepresentationGraph(NRG)fortwoarticlesnippetsingworldknowledge;cependant,specifyinginad-vancethebehaviorscommonlyassociatedwithpeo-plebasedontheirbackgroundandsituationalcon-text,totheextentitcanprovidegoodcoverageforourtask,requiresconsiderableeffort.Instead,wesuggesttolearnthisinformationfromdatadirectly,andourmodellearnsjointlytopredictandrepresentthesatiricalelementsofthearticle.3ModelingGivenanewsarticle,ourCOMSENSEsystemfirstconstructsagraph-basedrepresentationofthenarrative,denotedNarrativeRepresentationGraph(NRG),capturingitsparticipants,theiractionsandutterances.Wedescribethisprocessinmorede-tailinSection3.1.BasedontheNRG,ourmodelmakesasetofinferences,mappingtheNRGver-ticestogeneralcategoriesabstractingoverthespe-cificNRG.Theseabstractionsareformulatedasla-tentvariablesinourmodel.ThesystemmakesapredictionbyreasoningovertheabstractNRG,par
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
540
decomposingitintopaths,whereeachpathcapturesapartialviewoftheabstractNRG.Finallyweasso-ciatethepathswiththesatiredecisionoutput.TheCOMSENSEmodelthensolvesaglobalinferenceproblem,formulatedasanIntegerLinearProgram(ILP)instance,lookingforthemostlikelyexplana-tionofthesatirepredictionoutput,consistentwiththeextractedpatterns.WeexplainthisprocessindetailinSection3.2.NRGAbstractionasCommon-SenseThemaingoaloftheCOMSENSEapproachistomoveawayfrompurelylexicalmodels,andinsteadbaseitsde-cisionsoncommon-senseinferences.Weformulatetheseinferencesasparameterizedrules,mappingel-ementsofthenarrative,representedusingtheNRG,toaclassificationdecision.Therules’abilitytocap-turecommon-senseinferenceshingesontwokeyel-ements.First,theabstractionofNRGnodesintotypednarrativeelementsallowsthemodeltofindcommonalitiesacrossentitiesandtheiractions.ThisisdonebyassociatingeachNRGnodewithasetoflatentvariables.Second,constructingthedecisionrulesaccordingtothestructureoftheNRGgraphallowsustomodelthedependenciesbetweennarra-tiveelements.ThisisdonebyfollowingthepathsintheabstractNRG,generatingrulesbycombiningthelatentvariablesrepresentingnodesonthepath,andassociatingthemwithasatiredecisionvariable.ComputationalConsiderationsWhensettingupthelearningsystem,thereisaclearexpressiv-ity/efficiencytradeoffoverthesetwoelements.In-creasingthenumberoflatentvariablesassociatedwitheachNRGnodewouldallowthemodeltolearnamorenuancedrepresentation.Similarly,gener-atingrulesbyfollowinglongerNRGpathswouldallowthemodeltoconditionitssatiredecisiononmultipleentitiesandeventsjointly.Theaddedexpressivitydoesnotcomewithoutprice.Giventhelimitedsupervisionaffordedtothemodelwhenlearningtheserules,additionalexpressivitywouldresultinamoredifficultlearningproblemwhichcouldleadtooverfitting.Ourexperimentsdemon-stratethistradeoff,andinFigure4weshowtheef-fectofincreasingthenumberoflatentvariablesonperformance.Anadditionalconcernwithincreas-ingthemodel’sexpressivityiscomputationaleffi-ciency.SatirepredictionisformulatedasanILPinferenceprocessjointlyassigningvaluestothela-tentvariablesandmakingthesatiredecision.SinceILPisexponentialinthenumberofvariables,in-creasingthenumberoflatentvariableswouldbecomputationallychallenging.Inthispaperwetakeastraight-forwardapproachtoensuringcomputa-tionaltractabilitybylimitingthelengthofNRGpathsconsideredbyourmodeltoaconstantsizec=2.Assumingthatwehavemlatentcategoriesas-sociatedwitheachnode,eachpathwouldgeneratemcILPvariables(seeSection3.3fordetails),hencetheimportanceoflimitingthelengthofthepath.Inthefutureweintendtostudyapproximateinferencemethodsthatcanhelpalleviatethiscomputationaldifficultly,suchasusingLP-approximation(Martinsetal.,2009).3.1NarrativeRepresentationGraphforNewsArticlesTheNarrativeRepresentationGraph(NRG)isasim-plegraph-basedrepresentationfornarrativetext,de-scribingtheconnectionsbetweenentitiesandtheiractions.ThekeymotivationbehindNRGwastopro-videthestructurenecessaryformakinginferences,andasaresultwechoseasimplerepresentationthatdoesnottakeintoaccountcross-eventrelationships,andnuanceddifferencesbetweensomeoftheeventargumenttypes.Whileotherrepresentations(Mani,2012;Goyaletal.,2010;Elson,2012)capturemoreinformation,theyarehardertoconstructandmorepronetoerror.Wewilllookintoadaptingthesemodelsforourpurposeinfuturework.Sincesatiricalarticlestendtofocusonpoliticalfigures,wedesigntheNRGaroundanimateentitiesthatdrivetheeventsdescribedinthetext,theirac-tions(representedaspredicatenodes),theircontex-tualizinginformation(location-modifiers,temporalmodifiers,negations),andtheirutterances.Weomit-tedfromthegraphothernon-animateentitytypes.InFigure2weshowanexampleofthisrepresenta-tion.Similarinspirittopreviouswork(Goyaletal.,2010;ChambersandJurafsky,2008),werepresenttherelationsbetweentheentitiesthatappearinthestoryusingaSemanticRoleLabelingsystem(Pun-yakanoketal.,2008)andcollapsealltheentitymen-tionsintoasingleentityusingaCo-Referencereso-lutionsystem(Manningetal.,2014).Weattribute
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
541
utterancestotheirspeakerbasedonapreviouslypublishedrulebasedsystem(O’Keefeetal.,2012).Officiellement,weconstructagraphG={V,E},whereVconsistsofthreetypesofvertices:AN-IMATEENTITY(e.g.,people),PREDICATE(e.g.,actions)andARGUMENT(e.g.,utterances,loca-tions).TheedgesEcapturetherelationshipsbe-tweenvertices.Thegraphcontainsseveraldifferentedges.COREFedgescollapsethementionsofthesameentityintoasingleentity,ARGUMENT-TYPEedgesconnectANIMATEENTITYnodestoPRED-ICATEnodes2,andPREDICATEnodestoargumentnodes(modifiers).FinallyweaddQUOTEedgesconnectingANIMATEENTITYnodestoutterances(ARGUMENT).3.2SatirePredictionusingtheNarrativeRepresentationGraphSatirepredictionisinherentlyatextclassificationproblem.Suchproblemsareoftenapproachedus-ingaBag-of-Words(BoW)modelwhichignoresthedocumentstructurewhenmakingpredictions.In-stead,theNRGprovidesastructuredrepresentationformakingthesatireprediction.Webeginbyshow-inghowtheNRGcanbeuseddirectlyandthendis-cusshowtoenhanceitbymappingthegraphintoabstractcategories.DirectlyUsingNRGforSatirePredictionWesuggestasimpleapproachforextractingfeaturesdi-rectlyfromtheNRG,bydecomposingitintographpaths,withoutmappingthegraphintoabstractcat-egories.Thissimple,word-basedrepresentationforpredictionstructuredaccordingtotheNRG(denotedNARRLEX),generatesfeaturesbyusingthewordsintheoriginaldocument,correspondingtothegraphdecomposition.Forexample,considerthepathcon-necting“aman”toanutteranceinFigure2(b).Sim-plefeaturescouldassociatetheutteranceswordswiththatentity,ratherthanwiththePresident.TheresultingNARRLEXmodelgeneratesBag-of-WordsfeaturesbasedonwordscorrespondingtoNRGpathvertices,conditionedontheirconnectedentityver-tex.UsingCommon-SenseforSatirePredictionUn-liketheNARRLEXmodel,whichreliesondirectly2Theseedgesaretypedaccordingtotheirsemanticroles.observedinformation,ourCOMSENSEmodelper-formsinferenceoverhigherlevelpatterns.Inthismodelthepredictionisaglobalinferenceprocess,takingintoaccounttherelationshipsbetweenNRGelements(andtheirabstractionintocategories)andthefinalprediction.ThisprocessisdescribedinFig-ure3.First,themodelassociatesahighlevelcategory,thatcanbereusedevenwhenother,previouslyun-seen,entitiesarediscussedinthetext.WeassociateasetofBooleanvariableswitheachNRGvertex,capturinghigherlevelabstractionoverthisnode.Wedefinethreetypesofcategoriescorrespond-ingtothethreetypesofvertices,anddenotethemE,UN,QforEntitycategory,ActioncategoryandQuotecategory,respectively.Eachcategoryvari-ablecantakekdifferentvalues.AsaconventionwedenoteX=iascategoryassignment,whereX∈{E,UN,Q}isthecategorytype,andiisitsas-signment.Sincethesecategoryassignmentsarenotdirectlyobserved,theyaretreatedaslatentvariablesinourmodel.ThisprocessisexemplifiedatthetoprightcornerofFigure3.Combinationsofcategoryassignmentsformpat-ternsusedfordeterminingtheprediction.Thesepat-ternscanbeviewedasparameterizedrules.Eachweightedruleassociatesacombinationwithanout-putvariable(SATIREorREAL).ExamplesofsuchrulesareprovidedinthemiddleoftherightcornerofFigure3.WeformulatetheactivationsoftheserulesasBooleanvariables,whoseassignmentsarehighlyinterconnected.Forexample,thevariablesrepresentingthefollowingrules(E=0)→SATIREand(E=0)→REALaremutuallyexclusive,sinceassigningaTvaluetoeitheroneentailsasatire(orreal)prediction.Toaccountforthisinterdepen-dency,weaddconstraintscapturingtherelationsbe-tweenrules.Themodelmakespredictionsbycombiningtheruleweightsandpredictingthetopscoringoutputvalue.Thepredictioncanbeviewedasaderivationprocess,mappingarticleentitiestocategories(e.g.,ENTITY(“AMAN”)→(E=0),isanexampleofsuchderivation),combinationsofcategoriescomposeintopredictionpatterns(par exemple.,(E=0)→SATIRE).WeuseanILPsolvertofindtheoptimalderivationse-quence.WedescribetheinferenceprocessasanIn-tegerLinearPrograminthefollowingsection.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
542
“do you want to hit this?”QuoteA0a manPresident Barak ObamaLaughArguments and modifiers PredicatesAnimateEntitiesE=1Q=1A=0E=0Commonsense Prediction RulesLatent Category AssignmentsEntity(“a man”) (E=0)Entity(“president Barak Obama”) (E=1) Predicate(“laugh”) (A=0)Quote(“Do you want to hit This?”) (Q=1)(E=0) SATIRE(E=0) SATIRE(A=0) SATIRE(Q=1) SATIRE(E=1) (A=0) SATIRE(E=0) (Q=1) SATIRE(E=0) REAL(E=0) REAL(A=0) REAL(Q=1) REAL(E=1) (A=0) REAL(E=0) (Q=1) REALFigure3:ExtractingCommon-sensepredictionrules.3.3IdentifyingRelevantInteractionsusingConstrainedOptimizationWeformulatethedecisionasa0-1IntegerLinearProgrammingproblem,consistingofthreetypesofBooleanvariables:categoryassignmentsindicatorvariables,indicatorvariablesforcommon-sensepat-terns,andfinallytheoutputdecisionvariables.Eachindicatorvariableisalsorepresentedusingafeatureset,usedtoscoreitsactivation.3.3.1CategoryAssignmentVariablesEachnodeintheNRGisassignedasetofcom-petingvariables,mappingthenodetodifferentcate-goriesaccordingtoitstype.•ANIMATEENTITYCategoryVariables,de-notedhi,j,E,indicatingtheEntitycategoryiforNRGvertexj.•ACTIONCategoryVariables,denotedhi,j,UN,in-dicatingtheActioncategoryiforNRGvertexj.•QUOTECategoryVariables,denotedhi,j,Q,in-dicatingtheQuotecategoryiforNRGvertexj.Thenumberofpossiblecategoriesforeachvari-abletypeisahyper-parameterofthemodel.VariableactivationconstraintsCategoryas-signmentstothesamenodearemutuallyexclusive(anodecanonlyhaveasinglecategory).Weencodethisfactbyconstrainingthedecisionwithalinearconstraint(whereX∈{E,UN,Q}):∀jXihi,j,X=1.CategoryAssignmentFeaturesEachdeci-sionvariabledecomposesintoasetoffeatures,φ(X,Salut,j,X)capturingthewordsassociatedwiththej-thvertex,conditionedonXandi.3.3.2Common-sensePatternsVariablesWerepresentcommon-sensepredictionrulesus-inganadditionalsetofBooleanvariables,connect-ingthecategoryassignmentsvariableswiththeout-putprediction.ThespaceofpossiblevariablesisdeterminedbydecomposingtheNRGintopathsofsizeupto2,andassociatingtwoBooleanvariableswithcategoryassignmentvariablescorrespondingtotheverticesonthesepaths.Oneofthevariablesas-sociatesthesequenceofcategoryassignmentvari-ableswithaREALoutputvalue,andonewithaSATIREoutputvalue.•SingleVertexPathPatternsVariables,denotedbyhBhi,j,X,indicatingthatthecategoryassignmentcapturedbyhi,j,XisassociatedwithoutputvalueB(whereB∈{SATIRE,REAL}).•TwoVertexPathPatternsVariables,denotedbyhB(Salut,j,X1),(hk,je,X2),indicatingthatthepatterncap-turedbycategoryassignmentalongtheNRGpathofhi,j,X1andhi,j,X2isassociatedwithoutputvalueB(whereB∈{SATIRE,REAL}).DecisionConsistencyconstraintsItisclearthattheactivationofthecommon-sensePatternsVari-ablesentailstheactivationofthecategoryassign-mentvariables,correspondingtotheelementsofthecommon-sensepatterns.ForreadabilityweonlywritetheconstraintfortheSingleVertexPathVari-ables:
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
543
(hBhi,j,X)=⇒(Salut,j,X).FeaturesSimilartothecategoryassignmentvariablefeatures,eachdecisionvariabledecom-posesintoasetoffeatures,φ(X,hBhi,j,X).Thesefea-turescapturesthewordsassociatedwitheachofthecategoryassignmentvariables(inthisexample,thewordsassociatedwiththej-thvertex)conditionedonthecategoryassignmentsandtheoutputpredic-tionvalue(inthisexample,X,iandB).Wealsoaddafeatureφ(Salut,j,X,B)capturingtheconnectionbe-tweentheoutputvalueB,andcategoryassignment.3.3.3SatirePredictionVariablesFinally,weaddtwomoreBooleanvariablescor-respondingtotheoutputprediction:hSatireandhReal.Theactivationofthesetwovariablesismu-tuallyexclusive,weencodethatbyaddingthecon-straint:hSatire+hReal=1.Weensuretheconsistencyofourmodeladdingconstraintsforcingagreementbetweenthefinalpre-dictionvariables,andthecommon-sensepatternsvariables:hBhi,j,X=⇒hB.OverallOptimizationFunctionTheBooleanvariablesdescribedintheprevioussectiondefineaspaceofcompetinginferences.Wefindtheoptimaloutputvaluederivationbyfindingtheoptimalsetofvariablesassignments,bysolvingthefollowingobjective:maxy,hPihiwTφ(X,Salut,oui)s.t.C,∀i;hi∈{0,1},(1)wherehi∈HisthesetofallvariablesdefinedaboveandCisthesetofconstraintsdefinedovertheactivationofthesevariables.wistheweightvector,usedtoquantifythefeaturerepresentationofeachh,obtainedusingafeaturefunctionφ(·).NotethattheBooleanvariableactsasa0-1indi-catorvariable.WeformalizeEq.(1)asanILPin-stance,whichwesolveusingthehighlyoptimizedGurobitoolkit3.3http://www.gurobi.com/4ParameterEstimationforCOMSENSETheCOMSENSEapproachmodelsthedecisionasinteractionsbetweenhigh-levelcategoriesofenti-ties,actionsandutterances.However,thehighlevelcategoriesassignedtotheNRGverticesarenotob-served,andasaresultweviewitasaweaklysuper-visedlearningproblem,wherethecategoryassign-mentscorrespondtolatentvariableassignments.Welearntheparametersoftheseassignmentsbyusingadiscriminativelatentstructurelearningframework.ThetrainingdataisacollectionD={(xi,yi)}ni=1,wherexiisanarticle,parsedintoanNRGrepresentation,andyisabinarylabel,indicatingifthearticleissatiricalorreal.Giventhisdataweestimatethemodels’parame-tersbyminimizingthefollowingobjectivefunction.LD(w)=minwλ2||w||2+1nnXi=1ξi(2)ξiistheslackvariable,capturingthemarginvio-lationpenaltyforagiventrainingexample,andde-finedasfollows:ξi=maxy,hf(X,h,oui,w)+coût(oui,yi)−maxhf(X,h,yi,w),wheref(·)isascoringfunction,similartotheoneusedinEq.1.Thecostfunctionisthemarginthatthetruepredictionmustexceedoverthecompetinglabel,anditissimplydefinedasthedifferencebe-tweenthemodelpredictionandthegoldlabel.Thisformulationisanextensionofthehingelossforla-tentstructureSVM.λistheregularizationparame-tercontrollingthetradeoffbetweenthel2regularizerandtheslackpenalty.Weoptimizethisobjectiveusingthestochasticsub-gradientdescentalgorithm(Ratliffetal.,2007;Felzenszwalbetal.,2009).Wecancomputethesub-gradientasfollows:∇LD(w)=λw+nXi=1Φ(xi,yi,y∗)Φ(xi,yi,y∗)=φ(xi,h∗,yi)−φ(xi,h∗,y∗),whereφ(xi,h∗,y∗)isthesetoffeaturesrepresent-ingthesolutionobtainedaftersolvingEq.14and4modifiedtoaccommodatethemarginconstraint
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
544
makingaprediction.φ(xi,h∗,yi)isthesetoffea-turesrepresentingthesolutionobtainedbysolvingEq.1whilefixingtheoutcomeoftheinferencepro-cesstothecorrectprediction(i.e.,yi).Intuitively,itcanbeconsideredasfindingthebestexplanationforthecorrectlabelusingthelatentvariablesh.Inthestochasticversionofthesubgradientde-scentalgorithmweapproximate∇LD(w)bycom-putingthesubgradientofasingleexampleandmak-ingalocalupdate.Thisversionresemblesthelatent-structureperceptronalgorithm(Sunetal.,2009).Werepeatedlyiterateoverthetrainingexamplesandforeachexample,ifthecurrentwleadstoacorrectprediction(andsatisfiesthemarginconstraint),weonlyshrinkwaccordingtoλ.Ifthemodelmakesanincorrectprediction,themodelisupdatedaccordingΦ(xi,yi,y∗).TheoptimizationobjectiveLD(W)isnotconvex,andtheoptimizationprocedureisguar-anteedtoconvergetoalocalminimum.5EmpiricalStudyWedesignourexperimentalevaluationtohelpclar-ifyseveralquestions.First,wewanttounderstandhowourmodelcompareswithtraditionaltextclassi-ficationmodels.Wehypothesizethatthesemethodsaremoresusceptibletooverfitting,anddesignourexperimentsaccordingly.Wecomparethemodels’performancewhenusingin-domaindata(testandtrainingdataarefromthesamesource),andout-of-domaindata,wherethetestdataiscollectedfromadifferentsource.Welookintotwotasks.OneistheSatiredetectiontask(BurfootandBaldwin,2009).Wealsointroduceanewtask,called“didIsaythat?”whichonlyfocusesonutterancesandspeakers.Thesecondaspectofourevaluationfocusesonthecommon-senseinferenceslearnedbyourmodel.Weexaminehowthesizeofthesetofcategoriesimpactsthemodelperformance.Wealsoprovideaqualitativeanalysisofthelearnedcategoriesus-ingaheatmap,capturingtheactivationstrengthoflearnedinferencesoverthetrainingdata.PredictiontasksWelookintotwopredictiontasks:(1)SatireDetection(denotedSD),abinaryclassificationtask,inwhichthemodelhasaccesstothecompletearticle(2)“DidIsaythat?»(denotedDIST),abinaryclassificationtask,consistingonlyofentitiesmentions(andtheirsurroundingcontextintext)anddirectquotes.ThegoaloftheDISTistopredictifagivenutteranceislikelytobereal,givenitsspeaker.Sincenotalldocumentcontaindi-rectquotes,weonlyuseasubsetofthedocumentsintheSDtask.DatasetsInbothpredictiontaskswelookintotwosettings:(1)In-domainprediction:wherethetrainingandtestdataarecollectedfromthesamesource,et(2)out-of-domainprediction,wherethetestdataiscollectedfromadifferentsource.WeusethedatacollectedbyBurfootandBaldwin(2009)fortrainingthemodelinbothsettings,anditstestdataforin-domainprediction(denotedTRAIN-SD’09,TEST-SD’09,TRAIN-SD’09-DIST,TEST-SD’09-DIST,respectivelyfortrainingandtestingintheSDandDISTtasks).Inaddition,wecollectedaseconddatasetofsatiricalandrealarticles(de-notedSD’16).Thiscollectionofarticlescontainsrealarticlesfromcnn.comandsatiricalarticlesfromtheonion.com,awellknownsatiricalnewswebsite.Thearticleswerepublishedbetween2010to2015,appearinginthepoliticalsectionsofbothnewsweb-sites.Followingotherworkinthefield,alldatasetsarehighlyskewedtowardthenegativeclass(realar-ticles),asitbettercharacterizesarealisticpredictionscenario.Thestatisticsofthedatasetsaresumma-rizedinTable2.EvaluatedSystemsWecompareseveralsystems,asfollows:SystemALLPOSAlwayspredictSatireBB’09Resultsby(BurfootandBaldwin,2009)CONVConvolutionalNN.Wefollowed(Kim,2014),usingpre-trained300-dimensionalwordvectors(Mikolovetal.,2013).LEXSVMwithunigram(LEXU)orbothuni-gramandbigram(LEXU+B)featuresNARRLEXSVMwithdirectNRG-basedfeatures(seeSec3.2)COMSENSEOurmodel.WedenotethefullmodelasCOMSENSEF,andCOMSENSEQwhenus-ingonlytheentity+quotesbasedpatterns.Wetunedallthemodels’hyper-parametersbyus-ingasmallvalidationset,consistingof15%ofthetrainingdata.Aftersettingthehyper-parameters,le
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
545
modelwasretrainedusingtheentiredataset.WeusedSVM-light5totrainourlexicalbaselinesys-tems(LEXandNARRLEX).Sincethedataishighlyskewedtowardsthenegativeclass(REAL),wead-justthelearnerobjectivefunctioncostfactorforpos-itiveexamplestooutweighnegativeexamples.Thecostfactorwastunedusingthevalidationset.5.1ExperimentalResultsSinceourgoalistoidentifysatiricalarticles,givensignificantlymorerealarticles,wereporttheF-measureofthepositiveclass.Theresultsaresum-marizedinTables1and3.WecanseethatinallcasestheCOMSENSEmodelobtainsthebestresults.Wenotethatinbothtasks,whenlearningintheout-of-domainsettingsperformancedropssharply,how-everthegapbetweentheCOMSENSEmodelandothermodelsincreasesinthesesettings,showingthatitislesspronetooverfitting.Interestingly,forthesatiredetection(SD)task,theCOMSENSEQmodelperformsbestforthein-domainsetting,andCOMSENSEFgivesthebestper-formanceintheout-of-domainsettings.Wehypoth-esizethatthisisduetoaphenomenonwecall“over-fittingtodocumentstructure”.Lexicalmodelstendtobasethedecisiononwordchoicesspecifictothetrainingdata,andasaresultwhentestedonoutofdomaindata,whichdescribesneweventsandenti-ties,performancedropssharply.Instead,theCOM-SENSEQmodelfocusesonpropertiesofquotationsandentitiesappearinginthetext.IntheSD’09datasets,thisinformationhelpsfocusthelearner,astherealandsatirearticlesarestructureddiffer-ently(forexample,satirearticlesfrequentlycontainmultiplequotes).Thisstructureisnotmaintainedwhenworkingwithout-of-domaindata,andindeedinthesesettingsthemodelbenefitsfromusingaddi-tionalinformationofferedbythefullmodel.NumberofLatentCategoriesOurCOM-SENSEmodelisparametrizedwiththenumberoflatentcategoriesitconsidersforeachentity,predicateandquote.Thishyper-parametercanhaveastronginfluenceonthemodelperformance(andrunningtime).Increasingitaddstothemodel’sexpressivityallowingittolearnmorecomplexpatterns,butalsodefinesamorecomplexlearning5http://svmlight.joachims.org/0.310.350.380.420.450.4923456EV=2EV=2EV=3EV=3EV=1EV=1LexLexQuote VarsF-ScoreFigure4:DifferentNumberofLatentCategories.EVde-notesthenumberentitycategoriesused,andQuoteVarsdenotesthenumberofquotecategoriesused.problem(recallournon-convexlearningobjectivefunction).WefocusedontheDISTtaskwhenevaluatingdifferentconfigurationsasitconvergedmuchfasterthanthefullmodel.Figure4plotsthemodelbehaviorwhenusingdifferentnumbersoflatentcategories.Interestingly,thenumberofentitycategoriessaturatesfasterthanthenumberofquotecategories.Thiscanbeattributedtothelimitedtextdescribingentities.VisualizingLatentCOMSENSEPatternsGiventheassignmenttolatentcategories,ourmodellearnscommon-sensepatternsforidentifyingsatiricalandrealarticlesbasedonthesecategories.Ideally,thesepatternscouldbeextracteddirectlyfromthedata,howeverprovidingtheresourcesforthisadditionalpredictiontaskisnotstraightforward.Instead,weviewthecategoryassignmentaslatentvariables,whichraisesthequestion-whatarethecategorieslearnedbythemodel?InthissectionweprovideaqualitativeevaluationofthesecategoriesandthepredictionrulesidentifiedbythesystemusingtheheatmapinFigure5.Forsimplicity,wefocusontheDISTtask,whichonlyhascategoriescorrespondingtoentitiesandquotes.(un)PredictionRulesThesepatternsareexpressedasrules,mappingcategoryassignmentstooutputvalues.IntheDISTtask,weconsidercombinationsofentityandquotecategorypairs,denotedEi,Qj,intheheatmap.ThetoppartofFigure5,inred,showstheactivationstrengthofeachofthecategorycom-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
546
Task:SDINDOMAIN(SD’09+SD’09)OUTDOMAIN(SD’09+SD’16)PRFPRFALLPOS0.06310.1180.12110.214BB’090.9450.6900.798—CONV0.8220.5310.6140.5170.3100.452LEXU0.9200.6900.7900.2980.5790.394LEXU+B0.8400.7200.7750.3470.3670.356NARRLEX0.6900.5900.6300.2710.4250.330COMSENSEQ0.8390.7800.8080.3170.7060.438COMSENSEF0.8530.700.7700.3860.6930.496Table1:ResultsfortheSDtaskE0, Q0E0, Q1E0, Q2E1, Q0E1, Q1E1, Q2E0, Q0E0, Q1E0, Q2E1, Q0E1, Q1E1, Q2SatireRuleRuleRealActivationActivationQuote Topics ActivationEntity Topics ActivationSatireProfanityPresidentRealSatireDrugsLiberalRealSatirePoliteConserva-RealtiveSatireScienceAnnony-RealmousSatireLegalPoliticsRealSatirePoliticsSpeakerRealSatireContro-Law Enfo-RealversyrcementFigure5:Visualizationofthecategorieslearnedbythemodels.Colorcodingcapturetheactivationstrengthofmanuallyconstructedtopicalwordgroups,accordingtoeachlatentcategory.Darkercolorsindicatehighervalues.Ei(Qi),indicatesanentity(Quote)variableassignedthei-thcategory.DataREALSATIRETRAIN-SD’092505133TEST-SD’091495100TEST-SD’163117433TRAIN-SD’09-DIST1160112TEST-SD’09-DIST68085TEST-SD’16-DIST1964362Table2:Datasetsstatistics.binationswhenmakingpredictionsoverthetrain-ingdata.Darkercolorscorrespondtolargervalues,whichwerecomputedas:cell(CE,CQ,B)=PjhB(hCE,j,E),(hCQ,j,Q)PJ,k,lhB(hk,j,E),(hl,j,Q)Intuitively,eachcellvalueinFigure5isthenumberoftimeseachcategorypatternappearedinREALorSATIREoutputpredictions,normalizedbytheover-allnumberofpatternactivationsforeachoutput.Weassumethatdifferentpatternswillbeassoci-atedwithsatiricalandrealarticles,andindeedwecanseethatmostentitiesandquotesappearinginREALarticlesfallintoadistinctivecategorypattern,E0,Q0.Interestingly,thereissomeoverlapbetweenthetwopredictionsinthemostactiveSATIREcate-gory(E1,Q0).Wehypothesizethatthisisduetothefactthatthetwoarticletypeshavesomeoverlap.(b)Associatingtopicwordswithlearnedcate-goriesInordertounderstandtheentityandquotecategoriesemergingfromthetrainingphase,welookattheactivationstrengthofeachcategorypat-ternwithrespecttoasetoftopicwords.Wemanu-allyidentifiedasetofentitytypesandquotetopics,whicharelikelytoappearinpoliticalarticles.Weassociatealistofwordswitheachoneofthesetypes.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
547
Task:DISTINDOMAIN(DIST’09+DIST’09)OUTDOMAIN(DIST’09+DIST’16)PRFPRFALLPOS0.11010.1980.15510.268LEXU0.8370.4230.5610.4070.3280.363COMSENSEQ0.7120.5530.6220.4040.5610.469Table3:ResultsfortheDISTtaskForexample,theentitytopicPRESIDENTwasasso-ciatedwithwordssuchaspresident,vice-president,Obama,Biden,Bush,Clinton.Similarly,weassoci-atedwiththequotetopicPROFANITYalistofpro-fanitywords.Weassociate7typeswithquotecate-goriescorrespondingtostyleandtopic,namelyPRO-FANITY,DRUGS,POLITENESS,SCIENCE,LEGAL,POLITICS,CONTROVERSY,andanothersetofseventypeswithentitytypes,namelyPRESIDENT,LIBERAL,CONSERVA-TIVE,ANONYMOUS,POLITICS,SPEAKER,LAWENFORCE-MENT.InthebottomleftpartofFigure5(inblue),weshowtheactivationstrengthofeachcategorywithrespecttothesetofselectedquotetopics.Intu-itively,wecountthenumberoftimesthewordsas-sociatedwithagiventopicappearedinthetextspancorrespondingtoacategoryassignmentpair,sepa-ratelyforeachoutputprediction.Wenormalizethisvaluebythetotalnumberoftopicwordoccurrences,overallcategoryassignmentpairs.NotethatweonlylookatthetextspancorrespondingtoquoteverticesintheNRG.WeprovideasimilaranalysisforentitycategoriesinthebottomrightpartofFig-ure5(ingreen).Weshowtheactivationstrengthofeachcategorywithrespecttothesetofselectedentitytopicwords.Ascanbeexpected,wecanseethatprofanitywordsareonlyassociatedwithsatir-icalcategories,andevenmoreinterestingly,whenwordsappearinbothsatiricalandrealpredictions,theytendtofallintodifferentcategories.Forex-ample,thetopicwordsrelatedtoDRUGScanap-pearbothinrealarticlesdiscussingalcoholanddrugpolicies.Buttopicwordsrelatedtodrugsalsoap-pearinsatiricalarticlesportrayingpoliticiansusingthesesubstances.Whiletheseareonlyqualitativeresults,webelievetheyprovidestrongintuitionsforfuturework,especiallyconsideringthefactthattheactivationvaluesdonotrelyondirectsupervision,andonlyreflectthecommon-sensepatternsemerg-ingfromthelearnedmodel.6SummaryandFutureWorkInthispaperwepresentedalatentvariablemodelforsatiredetection.Wefollowedtheobservationthatsatiredetectionisinherentlyasemantictaskandmodeledthecommon-senseinferencesrequiredforitusingalatentvariableframework.Wedesignedourexperimentsspecificallytoex-amineifourmodelcangeneralizebetterthanun-structuredlexicalmodelsbytestingitonout-of-domaindata.Ourexperimentsshowthatinthesechallengingsettings,theperformancegapbetweenourapproachandtheunstructuredmodelsincreases,demonstratingtheeffectivenessofourapproach.Inthispaperwerestrictedourselvestolimitednarrativerepresentation.Inthefutureweintendtostudyhowtoextendthisrepresentationtocapturemorenuancedinformation.Learningcommon-senserepresentationforpre-dictionproblemshasconsiderablepotentialforNLPapplications.AstheNLPcommunityconsidersin-creasinglychallengingtasksfocusingonsemanticandpragmaticaspects,theimportanceoffindingsuchcommon-senserepresentationwillincrease.Inthispaperwedemonstratedthepotentialofcommon-senserepresentationsforoneapplication.Wehopetheseresultswillserveasastartingpointforotherstudiesinthisdirection.ReferencesGaborAngeliandChristopherDManning.2014.Nat-uralli:Naturallogicinferenceforcommonsenserea-soning.InProc.oftheConferenceonEmpiricalMeth-odsforNaturalLanguageProcessing(EMNLP).JonathanBerant,VivekSrikumar,Pei-ChunChen,AbbyVanderLinden,BrittanyHarding,BradHuang,PeterClark,andChristopherD.Manning.2014.Model-ingbiologicalprocessesforreadingcomprehension.InProc.oftheConferenceonEmpiricalMethodsforNaturalLanguageProcessing(EMNLP).AntoineBordes,JasonWeston,RonanCollobert,andYoshuaBengio.2011.Learningstructuredembed-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
548
dingsofknowledgebases.InProc.oftheNationalConferenceonArtificialIntelligence(AAAI).ClintBurfootandTimothyBaldwin.2009.Automaticsatiredetection:Areyouhavingalaugh?InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).NathanaelChambersandDanJurafsky.2008.Unsuper-visedlearningofnarrativeeventchains.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).NathanaelChambersandDanJurafsky.2009.Unsuper-visedlearningofnarrativeschemasandtheirpartici-pants.InProc.oftheAnnualMeetingoftheAssocia-tionComputationalLinguistics(ACL).DmitryDavidov,OrenTsur,andAriRappoport.2010.Semi-supervisedrecognitionofsarcasticsentencesintwitterandamazon.InProc.oftheAnnualConfer-enceonComputationalNaturalLanguageLearning(CoNLL).DavidKElson.2012.Dramabank:Annotatingagencyinnarrativediscourse.InProc.oftheInternationalConferenceonLanguageResourcesandEvaluation(LREC).P.F.Felzenszwalb,R.B.Girshick,D.McAllester,andD.Ramanan.2009.Objectdetectionwithdiscrimina-tivelytrainedpartbasedmodels.IEEETransactionsonPatternAnalysisandMachineIntelligence,99(1).ElenaFilatova.2012.Ironyandsarcasm:Corpusgen-erationandanalysisusingcrowdsourcing.InProc.oftheInternationalConferenceonLanguageResourcesandEvaluation(LREC).MattGerber,AndrewSGordon,andKenjiSagae.2010.Open-domaincommonsensereasoningusingdiscourserelationsfromacorpusofweblogstories.InProc.oftheNAACLHLT2010FirstInternationalWorkshoponFormalismsandMethodologyforLearn-ingbyReading.RobertoGonz´alez-Ib´anez,SmarandaMuresan,andNinaWacholder.2011.Identifyingsarcasmintwitter:acloserlook.InProc.oftheAnnualMeetingoftheAs-sociationComputationalLinguistics(ACL).Associa-tionforComputationalLinguistics.AndrewSGordon,CosminAdrianBejan,andKenjiSagae.2011.Commonsensecausalreasoningusingmillionsofpersonalstories.InProc.oftheNationalConferenceonArtificialIntelligence(AAAI).AndrewSGordon,ZornitsaKozareva,andMelissaRoemmele.2012.Semeval-2012task7:choiceofplausiblealternatives:anevaluationofcommonsensecausalreasoning.InProc.oftheSixthInternationalWorkshoponSemanticEvaluation.AmitGoyal,EllenRiloff,andHalDaum´eIII.2010.Au-tomaticallyproducingplotunitrepresentationsfornar-rativetext.InProc.oftheConferenceonEmpiricalMethodsforNaturalLanguageProcessing(EMNLP).JerryRHobbs,MarkStickel,PaulMartin,andDouglasEdwards.1988.Interpretationasabduction.InProc.oftheAnnualMeetingoftheAssociationComputa-tionalLinguistics(ACL).JihenKaroui,FarahBenamara,V?roniqueMoriceau,NathalieAussenac-Gilles,andLamiaHadrichBel-guith.2015.Towardsacontextualpragmaticmodeltodetectironyintweets.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).YoonKim.2014.ConvolutionalNeuralNetworksforSentenceClassification.InProc.oftheConferenceonEmpiricalMethodsforNaturalLanguageProcessing(EMNLP).IgorLabutovandHodLipson.2012.Humorascircuitsinsemanticnetworks.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).WendyG.Lehnert.1981.Plotunitsandnarrativesum-marization.CognitiveScience,5(4):293–331.DouglasBLenat.1995.Cyc:Alarge-scaleinvestmentinknowledgeinfrastructure.CommunicationsoftheACM,38(11):33–38.HugoLiuandPushSingh.2004.Conceptnet?apracti-calcommonsensereasoningtool-kit.BTtechnologyjournal,22(4):211–226.PeterLoBueandAlexanderYates.2011.Typesofcommon-senseknowledgeneededforrecognizingtex-tualentailment.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).StephanieLukinandMarilynWalker.2013.Really?well.apparentlybootstrappingimprovestheperfor-manceofsarcasmandnastinessclassifiersforonlinedialogue.InProc.oftheWorkshoponLanguageAnal-ysisinSocialMedia.InderjeetMani.2012.ComputationalModelingofNar-rative.SynthesisLecturesonHumanLanguageTech-nologies.Morgan&ClaypoolPublishers.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyFinkel,StevenJ.Bethard,andDavidMcClosky.2014.TheStanfordCoreNLPnaturallanguagepro-cessingtoolkit.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).Andr´eFTMartins,NoahASmith,andEricPXing.2009.Polyhedralouterapproximationswithapplica-tiontonaturallanguageparsing.InProc.oftheInter-nationalConferenceonMachineLearning(ICML).J.McCarthy.1980.Circumscriptionaformofnon-monotonicreasoning.ArtificialIntelligence,13(1,2).TomasMikolov,IlyaSutskever,KaiChen,GregorySCorrado,andJeffreyDean.2013.Distributedrep-resentationsofwordsandphrasesandtheircomposi-tionality.InTheConferenceonAdvancesinNeuralInformationProcessingSystems(NIPS).
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
549
TimO’Keefe,SilviaPareti,JamesRCurran,IrenaKo-prinska,andMatthewHonnibal.2012.Asequencelabellingapproachtoquoteattribution.InProc.oftheConferenceonEmpiricalMethodsforNaturalLan-guageProcessing(EMNLP).BoPangandLillianLee.2008.Opinionminingandsentimentanalysis.Foundationsandtrendsininfor-mationretrieval,2(1-2):1–135.V.Punyakanok,D.Roth,andW.Yih.2008.Theimpor-tanceofsyntacticparsingandinferenceinsemanticrolelabeling.ComputationalLinguistics,34(2).NathanD.Ratliff,J.AndrewBagnell,andMartinZinke-vich.2007.(approximate)subgradientmethodsforstructuredprediction.InProc.oftheInternationalConferenceonArtificialIntelligenceandStatistics(AISTATS).RaymondReiter.1980.Alogicfordefaultreasoning.Artificialintelligence,13(1):81–132.AntonioReyes,PaoloRosso,andTonyVeale.2013.Amultidimensionalapproachfordetectingironyintwit-ter.LanguageResourcesandEvaluation,47(1):239–268.MatthewRichardson,ChristopherJ.C.Burges,andErinRenshaw.2013.Mctest:Achallengedatasetfortheopen-domainmachinecomprehensionoftext.InProc.oftheConferenceonEmpiricalMethodsforNaturalLanguageProcessing(EMNLP).EllenRiloff,AshequlQadir,PrafullaSurve,LalindraDeSilva,NathanGilbert,andRuihongHuang.2013.Sarcasmascontrastbetweenapositivesentimentandnegativesituation.InProc.oftheConferenceonEmpiricalMethodsforNaturalLanguageProcessing(EMNLP).RichardSocher,DanqiChen,ChristopherDManning,andAndrewNg.2013.Reasoningwithneuralten-sornetworksforknowledgebasecompletion.InTheConferenceonAdvancesinNeuralInformationPro-cessingSystems(NIPS).XuSun,TakuyaMatsuzaki,,DaisukeOkanohara,andJunichiTsujii.2009.Latentvariableperceptronal-gorithmforstructuredclassication.InProc.oftheIn-ternationalJointConferenceonArtificialIntelligence(IJCAI).NiketTandon,GerardDeMelo,andGerhardWeikum.2011.Derivingaweb-scalecommonsensefactdatabase.InProc.oftheNationalConferenceonArti-ficialIntelligence(AAAI).JosephTepperman,DavidRTraum,andShrikanthNarayanan.2006.”yeahright”:sarcasmrecognitionforspokendialoguesystems.InProc.ofInterspeech.ByronC.Wallace,DoKookChoe,LauraKertz,andEu-geneCharniak.2014.Humansrequirecontexttoin-ferironicintent(socomputersprobablydo,aussi).InProc.oftheAnnualMeetingoftheAssociationCom-putationalLinguistics(ACL).ByronC.Wallace,DoKookChoe,andEugeneCharniak.2015.Sparse,contextuallyinformedmodelsforironydetection:Exploitingusercommunities,entitiesandsentiment.InProc.oftheAnnualMeetingoftheAsso-ciationComputationalLinguistics(ACL).HaiWangandMohitBansalKevinGimpelDavidMcAllester.2015.Machinecomprehensionwithsyn-tax,frames,andsemantics.InProc.oftheAnnualMeetingoftheAssociationComputationalLinguistics(ACL).
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
1
6
1
5
6
7
4
2
4
/
/
t
je
un
c
_
un
_
0
0
1
1
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
550