Transactions of the Association for Computational Linguistics, vol. 3, pp. 117–129, 2015. Action Editor: Hal Daum´e III.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 117–129, 2015. Action Editor: Hal Daum´e III.
Submission batch: 10/2014; Revision batch 1/2015; Published 2/2015. c
2015 Association for Computational Linguistics.
(cid:13)

117

ExploitingParallelNewsStreamsforUnsupervisedEventExtractionCongleZhang,StephenSoderland&DanielS.WeldComputerScience&EngineeringUniversityofWashingtonSeattle,WA98195,USA{clzhang,soderlan,weld}@cs.washington.eduAbstractMostapproachestorelationextraction,thetaskofextractinggroundfactsfromnaturallanguagetext,arebasedonmachinelearningandthusstarvedbyscarcetrainingdata.Man-ualannotationistooexpensivetoscaletoacomprehensivesetofrelations.Distantsuper-vision,whichautomaticallycreatestrainingdata,onlyworkswithrelationsthatalreadypopulateaknowledgebase(KB).Unfortu-nately,KBssuchasFreeBaserarelycovereventrelations(e.g.“persontravelstoloca-tion”).Ainsi,theproblemofextractingawiderangeofevents—e.g.,fromnewsstreams—isanimportant,openchallenge.ThispaperintroducesNEWSSPIKE-RE,anovel,unsupervisedalgorithmthatdiscoverseventrelationsandthenlearnstoextractthem.NEWSSPIKE-REusesanovelprobabilisticgraphicalmodeltoclustersentencesdescrib-ingsimilareventsfromparallelnewsstreams.Theseclustersthencomprisetrainingdatafortheextractor.OurevaluationshowsthatNEWSSPIKE-REgenerateshighqualitytrain-ingsentencesandlearnsextractorsthatper-formmuchbetterthanrivalapproaches,morethandoublingtheareaunderaprecision-recallcurvecomparedtoUniversalSchemas.1IntroductionRelationextraction,theprocessofextractingstruc-turedinformationfromnaturallanguagetext,growsincreasinglyimportantforWebsearchandques-tionanswering.Traditionalsupervisedapproaches,whichcanachievehighprecisionandrecall,arelim-itedbythecostoflabelingtrainingdataandareun-likelytoscaletothethousandsofrelationsontheWeb.Anotherapproach,distantsupervision(CravenandKumlien,1999;WuandWeld,2007),createsitsowntrainingdatabymatchingthegroundinstancesofaKnowledgebase(KB)(e.g.Freebase)totheun-labeledtext.Unfortunately,whiledistantsupervisioncanworkwellinsomesituations,themethodislimitedtorela-tivelystaticfacts(e.g.,born-in(person,location)orcapital-of(location,location))wherethereisacor-respondingknowledgebase.Butwhataboutdy-namiceventrelations(alsoknownasﬂuents),suchastravel-to(person,location)orﬁre(organization,person)?Sincethesetime-dependentfactsareephemeral,theyarerarelystoredinapre-existingKB.Atthesametime,knowledgeofreal-timeeventsiscrucialformakinginformeddecisionsinﬁeldslikeﬁnanceandpolitics.Indeed,newsstoriesreporteventsalmostexclusively,solearningtoex-tracteventsisanimportantopenproblem.Thispaperdevelopsanewunsupervisedtech-nique,NEWSSPIKE-RE,tobothdiscovereventrela-tionsandextractthemwithhighprecision.Thein-tuitionunderlyingNEWSSPIKE-REisthatthetextofarticlesfromtwodifferentnewssourcesarenotindependent,sincetheyareeachconditionedonthesamereal-worldevents.Bylookingforrarelyde-scribedentitiesthatsuddenly“spike”inpopularityonagivendate,onecanidentifyparaphrases.Suchtemporalcorrespondence(ZhangandWeld,2013)allowonetoclusterdiversesentences,andthere-sultingclustersmaybeusedtoformtrainingdatainordertolearneventextractors.Furthermore,onecanalsoexploitparallelnewstoobtaindirectnegativeevidence.Toseethis,supposeonedaythenewsin-cludesthefollowing:(un)“SnowdentravelstoHongKong,offsoutheasternChina.”(b)“Snowdencan-notstayinHongKongasChineseofﬁcialswillnotallow…”Sincenewsstoriesareusuallycoherent,itishighlyunlikelythattraveltoandstayin(whichisnegated)aresynonymous.Byleveragingsuchdirectnegativephrases,wecanlearnextractorscapableofdistinguishingheavilyco-occurringbutsemanticallydifferentphrases,therebyavoidingmanyextractionerrors.OurNEWSSPIKE-REsystemencapuslatestheseintuitionsinanovelgraphicalmodelmaking

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

118

thefollowingcontributions:•Wedevelopamethodtodiscoverasetofdis-tinct,salienteventrelationsfromnewsstreams.•Wedescribeanalgorithmtoexploitparal-lelnewsstreamstoclustersentencesthatbe-longtothesameeventrelations.Inparticu-lar,weproposethetemporalnegationheuris-tictoavoidconﬂatingco-occurringbutnon-synonymousphrases.•Weintroduceaprobabilisticgraphicalmodeltogeneratetrainingforasententialeventextractorwithoutrequiringanyhumanannotations.•Wepresentdetailedexperimentsdemonstratingthattheeventextractors,learnedfromthegener-atedtrainingdata,signiﬁcantlyoutperformsev-eralcompetitivebaselines,e.g.oursystemmorethandoublestheareaunderthemicro-averaged,PRcurve(0.80vs.0.30)comparedtoRiedel’sUniversalSchema(Riedeletal.,2013).2PreviousWorkSupervisedlearningapproacheshavebeenwidelydevelopedforeventextractiontaskssuchasMUC-4andACE.Theyoftenfocusonahand-craftedon-tologyandtraintheextractorwithmanuallycreatedtrainingdata.Whiletheycanofferhighprecisionandrecall,theyareoftendomain-speciﬁc(e.g.bio-logicalevents(Riedeletal.,2011;McCloskyetal.,2011)andentertainmentevents(Bensonetal.,2011;ReichartandBarzilay,2012)),andarehardtoscaleovertheeventsontheWeb.OpenIEsystemsextractopendomainrelations(par exemple.(Bankoetal.,2007;Faderetal.,2011))andevents(par exemple.(Ritteretal.,2012)).Theyoftenperformself-supervisedlearningofrelation-independentex-tractions.Itallowsthemtoscalebutmakesthemunabletooutputcanonicalizedrelations.Distantsupervisedapproacheshavebeendevel-opedtolearnextractorsbyexploitingthefactsexis-tinginaknowledgebase,thusavoidinghumanan-notation.Wuetal.(2007)andReschkeetal.(2014)learnedInfoboxrelationsfromWikipedia,whileMintzetal.(2009)heuristicallymatchedFreebasefactstotexts.Sincethetrainingdatageneratedbytheheuristicmatchingisoftenimperfect,multi-instancelearningapproaches(Riedeletal.,2010;Hoffmannetal.,2011;Surdeanuetal.,2012)havebeendevelopedtocombatthisproblem.Unfortu-nately,mostfactsexistingintheKBsarestaticfactslikegeographicalorbiographicaldata.Theyfallshortoflearningextractorsforﬂuentfactssuchassportsresultsortravelandmeetingsbyaperson.Bootstrappingisanothercommonextractiontechnique(Brin,1999;AgichteinandGravano,2000;Carlsonetal.,2010;Nakasholeetal.,2011;HuangandRiloff,2013).Thistypicallytakesasetofseedsasinput,whichcanbegroundinstancesorkeyphrases.Thealgorithmstheniterativelygener-atemorepositiveinstancesandphrases.Whiletherearemanysuccessfulexamplesofbootstrapping,thechallengeistoavoidsemanticdrift.Large-scalesys-tems,donc,oftenrequireextraprocessingsuchasmanualvalidationbetweentheiterationsoraddi-tionalnegativeseedsastheinput.Unsupervisedapproacheshavebeendevelopedforrelationdiscoveryandextractions.Thesealgo-rithmsareusuallybasedonsomeclusteringassump-tionsoveralargeunlabeledcorpus.Commonas-sumptionsincludethedistributionalhypothesisusedby(Hasegawaetal.,2004;ShinyamaandSekine,2006),latenttopicassumptionby(Yaoetal.,2012;Yaoetal.,2011),andlowrankassumptionby(Taka-matsuetal.,2011;Riedeletal.,2013).Sincetheassumptionslargelyrelyonco-occurrence,previousunsupervisedapproachestendtoconfusecorrelatedbutsemanticallydifferentphrasesduringextraction.Incontrasttothis,ourworklargelyavoidstheseer-rorsbyexploitingthetemporalnegationheuristicinparallelnewsstreams.Inaddition,unlikemanyunsupervisedalgorithmsrequiringhumanefforttocanonicalizetheclusters,ourworkautomaticallydiscoverseventswithreadablenames.Paraphrasingtechniquesinspireourwork.Sometechniques,suchasDIRT(LinandPantel,2001)andResolver(YatesandEtzioni,2009),arebasedonthedistributionalhypothesis.Anothercommonapproachistouseparallelcorpora,includingnewsstreams(BarzilayandLee,2003;Dolanetal.,2004;ZhangandWeld,2013),multipletranslationsofthesamestory(BarzilayandMcKeown,2001)andbilingualsentencepairs(Ganitkevitchetal.,2013)togeneratetheparaphrases.Althoughthesealgo-rithmscreatemanygoodparaphrases,theycannotbedirectlyusedtogenerateenoughtrainingdatatotrainarelationextractorfortworeasons:ﬁrst,thesemanticsoftheparaphrasesisoftencontextdepen-dent;second,thegeneratedparaphrasesareoftenin

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

119

Parallel news streamsE=e(t1,t2)EventDiscover event relationsNewsSpikew/Parallel sentencesr1r2r3 (a1,a2,t)r1r2r3 r4r5r1r2r3 NS=(a1,a2,d,S)GroupS={s1, s2,s3}r1r2r3 (a1,a2,t)r1r2r3 r4r5r1r2r3 E=e(t1,t2)s→E(a1,a2)s’→E(a’1,a’2)Generate training dataTraining sentencesEventExtractorlearnTestsentencesinputextracts→E(a1,a2)ExtractionssTraining PhaseTesting PhaseFigure1:Duringitstrainingphase,NEWSSPIKE-REﬁrstgroupsparallelsentencesasNewsSpikes.Next,thesystemautomaticallydiscoversasetofeventrelations.Then,aprobabilisticgraphicalmodelclusterssentencesfromtheNewsSpikeastrainingdataforeachdiscoveredrelation,whichisusedtolearnsententialeventextrac-tors.Duringthetestingphase,theextractortakestestsentencesasinputandpredictseventextractions.smallclustersanditremainschallengingtomergethemforthepurposeoftraininganextractor.Ourworkextendspreviousparaphrasingtechniques,no-tablythatofZhangandWeld(2013),butwefo-cusongeneratinghigh-quality,positiveandnegativetrainingsentencesforthediscoveredeventsinordertolearnextractorswithhighprecisionandrecall.3SystemOverviewNewsarticlesreportanenormousnumberofeventseveryday.Oursystem,NEWSSPIKE-RE,alignsparalelnewsstreamstoindentifyandextracttheseeventsasshowninFigure1.NEWSSPIKE-REhasbothtrainingandtestphases.Itstrainingphasehastwomainsteps:event-relationdiscov-eryandtraining-setgeneration.Section4describesoureventrelationdiscoveryalgorithm,whichpro-cessestime-stampednewsarticlestodiscernasetofsalient,distincteventrelationsintheformofE=e(t1,t2),whereeisarepresentativeeventphraseandtiaretypesofthetwoarguments.NEWSSPIKE-REgeneratestheeventphrasesusinganOpenInformationExtraction(IE)système(Faderetal.,2011),andusesaﬁne-grainedentityrecogni-tionsystemFIGER(LingandWeld,2012)togen-eratetypedescriptorssuchas“company”,“politi-cian”,and“medicaltreatment”.ThesecondpartofNEWSSPIKE-RE’strainingphase,describedinSection5,isamethodforbuild-ingextractorsforthediscoveredeventrelations.Ourapproachismotivatedbytheintuition,adaptedfromZhangandWeld(2013),thatarticlesfromdifferentnewssourcestypicallyusedifferentsentencestode-scribethesameevent,andthatcorrespondingsen-tencescanbeidentiﬁedwhentheymentionauniquepairofreal-worldentities.Forexample,whenanun-usualentitypair(Selena,Norway)issuddenlyseeninthreearticlesonasingleday:SelenatraveledtoNorwaytoseeherex-boyfriend.SelenaarrivedinNorwayforarendezvouswithJustin.Selena’striptoNorwaywasnocoincidence.Itislikelythatallthreerefertothesameeventre-lation,travel-to(person,location)1,andcanbeusedaspositivetrainingexamplesfortherelation.AsinZhang&Weld(2013),wegroupparallelsentencessharingthesameargumentpairanddateinastruc-turecalledaNewsSpike.However,weincludeallsentencesmentioningthearguments(e.g.Selena’striptoNorway)intheNewsSpike(notjustthoseyieldingOpenIEextractions),andusethelexical-izeddependencypathbetweenthearguments(e.g.<-[poss]-trip-[prep-to]->2,astheeventphrase.Inthisway,wecangeneralizeextractorsbeyondthescopeofOpenIE.Formally,aNewsSpikeisatu-ple,(a1,a2,d,S),wherea1anda2arearguments(e.g.Selena),disadate,andSisasetofargument-labeledsentences{(s,a1,a2,p)…}inwhichsisasentencewithargumentsaiandeventphrasep.It’simportantthatnon-synonomoussentenceslike“SelenastaysinNorway”shouldbeexcludedfromthetrainingdatafortravel-to(person,loca-tion)evenifatravel-toeventdidapplytothatargu-mentpair.Inordertoselectonlythesynonomoussentences,wedevelopaprobabilisticgraphicalmodel,describedinSection5.2,toaccuratelyas-signsentencesfromNewsSpikestoeachdiscov-eredeventrelationE.Giventhisannotateddata,NEWSSPIKE-REtrainsextractorsusingamulti-classlogisticregressionclassﬁer.Duringthetestingphase,NEWSSPIKE-REac-ceptsarbitrarysentences(nodate-stamprequired),usesFIGERtoidentifypossiblearguments,andusestheclassiﬁertopredictswhichevents(ifany)holdbetweenanargumentpair.Wedescribetheextrac-tionprocessinSection6.NotethatNEWSSPIKE-REisanunsupervisedal-1Forclarityinthepaper,werefertothisrelationastravel-to,eventhoughthephrasearriveinisactuallymorefrequentandisselectedasthenameofthisrelationbyoureventdiscoveryalgorithm,asshowninTable2.2Thisdependencypathwillbereferredtoas“’stripto”.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

120

𝐸1𝐸2𝜂1𝜂2𝜂3𝐸3Figure2:Asimpleexampleoftheedge-coveralgo-rithmwithK=2,whereEiareeventrelationsandηjareNewsSpikes.TheoptimalsolutionselectsE1withedgestoη1andη2,andE3withedgetoη3.ThesetwoeventrelationscoveralltheNewsSpikes.gorithmthatrequiresnomanuallabellingofthetraininginstances.Likedistantsupervision,thekeyistoautomaticallygeneratethetrainingdata,atwhichpointatraditionalsupervisedclassiﬁermaybeappliedtolearnanextractor.Becausedis-tantsupervisioncreatesverynoisyannotations,re-searchersoftenusespecializedlearnersthatmodelthecorrectnessofatrainingexamplewithala-tentvariable(Riedeletal.,2010;Hoffmannetal.,2011),butwefoundthisunnecessary,becauseNEWSSPIKE-REcreateshighqualitytrainingdata.4DiscoveringSalientEventsTheﬁrststepofNEWSSPIKE-REistodiscoverasetofeventrelationsintheformofE=e(t1,t2),whereeisaneventphrase,andtiareﬁne-grainedar-gumenttypesgeneratedbyFIGER,augmentedwiththeimportanttypes“number”and“money”,whicharerecognizedbytheStanfordnameentityrecogni-tionsystem(Finkeletal.,2005).Tobemostuseful,thediscoveredeventrelationsshouldcoversalienteventsthatarefrequentlyreportedinthenewsar-ticles.Formally,wesaythataNewsSpikeη=(a1,a2,d,S)mentionsE=e(t1,t2)ifthetypesofaiaretiforeachi,andoneofitssentencehaseastheeventphrasebetweenthearguments.Tomax-imizethesalienceoftheevents,NEWSSPIKE-REwillprefereventrelationsthatare“mentioned”bymoreNewsSpikes.Inaddition,thesetofeventrelationsshouldbedistict.Forexample,iftherelationtravel-to(person,location)isalreadyintheset,thenvisit(person,lo-cation)shouldnotbeselectedasaseparaterelation.Toreduceoverlap,discoveredeventrelationsshouldnotbementionedbythesameNewsSpike.LetEbeallcandidateeventrelations,NbeallNewsSpikes.OurgoalistoselecttheKmostsalientrelationsfromE,minimizingoverlapbetweenre-lations.Wecanframethistaskasavariantofthebipartitegraphedge-coverproblem.LetabipartitegraphGhaveonenodeEiforeacheventrelationinEandonenodeηjforeachNewsSpikeinN.ThereisanedgebetweenEiandηjifηjmentionsEi.Theedge-coverproblemistoselectalargestsubsetofedgessubjectto(1)atmostKnodesofEiarecho-senandalledgesincidenttothemarechosenasthecoverededges;(2)eachnodeofηjisincidenttoatmostoneedge.TheﬁrstconstraintguaranteesthatthereareexactlyKeventrelationsdiscovered;thesecondconstraintensuresthatnoNewsSpikepartic-ipatesintwoeventrelations.Figure2showstheoptimizedsolutionofasimplegraphwithK=2,whichcancover3edgeswith2eventrelationsthathavenooverlappingNewsSpikes.Sinceboththeobjectivefunctionandconstraintsarelinear,wecanoptimizethisedge-coverproblemwithintegerlinearprogramming(NemhauserandWolsey,1988).Bysolvingtheoptimizationprob-lem,NEWSSPIKE-REﬁndsasalientsetofeventre-lationsincidenttothecoverededges.Thediscov-eredrelationswithKsetto30areshowninTable2inSection7.Inaddition,thecoverededgesbringustheinitialmappingbetweentheeventtypesandNewsSpikes,whichisusedtotraintheprobablisticmodelinSection5.3.5GeneratingtheTrainingSentencesAfterNEWSSPIKE-REhasdiscoveredasetofeventrelations,itthengeneratestraininginstancestolearnanextractorforeachrelation.Inthissection,wepresentouralgorithmforgeneratingthetrainingsentences.AsshowninFigure1,thegeneratortakesNNewsSpikes{ηi=(a1i,a2i,di,Si)|je = 1…N}andKeventrelations{Ek=ek(t1k,t2k)|k=1…K}asinput.Foreveryeventrelation,Ek,thegeneratoridentiﬁesasubsetofsentencesfrom∪Ni=1Siexpressingtheeventrelationastrainingsen-tences.Inthissection,weﬁrstcharacterizetheparaphrasedeventphrasesandtheparallelsentencesinNewsSpikes.Thenweshowhowtoencodethisheuristicinaprobabilisticgraphicalmodelthatjointlyparaphrasestheeventphrasesandidentiﬁesasetoftrainingsentences.5.1ExploitingPropertiesofParallelNewsPreviouswork(ZhangandWeld,2013)proposedseveralheuristicsthatareusefultoﬁndsimilarsen-tencesinaNewsSpike.Forexample,thetempo-ralfunctionalityheuristicsaysthatsentencesina

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

121

NewsSpikewiththesametensetendtobepara-phrases.Unfortunately,thesemethodsaretooweaktogenerateenoughdatafortraininghighqualityeventextractors:(1)theyare“in-spikeheuristics”thattendtogeneratesmallclustersfromindividualNewsSpikes.Itremainsunclearhowtomergesim-ilareventsoccuringondifferentdaysandbetweendifferententitiestoincreaseclustersize.(2)theyin-cludedheuristicsto“gainprecisionattheexpenseofrecall”(e.g.newsarticlesdonotstatethesamefacttwice),becauseitishardtoobtaindirectnega-tivephrasesinsideoneNewsSpike.Inthispaper,weexploitnewsstreamsinacross-spike,globalman-nertoobtainaccuratepositiveandnegativesignals.Thisallowsustodramaticallyimproverecallwhilemaintaininghighprecision.Oursystemstartsfromthebasicobservationthattheparallelsentencestendtobecoherent.SoifaNewsSpikeη=(a1,a2,d,S)isaninstanceofaneventrelationE=e(t1,t2),theeventphrasesinitsparallelsentencestendtobeparaphrases.Butsome-timesthesentencesintheNewsSpikearerelatedbutnotparaphrases.Forexample,oneday“SnowdenwillstayinHongKong…”appearstogetherwith“SnowdentravelstoHongKong…”.Althoughthefactstay-in(Snowden,HongKong)istrue,itisharm-fultoinclude“SnowdenwillstayinHongKong”inthetrainingfortravel-to(person,location).Detectingparaphrasesremainsachallengetomostunsupervisedapproachesbecausetheytendtoclusterheavilyco-occurringphraseswhichmayturnouttobesemanticallydifferentorevenantony-mous.(ZhangandWeld,2013)presentedamethodtoavoidconfusionbetweenantonymandsynonymsinNewsSpikes,butdidnotaddresstheproblemofrelatedbutdifferentphrasesliketraveltoandstayininaNewsSpike.Tohandlethis,ourmethodrestsonasimpleob-servation:whenyouread“SnowdentravelstoHongKong”and“SnowdencannotstayinHongKongasChineseofﬁcialsdonotallow…”inthesameNewsSpike,itisunlikethattraveltoandstayinaresynonymouseventphrasesbecauseotherwisethetwonewsstoriesaredescribingtheoppositeevent.Thisobservationleadsto:TemporalNegationHeuristic.Twoeventphrasespandqtendtobesemanticallydifferentiftheyco-occurintheNewsSpikebutoneofthemisinnegatedform.Thetemporalnegationheuristichelpsintwoways:(1)itprovidessomedirectnegativephrasesfortheeventrelations;NEWSSPIKE-REusesthesetoheuristicallylabelsomevariablesinthemodel.(2)Itcreatessomeusefulfeaturestoimplementaformoftransitvity.Forexample,ifweﬁndthatliveinandstayinarefrequentlyco-occurringandthetemporalnegationheuristictellsusthattraveltoandstayinarenotparaphrases,thisisevidencethatliveinisunlikelytobeaparaphraseoftravelto,eveniftheyareheavilyco-occurring.Thefollowingsectiondescribesourimplementa-tionthatusesthesepropertiestogeneratehighqual-itytraining.Ourgoalisthefollowing:asentence(s,a1,a2,p)fromNewsSpikeη=(a1,a2,d,S)shouldbeincludedinthetrainingdataforeventre-lationE=e(t1,t2)iftheeventphrasepisapara-phraseofeandtheeventrelationEhappenstotheargumentpair(a1,a2)attimed.5.2JointClusterModelAsdiscussedabove,toidentifyahighqualitysetoftrainingsentencesfromNewsSpikes,oneneedstocombineevidencethateventphrasesareparaphraseswithevidencefromNewsSpikes.Forthispurpose,wedeﬁneanundirectedgraphicalmodeltojointlyreasonaboutparaphrasingtheeventphrasesandidentifyingthetrainingsentencesfromNewsSpikes.Weﬁrstlistthenotationusedinthissection:Eeventrelationp∈Peventphrasess∈Spsentencesw/theeventphrasepYpIspaparaphraseforE?ZspIssw/pgoodtrainingforE?ΦfactorsLetPbetheunionofalltheeventphrasesfromeveryNewsSpike.Foreachp∈P,letSpbethesetofsentenceshavingpasitseventphrase.Figure3(un)showsthemodelinplateform.Therearetwokindsofrandomvariablescorrespondingtophrasesandsentences,respectively.ForeacheventrelationE=e(t1,t2),thereexistsaconnectedcom-ponentforeveryeventphrasep∈Pthatmodels(1)whetherpisaparaphraseofeornot(modeledus-ingBooleanphrasevariables,Yp);et(2)whethereachsentenceofSpisagoodtrainingsentenceforE(modeledusing|Sp|Booleansentencevariables{Zsp|s∈Sp}.Intuitively,thegoalofthemodelistoﬁndthesetofgoodtrainingsentences,avec

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

122

𝑝𝑌𝑆𝑝𝑍𝑖(un)1𝑍11𝑌′stripto10Selena Gomez’strip to Norway… for UCLA’s trip to NebraskaTsarnaev’ssix-month trip to Russiaescapes the FBI’s attention𝑍2𝑍3ΦjointΦphraseΦinΦcross(b)00𝑌stayin0Snowdenplans to stay in HongkongManzielstays in Austinto attend a fraternity party𝑍2𝑍1ΦjointΦphraseΦin(c)Figure3:(un)Theconnectedcomponentsdepictedasplatemodel,whereeachYisaBooleanvariableforarelationphraseandeachZisaBooleanvariableforatrainingsentenceforwiththatphrase;(b)et(c)areexampleconnectedcomponentsfortheeventphrases’striptoandstayinrespectively.ThegoalofthemodelistosetY=1forgoodparaphrasesofarelationandtosetZ=1forgoodtrainingsentences.Zsp=1.Theunionofsuchsentencesoverthedif-ferentphrases,∪p{s|Zsp=1},deﬁnesthetrainingsentencesfortheevent.Figure3(b)and3(c)showtwoexampleconnected-componentsfortheeventphrases’striptoandstayinrespectively.Now,wecandeﬁnethejointdistributionovertheeventphrasesandthesentences.Thejointdis-tributionisafunctiondeﬁnedonfactorsthaten-codeourobservationsaboutNewsSpikesasfeaturesandconstraints.ThephrasefactorΦphraseisalog-linearfunctionattachingtoYpwiththeparaphras-ingfeatures,suchaswhetherpandeco-occurintheNewsSpikes,orwhetherpsharesthesameheadwordwithe.Theyareusedtodistinguishwhetherpisagoodeventphrase.Asentenceshouldnotbeidentiﬁedasagoodtrainingsentenceifitdoesnotcontainapositiveeventphrase.Forexample,ifYstayininFigure3(b)takesthevalueof0,thusallsentenceswiththeeventphrasestayinshouldalsotakethevalueof0.WeimplementthisconstraintwithajointfactorΦjointamongYpandZspvariable.Inaddition,goodtrainingsentencesoccurwhentheNewsSpikeisaneventinstance.Toencodethisobservation,weneedtofeaturizetheNewsSpikesandletthembiastheassignments.Ourmodelim-plementsthiswithtwotypesoflog-linearfactors:(1)theunaryin-spikefactorΦindependsonthesen-tencevariablesandcontainsfeaturesaboutthecor-respondingNewsSpike.Thefactorisusedtodis-tinguishwhethertheNewsSpikeisaninstanceofe(t1,t2),suchaswhethertheargumenttypesoftheNewsSpikematchthedesignatedtypest1,t2;(2)thepairwisecross-spikefactorsΦcrossconnectpairsofsentences.ThisusesfeaturessuchaswhetherthepairofNewsSpikesforthetwosentenceshavehightextualsimilarity,andwhethertwoNewsSpikescon-tainnegatedeventphrases.Wedeﬁnethejointdistributionfortheconnectedcomponentforpasfollows.LetZbethevectorofsentencevariables,letxbethefeatures.Thejointdistributionis:p(Y=y,Z=z|X;Ème)def=1ZxΦphrase(oui,X)×Φjoint(oui,z)YsΦin(zs,X)Ys,s0Φcross(zs,zs0,x)wheretheparametervectorΘistheweightvec-torofthefeaturesinΦinandΦcross,whicharelog-linearfunctions.ThejointfactorsΦjointiszerowhenYp=0butsomeZsp=1.Otherwise,itissetto1.WeuseintegerlinearprogrammingtoperformMAPinferenceonthemodel,ﬁndingthepredictionsy,zthatmaximizetheprobability.5.3LearningfromHeuristicLabelsWenowpresentthelearningalgorithmforourjointclustermodel.ThegoalofthelearningalgorithmistosetΘforthelog-linearfunctionsinthefactorsinawaythatmaximizesthelikelihoodestimation.Wedothisinatotallyunsupervisedmanner,sincemanualannotationisexpensiveandnotscalabletolargenumbersofeventrelations.Theweightsarelearnedinthreesteps:(1)NEWSSPIKE-REcreatesasetofheuristiclabelsforasubsetofvariablesinthegraphicalmodel;(2)itusestheheuristiclabelsassupervisionforthemodel;(3)itupdatesΘwiththeperceptronlearningalgorithm.Theweightsareusedtoinferthevaluesofthevariablesthatdon’thaveheuristiclabels.TheprocedureissummarizedinFigure4.ForeacheventrelationE=e(t1,t2),NEWSSPIKE-REcreatesheuristiclabelsasfollows:

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

123

Saisir:NewsSpikesandtheconnectedcomponentsofthemodel;HeuristicLabels:1.ﬁndpositiveandnegativephrasesandsentencesP+,P−,S+,S−;2.labeltheconnectedcomponenetsaccordinglyandcreate{(Ylabeli,Zlabeli)|Mi=1}.Apprentissage:UpdateΘwiththeperceptronlearningal-gorithm.Output:thevaluesofallvariablesintheconnectedcomponentswiththeMAPinference.Figure4:LearningfromHeuristicLabels(1)P+:thetemporalfunctionalityheuristic(ZhangandWeld,2013)saysthatifaneventphrasepco-occurswitheintheNewsSpikes,ittendstobeaparaphraseofe.Weaddthemostfrequentlyco-occurringeventphrasestoP+.P+alsoincludeseitself.(2)P−:thetemporalnegationheuristicsaysthatifpandeco-occurintheNewsSpikebutoneofthemisinitsnegatedform,pshouldbenega-tivelylabeled.WeaddthoseeventphrasestoP−.IfaphrasepappearsinbothP+andP−,were-moveitfrombothsets.(3)S+:weﬁrstgetthepositiveNewsSpikesfromthesolutionoftheedge-coverprobleminsection4.WetreattheNewsSpikeηaspositiveiftheedgebetweenηandEiscov-ered.Next,everysentencewithp∈P+isaddedintoS+.(4)S−:sincetheeventrelationsdiscoveredinsection4tendtobedistinctrelations,asentenceistreatedasnegativesentenceforEifitisheuris-ticallylabeledaspositiveforE06=E.Inaddition,S−includesallsentenceswithp∈P−.WithP+,P−,S+,S−,wedeﬁnetheheuristicla-beledsettobe{(Ylabeli,Zlabeli)|Mi=1},whereMisthenumberoftheconnectedcomponentswiththecorrespondingeventphrasesp∈P+∪P−;Ylabeli=1ifp∈P+andYlabeli=0ifp∈P−.Ziislabeledsimilarly,butnotethatifthesentenceintheconnectedcomponentdoesn’texistinS+∪S−,NEWSSPIKE-REdoesn’tincludethecorrespondingvariableinZlabeli.With{(Ylabeli,Zlabeli)|Mi=1},learn-ingcanbedonewithmaximumlikelihoodestima-tionasL(Ème)=logQip(Yi=ylabeli,Zi=zlabeli|xi,Ème).Following(Collins,2002),weuseafastper-ceptronlearningapproachtoupdateΘ.Itconsistsofiteratingtwosteps:(1)MAPinferencegiventhecurrentweight;(2)penalizingtheweightsifthein-ferredassignmentsaredifferentfromtheheuristiclabeledassignments.6SententialEventExtractionAsshowninFigure1,welearntheextractorsfromthegeneratedtrainingsentences.Notethatmostdis-tantsupervised(Hoffmannetal.,2011;Surdeanuetal.,2012)approachesusemulti-instance,aggregate-leveltraining(i.e.thesupervisioncomesfromla-beledsetsofinstancesinsteadofindividuallyla-beledsentences).Copingwiththenoiseinherentinthesemulti-instancebagsremainsabigchallengefordistantsupervision.Incontrast,oursentence-leveltrainingdataismoredirectandminimizesnoise.Therefore,weimplementtheeventextrac-torasasimplemulti-class,L2-regularizedlogisticregressionclassiﬁer.Forfeaturesoftheclassiﬁer,weusethelexi-calizeddependencypaths,theOpenIEphrases,theminimalsubtreeofthedependencyparseandthebag-of-wordsbetweenthearguments.Wealsoaug-mentthemwithﬁnegrainedargumenttypespro-ducedbyFIGER(LingandWeld,2012).Theeventextractorthatislearnedcantakeindividualtestsen-tences(s,a1,a2)asinputandpredictwhetherthatsentenceexpressestheeventbetween(a1,a2).7EmpiricalEvaluationOurevaluationaddressestwoquestions.Section7.2considerswhetherourtraininggenerationalgorithmidentiﬁesaccurateanddiversesentences.Then,Section7.3investigateswhethertheeventextrac-tor,learnedfromthetrainingsentences,outperformsotherextractionapproaches.7.1ExperimentalSetupWefollowtheproceduredescribedin(ZhangandWeld,2013)tocollectparallelnewsstreamsandgeneratetheNewsSpikes:ﬁrst,wegetnewsseedsandquerytheBingnewswiresearchenginetogatheradditional,time-stamped,newsarticlesonasimi-lartopic;next,weextractOpenIEtuplesfromthenewsarticlesandgroupthesentencesthatsharethesameargumentsanddateintoNewsSpikes.Wecol-lectedthenewsstreamcorpusfromMarch1st2013toJuly1st2014.Wesplitthedatasetintotwoparts:inthetrainingphrase,weusethenewsstreamsin2013(namedNS13)togeneratethetrainingsen-tences.NS13has33kNewsSpikescontaining173ksentences.Weevaluatedtheextractionperformanceonnewsarticlescollectedin2014(namedNS14).Inthis

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

124

chemin,wemakesurethetestsentencesareunseenduringtraining.Thereare15millionsentencesinNS14.Werandomlysample100kuniquesentenceshavingtwodifferentargumentsrecognizedbythenameentityrecognitionsystem.Foroureventdiscoveryalgorithm,wesetthenumberofeventrelationstobe30andrantheal-gorithmonNS13.Thealgorithmtakes6secondstorunona2.3GHzCPU.Notethatmostpreviousunsupervisedrelationdiscoveryalgorithmsrequireadditionalmanualpost-processingtoassignnamestotheoutputclusters.Incontrast,NEWSSPIKE-REdiscoverstheeventrelationsfullyautomaticallyandtheoutputisself-explanatory.Welistthemtogetherwiththeby-eventextractionperformanceinTable2.Fromthetable,wecanseethatmostofthediscov-eredeventrelationsaresalientwithlittleoverlapbe-tweenrelations.WhilewearbitrarilysetKto30inourexperi-ments,thereisnoinherentlimittothenumberofrelationphrasesaslongasthenewscorpusprovidessufﬁcientsupporttolearnanextractorforeachrela-tion.Infuture,weplantoexploremuchlargersetsofeventrelationstoseeiftheextractionaccuracyismaintained.ThejointclustermodelthatidentiﬁestrainingsentencesforeacheventrelationE=e(t1,t2)usescosinesimilaritybetweentheeventphrasepofasentenceandthecanonicalphrasesofeachrelationasfeaturesinthephrasefactorsinFigure3(un).Italsoincludesthecosinesimilaritybetweenpandasetof“anti-phrases”fortheeventrelationwhicharerecognizedbythetemporalnegationheuristic.Forthein-spikefactor,wemeasurewhethertheﬁne-grainedargumenttypesofthesentencereturnedfromtheFIGERsystemmatchestherequiredtirespectively.Inaddition,weimplementthefea-turesfrom(ZhangandWeld,2013)tomeasurewhetherthesentenceisdescribingtheeventoftheNewsSpike.Forthecross-spikefactors,weusetex-tualsimilarityfeaturesbetweenthetwosetsofpar-allelsentencestomeasurethedistancebetweenthepairofNewsSpikes.7.2QualityoftheGeneratedTrainingSetThekeytoagoodlearningsystemisahigh-qualitytrainingset.Inthissection,wecompareourjointmodelagainstpipelinesystemsthatconsiderpara-phrasesandargumenttypematchingsequentially,systemalldiverse#mi.ma.#mi.ma.Basic43,718.50.6212,701.38.51Yates0915,212.78.76586.48.50Ganit1314,420.74.711,210.53.53Zhang1314,804.76.75890.63.61NEWSSPIKE-RE20,105.88.892,156.71.72w/ocross16,463.86.861,883.67.69w/oneg33,548.76.814,019.64.68Table1:Qualityofthegeneratedtrainingsentences(count,micro-andmacro-accuracy),where“all”in-cludessentenceswithalleventphrasesand“diverse”arethosewithdistincteventphrases.basedonthefollowingparaphrasingtechniques.Basicisbasedonthetemporalfunctionalityheuristicof(ZhangandWeld,2013).IttreatsalleventphrasesappearinginthesameNewsSpikeasparaphrases.Yates09usesResolver(YatesandEtzioni,2009)tocreateclustersofphrases.Re-solvermeasuresthesimilaritybetweenthephrasesbymeansofbothdistributionalfeaturesandtextualfeatures.WeconvertthesentencesinNewsSpikesintotuplesintheformof(a1,p,A2),andrunRe-solveronthesetuplestogeneratetheparaphrases.Zhang13:Weusedthegeneratedparaphrasesetfrom(ZhangandWeld,2013).Ganit13:Gan-itkevitchetal.(2013)releasedalargeparaphrasedatabase(PPDB)basedonexploitingthebilingualparallelcorpora.Notethatsomeofthesepara-phrasingsystemsdonothandledependencypaths.Sowhenpisadependencypath,weusethesur-facestringbetweentheargumentsasthephrase.NewsSpike-RE:WealsoconductablationtestingonNEWSSPIKE-REtomeasuretheeffectofthecross-spikefactorsandthetemporalnegationheuris-tic:w/oCrossusesasimplermodelbyremov-ingthecross-spikefactorsofNEWSSPIKE-RE;w/oNegationusesthesamejointclustermodelasNEWSSPIKE-REbutremovesthefeaturesandtheheuristiclabelscomingfromthetemporalnegationheuristic.Wemeasuredthemicro-andmacro-accuracyofeachsystembymanuallylabeling1000randomlychosenoutputfromeachsystem3.Annotatorsreadeachtrainingsentence,anddecidedifitwasagoodexampleforaparticularevent.Wealsoreportthenumberofgeneratedsentences.Sincetheextrac-torshouldgeneralizeoversentenceswithdissimilarexpressions,itiscrucialtoidentifysentenceswith3TwoOdeskworkerswereaskedtolabelthedataset,agrad-uatestudentthenreconciledanydisagreements.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

125

EventF1@maxrecallareau/PRcurveareau/diversePRcurve#R13R13PN-RER13R13PN-RE#R13R13PN-REacquire(organization,person)590.340.330.580.260.260.57200.260.170.58arrivein(organization,location)950.110.400.560.010.120.42180.010.020.50arrivein(person,location)1300.610.860.860.350.670.93180.260.330.80beat(organization,organization)1780.420.850.900.140.640.84240.060.530.58beat(person,person)1070.570.820.940.210.530.91140.080.250.77buy(organization,organization)840.470.470.780.250.500.82340.190.400.79defend(person,person)410.370.380.520.360.470.65120.130.060.47dieat(person,number)1580.530.970.980.310.930.97170.330.830.94mourir(person,temps)1790.850.910.970.660.800.96160.220.630.87ﬁre(organization,person)390.360.330.530.320.450.8880.200.100.66hit(event,location)330.000.420.640.000.510.48240.000.450.50lead(person,organization/sportsteam)1190.770.860.870.570.730.77140.300.360.62leave(person,organization)610.400.520.590.140.380.57140.070.130.38meetwith(person,person)1370.740.860.920.480.730.88140.280.560.93nominate(person/politician,person)440.120.380.540.130.440.77270.110.530.75pay(organization,money)1340.770.910.930.520.850.90170.330.900.56place(organization,person)340.170.280.500.240.230.95160.190.210.94play(person/artist,person)1730.920.890.870.880.790.73150.630.560.47release(organization,person)300.180.220.600.080.250.72160.060.150.81remplacer(person,person)1150.820.890.940.620.750.87180.460.580.89report(governmentagency,temps)1400.370.840.910.090.740.83350.060.520.70report(writtenwork,temps)1300.640.850.830.430.820.74220.380.580.51returnto(person/athlete,location)450.140.340.500.030.300.49210.080.230.78shoot(person,number)1010.710.890.920.490.740.8480.350.370.48signwith(person,organization)1290.470.620.890.250.460.85440.150.170.91sign(organization,person)1100.450.710.850.260.630.79260.150.270.66unveil(organization,product)880.430.710.440.260.520.30220.310.220.63vote(gouvernement,temps)320.290.240.740.320.250.77190.350.220.83winat(person,location)1000.240.680.850.080.600.90400.010.420.90win(person,event)1070.540.770.860.220.630.77190.030.260.78microaverage2,9030.530.700.810.300.590.806090.150.310.71macroaverage970.460.640.760.300.560.76200.200.370.70Table2:Performanceofextractorsbyeventrelation,reportingbothprecisionandtheareaunderthePRcurve.The#columnshowsthenumberoftrueextractionsinthepoolofsampledoutput.NEWSSPIKE-RE(labeledN-RE)outperformstwoimplementationsofRiedel’sUniversalSchemas(SeeSection7.3fordetails).TheadvantageofNEWSSPIKE-REoverUniversalSchemasisgreatestonadiversetestsetwhereeachsentencehasadistincteventphrase.diverseeventphrases.Thereforewealsomeasuredtheaccuracyandthecountofa“diverse”condition:onlyconsiderthesubsetofsentenceswithdistincteventphrases.Table1showstheaccuracyandthenumberoftrainingexamples.Thebasictemporalsystembringsus0.50/0.62micro-andmacro-accuracyoveralland0.38/0.51inthediversecondition.ItshowsthatNewsSpikesarepromisingresourcestogeneratethetrainingset,butthatelaborationisnec-essary.Yates09gets0.78/0.76accuracyoverallbe-causeitstextualfeatureshelpittorecognizemanygoodsentenceswithsimilarphrases.Butforthedi-versecondition,itgetslowerprecisionbecausethedistributionalhypothesisfailstodistinguishthosecorrelatedbutdifferentphrases.AlthoughGanitkevitch13andZhang13leverageexistingparaphrasedatabases,itisinterestingthattheiraccuracyisstillnotgood.Itislargelybecausemanytimestheparaphrasingmustdependonthecontext:e.g.“CutlerhitsMartellusBennettwithTDinclosingseconds.”isnotgoodforthebeat(team,team)relation,eventhoughhitisasynonymforbeatingeneral.Thesetwosystemsshowthatitisnotenoughtouseanoff-the-shelfparaphrasingdatabaseforextraction.Theablationtestshowstheeffectivenessofthetemporalnegationhypothesis:afterturningofftherelevantfeaturesandheuristiclabels,theprecisiondropsabout10percentagepoints.Inaddition,thecross-spikefactorsbringNEWSSPIKE-REabout22%moretrainingsentencesandalsoincreasetheaccuracy.WedidbootstrapsamplingtotestthestatisticalsigniﬁcanceofNEWSSPIKE-RE’simprovementinaccuracyovereachcomparisonsystemandabla-tionofNEWSSPIKE-RE.Foreachsystemwecom-putedtheaccuracyof10samplesof100labeledoutputs.Wethenranthepairedt-testovertheac-curacynumbersofeachothersystemcomparedto

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

126

NEWSSPIKE-RE.Forallbutw/ocrosstheimprove-mentisstronglysigniﬁcantwithp-valuelessthan1%.Theincreaseinaccuracycomparedtow/ocrosshasborderlinesigniﬁcance(p-value5.5%),butisaclearwinwithits22%increaseintrainingsize.7.3PerformanceoftheEventExtractorsMostpreviousrelationextractionapproacheseitherrequireamanuallylabeledtrainingset,orworkonlyonapre-deﬁnedsetofrelationsthathavegroundinstancesfromKBs.TheclosestworktoNEWSSPIKE-REisUniversalSchemas(Riedeletal.,2013),whichaddressesthelimitationofdis-tantsupervisionthattherelationsmustexistinKBs.Theirsolutionistotreatthesurfacestrings,de-pendencypaths,andrelationsfromKBsasequal“schemas”,andthentoexploitthecorrelationbe-tweentheinstancesandtheschemasfromaverylargeunlabeledcorpus.Intheirpaper,Riedeletal.evaluatedonlyonstaticrelationsfromFreebaseandachievestate-of-the-artperformance.ButUniversalSchemascanbeadaptedtohandleevents,byintro-ducingtheeventsasschemasandheuristicallyﬁnd-ingseedinstances.Wesetupacompetingsystem(R13)asfollows:(1)WetaketheNYTimescorpuspublishedbetween1987and2007(Sandhaus,2008),thedatasetusedbyRiedeletal.(2013)containing1.8millionNYTimesarticles;(2)Theinstances(i.e.therowsofthematrix)comefromtheentitypairsfromthenewsar-ticles;(3)Therearetwotypesofcolumns:somearetheextractionfeaturesusedbyNEWSSPIKE-RE,in-cludingthelexicalizeddependencypathsdescribedinRiedeletal.;othersareeventrelationsE=e(t1,t2);(4)Foranentitypair(a1,a2),ifthereisanOpenIEextraction(a1,e,A2)andtheentitytypesof(a1,a2)match(t1,t2),weassumetheeventrela-tionEisobservedonthatinstance.AsshowninTable1,parallelnewsstreamsareapromisingresourceforclusteringbecauseofthestrongcorrelationbetweentheinstancesandtheeventphrases.WetrainanotherversionofUniversalSchemasR13PontheparallelnewsstreamsNS13.Inparticular,entitypairsfromdifferentNewsSpikesareusedasdifferentrowsinthematrix.Wewouldliketomeasuretheprecisionandre-calloftheextractors.Butnotethatitisimpos-sibletofullylabelallthesentences,sowefollowthe“pooling”techniquedescribedin(Riedeletal.,2013)tocreatethelabeleddataset.Foreverycom-petingsystem,wesample100topoutputsforeveryeventrelationandaddthistothepool.Theanno-tatorsareshownthesesentencesandaskedtojudgewhetherthesentenceexpressestheeventrelationornot.Afterthat,thelabeledsetbecome“gold”andcanbeusedtomeasuretheprecisionandpseudo-recall.Thereareinall6,178distinctsentencesinthepool,sincesomeoutputsareproducedbymul-tiplesystems.Amongthem,2,903sentencesarela-beledaspositive.InTable2,the#columnsshowthenumberoftrueextractionsinthepoolforeveryeventrelation.SimilartothediverseconditioninTable1,itisimportantthattheextractorcancorrectlypredictondiversesentencesthataredissimilartoeachother.Thusweconducteda“diversepooling”:foreachsystem,wereportnumbersforthesentenceswithdifferentdependencypathsbetweentheargumentsforeverydiscoveredevent.Figure5(un)showstheprecisionpseudo-recallcurveforallsentencesforthethreesystems.NEWSSPIKE-REoutperformsthecompetingsys-temsbyalargemargin.Forexample,theareaun-derthecurve(AUC)ofNEWSSPIKE-REforallsen-tencesis0.80whilethatofR13PandR13are0.59and0.30.Thisisa35%increaseoverR13Pand2.7timestheareacomparedtoR13.SimilarincreasesinAUCareobservedondiversesentences.Table2furtherliststhebreakdownnumbersforeacheventrelation,aswellasthemicroandmacroaverage.AlthoughUniversalSchemashadsomesuccessforseveralrelations,NEWSSPIKE-REachievedthebestF1for26outof30eventrelations;bestAUCfor26outof30.Theadvantageisevengreaterinthedi-versecondition.ItisinterestingtoseethatR13PperformsmuchbetterthanR13,sincethedatacom-ingfromNYTimesismuchnoisier.AcloserlookshowsthatUniversalSchemastendstoconfusecorrelatedbutdifferentphrases.NEWSSPIKE-RE,cependant,rarelymadetheseerrorsbecauseourmodelcaneffectivelyexploitnegativeevidencetodistinguishthem.7.3.1ComparingtoDistantSupervisionAlthoughthemosteventrelationsinTable2can-notbehandledbythedistantsupervisedapproach,itispossibletomatchbuy(org,org)toFreebasere-lationswithappropriatedatabaseoperatorssuchas

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

127

0.00.20.40.60.81.00.00.20.40.60.81.0R13: Uschema on NYTR13P: Uschema on NS13NewsSpike-RE on NS13Pseudo RecallPrecision(un)0.00.20.40.60.81.00.00.20.40.60.81.0R13: Uschema on NYTR13P: Uschema on NS13NewsSpike-RE on NS13Pseudo RecallPrecisionDS on NYT(b)Figure5:Precisionpseudo-recallcurvesfor(un)alleventrelations;(b)buy(org,org),thisﬁgureincludesthedistantsupervisionalgorithmMIMLlearnedfrommatchingtheFreebaserelation5toTheNewYorkTimes.NEWSSPIKE-REhasAUC0.80,morethandoublingR13(0.30)and35%higherthanR13P(0.59)foralleventrelations.joinandselect(Zhangetal.,2012).Toevaluatehowdistantsupervisionperforms,weintroducethesystemDSonNYTbasedonamanualmappingofbuy(org,org)tothejoinrelation4inFreebase.ThenwematchitsinstancestoNYTimesarticlesandfol-lowthestepsofSurdeanuetal.(2012)totraintheextractor.ThematchingtoNYTimesbringsus264positiveinstanceshaving5,333sentences,butunfortunatelythesentence-levelaccuracyisonly13%basedonexaminationof100randomsentences.Figure5(b)showsthePRcurvesforallthecompetingsystems.Distantsupervisionpredictsthetopextractionscor-rectlybecausethemulti-instancetechniquerecog-nizessomecommonexpressions(e.g.buy,acquire),buttheprecisiondropsdramaticallysincemostpos-itiveexpressionsareoverwhelmedbythenoise.8ConclusionsandFutureWorkPopulardistantsupervisedapproacheshavelimitedabilitytohandleeventextraction,sinceﬂuentfactsarehighlytimedependentandoftendonotexistinanyKB.Thispaperpresentsanovelunsupervisedapproachforeventextractionthatexploitsparallelnewsstreams.OurNEWSSPIKE-REsystemauto-maticallyidentiﬁesasetofargument-typedeventsfromanewscorpus,andthenlearnsasentential(micro-reading)extractorforeachevent.Weintroducedanovel,temporalnegationheuris-ticforparallelnewsstreamsthatidentiﬁeseventphrasesthatarecorrelated,butarenotparaphrases.Weencodedthisinaprobabilisticgraphicalmodel4/organization/organization/companies_acquired1/business/acquisition/company_acquiredtoclustersentences,generatinghighqualitytrainingdatatolearnasententialextractor.Thisprovidesnegativeevidencecrucialtoachievinghighpreci-siontrainingdata.Experimentsshowthehighqualityofthegener-atedtrainingsentencesandconﬁrmtheimportanceofournegationheuristic.Ourmostimportantexper-imentshowsthatwecanlearnaccurateeventextrac-torsfromthistrainingdata.NEWSSPIKE-REout-performscomparableextractorsbyawidemargin,morethandoublingtheareaunderaprecision-recallcurvecomparedtoUniversalSchemas.Infutureworkweplantoimplementoursystemasanend-to-endonlineservice.Thiswouldallowuserstoconvenientlydeﬁneeventsofinterest,learnextractorsforeachevent,andreturnextractedfactsfromnewsstreams.AcknowledgmentsWethankHalDaumeIII,XiaoLing,LukeZettle-moyerandthereviewers.ThisworkwassupportedbyONRgrantN00014-12-1-0211,theWRF/CableProfessorship,agiftfromGoogle,andtheDefenseAdvancedResearchProjectsAgency(DARPA)Ma-chineReadingProgramunderAirForceResearchLaboratory(AFRL)primecontractno.FA8750-09-C-0181.Anyopinions,résultats,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreﬂecttheviewofDARPA,AFRL,ortheUSgovernment.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

128

ReferencesEugeneAgichteinandLuisGravano.2000.Snowball:extractingrelationsfromlargeplain-textcollections.InACMDL,pages85–94.MicheleBanko,MichaelJ.Cafarella,StephenSoderland,MatthewBroadhead,andOrenEtzioni.2007.Openinformationextractionfromtheweb.InProceedingsofthe20thInternationalJointConferenceonArtiﬁcialIntelligence(IJCAI-07),pages2670–2676.ReginaBarzilayandLillianLee.2003.Learningtoparaphrase:anunsupervisedapproachusingmultiple-sequencealignment.InProceedingsofthe2003Con-ferenceoftheNorthAmericanChapteroftheAssoci-ationforComputationalLinguisticsonHumanLan-guageTechnology(HLT-NAACL),pages16–23.ReginaBarzilayandKathleenRMcKeown.2001.Ex-tractingparaphrasesfromaparallelcorpus.InPro-ceedingsofthe39thAnnualMeetingonAssociationforComputationalLinguistics(ACL),pages50–57.EdwardBenson,AriaHaghighi,andReginaBarzilay.2011.Eventdiscoveryinsocialmediafeeds.InPro-ceedingsofthe49thAnnualMeetingoftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies(HLT-NAACL),pages389–398.SergeyBrin.1999.Extractingpatternsandrelationsfromtheworldwideweb.InTheWorldWideWebandDatabases,pages172–183.AndrewCarlson,JustinBetteridge,BryanKisiel,BurrSettles,EstevamR.HruschkaJr.,andTomM.Mitchell.2010.Towardanarchitecturefornever-endinglanguagelearning.InProceedingsoftheAAAIConferenceonArtiﬁcialIntelligence(AAAI-10).MichaelCollins.2002.Discriminativetrainingmeth-odsforhiddenMarkovmodels:Theoryandexperi-mentswithperceptronalgorithms.InProceedingsoftheACL-02conferenceonEmpiricalmethodsinnatu-rallanguageprocessing-Volume10,pages1–8.MarkCravenandJohanKumlien.1999.Constructingbiologicalknowledgebasesbyextractinginformationfromtextsources.InProceedingsoftheSeventhInter-nationalConferenceonIntelligentSystemsforMolec-ularBiology(ISMB),pages77–86.BillDolan,ChrisQuirk,andChrisBrockett.2004.Un-supervisedconstructionoflargeparaphrasecorpora:Exploitingmassivelyparallelnewssources.InCom-putationalLinguistics,page350.AnthonyFader,StephenSoderland,andOrenEtzioni.2011.Identifyingrelationsforopeninformationextraction.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1535–1545.JennyRoseFinkel,TrondGrenager,andChristopherManning.2005.Incorporatingnon-localinforma-tionintoinformationextractionsystemsbyGibbssam-pling.InProceedingsofthe43rdAnnualMeetingonAssociationforComputationalLinguistics(ACL),pages363–370.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:Theparaphrasedatabase.InJointHumanLanguageTechnologyCon-ference/AnnualMeetingoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics(HLT-NAACL2013),pages758–764.TakaakiHasegawa,SatoshiSekine,andRalphGrishman.2004.Discoveringrelationsamongnamedentitiesfromlargecorpora.InProceedingsofthe42ndAnnualMeetingonAssociationforComputationalLinguistics(ACL),page415.RaphaelHoffmann,CongleZhang,XiaoLing,LukeZettlemoyer,andDanielSWeld.2011.Knowledge-basedweaksupervisionforinformationextractionofoverlappingrelations.InProceedingsofthe49thAn-nualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(HLT-ACL),pages541–550.RuihongHuangandEllenRiloff.2013.Multi-facetedeventrecognitionwithbootstrappeddictionaries.IntheConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(HLT-NAACL),pages41–51.DekangLinandPatrickPantel.2001.Discoveryofinfer-encerulesforquestion-answering.NaturalLanguageEngineering,7(4):343–360.XiaoLingandDanielSWeld.2012.Fine-grainedentityrecognition.InAssociationfortheAdvancementofArtiﬁcialIntelligence(AAAI).DavidMcClosky,MihaiSurdeanu,andChristopherDManning.2011.Eventextractionasdependencypars-ing.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:Hu-manLanguageTechnologies(HLT-ACL),pages1626–1635.MikeMintz,StevenBills,RionSnow,andDanielJuraf-sky.2009.Distantsupervisionforrelationextrac-tionwithoutlabeleddata.InProceedingsofthe47thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1003–1011.NdapandulaNakashole,MartinTheobald,andGerhardWeikum.2011.Scalableknowledgeharvestingwithhighprecisionandhighrecall.InProceedingsofthefourthACMinternationalconferenceonWebsearchanddatamining(WSDM),pages227–236.GeorgeLNemhauserandLaurenceAWolsey.1988.In-tegerandcombinatorialoptimization,volume18.Wi-leyNewYork.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
7
1
5
6
6
7
5
0

/
t

un
c
_
un
_
0
0
1
2
7
p
d

b
oui
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

129

RoiReichartandReginaBarzilay.2012.Multieventex-tractionguidedbyglobalconstraints.InProceedingsofthe2012ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(HLT-NAACL),pages70–79.KevinReschke,MartinJankowiak,MihaiSurdeanu,ChristopherDManning,andDanielJurafsky.2014.Eventextractionusingdistantsupervision.InLan-guageResourcesandEvaluationConference(LREC).SebastianRiedel,LiminYao,andAndrewMcCallum.2010.Modelingrelationsandtheirmentionswith-outlabeledtext.InMachineLearningandKnowledgeDiscoveryinDatabases(ECML),pages148–163.SebastianRiedel,DavidMcClosky,MihaiSurdeanu,An-drewMcCallum,andChristopherDManning.2011.ModelcombinationforeventextractioninBioNLP2011.InProceedingsoftheBioNLPSharedTask2011Workshop,pages51–55.SebastianRiedel,LiminYao,BenjaminM.Mar-lin,andAndrewMcCallum.2013.Relationextractionwithmatrixfactorizationanduniversalschemas.InJointHumanLanguageTechnologyCon-ference/AnnualMeetingoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics(HLT-NAACL).AlanRitter,OrenEtzioni,SamClark,etal.2012.Opendomaineventextractionfromtwitter.InProceedingsofthe18thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining(KDD),pages1104–1112.EvanSandhaus.2008.TheNewYorkTimesannotatedcorpus.LinguisticDataConsortium.YusukeShinyamaandSatoshiSekine.2006.Preemp-tiveinformationextractionusingunrestrictedrelationdiscovery.InProceedingsofthemainconferenceonHumanLanguageTechnologyConferenceoftheNorthAmericanChapteroftheAssociationofCom-putationalLinguistics(HLT-NAACL),pages304–311.MihaiSurdeanu,JulieTibshirani,RameshNallapati,andChristopherDManning.2012.Multi-instancemulti-labellearningforrelationextraction.InProceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning(EMNLP),pages455–465.ShingoTakamatsu,IsseiSato,andHiroshiNakagawa.2011.Probabilisticmatrixfactorizationleveragingcontextsforunsupervisedrelationextraction.InAd-vancesinKnowledgeDiscoveryandDataMining,pages87–99.FeiWuandDanielS.Weld.2007.Autonomouslyse-mantifyingwikipedia.InProceedingsoftheInter-nationalConferenceonInformationandKnowledgeManagement(CIKM),pages41–50.LiminYao,AriaHaghighi,SebastianRiedel,andAndrewMcCallum.2011.Structuredrelationdiscoveryusinggenerativemodels.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcess-ing(EMNLP),pages1456–1466.LiminYao,SebastianRiedel,andAndrewMcCallum.2012.Unsupervisedrelationdiscoverywithsensedis-ambiguation.InProceedingsofthe50thAnnualMeet-ingoftheAssociationforComputationalLinguistics(ACL),pages712–720.AlexanderYatesandOrenEtzioni.2009.Unsupervisedmethodsfordeterminingobjectandrelationsynonymsontheweb.JournalofArtiﬁcialIntelligenceResearch,34(1):255.CongleZhangandDanielSWeld.2013.Harvestingpar-allelnewsstreamstogenerateparaphrasesofeventre-lations.InProceedingsofthe2013JointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning(EMNLP),pages455–465.CongleZhang,RaphaelHoffmann,andDanielSWeld.2012.Ontologicalsmoothingforrelationextractionwithminimalsupervision.InAssociationfortheAd-vancementofArtiﬁcialIntelligence(AAAI).