Transactions of the Association for Computational Linguistics, vol. 5, pp. 101–115, 2017. Action Editor: Mark Johnson.
Submission batch: 10/2016; Revision batch: 4/2017; Published 4/2017.
2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.
c
(cid:13)
Cross-SentenceN-aryRelationExtractionwithGraphLSTMsNanyunPeng1∗HoifungPoon2ChrisQuirk2KristinaToutanova3∗Wen-tauYih21CenterforLanguageandSpeechProcessing,ComputerScienceDepartmentJohnsHopkinsUniversity,Baltimore,MD,USA2MicrosoftResearch,Redmond,WA,USA3GoogleResearch,Seattle,WA,USAnpeng1@jhu.edu,kristout@google.com{hoifung,chrisq,scottyih}@microsoft.comAbstractPastworkinrelationextractionhasfocusedonbinaryrelationsinsinglesentences.Re-centNLPinroadsinhigh-valuedomainshavesparkedinterestinthemoregeneralsettingofextractingn-aryrelationsthatspanmul-tiplesentences.Inthispaper,weexploreageneralrelationextractionframeworkbasedongraphlongshort-termmemorynetworks(graphLSTMs)thatcanbeeasilyextendedtocross-sentencen-aryrelationextraction.ThegraphformulationprovidesaunifiedwayofexploringdifferentLSTMapproachesandin-corporatingvariousintra-sententialandinter-sententialdependencies,suchassequential,syntactic,anddiscourserelations.Arobustcontextualrepresentationislearnedfortheen-tities,whichservesasinputtotherelationclas-sifier.Thissimplifieshandlingofrelationswitharbitraryarity,andenablesmulti-tasklearningwithrelatedrelations.Weevaluatethisframe-workintwoimportantprecisionmedicineset-tings,demonstratingitseffectivenesswithbothconventionalsupervisedlearninganddistantsupervision.Cross-sentenceextractionpro-ducedlargerknowledgebases.andmulti-tasklearningsignificantlyimprovedextractionac-curacy.AthoroughanalysisofvariousLSTMapproachesyieldedusefulinsighttheimpactoflinguisticanalysisonextractionaccuracy.1IntroductionRelationextractionhasmadegreatstridesinnewswireandWebdomains.Recently,therehas∗ThisresearchwasconductedwhentheauthorswereatMicrosoftResearch.beenincreasinginterestinapplyingrelationextrac-tiontohigh-valuedomainssuchasbiomedicine.Theadventof$1000humangenome1heraldsthedawnofprecisionmedicine,butprogressinpersonalizedcan-certreatmenthasbeenhinderedbythearduoustaskofinterpretinggenomicdatausingpriorknowledge.Forexample,givenatumorsequence,amoleculartumorboardneedstodeterminewhichgenesandmu-tationsareimportant,andwhatdrugsareavailabletotreatthem.Alreadytheresearchliteraturehasawealthofrelevantknowledge,anditisgrowingatanastonishingrate.PubMed2,theonlinerepositoryofbiomedicalarticles,addstwonewpapersperminute,oronemillioneachyear.Itisthusimperativetoadvancerelationextractionformachinereading.Inthevastliteratureonrelationextraction,pastworkfocusedprimarilyonbinaryrelationsinsinglesentences,limitingtheavailableinformation.Con-siderthefollowingexample:“Thedeletionmutationonexon-19ofEGFRgenewaspresentin16patients,whiletheL858Epointmutationonexon-21wasnotedin10.Allpatientsweretreatedwithgefitinibandshowedapartialresponse.”.Collectively,thetwosentencesconveythefactthatthereisaternaryinteractionbetweenthethreeentitiesinbold,whichisnotexpressedineithersentencealone.Namely,tumorswithL858EmutationinEGFRgenecanbetreatedwithgefitinib.Extractingsuchknowledgeclearlyrequiresmovingbeyondbinaryrelationsandsinglesentences.N-aryrelationsandcross-sentenceextractionhavereceivedrelativelylittleattentioninthepast.Prior1http://www.illumina.com/systems/hiseq-x-sequencing-system.html2https://www.ncbi.nlm.nih.gov/pubmed
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
102
Thedeletionmutationonexon-19ofEGFRgenewaspresentin16patients,whiletheL858Epointmutationonexon-21wasnotedin10.ROOTDETNNPREPONPREPOFNNNSUBJCOPPREPINADVCLNUMDETNNNNPREPONMARKNSUBJPASSAUXPASSPREPDEPNEXTSENTAllpatientsweretreatedwithgefitinibandshowedapartialresponse.DETNSUBJPASSAUXPASSPREPWITHCONJANDROOTDOBJDETAMODFigure1:Anexampledocumentgraphforapairofsentencesexpressingaternaryinteraction(tumorswithL858EmutationinEGFRgenerespondtogefitinibtreatment).Forsimplicity,weomitedgesbetweenadjacentwordsorrepresentingdiscourserelations.workonn-aryrelationextractionfocusedonsin-glesentences(Palmeretal.,2005;McDonaldetal.,2005)orentity-centricattributesthatcanbeextractedlargelyindependently(Chinchor,1998;SurdeanuandHeng,2014).Priorworkoncross-sentenceex-tractionoftenusedcoreferencetogainaccesstoar-gumentsinadifferentsentence(GerberandChai,2010;Yoshikawaetal.,2011),withouttrulymodel-inginter-sententialrelationalpatterns.(SeeSection7foramoredetaileddiscussion.)Anotableexcep-tionisQuirkandPoon(2017),whichapplieddistantsupervisiontogeneralcross-sentencerelationextrac-tion,butwaslimitedtobinaryrelations.Inthispaper,weexploreageneralframeworkforcross-sentencen-aryrelationextraction,basedongraphlongshort-termmemorynetworks(graphLSTMs).Byadoptingthegraphformulation,ourframeworksubsumespriorapproachesbasedonchainortreeLSTMs,andcanincorporatearichsetoflinguisticanalysestoaidrelationextraction.Relationclassificationtakesasinputtheentityrepresentationslearnedfromtheentiretext,andcanbeeasilyex-tendedforarbitraryrelationarityn.Thisapproachalsofacilitatesjointlearningwithkindredrelationswherethesupervisionsignalismoreabundant.Weconductedextensiveexperimentsontwoim-portantdomainsinprecisionmedicine.Inbothdis-tantsupervisionandsupervisedlearningsettings,graphLSTMsthatencoderichlinguisticknowledgeoutperformedotherneuralnetworkvariants,aswellasawell-engineeredfeature-basedclassifier.Multi-tasklearningwithsub-relationsledtofurtherim-provement.SyntacticanalysisconferredasignificantbenefittotheperformanceofgraphLSTMs,espe-ciallywhensyntaxaccuracywashigh.Inthemoleculartumorboarddomain,PubMed-scaleextractionusingdistantsupervisionfromasmallsetofknowninteractionsproducedordersofmagnitudemoreknowledge,andcross-sentenceex-tractiontripledtheyieldcomparedtosingle-sentenceextraction.Manualevaluationverifiedthattheaccu-racyishighdespitethelackofannotatedexamples.2Cross-sentencen-aryrelationextractionLete1,···,embeentitymentionsintextT.Rela-tionextractioncanbeformulatedasaclassificationproblemofdeterminingwhetherarelationRholdsfore1,···,eminT.Forexample,givenacancerpatientwithmutationvingeneg,amoleculartumorboardseekstofindifthistypeofcancerwouldre-spondtodrugd.Literaturewithsuchknowledgehasbeengrowingrapidly;wecanhelpthetumorboardbycheckingiftheRespondrelationholdsforthe(d,g,v)triple.Traditionalrelationextractionmethodsfocusonbinaryrelationswhereallentitiesoccurinthesamesentence(i.e.,m=2andTisasentence),andcannothandletheaforementionedternaryrelations.Moreover,aswefocusonmorecomplexrelationsandnincreases,itbecomesincreasinglyrarethattherelatedentitieswillbecontainedentirelyinasinglesentence.Inthispaper,wegeneralizeextractiontocross-sentence,n-aryrelations,wherem>2andTcancontainmultiplesentences.Aswillbeshowninourexperimentssection,n-aryrelationsarecrucialforhigh-valuedomainssuchasbiomedicine,andexpandingbeyondthesentenceboundaryenablestheextractionofmoreknowledge.Inthestandardbinary-relationsetting,thedom-inantapproachesaregenerallydefinedintermsoftheshortestdependencypathbetweenthetwoen-titiesinquestion,eitherbyderivingrichfeaturesfromthepathorbymodelingitusingdeepneural
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
103
networks.Generalizingthisparadigmtothen-arysettingischallenging,asthereare(cid:0)n2(cid:1)paths.OneapparentsolutionisinspiredbyDavidsonianseman-tics:first,identifyasingletriggerphrasethatsig-nifiesthewholerelation,thenreducethen-aryre-lationtonbinaryrelationsbetweenthetriggerandanargument.However,challengesremain.Itisof-tenhardtospecifyasingletrigger,astherelationismanifestedbyseveralwords,oftennotcontigu-ous.Moreover,itisexpensiveandtime-consumingtoannotatetrainingexamples,especiallyiftriggersarerequired,asisevidentinpriorannotationeffortssuchasGENIA(Kimetal.,2009).Therealisticandwidelyadoptedparadigmistoleverageindirectsu-pervision,suchasdistantsupervision(CravenandKumlien,1999;Mintzetal.,2009),wheretriggersarenotavailable.Additionally,lexicalandsyntacticpatternssigni-fyingtherelationwillbesparse.Tohandlesuchsparsity,traditionalfeature-basedapproachesrequireextensiveengineeringandlargedata.Unfortunately,thischallengebecomesmuchmoresevereincross-sentenceextractionwhenthetextspansmultiplesen-tences.Toovercomethesechallenges,weexploreagen-eralrelationextractionframeworkbasedongraphLSTMs.Bylearningacontinuousrepresentationforwordsandentities,LSTMscanhandlesparsityeffectivelywithoutrequiringintensefeatureengineer-ing.ThegraphformulationsubsumespriorLSTMapproachesbasedonchainsortrees,andcanincor-poraterichlinguisticanalyses.Thisapproachalsoopensupopportunitiesforjointlearningwithrelatedrelations.Forexample,theResponserelationoverd,g,valsoimpliesabinarysub-relationoverdrugdandmutationv,withthegeneunderspecified.Evenwithdistantsupervision,thesupervisionsignalforn-aryrelationswilllikelybesparserthantheirbinarysub-relations.Ourap-proachmakesitveryeasytousemulti-tasklearningoverboththen-aryrelationsandtheirsub-relations.3GraphLSTMsLearningacontinuousrepresentationcanbeeffectivefordealingwithlexicalandsyntacticsparsity.Forse-quentialdatasuchastext,recurrentneuralnetworks(RNNs)arequitepopular.TheyresemblehiddenContextual Entity Representation c w(1) …… w(n-1) w(n) …… … … Word Embeddings for Input Text ( ) concatenation Rela%on Classifier R1 …… Rk Graph LSTM … … Figure2:Ageneralarchitectureforcross-sentencen-aryrelationextractionbasedongraphLSTMs.Markovmodels(HMMs),exceptthatdiscretehid-denstatesarereplacedwithcontinuousvectors,andemissionandtransitionprobabilitieswithneuralnet-works.ConventionalRNNswithsigmoidunitssufferfromgradientdiffusionorexplosion,makingtrain-ingverydifficult(Bengioetal.,1994;Pascanuetal.,2013).Longshort-termmemory(LSTMs)(Hochre-iterandSchmidhuber,1997)combatstheseproblemsbyusingaseriesofgates(input,forgetandoutput)toavoidamplifyingorsuppressinggradientsduringbackpropagation.Consequently,LSTMsaremuchmoreeffectiveincapturinglong-distancedependen-cies,andhavebeenappliedtoavarietyofNLPtasks.However,mostapproachesarebasedonlinearchainsandonlyexplicitlymodelthelinearcontext,whichignoresavarietyoflinguisticanalyses,suchassyn-tacticanddiscoursedependencies.Inthissection,weproposeageneralframeworkthatgeneralizesLSTMstographs.WhilethereissomepriorworkonlearningtreeLSTMs(Taietal.,2015;MiwaandBansal,2016),tothebestofourknowledge,graphLSTMshavenotbeenappliedtoanyNLPtaskyet.Figure2showsthearchitectureofthisapproach.Theinputlayeristhewordembeddingofinputtext.NextisthegraphLSTMwhichlearnsacontextualrepresentationforeachword.Fortheentitiesinquestion,theircontextualrepresentationsareconcatenatedandbecometheinputtotherelationclassifiers.Foramulti-wordentity,wesimplyusedtheaverageofitswordrepresentationsandleavetheexplorationofmoresophisticatedaggregationapproachestofuturework.Thelayersaretrainedjointlywithbackpropagation.Thisframeworkis
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
104
All(cid:10)patients(cid:10)were(cid:10)treated(cid:10)with(cid:10)gefitinib(cid:10)and(cid:10)showed(cid:10)a(cid:10)partial(cid:10)response.◦→◦→◦→◦→◦→◦→◦→◦→◦→◦→◦◦←◦←◦←◦←◦←◦←◦←◦←◦←◦←◦Figure3:ThegraphLSTMsusedinthispaper.Thedocumentgraph(top)ispartitionedintotwodirectedacyclicgraphs(bottom);thegraphLSTMsisconstructedbyaforwardpass(LefttoRight)followedbyabackwardpass(RighttoLeft).Notethatinformationgoesfromdependencychildtoparent.agnostictothechoiceofclassifiers.JointlydesigningclassifierswithgraphLSTMswouldbeinterestingfuturework.AtthecoreofthegraphLSTMisadocumentgraphthatcapturesvariousdependenciesamongtheinputwords.Bychoosingwhatdependenciestoin-cludeinthedocumentgraph,graphLSTMsnaturallysubsumeslinear-chainortreeLSTMs.ComparedtoconventionalLSTMs,thegraphfor-mulationpresentsnewchallenges.Duetopotentialcyclesinthegraph,astraightforwardimplementationofbackpropagationmightrequiremanyiterationstoreachafixedpoint.Moreover,inthepresenceofapo-tentiallylargenumberofedgetypes(adjacent-word,syntacticdependency,etc.),parametrizationbecomesakeyproblem.Intheremainderofthissection,wefirstintroducethedocumentgraphandshowhowtoconductback-propagationingraphLSTMs.Wethendiscusstwostrategiesforparametrizingtherecurrentunits.Fi-nally,weshowhowtoconductmulti-tasklearningwiththisframework.3.1DocumentGraphTomodelvariousdependenciesfromlinguisticanaly-sisatourdisposal,wefollowQuirkandPoon(2017)andintroduceadocumentgraphtocaptureintra-andinter-sententialdependencies.Adocumentgraphconsistsofnodesthatrepresentwordsandedgesthatrepresentvariousdependenciessuchaslinearcontext(adjacentwords),syntacticdependencies,anddiscourserelations(Leeetal.,2013;Xueetal.,2015).Figure1showsthedocumentgraphforourrunningexample;thisinstancesuggeststhattumorswithL858EmutationinEGFRgenerespondstothedruggefitinib.ThisdocumentgraphactsasthebackboneuponwhichagraphLSTMisconstructed.Ifitcon-tainsonlyedgesbetweenadjacentwords,werecoverlinear-chainLSTMs.Similarly,otherpriorLSTMapproachescanbecapturedinthisframeworkbyre-strictingedgestothoseintheshortestdependencypathortheparsetree.3.2BackpropagationinGraphLSTMsConventionalLSTMsareessentiallyverydeepfeed-forwardneuralnetworks.Forexample,aleft-to-rightlinearLSTMhasonehiddenvectorforeachword.Thisvectorisgeneratedbyaneuralnetwork(re-currentunit)thattakesasinputtheembeddingofthegivenwordandthehiddenvectoroftheprevi-ousword.Indiscriminativelearning,thesehiddenvectorsthenserveasinputfortheendclassifiers,fromwhichgradientsarebackpropagatedthroughthewholenetwork.Generalizingsuchastrategytographswithcyclestypicallyrequiresunrollingrecurrenceforanumberofsteps(Scarsellietal.,2009;Lietal.,2016;Liangetal.,2016).Essentially,acopyofthegraphiscreatedforeachstepthatservesasinputforthenext.Theresultisafeed-forwardneuralnetworkthroughtime,andbackpropagationisconductedaccordingly.Inprinciple,wecouldadoptthesamestrategy.Ef-fectively,gradientsarebackpropagatedinamannersimilartoloopybeliefpropagation(LBP).However,thismakeslearningmuchmoreexpensiveaseachup-datesteprequiresmultipleiterationsofbackpropaga-tion.Moreover,loopybackpropagationcouldsufferfromthesameproblemsencounteredtoinLBP,suchasoscillationorfailuretoconverge.Weobservethatdependenciessuchascoreferenceanddiscourserelationsaregenerallysparse,sothebackboneofadocumentgraphconsistsofthelin-earchainandthesyntacticdependencytree.Asinbeliefpropagation,suchstructurescanbeleveragedtomakebackpropagationmoreefficientbyreplac-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
105
ingsynchronousupdates,asintheunrollingstrat-egy,withasynchronousupdates,asinlinear-chainLSTMs.Thisopensupopportunitiesforavarietyofstrategiesinorderingbackpropagationupdates.Inthispaper,weadoptasimplestrategythatper-formedquitewellinpreliminaryexperiments,andleavefurtherexplorationtofuturework.Specifi-cally,wepartitionthedocumentgraphintotwodi-rectedacyclicgraphs(DAGs).OneDAGcontainstheleft-to-rightlinearchain,aswellasotherforward-pointingdependencies.TheotherDAGcoverstheright-to-leftlinearchainandthebackward-pointingdependencies.Figure3illustratesthisstrategy.Effec-tively,wepartitiontheoriginalgraphintotheforwardpass(left-to-right),followedbythebackwardpass(right-to-left),andconstructtheLSTMsaccordingly.Whenthedocumentgraphonlycontainslinearchainedges,thegraphLSTMsisexactlyabi-directionalLSTMs(BiLSTMs).3.3TheBasicRecurrentPropagationUnitAstandardLSTMunitconsistsofaninputvector(wordembedding),amemorycellandanoutputvec-tor(contextualrepresentation),aswellasseveralgates.Theinputgateandoutputgatecontroltheinformationflowingintoandoutofthecell,whereastheforgetgatecanoptionallyremoveinformationfromtherecurrentconnectiontoaprecedentunit.Inlinear-chainLSTMs,eachunitcontainsonlyoneforgetgate,asithasonlyonedirectprecedent(i.e.,theadjacent-wordedgepointingtothepreviousword).IngraphLSTMs,however,aunitmayhaveseveralprecedents,includingconnectionstothesamewordviadifferentedges.Wethusintroduceaforgetgateforeachprecedent,similartotheapproachtakenbyTaietal.(2015)fortreeLSTMs.Encodingrichlinguisticanalysisintroducesmanydistinctedgetypesbesideswordadjacency,suchassyntacticdependencies,whichopensupmanypossi-bilitiesforparametrization.Thiswasnotconsideredinpriorsyntax-awareLSTMapproaches(Taietal.,2015;MiwaandBansal,2016).Inthispaper,weex-ploretwoschemesthatintroducemorefined-grainedparametersbasedontheedgetypes.FullParametrizationOurfirstproposalsimplyin-troducesadifferentsetofparametersforeachedgetype,withcomputationspecifiedbelow.it=σ(Wixt+Xj∈P(t)Um(t,j)ihj+bi)ot=σ(Woxt+Xj∈P(t)Um(t,j)ohj+bo)˜ct=tanh(Wcxt+Xj∈P(t)Um(t,j)chj+bc)ftj=σ(Wfxt+Um(t,j)fhj+bf)ct=it(cid:12)˜ct+Xj∈P(t)ftj(cid:12)cjht=ot(cid:12)tanh(ct)AsinstandardchainLSTMs,xtistheinputwordvectorfornodet,htisthehiddenstatevectorfornodet,W’saretheinputweightmatrices,andb’sarethebiasvectors.σ,tanh,and(cid:12)representthesig-moidfunction,thehyperbolictangentfunction,andtheHadamardproduct(pointwisemultiplication),re-spectively.Themaindifferenceslieintherecurrenceterms.IngraphLSTMs,aunitmighthavemultiplepredecessors(P(t)),foreachofwhich(j)thereisaforgetgateftj,andatypedweightmatrixUm(t,j),wherem(t,j)signifiestheconnectiontypebetweentandj.Theinputandoutputgates(it,ot)dependonallpredecessors,whereastheforgetgate(ftj)onlydependsonthepredecessorwithwhichthegateisassociated.ctand˜ctrepresentintermediatecompu-tationresultswithinthememorycell,whichtakeintoaccounttheinputandforgetgates,andwillbecombinedwithoutputgatetoproducethehiddenrepresentationht.Fullparameterizationisstraightforward,butitre-quiresalargenumberofparameterswhentherearemanyedgetypes.Forexample,therearedozensofsyntacticedgetypes,eachcorrespondingtoaStan-forddependencylabel.Asaresult,inourexper-imentsweresorttousingonlythecoarse-grainedtypes:wordadjacency,syntacticdependency,etc.Next,wewillconsideramorefine-grainedapproachbylearninganedge-typeembedding.Edge-TypeEmbeddingToreducethenumberofparametersandleveragepotentialcorrelationamongfine-grainededgetypes,welearnedalow-dimensionalembeddingoftheedgetypes,andcon-ductedanouterproductofthepredecessor’shiddenvectorandtheedge-typeembeddingtogeneratea“typedhiddenrepresentation”,whichisamatrix.Thenewcomputationisasfollows:
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
106
it=σ(Wixt+Xj∈P(t)Ui×T(hj⊗ej)+bi)ftj=σ(Wfxt+Uf×T(hj⊗ej)+bf)ot=σ(Woxt+Xj∈P(t)Uo×T(hj⊗ej)+bo)˜ct=tanh(Wcxt+Xj∈P(t)Uc×T(hj⊗ej)+bc)ct=it(cid:12)˜ct+Xj∈P(t)ftj(cid:12)cjht=ot(cid:12)tanh(ct)U’sarenowl×l×dtensors(listhedimensionofthehiddenvectoranddisthedimensionforedge-typeembedding),andhj⊗ejisatensorproductthatproducesanl×dmatrix.×TdenotesatensordotproductdefinedasT×TA=Pd(T:,:,d·A:,d),whichproducesanl-dimensionalvector.Theedge-typeembeddingejisjointlytrainedwiththeotherparameters.3.4ComparisonwithPriorLSTMApproachesThemainadvantagesofagraphformulationareitsgeneralityandflexibility.AsseeninSection3.1,linear-chainLSTMsareaspecialcasewhenthedoc-umentgraphisthelinearchainofadjacentwords.Similarly,TreeLSTMs(Taietal.,2015)areaspecialcasewhenthedocumentgraphistheparsetree.IngraphLSTMs,theencodingoflinguisticknowl-edgeisfactoredfromthebackpropagationstrategy(Section3.2),makingitmuchmoreflexible,includ-ingintroducingcycles.Forexample,MiwaandBansal(2016)conductedjointentityandbinaryre-lationextractionbystackingaLSTMforrelationextractionontopofanotherLSTMforentityrecog-nition.IngraphLSTMs,thetwocanbecombinedseamlesslyusingadocumentgraphcomprisingboththeword-adjacencychainandthedependencypathbetweenthetwoentities.Thedocumentgraphcanalsoincorporateotherlinguisticinformation.Forexample,coreferenceanddiscourseparsingareintuitivelyrelevantforcross-sentencerelationextraction.Althoughexistingsystemshavenotyetbeenshowntoimprovecross-sentencerelationextraction(QuirkandPoon,2017),itremainsanimportantfuturedirectiontoexploreincorporatingsuchanalyses,especiallyafteradaptingthemtothebiomedicaldomains(Belletal.,2016).3.5Multi-taskLearningwithSub-relationsMulti-tasklearninghasbeenshowntobebeneficialintrainingneuralnetworks(Caruana,1998;CollobertandWeston,2008;PengandDredze,2016).Bylearningcontextualentityrepresentations,ourframe-workmakesitstraightforwardtoconductmulti-tasklearning.Theonlychangeistoaddaseparateclassi-fierforeachrelatedauxiliaryrelation.AllclassifierssharethesamegraphLSTMsrepresentationlearnerandwordembeddings,andcanpotentiallyhelpeachotherbypoolingtheirsupervisionsignals.Inthemoleculartumorboarddomain,weappliedthisparadigmtojointlearningofboththeternaryrela-tion(drug-gene-mutation)anditsbinarysub-relation(drug-mutation).Experimentresultsshowthatthisprovidessignificantgainsinbothtasks.4ImplementationDetailsWeimplementedourmethodsusingtheTheanoli-brary(TheanoDevelopmentTeam,2016).Weusedlogisticregressionforourrelationclassifiers.Hyperparametersweresetbasedonpreliminaryexperi-mentsonasmalldevelopmentdataset.Trainingwasdoneusingmini-batchedstochasticgradientdescent(SGD)withbatchsize8.Weusedalearningrateof0.02andtrainedforatmost30epochs,withearlystoppingbasedondevelopmentdata(Caruanaetal.,2001;Gravesetal.,2013).ThedimensionforthehiddenvectorsinLSTMunitswassetto150,andthedimensionfortheedge-typeembeddingwassetto3.Thewordembeddingswereinitializedwiththepub-liclyavailable100-dimensionalGloVewordvectorstrainedon6billionwordsfromWikipediaandwebtext3(Penningtonetal.,2014).Othermodelparam-eterswereinitializedwithrandomsamplesdrawnuniformlyfromtherange[−1,1].Inmulti-tasktraining,wealternatedamongalltasks,eachtimepassingthroughalldataforonetask4,andupdatingtheparametersaccordingly.Thiswasrepeatedfor30epochs.3http://nlp.stanford.edu/projects/glove/4However,drug-genepairshavemuchmoredata,sowesub-sampledtheinstancesdowntothesamesizeasthemainn-aryrelationtask.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
107
5Domain:MolecularTumorBoardsOurmainexperimentsfocusonextractingternaryinteractionsoverdrugs,genesandmutations,whichisimportantformoleculartumorboards.Adrug-gene-mutationinteractionisbroadlyconstruedasanassociationbetweenthedrugefficacyandthemuta-tioninthegivengene.Thereisnoannotateddatasetforthisproblem.However,duetotheimportanceofsuchknowledge,oncologistshavebeenpainstakinglycuratingknownrelationsfromreadingpapers.Suchamanualapproachcannotkeepupwiththerapidgrowthoftheresearchliterature,andthecoverageisgenerallysparseandnotuptodate.However,thecu-ratedknowledgecanbeusedfordistantsupervision.5.1DatasetsWeobtainedbiomedicalliteraturefromPubMedCen-tral5,consistingofapproximatelyonemillionfull-textarticlesasof2015.Notethatonlyafractionofpaperscontainknowledgeaboutdrug-gene-mutationinteractions.Extractingsuchknowledgefromthevastbodyofbiomedicalpapersisexactlythechal-lenge.Aswewillseeinlatersubsections,distantsupervisionenablesustogenerateasizabletrain-ingsetfromasmallnumberofmanuallycuratedfacts,andthelearnedmodelwasabletoextractor-dersofmagnitudemorefacts.Infuturework,wewillexploreincorporatingmoreknownfactsfordis-tantsupervisionandextractingfrommorefull-textarticles.Weconductedtokenization,part-of-speechtag-ging,andsyntacticparsingusingSPLAT(Quirketal.,2012),andobtainedStanforddependencies(deMarneffeetal.,2006)usingStanfordCoreNLP(Man-ningetal.,2014).WeusedtheentitytaggersfromLiterome(Poonetal.,2014)toidentifydrug,geneandmutationmentions.WeusedtheGeneDrugKnowledgeDatabase(GDKD)(Dienstmannetal.,2015)andtheClini-calInterpretationsofVariantsInCancer(CIVIC)knowledgebase6fordistantsupervision.Theknowl-edgebasesdistinguishfine-grainedinteractiontypes,whichwedonotuseinthispaper.5http://www.ncbi.nlm.nih.gov/pmc/6http://civic.genome.wustl.edu5.2DistantSupervisionAfteridentifyingdrug,geneandmutationmentionsinthetext,co-occurringtripleswithknowninterac-tionswerechosenaspositiveexamples.However,unlikethesingle-sentencesettinginstandarddis-tantsupervision,caremustbetakeninselectingthecandidates.Sincethetriplescanresideindiffer-entsentences,anunrestrictedselectionoftextspanswouldriskintroducingmanyobviouslywrongex-amples.WethusfollowedQuirkandPoon(2017)inrestrictingthecandidatestothoseoccurringinaminimalspan,i.e.,weretainacandidateonlyifisnootherco-occurrenceofthesameentitiesinanoverlappingtextspanwithasmallernumberofcon-secutivesentences.Furthermore,weavoidpickingunlikelycandidateswherethetriplesarefarapartinthedocument.Specifically,weconsidereden-titytripleswithinKconsecutivesentences,ignoringparagraphboundaries.K=1correspondstothebaselineofextractionwithinsinglesentences.WeexploredK≤3,whichcapturedalargefractionofcandidateswithoutintroducingmanyunlikelyones.Only59distinctdrug-gene-mutationtriplesfromtheknowledgebaseswerematchedinthetext.Evenfromsuchasmallsetofuniquetriples,weobtained3,462ternaryrelationinstancesthatcanserveaspos-itiveexamples.Formulti-tasklearning,wealsocon-sidereddrug-geneanddrug-mutationsub-relations,whichyielded137,469drug-geneand3,192drug-mutationrelationinstancesaspositiveexamples.Wegeneratenegativeexamplesbyrandomlysam-plingco-occurringentitytripleswithoutknowninter-actions,subjecttothesamerestrictionsabove.Wesampledthesamenumberaspositiveexamplestoobtainabalanceddataset7.5.3AutomaticEvaluationTocomparethevariousmodelsinourproposedframework,weconductedfive-foldcross-validation,treatingthepositiveandnegativeexamplesfromdis-tantsupervisionasgoldannotation.Toavoidtrain-testcontamination,allexamplesfromadocumentwereassignedtothesamefold.Sinceourdatasetsarebalancedbyconstruction,wesimplyreportaver-agetestaccuracyonheld-outfolds.Obviously,the7Wewillreleasethedatasetathttp://hanover.azurewebsites.net.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
108
ModelSingle-Sent.Cross-Sent.Feature-Based74.777.7CNN77.578.1BiLSTM75.380.1GraphLSTM-EMBED76.580.6GraphLSTM-FULL77.980.7Table1:Averagetestaccuracyinfive-foldcross-validationfordrug-gene-mutationternaryinterac-tions.Feature-Basedusedthebestperformingmodelin(QuirkandPoon,2017)withfeaturesderivedfromshortestpathsbetweenallentitypairs.ModelSingle-Sent.Cross-Sent.Feature-Based73.975.2CNN73.074.9BiLSTM73.976.0BiLSTM-Shortest-Path70.271.7TreeLSTM75.975.9GraphLSTM-EMBED74.376.5GraphLSTM-FULL75.676.7Table2:Averagetestaccuracyinfive-foldcross-validationfordrug-mutationbinaryrelations,withanextrabaselineusingaBiLSTMontheshortestdependencypath(Xuetal.,2015b;MiwaandBansal,2016).resultscouldbenoisy(e.g.,entitytriplesnotknowntohaveaninteractionmightactuallyhaveone),butthisevaluationisautomaticandcanquicklyevaluatetheimpactofvariousdesignchoices.WeevaluatedtwovariantsofgraphLSTMs:“GraphLSTM-FULL”withfullparametrizationand“GraphLSTM-EMBED”withedge-typeembedding.WecomparedgraphLSTMswiththreestrongbase-linesystems:awell-engineeredfeature-basedclassi-fier(QuirkandPoon,2017),aconvolutionalneuralnetwork(CNN)(Zengetal.,2014;Santosetal.,2015;Wangetal.,2016),andabi-directionalLSTM(BiLSTM).FollowingWangetal.(2016),weusedin-putattentionfortheCNNandainputwindowsizeof5.QuirkandPoon(2017)onlyextractedbinaryrela-tions.Weextendedittoternaryrelationsbyderivingfeaturesforeachentitypair(withaddedannotationtosignifythetwoentitytypes),andpoolingthefeaturesfromallpairs.Forbinaryrelationextraction,priorsyntax-awareapproachesaredirectlyapplicable.Sowealsocomparedwithastate-of-the-arttreeLSTMsystem(MiwaandBansal,2016)andaBiLSTMontheshortestdependencypathbetweenthetwoentities(BiLSTM-Shortest-Path)(Xuetal.,2015b).Table1showstheresultsforcross-sentence,ternaryrelationextraction.Allneural-networkbasedmodelsoutperformedthefeature-basedclassifier,il-lustratingtheiradvantageinhandlingsparselinguis-ticpatternswithoutrequiringintensefeatureengi-neering.AllLSTMssignificantlyoutperformedCNNinthecross-sentencesetting,verifyingtheimpor-tanceincapturinglong-distancedependencies.ThetwovariantsofgraphLSTMsperformonparwitheachother,thoughGraphLSTM-FULLhasasmalladvantage,suggestingthatfurtherexplorationofparametrizationschemescouldbebeneficial.Inparticular,theedge-typeembeddingmightimprovebypretrainingonunlabeledtextwithsyntacticparses.BothgraphvariantssignificantlyoutperformedBiLSTMs(p<0.05byMcNemar’schi-squaretest),thoughthedifferenceissmall.Thisresultisintrigu-ing.InQuirkandPoon(2017),thebestsystemin-corporatedsyntacticdependenciesandoutperformedthelinear-chainvariant(Base)byalargemargin.Sowhydidn’tgraphLSTMsmakeanequallysubstantialgainbymodelingsyntacticdependencies?Onereasonisthatlinear-chainLSTMscanalreadycapturedsomeofthelong-distancedependenciesavailableinsyntacticparses.BiLSTMssubstantiallyoutperformedthefeature-basedclassifier,evenwith-outexplicitmodelingofsyntacticdependencies.ThegaincannotbeentirelyattributedtowordembeddingasLSTMsalsooutperformedCNNs.Anotherreasonisthatsyntacticparsingislessaccurateinthebiomedicaldomain.Parseerrorscon-fusethegraphLSMlearner,limitingthepotentialforgain.InSection6,weshowsupportingevidenceinadomainwhengoldparsesareavailable.Wealsoreportedaccuracyoninstanceswithinsinglesentences,whichexhibitedabroadlysimilarsetoftrends.Notethatsingle-sentenceandcross-sentenceaccuraciesarenotdirectlycomparable,asthetestsetsaredifferent(onesubsumestheother).Weconductedthesameexperimentsonthebinarysub-relationbetweendrug-mutationpairs.Table2
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
109
Drug-Gene-Mut.Drug-Mut.BiLSTM80.176.0+Multi-task82.478.1GraphLSTM80.776.7+Multi-task82.078.5Table3:Multi-tasklearningimprovedaccuracyforbothBiLSTMsandGraphLSTMs.showstheresults,whicharesimilartotheternarycase:GraphLSTM-FULLconsistentlyperformedthebestforbothsinglesentenceandcross-sentenceinstances.BiLSTMsontheshortestpathsubstan-tiallyunderperformedBiLSTMsorgraphLSTMs,losingbetween4-5absolutepointsinaccuracy,whichcouldbeattributedtothelowerparsingqualityinthebiomedicaldomain.Interestingly,thestate-of-the-arttreeLSTMs(MiwaandBansal,2016)alsounder-performedgraphLSTMs,eventhoughtheyencodedessentiallythesamelinguisticstructures(wordadja-cencyandsyntacticdependency).WeattributedthegaintothefactthatMiwaandBansal(2016)usedseparateLSTMsforthelinearchainandthedepen-dencytree,whereasgraphLSTMslearnedasinglerepresentationforboth.Toevaluatewhetherjointlearningwithsub-relationscanhelp,weconductedmulti-tasklearningusingGraphLSTM-FULLtojointlytrainextractorsforboththeternaryinteractionandthedrug-mutation,drug-genesub-relations.Table3showstheresults.Multi-tasklearningresultedinasignificantgainforboththeternaryinteractionandthedrug-mutationinteraction.Interestingly,theadvantageofgraphLSTMsoverBiLSTMsisreducedwithmulti-tasklearning,suggestingthatwithmoresupervisionsig-nal,evenlinear-chainLSTMscanlearntocapturelong-rangedependenciesthatareweremadeevidentbyparsefeaturesingraphLSTMs.Notethattherearemanymoreinstancesfordrug-geneinteractionthanothers,soweonlysampledasubsetofcomparablesize.Therefore,wedonotevaluatetheperformancegainfordrug-geneinteraction,asinpractice,onewouldsimplylearnfromallavailabledata,andthesub-sampledresultsarenotcompetitive.Weincludedcoreferenceanddiscourserelationsinourdocumentgraph.However,wedidn’tobserveanysignificantgains,similartotheobservationinSingle-Sent.Cross-Sent.Candidates10,87357,033p≥0.51,4084,279p≥0.95301,461GDKD+CIVIC59Table4:Numbersofuniquedrug-gene-mutationin-teractionsextractedfromPubMedCentralarticles,comparedtothatfrommanuallycuratedKBsusedindistantsupervision.psignifiesoutputprobability.QuirkandPoon(2017).Weleavefurtherexplorationtofuturework.5.4PubMed-ScaleExtractionOurultimategoalistoextractallknowledgefromavailabletext.Wethusretrainedourmodelusingthebestsystemfromautomaticevaluation(i.e.,GraphLSTM-FULL)onallavailabledata.TheresultingmodelwasthenusedtoextractrelationsfromallPubMedCentralarticles.Table4showsthenumberofcandidatesandex-tractedinteractions.Withaslittleas59uniquedrug-gene-mutationtriplesfromthetwodatabases8,welearnedtoextractordersofmagnitudemoreuniqueinteractions.Theresultsalsohighlightthebenefitofcross-sentenceextraction,whichyields3to5timesmorerelationsthansingle-sentenceextraction.Table5conductsasimilarcomparisononuniquenumberofdrugs,genes,andmutations.Again,ma-chinereadingcoversfarmoreuniqueentities,espe-ciallywithcross-sentenceextraction.5.5ManualEvaluationOurautomaticevaluationsareusefulforcomparingcompetingapproaches,butmaynotreflectthetrueclassifierprecisionasthelabelsarenoisy.Therefore,werandomlysampledextractedrelationinstancesandaskedthreeresearchersknowledgeableinpre-cisionmedicinetoevaluatetheircorrectness.Foreachinstance,theannotatorswerepresentedwiththeprovenance:sentenceswiththedrug,gene,andmutationhighlighted.Theannotatorsdeterminedin8Therearemoreinthedatabases,butthesearetheonlyonesforwhichwefoundmatchinginstancesinthetext.Infuturework,wewillexplorevariouswaystoincreasethenumber,e.g.,bymatchingunderspecifieddrugclassestospecificdrugs.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
110
DrugGeneMut.GDKD+CIVIC161241Single-Sent.(p≥0.9)68228221Single-Sent.(p≥0.5)93597476Cross-Sent.(p≥0.9)103512445Cross-Sent.(p≥0.5)14413441042Table5:Numbersofuniquedrugs,genesandmuta-tionsinextractionfromPubMedCentralarticles,incomparisonwiththatinthemanuallycuratedGeneDrugKnowledgeDatabase(GDKD)andClinicalIn-terpretationsofVariantsInCancer(CIVIC)usedfordistantsupervision.psignifiesoutputprobability.EntityRelationPrecisionErrorErrorRandom17%36%47%p≥0.564%7%29%p≥0.975%1%24%Table6:Sampleprecisionofdrug-gene-mutationinteractionsextractedfromPubMedCentralarticles.psignifiesoutputprobability.eachcasewhetherthisinstanceimpliedthatthegivenentitieswererelated.Notethatevaluationdoesnotattempttoidentifywhethertherelationshipsaretrueorreplicatedinfollow-uppapers;rather,itfocusesonwhethertherelationshipsareentailedbythetext.Wefocusedourevaluationeffortsonthecross-sentenceternary-relationsetting.Weconsideredthreeprobabilitythresholds:0.9forahigh-precisionbutpotentiallylow-recallsetting,0.5,andarandomsampleofallcandidates.Ineachcase,150instanceswereselectedforatotalof450annotations.Asubsetof150instanceswerereviewedbytwoannotators,andtheinter-annotatoragreementwas88%.Table6showsthattheclassifierindeedfiltersoutalargeportionofpotentialcandidates,withestimatedinstanceaccuracyof64%atthethresholdof0.5,and75%at0.9.Interestingly,LSTMsareeffectiveatscreeningoutmanyentitymentionerrors,presum-ablybecausetheyincludebroadcontextualfeatures.ModelPrecisionRecallF1Poonetal.(2015)37.529.933.2BiLSTM37.629.433.0GraphLSTM41.430.034.8GraphLSTM(GOLD)43.330.535.8Table7:GENIAtestresultsonthebinaryrelationofgeneregulation.GraphLSTM(GOLD)usedgoldsyntacticparsesinthedocumentgraph.6Domain:GeneticPathwaysWealsoconductedexperimentsonextractinggeneticpathwayinteractionsusingtheGENIAEventExtrac-tiondataset(Kimetal.,2009).Thisdatasetcontainsgoldsyntacticparsesforthesentences,whichofferedauniqueopportunitytoinvestigatetheimpactofsyn-tacticanalysisongraphLSTMs.Italsoallowedustotestourframeworkinsupervisedlearning.Theoriginalsharedtaskevaluatedoncomplex,nestedeventsfornineeventtypes,manyofwhichareunaryrelations(Kimetal.,2009).FollowingPoonetal.(2015),wefocusedongeneregulationandreducedittobinary-relationclassificationforhead-to-headcomparison.Wefollowedtheirexperimentalprotocolbysub-samplingnegativeexamplestobeaboutthreetimesofpositiveexamples.Sincethedatasetisnotentirelybalanced,were-portedprecision,recall,andF1.WeusedourbestperforminggraphLSTMfromthepreviousexperi-ments.Bydefault,automaticparseswereusedinthedocumentgraphs,whereasinGraphLSTM(GOLD),goldparseswereusedinstead.Table7showsthere-sults.Onceagain,despitethelackofintensefeatureengineering,linear-chainLSTMsperformedonparwiththefeature-basedclassifier(Poonetal.,2015).GraphLSTMsexhibitedamorecommandingadvan-tageoverlinear-chainLSTMsinthisdomain,sub-stantiallyoutperformingthelatter(p<0.01byMc-Nemar’schi-squaretest).Mostinterestingly,graphLSTMsusinggoldparsessignificantlyoutperformedthatusingautomaticparses,suggestingthatencodinghigh-qualityanalysisisparticularlybeneficial.7RelatedWorkMostworkonrelationextractionhasbeenappliedtobinaryrelationsofentitiesinasinglesentence.Wefirstreviewrelevantworkonthesingle-sentencebi-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
111
naryrelationextractiontask,andthenreviewrelatedworkonn-aryandcross-sentencerelationextraction.BinaryrelationextractionThetraditionalfeature-basedmethodsrelyoncarefullydesignedfeaturestolearngoodmodels,andoftenintegratediversesourcesofevidencesuchaswordsequencesandsyn-taxcontext(Kambhatla,2004;GuoDongetal.,2005;Boscheeetal.,2005;Suchaneketal.,2006;ChanandRoth,2010;NguyenandGrishman,2014).Thekernel-basedmethodsdesignvarioussubsequenceortreekernels(MooneyandBunescu,2005;BunescuandMooney,2005;Qianetal.,2008)tocapturestruc-turedinformation.Recently,modelsbasedonneuralnetworkshaveadvancedthestateoftheartbyauto-maticallylearningpowerfulfeaturerepresentations(Xuetal.,2015a;Zhangetal.,2015;Santosetal.,2015;Xuetal.,2015b;Xuetal.,2016).MostneuralarchitecturesresembleFigure2,wherethereisacorerepresentationlearner(blue)thattakeswordembeddingsasinputandproducescontextualentityrepresentations.Suchrepresenta-tionsarethentakenbyrelationclassifierstopro-ducethefinalpredictions.Effectivelyrepresentingsequencesofwords,bothconvolutional(Zengetal.,2014;Wangetal.,2016;Santosetal.,2015)andRNN-basedarchitectures(Zhangetal.,2015;Socheretal.,2012;Caietal.,2016)havebeensuccessful.Mostofthesehavefocusedonmodelingeitherthesurfacewordsequencesorthehierarchicalsyntac-ticstructure.MiwaandBansal(2016)proposedanarchitecturethatbenefitsfrombothtypesofinforma-tion,usingasurfacesequencelayer,followedbyadependency-treesequencelayer.N-aryrelationextractionEarlyworkonextract-ingrelationsbetweenmorethantwoargumentshasbeendoneinMUC-7,withafocusonfact/eventextractionfromnewsarticles(Chinchor,1998).Se-manticrolelabelinginthePropbank(Palmeretal.,2005)orFrameNet(Bakeretal.,1998)stylearealsoinstancesofn-aryrelationextraction,withextrac-tionofeventsexpressedinasinglesentence.Mc-Donaldetal.(2005)extractn-aryrelationsinabio-medicaldomain,byfirstfactoringthen-aryrelationintopair-wiserelationsbetweenallentitypairs,andthenconstructingmaximalcliquesofrelatedenti-ties.Recently,neuralmodelshavebeenappliedtosemanticrolelabeling(FitzGeraldetal.,2015;RothandLapata,2016).Theseworkslearnedneuralrep-resentationsbyeffectivelydecomposingthen-aryrelationintobinaryrelationsbetweenthepredicateandeachargument,byembeddingthedependencypathbetweeneachpair,orbycombiningfeaturesofthetwousingafeed-forwardnetwork.Althoughsomere-rankingorjointinferencemodelshavebeenemployed,therepresentationsoftheindividualargu-mentsdonotinfluenceeachother.Incontrast,weproposeaneuralarchitecturethatjointlyrepresentsnentitymentions,takingintoaccountlong-distancedependenciesandinter-sententialinformation.Cross-sentencerelationextractionSeveralrela-tionextractiontaskshavebenefitedfromcross-sentenceextraction,includingMUCfactandeventextraction(SwampillaiandStevenson,2011),recordextractionfromwebpages(Wicketal.,2006),extrac-tionoffactsforbiomedicaldomains(Yoshikawaetal.,2011),andextensionsofsemanticrolelabelingtocoverimplicitinter-sententialarguments(GerberandChai,2010).Thesepriorworkshaveeitherreliedonexplicitco-referenceannotation,orontheassump-tionthatthewholedocumentreferstoasingleco-herentevent,tosimplifytheproblemandreducetheneedforpowerfulrepresentationsofmulti-sententialcontextsofentitymentions.Recently,cross-sentencerelationextractionmodelshavebeenlearnedwithdistantsupervision,andusedintegratedcontextualevidenceofdiversetypeswithoutrelianceontheseassumptions(QuirkandPoon,2017),butthatworkfocusedonbinaryrelationsonlyandexplicitlyengi-neeredsparseindicatorfeatures.RelationextractionusingdistantsupervisionDistantsupervisionhasbeenappliedtoextractionofbinary(Mintzetal.,2009;Poonetal.,2015)andn-ary(Reschkeetal.,2014;Lietal.,2015)relations,traditionallyusinghand-engineeredfeatures.Neuralarchitectureshaverecentlybeenappliedtodistantlysupervisedextractionofbinaryrelations(Zengetal.,2015).Ourworkisthefirsttoproposeaneuralarchi-tectureforn-aryrelationextraction,wheretherepre-sentationofatupleofentitiesisnotdecomposableintoindependentrepresentationsoftheindividualentitiesorentitypairs,andwhichintegratesdiverseinformationfrommulti-sententialcontext.Toutilizetrainingdatamoreeffectively,weshowhowmulti-tasklearningforcomponentbinarysub-relationscan
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
112
improveperformance.Ourlearnedrepresentationcombinesinformationsourceswithinasinglesen-tenceinamoreintegratedandgeneralizablefashionthanpriorapproaches,andcanalsoimproveperfor-manceonsingle-sentencebinaryrelationextraction.8ConclusionWeexploreageneralframeworkforcross-sentencen-aryrelationextractionbasedongraphLSTMs.Thegraphformulationsubsumeslinear-chainandtreeLSTMsandmakesiteasytoincorporaterichlinguis-ticanalysis.Experimentsonbiomedicaldomainsshowedthatextractionbeyondthesentencebound-aryproducedfarmoreknowledge,andencodingrichlinguisticknowledgeprovidedconsistentgain.Whilethereismuchroomtoimproveinbothrecallandprecision,ourresultsindicatethatmachineread-ingcanalreadybeusefulinprecisionmedicine.Inparticular,automaticallyextractedfacts(Section5.4)canserveascandidatesformanualcuration.Insteadofscanningmillionsofarticlestocuratefromscratch,humancuratorswouldjustquicklyvetthousandsofextractions.Theerrorsidentifiedbycuratorsofferdirectsupervisiontothemachinereadingsystemforcontinuousimprovement.Therefore,themostim-portantgoalistoattainhighrecallandreasonableprecision.Ourcurrentmodelsarealreadyquitecapa-ble.Futuredirectionsinclude:interactivelearningwithuserfeedback;improvingdiscoursemodelingingraphLSTMs;exploringotherbackpropagationstrategies;jointlearningwithentitylinking;applica-tionstootherdomains.AcknowledgementsWethankDanielFriedandMing-WeiChangforuse-fuldiscussions,aswellastheanonymousreviewersandeditor-in-chiefMarkJohnsonfortheirhelpfulcomments.ReferencesCollinBaker,CharlesFillmore,andJohnLowe.1998.TheBerkeleyFrameNetproject.InProceedingsoftheThirty-SixthAnnualMeetingoftheAssociationforComputationalLinguisticsandSeventeenthInterna-tionalConferenceonComputationalLinguistics.DaneBell,GustaveHahn-Powell,MarcoA.Valenzuela-Escarcega,andMihaiSurdeanu.2016.Aninvesti-gationofcoreferencephenomenainthebiomedicaldomain.InProceedingsoftheTenthEditionoftheLanguageResourcesandEvaluationConference.YoshuaBengio,PatriceSimard,andPaoloFrasconi.1994.Learninglong-termdependencieswithgradientdescentisdifficult.IEEEtransactionsonneuralnetworks,5(2).ElizabethBoschee,RalphWeischedel,andAlexZama-nian.2005.Automaticinformationextraction.InProceedingsoftheInternationalConferenceonIntelli-genceAnalysis.RazvanCBunescuandRaymondJMooney.2005.Ashortestpathdependencykernelforrelationextraction.InProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing.RuiCai,XiaodongZhang,andHoufengWang.2016.Bidirectionalrecurrentconvolutionalneuralnetworkforrelationclassification.InProceedingsoftheFifty-FourthAnnualMeetingoftheAssociationforCompu-tationalLinguistics.RichCaruana,SteveLawrence,andLeeGiles.2001.Overfittinginneuralnets:Backpropagation,conjugategradient,andearlystopping.InProceedingsofTheFifteenthAnnualConferenceonNeuralInformationProcessingSystems.RichCaruana.1998.Multitasklearning.InLearningtolearn.Springer.YeeSengChanandDanRoth.2010.Exploitingback-groundknowledgeforrelationextraction.InProceed-ingsoftheTwenty-ThirdInternationalConferenceonComputationalLinguistics.NancyChinchor.1998.OverviewofMUC-7/MET-2.Technicalreport,ScienceApplicationsInternationalCorporation,SanDiego,CA.RonanCollobertandJasonWeston.2008.Aunifiedar-chitecturefornaturallanguageprocessing:Deepneuralnetworkswithmultitasklearning.InProceedingsoftheTwenty-FifthInternationalConferenceonMachinelearning.MarkCravenandJohanKumlien.1999.Constructingbiologicalknowledgebasesbyextractinginformationfromtextsources.InProceedingsoftheSeventhInter-nationalConferenceonIntelligentSystemsforMolecu-larBiology.Marie-CatherinedeMarneffe,BillMacCartney,andChristopherD.Manning.2006.Generatingtypeddependencyparsesfromphrasestructureparses.InProceedingsoftheFifthInternationalConferenceonLanguageResourcesandEvaluation.RodrigoDienstmann,InSockJang,BrianBot,StephenFriend,andJustinGuinney.2015.Databaseofge-nomicbiomarkersforcancerdrugsandclinicaltar-getabilityinsolidtumors.CancerDiscovery,5.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
113
NicholasFitzGerald,OscarT¨ackstr¨om,KuzmanGanchev,andDipanjanDas.2015.Semanticrolelabelingwithneuralnetworkfactors.InProceedingsoftheCon-ferenceonEmpiricalMethodsinNaturalLanguageProcessing.MatthewGerberandJoyceY.Chai.2010.BeyondNom-Bank:Astudyofimplicitargumentsfornominalpredi-cates.InProceedingsoftheForty-EighthAnnualMeet-ingoftheAssociationforComputationalLinguistics.AlanGraves,Abdel-rahmanMohamed,andGeoffreyHin-ton.2013.Speechrecognitionwithdeeprecurrentneuralnetworks.InProceedingsofTheThirty-EighthIEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing.ZhouGuoDong,SuJian,ZhangJie,andZhangMin.2005.Exploringvariousknowledgeinrelationextraction.InProceedingsoftheForty-ThirdAnnualMeetingoftheAssociationforComputationalLinguistics.SeppHochreiterandJ¨urgenSchmidhuber.1997.Longshort-termmemory.Neuralcomputation,9(8).NandaKambhatla.2004.Combininglexical,syntactic,andsemanticfeatureswithmaximumentropymodelsforextractingrelations.InProceedingsoftheForty-SecondAnnualMeetingoftheAssociationforCompu-tationalLinguistics,DemonstrationSessions.Jin-DongKim,TomokoOhta,SampoPyysalo,Yoshi-nobuKano,andJun’ichiTsujii.2009.OverviewofBioNLP’09sharedtaskoneventextraction.InProceed-ingsoftheWorkshoponCurrentTrendsinBiomedicalNaturalLanguageProcessing:SharedTask.HeeyoungLee,AngelChang,YvesPeirsman,NathanaelChambers,MihaiSurdeanu,andDanJurafsky.2013.Deterministiccoreferenceresolutionbasedonentity-centric,precision-rankedrules.ComputationalLinguis-tics,39(4).HongLi,SebastianKrause,FeiyuXu,AndreaMoro,HansUszkoreit,andRobertoNavigli.2015.Improvementofn-aryrelationextractionbyaddinglexicalsemanticstodistant-supervisionrulelearning.InProceedingsoftheSeventhInternationalConferenceonAgentsandArtificialIntelligence.YujiaLi,DanielTarlow,MarcBrockschmidt,andRichardZemel.2016.Gatedgraphsequenceneuralnetworks.InProceedingsoftheFourthInternationalConferenceonLearningRepresentations.XiaodanLiang,XiaohuiShen,JiashiFeng,LiangLin,andShuichengYan.2016.SemanticobjectparsingwithgraphLSTM.InProceedingsofEuropeanConferenceonComputerVision.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyFinkel,StevenJ.Bethard,andDavidMcClosky.2014.TheStanfordCoreNLPnaturallanguagepro-cessingtoolkit.InProceedingsoftheFifty-SecondAnnualMeetingoftheAssociationforComputationalLinguistics:SystemDemonstrations.RyanMcDonald,FernandoPereira,SethKulick,ScottWinters,YangJin,andPeteWhite.2005.Simplealgo-rithmsforcomplexrelationextractionwithapplicationstobiomedicalIE.InProceedingsoftheForty-ThirdAnnualMeetingonAssociationforComputationalLin-guistics.MikeMintz,StevenBills,RionSnow,andDanJuraf-sky.2009.Distantsupervisionforrelationextractionwithoutlabeleddata.InProceedingsoftheJointCon-ferenceoftheForty-SeventhAnnualMeetingoftheAs-sociationforComputationalLinguisticsandtheFourthInternationalJointConferenceonNaturalLanguageProcessing.MakotoMiwaandMohitBansal.2016.End-to-endre-lationextractionusingLSTMsonsequencesandtreestructures.InProceedingsoftheFifty-FourthAnnualMeetingoftheAssociationforComputationalLinguis-tics.RaymondJMooneyandRazvanCBunescu.2005.Subse-quencekernelsforrelationextraction.InProceedingsofTheNineteenAnnualConferenceonNeuralInforma-tionProcessingSystems.ThienHuuNguyenandRalphGrishman.2014.Employ-ingwordrepresentationsandregularizationfordomainadaptationofrelationextraction.InProceedingsoftheFifty-SecondAnnualMeetingoftheAssociationforComputationalLinguistics.MarthaPalmer,DanielGildea,andPaulKingsbury.2005.ThePropositionBank:Anannotatedcorpusofseman-ticroles.ComputationalLinguistics,31(1).RazvanPascanu,TomasMikolov,andYoshuaBengio.2013.Onthedifficultyoftrainingrecurrentneuralnetworks.InProceedingsofTheThirtiethInternationalConferenceonMachineLearning.NanyunPengandMarkDredze.2016.Improvingnamedentityrecognitionforchinesesocialmediawithwordsegmentationrepresentationlearning.InProceedingsoftheFifty-FourthAnnualMeetingoftheAssociationforComputationalLinguistics.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:Globalvectorsforwordrepre-sentation.InProceedingsoftheConferenceonEmpiri-calMethodsinNaturalLanguageProcessing.HoifungPoon,ChrisQuirk,CharlieDeZiel,andDavidHeckerman.2014.Literome:PubMed-scalegenomicknowledgebaseinthecloud.Bioinformatics,30(19).HoifungPoon,KristinaToutanova,andChrisQuirk.2015.Distantsupervisionforcancerpathwayextractionfromtext.InPacificSymposiumonBiocomputing.LonghuaQian,GuodongZhou,FangKong,QiaomingZhu,andPeideQian.2008.Exploitingconstituent
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
114
dependenciesfortreekernel-basedsemanticrelationextraction.InProceedingsoftheTwenty-SecondInter-nationalConferenceonComputationalLinguistics.ChrisQuirkandHoifungPoon.2017.Distantsupervi-sionforrelationextractionbeyondthesentencebound-ary.InProceedingsoftheFifteenthConferenceonEuropeanchapteroftheAssociationforComputationalLinguistics.ChrisQuirk,PallaviChoudhury,JianfengGao,HisamiSuzuki,KristinaToutanova,MichaelGamon,Wen-tauYih,andLucyVanderwende.2012.MSRSPLAT,alanguageanalysistoolkit.InProceedingsoftheConfer-enceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTech-nologies,DemonstrationSession.KevinReschke,MartinJankowiak,MihaiSurdeanu,ChristopherDManning,andDanielJurafsky.2014.Eventextractionusingdistantsupervision.InProceed-ingsofEightheditionoftheLanguageResourcesandEvaluationConference.MichaelRothandMirellaLapata.2016.Neuralsemanticrolelabelingwithdependencypathembeddings.InProceedingsoftheFifty-FourthAnnualMeetingoftheAssociationforComputationalLinguistics.CiceroNogueiradosSantos,BingXiang,andBowenZhou.2015.Classifyingrelationsbyrankingwithconvolutionalneuralnetworks.InProceedingsoftheFifty-ThirdAnnualMeetingoftheAssociationforCom-putationalLinguistics.FrancoScarselli,MarcoGori,AhChungTsoi,MarkusHagenbuchner,andGabrieleMonfardini.2009.Thegraphneuralnetworkmodel.IEEETransactionsonNeuralNetworks,20(1).RichardSocher,BrodyHuval,ChristopherDManning,andAndrewYNg.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InProceedingsoftheJointConferenceonEmpiricalMethodsinNatu-ralLanguageProcessingandComputationalNaturalLanguageLearning.FabianMSuchanek,GeorgianaIfrim,andGerhardWeikum.2006.Combininglinguisticandstatisticalanalysistoextractrelationsfromwebdocuments.InProceedingsoftheTwelfthInternationalConferenceonKnowledgeDiscoveryandDataMining.MihaiSurdeanuandJiHeng.2014.OverviewoftheenglishslotfillingtrackattheTAC2014knowledgebasepopulationevaluation.InProceedingsoftheU.S.NationalInstituteofStandardsandTechnologyKnowl-edgeBasePopulation2014Workshop.KumuthaSwampillaiandMarkStevenson.2011.Extract-ingrelationswithinandacrosssentences.InProceed-ingsoftheConferenceonRecentAdvancesinNaturalLanguageProcessing.KaiShengTai,RichardSocher,andChristopherDMan-ning.2015.Improvedsemanticrepresentationsfromtree-structuredlongshort-termmemorynetworks.InProceedingsoftheFifty-ThirdAnnualMeetingoftheAssociationforComputationalLinguistics.TheanoDevelopmentTeam.2016.Theano:APythonframeworkforfastcomputationofmathematicalex-pressions.arXive-prints,abs/1605.02688.LinlinWang,ZhuCao,GerarddeMelo,andZhiyuanLiu.2016.Relationclassificationviamulti-levelattentionCNNs.InProceedingsoftheFifty-FourthAnnualMeet-ingoftheAssociationforComputationalLinguistics.MichaelWick,AronCulotta,andAndrewMcCallum.2006.Learningfieldcompatibilitiestoextractdatabaserecordsfromunstructuredtext.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing.KunXu,YansongFeng,SongfangHuang,andDongyanZhao.2015a.Semanticrelationclassificationviaconvolutionalneuralnetworkswithsimplenegativesampling.InProceedingsofConferenceonEmpiricalMethodsinNaturalLanguageProcessing.YanXu,LiliMou,GeLi,YunchuanChen,HaoPeng,andZhiJin.2015b.Classifyingrelationsvialongshorttermmemorynetworksalongshortestdependencypaths.InProceedingsofConferenceonEmpiricalMethodsinNaturalLanguageProcessing.YanXu,RanJia,LiliMou,GeLi,YunchuanChen,YangyangLu,andZhiJin.2016.Improvedrelationclassificationbydeeprecurrentneuralnetworkswithdataaugmentation.InProceedingsoftheTwenty-SixthInternationalConferenceonComputationalLinguis-tics.NianwenXue,HweeTouNg,SameerPradhan,RashmiPrasad,ChristopherBryant,andAttapolRutherford.2015.TheCoNLL-2015sharedtaskonshallowdis-courseparsing.InProceedingsoftheConferenceonComputationalNaturalLanguageLearning,SharedTask.KatsumasaYoshikawa,SebastianRiedel,TsutomuHi-rao,MasayukiAsahara,andYujiMatsumoto.2011.Coreferencebasedevent-argumentrelationextractiononbiomedicaltext.JournalofBiomedicalSemantics,2(5).DaojianZeng,KangLiu,SiweiLai,GuangyouZhou,JunZhao,etal.2014.Relationclassificationviaconvo-lutionaldeepneuralnetwork.InProceedingsoftheTwenty-SixthInternationalConferenceonComputa-tionalLinguistics.DaojianZeng,KangLiu,YuboChen,andJunZhao.2015.Distantsupervisionforrelationextractionviapiecewiseconvolutionalneuralnetworks.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
115
ShuZhang,DequanZheng,XinchenHu,andMingYang.2015.Bidirectionallongshort-termmemorynetworksforrelationclassification.InProceedingsofTwenty-NinthPacificAsiaConferenceonLanguage,Informa-tionandComputation.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
4
9
1
5
6
7
4
5
0
/
/
t
l
a
c
_
a
_
0
0
0
4
9
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
116