Transactions of the Association for Computational Linguistics, vol. 6, pp. 17–31, 2018. Action Editor: Ani Nenkova.
Submission batch: 7/17; Revision batch: 11/2017; Published 1/2018.
2018 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.
c
(cid:13)
MultipleInstanceLearningNetworksforFine-GrainedSentimentAnalysisStefanosAngelidisandMirellaLapataInstituteforLanguage,CognitionandComputationSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89ABs.angelidis@ed.ac.uk,mlap@inf.ed.ac.ukAbstractWeconsiderthetaskoffine-grainedsenti-mentanalysisfromtheperspectiveofmulti-pleinstancelearning(MIL).Ourneuralmodelistrainedondocumentsentimentlabels,andlearnstopredictthesentimentoftextseg-ments,i.e.sentencesorelementarydiscourseunits(EDUs),withoutsegment-levelsupervi-sion.Weintroduceanattention-basedpolar-ityscoringmethodforidentifyingpositiveandnegativetextsnippetsandanewdatasetwhichwecallSPOT(asshorthandforSegment-levelPOlariTyannotations)forevaluatingMIL-stylesentimentmodelslikeours.Experimen-talresultsdemonstratesuperiorperformanceagainstmultiplebaselines,whereasajudge-mentelicitationstudyshowsthatEDU-levelopinionextractionproducesmoreinformativesummariesthansentence-basedalternatives.1IntroductionSentimentanalysishasbecomeafundamentalareaofresearchinNaturalLanguageProcessingthankstotheproliferationofuser-generatedcontentintheformofonlinereviews,blogs,internetforums,andsocialmedia.Aplethoraofmethodshavebeenpro-posedintheliteraturethatattempttodistillsenti-mentinformationfromtext,allowingusersandser-viceproviderstomakeopinion-drivendecisions.Thesuccessofneuralnetworksinavarietyofap-plications(Bahdanauetal.,2015;LeandMikolov,2014;Socheretal.,2013)andtheavailabilityoflargeamountsoflabeleddatahaveledtoanin-creasedfocusonsentimentclassification.Super-visedmodelsaretypicallytrainedondocuments(JohnsonandZhang,2015un;JohnsonandZhang,2015b;Tangetal.,2015;Yangetal.,2016),sen-tences(Kim,2014),orphrases(Socheretal.,2011;[Rating:??]IhadaverymixedexperienceatTheStand.Theburgerandfriesweregood.Thechocolateshakewasdivine:richandcreamy.Thedrive-thruwashorrible.Ittookusatleast30minutestoorderwhentherewereonlyfourcarsinfrontofus.Wecomplainedaboutthewaitandgotahalf–heartedapology.Iwouldgobackbecausethefoodisgood,butmyonlyhesitationisthewait.Summary+Theburgerandfriesweregood+Thechocolateshakewasdivine+Iwouldgobackbecausethefoodisgood–Thedrive-thruwashorrible–Ittookusatleast30minutestoorderFigure1:AnEDU-basedsummaryofa2-out-of-5starreviewwithpositiveandnegativesnippets.Socheretal.,2013)annotatedwithsentimentla-belsandusedtopredictsentimentinunseentexts.Coarse-graineddocument-levelannotationsarerel-ativelyeasytoobtainduetothewidespreaduseofopiniongradinginterfaces(e.g.,starratingsac-companyingreviews).Incontrast,theacquisitionofsentence-orphrase-levelsentimentlabelsre-mainsalaboriousandexpensiveendeavordespiteitsrelevancetovariousopinionminingapplica-tions,e.g.,detectingorsummarizingconsumeropin-ionsinonlineproductreviews.Theusefulnessoffiner-grainedsentimentanalysisisillustratedintheexampleofFigure1,wheresnippetsofopposingpo-laritiesareextractedfroma2-starrestaurantreview.Although,asawhole,thereviewconveysnegativesentiment,aspectsofthereviewer’sexperiencewereclearlypositive.Thisgoeslargelyunnoticedwhenfocusingsolelyonthereview’soverallrating.Inthiswork,weconsidertheproblemofsegment-levelsentimentanalysisfromtheperspectiveofMultipleInstanceLearning(MIL;Keeler,1991).
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
je
un
c
_
un
_
0
0
0
0
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
18
Insteadoflearningfromindividuallylabeledseg-ments,ourmodelonlyrequiresdocument-levelsu-pervisionandlearnstointrospectivelyjudgethesen-timentofconstituentsegments.Beyondshowinghowtoutilizedocumentcollectionsofratedreviewstotrainfine-grainedsentimentpredictors,wealsoinvestigatethegranularityoftheextractedsegments.Previousresearch(Tangetal.,2015;Yangetal.,2016;ChengandLapata,2016;Nallapatietal.,2017)haspredominantlyvieweddocumentsasse-quencesofsentences.Inspiredbyrecentworkinsummarization(Lietal.,2016)andsentimentclas-sification(Bhatiaetal.,2015),wealsorepresentdocumentsviaRhetoricalStructureTheory’s(MannandThompson,1988)ElementaryDiscourseUnits(EDUs).AlthoughdefinitionsforEDUsvaryintheliterature,wefollowstandardpracticeandtaketheelementaryunitsofdiscoursetobeclauses(Carlsonetal.,2003).Weemployastate-of-the-artdiscourseparser(FengandHirst,2012)toidentifythem.Ourcontributionsinthisworkarethree-fold:anovelmultipleinstancelearningneuralmodelwhichutilizesdocument-levelsentimentsupervisiontojudgethepolarityofitsconstituentsegments;thecreationofSPOT,apubliclyavailabledatasetwhichcontainsSegment-levelPOlariTyannotations(forsentencesandEDUs)andcanbeusedfortheeval-uationofMIL-stylemodelslikeours;andtheem-piricalfinding(throughautomaticandhuman-basedevaluation)thatneuralmultipleinstancelearningissuperiortomoreconventionalneuralarchitecturesandotherbaselinesondetectingsegmentsentimentandextractinginformativeopinionsinreviews.12BackgroundOurworkliesattheintersectionofmultipleresearchareas,includingsentimentclassification,opinionminingandmultipleinstancelearning.Wereviewrelatedworkintheseareasbelow.SentimentClassificationSentimentclassificationisoneofthemostpopulartasksinsentimentanal-ysis.Earlyworkfocusedonunsupervisedmeth-odsandthecreationofsentimentlexicons(Turney,2002;HuandLiu,2004;Wiebeetal.,2005;Bac-cianellaetal.,2010)basedonwhichtheoverallpo-1OurcodeandSPOTdatasetarepubliclyavailableat:https://github.com/stangelid/milnet-sentlarityofatextcanbecomputed(e,g.,byaggregatingthesentimentscoresofconstituentwords).Morere-cently,Taboadaetal.(2011)introducedSO-CAL,astate-of-the-artmethodthatcombinesarichsenti-mentlexiconwithcarefullydefinedrulesoversyn-taxtreestopredictsentencesentiment.Supervisedlearningtechniqueshavesubse-quentlydominatedtheliterature(Pangetal.,2002;PangandLee,2005;Quetal.,2010;XiaandZong,2010;WangandManning,2012;LeandMikolov,2014)thankstouser-generatedsentimentlabelsorlarge-scalecrowd-sourcingefforts(Socheretal.,2013).Neuralnetworkmodelsinparticularhaveachievedstate-of-the-artperformanceonvari-oussentimentclassificationtasksduetotheirabil-itytoalleviatefeatureengineering.Kim(2014)introducedaverysuccessfulCNNarchitectureforsentence-levelclassification,whereasotherwork(Socheretal.,2011;Socheretal.,2013)usesrecur-siveneuralnetworkstolearnsentimentforsegmentsofvaryinggranularity(i.e.,words,phrases,andsen-tences).WedescribeKim’s(2014)approachinmoredetailasitisalsousedaspartofourmodel.Letxidenoteak-dimensionalwordembeddingofthei-thwordintextsegmentsoflengthn.Thesegment’sinputrepresentationistheconcatenationofwordembeddingsx1,…,xn,resultinginwordmatrixX.LetXi:i+jrefertotheconcatenationofembeddingsxi,…,xi+j.AconvolutionfilterW∈Rlk,appliedtoawindowoflwords,producesanewfeatureci=ReLU(W◦Xi:i+l+b),whereReLUistheRectifiedLinearUnitnon-linearity,‘◦’denotestheentrywiseproductfollowedbyasumoverallelementsandb∈Risabiasterm.Ap-plyingthesamefiltertoeverypossiblewindowofwordvectorsinthesegment,producesafeaturemapc=[c1,c2,…,cn−l+1].Multiplefeaturemapsforvariedwindowsizesareapplied,resultinginafixed-sizesegmentrepresentationvviamax-over-timepooling.Wewillrefertotheapplicationofcon-volutiontoaninputwordmatrixX,asCNN(X).Afinalsentimentpredictionisproducedusingasoft-maxclassifierandthemodelistrainedviaback-propagationusingsentence-levelsentimentlabels.Theavailabilityoflarge-scaledatasets(Diaoetal.,2014;Tangetal.,2015)hasalsoledtothede-velopmentofdocument-levelsentimentclassifierswhichexploithierarchicalneuralrepresentations.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
je
un
c
_
un
_
0
0
0
0
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
19
Theseareobtainedbyfirstbuildingrepresentationsofsentencesandaggregatingthoseintoadocumentfeaturevector(Tangetal.,2015).Yangetal.(2016)furtheracknowledgethatwordsandsentencesaredeferentiallyimportantindifferentcontexts.Theypresentamodelwhichlearnstoattend(Bahdanauetal.,2015)toindividualtextpartswhenconstructingdocumentrepresentations.Wedescribesuchanar-chitectureinmoredetailasweuseitasapointofcomparisonwithourownmodel.Givendocumentdcomprisingsegments(s1,…,sm),aHierarchicalNetworkwithat-tention(henceforthHIERNET;basedonYangetal.,2016)producessegmentrepresentations(v1,…,vm)whicharesubsequentlyfedintoabidirectionalGRUmodule(Bahdanauetal.,2015),whoseresultinghiddenvectors(h1,…,hm)areusedtoproduceattentionweights(a1,…,am)(seeSection3.2formoredetailsontheattentionmechanism).Adocumentisrepresentedastheweightedaverageofthesegments’hiddenvec-torsvd=Piaihi.Afinalsentimentpredictionisobtainedusingasoftmaxclassifierandthemodelistrainedviaback-propagationusingdocument-levelsentimentlabels.ThearchitectureisillustratedinFigure2(un).Intheirproposedmodel,Yangetal.(2016)usebidirectionalGRUmodulestorepresentsegmentsaswellasdocuments,whereasweuseamoreefficientCNNencodertocomposewordsintosegmentvectors2(i.e.,vi=CNN(Xi)).NotethatmodelslikeHIERNETdonotnaturallypredictsentimentforindividualsegments;wediscusshowtheycanbeusedforsegment-levelopinionextractioninSection5.2.Ourownworkdrawsinspirationfromrepresen-tationlearning(Tangetal.,2015;Kim,2014),es-peciallytheideathatnotallpartsofadocumentconveysentiment-worthyclues(Yangetal.,2016).Ourmodeldepartsfrompreviousapproachesinthatitprovidesanaturalwayofpredictingthepolar-ityofindividualtextsegmentswithoutrequiringsegment-levelannotations.Moreover,ouratten-tionmechanismdirectlyfacilitatesopiniondetectionratherthansimplyaggregatingsentencerepresenta-tionsintoasingledocumentvector.2WhenappliedtotheYELP’13andIMDBdocumentclas-sificationdatasets,theuseofCNNsresultsinarelativeperfor-mancedecreaseof<2%comparedYangetal’smodel(2016).OpinionMiningAstandardsettingforopinionminingandsummarization(Lermanetal.,2009;Careninietal.,2006;Ganesanetal.,2010;DiFab-brizioetal.,2014;Geranietal.,2014)assumesasetofdocumentsthatcontainopinionsaboutsomeen-tityofinterest(e.g.,camera).Thegoalofthesystemistogenerateasummarythatisrepresentativeoftheaverageopinionandspeakstoitsimportantaspects(e.g.,picturequality,batterylife,value).Outputsummariescanbeextractive(Lermanetal.,2009)orabstractive(Geranietal.,2014;DiFabbrizioetal.,2014)andtheunderlyingsystemsexhibitvary-ingdegreesoflinguisticsophisticationfromidenti-fyingaspects(Lermanetal.,2009)tousingRST-stylediscourseanalysis,andmanuallydefinedtem-plates(Geranietal.,2014;DiFabbrizioetal.,2014).Ourproposedmethoddepartsfrompreviousworkinthatitfocusesondetectingopinionsinindivid-ualdocuments.Givenareview,wepredictthepo-larityofeverysegment,allowingfortheextrac-tionofsentiment-heavyopinions.WeexploretheusefulnessofEDUsegmentationinspiredbyLietal.(2016),whoshowthatEDU-basedsummariesalignwithnear-extractivesummariesconstructedbynewseditors.Importantly,ourmodelistrainedinaweakly-supervisedfashiononlargescaledocu-mentclassificationdatasetswithoutrecoursetofine-grainedlabelsorgold-standardopinionsummaries.MultipleInstanceLearningOurmodelsadoptaMultipleInstanceLearning(MIL)framework.MILdealswithproblemswherelabelsareassociatedwithgroupsofinstancesorbags(documentsinourcase),whileinstancelabels(segment-levelpolarities)areunobserved.Anaggregationfunctionisusedtocombineinstancepredictionsandassignlabelsonthebaglevel.Thegoaliseithertolabelbags(KeelerandRumelhart,1992;Dietterichetal.,1997;MaronandRatan,1998)ortosimultaneouslyinferbagandinstancelabels(Zhouetal.,2009;Weietal.,2014;Kotziasetal.,2015).Weviewsegment-levelsenti-mentanalysisasaninstantiationofthelattervariant.InitialMILeffortsforbinaryclassificationmadethestrongassumptionthatabagisnegativeonlyifallofitsinstancesarenegative,andpositiveoth-erwise(Dietterichetal.,1997;MaronandRatan,1998;Zhangetal.,2002;AndrewsandHofmann,2004;Carbonettoetal.,2008).Subsequentworkre-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
20
laxedthisassumption,allowingforpredictioncom-binationsbettersuitedtothetasksathand.Wei-dmannetal.(2003)introducedageneralizedMILframework,whereacombinationofinstancetypesisrequiredtoassignabaglabel.Zhouetal.(2009)usedgraphkernelstoaggregatepredictions,exploit-ingrelationsbetweeninstancesinobjectandtextcategorization.XuandFrank(2004)proposedamultiple-instancelogisticregressionclassifierwhereinstancepredictionsweresimplyaveraged,assum-ingequalandindependentcontributiontowardbagclassification.Morerecently,Kotziasetal.(2015)usedsentencevectorsobtainedbyapre-trainedhi-erarchicalCNN(Deniletal.,2014)asfeaturesun-deranunweightedaverageMILobjective.Predic-tionaveragingwasfurtherextendedbyPappasandPopescu-Belis(2014;2017),whousedaweightedsummationofpredictions,anideawhichwealsoadoptinourwork.ApplicationsofMILaremanyandvaried.MILwasfirstexploredbyKeelerandRumelhart(1992)forrecognizinghandwrittenpostcodes,wherethepositionandvalueofindividualdigitswasunknown.MILtechniqueshavesincebeenappliedtodrugac-tivityprediction(Dietterichetal.,1997),imagere-trieval(MaronandRatan,1998;Zhangetal.,2002),objectdetection(Zhangetal.,2006;Carbonettoetal.,2008;Couretal.,2011),textclassification(An-drewsandHofmann,2004),imagecaptioning(Wuetal.,2015),paraphrasedetection(Xuetal.,2014),andinformationextraction(Hoffmannetal.,2011).Whenappliedtosentimentanalysis,MILtakesadvantageofsupervisionsignalsonthedocumentlevelinordertotrainsegment-levelsentimentpre-dictors.AlthoughtheirworkisnotcouchedintheframeworkofMIL,T¨ackstr¨omandMcDonald(2011)showhowsentencesentimentlabelscanbelearnedaslatentvariablesfromdocument-levelan-notationsusinghiddenconditionalrandomfields.PappasandPopescu-Belis(2014)useamultiplein-stanceregressionmodeltoassignsentimentscorestospecificaspectsofproducts.TheGroup-InstanceCostFunction(GICF),proposedbyKotziasetal.(2015),averagessentencesentimentpredictionsdur-ingtrainng,whileensuringthatsimilarsentencesreceivesimilarpolaritylabels.Theirworkusesapre-trainedhierarchicalCNNtoobtainsentenceem-beddings,butisnottrainableend-to-end,incontrastwithourproposednetwork.Additionally,noneoftheaforementionedeffortsexplicitlyevaluateopin-ionextractionquality.3MethodologyInthissectionwedescribehowmultipleinstancelearningcanbeusedtoaddresssomeofthedraw-backsseeninpreviousapproaches,namelytheneedforexpertknowledgeinlexicon-basedsentimentanalysis(Taboadaetal.,2011),expensivefine-grainedannotationonthesegmentlevel(Kim,2014;Socheretal.,2013)ortheinabilitytonaturallypre-dictsegmentsentiment(Yangetal.,2016).3.1ProblemFormulationUndermultipleinstancelearning(MIL),adatasetDisacollectionoflabeledbags,eachofwhichisagroupofunlabeledinstances.Specifically,eachdocumentdisasequence(bag)ofsegments(in-stances).Thissequenced=(s1,s2,...,sm)isob-tainedfromadocumentsegmentationpolicy(seeSection4fordetails).Adiscretesentimentlabelyd∈[1,C]isassociatedwitheachdocument,wherethelabelsetisorderedandclasses1andCcorre-spondtomaximallynegativeandmaximallyposi-tivesentiment.Itisassumedthatydisanunknownfunctionoftheunobservedsegment-levellabels:yd=f(y1,y2,...,ym)(1)Probabilisticsentimentclassifierswillproducedocument-levelpredictionsˆydbyselectingthemostprobableclassaccordingtoclassdistributionpd=hp(1)d,...,p(C)di.Inanon-MILframeworkaclassifierwouldlearntopredictthedocument’ssen-timentbydirectlyconditioningonitssegments’fea-turerepresentationsortheiraggregate:pd=ˆfθ(v1,v2,...,vm)(2)Incontrast,aMILclassifierwillproduceaclassdis-tributionpiforeachsegmentandadditionallylearntocombinetheseintoadocument-levelprediction:pi=ˆgθs(vi),(3)pd=ˆfθd(p1,p2,...,pm).(4)Inthiswork,ˆgandˆfaredefinedusingasingleneu-ralnetwork,describedbelow.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
21
Figure2:AHierarchicalNetwork(HIERNET)fordocument-levelsentimentclassificationandourproposedMultipleInstanceLearningNetwork(MILNET).Themodelsusethesameattentionmechanismtocombinesegmentvectorsandpredictionsrespectively.3.2MultipleInstanceLearningNetworkHierarchicalneuralmodelslikeHIERNEThavebeenusedtopredictdocument-levelpolaritybyfirsten-codingsentencesandthencombiningtheserepre-sentationsintoadocumentvector.Hierarchicalvec-torcompositionproducespowerfulsentimentpre-dictors,butlackstheabilitytointrospectivelyjudgethepolarityofindividualsegments.OurMultipleInstanceLearningNetwork(hence-forthMILNET)isbasedonthefollowingintuitiveassumptionsaboutopinionatedtext.Eachsegmentconveysadegreeofsentimentpolarity,rangingfromverynegativetoverypositive.Additionally,seg-mentshavevaryingdegreesofimportance,inrela-tiontotheoverallopinionoftheauthor.Theoverar-chingpolarityofatextisanaggregationofsegmentpolarities,weightedbytheirimportance.Thus,ourmodelattemptstopredictthepolarityofsegmentsanddecideswhichpartsofthedocumentaregoodindicatorsofitsoverallsentiment,allowingforthedetectionofsentiment-heavyopinions.Anillustra-tionofMILNETisshowninFigure2(b);themodelconsistsofthreecomponents:aCNNsegmenten-coder,asoftmaxsegmentclassifierandanattention-basedpredictionweightingmodule.SegmentEncodingAnencodingvi=CNN(Xi)isproducedforeachsegment,usingtheCNNarchi-tecturedescribedinSection2.SegmentClassificationObtainingaseparaterep-resentationviforeverysegmentinadocumental-lowsustoproduceindividualsegmentsentimentpredictionspi=hp(1)i,...,p(C)ii.Thisisachievedusingasoftmaxclassifier:pi=softmax(Wcvi+bc),(5)whereWcandbcaretheclassifier’sparameters,sharedacrossallsegments.IndividualdistributionspiareshowninFigure2(b)assmallbar-charts.DocumentClassificationInthesimplestcase,document-levelpredictionscanbeproducedbytakingtheaverageofsegmentclassdistributions:p(c)d=1/mPip(c)i,c∈[1,C].Thisis,however,acrudewayofcombiningsegmentsentiment,asnotallpartsofadocumentconveyimportantsentimentclues.Weoptforasegmentattentionmechanismwhichrewardstextunitsthataremorelikelytobegoodsentimentpredictors.Ourattentionmechanismisbasedonabidirec-tionalGRUcomponent(Bahdanauetal.,2015)and
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
22
Thestarterswerequitebland.Ididn’tenjoymostofthem,buttheburgerwasbrilliant!123450.000.250.500.75probabilityatt: 0.3polarity−101gtd-pol.12345att: 0.2−10112345att: 0.5−101Figure3:Polarityscores(bottom)obtainedfromclassprobabilitydistributionsforthreeEDUs(top)ex-tractedfromarestaurantreview.Attentionweights(top)areusedtofine-tunetheobtainedpolarities.inspiredbyYangetal.(2016).However,incon-trasttotheirwork,whereattentionisusedtocom-binesentencerepresentationsintoasingledocumentvector,weutilizeasimilartechniquetoaggregateindividualsentimentpredictions.WefirstuseseparateGRUmodulestoproduceforwardandbackwardhiddenvectors,whicharethenconcatenated:−→hi=−−−→GRU(vi),(6)←−hi=←−−−GRU(vi),(7)hi=[−→hi,←−hi],i∈[1,m].(8)Theimportanceofeachsegmentismeasuredwiththeaidofavectorha,asfollows:h0i=tanh(Wahi+ba),(9)ai=exp(h0Tiha)Piexp(h0Tiha),(10)whereEquation(9)definesaone-layerMLPthatproducesanattentionvectorforthei-thsegment.Attentionweightsaiarecomputedasthenormal-izedsimilarityofeachh0iwithha.Vectorha,whichisrandomlyinitializedandlearnedduringtraining,canbethoughtofasatrainedkey,abletorecognizesentiment-heavysegments.Theattentionmecha-nismisdepictedinthedashedboxofFigure2,withattentionweightsshownasshadedcircles.Finally,weobtainadocument-leveldistributionoversentimentlabelsastheweightedsumofseg-mentdistributions(seetopofFigure2(b)):p(c)d=Xiaip(c)i,c∈[1,C].(11)TrainingThemodelistrainedend-to-endondoc-umentswithuser-generatedsentimentlabels.Weusethenegativeloglikelihoodofthedocument-levelpredictionasanobjectivefunction:L=−Xdlogp(yd)d(12)4Polarity-basedOpinionExtractionAftertraining,ourmodelcanproducesegment-levelsentimentpredictionsforunseentextsintheformofclassprobabilitydistributions.Adirectapplicationofourmethodisopinionextraction,wherehighlypositiveandnegativesnippetsareselectedfromtheoriginaldocument,producingextractivesentimentsummaries,asdescribedbelow.PolarityScoringInordertoextractopinionsum-maries,weneedtoranksegmentsaccordingtotheirsentimentpolarity.Weintroduceamethodthattakesourmodel’sconfidenceinthepredictionintoac-count,byreducingeachsegment’sclassprobabilitydistributionpitoasinglereal-valuedpolarityscore.Toachievethis,wefirstdefineareal-valuedclassweightvectorw=hw(1),...,w(C)|w(c)∈[−1,1]ithatassignsuniformly-spacedweightstotheorderedlabelset,suchthatw(c+1)−w(c)=2C−1.Forexam-ple,ina5-classscenario,theclassweightvectorwouldbew=h−1,−0.5,0,0.5,1i.Wecomputethepolarityscoreofasegmentasthedot-productoftheprobabilitydistributionpiwithvectorw:polarity(si)=Xcp(c)iw(c)∈[−1,1](13)GatedPolarityAsawayofincreasingtheeffec-tivenessofourmethod,weintroduceagatedexten-sionthatusestheattentionmechanismofourmodeltofurtherdifferentiatebetweensegmentsthatcarry
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
23
significantsentimentcuesandthosethatdonot:gated-polarity(si)=ai·polarity(si),(14)whereaiistheattentionweightassignedtothei-thsegment.Thisforcesthepolarityscoresofsegmentsthemodeldoesnotattendtocloserto0.AnillustrationofourpolarityscoringfunctionisprovidedinFigure3,wheretheclasspredic-tions(top)ofthreerestaurantreviewsegmentsaremappedtotheircorrespondingpolarityscores(bot-tom).Weobservethatourmethodproducesthede-siredresult;segments1and2conveynegativesenti-mentandreceivenegativescores,whereasthethirdsegmentismappedtoapositivescore.Althoughthesamediscreteclasslabelisassignedtothefirsttwo,thesecondsegment’sscoreiscloserto0(neutral)asitsclassprobabilitymassismoreevenlydistributed.SegmentationPoliciesAsmentionedearlier,oneofthehypothesesinvestigatedinthisworkregardstheuseofsubsententialunitsasthebasisofextrac-tion.Specifically,ourmodelwasappliedtosen-tencesandElementaryDiscourseUnits(EDUs),ob-tainedfromaRhetoricalStructureTheory(RST)parser(FengandHirst,2012).AccordingtoRST,documentsarefirstsegmentedintoEDUscorre-spondingroughlytoindependentclauseswhicharethenrecursivelycombinedintolargerdiscoursespans.Thisresultsinatreerepresentationofthedocument,whereconnectednodesarecharacterizedbydiscourserelations.WeonlyutilizeRST’sseg-mentation,andleavethepotentialuseofthetreestructuretofuturework.TheexampleinFigure3illustrateswhyEDU-basedsegmentationmightbebeneficialforopinionextraction.ThesecondandthirdEDUscorrespondtothesentence:Ididn’tenjoymostofthem,buttheburgerwasbrilliant.Takenasawhole,thesentenceconveysmixedsentiment,whereastheEDUsclearlyconveyopposingsentiment.5ExperimentalSetupInthissectionwedescribethedatausedtoassesstheperformanceofourmodel.Wealsogivedetailsonmodeltrainingandcomparisonsystems.Yelp’13IMDBDocuments335,018348,415Average#Sentences8.9014.02Average#EDUs19.1137.38Average#Words152325VocabularySize211,245115,831Classes1–51–10Table1:Document-levelsentimentclassificationdatasetsusedtotrainourmodels.Yelp’13segIMDBsegSent.EDUsSent.EDUs#Segments1,0652,1101,0292,398#Documents10097Classes{–,0,+}{–,0,+}Table2:SPOTdataset:numbersofdocumentsandsegmentswithpolarityannotations.5.1DatasetsOurmodelsweretrainedontwolarge-scalesenti-mentclassificationcollections.TheYelp’13corpuswasintroducedinTangetal.(2015)andcontainscustomerreviewsoflocalbusinesses,eachassoci-atedwithhumanratingsonascalefrom1(negative)to5(positive).TheIMDBcorpusofmoviereviewswasobtainedfromDiaoetal.(2014);eachreviewisassociatedwithuserratingsrangingfrom1to10.Bothdatasetsaresplitintotraining(80%),validation(10%)andtest(10%)sets.AsummaryofstatisticsforeachcollectionisprovidedinTable1.Inordertoevaluatemodelperformanceonthesegmentlevel,weconstructedanewdatasetnamedSPOT(asashorthandforSegmentPOlariTy)byannotatingdocumentsfromtheYelp’13andIMDBcollections.Specifically,wesampledreviewsfromeachcollectionsuchthatalldocument-levelclassesarerepresenteduniformly,andthedocumentlengthsarerepresentativeoftherespectivecorpus.Docu-mentsweresegmentedintosentencesandEDUs,re-sultingintwosegment-leveldatasetspercollection.StatisticsaresummarizedinTable2.EachreviewwaspresentedtothreeAmazonMe-chanicalTurk(AMT)annotatorswhowereaskedtojudgethesentimentconveyedbyeachsegment(i.e.,sentenceorEDU)asnegative,neutral,orpos-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
24
123450.00.10.20.30.40.50.60.70.8proportion of segmentsYelp'13 - Sentences12345Yelp'13 - EDUsnegativeneutralpositive12345678910document class0.00.10.20.30.40.50.60.70.8proportion of segmentsIMDB - Sentences12345678910document classIMDB - EDUsFigure4:Distributionofsegment-levellabelsperdocument-levelclassonourtheSPOTdatasets.itive.Weassignedlabelsusingamajorityvoteorafourthannotatorintherarecasesofnoagreement(<5%).Figure4showsthedistributionofsegmentlabelsforeachdocument-levelclass.Asexpected,documentswithpositivelabelscontainalargernum-berofpositivesegmentscomparedtodocumentswithnegativelabelsandviceversa.Neutralseg-mentsaredistributedinanapproximatelyuniformmanneracrossdocumentclasses.Interestingly,theproportionofneutralEDUsissignificantlyhighercomparedtoneutralsentences.Theobservationre-inforcesourargumentinfavorofEDUsegmenta-tion,asitsuggeststhatasentencewithpositiveornegativeoverallpolaritymaystillcontainneutralEDUs.DiscardingneutralEDUs,couldthereforeleadtomoreconciseopinionextractioncomparedtorelyingonentiresentences.Wefurtherexperimentedontwocollectionsintro-ducedbyKotziasetal.(2015)whichalsooriginatefromtheYELP’13andIMDBdatasets.Eachcollec-tionconsistsof1,000randomlysampledsentencesannotatedwithbinarysentimentlabels.5.2ModelComparisonOnthetaskofsegmentclassificationwecomparedMILNET,ourmultipleinstancelearningnetwork,againstthefollowingmethods:Majority:Majorityclassappliedtoallinstances.SO-CAL:State-of-the-artlexicon-basedsystemthatclassifiessegmentsintopositive,neutral,andnegativeclasses(Taboadaetal.,2011).Seg-CNN:Fully-supervisedCNNsegmentclassi-fiertrainedonSPOT’slabels(Kim,2014).GICF:TheGroup-InstanceCostFunctionmodelintroducedinKotziasetal.(2015).ThisisanunweightedaveragepredictionaggregationMILmethodthatusessentencefeaturesfromapre-trainedconvolutionalneuralmodel.HIERNET:HIERNETdoesnotexplicitlygenerateindividualsegmentpredictions.Segmentpolarityscoresareobtainedbyassigningthedocument-levelpredictiontoeverysegment.Wecanthenproducefiner-grainedpolaritydistinctionsviagating,usingthemodel’sattentionweights.WefurtherillustratethedifferencesbetweenHI-ERNETandMILNETinFigure5,whichincludesshortdescriptionsandsimplifiedequationsforeachmodel.MILNETnaturallyproducesdistinctseg-mentpolarities,whileHIERNETassignsasinglepo-larityscoretoeverysegment.Inbothcases,gatingisafurthermeansofidentifyingneutralsegments.Finally,wedifferentiatebetweenvariantsofHI-ERNETandMILNETaccordingto:Polaritysource:Controlswhetherweassignpolar-itiesviasegment-specificordocument-widepre-dictions.HIERNETonlyallowsfordocument-widepredictions.MILNETcanuseboth.Attention:Weusemodelswithoutgating(nosub-script),withgating(gtsubscript)aswellasmod-elstrainedwiththeattentionmechanismdisabled,fallingbacktosimpleaveraging(avgsubscript).5.3ModelTrainingandEvaluationWetrainedMILNETandHIERNETusingAdadelta(Zeiler,2012)for25epochs.Mini-batchesof200documentswereorganizedbasedonthereviews’segmentanddocumentlengthssotheamountofpaddingwasminimized.Weused300-dimensionalpre-trainedword2vecembeddings.Wetunedhyper-parametersonthevalidationsetsofthedocumentclassificationcollections,resultinginthefollow-ingconfiguration(unlessotherwisenoted).FortheCNNsegmentencoder,weusedwindowsizesof3,4
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
25
Figure5:SystempipelinesforHIERNETandMILNETshowing4distinctphasesforsentimentanalysis.and5wordswith100featuremapsperwindowsize,resultingin300-dimensionalsegmentvectors.TheGRUhiddenvectordimensionsforeachdirectionweresetto50andtheattentionvectordimension-alityto100.WeusedL2-normalizationanddropouttoregularizethesoftmaxclassifiersandadditionaldropoutontheinternalGRUconnections.Real-valuedpolarityscoresproducedbythetwomodelsaremappedtodiscretelabelsusingtwoap-propriatethresholdst1,t2∈[−1,1],sothataseg-mentsisclassifiedasnegativeifpolarity(s)
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
je
un
c
_
un
_
0
0
0
0
2
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
26
MethodYelp’13segIMDBsegSentEDUSentEDUMajority19.02†17.03†18.32†21.52†DocumentHIERNETavg54.21†50.90†46.99†49.02†HIERNET55.33†51.43†48.47†49.70†HIERNETgt56.64†58.7562.1257.38†MILNETavg58.43†48.63†53.40†51.81†MILNET52.73†53.59†48.75†47.18†MILNETgt59.74†59.4761.83†58.24†SegmMILNETavg51.79†46.77†45.69†38.37†MILNET61.4159.5859.99†57.71†MILNETgt63.3559.8563.9759.87SO-CAL56.53†58.16†53.21†60.40Seg-CNN56.18†59.9658.32†62.95†Table3:Segmentclassificationresults(inmacro-averagedF1).†indicatesthatthesysteminquestionissignificantlydifferentfromMILNETgt(approxi-materandomizationtest(Noreen,1989),p<0.05).ouslydescribed.Thefinalblockshowstheperfor-manceofSO-CALandtheSeg-CNNclassifier.Whenconsideringmodelsthatusedocument-levelsupervision,MILNETwithgated,segment-specificpolaritiesobtainsthebestclassificationper-formanceacrossallfourdatasets.Interestingly,itperformscomparablytoSeg-CNN,thefully-supervisedsegmentclassifier,whichprovidesaddi-tionalevidencethatMILNETcaneffectivelyiden-tifysegmentpolaritywithouttheneedforsegment-levelannotations.OurmodelalsooutperformsthestrongSO-CALbaselineinallbutonedatasetswhichisremarkablegiventheexpertknowledgeandlinguisticinformationusedtodevelopthelat-ter.Document-levelpolaritypredictionsresultinlowerclassificationperformanceacrosstheboard.Differencesbetweenthestandardhierarchicalandmultipleinstancenetworksarelesspronouncedinthiscase,asMILNETlosestheadvantageofproduc-ingsegment-specificsentimentpredictions.Modelswithoutattentionperformworseinmostcases.Theuseofgatedpolaritiesbenefitsallmodelconfigura-tions,indicatingthemethod’sabilitytoselectivelyfocusonsegmentswithsignificantsentimentcues.WefurtheranalyzedthepolaritiesassignedbyMILNETandHIERNETtopositive,negative,andNeutralSegmentsNon-GtdGatedSentHIERNET4.6736.60MILNET39.6144.60Non-GtdGatedEDUHIERNET2.3955.38MILNET52.1056.60Table4:F1scoresforneutralsegments(Yelp’13).MethodYelpIMDBGICF86.386.0GICFHN92.986.5GICFMN93.291.0MILNET94.091.9Table5:Accuracyscoresonthesentenceclassi-ficationdatasetsintro-ducedinKotziasetal.(2015).−1010.00.20.40.60.81.01.21.41.6negative−101HierNetneutral−101positive−1010.00.20.40.60.81.01.21.41.6−101polarityMILNet−101Figure6:Distributionofpredictedpolarityscoresacrossthreeclasses(Yelp’13sentences).neutralsegments.Figure6illustratesthedistribu-tionofpolarityscoresproducedbythetwomod-elsontheYelp’13dataset(sentencesegmentation).Inthecaseofnegativeandpositivesentences,bothmodelsdemonstrateappropriatelyskeweddistribu-tions.However,theneutralclassappearstobepar-ticularlyproblematicforHIERNET,wherepolarityscoresarescatteredacrossawiderangeofvalues.Incontrast,MILNETismoresuccessfulatidentify-ingneutralsentences,asitscorrespondingdistribu-tionhasasinglemodenearzero.Attentiongatingaddressesthisissuebymovingthepolarityscoresofsentiment-neutralsegmentstowardszero.ThisisillustratedinTable4whereweobservethatgatedvariantsofbothmodelsdoabetterjobatidentify-ingneutralsegments.TheeffectisverysignificantforHIERNET,whileMILNETbenefitsslightlyandremainsmoreeffectiveoverall.SimilartrendswereobservedinallfourSPOTdatasets.Inordertoexaminetheeffectoftrainingsize,wetrainedmultiplemodelsusingsubsetsoftheoriginaldocumentcollections.Wetrainedonfiverandom
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
27
40455055606570macro-f1Yelp SentencesYelp EDUS050000100000150000200000250000training size40455055606570macro-f1IMDB Sentences050000100000150000200000250000training sizeIMDB EDUSMILNetHierNetSeg-CNNFigure7:PerformanceofHIERNETgtandMILNETgtforvaryingtrainingsizes.subsetsforeachtrainingsize,rangingfrom100doc-umentstothefulltrainingset,andtestedsegmentclassificationperformanceonSPOT.Theresults,av-eragedacrosstrials,arepresentedinFigure7.WiththeexceptionoftheIMDBEDU-segmenteddataset,MILNETonlyrequiresafewthousandtrainingdoc-umentstooutperformthesupervisedSeg-CNN.HI-ERNETfollowsasimilarcurve,butisinferiortoMILNET.AreasonforMILNET’sinferiorperfor-manceontheIMDBcorpus(EDU-split)canbelow-qualityEDUs,duetothenoisyandinformalstyleoflanguageusedinIMDBreviews.Finally,wecomparedMILNETagainsttheGICFmodel(Kotziasetal.,2015)ontheirYelpandIMDBsentencesentimentdatasets.4Theirmodelre-quiressentenceembeddingsfromapre-trainedneu-ralmodel.WeusedthehierarchicalCNNfromtheirwork(Deniletal.,2014)and,additionally,pre-trainedHIERNETandMILNETsentenceem-beddings.TheresultsinTable5showthatMILNEToutperformsallvariantsofGIFC.Ourmodelsalsoseemtolearnbettersentenceembeddings,astheyimproveGICF’sperformanceonbothcollections.4GICFonlyhandlesbinarylabels,whichmakesitunsuitableforthefull-scalecomparisonsinTable3.Here,webinarizeourtrainingdatasetsandusesame-sizedsentenceembeddingsforallfourmodels(R150forYelp,R72forIMDB).MethodInformativenessPolarityCoherenceHIERNETsent43.733.643.5MILNETsent45.736.744.6Unsure10.729.611.8HIERNETedu34.2†28.0†48.4MILNETedu53.361.145.0Unsure12.511.06.6MILNETsent35.7†33.4†70.4†MILNETedu55.051.523.7Unsure9.315.25.9LEAD34.019.0†40.3RANDOM22.9†19.6†17.8†MILNETedu37.446.933.3Unsure5.714.68.6Table6:Humanevaluationresults(inpercentages).†indicatesthatthesysteminquestionissignifi-cantlydifferentfromMILNET(sign-test,p<0.01).6.2OpinionExtractionInouropinionextractionexperiments,AMTwork-ers(allnativeEnglishspeakers)wereshownanoriginalreviewandasetofextractive,bullet-stylesummaries,producedbycompetingsystemsusinga30%compressionrate.Participantswereaskedtodecidewhichsummarywasbestaccordingtothreecriteria:Informativeness(Whichsummarybestcap-turesthesalientpointsofthereview?),Polarity(Whichsummarybesthighlightspositiveandneg-ativecomments?)andCoherence(Whichsummaryismorecoherentandeasiertoread?).Subjectswereallowedtoanswer“Unsure”incaseswheretheycouldnotdiscriminatebetweensummaries.WeusedallreviewsfromourSPOTdatasetandcollectedthreeresponsesperdocument.Weranfourjudg-mentelicitationstudies:onecomparingHIERNETandMILNETwhensummarizingreviewssegmentedassentences,asecondonecomparingthetwomod-elswithEDUsegmentation,athirdwhichcomparesEDU-andsentence-basedsummariesproducedbyMILNET,andafourthwhereEDU-basedsum-mariesfromMILNETwerecomparedtoaLEAD(thefirstNwordsfromeachdocument)andaRAN-DOM(randomEDUs)baseline.Table6summarizesourresults,showingthepro-portionofparticipantsthatpreferredeachsystem.Thefirstblockinthetableshowsaslightprefer-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
28
[Rating:????]Aswithanyfamily-runholeinthewall,servicecanbeslow.Whatthestafflackedinspeed,theymadeupforincharm.Thefoodwasgood,butnothingwowedme.IhadthePierogiswhilemyfriendhadswedishmeatballs.Bothdishesweretasty,aswerethesides.Onethingthatwasdisappointingwasthatthefoodwasaalittlecold(lukewarm).Therestaurantitselfisbrightandclean.Iwillgobackagainwhenifeellikeeatingoutsidethebox.EDU-basedExtractedviaHIERNETgt(0.13)[+0.26]Thefoodwasgood+(0.10)[+0.26]butnothingwowedme.+(0.09)[+0.26]Therestaurantitselfisbrightandclean+(0.13)[+0.26]Bothdishesweretasty+(0.18)[+0.26]Iwillgobackagain+ExtractedviaMILNETgt(0.16)[+0.12]Thefoodwasgood+(0.12)[+0.43]Therestaurantitselfisbrightandclean+(0.19)[+0.15]Iwillgobackagain+(0.09)[–0.07]butnothingwowedme.−(0.10)[–0.10]thefoodwasaalittlecold(lukewarm)−Sent-based(0.12)[+0.23]Bothdishesweretasty,aswerethesides+(0.18)[+0.23]Thefoodwasgood,butnothingwowedme+(0.22)[+0.23]Onethingthatwasdisappointingwasthatthefoodwasaalittlecold(lukewarm)+(0.13)[+0.26]Bothdishesweretasty,aswerethesides+(0.20)[+0.59]IwillgobackagainwhenIfeellikeeatingoutsidethebox+(0.18)[–0.12]Thefoodwasgood,butnothingwowedme−(number):attentionweight[number]:non-gatedpolarityscoretext+:extractedpositiveopiniontext−:extractednegativeopinionFigure8:ExampleEDU-andsentence-basedopinionsummariesproducedbyHIERNETgtandMILNETgt.enceforMILNETacrosscriteria.ThesecondblockshowssignificantpreferenceforMILNETagainstHIERNEToninformativenessandpolarity,whereasHIERNETwasmoreoftenpreferredintermsofcoherence,althoughthedifferenceisnotstatisti-callysignificant.ThethirdblockcomparessentenceandEDUsummariesproducedbyMILNET.EDUsummarieswereperceivedassignificantlybetterintermsofinformativenessandpolarity,butnotco-herence.ThisissomewhatexpectedasEDUstendtoproducemoreterseandtelegraphictextandmayseemunnaturalduetosegmentationerrors.InthefourthblockweobservethatparticipantsfindMIL-NETmoreinformativeandbetteratdistillingpolar-itycomparedtotheLEADandRANDOM(EDUs)baselines.WeshouldpointoutthattheLEADsys-temisnotastrawman;ithasprovedhardtoout-performbymoresophisticatedmethods(Nenkova,2005),particularlyonthenewswiredomain.ExampleEDU-andsentence-basedsummariesproducedbygatedvariantsofHIERNETandMIL-NETareshowninFigure8,withattentionweightsandpolarityscoresoftheextractedsegmentsshowninroundandsquarebracketsrespectively.Forbothgranularities,HIERNET’spositivedocument-levelpredictionresultsinasinglepolarityscoreassignedtoeverysegment,andfurtheradjustedusingthecor-respondingattentionweights.Theextractedseg-mentsareinformative,butfailtocapturetheneg-ativesentimentofsomesegments.Incontrast,MIL-NETisabletodetectpositiveandnegativesnippetsviaindividualsegmentpolarities.Here,EDUseg-mentationproducedamoreconcisesummarywithaclearergroupingofpositiveandnegativesnippets.7ConclusionsInthiswork,wepresentedaneuralnetworkmodelforfine-grainedsentimentanalysiswithintheframe-workofmultipleinstancelearning.Ourmodelcanbetrainedonlargescalesentimentclassifica-tiondatasets,withouttheneedforsegment-levellabels.Asadeparturefromthecommonlyusedvector-basedcomposition,ourmodelfirstpredictssentimentatthesentence-orEDU-levelandsubse-quentlycombinespredictionsupthedocumenthier-archy.Anattention-weightedpolarityscoringtech-niqueprovidesanaturalwaytoextractsentiment-heavyopinions.Experimentalresultsdemonstratethesuperiorperformanceofourmodelagainstmoreconventionalneuralarchitectures.Humanevalua-tionstudiesalsoshowthatMILNETopinionextractsarepreferredbyparticipantsandareeffectiveatcap-turinginformativenessandpolarity,especiallywhenusingEDUsegments.Inthefuture,wewouldliketofocusonmulti-document,aspect-basedextraction(Caoetal.,2017)andwaysofimprovingthecoher-enceofoursummariesbytakingintoaccountmorefine-graineddiscourseinformation(Daum´eIIIandMarcu,2002).
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
29
AcknowledgmentsTheauthorsgratefullyacknowledgethesupportoftheEuropeanResearchCouncil(awardnum-ber681760).WethankTACLactioneditorAniNenkovaandtheanonymousreviewerswhosefeed-backhelpedimprovethepresentpaper,aswellasCharlesSutton,TimothyHospedales,andmembersofEdinburghNLPforhelpfuldiscussionsandsug-gestions.ReferencesStuartAndrewsandThomasHofmann.2004.Multipleinstancelearningviadisjunctiveprogrammingboost-ing.InAdvancesinNeuralInformationProcessingSystems16,pages65–72.CurranAssociates,Inc.StefanoBaccianella,AndreaEsuli,andFabrizioSebas-tiani.2010.SentiWordNet3.0:Anenhancedlexi-calresourceforsentimentanalysisandopinionmin-ing.InProceedingsofthe5thConferenceonIn-ternationalLanguageResourcesandEvaluation,vol-ume10,pages2200–2204,Valletta,Malta.DzmitryBahdanau,KyunghyunCho,andYoshuaBen-gio.2015.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.InProceedingsofthe3rdInternationalConferenceonLearningRepresen-tations,SanDiego,California,USA.ParminderBhatia,YangfengJi,andJacobEisenstein.2015.Betterdocument-levelsentimentanalysisfromRSTdiscourseparsing.InProceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages2212–2218,Lisbon,Portu-gal.ZiqiangCao,WenjieLi,SujianLi,andFuruWei.2017.Improvingmulti-documentsummarizationviatextclassification.InProceedingsofthe31stAAAICon-ferenceonArtificialIntelligence,pages3053–3058,SanFrancisco,California,USA.PeterCarbonetto,GyuriDork´o,CordeliaSchmid,Hen-drikK¨uck,andNandoDeFreitas.2008.Learningtorecognizeobjectswithlittlesupervision.InternationalJournalofComputerVision,77(1):219–237.GiuseppeCarenini,RymondNg,andAdamPauls.2006.Multidocumentsummarizationofevaluativetext.InProceedingsofthe11thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguis-tics,pages305–312,Trento,Italy.LynnCarlson,DanielMarcu,andMaryEllenOkurowski.2003.Buildingadiscourse-taggedcorpusintheframeworkofrhetoricalstructuretheory.InCurrentandNewDirectionsinDiscourseandDialogue,pages85–112.Springer.JianpengChengandMirellaLapata.2016.Neuralsum-marizationbyextractingsentencesandwords.InPro-ceedingsofthe54thAnnualMeetingoftheAssocia-tionforComputationalLinguistics(Volume1:LongPapers),pages484–494,Berlin,Germany.TimotheeCour,BenSapp,andBenTaskar.2011.Learn-ingfrompartiallabels.JournalofMachineLearningResearch,12(May):1501–1536.HalDaum´eIIIandDanielMarcu.2002.Anoisy-channelmodelfordocumentcompression.InProceedingsofthe40thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages449–456,Philadelphia,Pennsylvania,USA.MishaDenil,AlbanDemiraj,andNandodeFreitas.2014.Extractionofsalientsentencesfromlabelleddocuments.Technicalreport,UniversityofOxford.GiuseppeDiFabbrizio,AmandaStent,andRobertGaizauskas.2014.Ahybridapproachtomulti-documentsummarizationofopinionsinreviews.InProceedingsofthe8thInternationalNaturalLan-guageGenerationConference(INLG),pages54–63,Philadelphia,Pennsylvania,USA.QimingDiao,MinghuiQiu,Chao-YuanWu,Alexan-derJ.Smola,JingJiang,andChongWang.2014.Jointlymodelingaspects,ratingsandsentimentsformovierecommendation(JMARS).InProceedingsofthe20thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages193–202,NewYork,NY,USA.ThomasG.Dietterich,RichardH.Lathrop,andTomsLozano-Prez.1997.Solvingthemultipleinstanceproblemwithaxis-parallelrectangles.ArtificialIntel-ligence,89(1):31–71.WeiVanessaFengandGraemeHirst.2012.Text-leveldiscourseparsingwithrichlinguisticfeatures.InPro-ceedingsofthe50thAnnualMeetingoftheAssocia-tionforComputationalLinguistics(Volume1:LongPapers),pages60–68,JejuIsland,Korea.KavitaGanesan,ChengXiangZhai,andJiaweiHan.2010.Opinosis:Agraphbasedapproachtoabstrac-tivesummarizationofhighlyredundantopinions.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages340–348,Beijing,China.ShimaGerani,YasharMehdad,GiuseppeCarenini,Ray-mondT.Ng,andBitaNejat.2014.Abstractivesum-marizationofproductreviewsusingdiscoursestruc-ture.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1602–1613,Doha,Qatar.RaphaelHoffmann,CongleZhang,XiaoLing,LukeZettlemoyer,andDanielSWeld.2011.Knowledge-basedweaksupervisionforinformationextractionof
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
30
overlappingrelations.InProceedingsofthe49thAn-nualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies-Volume1,pages541–550,Portland,Oregon,USA.MinqingHuandBingLiu.2004.Miningandsumma-rizingcustomerreviews.InProceedingsofthe10thACMSIGKDDInternationalConferenceonKnowl-edgeDiscoveryandDataMining,pages168–177,Seattle,Washington,USA.RieJohnsonandTongZhang.2015a.Effectiveuseofwordorderfortextcategorizationwithconvolu-tionalneuralnetworks.InProceedingsofthe2015ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages103–112,Denver,Col-orado,USA.RieJohnsonandTongZhang.2015b.Semi-supervisedconvolutionalneuralnetworksfortextcategorizationviaregionembedding.InAdvancesinNeuralInfor-mationProcessingSystems28,pages919–927.CurranAssociates,Inc.JimKeelerandDavidE.Rumelhart.1992.Aself-organizingintegratedsegmentationandrecogni-tionneuralnet.InAdvancesinNeuralInforma-tionProcessingSystems4,pages496–503.Morgan-Kaufmann.YoonKim.2014.Convolutionalneuralnetworksforsen-tenceclassification.InProceedingsofthe2014Con-ferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1746–1751,Doha,Qatar.DimitriosKotzias,MishaDenil,NandoDeFreitas,andPadhraicSmyth.2015.Fromgrouptoindividualla-belsusingdeepfeatures.InProceedingsofthe21thACMSIGKDDInternationalConferenceonKnowl-edgeDiscoveryandDataMining,pages597–606,Sydney,Australia.QuocLeandTomasMikolov.2014.Distributedrepre-sentationsofsentencesanddocuments.InProceed-ingsofthe31stInternationalConferenceonMachineLearning,pages1188–1196,Beijing,China.KevinLerman,SashaBlair-Goldensohn,andRyanMc-Donald.2009.Sentimentsummarization:Evaluatingandlearninguserpreferences.InProceedingsofthe12thConferenceoftheEuropeanChapteroftheACL,pages514–522,Athens,Greece.JunyiJessyLi,KapilThadani,andAmandaStent.2016.Theroleofdiscourseunitsinnear-extractivesumma-rization.InProceedingsoftheSIGDIAL2016Con-ference,The17thAnnualMeetingoftheSpecialInter-estGrouponDiscourseandDialogue,pages137–147,LosAngeles,California,USA.WilliamC.MannandSandraA.Thompson.1988.Rhetoricalstructuretheory:Towardafunctionalthe-oryoftextorganization.Text-InterdisciplinaryJour-nalfortheStudyofDiscourse,8(3):243–281.OdedMaronandAparnaLakshmiRatan.1998.Multiple-instancelearningfornaturalsceneclassifica-tion.InProceedingsofthe15thInternationalCon-ferenceonMachineLearning,volume98,pages341–349,SanFrancisco,California,USA.RameshNallapati,FeifeiZhai,andBowenZhou.2017.SummaRuNNer:Arecurrentneuralnetworkbasedse-quencemodelforextractivesummarizationofdocu-ments.InProceedingsofthe31stAAAIConferenceonArtificialIntelligence,pages3075–3081,SanFran-cisco,California.AniNenkova.2005.Automatictextsummarizationofnewswire:Lessonslearnedfromthedocumentunder-standingconference.InProceedingsofthe20thAAAI,pages1436–1441,Pittsburgh,Pennsylvania,USA.EricNoreen.1989.Computer-intensiveMethodsforTestingHypotheses:AnIntroduction.Wiley.BoPangandLillianLee.2005.Seeingstars:Ex-ploitingclassrelationshipsforsentimentcategoriza-tionwithrespecttoratingscales.InProceedingsofthe43rdAnnualMeetingonAssociationforCompu-tationalLinguistics,pages115–124.AssociationforComputationalLinguistics.BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?sentimentclassificationusingma-chinelearningtechniques.InProceedingsofthe2002ConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages79–86,Pittsburgh,Pennsyl-vania,USA.NikolaosPappasandAndreiPopescu-Belis.2014.Ex-plainingthestars:Weightedmultiple-instancelearn-ingforaspect-basedsentimentanalysis.InProceed-ingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages455–466,Doha,Qatar,October.NikolaosPappasandAndreiPopescu-Belis.2017.Ex-plicitdocumentmodelingthroughweightedmultiple-instancelearning.JournalofArtificialIntelligenceRe-search,58:591–626.LizhenQu,GeorgianaIfrim,andGerhardWeikum.2010.Thebag-of-opinionsmethodforreviewratingpredictionfromsparsetextpatterns.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages913–921,Beijing,China.RichardSocher,JeffreyPennington,EricH.Huang,An-drewY.Ng,andChristopherD.Manning.2011.Semi-supervisedrecursiveautoencodersforpredictingsentimentdistributions.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages151–161,Edinburgh,Scot-land,UK.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
31
RichardSocher,AlexPerelygin,JeanWu,JasonChuang,ChristopherD.Manning,AndrewY.Ng,andChristo-pherPotts.2013.Recursivedeepmodelsforsemanticcompositionalityoverasentimenttreebank.InPro-ceedingsofthe2013ConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing,pages1631–1642,Seattle,Washington,USA.MaiteTaboada,JulianBrooke,MilanTofiloski,KimberlyVoll,andManfredStede.2011.Lexicon-basedmeth-odsforsentimentanalysis.ComputationalLinguis-tics,37(2):267–307.OscarT¨ackstr¨omandRyanMcDonald.2011.Discov-eringfine-grainedsentimentwithlatentvariablestruc-turedpredictionmodels.InProceedingsofthe39thEuropeanConferenceonInformationRetrieval,pages368–374,Aberdeen,Scotland,UK.DuyuTang,BingQin,andTingLiu.2015.Documentmodelingwithgatedrecurrentneuralnetworkforsen-timentclassification.InProceedingsofthe2015Con-ferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1422–1432,Lisbon,Portugal.PeterDTurney.2002.Thumbsuporthumbsdown?:Semanticorientationappliedtounsupervisedclassifi-cationofreviews.InProceedingsofthe40thannualmeetingonassociationforcomputationallinguistics,pages417–424,Pittsburgh,Pennsylvania,USA.SidaWangandChristopherD.Manning.2012.Base-linesandbigrams:Simple,goodsentimentandtopicclassification.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguis-tics:ShortPapers-Volume2,pages90–94,JejuIsland,Korea.Xiu-ShenWei,JianxinWu,andZhi-HuaZhou.2014.Scalablemulti-instancelearning.InProceedingsoftheIEEEInternationalConferenceonDataMining,pages1037–1042,Shenzhen,China.NilsWeidmann,EibeFrank,andBernhardPfahringer.2003.Atwo-levellearningmethodforgeneralizedmulti-instanceproblems.InProceedingsofthe14thEuropeanConferenceonMachineLearning,pages468–479,Dubrovnik,Croatia.JanyceWiebe,TheresaWilson,andClaireCardie.2005.Annotatingexpressionsofopinionsandemotionsinlanguage.Languageresourcesandevaluation,39(2):165–210.JiajunWu,YinanYu,ChangHuang,andKaiYu.2015.Deepmultipleinstancelearningforimageclassifica-tionandauto-annotation.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecogni-tion,pages3460–3469,Boston,Massachusetts,USA.RuiXiaandChengqingZong.2010.Exploringtheuseofwordrelationfeaturesforsentimentclassification.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics:Posters,pages1336–1344,Beijing,China.XinXuandEibeFrank.2004.Logisticregressionandboostingforlabeledbagsofinstances.InProceed-ingsofthePacific-AsiaConferenceonKnowledgeDis-coveryandDataMining,pages272–281.Springer-Verlag.Wei Xu, Alan Ritter, Chris Callison-Burch, William B.Dolan,andYangfengJi.2014.ExtractinglexicallydivergentparaphrasesfromTwitter.TransactionsoftheAssociationforComputationalLinguistics,2:435–448.ZichaoYang,DiyiYang,ChrisDyer,XiaodongHe,AlexSmola,andEduardHovy.2016.Hierarchicalatten-tionnetworksfordocumentclassification.InPro-ceedingsofthe2016ConferenceoftheNorthAmeri-canChapteroftheAssociationforComputationalLin-guistics:HumanLanguageTechnologies,pages1480–1489,SanDiego,California,USA.MatthewD.Zeiler.2012.ADADELTA:anadaptivelearningratemethod.CoRR,abs/1212.5701.QiZhang,SallyA.Goldman,WeiYu,andJasonE.Fritts.2002.Content-basedimageretrievalusingmultiple-instancelearning.InProceedingsofthe19thInter-nationalConferenceonMachineLearning,volume2,pages682–689,Sydney,Australia.ChaZhang,JohnC.Platt,andPaulA.Viola.2006.Mul-tipleinstanceboostingforobjectdetection.InAd-vancesinNeuralInformationProcessingSystems18,pages1417–1424.MITPress.Zhi-HuaZhou,Yu-YinSun,andYu-FengLi.2009.Multi-instancelearningbytreatinginstancesasnon-iidsamples.InProceedingsofthe26thAnnualIn-ternationalConferenceonMachineLearning,pages1249–1256,Montr´eal,Quebec.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
0
0
2
1
5
6
7
6
0
2
/
/
t
l
a
c
_
a
_
0
0
0
0
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
32
Télécharger le PDF