计算语言学协会会刊, 2 (2014) 155–168. 动作编辑器: Janyce Wiebe.

Submitted 6/2013; 修改 11/2013; 已发表 4/2014. C
(西德:13)

2014 计算语言学协会.

Senti-LSSVM:Sentiment-OrientedMulti-RelationExtractionwithLatentStructuralSVMLizhenQuMaxPlanckInstituteforInformaticslqu@mpi-inf.mpg.deYiZhangNuanceCommunicationsyi.zhang@nuance.comRuiWangDFKIGmbHmars198356@hotmail.comLiliJiangMaxPlanckInstituteforInformaticsljiang@mpi-inf.mpg.deRainerGemullaMaxPlanckInstituteforInformaticsrgemulla@mpi-inf.mpg.deGerhardWeikumMaxPlanckInstituteforInformaticsweikum@mpi-inf.mpg.deAbstractExtractinginstancesofsentiment-orientedre-lationsfromuser-generatedwebdocumentsisimportantforonlinemarketinganalysis.Un-likepreviouswork,weformulatethisextrac-tiontaskasastructuredpredictionproblemanddesignthecorrespondinginferenceasanintegerlinearprogram.OurlatentstructuralSVMbasedmodelcanlearnfromtrainingcor-porathatdonotcontainexplicitannotationsofsentiment-bearingexpressions,anditcansi-multaneouslyrecognizeinstancesofbothbi-nary(polarity)andternary(comparative)re-lationswithregardtoentitymentionsofin-terest.Theempiricalevaluationshowsthatourapproachsigniﬁcantlyoutperformsstate-of-the-artsystemsacrossdomains(camerasandmovies)andacrossgenres(reviewsandforumposts).Thegoldstandardcorpusthatwebuiltwillalsobeavaluableresourceforthecommunity.1IntroductionSentiment-orientedrelationextraction(Choietal.,2006)isconcernedwithrecognizingsentimentpo-laritiesandcomparativerelationsbetweenentitiesfromnaturallanguagetext.Identifyingsuchrela-tionsoftenrequiressyntacticandsemanticanalysisatbothsentenceandphraselevel.Mostpriorworkonsentimentanalysisconsidereitheri)subjectivesentencedetection(YuandKübler,2011),二)po-larityclassiﬁcation(JohanssonandMoschitti,2011;Wilsonetal.,2005),oriii)comparativerelationidentiﬁcation(JindalandLiu,2006;Ganapathib-hotlaandLiu,2008).Inpractice,然而,differ-enttypesofsentiment-orientedrelationsfrequentlycoexistindocuments.Inparticular,wefoundthatmorethan38%ofthesentencesinourtestcorpuscontainmorethanonetypeofrelations.Theiso-latedanalysisapproachisinappropriatebecausei)itsacriﬁcesacuracybyignoringtheintricateinterplayamongdifferenttypesofrelations;二)itcouldleadtoconﬂictingpredictionssuchasestimatingarelationcandidateasbothnegativeandcomparative.There-fore,inthispaper,weidentifyinstancesofbothsen-timentpolaritiesandcomparativerelationsforenti-tiesofinterestsimultaneously.Weassumethatallthementionsofentitiesandattributesaregiven,andentitiesaredisambiguated.Itisawidelyusedas-sumptionwhenevaluatingamoduleinapipelinesystemthattheoutputsofprecedingmodulesareerror-free.Tothebestofourknowledge,theonlyexist-ingsystemcapableofextractingbothcomparisonsandsentimentpolaritiesisarule-basedsystempro-posedbyDingetal.(2009).Wearguethatitisbettertotacklethetaskbyusingauniﬁedmodelwithstructuredoutputs.Itallowsustoconsiderasetofcorrelatedrelationinstancesjointlyandchar-acterizetheirinteractionthroughasetofsoftandhardconstraints.Forexample,wecanencodecon-straintstodiscourageanattributetoparticipateinapolarityrelationandacomparativerelationatthesametime.Asaresult,thesystemextractsasetofcorrelatedinstancesofsentiment-orientedrelationsfromagivensentence.Forexample,withthesen-tenceaboutthecameraCanon7D,“Thesensorisgreat,butthepriceishigherthanNikonD7000.”theexpectedoutputispositive(Canon7D,sensor)

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

156

andpreferred(NikonD7000,Canon7D,textit-price).然而,constructingafullyannotatedtrain-ingcorpusforthistaskislabor-intensiveandre-quiresstronglinguisticbackground.Weminimizethisoverheadbyapplyingasimpliﬁedannotationscheme,inwhichannotatorsmarkmentionsofen-titiesandattributes,disambiguatetheentities,andlabelinstancesofrelationsforeachsentence.Basedonthenewscheme,wehavecreatedasmallSenti-mentRelationGraph(SRG)corpusforthedomainsofcamerasandmovies,whichsigniﬁcantlydiffersfromthecorporausedinpriorwork(WeiandGulla,2010;Kessleretal.,2010;Topraketal.,2010;Wiebeetal.,2005;HuandLiu,2004)inthefollow-ingways:我)bothsentimentpolaritiesandcompar-ativerelationsareannotated;二)allmentioneden-titiesaredisambiguated;andiii)nosubjectiveex-pressionsareannotated,unlesstheyarepartofentitymentions.Thenewannotationschemeraisesanewchal-lengeforlearningalgorithmsinthattheyneedtoautomaticallyﬁndtextualevidencesforeachanno-tatedrelationduringtraining.Forexample,withthesentence“IliketheRebelalittlebetter,butthatisanotherpricejump”,simplyassigningasentiment-bearingexpressiontothenearestrelationcandidateisinsufﬁcient,especiallywhenthesentimentisnotexplicitlyexpressed.Inthispaper,weproposeSENTI-LSSVM,alatentstructuralSVMbasedmodelforsentiment-orientedrelationextraction.SENTI-LSSVMisappliedtoﬁndthemostlikelysetoftherelationinstancesexpressedinagivensentence,wherethelatentvariablesareusedtoassignthemostappropriatetextualevidencestotherespectiveinstances.Insummary,thecontributionsofthispaperarethefollowing:•WeproposeSENTI-LSSVM:theﬁrstuniﬁedsta-tisticalmodelwiththecapabilityofextractinginstancesofbothbinaryandternarysentiment-orientedrelations.•Wedesignatask-speciﬁcintegerlinearpro-gramming(ILP)formulationforinference.•WeconstructanewSRGcorpusasavaluableassetfortheevaluationofsentimentrelationextraction.•Weconductextensiveexperimentswithon-linereviewsandforumposts,showingthatSENTI-LSSVMmodelcaneffectivelylearnfromatrainingcorpuswithoutexplicitlyannotatedsubjectiveexpressionsandthatitsperformancesigniﬁcantlyoutperformsstate-of-the-artsys-tems.2RelatedWorkThereareampleworksonanalyzingsentimentpo-laritiesandentitycomparisons,butthemajorityofthemstudiedthetwotasksinisolation.Mostpriorapproachesforﬁne-grainedsentimentanalysisfocusonpolarityclassiﬁcation.Super-visedapproachesonexpression-levelanalysisre-quiretheannotationofsentiment-bearingexpres-sionsastrainingdata(Jinetal.,2009;ChoiandCardie,2010;JohanssonandMoschitti,2011;YessenalinaandCardie,2011;WeiandGulla,2010).然而,thecorrespondingannotationpro-cessistime-consuming.Althoughsentence-levelannotationsareeasiertoobtain,theanalysisatthislevelcannotcopewithsentencesconveyingrelationsofmultipletypes(McDonaldetal.,2007;TäckströmandMcDonald,2011;Socheretal.,2012).Lexicon-basedapproachesrequirenotrainingdata(Kuetal.,2006;KimandHovy,2006;Godboleetal.,2007;Dingetal.,2008;PopescuandEtzioni,2005;Liuetal.,2005)butsufferfrominferiorperformance(Wil-sonetal.,2005;Quetal.,2012).Incontrast,ourmethodrequiresnoannotationofsentiment-bearingexpressionsfortrainingandcanpredictbothsenti-mentpolaritiesandcomparativerelations.Sentiment-orientedcomparativerelationshavebeenstudiedinthecontextofuser-generateddis-course(JindalandLiu,2006;GanapathibhotlaandLiu,2008).Approachesrelyonlinguisticallymoti-vatedrulesandassumetheexistenceofindependentkeywordsinsentenceswhichindicatecomparativerelations.Therefore,thesemethodsfallshortofex-tractingcomparativerelationsbasedondomainde-pendentinformation.BothJohanssonandMoschitti(2011)andWuetal.(2011)formulateﬁne-grainedsentimentanaly-sisasalearningproblemwithstructuredoutputs.However,theyfocusonlyonpolarityclassiﬁcation

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

157

ofexpressionsandrequireannotationofsentiment-bearingexpressionsfortrainingaswell.WhileILPhasbeenpreviouslyappliedforinfer-enceinsentimentanalysis(ChoiandCardie,2009;SomasundaranandWiebe,2009;Wuetal.,2011),ourtaskrequiresacompleteILPreformulationdueto1)theabsenceofannotatedsentimentexpressionsand2)theconstraintsimposedbythejointextrac-tionofbothsentimentpolarityandcomparativere-lations.3SystemOverviewThissectiongivesanoverviewofthewholesystemforextractingsentiment-orientedrelationinstances.Priortopresentingthesystemarchitecture,wein-troducetheessentialconceptsandthedeﬁnitionsoftwokindsofdirectedhypergraphsastherepresen-tationofcorrelatedrelationinstancesextractedfromsentences.3.1ConceptsandDeﬁnitionsEntity.Anentityisanabstractorconcretething,whichneedsnotbeofmaterialexistence.Anentityinthispaperreferstoeitheraproductorabrand.Attribute.Anattributeisanobjectcloselyassoci-atedwithorbelongingtoanentity,suchasthelensofdigitalcamera.Sentiment-OrientedRelation.Asentiment-orientedrelationiseitherasentimentpolarityoracomparativerelation,deﬁnedontuplesofentitiesandattributes.Asentimentpolarityrelationconveyseitherapositiveoranegativeattitudetowardsenti-tiesortheirattributes,whereasacomparativerela-tionindicatesthepreferenceofoneentityovertheotherentityw.r.t.anattribute.RelationInstance.Aninstanceofsentimentpolar-itytakestheformr(实体,attribute)withr∈{pos-itive,negative},suchaspositive(Canon7D,sen-sor).Thepolarityinstancesexpressedintheformofunaryrelations,suchas“NikonD7000isex-cellent.”,aredenotedasbinaryrelationsr(实体,whole),wheretheattributewholeindicatestheen-tityasawhole.Incontrast,aninstanceofcompar-ativerelationisintheformofpreferred{实体,en-tity,attribute},e.g.preferred(Canon7D,NikonD7000,price).Forbrevity,werefertoaninstancesetofsentiment-orientedrelationsextractedfromasentenceasansSoR.Torepresenttheinstancesoftheremainingrelations,werepresentthemasother{实体,attribute},suchastextitpartOf{wheel,car}.Theserelationsincludeobjectiverelationsandthesubjectiverelationsotherthansentiment-orientedrelations.Mention-BasedRelationInstances.Amention-basedrelationinstancereferstoatupleofentitymentionswithacertainrelation.Thisconceptisin-troducedastherepresentationofinstancesinasen-tencebyreplacingentitieswiththecorrespondingentitymentions,suchaspositive(“CanonSD880i”,“wideangleview”).Figure1:AnexampleofMRG.Mention-BasedRelationGraph.Amention-basedrelationgraph(orMRG)representsacollectionofmention-basedrelationinstancesexpressedinasen-tence.AsillustratedinFigure1,anMRGisadi-rectedhypergraphG=hM,EiwithavertexsetMandanedgesetE.Avertexmi∈Mdenotesamentionofanentityoranattributeoccurringei-therwithinthesentenceorinitscontext.Wesaythatamentionisfromthecontextifitismentionedintheprevioussentenceorisanattributeimpliedinthecurrentsentence.Aninstanceofabinaryre-lationinanMRGtakestheformofabinaryedgeel=(mi,嘛),wheremiandmadenoteanen-titymentionandanattributementionrespectively,andthetypel∈{积极的,negative,其他}.Aternaryedgeelindicatingcomparativerelationisrepresentedasel=(mi,mj,嘛),wheretwoen-titymentionsmiandmjarecomparedwithrespecttotheattributementionma.Wedeﬁnethetypel∈{更好的,更差}toindicatetwopossibledirec-tionsoftherelationandassumemioccursbeforemj.Asaresult,wehaveasetLofﬁverelationtypes:积极的,negative,更好的,worseorother.Ac-cordingtothesedeﬁnitions,theannotationsintheSRGcorpusareactuallyMRGsanddisambiguatedentities.Iftherearemultiplementionsreferringtothesameentity,annotatorsareaskedtochoosethe

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

158

mostobviousonebecauseitsavesannotationtimeandislessdemandingfortheentityrecognitionanddiambiguationmodules.Figure2:AnexampleofeMRG.Thetextualevi-dencesarewrappedbygreendashedboxes.EvidentiaryMention-BasedRelationGraph.Anevidentiarymention-basedrelationgraph,coinedeMRG,extendsanMRGbyassociatingeachedgewithatextualevidencetosupportthecorrespondingrelationassertions(seeFigure2).最后,anedgeinaneMRGisdenotedbyapair(A,C),wherearepresentsamention-basedrelationinstanceandcistheassociatedtextualevidence.Itisalsore-ferredtoasanevidentiaryedge.representedasel=(mi,mj,嘛),anMRGasanevidentiaryMRG(eMRG)andtheedgesofeMRGsasevidentiaryedges,asshowninFigure2.3.2SystemArchitectureFigure3:Systemarchitecture.AsillustratedbyFigure3,atthecoreofoursys-temistheSENTI-LSSVMmodel,whichextractssetsofmention-basedrelationshipsintheformofeMRGsfromsentences.Foragivensentencewithknownentitymentions,weselectallpossiblementionsetsasrelationcandidates,whereeachsetincludesatleastoneentitymention.Thenweassociateeachrelationcandidatewithasetofconstituentsorthewholesentenceasthetextualevidencecandidates(cf.Section6.1).随后,theinferencecom-ponentaimstoﬁndthemostlikelyeMRGfromallpossiblecombinationsofmention-basedrelationin-stancesandtheirtextualevidences(cf.Section6.2).TherepresentationeMRGischosenbecauseitchar-acterizesexactlythemodeloutputsbylettingeachedgecorrespondtoaninstanceofmention-basedre-lationandtheassociatedtextualevidence.Finally,themodelparametersofthismodelarelearnedbyanonlinealgorithm(cf.Section7).Sinceinstancesetsofsentiment-orientedrelations(sSoRs)aretheexpectedoutputs,wecanobtainsSoRsfromMRGsbyusingasimplerule-basedal-gorithm.Thealgorithmessentiallymapsthemen-tionsfromanMRGintoentitiesandattributesinansSoRandlabelthecorrespondingtupleswiththere-lationtypesoftheedgesfromanMRG.Forinstancesofcomparativerelation,thelabelbetterorworseismappedtotherelationtypepreferred.4SENTI-LSSVMModelThetaskofsentiment-orientedrelationextractionistodeterminethemostlikelysSoRinasentence.SincesSoRsarederivedfromthecorrespondingMRGsasdescribedinSection3,thetaskisreducedtoﬁndthemostlikelyMRGforeachsentence.SinceanMRGiscreatedbyassigningrelationtypestoasubsetofallrelationcandidates,whicharepossibletuplesofmentionswithunknownrelationtypes,thenumberofMRGscanbeextremelyhigh.Totacklethetask,onesolutionistoemployanedge-factoredlinearmodelintheframeworkofstructuralSVM(Martinsetal.,2009;Tsochantaridisetal.,2004).Themodelsuggeststhatabagoffea-turesshouldbespeciﬁedforeachrelationcandidate,andthenthemodelpredictsthemostlikelycandi-datesetsalongwiththeirrelationtypestoformtheoptimalMRGs.Asweobserved,forarelationcan-didate,themostinformativefeaturesarethewordsnearitsentitymentionsintheoriginaltext.How-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

159

曾经,ifwerepresentacandidatebyallthesewords,itisverylikelythattheinstancesofdifferentrelationtypesshareoverlysimilarfeatures,becauseamen-tionisofteninvolvedinmorethanonerelationcan-didate,asshowninFigure2.Asaconsequence,theinstancesofdifferentrelationsrepresentedbyoverlysimilarfeaturescaneasilyconfusethelearningalgo-rithm.Thus,itiscriticaltoselectproperconstituentsorsentencesastextualevidencesforeachrelationcandidateinbothtrainingandtesting.Consequently,wedividethetaskofsentiment-orientedrelationextractionintotwosubtasks:我)identifyingthemostlikelyMRGs;二)assigningpropertextualevidencestoeachedgeofMRGstosupporttheirrelationassertions.Itisdesirabletocarryoutthetwosubtasksjointlyasthesetwosub-taskscouldenhanceeachother.First,theidentiﬁ-cationofrelationtypesrequirespropertextualev-idences;第二,thesoftandhardconstraintsim-posedbythecorrelatedrelationinstancesfacilitatetherecognitionofthecorrespondingtextualevi-dences.SincetheeMRGsarecreatedbyattachingeveryMRGwithasetoftextualevidences,tacklingthetwosubtaskssimultaneouslyisequivalenttose-lectingthemostlikelyeMRGfromasetofeMRGcandidates.ItischallengingbecauseourSRGcorpusdoesnotcontainanyannotationoftextualevidences.Formally,letXdenotethesetofallavailablesen-tences,andwedeﬁney∈Y(X)(x∈X)asthesetoflabelededgesofanMRGandY=∪x∈XY(X).Sincetheassignmentsoftextualevidencesarenotobserved,anassignmentofevidencestoyisde-notedbyalatentvariableh∈H(X)andH=∪x∈XH(X).然后(y,H)correspondstoaneMRG,和(A,C)∈(y,H)isalabelededgeaattachedwithatextualevidencec.GivenalabeleddatasetD={(x1,y1),…,(xn,yn)}∈(X×Y)n,weaimtolearnadiscriminantfunctionf:X→Y×HthatoutputstheoptimaleMRG(y,H)∈Y(X)×H(X)foragivensentencex.Duetotheintroductionoflatentvariables,weadoptthelatentstructuralSVM(YuandJoachims,2009)forstructuralclassiﬁcation.Ourdiscriminantfunctionisdeﬁnedasf(X)=argmax(y,H)∈Y(X)×H(X)β>Φ(X,y,H)(1)whereΦ(X,y,H)isthefeaturefunctionofaneMRG(y,H)andβisthecorrespondingweightvector.Toensuretractability,wealsoemployedge-basedfactorizationforourmodel.LetMpdenoteasetofentitymentionsandyr(mi)beasetofedgeslabeledwithsentiment-orientedrelationsincidenttomi,thefactorizationofΦ(X,y,H)isgivenasΦ(X,y,H)=X(A,C)∈(y,H)Φe(X,A,C)+(2)Xmi∈MpXa,a0∈yr(mi),a6=a0Φc(A,a0)whereΦe(X,A,C)isalocaledgefeaturefunctionforalabelededgeaattachedwithatextualevidencecandΦc(A,a0)isafeaturefunctioncapturingco-occurrenceoftwolabelededgesamianda0miinci-denttoanentitymentionmi.5FeatureSpaceThefollowingfeaturesareusedinthefeaturefunc-tions(Equation2):一元词组:Asmentionedbefore,atextualevi-denceattachedtoanedgeinMRGiseitheraword,phraseorsentence.Weconsideralllemmatizedun-igramsinthetextualevidenceasunigramfeatures.Context:Sincewebusersusuallyexpressrelatedsentimentsaboutthesameentityacrosssentenceboundaries,wedescribethesentimentﬂowusingasetofcontextualbinaryfeatures.Forexample,ifen-tityAismentionedinboththeprevioussentenceandthecurrentsentence,asetofcontextualbinaryfea-turesareusedtoindicateallpossiblecombinationsofthecurrentandthepreviousmentionedsentiment-orientedrelationsregardingtoentityA.Co-occurrence:Wehavementionedtheco-occurrencefeatureinEquation2,indicatedbyΦc(A,a0).Itcapturestheco-occurrenceoftwola-belededgesincidenttothesameentitymention.Notethattheco-occurrencefeaturefunctioniscon-sideredonlyifthereisacontrastconjunctionsuchas“but”betweenthenon-sharedentitymentionsinci-denttothetwolabelededges.Senti-predictors:Followingtheideaof(Quetal.,2012),weencodethepredictionresultsfromtherule-basedphrase-levelmulti-relationpredic-tor(Dingetal.,2009)andfromthebag-of-opinionspredictor(Quetal.,2010)asfeaturesbasedonthetextualevidence.Theoutputoftheﬁrstpredictorisanintegervalue,whiletheoutputofthesecondpredictorisasentimentrelation,suchas“positive”,

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

160

“negative”,“better”or“worse”.Wemaptherela-tionaloutputsintointegervaluesandthenencodetheoutputsfrombothpredictorsassenti-predictorfeatures.Others:Thecommonlyusedpart-of-speechtagsarealsoincludedasfeatures.Moreover,foranedgecandidate,asetofbinaryfeaturesareusedtodenotethetypesoftheedgeanditsentitymentions.Forin-stance,abinaryfeatureindicateswhetheranedgeisabinaryedgerelatedtoanentitymentionedincon-text.Tocharacterizethesyntacticdependenciesbe-tweentwoadjacententitymentions,weusethepathinthedependencytreebetweentheheadsofthecor-respondingconstituents,thenumberofwordsandothermentionsin-betweenasfeatures.Additionally,ifthetextualevidenceisaconstituent,itsfeaturew.r.t.anedgeisthedependencypathtotheclos-estmentionoftheedgethatdoesnotoverlapwiththisconstituent.6StructuralInferenceInordertoﬁndthebesteMRGforagivensentencewithawelltrainedmodel,weneedtodeterminethemostlikelyrelationtypeforeachrelationcandi-dateandsupportthecorrespondingassertionswithpropertextualevidences.WeformulatethistaskasanIntegerLinearProgramming(ILP).Insteadofconsideringallconstituentsofasentence,weempir-icallyselectasubsetastextualevidencesforeachrelationcandidate.6.1TextualEvidenceCandidatesSelectionTextualevidencesareselectedbasedonthecon-stituenttreesofsentencesparsedbytheStanfordparser(KleinandManning,2003).Foreachmen-tioninasentence,weﬁrstlocateaconstituentinthetreewiththemaximaloverlapbyJaccardsim-ilarity.Startingfromthisconstituent,weconsidertwotypesofcandidates:typeIcandidatesarecon-stituentsatthehighestlevelwhichcontainneitheranywordofanothermentionnoranycontrastcon-junctionssuchas“but”;typeIIcandidatesarecon-stituentsatthehighestlevelwhichcoverexactlytwomentionsofanedgeanddonotoverlapwithanyothermentions.Forabinaryedgeconnectinganen-titymentionandanattributemention,weconsideratypeIcandidatestartingfromtheattributemen-tion.Forabinaryedgeconnectingtwoentitymen-tions,weconsidertypeIcandidatesstartingfrombothmentions.Moreover,foracomparativeternaryedge,weconsiderbothtypeIandtypeIIcandidatesstartingfromtheattributemention.Thisstrategyisbasedonourobservationthatthesecandidatesof-tencoverthemostimportantinformationw.r.t.thecoveredentitymentions.6.2ILPFormulationWeformulatetheinferenceproblemofﬁndingthebesteMRGasanILPproblemduetoitsconvenientintegrationofbothsoftandhardconstraints.Giventhemodelparametersβ,wereformulatethescoreofaneMRGinthediscriminantfunction(1)asfollows,β>Φ(X,y,H)=X(A,C)∈(y,H)saczac+Xmi∈MpXa,a0∈yr(mi),a6=a0saa0zaa0wheresac=β>Φe(X,A,C)denotesthescoreofalabelededgeaattachedwithatextualevidencec,saa0=β>Φc(A,a0)istheedgeco-occurrencescore,thebinaryvariablezacindicatesthepresenceorab-senceofthecorrespondingedge,andzaa0indicatesiftwoedgesco-occurr.AsnoteveryedgesetcanformaneMRG,werequirethatavalideMRGshouldsatisfyasetoflinearconstraints,whichformourconstraintspace.Thenfunction(1)isequivalenttomaxz∈Bs>z+µzds.t.Azητ≤dz,这,τ∈BwhereB=2SwithS={0,1},andηandτareauxiliarybinaryvariablesthathelpdeﬁnethecon-straintspace.TheaboveoptimizationproblemtakesexactlytheformofanILPbecauseboththecon-straintsandtheobjectivefunctionarelinear,andallvariablestakeonlyintegervalues.Inthefollowing,weconsidertwotypesofcon-straintspace,1)aneMRGwithonlybinaryedgesand2)aneMRGwithbothbinaryandternaryedges.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

161

eMRGwithonlyBinaryEdges:AneMRGhasonlybinaryedgesifasentencecontainsnoattributementionoratmostoneentitymention.Weexpectthateachedgehasonlyonerelationtypeandissup-portedbyasingletextualevidence.Tofacilitatetheformulationofconstraints,weintroduceηeltode-notethepresenceorabsenceofalabelededgeel,andηectoindicateifatextualevidencecisassignedtoanunlabelededgee.Thenthebinaryvariableforthecorrespondingevidentiaryedgezelc=ηec∧ηel,wheretheILPformulationofconjunctioncanbefoundin(Martinsetal.,2009).LetCedenotethesetoftextualevidencecandi-datesofanunlabelededgee.Theconstraintofatmostonetextualevidenceperedgeisformulatedas:Xc∈Ceηec≤1(3)Onceatextualevidenceisassignedtoanedge,theirrelationlabelsshouldmatchandthenumberoflabelededgesmustagreewiththenumberofat-tachedtextualevidences.Further,weassumethatatextualevidencecconveysatmostonerelationsothatanevidencewillnotbeassignedtotherelationsofdifferenttypes,whichisthemainproblemforthestructuralSVMbasedmodel.Letηclindicatethatthetextualevidencecislabeledbytherelationtypel.Thecorrespondingconstraintsareexpressedas,Xl∈Leηel=Xc∈Ceηec;zelc≤ηcl;Xl∈Lηcl≤1whereLedenotesthesetofallpossiblelabelsforanunlabelededgee,andListhesetofallrelationtypesofMRGs(cf.Section3).Inordertoavoidatextualevidencebeingoverlyreusedbymultiplerelationcandidates,weﬁrstpe-nalizetheassignmentofatextualevidencectoalabelededgeabyassociatingthecorrespondingzacwithaﬁxednegativecost−µintheobjectivefunc-tion.Thentheselectionofonetextualevidenceperedgeaisencouragedbyassociatingµtozdcintheobjectivefunction,wherezdc=We∈ScηecandScisthesetofedgesthatthetextualevidencecservesasacandidate.Thedisjunctionzdcisexpressedas:zdc≥ηe,e∈Sczdc≤Xe∈Scηe(A)Binaryedgestructure(乙)TernaryedgestructureFigure4:Alternativestructuresassociatedwithanattributemention.Thissoftconstraintnotonlyencouragesonetextualevidenceperedge,butalsokeepsiteligibleformul-tipleassignments.Foranytwolabelededgeaanda0incidenttothesameentitymention,theedge-to-edgeco-occurrenceisdescribedbyzca,a0=za∧za0.eMRGwithbothBinaryandTernaryEdges:Iftherearemorethanoneentitymentionsandatleastoneattributementioninasentence,aneMRGcanpotentiallyhavebothbinaryandternaryedges.Inthiscase,weassumethateachmentionofattributescanparticipateeitherinbinaryrelationsorinternaryrelations.Theassumptionholdsinmorethan99.9%ofthesentencesinourSRGcorpus,thuswedescribeitasasetofhardconstraints.Geometrically,theas-sumptioncanbevisualizedastheselectionbetweentwoalternativestructuresincidenttothesameat-tributemention,asshowninFigure4.Notethat,inthebinaryedgestructure,weincludenotonlytheedgesincidenttotheattributementionbutalsotheedgebetweenthetwoentitymentions.LetSbmibethesetofallpossiblelabelededgesinabinaryedgestructureofanattributementionmi.Variableτbmi=Wel∈Sbmiηelindicateswhethertheattributementionisassociatedwithabinaryedgestructureornot.Inthesamemanner,weuseτtmi=Wel∈Stmiηeltoindicatetheassociationoftheanattributementionmiwithanternaryedgestruc-turefromthesetofallincidentternaryedgesStmi.Theselectionbetweentwoalternativestructuresis

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

162

formulatedasτbmi+τtmi=1.Asthisinﬂuencesonlytheedgesincidenttoanattributemention,wekeepalltheconstraintsintroducedintheprevioussectionunchangedexceptforconstraint(3),whichismodiﬁedasXc∈Ceηec≤τbmi;Xc∈Ceηec≤τtmiTherefore,wecanhaveeitherbinaryedgesorternaryedgesforanattributemention.7LearningModelParametersGivenasetoftrainingsentencesD={(x1,y1),…,(xn,yn)},thebestweightvec-torβofthediscriminantfunction(1)isfoundbysolvingthefollowingoptimizationproblem:minβ1nnXi=1[max(ˆy,^h)∈Y(X)×H(X)(β>Φ(X,ˆy,^h)+δ(^h,ˆy,y))−max¯h∈H(X)β>Φ(X,y,¯h)]+ρ|β|](4)whereδ(^h,ˆy,y)isalossfunctionmeasuringthedis-crepanciesbetweenaneMRG(y,¯h)withgoldstan-dardedgelabelsyandaneMRG(ˆy,^h)withinferredlabelededgesˆyandtextualevidencesˆh.Duetothesparsenatureofthelexicalfeatures,weapplyL1regularizertotheweightvectorβ,andthedegreeofsparsityiscontrolledbythehyperparameterρ.SincetheL1normintheaboveoptimizationproblemisnotdifferentiableatzero,weapplytheonlineforward-backwardsplitting(FOBOS)algo-rithm(DuchiandSinger,2009).Itrequirestwostepsforupdatingtheweightvectorβbyusingasingletrainingsentencexoneachiterationt.βt+12=βt−εt∆tβt+1=argminβ12kβ−βtk2+εtρ|β|where∆tisthesubgradientcomputedwithoutcon-sideringtheL1normandεtisthelearningrate.Foralabeledsentencex,∆t=Φ(X,ˆy∗,ˆh∗)−Φ(X,y,¯h∗),wherethefeaturefunctionsofthecorre-spondingeMRGsareinferredbysolving(ˆy∗,ˆh∗)=argmax(^h,ˆy)∈H(X)×Y(X)[β>Φ(X,ˆy,^h)+δ(^h,ˆy,y)]和(y,¯h∗)=argmax¯h∈H(X)β>Φ(X,y,¯h),asin-dicatedintheoptimizationproblem(4).Theformerinferenceproblemissimilartotheoneweconsideredintheprevioussectionexcepttheinclusionofthelossfunction.WeincorporatethelossfunctionintotheILPformulationbydeﬁn-ingthelossbetweenanMRG(y,H)andagoldstan-dardMRGasthesumofper-edgecosts.Inourex-periments,weconsiderapositivecostϕforeachwronglylabelededgea,sothatifanedgeahasadifferentlabelfromthegoldstandard,weaddϕtothecoefﬁcientsacofthecorrespondingvariablezacintheobjectivefunctionoftheILPformulation.Inaddition,sincethenon-positiveweightsofedgelabelsintheinitiallearningphraseoftenleadtoeMRGswithmanyunlabelededges,whichharmsthelearningperformance,weﬁxitbyaddingacon-straintfortheminimalnumberoflabelededgesinaneMRG,Xa∈AXc∈Caηac≥ζ(5)whereAisthesetofalllabelededgecandidatesandζdenotestheminimalnumberoflabelededges.Empirically,thebestwaytodetermineζistomakeitequaltothemaximalnumberoflabelededgesinaneMRGwiththerestrictionthatatex-tualevidencecanbeassignedtoatmostoneedge.ByconsideringalltheedgecandidatesAandallthetextualevidencecandidatesCastwovertexsetsinabipartitegraphˆG=hV=(A,C),Ei(withedgesinEindicatingwhichtextualevidencecanbeassignedtowhichedge),ζcorrespondstoexactlythesizeofamaximummatchingofthebipartitegraph1.ToﬁndtheoptimaleMRG(y,¯h∗),forthegoldla-belkofeachedge,weconsiderthefollowingsetofconstraintsforinferencesincethelabelsoftheedgesareknownforthetrainingdata,Xc∈Ceηec≤1;ηec≤lckXk0∈Llck0≤1;Xe∈Scηec≤1Weincludealsothesoftconstraints,whichavoidatextualevidencebeingoverlyreusedbymultiplerelations,andtheconstraintssimilarto(5)toensureaminimalnumberoflabelededgesandaminimalnumberofsentiment-orientedrelations.1ItiscomputedbytheHopcroft-Karpalgorithm(HopcroftandKarp,1973)inourimplementation.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

163

8SRGCorpusForevaluationweconstructedtheSRGcorpus,whichintotalconsistsof1686manuallyannotatedonlinereviewsandforumpostsinthedigitalcameraandmoviedomains2.Foreachdomain,wemaintainasetofattributesandalistofentitynames.Theannotationschemeforthesentimentrepre-sentationassertsminimallinguisticknowledgefromourannotators.Byfocusingonthemeaningsofthesentences,theannotatorsmakedecisionsbasedontheirlanguageintuition,notrestrictedbyspeciﬁcsyntacticstructures.TakingtheexampleinFigure2,theannotatorsonlyneedtomarkthementionsofentitiesandattributesfromboththesentencesandthecontext,disambiguatethem,andlabel(“Canon7D”,“NikonD7000”,价格)asworseand(“Canon7D”,“sensor”)aspositive,whereasinpriorwork,peoplehaveannotatedthesentiment-bearingexpres-sionssuchas“great”andlinkthemtotherespectiverelationinstancesaswell.Thisalsoenablesthemtoannotateinstancesofbothsentimentpolarityandcomparativerelaton,whichareconveyedbynotonlyexplicitsentiment-bearingexpressionslike“excel-lentperformance”,butalsofactualexpressionsim-plyingevaluationssuchas“The7Vhas10xopticalzoomandthe9Vhas16x.”.CameraMovieReviewsForumsReviewsForumspositive3861539879905negative165363529331comparison304803935Table1:DistributionofrelationinstancesinSRGcorpus.14annotatorsparticipatedintheannotationproject.Afterashorttrainingperiod,annotatorsworkedonrandomlyassigneddocumentsoneatatime.Forproductreviews,thesystemlistsallrel-evantinformationabouttheentityandtheprede-ﬁnedattributes.Forforumposts,thesystemshowsonlytheattributelist.Foreachsentenceinadoc-ument,theannotatorﬁrstdeterminesifitreferstoanentityofinterest.Ifnot,thesentenceismarked2The107camerareviewsarefrombestbuy.comandAma-zon.com;the667cameraforumpostsaredownloadedfromfo-rum.digitalcamerareview.com;the138moviereviewsand774forumpostsarefromimdb.comandboards.ierespectivelyasoff-topic.Otherwise,theannotatorwillidentifythemostobviousmentions,disambiguatethem,andmarktheMRGs.Weevaluatetheinter-annotatoragreementonsSoRsintermsofCohen’sKappa(κ)(科恩,1968).AnaverageKappavalueof0.698wasachievedonarandomlyselectedsetconsistingof412sentences.Table1showsthecorpusdistributionafternor-malizingthemintosSoRs.Cameraforumpostscon-tainthelargestproportionofcomparisonsbecausetheyaremainlyabouttherecommendationofdig-italcameras.Incontrast,webusersaremuchlessinterestedincomparingmovies,inbothreviewsandforums.Inallsubsets,positiverelationsplayadom-inantrolesincewebusersintendtoexpressmorepositiveattitudesonlinethannegativeones(PangandLee,2007).9ExperimentsThissectiondescribestheempiricalevaluationofSENTI-LSSVMtogetherwithtwocompetitivebase-linesontheSRGcorpus.9.1ExperimentalSetupWeimplementedarule-basedbaseline(DING-RULE)andastructuralSVM(Tsochantaridisetal.,2004)基线(SENTI-SSVM)forcomparison.TheformersystemextendstheworkofDingetal.(2009),whichdesignedseverallinguistically-motivatedrulesbasedonasentimentpolaritylexi-conforrelationidentiﬁcationandassumesthereisonlyonetypeofsentimentrelationinasentence.Inourimplementation,wekeepalltherulesof(Dingetal.,2009)andaddonephrase-levelrulewhentherearemorethanonementioninasentence.Thead-ditionalruleassignssentiment-bearingwordsandnegatorstoitsnearestrelationcandidatesbasedontheabsolutesurfacedistancebetweenthewordsandthecorrespondingmentions.Inthiscase,thephrase-levelsentiment-orientedrelationsdependonlyontheassignedsentimentwordsandnegators.Thelat-tersystemisbasedonastructuralSVManddoesnotconsidertheassignmentoftextualevidencestorelationinstancesduringinference.Thetextualfea-turesofarelationcandidatearealllexicalandsen-timentpredictorfeatureswithinasurfacedistanceoffourwordsfromthementionsofthecandidate.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

164

因此,thisbaselinedoesnotneedtheinferencecon-straintsofSENTI-LSSVMfortheselectionoftextualevidences.Togainmoreinsightsintothemodel,wealsoevaluatethecontributionofindividualfea-turesofSENTI-LSSVM.Inaddition,toshowifidenti-fyingsentimentpolaritiesandcomparativerelationsjointlyworksbetterthantacklingeachtaskonitsown,wetrainSENTI-LSSVMforeachtaskseparatelyandcombinetheirpredictionsaccordingtocompat-ibilityrulesandthecorrespondinggraphscores.Foreachdomainandtextgenre,wewithheld15%documentsfordevelopmentandusetheremainingforcrossvalidation.Thehyperparametersofallsys-temsaretunedonthedevelopmentdatasets.ForallexperimentsofSENTI-LSSVM,weuseρ=0.0001fortheL1regularizerinEq.(4)andϕ=0.05forthelossfunction;andforSENTI-SSVM,ρ=0.0001andϕ=0.01.Sincetherelationtypeofoff-topicsentencesiscertainlyother,weevaluateallsystemswith5-foldcross-validationonlyontheon-topicsentencesintheevaluationdataset.SincethesamesSoRcanhaveseveralequivalentMRGsandtherela-tiontypeotherisnotofourinterest,weevaluatethesSoRsintermsofprecision,recallandF-measure.Allreportednumbersareaveragesoverthe5folds.9.2ResultsTable2showsthecompleteresultsofallsys-tems.HereourmodelSENTI-LSSVMoutperformedallbaselinesintermsoftheaverageF-measurescoresandrecallsbyalargemargin.TheF-measureonmoviereviewsisabout14%overthebestbase-line.Therule-basedsystemhashigherprecisionthanrecallinmostcases.However,simplyincreas-ingthecoverageofthedomainindependentsenti-mentpolaritylexiconmightleadtoworseperfor-mance(Taboadaetal.,2011)becausemanysen-timentorientedrelationsareconveyedbydomaindependentexpressionsandfactualexpressionsim-plyingevaluations,suchas“Thiscameradoesnothavemanualcontrol.”ComparedtoDING-RULE,SENTI-SSVMperformsbetterinthecameradomainbutworseforthemoviesduetomanymisclassi-ﬁcationofnegativerelationinstancesasother.ItalsowronglypredictedmorepositiveinstancesasotherthanSENTI-LSSVM.Wefoundthattherecallsoftheseinstancesarelowbecausetheyoftenhaveoverlysimilarfeatureswiththeinstancesofthetypeotherlinkingtothesamementions.Theproblemgetsworseinthemoviedomainsincei)manysen-tencescontainnoexplicitsentiment-bearingwords;二)thepriorpolarityofthesentiment-bearingwordsdonotagreewiththeircontextualpolarityinthesentences.Considerthefollowingexamplefromaforumpostaboutthemovie“SupermanReturns”:“HavealookatSuperman:theAnimatedSeriesorJusticeLeagueUnlimited…thatishowthechar-actersofSupermanandLexLuthorshouldbe.”.Incontrast,ourmodelminimizestheoverlappingfea-turesbyassigningthemtothemostlikelyrelationcandidates.Thisleadstosigniﬁcantlybetterper-formance.AlthoughSENTI-SSVMhaslowrecallforbothpositiveandnegativerelations,itachievesthehighestrecallforthecomparativerelationamongallsystemsinthemoviedomainandcamerareviews.Sincelessthan1%ofallinstancesareforcompara-tiverelationsinthesedocumentsetsandallmodelsaretrainedtooptimizetheoverallaccuracy,SENTI-LSSVMintendstotradeofftheminorityclassfortheoverallbetterperformance.Thisadvantagedisap-pearsonthecameraforumposts,wherethenumberofinstancesofcomparativerelationis12timesmorethanthatintheotherdatasets.Allsystemsperformbetterinpredictingpositiverelationsthanthenegativeones.Thiscorrespondswelltotheempiricalﬁndingsin(Wilson,2008)thatpeopleintendtousemorecomplexexpressionsfornegativesentimentsthantheirafﬁrmativecounter-parts.ItisalsoinaccordancewiththedistributionoftheserelationsinourSRGcorpuswhichisrandomlysampledfromtheonlinedocuments.Forlearningsystems,itcanalsobeexplainedbythefactthatthetrainingdataforpositiverelationsareconsiderablymorethanthosefornegativeones.Thecomparativerelationisthehardestonetoprocesssincewefoundthatmanycorrespondingexpressionsdonotcontainexplicitkeywordsforcomparison.Tounderstandtheperformanceofthekeyfea-turegroupsinourmodelbetter,weremoveeachgroupfromthefullSENTI-LSSVMsystemandeval-uatethevariationswithmoviereviewsandcameraforumposts,whichhaverelativelybalanceddistri-butionofrelationtypes.AsshowninTable3,thefeaturesfromthesentimentpredictorsmakesignif-icantcontributionsforbothdatasets.Thediffer-entdropsoftheperformanceindicatethatthepo-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

165

PositiveNegativeComparisonMicro-averagePRFPRFPRFPRFCameraForumDING-RULE56.439.046.146.224.031.642.614.021.053.430.839.0SENTI-SSVM60.235.644.844.238.541.228.040.132.943.736.739.9SENTI-LSSVM69.238.949.850.839.344.342.635.138.556.538.045.4CameraRe-viewDING-RULE83.669.075.668.638.849.630.016.921.681.158.668.1SENTI-SSVM72.675.474.063.962.563.228.038.932.568.170.469.3SENTI-LSSVM77.385.481.268.961.364.922.320.721.673.173.473.7MovieForumDING-RULE63.737.447.127.634.330.68.95.66.848.235.941.2SENTI-SSVM66.230.141.325.617.320.744.256.749.753.327.936.6SENTI-LSSVM63.344.252.129.745.636.040.145.042.449.744.647.0MovieRe-viewDING-RULE66.547.255.242.039.140.531.412.017.456.244.049.4SENTI-SSVM61.354.057.445.213.721.124.563.335.354.639.245.7SENTI-LSSVM59.079.167.653.351.452.328.334.030.957.968.862.9Table2:EvaluationresultsforDING-RULE,SENTI-SSVMandSENTI-LSSVM.Boldfaceﬁguresarestatisticallysigniﬁcantlybetterthanallothersinthesamecomparisongroupundert-testwithp=0.05.FeatureModelsMovieReviewsCameraForumsfullsystem62.945.4¬unigram63.2(+0.3)41.2(-4.2)¬context54.5(-8.4)46.0(+0.6)¬co-occurrence62.6(-0.3)44.9(-0.5)¬senti-predictors61.3(-1.6)34.3(-11.1)Table3:Micro-averageF-measureofSENTI-LSSVMwithdifferentfeaturemodelslaritiespredictedbyrulesaremoreconsistentincameraforumpoststhaninmoviereviews.Duetothecomplexityofexpressionsinthemoviere-viewsourmodelcannotbeneﬁtfromtheunigramfeaturesbutthesefeaturesareagoodcompensationforthesentimentpredictorfeaturesincamerafo-rumposts.Thesharpdropbyremovingthecontextfeaturesfromourmodelonmoviereviewsindicatesthatthesentimentsinmoviereviewsdependhighlyontherelationsoftheprevioussentences.Incon-trast,thesentiment-orientedrelationsoftheprevi-oussentencescouldbeareasonofoverﬁttingforcameraforumdata.Theedgeco-occurrencefea-turesdonotplayanimportantroleinourmodelsincethenumberofco-occurredsentiment-orientedrelationsinthesentenceswithcontrastconjunctionslike“but”issmall.However,wefoundthatallow-ingtheco-occurrenceofanysentiment-orientedre-lationswouldharmtheperformanceofthemodel.Inaddition,ourexperimentsshowedthatthesep-aratedapproach,whichtrainsamodelforsenti-mentpolaritiesandcomparativerelationsrespec-tively,leadstoadecreasebyalmost1%intermsoftheF-measureaveragedoverallfourdatasets.ThelargestdropofF-measureis3%oncameraforumposts,sincethisdatasetcontainsthelargestpropor-tionofcomparativerelations.Wefoundthattheer-rorsareincreasedwhenthetrainedmodelsmakeconﬂictingpredictions.Inthiscase,thejointap-proachcantakeallfactorsintoaccountandmakemoreconsistentdecisionsthantheseparatedap-proaches.10ConclusionWeproposedSENTI-LSSVMmodelforextractingin-stancesofbothsentimentpolaritiesandcomparativerelations.Forevaluatingandtrainingthemodel,wecreatedanSRGcorpusbyusingalightweightan-notationscheme.Weshowedthatourmodelcanautomaticallyﬁndtextualevidencestosupportitsrelationpredictionsandachievessigniﬁcantlybet-terF-measurescoresthanalternativestate-of-the-artmethods.ReferencesYejinChoiandClaireCardie.2009.Adaptingapolaritylexiconusingintegerlinearprogrammingfordomain-speciﬁcsentimentclassiﬁcation.InProceedingsofthe2009ConferenceonEmpiricalMethodsinNatural

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

166

LanguageProcessing:Volume2-Volume2,EMNLP’09,pages590–598,Stroudsburg,PA,USA.Associa-tionforComputationalLinguistics.YejinChoiandClaireCardie.2010.Hierarchicalse-quentiallearningforextractingopinionsandtheirat-tributes.InProceedingsoftheAnnualmeetingoftheAssociationforComputationalLinguistics,pages269–274.AssociationforComputationalLinguistics.YejinChoi,EricBreck,andClaireCardie.2006.Jointextractionofentitiesandrelationsforopinionrecog-nition.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages431–439,Stroudsburg,PA,USA.AssociationforCompu-tationalLinguistics.JacobCohen.1968.WeightedKappa:NominalScaleAgreementProvisionforScaledDisagreementorPar-tialCredit.Psychologicalbulletin,70(4):213.XiaowenDing,BingLiu,andPhilipS.Yu.2008.Aholisticlexicon-basedapproachtoopinionmining.InProceedingsofthe2008InternationalConferenceonWebSearchandDataMining,pages231–240,NewYork,纽约,USA.ACM.XiaowenDing,BingLiu,andLeiZhang.2009.Entitydiscoveryandassignmentforopinionminingapplica-tions.InProceedingsoftheACMSIGKDDConfer-enceonKnowledgeDiscoveryandDataMining,pages1125–1134.JohnDuchiandYoramSinger.2009.Efﬁcientonlineandbatchlearningusingforwardbackwardsplitting.TheJournalofMachineLearningResearch,10:2899–2934.MurthyGanapathibhotlaandBingLiu.2008.Miningopinionsincomparativesentences.InProceedingsofthe22ndInternationalConferenceonComputationalLinguistics-Volume1,pages241–248,Stroudsburg,PA,USA.AssociationforComputationalLinguistics.NamrataGodbole,ManjunathSrinivasaiah,andStevenSkiena.2007.Large-scalesentimentanalysisfornewsandblogs(systemdemonstration).InProceed-ingsoftheInternationalAAAIConferenceonWeblogsandSocialMedia.JohnEHopcroftandRichardMKarp.1973.Annˆ5/2algorithmformaximummatchingsinbipartitegraphs.SIAMJournaloncomputing,2(4):225–231.MinqingHuandBingLiu.2004.Miningandsumma-rizingcustomerreviews.InProceedingsofthetenthACMSIGKDDinternationalconferenceonKnowl-edgediscoveryanddatamining,ProceedingsoftheACMSIGKDDConferenceonKnowledgeDiscov-eryandDataMining,pages168–177,NewYork,纽约,USA.ACM.WeiJin,HungHayHo,andRohiniK.Srihari.2009.Opinionminer:anovelmachinelearningsystemforwebopinionminingandextraction.InProceedingsofthe15thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages1195–1204,NewYork,纽约,USA.ACM.NitinJindalandBingLiu.2006.Miningcomparativesentencesandrelations.InProceedingsofthe21stIn-ternationalConferenceonArtiﬁcialIntelligence-Vol-ume2,AAAI’06,pages1331–1336.AAAIPress.RichardJohanssonandAlessandroMoschitti.2011.Extractingopinionexpressionsandtheirpolarities–explorationofpipelinesandjointmodels.InProceed-ingsoftheAnnualmeetingoftheAssociationforCom-putationalLinguistics,volume11,pages101–106.JasonS.Kessler,MiriamEckert,LyndsieClark,andNicolasNicolov.2010.The2010icwsmjdpasent-mentcorpusfortheautomotivedomain.In4thInter-nationalAAAIConferenceonWeblogsandSocialMe-diaDataWorkshopChallenge(ICWSM-DWC2010).Soo-MinKimandEduardHovy.2006.Extractingopin-ions,opinionholders,andtopicsexpressedinonlinenewsmediatext.InProceedingsoftheWorkshoponSentimentandSubjectivityinText,SST’06,pages1–8,Stroudsburg,PA,USA.AssociationforComputationalLinguistics.DanKleinandChristopherD.Manning.2003.Accurateunlexicalizedparsing.InProceedingsofthe41stAn-nualMeetingonAssociationforComputationalLin-guistics-Volume1,ACL’03,pages423–430,Strouds-burg,PA,USA.AssociationforComputationalLin-guistics.Lun-WeiKu,Yu-TingLiang,andHsin-HsiChen.2006.Opinionextraction,summarizationandtrackinginnewsandblogcorpora.InAAAISpringSympo-sium:ComputationalApproachestoAnalyzingWe-blogs,pages100–107.BingLiu,MinqingHu,andJunshengCheng.2005.Opinionobserver:analyzingandcomparingopinionsontheweb.InProceedingsofthe14thinternationalconferenceonWorldWideWeb,pages342–351,NewYork,纽约,USA.ACM.AndréL.Martins,NoahA.Smith,andEricP.Xing.2009.Conciseintegerlinearprogrammingformula-tionsfordependencyparsing.InProceedingsoftheAnnualmeetingoftheAssociationforComputationalLinguistics,pages342–350.RyanT.McDonald,KerryHannan,TylerNeylon,MikeWells,andJeffreyC.Reynar.2007.Structuredmod-elsforﬁne-to-coarsesentimentanalysis.InProceed-ingsoftheAnnualmeetingoftheAssociationforCom-putationalLinguistics.BoPangandLillianLee.2007.Opinionminingandsentimentanalysis.FoundationsandTrendsinInfor-mationRetrieval,2(1-2):1–135.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

167

Ana-MariaPopescuandOrenEtzioni.2005.Extract-ingproductfeaturesandopinionsfromreviews.InProceedingsoftheconferenceonHumanLanguageTechnologyandEmpiricalMethodsinNaturalLan-guageProcessing,HLT’05,pages339–346,Strouds-burg,PA,USA.AssociationforComputationalLin-guistics.LizhenQu,GeorgianaIfrim,andGerhardWeikum.2010.Thebag-of-opinionsmethodforreviewrat-ingpredictionfromsparsetextpatterns.InChu-RenHuangandDanJurafsky,编辑,Proceedingsofthe23rdInternationalConferenceonComputationalLin-guistics(Coling2010),ACLAnthology,pages913–921,Beijing,China.TsinghuaUniversityPress.LizhenQu,RainerGemulla,andGerhardWeikum.2012.Aweaklysupervisedmodelforsentence-levelseman-ticorientationanalysiswithmultipleexperts.InJointConferenceonEmpiricalMethodsinNaturalLan-guageProcessingandComputationalNaturalLan-guageLearning(EMNLP-CoNLL),pages149–159,JejuIsland,韩国,July.ProceedingsoftheAnnualmeetingoftheAssociationforComputationalLinguis-tics.RichardSocher,BrodyHuval,ChristopherD.Manning,andAndrewY.Ng.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InProceed-ingsoftheConferenceonEmpiricalMethodsinNatu-ralLanguageProcessing,pages1201–1211.SwapnaSomasundaranandJanyceWiebe.2009.Rec-ognizingstancesinonlinedebates.InProceedingsoftheJointconferenceofthe47thAnnualMeetingoftheAssociationforComputationalLinguisticsandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAsianFederationofNaturalLan-guageProcessing,pages226–234.MaiteTaboada,JulianBrooke,MilanToﬁloski,Kim-berlyD.Voll,andManfredStede.2011.Lexicon-basedmethodsforsentimentanalysis.ComputationalLinguistics,37(2):267–307.OscarTäckströmandRyanMcDonald.2011.Discov-eringﬁne-grainedsentimentwithlatentvariablestruc-turedpredictionmodels.InProceedingsofthe33rdEuropeanconferenceonAdvancesininformationre-trieval,ECIR’11,pages368–374,Berlin,Heidelberg.Springer-Verlag.CigdemToprak,NiklasJakob,andIrynaGurevych.2010.Sentenceandexpressionlevelannotationofopinionsinuser-generateddiscourse.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,ACL’10,pages575–584,Stroudsburg,PA,USA.AssociationforComputationalLinguistics.IoannisTsochantaridis,ThomasHofmann,ThorstenJoachims,andYaseminAltun.2004.Supportvec-tormachinelearningforinterdependentandstructuredoutputspaces.InProceedingsoftheInternationalConferenceonMachineLearning,pages104–112.WeiWeiandJonAtleGulla.2010.Sentimentlearn-ingonproductreviewsviasentimentontologytree.InProceedingsoftheAnnualmeetingoftheAssociationforComputationalLinguistics,pages404–413.JanyceWiebe,TheresaWilson,andClaireCardie.2005.Annotatingexpressionsofopinionsandemotionsinlanguage.LanguageResourcesandEvaluation,39(2-3):165–210.TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsentimentanalysis.InProceedingsoftheconfer-enceonHumanLanguageTechnologyandEmpiricalMethodsinNaturalLanguageProcessing,HLT’05,pages347–354,Stroudsburg,PA,USA.AssociationforComputationalLinguistics.TheresaAnnWilson.2008.Fine-grainedsubjectivityandsentimentanalysis:recognizingtheintensity,po-larity,andattitudesofprivatestates.Ph.D.thesis,UNIVERSITYOFPITTSBURGH.YuanbinWu,QiZhang,XuanjingHuang,andLideWu.2011.Structuralopinionminingforgraph-basedsen-timentrepresentation.InProceedingsoftheConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,pages1332–1341.AinurYessenalinaandClaireCardie.2011.Composi-tionalmatrix-spacemodelsforsentimentanalysis.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages172–182.Chun-NamJohnYuandThorstenJoachims.2009.Learningstructuralsvmswithlatentvariables.InPro-ceedingsoftheInternationalConferenceonMachineLearning,page147.NingYuandSandraKübler.2011.Fillingthegap:Semi-supervisedlearningforopiniondetectionacrossdomains.InProceedingsoftheFifteenthConferenceonComputationalNaturalLanguageLearning,pages200–209.AssociationforComputationalLinguistics.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
1
7
3
1
5
6
6
8
3
4

/
t

我

A
C
_
A
_
0
0
1
7
3
p
d

乙
y
G
你
e
s
t

哦
n
0
9
S
e
p
e
米
乙
e
r
2
0
2
3

168 计算语言学协会会刊, 2 (2014) 155–168. 动作编辑器: Janyce Wiebe. 图像

下载pdf