Transactions of the Association for Computational Linguistics, 1 (2013) 99–110. Action Editor: Chris Callison-Burch.

Submitted 12/2012; Published 5/2013. C
(cid:13)

2013 Verein für Computerlinguistik.

UsingPivot-BasedParaphrasingandSentimentProﬁlestoImproveaSubjectivityLexiconforEssayDataBeataBeigmanKlebanov,NitinMadnani,JillBursteinEducationalTestingService660RosedaleRoad,Princeton,NJ08541,USA{bbeigmanklebanov,nmadnani,jburstein@ets.org}AbstractWedemonstrateamethodofimprovingaseedsentimentlexicondevelopedonessaydatabyusingapivot-basedparaphrasingsystemforlexicalexpansioncoupledwithsentimentpro-ﬁleenrichmentusingcrowdsourcing.Proﬁleenrichmentaloneyieldsupto15%improve-mentintheaccuracyoftheseedlexiconon3-waysentence-levelsentimentpolarityclassiﬁ-cationofessaydata.Usinglexicalexpansioninadditiontosentimentproﬁlesprovidesafurther7%improvementinperformance.Ad-ditionalexperimentsshowthattheproposedmethodisalsoeffectivewithothersubjectivitylexiconsandinadifferentdomainofapplica-tion(productreviews).1IntroductionInalmostanysub-ﬁeldofcomputationallinguistics,creationofworkingsystemsstartswithaninvest-mentinmanually-generatedormanually-annotateddataforcomputationalexploration.Insubjectivityandsentimentanalysis,annotationoftrainingandtestingdataandconstructionofsubjectivitylexiconshavebeenthelociofcostlylaborinvestment.Manysubjectivitylexiconsarementionedintheliterature.Thetwolargemanually-builtlexiconsforEnglish–theGeneralInquirer(Stoneetal.,1966)andthelexiconprovidedwiththeOpinion-Finderdistribution(WiebeandRiloff,2005)–areavailableforresearchandeducationonly1andun-derGNUGPLlicensethatdisallowstheirincor-porationintoproprietarymaterials,2respectively.1http://www.wjh.harvard.edu/inquirer/j11/manual/2http://www.gnu.org/copyleft/gpl.htmlThosewishingtointegratesentimentanalysisintoproducts,alongwiththosestudyingsubjectivityinlanguagesotherthanEnglish,orforspeciﬁcdo-mainssuchasﬁnance,orforparticulargenressuchasMySpacecomments,reportedconstructionoflexicons(Taboadaetal.,2011;LoughranandMcDonald,2011;Thelwalletal.,2010;RaoandRavichandran,2009;JijkounandHofmann,2009;PitelandGrefenstette,2008;Mihalceaetal.,2007).Inthispaper,weaddressthestepofexpandingasmall-scale,manually-builtsubjectivitylexicon(aseedlexicon,typicallyforadomainorlanguageinquestion)intoamuchlargerbutnoisierlexi-conusinganautomaticprocedure.Wepresentanovelexpansionmethodusingastate-of-the-artparaphrasingsystem.Theexpansionyieldsa4-foldincreaseinlexiconsize;yet,theexpansionaloneisinsufﬁcientinordertoimproveperformanceonsentence-levelsentimentpolarityclassiﬁcation.Inthispaperwetestthefollowinghypothesis.Wesuggestthattheeffectivenessoftheexpansionishamperedby(1)introductionofopposite-polarityitems,suchasintroducingresoluteasanexpansionofforceful,orremarkableasanexpansionofpecu-liar;(2)introductionofweaklypolar,neutral,oram-biguouswordsasexpansionsofpolarseedwords,suchasgeneratingconcernasanexpansionofanx-ietyorfutureasanexpansionofaftermath;3(3)in-abilitytodistinguishbetweenstrongerorclear-cutversusweakerorambiguoussentimentandtomakeadifferentialuseofthose.Weaddressitems(1)Und(2)byenrichingthelexi-conwithsentimentproﬁles(section3),andpropose3Table2andFigure1providesupporttotheseassessments.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

100

awayofeffectivelyutilizingthisinformationforthesentence-levelsentimentpolarityclassiﬁcationtask(sections5and6).Proﬁle-enrichmentaloneyieldsupto15%increaseinperformancefortheseedlexiconwhenusingdifferentmachinelearningalgorithms;paraphraser-basedexpansionwithsen-timentproﬁlesimprovesperformancebyanaddi-tional7%.Overall,weobserveanimprovementofupto25%inclassiﬁcationaccuracyovertheseedlexiconwithoutproﬁles.Insection7,wepresentcomparativeevaluations,demonstratingthecompetitivenessoftheexpandedandproﬁle-enrichedlexicon,aswellastheeffective-nessoftheexpansionandenrichmentparadigmpre-sentedherefordifferentsubjectivitylexicons,dif-ferentlexicalexpansionmethods,andinadifferentdomainofapplication(productreviews).2BuildingSubjectivityLexiconsThegoalofoursentimentanalysisprojectistoallowfortheidentiﬁcationofsentimentinsentencesthatappearinessayresponsestoavarietyoftasksde-signedtotestEnglishproﬁciencyinbothnative-andnon-native-speakerpopulationsinastandardizedas-sessmentaswellasinaninstructionalsettings.Inordertoallowforthefutureuseofthesentimentanalyzerinaproprietoryproductandtoensureitsﬁttothetest-takeressaydomain,webeganourworkwiththeconstructionofaseedlexiconrelyingonourmaterials(section2.1).Wethenusedastatisti-calparaphrasingsystemtoexpandtheseedlexicon(section2.2).2.1SeedLexiconInordertoinformtheprocessoflexiconconstruc-tion,werandomlysampled5,000essaysfromacor-pusofabout100,000essayscontainingwritingsam-plesacrossmanytopics.Essayswereresponsestoseveraldifferentwritingassignments,includinggraduateschoolentranceexams,non-nativeEnglishspeakerproﬁciencyexams,andprofessionallicen-sureexams.Ourseedlexiconisacombinationof(1)positiveandnegativesentimentwordsmanuallyselectedfromafulllistofwordtypesinthesedata,Und(2)wordsmarkedinasmall-scaleannotationofasampleofsentencesfromthesedataforallposi-tiveandnegativewords.Amoredetaileddescrip-tionoftheconstructionofseedlexiconcanbefoundinBeigmanKlebanovetal(2012).Theseedlexi-concontains749singlewords,406positiveand343negative.2.2ExpandedLexiconWeusedapivot-basedlexicalandphrasalpara-phrasegenerationsystem(MadnaniandDorr,2013).Theparaphraserimplementsthepivot-basedmethodasdescribedbyBannardandCallison-Burch(2005)withseveraladditionalﬁlteringmechanismstoin-creasetheprecisionoftheextractedpairs.Thepivot-basedmethodutilizestheinherentmonolin-gualsemanticknowledgefrombilingualcorpora:WeﬁrstidentifyphrasalcorrespondencesbetweenEnglishandagivenforeignlanguageF,thenmapfromEnglishtoEnglishbyfollowingtranslationunitsfromEnglishtotheotherlanguageandback.Forexample,ifthetwoEnglishphrasese1ande2bothcorrespondtothesameforeignphrasef,thentheymaybeconsideredtobeparaphrasesofeachotherwiththefollowingprobability:P(e1|e2)≈p(e1|F)P(F|e2)IfthereareseveralpivotphrasesthatlinkthetwoEnglishphrases,thentheyareallusedincomputingtheprobability:P(e1|e2)≈Xf0p(e1|f0)P(f0|e2)SeedExpansionSeedExpansionabuseexploitationcostlyonerousaccusereproachdangerousunsafeanxietydisquietimprovereinforceconﬂictcrisisinvaluablepreciousTable1:Examplesofparaphraserexpansions.SomeexamplesofexpansionsgeneratedbytheparaphraserareshowninTable1.MoredetailsaboutthiskindofapproachcanbefoundinBan-nardandCallison-Burch(2005).WeusetheFrench-Englishparallelcorpus(approximately1.2millionsentences)fromthecorpusofEuropeanparliamen-taryproceedings(Koehn,2005)asthedataonwhichpivotingisperformedtoextracttheparaphrases.However,thebaseparaphrasesystemissusceptible

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

101

tolargeamountsofnoiseduetotheimperfectbilin-gualwordalignments.Therefore,weimplementad-ditionalheuristicsinordertominimizethenum-berofnoisyparaphrasepairs(MadnaniandDorr,2013).Forexample,onesuchheuristicﬁltersoutpairswhereafunctionwordmayhavebeeninferredasaparaphraseofacontentword.Forthelexiconexpansionexperimentreportedhere,weusethetop15single-wordparaphrasesforeverywordfromtheseedlexicon,excludingmorphologicalvariantsoftheseedword.Thisprocessresultsinanexpandedlexiconof2,994differentwords,1,666positiveand1,761negative(433wordsareinboththepositiveandthenegativelists).Theexpandedlexiconin-cludestheseedlexicon.3InducingsentimentproﬁlesLetγwbethesentimentproﬁleofthewordw.γw=(pposw,pnegw,pneuw)(1)whereΣi∈{Pos,neg,neu}piw=1.Thus,asentimentproﬁleofawordisessentiallya3-sidedcoin,cor-respondingtoitsprobabilityofcomingoutpositive,negative,andneutral,respectively.3.1EstimatingsentimentproﬁlesOurgoalistoestimatetheproﬁleusingoutcomesofmultipletrialsasfollows.Foreveryword,apersonisshownthewordandaskedwhetheritispositive,negative,orneutral.Aperson’sdecisionismodeledasﬂippingthecoincorrespondingtotheword,andrecordingtheoutcome–positive,negative,orneu-tral.WerunN=20suchtrialsforeverywordintheexpandedlexiconusingtheCrowdFlowercrowd-sourcingsite,4foratotalcostof$800.Weusemaxi-mumlikelihoodestimateofsentimentproﬁle:ˆpiw=niw(2)whereniwistheproportionofNtrialsonthewordwthatfellincelli∈{Pos,neg,neu}.Table2showssomeestimatedproﬁles.FollowingGoodman(1965)andQuesenberryandHurst(1964),wecalculateconﬁdenceintervalsfortheparameterspiw:(ˆpiw)−=(B+2niw−T)/(2(N+B))(3)4www.crowdﬂower.comWordˆpposwˆpneuwˆpnegwforceful00.150.85resolute0.80.150.05peculiar0.050.150.8remarkable100anxiety001concern0.250.40.35absurd001laughable0.50.050.45deadly001fateful0.250.450.3consequence0.050.150.8outcome0.150.850Table2:Examplesofestimatedsentimentproﬁles.Wordsingrayareexpansionsgeneratedfromwordsintheprecedingrow;notethedifferenceintheproﬁles.(ˆpiw)+=(B+2niw+T)/(2(N+B))(4)whereT=qB[B+4niw(N−niw)/N])(5)Forconﬁdenceαthatallpiw,i∈{Pos,neg,neu}aresimultaneouslywithintheirrespectiveintervals,thevalueofBisdeterminedastheupperα/3×100thpercentileoftheχ2distributionwithonedegreeoffreedom.Weuseα=0.1,resultinginB=4.55.Theresultingintervalisabout0.2aroundtheestimatedvaluewhenˆpiwiscloseto0.5,andsomewhatnar-rowerforˆpiwcloserto0or1.Wewillusethisinfor-mationwheninducingfeaturesfromtheproﬁles.3.2SentimentdistributionsofthelexiconsTheestimatedsentimentproﬁlesperwordallowustovisualizethedistributionsofthetwolexicons.InFigure1,weplotthenumberofentriesinthelexi-conasafunctionofthedifferenceinpositiveandnegativepartsoftheproﬁle,in0.2-widebins.Thus,awordwwouldbeinthesecond-leftmostbinif−0.8<(ˆpposw−ˆpnegw)<−0.6.Whiletheexpansionprocessmorethandoublesthenumberofwordsinthehighestbinsforboththepositiveandthenegativepolarity,itclearlyintroducesalargenumberofwordsinthelow-andmediumbinsintothelexicon.Itisinthissensethattheexpansionprocessisnoisy;appa-rently,seedwordswithclearandstrongpolarity l D o w n o a d e d f r o m h t t p : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 1 3 1 5 6 6 6 2 5 / / t l a c _ a _ 0 0 2 1 3 P D . F B j G u e S T T O N 0 7 S e P e M B e R 2 0 2 3 102 1020040060080010001200−1.0−0.50.00.51.02.2020040060080010001200−1.0−0.50.00.51.03.2020040060080010001200−1.0−0.50.00.51.04.2.1020040060080010001200−1.0−0.50.00.51.05.2020040060080010001200−1.0−0.50.00.51.06.2020040060080010001200−1.0−0.50.00.51.07.2.1020040060080010001200−1.0−0.50.00.51.08.2020040060080010001200−1.0−0.50.00.51.09.2.1020040060080010001200−1.0−0.50.00.51.010.2.1020040060080010001200−1.0−0.50.00.51.011.2.1020040060080010001200−1.0−0.50.00.51.012.2.1020040060080010001200−1.0−0.50.00.51.013.2020040060080010001200−1.0−0.50.00.51.014.2.1020040060080010001200−1.0−0.50.00.51.01020040060080010001200−1.0−0.50.00.51.02.2020040060080010001200−1.0−0.50.00.51.03.2020040060080010001200−1.0−0.50.00.51.04.2.1020040060080010001200−1.0−0.50.00.51.05.2020040060080010001200−1.0−0.50.00.51.06.2020040060080010001200−1.0−0.50.00.51.07.2.1020040060080010001200−1.0−0.50.00.51.08.2020040060080010001200−1.0−0.50.00.51.09.2.1020040060080010001200−1.0−0.50.00.51.010.2.1020040060080010001200−1.0−0.50.00.51.011.2.1020040060080010001200−1.0−0.50.00.51.012.2.1020040060080010001200−1.0−0.50.00.51.013.2020040060080010001200−1.0−0.50.00.51.014.2.1020040060080010001200−1.0−0.50.00.51.0Figure1:Sentimentdistributionsfortheseed(links)andtheexpanded(Rechts)lexicons.areoftenexpandedintolowintensity,neutral,orambiguousones,asinpairslikeabsurd/laughable,deadly/fateful,anxiety/concernshowninTable2.4RelatedWorkThemostpopularseedexpansionmethodsdiscussedintheliteraturearebasedonWordNet(Müller,1995)oranotherlexicographicresource,ondis-tributionalsimilaritywiththeseeds,oronamix-turethereof(Cruzetal.,2011;Baccianellaetal.,2010;Velikovichetal.,2010;Qiuetal.,2009;Mo-hammadetal.,2009;EsuliandSebastiani,2006;KimandHovy,2004;AndreevskaiaandBergler,2006;HuandLiu,2004;KanayamaandNasukawa,2006;StrapparavaandValitutti,2004;Kampsetal.,2004;Takamuraetal.,2005;TurneyandLittman,2003;HatzivassiloglouandMcKeown,1997).Theparaphrase-basedexpansionmethodisinthedis-tributionalsimilaritycamp;wealsoexperimentedwithWordNet-basedexpansionasdescibedinsec-tion7.2.Thetaskofassigningsentimentproﬁlestowordsinasentimentlexiconhasbeenaddressedinthelite-rature.SentiWordNetassignsproﬁlestoallwordsinWordNetbasedonapropagationalgorithmfromasmallseedsetmanuallyannotatedbyasmallnum-berofjudges(Baccianellaetal.,2010;Cerinietal.,2007).AndreevskaiaandBergler(2006)usegraphpropagationalgorithmsonWordNettoassigncen-tralityscoresinpositiveandnegativecategories;asimilarapproachbasedonweb-scaleco-occurrencegraphsisdiscussedinVelikovichetal(2010).Thel-walletal(2010)manuallyannotatedasetofwordsforstrengthofsentimentandusedmachinelearningtoﬁne-tuneit.Taboadaetal(2011)producedanexpertannotationoftheirlexiconwithstrengthofsentiment.SubasicandHuettner(2001)manuallybuiltanaffectlexiconwithintensities.WiebeandRiloff(2005)classifedlexiconentriesintoweaklyandstronglysubjective,basedontheirrelativefre-quencyofappearanceinsubjectiveversusobjectivecontextsinalargeannotateddataset.Oursentimentproﬁlesarebestthoughtofasrelativelyﬁne-grainedpriorsforthesentimentex-pressedbyagivenwordout-of-context.Thesere-ﬂectamixtureofstrengthofsentiment(ˆpposgood>ˆpposdecent),contextualambiguity(concerncanbein-terpretedassimilartoworryortocare,asin“Herconditionwascausingconcern”versus“Heshowedgenuineconcernforher”),anddominanceofapo-larconnotation(abandonisˆpneg=1;ithasanegativeovertoneeveniftheactualsenseisnotthatofdesertbutofvacate,asin“Youmustabandonyourofﬁce”).Tothebestofourknowledge,thispaperpresentstheﬁrstattempttointegratejudgementsobtainedthroughcrowdsourcingonalargescaleintoasen-timentlexicon,showingtheeffectivenessofthislexicon-enrichmentprocedureforasentimentclas-siﬁcationtask.5Usingproﬁlesforsentence-levelsentimentpolarityclassiﬁcationToevaluatetheusefulnessofthelexicons,weusethemtogeneratefeaturesformachinelearningsys-tems,andcompareperformanceon3-waysentence-levelsentimentpolarityclassiﬁcation.Toensurero-bustnessoftheobservedtrends,weexperimentwithanumberofmachinelearningalgorithms:SVMLinearandRBF,Na¨ıveBayes,LogisticRegression(usingWEKA(Halletal.,2009)),andc5.0DecisionTrees(Quinlan,1993).55.1DataWegeneratedthedatafortrainingandtestingthemachinelearningsystemsasfollows.Weusedour5availablefromhttp://rulequest.com/

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

103

poolof100,000essaystosampleasecond,non-overlappingsetof5,000essays,sothatnoessayusedforlexicondevelopmentappearsinthisset.Fromtheseessays,werandomlysampled550sen-tences,andsubmittedthemtosentimentpolarityan-notationbytwoexperiencedresearchassistants;50double-annotatedsentencedshowedκ=0.8.TESTsetcontainsthe43agreeddouble-annotatedsen-tences,andadditional238sampledfromthe500single-annotatedsentences,281sentenceintotal.ThecategorydistributionintheTESTsetis46.6%neutral,32.4%positive,and21%negative.TheTRAINsetcontainstheremainingsentences,pluspositive,negative,andneutralsentencesanno-tatedduringlexicondevelopment,forthetotalof1,631sentences.ThecategorydistributioninTRAINis39%neutral,35%positive,26%negative.5.2FromlexiconstofeaturesOurgoalistoevaluatetheimpactofsentimentpro-ﬁlesonsentence-levelsentimentpolarityclassiﬁca-tionfortheseedandtheexpandedlexicons,whilealsolookingforthemosteffectivewaystorepresentthisinformationformachinelearners.Weimplementtwobaselinesystems.Onepro-videsthemachinelearnerwiththemostdetailedin-formationcontainedinalexicon:BL-fullhas2fea-turesforeverylexiconword,takingthevalues(1,0)forpositivematchinasentence,(0,1)–fornegative,(1,1)forawordinbothpositiveandnegativepartsofthelexicon,Und(0,0)otherwise.Thesecondbaselineprovidesthemachinelearnerwithonlysummaryinformationabouttheoverallsentimentofthesentence.BL-sumusesonly2fea-tures:(1)thetotalcountofpositivewordsinthesentence;(2)thetotalcountofnegativewordsinthesentence,accordingtothegivenlexicon.Forthesentiment-enrichedruns,weconstructanumberofrepresentations:Int-full,Int-sum,Int-bin,andInt-c.Int-fullandInt-sumareparalleltotherespectivebaselinesystems.Int-fullrepresentseachlexiconwordas2featurescorrespondingtotheword’sestimatedˆpposwandˆpnegw,providingthemostdetailedinformationtothemachinelearner.IntheInt-sumcondition,weuseˆpposwandˆpnegwforeverywordtoinduce2features:(1)thesumofpositiveprobabilitiesofallwordsinthesentence;(2)thesumofnegativeprobabilitiesforallwordsinthesentence,accordingtothegivenlexicon.ForInt-binruns,weusebinsofthesizeof0.2–halfofthemaximalconﬁdenceinterval–togrouptogetherwordswithcloseestimates.Weproduce10features.Forpositivebins,the5featurescountthenumberofwordsinthesentencethatfallinbini,1≤i≤5,respectively,thatis,wordswith0.2(i−1)<ˆpposw≤0.2i.Bin1alsoincludeswordswithˆpposw=0,sincethesecannotbedistinguishedwithhighconﬁdencefromˆpposw=0.1.Notethatwedonotprovideascale,wemerelyrepresentdifferentrangeswithdifferentfeatures.Thisshouldallowthemachinelearnerstheﬂexibilitytoweightthediffe-rentbinsdifferentlywheninducingclassiﬁers.TheInt-cconditionrepresentsacoarse-grainedsetting.Weproduce4features,twoforeachpola-rity:(1)thenumberofwordssuchthat0≤ˆpposw<0.4;(2)thenumberofwordssuchthat0.4≤ˆpposw≤1;similarityforthenegativepolarity.Table3summarizesconditionsandfeatures.Cond.#FFeatureDescriptionBL-full2|L|(1Lpos∩S(w),1Lneg∩S(w))BL-sum2f1=|{w:w∈Lpos∩S}|f2=|{w:w∈Lneg∩S}|Int-full2|L|(ˆpposw,ˆpnegw)∀w∈AInt-sum2(Σw∈Aˆpposw,Σw∈Aˆpnegw)Int-bin10f1=|{w∈A:0≤ˆpposw≤0.2}|...f10=|{w∈A:0.8<ˆpnegw≤1}|Int-c4f1=|{w∈A:0≤ˆpposw<0.4}|...f4=|{w∈A:0.4≤ˆpnegw≤1}|Table3:Descriptionofconditions.Column2showsthenumberoffeatures.Incolumn3:1isanindicatorfunc-tion;Lisalexicon;Lposisthepartofthelexiconcon-tainingpositivewords(samewithnegatives);Sisasen-tenceforwhichafeaturevectorisbuilt;A=L∩S.Forallw∈L−Sinthe-fullconditions,wisrepresentedwith(0,0).6ResultsTable4showsclassiﬁcationaccuraciesfor5ma-chinelearningsystemsacross6conditions,fortheseedandtheexpandedlexicons.LetBLdenotethebest-performingbaseline(BL- l D O w N O A D e D F R O M H T T P : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 1 3 1 5 6 6 6 2 5 / / t l a c _ a _ 0 0 2 1 3 P D . F B j G u e S T T O N 0 7 S e P e M B e R 2 0 2 3 104 MachineConditionSeedExpandedLearner–Majority0.4660.466c5.0BL-full0.4410.498BL-sum0.5120.480Int-full0.4410.498Int-sum0.5660.616Int-bin0.5870.641Int-c0.5300.577SVMBL-full0.4660.466RBFBL-sum0.5270.495Int-full0.4660.466Int-sum0.5480.601Int-bin0.5730.644Int-c0.5300.562SVMBL-full0.5840.566LinearBL-sum0.5090.502Int-full0.5800.609Int-sum0.6010.580Int-bin0.5730.630Int-c0.5690.569LogisticBL-full0.5450.509RegressionBL-sum0.5450.509Int-full0.5340.502Int-sum0.5550.584Int-bin0.5840.616Int-c0.5450.577Na¨ıveBL-full0.5980.584BayesBL-sum0.5090.473Int-full0.5980.580Int-sum0.5450.605Int-bin0.5590.626Int-c0.5370.601Table4:ClassiﬁcationaccuraciesonTESTset.Majo-ritybaselinecorrespondstoclassifyingallsentencesasneutral.Thebestperformanceisboldfaced.LetBLstandforthebest-performingbaseline(BL-fullorBL-sum)foracombinationofmachinelearnerandlexicon.WeuseWilcoxonSigned-Ranktest,reportingthenum-berofsignedranks(N)andthesumofsignedranks(W).Statisticallysigniﬁcantresultsatp=0.05are:Int-sum>BL(N=10,W=43);Int-bin>BL(N=10,W=48);Int-bin>Int-sum(N=10,W=43);Int-bin>Int-full(N=10,W=47);Int-sum>Int-full(N=10,W=37);Int-bin>Int-c(N=10,W=55);Int-sum>Int-c(N=10,W=55);Ex-panded>SeedunderIntcondition(includesInt-full,Int-sum,Int-bin,Int-c)(N=18,W=152,z=3.3).DifferencesbetweenInt-full,Int-c,andBLarenotsigniﬁcant.fullorBL-sum)foracombinationofmachinelearnerandlexicon.Theresultsshowthat(1)Int-bin>Int-sum>BL=Int-c=Int-full;(2)Ex-panded>SeedunderIntcondition.Allinequalitiesarestatisticallysigniﬁcantatp=0.05(seecaptionofTable4fordetails).Erste,boththeseedandtheexpandedlexiconsbeneﬁtfromproﬁleenrichment,although,aspre-dicted,theexpandedlexiconyieldslargergainsduetoitsmorevariedproﬁles:Theseedlexicongainsupto15%inaccuracy(c5.0BL-sumvsInt-bin),whiletheexpandedlexicongainsupto30%,asSVMRBFscoresgoupfrom0.495to0.644.Second,observethatproﬁlingallowstheex-pandedlexicontoleverageitsimprovedcoverage:Whileitisinferiortothebestbaselinerunwiththeseedlexiconforallsystems,itsucceedsinimpro-vingtheseedlexiconaccuraciesby5%-12%acrossthedifferentsystemsfortheInt-binruns.Thebestrunoftheexpandedlexicon(Int-binforSVMRBF)improvesuponthebestrunoftheseedlexicon(Int-sumforSVM-linear)by7%,demonstratingthesuc-cessoftheparaphraser-basedexpansiononcepro-ﬁlesaretakenintoaccount.Overall,comparingthebestbaselinefortheseedlexiconwithInt-bincon-ditionoftheexpandedlexicon,weobserveanim-provementbetween5%(0.598to0.626forNa¨ıveBayes)and25%(0.512to0.641forc5.0),provingtheeffectivinessoftheparaphrase-basedexpansionwithproﬁleenrichmentparadigm.Third,representingproﬁlesusing10bins(Int-bin)providesasmallbutconsistentimprovementoverthesummaryrepresentation(Int-sum)thatsumspositivityandnegativityofthesentiment-bearingwordsinasentence,overacoarse-grainedrepresen-tation(Int-c),aswellasoverthefull-informationrepresentation(Int-full).EvenNa¨ıveBayesandSVMlinear,knowntoworkwellwithlargefeaturesets,showbetterperformanceintheInt-bincon-ditionfortheexpandedlexicon.Theresultsindi-catethatanintermediatedegreeofdetail–betweensummary-onlyandcoarse-grainedrepresentationontheonehandandfull-informationrepresentationontheother–isthebestchoiceinoursetting.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

105

7ComparativeEvaluationsInthissection,wepresentcomparativeevaluationsoftheworkpresentedinthispaperwithrespecttorelatedwork.Thissectionshowsthattheparaphraseexpansion+proﬁleenrichmentsolutionproposedinthispaperiseffectiveforourtaskbeyondoff-the-shelfsolutions,andthatitseffectivenessgeneralizestosentimentanalysisinadifferentdomain.Wealsoshowthatproﬁleenrichmentcanbeeffectivelycou-pledwithothermethodsoflexicalexpansion,al-thoughtheparaphraser-basedexpansionreceivesalargerboostinperformancefromproﬁleenrichmentthanthealternativeexpansionmethodsweconsider.Insection7.1,wedemonstratethattheparaphrase-basedexpansionandproﬁleenrich-mentyieldsuperiorperformanceonourdatarelativetostate-of-artsubjectivitylexicons–Opin-ionFinder,GeneralInquirer,andSentiWordNet.Insection7.2,weshowthatproﬁleenrichmentcanbeeffectivelycoupledwithothermethodsoflexicalexpansion,suchasaWordNet-basedexpansionandanexpansionthatutilizesLin’sdistributionalthesaurus.However,weﬁndthattheparaphraser-basedexpansionbeneﬁtsthemostfromproﬁleenrichment,andattainsbetterperformanceonourdatathanthealterantiveexpansionmethods.Insection7.3,weshowthattheparaphrase-basedexpansionandproﬁleenrichmentparadigmiseffectiveforothersubjecitivylexiconsonotherdata.Weuseadatasetofproductreviewsannotatedforsentence-levelpositivityandnegativityasnewdataforevaluation(HuandLiu,2004).WeusesubsetsofOpinionFinder,GeneralInquirer,andsentimentlexiconfromHuandLiu(2004).Wedemonstratethatparaphrase-basedexpansionandproﬁleenrichmentimprovetheaccuracyofsentimentclassiﬁcationofproductreviewsforeverylexiconandmachinelearnercombination;themagnitudeofimprovementis5%onaverage.7.1CompetitivenessoftheExpandedLexiconHadwebeenabletousetheOpinionFinderortheGeneralInquirerlexicons(OFLandGIL)as-is,howwouldtheresultshavecomparedtothoseattainedusingourlexicons?Weperformedthebaselinerunswithbothlexicons;OFLaccuracieswere0.544-0.594acrossmachinelearningsystems,GIL’s–0.491-0.584(seeGILcolumninTable5).WealsoexperimentedwithusingtheweaksubjandstrongsubjlabelsinOFLassomewhatparalleldistinctionstotheonespresentedhere(seesec-tion4–RelatedWork–foramoredetaileddiscus-sion).Weused(1,0,0)proﬁleforstrongpositives,(0.3,0,0.7)forweakpositives,(0,1,0)forstrongneg-atives,Und(0,0.3,0.7)forweaknegatives,andranallthefeaturerepresentationsdiscussedinsection5.2.Table5columnOFLshowsthebestrunforeverymachinelearningsystem,acrossthedifferentfeaturerepresentations,andchoosingthebetterperformingrunbetweenvanillaOFLandtheversionenrichedwithweak/strongdistinctions.MachineSeedOFLGILSWNExp.LearnerBLc5.00.5120.5980.4910.5160.641SVM-RBF0.5270.5940.4950.5200.644SVM-lin.0.5840.5940.5800.5690.630Log.Reg.0.5450.5980.5410.5370.616Na¨ıveB.0.5980.5730.5840.5870.626Table5:Performanceofdifferentlexiconsonessaydatausingvariousmachinelearningsystems.Foreachsys-temandlexicon,thebestperformanceacrosstheapplica-blefeaturerepresentationsfromsection5.2andthevari-ants(seetext)isshown.SeedBLcolumnshowsthebestbaselineperformanceofourseedlexicon–beforepara-phraserexpansionandproﬁleenrichmentwereapplied.Exp.columnshowstheperformanceofInt-binfeaturerepresentationfortheexpandedlexiconafterproﬁleen-richment.Additionally,weexperimentedwithSentiWord-Net(Baccianellaetal.,2010).SentiWordNetisaresourceforopinionminingbuiltontopofWord-Net,whichassignseachsynsetinWordNetascoretriplet(positive,negative,andobjective),indicatingthestrengthofeachofthesethreepropertiesforthewordsinthesynset.TheSentiWordNetannotationswereautomaticallygenerated,startingwithasetofmanuallylabeledsynsets.Currently,SentiWordNetincludesanautomaticannotationforallthesynsetsinWordNet,totalingmorethan100,000words.Itisthereforethelargest-scalelexiconwithintensityinformationthatiscurrentlyavailable.SinceSentiWordNetassignsscorestosynsetsandsinceourdataisnotsense-tagged,weinducedSen-

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

106

tiWordNetscoresinthefollowingways.Wepart-of-speechtaggedourtrainandtestdatausingStan-fordtagger(Toutanovaetal.,2003).Dann,wetooktheSentiWordNetscoresforthetopsenseforthegivenpart-of-speech(SWN-1).Inadifferentvari-ant,wetookaweightedaverageofthescoresforthedifferentsenses,usingtheweightingalgorithmpro-videdonSentiWordNetwebsite6(SWN-2).Table5columnSWNshowsthebestperformanceﬁguresbetweenSWN-1andSWN-2,acrossthefeaturerep-resentationsinsection5.2.ThecomparativeresultsinTable5clearlyshowthatwhileourvanillaseedlexiconperformscom-parablytooff-the-shelflexiconsonourdata,theparaphraser-expandedlexiconwithsentitmentpro-ﬁlesoutperformsOpinionFinder,GeneralInquirer,andSentiWordNet.7.2SentimentProﬁleEnrichmentwithOtherLexicalExpansionMethodsWepresentedanovellexiconexpansionmethodusingaparaphrasingsystem.Wealsoexperimentedwithmorestandardmethods,usingWordNetanddistributionalsimilarity(BeigmanKlebanovetal.,2012;EsuliandSebastiani,2006;KimandHovy,2004;AndreevskaiaandBergler,2006;HuandLiu,2004;KanayamaandNasukawa,2006;StrapparavaandValitutti,2004;Kampsetal.,2004;Takamuraetal.,2005;TurneyandLittman,2003;Hatzivas-siloglouandMcKeown,1997).Konkret,weim-plementedaWordNet(Müller,1995)basedexpan-sionthatusesthe3mostfrequentsynonymsofthetopsenseoftheseedword(WN-e).Wealsoimple-mentedamethodbasedondistributionalsimilarity:UsingLin’sproximity-basedthesaurus(Lin,1998)trainedonourin-houseessaydataaswellasonwell-formednewswiretexts,wetookallwordswiththeproximityscore>1.80toanyoftheseedlexiconwords(Lin-e).Justliketheparaphraserlexicon,bothperformworsethantheseedlexiconin9outof10baselineruns(BL-sumandBl-fullconditionsforthe5machinelearners).Totesttheeffectofproﬁleenrichment,allwordsinWN-eandLin-eunderwentproﬁleestimationasdescribedinsection3.1,yieldinglexiconsWN-e-pandLin-e-p,respectively.Figure2showsthedistri-6http://sentiwordnet.isti.cnr.it/,under“Samplecode.”1020040060080010001200−1.0−0.50.00.51.02.2020040060080010001200−1.0−0.50.00.51.03.2020040060080010001200−1.0−0.50.00.51.04.2.1020040060080010001200−1.0−0.50.00.51.05.2020040060080010001200−1.0−0.50.00.51.06.2020040060080010001200−1.0−0.50.00.51.07.2.1020040060080010001200−1.0−0.50.00.51.08.2020040060080010001200−1.0−0.50.00.51.09.2.1020040060080010001200−1.0−0.50.00.51.010.2.1020040060080010001200−1.0−0.50.00.51.011.2.1020040060080010001200−1.0−0.50.00.51.012.2.1020040060080010001200−1.0−0.50.00.51.013.2020040060080010001200−1.0−0.50.00.51.014.2.1020040060080010001200−1.0−0.50.00.51.0Figure2:SentimentproﬁledistributionsforLin-e-p(links)andWN-e-p(Rechts)lexicons.butions.WN-e-pandLin-e-pexhibitsimilartrendstothoseoftheparaphraser.SubstitutingWN-e-pforExpandeddatainTable4,weﬁndthesamere-lationshipsbetweenthedifferentfeaturesets:Int-bin>Int-sum>Int-full=BL.ForLin-e-p,Int-sumde-teriorates:Int-bin>Int-sum=Int-full=BL.Forthe20runsintheIntcondition,Paraphraser>WN-e-p>Lin-e-p.7Notethatthisisalsotheorderoflexi-consizes:Lin-eisthemostconservativeexpan-sion(1,907Wörter),WN-eisthesecondwith2,527words,andthelexiconexpandedusingparaphrasingisthelargestwith2,994words.Table6showstheperformanceofLin-e-p,WN-e-p,andoftheEx-pandedlexiconfromTable4usingtheInt-binfea-turerepresentation.Theaveragerelativeimprove-mentsoverthebestbaselinerangebetween6.6%to14.6%forthedifferentexpansionmethods.Proﬁleinductionappearstobeapowerfullexiconclean-upprocedurethatworksespeciallywellwithmoreaggressiveandthuspotentiallynoisierexpan-sions:Themachinelearnersdepresslow-intensityandambiguousexpansions,therebyallowingtheeffectiveutilizationoftheimprovedcoverageofsentiment-bearingvocabulary.7.3EffectivenessoftheParaphraseExpansionwithProﬁleEnrichmentParadigminaDifferentDomainInordertocheckwhethertheparaphrase-basedex-pasionandproﬁleenrichmentparadigmdiscussedinthispapergeneralizestoothersubjectivitylexicons7All>aresignﬁcantatp=0.05usingWilcoxontest.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

107

MachineSeedLin-e-pWN-e-pExp.LearnerBLc5.00.5120.5840.6160.641SVM-RBF0.5270.5980.6010.644SVM-lin.0.5840.5770.5690.630Log.Reg.0.5450.5870.5800.616Na¨ıveB.0.5980.5910.6230.626Av.Gain0.0660.0850.146Table6:PerformanceofWordNet-based,Lin-based,andParaphraser-basedexpansionswithproﬁleenrichmentintheInt-bincondition.SeedBLcolumnshowsthebestbaselineperformanceoftheseedlexicon–beforeexpan-sionandproﬁleenrichmentwereapplied.ThelastlineshowstheaveragerelativegainoverthebestbaselinecalculatedasAGlex=Σm∈MLexm−SeedBLmSeedBLm,whereM={c5.0,SVM-RBF,SVM-linear,LogisticRegres-sion,Na¨ıveBayes},andlex∈{Lin-e-p,Wn-e-p,Exp}.anddomainsofapplication,weexperimentedwithaproductreviewsdataset(HuandLiu,2004)andadditionallexiconsasfollows.7.3.1LexiconsWeusetheOpinionFinderandGeneralInquirerlexicons(OFLandGIL)asbefore,aswellasthelexiconofpositiveandnegativesentimentandopinionwordsavailablealongwith(HuandLiu,2004)productreviewsdataset–HL.8Sinceeachoftheselexiconscontainsmorethan3,000words,enrichmentofthefulllexiconswithproﬁlesisbeyondtheﬁnancialscopeofourproject.Wethereforerestricteachofthelexiconstothesizeoftheiroverlapwithourseedlexicon(see2.1);theoverlapshavebetween415and467words.Thesere-strictedlexiconsareourinitiallexiconsforthenewexperimentthatparalleltheroleoftheseedlexiconintheexperimentsonessaydata.Foreachofthe3initiallexiconsL,L∈{OFL,GIL,HL},wefollowtheparaphrase-basedexpan-sionasdescribedinsection2.2.Thisresultsinabout4.5-foldexpansionofeachlexicon,thenewlexi-consL-e,L∈{OFL,GIL,HL},numberingbetween2,015and2,167words.Boththeinitialandtheex-pandedlexiconsnowundergoproﬁleenrichmentasdescribedinsection3.1,producinglexiconsL-pand8http://www.cs.uic.edu/∼liub/FBS/sentiment-analysis.html#lexiconL-e-p,L∈{OFL,GIL,HL}.7.3.2DataWeusethedatasetfromHuandLiu(2004)9thatcontainsreviewsof5productsfromamazon.com:twodigitalcameras,aDVDplayer,anMP3player,andacellularphone.Thereviewsareannotatedatsentencelevelwithalabelthatdesrcibesthepar-ticularfeaturethatisthesubjectofthepositiveornegativeevaluationandthepolarityandextentoftheevaluation.Forexample,thesentence“Thephonebookisveryuser-friendlyandthespeaker-phoneisexcellent”islabeledasPHONEBOOK[+2],SPEAKERPHONE[+2],whilethesentence“Iamboredwiththesilverlook”islabeledLOOK[−1].Weusedallsentencesthatwerelabeledwithanumeri-calscoreforatleastonefeature,removingasmallnumberofsentenceslabeledwithbothpositiveandnegativescoresfordifferentfeatures.10Weusedthesignofthenumericalscoretolabelthesentencesaspositiveornegative.Theresultingdatasetconsistsof1,695sentences,1,061positiveand634nega-tive;accuracyforamajoritybaselineonthisdatasetis0.626.Ourexperimentsonthisdatasetaredoneusing5-foldcross-validation.7.3.3ResultsTable7showsclassiﬁcationaccuraciesfortheproductreviewdatausingdifferentlexiconsandma-chinelearners.Weobservethatthecombinationofparaphrase-basedexpansionandproﬁleenrichment(L-e-pcolumninthetable)resultedinanimprovedperformanceovertheinitiallexicon(Lcolumninthetable)inallcases,withtheaveragegainof5%inaccuracy.Furthermore,thecontributionsoftheexpansionandtheproﬁleenrichmentarecomplementary,sincetheircombinationperformsbetterthaneachiniso-lation.Wenotethatproﬁleenrichmentalonefortheinitiallexicondidnotyieldanimprovement.Thiscanbeexplainedbythefactthattheinitiallexiconsarehighlypolar,soproﬁlesprovidelittleadditionalinformation:Thepercentageofwordswithˆppos≥0.8orˆpneg≥0.8is84%,86%and91%forGIL,9http://www.cs.uic.edu/∼liub/FBS/sentiment-analysis.html#datasets,thelinkunder“CustomerReviewDatasets(5Produkte)”10suchas“Theheadsetthatcomeswiththephonehasgoodsoundvolumebutithurtstheearslikeyoucannotimagine!”

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

108

MachineLexiconVariantLearnerLL-pL-eL-e-pL=OFL∩Seed,|L|=467,|L-e|=2,167c5.00.6630.6700.6910.704SVM-RBF0.6680.6760.6930.714SVM-lin.0.6750.6700.6880.696Log.Reg.0.6660.6580.6930.698Na¨ıveB.0.6680.6680.6860.695L=GIL∩Seed,|L|=415,|L-e|=2,015c5.00.6440.6580.6630.686SVM-RBF0.6500.6650.6530.683SVM-lin.0.6650.6650.6770.681Log.Reg.0.6640.6580.6780.694Na¨ıveB.0.6690.6660.6780.703L=HL∩Seed,|L|=434,|L-e|=2,054c5.00.6760.6750.6890.706SVM-RBF0.6730.6740.7000.713SVM-lin.0.6760.6640.7030.710Log.Reg.0.6680.6610.7030.699Na¨ıveB.0.6680.6720.6970.697Table7:Accuraciesonproductreviewdata.Foreachma-chinelearnerandlexicon,thebestbaselineperformanceisshownasLfortheinitiallexiconandasL-efortheparaphrase-expandedlexicon.L-pandL-e-pshowtheperformanceofInt-binfeaturesetontheproﬁle-enrichedinitialandparaphrase-expandedlexicons,respectively.ThethreeinitiallexiconsLareOpinionFinder(OFL),GeneralInquirer(GIL),Und(HuandLiu,2004)(HL),eachintersectedwithourseedlexicon.Sizesoftheintialandexpandedlexiconsareprovided.OFL,andHL-derivedlexicons,respectively.Incon-trast,fortheexpandedlexicons,thesepercentagesare51%,53%,and56%;theselexiconsbeneﬁtfromproﬁleenrichment.8ConclusionsWedemonstratedamethodofimprovingaseedsen-timentlexiconbyusingapivot-basedparaphrasingsystemforlexicalexpansionandsentimentproﬁleenrichmentusingcrowdsourcing.Proﬁleenrich-mentaloneyieldedupto15%improvementintheperformanceoftheseedlexicononthetaskof3-waysentence-levelsentimentpolarityclassiﬁcationoftest-takeressaydata.Whilethelexicalexpansiononitsownfailedtoimproveupontheperformanceoftheseedlexicon,itbecamemuchmoreeffectiveontopofsentimentproﬁles,generatinga7%perfor-manceboostoverthebestproﬁle-enrichedrunwiththeseedlexicon.Overall,paraphrase-basedexpan-sioncoupledwithproﬁleenrichmentyieldsanupto25%improvementinaccuracy.Additionally,weshowedthatourparaphrase-expandedandproﬁle-enrichedlexiconperformssigniﬁcantlybetteronourdatathanoff-the-shelfsubjectivitylexicons,nämlich,OpinionFinder,Gen-eralInquirer,andSentiWordNet.Furthermore,ourresultssuggestthatparaphrase-basedexpansionde-rivesmorebeneﬁtfromproﬁlesthantwocompetingexpansionmechanismsbasedonWordNetandonLin’sdistributionalthesaurus.Finally,wedemonstratedtheeffectivenessoftheparaphraser-basedexpansionwithproﬁleenrich-mentparadigmonadifferentdataset.WeusedHuandLiu(2004)productreviewdatawithsentence-levelsentimentpolaritylabels.Paraphrase-basedexpansionwithproﬁleenrichmentyieldedanim-provedperformanceacrossalllexiconsandmachinelearningalgorithmswetried,withanaverageim-provementrateof5%inclassiﬁcationaccuracy.Recentliteraturearguesthatsentimentpolarityisapropertyofwordsenses,ratherthanofwords(Gyamﬁetal.,2009;SuandMarkert,2008;WiebeandMihalcea,2006),althoughDragutetal(2012)successfullyoperatewith“mostlynegative”and“mostlypositive”wordsbasedonthepolaritydistri-butionsofwordsenses.Weplantoaddressinfutureworksensedisambiguationforwordsthathavemul-tiplesenseswithverydifferentsentiment,suchasstress,aseitheranxiety(negative)oremphasis(neu-tral).ReferencesAlinaAndreevskaiaandSabineBergler.2006.MiningWordNetforafuzzysentiment:Sentimenttagextrac-tionofWordNetglosses.InProceedingsofEACL,pages209–216,Trento,Italy.StefanoBaccianella,AndreaEsuli,andFabrizioSebas-tiani.2010.SENTIWORDNET3.0:AnEnhancedLexicalResourceforSentimentAnalysisandOpinionMining.InProceedingsofLREC,pages2200–2204,Malta.ColinBannardandChrisCallison-Burch.2005.Para-phrasingwithbilingualparallelcorpora.InProceed-ingsofACL,pages597–604,AnnArbor,MI.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

109

BeataBeigmanKlebanov,JillBurstein,NitinMadnani,AdamFaulkner,andJoelTetreault.2012.Build-ingsentimentlexicon(S)fromscratchforessaydata.InProceedingsofthe13thInternationalConferenceonIntelligentTextProcessingandComputationalLin-guistics(CICLing),NewDelhi,Indien,March.S.Cerini,V.Compagnoni,A.Demontis,M.Formentelli,andG.Gandini.2007.Micro-WNOp:Agoldstan-dardfortheevaluationofautomaticallycompiledlexi-calresourcesforopinionmining.InAndreaSanso,editor,Languageresourcesandlinguistictheory:Ty-pology,secondlanguageacquisition,pages200–210.FrancoAngeliEditore,Milano,IT.Ferm´ınL.Cruz,Jos´eA.Troyano,F.JavierOrtega,andFernandoEnr´ıquez.2011.Automaticexpansionoffeature-levelopinionlexicons.InProceedingsofthe2ndWorkshoponComputationalApproachestoSubjectivityandSentimentAnalysis,pages125–131,Portland,Oregon,June.EduardDragut,HongWang,ClementYu,PrasadSistla,andWeiyiMeng.2012.Polarityconsistencycheck-ingforsentimentdictionaries.InProceedingsofthe50thAnnualMeetingoftheAssociationforCompu-tationalLinguistics(Volume1:LongPapers),pages997–1005,JejuIsland,Korea,July.AssociationforComputationalLinguistics.AndreaEsuliandFabrizioSebastiani.2006.Determin-ingtermsubjectivityandtermorientationforopinionmining.InProceedingsofEACL,pages193–200,Trento,Italy.LeoA.Goodman.1965.OnSimultaneousConﬁdenceIntervalsforMultinomialProportions.Technometrics,7(2):247–254.YawGyamﬁ,JanyceWiebe,RadaMihalcea,andCemAkkaya.2009.Integratingknowledgeforsubjectivitysenselabeling.InProceedingsofNAACL,pages10–18,Boulder,CO.MarkHall,EibeFrank,GeoffreyHolmes,BernhardPfahringer,PeterReutemann,andIanH.Witten.2009.TheWEKADataMiningSoftware:AnUpdate.SIGKDDExplorations,11.VasileiosHatzivassiloglouandKathleenMcKeown.1997.Predictingthesemanticorientationofadjec-tives.InProceedingsofACL,pages174–181,Madrid,Spain.MinqingHuandBingLiu.2004.Miningandsummarizingcustomerreviews.InProceedingsofACMSIGKDDInternationalConferenceonKnowl-edgeDiscoveryandDataMining,pages168–177,Seattle,WA.ValentinJijkounandKatjaHofmann.2009.Gener-atingaNon-EnglishSubjectivityLexicon:RelationsThatMatter.InProceedingsofEACL,pages398–405,Athens,Greece.JaapKamps,MaartenMarx,RobertMokken,andMaartendeRijke.2004.UsingWordNettomeasuresemanticorientationofadjectives.InProceedingsofLREC,pages1115–1118,Lisbon,Portugal.HiroshiKanayamaandTetsuyaNasukawa.2006.FullyautomaticLexiconExpansionforDomain-orientedSentimentAnalysis.InProceedingsofEMNLP,pages355–363,Syndey,Australia.Soo-MinKimandEdwardHovy.2004.Determiningthesentimentofopinions.InProceedingsofCOLING,pages1367–1373,Geneva,Switzerland.PhilipKoehn.2005.EUROPARL:AParallelcorpusforStatisticalMachineTranslation.InProceedingsoftheMachineTranslationSummit.DekangLin.1998.Automaticretrievalandclusteringofsimilarwords.InProceedingsofACL,pages768–774,Montreal,Canada.TimLoughranandBillMcDonald.2011.WhenisaLi-abilitynotaLiability?TextualAnalysis,Dictionaries,and10-Ks.JournalofFinance,66:35–65.NitinMadnaniandBonnieDorr.2013.GeneratingTar-getedParaphrasesforImprovedTranslation.ACMTransactionsonIntelligentSystemsandTechnology,toappear.RadaMihalcea,CarmenBanea,andJanyceWiebe.2007.Learningmultilingualsubjectivelanguageviacross-lingualprojections.InProceedingsofACL,pages976–983,Prague,CzechRepublic.GeorgeMiller.1995.WordNet:Alexicaldatabase.CommunicationsoftheACM,38:39–41.SaifMohammad,CodyDunne,andBonnieDorr.2009.Generatinghigh-coveragesemanticorientationlexi-consfromovertlymarkedwordsandathesaurus.InProceedingsofEMNLP,pages599–608,Singapore,August.GuillaumePitelandGregoryGrefenstette.2008.Semi-automaticbuildingmethodforamultidimensionalaf-fectdictionaryforanewlanguage.InProceedingsofLREC,Marrakech,Morocco.GuangQiu,BingLiu,JiajunBu,andChunChen.2009.Expandingdomainsentimentlexiconthroughdoublepropagation.InProceedingsofthe21stinternationaljontconferenceonArtiﬁcalintelligence,IJCAI’09,pages1199–1204.C.QuesenberryandD.Hurst.1964.Largesamplesi-multaneousconﬁdenceintervalsformultinomialpro-portions.Technometrics,6:191–195.J.R.Quinlan.1993.C4.5:Programsformachinelear-ning.MorganKaufmannPublishers.DelipRaoandDeepakRavichandran.2009.Semi-supervisedpolaritylexiconinduction.InProceedingsofEACL,pages675–682,Athens.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
2
1
3
1
5
6
6
6
2
5

/
T

A
C
_
A
_
0
0
2
1
3
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

110

PhilipStone,DexterDunphy,MarshallSmith,andDanielOgilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress.CarloStrapparavaandAlessandroValitutti.2004.WordNet-affect:anaffectiveextensionofWordNet.InProceedingsofLREC,pages1083–1086,Lisbon,Portugal.FangzhongSuandKatjaMarkert.2008.ElicitingSubjectivityandPolarityJudgementsonWordSenses.InProceedingsofCOLING,pages825–832,Manch-ester,UK.P.SubasicandA.Huettner.2001.Affectanalysisoftextusingfuzzysemantictyping.IEEETransactionsonFuzzySystems,9(4).MaiteTaboada,JulianBrooke,MilanToﬁloski,Kim-berlyVoll,andManfredStede.2011.Lexicon-BasedMethodforSentimentAnalysis.ComputationalLin-guistics,37(2):267–307.HiroyaTakamura,TakashiInui,andManabuOkumura.2005.Extractingsemanticorientationofwordsusingspinmodel.InProceedingsofACL,pages133–140,AnnArbor,MI.MikeThelwall,KevanBuckley,GeorgiosPaltoglou,DiCai,andArvidKappas.2010.Sentimentstrengthdetectioninshortinformaltext.JournaloftheAmer-icanSocietyforInformationScienceandTechnology,61(12):2544–2558.KristinaToutanova,DanKlein,ChristopherManning,andYoramSinger.2003.Feature-RichPart-of-SpeechTaggingwithaCyclicDependencyNetwork.InProceedingsofHLT-NAACL,pages252–259.PeterTurneyandMichaelLittman.2003.Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation.ACMTransactionsonInformationSystems,21(4):315346.LeonidVelikovich,SashaBlair-Goldensohn,KerryHan-nan,andRyanMcDonald.2010.TheviabilityofWeb-derivedpolaritylexicons.InProceedingsofNAACL,pages777–785,LosAngeles,CA.JanyceWiebeandRadaMihalcea.2006.Wordsenseandsubjectivity.InProceedingsofACL,pages1065–1072,Sydney,Australia.JanyceWiebeandEllenRiloff.2005.Creatingsubjec-tiveandobjectivesentenceclassiﬁersfromunanno-tatedtexts.InProceedingsofCICLING(invitedpa-per),pages486–497,MexicoCity.
PDF Herunterladen