Transactions of the Association for Computational Linguistics, vol. 6, pp. 269–285, 2018. Action Editor: Diana McCarthy.

Transactions of the Association for Computational Linguistics, vol. 6, pp. 269–285, 2018. Action Editor: Diana McCarthy.
Submission batch: 11/2017; Revision batch: 2/2018; Published 5/2018.
c(cid:13)2018 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

BootstrapDomain-SpeciﬁcSentimentClassiﬁersfromUnlabeledCorporaAndriusMudinas,DellZhang,andMarkLeveneDepartmentofComputerScienceandInformationSystemsBirkbeck,UniversityofLondonLondonWC1E7HX,UKandrius@dcs.bbk.ac.uk,dell.z@ieee.org,mark@dcs.bbk.ac.ukAbstractThere is often the need to perform sentiment classiﬁcation in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classiﬁer or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibil-ity of building domain-speciﬁc sentiment clas-siﬁers with unlabeled documents only. Our investigation indicates that in the word em-beddings learned from the unlabeled corpus of a given domain, the distributed word rep-resentations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Ex-ploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific senti-ment lexicon from just a few typical senti-ment words (“seeds”). An important ﬁnding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophis-ticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment clas-siﬁcation, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to as-sign positive/negative sentiment scores to un-labeled documents ﬁrst, and then uses those documents found to have clear sentiment sig-nals as pseudo-labeled examples to train a document sentiment classiﬁer via supervised learning algorithms (such as LSTM). On sev-eralbenchmarkdatasetsfordocumentsenti-mentclassiﬁcation,ourend-to-endpipelinedapproachwhichisoverallunsupervised(ex-ceptforatinysetofseedwords)outper-formsexistingunsupervisedapproachesandachievesanaccuracycomparabletothatoffullysupervisedapproaches.1IntroductionSentimentanalysis(Liu,2015)isapopularresearchtopicwhichhasawiderangeofapplications,suchassummarizingcustomerreviews,monitoringsocialmedia,andpredictingstockmarkettrends(Bollenetal.,2011).Abasictaskinsentimentanalysisistoclassifythesentimentpolarityofagivenpieceoftext(document),i.e.,whethertheopinionexpressedinthetextispositiveornegative(Pangetal.,2002),whichisthefocusofthispaper.Therearemanydifferentapproachestosenti-mentclassiﬁcationintheNaturalLanguageProcess-ing(NLP)literature—fromsimplelexicon-basedmethods(Dingetal.,2008;Thelwalletal.,2010;Thelwalletal.,2012)tolearning-basedapproaches(PangandLee,2004;Turney,2002;JoandOh,2011;Argamonetal.,2007;LinandHe,2009),andalsohybridmethodsinbetween(Mudinasetal.,2012;Zhangetal.,2011).Nomatterwhichap-proachistaken,asentimentclassiﬁerbuiltforitstargetdomainwouldworkwellonlywithinthatspe-ciﬁcdomain,butsufferaseriousperformancelossoncethedomainboundaryiscrossed.Thesamewordcoulddrasticallychangeitssentimentpolarity(and/orstrength)ifitisusedinadifferentdomain.Forexample,being“small”islikelytobenegative

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

270

forahotelroombutpositiveforadigitalcamcorder,being“unexpected”maybeagoodthingfortheend-ingofamoviebutnotfortheengineofacar,andwewillprobablyenjoy“interesting”booksbutnotnec-essarily“interesting”food.Here,thedomaincouldbedeﬁnednotbythetopicofthedocumentsbutbythestyleofwriting.Forexample,themeaningsofwordslike“gay”and“terriﬁc”woulddependonwhetherthetextwaswritteninahistoricaleraormoderntimes.Whenweneedtoperformsentimentclassiﬁca-tioninanewdomainunseenbefore,thereareusu-allyneitherlabeleddictionaryavailabletoemploylexicon-basedsentimentclassiﬁersnorlabeledcor-pusavailabletotrainlearning-basedsentimentclas-siﬁers.Itis,ofcourse,possibletoresorttoageneral-purposeoff-the-shelfsentimentclassiﬁer,orapre-builtoneforadifferentdomain.However,theef-fectivenesswouldoftenbeunsatisfactorybecauseofthereasonsmentionedabove.Therehavebeensomestudiesondomainadaptationortransferlearn-ingforsentimentclassiﬁcation(Blitzeretal.,2007;Tanetal.,2009;Panetal.,2010;Glorotetal.,2011;Yoshidaetal.,2011;Bollegalaetal.,2013;Xiaetal.,2013;YangandEisenstein,2015),buttheystillrequirealargeamountoflabeledtrainingdatafromafairlysimilarsourcedomain,whichisnotalwaysfeasible.Thosealgorithmsalsotendtobecomputational-expensiveandtime-consuming(Mo-hammadandTurney,2010;Fastetal.,2016).Inthispaper,weproposeanend-to-endpipelinednearly-unsupervisedapproachtodomain-speciﬁcsentimentclassiﬁcationofdocumentsforanewdomainbasedondistributedwordrepresentations(vectors).AsshowninFig.1,theproposedapproachconsistsofthreemainstages(components):(1)domain-speciﬁcsentimentwordembedding,(2)domain-speciﬁcsentimentlexiconinduction,(3)domain-speciﬁcsentimentclassiﬁcationofdoc-uments.Brieﬂyspeaking,givenalargeunlabeledcorpusforanewdomain,wewouldﬁrstsetupthevectorspaceforthatdomainviawordembedding,theninduceasentimentlexiconinthediscoveredvec-torspacefromaverysmallsetofseedwordsaswellasageneral-purposelexicon,andﬁnallyexploittheinducedlexiconinalexicon-baseddocumentsentimentclassiﬁertobootstrapamoreeffectivelearning-baseddocumentsentimentclassiﬁerforthatdomain.Thesecondstageofourapproachout-performsthestate-of-the-artunsupervisedmethodforsentimentlexiconinduction(Hamiltonetal.,2016),whichisthemostcloselyrelatedwork(seeSection2).Thekeytothesuperiorperformanceofourmethodcomparedwiththeirsistheinsightgainedfromourﬁrststagethatpositiveandneg-ativesentimentwordsarelargelyclusteredinthedomain-speciﬁcvectorspacebutthesetwoclus-tershaveanon-negligibleoverlap,thereforesemi-supervised/transductivelearningalgorithmscouldbeeasilymisledbytheexamplesintheoverlapandwouldactuallynotworkaswellassimplesuper-visedclassiﬁcationalgorithms.Overall,thedocu-mentsentimentclassiﬁerresultingfromournearly-unsupervisedapproachdoesnotrequireanylabeleddocumenttobetrained,anditcanoutperformthestate-of-the-artunsupervisedmethodfordocumentsentimentclassiﬁcation(Eisenstein,2017).Thesourcecodeforourimplementedsystemandthedatasetsforourexperimentsareopentotheresearchcommunity1.Therestofthispaperisorganizedasfollows.InSection2,wereviewpreviousstudiesonthistopic.InSections3to5,wedescribethethreemainstagesofourapproachrespectively.InSection6,wedrawconclusionsanddiscussfuturework.2RelatedWorkMostoftheearlysentimentanalysissystemstooklexicon-basedapproachestodocumentsentimentclassiﬁcationwhichrelyonpre-compiledsentimentlexicons(Owsleyetal.,2006).Variousmethodshavebeenproposedtoautomaticallyproducesuchsentimentlexicons(HuandLiu,2004;Dingetal.,2008).Later,thefocusofresearchshiftedtolearning-basedapproaches(Pangetal.,2002;PangandLee,2004),assupervisedlearningalgorithmsusuallydeliveramuchhigheraccuracyinsenti-mentclassiﬁcationthanpurelexicon-basedmeth-ods.However,lexiconshavenotcompletelylosttheirattractiveness:theyareusuallyeasiertoun-derstandandtomaintainbynon-experts,andtheycanalsobeintegratedintolearning-basedsentimentclassiﬁers(Mudinasetal.,2012;Eisenstein,2017).1https://goo.gl/8K9PbE

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

271

UnlabeledTraining DocumentsWord EmbeddingsSentiment LexiconPseudo-labeledTraining DocumentsProbabilisticWord ClassifierSentiment SeedsLexicon-based Sentiment ClassifierLearning-based Sentiment ClassifierUnlabeledTest DocumentsClassified Test DocumentsFigure1:Ournearly-unsupervisedapproachtodomain-speciﬁcsentimentclassiﬁcation.Thelexicon-basedsentimentclassiﬁerusedinourexperimentsisapublicly-availablesystemcalledpSenti2(Mudinasetal.,2012).Inadditiontoacustomizablesentimentlexicon,italsousesshallowNLPtechniqueslikepart-of-speech(POS)taggingandthedetectionofsentimentinvertersandothermodiﬁers(intensifyinganddiminishingadverbs).Theintroductionofmodernwordembeddingtechniqueslikeword2vec(Mikolovetal.,2013)andGloVe(Penningtonetal.,2014)haveopenedthepossibilityofnewsentimentanalysismethods.Givenalargeunlabeledcorpus,suchtechniquescanlearnfromwordco-occurrenceinformationandpro-duceavectorspaceofhundredsofdimensions,witheachwordbeingassignedacorrespondingvector.Theresultingvectorspacehelpsinunderstandingthesemanticrelationshipsbetweenwordsandal-lowsgroupingofwordsbasedontheirlinguisticsimilarities.RecentlyRotheetal.(2016)proposedtheDENSIFIERmethodthatcanreducethedimen-sionalityofwordembeddingswithoutlosingseman-ticinformationandexploreditsapplicationinvari-ousdomains.FortheSemEval-2015task(Rosenthaletal.,2015),DENSIFIERperformedslightlyworsecomparedtoword2vec,thoughitstrainingtimewasshorterbyafactorof21.Infact,previousstudiessuchas(Rotheetal.,2016;Cliche,2017)suggestthatword2vecusuallyprovidesthebestwordem-beddingsforsentimentanalysistasks.Intheirrecentwork,Hamiltonetal.(2016)2https://goo.gl/pj4XAQdemonstratedthatbystartingfromasmallsetofseedwordsandconductinglabelpropagationoverthelexicalgraphderivedfromthepairwiseprox-imitiesofwordembeddings,theycouldinduceadomain-speciﬁcsentimentlexiconcomparabletoahand-curatedone.Intuitively,thesuccessoftheirmethodnamedSentProprequiresarelativelyclearseparationbetweensentimentwordsofoppositepo-larityinthevectorspacewhich,aswewillshowlater,isnotveryrealistic.Moreover,theyhavefo-cusedontheinductionofsentimentlexiconsalone,whilewearetryingtodesignanend-to-endpipelinethatcanturnunlabeleddocumentsinanewdo-maindirectlytotheirsentimentclassiﬁcations,withdomain-speciﬁcsentimentlexiconinductionasakeycomponent.Recentadvancesindeeplearning(LeCunetal.,2015)haselevatedsentimentanalysistonewperfor-mancelevels(Kim,2014;DaiandLe,2015;HongandFang,2015).AsreportedbyDaiandLe(2015),theLongShort-TermMemory(LSTM)(HochreiterandSchmidhuber,1997)RecurrentNeuralNetwork(RNN)canreachorsurpasstheperformancelev-elsofallpreviousbaselinesforsentimentclassiﬁ-cationofdocuments.OneofthemanyappealsofLTSMisthatitcanconnectpreviousinformationtothecurrentcontextandallowseamlessintegrationofpre-trainedwordembeddingsastheﬁrst(projec-tion)layeroftheneuralnetwork.Moreover,Rad-fordetal.(2017)discoveredthe“sentimentunit”,thesingleunitwhichcanlearntheperfectrepresen-tationofsentiment,inamultiplicativeLSTMwith

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

272

4096units,despitethefactthattheLSTMwasonlytrainedforacompletelydifferentpurpose—topre-dictthenextcharacterinthetextofAmazonre-views.OurresultsareinlinewiththoseﬁndingsandconﬁrmedthesuperiorityofLSTMinbuildingdocument-levelsentimentclassiﬁers.Zhangetal.(2011)triedtoaddressthelowre-callproblemoflexicon-basedmethodsforTwittersentimentclassiﬁcationviatrainingalearning-basedsentimentclassiﬁerusingthenoisylabelsgeneratedbyalexicon-basedsentimentclassiﬁer(Dingetal.,2008).Althoughthebasicideaoftheirworkissimilartowhatwedointhethirdstageofourap-proach(seeSection5),thereexistseveralnotabledifferences.First,theyadoptedasinglegeneral-purposesentimentlexiconprovidedbyDingetal.(2008)anduseditforalldomains,whilewewouldinduceadifferentlexiconforeachdifferentdomain.Consequently,theirmethodcouldhavearelativelylargevarianceinthedocumentsentimentclassiﬁca-tionperformancebecauseofthedomainmismatch(e.g.,F1=0.874forthe“Tangled”tweetsandF1=0.647forthe“Obama”tweets),whereasourapproachwouldperformquiteconsistentlyoverdif-ferentdomains.Second,theywouldneedtostripoutallthepreviously-knownopinionwordsintheirsinglegeneral-purposesentimentlexiconfromthetrainingdocumentsinordertopreventthetrainingbiasandforcetheirdocumentsentimentclassiﬁertoexploitdomain-speciﬁcfeatures,butdoingthiswouldobviouslylosetheveryvaluablesentimentsignalscarriedbythoseopinionwords.Incontrast,wewouldbeabletoutilizealltermsinthetrainingdocuments,includingthoseopinionwordsthatap-pearedinourautomaticallyinduceddomain-speciﬁclexicons,asfeatures,whenbuildingourdocumentsentimentclassiﬁers.Third,theydesignedtheirmethodspeciﬁcallyforTwittersentimentclassiﬁca-tion,whileourapproachwouldworkfornotonlyshorttextssuchastweets(seeSection5.2)butalsolongtextssuchascustomerreviews(seeSec-tion5.1).Fourth,theyhadtouseanintermediatesteptoidentifyadditionalopinionatedtweets(ac-cordingtotheopinionindicatorsextractedthroughtheχ2testontheresultsoftheirlexicon-basedsen-timentclassiﬁer)inordertohandletheneutralclass,butwewouldnotrequirethattime-consumingstepaswewouldusethecalibratedprobabilisticoutputsofourdocumentsentimentclassiﬁertodetecttheneutralclass(seeSection5.3).3Domain-SpeciﬁcSentimentWordEmbeddingOurapproachtodomain-speciﬁcdocument-levelsentimentclassiﬁcationisbuiltontopofwordem-beddings—distributedwordrepresentations(vec-tors)thatcouldbelearnedfromanunlabeledcorpustoencodethesemanticsimilaritiesbetweenwords(Goldberg,2017).Inthissection,weinvestigatehowtheembed-dingsofsentimentwordsforaparticulardomainwouldlooklikeinthedomain-speciﬁcvectorspace.Toensureafaircomparisonwiththestate-of-the-artsentimentlexiconinductiontechniqueSentProp3(Hamiltonetal.,2016)laterinSection4,weadoptthesamepublicly-availablepre-trainedwordem-beddingsforthefollowingthreedomainstogetherwiththecorrespondingsetsofsentimentwords(i.e.,sentimentlexicons).•Standard-English.WeusethetheGoogleNewswordembeddings4andthe‘GeneralInquirer’lex-icon(Stoneetal.,1966)withthesentimentpolar-ityscorescollectedbyWarrineretal.(2013).•Twitter.WeusethewordembeddingsconstructedbyRotheetal.(2016)andthesentimentlexiconfromtheSemEval-2015Task10E(Rosenthaletal.,2015).•Finance.Weusethewordembeddingslearnedus-inganSVD-basedmethod(Manningetal.,2008)fromacollectionof“8-K”ﬁnancialreports5(Leeetal.,2014)andtheﬁnancesentimentlexiconhand-craftedbyHamiltonetal.(2016).Notethattheabovethreesentimentlexiconswouldbeusedforboththeinspectionofsentimentworddistributionsinthissectionandtheevaluationofsentimentlexiconinductionlaterinthenextsec-tion.Furthermore,tofacilitateafaircompari-sonwiththestate-of-the-artunsuperviseddocumentsentimentclassiﬁcationtechniqueProbLex-DCM6(Eisenstein,2017)laterinSection5,wealsoadoptthefollowingtwodocumentcollectionswhichtheyhaveused.3https://goo.gl/BFkY8N4https://goo.gl/5r79l65https://goo.gl/7ntr2V6https://goo.gl/Qr993F

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

273

•IMDB.Weuse50kmoviereviewsinEnglishfromIMDB(Maasetal.,2011)with25klabeledtrainingdocuments.•Amazon.Weuseabout28kproductreviewsinEnglishacrossfourproductcategoriesfromAma-zon(Blitzeretal.,2007;McAuleyandLeskovec,2013)with8klabeledtrainingdocuments.Thewordembeddingsfortheabovetwodomainsweretrainedbyusontherespectivecorporaus-ingword2vec(Mikolovetal.,2013)whichemploysatwo-layerneuralnetworkandisbyfarthemostwidelyusedwordembeddingtechnique.Speciﬁ-cally,weranword2vecwithskip-gramofaﬁve-wordwindowtoconstructwordvectorsof500di-mensions,asrecommendedbypreviousstudies7.ThesentimentlexiconmadebyLiu(2015)isconsis-tentlyoneofthebestforanalyzingreviews(Ribeiroetal.,2016),soitisusedforbothofthosedomains.Drawingananalogytothewell-knownclusterhy-pothesisinInformationRetrieval(IR)(Manningetal.,2008),hereweputforwardtheclusterhypothe-sisforsentimentanalysis:wordsinthesameclusterbehavesimilarlywithrespecttosentimentpolarityinaspeciﬁcdomain.Thatistosay,weexpectpos-itiveandnegativesentimentwordstoformdistinctclusters,giventhattheyhavebeenrepresentedinanappropriatevectorspace.Toverifythishypothesis,itwouldbeusefultovisualizethehigh-dimensionalsentimentwordvectorsina2Dplane.Wehavetriedanumberofdimensionalityreductiontech-niquesincludingthet-distributedStochasticNeigh-borEmbedding(t-SNE)(vanderMaatenandHin-ton,2008),butfoundthatsimplyusingtheclas-sicPrincipleComponentAnalysis(PCA)(Bishop,2006)worksverywellforthispurpose.Wehavefoundthatingeneral,theaboveclusterhypothesisholdsforwordembeddingswithinaspe-ciﬁcdomain.Fig.2ashowsthatintheStandard-Englishdomain,thesentimentwordswithoppositepolaritieswouldformtwodistinctclusters.How-ever,itcanalsobeseenthatthosetwoclusterswouldoverlapwitheachother.Thatisbecauseeachwordcarriesnotonlyasentimentvaluebutalsoitslinguis-ticandsemanticinformation.Zoomingintooneofthewordvectorspaceregions(Fig.2b)canhelpusunderstandwhysentimentwordswithdifferentpo-7https://goo.gl/SyAdej−−++−−−−++++−−−−−+−−+−−−+−+−+−−−++++−+−−−++++−++−++−+−−+−−−+−−−++−−++−+−+−+−+−+++−++−+−+−−+−−−++−−−−−+−−−−−+−−−+−++−++++−++−+−−−−−++++++−−−−+−−−+−−−−+−+−−−−++−+++−+−+−−−−+−−++++−−+−+−−−++++−−−−−+−+++−−+−−−−−−++−−+−−++−−−++−−−+−−+−−+−−+−−++++−−−−−−−+−−+−−−−−+−−−+−+−++++++−−−−−+−−−−+++−−+−++−−−+++−−+++−−−−−+++++−++−−−++−−+−−−−+−−+−+−−+−+−−+−−−−−−+−−−−−+−−−−−−−−−+−+−++−−−−+−−+++−−+−−+++−−−+−−+−++++−+++++−−−−+−+++−−++−−−+−+−−−++−+−+++−−−−−−+−+−−−−−+−−−−−−−+−+−++−−−−−−−−−+−+−+−−++−−−−−+−−+++−++−−−++−+−−−−−−−−−−−−−−−+−−+−+−+−+++−−−−−−+−−−−−−−−−−−+−−−++−−−+−−−−+−+++−−−+−+−−−+−+−−+−+−−−−−++−++−+−−−+−−++−++−−−−−−+−−−−++−+++−−++++−++−−++−++−−−++−+−−−++−−−−+−+−−−+−+−+−−−++−−+−++−++−−++++−+−+−−−+−+−+−−−−−++++−+−+−++−−++−++−−+−−+−−−−++−−+−−−−+−+++−++−+−+++−−−−−−+−−−−−−−−−−+−+−+−−+−−+−−++−+−+−−+−−−−+−+−−+−+−−+−+−−−+−−+−−−−−++++−+−−−+−−+−−+−−−−−−−−−+−+−−−−−−−−+−+−+++−+−−−++−−−−−+−+−++−−−−−−−−++−−−+−−−+−−−+−+−+−−−−−−+++−+−−+−+++−+−−−+++−+−−+−−−−++−−+−−−−−−−+−+−−−+−++−+−−−−+−−−−+−++−+++−+−−−−−−−+−+++−+++−−+−−−−+−−−−+−+−−−+−+−++−++−+−−−−++−−−−+−+−−−+++−−−++−−+−−−−−+−++−−+−−−−−−−−−−+−−−+−−+−+−−+−+−−−−−−−+++++++−+−−−−−+−−+−+−−−+−−+++−−+−++−−−−−−−−+−+−−−−−−−+−−+−++−−++−−+−−−−−−−−−−+−+−+−+−−+−−−−−−−+−++−+−−++−−−−−−−−+−+−−+−−−−−++−−+++−−−+−−+−−++−++−−+−++++−+−−−−++−+−−+−++−−++−+−−−−+−−++−−+−−−−+−−−+−+−−+−−−++−+−−−+−+−+−−+−−+−−−+−++++−−+−+−++−+−−−++−+−+−+−−−−+−+−++−+−++−−−++−+−−−−−−−−+−+−+++−−−−−+−−+−−−−+−−−−+++−−−−+−−−−−−−++−−−−−−−−+−−−−+−+++−−+−+++++−−+−−−−+−+++−−+−−−++−+−+−+−−−−−+−−−−−−++−+++−−−−−−+−+−+++++−−−−−−+−−+++++−+−−−+++−−+++−−−−+−+−+++−++−+−−−+−−−−−−−+−−−−−+−−+−−−+−−−−+−−−−−+−−+−++++++−+−−−−−+−−−+++++−+−−+−−−++−+−−+−+−−−−−+−+−−−−++−+−−+−−−−−+++−−−−+−+++−−−++−−+−−+−−++++−−+−+−−++−−+++−+−−−−−+−−++−−−++−−+−+−−+++−+−−+−−−+−−++−−+−++−−−−−−+−−+−−+−+−−−+−+−+−++−+−++++++−+−−−+++−+−−−−+−−−−+−+−++−+++−−+−+−−+++−−+++++−+−−−+−−+−+−++++−−−−−−+−++−−+−−++−++−+−−+−+++−+−−−+−−++−−−−−−+−−+−−+−+−−−−++++++−−+−−+−+−+−−−−−+−++−−++−−−+++−−−−−+−+−−++−−−−−+−+−−+++−−+−−+−−+−+−+−+−+−−+++++++−−+−+−−−−+−−++−−−−+−−++−−++−−−−−++−−−−++−+−−++−−+−+−−−++−+−−++−++−−++−−−−−+−−−−++−−−+−−+−+−−+−−−+−−+−++−−++−−−−+−−+−−+−++−−+−−−−−+−−−−−−−−+−++−++++−−−+−−−−−−−−−+++−−+−+−−−−−−−++−−−−−−+−+−+−−−−−−+−−++++−+−−−+−−+++−−+−−−−+−−++++−−−−−−+−++−−−+−+−+−−−+−−+++−+−−−−−−++++−−−++−++−−−−−+−−−+−−+−−−−+−−−++−++++−++++−−−−−+−+−−+−−−++−−−−++−+−+−−+−−−+−−−−+−−++++−−−++−+−−−+−−+−−+−−+++−+−+−+−−−++++−−−++−−−++−−−++−−−−+−+−−−+−++−+−+−+++−−−−+++−−+−+−−−−+−−−+−+−−−+−−101−101(a)Theglobalvectorspaceshowingtwoclusters.(b)Alocalregionofthevectorspacezoomedin.Figure2:VisualisationofthesentimentwordsintheStandard-Englishdomain.laritiescouldbegroupedtogether:‘hail’,‘stormy’and‘sunny’arelinguisticallysimilarastheyallde-scribeweatherconditions,yettheyconveyverydif-ferentsentimentvalues.Moreover,asdescribedby(Plutchik,1984),sentimentcouldbegroupedintomultipledimensionssuchasjoy–sadness,anger–fear,trust–disgustandanticipation–surprise.Puttingthataside,certainsentimentwordscanbeclassiﬁedsometimesaspositiveandsometimesasnegative,dependingonthecontext.Thesereasonsleadtothephenomenonthatmanysentimentwordsarelocatedintheoverlappingnoisyregionbetweentwoclustersinthedomain-speciﬁcvectorspace.OnvisualinspectionoftheFinance(Fig.3a)sen-timentwordsandIMDB(Fig.4a)sentimentwordsintheirrespectivevectorspaces,wecanseethatpos-itiveandnegativewordsformdistinctclusterswhicharelargelyseparable.However,ifweconsiderFi-nancesentimentwordsintheIMDBvectorspace(seeFig.3b),positiveandnegativewordswouldbemixedtogetherandcouldnotbeseparatedeasily.Onemaybesurprisedthatpositiveandnegativesentimentwordsformtheirrespectiveclusters,be-causemostofthetimetheycouldbeusedinex-

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

274

−−+−+−+−−+−−+−−+−−−−−+−−+−+−−−−−++−+−−−−−+−−−−−−++−−−−+−++−+−−−−−+−++−−−++−−+++−+−+−−−−+++−+−−−−−−−−+−+−−−−−−+−−−+−−−−+−+−+−+−+−+−−+−+−−−−−−+−−+−−−−−−−−−+−−−+−+−+−+−−+−−−−+++−−−−−+−−++−−−−−−−+−−−−−+−−−++−−+−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−+−−−+−+−−−−+−−+−−++++−−−−−−−−−+−−+−−+−−−−++++−−+−−−−−−−−−−+−−+−−−−−+−+−+−+−−−−−−−−−−−−+−−+++−−+−+−−−−−−−++−−−+−+−−−+−++−−−+−−+−−+−+−−−−−−+−−−−−−−−−+−−+++−−−+−−+−+−−−−−−1001020−20−1001020(a)IntheFinance(samedomain)vectorspace.−+−−+−−+−−−+−−−−+−+−−−+−−+−−++−−−+−−−−−−−−−+−−++−−−−−−−−−−−+−−++−+−+−+−−+−−−−−−−+−−−−−−−−−−++−−−−−−+−−+−+−−+−+−−−−−−−−−−−++−−++−−++−−−−−−−+−+−−−−−+−+−−−−+−+−−++−−−+−+−−−+−−−+−−+−−+−+−+−−−−−++−−+−−+−−−−+−−−−+−−−−−−++−−−−+−−+−−−−−−−−−−++−−+−−−−−−−++−−+−−+−−−+−−−−−−−−−−−−+−−+−−−−−++−−−−+−−−+−−++−−+−−−+−−−−−++−−++−−−++−+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−−+−−−−−+−−−−−−−+−−−+−−−−−−−−−−−−−−+++−−−−−−−−+−+++−−−−−−++−+−+−−−−−−+−+−−+−−+−++−+−−−−−+−−−−−−−−−−−−−+−−++−+−−−+−−−+−−−−−+−+−+−−−−−+−−−−−−−−−−−−+−−−−−−+++−++−+−+−−−−+−−−−+−−−+−−−−−−+−−−−−−−−+−−−+++−−−++−−+−−−+−−−−−−−−−+−++−−−−−−−−−+−+−−−+−−+−+−+−+−−−−−−−++−−−−−+−−−−−−+−−−−+−+−−−−−−+−−−−++−+−−−++−+++−−−+−+−−−+−+−+−−−++−+−−−−+−−−5051015−15−10−505(b)IntheIMDB(differentdomain)vectorspace.Figure3:SentimentwordsofFinanceinthesame/differentdomainvectorspace.−−+−−+−+−−+−+−−−+−+++++++++−+−−−−−+++−−++−++−+−−−−+++++−−−+−++−−−−+−+−+−−++++++−−+++−+++−++−+−+−++++++++−+++−++−−−−−−+−−−++−−+−−++−−+−++−++−−−+−++++−−+−++++−+++−−−−+−−+−+−−+++−−+++−−+−+−+−++−−+−−−−+−++−−++−++++−−+−−+−−−+−−−+−+−+−−−−−++−+−−+++−+−+−−−++++−−+−+++−++++−++−−+−−+−−−−−−++−−−−−++−−++−−++−−−−−+−−+−−−+++−−−−−−++−+++−−−+−++++−−−+−−++++−−+−+−−−−+−−++−−−−++−+−−+++−−−++−−++−+−−−−+++++−+−−−++−+−−+−−+−+++−+++−−−−−−−+−+−+−−+++++−+−−+++++−++−+−−+−+−−+−+−++−−−−+−++−++−+−−+−−−++−++++−+−−−−−−−−+−++−−+−−−++−++++−−+−−−−−+−+++++−−+++−+−−++−++−++++−−+−−−−−−−−+++−+−+−+++++++−−−−−+++−+++++−−−−+++++−−−−−−−+−++−−+−−−−+−+−−+−−−−++++−+−−−−+−−+−+−+−+−−++−−++−−−+−+−−−+−−−+−−++++−++−−++++−−++−++++−+−−−+−−++++−++−−+−−+−−−−+−−+−+++−++−+−−−++−−−++−−++−−+−−++−−++−+++−−++++++−−−+−−−−−+−+−−−+−++−−−−−+−−−+−++−−+−−−+−−−−−+−−++−+−−++−−−−+−++++++++−+++−++++−−−+−−++−−++−−−+−−−−−+−+++−++−+++−−++−−++−+++−−++−−+−+++−−−−+−+−−+−−+−−−++++−−−++−++−+−+−+−−−+++−−+−−−+++++++−−++−−++−−−−−−−+−+−−−−−−++++−−+++−−++−++−+−+−+−+++−−+−+−++−−+−++++−−++−+++−−−+++−+−−−+−++−+−+−−−+++−++−++−−+−+++−+−−+−+−−−+−−−+−−−++−+++−−−−++−−−−++−++−++++−+−++−+++−−−++−−+−+++−−+−−++−+++−−−+−+−+−−++−−−+++−+−++−+++−−+++−+++−+−−−−−−+−−+−−−−−+−+−++−+−−+−−−−++−++++−+−+−++−+−+−+−+−++−+−++++++−++++−+−−+−+−+−+++++++−+−−−−++++−−−−−++−++++++++−−−−+−++−−−+++−++−++++++−+−−+−+++−+++−−−+−++++−+−−++−+−−+++++−+−+−−+++−−+−+++−++−−+−+−+−+−−−+−+−+−−−++−−−−+−+−−++++++−+−−++−−−+++++++++−−−+++++−++−−−+−++++−+−−−−−++−+−++−+−+−++−+−−+−−−−−+−++−+−−+++−+−−−+−−++−++−++−−−−−−+−−−−−−−−+−+−+−−+−−++−−10−50510−10−50510(a)Original/Full.−−+++−−+−−++++−+−++−−+++−+−++++−−+++−−++−+−+−+−−+−+−+−−+−−+−−−−++−−−−−++++−−−−+−++−+−−−+−−−−+−++−−−++−−−++++−++++++++−−−+−++−++−−−−−−−+++−+++−+−+−+++−−−−−−+−−+−+−−−−+−−+++−−++−−+−−−++−+−++++−−−−−−−−−−++++−++−−−++−++++++−−++−−−−−++−++−−−++−++−−−++−++−−−+++++−−−−+−−+++−++−++−+−+−−++−+−+−−−−−−−+++−−++++−++−−+−+++−−−−−−+−+−+++−+−−+−+−−−−−−+++−+++−+−+−+−+−−+−+−+−−++−++++−−−−+−−−10−50510−10−50510(b)Filtered.Figure4:SentimentwordsaboutmoviesintheIMDBvectorspacebefore/afterﬁltering.actlythesamecontextwhichmightsuggestthattheywouldresultinsimilarwordembeddings.Forexam-ple,wecouldsay“theroomisgood”andalso“theroomisbad”:botharelegitimatesentences.Theprobablereasonfortheclusterhypothesistobetrueisthatinrealitypeopletendtousepositivesentimentwordstogethermuchmoreoftenthantomixthemwithnegativesentimentwords,andviceversa.Forexample,itwouldbemuchmoreoftenforustoseesentenceslike“theroomiscleanandtidy”than“theroomiscleanbutmessy”.Itisalongestablishedfactincomputationallinguisticsthatwordswithsimilarmeaningstendtooccurnearbyeachother(MillerandCharles,1991);sentimentwordsarenoexcep-tion(Turney,2002).Moreover,ithasbeenwidelyobservedthatonlinecustomerreviewsareaffectedbytheso-calledlove-hateself-selectionbias:userstendtorateonlyproductswhichtheyeitherlikeorhate,leadingtoalotmore1-starand5-starratingsthanother(moderate)ratings;iftheproductisjust

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

275

averageorso-so,theyprobablywillnotbothertoleavereviews.Thepolarizationofonlinecustomerreviewswouldalsoencouragetheclusteringofsen-timentwordsintooppositepolarities.4Domain-SpeciﬁcSentimentLexiconInductionGiventhewordembeddingsforaspeciﬁcdo-main,wecaninduceacustomizedsentimentlexi-confromafewtypicalsentimentwords(“seeds”)frequentlyusedinthatparticulardomain.Suchaninduceddomain-speciﬁcsentimentlexiconplaysacrucialroleinthepipelinetowardsdomain-speciﬁcdocument-levelsentimentclassiﬁcation.Table1showstheseedwordsforﬁvedifferentdo-mainswhichareidenticaltothoseusedbyHamiltonetal.(2016)exceptforthetwoadditionaldomainsIMDBandAmazon.Theinductionofasentimentlexiconcouldthenbeformulatedasasimplewordsentimentclassiﬁcationproblemwithtwoclasses(positivevs.negative).Eachwordisrepresentedasavectorviadomain-speciﬁcwordembedding;theseedwordsarelabeledwiththeircorrespond-ingclasseswhilealltheotherwords(i.e.,“candi-dates”)areunlabeled;thetaskhereistolearnaclas-siﬁerfromthelabeledexamplesﬁrstandthenapplyittopredictthesentimentpolarityofeachunlabeledcandidateword.Theprobabilisticoutputsofsuchawordsentimentclassiﬁercouldberegardedasthemeasureofconﬁdenceaboutthepredictedsentimentpolarity.Intheend,thosecandidatewordswithahighprobabilityofbeingeitherpositiveornegativewouldbeaddedtothesentimentlexicon.Theﬁnalinducedsentimentlexiconwouldincludeboththeseedwordsandtheselectedcandidatewords.AspointedoutbyMudinasetal.(2012),ifwesimplyconsiderallwordsfromthegivencorpusascandidatewords,theabovedescribedwordsenti-mentclassiﬁertendstoassignsentimentvaluesnotonlytotheactualsentimentwordsbutalsototheirassociatedproductfeaturesormoregenerallytheas-pectsoftheexpressedview.Forexample,ifalotofcustomersdonotliketheweightofaproduct,thewordsentimentclassiﬁermayassignstrongnega-tivesentimentto“weight”,yetthisisnotstable—thesentimentpolarityof“weight”maybedifferentwhenanewversionoftheproductisreleasedorthecustomerpopulationhaschanged,andfurthermoreitprobablydoesnotapplytootherproducts.Toavoidthispotentialissue,itwouldbenecessarytoconsideronlyahigh-qualitylistofcandidatewordswhicharelikelytobegenuinesentimentwords.Suchalistofcandidatewordscouldbeobtaineddirectlyfromgeneral-purposesentimentlexicons.Itisalsopossi-bletoperformNLPonthetargetdomaincorpusandextractfrequently-occurringadjectivesorothertypi-calsentimentindicatorslikeemoticonsascandidatewords,whichisbeyondthescopeofthispaper.Toexaminetheeffectivenessofdifferentma-chinelearningalgorithmsforbuildingsuchdomain-speciﬁcwordsentimentclassiﬁers,weattempttorecreateknownsentimentlexiconsinthreedomains:Standard-English,Twitter,andFinance(seeSec-tion3),inthesamewayasHamiltonetal.(2016)did.Putdifferently,forthepurposeofevaluation,wewouldjustuseaknownsentimentlexiconinthecorrespondingdomainasthelistofcandidatewordsandseehowdifferentmachinelearningalgorithmswouldclassifythosecandidatewordsbasedontheirdomain-speciﬁcwordembeddings.Forthoselexi-conswithternarysentimentclassiﬁcation(positivevs.neutralvs.negative),theclass-massnormal-izationmethod(Zhuetal.,2003)usedbyHamiltonetal.(2016)hasbeenappliedheretoidentifytheneutralcategory.Thequalityofeachinducedlex-iconforaspeciﬁcdomainisevaluatedbycompar-ingitwithitscorrespondingknownlexiconastheground-truth,accordingtotheperformancemetricswhicharethesameasin(Hamiltonetal.,2016):AreaUndertheReceiver-Operating-Characteristic(ROC)Curve(AUC)forthebinaryclassiﬁcations(ignoringtheneutralclass,asiscommoninpre-viouswork)andKendall’sτrankcorrelationco-efﬁcientwithcontinuoushuman-annotatedpolarityscores.NotethatKendall’sτisnotsuitablefortheFinancedomain,asitsknownsentimentlexiconisonlybinary.Therefore,ourexperimentalsettingandperformancemeasuresareallidenticaltothoseofHamiltonetal.(2016),whichensuresthevalidityoftheempiricalcomparisonbetweenourapproachandtheirs.InTable2,wecompareanumberoftypicalsu-pervisedandsemi-supervised/transductivelearningalgorithmsforwordsentimentclassiﬁcationinthecontextofdomain-speciﬁcsentimentlexiconinduc-

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
0
2
0
1
5
6
7
6
2
2

/
t

a
c
_
a
_
0
0
0
2
0
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

276

CorpusPositiveNegativeStandard-Englishgood,lovely,excellent,fortunate,pleasant,delightful,perfect,loved,love,happybad,horrible,poor,unfortunate,unpleasant,disgusting,evil,hated,hate,unhappyTwitterlove,loved,loves,awesome,nice,amazing,best,fantastic,correct,happyhate,hated,hates,terrible,nasty,awful,worst,horrible,wrong,sadFinancesuccessful,excellent,proﬁt,beneﬁcial,im-proving,improved,success,gains,positivenegligent,loss,volatile,wrong,losses,dam-ages,bad,litigation,failure,down,negativeIMDBgood,excellent,perfect,happy,interesting,amazing,unforgettable,genius,gifted,in-crediblebad,bland,horrible,disgusting,poor,banal,shallow,disappointed,disappointing,lifeless,simplistic,boreAmazonIMDBdomainseeds(asabove)pluspositive,fortunate,correct,niceIMDBdomainseeds(asabove)plusnegative,unfortunate,wrong,terrible,inferiorTable1:The“seeds”fordomain-speciﬁcsentimentlexiconinduction.tion:•kNN—kNearestNeighbors(Hastieetal.,2009),•LR—LogisticRegression(Hastieetal.,2009),•SVMlin—SupportVectorMachinewiththelin-earkernel(Joachims,1998),•SVMrbf—SupportVectorMachinewiththenon-linearRBFkernel(Joachims,1998),•TSVM—TransductiveSupportVectorMachine(Joachims,1999),•S3VM—Semi-SupervisedSupportVectorMa-chine(Giesekeetal.,2012),•CPLE—ContrastivePessimisticLikelihoodEs-timation(Loog,2016),•SGT—SpectralGraphTransducer(Joachims,2003),•SentProp—alabelpropagationbasedclassiﬁca-tionmethodproposedfortheSocialSentsystem(Hamiltonetal.,2016).Thesuitableparametervaluesoftheabovelearningalgorithms(suchastheCforSVM)arefoundviagridsearchwithcross-validation,andtheprobabilis-ticoutputsaregivenbyPlattscaling(Platt,2000)iftheyarenotprovidedbytheoriginallearningalgo-rithm.TheexperimentalresultsshowninTable2demonstratethatinalmosteverysingledomain,simplelinearmodelbasedsupervisedlearningal-gorithms(LRandSVMlin)canachievetheop-timalornear-optimalaccuracyforthesentimentlexiconinductiontask,andtheyoutperformthestate-of-the-artsentimentlexiconinductionmethodSentProp(Hamiltonetal.,2016)byalargemar-gin.Theperformanceimprovementsarestatisti-callysigniﬁcant(p-value<0.05)accordingtothesigntest.Theredoesnotseemtobeanybeneﬁtofutilizingnon-linearmodels(kNNandSVMrbf)orsemi-supervised/transductivelearningalgorithms(TSVM,S3VM,CPLE,SGT,andSentProp).Thequalitativeanalysisofthesentimentlexiconsin-ducedbydifferentmethodsshowsthattheydifferonlyonthoseborderline,ambiguouswords(suchas“soft”)residinginthenoisyoverlappingregionbetweentwoclustersinthevectorspace(seeSec-tion3).Inparticular,SentPropisbasedonlabelpropagationoverthelexicalgraphofwords,soitcouldbeeasilymisledbynoisyborderlinewordswhensentimentclustershaveconsiderableover-lapwitheachother,kindof“over-ﬁtting”(Bishop,2006).Furthermore,accordingtoourexperimentsonthesamemachine,thosesimplelinearmodelsare70+timesfasterthanSentProp.Thespeeddif-ferenceismainlyduetothefactthatsupervisedlearningalgorithmsonlyneedtotrainonasmallnumberoflabeledwords(“seeds”inourcontext)whilesemi-supervised/transductivelearningalgo-rithmsneedtotrainonnotonlyasmallnumberoflabeledwordsbutalsoalargenumberofunlabeledwords.Ithasalsobeenobservedinourexperimentsthatthereisatypicalprecision/recalltrade-off(Man-ningetal.,2008)fortheautomaticinductionofse-manticlexicons.Assumingthattheclassiﬁedcandi-datewordsareaddedtothelexiconinthedescend-ingorderoftheirprobabilities(ofbeingeitherpos-itiveornegative),theinducedlexiconwillbenois-ierandnoisierwhenitbecomesbiggerandbigger. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 0 1 5 6 7 6 2 2 / / t l a c _ a _ 0 0 0 2 0 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 277 CorpusSupervisedSemi-Supervised/TransductivekNNLRSVMlinSVMrbfTSVMS3VMCPLESGTSentPropAUCStandard-English0.8920.9310.9390.9410.9010.5400.6800.8520.906Twitter0.8490.9000.8950.8950.7700.5210.6510.7250.860Finance0.7110.9440.9420.9320.6650.5610.8360.7250.916τStandard-English0.4690.4950.4980.4950.4870.0380.1620.4090.440Twitter0.4900.5690.5480.5470.5220.0010.2110.4370.500Table2:Comparingtheinducedlexiconswiththeircorrespondingknownlexicons(ground-truth)accordingtotherankingofsentimentwordsmeasuredbyAUCandKendall’sτ.Fig.5showsthatimposingahighercut-offprob-abilitythreshold(forcandidatewordstoentertheinducedlexicon)woulddecreasethesizeofthein-ducedlexiconbutincreaseitsquality(accuracy).Ononehand,theinducedlexiconneedstocontainasuf-ﬁcientnumberofsentimentwords,especiallywhendetectingsentimentfromshorttexts,asalexicon-basedmethodcannotreasonablyclassifydocumentswithnoneortoofewsentimentwords.Ontheotherhand,thenoise(misclassiﬁedsentimentwords)intheinducedlexiconwouldobviouslyhaveadetri-mentalimpactontheaccuracyofthedocumentsen-timentclassiﬁerbuiltontopofit.ContrarytomostpreviousworklikethatfromQiuetal.(2011)whichtriestoexpandthesentimentlexiconasmuchaspos-sibleandthusmaintainahighrecall,wewouldputmoreemphasisontheprecisionandkeepatightcon-trolofthelexiconsize.Forus,havingasmallsenti-mentlexiconisaffordable,becauseourproposedap-proachtodocumentsentimentclassiﬁcationwillbeabletomitigatethelowrecallproblemoflexicon-basedmethodsbycombiningthemwithlearning-basedmethods,whichweshalltalkaboutnext.5Domain-SpeciﬁcSentimentClassiﬁcationofDocumentsAdomain-speciﬁcsentimentlexicon,automaticallyinducedusingtheabovetechnique,providesasolidbasisforbuildingdomain-speciﬁcdocumentsenti-mentclassiﬁers.Fortheexperimentshere,wewouldusealistof7866candidatewordsconstructedbymergingtwowell-knowngeneral-purposesentimentlexiconsthatarebothpubliclyavailable—the‘Gen-eralInquirer’(Stoneetal.,1966)andthesentimentlexiconfromLiu(2012).Thissetofcandidatewordsisitselfacombined,general-purposesentimentlex-Accuracy vs Size0.850.900.951.00Accuracy010002000300040000.500.600.700.800.90Cutoff probabilityNumber of wordsFigure5:Howtheaccuracyandsizeofanin-ducedlexiconareinﬂuencedbythecut-offproba-bilitythreshold.icon,sowenameittheGI+BLlexicon.Moreover,wewouldsetthecut-offprobabilitythresholdtoagenerallygoodvalue0.7inoursentimentlexiconinductionalgorithm.ComparingtheIMDBvectorspaceincludingallthecandidatewords(Fig.4a)withthatincludingonlythehigh-probabilitycandi-datewords(Fig.4b),itisobviousthatthepositiveandnegativesentimentclustersbecomemoreclearlyseparatedinthelatter.Theinducedsentimentlexicononitsowncouldbeapplieddirectlyinalexicon-basedmethodforsentimentclassiﬁcationofdocuments,andareason-ablygoodperformancecouldbeachievedaswewillshowlaterinTable4.However,mostofthetime,lexicon-basedsentimentclassiﬁersarenotaseffec-tiveaslearning-basedsentimentclassiﬁers.Onerea-sonisthattheformertendstosufferfromapoorrecall.Forexample,withalimitedsizesentimentlexicon,lexicon-basedmethodswouldoftenfailto l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 0 1 5 6 7 6 2 2 / / t l a c _ a _ 0 0 0 2 0 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 278 detectthesentimentpresentinshorttexts,e.g.,fromTwitter,duetothelexicalgap.Giventheinducedsentimentlexicon,weproposetousealexicon-basedsentimentclassiﬁertoclassifyunlabeleddocuments,andthenusethoseclassiﬁeddocumentscontainingatleastthreesentimentwordsaspseudo-labeleddocumentstobeusedlaterforthetrainingofalearning-basedsentimentclassiﬁer.Theconditionof“atleastthreesentimentwords”istoensurethatonlyreliablyclassiﬁeddocumentswouldbefurtherutilisedastrainingexamples.5.1SentimentClassiﬁcationofLongTextsFirst,wetrytheinducedsentimentlexiconsinthelexicon-basedsentimentclassiﬁerpSenti(Mudinasetal.,2012)toseehowgoodtheyare.Givenasen-timentlexicon,pSentiisabletoperformnotonlybinarysentimentclassiﬁcationbutalsoordinalsen-timentclassiﬁcationonaﬁve-pointscale.Tomea-surethebinaryclassiﬁcationperformance,weusebothmicro-averagedF1(miF1)andmacro-averagedF1(maF1)whicharecommonlyusedintextcatego-rization(YangandLiu,1999).Tomeasuretheﬁve-pointscaleclassiﬁcationperformance,weusebothCohen’sκcoefﬁcient(Manningetal.,2008)andalsoRoot-Mean-SquareError(RMSE)(Bishop,2006).Asthebaseline,weuseacombinedgeneral-purposesentimentlexicon,GI+BL,mentionedpre-viouslyinSection4.AswecanseefromtheresultsshowninTable3,usingtheinducedsentimentlex-iconforthetargetdomainwouldmakethelexicon-basedsentimentclassiﬁerpSentiperformbetterthansimplyemployinganexistinggeneral-purposesen-timentlexicon.Moreover,usingthesentimentlex-iconsinducedfromthesamedomainwouldleadamuchbetterperformancethanusingthesentimentlexiconsinducedfromadifferentdomain.Second,toevaluatetheproposedtwo-phaseboot-strappingmethod,wemakeempiricalcomparisonsontheIMDBandAmazondatasetsusinganumberofrepresentativemethodsfordocumentsentimentclassiﬁcation:•pSenti—aconcept-levellexicon-basedsentimentclassiﬁer(Mudinasetal.,2012),•ProbLex-DCM—aprobabilisticlexicon-basedclassiﬁcationusingtheDirichletCompoundMultinomial(DCM)likelihoodtoreduceeffectivecountsforrepeatedwords(Eisenstein,2017),•SVMlin—SupportVectorMachinewithlinearkernel(Joachims,1998),•CNN—ConvolutionalNeuralNetwork(Kim,2014),•LSTM—LongShort-TermMemory,aRecurrentNeuralNetwork(RNN)thatcanrememberval-uesoverarbitrarytimeintervals(HochreiterandSchmidhuber,1997;DaiandLe,2015).To apply the deep learning algorithms CNN and LSTM that have a word embedding projection layer, we ﬁx the review size to 500 words, truncating re-views longer than that and padding reviews shorter than that with null values. As pointed out by Greff et al. (2017), the hidden layer size is an important hyperparameter of LSTM: usually the larger the net-work, the better the performance but the longer the training time. In our experiments, we have used an LSTM network with 400 units on the hidden layer which is the capacity that a PC with one Nvidia GTX 1080 Ti GPU can afford and a dropout (Wager et al., 2013) rate of 0.5 which is the most common setting in research literature (Srivastava et al., 2014; Hong and Fang, 2015; Cliche, 2017).As shown in Table 4, the above described two-phase bootstrapping method has been demonstrated to be beneﬁcial: the learning-based sentiment clas-siﬁers trained on pseudo-labeled data are supe-rior to lexicon-based sentiment classiﬁers, including the state-of-the-art unsupervised sentiment classiﬁer ProbLex-DCM (Eisenstein, 2017). Furthermore, the two-phase bootstrapping method is a general frame-work which can utilize any lexicon-based sentiment classiﬁer to produce pseudo-labeled data. There-fore the more sophisticated ProbLex-DCM could also be used instead of pSenti in this framework, which is likely to deliver an even higher perfor-mance. Among the three learning-based sentiment classiﬁers, LSTM achieved the best performance on both datasets, which is consistent with the observa-tions in other studies like Dai and Le (2015).Comparing the LSTM-based sentiment classiﬁers trained on pseudo-labeled and real labeled data, we can also see that using a large number of pseudo-labeled examples could achieve a similar effect as using 25/4 ≈ 6k and 8/2 = 4k real labeled ex-amples for IMDB and Amazon respectively. This suggests that the unsupervised approach is actually preferable to the supervised approach if there are l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 0 1 5 6 7 6 2 2 / / t l a c _ a _ 0 0 0 2 0 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 279 Lexiconbinary5-pointscalemiF1maF1Fpos1Fneg1Cohen’sκRMSEgeneral-purposeGI+BL0.7450.7440.7640.7220.2351.325domain-speciﬁcsamedomain(Kitchen)0.7610.7610.7720.7500.2361.310differentdomain(Electronics)0.7490.7490.7500.7490.2151.373differentdomain(Video)0.7360.7350.7520.7170.2061.372Table3:Lexicon-basedsentimentclassiﬁcationofAmazonKitchenproductreviews.MethodIMDBAmazonAUCF1AUCF1UnsupervisedLexicon-basedpSentiwithexistinggeneral-purposelexicon0.8080.7050.8180.747pSentiwithinduceddomain-speciﬁclexicon0.8410.7680.8390.771ProbLex-DCM(Eisenstein,2017)0.8840.8060.8360.756Learning-basedSVMlintrainedonpseudo-labeleddata0.8630.7710.8450.763CNNtrainedonpseudo-labeleddata0.8790.7810.8490.773LSTMtrainedonpseudo-labeleddata0.8900.8100.8500.776SupervisedLearning-basedLSTMtrainedonreallabeleddata(fullsize)0.9710.9120.8780.802”(1/2size)0.9340.8620.8520.752”(1/4size)0.8920.8210.8410.744”(1/8size)0.8500.7460.8310.735Table4:Sentimentclassiﬁcationoflongtexts.onlyafewthousand(orless)labeledexamples.5.2SentimentClassiﬁcationofShortTextsToevaluateourproposedapproachtosentimentclassiﬁcationofshorttexts,wehavecarriedoutexperimentsontheTwittersentimentclassiﬁcationbenchmarkdatasetfromSemEval-2017Task4B(Rosenthaletal.,2017)whichistoclassify6185tweetsaseitherpositiveornegative.Otherthanthetrainingsetof20,508tweets,wealsocollectedun-labeledtweetsusingtheTwitterAPI.Allthetweetswouldbepre-processedbyreplacingemoticonswiththeircorrespondingtextrepresentationsandencod-ingURLsbytokens.InadditiontotheTwitter-domainseedwordslistedinTable1,wehavealsomadeuseofcommonpositive/negativeemoticonswhichareubiquitousonTwitterasadditionalseedsforthetaskofsentimentlexiconinduction.Notethatinallourexperiments,wedonotusethesen-timentlabelsandthetopicinformationprovidedinthetrainingdata.MakinguseoftheprovidedtrainingdataandourownunlabeleddatacollectedfromTwitter,wehaveconstructedthedomain-speciﬁcwordembeddings,inducedthesentimentlexicon,andbootstrappedthepseudo-labeledtweetdatatotrainthebinarytweetsentimentclassiﬁer.AsthelearningalgorithmwehavechosenLSTMwithahiddenlayerof150unitswhichwouldbeenoughfortweetsastheyarequiteshort(withanaveragelengthofonly20words).Theofﬁcialperformancemeasuresforthisshorttextsentimentclassiﬁcationtask(Rosenthaletal.,2017)includeAccuracy(Acc)andF1.Althoughourapproachisnearly-unsupervised(withoutanyrelianceonlabeleddocuments),itsperformanceonthisbenchmarkdatasetiscomparabletothatofsu-pervisedmethods:itwouldbeplacedroughlyinthemiddleofalltheparticipatingsystemsinthiscom-petition(seeTable5).5.3DetectingNeutralSentimentManyreal-worldapplicationsofsentimentclassiﬁ-cation(e.g.,onsocialmedia)arenotsimplyabi-naryclassiﬁcationtask,butinvolveaneutralcate-goryaswell.Althoughmanylexicon-basedsen-timentclassiﬁersincludingpSenticandetectneu-tralsentiment,extendingtheabovelearning-basedsentimentclassiﬁer(trainedonpseudo-labeleddata) l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 0 1 5 6 7 6 2 2 / / t l a c _ a _ 0 0 0 2 0 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 280 SystemAccF1UnsupervisedBaselineallpositive0.3980.285Baselineallnegative0.6020.376OursLSTM0.8040.795SupervisedWorstsystem0.4120.372Mediansystem0.8020.801Bestsystem0.8970.890Table5:Sentimentclassiﬁcationofshorttextsintotwocategories—SemEval-2017Task4B.torecognizeneutralsentimentischallenging.Toinvestigatethisissue,wehavedoneexperimentsontheTwittersentimentclassiﬁcationbenchmarkdatasetfromSemEval-2017Task4C(Rosenthaletal.,2017)whichistoclassify12379tweetsintoanordinalﬁve-pointscale(−2,−1,0,+1,+2)where0representstheneutralclass.Onecommonwaytohandleneutralsentimentistotreatthesetofneutraldocumentsasasepa-rateclassfortheclassiﬁcationalgorithm,whichisthemethodadvocatedbyKoppelandSchler(2006).Withthepseudo-labeledtrainingexamplesofthreeclasses(−1:negative,0:neutral,and+1:posi-tive),wetriedbothstandardmulti-classclassiﬁca-tion(HsuandLin,2002)andordinalclassiﬁcation(FrankandHall,2001).However,neitherofthemcoulddeliverareasonableperformance.Aftercare-fullyinspectingtheclassiﬁcationresults,werealisedthatitisverydifﬁculttohaveasetofrepresentativetrainingexampleswithgoodcoveragefortheneu-tralclass.Thisisbecausetheneutralclassisnothomogeneous:adocumentcouldbeneutralbecauseitisequallypositiveandnegative,orbecauseitdoesnotcontainanysentiment.Inpractice,thelattercaseismoreoftenseenthantheformercase,anditim-pliesthattheneutralclassismoreoftendeﬁnedbytheabsenceofsentimentwordfeaturesratherthantheirpresence,whichwouldbeproblematictomostsupervisedlearningalgorithms.Whatwediscoveredisthatthesimplemethodofidentifyingneutraldocumentsfromthebinarysentimentclassiﬁer’sdecisionboundaryworkssur-prisinglywell,aslongastherightthresholdsarefound.Speciﬁcally,wetaketheprobabilisticout-putsofabinarysentimentclassiﬁertrainedasbe-fore,andthenputallthedocumentswhoseproba-bilityofbeingpositiveliesnotcloseto0,notcloseto1,butinthemiddlerangeintotheneutralclass.Itturnsoutthatprobabilitycalibration(Niculescu-MizilandCaruana,2005)iscruciallyimportantforthissimplemethodtowork.Somesupervisedlearn-ingalgorithmsforclassiﬁcationcangivepoores-timatesoftheclassprobabilities,andsomeevendonotsupportprobabilityprediction.Forinstance,maximum-marginlearningalgorithmssuchasSVMfocusonhardsamplesthatareclosetothedeci-sionboundary(thesupportvectors),whichmakestheirprobabilitypredictionbiased.Thetechniqueofprobabilitycalibrationallowsustobettercalibratetheprobabilitiesofagivenclassiﬁer,ortoaddsup-portforprobabilityprediction.Ifaclassiﬁeriswellcalibrated,itsprobabilisticoutputshouldbeabletobedirectlyinterpretedasaconﬁdencelevelontheprediction.Forexample,amongthedocumentstowhichsuchacalibratedbinaryclassiﬁergivesaprobabilisticoutputcloseto0.8,approximately80%ofthedocumentswouldactuallybelongtotheposi-tiveclass.UsingthesigmoidmodelofPlatt(2000)withcross-validationonthepseudo-labeledtrainingdata,wecarryoutprobabilitycalibrationforourLSTMbasedbinarysentimentclassiﬁer.Fig.6showsthatthecalibratedprobabilitypredictionalignswiththetrueconﬁdenceofpredictionmuchbetterthantherawprobabilityprediction.Inthiscase,theBrierloss(Brier,1950)thatmeasuresthemeansquareddifferencebetweenthepredictedprobabilityandtheactualoutcomecouldbereducedfrom0.182to0.153byprobabilitycalibration.Ifweranktheestimatedprobabilitiesofbeingpositivefromlowtohigh,thecurveofprobabili-tieswouldbeinan“S”-shapewithadistinctmiddlerangewheretheslopeissteeperthanthetwoends,asshowninFig.7.Thedocumentswiththeirprobabil-itiesofbeingpositiveinsuchamiddlerangeshouldbeneutral.Thereforethetwoelbowpointsintheprobabilitycurvewouldmakeappropriatethresh-oldsfortheidentiﬁcationofneutralsentiment,andtheycouldbefoundautomaticallybyasimplealgo-rithmusingthecentraldifferencetoapproximatethesecondderivative.LetpLandpUdenotetheidenti-ﬁedthresholds(pL

Download pdf