Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee.

Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee.
Submitted 11/2012; Revised 1/2013; Published 7/2013. c
(cid:13)

2013 Association for Computational Linguistics.

Good,Great,Excellent:GlobalInferenceofSemanticIntensitiesGerarddeMeloICSI,Berkeleydemelo@icsi.berkeley.eduMohitBansalCSDivision,UCBerkeleymbansal@cs.berkeley.eduAbstractAdjectiveslikegood,great,andexcellentaresimilarinmeaning,butdifferinintensity.In-tensityorderinformationisveryusefulforlanguagelearnersaswellasinseveralNLPtasks,butismissinginmostlexicalresources(dictionaries,WordNet,andthesauri).Inthispaper,wepresentaprimarilyunsupervisedapproachthatusessemanticsfromWeb-scaledata(e.g.,phraseslikegoodbutnotexcel-lent)torankwordsbyassigningthemposi-tionsonacontinuousscale.WerelyonMixedIntegerLinearProgrammingtojointlydeter-minetheranks,suchthatindividualdecisionsbeneﬁtfromglobalinformation.Whenrank-ingEnglishadjectives,ourglobalalgorithmachievessubstantialimprovementsoverpre-viousworkonbothpairwiseandrankcorre-lationmetrics(speciﬁcally,70%pairwiseac-curacyascomparedtoonly56%bypreviouswork).Moreover,ourapproachcanincorpo-rateexternalsynonymyinformation(increas-ingitspairwiseaccuracyto78%)andextendseasilytonewlanguages.Wealsomakeourcodeanddatafreelyavailable.11IntroductionCurrentlexicalresourcessuchasdictionariesandthesauridonotprovideinformationaboutthein-tensityorderofwords.Forexample,bothWordNet(Miller,1995)andRoget’s21stCenturyThesaurus(thesaurus.com)presentacceptable,great,andsu-perbassynonymsoftheadjectivegood.However,anativespeakerknowsthatthesewordsrepresentvaryingintensityandcaninfactgenerallyberankedbyintensityasacceptablexj.Therefore,intuitively,ourgoalcorrespondstomaximizingtheobjectiveXi,jsgn(xj−xi)·score(ai,aj)(4)Notethatitisimportanttousethesignumfunc-tionsgn()here,becauseweonlycareabouttherel-ativeorderofxiandxj.MaximizingPij(xj−xi)·score(ai,aj)wouldleadtoallwordsbeingplacedattheedgesofthescale,becausethehighestscoreswoulddominateoverallotherones.Wedoincludethescoremagnitudesintheobjective,becausetheyhelpresolvecontradictionsinthepairwisescores(e.g.,seeFigure1).Thisisdiscussedinmorede-tailinSection2.2.2.Inordertomaximizethisnon-differentiableob-jective,weuseMixedIntegerLinearProgramming(MILP),avariantoflinearprogramminginwhichsomebutnotallofthevariablesareconstrainedtobeintegers.UsinganMILPformalization,wecanﬁndagloballyoptimalsolutioninthejointdeci-sionspace,andunlikepreviouswork,wejointlyex-ploitglobalinformationratherthanjustindividuallocal(pairwise)scores.ToencodetheobjectiveinaMILP,weneedtointroduceadditionalvariablesdij,wij,sijtocapturetheeffectofthesignumfunction,asexplainedbelow.WeadditionallyalsoenableourMILPtomakeuseofanyexternalequivalence(synonymy)infor-mationE⊆{1,…,N}×{1,…,N}thatmaybeavailable.Inthiscontext,twowordsareconsideredsynonymousiftheyarecloseenoughinmeaningtobeplacedon(almost)thesamepositionintheinten-sityscale.If(i,j)∈E,wecansafelyassumethatai,ajhavenear-equivalentintensity,soweshouldencouragexi,xjtoremainclosetoeachother.TheMILPisdeﬁnedasfollows:maximizeX(i,j)6∈E(wij−sij)·score(ai,aj)−X(i,j)∈E(wij+sij)Csubjecttodij=xj−xi∀i,j∈{1,…,N}dij−wijC≤0∀i,j∈{1,…,N}dij+(1−wij)C>0∀i,j∈{1,…,N}dij+sijC≥0∀i,j∈{1,…,N}dij−(1−sij)C<0∀i,j∈{1,...,N}xi∈[0,1]∀i∈{1,...,N}wij∈{0,1}∀i,j∈{1,...,N}sij∈{0,1}∀i,j∈{1,...,N}Thedifferencevariablesdijsimplycapturediffer-encesbetweenxi,xj.CisanyverylargeconstantgreaterthanPi,j|score(ai,aj)|;theexactvalueisirrelevant.Theindicatorvariableswijandsijarejointlyusedtodeterminethevalueofthesignumfunctionsgn(dij)=sgn(xj−xi).Variableswijbecome1ifandonlyifdij>0andhenceserveasindicatorvariablesforweak-strongrelationshipsintheoutput.Variablessijbecome1ifandonlyifdij<0andhenceserveasindicatorvariablesforastrong-weakrelationshipintheoutput.Theob-jectiveencourageswij=1forscore(ai,aj)>0andsij=1forscore(ai,aj)<0.3Whenequiva-lence(synonymy)informationisavailable,thenfor(i,j)∈Ebothsij=0andwij=0areencouraged.2.2.2DiscussionOurMILPusesintensityevidenceofallinputpairstogetherandassimilatesallthescoresviaglobaltransitivityconstraintstodeterminetheposi-tionsoftheinputwordsonacontinuousreal-valuedscale.Hence,ourapproachaddressesdrawbacks3Inordertoavoidnumericinstabilityissuesduetoverysmallscore(ai,aj)valuesafterfrequencynormalization,inpracticewehavefounditnecessarytorescalethembyafac-torof1overthesmallest|score(ai,aj)|>0.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
2
2
7
1
5
6
6
6
7
1

/
t

a
c
_
a
_
0
0
2
2
7
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

283

Figure2:EquivalenceInformation:Knowingthatam,a2aresynonymsgivestheMILPanindicationofwheretoplaceanonthescalewithrespecttoa1,a2,a3oflocalordivide-and-conquerapproaches,whereadjectivesarescoredwithrespecttoselectedpivotwords,andhencemanyadjectivesthatlackpairwiseevidencewiththepivotsarenotproperlyclassiﬁed,althoughtheymayhaveorderevidencewithsomethirdadjectivethatcouldhelpestablishtheranking.Optionalsynonymyinformationcanfurtherhelp,asshowninFigure2.Moreover,ourMILPalsogiveshigherweighttopairswithhigherscores,whichisusefulwhenbreakingglobalconstraintcyclesasinthesimpleexampleinFigure1.Ifweneedtobreakacon-straintviolatingtriangleorcycle,wewouldhavetomakearbitrarychoicesifwewererankingbasedonsgn(score(a,b))alone.Instead,wecanchooseabetterrankingbasedonthemagnitudeofthepair-wisescores.Astrongerscorebetweenanadjectivepairdoesn’tnecessarilymeanthattheyshouldbefurtherapartintheranking.ItmeansthatthesetwowordsareattestedtogetherontheWebwithrespecttotheintensitypatternsmorethanwithothercandi-datewords.Therefore,wetrytorespecttheorderofsuchwordpairsmoreintheﬁnalrankingwhenwearebreakingconstraint-violatingcycles.3RelatedWorkHatzivassiloglouandMcKeown(1993)presentedtheﬁrststeptowardsautomaticidentiﬁcationofad-jectivescales,thoroughlydiscussingthebackgroundofadjectivesemanticsandameansofdiscoveringclustersofadjectivesthatbelongonthesamescale,thusprovidingonewayofcreatingtheinputforourrankingalgorithm.InkpenandHirst(2006)studynear-synonymsandnuancesofmeaningdifferentiation(suchasstylistic,attitudinal,etc.).Theyattempttoautomaticallyac-quireaknowledgebaseofnear-synonymdifferencesviaanunsuperviseddecision-listalgorithm.How-ever,theirmethoddependsonaspecialdictionaryofsynonymdifferencestolearntheextractionpat-terns,whileweuseonlyarawWeb-scalecorpus.Mohammadetal.(2013)proposedamethodofidentifyingwhethertwoadjectivesareantonymous.Thisproblemisrelatedbutdistinct,becausethede-greeofantonymydoesnotnecessarilydeterminetheirpositiononanintensityscale.Antonyms(e.g.,little,big)arenotnecessarilyontheextremeendsofscales.SheinmanandTokunaga(2009)andSheinmanetal.(2012)presentthemostcloselyrelatedpreviousworkonadjectiveintensities.Theycollectlexico-semanticpatternsviabootstrappingfromseedadjec-tivepairstoobtainpairwiseintensities,albeitusingsearchengine‘hits’,whichareunstableandprob-lematic(Kilgarriff,2007).Whiletheirapproachisprimarilyevaluatedintermsofalocalpairwiseclassiﬁcationtask,theyalsosuggestthepossibil-ityoforderingadjectivesonascaleusingapivot-basedpartitioningapproach.Althoughintuitiveintheory,theextractedpairwisescoresarefrequentlytoosparseforthistowork.Thus,manyadjec-tiveshavenoscorewithaparticularheadword.Inourexperiments,wereimplementedthisapproachandshowthatourMILPmethodimprovesoveritbyallowingindividualpairwisedecisionstobeneﬁtmorefromglobalinformation.SchulamandFell-baum(2010)applytheapproachofSheinmanandTokunaga(2009)toGermanadjectives.Ourmethodextendseasilytovariousforeignlanguagesasde-scribedinSection5.Anotherrelatedtaskistheextractionoflexico-syntacticandlexico-semanticintensity-orderpat-ternsfromlargetextcorpora(Hearst,1992;ChklovskiandPantel,2004;TandonanddeMelo,2010).SheinmanandTokunaga(2009)followsDavidovandRappoport(2008)toautomaticallybootstrapadjectivescalingpatternsusingseedad-jectivesandWebhits.Thesemethodsthuscanbeusedtoprovidetheinputpatternsforouralgorithm.VerbOceanbyChklovskiandPantel(2004)ex-tractsvariousﬁne-grainedsemanticrelations(in-cludingthestronger-thanrelation)betweenpairsofverbs,usinglexico-syntacticpatternsovertheWeb.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
2
2
7
1
5
6
6
6
7
1

/
t

a
c
_
a
_
0
0
2
2
7
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

284

OurapproachofjointlyrankingasetofwordsusingpairwiseevidenceisalsoapplicabletotheVerbO-ceanpairs,andshouldhelpaddresssimilarsparsityissuesoflocalpairwisedecisions.Suchscaleswillagainbequiteusefulforlanguagelearnersandlan-guageunderstandingtools.deMarneffeetal.(2010)inferyes-or-noanswerstoquestionswithresponsesinvolvingscalaradjec-tivesinadialoguecorpus.Theycorrelateadjectiveswithratingsinamoviereviewcorpustoﬁndthatgoodappearsinlower-ratedreviewsthanexcellent.Finally,therehasbeenalotofworkonmeasuringthegeneralsentimentpolarityofwords(Hatzivas-siloglouandMcKeown,1997;HatzivassiloglouandWiebe,2000;TurneyandLittman,2003;LiuandSeneff,2009;Taboadaetal.,2011;YessenalinaandCardie,2011;PangandLee,2008).Ourworkin-steadaimsatproducingalarge,unrestrictednumberofindividualintensityscalesfordifferentqualitiesandhencecanhelpinﬁne-grainedsentimentanaly-siswithrespecttoveryparticularcontentaspects.4Experiments4.1DataInputClustersInordertoobtaininputclustersforevaluation,westartedoutwiththesatelliteclusteror‘dumbbell’structureofadjectivesinWordNet3.0,whichconsistsoftwodirectantonymsasthepolesandanumberofothersatelliteadjectivesthatarese-manticallysimilartoeachofthepoles(GrossandMiller,1990).Foreachantonymypair,wedeter-minedanextendeddumbbellsetbylookingupsyn-onymsandwordsinrelated(satelliteadjectiveand‘see-also’)synonymsets.Wecutsuchanextendeddumbbellintotwoantonymoushalvesandtreatedeachofthesehalvesasapotentialinputadjectivecluster.MostoftheseWordNetclustersarenoisyforthepurposeofourtask,i.e.theycontainadjectivesthatappearunrelatableonasinglescaleduetopolysemyandsemanticdrift,e.g.violentwithrespecttosuper-naturalandaffected.MotivatedbySheinmanandTokunaga(2009),wesplitsuchhard-to-relatead-jectivesintosmallerscale-speciﬁcsubgroupsusingthecorpusevidence4.Forthis,weconsideranundi-4NotethatwedonotusetheWordNetdatasetofSheinmanandTokunaga(2009)forevaluation,asitdoesnotprovidefull438 115 60 35 19 12 14 5 4 3 0 100 200 300 400 500 2 3 4 5 6 7 8 9 10-14 15-17 # of chains Length of chain Figure3:Thehistogramofclustersizesafterpartitioning.41 27 12 3 3 2 0 10 20 30 40 50 3 4 5 6 7 8 # of chains Length of chain Figure4:Thehistogramofclustersizesinthetestset.rectededgebetweeneachpairofadjectivesthathasanon-zerointensityscore(basedontheWeb-scalescoringproceduredescribedinSection2.1.3).Theresultinggraphisthenpartitionedintoconnectedcomponentssuchthatanyadjectivesinasubgraphareatleastindirectlyconnectedviasomepathandthusmuchmorelikelytobelongtothesameinten-sityscale.Whilethisdoesbreakuppartitionswhen-everthereisnocorpusevidenceconnectingthem,orderingtheadjectiveswithineachsuchpartitionre-mainsachallengingtask.ThisisbecausetheWebevidencewillstillnotnecessarilydirectlyrelatealladjectives(inapartition)toeachother.Addition-ally,theWebevidencemaystillindicatethewrongdirection.Figure3showsthesizedistributionoftheresultingpartitions.PatternsToconstructourintensitypatternset,westartedwithacoupleofcommonrankableadjectiveseedpairssuchas(good,great)and(hot,boiling)andusedtheWeb-scalen-gramscorpus(BrantsandFranz,2006)tocollectthefewmostfrequentpat-ternsbetweenandaroundtheseseed-pairs(inbothdirections).Amongthese,wemanuallychoseascales.Instead,theirannotatorsonlymadepairwisecompar-isonswithselectwords,usinga5-wayclassiﬁcationscheme(neutral,mild,verymild,intense,veryintense).

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
2
2
7
1
5
6
6
6
7
1

/
t

a
c
_
a
_
0
0
2
2
7
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

285

smallsetofintuitivepatternsthatarelinguisticallyusefulfororderingadjectives,severalofwhichhadnotbeendiscoveredinpreviouswork.TheseareshowninTable1.Notethatweonlycollectedpat-ternsthatwerenotambiguousinthetwoorders,forexamplethepattern’?,not?’isambiguousbe-causeitcanbeusedasboth’good,notgreat’and’great,notgood’.Alternatively,onecaneasilyalsousefully-automaticbootstrappingtechniquesbasedonseedwordpairs(Hearst,1992;ChklovskiandPantel,2004;YangandSu,2007;Turney,2008;DavidovandRappoport,2008).However,oursemi-automaticapproachisasimpleandfastprocessthatextractsasmallsetofhigh-qualityandverygen-eraladjective-scalingpatterns.Thisprocesscanquicklyberepeatedfromscratchinanyotherlan-guage.Moreover,asdescribedinSection5.1,theEnglishpatternscanalsobeprojectedautomaticallytopatternsinotherlanguages.DevelopmentandTestSetsSection2.1describesthemethodforcollectingtheintensityscoresforad-jectivepairs,usingWeb-scalen-grams(BrantsandFranz,2006).WereliedonasmalldevelopmentsettotesttheMILPstructureandthepairwisescoresetup.Forthis,wemanuallychose5representativeadjectiveclustersfromthefullsetofclusters.Theﬁnaltestset,distinctfromthisdevelopmentset,consistsof569wordpairsin88clusters,eachannotatedbytwonativespeakersofEnglish.Boththegoldtestdata(andourcode)arefreelyavail-able.5Toarriveatthisdata,werandomlydrew30clusterseachforclustersizes3,4,and5+fromthehistogramofpartitionedadjectiveclustersinFig-ure3.Whilelabelingacluster,annotatorscouldex-cludewordsthattheydeemedunsuitabletoﬁtonasinglesharedintensityscalewiththerestofthecluster.Fortunately,thepartitioningdescribedear-lierhadalreadyseparatedmostsuchcasesintodis-tinctclusters.Theannotatorsorderedtheremainingwordsonascale.Wordsthatseemedindistinguish-ableinstrengthcouldsharepositionsintheiranno-tation.Asourgoalistocomparescaleformationalgo-rithms,wedidnotincludetrivialclustersofsize2.Onsuchtrivialclusters,theWebevidencealonede-terminestheoutputandhenceallalgorithms,includ-5http://demelo.org/gdm/intensity/ingthebaseline,obtainthesamepairwiseaccuracy(deﬁnedbelow)of93.3%onaseparatesetof30ran-domclustersofsize2.Figure4showsthedistributionofclustersizesinourmaingoldset.Theinter-annotatoragreementintermsofCohen’sκ(Cohen,1960)onthepairwiseclassiﬁcationtaskwith3labels(weaker,stronger,orequal/unknown)was0.64.Intermsofpairwiseaccuracy,theagreementwas78.0%.4.2MetricsInordertothoroughlyevaluatetheperformanceofouradjectiveorderingprocedure,werelyonbothpairwiseandranking-correlationevaluationmetrics.ConsiderasetofinputwordsA={a1,a2,…,an}andtworankingsforthisset–agold-standardrank-ingrG(A)andapredictedrankingrP(A).4.2.1PairwiseAccuracyForapairofwordsai,aj,wemayconsidertheclassiﬁcationtaskofchoosingoneofthreelabels(<,>,=?)forthecaseofaibeingweaker,stronger,andequal(orunknown)inintensity,respectively,com-paredtoa2:L(a1,a2)=ifr(ai)>r(aj)=?ifr(ai)=r(aj)Foreachpair(a1,a2),wecomputegold-standardlabelsLG(a1,a2)andpredictedlabelsLP(a1,a2)asabove,andthenthepairwiseaccuracyPW(A)foraparticularorderingonAissimplythefractionofpairsthatarecorrectlyclassiﬁed,i.e.forwhichthepredictedlabelissameasthegold-standardlabel:PW(A)=PirG(aj)andrP(ai)>rP(aj),orrG(ai)rG(aj)andrP(ai)rP(aj).•tiediffrG(ai)=rG(aj)orrP(ai)=rP(aj).Spearman’srhocorrelationcoefﬁcientFortwon-sizedrankedlists{xi}and{yi},theSpearmancorrelationcoefﬁcientisdeﬁnedasthePearsoncor-relationcoefﬁcientbetweentheranksofvariables:ρ=Pi(xi−¯x)·(yi−¯y)rPi(xi−¯x)2·Pi(yi−¯y)2Here,¯xand¯ydenotethemeansofthevaluesintherespectivelists.Weusethestandardprocedureforhandlingtiescorrectly.Tiedvaluesareassignedtheaverageofallranksofitemssharingthesamevalueintherankedlistsortedinascendingorderofthevalues.HandlingInversionsWhileannotating,wesome-timesobservedthattheorderingitselfwasveryclearbuttheannotatorsdisagreedaboutwhichendofaparticularscalewastocountasthestrongone,e.g.whentransitioningfromsofttohardorfromalphatobeta.Wethusalsoreportaverageabsolutevaluesofbothcorrelationcoefﬁcients,astheseproperlyac-countforanticorrelations.Ourtestsetonlycontainsclustersofsize3orlarger,sothereisnoneedtoaccountforinversionsinclustersofsize2.4.3ResultsInTable3,weusetheevaluationmetricsmentionedabovetocompareseveraldifferentapproaches.WebBaselineTheﬁrstbaselinesimplyreﬂectstheoriginalpairwiseWeb-basedintensityscores.Weclassify(withoneof3labels)agivenpairofadjectivesusingtheWeb-basedintensityscores(asdescribedinSection2.1.3)asfollows:Lbaseline(a1,a2)=0>ifscore(ai,aj)<0=?ifscore(ai,aj)=0Sincescore(ai,aj)representstheweak-strongscoreofthetwoadjectives,amorepositivevaluemeansahigherlikelihoodofaibeingweaker(<,ontheleft)inintensitythanaj.InTable3,weobservethatthe(micro-averaged)pairwiseaccuracy,asdeﬁnedearlier,fortheorigi-nalWebbaselineis48.2%,whiletherankingmea-suresareundeﬁnedbecausetheindividualpairsdonotleadtoacoherentscale.Divide-and-ConquerThedivide-and-conquerbaselinerecursivelysplitsasetofwordsintothreesubgroups,placedtotheleft(weaker),onthesameposition(noevidence),ortotheright(stronger)ofagivenrandomlychosenpivotword.Whilethisapproachshowsonlyaminorimprove-mentintermsofthepairwiseaccuracy(50.6%),itsmainbeneﬁtisthatoneobtainswell-deﬁnedinten-sityscalesratherthanjustacollectionofpairwisescores.SheinmanandTokunagaTheapproachbySheinmanandTokunaga(2009)involvesasimi-lardivide-and-conquerbasedpartitioningintheﬁrstphase,exceptthattheirmethodmakesuseofsyn-onymyinformationfromWordNetandusesallsyn-onymsinWordNet’ssynsetfortheheadwordasneutralpivotelements(iftheheadwordisnotinWordNet,thenthewordwiththemaximalunigramfrequencyischosen).Inthesecondphase,theirmethodperformspairwisecomparisonswithinthemoreintenseandlessintensesubgroups.Wereim-plementtheirapproachhere,usingtheGoogleN-GramsdatasetinsteadofonlineWebsearchenginehits.WeobserveasmallimprovementovertheWebbaselineintermsofpairwiseaccuracy.Notethatthe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 7 1 5 6 6 6 7 1 / / t l a c _ a _ 0 0 2 2 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 287 MethodPairwiseAccuracyAvg.τAvg.|τ|Avg.ρAvg.|ρ|WebBaseline48.2%N/AN/AN/AN/ADivide-and-Conquer50.6%0.450.530.520.62SheinmanandTokunaga(2009)55.5%N/AN/AN/AN/AMILP69.6%0.570.650.640.73MILPwithsynonymy78.2%0.570.660.670.80Inter-AnnotatorAgreement78.0%0.670.760.750.86Table3:MaintestresultsPredictedClassWeakerTieStrongerTrueClassWeaker11712715Tie54215Stronger11122115Table4:Confusionmatrix(Webbaseline)rankcorrelationmeasurescoresareundeﬁnedfortheirapproach.Thisisbecauseinsomecasestheirmethodplacedallwordsonthesamepositioninthescale,whichthesemeasurescannothandleevenintheirtie-correctedversions.Overall,theSheinmanandTokunagaapproachdoesnotaggregateinforma-tionsufﬁcientlywellatthegloballevelandoftenfailstomakeuseoftransitiveinference.MILPOurMILPexploitsthesamepairwisescorestoinducesigniﬁcantlymoreaccuratepair-wiselabelswith69.6%accuracy,a41%relativeerrorreductionovertheWebbaseline,38%overDivide-and-Conquer,and32%overSheinmanandTokunaga(2009).WefurtherseethatourMILPmethodisabletoexploitexternalsynonymy(equiv-alence)information(usingsynonymsmarkedbytheannotators).Theaccuracyofthepairwisescoresaswellasthequalityoftheoverallrankingincreaseevenfurtherto78.2%,approachingthehumaninter-annotatoragreement.Intermsofaveragecorrelationcoefﬁcients,weobservesimilarimprovementtrendsfromtheMILP,butofdifferentmagnitudes,becausetheseaveragesgivesmallclustersthesameweightaslargerones.4.4AnalysisConfusionMatricesForagivenapproach,wecanstudytheconfusionmatrixobtainedbycross-tabulatingthegoldclassiﬁcationwiththepredictedPredictedClassWeakerTieStrongerTrueClassWeaker1772953Tie92429Stronger1538195Table5:Confusionmatrix(MILP)classiﬁcationofeveryuniquepairofadjectivesinthegroundtruthdata.Table4showstheconfusionmatrixfortheWebbaseline.Weobservethatduetothesparsityofpairwiseintensityorderevidence,thebaselinemethodpredictstoomanyties.Table5providestheconfusionmatrixfortheMILP(withoutexternalequivalenceinformation)forcomparison.AlthoughthemiddlecolumnstillshowsthattheMILPpredictsmoretiesthanhumansannotators,weﬁndthataclearmajorityofalluniquepairsarenowcorrectlyplacedalongthediagonal.ThisconﬁrmsthatourMILPsuccessfullyinfersneworderingdecisions,althoughitusesthesameinput(corpusevidence)asthebaseline.TheremainingtiesaremostlyjusttheresultofpairsforwhichtheresimplyisnoevidenceatallintheinputWebcounts.Notethatthisproblemcouldforinstancebecircum-ventedbyrelyingonacrowdsourcingapproach:Afewdispersedtie-breakersareenoughtoallowourMILPtocorrectmanyotherpredictions.PredictedExamplesFinally,inTable6,wepro-videaselectionofrealresultsobtainedbyouralgo-rithm.Forinstance,itcorrectlyinferredthatterri-fyingismoreintensethancreepyorscary,althoughtheWebpatterncountsdidnotprovideanyexplicitinformationaboutthesewordspairs.Insomecases,however,theWebevidencedidnotsufﬁcetodrawtherightconclusions,oritwasmisleadingduetois-sueslikepolysemy(asforthewordfunny). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 2 7 1 5 6 6 6 7 1 / / t l a c _ a _ 0 0 2 2 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 288 AccuracyPredictionGoldStandardGoodhard0thatdetermineshowmuchweightweassigntotranslationsasopposedtothecorpuscountscores.TheMILPisextendedbyaddingthefollowingextraconstraints.dij−w0ijCT<−dmax∀i,j∈{1,...,N}dij+(1−w0ij)CT≥−dmax∀i,j∈{1,...,N}dij+s0ijCT>dmax∀i,j∈{1,…,N}dij−(1−s0ij)CT≤dmax∀i,j∈{1,…,N}w0ij∈{0,1}∀i,j∈Ts0ij∈{0,1}∀i,j∈TThevariablesdi,j,asbefore,encodedistancesbe-tweenpositionsofwordsonthescale,butnowalsoincludecross-lingualpairsofwordsindifferentlan-guages.Thenewconstraintsencouragetranslationalequivalentstoremainclosetoeachother,preferablywithinadesired(butnotstrictlyenforced)maximumdistancedmax.Thenewvariablesw0ij,s0ijaresim-ilartowij,sijinthestandardMILP.However,thew0ijbecome1ifandonlyifdij≥−dmaxandthes0ijbecome1ifandonlyifdij≤dmax.Ifbothw0ijands0ijare1,thenthetwowordshaveasmalldistance−dmax≤dij≤dmax.Theaugmentedobjectivefunctionexplicitlyencouragesthisfortranslationalequivalents.Overall,thisapproachthusallowsevi-dencefromalanguagewithmoreWebevidencetoimprovetheprocessofadjectiveorderinginlesser-resourcedlanguages.6ConclusionInthiswork,wehavepresentedanapproachtothechallengingandlittle-studiedtaskofrankingwordsintermsoftheirintensityonacontinuousscale.Weaddresstheissueofsparsityoftheintensityorderev-idenceintwoways.First,pairwiseintensityscoresarecomputedusinglinguisticallyintuitivepatternsinaverylarge,Web-scalecorpus.Next,aMixedIntegerLinearProgram(MILP)expandsonthisfur-therbyinferringnewrelativerelationships.Insteadofmakingorderingdecisionsaboutwordpairsin-dependently,ourMILPconsidersthejointdecisionspaceandfactorsine.g.howtwoadjectivesrelatetosomethirdadjective,thusenforcingglobalcon-straintssuchastransitivity.Ourapproachisgeneralenoughtoallowaddi-tionalevidencesuchassynonymyintheMILP,andcanstraightforwardlybeappliedtootherwordclasses(suchasverbs),andtootherlanguages(monolinguallyaswellascross-lingually).Theoverallresultsacrossmultiplemetricsaresubstan-tiallybetterthanpreviousapproaches,andfairlyclosetohumanagreementonthischallengingtask.AcknowledgmentsWewouldliketothanktheeditorandtheanony-mousreviewersfortheirhelpfulfeedback.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
2
2
7
1
5
6
6
6
7
1

/
t

a
c
_
a
_
0
0
2
2
7
p
d

b
y
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

290

ReferencesMohitBansalandDanKlein.2011.Web-scalefeaturesforfull-scaleparsing.InProceedingsofACL2011.ThorstenBrantsandAlexFranz.2006.TheGoogleWeb1T5-gramcorpusversion1.1.LDC2006T13.ThorstenBrantsandAlexFranz.2009.Web1T5-gram,10Europeanlanguages,version1.LDC2009T25.TimothyChklovskiandPatrickPantel.2004.VerbO-cean:Miningthewebforﬁne-grainedsemanticverbrelations.InProceedingsofEMNLP2004.JacobCohen.1960.Acoefﬁcientofagreementfornom-inalscales.EducationalandPsychologicalMeasure-ment,20(1):37–46.DmitryDavidovandAriRappoport.2008.Unsuper-viseddiscoveryofgenericrelationshipsusingpatternclustersanditsevaluationbyautomaticallygeneratedsatanalogyquestions.InProceedingsofACL2008.Marie-CatherinedeMarneffe,ChristopherD.Manning,andChristopherPotts.2010.Wasitgood?itwasprovocative.learningthemeaningofscalaradjectives.InProceedingsofACL2010.GerarddeMeloandGerhardWeikum.2009.Towardsauniversalwordnetbylearningfromcombinedevi-dence.InProceedingsofCIKM2009.ZhichengDou,RuihuaSong,XiaojieYuan,andJi-RongWen.2008.Areclick-throughdataadequateforlearn-ingwebsearchrankings?InProc.ofCIKM2008.DerekGrossandKatherineJ.Miller.1990.AdjectivesinWordNet.InternationalJournalofLexicography,3(4):265–277.VasileiosHatzivassiloglouandKathleenR.McKeown.1993.Towardstheautomaticidentiﬁcationofadjecti-valscales:Clusteringadjectivesaccordingtomeaning.InProceedingsofACL1993.VasileiosHatzivassiloglouandKathleenR.McKeown.1997.Predictingthesemanticorientationofadjec-tives.InProceedingsofACL1997.VasileiosHatzivassiloglouandJanyceM.Wiebe.2000.Effectsofadjectiveorientationandgradabilityonsen-tencesubjectivity.InProceedingsofCOLING2000.MartiHearst.1992.Automaticacquisitionofhyponymsfromlargetextcorpora.InProceedingsofCOLING1992.DianaInkpenandGraemeHirst.2006.Buildingandusingalexicalknowledgebaseofnear-synonymdif-ferences.ComputationalLinguistics,32(2):223–262.MauriceG.Kendall.1938.Anewmeasureofrankcor-relation.Biometrika,30(1/2):81–93.AdamKilgarriff.2007.Googleologyisbadscience.ComputationalLinguistics,33(1).WilliamH.Kruskal.1958.Ordinalmeasuresofassocia-tion.JournaloftheAmericanStatisticalAssociation,53(284):814–861.JingjingLiuandStephanieSeneff.2009.Reviewsenti-mentscoringviaaparse-and-paraphraseparadigm.InProceedingsofEMNLP2009.GeorgeA.Miller.1995.WordNet:Alexicaldatabaseforenglish.CommunicationsoftheACM,38(11):39–41.SaidM.Mohammad,BonnieJ.Dorr,GraemeHirst,andPeterD.Turney.2013.Computinglexicalcontrast.ComputationalLinguistics.BoPangandLillianLee.2008.Opinionminingandsentimentanalysis.FoundationsandTrendsinInfor-mationRetrieval,2(1-2):1–135,January.PeterF.SchulamandChristianeFellbaum.2010.Au-tomaticallydeterminingthesemanticgradationofger-manadjectives.InProceedingsofKONVENS2010.VeraSheinmanandTakenobuTokunaga.2009.AdjS-cales:Visualizingdifferencesbetweenadjectivesforlanguagelearners.IEICETransactionsonInformationandSystems,92(8):1542–1550.VeraSheinman,TakenobuTokunaga,I.Julien,P.Schu-lam,andC.Fellbaum.2012.ReﬁningWordNetadjec-tivedumbbellsusingintensityrelations.InProceed-ingsofGlobalWordNetConference2012.RionSnow,DanielJurafsky,andAndrewY.Ng.2006.Semantictaxonomyinductionfromheterogenousevi-dence.InProceedingsofCOLING/ACL2006.CharlesSpearman.1904.Theproofandmeasurementofassociationbetweentwothings.TheAmericanjournalofpsychology,15(1):72–101.FabianM.Suchanek,MauroSozio,andGerhardWeikum.2009.SOFIE:aself-organizingframeworkforinformationextraction.InProceedingsofWWW2009.MaiteTaboada,JulianBrooke,MilanToﬁloskiy,andKimberlyVollz.2011.Lexicon-basedmethodsforsentimentanalysis.ComputationalLinguistics.NiketTandonandGerarddeMelo.2010.Informationextractionfromweb-scalen-gramdata.InProceed-ingsoftheSIGIR2010WebN-gramWorkshop.PeterD.TurneyandMichaelL.Littman.2003.Mea-suringpraiseandcriticism:Inferenceofsemanticorientationfromassociation.ACMTrans.Inf.Syst.,21(4):315–346,October.PeterD.Turney.2008.Auniformapproachtoanalogies,synonyms,antonyms,andassociations.InProceed-ingsofCOLING2008.XiaofengYangandJianSu.2007.Coreferenceresolu-tionusingsemanticrelatednessinformationfromauto-maticallydiscoveredpatterns.InProceedingsofACL2007.AinurYessenalinaandClaireCardie.2011.Composi-tionalmatrix-spacemodelsforsentimentanalysis.InProceedingsofEMNLP2011. Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee. image

Download pdf

Specialized Research AI at MIT

Specialized Research AI at MIT

Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee.