Transactions of the Association for Computational Linguistics, 1 (2013) 353–366. Action Editor: Patrick Pantel. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, 1 (2013) 353–366. Editor de acciones: Patricio Pantel.
Submitted 5/2013; Revised 7/2013; Publicado 10/2013. C(cid:13)2013 Asociación de Lingüística Computacional.

DistributionalSemanticsBeyondWords:SupervisedLearningofAnalogyandParaphrasePeterD.TurneyNationalResearchCouncilCanadaInformationandCommunicationsTechnologiesOttawa,ontario,Canada,K1A0R6peter.turney@nrc-cnrc.gc.caAbstractTherehavebeenseveraleffortstoextenddistributionalsemanticsbeyondindividualwords,tomeasurethesimilarityofwordpairs,phrases,andsentences(brieﬂy,tuples;orderedsetsofwords,contiguousornoncontiguous).Onewaytoextendbeyondwordsistocom-paretwotuplesusingafunctionthatcom-binespairwisesimilaritiesbetweenthecom-ponentwordsinthetuples.Astrengthofthisapproachisthatitworkswithbothrela-tionalsimilarity(analogy)andcompositionalsimilarity(paraphrase).Sin embargo,pastworkrequiredhand-codingthecombinationfunc-tionfordifferenttasks.Themaincontributionofthispaperisthatcombinationfunctionsaregeneratedbysupervisedlearning.Weachievestate-of-the-artresultsinmeasuringrelationalsimilaritybetweenwordpairs(SATanalo-giesandSemEval2012Task2)andmeasur-ingcompositionalsimilaritybetweennoun-modiﬁerphrasesandunigrams(multiple-choiceparaphrasequestions).1IntroductionHarris(1954)andFirth(1957)hypothesizedthatwordsthatappearinsimilarcontextstendtohavesimilarmeanings.Thishypothesisisthefounda-tionfordistributionalsemantics,inwhichwordsarerepresentedbycontextvectors.Thesimilarityoftwowordsiscalculatedbycomparingthetwocor-respondingcontextvectors(Lundetal.,1995;Lan-dauerandDumais,1997;TurneyandPantel,2010).Distributionalsemanticsishighlyeffectiveformeasuringthesemanticsimilaritybetweenindivid-ualwords.Onasetofeightymultiple-choicesyn-onymquestionsfromthetestofEnglishasafor-eignlanguage(TOEFL),adistributionalapproachrecentlyachieved100%accuracy(BullinariaandLevy,2012).Sin embargo,ithasbeendifﬁculttoextenddistributionalsemanticsbeyondindividualwords,towordpairs,phrases,andsentences.Movingbeyondindividualwords,therearevari-oustypesofsemanticsimilaritytoconsider.Herewefocusonparaphraseandanalogy.Paraphraseissimilarityinthemeaningoftwopiecesoftext(AndroutsopoulosandMalakasiotis,2010).Anal-ogyissimilarityinthesemanticrelationsoftwosetsofwords(Turney,2008a).Itiscommontostudyparaphraseatthesentencelevel(AndroutsopoulosandMalakasiotis,2010),butweprefertoconcentrateonthesimplesttypeofparaphrase,whereabigramparaphrasesaunigram.Forexample,doghouseisaparaphraseofkennel.Inourexperiments,weconcentrateonnoun-modiﬁerbigramsandnoununigrams.Analogiesmaptermsinonedomaintotermsinanotherdomain(Gentner,1983).Thefamiliaranal-ogybetweenthesolarsystemandtheRutherford-Bohratomicmodelinvolvesseveraltermsfromthedomainofthesolarsystemandthedomainoftheatomicmodel(Turney,2008a).Thesimplesttypeofanalogyisproportionalanal-ogy,whichinvolvestwopairsofwords(Turney,2006b).Forexample,thepairhcook,rawiisanal-ogoustothepairhdecorate,plaini.Ifwecookathing,itisnolongerraw;ifwedecorateathing,él

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
3
1
5
6
6
6
7
5

/
t

a
C
_
a
_
0
0
2
3
3
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

354

isnolongerplain.Thesemanticrelationsbetweencookandrawaresimilartothesemanticrelationsbetweendecorateandplain.Inthefollowingexper-iments,wefocusonproportionalanalogies.Erk(2013)distinguishedfourapproachestoextenddistributionalsemanticsbeyondwords:Intheﬁrst,asinglevectorspacerepresentationforaphraseorsentenceiscomputedfromtherepresen-tationsoftheindividualwords(MitchellandLap-ata,2010;BaroniandZamparelli,2010).Inthesec-ond,twophrasesorsentencesarecomparedbycom-biningmultiplepairwisesimilarityvalues(Socheretal.,2011;Turney,2012).Tercero,weightedinferencerulesintegratedistributionalsimilarityandformallogic(Garretteetal.,2011).Cuatro,asinglespaceintegratesformallogicandvectors(Clarke,2012).Takingthesecondapproach,Turney(2012)intro-ducedadual-spacemodel,withonespaceformea-suringdomainsimilarity(similarityoftopicorﬁeld)andanotherforfunctionsimilarity(similarityofroleorusage).Similaritiesbeyondindividualwordsarecalculatedbyfunctionsthatcombinedomainandfunctionsimilaritiesofcomponentwords.Thedual-spacemodelhasbeenappliedtomea-suringcompositionalsimilarity(paraphraserecogni-tion)andrelationalsimilarity(analogyrecognition).Inexperimentsthattestedforsensitivitytowordorder,thedual-spacemodelperformedsigniﬁcantlybetterthancompetingapproaches(Turney,2012).Alimitationofpastworkwiththedual-spacemodelisthatthecombinationfunctionswerehand-coded.Ourmaincontributionistoshowhowhand-codingcanbeeliminatedwithsupervisedlearning.Foreaseofreference,wewillcallourapproachSuperSim(supervisedsimilarity).Withnomodiﬁ-cationofSuperSimforthespeciﬁctask(relationalsimilarityorcompositionalsimilarity),weachievebetterresultsthanprevioushand-codedmodels.Compositionalsimilarity(paraphrase)comparestwocontiguousphrasesorsentences(n-grams),whereasrelationalsimilarity(analogy)doesnotrequirecontiguity.Weusetupletorefertobothcon-tiguousandnoncontiguouswordsequences.Weapproachanalogyasaproblemofsupervisedtupleclassiﬁcation.Tomeasuretherelationalsim-ilaritybetweentwowordpairs,wetrainSuperSimwithquadruplesthatarelabeledaspositiveandneg-ativeexamplesofanalogies.Forexample,thepro-portionalanalogyhcook,raw,decorate,plainiislabeledasapositiveexample.Aquadrupleisrepresentedbyafeaturevector,composedofdomainandfunctionsimilaritiesfromthedual-spacemodelandotherfeaturesbasedoncorpusfrequencies.SuperSimusesasupportvectormachine(Platón,1998)tolearntheprobabilitythataquadrupleha,b,C,diconsistsofawordpairha,biandananalogouswordpairhc,di.Theprobabilitycanbeinterpretedasthedegreeofrelationalsimilar-itybetweenthetwogivenwordpairs.Wealsoapproachparaphraseassupervisedtupleclassiﬁcation.Tomeasurethecompositionalsimi-laritybeweenanm-gramandann-gram,wetrainthelearningalgorithmwith(m+n)-tuplesthatarepositiveandnegativeexamplesofparaphrases.SuperSimlearnstoestimatetheprobabilitythatatripleha,b,ciconsistsofacompositionalbigramabandasynonymousunigramc.Forinstance,thephraseﬁshtankissynonymouswithaquarium;thatis,ﬁshtankandaquariumhavehighcompositionalsimilarity.Thetriplehﬁsh,tank,aquariumiisrepre-sentedusingthesamefeaturesthatweusedforanal-ogy.Theprobabilityofthetriplecanbeinterpretedasthedegreeofcompositionalsimilaritybetweenthegivenbigramandunigram.WereviewrelatedworkinSection2.Thegen-eralfeaturespaceforlearningrelationsandcompo-sitionsispresentedinSection3.TheexperimentswithrelationalsimilarityaredescribedinSection4,andSection5reportstheresultswithcompositionalsimilarity.Section6discussestheimplicationsoftheresults.WeconsiderfutureworkinSection7andconcludeinSection8.2RelatedWorkInSemEval2012,Task2wasconcernedwithmea-suringthedegreeofrelationalsimilaritybetweentwowordpairs(Jurgensetal.,2012)andTask6(Agirreetal.,2012)examinedthedegreeofseman-ticequivalencebetweentwosentences.Thesetwoareasofresearchhavebeenmostlyindependent,althoughSocheretal.(2012)andTurney(2012)presentuniﬁedperspectivesonthetwotasks.Weﬁrstdiscusssomeworkonrelationalsimilarity,thensomeworkoncompositionalsimilarity,andlastlyworkthatuniﬁesthetwotypesofsimilarity.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
3
1
5
6
6
6
7
5

/
t

a
C
_
a
_
0
0
2
3
3
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

355

2.1RelationalSimilarityLRA(latentrelationalanalysis)measuresrela-tionalsimilaritywithapair–patternmatrix(Turney,2006b).Rowsinthematrixcorrespondtowordpairs(a,b)andcolumnscorrespondtopatternsthatconnectthepairs(“afortheb”)inalargecor-pus.Thisisaholistic(noncompositional)approachtodistributionalsimilarity,sincethewordpairsareopaquewholes;thecomponentwordshavenosep-araterepresentations.Acompositionalapproachtoanalogyhasarepresentationforeachword,andawordpairisrepresentedbycomposingtherepresen-tationsforeachmemberofthepair.Givenavocabu-laryofNwords,acompositionalapproachrequiresNrepresentationstohandleallpossiblewordpairs,butaholisticapproachrequiresN2representations.Holisticapproachesdonotscaleup(Turney,2012).LRArequiredninedaystorun.Bollegalaetal.(2008)answeredtheSATanal-ogyquestionswithasupportvectormachinetrainedonquadruples(proportionalanalogies),aswedohere.However,theirfeaturevectorsareholistic,andhencetherearescalingproblems.Herda˘gdelenandBaroni(2009)usedasupportvectormachinetolearnrelationalsimilarity.Theirfeaturevectorscontainedacombinationofholisticandcompositionalfeatures.Measuringrelationalsimilarityiscloselycon-nectedtoclassifyingwordpairsaccordingtotheirsemanticrelations(TurneyandLittman,2005).SemanticrelationclassiﬁcationwasthefocusofSemEval2007Task4(Girjuetal.,2007)andSemEval2010Task8(Hendrickxetal.,2010).2.2CompositionalSimilarityToextenddistributionalsemanticsbeyondwords,manyresearcherstaketheﬁrstapproachdescribedbyErk(2013),inwhichasinglevectorspaceisusedforindividualwords,phrases,andsentences(Lan-dauerandDumais,1997;MitchellandLapata,2008;MitchellandLapata,2010).Inthisapproach,giventhewordsaandbwithcontextvectorsaandb,weconstructavectorforthebigramabbyapplyingvec-toroperationstoaandb.MitchellandLapata(2010)experimentwithmanydifferentvectoroperationsandﬁndthatelement-wisemultiplicationperformswell.Thebigramabisrepresentedbyc=a(cid:12)b,whereci=ai·bi.However,element-wisemultiplica-tioniscommutative,sothebigramsabandbamaptothesamevectorc.Inexperimentsthattestforordersensitivity,element-wisemultiplicationper-formspoorly(Turney,2012).Wecantreatthebigramabasaunit,asifitwereasingleword,andconstructacontextvectorforabfromoccurrencesofabinalargecorpus.Thisholisticapproachtorepresentingbigramsperformswellwhenalimitedsetofbigramsisspeciﬁedinadvance(beforebuildingtheword–contextmatrix),butitdoesnotscaleup,becausetherearetoomanypossiblebigrams(Turney,2012).Althoughtheholisticapproachdoesnotscaleup,wecangenerateafewholisticbigramvectorsandusethemtotrainasupervisedregressionmodel(Guevara,2010;BaroniandZamparelli,2010).Givenanewbigramcd,notobservedinthecorpus,theregressionmodelcanpredictaholisticvectorforcd,ifcanddhavebeenobservedseparately.WeshowinSection5thatthisideacanbeadaptedtotrainSuperSimwithoutmanuallylabeleddata.Socheretal.(2011)takethesecondapproachdescribedbyErk(2013),inwhichtwosentencesarecomparedbycombiningmultiplepairwisesimilar-ityvalues.Theyconstructavariable-sizedsimilar-itymatrixX,inwhichtheelementxijisthesim-ilaritybetweenthei-thphraseofonesentenceandthej-thphraseoftheother.Sincesupervisedlearn-ingissimplerwithﬁxed-sizedfeaturevectors,thevariable-sizedsimilaritymatrixisthenreducedtoasmallerﬁxed-sizedmatrix,toallowcomparisonofpairsofsentencesofvaryinglengths.2.3UniﬁedPerspectivesonSimilaritySocheretal.(2012)representwordsandphraseswithapair,consistingofavectorandamatrix.Thevectorcapturesthemeaningofthewordorphraseandthematrixcaptureshowawordorphrasemod-iﬁesthemeaningofanotherwordorphrasewhentheyarecombined.Theyapplythismatrix–vectorrepresentationtobothcompositionsandrelations.Turney(2012)representswordswithtwovectors,avectorfromdomainspaceandavectorfromfunc-tionspace.Thedomainvectorcapturesthetopicorﬁeldofthewordandthefunctionvectorcapturesthe

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
3
1
5
6
6
6
7
5

/
t

a
C
_
a
_
0
0
2
3
3
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

356

functionalroleoftheword.Thisdual-spacemodelisappliedtobothcompositionsandrelations.Hereweextendthedual-spacemodelofTur-ney(2012)intwoways:Hand-codingisreplacedwithsupervisedlearningandtwonewsetsoffea-turesaugmentdomainandfunctionspace.Movingtosupervisedlearninginsteadofhand-codingmakesiteasiertointroducenewfeatures.Inthedual-spacemodel,parameterizedsimilar-itymeasuresprovidedtheinputvaluesforhand-craftedfunctions.Eachtaskrequiredadifferentsetofhand-craftedfunctions.Theparametersofthesimilaritymeasuresweretunedusingacustomizedgridsearchalgorithm.Thegridsearchalgorithmwasnotsuitableforintegrationwithasupervisedlearningalgorithm.TheinsightbehindSuperSimisthat,givenappropriatefeatures,asupervisedlearn-ingalgorithmcanreplacethegridsearchalgorithmandthehand-craftedfunctions.3FeaturesforTupleClassiﬁcationWerepresentatuplewithfourtypesoffeatures,allbasedonfrequenciesinalargecorpus.Theﬁrsttypeoffeatureisthelogarithmofthefrequencyofaword.Thesecondtypeisthepositivepoint-wisemutualinformation(PPMI)betweentwowords(ChurchandHanks,1989;BullinariaandLevy,2007).Thirdandfourtharethesimilaritiesoftwowordsindomainandfunctionspace(Turney,2012).Inthefollowingexperiments,weusethePPMImatrixfromTurneyetal.(2011)andthedomainandfunctionmatricesfromTurney(2012).1Thethreematricesandthewordfrequencydataarebasedonthesamecorpus,acollectionofwebpagesgath-eredfromuniversitywebsites,containing5×1010words.2Allthreematricesareword–contextmatri-ces,inwhichtherowscorrespondtoterms(wordsandphrases)inWordNet.3Thecolumnscorrespondtothecontextsinwhichthetermsappear;eachmatrixinvolvesadifferentkindofcontext.1Thethreematricesandthewordfrequencydataareavail-ableonrequestfromtheauthor.Thematrixﬁlesrangefromtwotoﬁvegigabyteswhenpackagedandcompressedfordistri-bution.2ThecorpuswascollectedbyCharlesClarkeattheUniver-sityofWaterloo.Itisabout280gigabytesofplaintext.3Seehttp://wordnet.princeton.edu/forinfor-mationaboutWordNet.Lethx1,x2,…,xnibeann-tupleofwords.Thenumberoffeaturesweusetorepresentthistupleincreasesasafunctionofn.Theﬁrstsetoffeaturesconsistsoflogfrequencyvaluesforeachwordxiinthen-tuple.Letfreq(xi)bethefrequencyofxiinthecorpus.WedeﬁneLF(xi)aslog(freq(xi)+1).Ifxiisnotinthecorpus,freq(xi)iszero,andthusLF(xi)isalsozero.Therearenlogfrequencyfeatures,oneLF(xi)featureforeachwordinthen-tuple.Thesecondsetoffeaturesconsistsofpositivepointwisemutualinformationvaluesforeachpairofwordsinthen-tuple.WeusetherawPPMImatrixfromTurneyetal.(2011).Althoughtheycomputedthesingularvaluedecomposition(SVD)toprojecttherowvectorsintoalower-dimensionalspace,weneedtheoriginalhigh-dimensionalcolumnsforourfeatures.TherawPPMImatrixhas114,501rowsand139,246columnswithadensityof1.2%.ForeachterminWordNet,thereisacorrespondingrowintherawPPMImatrix.ForeachunigraminWord-Net,therearetwocorrespondingcolumnsintherawPPMImatrix,onemarkedleftandtheotherright.Supposexicorrespondstothei-throwofthePPMImatrixandxjcorrespondsthej-thcolumn,markedleft.Thevalueinthei-throwandj-thcol-umnofthePPMImatrix,PPMI(xi,xj,izquierda),isthepositivepointwisemutualinformationofxiandxjco-occurringinthecorpus,wherexjistheﬁrstwordtotheleftofxi,ignoringanyinterveningstopwords(thatis,ignoringanywordsthatarenotinWordNet).Ifxi(orxj)hasnocorrespondingrow(orcolumn)inthematrix,thenthePPMIvalueissettozero.Turneyetal.(2011)estimatedPPMI(xi,xj,izquierda)bysamplingthecorpusforphrasescontainingxiandthenlookingforxjtotheleftofxiinthesampledphrases(andlikewiseforright).Duetothissam-plingprocess,PPMI(xi,xj,izquierda)doesnotnecessar-ilyequalPPMI(xj,xi,bien).Forexample,supposexiisararewordandxjisacommonword.WithPPMI(xi,xj,izquierda),whenwesamplephrasescontain-ingxi,wearerelativelylikelytoﬁndxjinsomeofthesephrases.WithPPMI(xj,xi,bien),whenwesamplephrasescontainingxj,wearelesslikelytoﬁndanyphrasescontainingxi.Although,intheory,PPMI(xi,xj,izquierda)shouldequalPPMI(xj,xi,bien),theyarelikelytobeunequalgivenalimitedsample.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
2
3
3
1
5
6
6
6
7
5

/
t

a
C
_
a
_
0
0
2
3
3
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

357

Fromthen-tuple,weselectallofthen(n−1)pares,hxi,xji,suchthati6=j.Wethengener-atetwofeaturesforeachpair,PPMI(xi,xj,izquierda)andPPMI(xi,xj,bien).Thusthereare2n(n−1)PPMIvaluesinthesecondsetoffeatures.Thethirdsetoffeaturesconsistsofdomainspacesimilarityvaluesforeachpairofwordsinthen-tuple.Domainspacewasdesignedtocapturethetopicofaword.Turney(2012)ﬁrstconstructedafrequencymatrix,inwhichtherowscorrespondtotermsinWordNetandthecolumnscorrespondtonearbynouns.Givenatermxi,thecorpuswassam-pledforphrasescontainingxiandthephraseswereprocessedwithapart-of-speechtagger,toidentifynouns.Ifthenounxjwastheclosestnountotheleftorrightofxi,thenthefrequencycountforthei-throwandj-thcolumnwasincremented.Thehypoth-esiswasthatthenounsnearatermcharacterizethetopicsassociatedwiththeterm.Theword–contextfrequencymatrixfordomainspacehas114,297rows(terms)and50,000columns(nouncontexts,temas),withadensityof2.6%.ThefrequencymatrixwasconvertedtoaPPMImatrixandthensmoothedwithSVD.TheSVDyieldsthreematrices,Ud.,S,andV.AtermindomainspaceisrepresentedbyarowvectorinUkΣpk.Theparameterkspeciﬁesthenum-berofsingularvaluesinthetruncatedsingularvaluedecomposition;thatis,kisthenumberoflatentfactorsinthelow-dimensionalrepresentationoftheterm(LandauerandDumais,1997).WegenerateUkandΣkbydeletingthecolumnsinUandΣcorrespondingtothesmallestsingularvalues.TheparameterpraisesthesingularvaluesinΣktothepowerp(Caron,2001).Aspgoesfromonetozero,factorswithsmallersingularvaluesaregivenmoreweight.Thishastheeffectofmakingthesimilaritymeasuremorediscriminating(Turney,2012).Thesimilarityoftwowordsindomainspace,Dom(xi,xj,k,pag),iscomputedbyextractingtherowvectorsinUkΣpkthatcorrespondtothewordsxiandxj,andthencalculatingtheircosine.Optimalper-formancerequirestuningtheparameterskandpforthetask(BullinariaandLevy,2012;Turney,2012).Inthefollowingexperiments,weavoiddirectlytun-ingkandpbygeneratingfeatureswithavarietyofvaluesforkandp,allowingthesupervisedlearningalgorithmtodecidewhichfeaturestouse.FeaturesetSizeofsetLF(xi)nPPMI(xi,xj,handedness)2norte(n−1)Dom(xi,xj,k,pag)12norte(n−1)nknpFun(xi,xj,k,pag)12norte(n−1)nknpTable1:Thefoursetsoffeaturesandtheirsizes.Fromthen-tuple,weselectall12n(n−1)pares,hxi,xji,suchthatie l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 358 n-tupleLFPPMIDomFunTotal11000122411011022633123303306754424660660134855401100110022456660165016503366Table2:Numberoffeaturesforvarioustuplesizes.corpus.ThePPMIfeaturesarebasedondirectco-occurrencesoftwowords;thatis,PPMIisonlygreaterthanzeroifthetwowordsactuallyoccurtogetherinthecorpus.Domainandfunctionspacecaptureindirectorhigher-orderco-occurrence,duetothetruncatedSVD(LemaireandDenhi`ere,2006);thatis,thevaluesofDom(xi,xj,k,pag)andFun(xi,xj,k,pag)canbehighevenwhenxiandxjdonotactuallyco-occurinthecorpus.Weconjec-turethatthereareyethigherordersinthishierarchythatwouldprovideimprovedsimilaritymeasures.SuperSimlearnstoclassifytuplesbyrepresentingthemwiththesefeatures.SuperSimusesthesequen-tialminimaloptimization(SMO)supportvectormachine(SVM)asimplementedinWeka(Platón,1998;Wittenetal.,2011).4Thekernelisanormal-izedthird-orderpolynomial.Wekaprovidesproba-bilityestimatesfortheclassesbyﬁttingtheoutputsoftheSVMwithlogisticregressionmodels.4RelationalSimilarityThissectionpresentsexperimentswithlearningrela-tionalsimilarityusingSuperSim.Thetrainingdatasetsconsistofquadruplesthatarelabeledaspositiveandnegativeexamplesofanalogies.Table2showsthatthefeaturevectorshave1,348elements.Weexperimentwiththreedatasets,acollectionof374ﬁve-choicequestionsfromtheSATcol-legeentranceexam(Turneyetal.,2003),amodi-ﬁedten-choicevariationoftheSATquestions(Tur-ney,2012),andtherelationalsimilaritydatasetfromSemEval2012Task2(Jurgensetal.,2012).54Wekaisavailableathttp://www.cs.waikato.ac.nz/ml/weka/.5TheSATquestionsareavailableonrequestfromtheauthor.TheSemEval2012Task2datasetisavailableathttps://sites.google.com/site/semeval2012task2/.Stem:palabra:languageChoices:(1)paint:portrait(2)poetry:ritmo(3)nota:música(4)tale:story(5)week:yearSolution:(3)nota:musicTable3:Aﬁve-choiceSATanalogyquestion.4.1Five-choiceSATQuestionsTable3isanexampleofaquestionfromthe374ﬁve-choiceSATquestions.Eachﬁve-choiceques-tionyieldsﬁvelabeledquadruples,bycombiningthestemwitheachchoice.Thequadruplehword,lan-guage,nota,musiciislabeledpositiveandtheotherfourquadruplesarelabelednegative.Sincelearningworksbetterwithbalancedtrain-ingdata(JapkowiczandStephen,2002),weusethesymmetriesofproportionalanalogiestoaddmorepositiveexamples(LepageandShin-ichi,1996).Foreachpositivequadruple,ha,b,C,di,weaddthreemorepositivequadruples,hb,a,d,ci,hc,d,a,bi,andhd,C,b,ai.Thuseachﬁve-choicequestionpro-videsfourpositiveandfournegativequadruples.Weuseten-foldcross-validationtoapplySuper-SimtotheSATquestions.ThefoldsareconstructedsothattheeightquadruplesfromeachSATquestionarekepttogetherinthesamefold.Toansweraques-tioninthetestingfold,thelearnedmodelassignsaprobabilitytoeachoftheﬁvechoicesandguessesthechoicewiththehighestprobability.SuperSimachievesascoreof54.8%correct(205outof374).Table4givestherankofSuperSiminthelistofthetoptenresultswiththeSATanalogyquestions.6Thescoresrangingfrom51.1%to57.0%arenotsig-niﬁcantlydifferentfromSuperSim’sscoreof54.8%,accordingtoFisher’sexacttestatthe95%conﬁ-dencelevel.However,SuperSimanswerstheSATquestionsinafewminutes,whereasLRArequiresninedays,andSuperSimlearnsitsmodelsautomat-ically,unlikethehand-codingofTurney(2012).6SeetheStateoftheArtpageontheACLWikiathttp://aclweb.org/aclwiki. yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 359 AlgorithmReferenceCorrectKnow-BestVeale(2004)43.0k-meansBic¸ici&Yuret(2006)44.0BagPackHerda˘gdelen&Baroni(2009)44.1VSMTurney&Littman(2005)47.1Dual-SpaceTurney(2012)51.1BMIBollegalaetal.(2009)51.1PairClassTurney(2008b)52.1PERTTurney(2006a)53.5SuperSim—54.8LRATurney(2006b)56.1HumanAveragecollegeapplicant57.0Table4:Thetoptenresultsonﬁve-choiceSATquestions.4.2Ten-choiceSATQuestionsInadditiontosymmetries,proportionalanalogieshaveasymmetries.Ingeneral,ifthequadrupleha,b,C,diispositive,ha,d,C,biisnegative.Forexample,hword,idioma,nota,musiciisagoodanalogy,buthword,música,nota,languageiisnot.Wordsarethebasicunitsoflanguageandnotesarethebasicunitsofmusic,butwordsarenotnecessaryformusicandnotesarenotnecessaryforlanguage.Turney(2012)usedthisasymmetrytoconvertthe374ﬁve-choiceSATquestionsinto374ten-choiceSATquestions.Eachchoicehc,diwasexpandedwiththestemha,bi,resultinginthequadrupleha,b,C,di,andthentheorderwasshuf-ﬂedtoha,d,C,bi,sothateachchoicepairinaﬁve-choicequestiongeneratedtwochoicequadruplesinaten-choicequestion.Nineofthequadruplesarenegativeexamplesandthequadrupleconsistingofthestempairfollowedbythesolutionpairistheonlypositiveexample.Thepurposeoftheten-choicequestionsistotesttheabilityofmeasuresofrela-tionalsimilaritytoavoidtheasymmetricdistractors.Weusetheten-choicequestionstocomparethehand-codeddual-spaceapproach(Turney,2012)withSuperSim.Wealsousethesequestionstoper-formanablationstudyofthefoursetsoffeaturesinSuperSim.Aswiththeﬁve-choicequestions,weusethesymmetriesofproportionalanalogiestoaddthreemorepositiveexamples,sothetrainingdatasethasninenegativeexamplesandfourposi-tiveexamplesperquestion.Weapplyten-foldcross-validationtothe374ten-choicequestions.Ontheten-choicequestions,SuperSim’sscoreFeaturesAlgorithmLFPPMIDomFunCorrectDual-Space001147.9SuperSim111152.7SuperSim011152.7SuperSim101152.7SuperSim110145.7SuperSim111041.7SuperSim10005.6SuperSim010032.4SuperSim001039.6SuperSim000139.3Table5:Featureablationwithten-choiceSATquestions.is52.7%(Table5),comparedto54.8%ontheﬁve-choicequestions(Table4),adropof2.1%.Thehand-codeddual-spacemodelscores47.9%(Table5),comparedto51.1%ontheﬁve-choicequestions(Table4),adropof3.2%.Thedif-ferencebetweenSuperSim(52.7%)andthehand-codeddual-spacemodel(47.9%)isnotsigniﬁcantaccordingtoFisher’sexacttestatthe95%conﬁ-dencelevel.TheadvantageofSuperSimisthatitdoesnotneedhand-coding.TheresultsshowthatSuperSimcanavoidtheasymmetricdistractors.Table5showstheimpactofdifferentsubsetsoffeaturesonthepercentageofcorrectanswerstotheten-choiceSATquestions.Includedfeaturesaremarked1andablatedfeaturesaremarked0.Theresultsshowthatthelogfrequency(LF)andPPMIfeaturesarenothelpful(butalsonotharmful)forrelationalsimilarity.Wealsoseethatdomainspaceandfunctionspacearebothneededforgoodresults.4.3SemEval2012Task2TheSemEval2012Task2datasetisbasedonthesemanticrelationclassiﬁcationschemeofBejaretal.(1991),consistingoftenhigh-levelcategoriesofrelationsandseventy-ninesubcategories,withparadigmaticexamplesofeachsubcategory.Forinstance,thesubcategorytaxonomicinthecate-goryclassinclusionhasthreeparadigmaticexam-ples,ﬂower:tulip,emotion:rage,andpoem:sonnet.Jurgensetal.(2012)usedAmazon’sMechanicalTurktocreatetheSemEval2012Task2datasetintwophases.Intheﬁrstphase,Turkersexpandedtheparadigmaticexamplesforeachsubcategorytoan l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 360 AlgorithmReferenceSpearmanBUAPTovaretal.(2012)0.014Duluth-V2Pedersen(2012)0.038Duluth-V1Pedersen(2012)0.039Duluth-V0Pedersen(2012)0.050UTD-SVMRink&Harabagiu(2012)0.116UTD-NBRink&Harabagiu(2012)0.229RNN-1600Mikolovetal.(2013)0.275UTD-LDARink&Harabagiu(2013)0.334ComZhilaetal.(2013)0.353SuperSim—0.408Table6:SpearmancorrelationsforSemEval2012Task2.averageofforty-onewordpairspersubcategory,atotalof3,218pairs.Inthesecondphase,eachwordpairfromtheﬁrstphasewasassignedaprototypical-ityscore,indicatingitssimilaritytotheparadigmaticexamples.ThechallengeofSemEval2012Task2wastoguesstheprototypicalityscores.SuperSimwastrainedontheﬁve-choiceSATquestionsandevaluatedontheSemEval2012Task2testdataset.Foragivenawordpair,wecreatedquadruples,combiningthewordpairwitheachoftheparadigmaticexamplesforitssubcategory.WethenusedSuperSimtocomputetheprobabilitiesforeachquadruple.Ourguessfortheprototypicalityscoreofthegivenwordpairwastheaverageoftheprobabilities.Spearman’srankcorrelationcoef-ﬁcientbetweentheTurkers’prototypicalityscoresandSuperSim’sscoreswas0.408,averagedoverthesixty-ninesubcategoriesinthetestingset.Super-SimhasthehighestSpearmancorrelationachievedtodateonSemEval2012Task2(seeTable6).5CompositionalSimilarityThissectionpresentsexperimentsusingSuperSimtolearncompositionalsimilarity.Thedatasetscon-sistoftriples,ha,b,ci,suchthatabisanoun-modiﬁerbigramandcisanoununigram.Thetriplesarelabeledaspositiveandnegativeexam-plesofparaphrases.Table2showsthatthefea-turevectorshave675elements.Weexperimentwithtwodatasets,seven-choiceandfourteen-choicenoun-modiﬁerquestions(Turney,2012).77Theseven-choicedatasetisavailableathttp://jair.org/papers/paper3640.html.Thefourteen-choicedatasetcanbegeneratedfromtheseven-choicedataset.Stem:fantasyworldChoices:(1)fairyland(2)fantasy(3)world(4)phantasy(5)universe(6)ranter(7)souringSolution:(1)fairylandTable7:Anoun-modiﬁerquestionbasedonWordNet.5.1Noun-ModiﬁerQuestionsTheﬁrstdatasetisaseven-choicenoun-modiﬁerquestiondataset,constructedfromWordNet(Tur-ney,2012).Thedatasetcontains680questionsfortrainingand1,500fortesting,atotalof2,180ques-tions.Table7showsoneofthequestions.Thestemisabigramandthechoicesareuni-grams.Thebigramiscomposedofaheadnoun(world),modiﬁedbyanadjectiveornoun(fantasy).Thesolutionistheunigram(fairyland)thatbelongstothesameWordNetsynsetasthestem.Thedistractorsaredesignedtobedifﬁcultforcur-rentapproachestocomposition.Forexample,iffan-tasyworldisrepresentedbyelement-wisemultipli-cationofthecontextvectorsforfantasyandworld(MitchellandLapata,2010),themostlikelyguessisfantasyorworld,notfairyland(Turney,2012).Eachseven-choicequestionyieldssevenlabeledtriples,bycombiningthestemwitheachchoice.Thetriplehfantasy,world,fairylandiislabeledpos-itiveandtheothersixtriplesarelabelednegative.Ingeneral,ifha,b,ciisapositiveexample,thenhb,a,ciisnegative.Forexample,worldfantasyisnotaparaphraseoffairyland.Theseconddatasetisconstructedbyapplyingthisshufﬂingtransfor-mationtoconvertthe2,180seven-choicequestionsinto2,180fourteen-choicequestions(Turney,2012).Theseconddatasetisdesignedtobedifﬁcultforapproachesthatarenotsensitivetowordorder.Table8showsthepercentageofthetestingquestionsthatareansweredcorrectlyforthetwodatasets.Becausevectoradditionandelement-wisemultiplicationarenotsensitivetowordorder,theyperformpoorlyonthefourteen-choicequestions.Forbothdatasets,SuperSimperformssigniﬁcantly l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 361 CorrectAlgorithm7-choices14-choicesVectoraddition50.122.5Element-wisemultiplication57.527.4Dual-Spacemodel58.341.5SuperSim75.968.0Holisticmodel81.6—Table8:Resultsforthetwonoun-modiﬁerdatasets.betterthanallotherapproaches,exceptfortheholis-ticapproach,accordingtoFisher’sexacttestatthe95%conﬁdencelevel.8Theholisticapproachisnoncompositional.Thestembigramisrepresentedbyasinglecontextvec-tor,generatedbytreatingthebigramasifitwereaunigram.Anoncompositionalapproachcannotscaleuptorealisticapplications(Turney,2012).Theholisticapproachcannotbeappliedtothefourteen-choicequestions,becausethebigramsintheseques-tionsdonotcorrespondtotermsinWordNet,andhencetheydonotcorrespondtorowvectorsinthematricesweuse(seeSection3).Turney(2012)founditnecessarytohand-codeasoundnesscheckintoallofthealgorithms(vectoraddition,element-wisemultiplication,dual-space,andholistic).Givenastemabandachoicec,thehand-codedcheckassignsaminimalscoretothechoiceifc=aorc=b.Wedonotneedtohand-codeanycheckingintoSuperSim.Itlearnsautomat-icallyfromthetrainingdatatoavoidsuchchoices.5.2AblationExperimentsTable9showstheeffectsofablatingsetsoffea-turesontheperformanceofSuperSimwiththefourteen-choicequestions.PPMIfeaturesarethemostimportant;bythemselves,theyachieve59.7%correct,althoughtheotherfeaturesareneededtoreach68.0%.Domainspacefeaturesreachthesec-ondhighestperformancewhenusedalone(34.6%),buttheyreduceperformance(from69.3%to68.0%)whencombinedwithotherfeatures;sin embargo,thedropisnotsigniﬁcantaccordingtoFisher’sexacttestatthe95%signiﬁcancelevel.SincethePPMIfeaturesplayanimportantroleinansweringthenoun-modiﬁerquestions,letustake8TheresultsforSuperSimarenewbuttheotherresultsinTable8arefromTurney(2012).FeaturesAlgorithmLFPPMIDomFunCorrectDual-Space001141.5SuperSim111168.0SuperSim011166.6SuperSim101152.3SuperSim110169.3SuperSim111065.9SuperSim100014.1SuperSim010059.7SuperSim001034.6SuperSim000132.9Table9:Ablationwithfourteen-choicequestions.PPMIfeaturesubsetsha,biha,cihb,ciCorrect11168.001159.910165.411067.510062.601058.100155.600052.3Table10:PPMIsubsetablationwithfourteen-choices.acloserlookatthem.FromTable2,weseethattherearetwelvePPMIfeaturesforthetripleha,b,ci,whereabisanoun-modiﬁerbigramandcisanoununigram.Wecansplitthetwelvefeaturesintothreesubsets,onesubsetforeachpairofwords,ha,bi,ha,ci,andhb,ci.Forexample,thesubsetforha,biisthefourfeaturesPPMI(a,b,izquierda),PPMI(b,a,izquierda),PPMI(a,b,bien),andPPMI(b,a,bien).Table10showstheeffectsofablatingthesesubsets.TheresultsinTable10indicatethatallthreePPMIsubsetscontributetotheperformanceofSuperSim,buttheha,bisubsetcontributesmorethantheothertwosubsets.Theha,bifeatureshelptoincreasethesensitivityofSuperSimtotheorderofthewordsinthenoun-modiﬁerbigram;forexam-ple,theymakeiteasiertodistinguishfantasyworldfromworldfantasy.5.3HolisticTrainingSuperSimuses680trainingquestionstolearntorec-ognizewhenabigramisaparaphraseofaunigram;itlearnsfromexpertknowledgeimplicitinWordNet l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 362 Stem:searchengineChoices:(1)searchengine(2)buscar(3)engine(4)searchlanguage(5)searchwarrant(6)dieselengine(7)steamengineSolution:(1)searchengineTable11:Aquestionbasedonholisticvectors.synsets.ItwouldbeadvantageoustobeabletotrainSuperSimwithlessrelianceonexpertknowledge.Pastworkwithadjective-nounbigramshasshownthatwecanuseholisticbigramvectorstotrainasupervisedregressionmodel(Guevara,2010;BaroniandZamparelli,2010).Theoutputoftheregressionmodelisavectorrepresentationforabigramthatapproximatestheholisticvectorforthebigram;thatis,itapproximatesthevectorwewouldgetbytreat-ingthebigramasifitwereaunigram.SuperSimdoesnotgeneratevectorsasoutput,butwecanstilluseholisticbigramvectorsfortraining.Table11showsaseven-choicetrainingquestionthatwasgeneratedwithoutusingWordNetsynsets.Thechoicesoftheformabarebigrams,butwerepre-sentthemwithholisticbigramvectors;wepretendtheyareunigrams.Wecallabbigramspseudo-unigrams.AsfarasSuperSimisconcerned,thereisnodifferencebetweenthesepseudo-unigramsandtrueunigrams.ThequestioninTable11istreatedthesameasthequestioninTable7.Wegenerate680holistictrainingquestionsbyrandomlyselecting680noun-modiﬁerbigramsfromWordNetasstemsforthequestions(searchengine),avoidinganybigramsthatappearasstemsinthetestingquestions.Thesolution(searchengine)isthepseudo-unigramthatcorrespondstothestembigram.InthematricesinSection3,eachterminWordNetcorrespondstoarowvector.ThesecorrespondingrowvectorsenableustotreatbigramsfromWordNetasiftheywereunigrams.Thedistractorsarethecomponentunigramsinthestembigram(searchandengine)andpseudo-unigramsthatshareacomponentwordwiththestem(searchwarrant,dieselengine).Toconstructtheholistictrainingquestions,weusedWordNetasaCorrectTraining7-choices14-choicesHolistic61.854.4Standard75.968.0Table12:ResultsforSuperSimwithholistictraining.sourceofbigrams,butweignoredtherichinfor-mationthatWordNetprovidesaboutthesebigrams,suchastheirsynonyms,hypernyms,hyponyms,meronyms,andglosses.Table12comparesholistictrainingtostandardtraining(thatis,trainingwithquestionslikeTable11versustrainingwithquestionslikeTable7).Thetestingsetisthestandardtestingsetinbothcases.Thereisasigniﬁcantdropinperformancewithholistictraining,buttheperformancestillsurpassesvectoraddition,element-wisemultiplication,andthehand-codeddual-spacemodel(seeTable8).Sinceholisticquestionscanbegeneratedauto-maticallywithouthumanexpertise,weexperi-mentedwithincreasingthesizeoftheholistictrain-ingdataset,growingitfrom1,000to10,000ques-tionsinincrementsof1,000.Theperformanceonthefourteen-choicequestionswithholistictrain-ingandstandardtestingvariedbetween53.3%and55.1%correct,withnocleartrendupordown.Thisisnotsigniﬁcantlydifferentfromtheperformancewith680holistictrainingquestions(54.4%).Itseemslikelythatthedropinperformancewithholistictraininginsteadofstandardtrainingisduetoadifferenceinthenatureofthestandardquestions(Table7)andtheholisticquestions(Table11).Wearecurrentlyinvestigatingthisissue.Weexpecttobeabletoclosetheperformancegapinfuturework,byimprovingtheholisticquestions.However,itispossiblethattherearefundamentallimitstoholistictraining.6DiscussionSuperSimperformsslightlybetter(notstatisticallysigniﬁcant)thanthehand-codeddual-spacemodelonrelationalsimilarityproblems(Section4),butitperformsmuchbetteroncompositionalsimilarityproblems(Section5).TheablationstudiessuggestthisisduetothePPMIfeatures,whichhavenoeffectonten-choiceSATperformance(Table5),buthavea l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 363 largeeffectonfourteen-choicenoun-modiﬁerpara-phraseperformance(Table9).Oneadvantageofsupervisedlearningoverhand-codingisthatitfacilitatesaddingnewfeatures.Itisnotclearhowtomodifythehand-codedequationsforthedual-spacemodelofnoun-modiﬁercomposi-tion(Turney,2012)toincludePPMIinformation.SuperSimisoneofthefewapproachestodistri-butionalsemanticsbeyondwordsthathasattemptedtoaddressbothrelationalandcompositionalsimilar-ity(seeSection2.3).Itisastrengthofthisapproachthatitworkswellwithbothkindsofsimilarity.7FutureWorkandLimitationsGiventhepromisingresultswithholistictrainingfornoun-modiﬁerparaphrases,weplantoexperimentwithholistictrainingforanalogies.Considertheproportionalanalogyhardistohardtimeasgoodistogoodtime,wherehardtimeandgoodtimearepseudo-unigrams.Toahuman,thisanalogyistriv-ial,butSuperSimhasnoaccesstothesurfaceformofaterm.AsfarasSuperSimisconcerned,thisanalogyismuchthesameastheanalogyhardistodifﬁcultyasgoodistofun.Thisstrategyautomat-icallyconvertssimple,easilygeneratedanalogiesintomorecomplex,challenginganalogies,whichmaybesuitedtotrainingSuperSim.Thisalsosuggeststhatnoun-modiﬁerparaphrasesmaybeusedtosolveanalogies.Perhapswecanevaluatethequalityofacandidateanalogyha,b,C,dibysearchingforatermesuchthathb,mi,aiandhd,mi,ciaregoodparaphrases.Forexample,considertheanalogymasonistostoneascarpenteristowood.Wecanparaphrasemasonasstoneworkerandcarpenteraswoodworker.Thistransformstheanalogytostoneworkeristostoneaswoodworkeristowood,whichmakesiteasiertorecognizetherelationalsimilarity.AnotherareaforfutureworkisextendingSuper-Simbeyondnoun-modiﬁerparaphrasestomeasur-ingthesimilarityofsentencepairs.WeplantoadaptideasfromSocheretal.(2011)forthistask.Theyusedynamicpoolingtorepresentsentencesofvary-ingsizewithﬁxed-sizefeaturevectors.Usingﬁxed-sizefeaturevectorsavoidstheproblemofquadraticgrowthanditenablesthesupervisedlearnertogen-eralizeoversentencesofvaryinglength.SomeofthecompetingapproachesdiscussedbyErk(2013)incorporateformallogic.TheworkofBaronietal.(2012)suggestswaysthatSuperSimcouldbedevelopedtodealwithlogic.WebelievethatSuperSimcouldbeneﬁtfrommorefeatures,withgreaterdiversity.Oneplacetolookforthesefeaturesishigherlevelsinthehierar-chythatwesketchinSection3.Ourablationexperimentssuggestthatdomainandfunctionspacesprovidethemostimportantfeaturesforrelationalsimilarity,butPPMIvaluesprovidethemostimportantfeaturesfornoun-modiﬁercomposi-tionalsimilarity.Explainingthisisanothertopicforfutureresearch.8ConclusionInthispaper,wehavepresentedSuperSim,auniﬁedapproachtoanalogy(relationalsimilarity)andpara-phrase(compositionalsimilarity).SuperSimtreatsthembothasproblemsofsupervisedtupleclassiﬁ-cation.Thesupervisedlearningalgorithmisastan-dardsupportvectormachine.ThemaincontributionofSuperSimisasetoffourtypesoffeaturesforrep-resentingtuples.Thefeaturesworkwellwithbothanalogyandparaphrase,withnotask-speciﬁcmod-iﬁcations.SuperSimmatchesthestateoftheartonSATanalogyquestionsandsubstantiallyadvancesthestateoftheartontheSemEval2012Task2chal-lengeandthenoun-modiﬁerparaphrasequestions.SuperSimrunsmuchfasterthanLRA(Turney,2006b),answeringtheSATquestionsinminutesinsteadofdays.Unlikethedual-spacemodel(Tur-ney,2012),SuperSimrequiresnohand-codedsimi-laritycompositionfunctions.Sincethereisnohand-coding,itiseasytoaddnewfeaturestoSuperSim.Muchworkremainstobedone,suchasincorporat-inglogicandscalinguptosentenceparaphrases,butpastworksuggeststhattheseproblemsaretractable.InthefourapproachesdescribedbyErk(2013),SuperSimisaninstanceofthesecondapproachtoextendingdistributionalsemanticsbeyondwords,comparingwordpairs,phrases,orsentences(ingen-eral,tuples)bycombiningmultiplepairwisesimi-larityvalues.Perhapsthemainsigniﬁcanceofthispaperisthatitprovidessomeevidenceinsupportofthisgeneralapproach. yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 364 ReferencesEnekoAgirre,DanielCer,MonaDiab,andAitorGonzalez-Agirre.2012.Semeval-2012Task6:Apilotonsemantictextualsimilarity.InProceedingsoftheFirstJointConferenceonLexicalandCompu-tationalSemantics(*SEM),pages385–393,Montr´eal,Canada.IonAndroutsopoulosandProdromosMalakasiotis.2010.Asurveyofparaphrasingandtextualentailmentmethods.JournalofArtiﬁcialIntelligenceResearch,38:135–187.MarcoBaroniandRobertoZamparelli.2010.Nounsarevectors,adjectivesarematrices:Representingadjective-nounconstructionsinsemanticspace.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2010),pages1183–1193.MarcoBaroni,RaffaellaBernardi,Ngoc-QuynhDo,andChung-chiehShan.2012.Entailmentabovethewordlevelindistributionalsemantics.InProceedingsofthe13thConferenceoftheEuropeanChapteroftheAsso-ciationforComputationalLinguistics(EACL2012),pages23–32.IsaacI.Bejar,RogerChafﬁn,andSusanE.Embretson.1991.CognitiveandPsychometricAnalysisofAna-logicalProblemSolving.Springer-Verlag.ErgunBic¸iciandDenizYuret.2006.Clusteringwordpairstoansweranalogyquestions.InProceedingsoftheFifteenthTurkishSymposiumonArtiﬁcialIntelli-genceandNeuralNetworks(TAINN2006),Akyaka,Mugla,Turkey.DanushkaBollegala,YutakaMatsuo,andMitsuruIshizuka.2008.WWWsitstheSAT:Measuringrela-tionalsimilarityontheWeb.InProceedingsofthe18thEuropeanConferenceonArtiﬁcialIntelligence(ECAI2008),pages333–337,Patras,Greece.DanushkaBollegala,YutakaMatsuo,andMitsuruIshizuka.2009.MeasuringthesimilaritybetweenimplicitsemanticrelationsfromtheWeb.InProceed-ingsofthe18thInternationalConferenceonWorldWideWeb(WWW2009),pages651–660.JohnBullinariaandJosephLevy.2007.Extract-ingsemanticrepresentationsfromwordco-occurrencestatistics:Acomputationalstudy.BehaviorResearchMethods,39(3):510–526.JohnBullinariaandJosephLevy.2012.Extract-ingsemanticrepresentationsfromwordco-occurrencestatistics:Stop-lists,stemming,andSVD.BehaviorResearchMethods,44(3):890–907.JohnCaron.2001.ExperimentswithLSAscor-ing:Optimalrankandbasis.InProceedingsoftheSIAMComputationalInformationRetrievalWork-shop,pages157–169,Raleigh,NC.KennethChurchandPatrickHanks.1989.Wordasso-ciationnorms,mutualinformation,andlexicography.InProceedingsofthe27thAnnualConferenceoftheAssociationofComputationalLinguistics,pages76–83,Vancouver,BritishColumbia.DaoudClarke.2012.Acontext-theoreticframeworkforcompositionalityindistributionalsemantics.Compu-tationalLinguistics,38(1):41–71.KatrinErk.2013.Towardsasemanticsfordistributionalrepresentations.InProceedingsofthe10thInterna-tionalConferenceonComputationalSemantics(IWCS2013),Potsdam,Germany.JohnRupertFirth.1957.Asynopsisoflinguistictheory1930–1955.InStudiesinLinguisticAnalysis,pages1–32.Blackwell,Oxford.DanGarrette,KatrinErk,andRayMooney.2011.Inte-gratinglogicalrepresentationswithprobabilisticinfor-mationusingmarkovlogic.InProceedingsofthe9thInternationalConferenceonComputationalSemantics(IWCS2011),pages105–114.DedreGentner.1983.Structure-mapping:Atheoreticalframeworkforanalogy.CognitiveScience,7(2):155–170.RoxanaGirju,PreslavNakov,ViviNastase,StanSzpakowicz,PeterTurney,andDenizYuret.2007.Semeval-2007task04:Classiﬁcationofsemanticrelationsbetweennominals.InProceedingsoftheFourthInternationalWorkshoponSemanticEvalua-tions(SemEval2007),pages13–18,Prague,CzechRepublic.EmilianoGuevara.2010.Aregressionmodelofadjective-nouncompositionalityindistributionalsemantics.InProceedingsofthe2010WorkshoponGEometricalModelsofNaturalLanguageSemantics(GEMS2010),pages33–37.ZelligHarris.1954.Distributionalstructure.Word,10(23):146–162.IrisHendrickx,SuNamKim,ZornitsaKozareva,PreslavNakov,Diarmuid´OS´eaghdha,SebastianPad´o,MarcoPennacchiotti,LorenzaRomano,andStanSzpakow-icz.2010.Semeval-2010task8:Multi-wayclassiﬁca-tionofsemanticrelationsbetweenpairsofnominals.InProceedingsofthe5thInternationalWorkshoponSemanticEvaluation,pages33–38,Uppsala,Sweden.Amac¸Herda˘gdelenandMarcoBaroni.2009.Bagpack:Ageneralframeworktorepresentsemanticrelations.InProceedingsoftheEACL2009GeometricalModelsforNaturalLanguageSemantics(GEMS)Taller,pages33–40.NathalieJapkowiczandShajuStephen.2002.Theclassimbalanceproblem:Asystematicstudy.IntelligentDataAnalysis,6(5):429–449.DavidA.Jurgens,SaifM.Mohammad,PeterD.Tur-ney,andKeithJ.Holyoak.2012.SemEval-2012 l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 365 Task2:Measuringdegreesofrelationalsimilarity.InProceedingsoftheFirstJointConferenceonLexi-calandComputationalSemantics(*SEM),pages356–364,Montr´eal,Canada.ThomasK.LandauerandSusanT.Dumais.1997.AsolutiontoPlato’sproblem:Thelatentseman-ticanalysistheoryoftheacquisition,induction,andrepresentationofknowledge.PsychologicalReview,104(2):211–240.BenoˆıtLemaireandGuyDenhi`ere.2006.Effectsofhigh-orderco-occurrencesonwordsemanticsimilar-ity.CurrentPsychologyLetters:Comportamiento,Cerebro&Cognición,18(1).YvesLepageandAndoShin-ichi.1996.Saussuriananalogy:Atheoreticalaccountanditsapplication.InProceedingsofthe16thInternationalConferenceonComputationalLinguistics(COLING1996),pages717–722.KevinLund,CurtBurgess,andRuthAnnAtchley.1995.Semanticandassociativepriminginhigh-dimensionalsemanticspace.InProceedingsofthe17thAnnualConferenceoftheCognitiveScienceSociety,pages660–665.TomasMikolov,Wen-tauYih,andGeoffreyZweig.2013.Linguisticregularitiesincontinuousspacewordrepresentations.InProceedingsofthe2013Confer-enceoftheNorthAmericanChapteroftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies(NAACL2013),Atlanta,Georgia.JeffMitchellandMirellaLapata.2008.Vector-basedmodelsofsemanticcomposition.InProceedingsofACL-08:HLT,pages236–244,Columbus,Ohio.AssociationforComputationalLinguistics.JeffMitchellandMirellaLapata.2010.Compositionindistributionalmodelsofsemantics.CognitiveScience,34(8):1388–1429.TedPedersen.2012.Duluth:Measuringdegreesofrelationalsimilaritywiththeglossvectormeasureofsemanticrelatedness.InFirstJointConferenceonLexicalandComputationalSemantics(*SEM),pages497–501,Montreal,Canada.JohnC.Platt.1998.Fasttrainingofsupportvectormachinesusingsequentialminimaloptimization.InAdvancesinKernelMethods:SupportVectorLearn-ing,pages185–208,Cambridge,MA.MITPress.BryanRinkandSandaHarabagiu.2012.UTD:Deter-miningrelationalsimilarityusinglexicalpatterns.InFirstJointConferenceonLexicalandComputationalSemantics(*SEM),pages413–418,Montreal,Canada.BryanRinkandSandaHarabagiu.2013.Theimpactofselectionalpreferenceagreementonsemanticrela-tionalsimilarity.InProceedingsofthe10thInterna-tionalConferenceonComputationalSemantics(IWCS2013),Potsdam,Germany.RichardSocher,EricH.Huang,JeffreyPennington,AndrewY.Ng,andChristopherD.Manning.2011.Dynamicpoolingandunfoldingrecursiveautoen-codersforparaphrasedetection.InAdvancesinNeuralInformationProcessingSystems(NIPS2011),pages801–809.RichardSocher,BrodyHuval,ChristopherManning,andAndrewNg.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InProceed-ingsofthe2012JointConferenceonEmpiricalMeth-odsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning(EMNLP-CoNLL2012),pages1201–1211.MireyaTovar,J.AlejandroReyes,AzucenaMontes,DarnesVilari˜no,DavidPinto,andSaulLe´on.2012.BUAP:Aﬁrstapproximationtorelationalsimilar-itymeasuring.InFirstJointConferenceonLexi-calandComputationalSemantics(*SEM),pages502–505,Montreal,Canada.PeterD.TurneyandMichaelL.Littman.2005.Corpus-basedlearningofanalogiesandsemanticrelations.MachineLearning,60(1–3):251–278.PeterD.TurneyandPatrickPantel.2010.Fromfre-quencytomeaning:Modelos espaciales vectoriales de semántica. Revista de investigación de inteligencia artificial,37:141–188.PeterD.Turney,MichaelL.Littman,JeffreyBigham,andVictorShnayder.2003.Combiningindependentmod-ulestosolvemultiple-choicesynonymandanalogyproblems.InProceedingsoftheInternationalCon-ferenceonRecentAdvancesinNaturalLanguagePro-cessing(RANLP-03),pages482–489,Borovets,Bul-garia.PeterD.Turney,YairNeuman,DanAssaf,andYohaiCohen.2011.Literalandmetaphoricalsenseidentiﬁ-cationthroughconcreteandabstractcontext.InPro-ceedingsofthe2011ConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing,pages680–690.PeterD.Turney.2006a.Expressingimplicitsemanticrelationswithoutsupervision.InProceedingsofthe21stInternationalConferenceonComputationalLin-guisticsand44thAnnualMeetingoftheAssociationforComputationalLinguistics(Coling/ACL-06),pages313–320,Sydney,Australia.PeterD.Turney.2006b.Similarityofsemanticrelations.ComputationalLinguistics,32(3):379–416.PeterD.Turney.2008a.Thelatentrelationmappingengine:Algorithmandexperiments.JournalofArtiﬁ-cialIntelligenceResearch,33:615–655.PeterD.Turney.2008b.Auniformapproachtoanalo-gies,synonyms,antonyms,andassociations.InPro-ceedingsofthe22ndInternationalConferenceonComputationalLinguistics(Coling2008),pages905–912,Manchester,Reino Unido. yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 2 3 3 1 5 6 6 6 7 5 / / t yo a C _ a _ 0 0 2 3 3 pag d . F b y gramo tu mi s t t oh norte 0 8 S mi pag mi metro b mi r 2 0 2 3 366 PeterD.Turney.2012.Domainandfunction:Adual-spacemodelofsemanticrelationsandcompositions.JournalofArtiﬁcialIntelligenceResearch,44:533–585.TonyVeale.2004.WordNetsitstheSAT:Aknowledge-basedapproachtolexicalanalogy.InProceedingsofthe16thEuropeanConferenceonArtiﬁcialIntel-ligence(ECAI2004),pages606–612,Valencia,Spain.IanH.Witten,EibeFrank,andMarkA.Hall.2011.DataMining:PracticalMachineLearningToolsandTech-niques,ThirdEdition.MorganKaufmann,SanFran-cisco.AlisaZhila,Wen-tauYih,ChristopherMeek,GeoffreyZweig,andTomasMikolov.2013.Combininghet-erogeneousmodelsformeasuringrelationalsimilar-ity.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics:HumanLanguageTechnolo-gies(NAACL2013),Atlanta,Georgia.
Descargar PDF