Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 31–45, 2016. Editor de acciones: Tim Baldwin.
Lote de envío: 12/2015; Lote de revisión: 2/2016; Publicado 2/2016.
2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
C
(cid:13)
ABayesianModelofDiachronicMeaningChangeLeaFrermannandMirellaLapataInstituteforLanguage,CognitionandComputationSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89ABl.frermann@ed.ac.uk,mlap@inf.ed.ac.ukAbstractWordmeaningschangeovertimeandanau-tomatedprocedureforextractingthisinfor-mationfromtextwouldbeusefulforhistor-icalexploratorystudies,informationretrievalorquestionanswering.Wepresentady-namicBayesianmodelofdiachronicmeaningchange,whichinferstemporalwordrepresen-tationsasasetofsensesandtheirprevalence.Unlikepreviouswork,weexplicitlymodellanguagechangeasasmooth,gradualpro-cess.Weexperimentallyshowthatthismodel-ingdecisionisbeneficial:ourmodelperformscompetitivelyonmeaningchangedetectiontaskswhilstinducingdiscerniblewordsensesandtheirdevelopmentovertime.ApplicationofourmodeltotheSemEval-2015temporalclassificationbenchmarkdatasetsfurtherre-vealsthatitperformsonparwithhighlyop-timizedtask-specificsystems.1IntroductionLanguageisadynamicsystem,constantlyevolv-ingandadaptingtotheneedsofitsusersandtheirenvironment(Aitchison,2001).Wordsinalllan-guagesnaturallyexhibitarangeofsenseswhosedis-tributionorprevalencevariesaccordingtothegenreandregisterofthediscourseaswellasitshistoricalcontext.Asanexample,considerthewordcutewhichaccordingtotheOxfordEnglishDictionary(OED,Stevenson2010)firstappearedintheearly18thcenturyandoriginallymeantcleverorkeen-witted.1Bythelate19thcenturycutewasusedin1Throughoutthispaperwedenotewordsintruetype,theirsensesinitalics,andsense-specificcontextwordsas{liza}.thesamesenseascunning.Todayitmostlyreferstoobjectsorpeopleperceivedasattractive,prettyorsweet.Anotherexampleisthewordmousewhichinitiallywasonlyusedintherodentsense.TheOEDdatesthecomputerpointingdevicesenseofmouseto1965.Thelattersensehasbecomepar-ticularlydominantinrecentdecadesduetotheever-increasinguseofcomputertechnology.Thearrivaloflarge-scalecollectionsofhistorictexts(Davies,2010)andonlinelibrariessuchastheInternetArchiveandGoogleBookshavegreatlyfacilitatedcomputationalinvestigationsoflanguagechange.Theabilitytoautomaticallydetecthowthemeaningofwordsevolvesovertimeispotentiallyofsignificantvaluetolexicographicandlinguisticresearchbutalsotorealworldapplications.Time-specificknowledgewouldpresumablyrenderwordmeaningrepresentationsmoreaccurate,andbenefitseveraldownstreamtaskswheresemanticinforma-tioniscrucial.Examplesincludeinformationre-trievalandquestionanswering,wheretime-relatedinformationcouldincreasetheprecisionofquerydisambiguationanddocumentretrieval(e.g.,byre-turningdocumentswithnewlycreatedsensesorfil-teringoutdocumentswithobsoletesenses).InthispaperwepresentadynamicBayesianmodelofdiachronicmeaningchange.Wordmean-ingismodeledasasetofsenses,whicharetrackedoverasequenceofcontiguoustimeintervals.Weinfertemporalmeaningrepresentations,consistingofaword’ssenses(asaprobabilitydistributionoverwords)andtheirrelativeprevalence.Ourmodelisthusabletodetectthatmousehadonesenseuntilthemid-20thcentury(characterizedbywordssuchas{cheese,tail,rat})andsubsequentlyacquireda
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
yo
a
C
_
a
_
0
0
0
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
32
secondsenserelatingtocomputerdevice.More-over,itinferssubtlechangeswithinasinglesense.Forinstance,inthe1970sthewords{cable,ball,mousepad}weretypicalforthecomputerdevicesense,whereasnowadaystheterms{optical,laser,usb}aremoretypical.Contrarytopreviouswork(Mitraetal.,2014;MihalceaandNastase,2012;Gu-lordavaandBaroni,2011)wheretemporalrepresen-tationsarelearntinisolation,ourmodelassumesthatadjacentrepresentationsareco-dependent,thuscapturingthenatureofmeaningchangebeingfun-damentallysmoothandgradual(McMahon,1994).Thisalsoservesasaformofsmoothing:temporallyneighboringrepresentationsinfluenceeachotheriftheavailabledataissparse.Experimentalevaluationshowsthatourmodel(a)inducestemporalrepresentationswhichreflectwordsensesandtheirdevelopmentovertime,(b)isabletodetectmeaningchangebetweentwotimepe-riods,y(C)isexpressiveenoughtoobtainusefulfeaturesforidentifyingthetimeintervalinwhichapieceoftextwaswritten.Overall,ourresultsindi-catethatanexplicitmodeloftemporaldynamicsisadvantageousfortrackingmeaningchange.Com-parisonsacrossevaluationsandagainstavarietyofrelatedsystemsshowthatdespitenotbeingdesignedwithanyparticulartaskinmind,ourmodelperformscompetitivelyacrosstheboard.2RelatedWorkMostworkondiachroniclanguagechangehasfo-cusedondetectingwhetherandtowhatextentaword’smeaningchanged(e.g.,betweentwoepochs)withoutidentifyingwordsensesandhowthesevaryovertime.Avarietyofmethodshavebeenappliedtothetaskrangingfromtheuseofstatisticaltestsinordertodetectsignificantchangesinthedistributionoftermsfromtwotimeperiods(PopescuandStrap-parava,2013;CookandStevenson,2010),totrain-ingdistributionalsimilaritymodelsontimeslices(GulordavaandBaroni,2011;Sagietal.,2009),andneurallanguagemodels(Kimetal.,2014;Kulkarnietal.,2015).Otherwork(MihalceaandNastase,2012)takesasupervisedlearningapproachandpre-dictsthetimeperiodtowhichawordbelongsgivenitssurroundingcontext.Bayesianmodelshavebeenpreviouslydevelopedforvarioustasksinlexicalsemantics(BrodyandLa-pata,2009;´OS´eaghdha,2010;Ritteretal.,2010)andwordmeaningchangedetectionisnoexception.Usingtechniquesfromnon-parametrictopicmodel-ing,Lauetal.(2012)inducewordsenses(aka.top-ics)foragiventargetwordovertwotimeperiods.Novelsensesarethenaredetectedbasedonthediscrepancybetweensensedistributionsinthetwoperiods.Follow-upwork(Cooketal.,2014;Lauetal.,2014)furtherexploresmethodsforhowtobestmeasurethissensediscrepancy.Ratherthaninfer-ringwordsenses,WijayaandYeniterzi(2011)useaTopics-over-Timemodelandk-meansclusteringtoidentifytheperiodsduringwhichselectedwordsmovefromonetopictoanother.Anon-BayesianapproachisputforwardinMi-traetal.(2014,2015)whoadoptagraph-basedframeworkforrepresentingwordmeaning(seeTah-masebietal.(2011)forasimilarearlierproposal).Inthismodelwordscorrespondtonodesinase-manticnetworkandedgesaredrawnbetweenwordssharingcontextualfeatures(extractedfromadepen-dencyparser).Agraphisconstructedforeachtimeinterval,andnodesareclusteredintosenseswithChineseWhispers(Biemann,2006),arandomizedgraphclusteringalgorithm.Bycomparingthein-ducedsensesforeachtimesliceandobservinginter-clusterdifferences,theirmethodcandetectwhethersensesemergeordisappear.Ourworkdrawsideasfromdynamictopicmod-eling(BleiandLafferty,2006b)wheretheevolu-tionoftopicsismodeledvia(liso)changesintheirassociateddistributionsoverthevocabulary.Althoughthedynamiccomponentofourmodeliscloselyrelatedtopreviousworkinthisarea(Mimnoetal.,2008),ourmodelisspecificallyconstructedforcapturingsenseratherthantopicchange.Ourap-proachisconceptuallysimilartoLauetal.(2012).Wealsolearnajointsenserepresentationformulti-pletimeslices.However,inourcasethenumberoftimeslicesinnotrestrictedtotwoandweexplicitlymodeltemporaldynamics.LikeMitraetal.(2014,2015),wemodelhowsenseschangeovertime.Inourmodel,temporalrepresentationsarenotinde-pendent,butinfluencedbytheirtemporalneighbors,encouragingsmoothchangeovertime.Wethereforeinduceaglobalandconsistentsetoftemporalrepre-sentationsforeachword.Ourmodelisknowledge-lean(itdoesnotmakeuseofaparser)andlanguage
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
yo
a
C
_
a
_
0
0
0
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
33
independiente(allthatisneededisatime-stampedcorpusandtoolsforbasicpre-processing).ContrarytoMitraetal.(2014,2015),wedonottreatthetasksofinferringasemanticrepresentationforwordsandtheirsensesastwoseparateprocesses.Evaluationofmodelswhichdetectmeaningchangeisfraughtwithdifficulties.Thereisnostan-dardsetofwordswhichhaveundergonemeaningchangeorbenchmarkcorpuswhichrepresentsava-rietyoftimeintervalsandgenres,andisthematicallyconsistent.Previousworkhasgenerallyfocusedonafewhand-selectedwordsandmodelswereevalu-atedqualitativelybyinspectingtheiroutput,ortheextenttowhichtheycandetectmeaningchangesfromtwotimeperiods.Forexample,Cooketal.(2014)manuallyidentify13targetwordswhichun-dergomeaningchangeinafocuscorpuswithre-specttoareferencecorpus(bothnewstext).Theythenassesshowtheirmodelsfareatlearningsensedifferencesforthesetargetscomparedtodistractorswhichdidnotundergomeaningchange.Theyalsounderlinetheimportanceofusingthematicallycom-parablereferenceandfocuscorporatoavoidspuri-ousdifferencesinwordrepresentations.Inthisworkweevaluateourmodel’sabilitytodetectandquantifymeaningchangeacrossseveraltimeintervals(notjusttwo).Insteadofrelyingonafewhand-selectedtargetwords,weuselargersetssampledfromourlearningcorpusorfoundtoundergomeaningchangeinajudgmentelicitationstudy(GulordavaandBaroni,2011).Inaddition,weadopttheevaluationparadigmofMitraetal.(2014)andvalidateourfindingsagainstWordNet.Finally,weapplyourmodeltotherecentlyes-tablishedSemEval-2015diachronictextevaluationsubtasks(PopescuandStrapparava,2015).Inordertopresentaconsistentsetofexperiments,weuseourowncorpusthroughoutwhichcoversawiderrangeoftimeintervalsandiscompiledfromavarietyofgenresandsourcesandisthusthematicallycoher-ent(seeSection4fordetails).Whereverpossible,wecompareagainstpriorart,withthecaveatthattheuseofadifferentunderlyingcorpusunavoidablyinfluencestheobtainedsemanticrepresentations.3ABayesianModelofSenseChangeInthissectionweintroduceSCAN,ourdynamicBayesianmodelofSenseChANge.SCANcaptureshowaword’ssensesevolveovertime(e.g.,whethernewsensesemerge),whethersomesensesbecomemoreorlessprevalent,aswellasphenomenaper-tainingtoindividualsensessuchasmeaningexten-sion,shift,ormodification.Weassumethattimeisdiscrete,dividedintocontiguousintervals.Givenaword,ourmodelinfersitssensesforeachtimein-tervalandtheirprobability.Itcapturesthegradualnatureofmeaningchangeexplicitly,throughdepen-denciesbetweentemporallyadjacentmeaningrep-resentations.Sensesthemselvesareexpressedasaprobabilitydistributionoverwords,whichcanalsochangeovertime.3.1ModelDescriptionWecreateaSCANmodelforeachtargetwordc.Theinputtothemodelisacorpusofshorttextsnippets,eachconsistingofamentionofthetargetwordcanditslocalcontextw(inourexperimentsthisisasym-metriccontextwindowof±5words).Eachsnip-petisannotatedwithitsyearoforigin.Themodelisparametrizedwithregardtothenumberofsensesk∈[1…k]ofthetargetwordc,andthelengthoftimeintervals∆Twhichmightbefinelyorcoarselydefined(e.g.,spanningayearoradecade).Weconflatealldocumentsoriginatingfromthesametimeintervalt∈[1…t]andinferatempo-ralrepresentationofthetargetwordperinterval.Atemporalmeaningrepresentationfortimetis(a)aK-dimensionalmultinomialdistributionoverwordsensesφtand(b)aV-dimensionaldistributionoverthevocabularyψt,kforeachwordsensek.Inad-dition,ourmodelinfersaprecisionparameterκφ,whichcontrolstheextenttowhichsenseprevalencechangesforwordcovertime(seeSection3.2fordetailsonhowwemodeltemporaldynamics).Weplaceindividuallogisticnormalpriors(BleiandLafferty,2006a)onourmultinomialsensedis-tributionsφandsense-worddistributionsψk.Adrawfromthelogisticnormaldistributioncon-sistsof(a)adrawofann-dimensionalrandomvectorxfromthemultivariatenormaldistributionparametrizedbyann-dimensionalmeanvectorµandan×nvariance-covariancematrixΣ,x∼N(X|µ,S);y(b)amappingofthedrawnparam-eterstothesimplexthroughthelogistictransforma-tionφn=exp(xn)/Pn0exp(xn0),whichensuresadrawofvalidmultinomialparameters.Thenormaldistributionsareparametrizedtoencouragesmooth
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
yo
a
C
_
a
_
0
0
0
8
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
34
wzzwzwφt−1φtφt+1κφa,bψt−1ψtψt+1κψIDt−1IDtIDt+1KDrawκφ∼Gamma(a,b)fortimeintervalt=1..TdoDrawsensedistributionφt|φ−t,κφ∼N(12(φt−1+φt+1),κφ)forsensek=1..KdoDrawworddistributionψt,k|ψ−t,κψ∼N(12(ψt−1,k+ψt+1,k),κψ)fordocumentd=1..DdoDrawsensezd∼Mult(φt)forcontextpositioni=1..IdoDrawwordwd,i∼Mult(ψt,zd)Figure1:Left:platediagramforthedynamicsensemodelforthreetimesteps{t−1,t,t+1}.Constantparametersareshownasdashednodes,latentvariablesasclearnodes,andobservedvariablesasgraynodes.Right:thecorrespondinggenerativestory.changeinmultinomialparameters,overtime(seeSection3.2fordetails),andtheextentofchangeiscontrolledthroughaprecisionparameterκ.Welearnthevalueofκφduringinference,whichal-lowsustomodeltheextentoftemporalchangeinsenseprevalenceindividuallyforeachtargetword.WedrawκφfromaconjugateGammaprior.Wedonotinferthesense-wordprecisionparameterκψonallψk.Instead,wefixitatahighvalue,trig-geringlittlevariationofworddistributionswithinsenses.Thisleadstosensesbeingthematicallyco-herentovertime.Wenowdescribethegenerativestoryofourmodel,whichisdepictedinFigure1(bien),along-sideitsplatediagramrepresentation(izquierda).Primero,wedrawthesenseprecisionparameterκφfromaGammaprior.Foreachtimeintervaltwedraw(a)amultinomialdistributionoversensesφtfromalo-gisticnormalprior;y(b)amultinomialdistribu-tionoverthevocabularyψt,kforeachsensek,fromanotherlogisticnormalprior.Next,wegeneratetime-specifictextsnippets.Foreachsnippetd,wefirstobservethetimeintervalt,anddrawasensezdfromMult(φt).Finalmente,wegenerateIcontextwordswd,iindependentlyfromMult(ψt,zd).3.2BackgroundoniGMRFsLetφ={φ1…φT}denoteaT-dimensionalrandomvector,whereeachφtmightforexamplecorrespondtoasenseprobabilityattimet.Wedefineapriorwhichencouragessmoothchangeofparametersatneighboringtimes,intermsofafirstorderrandomwalkontheline(graphicallyshowninFigure2,andthechainsofφandψinFigure1(izquierda)).Específicamente,wedefinethispriorasanintrinsicGaussianMarkovRandomField(iGMRF;RueandHeld2005),whichallowsustomodelthechangeofadjacentparame-tersasdrawnfromanormaldistribution,p.ej.:∆φt∼N(0,κ−1).(1)TheiGMRFisdefinedwithrespecttothegraphinFigure2;itissparselyconnectedwithonlyfirst-orderdependencieswhichallowsforefficientin-ference.Asecondfeature,whichmakesiGMRFspopularaspriorsinBayesianmodeling,isthefactthattheycanbedefinedpurelyintermsofthelo-calchangesbetweendependent(i.e.,adjacent)vari-ables,withouttheneedtospecifyanoverallmeanofthemodel.Thefullconditionalsexplicitlycap-turetheseintuitions:φt|φ−t,κ∼N(cid:16)12(φt−1+φt+1),12κ(cid:17),(2)for1
Descargar PDF