Transactions of the Association for Computational Linguistics, Bd. 4, S. 31–45, 2016. Action Editor: Tim Baldwin.
Submission batch: 12/2015; Revision batch: 2/2016; Published 2/2016.
2016 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.
C
(cid:13)
ABayesianModelofDiachronicMeaningChangeLeaFrermannandMirellaLapataInstituteforLanguage,CognitionandComputationSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89ABl.frermann@ed.ac.uk,mlap@inf.ed.ac.ukAbstractWordmeaningschangeovertimeandanau-tomatedprocedureforextractingthisinfor-mationfromtextwouldbeusefulforhistor-icalexploratorystudies,informationretrievalorquestionanswering.Wepresentady-namicBayesianmodelofdiachronicmeaningchange,whichinferstemporalwordrepresen-tationsasasetofsensesandtheirprevalence.Unlikepreviouswork,weexplicitlymodellanguagechangeasasmooth,gradualpro-cess.Weexperimentallyshowthatthismodel-ingdecisionisbeneficial:ourmodelperformscompetitivelyonmeaningchangedetectiontaskswhilstinducingdiscerniblewordsensesandtheirdevelopmentovertime.ApplicationofourmodeltotheSemEval-2015temporalclassificationbenchmarkdatasetsfurtherre-vealsthatitperformsonparwithhighlyop-timizedtask-specificsystems.1IntroductionLanguageisadynamicsystem,constantlyevolv-ingandadaptingtotheneedsofitsusersandtheirenvironment(Aitchison,2001).Wordsinalllan-guagesnaturallyexhibitarangeofsenseswhosedis-tributionorprevalencevariesaccordingtothegenreandregisterofthediscourseaswellasitshistoricalcontext.Asanexample,considerthewordcutewhichaccordingtotheOxfordEnglishDictionary(OED,Stevenson2010)firstappearedintheearly18thcenturyandoriginallymeantcleverorkeen-witted.1Bythelate19thcenturycutewasusedin1Throughoutthispaperwedenotewordsintruetype,theirsensesinitalics,andsense-specificcontextwordsas{lists}.thesamesenseascunning.Todayitmostlyreferstoobjectsorpeopleperceivedasattractive,prettyorsweet.Anotherexampleisthewordmousewhichinitiallywasonlyusedintherodentsense.TheOEDdatesthecomputerpointingdevicesenseofmouseto1965.Thelattersensehasbecomepar-ticularlydominantinrecentdecadesduetotheever-increasinguseofcomputertechnology.Thearrivaloflarge-scalecollectionsofhistorictexts(Davies,2010)andonlinelibrariessuchastheInternetArchiveandGoogleBookshavegreatlyfacilitatedcomputationalinvestigationsoflanguagechange.Theabilitytoautomaticallydetecthowthemeaningofwordsevolvesovertimeispotentiallyofsignificantvaluetolexicographicandlinguisticresearchbutalsotorealworldapplications.Time-specificknowledgewouldpresumablyrenderwordmeaningrepresentationsmoreaccurate,andbenefitseveraldownstreamtaskswheresemanticinforma-tioniscrucial.Examplesincludeinformationre-trievalandquestionanswering,wheretime-relatedinformationcouldincreasetheprecisionofquerydisambiguationanddocumentretrieval(e.g.,byre-turningdocumentswithnewlycreatedsensesorfil-teringoutdocumentswithobsoletesenses).InthispaperwepresentadynamicBayesianmodelofdiachronicmeaningchange.Wordmean-ingismodeledasasetofsenses,whicharetrackedoverasequenceofcontiguoustimeintervals.Weinfertemporalmeaningrepresentations,consistingofaword’ssenses(asaprobabilitydistributionoverwords)andtheirrelativeprevalence.Ourmodelisthusabletodetectthatmousehadonesenseuntilthemid-20thcentury(characterizedbywordssuchas{cheese,tail,rat})andsubsequentlyacquireda
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
T
l
A
C
_
A
_
0
0
0
8
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
32
secondsenserelatingtocomputerdevice.More-over,itinferssubtlechangeswithinasinglesense.Forinstance,inthe1970sthewords{cable,ball,mousepad}weretypicalforthecomputerdevicesense,whereasnowadaystheterms{optical,laser,usb}aremoretypical.Contrarytopreviouswork(Mitraetal.,2014;MihalceaandNastase,2012;Gu-lordavaandBaroni,2011)wheretemporalrepresen-tationsarelearntinisolation,ourmodelassumesthatadjacentrepresentationsareco-dependent,thuscapturingthenatureofmeaningchangebeingfun-damentallysmoothandgradual(McMahon,1994).Thisalsoservesasaformofsmoothing:temporallyneighboringrepresentationsinfluenceeachotheriftheavailabledataissparse.Experimentalevaluationshowsthatourmodel(A)inducestemporalrepresentationswhichreflectwordsensesandtheirdevelopmentovertime,(B)isabletodetectmeaningchangebetweentwotimepe-riods,Und(C)isexpressiveenoughtoobtainusefulfeaturesforidentifyingthetimeintervalinwhichapieceoftextwaswritten.Overall,ourresultsindi-catethatanexplicitmodeloftemporaldynamicsisadvantageousfortrackingmeaningchange.Com-parisonsacrossevaluationsandagainstavarietyofrelatedsystemsshowthatdespitenotbeingdesignedwithanyparticulartaskinmind,ourmodelperformscompetitivelyacrosstheboard.2RelatedWorkMostworkondiachroniclanguagechangehasfo-cusedondetectingwhetherandtowhatextentaword’smeaningchanged(e.g.,betweentwoepochs)withoutidentifyingwordsensesandhowthesevaryovertime.Avarietyofmethodshavebeenappliedtothetaskrangingfromtheuseofstatisticaltestsinordertodetectsignificantchangesinthedistributionoftermsfromtwotimeperiods(PopescuandStrap-parava,2013;CookandStevenson,2010),totrain-ingdistributionalsimilaritymodelsontimeslices(GulordavaandBaroni,2011;Sagietal.,2009),andneurallanguagemodels(Kimetal.,2014;Kulkarnietal.,2015).Otherwork(MihalceaandNastase,2012)takesasupervisedlearningapproachandpre-dictsthetimeperiodtowhichawordbelongsgivenitssurroundingcontext.Bayesianmodelshavebeenpreviouslydevelopedforvarioustasksinlexicalsemantics(BrodyandLa-pata,2009;´OS´eaghdha,2010;Ritteretal.,2010)andwordmeaningchangedetectionisnoexception.Usingtechniquesfromnon-parametrictopicmodel-ing,Lauetal.(2012)inducewordsenses(aka.top-ics)foragiventargetwordovertwotimeperiods.Novelsensesarethenaredetectedbasedonthediscrepancybetweensensedistributionsinthetwoperiods.Follow-upwork(Cooketal.,2014;Lauetal.,2014)furtherexploresmethodsforhowtobestmeasurethissensediscrepancy.Ratherthaninfer-ringwordsenses,WijayaandYeniterzi(2011)useaTopics-over-Timemodelandk-meansclusteringtoidentifytheperiodsduringwhichselectedwordsmovefromonetopictoanother.Anon-BayesianapproachisputforwardinMi-traetal.(2014,2015)whoadoptagraph-basedframeworkforrepresentingwordmeaning(seeTah-masebietal.(2011)forasimilarearlierproposal).Inthismodelwordscorrespondtonodesinase-manticnetworkandedgesaredrawnbetweenwordssharingcontextualfeatures(extractedfromadepen-dencyparser).Agraphisconstructedforeachtimeinterval,andnodesareclusteredintosenseswithChineseWhispers(Biemann,2006),arandomizedgraphclusteringalgorithm.Bycomparingthein-ducedsensesforeachtimesliceandobservinginter-clusterdifferences,theirmethodcandetectwhethersensesemergeordisappear.Ourworkdrawsideasfromdynamictopicmod-eling(BleiandLafferty,2006B)wheretheevolu-tionoftopicsismodeledvia(glatt)changesintheirassociateddistributionsoverthevocabulary.Althoughthedynamiccomponentofourmodeliscloselyrelatedtopreviousworkinthisarea(Mimnoetal.,2008),ourmodelisspecificallyconstructedforcapturingsenseratherthantopicchange.Ourap-proachisconceptuallysimilartoLauetal.(2012).Wealsolearnajointsenserepresentationformulti-pletimeslices.However,inourcasethenumberoftimeslicesinnotrestrictedtotwoandweexplicitlymodeltemporaldynamics.LikeMitraetal.(2014,2015),wemodelhowsenseschangeovertime.Inourmodel,temporalrepresentationsarenotinde-pendent,butinfluencedbytheirtemporalneighbors,encouragingsmoothchangeovertime.Wethereforeinduceaglobalandconsistentsetoftemporalrepre-sentationsforeachword.Ourmodelisknowledge-lean(itdoesnotmakeuseofaparser)andlanguage
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
T
l
A
C
_
A
_
0
0
0
8
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
33
independent(allthatisneededisatime-stampedcorpusandtoolsforbasicpre-processing).ContrarytoMitraetal.(2014,2015),wedonottreatthetasksofinferringasemanticrepresentationforwordsandtheirsensesastwoseparateprocesses.Evaluationofmodelswhichdetectmeaningchangeisfraughtwithdifficulties.Thereisnostan-dardsetofwordswhichhaveundergonemeaningchangeorbenchmarkcorpuswhichrepresentsava-rietyoftimeintervalsandgenres,andisthematicallyconsistent.Previousworkhasgenerallyfocusedonafewhand-selectedwordsandmodelswereevalu-atedqualitativelybyinspectingtheiroutput,ortheextenttowhichtheycandetectmeaningchangesfromtwotimeperiods.Forexample,Cooketal.(2014)manuallyidentify13targetwordswhichun-dergomeaningchangeinafocuscorpuswithre-specttoareferencecorpus(bothnewstext).Theythenassesshowtheirmodelsfareatlearningsensedifferencesforthesetargetscomparedtodistractorswhichdidnotundergomeaningchange.Theyalsounderlinetheimportanceofusingthematicallycom-parablereferenceandfocuscorporatoavoidspuri-ousdifferencesinwordrepresentations.Inthisworkweevaluateourmodel’sabilitytodetectandquantifymeaningchangeacrossseveraltimeintervals(notjusttwo).Insteadofrelyingonafewhand-selectedtargetwords,weuselargersetssampledfromourlearningcorpusorfoundtoundergomeaningchangeinajudgmentelicitationstudy(GulordavaandBaroni,2011).Inaddition,weadopttheevaluationparadigmofMitraetal.(2014)andvalidateourfindingsagainstWordNet.Finally,weapplyourmodeltotherecentlyes-tablishedSemEval-2015diachronictextevaluationsubtasks(PopescuandStrapparava,2015).Inordertopresentaconsistentsetofexperiments,weuseourowncorpusthroughoutwhichcoversawiderrangeoftimeintervalsandiscompiledfromavarietyofgenresandsourcesandisthusthematicallycoher-ent(seeSection4fordetails).Whereverpossible,wecompareagainstpriorart,withthecaveatthattheuseofadifferentunderlyingcorpusunavoidablyinfluencestheobtainedsemanticrepresentations.3ABayesianModelofSenseChangeInthissectionweintroduceSCAN,ourdynamicBayesianmodelofSenseChANge.SCANcaptureshowaword’ssensesevolveovertime(e.g.,whethernewsensesemerge),whethersomesensesbecomemoreorlessprevalent,aswellasphenomenaper-tainingtoindividualsensessuchasmeaningexten-sion,Schicht,ormodification.Weassumethattimeisdiscrete,dividedintocontiguousintervals.Givenaword,ourmodelinfersitssensesforeachtimein-tervalandtheirprobability.Itcapturesthegradualnatureofmeaningchangeexplicitly,throughdepen-denciesbetweentemporallyadjacentmeaningrep-resentations.Sensesthemselvesareexpressedasaprobabilitydistributionoverwords,whichcanalsochangeovertime.3.1ModelDescriptionWecreateaSCANmodelforeachtargetwordc.Theinputtothemodelisacorpusofshorttextsnippets,eachconsistingofamentionofthetargetwordcanditslocalcontextw(inourexperimentsthisisasym-metriccontextwindowof±5words).Eachsnip-petisannotatedwithitsyearoforigin.Themodelisparametrizedwithregardtothenumberofsensesk∈[1…K]ofthetargetwordc,andthelengthoftimeintervals∆Twhichmightbefinelyorcoarselydefined(e.g.,spanningayearoradecade).Weconflatealldocumentsoriginatingfromthesametimeintervalt∈[1…T]andinferatempo-ralrepresentationofthetargetwordperinterval.Atemporalmeaningrepresentationfortimetis(A)aK-dimensionalmultinomialdistributionoverwordsensesφtand(B)aV-dimensionaldistributionoverthevocabularyψt,kforeachwordsensek.Inad-dition,ourmodelinfersaprecisionparameterκφ,whichcontrolstheextenttowhichsenseprevalencechangesforwordcovertime(seeSection3.2fordetailsonhowwemodeltemporaldynamics).Weplaceindividuallogisticnormalpriors(BleiandLafferty,2006A)onourmultinomialsensedis-tributionsφandsense-worddistributionsψk.Adrawfromthelogisticnormaldistributioncon-sistsof(A)adrawofann-dimensionalrandomvectorxfromthemultivariatenormaldistributionparametrizedbyann-dimensionalmeanvectorµandan×nvariance-covariancematrixΣ,x∼N(X|µ,Σ);Und(B)amappingofthedrawnparam-eterstothesimplexthroughthelogistictransforma-tionφn=exp(xn)/Pn0exp(xn0),whichensuresadrawofvalidmultinomialparameters.Thenormaldistributionsareparametrizedtoencouragesmooth
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
T
l
A
C
_
A
_
0
0
0
8
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
34
wzzwzwφt−1φtφt+1κφa,bψt−1ψtψt+1κψIDt−1IDtIDt+1KDrawκφ∼Gamma(A,B)fortimeintervalt=1..TdoDrawsensedistributionφt|φ−t,κφ∼N(12(φt−1+φt+1),κφ)forsensek=1..KdoDrawworddistributionψt,k|ψ−t,κψ∼N(12(ψt−1,k+ψt+1,k),κψ)fordocumentd=1..DdoDrawsensezd∼Mult(φt)forcontextpositioni=1..IdoDrawwordwd,i∼Mult(ψt,zd)Figure1:Links:platediagramforthedynamicsensemodelforthreetimesteps{t−1,t,t+1}.Constantparametersareshownasdashednodes,latentvariablesasclearnodes,andobservedvariablesasgraynodes.Right:thecorrespondinggenerativestory.changeinmultinomialparameters,overtime(seeSection3.2fordetails),andtheextentofchangeiscontrolledthroughaprecisionparameterκ.Welearnthevalueofκφduringinference,whichal-lowsustomodeltheextentoftemporalchangeinsenseprevalenceindividuallyforeachtargetword.WedrawκφfromaconjugateGammaprior.Wedonotinferthesense-wordprecisionparameterκψonallψk.Instead,wefixitatahighvalue,trig-geringlittlevariationofworddistributionswithinsenses.Thisleadstosensesbeingthematicallyco-herentovertime.Wenowdescribethegenerativestoryofourmodel,whichisdepictedinFigure1(Rechts),along-sideitsplatediagramrepresentation(links).Erste,wedrawthesenseprecisionparameterκφfromaGammaprior.Foreachtimeintervaltwedraw(A)amultinomialdistributionoversensesφtfromalo-gisticnormalprior;Und(B)amultinomialdistribu-tionoverthevocabularyψt,kforeachsensek,fromanotherlogisticnormalprior.Next,wegeneratetime-specifictextsnippets.Foreachsnippetd,wefirstobservethetimeintervalt,anddrawasensezdfromMult(φt).Endlich,wegenerateIcontextwordswd,iindependentlyfromMult(ψt,zd).3.2BackgroundoniGMRFsLetφ={φ1…φT}denoteaT-dimensionalrandomvector,whereeachφtmightforexamplecorrespondtoasenseprobabilityattimet.Wedefineapriorwhichencouragessmoothchangeofparametersatneighboringtimes,intermsofafirstorderrandomwalkontheline(graphicallyshowninFigure2,andthechainsofφandψinFigure1(links)).Konkret,wedefinethispriorasanintrinsicGaussianMarkovRandomField(iGMRF;RueandHeld2005),whichallowsustomodelthechangeofadjacentparame-tersasdrawnfromanormaldistribution,z.B.:∆φt∼N(0,κ−1).(1)TheiGMRFisdefinedwithrespecttothegraphinFigure2;itissparselyconnectedwithonlyfirst-orderdependencieswhichallowsforefficientin-ference.Asecondfeature,whichmakesiGMRFspopularaspriorsinBayesianmodeling,isthefactthattheycanbedefinedpurelyintermsofthelo-calchangesbetweendependent(i.e.,adjacent)vari-ables,withouttheneedtospecifyanoverallmeanofthemodel.Thefullconditionalsexplicitlycap-turetheseintuitions:φt|φ−t,κ∼N(cid:16)12(φt−1+φt+1),12κ(cid:17),(2)for1
PDF Herunterladen