计算语言学协会会刊, 卷. 4, PP. 31–45, 2016. 动作编辑器: Tim Baldwin.
提交批次: 12/2015; 修改批次: 2/2016; 已发表 2/2016.
2016 计算语言学协会. 根据 CC-BY 分发 4.0 执照.
C
(西德:13)
ABayesianModelofDiachronicMeaningChangeLeaFrermannandMirellaLapataInstituteforLanguage,CognitionandComputationSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89ABl.frermann@ed.ac.uk,mlap@inf.ed.ac.ukAbstractWordmeaningschangeovertimeandanau-tomatedprocedureforextractingthisinfor-mationfromtextwouldbeusefulforhistor-icalexploratorystudies,informationretrievalorquestionanswering.Wepresentady-namicBayesianmodelofdiachronicmeaningchange,whichinferstemporalwordrepresen-tationsasasetofsensesandtheirprevalence.Unlikepreviouswork,weexplicitlymodellanguagechangeasasmooth,gradualpro-cess.Weexperimentallyshowthatthismodel-ingdecisionisbeneficial:ourmodelperformscompetitivelyonmeaningchangedetectiontaskswhilstinducingdiscerniblewordsensesandtheirdevelopmentovertime.ApplicationofourmodeltotheSemEval-2015temporalclassificationbenchmarkdatasetsfurtherre-vealsthatitperformsonparwithhighlyop-timizedtask-specificsystems.1IntroductionLanguageisadynamicsystem,constantlyevolv-ingandadaptingtotheneedsofitsusersandtheirenvironment(Aitchison,2001).Wordsinalllan-guagesnaturallyexhibitarangeofsenseswhosedis-tributionorprevalencevariesaccordingtothegenreandregisterofthediscourseaswellasitshistoricalcontext.Asanexample,considerthewordcutewhichaccordingtotheOxfordEnglishDictionary(OED,Stevenson2010)firstappearedintheearly18thcenturyandoriginallymeantcleverorkeen-witted.1Bythelate19thcenturycutewasusedin1Throughoutthispaperwedenotewordsintruetype,theirsensesinitalics,andsense-specificcontextwordsas{列表}.thesamesenseascunning.Todayitmostlyreferstoobjectsorpeopleperceivedasattractive,prettyorsweet.Anotherexampleisthewordmousewhichinitiallywasonlyusedintherodentsense.TheOEDdatesthecomputerpointingdevicesenseofmouseto1965.Thelattersensehasbecomepar-ticularlydominantinrecentdecadesduetotheever-increasinguseofcomputertechnology.Thearrivaloflarge-scalecollectionsofhistorictexts(戴维斯,2010)andonlinelibrariessuchastheInternetArchiveandGoogleBookshavegreatlyfacilitatedcomputationalinvestigationsoflanguagechange.Theabilitytoautomaticallydetecthowthemeaningofwordsevolvesovertimeispotentiallyofsignificantvaluetolexicographicandlinguisticresearchbutalsotorealworldapplications.Time-specificknowledgewouldpresumablyrenderwordmeaningrepresentationsmoreaccurate,andbenefitseveraldownstreamtaskswheresemanticinforma-tioniscrucial.Examplesincludeinformationre-trievalandquestionanswering,wheretime-relatedinformationcouldincreasetheprecisionofquerydisambiguationanddocumentretrieval(e.g.,byre-turningdocumentswithnewlycreatedsensesorfil-teringoutdocumentswithobsoletesenses).InthispaperwepresentadynamicBayesianmodelofdiachronicmeaningchange.Wordmean-ingismodeledasasetofsenses,whicharetrackedoverasequenceofcontiguoustimeintervals.Weinfertemporalmeaningrepresentations,consistingofaword’ssenses(asaprobabilitydistributionoverwords)andtheirrelativeprevalence.Ourmodelisthusabletodetectthatmousehadonesenseuntilthemid-20thcentury(characterizedbywordssuchas{cheese,tail,rat})andsubsequentlyacquireda
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
我
A
C
_
A
_
0
0
0
8
1
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
32
secondsenserelatingtocomputerdevice.More-over,itinferssubtlechangeswithinasinglesense.Forinstance,inthe1970sthewords{电缆,ball,mousepad}weretypicalforthecomputerdevicesense,whereasnowadaystheterms{optical,laser,usb}aremoretypical.Contrarytopreviouswork(Mitraetal.,2014;MihalceaandNastase,2012;Gu-lordavaandBaroni,2011)wheretemporalrepresen-tationsarelearntinisolation,ourmodelassumesthatadjacentrepresentationsareco-dependent,thuscapturingthenatureofmeaningchangebeingfun-damentallysmoothandgradual(麦克马洪,1994).Thisalsoservesasaformofsmoothing:temporallyneighboringrepresentationsinfluenceeachotheriftheavailabledataissparse.Experimentalevaluationshowsthatourmodel(A)inducestemporalrepresentationswhichreflectwordsensesandtheirdevelopmentovertime,(乙)isabletodetectmeaningchangebetweentwotimepe-riods,和(C)isexpressiveenoughtoobtainusefulfeaturesforidentifyingthetimeintervalinwhichapieceoftextwaswritten.Overall,ourresultsindi-catethatanexplicitmodeloftemporaldynamicsisadvantageousfortrackingmeaningchange.Com-parisonsacrossevaluationsandagainstavarietyofrelatedsystemsshowthatdespitenotbeingdesignedwithanyparticulartaskinmind,ourmodelperformscompetitivelyacrosstheboard.2RelatedWorkMostworkondiachroniclanguagechangehasfo-cusedondetectingwhetherandtowhatextentaword’smeaningchanged(e.g.,betweentwoepochs)withoutidentifyingwordsensesandhowthesevaryovertime.Avarietyofmethodshavebeenappliedtothetaskrangingfromtheuseofstatisticaltestsinordertodetectsignificantchangesinthedistributionoftermsfromtwotimeperiods(PopescuandStrap-parava,2013;CookandStevenson,2010),totrain-ingdistributionalsimilaritymodelsontimeslices(GulordavaandBaroni,2011;Sagietal.,2009),andneurallanguagemodels(Kimetal.,2014;Kulkarnietal.,2015).Otherwork(MihalceaandNastase,2012)takesasupervisedlearningapproachandpre-dictsthetimeperiodtowhichawordbelongsgivenitssurroundingcontext.Bayesianmodelshavebeenpreviouslydevelopedforvarioustasksinlexicalsemantics(BrodyandLa-pata,2009;´OS´eaghdha,2010;Ritteretal.,2010)andwordmeaningchangedetectionisnoexception.Usingtechniquesfromnon-parametrictopicmodel-ing,Lauetal.(2012)inducewordsenses(aka.top-ics)foragiventargetwordovertwotimeperiods.Novelsensesarethenaredetectedbasedonthediscrepancybetweensensedistributionsinthetwoperiods.Follow-upwork(Cooketal.,2014;Lauetal.,2014)furtherexploresmethodsforhowtobestmeasurethissensediscrepancy.Ratherthaninfer-ringwordsenses,WijayaandYeniterzi(2011)useaTopics-over-Timemodelandk-meansclusteringtoidentifytheperiodsduringwhichselectedwordsmovefromonetopictoanother.Anon-BayesianapproachisputforwardinMi-traetal.(2014,2015)whoadoptagraph-basedframeworkforrepresentingwordmeaning(seeTah-masebietal.(2011)forasimilarearlierproposal).Inthismodelwordscorrespondtonodesinase-manticnetworkandedgesaredrawnbetweenwordssharingcontextualfeatures(extractedfromadepen-dencyparser).Agraphisconstructedforeachtimeinterval,andnodesareclusteredintosenseswithChineseWhispers(Biemann,2006),arandomizedgraphclusteringalgorithm.Bycomparingthein-ducedsensesforeachtimesliceandobservinginter-clusterdifferences,theirmethodcandetectwhethersensesemergeordisappear.Ourworkdrawsideasfromdynamictopicmod-eling(BleiandLafferty,2006乙)wheretheevolu-tionoftopicsismodeledvia(smooth)changesintheirassociateddistributionsoverthevocabulary.Althoughthedynamiccomponentofourmodeliscloselyrelatedtopreviousworkinthisarea(Mimnoetal.,2008),ourmodelisspecificallyconstructedforcapturingsenseratherthantopicchange.Ourap-proachisconceptuallysimilartoLauetal.(2012).Wealsolearnajointsenserepresentationformulti-pletimeslices.However,inourcasethenumberoftimeslicesinnotrestrictedtotwoandweexplicitlymodeltemporaldynamics.LikeMitraetal.(2014,2015),wemodelhowsenseschangeovertime.Inourmodel,temporalrepresentationsarenotinde-pendent,butinfluencedbytheirtemporalneighbors,encouragingsmoothchangeovertime.Wethereforeinduceaglobalandconsistentsetoftemporalrepre-sentationsforeachword.Ourmodelisknowledge-lean(itdoesnotmakeuseofaparser)andlanguage
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
我
A
C
_
A
_
0
0
0
8
1
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
33
独立的(allthatisneededisatime-stampedcorpusandtoolsforbasicpre-processing).ContrarytoMitraetal.(2014,2015),wedonottreatthetasksofinferringasemanticrepresentationforwordsandtheirsensesastwoseparateprocesses.Evaluationofmodelswhichdetectmeaningchangeisfraughtwithdifficulties.Thereisnostan-dardsetofwordswhichhaveundergonemeaningchangeorbenchmarkcorpuswhichrepresentsava-rietyoftimeintervalsandgenres,andisthematicallyconsistent.Previousworkhasgenerallyfocusedonafewhand-selectedwordsandmodelswereevalu-atedqualitativelybyinspectingtheiroutput,ortheextenttowhichtheycandetectmeaningchangesfromtwotimeperiods.Forexample,Cooketal.(2014)manuallyidentify13targetwordswhichun-dergomeaningchangeinafocuscorpuswithre-specttoareferencecorpus(bothnewstext).Theythenassesshowtheirmodelsfareatlearningsensedifferencesforthesetargetscomparedtodistractorswhichdidnotundergomeaningchange.Theyalsounderlinetheimportanceofusingthematicallycom-parablereferenceandfocuscorporatoavoidspuri-ousdifferencesinwordrepresentations.Inthisworkweevaluateourmodel’sabilitytodetectandquantifymeaningchangeacrossseveraltimeintervals(notjusttwo).Insteadofrelyingonafewhand-selectedtargetwords,weuselargersetssampledfromourlearningcorpusorfoundtoundergomeaningchangeinajudgmentelicitationstudy(GulordavaandBaroni,2011).Inaddition,weadopttheevaluationparadigmofMitraetal.(2014)andvalidateourfindingsagainstWordNet.Finally,weapplyourmodeltotherecentlyes-tablishedSemEval-2015diachronictextevaluationsubtasks(PopescuandStrapparava,2015).Inordertopresentaconsistentsetofexperiments,weuseourowncorpusthroughoutwhichcoversawiderrangeoftimeintervalsandiscompiledfromavarietyofgenresandsourcesandisthusthematicallycoher-ent(seeSection4fordetails).Whereverpossible,wecompareagainstpriorart,withthecaveatthattheuseofadifferentunderlyingcorpusunavoidablyinfluencestheobtainedsemanticrepresentations.3ABayesianModelofSenseChangeInthissectionweintroduceSCAN,ourdynamicBayesianmodelofSenseChANge.SCANcaptureshowaword’ssensesevolveovertime(e.g.,whethernewsensesemerge),whethersomesensesbecomemoreorlessprevalent,aswellasphenomenaper-tainingtoindividualsensessuchasmeaningexten-sion,转移,ormodification.Weassumethattimeisdiscrete,dividedintocontiguousintervals.Givenaword,ourmodelinfersitssensesforeachtimein-tervalandtheirprobability.Itcapturesthegradualnatureofmeaningchangeexplicitly,throughdepen-denciesbetweentemporallyadjacentmeaningrep-resentations.Sensesthemselvesareexpressedasaprobabilitydistributionoverwords,whichcanalsochangeovertime.3.1ModelDescriptionWecreateaSCANmodelforeachtargetwordc.Theinputtothemodelisacorpusofshorttextsnippets,eachconsistingofamentionofthetargetwordcanditslocalcontextw(inourexperimentsthisisasym-metriccontextwindowof±5words).Eachsnip-petisannotatedwithitsyearoforigin.Themodelisparametrizedwithregardtothenumberofsensesk∈[1…K]ofthetargetwordc,andthelengthoftimeintervals∆Twhichmightbefinelyorcoarselydefined(e.g.,spanningayearoradecade).Weconflatealldocumentsoriginatingfromthesametimeintervalt∈[1…时间]andinferatempo-ralrepresentationofthetargetwordperinterval.Atemporalmeaningrepresentationfortimetis(A)aK-dimensionalmultinomialdistributionoverwordsensesφtand(乙)aV-dimensionaldistributionoverthevocabularyψt,kforeachwordsensek.Inad-dition,ourmodelinfersaprecisionparameterκφ,whichcontrolstheextenttowhichsenseprevalencechangesforwordcovertime(seeSection3.2fordetailsonhowwemodeltemporaldynamics).Weplaceindividuallogisticnormalpriors(BleiandLafferty,2006A)onourmultinomialsensedis-tributionsφandsense-worddistributionsψk.Adrawfromthelogisticnormaldistributioncon-sistsof(A)adrawofann-dimensionalrandomvectorxfromthemultivariatenormaldistributionparametrizedbyann-dimensionalmeanvectorµandan×nvariance-covariancematrixΣ,x∼N(X|µ,Σ);和(乙)amappingofthedrawnparam-eterstothesimplexthroughthelogistictransforma-tionφn=exp(xn)/Pn0exp(xn0),whichensuresadrawofvalidmultinomialparameters.Thenormaldistributionsareparametrizedtoencouragesmooth
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
0
8
1
1
5
6
7
3
7
0
/
/
t
我
A
C
_
A
_
0
0
0
8
1
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
34
wzzwzwφt−1φtφt+1κφa,bψt−1ψtψt+1κψIDt−1IDtIDt+1KDrawκφ∼Gamma(A,乙)fortimeintervalt=1..TdoDrawsensedistributionφt|φ−t,κφ∼N(12(φt−1+φt+1),κφ)forsensek=1..KdoDrawworddistributionψt,k|ψ−t,κψ∼N(12(ψt−1,k+ψt+1,k),κψ)fordocumentd=1..DdoDrawsensezd∼Mult(φt)forcontextpositioni=1..IdoDrawwordwd,i∼Mult(ψt,zd)Figure1:左边:platediagramforthedynamicsensemodelforthreetimesteps{t−1,t,t+1}.Constantparametersareshownasdashednodes,latentvariablesasclearnodes,andobservedvariablesasgraynodes.Right:thecorrespondinggenerativestory.changeinmultinomialparameters,overtime(seeSection3.2fordetails),andtheextentofchangeiscontrolledthroughaprecisionparameterκ.Welearnthevalueofκφduringinference,whichal-lowsustomodeltheextentoftemporalchangeinsenseprevalenceindividuallyforeachtargetword.WedrawκφfromaconjugateGammaprior.Wedonotinferthesense-wordprecisionparameterκψonallψk.Instead,wefixitatahighvalue,trig-geringlittlevariationofworddistributionswithinsenses.Thisleadstosensesbeingthematicallyco-herentovertime.Wenowdescribethegenerativestoryofourmodel,whichisdepictedinFigure1(正确的),along-sideitsplatediagramrepresentation(左边).第一的,wedrawthesenseprecisionparameterκφfromaGammaprior.Foreachtimeintervaltwedraw(A)amultinomialdistributionoversensesφtfromalo-gisticnormalprior;和(乙)amultinomialdistribu-tionoverthevocabularyψt,kforeachsensek,fromanotherlogisticnormalprior.Next,wegeneratetime-specifictextsnippets.Foreachsnippetd,wefirstobservethetimeintervalt,anddrawasensezdfromMult(φt).最后,wegenerateIcontextwordswd,iindependentlyfromMult(ψt,zd).3.2BackgroundoniGMRFsLetφ={φ1…φT}denoteaT-dimensionalrandomvector,whereeachφtmightforexamplecorrespondtoasenseprobabilityattimet.Wedefineapriorwhichencouragessmoothchangeofparametersatneighboringtimes,intermsofafirstorderrandomwalkontheline(graphicallyshowninFigure2,andthechainsofφandψinFigure1(左边)).Specifically,wedefinethispriorasanintrinsicGaussianMarkovRandomField(iGMRF;RueandHeld2005),whichallowsustomodelthechangeofadjacentparame-tersasdrawnfromanormaldistribution,例如:∆φt∼N(0,κ−1).(1)TheiGMRFisdefinedwithrespecttothegraphinFigure2;itissparselyconnectedwithonlyfirst-orderdependencieswhichallowsforefficientin-ference.Asecondfeature,whichmakesiGMRFspopularaspriorsinBayesianmodeling,isthefactthattheycanbedefinedpurelyintermsofthelo-calchangesbetweendependent(i.e.,adjacent)vari-ables,withouttheneedtospecifyanoverallmeanofthemodel.Thefullconditionalsexplicitlycap-turetheseintuitions:φt|φ−t,κ∼N(西德:16)12(φt−1+φt+1),12κ(西德:17),(2)for1
下载pdf