Transactions of the Association for Computational Linguistics, vol. 3, pp. 43–57, 2015. Action Editor: Janyce Wiebe.

Submission batch: 7/2014; Revision batch 12/2014; Published 1/2015. c
(cid:13)

2015 Association for Computational Linguistics.

SPRITE:GeneralizingTopicModelswithStructuredPriorsMichaelJ.PaulandMarkDredzeDepartmentofComputerScienceHumanLanguageTechnologyCenterofExcellenceJohnsHopkinsUniversity,Baltimore,MD21218mpaul@cs.jhu.edu,mdredze@cs.jhu.eduAbstractWeintroduceSPRITE,afamilyoftopicmodelsthatincorporatesstructureintomodelpriorsasafunctionofunderlyingcomponents.Thestructuredpriorscanbeconstrainedtomodeltopichierarchies,factorizations,correlations,andsupervi-sion,allowingSPRITEtobetailoredtoparticularsettings.WedemonstratethisﬂexibilitybyconstructingaSPRITE-basedmodeltojointlyinfertopichierarchiesandauthorperspective,whichweapplytocor-poraofpoliticaldebatesandonlinere-views.Weshowthatthemodellearnsin-tuitivetopics,outperformingseveralothertopicmodelsatpredictivetasks.1IntroductionTopicmodelscanbeapowerfulaidforanalyzinglargecollectionsoftextbyuncoveringlatentin-terpretablestructureswithoutmanualsupervision.Yetpeopleoftenhaveexpectationsabouttopicsinagivencorpusandhowtheyshouldbestructuredforaparticulartask.Itiscrucialfortheuserexpe-riencethattopicsmeettheseexpectations(Mimnoetal.,2011;Talleyetal.,2011)yetblackboxtopicmodelsprovidenocontroloverthedesiredoutput.ThispaperpresentsSPRITE,afamilyoftopicmodelsthatprovideaﬂexibleframeworkforen-codingpreferencesaspriorsforhowtopicsshouldbestructured.SPRITEcanincorporatemanytypesofstructurethathavebeenconsideredinpriorwork,includinghierarchies(Bleietal.,2003a;Mimnoetal.,2007),factorizations(PaulandDredze,2012;Eisensteinetal.,2011),sparsity(WangandBlei,2009;BalasubramanyanandCo-hen,2013),correlationsbetweentopics(BleiandLafferty,2007;LiandMcCallum,2006),pref-erencesoverwordchoices(Andrzejewskietal.,2009;PaulandDredze,2013),andassociationsbetweentopicsanddocumentattributes(Ramageetal.,2009;MimnoandMcCallum,2008).SPRITEbuildsonastandardtopicmodel,addingstructuretothepriorsoverthemodelpa-rameters.Thepriorsaregivenbylog-linearfunc-tionsofunderlyingcomponents(§2),whichpro-videadditionallatentstructurethatwewillshowcanenrichthemodelinmanyways.Byapply-ingparticularconstraintsandpriorstothecompo-nenthyperparameters,avarietyofstructurescanbeinducedsuchashierarchiesandfactorizations(§3),andwewillshowthatthisframeworkcap-turesmanyexistingtopicmodels(§4).Afterdescribingthegeneralformofthemodel,weshowhowSPRITEcanbetailoredtopartic-ularsettingsbydescribingaspeciﬁcmodelfortheappliedtaskofjointlyinferringtopichierar-chiesandperspective(§6).Weexperimentwiththistopic+perspectivemodelonsetsofpoliticaldebatesandonlinereviews(§7),anddemonstratethatSPRITElearnsdesiredstructureswhileoutper-formingmanybaselinesatpredictivetasks.2TopicModelingwithStructuredPriorsOurmodelfamilygeneralizeslatentDirichletal-location(LDA)(Bleietal.,2003b).UnderLDA,thereareKtopics,whereatopicisacategor-icaldistributionoverVwordsparameterizedbyφk.Eachdocumenthasacategoricaldistributionovertopics,parameterizedbyθmforthemthdoc-ument.Eachobservedwordinadocumentisgen-eratedbydrawingatopiczfromθm,thendrawingthewordfromφz.θandφhavepriorsgivenbyDirichletdistributions.Ourgeneralizationaddsstructuretothegener-ationoftheDirichletparameters.Thepriorsfortheseparametersaremodeledaslog-linearcom-binationsofunderlyingcomponents.Componentsarereal-valuedvectorsoflengthequaltothevo-cabularysizeV(forpriorsoverworddistribu-tions)orlengthequaltothenumberoftopicsK

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
1
1
5
6
6
7
2
8

/
t

un
c
_
un
_
0
0
1
2
1
p
d

b
oui
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(forpriorsovertopicdistributions).Forexample,wemightassumethattopicsaboutsportslikebaseballandfootballshareacommonprior–givenbyacomponent–withgeneralwordsaboutsports.Aﬁne-grainedtopicaboutsteroiduseinsportsmightbecreatedbycombiningcom-ponentsaboutbroadertopicslikesports,medicine,andcrime.Bymodelingthepriorsascombina-tionsofcomponentsthataresharedacrossalltop-ics,wecanlearninterestingconnectionsbetweentopics,wherecomponentsprovideanadditionallatentlayerforcorpusunderstanding.Aswe’llshowinthenextsection,byimposingcertainrequirementsonwhichcomponentsfeedintowhichtopics(ordocuments),wecaninduceavarietyofmodelstructures.Forexample,ifwewanttomodelatopichierarchy,werequirethateachtopicdependonexactlyoneparentcompo-nent.Ifwewanttojointlymodeltopicandide-ologyinacorpusofpoliticaldocuments(§6),wemaketopicpriorsacombinationofonecomponentfromeachoftwogroups:atopicalcomponentandanideologicalcomponent,resultinginideology-speciﬁctopicslike“conservativeeconomics”.Componentsconstructpriorsasfollows.Forthetopic-speciﬁcworddistributionsφ,thereareC(φ)topiccomponents.Thekthtopic’sprioroverφkisaweightedcombination(withcoefﬁcientvectorβk)oftheC(φ)composants(wherecomponentcisdenotedωc).Forthedocument-speciﬁctopicdis-tributionsθ,thereareC(je)documentcomponents.Themthdocument’sprioroverθmisaweightedcombination(coefﬁcientsαm)oftheC(je)compo-nents(wherecomponentcisdenotedδc).Onceconditionedonthesepriors,themodelisidenticaltoLDA.Thegenerativestoryisde-scribedinFigure1.WecallthisfamilyofmodelsSPRITE:StructuredPRIorTopicmodEls.Toillustratetherolethatcomponentscanplay,consideranexampleinwhichwearemodelingre-searchtopicsinacorpusofNLPabstracts(aswedoin§7.3).Considerthreespeech-relatedtopics:signalprocessing,automaticspeechrecognition,anddialogsystems.Conceptualizedasahierar-chy,thesetopicsmightbelongtoahigherlevelcategoryofspokenlanguageprocessing.SPRITEallowstherelationshipbetweenthesethreetopicstobedeﬁnedintwoways.One,wecanmodelthatthesetopicswillallhavewordsincommon.Thisishandledbythetopiccomponents–thesethreetopicscouldalldrawfromacommon“spokenlan-•Generatehyperparameters:un,β,d,ω(§3)•Foreachdocumentm,generateparameters:1.˜θmk=exp(PC(je)c=1αmcδck),1≤k≤K2.θm∼Dirichlet(˜θm)•Foreachtopick,generateparameters:1.˜φkv=exp(PC(φ)c=1βkcωcv),1≤v≤V2.φk∼Dirichlet(˜φk)•Foreachtoken(m,n),generatedata:1.Topic(unobserved):zm,n∼θm2.Word(observed):wm,n∼φzm,nFigure1:ThegenerativestoryofSPRITE.ThedifferencefromlatentDirichletallocation(Bleietal.,2003b)isthegen-erationoftheDirichletparameters.guage”topiccomponent,withhigh-weightwordssuchasspeechandspoken,whichinformsthepriorofallthreetopics.Second,wecanmodelthatthesetopicsarelikelytooccurtogetherindocu-ments.Forexample,articlesaboutdialogsystemsarelikelytodiscussautomaticspeechrecognitionasasubroutine.Thisishandledbythedocumentcomponents–therecouldbea“spokenlanguage”documentcomponentthatgiveshighweighttoallthreetopics,sothatifadocumentdrawitspriorfromthiscomponent,thenitismorelikelytogiveprobabilitytothesetopicstogether.Thenextsectionwilldescribehowparticularpriorsoverthecoefﬁcientscaninducevariousstructuressuchashierarchiesandfactorizations,andcomponentsandcoefﬁcientscanalsobepro-videdasinputtoincorporatesupervisionandpriorknowledge.ThegeneralpriorstructureusedinSPRITEcanbeusedtorepresentawidearrayofexistingtopicmodels,outlinedinSection4.3TopicStructuresBychangingtheparticularconﬁgurationofthehy-perparameters–thecomponentcoefﬁcients(αandβ)andthecomponentweights(δandω)–weob-tainadiverserangeofmodelstructuresandbehav-iors.Wenowdescribepossiblestructuresandthecorrespondingpriors.3.1ComponentStructuresThissubsectiondiscussesvariousgraphstructuresthatcandescribetherelationbetweentopiccom-ponentsandtopics,andbetweendocumentcom-ponentsanddocuments,illustratedinFigure2.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
2
1
1
5
6
6
7
2
8

/
t

un
c
_
un
_
0
0
1
2
1
p
d

b
oui
g
u
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

(un)DenseDAG(b)SparseDAG(c)Tree(d)FactoredForestFigure2:Examplegraphstructuresdescribingpossiblerelationsbetweencomponents(middlerow)andtopicsordocuments(bottomrow).Edgescorrespondtonon-zerovaluesforαorβ(thecomponentcoefﬁcientsdeﬁningpriorsoverthedocumentandtopicdistributions).Therootnodeisasharedprioroverthecomponentweights(withotherpossibilitiesdiscussedin§3.3).3.1.1DirectedAcyclicGraphThegeneralSPRITEmodelcanbethoughtofasadensedirectedacyclicgraph(DAG),whereeverydocumentortopicisconnectedtoeverycompo-nentwithsomeweightαorβ.Whenmanyoftheαorβcoefﬁcientsarezero,theDAGbecomessparse.AsparseDAGhasanintuitiveinterpre-tation:eachdocumentortopicdependsonsomesubsetofcomponents.Thedefaultpriorovercoefﬁcientsthatweuseinthisstudyisa0-meanGaussiandistribution,whichencouragestheweightstobesmall.Wenotethattoinduceasparsegraph,onecouldusea0-meanLaplacedistributionastheprioroverαandβ,whichprefersparameterssuchthatsomecomponentsarezero.3.1.2TreeWheneachdocumentortopichasexactlyonepar-ent(onenonzerocoefﬁcient)weobtainatwo-leveltreestructure.Thisstructurenaturallyarisesintopichierarchies,forexample,whereﬁne-grainedtopicsarechildrenofcoarse-grainedtopics.Tocreatean(unweighted)arbre,werequireαmc∈{0,1}andPcαmc=1foreachdocu-mentm.Similarly,βkc∈{0,1}andPcβkc=1foreachtopick.Inthissetting,αmandβkareindicatorvectorswhichselectasinglecomponent.Inthisstudy,ratherthanstrictlyrequiringαmandβktobebinary-valuedindicatorvectors,wecreatearelaxationthatallowsforeasierparameterestimation.Weletαmandβktoreal-valuedvari-ablesinasimplex,butplaceapriorovertheirval-uestoencouragesparsevalues,favoringvectorswithasinglecomponentnear1andothersnear0.ThisisachievedusingaDirichlet(r<1)distribu-tionastheprioroverαandβ,whichhashigherdensityneartheboundariesofthesimplex.11ThisgeneralizesthetechniqueusedinPaulandDredze(2012),whoapproximatedbinaryvariableswithreal-valuedvariablesin(0,1),byusinga“U-shaped”Beta(ρ<1)distri-Foraweightedtree,αandβcouldbeaproductoftwovariables:an“integer-like”indicatorvec-torwithsparseDirichletpriorassuggestedabove,combinedwithareal-valuedweight(e.g.,withaGaussianprior).Wetakethisapproachinourmodeloftopicandperspective(§6).3.1.3FactoredForestByusingstructuredsparsityovertheDAG,wecanobtainastructurewherecomponentsaregroupedintoGfactors,andeachdocumentortopichasoneparentfromeachgroup.Figure2(d)illus-tratesthis:theleftthreecomponentsbelongtoonegroup,therighttwobelongtoanother,andeachbottomnodehasexactlyoneparentfromeach.ThisisaDAGthatwecalla“factoredforest”be-causethesubgraphsassociatedwitheachgroupinisolationaretrees.Thisstructurearisesin“multi-dimensional”modelslikeSAGE(Eisensteinetal.,2011)andFactorialLDA(PaulandDredze,2012),whichallowtokenstobeassociatedwithmultiplevariables(e.g.atopicalongwithavariabledenot-ingpositiveornegativesentiment).Thisallowsworddistributionstodependonbothfactors.The“exactlyoneparent”indicatorconstraintisthesameasinthetreestructurebutenforcesatreeonlywithineachgroup.Thiscanthereforebe(softly)modeledusingasparseDirichletpriorasdescribedintheprevioussubsection.Inthiscase,thesubsetsofcomponentsbelongingtoeachfac-torhaveseparatesparseDirichletpriors.UsingtheexamplefromFigure2(d),theﬁrstthreecom-ponentindicatorswouldcomefromoneDirichlet,whilethelattertwocomponentindicatorswouldcomefromasecond.3.2TyingTopicandDocumentComponentsAdesirablepropertyformanysituationsisforthetopicanddocumentcomponentstocorrespondtobutionasthepriortoencouragesparsity.TheDirichletdistri-butionisthemultivariateextensionoftheBetadistribution. l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 46 eachother.Forexample,ifwethinkofthecom-ponentsascoarse-grainedtopicsinahierarchy,thenthecoefﬁcientsβenforcethattopicworddis-tributionsshareapriordeﬁnedbytheirparentωcomponent,whilethecoefﬁcientsαrepresentadocument’sproportionsofcoarse-grainedtopics,whicheffectsthedocument’sprioroverchildtop-ics(throughtheδvectors).Considertheexamplewithspokenlanguagetopicsin§2:thesethreetop-ics(signalprocessing,speechrecognition,anddi-alogsystems)areapriorilikelybothtosharethesamewordsandtooccurtogetherindocuments.Bytyingthesetogether,weensurethatthepat-ternsareconsistentacrossthetwotypesofcom-ponents,andthepatternsfrombothtypescanre-inforceeachotherduringinference.Inthiscase,thenumberoftopiccomponentsisthesameasthenumberofdocumentcompo-nents(C(φ)=C(je)),andthecoefﬁcients(βcz)ofthetopiccomponentsshouldcorrelatewiththeweightsofthedocumentcomponents(δzc).Theapproachwetake(§6)istodeﬁneδandβasaproductoftwovariables(suggestedin§3.1.2):abinarymaskvariable(withsparseDirichletprior),whichweletbeidenticalforbothδandβ,andareal-valuedpositiveweight.3.3DeepComponentsAsforpriorsoverthecomponentweightsδandω,weassumetheyaregeneratedfroma0-meanGaussian.Whilenotexperimentedwithinthisstudy,itisalsopossibletoallowthecomponentsthemselvestohaverichpriorswhicharefunctionsofhigherlevelcomponents.Forexample,ratherthanassumingameanofzero,themeancouldbeaweightedcombinationofhigherlevelweightvec-tors.ThisapproachwasusedbyPaulandDredze(2013)inFactorialLDA,inwhicheachωcompo-nenthaditsownGaussianpriorprovidedasinputtoguidetheparameters.4SpecialCasesandExtensionsWenowdescribeseveralexistingDirichletpriortopicmodelsandshowhowtheyarespecialcasesofSPRITE.Table1summarizesthesemodelsandtheirrelationtoSPRITE.Inalmosteverycase,wealsodescribehowtheSPRITErepresentationofthemodeloffersimprovementsovertheoriginalmodelorcanleadtonovelextensions.ModelSec.DocumentpriorsTopicpriorsLDA4.1SinglecomponentSinglecomponentSCTM4.2SinglecomponentSparsebinaryβSAGE4.3SinglecomponentSparseωFLDA4.3BinaryδistransposeofβFactoredbinaryβPAM4.4αaresupertopicweightsSinglecomponentDMR4.5αarefeaturevaluesSinglecomponentTable1:TopicmodelswithDirichletpriorsthataregen-eralizedbySPRITE.Thedescriptionofeachmodelcanbefoundinthenotedsectionnumber.PAMisnotequivalent,butcapturesverysimilarbehavior.ThedescribedcomponentformulationsofSCTMandSAGEareequivalent,butthesedifferfromSPRITEinthatthecomponentsdirectlydeﬁnetheparameters,ratherthanpriorsovertheparameters.4.1LatentDirichletAllocationInLDA(Bleietal.,2003b),allθvectorsaredrawnfromthesameprior,asareallφvectors.Thisisabasicinstanceofourmodelwithonlyonecomponentatthetopicanddocumentlevels,C(je)=C(φ)=1,withcoefﬁcientsα=β=1.4.2SharedComponentsTopicModelsSharedcomponentstopicmodels(SCTM)(Gorm-leyetal.,2010)deﬁnetopicsasproductsof“com-ponents”,wherecomponentsareworddistribu-tions.Tousethenotationofourpaper,thekthtopic’sworddistributioninSCTMisparameter-izedbyφkv∝Qcωβkccv,wheretheωvectorsareworddistributions(ratherthanvectorsinRV),andtheβkc∈{0,1}variablesareindicatorsdenotingwhethercomponentcisintopick.ThisiscloselyrelatedtoSPRITE,wheretop-icsalsodependonproductsofunderlyingcom-ponents.AmajordifferenceisthatinSCTM,thetopic-speciﬁcworddistributionsareexactlydeﬁnedasaproductofcomponents,whereasinSPRITE,itisonlythepriorthatisaproductofcomponents.2AnotherdifferenceisthatSCTMhasanunweightedproductofcomponents(βisbi-nary),whereasSPRITEallowsforweightedprod-ucts.Thelog-linearparameterizationleadstosim-pleroptimizationproceduresthantheproductpa-rameterization.Finally,thecomponentsinSCTMonlyapplytotheworddistributions,andnotthetopicdistributionsindocuments.4.3FactoredTopicModelsFactoredtopicmodelscombinemultipleaspectsofthetexttogeneratethedocument(insteadofjusttopics).OnesuchtopicmodelisFactorialLDA(FLDA)(PaulandDredze,2012).InFLDA,2TheposteriorbecomesconcentratedaroundthepriorwhentheDirichletvarianceislow,inwhichcaseSPRITEbe-haveslikeSCTM.SPRITEisthereforemoregeneral. l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 47 “topics”areactuallytuplesofpotentiallymultiplevariables,suchasaspectandsentimentinonlinereviews(Pauletal.,2013).Eachdocumentdistri-butionθmisadistributionoverpairs(orhigher-dimensionaltuplesiftherearemorethantwofac-tors),andeachpair(j,k)hasaworddistribu-tionφ(j,k).FLDAusesasimilarlog-linearpa-rameterizationoftheDirichletpriorsasSPRITE.Usingournotation,theDirichlet(˜φ(j,k))priorforφ(j,k)isdeﬁnedas˜φ(j,k),v=exp(ωjv+ωkv),whereωjisaweightvectoroverthevocabularyforthejthcomponentoftheﬁrstfactor,andωkencodestheweightsforthekthcomponentofthesecondfactor.(Somebiastermsareomittedforsim-plicity.)Theprioroverθmhasasimilarform:˜θm,(j,k)=exp(αmj+αmk),whereαmjisdocu-mentm’spreferenceforcomponentjoftheﬁrstfactor(andlikewiseforkofthesecond).ThiscorrespondstoaninstantiationofSPRITEusinganunweightedfactoredforest(§3.1.3),whereβzc=δcz(§3.2,recallthatδaredocumentcomponentswhileβarethetopiccoefﬁcients).Eachsubtopicz(whichisapairofvariablesinthetwo-factormodel)hasoneparentcomponentfromeachfactor,indicatedbyβzwhichisbinary-valued.Atthedocumentlevelinthetwo-factorexample,δjisanindicatorvectorwithvaluesof1forallpairswithjastheﬁrstcomponent,andthusthecoefﬁcientαmjcontrolsthepriorforallsuchpairsoftheform(j,·),andlikewiseδkindicatespairswithkasthesecondcomponent,controllingthepriorover(·,k).TheSPRITErepresentationoffersabeneﬁtovertheoriginalFLDAmodel.FLDAassumesthattheentireCartesianproductofthedifferentfactorsisrepresentedinthemodel(e.g.φparametersforev-erypossibletuple),whichleadstoissueswithefﬁ-ciencyandoverparameterizationwithhighernum-bersoffactors.WithSPRITE,wecansimplyﬁxthenumberof“topics”toanumbersmallerthanthesizeoftheCartesianproduct,andthemodelwilllearnwhichsubsetoftuplesareincluded,throughthevaluesofβandδ.Finally,anotherexistingmodelfamilythatal-lowsfortopicfactorizationisthesparseadditivegenerativemodel(SAGE)(Eisensteinetal.,2011).SAGEusesalog-linearparameterizationtodeﬁneworddistributions.SAGEisageneralfamilyofmodelsthatneednotbefactored,butispresentedasanefﬁcientsolutionforincludingmultiplefac-tors,suchastopicandgeographyortopicandau-thorideology.LikeSCTM,φisexactlydeﬁnedasaproductofωweights,ratherthanourapproachofusingtheproducttodeﬁneaprioroverφ.4.4TopicHierarchiesandCorrelationsWhilethetwoprevioussubsectionsprimarilyfo-cusedonworddistributions(withFLDAbeinganexceptionthatfocusedonboth),SPRITE’spriorsovertopicdistributionsalsohaveusefulcharac-teristics.Thecomponent-speciﬁcδvectorscanbeinterpretedascommontopicdistributionpat-terns,whereeachcomponentislikelytogivehighweighttogroupsoftopicsthattendtooccurto-gether.Eachdocument’sαweightsencodewhichofthetopicgroupsarepresentinthatdocument.SimilarpropertiesarecapturedbythePachinkoallocationmodel(PAM)(LiandMcCallum,2006).UnderPAM,eachdocumenthasadistri-butionoversupertopics.Eachsupertopicisas-sociatedwithaDirichletprioroversubtopicdis-tributions,wheresubtopicsarethelowleveltop-icsthatareassociatedwithwordparametersφ.Documentsalsohavesupertopic-speciﬁcdistribu-tionsoversubtopics(drawnfromeachsupertopic-speciﬁcDirichletprior).Eachtopicinadocumentisdrawnbyﬁrstdrawingasupertopicfromthedocument’sdistribution,thendrawingasubtopicfromthatsupertopic’sdocumentdistribution.Whilenotequivalent,thisisquitesimilartoSPRITEwheredocumentcomponentscorrespondtosupertopics.Eachdocument’sαweightscanbeinterpretedtobesimilartoadistributionoversupertopics,andeachδvectoristhatsupertopic’scontributiontotheprioroversubtopics.Theprioroverthedocument’stopicdistributionisthusaf-fectedbythedocument’ssupertopicweightsα.TheSPRITEformulationnaturallyallowsforpowerfulextensionstoPAM.Onepossibilityistoincludetopiccomponentsfortheworddistri-butions,inadditiontodocumentcomponents,andtotietogetherδczandβzc(§3.2).Thismodelstheintuitivecharacteristicthatsubtopicsbelongingtosimilarsupertopics(encodedbyδ)shouldcomefromsimilarpriorsovertheirworddistributions(sincetheywillhavesimilarβvalues).Thatis,childrenofasupertopicaretopicallyrelated–theyarelikelytosharewords.Thisisaricheralterna-tivetothehierarchicalvariantofPAMproposedbyMimnoetal.(2007),whichmodeledseparateworddistributionsforsupertopicsandsubtopics,butthesubtopicswerenotdependentonthesuper- l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 48 topicworddistributions.Anotherextensionistoformastricttreestructure,makingeachsubtopicbelongtoexactlyonesupertopic:atruehierarchy.4.5ConditioningonDocumentAttributesSPRITEalsonaturallyprovidestheabilitytocon-ditiondocumenttopicdistributionsonfeaturesofthedocument,suchasauserratinginareview.Todothis,letthenumberofdocumentcompo-nentsbethenumberoffeatures,andthevalueofαmcisthemthdocument’svalueofthecthfea-ture.Theδvectorstheninﬂuencethedocument’stopicpriorbasedonthefeaturevalues.Forexam-ple,increasingαmcwillincreasethepriorfortopiczifδczispositiveanddecreasethepriorifδczisnegative.ThisissimilartothestructureusedforPAM(§4.4),butheretheαweightsareﬁxedandprovidedasinput,ratherthanlearnedandinter-pretedassupertopicweights.ThisisidenticaltotheDirichlet-multinomialregression(DMR)topicmodel(MimnoandMcCallum,2008).TheDMRtopicmodeldeﬁne’seachdocument’sDirichletpriorovertopicsasalog-linearfunctionofthedocument’sfeaturevaluesandregressioncoefﬁ-cientsforeachtopic.Thecthfeature’sregressioncoefﬁcientscorrespondtotheδcvectorinSPRITE.5InferenceandParameterEstimationWenowdiscusshowtoinfertheposteriorofthelatentvariableszandparametersθandφ,andﬁndmaximumaposteriori(MAP)estimatesofthehy-perparametersα,β,d,andω,giventheirhyperpri-ors.WetakeaMonteCarloEMapproach,usingacollapsedGibbssamplertosamplefromthepos-teriorofthetopicassignmentszconditionedonthehyperparameters,thenoptimizingthehyperpa-rametersusinggradient-basedoptimizationcondi-tionedonthesamples.Giventhehyperparameters,thesamplingequa-tionsareidenticaltothestandardLDAsampler(GrifﬁthsandSteyvers,2004).Thepartialderiva-tiveofthecollapsedloglikelihoodLofthecorpuswithrespecttoeachhyperparameterβkcis:∂L∂βkc=∂P(β)∂βkc+Xvωcv˜φkv×(1)(cid:16)Ψ(nkv+˜φkv)−Ψ(˜φkv)+Ψ(Pk0˜φk0v)−Ψ(Pk0nk0v+˜φk0v)(cid:17)where˜φkv=exp(Pc0βkc0ωc0v),nkvisthenumberoftimeswordvisassignedtotopick(inthesamplesfromtheE-step),andΨisthedigammafunction,thederivativeofthelogofthegammafunction.ThedigammatermsarisefromtheDirichlet-multinomialdistribution,whenintegrat-ingouttheparametersφ.P(β)isthehyperprior.Fora0-meanGaussianhyperpriorwithvarianceσ2,∂P(β)∂βkc=−βkcσ2.UnderaDirchlet(r)hyper-prior,whenwewantβtorepresentanindicatorvector(§3.1.2),∂P(β)∂βkc=ρ−1βkc.Thepartialderivativesfortheotherhyperpa-rametersaresimilar.Ratherthaninvolvingasumoverthevocabulary,∂L∂δcksumsoverdocuments,while∂L∂ωcvand∂L∂αmcsumovertopics.OurinferencealgorithmalternatesbetweenoneGibbsiterationandoneiterationofgradientas-cent,sothattheparameterschangegradually.Forunconstrainedparameters,weusetheupdaterule:xt+1=xt+ηt∇L(xt),forsomevariablexandastepsizeηtatiterationt.Forparameterscon-strainedtothesimplex(suchaswhenβisasoftindicatorvector),weuseexponentiatedgradientascent(KivinenandWarmuth,1997)withtheup-daterule:xt+1i∝xtiexp(ηt∇iL(xt)).5.1TighteningtheConstraintsForvariablesthatweprefertobebinarybuthavesoftenedtocontinuousvariablesusingsparseBetaorDirichletpriors,wecanstraightforwardlystrengthenthepreferencetobebinarybymodify-ingtheobjectivefunctiontofavorthepriormoreheavily.Speciﬁcally,underaDirichlet(r<1)priorwewillintroduceascalingparameterτt≥1tothepriorloglikelihood:τtlogP(β)withpar-tialderivativeτtρ−1βkc,whichaddsextraweighttothesparseDirichletpriorintheobjective.Thealgorithmusedinourexperimentsbeginswithτ1=1andoptionallyincreasesτovertime.Thisisadeterministicannealingapproach,whereτcorrespondstoaninversetemperature(UedaandNakano,1998;SmithandEisner,2006).Asτapproachesinﬁnity,theprior-annealedMAPobjectivemaxβP(φ|β)P(β)τapproachesmaxβP(φ|β)maxβP(β).AnnealingonlythepriorP(β)resultsinmaximizationofthistermonly,whiletheoutermaxchoosesagoodβunderP(φ|β)asatie-breakeramongallβvaluesthatmaximizetheinnermax(binary-valuedβ).3Weshowexperimentally(§7.2.2)thatannealingtheprioryieldsvaluesthatsatisfytheconstraints.3Othermodiﬁcationscouldbemadetotheobjectivefunc-tiontoinducesparsity,suchasentropyregularization(Bala-subramanyanandCohen,2013). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 49 6AFactoredHierarchicalModelofTopicandPerspectiveWewillnowdescribeaSPRITEmodelthaten-compassesnearlyallofthestructuresandexten-sionsdescribedin§3–4,followedbyexperimen-talresultsusingthismodeltojointlycapturetopicand“perspective”inacorpusofpoliticaldebates(whereperspectivecorrespondstoideology)andacorpusofonlinedoctorreviews(whereperspec-tivecorrespondstothereviewsentiment).First,wewillcreateatopichierarchy(§4.4).Thehierarchywillmodelbothtopicsanddocu-ments,whereαmisdocumentm’ssupertopicpro-portions,δcisthecthsupertopic’ssubtopicprior,ωcisthecthsupertopic’swordprior,andβkistheweightvectorthatselectsthekthtopic’spar-entsupertopic,whichincorporates(soft)indicatorvectorstoencodeatreestructure(§3.1.2).Wewantaweightedtree;whileeachβkhasonlyonenonzeroelement,thenonzeroelementcanbeavalueotherthan1.Wedothisbyreplac-ingthesinglecoefﬁcientβkcwithaproductoftwovariables:bkcˆβkc.Here,ˆβkisareal-valuedweightvector,whilebkcisabinaryindicatorvectorwhichzeroesoutallbutoneelementofβk.Wedothesamewiththeδvectors,replacingδckwithbkcˆδck.Thebvariablesaresharedacrossbothtopicanddocumentcomponents,whichishowwetiethesetogether(§3.2).Werelaxthebinaryrequirementandinsteadallowapositivereal-valuedvectorwhoseelementssumto1,withaDirichlet(ρ<1)priortoencouragesparsity(§3.1.2).Tobeproperlyinterpretedasahierarchy,weconstrainthecoefﬁcientsαandβ(andbyex-tension,δ)tobepositive.Tooptimizethesepa-rametersinamathematicallyconvenientway,wewriteβkcasexp(logβkc),andinsteadoptimizelogβkc∈Rratherthanβkc∈R+.Second,wefactorize(§4.3)ourhierarchysuchthateachtopicdependsnotonlyonitssupertopic,butalsoonavalueindicatingperspective.Forex-ample,aconservativetopicaboutenergywillap-peardifferentlyfromaliberaltopicaboutenergy.Thepriorforatopicwillbealog-linearcombina-tionofbothasupertopic(e.g.energy)andaper-spective(e.g.liberal)weightvector.Thevariablesassociatedwiththeperspectivecomponentarede-notedwithsuperscript(P)ratherthansubscriptc.Tolearnmeaningfulperspectiveparameters,weincludesupervisionintheformofdocumentat-tributes(§4.5).Eachdocumentincludesapos-•bk∼Dirichlet(ρ<1)(softindicator)•α(P)isgivenasinput(perspectivevalue)•δ(P)k=β(P)k•˜φkv=exp(ω(B)v+β(P)kω(P)v+Pcbkcˆβkcωcv)•˜θmk=exp(δ(B)k+α(P)mδ(P)k+Pcbkcαmcˆδck)Figure3:SummaryofthehyperparametersinourSPRITE-basedtopicandperspectivemodel(§6).itiveornegativescoredenotingtheperspective,whichisthevariableα(P)mfordocumentm.Sinceα(P)arethecoefﬁcientsforδ(P),positivevaluesofδ(P)kindicatethattopickismorelikelyiftheau-thorisconservative(whichhasapositiveαscoreinourdata),andlesslikelyiftheauthorisliberal(whichhasanegativescore).Thereisonlyasingleperspectivecomponent,butitrepresentstwoendsofaspectrumwithpositiveandnegativeweights;β(P)andδ(P)arenotconstrainedtobepositive,unlikethesupertopics.Wealsosetβ(P)k=δ(P)k.Thismeansthattopicswithpositiveδ(P)kwillalsohaveapositiveβcoefﬁcientthatismultipliedwiththeperspectivewordvectorω(P).Finally,weinclude“bias”componentvectorsdenotedω(B)andδ(B),whichactasoverallweightsoverthevocabularyandtopics,sothatthecomponent-speciﬁcωandδweightscanbeinter-pretedasdeviationsfromtheglobalbiasweights.Figure3summarizesthemodel.Thisincludesmostofthefeaturesdescribedabove(trees,fac-toredstructures,tyingtopicanddocumentcompo-nents,anddocumentattributes),sowecanablatemodelfeaturestomeasuretheireffect.7Experiments7.1DatasetsandExperimentalSetupWeappliedourmodelstotwocorpora:•Debates:Asetofﬂoordebatesfromthe109th–112thU.S.Congress,collectedbyNguyenetal.(2013),whoalsoappliedahierarchicaltopicmodeltothisdata.Eachdocumentisatran-scriptofonespeaker’sturninadebate,andeachdocumentincludestheﬁrstdimensionoftheDW-NOMINATEscore(LewisandPoole,2004),areal-valuedscoreindicatinghowconservative(positive)orliberal(negative)thespeakeris.Thisvalueisα(P).Wetookasampleof5,000documentsfromtheHousedebates(850,374to-kens;7,426types),balancedacrosspartyafﬁlia- l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 50 tion.Wesampledfromthemostpartisanspeak-ers,removingscoresbelowthemedianvalue.•Reviews:DoctorreviewsfromRateMDs.com,previouslyanalyzedusingFLDA(Pauletal.,2013;Wallaceetal.,2014).Thereviewscon-tainratingsona1–5scaleformultipleaspects.Wecenteredtheratingsaroundthemiddlevalue3,thentookreviewsthathadthesamesignforallaspects,andaveragedthescorestoproduceavalueforα(P.).Ourcorpuscontains20,000documents(476,991tokens;10,158les types),bal-ancedacrosspositive/negativescores.Unlessotherwisespeciﬁed,K=50topicsandC=10components(excludingtheperspectivecomponent)forDebates,andK=20andC=5forReviews.Thesevalueswerechosenasaqualita-tivepreference,notoptimizedforpredictiveper-formance,butweexperimentwithdifferentvaluesin§7.2.2.WesetthestepsizeηtaccordingtoAda-Grad(Duchietal.,2011),wherethestepsizeistheinverseofthesumofsquaredhistoricalgradi-ents.4WeplaceasparseDirichlet(ρ=0.01)prioronthebvariables,andapplyweakregulariza-tiontoallotherhyperparametersviaaN(0,102)prior.Thesehyperparameterswerechosenafteronlyminimaltuning,andwereselectedbecausetheyshowedstableandreasonableoutputqualita-tivelyduringpreliminarydevelopment.Weranourinferencealgorithmfor5000itera-tions,estimatingtheparametersθandφbyaver-agingtheﬁnal100iterations.Ourresultsareaver-agedacross10randomlyinitializedsamplers.57.2EvaluatingtheTopicPerspectiveModel7.2.1AnalysisofOutputFigure4showsexamplesoftopicslearnedfromtheReviewscorpus.Theﬁgureincludesthehigh-estprobabilitywordsinvarioustopicsaswellasthehighestweightwordsinthesupertopiccom-ponentsandperspectivecomponent,whichfeedintothepriorsoverthetopicparameters.Weseethatonesupertopicincludesmanywordsrelatedtosurgery,suchasprocedureandperformed,andhasmultiplechildren,includingatopicaboutdentalwork.Anothersupertopicincludeswordsdescrib-ingfamilymemberssuchaskidsandhusband.4AdaGraddecayedtooquicklyforthebvariables.Forthese,weusedavariantsuggestedbyZeiler(2012)whichusesanaverageofhistoricalgradientsratherthanasum.5Ourcodeandthedatawillbeavailableat:http://cs.jhu.edu/˜mpaul.Onetopichasbothsupertopicsasparents,whichappearstodescribesurgeriesthatsavedafamilymember’slife,withtopwordsincluding{saved,vie,husband,cancer}.Theﬁgurealsoillustrateswhichtopicsareassociatedmorewithpositiveornegativereviews,asindicatedbythevalueofδ(P.).InterpretableparameterswerealsolearnedfromtheDebatescorpus.Considertwotopicsaboutenergythathavepolarvaluesofδ(P.).Theconservative-leaningtopicisaboutoilandgas,withtopwordsincluding{oil,gas,companies,prices,drilling}.Theliberal-leaningtopicisaboutrenewableenergy,withtopwordsinclud-ing{energy,nouveau,technologie,avenir,renewable}.Bothofthesetopicsshareacommonparentofanindustry-relatedsupertopicwhosetopwordsare{industry,companies,marché,price}.Anonparti-santopicunderthissamesupertopichastopwords{credit,financier,loan,mortgage,loans}.7.2.2QuantitativeEvaluationWeevaluatedthemodelontwopredictivetasksaswellastopicquality.Theﬁrstmetricisperplex-ityofheld-outtext.Theheld-outsetisbasedontokensratherthandocuments:wetrainedonevennumberedtokensandtestedonoddtokens.Thisisatypeof“documentcompletion”evaluation(Wal-lachetal.,2009b)whichmeasureshowwellthemodelcanpredictheld-outtokensofadocumentafterobservingonlysome.Wealsoevaluatedhowwellthemodelcanpre-dicttheattributevalue(DW-NOMINATEscoreoruserrating)ofthedocument.Wetrainedalinearregressionmodelusingthedocumenttopicdistri-butionsθasfeatures.Weheldouthalfofthedocu-mentsfortestingandmeasuredthemeanabsoluteerror.Whenestimatingdocument-speciﬁcSPRITEparametersforheld-outdocuments,weﬁxthefea-turevalueα(P.)m=0forthatdocument.Thesepredictiveexperimentsdonotdirectlymeasureperformanceatmanyoftheparticulartasksthattopicmodelsarewellsuitedfor,likedataexploration,summarization,andvisualiza-tion.Wethereforealsoincludeametricthatmoredirectlymeasuresthequalityandinterpretabilityoftopics.Weusethetopiccoherencemetricintro-ducedbyMimnoetal.(2011),whichisbasedonco-occurrencestatisticsamongeachtopic’smostprobablewordsandhasbeenshowntocorrelatewithhumanjudgmentsoftopicquality.Thismet-ricmeasuresthequalityofeachtopic,andwe l D o w n o a d e d f r o m h t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 51 best!amour!années!bienveillance!enfants!really!wonderful!hes!great!famille!comfortable!listens!thank!amazing!…!…!est allé!pay!plus tard!staff!asked!money!entreprise!refused!pain!ofﬁce!didn’t!dit!told!doctor!surgery!pain!est allé!dr!surgeon!told!procedure!mois!performed!removed!gauche!ﬁx!dit!plus tard!années!dr!vie!thank!saved!god!husband!heart!cancer!années!helped!doctors!hospital!father!homme!capable!told!hospital!dr!sang!est allé!plus tard!jours!mère!dit!er!cancer!weight!home!father!mois!dentist!dents!dental!travail!tooth!root!mouth!pain!dentists!est allé!ﬁlling!canal!dr!crown!cleaning!dr!best!enfants!années!kids!cares!hes!care!vieux!daughter!enfant!husband!famille!pediatrician!trust!baby!fils!pregnancy!dr!enfant!pregnant!ob!daughter!ﬁrst!delivered!gyn!birth!delivery!section!hospital!dr!best!années!doctor!amour!cares!ive!enfants!patients!hes!famille!kids!seen!doctors!fils!pain!surgery!dr!est allé!knee!foot!neck!mri!injury!shoulder!os!mois!told!surgeon!therapy!Perspective!“Surgery”!“Family”!Figure4:Examplesoftopics(grayboxes)andcomponents(coloredboxes)learnedontheReviewscorpuswith20topicsand5components.Wordswiththehighestandlowestvaluesofω(P.),theperspectivecomponent,areshownontheleft,reﬂectingpositiveandnegativesentimentwords.Thewordswithlargestωvaluesintwosupertopiccomponentsarealsoshown,withmanuallygivenlabels.Arrowsfromcomponentstotopicsindicatethatthetopic’sworddistributiondrawsfromthatcomponentinitsprior(withnon-zeroβvalue).Therearealsoimplicitarrowsfromtheperspectivecomponenttoalltopics(omittedforclarity).Theverticalpositionsoftopicsreﬂectthetopic’sperspectivevalueδ(P.).Topicscenteredabovethemiddlelinearemorelikelytooccurinreviewswithpositivescores,whiletopicsbelowthemiddlelinearemorelikelyinnegativereviews.Notethatthisisa“soft”hierarchybecausethetreestructureisnotstrictlyenforced,sosometopicshavemultipleparentcomponents.Table3showshowstricttreescanbelearnedbytuningtheannealingparameter.measuretheaveragecoherenceacrossalltopics:1KKXk=1MXm=2m−1Xl=1logDF(vkm,vkl)+1DF(vkl)(2)whereDF(v,w)isthedocumentfrequencyofwordsvandw(thenumberofdocumentsinwhichtheybothoccur),DF(v)isthedocumentfre-quencyofwordv,andvkiistheithmostprobablewordintopick.WeusethetopM=20words.Thismetricislimitedtomeasuringonlythequal-ityofwordclusters,ignoringthepotentiallyim-provedinterpretabilityoforganizingthedataintocertainstructures.However,itisstillusefulasanalternativemeasureofperformanceandutility,in-dependentofthemodels’predictiveabilities.Usingthesethreemetrics,wecomparedtosev-eralvariants(denotedinbold)ofthefullmodeltounderstandhowthedifferentpartsofthemodelaffectperformance:•Variantsthatcontainthehierarchycomponentsbutnottheperspectivecomponent(Hierarchyonly),andviceversa(Perspectiveonly).•The“hierarchyonly”modelusingonlydocu-mentcomponentsδandnotopiccomponents.ThisisaPAM-stylemodelbecauseitexhibitssimilarbehaviortoPAM(§4.4).Wealsocom-paredtotheoriginalPAMmodel.•The“hierarchyonly”modelusingonlytopiccomponentsωandnodocumentcomponents.ThisisaSCTM-stylemodelbecauseitexhibitssimilarbehaviortoSCTM(§4.2).•Thefullmodelwhereα(P.)islearnedratherthangivenasinput.ThisisaFLDA-stylemodelthathassimilarbehaviortoFLDA(§4.3).WealsocomparedtotheoriginalFLDAmodel.•The“perspectiveonly”modelbutwithouttheω(P.)topiccomponent,sotheattributevalueaf-fectsonlythetopicdistributionsandnottheworddistributions.ThisisidenticaltotheDMRmodelofMimnoandMcCallum(2008)(§4.5).•Amodelwithnocomponentsexceptforthebiasvectorsω(B)andδ(B).Thisisequiva-lenttoLDAwithoptimizedhyperparameters(learned).Wealsoexperimentedwithusingﬁxedsymmetrichyperparameters,usingval-uessuggestedbyGrifﬁthsandSteyvers(2004):50/Kand0.01fortopicandworddistributions.Toputtheresultsincontext,wealsocomparetotwotypesofbaselines:(1)“bagofwords”base-lines,wherewemeasuretheperplexityofadd-onesmoothedunigramlanguagemodels,wemeasure l D o w n o a d e d f r o m h t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 52 DebatesReviewsModelPerplexityPredictionerrorCoherencePerplexityPredictionerrorCoherenceFullmodel†1555.5±2.3†0.615±0.001-342.8±0.9†1421.3±8.4†0.787±0.006-512.7±1.6Hierarchyonly†1561.8±1.40.620±0.002-342.6±1.1†1457.2±6.9†0.804±0.007-509.1±1.9Perspectiveonly†1567.3±2.3†0.613±0.002-342.1±1.2†1413.7±2.2†0.800±0.002-512.0±1.7SCTM-style1572.5±1.60.620±0.002†-335.8±1.11504.0±1.9†0.837±0.002†-490.8±0.9PAM-style†1567.4±1.90.620±0.002-347.6±1.4†1440.4±2.7†0.835±0.004-542.9±6.7FLDA-style†1559.5±2.00.617±0.002-340.8±1.4†1451.1±5.4†0.809±0.006-505.3±2.3DMR1578.0±1.10.618±0.002-343.1±1.0†1416.4±3.0†0.799±0.003-511.6±2.0PAM1578.9±0.30.622±0.003†-336.0±1.11514.8±0.9†0.835±0.003†-493.3±1.2FLDA1574.1±2.20.618±0.002-344.4±1.31541.9±2.30.856±0.003-502.2±3.1LDA(learned)1579.6±1.50.620±0.001-342.6±0.61507.9±2.40.846±0.002-501.4±1.2LDA(ﬁxed)1659.3±0.90.622±0.002-349.5±0.81517.2±0.40.920±0.003-585.2±0.9Bagofwords2521.6±0.00.617±0.000†-196.2±0.01633.5±0.00.813±0.000†-408.1±0.0Naivebaseline7426.0±0.00.677±0.000-852.9±7.410158.0±0.01.595±0.000-795.2±13.0Table2:Perplexityofheld-outtokensandmeanabsoluteerrorforattributepredictionusingvariousmodels(±std.error).†indicatessigniﬁcantimprovement(p<0.05)overoptimizedLDAunderatwo-sidedt-test.thepredictionerrorusingbagofwordsfeatures,andwemeasurecoherenceoftheunigramdistri-bution;(2)naivebaselines,wherewemeasuretheperplexityoftheuniformdistributionovereachdataset’svocabulary,thepredictionerrorwhensimplypredictingeachattributeasthemeanvalueinthetrainingset,andthecoherenceof20ran-domlyselectedwords(repeatedfor10trials).Table2showsthatthefullSPRITEmodelsub-stantiallyoutperformstheLDAbaselineatbothpredictivetasks.Generally,modelvariantswithmorestructureperformbetterpredictively.ThedifferencebetweenSCTM-styleandPAM-styleisthattheformerusesonlytopiccom-ponents(forworddistributions)andthelatterusesonlydocumentcomponents(forthetopicdistri-butions).Resultsshowthatthestructuredpriorsaremoreimportantfortopicthanworddistribu-tions,sincePAM-stylehaslowerperplexityonbothdatasets.However,modelswithbothtopicanddocumentcomponentsgenerallyoutperformeitheralone,includingcomparingthePerspec-tiveonlyandDMRmodels.Theformerincludesbothtopicanddocumentperspectivecomponents,whileDMRhasonlyadocumentlevelcomponent.PAMdoesnotsigniﬁcantlyoutperformopti-mizedLDAinmostmeasures,likelybecauseitup-datesthehyperparametersusingamoment-basedapproximation,whichislessaccuratethanourgradient-basedoptimization.FLDAperplexityis2.3%higherthanoptimizedLDAonReviews,comparabletothe4%reportedbyPaulandDredze(2012)onadifferentcorpus.TheFLDA-styleSPRITEvariant,whichismoreﬂexible,signiﬁ-cantlyoutperformsFLDAinmostmeasures.Theresultsarequitedifferentundertheco-herencemetric.Itseemsthattopiccomponents(whichinﬂuencetheworddistributions)improvecoherenceoverLDA,whiledocumentcompo-nentsworsencoherence.SCTM-style(whichusesonlytopiccomponents)doesthebestinbothdatasets,whilePAM-style(whichusesonlydoc-uments)doestheworst.PAMalsosigniﬁcantlyimprovesoverLDA,despiteworseperplexity.TheLDA(learned)baselinesubstantiallyout-performsLDA(ﬁxed)inallcases,highlightingtheimportanceofoptimizinghyperparameters,con-sistentwithpriorresearch(Wallachetal.,2009a).Surprisingly,manySPRITEvariantsalsooutper-formthebagofwordsregressionbaseline,eventhoughthelatterwastunedtooptimizeperfor-manceusingheavy‘2regularization,whichweappliedonlyweakly(withouttuning)tothetopicmodelfeatures.Wealsopointoutthatthe“bagofwords”versionofthecoherencemetric(theco-herenceofthetop20words)ishigherthantheav-eragetopiccoherence,whichisanartifactofhowthemetricisdeﬁned:themostprobablewordsinthecorpusalsotendtoco-occurtogetherinmostdocuments,sothesewordsareconsideredtobehighlycoherentwhengroupedtogether.ParameterSensitivityWeevaluatedthefullmodelatthetwopredictivetaskswithvaryingnumbersoftopics({12,25,50,100}forDebatesand{5,10,20,40}forReviews)andcomponents({2,5,10,20}).Figure5showsthatperformanceismoresensitivetothenumberoftopicsthancom-ponents,withgenerallylessvarianceamongthelatter.Moretopicsimproveperformancemono-tonicallyonDebates,whileperformancedeclinesat40topicsonReviews.Themiddlerangeofcom-ponents(5–10)tendstoperformbetterthantoofew(2)ortoomany(20)components.Regardlessofquantitativedifferences,the l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 53 25102015001600170018001900PerplexityDebatesK=12K=25K=50K=10025102014001450150015501600ReviewsK=5K=10K=20K=40251020Numberofcomponents.605.610.615.620.625.630Predictionerror251020Numberofcomponents.76.80.84.88.92Figure5:Predictiveperformanceoffullmodelwithdiffer-entnumbersoftopicsKacrossdifferentnumbersofcompo-nents,representedonthex-axis(logscale).τtDebatesReviews0.000(SparseDAG)58.1%42.4%1.000(SoftTree)93.2%74.6%1.001t(HardTree)99.8%99.4%1.003t(HardTree)100%100%Table3:Thepercentageofindicatorvaluesthataresparse(near0or1)whenusingdifferentannealingschedules.choiceofparametersmaydependontheendap-plicationandtheparticularstructuresthattheuserhasinmind,ifinterpretabilityisimportant.Forexample,ifthetopicmodelisusedasavisual-izationtool,then2componentswouldnotlikelyresultinaninterestinghierarchytotheuser,evenifthissettingproduceslowperplexity.StructuredSparsityWeusearelaxationofthebinarybthatinducesa“soft”treestructure.Ta-ble3showsthepercentageofbvalueswhicharewithin(cid:15)=.001of0or1undervariousanneal-ingschedules,increasingtheinversetemperatureτby0.1%aftereachiteration(i.e.τt=1.001t)aswellas0.3%andnoannealingatall(τ=1).Atτ=0,wemodelaDAGratherthanatree,be-causethemodelhasnopreferencethatbissparse.ManyofthevaluesarebinaryintheDAGcase,butthesparsepriorsubstantiallyincreasesthenumberofbinaryvalues,obtainingfullybinarystructureswithsufﬁcientannealing.WecomparetheDAGandtreestructuresmoreinthenextsubsection.7.3StructureComparisonTheprevioussubsectionexperimentedwithmod-elsthatincludedavarietyofstructures,butdidnotprovideacomparisonofeachstructureiniso-lation,sincemostmodelvariantswerepartofacomplexjointmodel.Inthissection,weexper-imentwiththebasicSPRITEmodelforthethreestructuresdescribedin§3:aDAG,atree,andafactoredforest.Foreachstructure,wealsoexper-imentwitheachtypeofcomponent:document,topic,andbothtypes(combined).Forthissetofexperiments,weincludedathirddatasetthatdoesnotcontainaperspectivevalue:•Abstracts:Asetof957abstractsfromtheACLanthology(97,168tokens;8,246types).TheseabstractshavepreviouslybeenanalyzedwithFLDA(PaulandDredze,2012),soweincludeitheretoseeifthefactoredstructurethatweexploreinthissectionlearnssimilarpatterns.Basedonoursparsityexperimentsinthepre-vioussubsection,wesetτt=1.003ttoinducehardstructures(treeandfactored)andτ=0toin-duceaDAG.Wekeepthesameparametersastheprevioussubsection:K=50andC=10forDebatesandK=20andC=5forReviews.Forthefactoredstructures,weusetwofactors,withonefactorhav-ingmorecomponentsthantheother:3and7com-ponentsforDebates,and2and3componentsforReviews(thetotalnumberofcomponentsacrossthetwofactorsisthereforethesameasfortheDAGandtreeexperiments).TheAbstractsexper-imentsusethesameparametersaswithDebates.SincetheAbstractsdatasetdoesnothaveaper-spectivevaluetopredict,wedonotincludepredic-tionerrorasametric,insteadfocusingonheld-outperplexityandtopiccoherence(Eq.2).Table4showstheresultsofthesetwometrics.Sometrendsareclearandconsistent.Topiccomponentsalwayshurtperplexity,whilethesecomponentstypicallyimprovecoherence,aswasobservedintheprevioussubsection.Ithaspre-viouslybeenobservedthatperplexityandtopicqualityarenotcorrelated(Changetal.,2009).Theseresultsshowthatthechoiceofcomponentsdependsonthetaskathand.Combiningthetwocomponentstendstoproduceresultssomewhereinbetween,suggestingthatusingbothcomponenttypesisareasonable“default”setting.Documentcomponentsusuallyimproveper-plexity,likelyduetothenatureofthedocumentcompletionsetup,inwhichhalfofeachdocumentisheldout.Thedocumentcomponentscapturecorrelationsbetweentopics,sobyinferringthecomponentsthatgeneratedtheﬁrsthalfofthedoc-ument,thepriorisadjustedtogivemoreprobabil-itytotopicsthatarelikelytooccurintheunseensecondhalf.Anotherinterestingtrendisthatthe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 54 PerplexityCoherenceDAGTreeFactoredDAGTreeFactoredDebatesDocument1572.0±0.91568.7±2.01566.8±2.0-342.9±1.2-346.0±0.9-343.2±1.0Topic1575.0±1.51573.4±1.81559.3±1.5-342.4±0.6-339.2±1.7-333.9±0.9Combined1566.7±1.71559.9±1.91552.5±1.9-342.9±1.3-342.6±1.2-340.3±1.0ReviewsDocument1456.9±3.81446.4±4.01450.4±5.5-512.2±4.6-527.9±6.5-535.4±7.4Topic1508.5±1.71517.9±2.01502.0±1.9-500.1±1.2-499.0±0.9-486.1±1.5Combined1464.1±3.31455.1±5.61448.5±8.5-504.9±1.4-527.8±6.1-535.5±8.2AbstractsDocument3107.7±7.73089.5±9.13098.7±10.2-393.2±0.8-390.8±0.9-392.8±1.5Topic3241.7±2.13455.9±10.23507.4±9.7-389.0±0.8-388.8±0.7-332.2±1.1Combined3200.8±3.53307.2±7.83364.9±19.1-373.1±0.8-360.6±0.9-342.3±0.9Table4:Quantitativeresultsfordifferentstructures(columns)anddifferentcomponents(rows)fortwometrics(±std.error)acrossthreedatasets.Thebest(structure,component)pairforeachdatasetandmetricisinbold.factoredstructuretendstoperformwellunderbothmetrics,withthelowestperplexityandhighestco-herenceinamajorityoftheninecomparisons(i.e.eachrow).Perhapsthemodelsarecapturinganat-uralfactorizationpresentinthedata.Tounderstandthefactoredstructurequalita-tively,Figure6showsexamplesofcomponentsfromeachfactoralongwithexampletopicsthatdrawfromallpairsofthesecomponents,learnedonAbstracts.Weﬁndthatthefactorwiththesmallernumberofcomponents(leftoftheﬁgure)seemstodecomposeintocomponentsrepresent-ingthemajorthemesordisciplinesfoundinACLabstracts,withonecomponentexpressingcompu-tationalapproaches(top)andtheotherexpressinglinguistictheory(bottom).Thethirdcomponent(notshown)haswordsassociatedwithspeech,in-cluding{spoken,speech,recognition}.Thefactorshownontherightseemstodecom-poseintodifferentresearchtopics:onecompo-nentrepresentssemantics(top),anothersyntax(bottom),withothersincludingmorphology(topwordsincluding{segmentation,chinese,morphol-ogy})andinformationretrieval(topwordsinclud-ing{documents,retrieval,ir}).Manyofthetopicsintuitivelyfollowfromthecomponentsofthesetwofactors.Forexample,thetwotopicsexpressingvectorspacemodelsanddistributionalsemantics(topleftandright)bothdrawfromthe“computational”and“semantics”components,whilethetopicsexpressingontolo-giesandquestionanswering(middleleftandright)drawfrom“linguistics”and“semantics”.ThefactorizationissimilartowhathadbeenpreviouslybeeninducedbyFLDA.Figure3ofPaulandDredze(2012)showscomponentsthatlooksimilartothecomputationalmethodsandlinguistictheorycomponentshere,andthefactorwiththelargestnumberofcomponentsalsode-composesbyresearchtopic.TheseresultsshowthatSPRITEiscapableofrecoveringsimilarstruc-turesasFLDA,amorespecializedmodel.SPRITEisalsomuchmoreﬂexiblethanFLDA.WhileFLDAstrictlymodelsaone-to-onemappingoftopicstoeachpairofcomponents,SPRITEallowsmultipletopicstobelongtothesamepair(asinthesemanticsexamplesabove),andconverselySPRITEdoesnotrequirethatallpairshaveanas-sociatedtopic.ThispropertyallowsSPRITEtoscaletolargernumbersoffactorsthanFLDA,be-causethenumberoftopicsisnotrequiredtogrowwiththenumberofallpossibletuples.8RelatedWorkOurtopicandperspectivemodelisrelatedtosu-pervisedhierarchicalLDA(SHLDA)(Nguyenetal.,2013),whichlearnsatopichierarchywhilealsolearningregressionparameterstoassociatetopicswithfeaturevaluessuchaspoliticalper-spective.Thismodeldoesnotexplicitlyincorpo-rateperspective-speciﬁcwordpriorsintothetop-ics(asinourfactorizedapproach).Theregressionstructureisalsodifferent.SHLDAisa“down-stream”model,wheretheperspectivevalueisare-sponsevariableconditionedonthetopics.Incon-trast,SPRITEisan“upstream”model,wherethetopicsareconditionedontheperspectivevalue.Wearguethatthelatterismoreaccurateasagen-erativestory(theemittedwordsdependontheauthor’sperspective,nottheotherwayaround).Moreover,inourmodeltheperspectiveinﬂuencesboththewordandtopicdistributions(throughthetopicanddocumentcomponents,respectively).Inverseregressiontopicmodels(RabinovichandBlei,2014)usedocumentfeaturevalues(suchaspoliticalideology)toaltertheparametersofthe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 55 method!words!word!corpus!learning!performance!approaches!training!proposed!based!“Linguistics”!grammar!parsing!representation!structure!grammars!parse!syntax!representations!semantics!formalism!semantic!knowledge!domain!ontology!systems!words!information!wordnet!question!dialogue!parse!treebank!parser!penn!parsers!trees!dependencies!acoustic!corpus!parsing!training!learning!corpus!large!unsupervised!corpora!method!data!semantic!knowledge!semantics!ontology!relations!lexical!concepts!concept!similarity!words!word!vector!semantic!similar!based!method!words!corpus!word!multiword!paper!based!frequency!expressions!question!questions!answer!answering!answers!qa!systems!type!parsing!parser!parse!treebank!grammar!tree!trees!structure!german!languages!french!english!multilingual!italian!structure!spanish!“Computational”!“Semantics”!“Syntax”!Figure6:Examplesoftopics(grayboxes)andcomponents(coloredboxes)learnedontheAbstractscorpuswith50topicsusingafactoredstructure.Thecomponentshavebeengroupedintotwofactors,onefactorwith3components(left)andonewith7(right),withtwoexamplesshownfromeach.Eachtopicpriordrawsfromexactlyonecomponentfromeachfactor.topic-speciﬁcworddistributions.Thisisanalter-nativetothemorecommonapproachtoregressionbasedtopicmodeling,wherethevariablesaffectthetopicdistributionsratherthantheworddistri-butions.OurSPRITE-basedmodeldoesboth:thedocumentfeaturesadjustthepriorovertopicdis-tributions(throughδ),butbytyingtogetherthedocumentandtopiccomponents(withβ),thedoc-umentfeaturesalsoaffecttheprioroverworddis-tributions.Tothebestofourknowledge,thisistheﬁrsttopicmodeltoconditionbothtopicandworddistributionsonthesamefeatures.Thetopicaspectmodel(PaulandGirju,2010a)isalsoatwo-dimensionalfactoredmodelthathasbeenusedtojointlymodeltopicandperspective(PaulandGirju,2010b).However,thismodeldoesnotusestructuredpriorsovertheparameters,unlikemostofthemodelsdiscussedin§4.Analternativeapproachtoincorporatinguserpreferencesandexpertiseareinteractivetopicmodels(Huetal.,2013),acomplimentaryap-proachtoSPRITE.9DiscussionandConclusionWehavepresentedSPRITE,afamilyoftopicmod-elsthatutilizestructuredpriorstoinducepre-ferredtopicstructures.SpeciﬁcinstantiationsofSPRITEaresimilarorequivalenttoseveralexist-ingtopicmodels.WedemonstratedtheutilityofSPRITEbyconstructingasinglemodelwithmanydifferentcharacteristics,includingatopichierar-chy,afactorizationoftopicandperspective,andsupervisionintheformofdocumentattributes.Thesestructureswereincorporatedintothepri-orsofboththewordandtopicdistributions,unlikemostpriorworkthatconsideredoneortheother.Ourexperimentsexploredhoweachofthesevar-iousmodelfeaturesaffectperformance,andourresultsshowedthatmodelswithstructuredpriorsperformbetterthanbaselineLDAmodels.Ourframeworkhasmadeclearadvancementswithrespecttoexistingstructuredtopicmodels.Forexample,SPRITEismoregeneralandof-ferssimplerinferencethanthesharedcompo-nentstopicmodel(Gormleyetal.,2010),andSPRITEallowsformoreﬂexibleandscalablefac-toredstructuresthanFLDA,asdescribedinearliersections.Bothofthesemodelsweremotivatedbytheirabilitytolearninterestingstructures,ratherthantheirperformanceatanypredictivetask.Sim-ilarly,ourgoalinthisstudywasnottoprovidestateoftheartresultsforaparticulartask,buttodemonstrateaframeworkforlearningstruc-turesthatarericherthanpreviousstructuredmod-els.Therefore,ourexperimentsfocusedonun-derstandinghowSPRITEcomparestocommonlyusedmodelswithsimilarstructures,andhowthedifferentvariantscompareunderdifferentmetrics.Ultimately,themodeldesignchoicedependsontheapplicationandtheuserneeds.Byunifyingsuchawidevarietyoftopicmodels,SPRITEcanserveasacommonframeworkforenablingmodelexplorationandbringingapplication-speciﬁcpref-erencesandstructureintotopicmodels. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 56 AcknowledgmentsWethankJasonEisnerandHannaWallachforhelpfuldiscussions,andViet-AnNguyenforpro-vidingtheCongressionaldebatesdata.MichaelPaulissupportedbyaMicrosoftResearchPhDfellowship.ReferencesD.Andrzejewski,X.Zhu,andM.Craven.2009.In-corporatingdomainknowledgeintotopicmodelingviaDirichletforestpriors.InICML.R.BalasubramanyanandW.Cohen.2013.Regular-izationoflatentvariablemodelstoobtainsparsity.InSIAMConferenceonDataMining.D.BleiandJ.Lafferty.2007.AcorrelatedtopicmodelofScience.AnnalsofAppliedStatistics,1(1):17–35.D.Blei,T.Grifﬁths,M.Jordan,andJ.Tenenbaum.2003a.HierarchicaltopicmodelsandthenestedChineserestaurantprocess.InNIPS.D.Blei,A.Ng,andM.Jordan.2003b.LatentDirichletallocation.JMLR.J.Chang,J.Boyd-Graber,S.Gerrish,C.Wang,andD.Blei.2009.Readingtealeaves:Howhumansinterprettopicmodels.InNIPS.J.Duchi,E.Hazan,andY.Singer.2011.Adaptivesub-gradientmethodsforonlinelearningandstochasticoptimization.JMLR,12:2121–2159.J.Eisenstein,A.Ahmed,andE.P.Xing.2011.Sparseadditivegenerativemodelsoftext.InICML.M.R.Gormley,M.Dredze,B.VanDurme,andJ.Eis-ner.2010.Sharedcomponentstopicmodels.InNAACL.T.GrifﬁthsandM.Steyvers.2004.Findingscientiﬁctopics.InProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica.Y.Hu,J.Boyd-Graber,B.Satinoff,andA.Smith.2013.Interactivetopicmodeling.MachineLearn-ing,95:423–469.J.KivinenandM.K.Warmuth.1997.Exponentiatedgradientversusgradientdescentforlinearpredic-tors.InformationandComputation,132:1–63.J.B.LewisandK.T.Poole.2004.Measuringbiasanduncertaintyinidealpointestimatesviatheparamet-ricbootstrap.PoliticalAnalysis,12(2):105–127.W.LiandA.McCallum.2006.Pachinkoalloca-tion:DAG-structuredmixturemodelsoftopiccor-relations.InInternationalConferenceonMachineLearning.D.MimnoandA.McCallum.2008.Topicmod-elsconditionedonarbitraryfeatureswithDirichlet-multinomialregression.InUAI.D.Mimno,W.Li,andA.McCallum.2007.MixturesofhierarchicaltopicswithPachinkoallocation.InInternationalConferenceonMachineLearning.D.Mimno,H.M.Wallach,E.Talley,M.Leenders,andA.McCallum.2011.Optimizingsemanticcoher-enceintopicmodels.InEMNLP.V.Nguyen,J.Boyd-Graber,andP.Resnik.2013.Lex-icalandhierarchicaltopicregression.InNeuralIn-formationProcessingSystems.M.J.PaulandM.Dredze.2012.FactorialLDA:Sparsemulti-dimensionaltextmodels.InNeuralInforma-tionProcessingSystems(NIPS).M.J.PaulandM.Dredze.2013.Drugextractionfromtheweb:Summarizingdrugexperienceswithmulti-dimensionaltopicmodels.InNAACL.M.PaulandR.Girju.2010a.Atwo-dimensionaltopic-aspectmodelfordiscoveringmulti-facetedtopics.InAAAI.M.J.PaulandR.Girju.2010b.Summarizingcon-trastiveviewpointsinopinionatedtext.InEmpiricalMethodsinNaturalLanguageProcessing.M.J.Paul,B.C.Wallace,andM.Dredze.2013.Whataffectspatient(dis)satisfaction?Analyzingonlinedoctorratingswithajointtopic-sentimentmodel.InAAAIWorkshoponExpandingtheBoundariesofHealthInformaticsUsingAI.M.RabinovichandD.Blei.2014.Theinverseregres-siontopicmodel.InInternationalConferenceonMachineLearning.D.Ramage,D.Hall,R.Nallapati,andC.D.Man-ning.2009.LabeledLDA:asupervisedtopicmodelforcreditattributioninmulti-labeledcorpora.InEMNLP.N.A.SmithandJ.Eisner.2006.Annealingstructuralbiasinmultilingualweightedgrammarinduction.InCOLING-ACL.E.M.Talley,D.Newman,D.Mimno,B.W.HerrII,H.M.Wallach,G.A.P.C.Burns,M.Leenders,andA.McCallum.2011.DatabaseofNIHgrantsus-ingmachine-learnedcategoriesandgraphicalclus-tering.NatureMethods,8(6):443–444.N.UedaandR.Nakano.1998.Deterministicanneal-ingEMalgorithm.NeuralNetworks,11(2):271–282.B.C.Wallace,M.J.Paul,U.Sarkar,T.A.Trikalinos,andM.Dredze.2014.Alarge-scalequantitativeanalysisoflatentfactorsandsentimentinonlinedoctorreviews.JournaloftheAmericanMedicalInformaticsAssociation,21(6):1098–1103. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 57 H.M.Wallach,D.Mimno,andA.McCallum.2009a.RethinkingLDA:Whypriorsmatter.InNIPS.H.M.Wallach,I.Murray,R.Salakhutdinov,andD.Mimno.2009b.Evaluationmethodsfortopicmodels.InICML.C.WangandD.Blei.2009.DecouplingsparsityandsmoothnessinthediscretehierarchicalDirich-letprocess.InNIPS.M.D.Zeiler.2012.ADADELTA:Anadaptivelearningratemethod.CoRR,abs/1212.5701. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 2 1 1 5 6 6 7 2 8 / / t l a c _ a _ 0 0 1 2 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 58
Télécharger le PDF

Recherche en IA spécialisée au MIT

Recherche en IA spécialisée au MIT

Transactions of the Association for Computational Linguistics, vol. 3, pp. 43–57, 2015. Action Editor: Janyce Wiebe.