Documentation - Specialized Research AI at MIT

What topic do you need documentation on?

Transactions of the Association for Computational Linguistics, vol. 3, pp. 529–543, 2015. Action Editor: Sebastian Riedel.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 529–543, 2015. Action Editor: Sebastian Riedel. Submission batch: 5/2015; Revision batch: 8/2015; Published 10/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) Large-ScaleInformationExtractionfromTextualDeﬁnitionsthroughDeepSyntacticandSemanticAnalysisClaudioDelliBovi,LucaTelescaandRobertoNavigliDepartmentofComputerScienceSapienzaUniversityofRome{dellibovi,navigli}@di.uniroma1.itluca.telesca@gmail.comAbstractWepresentDEFIE,anapproachtolarge-scaleInformationExtraction(IE)basedonasyntactic-semanticanalysisoftextualdeﬁni-tions.Givenalargecorpusofdeﬁnitionsweleveragesyntacticdependenciestoreducedatasparsity,thendisambiguatetheargumentsandcontentwordsoftherelationstrings,andﬁ-nallyexploittheresultinginformationtoorga-nizetheacquiredrelationshierarchically.TheoutputofDEFIEisahigh-qualityknowledgebaseconsistingofseveralmillionautomati-callyacquiredsemanticrelations.11IntroductionTheproblemofknowledgeacquisitionliesatthecoreofNaturalLanguageProcessing.Recentyearshavewitnessedthemassiveexploitationofcollabo-rative,semi-structuredinformationastheidealmid-dlegroundbetweenhigh-quality,fully-structuredresourcesandthelargeramountofcheaper(butnoisy)unstructuredtext(Hovyetal.,2013).Col-laborativeprojects,likeFreebase(Bollackeretal.,2008)andWikidata(Vrandeˇci´c,2012),havebeenbeingdevelopedformanyyearsandarecontinu-ouslybeingimproved.Agreatdealofresearchalsofocusesonenrichingavailablesemi-structuredre-sources,mostnotablyWikipedia,therebycreatingtaxonomies(PonzettoandStrube,2011;Flatietal.,2014),ontologies(Mahdisoltanietal.,2015)andse-manticnetworks(NavigliandPonzetto,2012;Nas-taseandStrube,2013).Thesesolutions,however,1http://lcl.uniroma1.it/defieareinherentlyconstrainedtosmallandoftenpre-speciﬁedsetsofrelations.AmoreradicalapproachisadoptedinsystemslikeTEXTRUNNER(Etzionietal.,2008)andREVERB(Faderetal.,2011),whichdevelopedfromtheOpenInformationExtraction(OIE)paradigm(Etzionietal.,2008)andfocusedontheunconstrainedextractionofalargenumberofrelationsfrommassiveunstructuredcorpora.Ul-timately,alltheseendeavorsweregearedtowardsaddressingtheknowledgeacquisitionproblemandtacklinglong-standingchallengesintheﬁeld,suchasMachineReading(Mitchell,2005).WhileearlierOIEapproachesreliedmostlyondependenciesatthelevelofsurfacetext(Etzionietal.,2008;Faderetal.,2011),morerecentworkhasfocusedondeeperlanguageunderstandingatthelevelofbothsyntaxandsemantics(Nakasholeetal.,2012;MoroandNavigli,2013)andtackledchal-lenginglinguisticphenomenalikesynonymyandpolysemy.However,theseissueshavenotyetbeenaddressedintheirentirety.Relationstringsarestillboundtosurfacetext,lackingactualsemanticcon-tent.Furthermore,mostOIEsystemsdonothaveaclearanduniﬁedontologicalstructureandre-quireadditionalprocessingsteps,suchasstatisti-calinferencemappings(Duttaetal.,2014),graph-basedalignmentsofrelationalphrases(GrycnerandWeikum,2014),orknowledgebaseuniﬁcationpro-cedures(DelliBovietal.,2015),inorderfortheirpotentialtobeexploitableinrealapplications.InDEFIEthekeyideaistoleveragethelinguisticanalysisofrecentsemantically-enhancedOIEtech-niqueswhilemovingfromopentexttosmallercor-poraofdenseprescriptiveknowledge.Theaimis l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 489–501, 2015. Action Editor: Sebastian Riedel.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 489–501, 2015. Action Editor: Sebastian Riedel. Submission batch: 4/2015; Published 8/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) Approximation-AwareDependencyParsingbyBeliefPropagationMatthewR.GormleyMarkDredzeJasonEisnerHumanLanguageTechnologyCenterofExcellenceCenterforLanguageandSpeechProcessingDepartmentofComputerScienceJohnsHopkinsUniversity,Baltimore,MD{mrg,mdredze,jason}@cs.jhu.eduAbstractWeshowhowtotrainthefastdependencyparserofSmithandEisner(2008)forim-provedaccuracy.Thisparsercanconsiderhigher-orderinteractionsamongedgeswhileretainingO(n3)runtime.Itoutputstheparsewithmaximumexpectedrecall—butforspeed,thisexpectationistakenunderapos-teriordistributionthatisconstructedonlyap-proximately,usingloopybeliefpropagationthroughstructuredfactors.Weshowhowtoadjustthemodelparameterstocompensatefortheerrorsintroducedbythisapproximation,byfollowingthegradientoftheactuallossontrainingdata.Weﬁndthisgradientbyback-propagation.Thatis,wetreattheentireparser(approximationsandall)asadifferentiablecircuit,asothershavedoneforloopyCRFs(Domke,2010;Stoyanovetal.,2011;Domke,2011;StoyanovandEisner,2012).There-sultingparserobtainshigheraccuracywithfeweriterationsofbeliefpropagationthanonetrainedbyconditionallog-likelihood.1IntroductionRecentimprovementstodependencyparsingac-curacyhavebeendrivenbyhigher-orderfeatures.Suchafeaturecanlookbeyondjusttheparentandchildwordsconnectedbyasingleedgetoalsocon-sidersiblings,grandparents,etc.Byincludingin-creasinglyglobalinformation,thesefeaturespro-videmoreinformationfortheparser—buttheyalsocomplicateinference.Theresultinghigher-orderparsersdependonapproximateinferenceanddecod-ingprocedures,whichmaypreventthemfrompre-dictingthebestparse.Forexample,considerthedependencyparserwewilltraininthispaper,whichisbasedontheworkofSmithandEisner(2008).Ostensibly,thisparserﬁndstheminimumBayesrisk(MBR)parseunderaprobabilitydistributiondeﬁnedbyahigher-orderdependencyparsingmodel.Inreality,itachievesO(n3tmax)runtimebyrelyingonthreeapproxima-tionsduringinference:(1)variationalinferencebyloopybeliefpropagation(BP)onafactorgraph,(2)truncatinginferenceaftertmaxiterationspriortoconvergence,and(3)aﬁrst-orderpruningmodeltolimitthenumberofedgesconsideredinthehigher-ordermodel.Suchparsersaretraditionallytrainedasiftheinferencehadbeenexact.1Incontrast,wetraintheparsersuchthattheap-proximatesystemperformswellontheﬁnaleval-uationfunction.Wetreattheentireparsingcom-putationasadifferentiablecircuit,andbackprop-agatetheevaluationfunctionthroughourapprox-imateinferenceanddecodingmethodstoimproveitsparametersbygradientdescent.Thesystemalsolearnstocopewithmodelmisspeciﬁcation,wherethemodelcouldn’tperfectlyﬁtthedistributionevenabsenttheapproximations.Forstandardgraphicalmodels,StoyanovandEisner(2012)callthisap-proachERMA,for“empiricalriskminimizationun-derapproximations.”Forobjectivesbesidesempiri-calrisk,Domke(2011)referstoitas“learningwithtruncatedmessagepassing.”Ourprimarycontributionistheapplicationofthisapproximation-awarelearningmethodinthepars-ingsetting,forwhichthegraphicalmodelinvolvesaglobalconstraint.SmithandEisner(2008)pre-viouslyshowedhowtorunBPinthissetting(bycallingtheinside-outsidealgorithmasasubroutine).Wemustbackpropagatethedownstreamobjective1Forperceptrontraining,utilizinginexactinferenceasadrop-inreplacementforexactinferencecanbadlymisleadthelearner(KuleszaandPereira,2008;Huangetal.,2012). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i

Transactions of the Association for Computational Linguistics, vol. 3, pp. 475–488, 2015. Action Editor: Diana McCarthy.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 475–488, 2015. Action Editor: Diana McCarthy. Submission batch: 3/2015; Revision batch 6/2015; Published 8/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) SemanticProto-RolesDrewReisingerRachelRudingerFrancisFerraroCraigHarmanKyleRawlins∗BenjaminVanDurme∗{rawlins@cogsci,vandurme@cs}.jhu.eduJohnsHopkinsUniversityAbstractWepresenttheﬁrstlarge-scale,corpusbasedveriﬁcationofDowty’sseminaltheoryofproto-roles.Ourresultsdemonstrateboththeneedforandthefeasibilityofaproperty-basedannotationschemeofsemanticrelationships,asopposedtothecurrentlydominantnotionofcategoricalroles.1IntroductionFordecadesresearchershavedebatedthenumberandcharacterofthematicrolesrequiredforatheoryofthesyntax/semanticsinterface.AGENTandPA-TIENTarecanonicalexamples,butquestionsemergesuchas:shouldwehaveadistinctroleforBENE-FICIARY?WhataboutRECIPIENT?Whataretheboundariesbetweentheseroles?Andsoon.Dowty(1991),inaseminalarticle,respondedtothisdebatebyconstructingthenotionofaProto-AgentandProto-Patient,basedonentailmentsthatcanbemappedtoquestions,suchas:“Didthear-gumentchangestate?”,or“Didtheargumenthavevolitionalinvolvementintheevent?”.Dowtyarguedthatthesepropertiesgrouptogetherinthelexiconnon-categorically,inawaythatalignswithclas-sicAgent/Patientintuitions.Forinstance,aProto-Patientoftenbothchangesstate(butmightnot),andofteniscausallyaffectedbyanotherparticipant.Variousresourceshavebeendevelopedforcom-putationallinguistsworkingon‘SemanticRoleLa-beling’(SRL),largelyundertheclassical,categor-icalnotionofrole.HerewerevisitDowty’sre-∗Correspondingauthors.searchascomputationallinguistsdesiringdataforanewtask,SemanticProto-RoleLabeling(SPRL),inwhichexistingcoarse-grainedcategoricalrolesarereplacedbyscalarjudgementsofDowty-inspiredproperties.Astheavailabilityofsupportingdataisacriticalcomponentofsuchatask,muchofouref-fortsherearefocusedonshowingthateverydayEn-glishspeakers(untrainedannotators)areabletoan-swerbasicquestionsaboutsemanticrelationships.Inthisworkweconsiderthefollowingquestions:(i)cancrowdsourcingmethodsbeusedtoempiri-callyvalidatetheformallinguistictheoryofDowty,followingpriorworkinpsycholinguistics(Kako,2006b)?(ii)Howmightexistingsemanticanno-tationeffortsbeusedinsuchapursuit?(iii)CanthepursuitofDowty’ssemanticpropertiesbeturnedintoapracticalandscalableannotationtask?(iv)Dotheresultsofsuchanannotationtask(atvariousscales,includingoververylargecorpora)continuetoconﬁrmDowty’sproto-rolehypothesis?Andﬁ-nally,(v)howdotheresultingconﬁgurationsofﬁne-grainedrolepropertiescomparetocoarserannotatedrolesinresourcessuchasVerbNet?1Weﬁrstderiveasetofbasicsemanticques-tionspertainingtoDowty-inspiredproperties.ThesequestionsareusedintwoMechanicalTurkHITsthataddresstheaboveissuess.IntheﬁrstHIT,webuildonpsycholinguisticwork(Kako,2006b)todirectlyaccess‘type-level’intuitionsaboutalexicalitem,byaskingsubjectsproperty-questionsusingmade-up(“nonce”)wordsinargumentpositions.Ourresults1Tobeclear,Dowtyhimselfdoesnotmakedirectpredic-tionsaboutthedistributionofproto-rolepropertieswithinacor-pus,exceptinsofarasacorpusisrepresentativeofthelexicon. l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 449–460, 2015. Action Editor: Diana McCarthy.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 449–460, 2015. Action Editor: Diana McCarthy. Submission batch: 5/2015; Revision batch 7/2015; Published 8/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) Context-awareFrame-SemanticRoleLabelingMichaelRothandMirellaLapataSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89AB{mroth,mlap}@inf.ed.ac.ukAbstractFramesemanticrepresentationshavebeenusefulinseveralapplicationsrangingfromtext-to-scenegeneration,toquestionanswer-ingandsocialnetworkanalysis.Predictingsuchrepresentationsfromrawtextis,how-ever,achallengingtaskandcorrespondingmodelsaretypicallyonlytrainedonasmallsetofsentence-levelannotations.Inthispa-per,wepresentasemanticrolelabelingsys-temthattakesintoaccountsentenceanddis-coursecontext.Weintroduceseveralnewfea-tureswhichwemotivatebasedonlinguisticinsightsandexperimentallydemonstratethattheyleadtosigniﬁcantimprovementsoverthecurrentstate-of-the-artinFrameNet-basedse-manticrolelabeling.1IntroductionThegoalofsemanticrolelabeling(SRL)istoiden-tifyandlabeltheargumentsofsemanticpredicatesinasentenceaccordingtoasetofpredeﬁnedre-lations(e.g.,“who”did“what”to“whom”).Inadditiontoprovidingdeﬁnitionsandexamplesofrolelabeledtext,resourceslikeFrameNet(Ruppen-hoferetal.,2010)groupsemanticpredicatesintoso-calledframes,i.e.,conceptualstructuresdescribingthebackgroundknowledgenecessarytounderstandasituation,eventorentityasawholeaswellastherolesparticipatinginit.Accordingly,semanticrolesaredeﬁnedonaper-framebasisandaresharedamongpredicates.Inrecentyears,framerepresentationshavebeensuccessfullyappliedinarangeofdownstreamtasks,includingquestionanswering(ShenandLapata,2007),text-to-scenegeneration(Coyneetal.,2012),stockpriceprediction(Xieetal.,2013),andso-cialnetworkextraction(Agarwaletal.,2014).WhereassometasksdirectlyutilizeinformationencodedintheFrameNetresource,othersmakeuseofFrameNetindirectlythroughtheoutputofSRLsystemsthataretrainedondataannotatedwithframe-semanticrepresentations.Whilead-vancesinmachinelearninghaverecentlygivenrisetoincreasinglypowerfulSRLsystemsfollow-ingtheFrameNetparadigm(Hermannetal.,2014;T¨ackstr¨ometal.,2015),littleefforthasbeendevotedtoimprovesuchmodelsfromalinguisticperspec-tive.Inthispaper,weexploreinsightsfromthelin-guisticliteraturesuggestingaconnectionbetweendiscourseandrolelabelingdecisionsandshowhowtoincorporatetheseinanSRLsystem.Althoughearlytheoreticalwork(Fillmore,1976)hasrecog-nizedtheimportanceofdiscoursecontextfortheassignmentofsemanticroles,mostcomputationalapproacheshaveshiedawayfromsuchconsidera-tions.Toseehowcontextcanbeuseful,considerasanexampletheDELIVERYframe,whichstatesthataTHEMEcanbehandedofftoeitheraRECIPIENTor“moreindirectly”toaGOAL.Whilethedistinc-tionbetweenthelattertworolesmightbeclearforsomeﬁllers(e.g.,peoplevs.locations),thereareoth-erswherebothrolesareequallyplausibleandaddi-tionalinformationisrequiredtoresolvetheambigu-ity(e.g.,countries).IfwehearaboutaletterbeingdeliveredtoGreece,forinstance,reliablecuesmightbewhetherthesenderisapersonoracountryand l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 433–447, 2015. Action Editor: Sharon Goldwater.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 433–447, 2015. Action Editor: Sharon Goldwater. Submission batch: 10/2014; Revision batch 3/2015; Published 8/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) ModelingWordFormsUsingLatentUnderlyingMorphsandPhonologyRyanCotterellandNanyunPengandJasonEisnerDepartmentofComputerScience,JohnsHopkinsUniversity{ryan.cotterell,npeng1,eisner}@jhu.eduAbstractTheobservedpronunciationsorspellingsofwordsareoftenexplainedasarisingfromthe“underlyingforms”oftheirmor-phemes.Theseformsarelatentstringsthatlinguiststrytoreconstructbyhand.Weproposetoreconstructthemautomaticallyatscale,enablinggeneralizationtonewwords.Givensomesurfacewordtypesofaconcatenativelanguagealongwiththeabstractmorphemesequencesthattheyex-press,weshowhowtorecoverconsistentunderlyingformsforthesemorphemes,togetherwiththe(stochastic)phonologythatmapseachconcatenationofunderly-ingformstoasurfaceform.Ourtechniqueinvolvesloopybeliefpropagationinanat-uraldirectedgraphicalmodelwhosevari-ablesareunknownstringsandwhosecon-ditionaldistributionsareencodedasﬁnite-statemachineswithtrainableweights.Wedeﬁnetrainingandevaluationparadigmsforthetaskofsurfacewordprediction,andreportresultsonsubsetsof7languages.1IntroductionHowispluralityexpressedinEnglish?Compar-ingcats([kæts]),dogs([dOgz]),andquizzes([kwIzIz]),thepluralmorphemeevidentlyhasatleastthreepronunciations([s],[z],[Iz])andatleasttwospellings(-sand-es).Also,consider-ingsingularquiz,perhapsthe“shortexam”mor-phemehasmultiplespellings(quizz-,quiz-).Fortunately,languagesaresystematic.There-alizationofamorphememayvarybycontextbutislargelypredictablefromcontext,inawaythatgeneralizesacrossmorphemes.Infact,gener-ativelinguiststraditionallypositthateachmor-phemeofalanguagehasasinglerepresentationsharedacrossallcontexts(Jakobson,1948;Ken-stowiczandKisseberth,1979,chapter6).How-ever,thisstringisalatentvariablethatisneverobserved.Variationappearswhenthephonologyofthelanguagemapstheseunderlyingrepresen-tations(URs)—incontext—tosurfacerepresen-tations(SRs)thatmaybeeasiertopronounce.Thephonologyisusuallydescribedbyagrammarthatmayconsistofeitherrewriterules(ChomskyandHalle,1968)orrankedconstraints(PrinceandSmolensky,2004).Wewillreviewthisframeworkinsection2.Theupshotisthattheobservedwordsinalanguagearesupposedtobeexplainableintermsofasmallerunderlyinglexiconofmorphemes,plusaphonol-ogy.Ourgoalinthispaperistorecoverthelexiconandphonology(enablinggeneralizationtonewwords).Thisisdifﬁcultevenwhenwearetoldwhichmorphemesareexpressedbyeachword,be-causetheunknownunderlyingformsofthemor-phemesmustcooperateproperlywithoneanotherandwiththeunknownphonologicalrulestopro-ducetheobservedresults.Becauseofthesein-teractions,wemustreconstructeverythingjointly.Weregardthisasaproblemofinferenceinadi-rectedgraphicalmodel,assketchedinFigure1.Thisisanaturalproblemforcomputationallin-guistics.Phonologystudentsaretrainedtopuzzleoutsolutionsforsmalldatasetsbyhand.Childrenapparentlysolveitatthescaleofanentirelan-guage.Phonologistswouldliketohavegrammarsformanylanguages,notjusttostudyeachlan-guagebutalsotounderstanduniversalprinciplesanddifferencesamongrelatedlanguages.Auto-maticprocedureswouldrecoversuchgrammars.Theywouldalsoallowcomprehensiveevaluationandcomparisonofdifferentphonologicaltheories(i.e.,whatinductivebiasesareuseful?),andwouldsuggestmodelsofhumanlanguagelearning.Solvingthisproblemisalsopracticallyimpor-tantforNLP.Whatwerecoverisamodelthatcangenerateandhelpanalyzenovelwordforms,1whichaboundinmorphologicallycomplexlan-guages.Ourapproachisdesignedtomodelsur-facepronunciations(asneededfortext-to-speechandASR).Itmightalsobeappliedinpractice1Ananalyzerwouldrequireaprioroverpossibleanalyses.Ourpresentmodeldeﬁnesjustthecorrespondinglikelihoods,i.e.,theprobabilityoftheobservedwordgiveneachanalysis. l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 419–432, 2015. Action Editor: Philipp Koehn.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 419–432, 2015. Action Editor: Philipp Koehn. Submission batch: 4/2015; Revision batch 7/2015; Published 7/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. c (cid:13) UnsupervisedIdentiﬁcationofTranslationeseEllaRabinovichShulyWintnerDepartmentofComputerScienceDepartmentofComputerScienceUniversityofHaifaUniversityofHaifaellarabi@csweb.haifa.ac.ilshuly@cs.haifa.ac.ilAbstractTranslatedtextsaredistinctivelydifferentfromoriginalones,totheextentthatsu-pervisedtextclassiﬁcationmethodscandis-tinguishbetweenthemwithhighaccuracy.Thesedifferenceswereprovenusefulforsta-tisticalmachinetranslation.However,ithasbeensuggestedthattheaccuracyoftransla-tiondetectiondeteriorateswhentheclassiﬁerisevaluatedoutsidethedomainitwastrainedon.Weshowthatthisisindeedthecase,inavarietyofevaluationscenarios.Wethenshowthatunsupervisedclassiﬁcationishighlyac-curateonthistask.Wesuggestamethodfordeterminingthecorrectlabelsoftheclusteringoutcomes,andthenusethelabelsforvoting,improvingtheaccuracyevenfurther.More-over,wesuggestasimplemethodforcluster-inginthechallengingcaseofmixed-domaindatasets,inspiteofthedominanceofdomain-relatedfeaturesovertranslation-relatedones.Theresultisaneffective,fully-unsupervisedmethodfordistinguishingbetweenoriginalandtranslatedtextsthatcanbeappliedtonewdomainswithreasonableaccuracy.1IntroductionHuman-translatedtexts(inanylanguage)havedis-tinctfeaturesthatdistinguishthemfromoriginal,non-translatedtexts.Thesedifferencesstemei-therfromtheeffectofthetranslationprocessonthetranslatedoutcomes,orfrom“ﬁngerprints”ofthesourcelanguageonthetargetlanguageproduct.Thetermtranslationesewascoinedtoindicatetheuniquepropertiesoftranslations.Awarenesstotranslationesecanimprovestatis-ticalmachinetranslation(SMT).First,fortrainingtranslationmodels,paralleltextsthatweretranslatedinthedirectionoftheSMTtaskarepreferabletotextstranslatedintheoppositedirection;second,fortraininglanguagemodels,monolingualcorporaoftranslatedtextsarebetterthanoriginaltexts.Itispossibletoautomaticallydistinguishbetweenoriginal(O)andtranslated(T)texts,withveryhighaccuracy,byemployingtextclassiﬁcationmethods.Existingapproaches,however,onlyemploysuper-visedmachine-learning;theythereforesufferfromtwomaindrawbacks:(i)theyinherentlydependondataannotatedwiththetranslationdirection,and(ii)theymaynotbegeneralizedtounseen(relatedorunrelated)domains.1Theseshortcomingsunder-minetheusabilityofsupervisedmethodsfortrans-lationeseidentiﬁcationinatypicalreal-lifescenario,wherenolabelledin-domaindataareavailable.Inthisworkweexploreunsupervisedtechniquesforreliablediscriminationoforiginalandtranslatedtexts.Moreprecisely,weapplydimensionreductionandcentroid-basedclusteringmethods(enhancedbyinternalclusteringevaluation),fortellingOfromTinanunsupervisedscenario.Furthermore,wein-troducearobustmethodologyforlabellingtheob-tainedclusters,i.e.,annotatingthemas“original”or“translated”,byinspectingsimilaritiesbetweentheclusteringoutcomesandOandTprototypicalex-amples.Rigorousexperimentswithfourdiversecor-porademonstratethatclusteringofin-domaintextsusinglexical,content-independentfeaturessystem-aticallyyieldsveryhighaccuracy,only10percentpointslowerthantheperformanceofsupervisedclassiﬁcationonthesamedata(inmostcases).Ac-1Weuse“domain”ratherfreelyhenceforthtoindicatenotonlythetopicofacorpusbutalsoitsmodality(writtenvs.spo-ken),register,genre,date,etc. l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 359–373, 2015. Action Editor: Joakim Nivre.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 359–373, 2015. Action Editor: Joakim Nivre. Submission batch: 4/2015; Published 6/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) AGraph-basedLatticeDependencyParserforJointMorphologicalSegmentationandSyntacticAnalysisWolfgangSeekerand¨OzlemC¸etino˘gluInstitutf¨urMaschinelleSprachverarbeitungUniversityofStuttgart{seeker,ozlem}@ims.uni-stuttgart.deAbstractSpace-delimitedwordsinTurkishandHe-brewtextcanbefurthersegmentedintomean-ingfulunits,butsyntacticandsemanticcon-textisnecessarytopredictsegmentation.Atthesametime,predictingcorrectsyntac-ticstructuresreliesoncorrectsegmentation.Wepresentagraph-basedlatticedependencyparserthatoperatesonmorphologicallatticestorepresentdifferentsegmentationsandmor-phologicalanalysesforagiveninputsentence.Thelatticeparserpredictsadependencytreeoverapathinthelatticeandthussolvesthejointtaskofsegmentation,morphologicalanalysis,andsyntacticparsing.WeconductexperimentsontheTurkishandtheHebrewtreebankandshowthatthejointmodeloutper-formsthreestate-of-the-artpipelinesystemsonbothdatasets.Ourworkcorroboratesﬁnd-ingsfromconstituencylatticeparsingforHe-brewandpresentstheﬁrstresultsforfulllat-ticeparsingonTurkish.1IntroductionLinguistictheoryhasprovidedexamplesfrommanydifferentlanguagesinwhichgrammaticalinforma-tionisexpressedviacasemarking,morphologicalagreement,orclitics.Intheselanguages,conﬁgura-tionalinformationislessimportantthaninEnglishsincethewordsareovertlymarkedfortheirsyntac-ticrelationstoeachother.Suchmorphologicallyrichlanguagesposemanynewchallengestotoday’snaturallanguageprocessingtechnology,whichhasoftenbeendevelopedforEnglish.Oneoftheﬁrstchallengesisthequestiononhowtorepresentmorphologicallyrichlanguagesandwhatarethebasicunitsofanalysis(Tsarfatyetal.,2010).TheTurkishtreebank(Oﬂazeretal.,2003),forexample,representswordsassequencesofinﬂectionalgroups,semanticallycoherentgroupsofmorphemesseparatedbyderivationalboundaries.ThetreebankforModernHebrew(Sima’anetal.,2001)choosesmorphemesasthebasicunitofrep-resentation.Aspace-delimitedwordinthetreebankcanconsistofseveralmorphemesthatmaybelongtoindependentsyntacticcontexts.BothTurkishandHebrewshowhighamountsofambiguitywhenitcomestothecorrectsegmentationofwordsintoinﬂectionalgroupsandmorphemes,respectively.Withinasentence,however,theseam-biguitiescanoftenberesolvedbythesyntacticandsemanticcontextinwhichthesewordsappear.Astandard(dependency)parsingsystemde-cidessegmentation,morphologicalanalysis(includ-ingPOS),andsyntaxoneaftertheotherinapipelinesetup.Whilepipelinesarefastandefﬁcient,theycannotmodelinteractionbetweenthesedifferentlevelsofanalysis,however.Ithasthereforebeenarguedthatjointmodelingofthesethreetasksismoresuitabletotheproblem(Tsarfaty,2006).Inpreviousresearch,severaltransition-basedparsershavebeenproposedtomodelPOS/morphologicaltaggingandparsingjointly(Hatorietal.,2011;BohnetandNivre,2012;Bohnetetal.,2013).SuchparsingsystemshavebeenfurtherextendedtoalsosolvethesegmentationprobleminChinese(Ha-torietal.,2012;LiandZhou,2012;Zhangetal.,2014).Transition-basedparsersareattractivesince l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i

Transactions of the Association for Computational Linguistics, vol. 3, pp. 299–313, 2015. Action Editor: Kristina Toutanova.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 299–313, 2015. Action Editor: Kristina Toutanova. Submission batch: 1/2015; Revision batch 5/2015; Published 6/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) ImprovingTopicModelswithLatentFeatureWordRepresentationsDatQuocNguyen1,RichardBillingsley1,LanDu1andMarkJohnson1,21DepartmentofComputing,MacquarieUniversity,Sydney,Australia2SantaFeInstitute,SantaFe,NewMexico,USAdat.nguyen@students.mq.edu.au,{richard.billingsley,lan.du,mark.johnson}@mq.edu.auAbstractProbabilistictopicmodelsarewidelyusedtodiscoverlatenttopicsindocumentcollec-tions,whilelatentfeaturevectorrepresenta-tionsofwordshavebeenusedtoobtainhighperformanceinmanyNLPtasks.Inthispa-per,weextendtwodifferentDirichletmultino-mialtopicmodelsbyincorporatinglatentfea-turevectorrepresentationsofwordstrainedonverylargecorporatoimprovetheword-topicmappinglearntonasmallercorpus.Exper-imentalresultsshowthatbyusinginforma-tionfromtheexternalcorpora,ournewmod-elsproducesigniﬁcantimprovementsontopiccoherence,documentclusteringanddocumentclassiﬁcationtasks,especiallyondatasetswithfeworshortdocuments.1IntroductionTopicmodelingalgorithms,suchasLatentDirichletAllocation(Bleietal.,2003)andrelatedmethods(Blei,2012),areoftenusedtolearnasetoflatenttopicsforacorpus,andpredicttheprobabilitiesofeachwordineachdocumentbelongingtoeachtopic(Tehetal.,2006;Newmanetal.,2006;ToutanovaandJohnson,2008;Porteousetal.,2008;Johnson,2010;XieandXing,2013;Hingmireetal.,2013).Conventionaltopicmodelingalgorithmssuchastheseinferdocument-to-topicandtopic-to-worddis-tributionsfromtheco-occurrenceofwordswithindocuments.Butwhenthetrainingcorpusofdocu-mentsissmallorwhenthedocumentsareshort,theresultingdistributionsmightbebasedonlittleevi-dence.SahamiandHeilman(2006)andPhanetal.(2011)showthatithelpstoexploitexternalknowl-edgetoimprovethetopicrepresentations.SahamiandHeilman(2006)employedwebsearchresultstoimprovetheinformationinshorttexts.Phanetal.(2011)assumedthatthesmallcorpusisasampleoftopicsfromalargercorpuslikeWikipedia,andthenusethetopicsdiscoveredinthelargercorpustohelpshapethetopicrepresentationsinthesmallcorpus.However,ifthelargercorpushasmanyirrel-evanttopics,thiswill“useup”thetopicspaceofthemodel.Inaddition,Pettersonetal.(2010)proposedanextensionofLDAthatusesexternalinformationaboutwordsimilarity,suchasthesaurianddictio-naries,tosmooththetopic-to-worddistribution.Topicmodelshavealsobeenconstructedusingla-tentfeatures(SalakhutdinovandHinton,2009;Sri-vastavaetal.,2013;Caoetal.,2015).Latentfea-ture(LF)vectorshavebeenusedforawiderangeofNLPtasks(Glorotetal.,2011;Socheretal.,2013;Penningtonetal.,2014).Thecombinationofval-uespermittedbylatentfeaturesformsahighdimen-sionalspacewhichmakesitiswellsuitedtomodeltopicsofverylargecorpora.Ratherthanrelyingsolelyonamultinomialorla-tentfeaturemodel,asinSalakhutdinovandHinton(2009),Srivastavaetal.(2013)andCaoetal.(2015),weexplorehowtotakeadvantageofbothlatentfea-tureandmultinomialmodelsbyusingalatentfea-turerepresentationtrainedonalargeexternalcorpustosupplementamultinomialtopicmodelestimatedfromasmallercorpus.Ourmaincontributionisthatweproposetwonewlatentfeaturetopicmodelswhichintegratela-tentfeaturewordrepresentationsintotwoDirichlet l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 257–270, 2015. Action Editor: Katrin Erk.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 257–270, 2015. Action Editor: Katrin Erk. Submission batch: 12/2014; Revision batch 3/2015; Published 5/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) LearningaCompositionalSemanticsforFreebasewithanOpenPredicateVocabularyJayantKrishnamurthyCarnegieMellonUniversity5000ForbesAvenuePittsburgh,PA15213jayantk@cs.cmu.eduTomM.MitchellCarnegieMellonUniversity5000ForbesAvenuePittsburgh,PA15213tom.mitchell@cmu.eduAbstractWepresentanapproachtolearningamodel-theoreticsemanticsfornaturallanguagetiedtoFreebase.Crucially,ourapproachusesanopenpredicatevocabulary,enablingittoproducedenotationsforphrasessuchas“Re-publicanfront-runnerfromTexas”whosese-manticscannotberepresentedusingtheFree-baseschema.Ourapproachdirectlyconvertsasentence’ssyntacticCCGparseintoalog-icalformcontainingpredicatesderivedfromthewordsinthesentence,assigningeachwordaconsistentsemanticsacrosssentences.Thislogicalformisevaluatedagainstalearnedprobabilisticdatabasethatdeﬁnesadistribu-tionoverdenotationsforeachtextualpred-icate.Atrainingphaseproducesthisprob-abilisticdatabaseusingacorpusofentity-linkedtextandprobabilisticmatrixfactoriza-tionwithanovelrankingobjectivefunction.Weevaluateourapproachonacompositionalquestionansweringtaskwhereitoutperformsseveralcompetitivebaselines.Wealsocom-pareourapproachagainstmanuallyannotatedFreebasequeries,ﬁndingthatouropenpred-icatevocabularyenablesustoanswermanyquestionsthatFreebasecannot.1IntroductionTraditionalknowledgerepresentationassumesthatworldknowledgecanbeencodedusingaclosedvocabularyofformalpredicates.Inrecentyears,semanticparsinghasenabledustobuildcompo-sitionalmodelsofnaturallanguagesemanticsus-ingsuchaclosedpredicatevocabulary(ZelleandMooney,1996;ZettlemoyerandCollins,2005).Thesesemanticparsersmapnaturallanguagestate-mentstodatabasequeries,enablingapplicationssuchasansweringquestionsusingalargeknowl-edgebase(Yahyaetal.,2012;KrishnamurthyandMitchell,2012;CaiandYates,2013;Kwiatkowskietal.,2013;Berantetal.,2013;BerantandLiang,2014;Reddyetal.,2014).Furthermore,themodel-theoreticsemanticsprovidedbysuchparsershavethepotentialtoimproveperformanceonothertasks,suchasinformationextractionandcoreferenceres-olution.However,aclosedpredicatevocabularyhasinher-entlimitations.First,itscoveragewillbelimited,assuchvocabulariesaretypicallymanuallycon-structed.Second,itmayabstractawaypotentiallyrelevantsemanticdifferences.Forexample,these-manticsof“Republicanfront-runner”cannotbead-equatelyencodedintheFreebaseschemabecauseitlackstheconceptofa“front-runner.”Wecouldchoosetoencodethisconceptas“politician”atthecostofabstractingawaythedistinctionbetweenthetwo.Asthisexampleillustrates,thesetwoproblemsareprevalentineventhelargestknowledgebases.Analternativeparadigmisanopenpredicatevocabulary,whereeachnaturallanguagewordorphraseisgivenitsownformalpredicate.Thisparadigmisembodiedinbothopeninformationex-traction(Bankoetal.,2007)anduniversalschema(Riedeletal.,2013).Openpredicatevocabularieshavethepotentialtocapturesubtlesemanticdistinc-tionsandachievehighcoverage.However,wehaveyettodevelopcompellingapproachestocomposi-tionalsemanticswithinthisparadigm.Thispapertakesasteptowardcompositionalse- l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald. Submission batch: 1/2015; Revision batch 4/2015; Published 5/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) CombiningMinimally-supervisedMethodsforArabicNamedEntityRecognitionMahaAlthobaiti,UdoKruschwitz,andMassimoPoesioSchoolofComputerScienceandElectronicEngineeringUniversityofEssexColchester,UK{mjaltha,udo,poesio}@essex.ac.ukAbstractSupervisedmethodscanachievehighperfor-manceonNLPtasks,suchasNamedEn-tityRecognition(NER),butnewannotationsarerequiredforeverynewdomainand/orgenrechange.Thishasmotivatedresearchinminimallysupervisedmethodssuchassemi-supervisedlearninganddistantlearning,butneithertechniquehasyetachievedperfor-mancelevelscomparabletothoseofsuper-visedmethods.Semi-supervisedmethodstendtohaveveryhighprecisionbutcompar-ativelylowrecall,whereasdistantlearningtendstoachievehigherrecallbutlowerpre-cision.Thiscomplementaritysuggeststhatbetterresultsmaybeobtainedbycombiningthetwotypesofminimallysupervisedmeth-ods.Inthispaperwepresentanovelap-proachtoArabicNERusingacombinationofsemi-supervisedanddistantlearningtech-niques.Wetrainedasemi-supervisedNERclassiﬁerandanotheroneusingdistantlearn-ingtechniques,andthencombinedthemusingavarietyofclassiﬁercombinationschemes,includingtheBayesianClassiﬁerCombina-tion(BCC)procedurerecentlyproposedforsentimentanalysis.Accordingtoourresults,theBCCmodelleadstoanincreaseinper-formanceof8percentagepointsoverthebestbaseclassiﬁers.1IntroductionSupervisedlearningtechniquesareveryeffectiveandwidelyusedtosolvemanyNLPproblems,in-cludingNER(Sekineandothers,1998;Benajibaetal.,2007a;Darwish,2013).Themaindisadvantageofsupervisedtechniques,however,istheneedforalargeannotatedcorpus.Althoughaconsiderableamountofannotateddataisavailableformanylan-guages,includingArabic(Zaghouani,2014),chang-ingthedomainorexpandingthesetofclassesal-waysrequiresdomain-speciﬁcexpertsandnewan-notateddata,bothofwhichdemandtimeandeffort.Therefore,muchofthecurrentresearchonNERfocusesonapproachesthatrequireminimalhumaninterventiontoexportthenamedentity(NE)clas-siﬁerstonewdomainsandtoexpandNEclasses(Nadeau,2007;Nothmanetal.,2013).Semi-supervised(Abney,2010)anddistantlearn-ingapproaches(Mintzetal.,2009;Nothmanetal.,2013)arealternativestosupervisedmethodsthatdonotrequiremanuallyannotateddata.TheseapproacheshaveprovedtobeeffectiveandeasilyadaptabletonewNEtypes.However,theperfor-manceofsuchmethodstendstobelowerthanthatachievedwithsupervisedmethods(Althobaitietal.,2013;Nadeau,2007;Nothmanetal.,2013).Weproposecombiningthesetwominimallysu-pervisedmethodsinordertoexploittheirrespectivestrengthsandtherebyobtainbetterresults.Semi-supervisedlearningtendstobemoreprecisethandistantlearning,whichinturnleadstohigherre-callthansemi-supervisedlearning.Inthiswork,weusevariousclassiﬁercombinationschemestocombinetheminimalsupervisionmethods.Mostpreviousstudieshaveexaminedclassiﬁercombi-nationschemestocombinemultiplesupervised-learningsystems(Florianetal.,2003;SahaandEkbal,2013),butthisresearchistheﬁrsttocom-bineminimalsupervisionapproaches.Inaddition, l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 227–242, 2015. Action Editor: Joakim Nivre.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 227–242, 2015. Action Editor: Joakim Nivre. Submission batch: 2/2015; Revision batch 4/2015; Published 5/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) LearningCompositionModelsforPhraseEmbeddingsMoYuMachineIntelligence&TranslationLabHarbinInstituteofTechnologyHarbin,Chinagflfof@gmail.comMarkDredzeHumanLanguageTechnologyCenterofExcellenceCenterforLanguageandSpeechProcessingJohnsHopkinsUniversityBaltimore,MD,21218mdredze@cs.jhu.eduAbstractLexicalembeddingscanserveasusefulrep-resentationsforwordsforavarietyofNLPtasks,butlearningembeddingsforphrasescanbechallenging.Whileseparateembeddingsarelearnedforeachword,thisisinfeasibleforeveryphrase.Weconstructphraseem-beddingsbylearninghowtocomposewordembeddingsusingfeaturesthatcapturephrasestructureandcontext.Weproposeefﬁcientunsupervisedandtask-speciﬁclearningobjec-tivesthatscaleourmodeltolargedatasets.Wedemonstrateimprovementsonbothlanguagemodelingandseveralphrasesemanticsimi-laritytaskswithvariousphraselengths.Wemaketheimplementationofourmodelandthedatasetsavailableforgeneraluse.1IntroductionWordembeddingslearnedbyneurallanguagemod-els(Bengioetal.,2003;CollobertandWeston,2008;Mikolovetal.,2013b)havebeensuccess-fullyappliedtoarangeoftasks,includingsyn-tax(CollobertandWeston,2008;Turianetal.,2010;Collobert,2011)andsemantics(Huangetal.,2012;Socheretal.,2013b;Hermannetal.,2014).However,phrasesarecriticalforcapturinglexicalmeaningformanytasks.Forexample,CollobertandWeston(2008)showedthatwordembeddingsyieldedstate-of-the-artsystemsonword-orientedtasks(POS,NER)butperformanceonphraseori-entedtasks,suchasSRL,lagsbehind.Weproposeanewmethodforcompositionalse-manticsthatlearnstocomposewordembeddingsintophrases.Incontrasttoacommonapproachtophraseembeddingsthatusespre-deﬁnedcompo-sitionoperators(MitchellandLapata,2008),e.g.,component-wisesum/multiplication,welearncom-positionfunctionsthatrelyonphrasestructureandcontext.Otherworkonlearningcompositionsreliesonmatrices/tensorsastransformations(Socheretal.,2011;Socheretal.,2013a;HermannandBlun-som,2013;BaroniandZamparelli,2010;Socheretal.,2012;Grefenstetteetal.,2013).However,thisworksuffersfromtwoprimarydisadvantages.First,thesemethodshavehighcomputationalcomplexityfordenseembeddings:O(d2)orO(d3)forcompos-ingeverytwocomponentswithddimensions.Thehighcomputationalcomplexityrestrictsthesemeth-odstouseverylow-dimensionalembeddings(25or50).Whilelow-dimensionalembeddingsperformwellforsyntax(Socheretal.,2013a)andsentiment(Socheretal.,2013b)tasks,theydopoorlyonse-mantictasks.Second,becauseofthecomplexity,theyusesupervisedtrainingwithsmalltask-speciﬁcdatasets.Anexceptionistheunsupervisedobjec-tiveofrecursiveauto-encoders(Socheretal.,2011).Yetthisworkcannotutilizecontextualfeaturesofphrasesandstillposesscalingchallenges.InthisworkweproposeanovelcompositionaltransformationcalledtheFeature-richComposi-tionalTransformation(FCT)model.FCTproducesphrasesfromtheirwordcomponents.Incontrasttopreviouswork,ourapproachtophrasecomposi-tioncanefﬁcientlyutilizehighdimensionalembed-dings(e.g.d=200)withanunsupervisedobjective,bothofwhicharecriticaltodoingwellonseman-ticstasks.Ourcompositionfunctionisparameter- l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 197–210, 2015. Action Editor: Sharon Goldwater.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 197–210, 2015. Action Editor: Sharon Goldwater. Submission batch: 12/2014; Revision batch 3/2015; Published 4/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) Higher-orderLexicalSemanticModelsforNon-factoidAnswerRerankingDanielFried1,PeterJansen1,GustaveHahn-Powell1,MihaiSurdeanu1,andPeterClark21UniversityofArizona,Tucson,AZ,USA2AllenInstituteforArtiﬁcialIntelligence,Seattle,WA,USA{dfried,pajansen,hahnpowell,msurdeanu}@email.arizona.edupeterc@allenai.orgAbstractLexicalsemanticmodelsproviderobustperformanceforquestionanswering,but,ingeneral,canonlycapitalizeondirectev-idenceseenduringtraining.Forexample,monolingualalignmentmodelsacquiretermalignmentprobabilitiesfromsemi-structureddatasuchasquestion-answerpairs;neuralnetworklanguagemodelslearntermembeddingsfromunstructuredtext.Allthisknowledgeisthenusedtoestimatethesemanticsimilaritybetweenquestionandanswercandidates.Wein-troduceahigher-orderformalismthatal-lowsalltheselexicalsemanticmodelstochaindirectevidencetoconstructindirectassociationsbetweenquestionandanswertexts,bycastingthetaskasthetraversalofgraphsthatencodedirecttermassocia-tions.Usingacorpusof10,000questionsfromYahoo!Answers,weexperimentallydemonstratethathigher-ordermethodsarebroadlyapplicabletoalignmentandlan-guagemodels,acrossbothwordandsyn-tacticrepresentations.Weshowthatanimportantcriterionforsuccessiscontrol-lingforthesemanticdriftthataccumu-latesduringgraphtraversal.Allinall,theproposedhigher-orderapproachimprovesﬁveoutofthesixlexicalsemanticmod-elsinvestigated,withrelativegainsofupto+13%overtheirﬁrst-ordervariants.1IntroductionOpen-domainquestionanswering(QA),whichﬁndsshorttextualanswerstonaturallanguagequestions,isoftenviewedasthesuccessortokey-wordsearch(Etzioni,2011)andoneofthemostdifﬁcultandwidelyapplicableend-userapplica-tionsofnaturallanguageprocessing(NLP).Fromsyntacticparsing,discourseprocessing,andlex-icalsemantics,QAnecessitatesaleveloffunc-tionalityacrossavarietyoftopicsthatmakeitanatural,yetchallenging,provinggroundformanyaspectsofNLP.Here,weaddressapartic-ularlychallengingQAsubtask:open-domainnon-factoidQA,wherequeriestaketheformofcom-plexquestions(e.g.,mannerorHowquestions),andanswersrangefromsinglesentencestoen-tireparagraphs.Becausethistaskissocomplexandlargeinscope,currentstate-of-the-artopen-domainsystemsperformatonlyabout30%P@1,oransweringroughlyoneoutofthreequestionscorrectly(Jansenetal.,2014).Inthispaperwefocusonanswerranking(AR),akeycomponentofnon-factoidQAthatfocusesonorderingcandidateanswersbasedonthelike-lihoodthattheycapturetheinformationneededtoansweraquestion.Unlikekeywordsearch,Berger(2000)observedthatlexicalmatchingmethodsaregenerallyinsufﬁcientforQA,wherequestionsandanswersoftenhavelittletonolexicaloverlap(asinthecaseofWhereshouldwegoforbreakfast?andZoe’sDinerhasgreatpancakes).Previousworkhasshownthatlexicalsemantics(LS)mod-elsarewellsuitedtobridgingthis“lexicalchasm”,andatleasttwoﬂavorsoflexicalsemanticshavebeensuccessfullyappliedtoQA.TheﬁrsttreatsQAasamonolingualalignmentproblem,learningassociationsbetweenwords(orotherstructures)thatappearinquestion-answerpairs(Surdeanuetal.,2011;Yaoetal.,2013).Thesecondcomputesthesemanticsimilaritybetweenquestionandan-swerusinglanguagemodelsacquiredfromrele-vanttexts(Yihetal.,2013;Jansenetal.,2014).Herewearguethatwhilethesemodelsbegintobridgethe“lexicalchasm”,manystillsufferfromsparsityandonlycapitalizeondirectevi-dence.Returningtoourexamplequestion,ifwealsotrainontheQApairWhatgoeswellwithpan-cakes?andhashbrownsandtoast,wecanusethe l D o w n o a d e d f r o m h t t p : / / d i r e c t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 183–196, 2015. Action Editor: Patrick Pantel.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 183–196, 2015. Action Editor: Patrick Pantel. Submission batch: 9/2014; Revision batch 1/2015; Revision batch 3/2015; Published 3/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) FromVisualAttributestoAdjectivesthroughDecompositionalDistributionalSemanticsAngelikiLazaridouGeorgianaDinu∗AdamLiskaMarcoBaroniCenterforMind/BrainSciencesUniversityofTrento{angeliki.lazaridou|georgiana.dinu|adam.liska|marco.baroni}@unitn.itAbstractAsautomatedimageanalysisprogresses,thereisincreasinginterestinricherlinguistican-notationofpictures,withattributesofob-jects(e.g.,furry,brown…)attractingmostattention.Bybuildingontherecent“zero-shotlearning”approach,andpayingatten-tiontothelinguisticnatureofattributesasnounmodiﬁers,andspeciﬁcallyadjectives,weshowthatitispossibletotagimageswithattribute-denotingadjectivesevenwhennotrainingdatacontainingtherelevantan-notationareavailable.Ourapproachreliesontwokeyobservations.First,objectscanbeseenasbundlesofattributes,typicallyex-pressedasadjectivalmodiﬁers(adogissome-thingfurry,brown,etc.),andthusafunctiontrainedtomapvisualrepresentationsofob-jectstonominallabelscanimplicitlylearntomapattributestoadjectives.Second,ob-jectsandattributescometogetherinpictures(thesamethingisadoganditisbrown).Wecanthusachievebetterattribute(andob-ject)labelretrievalbytreatingimagesas“vi-sualphrases”,anddecomposingtheirlinguis-ticrepresentationintoanattribute-denotingadjectiveandanobject-denotingnoun.Ourapproachperformscomparablytoamethodexploitingmanualattributeannotation,itout-performsvariouscompetitivealternativesinbothattributeandobjectannotation,anditau-tomaticallyconstructsattribute-centricrepre-sentationsthatsigniﬁcantlyimproveperfor-manceinsupervisedobjectrecognition.∗Currentafﬁliation:ThomasJ.WatsonResearchCenter,IBM,gdinu@us.ibm.com1IntroductionAsthequalityofimageanalysisalgorithmsim-proves,thereisincreasinginterestinannotatingim-ageswithlinguisticdescriptionsrangingfromsin-glewordsdescribingthedepictedobjectsandtheirproperties(Farhadietal.,2009;Lampertetal.,2009)toricherexpressionssuchasfull-ﬂedgedim-agecaptions(Kulkarnietal.,2011;Mitchelletal.,2012).Thistrendhasgeneratedwideinterestinlin-guisticannotationsbeyondconcretenouns,withtheroleofadjectivesinimagedescriptionsreceiving,inparticular,muchattention.Adjectivesareofspecialinterestbecauseoftheircentralroleinso-calledattribute-centricimagerep-resentations.Thisframeworkviewsobjectsasbun-dlesofproperties,orattributes,commonlyex-pressedbyadjectives(e.g.,furry,brown),andusesthelatterasfeaturestolearnhigher-level,seman-ticallyricherrepresentationsofobjects(Farhadietal.,2009).1Attribute-basedmethodsachievebettergeneralizationofobjectclassiﬁerswithlesstrain-ingdata(Lampertetal.,2009),whileatthesametimeproducingsemanticrepresentationsofvisualconceptsthatmoreaccuratelymodelhumanse-1Inthispaper,weassumethat,justlikenounsarethelin-guisticcounterpartofvisualobjects,visualattributesareex-pressedbyadjectives.Aninformalsurveyoftherelevantlitera-turesuggeststhat,whenattributeshavelinguisticlabels,theyareindeedmostlyexpressedbyadjectives.Therearesomeattributes,suchasparts,thataremorenaturallyexpressedbyprepositionalphrases(PPs:withatail).Interestingly,DinuandBaroni(2014)showedthatthedecompositionfunctionwewilladoptherecanderivebothadjective-nounandnoun-PPphrases,suggestingthatourapproachcouldbeseamlesslyextendedtovisualattributesexpressedbynoun-modifyingPPs. l D o w n o a d e d f r o m h t t p : / / d i r

Transactions of the Association for Computational Linguistics, vol. 3, pp. 157–167, 2015. Action Editor: Yuji Matsumoto.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 157–167, 2015. Action Editor: Yuji Matsumoto. Submission batch: 9/2014; Revision batch: 12/2014; Revision batch 2/2015; Published 3/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license. c (cid:13) AnUnsupervisedMethodforUncoveringMorphologicalChainsKarthikNarasimhan,ReginaBarzilayandTommiJaakkolaCSAIL,MassachusettsInstituteofTechnology{karthikn,regina,tommi}@csail.mit.eduAbstractMoststate-of-the-artsystemstodayproducemorphologicalanalysisbasedonlyonortho-graphicpatterns.Incontrast,weproposeamodelforunsupervisedmorphologicalanal-ysisthatintegratesorthographicandseman-ticviewsofwords.Wemodelwordforma-tionintermsofmorphologicalchains,frombasewordstotheobservedwords,breakingthechainsintoparent-childrelations.Weuselog-linearmodelswithmorphemeandword-levelfeaturestopredictpossibleparents,in-cludingtheirmodiﬁcations,foreachword.Thelimitedsetofcandidateparentsforeachwordrendercontrastiveestimationfeasible.Ourmodelconsistentlymatchesoroutper-formsﬁvestate-of-the-artsystemsonArabic,EnglishandTurkish.11IntroductionMorphologicallyrelatedwordsexhibitconnectionsatmultiplelevels,rangingfromorthographicalpat-ternstosemanticproximity.Forinstance,thewordsplayingandplayedsharethesamestem,butalsocarrysimilarmeaning.Ideally,allthesecomple-mentarysourcesofinformationwouldbetakenintoaccountwhenlearningmorphologicalstructures.Moststate-of-the-artunsupervisedapproachestomorphologicalanalysisarebuiltprimarilyaroundorthographicpatternsinmorphologically-relatedwords(GoldwaterandJohnson,2004;CreutzandLagus,2007;SnyderandBarzilay,2008;Poonetal.,2009).Intheseapproaches,wordsarecom-monlymodeledasconcatenationsofmorphemes.1Codeisavailableathttps://github.com/karthikncode/MorphoChain.Thismorpheme-centricviewiswell-suitedforun-coveringdistributionalpropertiesofstemsandaf-ﬁxes.Butitisnotwell-equippedtocapturesemanticrelatednessatthewordlevel.Incontrast,earlierapproachesthatcapturese-manticsimilarityinmorphologicalvariantsoper-atesolelyatthewordlevel(SchoneandJuraf-sky,2000;Baronietal.,2002).Giventwocandi-datewords,theproximityisassessedusingstandardword-distributionalmeasuressuchasmutualinfor-mation.However,thefactthatthesemodelsdonotmodelmorphemesdirectlygreatlylimitstheirper-formance.Inthispaper,weproposeamodeltointegrateor-thographicandsemanticviews.Ourgoalistobuildachainofderivationsforacurrentwordfromitsbaseform.Forinstance,givenawordplayfully,thecorrespondingchainisplay→playful→playfully.Thewordplayisabaseformofthisderivationasitcannotbereducedanyfurther.Individualderiva-tionsareobtainedbyaddingamorpheme(ex.-ful)toaparentword(ex.play).Thisadditionmaybeimplementedviaasimpleconcatenation,oritmayinvolvetransformations.Ateverystepofthechain,themodelaimstoﬁndaparent-childpair(ex.play-playful)suchthattheparentalsoconstitutesavalidentryinthelexicon.Thisallowsthemodeltodi-rectlycomparethesemanticsimilarityoftheparent-childpair,whilealsoconsideringtheorthographicpropertiesofthemorphemiccombination.Wemodeleachstepofamorphologicalchainbymeansofalog-linearmodelthatenablesustoin-corporateawiderangeoffeatures.Attheseman-ticlevel,weconsidertherelatednessbetweentwowordsusingthecorrespondingvectorembeddings.Attheorthographiclevel,featurescapturewhether l D o w n o a d e d f r o m h t t p : / / d i r

Transactions of the Association for Computational Linguistics, vol. 3, pp. 131–143, 2015. Action Editor: Masaaki Nagata.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 131–143, 2015. Action Editor: Masaaki Nagata. Submission batch: 10/2014; Revision batch 1/2015; Published 3/2015. c 2015 Association for Computational Linguistics. (cid:13) 131 UnsupervisedDeclarativeKnowledgeInductionforConstraint-BasedLearningofInformationStructureinScientiﬁcDocumentsYufanGuoDTALUniversityofCambridge,UKyg244@cam.ac.ukRoiReichartTechnion-IITHaifa,Israelroiri@ie.technion.ac.ilAnnaKorhonenDTALUniversityofCambridge,UKalk23@cam.ac.ukAbstractInferringtheinformationstructureofscien-tiﬁcdocumentsisusefulformanyNLPappli-cations.Existingapproachestothistaskre-quiresubstantialhumaneffort.Weproposeaframeworkforconstraintlearningthatre-duceshumaninvolvementconsiderably.Ourmodelusestopicmodelstoidentifylatenttop-icsandtheirkeylinguisticfeaturesininputdocuments,inducesconstraintsfromthisin-formationandmapssentencestotheirdomi-nantinformationstructurecategoriesthroughaconstrainedunsupervisedmodel.Whentheinducedconstraintsarecombinedwithafullyunsupervisedmodel,theresultingmodelchallengesexistinglightlysupervisedfeature-basedmodelsaswellasunsupervisedmod-elsthatusemanuallyconstructeddeclarativeknowledge.Ourresultsdemonstratethatuse-fuldeclarativeknowledgecanbelearnedfromdatawithverylimitedhumaninvolvement.1IntroductionAutomaticanalysisofscientiﬁctextcanhelpscien-tistsﬁndinformationfromliteraturefaster,savingvaluableresearchtime.Inthispaperwefocusontheanalysisoftheinformationstructure(IS)ofsci-entiﬁcarticleswheretheaimistoassigneachunitofanarticle(typicallyasentence)intoacategorythatrepresentstheinformationtypeitconveys.Byinfor-mationstructurewerefertoaparticulartypeofdis-coursestructurethatfocusesonthefunctionalroleofaunitinthediscourse(Webberetal.,2011).Forinstance,inthescientiﬁcliterature,thefunctionalroleofasentencecouldbethebackgroundormoti-vationoftheresearch,themethodsused,theexperi-mentscarriedout,theobservationsontheresults,ortheauthor’sconclusions.ReadersofscientiﬁcliteratureﬁndinformationinIS-annotatedarticlesmuchfasterthaninunanno-tatedarticles(Guoetal.,2011b).ArgumentativeZoning(AZ)–aninformationstructureschemethathasbeenappliedsuccessfullytomanyscientiﬁcdo-mains(Teufeletal.,2009)–hasimprovedtaskssuchassummarizationandinformationextractionandretrieval(TeufelandMoens,2002;Tbahritietal.,2006;Ruchetal.,2007;Liakataetal.,2012;Contractoretal.,2012).Existingapproachestoinformationstructureanal-ysisrequiresubstantialhumaneffort.Mostusefeature-basedmachinelearning,suchasSVMsandCRFs(e.g.(TeufelandMoens,2002;Linetal.,2006;Hirohataetal.,2008;Shatkayetal.,2008;Guoetal.,2010;Liakataetal.,2012))whichrelyonthousandsofmanuallyannotatedtrainingsen-tences.Alsotheperformanceofsuchmethodsisratherlimited:Liakataetal.(2012)reportedper-classF-scoresrangingfrom.53to.76inthebio-chemistryandchemistrydomainsandGuoetal.(2013a)reportedsubstantiallylowernumbersforthechallengingIntroductionandDiscussionsectionsinbiomedicaldomain.Guoetal.(2013a)recentlyappliedtheGeneral-izedExpectation(GE)criterion(MannandMcCal-lum,2007)toinformationstructureanalysisusingexpertknowledgeintheformofdiscourseandlexi-calconstraints.Theirmodelproducespromisingre-sults,especiallyforsectionsandcategorieswhere l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t .

Transactions of the Association for Computational Linguistics, vol. 3, pp. 117–129, 2015. Action Editor: Hal Daum´e III.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 117–129, 2015. Action Editor: Hal Daum´e III. Submission batch: 10/2014; Revision batch 1/2015; Published 2/2015. c 2015 Association for Computational Linguistics. (cid:13) 117 ExploitingParallelNewsStreamsforUnsupervisedEventExtractionCongleZhang,StephenSoderland&DanielS.WeldComputerScience&EngineeringUniversityofWashingtonSeattle,WA98195,USA{clzhang,soderlan,weld}@cs.washington.eduAbstractMostapproachestorelationextraction,thetaskofextractinggroundfactsfromnaturallanguagetext,arebasedonmachinelearningandthusstarvedbyscarcetrainingdata.Man-ualannotationistooexpensivetoscaletoacomprehensivesetofrelations.Distantsuper-vision,whichautomaticallycreatestrainingdata,onlyworkswithrelationsthatalreadypopulateaknowledgebase(KB).Unfortu-nately,KBssuchasFreeBaserarelycovereventrelations(e.g.“persontravelstoloca-tion”).Thus,theproblemofextractingawiderangeofevents—e.g.,fromnewsstreams—isanimportant,openchallenge.ThispaperintroducesNEWSSPIKE-RE,anovel,unsupervisedalgorithmthatdiscoverseventrelationsandthenlearnstoextractthem.NEWSSPIKE-REusesanovelprobabilisticgraphicalmodeltoclustersentencesdescrib-ingsimilareventsfromparallelnewsstreams.Theseclustersthencomprisetrainingdatafortheextractor.OurevaluationshowsthatNEWSSPIKE-REgenerateshighqualitytrain-ingsentencesandlearnsextractorsthatper-formmuchbetterthanrivalapproaches,morethandoublingtheareaunderaprecision-recallcurvecomparedtoUniversalSchemas.1IntroductionRelationextraction,theprocessofextractingstruc-turedinformationfromnaturallanguagetext,growsincreasinglyimportantforWebsearchandques-tionanswering.Traditionalsupervisedapproaches,whichcanachievehighprecisionandrecall,arelim-itedbythecostoflabelingtrainingdataandareun-likelytoscaletothethousandsofrelationsontheWeb.Anotherapproach,distantsupervision(CravenandKumlien,1999;WuandWeld,2007),createsitsowntrainingdatabymatchingthegroundinstancesofaKnowledgebase(KB)(e.g.Freebase)totheun-labeledtext.Unfortunately,whiledistantsupervisioncanworkwellinsomesituations,themethodislimitedtorela-tivelystaticfacts(e.g.,born-in(person,location)orcapital-of(location,location))wherethereisacor-respondingknowledgebase.Butwhataboutdy-namiceventrelations(alsoknownasﬂuents),suchastravel-to(person,location)orﬁre(organization,person)?Sincethesetime-dependentfactsareephemeral,theyarerarelystoredinapre-existingKB.Atthesametime,knowledgeofreal-timeeventsiscrucialformakinginformeddecisionsinﬁeldslikeﬁnanceandpolitics.Indeed,newsstoriesreporteventsalmostexclusively,solearningtoex-tracteventsisanimportantopenproblem.Thispaperdevelopsanewunsupervisedtech-nique,NEWSSPIKE-RE,tobothdiscovereventrela-tionsandextractthemwithhighprecision.Thein-tuitionunderlyingNEWSSPIKE-REisthatthetextofarticlesfromtwodifferentnewssourcesarenotindependent,sincetheyareeachconditionedonthesamereal-worldevents.Bylookingforrarelyde-scribedentitiesthatsuddenly“spike”inpopularityonagivendate,onecanidentifyparaphrases.Suchtemporalcorrespondence(ZhangandWeld,2013)allowonetoclusterdiversesentences,andthere-sultingclustersmaybeusedtoformtrainingdatainordertolearneventextractors.Furthermore,onecanalsoexploitparallelnewstoobtaindirectnegativeevidence.Toseethis,supposeonedaythenewsin-cludesthefollowing:(a)“SnowdentravelstoHongKong,offsoutheasternChina.”(b)“Snowdencan-notstayinHongKongasChineseofﬁcialswillnotallow…”Sincenewsstoriesareusuallycoherent,itishighlyunlikelythattraveltoandstayin(whichisnegated)aresynonymous.Byleveragingsuchdirectnegativephrases,wecanlearnextractorscapableofdistinguishingheavilyco-occurringbutsemanticallydifferentphrases,therebyavoidingmanyextractionerrors.OurNEWSSPIKE-REsystemencapuslatestheseintuitionsinanovelgraphicalmodelmaking l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t

Transactions of the Association for Computational Linguistics, vol. 3, pp. 59–71, 2015. Action Editor: Hwee Tou Ng.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 59–71, 2015. Action Editor: Hwee Tou Ng. Submission batch: 10/2014; Revision batch 12/2014; Revision batch 1/2015; Published 1/2015. c (cid:13) 2015 Association for Computational Linguistics. 59 ASense-TopicModelforWordSenseInductionwithUnsupervisedDataEnrichmentJingWang∗MohitBansal†KevinGimpel†BrianD.Ziebart∗ClementT.Yu∗∗UniversityofIllinoisatChicago,Chicago,IL,60607,USA{jwang69,bziebart,cyu}@uic.edu†ToyotaTechnologicalInstituteatChicago,Chicago,IL,60637,USA{mbansal,kgimpel}@ttic.eduAbstractWordsenseinduction(WSI)seekstoautomat-icallydiscoverthesensesofawordinacor-pusviaunsupervisedmethods.Weproposeasense-topicmodelforWSI,whichtreatssenseandtopicastwoseparatelatentvari-ablestobeinferredjointly.Topicsarein-formedbytheentiredocument,whilesensesareinformedbythelocalcontextsurroundingtheambiguousword.Wealsodiscussunsu-pervisedwaysofenrichingtheoriginalcor-pusinordertoimprovemodelperformance,includingusingneuralwordembeddingsandexternalcorporatoexpandthecontextofeachdatainstance.Wedemonstratesigniﬁcantim-provementsoverthepreviousstate-of-the-art,achievingthebestresultsreportedtodateontheSemEval-2013WSItask.1IntroductionWordsenseinduction(WSI)isthetaskofautomat-icallydiscoveringallsensesofanambiguouswordinacorpus.TheinputstoWSIareinstancesoftheambiguouswordwithitssurroundingcontext.Theoutputisagroupingoftheseinstancesintoclusterscorrespondingtotheinducedsenses.WSIisgen-erallyconductedasanunsupervisedlearningtask,relyingontheassumptionthatthesurroundingcon-textofawordindicatesitsmeaning.Mostpreviousworkassumedthateachinstanceisbestlabeledwithasinglesense,andtherefore,thateachinstancebe-longstoexactlyonesensecluster.However,recentwork(ErkandMcCarthy,2009;Jurgens,2013)hasshownthatmorethanonesensecanbeusedtointer-pretcertaininstances,duetocontextambiguityandsenserelatedness.TohandlethesecharacteristicsofWSI(unsuper-vised,sensesrepresentedbytokenclusters,multiplesensesperinstance),weconsiderapproachesbasedontopicmodels.Atopicmodelisanunsupervisedmethodthatdiscoversthesemantictopicsunderly-ingacollectionofdocuments.ThemostpopularislatentDirichletallocation(LDA;Bleietal.,2003),inwhicheachtopicisrepresentedasamultinomialdistributionoverwords,andeachdocumentisrep-resentedasamultinomialdistributionovertopics.OneapproachwouldbetorunLDAonthein-stancesforanambiguousword,thensimplyinter-prettopicsasinducedsenses(BrodyandLapata,2009).However,whilesenseandtopicarerelated,theyaredistinctlinguisticphenomena.Topicsareassignedtoentiredocumentsandareexpressedbyallwordtokens,whilesensesrelatetoasingleam-biguouswordandareexpressedthroughthelocalcontextofthatword.Onepossibleapproachwouldbetoonlykeepthelocalcontextofeachambigu-ousword,discardingtheglobalcontext.However,thetopicalinformationcontainedinthebroadercon-text,thoughitmaynotdeterminethesensedirectly,mightstillbeusefulfornarrowingdownthelikelysensesoftheambiguousword.Considertheambiguouswordcold.Inthesen-tence“Hisreactiontotheexperimentswascold”,thepossiblesensesforcoldincludecoldtempera-ture,acoldsensation,commoncold,oranegativeemotionalreaction.However,ifweknowthatthetopicofthedocumentconcernstheeffectsoflowtemperaturesonphysicalhealth,thenthenegativeemotionalreactionsenseshouldbecomelesslikely.Therefore,inthiscase,knowingthetopichelpsnar-rowdownthesetofplausiblesenses. l D o w n o a d e d f r o m h t t p : / / d i r e c t .

Transactions of the Association for Computational Linguistics, vol. 3, pp. 43–57, 2015. Action Editor: Janyce Wiebe.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 43–57, 2015. Action Editor: Janyce Wiebe. Submission batch: 7/2014; Revision batch 12/2014; Published 1/2015. c (cid:13) 2015 Association for Computational Linguistics. 43 SPRITE:GeneralizingTopicModelswithStructuredPriorsMichaelJ.PaulandMarkDredzeDepartmentofComputerScienceHumanLanguageTechnologyCenterofExcellenceJohnsHopkinsUniversity,Baltimore,MD21218mpaul@cs.jhu.edu,mdredze@cs.jhu.eduAbstractWeintroduceSPRITE,afamilyoftopicmodelsthatincorporatesstructureintomodelpriorsasafunctionofunderlyingcomponents.Thestructuredpriorscanbeconstrainedtomodeltopichierarchies,factorizations,correlations,andsupervi-sion,allowingSPRITEtobetailoredtoparticularsettings.WedemonstratethisﬂexibilitybyconstructingaSPRITE-basedmodeltojointlyinfertopichierarchiesandauthorperspective,whichweapplytocor-poraofpoliticaldebatesandonlinere-views.Weshowthatthemodellearnsin-tuitivetopics,outperformingseveralothertopicmodelsatpredictivetasks.1IntroductionTopicmodelscanbeapowerfulaidforanalyzinglargecollectionsoftextbyuncoveringlatentin-terpretablestructureswithoutmanualsupervision.Yetpeopleoftenhaveexpectationsabouttopicsinagivencorpusandhowtheyshouldbestructuredforaparticulartask.Itiscrucialfortheuserexpe-riencethattopicsmeettheseexpectations(Mimnoetal.,2011;Talleyetal.,2011)yetblackboxtopicmodelsprovidenocontroloverthedesiredoutput.ThispaperpresentsSPRITE,afamilyoftopicmodelsthatprovideaﬂexibleframeworkforen-codingpreferencesaspriorsforhowtopicsshouldbestructured.SPRITEcanincorporatemanytypesofstructurethathavebeenconsideredinpriorwork,includinghierarchies(Bleietal.,2003a;Mimnoetal.,2007),factorizations(PaulandDredze,2012;Eisensteinetal.,2011),sparsity(WangandBlei,2009;BalasubramanyanandCo-hen,2013),correlationsbetweentopics(BleiandLafferty,2007;LiandMcCallum,2006),pref-erencesoverwordchoices(Andrzejewskietal.,2009;PaulandDredze,2013),andassociationsbetweentopicsanddocumentattributes(Ramageetal.,2009;MimnoandMcCallum,2008).SPRITEbuildsonastandardtopicmodel,addingstructuretothepriorsoverthemodelpa-rameters.Thepriorsaregivenbylog-linearfunc-tionsofunderlyingcomponents(§2),whichpro-videadditionallatentstructurethatwewillshowcanenrichthemodelinmanyways.Byapply-ingparticularconstraintsandpriorstothecompo-nenthyperparameters,avarietyofstructurescanbeinducedsuchashierarchiesandfactorizations(§3),andwewillshowthatthisframeworkcap-turesmanyexistingtopicmodels(§4).Afterdescribingthegeneralformofthemodel,weshowhowSPRITEcanbetailoredtopartic-ularsettingsbydescribingaspeciﬁcmodelfortheappliedtaskofjointlyinferringtopichierar-chiesandperspective(§6).Weexperimentwiththistopic+perspectivemodelonsetsofpoliticaldebatesandonlinereviews(§7),anddemonstratethatSPRITElearnsdesiredstructureswhileoutper-formingmanybaselinesatpredictivetasks.2TopicModelingwithStructuredPriorsOurmodelfamilygeneralizeslatentDirichletal-location(LDA)(Bleietal.,2003b).UnderLDA,thereareKtopics,whereatopicisacategor-icaldistributionoverVwordsparameterizedbyφk.Eachdocumenthasacategoricaldistributionovertopics,parameterizedbyθmforthemthdoc-ument.Eachobservedwordinadocumentisgen-eratedbydrawingatopiczfromθm,thendrawingthewordfromφz.θandφhavepriorsgivenbyDirichletdistributions.Ourgeneralizationaddsstructuretothegener-ationoftheDirichletparameters.Thepriorsfortheseparametersaremodeledaslog-linearcom-binationsofunderlyingcomponents.Componentsarereal-valuedvectorsoflengthequaltothevo-cabularysizeV(forpriorsoverworddistribu-tions)orlengthequaltothenumberoftopicsK l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t .