Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos.

Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos.
Submitted 1/2013; Revised 3/2013; Published 5/2013. c(cid:13)2013 Association for Computational Linguistics.

CombinedDistributionalandLogicalSemanticsMikeLewisSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKmike.lewis@ed.ac.ukMarkSteedmanSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKsteedman@inf.ed.ac.ukAbstractWeintroduceanewapproachtosemanticswhichcombinesthebenefitsofdistributionalandformallogicalsemantics.Distributionalmodelshavebeensuccessfulinmodellingthemeaningsofcontentwords,butlogicalse-manticsisnecessarytoadequatelyrepresentmanyfunctionwords.Wefollowformalse-manticsinmappinglanguagetologicalrep-resentations,butdifferinthattherelationalconstantsusedareinducedbyofflinedistri-butionalclusteringatthelevelofpredicate-argumentstructure.Ourclusteringalgorithmishighlyscalable,allowingustorunoncor-porathesizeofGigaword.Differentsensesofawordaredisambiguatedbasedontheirin-ducedtypes.Weoutperformavarietyofex-istingapproachesonawide-coveragequestionansweringtask,anddemonstratetheabilitytomakecomplexmulti-sentenceinferencesin-volvingquantifiersontheFraCaSsuite.1IntroductionMappingnaturallanguagetomeaningrepresenta-tionsisacentralchallengeofNLP.Therehasbeenmuchrecentprogressinunsuperviseddistributionalsemantics,inwhichthemeaningofawordisin-ducedbasedonitsusageinlargecorpora.Thisap-proachisusefulforarangeofkeyapplicationsin-cludingquestionansweringandrelationextraction(LinandPantel,2001;PoonandDomingos,2009;Yaoetal.,2011).Becausesuchasemanticscanbeautomicallyinduced,itescapesthelimitationofde-pendingonrelationsfromhand-builttrainingdata,knowledgebasesorontologies,whichhaveprovedoflimiteduseincapturingthehugevarietyofmean-ingsthatcanbeexpressedinlanguage.However,distributionalsemanticshaslargelyde-velopedinisolationfromtheformalsemanticsliter-ature.Whilstdistributionalsemanticshasbeenef-fectiveinmodellingthemeaningsofcontentwordssuchasnounsandverbs,itislessclearthatitcanbeappliedtothemeaningsoffunctionwords.Semanticoperators,suchasdeterminers,negation,conjunc-tions,modals,tense,mood,aspect,andpluralsareubiquitousinnaturallanguage,andarecrucialforhighperformanceonmanypracticalapplications—butcurrentdistributionalmodelsstruggletocaptureevensimpleexamples.Conversely,computationalmodelsofformalsemanticshaveshownlowrecallonpracticalapplications,stemmingfromtheirre-lianceonontologiessuchasWordNet(Miller,1995)tomodelthemeaningsofcontentwords(Bobrowetal.,2007;BosandMarkert,2005).Forexample,considerwhatisneededtoansweraquestionlikeDidGooglebuyYouTube?fromthefollowingsentences:1.GooglepurchasedYouTube2.Google’sacquisitionofYouTube3.Googleacquiredeverycompany4.YouTubemaybesoldtoGoogle5.GooglewillbuyYouTubeorMicrosoft6.Googledidn’ttakeoverYouTubeAlloftheserequireknowledgeoflexicalseman-tics(e.g.thatbuyandpurchasearesynonyms),butsomealsoneedinterpretationofquantifiers,nega-tives,modalsanddisjunction.Itseemsunlikelythat

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
1
9
1
5
6
6
6
4
9

/

/
t

je

un
c
_
un
_
0
0
2
1
9
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

180

distributionalorformalapproachescanaccomplishthetaskalone.Weproposeamethodformappingnaturallan-guagetofirst-orderlogicrepresentationscapableofcapturingthemeaningsoffunctionwordssuchasevery,notandor,butwhichalsousesdistributionalstatisticstomodelthemeaningofcontentwords.Ourapproachdiffersfromstandardformalseman-ticsinthatthenon-logicalsymbolsusedinthelog-icalformareclusteridentifiers.Wherestandardse-manticformalismswouldmaptheverbwritetoawrite’symbol,wemapittoaclusteridentifiersuchasrelation37,whichthenounauthormayalsomapto.Thismappingislearntbyofflineclustering.Unlikepreviousdistributionalapproaches,weperformclusteringatthelevelofpredicate-argumentstructure,ratherthansyntacticdependencystruc-ture.Thismeansthatweabstractawayfrommanysyntacticdifferencesthatarenotpresentinthese-mantics,suchasconjunctions,passives,relativeclauses,andlong-rangedependencies.Thissignifi-cantlyreducessparsity,sowehavefewerpredicatestoclusterandmoreobservationsforeach.Ofcourse,manypracticalinferencesrelyheavilyonbackgroundknowledgeabouttheworld—suchknowledgefallsoutsidethescopeofthiswork.2BackgroundOurapproachisbasedonCombinatoryCategorialGrammar(CCG;Steedman,2000),astronglylexi-calisedtheoryoflanguageinwhichlexicalentriesforwordscontainalllanguage-specificinformation.Thelexicalentryforeachwordcontainsasyntacticcategory,whichdetermineswhichothercategoriesthewordmaycombinewith,andasemanticinter-pretation,whichdefinesthecompositionalseman-tics.Forexample,thelexiconmaycontaintheentry:write‘(S\NP)/NP:λyλx.write0(X,oui)Surtout,thereisatransparentinterfacebetweenthesyntacticcategoryandthesemantics.Forex-amplethetransitiveverbentryabovedefinestheverbsyntacticallyasafunctionmappingtwonoun-phrasestoasentence,andsemanticallyasabi-naryrelationbetweenitstwoargumententities.Thismeansthatitisrelativelystraightforwardtodeterministicallymapparseroutputtoalogicalform,asintheBoxersystem(Bos,2008).ThisEverydogbarksNP↑/NNS\NPλpλq.∀x[p(X)=⇒q(X)]λx.dog0(X)λx.bark0(X)>NP↑λq.∀x[dog0(X)=⇒q(X)]>S∀x[dog0(X)=⇒bark0(X)]Figure1:AstandardlogicalformderivationusingCCG.TheNP↑notationmeansthatthesubjectistype-raised,andtakingtheverb-phraseasanargument—soisanab-breviationofS/(S\NP).Thisisnecessaryinparttosup-portacorrectsemanticsforquantifiers.InputSentenceShakespearewroteMacbeth⇓Intialsemanticanalysiswritearg0,arg1(shakespeare,macbeth)⇓EntityTypingwritearg0:PER,arg1:BOOK(shakespeare:PER,macbeth:BOOK)⇓Distributionalsemanticanalysisrelation37(shakespeare:PER,macbeth:BOOK)Figure2:Layersusedinourmodel.formofsemanticscapturestheunderlyingpredicate-argumentstructure,butfailstolicensemanyimpor-tantinferences—as,forexample,writeandauthordonotmaptothesamepredicate.Inadditiontothelexicon,thereisasmallsetofbinarycombinatorsandunaryrules,whichhaveasyntacticandsemanticinterpretation.Figure1givesanexampleCCGderivation.3OverviewofApproachWeattempttolearnaCCGlexiconwhichmapsequivalentwordsontothesamelogicalform—forexamplelearningentriessuchas:author‘N/PP[de]:λxλy.relation37(X,oui)write‘(S\NP)/NP:λxλy.relation37(X,oui)TheonlychangetothestandardCCGderivationisthatthesymbolsusedinthelogicalformarearbi-traryrelationidentifiers.Welearnthesebyfirstmap-pingtoadeterministiclogicalform(usingpredicates

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
1
9
1
5
6
6
6
4
9

/

/
t

je

un
c
_
un
_
0
0
2
1
9
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

181

suchasauthor’andwrite’),usingaprocesssimi-lartoBoxer,andthenclusteringpredicatesbasedontheirarguments.Thislexiconcanthenbeusedtoparsenewsentences,andintegratesseamlesslywithCCGtheoriesofformalsemantics.Typingpredicates—forexample,determiningthatwritingisarelationbetweenpeopleandbooks—hasbecomestandardinrelationclustering(Schoen-mackersetal.,2010;Berantetal.,2011;Yaoetal.,2012).WedemonstatehowtobuildatypingmodelintotheCCGderivation,bysubcategorizingalltermsrepresentingentitiesinthelogicalformwithamoredetailedtype.Thesetypesarealsoin-ducedfromtext,asexplainedinSection5,butforconveniencewedescribethemwithhuman-readablelabels,suchasPER,LOCandBOOK.Akeyadvantageoftypingisthatitallowsustomodelambiguouspredicates.FollowingBerantetal.(2011),weassumethatdifferenttypesignaturesofthesamepredicatehavedifferentmeanings,butgivenatypesignatureapredicateisunambiguous.ForexampleadifferentlexicalentryfortheverbbornisusedinthecontextsObamawasborninHawaiiandObamawasbornin1961,reflectingadistinctioninthesemanticsthatisnotobviousinthesyntax1.Typingalsogreatlyimprovestheefficiencyofclustering,asweonlyneedtocomparepredicateswiththesametypeduringclustering(forexample,wedonothavetoconsiderclusteringapredicatebetweenpeopleandplaceswithpredicatesbetweenpeopleanddates).Inthiswork,wefocusoninducingbinaryrela-tions.Manyexistingapproacheshaveshownhowtoproducegoodclusteringsof(non-event)nouns(Brownetal.,1992),anyofwhichcouldbesim-plyintegratedintooursemantics—butrelationclus-teringremainsanopenproblem(seeSection9).N-aryrelationsarebinarized,bycreatingabi-naryrelationbetweeneachpairofarguments.Forexample,forthesentenceRussiasoldAlaskatotheUnitedStates,thesystemcreatesthreebinaryrelations—correspondingtosellToSomeone(Russia,Alaska),buyFromSomeone(NOUS,Alaska),sellSome-thingTo(Russia,NOUS).Thistransformationdoesnot1Whilstthisassumptionisveryuseful,itdoesnotalwayshold—forexample,thegenitiveinShakespeare’sbookisambigu-ousbetweenownershipandauthorshiprelationsevengiventhetypesofthearguments.exactlypreservemeaning,butstillcapturesthemostimportantrelations.Notethatthisallowsustocomparesemanticrelationsacrossdifferentsyntac-tictypes—forexample,bothtransitiveverbsandargument-takingnounscanbeseenasexpressingbi-narysemanticrelationsbetweenentities.Figure2showsthelayersusedinourmodel.4InitialSemanticAnalysisTheinitialsemanticanalysismapsparseroutputontoalogicalform,inasimilarwaytoBoxer.ThesemanticformalismisbasedonSteedman(2012).Thefirststepissyntacticparsing.WeusetheC&Cparser(ClarkandCurran,2004),trainedonCCGBank(HockenmaierandSteedman,2007),us-ingtherefinedversionofHonnibaletal.(2010)whichbringsthesyntaxclosertothepredicate-argumentstructure.Anautomaticpost-processingstepmakesanumberofminorchangestotheparseroutput,whichconvertsthegrammarintoonemoresuitableforoursemantics.PP(prepositionalphrase)andPR(phrasalverbcomplement)categoriesaresub-categorisedwiththerelevantpreposition.NouncompoundswiththesameMUCnamed-entitytype(ChinchorandRobinson,1997)aremergedintoasinglenon-compositionalnode2(weotherwiseig-norenamed-entitytypes).AllargumentNPsandPPsaretype-raised,allowingustorepresentquanti-fiers.Allprepositionalphrasesaretreatedascorear-guments(i.e.giventhecategoryPP,notadjunctcat-egorieslike(N\N)/NPor((S\NP)\(S\NP))/NP),asitisdifficultfortheparsertodistinguishargu-mentsandadjuncts.InitialsemanticlexicalentriesforalmostallwordscanbegeneratedautomaticallyfromthesyntacticcategoryandPOStag(obtainedfromtheparser),asthesyntacticcategorycapturestheunderlyingpredicate-argumentstructure.WeuseaDavidsonian-stylerepresentationofarguments(Davidson,1967),whichwebinarizebycreatingaseparatepredicateforeachpairofargumentsofaword.ThesepredicatesarelabelledwiththelemmaoftheheadwordandaPropbank-styleargumentkey(KingsburyandPalmer,2002),e.g.arg0,argIn.WedistinguishnounandverbpredicatesbasedonPOS2Forexample,thisallowsustogiveBarackObamatheseman-ticsλx.barackobama(X)insteadofλx.barack(X)∧obama(X),whichismoreconvenientforcollectingdistributionalstatistics.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
1
9
1
5
6
6
6
4
9

/

/
t

je

un
c
_
un
_
0
0
2
1
9
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

182

WordCategorySemanticsAutomaticauthorN/PP[de]λxλy.authorarg0,argOf(oui,X)write(S\NP)/NPλxλy.writearg0,arg1(oui,X)ManualeveryNP↑/Nλpλq.∀x[p(X)→q(X)]pas(S\NP)/(S\NP)λpλx.¬p(X)Figure3:Exampleinitiallexicalentriestag—so,forexample,wehavedifferentpredicatesforeffectasanounorverb.Thisalgorithmcanbeoverriddenwithman-uallexicalentriesforspecificclosed-classfunctionwords.Whilstitmaybepossibletolearnthesefromdata,ourapproachispragmaticastherearerelativelyfewsuchwords,andthecomplexlogicalformsrequiredwouldbedifficulttoinducefromdis-tributionalstatistics.Weaddasmallnumberoflexi-calentriesforwordssuchasnegatives(Non,notetc.),andquantifiers(numbers,chaque,every,tous,etc.).SomeexampleinitiallexicalentriesareshowninFigure3.5EntityTypingModelOurentity-typingmodelassignstypestonouns,whichisusefulfordisambiguatingpolysemouspredicates.OurapproachissimilartoO’Seaghdha(2010)inthatweaimtoclusterentitiesbasedonthenounandunarypredicatesappliedtothem(itissimpletoconvertfromthebinarypredicatestounarypredicates).Forexample,wewantthepair(bornargIn,1961)tomaptoaDATtype,et(bornargIn,Hawaii)tomaptoaLOCtype.Thisisnon-trivial,asboththepredicatesandargumentscanbeambiguousbetweenmultipletypes—buttopicmodelsofferagoodsolution(describedbelow).5.1TopicModelWeassumethatthetypeofeachargumentofapred-icatedependsonlyonthepredicateandargument,althoughRitteretal.(2010)demonstrateanadvan-tageofmodellingthejointprobabilityofthetypesofmultipleargumentsofthesamepredicate.WeusethestandardLatentDirichletAllocationmodel(Bleietal.,2003),whichperformscomparablytomorecomplexmodelsproposedinO’Seaghdha(2010).Intopic-modellingterminology,weconstructadocumentforeachunarypredicate(e.g.bornargIn),basedonallofitsargumententities(words).Weas-sumethattheseargumentsaredrawnfromasmallnumberoftypes(topics),suchasPER,DATorLOC3.Eachtypejhasamultinomialdistributionφjoverarguments(forexample,aLOCtypeismorelikelytogenerateHawaiithan1961).Eachunarypredicateihasamultinomialdistributionθiovertopics,sothebornargInpredicatewillnormallygen-erateaDATorLOCtype.SparseDirichletpriorsαandβonthemultinomialsbiasthedistributionstobepeaky.TheparametersareestimatedbyGibbssampling,usingtheMalletimplementation(McCal-lum,2002).Thegenerativestorytocreatethedatais:Foreverytypek:Drawthep(arg|k)distributionφkfromDir(β)Foreveryunarypredicatei:Drawthep(type|je)distributionθifromDir(un)Foreveryargumentj:DrawatypezijfromMult(θi)DrawanargumentwijfromMult(φθi)5.2TypinginLogicalFormInthelogicalform,allconstantsandvariablesrepre-sentingentitiesxcanbeassignedadistributionovertypespx(t)usingthetypemodel.Aninitialtypedistributionisappliedinthelexicon,usingtheφdistributionsforthetypesofnouns,andtheθidis-tributionsforthetypeofargumentsofbinarypredi-cates(invertedusingBayes’rule).Thenateachβ-reductioninthederivation,weupdateprobabilitiesofthetypestobetheproductofthetypedistribu-tionsofthetermsbeingreduced.Iftwotermsxand3Typesareinducedfromthetext,butwegivehuman-readablelabelshereforconvenience.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
1
9
1
5
6
6
6
4
9

/

/
t

je

un
c
_
un
_
0
0
2
1
9
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

183

fileasuit(S\NP)/NPNP↑λy:(DOC=0.5LEGAL=0.4CLOTHES=0.01…)λx:(cid:26)PER=0.7ORG=0.2…(cid:27).filearg0,arg1(X,oui)λp.∃y:(CLOTHES=0.6LEGAL=0.3DOC=0.001…)[suit0(oui)∧p(oui)]Télécharger le PDF