Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos.
Submitted 1/2013; Überarbeitet 3/2013; Published 5/2013. C(cid:13)2013 Verein für Computerlinguistik.
CombinedDistributionalandLogicalSemanticsMikeLewisSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKmike.lewis@ed.ac.ukMarkSteedmanSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKsteedman@inf.ed.ac.ukAbstractWeintroduceanewapproachtosemanticswhichcombinesthebenefitsofdistributionalandformallogicalsemantics.Distributionalmodelshavebeensuccessfulinmodellingthemeaningsofcontentwords,butlogicalse-manticsisnecessarytoadequatelyrepresentmanyfunctionwords.Wefollowformalse-manticsinmappinglanguagetologicalrep-resentations,butdifferinthattherelationalconstantsusedareinducedbyofflinedistri-butionalclusteringatthelevelofpredicate-argumentstructure.Ourclusteringalgorithmishighlyscalable,allowingustorunoncor-porathesizeofGigaword.Differentsensesofawordaredisambiguatedbasedontheirin-ducedtypes.Weoutperformavarietyofex-istingapproachesonawide-coveragequestionansweringtask,anddemonstratetheabilitytomakecomplexmulti-sentenceinferencesin-volvingquantifiersontheFraCaSsuite.1IntroductionMappingnaturallanguagetomeaningrepresenta-tionsisacentralchallengeofNLP.Therehasbeenmuchrecentprogressinunsuperviseddistributionalsemantics,inwhichthemeaningofawordisin-ducedbasedonitsusageinlargecorpora.Thisap-proachisusefulforarangeofkeyapplicationsin-cludingquestionansweringandrelationextraction(LinandPantel,2001;PoonandDomingos,2009;Yaoetal.,2011).Becausesuchasemanticscanbeautomicallyinduced,itescapesthelimitationofde-pendingonrelationsfromhand-builttrainingdata,knowledgebasesorontologies,whichhaveprovedoflimiteduseincapturingthehugevarietyofmean-ingsthatcanbeexpressedinlanguage.However,distributionalsemanticshaslargelyde-velopedinisolationfromtheformalsemanticsliter-ature.Whilstdistributionalsemanticshasbeenef-fectiveinmodellingthemeaningsofcontentwordssuchasnounsandverbs,itislessclearthatitcanbeappliedtothemeaningsoffunctionwords.Semanticoperators,suchasdeterminers,negation,conjunc-tions,modals,Zeitform,mood,aspect,andpluralsareubiquitousinnaturallanguage,andarecrucialforhighperformanceonmanypracticalapplications—butcurrentdistributionalmodelsstruggletocaptureevensimpleexamples.Conversely,computationalmodelsofformalsemanticshaveshownlowrecallonpracticalapplications,stemmingfromtheirre-lianceonontologiessuchasWordNet(Müller,1995)tomodelthemeaningsofcontentwords(Bobrowetal.,2007;BosandMarkert,2005).Forexample,considerwhatisneededtoansweraquestionlikeDidGooglebuyYouTube?fromthefollowingsentences:1.GooglepurchasedYouTube2.Google’sacquisitionofYouTube3.Googleacquiredeverycompany4.YouTubemaybesoldtoGoogle5.GooglewillbuyYouTubeorMicrosoft6.Googledidn’ttakeoverYouTubeAlloftheserequireknowledgeoflexicalseman-tics(e.g.thatbuyandpurchasearesynonyms),butsomealsoneedinterpretationofquantifiers,nega-tives,modalsanddisjunction.Itseemsunlikelythat
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
9
1
5
6
6
6
4
9
/
/
T
l
A
C
_
A
_
0
0
2
1
9
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
180
distributionalorformalapproachescanaccomplishthetaskalone.Weproposeamethodformappingnaturallan-guagetofirst-orderlogicrepresentationscapableofcapturingthemeaningsoffunctionwordssuchasevery,notandor,butwhichalsousesdistributionalstatisticstomodelthemeaningofcontentwords.Ourapproachdiffersfromstandardformalseman-ticsinthatthenon-logicalsymbolsusedinthelog-icalformareclusteridentifiers.Wherestandardse-manticformalismswouldmaptheverbwritetoawrite’symbol,wemapittoaclusteridentifiersuchasrelation37,whichthenounauthormayalsomapto.Thismappingislearntbyofflineclustering.Unlikepreviousdistributionalapproaches,weperformclusteringatthelevelofpredicate-argumentstructure,ratherthansyntacticdependencystruc-ture.Thismeansthatweabstractawayfrommanysyntacticdifferencesthatarenotpresentinthese-mantics,suchasconjunctions,passives,relativeclauses,andlong-rangedependencies.Thissignifi-cantlyreducessparsity,sowehavefewerpredicatestoclusterandmoreobservationsforeach.Ofcourse,manypracticalinferencesrelyheavilyonbackgroundknowledgeabouttheworld—suchknowledgefallsoutsidethescopeofthiswork.2BackgroundOurapproachisbasedonCombinatoryCategorialGrammar(CCG;Steedman,2000),astronglylexi-calisedtheoryoflanguageinwhichlexicalentriesforwordscontainalllanguage-specificinformation.Thelexicalentryforeachwordcontainsasyntacticcategory,whichdetermineswhichothercategoriesthewordmaycombinewith,andasemanticinter-pretation,whichdefinesthecompositionalseman-tics.Forexample,thelexiconmaycontaintheentry:write‘(S\NP)/NP:λyλx.write0(X,j)Crucially,thereisatransparentinterfacebetweenthesyntacticcategoryandthesemantics.Forex-amplethetransitiveverbentryabovedefinestheverbsyntacticallyasafunctionmappingtwonoun-phrasestoasentence,andsemanticallyasabi-naryrelationbetweenitstwoargumententities.Thismeansthatitisrelativelystraightforwardtodeterministicallymapparseroutputtoalogicalform,asintheBoxersystem(Bos,2008).ThisEverydogbarksNP↑/NNS\NPλpλq.∀x[P(X)=⇒q(X)]λx.dog0(X)λx.bark0(X)>NP↑λq.∀x[dog0(X)=⇒q(X)]>S∀x[dog0(X)=⇒bark0(X)]Figure1:AstandardlogicalformderivationusingCCG.TheNP↑notationmeansthatthesubjectistype-raised,andtakingtheverb-phraseasanargument—soisanab-breviationofS/(S\NP).Thisisnecessaryinparttosup-portacorrectsemanticsforquantifiers.InputSentenceShakespearewroteMacbeth⇓Intialsemanticanalysiswritearg0,arg1(shakespeare,macbeth)⇓EntityTypingwritearg0:PER,arg1:BOOK(shakespeare:PER,macbeth:BOOK)⇓Distributionalsemanticanalysisrelation37(shakespeare:PER,macbeth:BOOK)Figure2:Layersusedinourmodel.formofsemanticscapturestheunderlyingpredicate-argumentstructure,butfailstolicensemanyimpor-tantinferences—as,forexample,writeandauthordonotmaptothesamepredicate.Inadditiontothelexicon,thereisasmallsetofbinarycombinatorsandunaryrules,whichhaveasyntacticandsemanticinterpretation.Figure1givesanexampleCCGderivation.3OverviewofApproachWeattempttolearnaCCGlexiconwhichmapsequivalentwordsontothesamelogicalform—forexamplelearningentriessuchas:author‘N/PP[von]:λxλy.relation37(X,j)write‘(S\NP)/NP:λxλy.relation37(X,j)TheonlychangetothestandardCCGderivationisthatthesymbolsusedinthelogicalformarearbi-traryrelationidentifiers.Welearnthesebyfirstmap-pingtoadeterministiclogicalform(usingpredicates
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
9
1
5
6
6
6
4
9
/
/
T
l
A
C
_
A
_
0
0
2
1
9
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
181
suchasauthor’andwrite’),usingaprocesssimi-lartoBoxer,andthenclusteringpredicatesbasedontheirarguments.Thislexiconcanthenbeusedtoparsenewsentences,andintegratesseamlesslywithCCGtheoriesofformalsemantics.Typingpredicates—forexample,determiningthatwritingisarelationbetweenpeopleandbooks—hasbecomestandardinrelationclustering(Schoen-mackersetal.,2010;Berantetal.,2011;Yaoetal.,2012).WedemonstatehowtobuildatypingmodelintotheCCGderivation,bysubcategorizingalltermsrepresentingentitiesinthelogicalformwithamoredetailedtype.Thesetypesarealsoin-ducedfromtext,asexplainedinSection5,butforconveniencewedescribethemwithhuman-readablelabels,suchasPER,LOCandBOOK.Akeyadvantageoftypingisthatitallowsustomodelambiguouspredicates.FollowingBerantetal.(2011),weassumethatdifferenttypesignaturesofthesamepredicatehavedifferentmeanings,butgivenatypesignatureapredicateisunambiguous.ForexampleadifferentlexicalentryfortheverbbornisusedinthecontextsObamawasborninHawaiiandObamawasbornin1961,reflectingadistinctioninthesemanticsthatisnotobviousinthesyntax1.Typingalsogreatlyimprovestheefficiencyofclustering,asweonlyneedtocomparepredicateswiththesametypeduringclustering(forexample,wedonothavetoconsiderclusteringapredicatebetweenpeopleandplaceswithpredicatesbetweenpeopleanddates).Inthiswork,wefocusoninducingbinaryrela-tions.Manyexistingapproacheshaveshownhowtoproducegoodclusteringsof(non-event)nouns(Brownetal.,1992),anyofwhichcouldbesim-plyintegratedintooursemantics—butrelationclus-teringremainsanopenproblem(seeSection9).N-aryrelationsarebinarized,bycreatingabi-naryrelationbetweeneachpairofarguments.Forexample,forthesentenceRussiasoldAlaskatotheUnitedStates,thesystemcreatesthreebinaryrelations—correspondingtosellToSomeone(Russland,Alaska),buyFromSomeone(US,Alaska),sellSome-thingTo(Russland,US).Thistransformationdoesnot1Whilstthisassumptionisveryuseful,itdoesnotalwayshold—forexample,thegenitiveinShakespeare’sbookisambigu-ousbetweenownershipandauthorshiprelationsevengiventhetypesofthearguments.exactlypreservemeaning,butstillcapturesthemostimportantrelations.Notethatthisallowsustocomparesemanticrelationsacrossdifferentsyntac-tictypes—forexample,bothtransitiveverbsandargument-takingnounscanbeseenasexpressingbi-narysemanticrelationsbetweenentities.Figure2showsthelayersusedinourmodel.4InitialSemanticAnalysisTheinitialsemanticanalysismapsparseroutputontoalogicalform,inasimilarwaytoBoxer.ThesemanticformalismisbasedonSteedman(2012).Thefirststepissyntacticparsing.WeusetheC&Cparser(ClarkandCurran,2004),trainedonCCGBank(HockenmaierandSteedman,2007),us-ingtherefinedversionofHonnibaletal.(2010)whichbringsthesyntaxclosertothepredicate-argumentstructure.Anautomaticpost-processingstepmakesanumberofminorchangestotheparseroutput,whichconvertsthegrammarintoonemoresuitableforoursemantics.PP(prepositionalphrase)andPR(phrasalverbcomplement)categoriesaresub-categorisedwiththerelevantpreposition.NouncompoundswiththesameMUCnamed-entitytype(ChinchorandRobinson,1997)aremergedintoasinglenon-compositionalnode2(weotherwiseig-norenamed-entitytypes).AllargumentNPsandPPsaretype-raised,allowingustorepresentquanti-fiers.Allprepositionalphrasesaretreatedascorear-guments(i.e.giventhecategoryPP,notadjunctcat-egorieslike(N\N)/NPor((S\NP)\(S\NP))/NP),asitisdifficultfortheparsertodistinguishargu-mentsandadjuncts.InitialsemanticlexicalentriesforalmostallwordscanbegeneratedautomaticallyfromthesyntacticcategoryandPOStag(obtainedfromtheparser),asthesyntacticcategorycapturestheunderlyingpredicate-argumentstructure.WeuseaDavidsonian-stylerepresentationofarguments(Davidson,1967),whichwebinarizebycreatingaseparatepredicateforeachpairofargumentsofaword.ThesepredicatesarelabelledwiththelemmaoftheheadwordandaPropbank-styleargumentkey(KingsburyandPalmer,2002),e.g.arg0,argIn.WedistinguishnounandverbpredicatesbasedonPOS2Forexample,thisallowsustogiveBarackObamatheseman-ticsλx.barackobama(X)insteadofλx.barack(X)∧obama(X),whichismoreconvenientforcollectingdistributionalstatistics.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
9
1
5
6
6
6
4
9
/
/
T
l
A
C
_
A
_
0
0
2
1
9
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
182
WordCategorySemanticsAutomaticauthorN/PP[von]λxλy.authorarg0,argOf(j,X)schreiben(S\NP)/NPλxλy.writearg0,arg1(j,X)ManualeveryNP↑/Nλpλq.∀x[P(X)→q(X)]nicht(S\NP)/(S\NP)λpλx.¬p(X)Figure3:Exampleinitiallexicalentriestag—so,forexample,wehavedifferentpredicatesforeffectasanounorverb.Thisalgorithmcanbeoverriddenwithman-uallexicalentriesforspecificclosed-classfunctionwords.Whilstitmaybepossibletolearnthesefromdata,ourapproachispragmaticastherearerelativelyfewsuchwords,andthecomplexlogicalformsrequiredwouldbedifficulttoinducefromdis-tributionalstatistics.Weaddasmallnumberoflexi-calentriesforwordssuchasnegatives(NEIN,notetc.),andquantifiers(Zahlen,jede,jeden,alle,usw.).SomeexampleinitiallexicalentriesareshowninFigure3.5EntityTypingModelOurentity-typingmodelassignstypestonouns,whichisusefulfordisambiguatingpolysemouspredicates.OurapproachissimilartoO’Seaghdha(2010)inthatweaimtoclusterentitiesbasedonthenounandunarypredicatesappliedtothem(itissimpletoconvertfromthebinarypredicatestounarypredicates).Forexample,wewantthepair(bornargIn,1961)tomaptoaDATtype,Und(bornargIn,Hawaii)tomaptoaLOCtype.Thisisnon-trivial,asboththepredicatesandargumentscanbeambiguousbetweenmultipletypes—buttopicmodelsofferagoodsolution(describedbelow).5.1TopicModelWeassumethatthetypeofeachargumentofapred-icatedependsonlyonthepredicateandargument,althoughRitteretal.(2010)demonstrateanadvan-tageofmodellingthejointprobabilityofthetypesofmultipleargumentsofthesamepredicate.WeusethestandardLatentDirichletAllocationmodel(Bleietal.,2003),whichperformscomparablytomorecomplexmodelsproposedinO’Seaghdha(2010).Intopic-modellingterminology,weconstructadocumentforeachunarypredicate(e.g.bornargIn),basedonallofitsargumententities(Wörter).Weas-sumethattheseargumentsaredrawnfromasmallnumberoftypes(topics),suchasPER,DATorLOC3.Eachtypejhasamultinomialdistributionφjoverarguments(forexample,aLOCtypeismorelikelytogenerateHawaiithan1961).Eachunarypredicateihasamultinomialdistributionθiovertopics,sothebornargInpredicatewillnormallygen-erateaDATorLOCtype.SparseDirichletpriorsαandβonthemultinomialsbiasthedistributionstobepeaky.TheparametersareestimatedbyGibbssampling,usingtheMalletimplementation(McCal-lum,2002).Thegenerativestorytocreatethedatais:Foreverytypek:Drawthep(arg|k)distributionφkfromDir(β)Foreveryunarypredicatei:Drawthep(type|ich)distributionθifromDir(α)Foreveryargumentj:DrawatypezijfromMult(θi)DrawanargumentwijfromMult(φθi)5.2TypinginLogicalFormInthelogicalform,allconstantsandvariablesrepre-sentingentitiesxcanbeassignedadistributionovertypespx(T)usingthetypemodel.Aninitialtypedistributionisappliedinthelexicon,usingtheφdistributionsforthetypesofnouns,andtheθidis-tributionsforthetypeofargumentsofbinarypredi-cates(invertedusingBayes’rule).Thenateachβ-reductioninthederivation,weupdateprobabilitiesofthetypestobetheproductofthetypedistribu-tionsofthetermsbeingreduced.Iftwotermsxand3Typesareinducedfromthetext,butwegivehuman-readablelabelshereforconvenience.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
9
1
5
6
6
6
4
9
/
/
T
l
A
C
_
A
_
0
0
2
1
9
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
183
fileasuit(S\NP)/NPNP↑λy:(DOC=0.5LEGAL=0.4CLOTHES=0.01…)λx:(cid:26)PER=0.7ORG=0.2…(cid:27).filearg0,arg1(X,j)λp.∃y:(CLOTHES=0.6LEGAL=0.3DOC=0.001…)[suit0(j)∧p(j)]PDF Herunterladen