Transactions of the Association for Computational Linguistics, 1 (2013) 193–206. Action Editor: Jason Eisner.
Submitted 10/2012; Überarbeitet 3/2013; Published 5/2013. C
(cid:13)
2013 Verein für Computerlinguistik.
JointlyLearningtoParseandPerceive:ConnectingNaturalLanguagetothePhysicalWorldJayantKrishnamurthyComputerScienceDepartmentCarnegieMellonUniversityjayantk@cs.cmu.eduThomasKollarComputerScienceDepartmentCarnegieMellonUniversitytkollar@andrew.cmu.eduAbstractThispaperintroducesLogicalSemanticswithPerception(LSP),amodelforgroundedlan-guageacquisitionthatlearnstomapnatu-rallanguagestatementstotheirreferentsinaphysicalenvironment.Forexample,givenanimage,LSPcanmapthestatement“bluemugonthetable”tothesetofimageseg-mentsshowingbluemugsontables.LSPlearnsphysicalrepresentationsforbothcate-gorical(“blue,”“mug”)andrelational(“on”)Sprache,andalsolearnstocomposetheserepresentationstoproducethereferentsofen-tirestatements.WefurtherintroduceaweaklysupervisedtrainingprocedurethatestimatesLSP’sparametersusingannotatedreferentsforentirestatements,withoutannotatedref-erentsforindividualwordsortheparsestruc-tureofthestatement.Weperformexperimentsontwoapplications:sceneunderstandingandgeographicalquestionanswering.WefindthatLSPoutperformsexisting,lessexpressivemodelsthatcannotrepresentrelationallan-guage.Wefurtherfindthatweaklysupervisedtrainingiscompetitivewithfullysupervisedtrainingwhilerequiringsignificantlylessan-notationeffort.1IntroductionLearningthemappingfromnaturallanguagetophysicalenvironmentsisacentralproblemfornatu-rallanguagesemantics.Understandingthismappingisnecessarytoenablenaturallanguageinteractionswithrobotsandotherembodiedsystems.Forexam-ple,foranautonomousrobottounderstandthesen-tence“Thebluemugisonthetable,”itmustbeabletoidentify(1)theobjectsinitsenvironmentcorre-spondingto“bluemug”and“table,”and(2)theob-jectswhichparticipateinthespatialrelationdenotedby“on.”Iftherobotcansuccessfullyidentifytheseobjects,itunderstandsthemeaningofthesentence.Theproblemoflearningtomapfromnaturallan-guageexpressionstotheirreferentsinanenviron-mentisknownasgroundedlanguageacquisition.Inembodiedsettings,environmentsconsistofrawsensordata–forexample,anenvironmentcouldbeanimagecollectedfromarobot’scamera.Insuchapplications,groundedlanguageacquisitionhastwosubproblems:parsing,learningthecompositionalstructureofnaturallanguage;andperception,learn-ingtheenvironmentalreferentsofindividualwords.Acquiringbothkindsofknowledgeisnecessarytounderstandnovellanguageinnovelenvironments.Unfortunately,perceptionisoftenignoredinworkonlanguageacquisition.Othervariantsofgroundedlanguageacquisitioneliminatetheneedforpercep-tionbyassumingaccesstoalogicalrepresentationoftheenvironment(ZettlemoyerandCollins,2005;WongandMooney,2006;Matuszeketal.,2010;ChenandMooney,2011;Liangetal.,2011).Theexistingworkwhichhasjointlyaddressedbothpars-ingandperceptionhassignificantdrawbacks,in-cluding:(1)fullysupervisedmodelsrequiringlargeamountsofmanualannotationand(2)limitedse-manticrepresentations(Kollaretal.,2010;Tellexetal.,2011;Matuszeketal.,2012).ThispaperintroducesLogicalSemanticswithPerception(LSP),amodelforgroundedlanguageacquisitionthatjointlylearnstosemanticallyparselanguageandperceivetheworld.LSPmodelsamappingfromnaturallanguagequeriestosetsofob-jectsinareal-worldenvironment.TheinputtoLSPisanenvironmentcontainingobjects,suchasaseg-
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
194
(A)Anenvironmentcontaining4objects(imagesegments).Umfeld:(imageonleft)KnowledgeBaseQuery:“thingstotherightofthebluemug”SemanticParseGrounding:{(2,1),(3,1)}Denotation:{2,3}(B)LSPpredictingtheenvironmentalreferentsofanaturallanguagequery.LanguageDenotationThemugs{1,3}Theobjectsonthetable{1,2,3}ThereisanLCDmonitor{2}Isthebluemugright{}ofthemonitor?Themonitorisbehind{2}thebluecup.(C)Trainingexamplesforweaklysu-pervisedtraining.Figure1:LSPappliedtosceneunderstanding.Givenanenvironmentcontainingasetofobjects(links),andanaturallanguagequery,LSPproducesasemanticparse,logicalknowledgebase,groundinganddenotation(Mitte),usingonlylanguage/denotationpairs(Rechts)fortraining.mentedimage(Figure1a),andanaturallanguagequery,suchas“thethingstotherightofthebluemug.”Giventheseinputs,LSPproduces(1)alogi-calknowledgebasedescribingobjectsandrelation-shipsintheenvironmentand(2)asemanticparseofthequerycapturingitscompositionalstructure.LSPcombinesthesetwooutputstoproducethequery’sgrounding,whichisthesetofobjectreferentsofthequery’snounphrases,anditsdenotation,whichisthequery’sanswer(Figure1b).1WeaklysupervisedtrainingestimatesparametersforLSPusingqueriesannotatedwiththeirdenotationsinanenvironment(Figure1c).Thisworkhastwocontributions.Thefirstcon-tributionisLSP,whichismoreexpressivethanpre-viousmodels,representingbothone-argumentcat-egoriesandtwo-argumentrelationsoversetsofob-jectsintheenvironment.Thesecondcontributionisaweaklysupervisedtrainingprocedurethatesti-matesLSP’sparameterswithoutannotatedsemanticparses,nounphrase/objectmappings,ormanually-constructedknowledgebases.Weperformexperimentsontwodifferentapplica-tions.Thefirstapplicationissceneunderstanding,whereLSPgroundsdescriptionsofimagesinim-agesegments.Thesecondapplicationisgeograph-icalquestionanswering,whereLSPlearnstoan-swerquestionsaboutlocations,representedaspoly-gonsonamap.Ingeographicalquestionanswering,1Wetreatdeclarativesentencesasiftheywerequeriesabouttheirsubject,e.g.,thedenotationof“themugisonthetable”isthesetofmugsontables.Typically,thedenotationofasentenceiseithertrueorfalse;ourtreatmentisstrictlymoregeneral,asasentence’sdenotationisnonemptyifandonlyifthesentenceistrue.LSPcorrectlyanswers34%morequestionsthanthemostcomparablestate-of-the-artmodel(Matuszeketal.,2012).Insceneunderstanding,accuracysim-ilarlyimprovesby16%.Furthermore,weaklysu-pervisedtrainingachievesanaccuracywithin6%ofthatachievedbyfullysupervisedtraining,whilere-quiringsignificantlylessannotationeffort.2PriorWorkLogicalSemanticswithPerception(LSP)isrelatedtoworkfromplanning,naturallanguageprocessing,computervisionandrobotics.Muchoftherelatedworkfocusesoninterpretingnaturallanguageus-ingafixedformalrepresentation.Someworkcon-structsintegratedsystemswhichexecuteplansinre-sponsetonaturallanguagecommands(Winograd,1970;Hsiaoetal.,2003;Royetal.,2003;Skubicetal.,2004;MacMahonetal.,2006;LevitandRoy,2007;Kruijffetal.,2007).Thesesystemsparsenaturallanguagetoaformalrepresentationwhichcanbeexecutedusingasetoffixedcontrolpro-grams.Similarly,workonsemanticparsinglearnstomapnaturallanguagetoagivenformalrepre-sentation.Semanticparserscanbetrainedusingsentencesannotatedwiththeirformalrepresentation(ZelleandMooney,1996;ZettlemoyerandCollins,2005;KateandMooney,2006;Kwiatkowskietal.,2010)orvariouslessrestrictiveannotations(Clarkeetal.,2010;Liangetal.,2011;KrishnamurthyandMitchell,2012).Endlich,workongroundedlan-guageacquisitionleveragessemanticparsingtomapfromnaturallanguagetoaformalrepresentationofanenvironment(KateandMooney,2007;ChenandMooney,2008;ShimizuandHaas,2009;Matuszek
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
195
EnvironmentdKnow.BaseΓmug(1)mug(3)Blau(1)table(4)on-rel(1,4)on-rel(3,4)…(A)Perceptionfperproducesalogicalknowl-edgebaseΓfromtheenvironmentdusinganindependentclassifierforeachcategoryandrelation.Languagez“bluemugontable”Logicalform‘λx.∃y.blue(X)∧mug(X)∧on-rel(X,j)∧table(j)(B)Semanticparsingfprsmapslanguageztoalog-icalform‘.Grounding:g={(1,4)},Denotation:γ={1}{1}{1}Blau(X){1,3}mug(X){(1,4),(3,4)}{(1,4),(3,4)}on-rel(X,j){4}table(j)(C)Evaluationfevalevaluatesalogicalform‘onalogicalknowledgebaseΓtoproduceagroundingganddenotationγ.Figure2:OverviewofLogicalSemanticswithPerception(LSP).etal.,2010;Dzifcaketal.,2009;Cantrelletal.,2010;ChenandMooney,2011).Allofthisworkas-sumesthattheformalenvironmentrepresentationisgiven,whileLSPlearnstoproducethisformalrep-resentationfromrawsensorinput.MostsimilartoLSPisworkonsimultaneouslyunderstandingnaturallanguageandperceivingtheenvironment.Thisproblemhasbeenaddressedinthecontextofrobotdirectionfollowing(Kollaretal.,2010;Tellexetal.,2011)andvisualattributelearning(Matuszeketal.,2012).Jedoch,thisworkislesssemanticallyexpressivethanLSPandtrainedusingmoresupervision.TheG3model(Kol-laretal.,2010;Tellexetal.,2011)assumesaone-to-onemappingfromnounphrasestoentitiesandistrainedusingfullsupervision,whileLSPallowsone-to-manymappingsfromnounphrasestoentitiesandcanbetrainedusingminimalannotation.Ma-tuszeketal.(2012)learnsonlyone-argumentcate-gories(“attributes”)andrequiresafullysupervisedinitialtrainingphase.Incontrast,LSPmodelstwo-argumentrelationsandallowsforweaklysupervisedsupervisedtrainingthroughout.3LogicalSemanticswithPerceptionLogicalSemanticswithPerception(LSP)isamodelforgroundedlanguageacquisition.LSPacceptsasinputanaturallanguagestatementandanenviron-mentandoutputstheobjectsintheenvironmentde-notedbythestatement.TheLSPmodelhasthreecomponents:perception,parsingandevaluation(seeFigure2).Theperceptioncomponentconstructslogicalknowledgebasesfromlow-levelfeature-basedrepresentationsofenvironments.Thepars-ingcomponentsemanticallyparsesnaturallanguageintolambdacalculusqueriesagainsttheconstructedknowledgebase.Finally,theevaluationcompo-nentdeterministicallyexecutesthisqueryagainsttheknowledgebasetoproduceLSP’soutput.TheoutputofLSPcanbeeitheradenotationoragrounding.Adenotationisthesetofentityrefer-entsforthephraseasawhole,whileagroundingisthesetofentityreferentsforeachcomponentofthephrase.ThedistinctionbetweenthesetwooutputsisshowninFigure1b.Inthisexample,thedenotationisthesetof“thingstotherightofthebluemug,”whichdoesnotincludethebluemugitself.Ontheotherhand,thegroundingincludesboththerefer-entsof“things”and“bluemug.”Onlydenotationsareusedduringtraining,soweignoregroundingsinthefollowingmodeldescription.However,ground-ingsareusedinourevaluation,astheyareamorecompletedescriptionofthemodel’sunderstanding.Formally,LSPisalinearmodelfthatpredictsadenotationγgivenanaturallanguagestatementzinanenvironmentd.AsshowninFigure3,thestruc-tureofLSPfactorsintoperception(fper),semanticparsing(fprs)andevaluation(feval)componentsus-ingseverallatentvariables:F(γ,Γ,„,T,z,D;θ)=fper(Γ,D;θper)+fprs(„,T,z;θprs)+feval(γ,Γ,„)LSPassumesaccesstoasetofpredicatesthattakeeitheroneargument,calledcategories(c∈C)ortwoarguments,calledrelations(r∈R).2ThesepredicatesaretheinterfacebetweenLSP’spercep-tionandparsingcomponents.Theperceptionfunc-tionfpertakesanenvironmentdandproducesalog-2Thesetofpredicatesarederivedfromourtrainingdata.SeeSection5.3.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
196
γfevalΓdfperℓzfprstFigure3:FactorgraphofLSP.Theenvironmentdandlanguagezaregivenasinput,fromwhichthemodelpredictsalogicalknowledgebaseΓ,logicalform‘,syntactictreetanddenotationγ.icalknowledgebaseΓthatassignstruthvaluestoinstancesofthesepredicatesusingparametersθper.Thisfunctionusesanindependentclassifiertopre-dicttheinstancesofeachpredicate.Theseman-ticparserfprstakesanaturallanguagestatementzandproducesalogicalform‘andsyntacticparsetusingparametersθprs.Thelogicalform‘isadatabasequeryexpressedinlambdacalculusnota-tion,constructedbylogicallycombiningthegivenpredicates.Finally,theevaluationfunctionfevalde-terministicallyevaluatesthelogicalform‘ontheknowledgebaseΓtoproduceadenotationγ.ThesecomponentsareillustratedinFigure2.Thefollowingsectionsdescribethepercep-tionfunction(Section3.1),semanticparser(Sec-tion3.2),evaluationfunction(Section3.3),andin-ference(Section3.4)inmoredetail.3.1PerceptionFunctionTheperceptionfunctionfperconstructsalogicalknowledgebaseΓgivenanenvironmentd.Theper-ceptionfunctionassumesthatanenvironmentcon-tainsacollectionofentitiese∈Ed.Theknowl-edgebaseproducedbyperceptionisacollectionofgroundpredicateinstancesusingtheseentities.Forexample,inFigure2a,theentireimageistheenvi-ronment,andeachimagesegmentisanentity.ThelogicalknowledgebaseΓcontainstheshownpred-icateinstances,wherethecategoriesincludeblue,mugandtable,andtherelationsincludeon-rel.Theperceptionfunctionscoreslogicalknowledgebasesusingasetofper-predicatebinaryclassifiers.Theseclassifiersindependentlyassignascoretowhethereachentity(entitypair)isanelementofeachcategory(relation).Letγc∈Γdenotethesetofentitieswhichareelementsofcategoryc;simi-larly,letγr∈Γdenotethesetofentitypairswhichareelementsoftherelationr.Giventhesesets,thescoreofalogicalknowledgebaseΓfactorsintoper-relationandper-categoryscoresh:fper(Γ,D;θper)=Xc∈Ch(γc,D;θcper)+Xr∈Rh(γr,D;θrper)Theper-predicatescoresareinturngivenbyasumofper-elementclassificationscores:H(γc,D;θcper)=Xe∈Edγc(e)(θcper)Tφcat(e)H(γr,D;θrper)=X(e1,e2)∈Edγr(e1,e2)(θrper)Tφrel(e1,e2)Eachtermintheabovesumsrepresentsasinglebinaryclassification,determiningthescoreforasin-gleentity(entitypair)belongingtoaparticularcat-egory(relation).Wetreatγcandγrasindicatorfunctionsforthesetstheydenote,i.e.,γc(e)=1forentitieseintheset,and0otherwise.Similarly,γr(e1,e2)=1forentitypairs(e1,e2)intheset,and0otherwise.Thefeaturesoftheseclassifiersaregivenbyφcatandφrel,whicharefeaturefunctionsthatmapentitiesandentitypairstofeaturevectors.Theparametersoftheseclassifiersaregivenbyθcperandθrper.Theperceptionparametersθpercontainonesuchsetofparametersforeverycategoryandre-lation,i.e.,θper={θcper:c∈C}∪{θrper:r∈R}.3.2SemanticParserThegoalofsemanticparsingistoidentifywhichportionsoftheinputnaturallanguagedenoteenti-tiesandrelationshipsbetweenentitiesintheenvi-ronment.Semanticparsingaccomplishesthisgoalbymappingfromnaturallanguagetoalogicalformthatexplicitlydescribesthelanguage’sentityrefer-entsusingone-andtwo-argumentpredicates.Thelogicalformiscombinedwithinstancesofthesepredicatestoproducethestatement’sdenotation.LSP’ssemanticparserisdefinedusingCombina-toryCategorialGrammar(CCG)(Steedman,1996).ThegrammaroftheparserisgivenbyalexiconΛwhichmapswordstosyntacticcategoriesandlogi-calforms.Forexample,“mug”mayhavethesyn-tacticcategoryNfornoun,andthelogicalformλx.mug(X),denotingthesetofallentitiesxsuchthatmugistrue.Duringparsing,thelogicalformsforadjacentphrasesarecombinedtoproducethelogicalformforthecompletestatement.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
197
theN/Nλf.fmugsNλx.mug(X)N:λx.mug(X)Sind(S\N)/Nλf.λg.λx.g(X)∧f(X)rightN/PPλf.λx.∃y.right-rel(X,j)∧f(j)ofPP/Nλf.ftheN/Nλf.fmonitorNλx.monitor(X)N:λx.monitor(X)PP:λx.monitor(X)N:λx.∃y.right-rel(X,j)∧monitor(j)S\N:λg.λx.∃y.g(X)∧right-rel(X,j)∧monitor(j)S:λx.∃y.mug(X)∧right-rel(X,j)∧monitor(j)Figure4:Exampleparseof“themugsarerightofthemonitor.”Thefirstrowofthederivationretrieveslexicalcategoriesfromthelexicon,whiletheremainingrowsrepresentapplicationsofCCGcombinators.Figure4illustrateshowCCGparsingproducesasyntactictreetandalogicalform‘.Thetoprowoftheparserepresentsretrievingalexiconentryforeachword.Eachsuccessiverowcombinesapairofentriesbyapplyingalogicalformtoanadjacentar-gument.Agivensentencemayhavemultipleparsesliketheoneshown,usingadifferentsetoflexiconentriesoradifferentorderoffunctionapplications.Thesemanticparserscoreseachsuchparse,learningtodistinguishcorrectandincorrectparses.ThesemanticparserinLSPisalinearmodeloverCCGparses(„,T)givenlanguagez:fprs(„,T,z;θprs)=θTprsφprs(„,T,z)Hier,φprs(„,T,z)representsafeaturefunctionmap-pingCCGparsestovectorsoffeaturevalues.φprsfactorizesaccordingtothetreestructureoftheCCGparse;itcontainsfeaturesforlocalparsingopera-tionswhicharesummedtoproducethefeatureval-uesforatree.Iftheparsetreeisaterminal,Dann:φprs(„,T,z)=1(lexiconentry)Thenotation1(X)denotesavectorwithasingleoneentrywhosepositionisdeterminedbyx.Thetermi-nalfeaturesareindicatorfeaturesforeachlexiconentry,asshowninthetoprowofFigure4.Thesefeaturesallowthemodeltolearnthecorrectsyntac-ticandsemanticfunctionofeachword.Iftheparsetreeisanonterminal,Dann:φprs(„,T,z)=φprs(links(„,T,z))+φprs(Rechts(„,T,z))+1(combinator)Thesenonterminalfeaturesaredefinedovercombi-natorrulesintheparsetree,asintheremainingrowsofFigure4.Thesefeaturesallowthemodeltolearnwhichadjacentparsetreesarelikelytocombine.WereferthereadertoZettlemoyerandCollins(2005)formoreinformationaboutCCGsemanticparsing.3.3EvaluationFunctionTheevaluationfunctionfevaldeterministicallyscoresdenotationsgivenalogicalform‘andalogicalknowledgebaseΓ.Intuitively,theevalu-ationfunctionsimplyevaluatesthequery‘onthedatabaseΓtoproduceadenotation.Theevaluationfunctionthenassignsscore0tothisdenotation,andscore−∞toallotherdenotations.Wedescribefevalbygivingarecurrenceforcom-putingthedenotationγofalogicalform‘onalog-icalknowledgebaseΓ.Thisevaluationtakestheformofatree,asinFigure2c.Thebasecasesare:•If‘=λx.c(X)thenγ=γc.•If‘=λx.λy.r(X,j),thenγ=γr.Thedenotationsformorecomplexlogicalformsarecomputedrecursivelybydecomposing‘accord-ingtoitslogicalstructure.Ourlogicalformscon-tainonlyconjunctionsandexistentialquantifiers;thecorrespondingrecursivecomputationsare:•If‘=λx.‘1(X)∧‘2(X),thenγ(e)=1iffγ1(e)=1∧γ2(e)=1.•If‘=λx.∃y.‘1(X,j),thenγ(e1)=1iff∃e2.γ1(e1,e2)=1.Notethatasimilarrecurrencecanbeusedtocom-putegroundings:simplyretainthesatisfyingassign-mentstoexistentially-quantifiedvariables.3.4InferenceThebasicinferenceprobleminLSPistopredictadenotationγgivenlanguagezandanenvironmentd.Thisinferenceproblemisstraightforwardduetothedeterministicstructureoffeval.Thehighest-scoringγcanbefoundbyindependentlymaximiz-ingfprsandfpertofindthehighest-scoringlogicalform‘andlogicalknowledgebaseΓ.Deterministi-callyevaluatingtherecurrenceforfevalusingthesevaluesyieldsthehighest-scoringdenotation.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
198
Anotherinferenceproblemoccursduringtrain-ing:identifythehighest-scoringlogicalformandknowledgebasewhichproduceaparticulardenota-tion.OurapproximateinferencealgorithmforthisproblemisdescribedinSection4.2.4WeaklySupervisedParameterEstimationThissectiondescribesaweaklysupervisedtrainingprocedureforLSP,whichestimatesparametersus-ingacorpusofsentenceswithannotateddenota-tions.Thealgorithmjointlytrainsboththepars-ingandtheperceptioncomponentsofLSPtobestpredictthedenotationsoftheobservedtrainingsen-tences.OurapproachtrainsLSPasamaximummarginMarkovnetworkusingthestochasticsubgra-dientmethod.Themaindifficultyiscomputingthesubgradient,whichrequirescomputingvaluesforthemodel’shiddenvariables,i.e.,thelogicalknowl-edgebaseΓandsemanticparse‘thatareresponsi-bleforthemodel’sprediction.4.1StochasticSubgradientMethodThetrainingproceduretrainsLSPasamaximummarginMarkovnetwork(Taskaretal.,2004),astructuredanalogofasupportvectormachine.Thetrainingdataforourweaklysupervisedalgorithmisacollection{(zi,γi,di)}ni=1,consistingoflanguagezipairedwithitsdenotationγiinenvironmentdi.Giventhisdata,theparametersθ=[θprs,θper]areestimatedbyminimizingthefollowingobjectivefunction:Ö(θ)=λ2||θ||2+1N”nXi=1ζi#(1)whereλisaregularizationparameterthatcontrolsthetrade-offbetweenmodelcomplexityandslackpenalties.Theslackvariableζirepresentsamarginviolationpenaltyfortheithtrainingexample,de-finedas:ζi=maxγ,Γ,„,T(cid:2)F(γ,Γ,„,T,zi,di;θ)+cost(γ,γi)(cid:3)−maxΓ,„,tf(γi,Γ,„,T,zi,di;θ)Theaboveexpressionisthestructuredcounterpartofthehingeloss,wherecost(γ,γi)isthemarginbywhichγi’sscoremustexceedγ’sscore.Weletcost(γ,γi)betheHammingcost;itaddsacostof1foreachentityesuchthatγi(e)6=γ(e).Weoptimizethisobjectiveusingthestochasticsubgradientmethod(Ratliffetal.,2006).Tocom-putethesubgradientgi,firstcomputethehighest-scoringassignmentstothemodel’shiddenvariables:ˆγ,ˆΓ,ˆ‘,ˆt←argmaxγ,Γ,„,tf(γ,Γ,„,T,zi,di;θj)+cost(γ,γi)(2)Γ∗,‘∗,t∗←argmaxΓ,„,tf(γi,Γ,„,T,zi,di;θj)(3)Thefirstsetofvalues(e.g.,ˆ‘)arethebestex-planationforthedenotationˆγwhichmostviolatesthemarginconstraint.Thesecondsetofvalues(e.g.,‘∗)arethebestexplanationforthetruede-notationγi.Thesubgradientupdateincreasestheweightsoffeaturesthatexplainthetruedenotation,whiledecreasingtheweightsoffeaturesthatexplainthedenotationviolatingthemargin.Thesubgradi-entfactorsintoparsingandperceptioncomponents:gi=[giprs,giper].Theparsingsubgradientis:giprs=φprs(ˆ‘,P,zi)−φprs(‘∗,t∗,zi)Thesubgradientoftheperceptionparametersθperfactorsintosubgradientsofthecategoryandrelationclassifierparameters.Recallthatθper={θcper:c∈C}∪{θrper:r∈R}.Letˆγc∈ˆΓbethebestmargin-violatingsetofentitiesforc,andγc∗∈Γ∗bethebesttruth-explainingsetofentities.Similarlydefineˆγrandγr∗.Thesubgradientsofthecategoryandrelationclassifierparametersare:gi,cper=Xe∈Edi(ˆγc(e)−γc∗(e))φcat(e)gi,rper=X(e1,e2)∈Edi(ˆγr(e1,e2)−γr∗(e1,e2))φrel(e1,e2)4.2Inference:ComputingtheSubgradientSolvingthemaximizationsinEquations2and3ischallengingbecausetheweightsplacedonthede-notationγcouplefprsandfper.Duetothiscou-pling,exactlysolvingtheseproblemsrequires(1)enumeratingallpossiblelogicalforms‘,Und(2)foreachlogicalform,findingthehighest-scoringlogi-calknowledgebaseΓbypropagatingtheweightsonγbackthroughfeval.Weuseatwo-stepapproximateinferencealgo-rithmforbothmaximizations.Thefirststepper-formsabeamsearchoverCCGparses,produzieren
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
199
kpossiblelogicalforms‘1,…,‘k.Thesecondstepusesanintegerlinearprogram(ILP)tofindthebestlogicalknowledgebaseΓgiveneachparse‘i.Inourexperiments,weparsewithabeamsizeof1000,thensolveanILPforeachofthe10highest-scoringlogicalforms.Thehighest-scoringparse/logicalknowledgebasepairistheapproximatemaximizer.Givenalogicalform‘outputbybeamsearch,thesecondstepofinferencecomputesthebestvaluesforthelogicalknowledgebaseΓanddenotationγ:maxΓ,γfper(Γ,D;θper)+feval(γ,„,Γ)+ψ(γ)(4)Hier,ψ(γ)=Pe∈Edψeγ(e)representsasetofweightsontheentitiesinthepredicteddenotationγ.ForEquation2,ψrepresentscost(γ,γi).ForEqua-tion3,ψisahardconstraintencodingγ=γi(i.e.,ψ(γ)=−∞whenγ6=γiand0otherwise).WeencodethemaximizationinEquation4asanILP.Foreachcategorycandrelationr,wecreatebi-naryvariablesγc(e1)andγr(e1,e2)foreachentityintheenvironment,e1,e2∈Ed.Wesimilarlycreatebinaryvariablesγ(e)forthedenotationγ.Usingthefactthatfperisalinearfunctionofthesevariables,wewritetheILPobjectiveas:fper(Γ,D;θper)+ψ(γ)=Xe1∈EdXc∈Cwce1γc(e1)+Xe1,e2∈EdXr∈Rwre1,e2γr(e1,e2)+Xe1∈Edψe1γ(e1)wheretheweightswce1andwre1,e2determinehowlikelyitisthateachentityorentitypairbelongstothepredicatescandr:wce1=(θcper)Tφcat(e1)wre1,e2=(θrper)Tφrel(e1,e2)TheILPalsoincludesconstraintsandadditionalauxiliaryvariablesthatrepresentfeval.Thesecon-straintscouplethedenotationγandthelogicalknowledgebaseΓsuchthatγistheresultofevaluat-ing‘onΓ.‘isrecursivelydecomposedasinSection3.3,andeachintermediatesetofentitiesintherecur-renceisgivenitsownsetof|Ed|(oder|Ed|2)variables.Thesevariablesarethenlogicallyconstrainedtoen-force‘’sstructure.5EvaluationOurevaluationperformsthreemajorcomparisons.First,weexaminetheperformanceimpactofweaklysupervisedtrainingbycomparingweaklyandfullysupervisedvariantsofLSP.Second,weexaminetheperformanceimpactofmodellingrelationsbycom-paringagainstacategory-onlybaseline,whichisanablatedversionofLSPsimilartothemodelofMa-tuszeketal.(2012).Endlich,weexaminethecausesoferrorsbyperforminganerroranalysisofLSP’ssemanticparserandperceptualclassifiers.Beforedescribingourresults,wefirstdescribesomenecessaryset-upfortheexperiments.Thesesectionsdescribethedatasets,Merkmale,construc-tionoftheCCGlexicon,anddetailsofthemodels.Ourdatasetsandadditionalevaluationresourcesareavailableonlinefromhttp://rtw.ml.cmu.edu/tacl2013_lsp/.5.1DataSetsWeevaluateLSPontwoapplications:sceneun-derstanding(SCENE)andgeographicalquestionan-swering(GEOQA).Thesedatasetsarecollections{(zi,γi,di,‘i,Γi)}ni=1,consistingofanumberofnaturallanguagestatementsziwithannotateddeno-tationsγiinenvironmentsdi.Forfullysupervisedtraining,eachstatementisannotatedwithagoldstandardlogicalform‘i,andeachenvironmentwithagoldstandardlogicalknowledgebaseΓi.StatisticsofthesedatasetsareshowninTable1,andexampleenvironmentsandstatementsareshowninFigure5.TheSCENEdatasetconsistsofsegmentedimagesofindoorenvironmentscontaininganumberofor-dinaryobjectssuchasmugsandmonitors.Eachimageisanenvironment,andeachimagesegment(boundingbox)isanentity.WecollectednaturallanguagedescriptionsofeachscenefrommembersofourlabandAmazonMechanicalTurk,askingsubjectstodescribetheobjectsineachscene.Theauthorsthenmanuallyannotatedthecollectedstate-mentswiththeirdenotationsandlogicalforms.Inthisdataset,eachimagecontainsthesamesetofob-jects;notethatthisdoesnottrivializethetask,asthemodelonlyobservesvisualfeaturesoftheobjects,whicharenotconsistentacrossenvironments.TheGEOQAdatasetconsistsofseveralmapscontainingentitiessuchascities,Staaten,nationalparks,lakesandoceans.Eachmapisanenvi-ronment,anditscomponententitiesaregivenbypolygonsoflatitude/longitudecoordinatesmarking
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
200
DataSetStatisticsSCENEGEOQA#ofenvironments1510Meanentities/environmentd4.16.9Mean#ofentitiesindenotationγ1.51.2#ofstatements284263Meanwords/statement6.36.3Meanpredicates/log.form2.62.8#ofpreds.inannotatedworlds4638LexiconStatistics#ofwordsinlexicon169288#oflexiconentries712876Meanparses/statement15.08.9Table1:Statisticsofthetwodatasetsusedinourevaluation,andofthegeneratedlexicons.theirboundaries.3Furthermore,eachentityhasoneknownname(e.g.,“Greenville”).Inthisdataset,distinctentitiesoccuronaveragein1.25environ-ments;repeatedentitiesaremostlylargelocations,suchasstatesandoceans.Thelanguageforthisdatasetwascontributedbymembersofourresearchgroup,whowereinstructedtoprovideamixtureofsimpleandcomplexgeographicalquestions.Theauthorsthenmanuallyannotatedeachquestionwithadenotation(itsanswer)andalogicalform.5.2FeaturesThefeaturesofbothapplicationsareintendedtocapturepropertiesofentitiesandrelationsbetweenthem.Assuch,bothapplicationsshareasetofphys-icalfeatureswhicharefunctionsoftheboundingpolygonsofeachentity.Examplecategoryfeatures(φcat)aretheareaandperimeteroftheentity,andanexamplerelationfeature(φrel)isthedistancebe-tweenentitycentroids.TheSCENEdatasetadditionallyincludesvisualappearancefeaturesinφcattocapturevisualproper-tiesofobjects.ThesefeaturesincludeaHistogramofOrientedGradients(HOG)(DalalandTriggs,2005)andanRGBcolorhistogram.TheGEOQAdatasetadditionallyincludesdis-tributionalsemanticfeaturestodistinguishbetweendifferenttypesofentities(e.g.,statesvs.lakes)andtocapturenon-spatialrelations(e.g.,capitalsofstates).Thesefeaturesarederivedfromphraseco-occurrenceswithentitynamesintheClueweb093PolygonswereextractedfromOpenStreetMap,http://www.openstreetmap.org/.webcorpus.4Thecategoryfeaturesφcatincludein-dicatorsforthe20contextswhichmostfrequentlyoccurwithanentity’sname(e.g.,“Xisacity”).Ähnlich,therelationfeaturesφrelincludeindica-torsforthe20mostfrequentcontextsbetweentwoentities’names(e.g.,“XineasternY”).5.3LexiconInductionOneoftheinputstothesemanticparser(Section3.2)isalexiconthatlistspossiblesyntacticandseman-ticfunctionsforeachword.Together,theseper-wordentriesdefinethesetofpossiblelogicalformsforeverystatement.Eachwordmayhavemul-tiplelexiconentriestocaptureuncertaintyaboutitsmeaning.Forexample,theword“right”mayhaveentriesN:λx.right(X)andN/PP:λf.λx.∃y.right-rel(X,j)∧f(j).Thesemanticparserlearnstodistinguishamongtheseinterpreta-tionstoproducegoodlogicalforms.Weautomaticallygeneratedlexiconsforbothapplicationsusingpart-of-speechtagheuristics.5Theseheuristicsmapwordstolexiconentriescon-tainingcategoryandrelationpredicatesderivedfromtheword’slemma.Nounsandadjectivespro-ducelexiconentriescontainingeithercategoriesorrelations(asshownabovefor“right”).Mappingtheseparts-of-speechtorelationsisnecessaryforphraseslike“totherightof,”wherethenoun“right”denotesarelation.Verbsandprepositionspro-ducelexiconentriescontainingrelations.Additionalheuristicsgeneratesemantically-emptylexiconen-tries,allowingwordslikedeterminerstohavenophysicalinterpretation.Finally,therearespecialheuristicsforformsof“tobe”and,inGEOQA,tohandleknownentitynames.Thecompletesetoflexiconinductionheuristicsisavailableonline.Theautomaticallygeneratedlexiconmakesitdif-ficulttocomparesemanticparsesacrossmodels,sincethecorrectnessofasemanticparsedependsonthelearnedperceptualclassifiers.Tofacilitatesuchacomparison(Section5.6),wefilteredoutlexiconentriescontainingpredicateswhichwerenotusedinanyoftheannotatedlogicalforms.StatisticsofthefilteredlexiconsareshowninTable1.4http://www.lemurproject.org/clueweb09/5WeusedtheStanfordPOStagger(Toutanovaetal.,2003).
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
201
EnvironmentdLanguagezandpredictedlogicalform‘PredictedgroundingTruegroundingmonitortotheleftofthemugs{(2,1),(2,3)}{(2,1),(2,3)}λx.∃y.monitor(X)∧left-rel(X,j)∧mug(j)mugtotheleftoftheothermug{(3,1)}{(3,1)}λx.∃y.mug(X)∧left-rel(X,j)∧mug(j)objectsonthetable{(1,4),(2,4){(1,4),(2,4),λx.∃y.object(X)∧on-rel(X,j)∧table(j)(3,4)}(3,4)}twobluecupsareplacedneartothecomputerscreen{(1)}{(1,2),(3,2)}λx.blue(X)∧cup(X)∧comp.(X)∧screen(X)WhatcitiesareinNorthCarolina?{(CH,NC),(GB,NC){(CH,NC),(GB,NC)λx.∃y.city(X)∧in-rel(X,j)∧y=NC(RA,NC)}(RA,NC)}WhatcityiseastofGreensboroinNorthCarolina?{(RA,GB,NC),{(RA,GB,NC)}λx.∃y,z.city(X)∧east-rel(X,j)(MB,GB,NC)}∧y=GB∧in-rel(j,z)∧z=NCWhatcitiesareontheocean?{(CH,AO),(GB,AO),{(MB,AO)}λx.∃y.city(X)∧on-rel(X,j)∧ocean(j)(MB,AO),(RA,AO)}Figure5:Exampleenvironments,statements,andmodelpredictionsfromtheSCENEandGEOQAdatasets.5.4ModelsandTrainingTheevaluationcomparesthreemodels.ThefirstmodelisLSP-W,whichisLSPtrainedusingtheweaklysupervisedalgorithmdescribedinSection4.Thesecondmodel,LSP-CAT,replicatesthemodelofMatuszeketal.(2012)byrestrictingLSPtousecategorypredicates.LSP-CATisconstructedbyremovingallrelationpredicatesinlexiconentries,mappingentrieslikeλf.λg.λx.∃y.r(X,j)∧g(X)∧f(j)toλf.λg.λx.∃y.g(X)∧f(j).Thismodelisalsotrainedusingourweaklysupervisedalgorithm.Thethirdmodel,LSP-F,isLSPtrainedwithfullsupervision,usingthemanuallyannotatedsemanticparsesandlogicalknowledgebasesinourdatasets.Giventheseannotations,trainingLSPamountstoindependentlytrainingasemanticparser(usingsen-tenceswithannotatedlogicalforms,{(zi,‘i)})andasetofperceptualclassifiers(usingenvironmentswithannotatedlogicalknowledgebases,{(di,Γi)}).ThismodelmeasurestheperformanceachievablewithLSPgivensignificantlymoresupervision.AllthreevariantsofLSPweretrainedusingthesamehyperparameters.ForSCENE,wecomputedsubgradientsin5exampleminibatchesandper-formed100passesoverthedatausingλ=0.03.ForGEOQA,wecomputedsubgradientsin8exampleminibatches,againperforming100passesoverthedatausingλ=0.02.Wetriedvaryingtheregular-izationparameter,butfoundthatperformancewasrelativelystableunderλ≤0.05.Allexperimentsuseleave-one-environment-outcross-validationtoestimatemodelperformance.Weholdouteachen-vironmentinturn,traineachmodelontheremainingenvironments,thentestontheheld-outenvironment.5.5ResultsWeconsidertwopredictionproblemsintheeval-uation.Thefirstproblemistopredictthecorrectdenotationγiforastatementziinanenvironmentdi.Acorrectpredictiononthistaskcorrespondstoacorrectlyansweredquestion.Aweaknessofthistaskisthatitispossibletoguesstherightde-notationwithoutfullyunderstandingthelanguage.Forexample,givenaquerylike“mugsontheta-ble,”itmightbepossibletoguessthedenotationbasedsolelyon“mugs,”ignoring“table”altogether.Thegroundingpredictiontaskcorrectsforthisprob-lem.Here,eachmodelpredictsagrounding,whichisthesetofallsatisfyingassignmentstothevari-ablesinalogicalform.Forexample,forthelog-icalformλx.∃y.left-rel(X,j)∧mug(j),thegroundingisthesetof(X,j)tuplesforwhichbothleft-rel(X,j)andmug(j)returntrue.Notethat,ifthepredictedsemanticparseisincorrect,thepredictedgroundingforastatementmaycontainadifferentnumberofvariablesthanthetrueground-ing;suchgroundingsareincorrect.Figure5showsmodelpredictionsforthegroundingtask.Performanceonbothtasksismeasuredusingex-actmatchaccuracy.Thismetricisthefractionofexamplesforwhichthepredictedsetofentities(beitthedenotationorgrounding)exactlyequalstheannotatedset.Thisisachallengingmetric,asthe
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
202
Denotationγ0rel.1rel.othertotalLSP-CAT0.940.450.200.51LSP-F0.890.810.200.70LSP-W0.890.770.160.67Groundingg0rel.1rel.othertotalLSP-CAT0.940.370.000.42LSP-F0.890.800.000.65LSP-W0.890.700.000.59%ofdata235621100(A)ResultsontheSCENEdataset.Denotationγ0rel.1rel.othertotalLSP-CAT0.220.190.070.17LSP-F0.640.530.210.48LSP-W0.640.580.210.51Groundingg0rel.1rel.othertotalLSP-CAT0.220.190.000.16LSP-F0.640.530.170.47LSP-W0.640.580.150.50%ofdata87220100(B)ResultsontheGEOQAdataset.Table2:ModelperformanceontheSCENEandGEOQAdatasets.LSP-CATisanablatedversionofLSPthatonlylearnscategories(similartoMatuszeketal.(2012)),LSP-FisLSPtrainedwithannotatedseman-ticparsesandlogicalknowledgebases,andLSP-WisLSPtrainedonsentenceswithannotateddenotations.Resultsareseparatedbythenumberofrelationsineachtestnaturallanguagestatement.numberofpossiblesetsgrowsexponentiallyinthenumberofentitiesintheenvironment.Sayanen-vironmenthas5entitiesandalogicalformhastwovariables;thenthereare25possibledenotationsand225possiblegroundings.Toquantifythisdifficulty,notethatselectingadenotationuniformlyatrandomachieves6%accuracyonSCENE,and1%accuracyonGEOQA;selectingarandomgroundingachieves1%and0%accuracy,respectively.Table2showsresultsforbothapplicationsus-ingexactmatchaccuracy.Tobetterunderstandtheperformanceofeachmodel,webreakdownperfor-manceaccordingtolinguisticcomplexity.Wecom-putethenumberofrelationsintheannotatedlogicalformforeachstatement,andshowseparateresultsfor0and1relations.Wealsoincludean“other”categorytocapturesentenceswithmorethanonerelation(veryinfrequent),orthatincludequanti-fiers,comparatives,orotherlinguisticphenomenanotcapturedbyLSP.Theresultsfromtheseexperimentssuggestthreeconclusions.First,wefindthatmodellingrelationsisimportantforbothapplications,als(1)themajor-ityofexamplescontainrelationallanguage,Und(2)LSP-WandLSP-FdramaticallyoutperformLSP-CATontheseexamples.ThelowperformanceofLSP-CATsuggeststhatmanydenotationscannotbepredictedfromonlythefirstnounphraseinastatement,demonstratingthatbothapplicationsre-quireanunderstandingofrelations.Second,wefindthatweaklysupervisedtrainingandfullysupervisedtrainingperformsimilarly,withaccuracydifferencesintherangeof3%-6%.Finally,wefindthatLSP-Wperformssimilarlyonboththedenotationandcom-pletegroundingtasks;thisresultsuggeststhatwhenLSP-Wpredictsacorrectdenotation,itdoessobe-causeithasidentifiedthecorrectentityreferentsofeachportionofthestatement.5.6ComponentErrorAnalysisWeperformedanerroranalysisofeachmodelcom-ponenttobetterunderstandthecausesoferrors.Ta-ble3showstheaccuracyofthesemanticparserfromeachtrainedmodel.Eachheld-outsentenceziisparsedtoproducealogicalform‘,whichismarkedcorrectifitexactlymatchesourmanualannotation‘i.Acorrectlogicalformimpliesacorrectground-ingforthestatementwhentheparseisevaluatedinthegoldstandardlogicalknowledgebase.Thesere-sultsshowthatbothLSP-WandLSP-Fhaverea-sonablyaccuratesemanticparsers,giventherestric-tionsonpossiblelogicalforms.Commonmistakesincludemissinglexiconentries(e.g.,“borders”isPOS-taggedasanoun,sotheGEOQAlexicondoesnotincludeaverbforit)andprepositionalphraseattachments(e.g.,6thexampleinFigure5).Table4showstheprecisionandrecallofthein-dividualperceptualclassifiers.Wecomputedthesemetricsbycomparingeachannotatedpredicateintheheld-outenvironmentwiththemodel’spredic-tionsforthesamepredicate,treatingeachentity(orentitypair)asanindependentexampleforclassifi-
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
203
SCENEGEOQALSP-CAT0.210.17LSP-W0.720.71LSP-F0.730.75UpperBound0.790.87Table3:Accuracyofthesemanticparserfromeachtrainedmodel.Upperboundisthehighestaccu-racyachievablewithoutmodellingcomparativesandotherlinguisticphenomenanotcapturedbyLSP.cation.Fullysupervisedtrainingappearstoproducebetterperceptualclassifiersthanweaklysupervisedtraining;Jedoch,thisresultconflictswiththefullsystemevaluationinTable2,wherebothsystemsperformequallywell.Therearetwocausesforthisphenomenon:uninformativeadjectivesandunim-portantrelationinstances.Uninformativeadjectivepredicatesareresponsi-bleforthelowperformanceofthecategoryclassi-fiersinSCENE.Phraseslike“LCDscreen”inthisdomainareannotatedwithlogicalformssuchasλx.lcd(X)∧screen(X).Hier,lcdisuninfor-mative,sincescreenalreadydenotesauniqueob-jectintheenvironment.Therefore,itisnotimpor-tanttolearnanaccurateclassifierforlcd.Weaklysupervisedtraininglearnsthatlcdismeaningless,yetpredictsthecorrectdenotationforλx.lcd(X)∧screen(X)usingitsscreenclassifier.Thediscrepancyinrelationperformanceoccursbecausetherelationevaluationweightseveryrela-tionequally,whereasinrealitysomerelationsaremorefrequent.Furthermore,evenwithinasinglerelation,eachentitypairisnotequallyimportant–forexample,peopletendtoaskwhatisinastate,butnotwhatisinanocean.Toaccountforthesefactors,wedefineareweightedrelationmetricusingthean-notatedlogicalformscontainingonlyonerelation,oftheformλx.∃y.c1(X)∧r(X,j)∧c2(j).Usingtheselogicalforms,wemeasuretheperformanceofronthesetofx,ypairssuchthatc1(X)∧c2(j),thenaveragethisoverallexamples.Table4showsthat,usingthismetric,bothtrainingregimeshavesimilarperformance.Thisresultsuggeststhatweaklysu-pervisedtrainingadaptsLSP’srelationclassifierstotherelationinstanceswhichareempiricallyimpor-tantforgroundingnaturallanguage.SCENEGEOQACategoriesPRF1PRF1LSP-CAT0.400.860.550.780.250.38LSP-W0.400.840.540.850.630.73LSP-F0.980.960.970.890.630.74RelationsPRF1PRF1LSP-W0.400.420.410.340.510.41LSP-F0.990.870.920.700.460.55Relations(rw)PRF1PRF1LSP-W0.980.980.980.860.720.79LSP-F0.980.950.960.890.660.76Table4:Perceptualclassifierperformance,mea-suredagainstthegoldstandardlogicalknowledgebases.LSP-CATisexcludedfromtherelationeval-uations,sinceitdoesnotlearnrelations.Relations(rw)isthereweightedmetric(seetextfordetails).6ConclusionsThispaperintroducesLogicalSemanticswithPer-ception(LSP),amodelformappingnaturallan-guagestatementstotheirreferentsinaphysicalen-vironment.LSPjointlymodelsperceptionandlan-guageunderstanding,simultaneouslylearning(1)tomapfromenvironmentstologicalknowledgebasescontaininginstancesofbothone-argumentcategoriesandtwo-argumentrelations,Und(2)tosemanticallyparsenaturallanguage.Furthermore,weintroduceaweaklysupervisedtrainingproce-durethattrainsLSPusingonlysentenceswithanno-tateddenotations,withoutannotatedsemanticparsesornounphrase/entitymappings.Anexperimen-talevaluationrevealsthatthisprocedureperformsnearlyaswellfullysupervisedtraining,whilere-quiringsignificantlylessannotationeffort.Ourex-perimentsalsofindthatLSP’sabilitytolearnrela-tionsimprovesperformanceovercomparablepriorwork(Matuszeketal.,2012).AcknowledgmentsThisresearchhasbeensupportedinpartbyDARPAunderawardFA8750-13-2-0005,andinpartbyagiftfromGoogle.WealsogratefullyacknowledgetheCMUReadtheWebgroupforassistancewithdatasetconstruction,andTomMitchell,ManuelaVeloso,BrendanO’Connor,FelixDuvallet,RobertFisherandtheanonymousreviewersforhelpfuldis-cussionsandcommentsonthepaper.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
204
ReferencesRehjCantrell,MatthiasScheutz,PaulSchermerhorn,andXuanWu.2010.Robustspokeninstruc-tionunderstandingforHRI.InProceedingsofthe5thACM/IEEEInternationalConferenceonHuman-RobotInteraction.DavidL.ChenandRaymondJ.Mooney.2008.Learningtosportscast:atestofgroundedlanguageacquisition.InProceedingsofthe25thInternationalConferenceonMachinelearning.DavidL.ChenandRaymondJ.Mooney.2011.Learn-ingtointerpretnaturallanguagenavigationinstruc-tionsfromobservations.InProceedingsofthe25thAAAIConferenceonArtificialIntelligence.JamesClarke,DanGoldwasser,Ming-WeiChang,andDanRoth.2010.Drivingsemanticparsingfromtheworld’sresponse.InProceedingsoftheFour-teenthConferenceonComputationalNaturalLan-guageLearning.NavneetDalalandBillTriggs.2005.Histogramsoforientedgradientsforhumandetection.InProceed-ingsofthe2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition.JurajDzifcak,MatthiasScheutz,ChittaBaral,andPaulSchermerhorn.2009.Whattodoandhowtodoit:translatingnaturallanguagedirectivesintotempo-ralanddynamiclogicrepresentationforgoalmanage-mentandactionexecution.InProceedingsofthe2009IEEEInternationalConferenceonRoboticsandAu-tomation.Kai-yuhHsiao,NikolaosMavridis,andDebRoy.2003.Couplingperceptionandsimulation:Stepstowardsconversationalrobotics.InProceedingsoftheIEEE/RSJInternationalConferenceonIntelligentRobotsandSystems.RohitJ.KateandRaymondJ.Mooney.2006.Usingstring-kernelsforlearningsemanticparsers.InPro-ceedingsofthe21stInternationalConferenceonCom-putationalLinguisticsandthe44thannualmeetingoftheACL.RohitJ.KateandRaymondJ.Mooney.2007.Learninglanguagesemanticsfromambiguoussupervision.InProceedingsofthe22ndConferenceonArtificialIn-telligence.ThomasKollar,StefanieTellex,DebRoy,andNicholasRoy.2010.Towardunderstandingnaturallanguagedirections.InProceedingsofthe5thACM/IEEEIn-ternationalConferenceonHuman-RobotInteraction.JayantKrishnamurthyandTomM.Mitchell.2012.Weaklysupervisedtrainingofsemanticparsers.InProceedingsofthe2012JointConferenceonEmpir-icalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning.Geert-JanM.Kruijff,HendrikZender,PatricJensfelt,andHenrikI.Christensen.2007.Situateddialogueandspatialorganization:Was,Wo…andwhy.In-ternationalJournalofAdvancedRoboticSystems.TomKwiatkowski,LukeZettlemoyer,SharonGoldwa-ter,andMarkSteedman.2010.InducingprobabilisticCCGgrammarsfromlogicalformwithhigher-orderunification.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.MichaelLevitandDebRoy.2007.Interpretationofspa-tiallanguageinamapnavigationtask.IEEETransac-tionsonSystems,Man,andCybernetics,PartB.PercyLiang,MichaelI.Jordan,andDanKlein.2011.Learningdependency-basedcompositionalsemantics.InProceedingsoftheAssociationforComputationalLinguistics.MatthewMacMahon,BrianStankiewicz,andBenjaminKuipers.2006.Walkthetalk:connectinglanguage,Wissen,andactioninrouteinstructions.InPro-ceedingsofthe21stNationalConferenceonArtificialIntelligence.CynthiaMatuszek,DieterFox,andKarlKoscher.2010.Followingdirectionsusingstatisticalmachinetransla-tion.InProceedingsofthe5thACM/IEEEInterna-tionalConferenceonHuman-RobotInteraction.CynthiaMatuszek,NicholasFitzGerald,LukeZettle-moyer,LiefengBo,andDieterFox.2012.Ajointmodeloflanguageandperceptionforgroundedat-tributelearning.InProceedingsofthe29thInterna-tionalConferenceonMachineLearning.NathanD.Ratliff,J.AndrewBagnell,andMartinA.Zinkevich.2006.(online)subgradientmethodsforstructuredprediction.ArtificialIntelligenceandStatistics.DebRoy,Kai-YuhHsiao,andNikolaosMavridis.2003.Conversationalrobots:buildingblocksforgroundingwordmeaning.InProceedingsoftheHLT-NAACL2003WorkshoponLearningWordMeaningfromNon-linguisticData.NobuyukiShimizuandAndrewHaas.2009.Learningtofollownavigationalrouteinstructions.InProceedingsofthe21stinternationaljointconferenceonartificalintelligence.MarjorieSkubic,DennisPerzanowski,SamBlisard,AlanSchultz,WilliamAdams,MagdaBugajska,andDerekBrock.2004.Spatiallanguageforhuman-robotdi-alogs.IEEETransactionsonSystems,Man,andCy-bernetics,PartC:ApplicationsandReviews.MarkSteedman.1996.SurfaceStructureandInterpre-tation.BenTaskar,CarlosGuestrin,andDaphneKoller.2004.Max-marginmarkovnetworks.InAdvancesinNeuralInformationProcessingSystems.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
205
StefanieTellex,ThomasKollar,StevenDickerson,MatthewWalter,AshisBanerjee,SethTeller,andNicholasRoy.2011.Understandingnaturallanguagecommandsforroboticnavigationandmobilemanipu-lation.InAAAIConferenceonArtificialIntelligence.KristinaToutanova,DanKlein,ChristopherD.Manning,andYoramSinger.2003.Feature-richpart-of-speechtaggingwithacyclicdependencynetwork.InPro-ceedingsofthe2003ConferenceoftheNorthAmeri-canChapteroftheAssociationforComputationalLin-guisticsonHumanLanguageTechnology.TerryWinograd.1970.Proceduresasarepresentationfordatainacomputerprogramforunderstandingnat-urallanguage.Ph.D.thesis,MassachusettsInstituteofTechnology.YukWahWongandRaymondJ.Mooney.2006.Learn-ingforsemanticparsingwithstatisticalmachinetrans-lation.InProceedingsofthemainconferenceonHu-manLanguageTechnologyConferenceofNAACL.JohnM.ZelleandRaymondJ.Mooney.1996.Learn-ingtoparsedatabasequeriesusinginductivelogicpro-gramming.InProceedingsoftheThirteenthNationalConferenceonArtificialIntelligence.LukeS.ZettlemoyerandMichaelCollins.2005.Learn-ingtomapsentencestologicalform:Structuredclas-sificationwithprobabilisticcategorialgrammars.InProceedingsofthe21stConferenceinUncertaintyinArtificialIntelligence.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
2
0
1
5
6
6
6
5
9
/
/
T
l
A
C
_
A
_
0
0
2
2
0
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
206