Transactions of the Association for Computational Linguistics, vol. 3, pp. 545–558, 2015. Action Editor: Jason Eisner.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 545–558, 2015. Action Editor: Jason Eisner.
Submission batch: 5/2015; Revision batch: 10/2015; Published 11/2015.

2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

c
(cid:13)

ImitationLearningofAgenda-basedSemanticParsersJonathanBerantStanfordUniversityyonatan@cs.stanford.eduPercyLiangStanfordUniversitypliang@cs.stanford.eduAbstractSemanticparsersconventionallyconstructlogicalformsbottom-upinaﬁxedorder,re-sultinginthegenerationofmanyextraneouspartiallogicalforms.Inthispaper,wecom-bineideasfromimitationlearningandagenda-basedparsingtotrainasemanticparserthatsearchespartiallogicalformsinamorestrate-gicorder.Empirically,ourparserreducesthenumberofconstructedpartiallogicalformsbyanorderofmagnitude,andobtainsa6x-9xspeedupoverﬁxed-orderparsing,whilemain-tainingcomparableaccuracy.1IntroductionSemanticparsing,thetaskofmappingnaturallanguagetosemanticrepresentations(e.g.,log-icalforms),hasemergedinrecentyearsasapromisingparadigmfordevelopingquestionan-sweringsystems(ZelleandMooney,1996;Zettle-moyerandCollins,2005;WongandMooney,2007;Kwiatkowskietal.,2010;Liangetal.,2011)andothernaturallanguageinterfaces(ZettlemoyerandCollins,2007;Tellexetal.,2011;Matuszeketal.,2012).Recently,therehavebeentwoma-jortrends:Theﬁrstistoscalesemanticparsingtolargeknowledgebases(KB)suchasFreebase(CaiandYates,2013;Kwiatkowskietal.,2013;BerantandLiang,2014).Thesecondistolearnsemanticparserswithoutrelyingonannotatedlogicalforms,butinsteadontheirdenotations(answers)(Clarkeetal.,2010;Liangetal.,2011);thislessenstheanno-tationburdenandhasbeeninstrumentalinfuelingtheﬁrsttrend(Berantetal.,2013).whatcitywasabrahamlincolnbornin3915083622020>1MType.City(cid:117)PlaceOfBirthOf.AbeLincolnType.Loc(cid:117)ContainedBy.LincolnTown…AbeLincolnLincolnTownUSSLincoln…Type.CityType.Loc…PlaceOfBirthOfPlacesLived…Figure1:Aparsingchartfortheutterance“whatcitywasabrahamlincolnbornin”.Numbersinchartcellsindicatethenumberofpossiblesemanticparsesconstructedoverthatspan,andarrowspointtosomeofthelogicalformsthatwerecon-structed.Therearemorethanonemillionpossiblesemanticparsesforthisutterance.Inthispaper,weareinterestedintrainingseman-ticparsersfromdenotationsonlargeKBs.Thechal-lengeinthissettingisthatthevocabularyofthetargetlogicallanguageoftencontainsthousandsoflogicalpredicates,andthereisamismatchbetweenthestructureofthenaturallanguageandthelogicallanguage.Asaresult,thespaceofpossibleseman-ticparsesforevenashortutterancegrowsquickly.Forexample,considertheutterance“whatcitywasabrahamlincolnbornin”.Figure1illustratesthenumberofpossiblesemanticparsesthatcanbecon-structedoversomeoftheutterancespans.Justbycombiningsemanticparsesoverthespans“city”,“lincoln”and“born”wealreadyobtain362·391·20possibleparses;attheroot,wegetoveramillionparses.1Theambiguityoflanguagethusresultsina1Evenwhentypeconstraintsareusedtopruneparses,westillproducemorethanamillionpossibleparsesattheroot.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

546

Root:Type.City(cid:117)PlaceOfBirthOf.AbeLincolnwhatSet:Type.CitycitywasSet:PlaceOfBirthOf.AbeLincolnEntity:AbeLincolnabrahamlincolnBinary:PlaceOfBirthOfborninJoinIntersectLexLexLexFigure2:Anexamplesemanticparse,orderivation,fortheutterance“whatcitywasabrahamlincolnbornin”.Eachnodeinthetreehasacategory(e.g.,ENTITY)andalogicalform(e.g.,AbeLincoln).hardsearchproblem.Tomanagethiscombinatorialexplosion,pastapproaches(KrishnamurthyandMitchell,2012;Kwiatkowskietal.,2013;Berantetal.,2013)usedbeamsearch,wherethenumberofparses(seeFig-ure2)foreachchartcell(e.g.,(SET,3:5))iscappedatK.Typicalbottom-upparsingisemployed,wherewebuildallparsesforspansoflengthnbeforen+1,etc.Thisﬁxed-orderparsingstrategycon-structsmanyunnecessaryparsesthough.Forex-ample,itwouldcreateKparsesforthecategoryENTITYandthespanover“lincoln”,generatingthelogicalformUSSLincoln,althoughitisunlikelythatthisentitywouldbeintheﬁnallogicalform.Toovercometheproblemswithﬁxed-orderpars-ing,weturntoagenda-basedparsing(Kay,1986;CaraballoandCharniak,1998;KleinandManning,2003;PaulsandKlein,2009;AuliandLopez,2011).Inagenda-basedparsing,anagenda(priorityqueue)holdspartialparsesthatcanbeconstructednext.Ateachstep,theparsewiththehighestpriorityispoppedfromtheagendaandputintothechart.Thisgivestheparserfullcontroloverthesequenceofparsesconstructed.Butimportantly,agenda-basedparsingrequiresagoodscoringfunctionthatcanranknotjustfullparsesbutalsopartialparsesontheagenda.Howdoweobtainsuchascoringfunction?Tothisend,weborrowideasfromimitationlearn-ingforstructuredprediction(Daumeetal.,2009;Rossetal.,2011;GoldbergandNivre,2013;Changetal.,2015).Speciﬁcally,wecastagenda-basedse-manticparsingasaMarkovdecisionprocess,wherethegoalistolearnapolicy,thatgivenastate(i.e.,thecurrentchartandagenda),choosesthebestnextaction(i.e.,theparsetopopfromtheagenda).Thesupervisionsignalisusedtogenerateasequenceoforacleactions,fromwhichthemodelistrained.OurworkbearsastrongresemblancetoJiangetal.(2012),whoappliedimitationlearningtoagenda-basedparsing,butinthecontextofsyntacticpars-ing.However,twonewchallengesariseinseman-ticparsing.First,syntacticparsingassumesgoldparses,fromwhichitiseasytoderiveanoracleac-tionsequence.Incontrast,wetrainfromquestion-answerpairsonly(ratherthanparsetreesorevenlogicalforms),sogeneratinganoraclesequenceismorechallenging.Second,semanticparsersexploreamuchlargersearchspacethansyntacticparsers,duetothehighlevelofuncertaintywhentranslatingtologicalform.Thus,weholdabeamofparsesforeachchartcell,andmodifylearningforthissetup.Wegainfurtherefﬁciencybyintroducingalazyagenda,whichreducesthenumberofparsesthatneedtobescored.Forexample,thesingleactionofprocessing“born”,requiresplacing391logicalformsontheagenda,althoughonlyfewofthemwillbeused.Ourlazyagendaholdsderivationstreams,whichimplicitlyrepresenta(possiblyin-ﬁnite!)groupofrelatedparsesasasingleagendaitem,andlazilymaterializeparsesasneeded.Em-pirically,thisreducesthenumberofparsesthatarescoredattrainingtimeby35%.Last,wemakemodelingcontributionsbyaug-mentingthefeaturesetpresentedbyBerantetal.(2013)withnewfeaturesthatimprovethemappingofphrasestoKBpredicates.Weevaluateouragenda-basedparserontheWE-BQUESTIONSdataset(Berantetal.,2013)againstaﬁxed-orderparser,andobservethatourparserre-ducesthenumberofparsingactionsbyanorderofmagnitude,achievesa6x-9xspeedup,andobtainsacomparableaccuracyof49.7%.Toconclude,thispaperdescribesthreecontribu-tions:First,anovelagenda-basedsemanticparserthatlearnstochoosegoodparsingactions,train-ingfromquestion-answerpairsonly;Second,alazyagendathatpacksparsesinstreamsandreducesthenumberofgeneratedparses;Last,modelingchangesthatsubstantiallyimproveaccuracy.2SemanticParsingTaskWhileouragenda-basedsemanticparserappliesmorebroadly,ourexpositionwillbebasedonour

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

547

primarymotivation,questionansweringonaknowl-edgebase.Thesemanticparsingtaskisdeﬁnedasfollows:Given(i)aknowledgebase(KB)K,(ii)agrammarG(deﬁnedshortly),and(iii)atrainingsetofquestion-answerpairs{(xi,yi)}ni=1,outputase-manticparserthatmapsnewquestionsxtoanswersyvialatentlogicalformsz.WenowbrieﬂydescribetheKBandlogicalformsusedinthispaper.LetEdenoteasetofentities(e.g.,AbeLincoln),andletPdenoteasetofproperties(e.g.,PlaceOfBirthOf).AknowledgebaseKisasetofassertions(e1,p,e2)∈E×P×E(e.g.,(Hodgenville,PlaceOfBirthOf,AbeLincoln)).WeusetheFreebaseKB(Google,2013),whichhas41Mentities,19Kproperties,and596Massertions.ToquerytheKB,weusethelogicallanguagesimpleλ-DCS.Insimpleλ-DCS,anentity(e.g.,AbeLincoln)denotesthesingletonsetcontainingthatentity;thisisaspecialcaseofaunarypred-icate.Aproperty(aspecialcaseofabinarypredicate)canbejoinedwithaunarypredicate;e.g.,PlaceOfBirthOf.AbeLincolndenotesallen-titiesthataretheplaceofbirthofAbrahamLin-coln.Wealsohaveintersection:Type.CityuPlaceOfBirthOf.AbeLincolndenotesthesetofen-titiesthatarebothcitiesandtheplaceofbirthofAbrahamLincoln.WewriteJzKKforthedenotationofalogicalformzwithrespecttoaKBK.Foraformaldescriptionofλ-DCS,seeLiang(2013).3GrammarsandSemanticFunctionsSincewearelearningsemanticparsersfromdeno-tations,wecannotinduceagrammarfromprovidedlogicalforms(Kwiatkowskietal.,2010).Instead,weassumeasmallandﬂexiblegrammarthatspec-iﬁesthespaceoflogicalforms.Thegrammarcon-sistsofabackboneCFG,butisatypicalinthateachruleisaugmentedwithasemantic(composition)functionthatproducesavaryingnumberofderiva-tionsusingarbitrarycontext.Thisﬂexibilitypro-videsproceduralcontroloverthegenerationoflogi-calforms.Formally,agrammarisatuple(V,N,R),whereVisasetofterminals(words),Nisasetofcate-gories(suchasBINARY,ENTITY,SETandROOTinFigure2,whereROOTistherootcategory),andRisarulesetofbinaryandunaryrules,explainedbelow.Abinaryruler∈RhastheformA→BC[f],whereA∈Nistheleft-hand-side,BC∈N2istheright-hand-side(RHS),andfisasemanticfunc-tion,explainedbelow.Givenanutterancex,thegrammardeﬁnesasetofderivations(semanticparsetrees)overeveryspanxi:j=(wi,wi+1,…,wj−1).DeﬁneDtobethesetofallderivations,andletdAi:jbeaderivationoverthespanxi:jofcategoryA.Giventhederiva-tionsdBi:kanddCk:jandtheruler=A→BC[f],thesemanticfunctionf:D×D→2Dpro-ducesasetofderivationsf(dBi:k,dCk:j)overxi:jwithcategoryA.Inwords,thesemanticfunc-tiontakestwochildderivationsasinputandpro-ducesasetofcandidateoutputderivations.Foreachoutputderivationd,letd.rbetheruleused(SET→ENTITYBINARY[JOIN])andd.zbethelogicalformconstructedbyf,usuallycreatedbycombiningthelogicalformsofthechildderiva-tions(PlaceOfBirthOf.AbeLincoln).Thiscom-pletesourdescriptionofbinaryrules;unaryrulesA→B[f]andlexicalrulesA→w[f]arehandledsimilarly,wherew∈V+isasequenceofterminals.Figure3demonstratestheﬂexibilityofseman-ticfunctions.TheJOINsemanticfunctiontakesaderivationwhoselogicalformisabinarypredicate,andaderivationwhoselogicalformisaunarypred-icate,andperformsajoinoperation.LEXtakesaderivationrepresentingaphraseandoutputsmanycandidatederivations.INTERSECTtakestwoderiva-tionsandattemptstointersecttheirlogicalforms(asdeﬁnedinSection2).Inthisspeciﬁccase,noout-putderivationsareproducedbecausetheKBtypesforType.CityandReleaseDateOf.LincolnFilmdonotmatch.IncontrastwithCFGrulesforsyntacticpars-ing,ruleswithsemanticfunctionsgeneratesetsofderivationsratherthanasinglederivation.Weallowsemanticfunctionstoperformarbitraryoperationsonthechildderivations,accessexternalresourcessuchasFreebasesearchAPIandtheKB.Inprac-tice,ourgrammaremploys11semanticfunctions;inadditiontoJOIN,LEX,andINTERSECT,weuseBRIDGE,whichimplementsthebridgingoperation(seeSection8)fromBerantetal.(2013),aswellasonesthatrecognizedatesandﬁlterderivationsbasedonpart-of-speechtags,namedentitytags,etc.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

548

Join(cid:18)Entity:AbeLincolnabrahamlincoln,Binary:PlaceOfBirthOfborn(cid:19)=(cid:40)Set:PlaceOfBirthOf.AbeLincolnEntity:AbeLincolnabrahamlincolnBinary:PlaceOfBirthOfborn(cid:41)Lex(lincoln)=(cid:40)Entity:AbeLincolnlincoln,Entity:LincolnFilmlincoln,…(cid:41)Intersect(cid:18)Set:Type.Citycity,Set:ReleaseDateOf.LincolnFilmabrahamlincolnborn(cid:19)={}Figure3:Asemanticfunction(weshowJOIN,LEXandINTERSECT)takesoneortwochildderivationsandreturnsasetofpossiblederivations.4Fixed-orderParsingWenowdescribeﬁxed-orderparsingwithbeamsearch,whichhasbeenthecommonpracticeinpastwork(KrishnamurthyandMitchell,2012;Kwiatkowskietal.,2013;Berantetal.,2013).Letxbetheinpututterance.WecallderivationsdROOT0:|x|,spanningtheutterancexandwithrootcate-gory,rootderivations,andallotherderivationspar-tialderivations.Givenascoringfunctions:D→R,abottom-upﬁxed-orderparseriteratesoverspansxi:jofincreasinglengthnandcategoriesA∈N,andusesthegrammartogeneratederivationsbasedonderivationsofsubspans.Weusebeamsearch,inwhichforeveryspanxi:jandeverycategoryAwekeepabeamthatstoresuptoKderivationsinachartcellHAi:j(wheredifferentderivationsusuallycorrespondtodifferentlogicalforms).WedenotebyHthesetofderivationsinanychartcell.Aﬁxed-orderparserisguaranteedtocomputetheKhighest-scoringderivationsifthefollowingtwoconditionshold:(i)allsemanticfunctionsreturnex-actlyonederivation,and(ii)thescoringfunctiondecomposes—thatis,thereisafunctionsrule:R→Rsuchthatforeveryruler=A→BC[f],thescoreofaderivationproducedbytheruleiss(dAi:j)=s(dBi:k)+s(dCk:j)+srule(r).Unfortunately,thetwoconditionsgenerallydonotholdinseman-ticparsing.Forexample,theINTERSECTfunctionreturnsanemptysetwhentype-checkingfails,vi-olatingcondition(i),andthescoringfunctionsof-tendependsonthedenotationsizeoftheconstructedlogicalform,violatingcondition(ii).Ingeneral,wewanttheﬂexibilityofhavingthescoringfunctiondependonthelogicalformsandsub-derivations,andthereforewewillnotbeconcernedwithexactnessinthispaper.Notethatwecouldaugmentthecat-egoriesNwiththelogicalform,butthiswouldin-creasethenumberofcategoriesexponentially.Model.Wefocusonlinearscoringfunctions:s(d)=φ(d)>θ,whereφ(d)∈RFisthefeaturevectorandθ∈RFistheparametervectortobelearned.GivenanysetofderivationsD⊆D,wecandeﬁnethecorrespondinglog-lineardistribution:pθ(d|D)=exp{φ(d)>θ}Pd0∈Dexp{φ(d0)>θ}.(1)Learning.Thetrainingdataconsistsofasetofutterance-denotation(question-answer)pairs{(xi,yi)}ni=1.Tolearnθ,weuseanonlinelearn-ingalgorithm,whereoneach(xi,yi),weusebeamsearchbasedonthecurrentparameterstoconstructasetofrootderivationsDi=HROOT0:|x|,andthentakeagradientsteponthefollowingobjective:Oi(θ)=logp(yi|xi)(2)=logXd∈Dipθ(d|Di)R(d)+λkθk1,(3)whereR(d)∈[0,1]isarewardfunctionthatmea-suresthecompatibilityofthepredicteddenotationJd.zKKandthetruedenotationyi,2.Wemarginalizeoverlatentderivations,whichareweightedbytheircompatibilitywiththeobserveddenotationyi.Themaindrawbackofﬁxed-orderparsingisthattoobtaintheKrootderivationsDi,theparsermustﬁrstconstructKderivationsforallspansandallcat-egories,manyofwhichwillnotmakeitintoanyrootderivationd∈Di.Next,wedescribeagenda-basedparsing,whosegoalistogivetheparserbettercon-trolovertheconstructedderivations.2Jd.zKKandyiarebothsetsofentities,soRistheF1score.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

549

removeaddFigure4:Aschematicillustrationofaexecutingaparsingaction,speciﬁedbyaderivationontheagenda.First,weremoveitfromtheagendaandputitinthechart.Then,combineitwithotherchartderivationstoproducenewderivations,whichareaddedbacktotheagenda.Algorithm1Agenda-basedparsing1:procedurePARSE(x)2:INITAGENDA()3:while|Q|>0∧|HROOT0:|x||jandr=B→AC[f]∈Rdo14:fordCj,k∈HCj:kdo15:Q.addAll(f(dAi:j,dCj:k))16:forkθ.Computingthebestactionargmaxapθ(a|s)sim-plyinvolvespoppingfromthepriorityqueue.Ahistoryh=(s1,a1,…,aT,sT+1)(seeFig-ure5)isasequenceofstatesandactions,suchthats1hasanemptychartandaninitialagenda,andsT+1isaterminalstatereachedafterperformingthechartactioninwhichwechoosearootderivationaTfromHROOT0:|x|(Algorithm1).Thepolicyforchoosingparsingactionsinducesadistributionoverhistoriespθ(h)=QTt=1pθ(at|st).Atahighlevel,ourpolicyistrainedusingimi-tationlearningtomimicanoraclethattakesaopti-malactionateverystep(Daumeetal.,2009;Rossetal.,2011).Becauseinsemanticparsingwetrainfromquestionsandanswers,wedonothaveaccesstoanoracle.Instead,weﬁrstparsexbysamplingahistoryfromthecurrentpolicypθ;letd∗betherootderivationwithhighestrewardoutoftheKrootderivationsconstructed(see(2)).Wethengenerateatargethistoryhtargetfromd∗usingtwoideas—localreweightingandhistorycompression,whichweex-plainshortly.Thepolicyparametersθarethenup-datedasfollows:θ←θ+ηR(htarget)TXt=1δt(htarget),(4)δt(h)=∇θlogpθ(at|st)(5)=φ(at)−Epθ(a0t|st)[φ(a0t)].TherewardR(h)=R(aT)∈[0,1]measuresthecompatibilityofthereturnedderivation(see(2)),andηisthelearningrate.4Notethatwhileourfea-4Notethatunlikestandardpolicygradient,ourupdatesarenotinvariant(eveninexpectation)toshiftingtherewardbyaconstant.Ourupdatesdonotmaximizereward,buttherewardmerelyprovidesawaytomodulatethemagnitudeoftheup-dates.st−1stst+1at−1atFigure5:Aschematicillustrationofa(partial)historyofstatesandactions.Eachellipserepresentsastate(chartandagenda),andtheredpathmarkstheactionschosen.turesφ(a)dependontheactiononly,theupdateruletakesintoaccountallactionsthatareontheagenda.Localreweighting.Giventhereferenced∗,letI[aind∗]indicatewhetheranactionaisasub-derivationofd∗.Wesamplehtargetfromthelo-callyreweighteddistributionp+wθ(a|s)∝pθ(a|s)·exp{βI[aind∗]}forsomeβ>0.Thisisamul-tiplicativeinterpolationofthemodeldistributionpθandtheoracle.Whenβishigh,thisreducestosam-plingfromtheavailableactionsind∗.Whennoor-acleactionsareavailable,thisreducestosamplingfrompθ.Theprobabilityofahistoryisdeﬁnedasp+wθ(h)=QTt=1p+wθ(at|st).RecallweconstructKrootderivations.Aprob-lemwithlocalreweightingisthatafteraddingd∗tothechart,therearenomoreoracleactionsontheagendaandallsubsequentactionsaresimplysam-pledfromthemodel.Wefoundthatupdatingto-wardstheseactionshurtsaccuracy.Toavoidthisproblem,weproposeperforminghistorycompres-sion,describednext.Historycompression.Givend∗,wecandeﬁneforeveryhistoryhasequenceofindices(t1,t2,…)suchthatI[atiind∗]=1foreveryi.Then,thecompressedhistoryc(h)=(st1,at1,st2,at2,…)isasequenceofstatesandactionssuchthatallactionschoosesub-derivationsofd∗.Notethatc(h)isnota“realhistory”inthesensethattak-ingactionatidoesnotnecessarilyresultinstatesti+1.InFigure6,thecompressedhistoryc(h)=(s1,a1,s3,a3,s4,a4,s5).Wecannowsampleatargethistoryhtargetfor(4)fromadistributionovercompressedhistories,

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

551

a4=d∗h:a1a3s1s2s3s4s5a1a2a3a4Figure6:Anexamplehistoryofstatesandactions,whereac-tionsthatarepartofthereferencederivationd∗=a4areinred.Thecompressedhistoryisc(h)=(s1,a1,s3,a3,s4,a4,s5).Algorithm2LearningalgorithmprocedureLEARN({xi,yi}ni=1)θ←0foreachiterationτandexampleidoh0←PARSE(pθ,xi)d∗←CHOOSEORACLE(h0)htarget←PARSE(p+cwθ,xi)θ←θ+ητ,i·R(htarget)PTt=1δt(htarget)p+cθ(h)=Ph0:c(h0)=hpθ(h),wherewemarginal-izeoverallhistoriesthathavethesamecompressedhistory.Tosamplefromp+cθ,wesampleh0∼pθandreturnhtarget=c(h0).Thiswillprovideahis-torycontainingonlyactionsleadingtotheoracled∗.Inourfullmodel,wesampleahistoryfromp+cwθ,whichcombineslocalreweightingandhis-torycompression:wesampleh0∼p+wθandre-turnhtarget=c(h0).WeempiricallyanalyzelocalreweightingandhistorycompressioninSection9.Inpractice,wesetβlargeenoughsothatthebe-haviorofp+cwθisasfollows:weﬁrstconstructthereferenced∗bysamplingoracleactions.Aftercon-structingd∗,nooracleactionsareontheagenda,soweconstructK−1morerootderivations,samplingfrompθ(butnotetheseactionsarenotpartofthere-turnedcompressedhistory).Finally,thelastactionchoosesd∗fromtheKderivations.Algorithm2summarizeslearning.Weinitializeourparameterstozero,andthenparseeachexam-plebysamplingahistoryfrompθ.WechoosethederivationwithhighestrewardinHROOT0:|x|astheref-erencederivationd∗.Thisdeﬁnesp+cwθ,whichwesamplefromtoupdateparameters.Thelearningrateητ,iissetusingAdaGrad(Duchietal.,2010).6.2RelatedapproachesOurmethodisrelatedtopolicygradientinreinforce-mentlearning(Suttonetal.,1999):ifin(4),wesamplefromthemodeldistributionpθwithoutanoracle,thenourupdateisexactlythepolicygradi-entupdate,whichmaximizestheexpectedrewardEpθ(h)[R(h)].Wedonotusepolicygradientsincethegradientisalmostzeroduringthebeginningoftraining,leadingtoslowconvergence.Thiscorrob-oratesJiangetal.(2012).OurmethodextendsJiangetal.(2012)toseman-ticparsing,whichposesthefollowingchallenges:(a)Wetrainfromdenotations,andmustobtainaref-erencetoguidelearning.(b)Tocombatlexicalun-certaintywemaintainabeamofsizeKineachpars-ingstate(weshowthisisimportantinSection9).(c)Weintroducehistorycompression,whichfocusesthelearnerontheactionsthatproducethecorrectderivationratherthanincorrectonesonthebeam.Interestingly,Jiangetal.(2012)foundthatimitationlearningdidnotworkwell,andobtainedimprove-mentsfrominterpolatingwithpolicygradient.Wefoundthatimitationlearningworkedwell,andin-terpolatingwithpolicygradientdidnotofferfurtherimprovements.ApossibleexplanationisthattheuncertaintypreservedintheKderivationsineachchartcellallowedimitationlearningtogeneralizeproperly,comparedtoJiangetal.(2012),whohadjustasingleitemineachchartcell.7LazyAgendaAswesawinSection3,asinglesemanticfunction(e.g.,LEX,BRIDGE)cancreatehundredsofderiva-tions.Scoringallthesederivationswhenaddingthemtotheagendaiswasteful,becausemosthavelowprobability.Inthissection,weassumesemanticfunctionsreturnaderivationstream,i.e.,anitera-torthatlazilycomputesderivationsondemand.OurlazyagendaGwillholdderivationstreamsratherthanderivations,andtheactualagendaQwillbedeﬁnedonlyimplicitly.TheintuitionissimilartolazyK-bestparsing(HuangandChiang,2005),butisappliedtoagenda-basedsemanticparsing.Ourmainassumptionisthateveryderivationstreamg=[d1,d2,…],issortedbydecreasingscore:s(d1)≥s(d2)≥···(inpractice,thisisonlyapproximatedasweexplainattheendofthissec-tion).Wedeﬁnethescoreofaderivationstreamass(g)=s(d1).AttesttimetheonlychangetoAl-gorithm1isinline4,whereinsteadofpoppingthe

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

552

Gs(g[1])|g|U[d1]710.88[d2,d3,d4,…]510011.92Gs(g[1])|g|U[d1]710.88[d2]510.12[d3]110.006[d4,…]−2980.004Figure7:Unrollingaderivationwhere(cid:15)=0.01and|G+|=1.Thestreaminredontheleftviolatesthestoppingcondition,andsoweunrolltwoderivationsuntilallstreamssatisfythecondition.highestscoringderivation,wepopthehighestscor-ingderivationstreamandprocesstheﬁrstderivationonthestream.Then,wefeaturizeandscorethenextderivationonthestreamifthestreamisnotempty,andpushthestreambacktotheagenda.Thisguar-anteeswewillobtainthehighestscoringderivationineveryparsingaction.However,duringtrainingwesamplefromadistri-butionoverderivations,notjustreturntheargmax.Samplingfromthedistributionoverstreamscanbequiteinaccurate.Supposetheagendacontainstwoderivationstreams:g1containsonederivationwithscore1andg2contains50derivationswithscore0.Thenwewouldassigng1probabilitye1e1+e0=0.73insteadofthetruemodelprobabilitye1e1+50e0=0.05.Theissueisthattheﬁrstderivationofgisnotindicativeoftheactualprobabilitymassing.Oursolutionissimple:beforesampling(line4inAlgorithm1),weprocesstheagendatoguaranteethatthesumofprobabilitiesofallunscoredderiva-tionsissmallerthan(cid:15).LetGbethelazyagendaandG+⊆Gbethesubsetofderivationstreamsthatcontainmorethanonederivation(whereunscoredderivationsexist).Ifforeveryg∈G+,pθ(g)=Pd∈gpθ(d)≤(cid:15)|G+|,thentheprobabilitysumofallunscoredderivationissmall:Pg∈G+p(g)≤(cid:15).Toguaranteethatpθ(g)≤(cid:15)|G+|,weunrollgun-tilthisstoppingconditionissatisﬁed.Unrollingastreamfromg=[d1,d2,…]meanspoppingd1fromg,constructingasingletonderivationstreamgnew=[d1],pushinggnewtotheagendaandscoringtheremainingstreambasedonthenextderivations(g)=s(d2)(Figure7).Tocheckifp(g)≤(cid:15)|G+|,wedeﬁnethefollow-ingupperboundUonp(g),whichisbasedonthenumberofderivationsinthestream|g|:pθ(g)=Pd∈ges(d)Pg0∈GPd0∈g0es(d0)≤|g|es(g[1])Pg0∈Ges(g0[1])=Uwhereg[1]istheﬁrstderivationing.CheckingthatU≤(cid:15)|G+|iseasy,sinceitisbasedonlyontheﬁrstderivationofeverystream.Onceallstreamsmeetthiscriterion,weknowthatthetotalunscoredprob-abilityislessthan(cid:15).Aslearningprogresses,therearebemanylowprobabilityderivationswhichwecanskipentirely.Thelastmissingpieceisensuringthatstreamsaresortedwithoutexplicitlyscoringallderivations.Wemakeabestefforttopreservethisproperty.Sortingderivationstreams.Allderivationsinastreamghavethesamechildderivations,astheywereconstructedbyoneapplicationofaseman-ticfunctionf.Thus,thedifferenceintheirscoresisonlyduetonewfeaturescreatedwhenapplyingf.Wecandecomposethesenewfeaturesintotwodisjointfeaturesets.Onesetincludesfeaturesthatdependonthegrammarruleonlyandareindepen-dentoftheinpututterancex,andanotheralsode-pendsonx.Forexample,thesemanticfunctionf=LEXmapsphrases,suchas“bornin”,tolog-icalforms,suchasPlaceOfBirthOf.MostfeaturesextractedbyLEXdonotdependonx:thecon-junctionof“bornin”andPlaceOfBirthOf,thefre-quencyofthephrase“bornin”inacorpus,etc.However,somefeaturesmaydependonxaswell.Forexample,ifxis“whatcitywasabrahamlin-colnbornin”,wecanconjoinPlaceOfBirthOfwiththeﬁrsttwowords“whatcity”.Asanotherex-ample,thesemanticfunctionBRIDGEtakesunarypredicates,suchasAbeLincoln,andjoinsthemwithanytype-compatiblebinarytoproducelogi-calforms,suchasPlaceOfBirthOf.AbeLincoln.Again,afeaturesuchasthenumberofassertionsinKthatcontainPlaceOfBirthOfdoesnotde-pendonx,whileafeaturethatconjoinstheintro-ducedbinary(PlaceOfBirthOf)withthemainverb(“born”),doesdependonx(seeSection8).Ourstrategyistopre-computeallfeaturesthatareindependentofxbeforetraining,5andsortstreams5ForLEX,thisrequiresgoingoveralllexiconentriesonce.ForBRIDGE,thisrequiresgoingonceovertheKB.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

553

basedonthesefeaturesonly,asanapproximationforthetrueorder.Let’sassumethatderivationsre-turnedbyanapplicationofasemanticfunctionfareparameterizedbyanauxiliarysetB.Forexample,whenapplyingLEXon“bornin”,Bwillincludealllexicalentriesthatmap“bornin”toabinarypredicate.WhenapplyingBRIDGEonAbeLincoln,Bwillincludeallbinarypredicatesthataretype-compatiblewithAbeLincoln.Weequipeachb∈BwithafeaturevectorφB(b)(computedbeforetrain-ing)ofallfeaturesthatareindependentofx.ThisgivesrisetoascoresB(b)=φB(b)>θthatdependsonthesemanticfunctiononly.Thus,wecansortBbeforeparsing,sothatwhenthefunctionfiscalled,wedonotneedtoinstantiatethederivations.NotethattheparametersθandthussBchangeduringlearning,sowere-sortBaftereveryiteration(ofgoingthroughalltrainingexamples),yieldinganapproximationtothetrueorderingofB.Inpractice,featuresextractedbyLEXdependmostlyonthelex-icalentryitselfandourapproximationisaccurate,whileforBRIDGEsomefeaturesdependonx,asweexplainnext.8FeaturesThefeaturesetinourmodelincludesallfeaturesde-scribedinBerantetal.(2013).6Inaddition,weaddnewlexicalizedfeaturesthatconnectnaturallan-guagephrasestobinarypredicates.InBerantetal.(2013),abinarypredicateisgeneratedusingalexiconconstructedofﬂineviaalignment,orthroughthebridgingoperation.Asmentionedabove,bridgingallowsustojoinunarypredicateswithbinarypredicatesthataretype-compatible,evenwhennowordintheutterancetrig-gersthebinarypredicate.Forexample,giventheut-terance“whatmoneytotaketosrilanka”,theparserwillidentifytheentitySriLanka,andbridgingwillproposeallpossiblebinaries,includingCurrency.Weaddafeaturetemplatethatconjoinsbinariessuggestedbybridging(Currency)withallcontentwordlemmas(“what”,“money”,“take”).Afterobservingenoughexamples,weexpectthefeaturecorrespondingto“money”andCurrencytobeup-weighted.Generatingfreelyandreweightingusing6Asinpreviouswork,somefeaturesusethefactthatthespellingofKBpredicatesisoftensimilartoEnglishwords.featurescanbeviewedasasoftwaytoexpandthelexiconduringtraining,similartolexicongeneration(ZettlemoyerandCollins,2005).Notethatthisfea-turedependsontheutterancex,andisnotusedforsortingstreams(Section7).Finally,eachfeatureisactuallyduplicated:onecopyﬁreswhenchoosingderivationsontheagenda(Algorithm1,line4),andtheothercopyﬁreswhenchoosingtheﬁnalrootderivation(line6).Wefoundthattheincreasedexpressivityfromseparatingfea-turesimprovesaccuracy.9ExperimentsWeevaluateoursemanticparserontheWEBQUES-TIONSdataset(Berantetal.,2013),whichcon-tains5,810question-answerpairs.Thequestionsareaboutpopulartopics(e.g.,“whatmoviesdoestaylorlautnerplayin?”)andanswersaresetsofentitiesobtainedthroughcrowdsourcing(allquestionsareanswerablebyFreebase).Weusetheprovidedtrain-testsplitandperformthreerandom80%-20%splitsofthetrainingdatafordevelopment.WeperformlexicallookupforFreebaseentitiesusingtheFreebaseSearchAPIandobtain20can-didateentitiesforeverynamedentityidentiﬁedbyStanfordCoreNLP(Manningetal.,2014).WeusethelexiconreleasedbyBerantetal.(2013)tore-trieveunaryandbinarypredicates.Weexecuteλ-DCSlogicalformsbyconvertingthemtoSPARQLandqueryingourlocalVirtuoso-backedcopyofFreebase.Duringtraining,weuseL1regularization,andcrudelytunehyperparametersonthedevelop-mentset(beamsizeK=200,toleranceforthelazyagenda(cid:15)=0.01,localreweightingβ=1000,andL1regularizationstrengthλ=10−5).Weevaluatedoursemanticparserusingthere-wardofthepredictions,i.e.,averageF1scoreonpredictedvs.trueentitiesoveralltestexamples.79.1MainResultsTable1providesourkeyresultcomparingtheﬁxed-orderparser(FIXEDORDER)andourproposedagenda-basedparser(AGENDAIL).Inallsubse-quenttables,Train,Dev.,andTestdenotetrain-ing,developmentandtestaccuracies,|Act.|denotes7Weusetheofﬁcialevaluationscriptfromhttp://www-nlp.stanford.edu/software/sempre/.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

554

TestTrain|Act.||Feat.|TimeFIXEDORDER49.660.618,12718,1271,782AGENDAIL49.761.11,3461,814291Table1:Testsetresultsforthestandardﬁxed-orderparser(FIXEDORDER)andournewagenda-basedparser(AGENDAIL),whichsubstantiallyreducesparsingtimeandthenumberofparsingactionsatnocosttoaccuracy.SystemAuthorsAcc.YV14YaoandVan-Durme(2014)35.4BCFL13Berantetal.(2013)35.7BDZZ14Baoetal.(2014)37.5BWC14Bordesetal.(2014)39.2BL14BerantandLiang(2014)39.9YDZR14Yangetal.(2014)41.3BWC14+BL14Bordesetal.(2014)41.8WYWH14Wangetal.(2014)45.3WYWH14Wangetal.(2014)45.3YCHG15Yihetal.(2015)52.5FIXEDORDERthiswork49.6AGENDAILthiswork49.7Table2:ResultsontheWEBQUESTIONStestset.theaveragenumberofparsingactions(popsfromagendainAGENDAILandderivationsplacedonchartinFIXEDORDER)perutterance,|Feat.|de-notestheaveragenumberoffeaturizedderivationsperutterance,andTimeisaverageparsingtimeinmilliseconds.WefoundthatAGENDAILis6xfasterthanFIXE-DORDER,performs13xfewerparsingactions,andreducesthenumberoffeaturizedderivationsbyanorderofmagnitude,withoutlossofaccuracy.Table2presentstestsetresultsofoursystems,comparedtorecentlypublishedresults.Wenotethatmostsystemsperformquestionansweringwith-outsemanticparsing.Ourﬁxed-orderparser,FIXE-DORDER,andagenda-basedparser,AGENDAIL,obtainanaccuracyof49.6and49.7respectively.Thisimprovesaccuracycomparedtoallprevioussystems,exceptforarecentlypublishedsemanticparserpresentedbyYihetal.(2015),whoseaccu-racyis52.5.Weattributeouraccuracyimprovementcomparedtoprevioussystemstothenewfeaturesandchangestothemodel,aswediscussbelow.BCFL13alsousedaﬁxed-orderparser,butob-tainedlowerperformance.Themaindifferencesbe-tweenthesystemsarethat(i)ourmodelincludesnewfeatures(Section8)combinedwithL1regular-ization,(ii)weusetheFreebasesearchAPIratherthanstringmatching,and(iii)ourgrammargener-AlgorithmDev.|Act.||Feat.|TimeAGENDAIL48.01,4211,912214FIXEDORDER49.118,25918,2591,972AGENDA45.96,2116,320419FIXED+AGENDA47.16,2816,615775α=100047.811,27911,2791,216α=10035.63,8583,858174α=1027.01,6041,60478p+wθ43.31,7062,121238p+cθ36.83,7584,278358pθ1.212,30215,5241,497-BINARYANDLEMMA40.51,5612,110167Table3:DevelopmentsetresultsforvariantsofAGENDAIL.atesalargerspaceofderivations.9.2AnalysisTogaininsightintooursystemcomponents,weper-formextensiveexperimentsonthedevelopmentset.Comparisonwithﬁxed-orderparsing.Figure8comparesaccuracy,speedattesttime,andnumberofderivationsforAGENDAILandFIXEDORDER.ForAGENDAIL,weshowboththenumberofderivationspoppedfromtheagenda,aswellasnum-berofderivationsscored,whichisslightlyhigherduetoscoredderivationsontheagenda.Weob-servethatforsmallbeamsizes,AGENDAILsub-stantiallyoutperformsFIXEDORDER.ThisissinceAGENDAILexploitssmallbeamsmoreefﬁcientlyinintermediateparsingstates.Forlargebeamsperfor-manceissimilar.Intermsofspeedandnumberofderivations,weseethatAGENDAILisdramaticallymoreefﬁcientthanFIXEDORDER:withbeamsize200–400,itisroughlyasefﬁcientasFIXEDORDERwithbeamsize10–20.Forthechosenbeamsize(K=200),AGENDAILis9xfasterthanFIXE-DORDER.ForK=1,performanceispoorforAGENDAILandzeroforFIXEDORDER.Thishighlightsthein-herentdifﬁcultyofmappingtologicalformscom-paredtomoreshallowtasks,asmaintainingjustasinglebestderivationforeachparsingstateisnotsufﬁcient.AcommonvariantonbeamparsingistoreplacetheﬁxedbeamsizeKwithathresholdα,andpruneanyderivationwhoseprobabilityisatleastαtimessmallerthanthebestderivationinthatstate(Zhangetal.,2010;Bodenstabetal.,2011).Weimple-mentedthisbaselineandcomparedittoAGENDAIL

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

555

1102050100200400beam size01020304050accuracyFixedOrderAgendaIL1102050100200400beam size0.00.51.01.52.02.53.03.54.0time (sec)FixedOrderAgendaIL1102050100200400beam size051015202530#derivations (thousands)FixedOrderAgendaIL (scored)AgendaIL (popped)Figure8:ComparingAGENDAILandFIXEDORDERforvariousbeamsizes(left:accuracy,middle:parsingtimeattesttimeinseconds,right:numberofthousandsofderivationsscoredandpopped).Thex-axisisonalogarithmicscale.andFIXEDORDERinTable3.Weseethatforα=1000,wegetafasteralgorithm,butaminordropinperformancecomparedtoFIXEDORDER.However,thisbaselinestillfeaturizes6xmorederivationsandis6xslowerthanAGENDAIL.Impactoflearning.TheAGENDAbaselineusesanagenda-basedparsertoapproximatethegradientsof(2).Thatis,weupdateparametersasinFIXE-DORDER,butsearchforKrootderivationsusingtheagenda-basedparser,describedinAlgorithm1(wherewepopthehighestscoringderivation).WeobservethatAGENDAfeaturizes3xmorederiva-tionscomparedtoAGENDAIL,andresultsina2.1dropinaccuracy.Thisdemonstratestheimportanceofexplicitlylearningtochoosecorrectactionsdur-ingintermediatestagesofparsing.Sinceonthedevelopmentset,FIXEDORDERoutperformedAGENDAILby1.1points,weim-plementedFIXED+AGENDA,whereaﬁxed-orderparserisusedattrainingtime,butanagenda-basedparserisusedattesttime.Thisparserfeaturized3.5xmorederivationscomparedtoAGENDAIL,is3.5xslower,andhasslightlyloweraccuracy.RecallthatAGENDAILsamplesahistoryfromp+cwθ,thatis,usinglocalreweightingandhistorycompression.Table3showstheimpactofsamplingfromp+wθ(localreweighting),p+cθ(historycompres-sion),anddirectlyfrompθ,whichreducestopolicygradient.Weobservethatsamplingfrompθdirectlyaccordingtopolicygradientresultsinverylowac-curacy,asthisproducesderivationswithzerore-wardmostofthetime.Bothlocalreweightingandhistorycompressionaloneimproveaccuracy(localAcc.|Feat.|TimeTr.Dev.Tr.Dev.Tr.Dev.(cid:15)=10256.446.71,6502,1211,159345(cid:15)=10−159.547.02,0431,8901,425279(cid:15)=10−261.048.02,6001,9121,830214(cid:15)=10−361.448.53,0631,7402,110220NOSTREAM60.047.64,0494,9313,155293Table4:Accuracy,numberoffeaturizedderivations,andpars-ingtimeforboththetrainingsetanddevelopmentsetwhenvaryingthevalueofthetoleranceparameter(cid:15).reweightingismoreimportant),butbothperformworsethanAGENDAIL.Impactoflazyagenda.Wenowexaminethecon-tributionofthelazyagenda.Notethatthelazyagendaaffectstrainingtimemuchmorethantesttimefortworeasons:(a)attesttimeweonlyneedtopopthehighestscoringderivation,andtheoverheadofapriorityqueueonlygrowslogarithmicallywiththesizeoftheagenda.Duringtraining,weneedtakeafullpassovertheagendawhensampling,andthusthenumberofitemsontheagendaisimportant;(b)attesttimeweneverunrollderivationstreams,onlypopthehighestscoringderivation(seeSection7).Inbrief,usingthelazyagendaresultsina1.5xspeedupattrainingtime.Tounderstandthesavingsofthelazyagenda,wevarythevalueofthetoleranceparameter(cid:15).When(cid:15)isveryhigh,wewillneverunrollderivationstreams,becauseforallderivationstreamsU≤(cid:15)|G+|(Section7).Thiswillbefast,butsamplingcouldbeinaccurate.As(cid:15)decreases,weunrollmorederivations.WealsocomparedtotheNOSTREAMbaseline,wheretheagendaholdsderivationsratherthanderivationstreams.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

556

AgendaIL:FixedOrder:whatcurrencydoesjamaicaaccept?2912127102085020222490018642140400whatcurrencydoesjamaicaaccept?54865248,71459532111,099761331,519403360441931593558Figure9:Numberofderivationsineverychartcellfortheutterance“whatcurrencydoesjamaicaaccept?”.AGENDAILreducesthenumberofderivationsinchartcellscomparedtoFIXEDORDER.Table4showstheresultsoftheseexperiments.Naturally,thenumberoffeaturizedderivationsintrainingincreasesas(cid:15)decreases.Inparticular,NOSTREAMresultsina2.5xincreaseinnumberoffeaturizedderivationscomparedtonounrolling((cid:15)=102),and1.5xincreasecomparedto(cid:15)=10−2,whichisthechosenvalue.Similarly,averagetrain-ingtimeisabout1.5xslowerforNOSTREAMcom-paredto(cid:15)=10−2.Accuracydoesnotchangemuchforvariousval-uesof(cid:15).Evenwhen(cid:15)=102,accuracydecreasesbyonly1.8pointscomparedto(cid:15)=10−3.Unexpect-edly,NOSTREAMyieldsaslightdropinaccuracy.Featureablation.Table3showsanablationtestonthenewfeaturetemplateweintroducedthatconjoinsbinariesandlemmasduringbridging(-BINARYANDLEMMA).Removingthisfeaturetem-platesubstantiallyreducesaccuracycomparedtoAGENDAIL,highlightingtheimportanceoflearningnewlexicalassociationsduringtraining.Example.Asaﬁnalexample,Figure9showstyp-icalparsechartsforAGENDAILandFIXEDORDER.AGENDAILgeneratesonly1,198derivations,whileFIXEDORDERconstructs15,543derivations,manyofwhichareunnecessary.Insummary,wedemonstratedthattraininganagenda-basedparsertochoosegoodparsingactionsthroughimitationlearningdramaticallyimprovesef-ﬁciencyandspeedattesttime,whilemaintainingcomparableaccuracy.10DiscussionandRelatedWorkLearning.Inthispaper,wesampledhistoriesfromadistributionthattriestotargetthereferencederivationd∗wheneverpossible.Workinimita-tionlearning(AbbeelandNg,2004;Daumeetal.,2009;Rossetal.,2011;GoldbergandNivre,2013)hasshownthatinterpolatingwiththemodel(corre-spondingtosmallerβ)canimprovegeneralization.Wewereunabletoimproveaccuracybyannealingβfrom1000to0,sounderstandingthisdynamicre-mainsanopenquestion.Parsing.Inthispaper,weavoidedcomputingKderivationsineachchartcellusinganagendaandlearningascoringfunctionforchoosingagendaitems.Acomplementaryandpurelyalgorithmicso-lutionislazyK-bestparsing(HuangandChiang,2005),orcubegrowing(HuangandChiang,2007),whichdonotinvolvelearningoranagenda.Simi-lartoourwork,cubegrowingapproximatesthebestderivationsineachchartcellinthecasewherefea-turesdonotdecomposeWorkinthepastattemptedtospeedupinferenceusingasimplemodelthatistrainedseparatelyandusedtoprunethehypothesesconsideredbythemainparsingmodel(Bodenstabetal.,2011;FitzGeraldetal.,2013).Weontheotherhandspeedupinferencebytrainingasinglemodelthatlearnstofollowgoodparsingactions.Workinagenda-basedsyntacticparsing(KleinandManning,2003;PaulsandKlein,2009)focusedonA*algorithmswhereeachderivationhasaprior-itybasedonthederivationscore(insidescore),andacompletionestimate(outsidescore).Goodesti-matesfortheoutsidescoreresultinadecreaseinthenumberofderivations.Currentlyactionsdependontheinsidescore,butwecouldaddfeaturesbasedonchartderivationstoprovide“outside”information.Addingsuchfeatureswouldpresentcomputationalchallengesasscoresontheagendawouldhavetobeupdatedastheagendaandchartaremodiﬁed.Semanticparsinghasbeengainingmomentuminrecentyears,butstilltherehasbeenrelativelylit-tleworkondevelopingfasteralgorithms,especiallycomparedtosyntacticparsing(Huang,2008;Kum-merfeldetal.,2010;RushandPetrov,2012;Lewis

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

557

andSteedman,2014).Whilewehaveobtainedsig-niﬁcantspeedups,wehopetoencouragenewideasthatexploitthestructureofsemanticparsingtoyieldbetteralgorithms.Reproducibility.Allcode,8data,andexperimentsforthispaperareavailableontheCodaLabplatformathttps://www.codalab.org/worksheets/0x8fdfad310dd84b7baf683b520b4b64d5/.AcknowledgmentsWethanktheanonymousreviewersandtheactioneditor,JasonEisner,fortheirthoroughreviewsandconstructivefeedback.Wealsogratefullyacknowl-edgethesupportoftheDARPACommunicatingwithComputers(CwC)programunderAROprimecontractno.W911NF-15-1-0462.ReferencesP.AbbeelandA.Ng.2004.Apprenticeshiplearningviainversereinforcementlearning.InInternationalConferenceonMachineLearning(ICML).M.AuliandA.Lopez.2011.EfﬁcientCCGparsing:A*versusadaptivesupertagging.InAssociationforComputationalLinguistics(ACL).J.Bao,N.Duan,M.Zhou,andT.Zhao.2014.Knowledge-basedquestionansweringasmachinetranslation.InAssociationforComputationalLinguis-tics(ACL).J.BerantandP.Liang.2014.Semanticparsingviapara-phrasing.InAssociationforComputationalLinguis-tics(ACL).J.Berant,A.Chou,R.Frostig,andP.Liang.2013.SemanticparsingonFreebasefromquestion-answerpairs.InEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP).N.Bodenstab,A.Dunlop,K.Hall,andB.Roark.2011.Beam-widthpredictionforefﬁcientcontext-freepars-ing.InAssociationforComputationalLinguistics(ACL),pages440–449.A.Bordes,S.Chopra,andJ.Weston.2014.Questionansweringwithsubgraphembeddings.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).Q.CaiandA.Yates.2013.Large-scalesemanticparsingviaschemamatchingandlexiconextension.InAsso-ciationforComputationalLinguistics(ACL).8OursystemusestheSEMPREtoolkit(http://nlp.stanford.edu/software/sempre).S.A.CaraballoandE.Charniak.1998.Newﬁguresofmeritforbest-ﬁrstprobabilisticchartparsing.Compu-tationalLinguistics,24:275–298.K.Chang,A.Krishnamurthy,A.Agarwal,H.Daume,andJ.Langford.2015.Learningtosearchbetterthanyourteacher.arXiv.J.Clarke,D.Goldwasser,M.Chang,andD.Roth.2010.Drivingsemanticparsingfromtheworld’sre-sponse.InComputationalNaturalLanguageLearn-ing(CoNLL),pages18–27.H.Daume,J.Langford,andD.Marcu.2009.Search-basedstructuredprediction.MachineLearning,75:297–325.J.Duchi,E.Hazan,andY.Singer.2010.Adaptivesubgradientmethodsforonlinelearningandstochas-ticoptimization.InConferenceonLearningTheory(COLT).N.FitzGerald,Y.Artzi,andL.S.Zettlemoyer.2013.Learningdistributionsoverlogicalformsforrefer-ringexpressiongeneration.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1914–1925.Y.GoldbergandJ.Nivre.2013.Trainingdeterminis-ticparserswithnon-deterministicoracles.Transac-tionsoftheAssociationforComputationalLinguistics(TACL),1.Google.2013.Freebasedatadumps(2013-06-09).https://developers.google.com/freebase/data.L.HuangandD.Chiang.2005.Betterk-bestparsing.InProceedingsoftheNinthInternationalWorkshoponParsingTechnology,pages53–64.L.HuangandD.Chiang.2007.Forestrescoring:Fasterdecodingwithintegratedlanguagemodels.InAssoci-ationforComputationalLinguistics(ACL).L.Huang.2008.Forestreranking:Discriminativepars-ingwithnon-localfeatures.InAssociationforCom-putationalLinguistics(ACL).J.Jiang,A.Teichert,J.Eisner,andH.Daume.2012.Learnedprioritizationfortradingoffaccuracyandspeed.InAdvancesinNeuralInformationProcessingSystems(NIPS).M.Kay.1986.AlgorithmSchemataandDataStruc-turesinSyntacticProcessing.ReadingsinNaturalLanguageProcessing.D.KleinandC.Manning.2003.A*parsing:Fastex-actviterbiparseselection.InHumanLanguageTech-nologyandNorthAmericanAssociationforComputa-tionalLinguistics(HLT/NAACL).J.KrishnamurthyandT.Mitchell.2012.Weaklysuper-visedtrainingofsemanticparsers.InEmpiricalMeth-odsinNaturalLanguageProcessingandComputa-tionalNaturalLanguageLearning(EMNLP/CoNLL),pages754–765.

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

:
/
/

d
i
r
e
c
t
.

i
t
.

e
d
u

/
t

a
c
l
/

a
r
t
i
c
e
–
p
d

f
/

d
o

i
/

1
0
1
1
6
2

/
t

a
c
_
a
_
0
0
1
5
7
1
5
6
6
8
0
0

/
t

a
c
_
a
_
0
0
1
5
7
p
d

b
y
g
u
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

558

J.Kummerfeld,J.Roesner,T.Dawborn,J.Haggerty,J.Curran,andS.Clark.2010.Fasterparsingbysupertaggeradaptation.InAssociationforComputa-tionalLinguistics(ACL).T.Kwiatkowski,L.Zettlemoyer,S.Goldwater,andM.Steedman.2010.InducingprobabilisticCCGgrammarsfromlogicalformwithhigher-orderuniﬁ-cation.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1223–1233.T.Kwiatkowski,E.Choi,Y.Artzi,andL.Zettlemoyer.2013.Scalingsemanticparserswithon-the-ﬂyontol-ogymatching.InEmpiricalMethodsinNaturalLan-guageProcessing(EMNLP).M.LewisandM.Steedman.2014.A*CCGparsingwithasupertag-factoredmodel.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).P.Liang,M.I.Jordan,andD.Klein.2011.Learningdependency-basedcompositionalsemantics.InAs-sociationforComputationalLinguistics(ACL),pages590–599.P.Liang.2013.Lambdadependency-basedcomposi-tionalsemantics.arXiv.C.D.Manning,M.Surdeanu,J.Bauer,J.Finkel,S.J.Bethard,andD.McClosky.2014.TheStanfordCoreNLPnaturallanguageprocessingtoolkit.InACLsystemdemonstrations.C.Matuszek,N.FitzGerald,L.Zettlemoyer,L.Bo,andD.Fox.2012.Ajointmodeloflanguageandper-ceptionforgroundedattributelearning.InInter-nationalConferenceonMachineLearning(ICML),pages1671–1678.A.PaulsandD.Klein.2009.K-bestA*parsing.InAs-sociationforComputationalLinguistics(ACL),pages958–966.S.Ross,G.Gordon,andA.Bagnell.2011.Areductionofimitationlearningandstructuredpredictiontono-regretonlinelearning.InArtiﬁcialIntelligenceandStatistics(AISTATS).A.RushandS.Petrov.2012.Vinepruningforefﬁcientmulti-passdependencyparsing.InHumanLanguageTechnologyandNorthAmericanAssociationforCom-putationalLinguistics(HLT/NAACL).R.Sutton,D.McAllester,S.Singh,andY.Mansour.1999.Policygradientmethodsforreinforcementlearningwithfunctionapproximation.InAdvancesinNeuralInformationProcessingSystems(NIPS).S.Tellex,T.Kollar,S.Dickerson,M.R.Walter,A.G.Banerjee,S.J.Teller,andN.Roy.2011.Understand-ingnaturallanguagecommandsforroboticnavigationandmobilemanipulation.InAssociationfortheAd-vancementofArtiﬁcialIntelligence(AAAI).Z.Wang,S.Yan,H.Wang,andX.Huang.2014.AnoverviewofMicrosoftdeepQAsystemonStan-fordWebQuestionsbenchmark.Technicalreport,Mi-crosoftResearch.Y.W.WongandR.J.Mooney.2007.Learningsyn-chronousgrammarsforsemanticparsingwithlambdacalculus.InAssociationforComputationalLinguis-tics(ACL),pages960–967.M.Yang,N.Duan,M.Zhou,andH.Rim.2014.Jointre-lationalembeddingsforknowledge-basedquestionan-swering.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).X.YaoandB.Van-Durme.2014.Informationextractionoverstructureddata:QuestionansweringwithFree-base.InAssociationforComputationalLinguistics(ACL).W.Yih,M.Chang,X.He,andJ.Gao.2015.Semanticparsingviastagedquerygraphgeneration:Questionansweringwithknowledgebase.InAssociationforComputationalLinguistics(ACL).M.ZelleandR.J.Mooney.1996.Learningtoparsedatabasequeriesusinginductivelogicprogramming.InAssociationfortheAdvancementofArtiﬁcialIntel-ligence(AAAI),pages1050–1055.L.S.ZettlemoyerandM.Collins.2005.Learningtomapsentencestologicalform:Structuredclassiﬁca-tionwithprobabilisticcategorialgrammars.InUncer-taintyinArtiﬁcialIntelligence(UAI),pages658–666.L.S.ZettlemoyerandM.Collins.2007.Onlinelearn-ingofrelaxedCCGgrammarsforparsingtologicalform.InEmpiricalMethodsinNaturalLanguagePro-cessingandComputationalNaturalLanguageLearn-ing(EMNLP/CoNLL),pages678–687.Y.Zhang,B.Ahn,S.Clark,C.V.Wyk,J.R.Curran,andL.Rimell.2010.Chartpruningforfastlexicalised-grammarparsing.InInternationalConferenceonComputationalLinguistics(COLING).
Download pdf