计算语言学协会会刊, 卷. 2, PP. 547–559, 2014. 动作编辑器: Sharon Goldwater, Alexander Koller.

提交批次: 3/2014; 修改批次 8/2014; 已发表 12/2014. C
(西德:13)

2014 计算语言学协会.

547

ANewCorpusandImitationLearningFrameworkforContext-DependentSemanticParsingAndreasVlachosComputerScienceDepartmentUniversityCollegeLondona.vlachos@cs.ucl.ac.ukStephenClarkComputerLaboratoryUniversityofCambridgesc609@cam.ac.ukAbstractSemanticparsingisthetaskoftranslatingnaturallanguageutterancesintoamachine-interpretablemeaningrepresentation.Mostapproachestothistaskhavebeenevaluatedonasmallnumberofexistingcorporawhichassumethatallutterancesmustbeinterpretedaccordingtoadatabaseandtypicallyignorecontext.Inthispaperwepresentanew,pub-liclyavailablecorpusforcontext-dependentsemanticparsing.TheMRLusedforthean-notationwasdesignedtosupportaportable,interactivetouristinformationsystem.WedevelopasemanticparserforthiscorpusbyadaptingtheimitationlearningalgorithmDAGGERwithoutrequiringalignmentinfor-mationduringtraining.DAGGERimprovesuponindependentlytrainedclassiﬁersby9.0and4.8pointsinF-scoreonthedevelopmentandtestsetsrespectively.1IntroductionSemanticparsingisthetaskoftranslatingnatu-rallanguageutterancesintoamachine-interpretablemeaningrepresentation(MR).Progressinsemanticparsinghasbeenfacilitatedbytheexistenceofcor-poracontainingutterancesannotatedwithMRs,themostcommonlyusedbeingATIS(Dahletal.,1994)andGeoQuery(Zelle,1995).Asthesecorporacoverrathernarrowapplicationdomains,recentworkhasdevelopedcorporatosupportnaturallanguagein-terfacestotheFreebasedatabase(CaiandYates,2013),aswellasthedevelopmentofMTsystems(Banarescuetal.,2013).然而,theseexistingcorporahavesomeim-portantlimitations.TheMRsaccompanyingtheutterancesaretypicallyrestrictedtosomeformofdatabasequery.Furthermore,inmostcaseseachutteranceisinterpretedinisolation;thusutterancesthatusecoreferenceorwhosesemanticsarecontext-dependentaretypicallyignored.Inthispaperwepresentanewcorpusforcontext-dependentseman-ticparsingtosupportthedevelopmentofaninterac-tivenavigationandexplorationsystemfortourism-relatedactivities.ThenewcorpuswasannotatedwithMRsthatcanhandledialogcontextsuchascoreferenceandcanaccommodateutterancesthatarenotinterpretableaccordingtoadatabase,e.g.repetitionrequests.Theutteranceswerecollectedinexperimentswithhumansubjects,andcontainphe-nomenasuchasellipsisanddisﬂuency.Wedevel-opedguidelinesandannotated17dialogscontaining2,374utterances,with82.9%exactmatchagreementbetweentwoannotators.Wealsodevelopasemanticparserforthiscor-pus.AstheoutputMRsarerathercomplex,in-steadofadoptinganapproachthatsearchestheout-putspaceexhaustively,weusetheimitationlearningalgorithmDAGGER(Rossetal.,2011)thatconvertslearningastructuredpredictionmodelintolearningasetofclassiﬁcationmodels.Wetakeadvantageofitsabilitytolearnwithnon-decomposablelossfunc-tionsandextendittohandletheabsenceofalign-mentinformationduringtrainingbydevelopingarandomizedexpertpolicy.Ourapproachimprovesuponindependentlytrainedclassiﬁersby9.0and4.8F-scoreonthedevelopmentandtestsets.2MeaningRepresentationLanguageOurproposedMRlanguage(MRL)wasdesignedinthecontextoftheportable,interactivenaviga-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

548

tionandexplorationsystemofJanarthanametal.(2013),throughwhichuserscanobtaininformationaboutplacesandobjectsofinterest,suchasmon-umentsandrestaurants,aswellasdirections(seedialoginFig.1).Thesystemisawareofthepo-sitionoftheuser(throughtheuseofGPStechnol-ogy)andisdesignedtobeinteractive;henceitcaninitiatethedialogbyofferinginformationonnearbypointsofinterestandcorrectingtheroutetakenbytheuserifneeded.TheMRsreturnedbythese-manticparsermustrepresenttheuserutterancesad-equatelysothatthesystemcangeneratetheappro-priateresponse.ThesystemwasdevelopedinthecontextoftheSPACEBOOKproject.1TheMRLusesaﬂatsyntaxcomposedofelemen-tarypredications,basedlooselyonminimalrecur-sionsemantics(Copestakeetal.,2005),butwith-outanexplicittreatmentofscope.EachMRcon-sistsofadialogactrepresentingtheoverallfunctionoftheutterance,followedforsomedialogactsbyanunorderedsetofpredicates.Allpredicatesareimplicitlyconjoinedandthenamesoftheirargu-mentsspeciﬁedtoimprovereadabilityandtoallowforsomeoftheargumentstobeoptional.Thear-gumentvaluescanbeeitherconstantsfromthecon-trolledvocabulary,verbatimstringextractsfromtheutterance(enclosedinquotes)orvariables(Xno).Negationisdenotedbyatilde(˜)infrontofpredi-cates.Thevariablesareusedtobindtogetherthear-gumentsofdifferentpredicateswithinanutterance,aswellastodenotecoreferenceacrossutterances.ThegoalsindesigningtheMRLweretoremainclosetoexistingsemanticformalisms,whilstatthesametimeproducinganMRLthatisparticularlysuitedtotheapplicationathand(Janarthanametal.,2013).WealsowantedanMRLthatcouldbecom-putedwithefﬁcientlyandaccurately,giventhena-tureoftheNLinput.HencewedevelopedanMRLthatisabletoexpresstherelevantsemanticsforthemajorityoftheutterancesinourdata,withoutmov-ingtothefullexpressivepowerof,e.g.,DRT.DialogactsThedialogactsareutterance-levella-belswhichcapturetheoverallfunctionoftheutter-anceinthedialog,forexamplewhetheranutteranceisaquestionseekingalistasananswer,astatementofinformation,anacknowledgement,aninstruction1www.spacebook-project.euUSERwhat’sthenearestitalian,嗯,forameal?dialogAct(set_question)*isA(id:X1,type:餐厅)def(id:X1)hasProperty(id:X1,property:cuisine,价值:”italian”)distance(地点:@USER,地点:X1,value:X2)argmin(争论:X1,value:X2)WIZARDvapiano’s.dialogAct(通知)isA(id:X4,type:餐厅)*isNamed(id:X4,name:”vapiano’s”)equivalent(id:X1,id:X4)USERtakemetovapiano!dialogAct(set_question)*路线(from_location:@USER,to_location:X4)isA(id:X4,type:餐厅)isNamed(id:X4,name:”vapiano”)WIZARDcertainly.dialogAct(acknowledge)WIZARDkeepwalkingstraightdownclerkstreet.dialogAct(instruct)*walk(agent:@USER,along_location:X1,direction:向前)isA(id:X1,type:street)isNamed(id:X1,name:”clerkstreet”)USERyes.dialogAct(acknowledge)USERwhatisthischurch?dialogAct(set_question)*isA(id:X2,type:church)指数(id:X2)WIZARDsorry,canyousaythisagain?dialogAct(repeat)USERisaidwhatisthischurchonmyleft!dialogAct(set_question)*isA(id:X2,type:church)指数(id:X2)位置(id:X2,ref:@USER,地点:左边)WIZARDitissaintjohn’s.dialogAct(通知)isA(id:X3,type:church)*isNamed(id:X3,name:”saintjohn’s”)equivalent(id:X2,id:X3)USERAsignheresaysitissaintmark’s.dialogAct(通知)isA(id:X4,type:church)*isNamed(id:X4,name:”saintmark’s”)equivalent(id:X2,id:X4)Figure1:SampledialogannotatedwithMRs

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

549

orarepetitionrequest(set_question,通知,acknowledge,instructandrepeatinFig-ure1).ThefocalpointtogetherwiththeactprovidesimilarinformationtotheintentannotationinATIS(Turetal.,2010).TheactsdeﬁnedintheproposedMRLfollowtheguidelinesproposedbyAllenandCore(1997),Stolckeetal.(2000)andBuntetal.(2012).Thedialogactsaredividedintotwocategories.Theﬁrstcategorycontainsthosethatareaccompa-niedbyasetofpredicatestorepresenttheseman-ticsofthesentence,suchasset_questionandinform.Fortheseactswedenotetheirfocalpoints—forexamplethepieceofinformationrequestedinaset_question—withanasterisk(*)infrontoftherelevantpredicate.Thesecondcategorycon-tainsdialogactsthatarenotaccompaniedbypredi-cates,suchasacknowledgeandrepeat.PredicatesTheMRLcontainspredicatestode-noteentities,propertiesandtheirrelations:•Predicatesintroducingentitiesandtheirproper-ties:isA,isNamedandhasProperty.•Predicatesdescribinguseractions,suchaswalkandturn,withargumentssuchasdirectionandalong_location.•Predicatesdescribinggeographicrelations,suchasdistance,routeandposition,usingreftodenoterelativepositioning.•Predicatesdenotingwhetheranentityisintro-ducedusingadeﬁnitearticle(def),anindeﬁ-nite(indef)oranindexical(指数).•Predicatesexpressingnumericalrelationssuchasargminandargmax.CoreferenceInordertomodelcoreferenceweadoptthenotionofdiscoursereferents(DRs)anddiscourseentities(DEs)fromDiscourseRepresenta-tionTheory(DRT)(Webber,1978;KampandReyle,1993).DRsarereferentialexpressionsappearinginutteranceswhichdenoteDEs,whicharementalentitiesinthespeaker’smodelofdiscourse.Mul-tipleDEscanrefertothesamereal-worldentity;forexample,inFig.1“vapiano’s”referstoadif-ferentDEfromtherestaurantintheprevioussen-tence(“thenearestitalian”),eventhoughtheyarelikelytobethesamereal-worldentity.Wecon-sideredDEsinsteadofactualentitiesintheMRLbecausetheyallowustocapturethesemanticsofinteractionssuchasthelastexchangebetweenthewizardanduser.TheMRLrepresentsmultipleDEsreferringtothesamereal-worldentitythroughthepredicateequivalent.Coreferenceisindicatedbyusingidenticalvari-ablesacrosspredicateargumentswithinanutteranceoracrossutterances.Themainprincipleindeter-miningwhetherDRscoreferisthatitmustbepossi-bletoinferthisfromthedialogcontextalone,with-outusingworldknowledge.3DataCollectionandAnnotationTheNLutteranceswerecollectedusingWizard-of-Ozexperiments(Kelley,1983)withpairsofhu-mansubjects.Ineachexperiment,onehumanpre-tendedtobeatouristvisitingEdinburgh(byphysi-callywalkingaroundthecity),whiletheotherper-formedtheroleofthesystemrespondingthroughasuitableinterfaceusingatext-to-speechsystem.Eachuser-wizardpairwasgivenoneoftwosce-nariosinvolvingrequestsfordirectionstodifferentpointsofinterest.TheﬁrstscenarioinvolvesseekingdirectionstothenationalmuseumofScotland,thengoingtoanearbycoffeeshop,followedbyapubviaacashmachineandﬁnallylookingforapark.ThesecondscenarioinvolveslookingforaJapaneserestaurantandtheuniversitygym,requestinginfor-mationabouttheFloddenWallmonument,visitingtheScottishparliamentandtheDynamicEarthsci-encecentre,andgoingtotheRoyalMileandtheSurgeon’sHallmuseum.Eachexperimentformedonedialogwhichwasmanuallytranscribedfromrecordedaudioﬁles.17dialogswerecollectedintotal,7fromtheﬁrstscenarioand10fromthesec-ond.MoredetailsarereportedinHilletal.(2013).Giventhevariednatureofthedialogs,someoftheuserrequestswerenotwithinthescopeofthesys-tem.Furthermore,theproposedMRLhasitsownlimitations;forexampleitdoesnothavepredicatestoexpresstemporalrelationships.Thus,itwasnec-essarytoﬁltertheutterancescollectedanddecidewhichonestoannotatewithMRs.2Inparticular,we2AsimilarﬁlteringprocesswasusedforGeoQuery(Sec-tion7.5.1inZelle(1995))andATIS(principlesofinterpretationdocument(/atis3/doc/pofi.doc)intheNISTCDs).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

550

vocabularytypenumberoftermsdialogacts15predicates19arguments41constants9entitytypes26properties4Table1:MRLvocabularyusedintheannotationdidnotannotateutterancesfallingintooneormoreofthefollowingcategories:•Utterancesthatarenothuman-interpretable,e.g.utterancesthatwereinterruptedtooearlytobeinterpretable.Insuchcases,thesystemislikelytorespondwitharepetitionrequest.•Utterancesthatarehuman-interpretablebutout-sidethescopeofthesystem,e.g.questionsabouthistoricaleventswhicharenotincludedinthedatabaseoftheapplicationconsidered.•Utterancesthatarewithinthescopeofthesys-tembuttoocomplextoberepresentedbytheproposedMRL,e.g.anutterancerequiringrep-resentationoftimetobeinterpreted.NotethatwestillannotateanutteranceifthecoreofitssemanticscanbecapturedbytheMRL.Forexample,“takemetovapianonow!”wouldbean-notated,eventhoughtheMRLcannotrepresentthemeaningof“now”.Broadinformationrequestssuchas“tellmemoreaboutthischurch”arealsoanno-tatedusingthepredicateextraInfo(id:Xno).WearguethatdeterminingwhichutterancesshouldbetranslatedintoMRs,andwhichshouldbeig-nored,isanimportantsubtaskforreal-worldappli-cationsofsemanticparsing.Theannotationwasperformedbyoneoftheau-thorsandafreelancelinguistwithnoexperienceinsemanticparsing.Aswellasannotatingtheuserutterances,wealsoannotatedthewizardutteranceswithdialogactsandtheentitiesmentioned,astheyprovidethenecessarycontexttoperformcontext-dependentinterpretation.Inpractice,尽管,weexpectthisinformationtobeusedbyanaturallan-guagegenerationsystemtoproducethesystem’sre-sponseandthusbeavailabletothesemanticparser.Thetotalnumberofuserutterancesannotatedwas2374,outofwhich1906wereannotatedwithMRs,theremainingnottranslatedduetotherea-sonsdiscussedearlierinthissection.ThenumberandtypesoftheMRLvocabularytermsusedap-pearinTbl.1.Theannotateddialogs,theguide-linesandthelistsofthevocabularytermsareavailablefromhttp://sites.google.com/site/andreasvlachos/resources.Inordertoassessthequalityoftheguidelinesandtheannotation,weconductedaninter-annotatoragreementstudy.Forthispurpose,thetwoanno-tatorsannotatedonedialogconsistingof510utter-ances.Exactmatchagreementattheutterancelevel,whichrequiresthattheMRsbytheannotatorsagreeondialogact,predicatesandwithin-utterancevari-ableassignment,was0.829,whichisastrongre-sultgiventhecomplexityoftheannotationtask,andwhichsuggeststhattheproposedguidelinescanbeappliedconsistently.Wealsoassessedtheagree-mentonpredicatesusingF-score,whichwas0.914.4ComparisontoExistingCorporaThemostcloselyrelatedcorpustotheonepresentedinthispaper(hereinSPACEBOOK)istheairlinetravelinformationsystem(ATIS)语料库(Dahletal.,1994)whichconsistsofdialogsbetweenauserandaﬂightbookingsystemcollectedinWizard-of-Ozexperiments.EachutteranceisannotatedwiththeSQLstatementthatwouldreturntherequestedpieceofinformationfromtheﬂightsdatabase.Theutter-anceinterpretationiscontext-dependent.Forexam-ple,whentheuserfollowsupaninitialﬂightrequest—e.g.“ﬁndmeﬂightstoBoston”—withutterancescontainingadditionalpreferences—e.g.“onMon-day”—theinterpretationoftheadditionalprefer-encesextendstheMRfortheinitialrequest.ComparedtoATIS,thedialogsintheSPACE-BOOKcorpusaresubstantiallylonger(8.8vs.139.7utterancesonaveragerespectively)andcoverabroaderdomainduetothelongerscenariosusedindatacollection.Furthermore,allowingthewizardstoanswerinnaturallanguageinsteadofrestrictingthemtorespondingviadatabasequeriesasinATISledtomorevarieddialogs.Finally,ourapproachtoannotatingcoreferenceavoidsrepeatingtheMRofpreviousutterances,thusresultinginshorterex-pressionsthatareclosertothesemanticsoftheNL

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

551

utterances.Thedatasetsdevelopedintherecentdialogstatetrackingchallenge(Hendersonetal.,2014)alsocon-sistofdialogsbetweenauserandatourisminforma-tionsystem.Howeverthetaskiseasiersinceonlythreeentitytypesareconsidered(餐厅,cof-feeshopandpub),aslot-ﬁllingMRLisusedandtheargumentslotstakevaluesfromﬁxedlists.Theabstractmeaningrepresentation(AMR)de-scribedbyBanarescuetal.(2013)wasdevelopedtoprovideasemanticinterpretationlayertoimprovemachinetranslation(公吨)systems.IthassimilarpredicateargumentstructuretotheMRLproposedhere,includingalackofcoverfortemporalrelationsandscoping.However,duetothedifferentappli-cationdomains(MTvs.tourism-relatedactivities),therearesomedifferences.SinceMTsystemsoper-ateatthesentence-level,eachsentenceisinterpretedinisolationinAMR,whilstourproposedMRLtakescontextintoaccount.Also,AMRtriestoaccountforallthewordsinasentence,whilstourMRLonlytriestocapturethesemanticsofthosewordsthatarerelevanttotheapplicationathand.OtherpopularsemanticparsingcorporaincludeGeoQuery(Zelle,1995)andFree-917(CaiandYates,2013).Bothconsistexclusivelyofquestionstobeansweredwithadatabasequery,theformerconsideringasmallAmericangeographydatabaseandthelatterthemuchwiderFreebasedatabase(Bollackeretal.,2008).UnlikeSPACEBOOKandATIS,thereisnonotionofcontextineitherofthesecorpora.Furthermore,theNLutterancesinthesecorporaarecompiledtobeinterpretedasdatabasequeries,whichisequivalenttoonlyoneofthedialogacts(set_question)intheSPACEBOOKcorpus.Thusthelatterallowstheexplorationoftheapplica-tionofdialogacttaggingasaﬁrststepinsemanticparsing.Finally,MacMahonetal.(2006)developedacorpusofnaturallanguageinstructionspairedwithsequencesofactions;howeverthedomainislimitedtosimplenavigationinstructionsandthereisnono-tionofdialoginthiscorpus.5SemanticParsingfortheNewCorpusTheMRLinFig.1isreadableandeasytoannotatewith.However,itisnotidealforexperiments,asitisdifﬁculttocompareMRexpressionsbeyondexactmatch.Forthesereasons,weconvertedtheMRex-pressionsintoanode-argumentform.Inparticular,allpredicatesintroducingentities(isA)andmostpredicatesintroducingrelationsamongentities(e.g.distance)becomenodes,whileallotherpredi-cates(e.g.isNamed,def)areconvertedintoargu-ments.Forexample,theMRfortheﬁrstutteranceinFig.1isconvertedintotheforminFig.2g.EntitiesappearinginMRexpressionswithoutatype(e.g.X2inthelastutteranceofFig.1)aredenotedwithanodeoftypeempty.Eachnodehasauniqueid(e.g.X1)andeachargumentcantakeasvalueaconstant(e.g.det),anodeid,oraverbatimstringextractfromtheutterance.Argumentsthatareabsent(e.g.thenameofrestaurant)aresettotheconstantnull.Thisconversionresultsin16utterance-levellabels(15dialogactsplusoneforthenon-interpretableut-terances),35nodetypesand32arguments.Thecomparisonbetweenapredictedandagoldstandardnode-argumentformisperformedinthreestages.Firstwemaptheidsofthepredictednodestothoseofthegoldstandard.Whileidsdonotcarryanysemantics,theyareneededtodifferenti-atebetweenmultiplenodesofthesametype;e.g.ifasecondrestauranthadbeenpredictedinFig.2hthenitwouldhaveadifferentidandwouldnotbematchedtoagoldstandardnode.Second,wedecomposethenode-argumentformsintoasetofatomicpredictions(Fig.2h).Thisdecomposi-tionallowstheawardingofpartialcredit,e.g.whenthenodetypeiscorrectbutsomeoftheargumentsarenot.Usingtheseatomicpredictionswecalculateprecision,recallandF-score.Themappingbetweenpredictedandgoldstan-dardidsisperformedbyevaluatingallmappings(withmappingsbetweennodesofdifferenttypesnotallowed),andchoosingtheoneresultinginthelow-estsumoffalsepositivesandnegatives.5.1TaskdecompositionFig.2showsthedecompositionofthesemanticparsingtaskinstages,whicharedescribedbelow.DialogactpredictionWeﬁrstassignanutterance-levellabelusingaclassiﬁerthatex-ploitsfeaturesbasedonthetextualcontentoftheutteranceandontheutteranceprecedingit.Thefea-turesextractedfromtheutteranceareallunigrams,

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

552

what’sthenearestitalianforameal?SETQUESTION(A)Dialogactpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurant(乙)Nodepredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetUSERlocation(C)Constantargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationOUTOUTOUTOUTINOUTOUTOUT(d)Stringargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationargminlocation(e)Nodeargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationargminlocationfocus(F)Focus/negationpredictionSETQUESTIONX1:餐厅(编号:singular,这:def,cuisine:“italian”)X2:distance(地点:USER,地点:X1,argmin:X1)focus:X1(G)Node-argumentformdialogAct:SETQUESTIONX1:restaurantX1:餐厅(编号:singular)X1:餐厅(这:def)X1:餐厅(cuisine:“italian”)X2:distanceX2:distance(地点:USER)X2:distance(地点:X1-restaurant(编号:singular,这:def))X2:distance(argmin:X1)focus:X1-restaurant(编号:singular,这:def)(H)AtomicpredictionsFigure2:Semanticparsingdecomposition.bigramsandtrigramsandtheﬁnalpunctuationmark.Unlikeintypicaltextclassiﬁcationtasks,contentwordsarenotalwayshelpfulindialogacttagging;e.g.thetoken“meal”inFig.2aisnotindicativeofset_question,whilen-gramsofwordstypicallyconsideredasstopwords,suchas“what’sthe”,canbemorehelpful.Ifthedialogactpredictedistobeaccompaniedbyotherpredicatesaccordingtotheguidelines(Sec.2)weproceedtothefollowingstages,otherwisestop.Thefeaturesbasedontheprecedingutterancein-dicatewhetheritwasbytheuserorthewizardand,inthelattercase,itsdialogact.Suchfeaturesareusefulindeterminingtheactofshort,ambiguousutterancessuchas“yes”,whichistaggedasyeswhenfollowingaprop_questionutterance,butasacknowledgeotherwise.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

553

NodepredictionInnodepredictionweuseaclas-siﬁertopredictwhethereachofthetokensintheut-terancedenotesanodeofaparticulartypeorempty(Fig.2b).Thefeaturesusedincludethetargetto-kenanditslemma,whichareconjoinedwiththePoStag,thepreviousandfollowingtokens,aswellasthelemmasofthetokenswithwhichithassyn-tacticdependencies.Furtherfeaturesrepresentthedialogact(e.g.routeismorelikelytoappearinasetquestionutterance),andthenumberandtypesofthenodesalreadypredicted.Sincetheevaluationignoresthealignmentbetweennodesandtokens,itwouldhavebeencorrecttopredictthecorrectnodesfromanytoken;e.g.restaurantcouldbepredictedfrom“italian”instead.However,alignmentdoesaffectargumentprediction,sinceitdeterminesitsfeatureextraction.ConstantargumentpredictionInthisstage(Fig.2c)wepredict,foreachargumentofeachnode,whetheritsvalueisanMRLvocabularyterm,aver-batimstringextract,anode,orabsent(specialval-uesSTRING,NODE,nullrespectively).IfthevaluepredictedisSTRINGorNODEitisreplacedbythepredictionsinsubsequentstages.Foreachargumentdifferentvaluesarepossible;thusweuseseparateclassiﬁersforeach,resultingin32classiﬁers.Thefeaturesusedincludethenodetype,thetokenthatpredictedthenode,andthesyntacticdependencypathsfromthattokentoallothertokensintheut-terance.Wealsoincludeasfeaturesthevaluespre-dictedforotherargumentsofthenode,thedialogact,andtheothernodetypespredicted.StringargumentpredictionForeachargumentpredictedtobeSTRING(e.g.cuisineinFig-ure2d),wepredictforeachtokeninleft-to-rightor-derwhetheritshouldbepartofthevalueforthisargumentornot(INorOUT).Sincethestringsthatareappropriateforeachargumentdiffer(e.g.thestringsforcuisineareunlikelytobeappropriateforname),weuseseparateclassiﬁersforeachofthem,resultinginﬁveclassiﬁers.Thefeaturesusedincludethetargettokenanditslemma,itsconjunc-tionwiththePoStag,thepreviousandfollowingtokens,andthelemmasofthetokenswithwhichithassyntacticdependencies.Wealsoaddedthelabelassignedtotheprevioustokenandthesyntacticde-pendencypathtothetokenthatpredictedthenode.NodeargumentpredictionForeachargumentpredictedtohaveNODEasitsvalue,wepredictforeveryothernodewhetheritshouldbethevalueornot(e.g.argmininFig.2e).Aswiththestringar-gumentprediction,weuseseparatebinaryclassiﬁersforeachargument,resultingin18classiﬁers.Thefeaturesextractedaresimilartothatstage,butwenowconsiderthetokensthatpredictedeachcandi-dateargumentnode(e.g.“meal”forrestaurant)insteadofthetokensintheutterance.Focus/NegationpredictionWepredictwhethereachnodeshouldbefocusedornegatedastwosep-aratebinarytasks.Thefeaturesusedincludetheto-kenthatpredictedthetargetnode,itslemmaandPoStagandthesyntacticdependencypathstoallothertokensintheutterance.Furtherfeaturesincludethetypeofthenodeanditsarguments.6ImitationLearningInordertolearntheclassiﬁersforthetaskde-compositiondescribed,twochallengesmustbead-dressed.Theﬁrstisthecomplexityofthestruc-turetobepredicted.Thetaskinvolvesmanyinter-dependentpredictionsmadebyavarietyofclas-siﬁers,andthuscannotbetackledbyapproachesthatassumeaparticulartypeofgraphstructure,orrestrictstructurefeatureextractioninordertoper-formefﬁcientdynamicprogramming.Thesecondchallengeisthelackofalignmentinformationdur-ingtraining.ImitationlearningalgorithmssuchasSEARN(Daum´eIIIetal.,2009)andDAGGER(Rossetal.,2011)havebeenappliedsuccessfullytoavari-etyofstructuredpredictiontasksincludingsumma-rization,biomedicaleventextractionanddynamicfeatureselection(Daum´eIIIetal.,2009;Vlachos,2012;Heetal.,2013)thankstotheirabilitytohan-dlecomplexoutputspaceswithoutexhaustivesearchandtheirﬂexibilityinincorporatingfeaturesbasedonthestructuredoutput.InthisworkwefocusonDAGGERandextendittohandlethemissingalign-ments.6.1StructuredpredictionwithDAGGERThedatasetaggregation(DAGGER)algorithm(Rossetal.,2011)formsthepredictionofaninstancesasasequenceofTactionsˆy1:Tpredictedbyalearnedpolicywhichconsistsofoneormoreclassiﬁers.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

554

Algorithm1:ImitationlearningwithDAG-GERInput:traininginstancesS,expertpolicyπ?,lossfunction‘,learningrateβ,CSClearnerCSCLOutput:LearnedpolicyHN1CSCExamplesE=∅2fori=1toNdo3p=(1−β)i−14currentpolicyπ=pπ?+(1−p)Hi−15forsinSdo6Predictπ(s)=ˆy1:T7forˆytinπ(s)do8ExtractfeaturesΦt=f(s,ˆy1:t−1)9foreachpossibleactionyjtdo10Predicty0t+1:T=π(s;ˆy1:t−1,yjt)11Assesscjt=‘(ˆy1:t−1,yjt,y0t+1:时间)12Add(Φt,ct)toE13LearnHi=CSCL(乙)Theseactionsaretakeninagreedyfashion,i.e.onceanactionhasbeentakenitcannotbechanged.Dur-ingtraining,itconvertstheproblemoflearninghowtopredictthesesequencesofactionsintocostsensi-tiveclassiﬁcation(CSC)learning.InCSClearningeachtrainingexamplehasavectorofmisclassiﬁca-tioncostsassociatedwithit,thusrenderingsomemistakesonsomeexamplestobemoreexpensivethanothers(Domingos,1999).Algorithm1presentsthetrainingprocedure.DAGGERrequiresasetoflabeledtraininginstancesSandalossfunction‘thatcomparescompleteout-putsforinstancesinSagainstthegoldstandard.Inaddition,anexpertpolicyπ?mustbespeciﬁedwhichisanoraclethatreturnstheoptimalactionfortheinstancesinS,akintoanexpertdemonstratingthetask.π?istypicallyderivedfromthegoldstan-dard;e.g.inpartofspeechtaggingπ?wouldreturnthecorrecttagforeachtoken.Inaddition,thelearn-ingrateβandaCSClearner(CSCL)mustbepro-vided.ThealgorithmoutputsalearnedpolicyHNthat,unlikeπ?,cangeneralizetounseendata.Eachtrainingiterationbeginsbysettingtheprob-abilityp(line3)ofusingπ?inthecurrentpolicyπ.Intheﬁrstiteration,onlyπ?isusedbut,inlaterit-erations,πbecomesstochasticand,foreachaction,圆周率?isusedwithprobabilityp,andthelearnedpol-icyfromthepreviousiterationHi−1withprobability1−p(line4).Thenπisusedtopredicteachtrain-inginstances(line6).Foreachactionˆyt,aCSCexampleisgenerated(lines7-12).ThefeaturesΦtareextractedfromsandallpreviousactionsˆy1:t−1(line8).Thecostforeachpossibleactionyjtises-timatedbypredictingtheremainingactionsy0t+1:Tforsusingπ(line10)andcalculatingthelossin-curredgivenyjtw.r.t.thegoldstandardforsusing‘(line11).Asπisstochastic,itiscommontousemultiplesamplesofy0t+1:Ttoassessthecostofeachactionyjtbyrepeatinglines10-11.Thefeatures,to-getherwiththecostsforeachpossibleaction,formaCSCexample(Φt,ct)(line12).Attheendofeachiteration,theCSCexamplesobtainedfromallitera-tionsareusedbytheCSClearningalgorithmtolearntheclassiﬁer(s)forHi(line13).Whenpredictingthetraininginstances(line6),andwhenestimatingthecostsforeachpossibleac-tion(lines10-11),thepolicylearnedinthepreviousiterationHi−1isusedaspartofπaftertheﬁrstit-eration.ThustheCSCexamplesgeneratedtolearnHidependonthepredictionsofHi−1and,bygrad-uallyincreasingtheuseofHi−1andignoringπ?inπ,thelearnedpoliciesareadjustedtotheirownpre-dictions,thuslearningthedependenciesamongtheactionsandhowtopredicttheminordertomini-mizetheloss.Thelearningrateβdetermineshowfastπmovesawayfromπ?.TheuseofHi−1inpredictingthetraininginstances(line6)alsohastheeffectofexploringsub-optimalactionssothatthelearnedpoliciesareadjustedtorecoverfromtheirmistakes.Finally,notethatifonlyonetrainingit-erationisperformed,thelearnedpolicyisequiva-lenttoasetofindependentlytrainedclassiﬁerssincenotrainingagainstthepredictionsofthepreviouslylearnedpolicytakesplace.6.2TrainingwithmissingalignmentsThelossfunction‘inDAGGERisonlyusedtocomparecompleteoutputsagainstthegoldstandard.Therefore,whengeneratingaCSCtrainingexampleinDAGGER(lines7-12),wedonotneedtoknowwhetheranactionyjtiscorrectornot,weonlyevalu-atewhattheeffectofyjtisonthelossincurredbythecompleteactionsequence.Thus,itdoesnotneedtodecomposeovertheactionstakentoevaluatethem.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

555

Theabilitytotrainagainstnon-decomposablelossfunctionsisusefulwhenthetrainingdatahasmiss-inglabels,asisthecasewithsemanticparsing.Fol-lowingSec.5,‘isdeﬁnedasthesumofthefalsepositiveandfalsenegativeatomicpredictionsusedtocalculateprecisionandrecalland,sinceitignoresthealignmentbetweentokensandnodes,itcannotassessnodepredictionactions.However,wecanuseitunderDAGGERtolearnanodepredictionclassi-ﬁertogetherwiththeclassiﬁersoftheotherstages.TheonlycomponentofDAGGERwhichassumesknowledgeofthecorrectactionsfortrainingistheexpertpolicyπ?.Sincethesearenotavailableforthenodepredictionstage,wereplaceπ?witharan-domizedexpertpolicyπrand,inwhichactionsthatarenotspeciﬁedbytheannotationarechosenran-domlyfromasetofequallyoptimalones.Forex-ample,inFig.2bwhenpredictingtheactionforeachtoken,πrandchoosesrandomlyamongnull,distance,andrestaurant,sothatbytheendofthestagethecorrectnodeshavebeenpredicted.Randomizingthischoicehelpsexploretheactionsavailable.Inourexperimentsweplacedauniformdistributionovertheavailableactions,i.e.allop-timalactionsareequallylikelytobechosen.Theactionsreturnedbyπrandwilloftenresultinalign-mentsthatdonotincuranylossbutarenonsensical,e.g.predictingrestaurantfrom“what”.How-ever,sinceπrandisprogressivelyignored,theeffectofsuchactionsisreduced.Whilebeingabletolearnasemanticparserwith-outalignmentinformationisuseful,itwouldhelptousesomesupervision,e.g.that“street”commonlypredictsthenodestreet.Weincorporatesuchanalignmentdictionaryinπrandasfollows:ifthetargettokenismappedtoanodetypeinthedictio-nary,andifanodeofthistypeneedstobepredictedfortheutterance,thenthistypeisreturned.Other-wise,thepredictionismadewithπrand.Finally,likeπranditself,thedictionaryisprogressivelyignoredandneitherconstrainsthetrainingprocess,norisusedduringtesting.7ExperimentsWesplittheannotateddialogsintotrainingandtestsets.Theformerconsistsoffourdialogsfromtheﬁrstscenarioandsevenfromthesecond,andthelat-terofthreedialogsfromeachscenario.Alldevel-opmentandfeatureengineeringwasconductedus-ingcross-validationonthetrainingset,atthedialoglevelratherthantheutterancelevel(thereforeresult-inginasmanyfoldsasdialogsinthetrainingset),toensurethateachfoldcontainsutterancesfromallpartsofthescenariofromwhichthedialogistaken.Toperformcost-sensitiveclassiﬁcationlearningweusedtheadaptiveregularizationofweightvec-tors(AROW)algorithm(Crammeretal.,2009).AROWisanonlinealgorithmforlinearpredic-torsthatadjuststheper-featurelearningratessothatpopularfeaturesdonotovershadowrarebutusefulones.Giventhetaskdecomposition,eachlearnedhypothesisconsistsof59classiﬁers.Werestrictedthepredictionofnodestocontentwordssincefunctionwordsareunlikelytoprovideusefulalignments.Allpreprocessingwasperformedus-ingtheStanfordCoreNLPtoolkit(Manningetal.,2014).Theimplementationofthesemanticparserisavailablefromhttp://sites.google.com/site/andreasvlachos/resources.TheDAGGERparametersweresetto12trainingitera-tions,β=0.3and3samplesforactioncostassess-ment.WecomparedourDAGGER-basedimitationlearningapproach(henceforthImit)againstindepen-dentlytrainedclassiﬁersusingthesameclassiﬁca-tionlearnerandfeatures(henceforthIndep).Forbothsystemsweincorporatedanalignmentdictionary(+alignversions)asdescribedinSec.6.2,inordertoimprovenodepredictionperformance.Thedictionarywasextractedfromthetrainingdataandcontains96tokensthatcommonlypredictapar-ticularnodetype.Theresultsfromthecross-validationexperimentsarereportedinTbl.2.Overallperformanceeval-uatedasdescribedinSec.5was53.6pointsinF-scoreforImit,5.7pointshigherthanIndepandthedifferenceisgreaterforthe+alignversions.Theseresultsdemonstratetheadvantagesoftrainingclas-siﬁersusingimitationlearningversusindependentlytrainedclassiﬁers.Isolatingtheperformancefornodeandargumentpredictionstages,weobservethatthemainbottleneckistheformer,whichinthecaseofImitis60.9pointsinF-scorecomparedto78.8forthelatter.Accuracyfordialogactsis78.9%.AsshowninTbl.2,thealignmentdictionaryim-provednotonlynodepredictionperformanceby6

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

556

ImitImit+alignIndepIndep+alignexactmatch(准确性)58.4%59.1%56%55.9%dialogact(准确性)78.9%79.3%78.8%79%节点(Rec/Prec/F)72.352.660.976.159.866.944.461.651.653.36458.1论据(Rec/Prec/F)77.68078.879.68381.374.167.270.178.266.371.8focus(Rec/Prec/F)81.887.284.484.486.785.585.98786.586.88.384.7全面的(Rec/Prec/F)59.348.953.662.254.459.145.350.847.95050.150.1Table2:Performancesusing11-foldcross-validationonthetrainingset.pointsinF-score,butalsoargumentpredictionby2.5points,thusdemonstratingthebeneﬁtsoflearn-ingthealignmentstogetherwiththeothercompo-nentsofthesemanticparser.Theoverallperfor-manceimprovedby5.5pointsinF-score.Finally,werananexperimentwithoraclenodepredictionandfoundthattheoverallperformanceusingcross-validationonthetrainingdataimprovedto88.2and79.9pointsinF-scorefortheImit+alignIndep+alignsystems.ThisisinagreementwiththeresultspresentedbyFlaniganetal.(2014)ondevel-opingasemanticparsingparserfortheAMRfor-malismwhoalsoarguethatnodepredictionisthemainperformancebottleneck.Tbl.3givesresultsonthetestset.TheoverallperformanceforImitis48.4F-scoreand47.9%forexactmatch.Asinthecross-validationresultsonthetrainingdata,trainingwithimitationlearningim-proveduponindependentlytrainedclassiﬁers.Theperformancewasimprovedfurtherusingthealign-mentdictionary,reaching53.5pointsinF-scoreand49.1%exactmatchaccuracy.Intheexperimentalsetupabove,dialogsfromthesamescenariosappearinbothtrainingandtesting.WhilethisisareasonableevaluationapproachalsofollowedinATISevaluations,itislikelytoberel-ativelyforgiving;inpractice,semanticparsersarelikelytoencounterentities,活动,etc.unseenintraining.Henceweconductedasecondevaluationinwhichdialogsfromonescenarioareusedtotrainaparserevaluatedontheother(stillrespectingthetrain/testsplitfrombefore).Whentestingonthedi-alogsfromtheﬁrstscenarioandtrainingonthedi-alogsfromthesecond,theoverallperformanceus-ingImit+alignwas36.9pointsinF-score,whileinthereverseexperimentitwas41.7.NotethatdirectcomparisonsagainsttheperformancesinTbl.3arenotmeaningfulsincefewerdialogsarebeingusedfortrainingandtestinginthecross-scenariosetup.8ComparisonwithRelatedWorkPreviousworkonsemanticparsinghandledthelackofalignmentsduringtraininginavarietyofways.ZettlemoyerandCollins(2009)manuallyengineeredaCCGlexiconfortheATIScorpus.Kwiatkowskietal.(2011)usedadedicatedalgo-rithmtoinferasimilardictionaryandusedalign-mentsfromGiza++(OchandNey,2000)toinitial-izetherelevantfeatures.MostrecentworkonGeo-Queryusesanalignmentdictionarythatincludesforeachgeographicalentityallnounphrasesreferringtoit(Jonesetal.,2012).Morerecently,Flaniganetal.(2014)developedadedicatedalignmentmodelontopofwhichtheylearnedasemanticparserfortheAMRformalism.Inourapproach,welearnthealignmentstogetherwiththesemanticparserwith-outrequiringadictionary.Intermsofstructuredpredictionframeworks,mostpreviousworkuseshiddenvariablelinear(ZettlemoyerandCollins,2007)orlog-linear(Liangetal.,2011)modelswithbeamsearch.Intermsofdirectcomparisonswithexistingwork,thegoalofthispaperistointroducethenewcorpusandpro-videacompetitiveﬁrstattemptatthenewsemanticparsingtask.However,webelieveitisnon-trivialtoapplyexistingapproachestothenewtask,自从,as-sumingadecompositionsimilartothatofSec.5.1,exhaustivesearchwouldbetooexpensive,andap-plyingvanillabeamsearchwouldbedifﬁcultsincedifferentpredictionsresultinbeamsof(sometimesradically)differentlengthsthatarenotcomparable.WehaveattemptedapplyingtheMT-basedse-manticparsingapproachproposedbyAndreasetal.(2013)toourdatasetbutininitialexperimentsthe

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

557

ImitImit+alignIndepIndep+alignexactmatch(准确性)47.9%49.1%47.6%46.1%dialogact(准确性)77%80.5%79.8%79.5%节点(Rec/Prec/F)68.745.754.875.551.761.441.961.149.75464.958.9论据(Rec/Prec/F)73.973.773.876.877.377.169.561.365.177.363.669.8focus(Rec/Prec/F)87.180.783.88681.283.681.673.477.390.676.883.1全面的(Rec/Prec/F)56.642.348.463.546.253.541.247.844.35047.448.7Table3:Performancesonthetestset.performancewaspoor.Themainreasonforthisisthat,unlikeGeoQuery,theproposedMRLdoesnotalignwellwithEnglish.TheexpertpolicyinDAGGERisageneralizationofthedynamicoracleofGoldbergandNivre(2013)forshift-reducedependencyparsingtoanystruc-turedpredictiontaskdecomposedintoasequenceofactions.Therandomizedexpertpolicyproposedex-tendsDAGGERtolearnnotonlyhowtoavoiderrorpropagation,butalsohowtoinferlatentvariables.Themainbottleneckistrainingdatasparsity.Somenodetypesappearonlyafewtimesinrela-tivelylongutterances,andthusitisdifﬁculttoinferappropriatealignmentsforthem.Unlikemachinetranslationbetweennaturallanguages,itisunreal-istictoexpectlargequantitiesofutterancestobeannotatedwithMRexpressions.Anappealingal-ternativewouldbetouseresponse-basedlearning,i.e.usetheresponsefromthesysteminsteadofMRexpressionsastrainingsignal(Liangetal.,2011;Kwiatkowskietal.,2013;BerantandLiang,2014).Howeversuchanapproachwouldnotbestraight-forwardtoimplementinourapplication,sincetheresponsefromthesystemisnotalwaystheresultofadatabasequerybut,e.g.,anavigationinstruc-tionthatiscontext-dependentandthusdifﬁculttoassessitscorrectness.Furthermore,itwouldrequirethedevelopmentofausersimulator(Keizeretal.,2012),anon-trivialtaskwhichisbeyondthescopeofthiswork.AdifferentapproachistousedialogsbetweenasystemanditsusersasproposedbyArtziandZettlemoyer(2011)usingtheDARPAcommu-nicatorcorpus(Walkeretal.,2002).然而,inthatworkutteranceswereselectedtobeshorterthan6wordsandtoincludeonenounphrasepresentinthelexiconusedduringlearningwhileignoringshortbutcommonphrasessuchas“yes”and“no”;thusitisunclearwhetheritwouldbeapplicabletoourdataset.Finally,dialogcontextisonlytakenintoaccountinpredictingthedialogactforeachutterance.Eventhoughourcorpuscontainscoreferenceinformation,wedidnotattemptthistaskasitisdifﬁculttoeval-uateandourperformanceonnodepredictiononwhichitreliesisrelativelylow.Weleavecorefer-enceresolutiononthenewcorpusasaninterestingandchallengingtaskforfuturework.9ConclusionsInthispaperwepresentedanewcorpusforcontext-dependentsemanticparsinginthecontextofaportable,interactivenavigationandexplorationsys-temfortourism-relatedactivities.TheMRLusedfortheannotationcanhandledialogcontextsuchascoreferenceandcanaccommodateutterancesthatarenotinterpretableaccordingtoadatabase.Weconductedaninter-annotatoragreementstudyandfound0.829exactmatchagreement.WealsodevelopedasemanticparserfortheSPACEBOOKcorpususingtheimitationlearningal-gorithmDAGGERthat,unlikepreviousapproaches,caninferthemissingalignmentsinthetrainingdatausingarandomizedexpertpolicy.Inexperimentsusingthenewcorpuswefoundthattrainingwithim-itationlearningsubstantiallyimprovesperformancecomparedtoindependentlytrainedclassiﬁers.Fi-nally,weshowedhowtoimproveperformancefur-therbyincorporatinganalignmentdictionary.AcknowledgementsTheresearchreportedwasconductedwhiletheﬁrstauthorwasattheUniversityofCambridgeandfundedbytheEuropeanCommunity’sSev-enthFrameworkProgramme(FP7/2007-2013)和-

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

558

dergrantagreementno.270019(SPACEBOOKprojectwww.spacebook-project.eu).TheauthorswouldliketoacknowledgetheworkofDi-aneNichollsintheannotation;theeffortsofRobinHillincollectingthedialogsfromWizard-of-Ozex-periments;andTimVieiraforhelpfulcommentsonanearlierversionofthismanuscript.ReferencesJamesAllenandMarkCore.1997.Dialogueactmarkupinseverallayers.Technicalreport,UniversityofRochester.JacobAndreas,AndreasVlachos,andStephenClark.2013.Semanticparsingasmachinetranslation.InProceedingsofthe51stAnnualMeetingoftheAssoci-ationforComputationalLinguistics(shortpapers).YoavArtziandLukeZettlemoyer.2011.Bootstrappingsemanticparsersfromconversations.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNatu-ralLanguageProcessing,pages421–432,Edinburgh,UK.LauraBanarescu,ClaireBonial,ShuCai,MadalinaGeorgescu,KiraGrifﬁtt,UlfHermjakob,KevinKnight,PhilippKoehn,MarthaPalmer,andNathanSchneider.2013.Abstractmeaningrepresentationforsembanking.InProceedingsofthe7thLinguisticAnnotationWorkshopandInteroperabilitywithDis-course,pages178–186,Soﬁa,Bulgaria,August.As-sociationforComputationalLinguistics.JonathanBerantandPercyLiang.2014.Semanticpars-ingviaparaphrasing.InProceedingsofthe52ndAn-nualMeetingoftheAssociationforComputationalLinguistics.KurtBollacker,ColinEvans,PraveenParitosh,TimSturge,andJamieTaylor.2008.Freebase:acol-laborativelycreatedgraphdatabaseforstructuringhu-manknowledge.InProceedingsofthe2008ACMSIGMODInternationalConferenceonManagementofData,pages1247–1250.HarryBunt,JanAlexandersson,Jae-WoongChoe,AlexChengyuFang,KoitiHasida,VolhaPetukhova,AndreiPopescu-Belis,andDavidTraum.2012.Iso24617-2:Asemantically-basedstandardfordialogueannotation.InProceedingsoftheEightInternationalConferenceonLanguageResourcesandEvaluation,Istanbul,Turkey.QingqingCaiandAlexanderYates.2013.Large-scaleSemanticParsingviaSchemaMatchingandLexiconExtension.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics.AnnCopestake,DanFlickinger,IvanSag,andCarlPol-lard.2005.Minimalrecursionsemantics:Anin-troduction.ResearchinLanguageandComputation,3(2–3):281–332.KobyCrammer,AlexKulesza,andMarkDredze.2009.Adaptiveregularizationofweightvectors.InAd-vancesinNeuralInformationProcessingSystems22,pages414–422.DeborahA.Dahl,MadeleineBates,MichaelBrown,WilliamFisher,KateHunicke-Smith,DavidPallett,ChristinePao,AlexanderRudnicky,andElizabethShriberg.1994.ExpandingthescopeoftheATIStask:theATIS-3corpus.InProceedingsoftheWork-shoponHumanLanguageTechnology,pages43–48,Plainsboro,NewJersey.HalDaum´eIII,JohnLangford,andDanielMarcu.2009.Search-basedstructuredprediction.MachineLearn-ing,75:297–325.PedroDomingos.1999.Metacost:ageneralmethodformakingclassiﬁerscost-sensitive.InProceedingsofthe5thInternationalConferenceonKnowledgeDis-coveryandDataMining,pages155–164.AssociationforComputingMachinery.JeffreyFlanigan,SamThomson,JaimeCarbonell,ChrisDyer,andNoahA.Smith.2014.Adiscriminativegraph-basedparserfortheabstractmeaningrepresen-tation.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages1426–1436,Baltimore,Maryland,June.AssociationforComputationalLin-guistics.YoavGoldbergandJoakimNivre.2013.Trainingdeterministicparserswithnon-deterministicoracles.TransactionsoftheAssociationforComputationalLinguistics,3(1):403–414,October.HeHe,HalDaum´eIII,andJasonEisner.2013.Dynamicfeatureselectionfordependencyparsing.InProceed-ingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1455–1464,Seattle,October.MatthewHenderson,BlaiseThomson,andJasonWilliams.2014.TheThirdDialogStateTrackingChallenge.InProceedingsofIEEESpokenLanguageTechnology.RobinHill,JanaG¨otze,andBonnieWebber.2013.SpaceBookProject:FinalDataRelease,Wizard-of-Oz(WoZ)experiments.Technicalreport,UniversityofEdinburgh.SrinivasanJanarthanam,OliverLemon,PhilBartie,TiphaineDalmas,AnnaDickinson,XingkunLiu,WilliamMackaness,andBonnieWebber.2013.Eval-uatingacityexplorationdialoguesystemwithinte-gratedquestion-answeringandpedestriannavigation.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

559

InProceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics(Volume1:LongPapers),pages1660–1668,Soﬁa,Bulgaria,Au-gust.AssociationforComputationalLinguistics.BevanKeeleyJones,MarkJohnson,andSharonGoldwa-ter.2012.SemanticparsingwithBayesiantreetrans-ducers.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics,pages488–496.HansKampandUweReyle.1993.FromDiscoursetoLogic.IntroductiontoModeltheoreticSemanticsofNaturalLanguage,FormalLogicandDiscourseRep-resentationTheory.Kluwer,Dordrecht.SimonKeizer,StphaneRossignol,SenthilkumarChan-dramohan,andOlivierPietquin.2012.Usersimula-tioninthedevelopmentofstatisticalspokendialoguesystems.InOliverLemonandOlivierPietquin,edi-tors,Data-DrivenMethodsforAdaptiveSpokenDia-logueSystems,pages39–73.SpringerNewYork.JohnF.Kelley.1983.Anempiricalmethodologyforwritinguser-friendlynaturallanguagecomputerappli-cations.InProceedingsoftheSIGCHIConferenceonHumanFactorsinComputingSystems,pages193–196.TomKwiatkowski,LukeZettlemoyer,SharonGoldwa-ter,andMarkSteedman.2011.Lexicalgeneraliza-tioninCCGgrammarinductionforsemanticparsing.InProceedingsofthe2011ConferenceonEmpiri-calMethodsinNaturalLanguageProcessing,pages1512–1523,Edinburgh,UK.TomKwiatkowski,EunsolChoi,YoavArtzi,andLukeZettlemoyer.2013.Scalingsemanticparserswithon-the-ﬂyontologymatching.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1545–1556,Seattle,WA.PercyLiang,MichaelJordan,andDanKlein.2011.Learningdependency-basedcompositionalsemantics.InProceedingsofthe49thAnnualMeetingoftheAs-sociationforComputationalLinguistics:HumanLan-guageTechnologies,pages590–599,Portland,Ore-gon.MattMacMahon,BrianStankiewicz,andBenjaminKuipers.2006.Walkthetalk:connectinglanguage,知识,andactioninrouteinstructions.InPro-ceedingsofthe21stNationalConferenceonArtiﬁcialIntelligence,pages1475–1482.AAAIPress.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyFinkel,StevenJ.Bethard,andDavidMcClosky.2014.TheStanfordCoreNLPnaturallanguagepro-cessingtoolkit.InProceedingsof52ndAnnualMeet-ingoftheAssociationforComputationalLinguistics:SystemDemonstrations,pages55–60.FranzJosefOchandHermannNey.2000.Improvedsta-tisticalalignmentmodels.InProceedingsofthe38thAnnualMeetingoftheAssociationforComputationalLinguistics,pages440–447,HongKong,China.St´ephaneRoss,GeoffreyJ.Gordon,andDrewBagnell.2011.Areductionofimitationlearningandstructuredpredictiontono-regretonlinelearning.In14thIn-ternationalConferenceonArtiﬁcialIntelligenceandStatistics,pages627–635.AndreasStolcke,KlausRies,NoahCoccaro,Eliza-bethShriberg,RebeccaBates,DanielJurafsky,PaulTaylor,RachelMartin,CarolVanEss-Dykema,andMarieMeteer.2000.Dialogueactmodelingforautomatictaggingandrecognitionofconversationalspeech.Computationallinguistics,26(3):339–373.GokhanTur,DilekHakkani-T¨ur,andLarryHeck.2010.What’slefttobeunderstoodinATIS?InIEEEWork-shoponSpokenLanguageTechnologies.AndreasVlachos.2012.Aninvestigationofimitationlearningalgorithmsforstructuredprediction.JournalofMachineLearningResearchWorkshopandConfer-enceProceedings,Proceedingsofthe10thEuropeanWorkshoponReinforcementLearning,24:143–154.MarilynA.Walker,AlexanderI.Rudnicky,RashmiPrasad,JohnS.Aberdeen,ElizabethOwenBratt,JohnS.Garofolo,HelenWrightHastie,AudreyN.Le,BryanL.Pellom,AlexandrosPotamianos,Re-beccaJ.Passonneau,SalimRoukos,GregoryA.Sanders,StephanieSeneff,andDavidStallard.2002.DARPAcommunicator:cross-systemresultsforthe2001evaluation.InProceedingsofthe7thInterna-tionalConferenceonSpokenLanguageProcessing.BonnieLynnWebber.1978.AFormalApproachtoDis-courseAnaphora.Ph.D.thesis,HarvardUniversity.JohnM.Zelle.1995.UsingInductiveLogicProgram-mingtoAutomatetheConstructionofNaturalLan-guageParsers.Ph.D.thesis,DepartmentofComputerSciences,TheUniversityofTexasatAustin.LukeS.ZettlemoyerandMichaelCollins.2007.OnlinelearningofrelaxedCCGgrammarsforparsingtologi-calform.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning,pages678–687.LukeS.ZettlemoyerandMichaelCollins.2009.Learn-ingcontext-dependentmappingsfromsentencestologicalform.InProceedingsoftheJointconferenceofthe47thAnnualMeetingoftheAssociationforComputationalLinguisticsandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAsianFederationofNaturalLanguageProcessing,pages976–984,Singapore.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你

/
t

A
C
我
/

我

A
r
t
我
C
e
–
p
d

F
/

d
哦

我
/

1
0
1
1
6
2

/
t

我

A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5

/
t

我

A
C
_
A
_
0
0
2
0
2
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

560
下载pdf