计算语言学协会会刊, 卷. 2, PP. 547–559, 2014. 动作编辑器: Sharon Goldwater, Alexander Koller.
提交批次: 3/2014; 修改批次 8/2014; 已发表 12/2014. C
(西德:13)
2014 计算语言学协会.
547
ANewCorpusandImitationLearningFrameworkforContext-DependentSemanticParsingAndreasVlachosComputerScienceDepartmentUniversityCollegeLondona.vlachos@cs.ucl.ac.ukStephenClarkComputerLaboratoryUniversityofCambridgesc609@cam.ac.ukAbstractSemanticparsingisthetaskoftranslatingnaturallanguageutterancesintoamachine-interpretablemeaningrepresentation.Mostapproachestothistaskhavebeenevaluatedonasmallnumberofexistingcorporawhichassumethatallutterancesmustbeinterpretedaccordingtoadatabaseandtypicallyignorecontext.Inthispaperwepresentanew,pub-liclyavailablecorpusforcontext-dependentsemanticparsing.TheMRLusedforthean-notationwasdesignedtosupportaportable,interactivetouristinformationsystem.WedevelopasemanticparserforthiscorpusbyadaptingtheimitationlearningalgorithmDAGGERwithoutrequiringalignmentinfor-mationduringtraining.DAGGERimprovesuponindependentlytrainedclassifiersby9.0and4.8pointsinF-scoreonthedevelopmentandtestsetsrespectively.1IntroductionSemanticparsingisthetaskoftranslatingnatu-rallanguageutterancesintoamachine-interpretablemeaningrepresentation(MR).Progressinsemanticparsinghasbeenfacilitatedbytheexistenceofcor-poracontainingutterancesannotatedwithMRs,themostcommonlyusedbeingATIS(Dahletal.,1994)andGeoQuery(Zelle,1995).Asthesecorporacoverrathernarrowapplicationdomains,recentworkhasdevelopedcorporatosupportnaturallanguagein-terfacestotheFreebasedatabase(CaiandYates,2013),aswellasthedevelopmentofMTsystems(Banarescuetal.,2013).然而,theseexistingcorporahavesomeim-portantlimitations.TheMRsaccompanyingtheutterancesaretypicallyrestrictedtosomeformofdatabasequery.Furthermore,inmostcaseseachutteranceisinterpretedinisolation;thusutterancesthatusecoreferenceorwhosesemanticsarecontext-dependentaretypicallyignored.Inthispaperwepresentanewcorpusforcontext-dependentseman-ticparsingtosupportthedevelopmentofaninterac-tivenavigationandexplorationsystemfortourism-relatedactivities.ThenewcorpuswasannotatedwithMRsthatcanhandledialogcontextsuchascoreferenceandcanaccommodateutterancesthatarenotinterpretableaccordingtoadatabase,e.g.repetitionrequests.Theutteranceswerecollectedinexperimentswithhumansubjects,andcontainphe-nomenasuchasellipsisanddisfluency.Wedevel-opedguidelinesandannotated17dialogscontaining2,374utterances,with82.9%exactmatchagreementbetweentwoannotators.Wealsodevelopasemanticparserforthiscor-pus.AstheoutputMRsarerathercomplex,in-steadofadoptinganapproachthatsearchestheout-putspaceexhaustively,weusetheimitationlearningalgorithmDAGGER(Rossetal.,2011)thatconvertslearningastructuredpredictionmodelintolearningasetofclassificationmodels.Wetakeadvantageofitsabilitytolearnwithnon-decomposablelossfunc-tionsandextendittohandletheabsenceofalign-mentinformationduringtrainingbydevelopingarandomizedexpertpolicy.Ourapproachimprovesuponindependentlytrainedclassifiersby9.0and4.8F-scoreonthedevelopmentandtestsets.2MeaningRepresentationLanguageOurproposedMRlanguage(MRL)wasdesignedinthecontextoftheportable,interactivenaviga-
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
548
tionandexplorationsystemofJanarthanametal.(2013),throughwhichuserscanobtaininformationaboutplacesandobjectsofinterest,suchasmon-umentsandrestaurants,aswellasdirections(seedialoginFig.1).Thesystemisawareofthepo-sitionoftheuser(throughtheuseofGPStechnol-ogy)andisdesignedtobeinteractive;henceitcaninitiatethedialogbyofferinginformationonnearbypointsofinterestandcorrectingtheroutetakenbytheuserifneeded.TheMRsreturnedbythese-manticparsermustrepresenttheuserutterancesad-equatelysothatthesystemcangeneratetheappro-priateresponse.ThesystemwasdevelopedinthecontextoftheSPACEBOOKproject.1TheMRLusesaflatsyntaxcomposedofelemen-tarypredications,basedlooselyonminimalrecur-sionsemantics(Copestakeetal.,2005),butwith-outanexplicittreatmentofscope.EachMRcon-sistsofadialogactrepresentingtheoverallfunctionoftheutterance,followedforsomedialogactsbyanunorderedsetofpredicates.Allpredicatesareimplicitlyconjoinedandthenamesoftheirargu-mentsspecifiedtoimprovereadabilityandtoallowforsomeoftheargumentstobeoptional.Thear-gumentvaluescanbeeitherconstantsfromthecon-trolledvocabulary,verbatimstringextractsfromtheutterance(enclosedinquotes)orvariables(Xno).Negationisdenotedbyatilde(˜)infrontofpredi-cates.Thevariablesareusedtobindtogetherthear-gumentsofdifferentpredicateswithinanutterance,aswellastodenotecoreferenceacrossutterances.ThegoalsindesigningtheMRLweretoremainclosetoexistingsemanticformalisms,whilstatthesametimeproducinganMRLthatisparticularlysuitedtotheapplicationathand(Janarthanametal.,2013).WealsowantedanMRLthatcouldbecom-putedwithefficientlyandaccurately,giventhena-tureoftheNLinput.HencewedevelopedanMRLthatisabletoexpresstherelevantsemanticsforthemajorityoftheutterancesinourdata,withoutmov-ingtothefullexpressivepowerof,e.g.,DRT.DialogactsThedialogactsareutterance-levella-belswhichcapturetheoverallfunctionoftheutter-anceinthedialog,forexamplewhetheranutteranceisaquestionseekingalistasananswer,astatementofinformation,anacknowledgement,aninstruction1www.spacebook-project.euUSERwhat’sthenearestitalian,嗯,forameal?dialogAct(set_question)*isA(id:X1,type:餐厅)def(id:X1)hasProperty(id:X1,property:cuisine,价值:”italian”)distance(地点:@USER,地点:X1,value:X2)argmin(争论:X1,value:X2)WIZARDvapiano’s.dialogAct(通知)isA(id:X4,type:餐厅)*isNamed(id:X4,name:”vapiano’s”)equivalent(id:X1,id:X4)USERtakemetovapiano!dialogAct(set_question)*路线(from_location:@USER,to_location:X4)isA(id:X4,type:餐厅)isNamed(id:X4,name:”vapiano”)WIZARDcertainly.dialogAct(acknowledge)WIZARDkeepwalkingstraightdownclerkstreet.dialogAct(instruct)*walk(agent:@USER,along_location:X1,direction:向前)isA(id:X1,type:street)isNamed(id:X1,name:”clerkstreet”)USERyes.dialogAct(acknowledge)USERwhatisthischurch?dialogAct(set_question)*isA(id:X2,type:church)指数(id:X2)WIZARDsorry,canyousaythisagain?dialogAct(repeat)USERisaidwhatisthischurchonmyleft!dialogAct(set_question)*isA(id:X2,type:church)指数(id:X2)位置(id:X2,ref:@USER,地点:左边)WIZARDitissaintjohn’s.dialogAct(通知)isA(id:X3,type:church)*isNamed(id:X3,name:”saintjohn’s”)equivalent(id:X2,id:X3)USERAsignheresaysitissaintmark’s.dialogAct(通知)isA(id:X4,type:church)*isNamed(id:X4,name:”saintmark’s”)equivalent(id:X2,id:X4)Figure1:SampledialogannotatedwithMRs
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
549
orarepetitionrequest(set_question,通知,acknowledge,instructandrepeatinFig-ure1).ThefocalpointtogetherwiththeactprovidesimilarinformationtotheintentannotationinATIS(Turetal.,2010).TheactsdefinedintheproposedMRLfollowtheguidelinesproposedbyAllenandCore(1997),Stolckeetal.(2000)andBuntetal.(2012).Thedialogactsaredividedintotwocategories.Thefirstcategorycontainsthosethatareaccompa-niedbyasetofpredicatestorepresenttheseman-ticsofthesentence,suchasset_questionandinform.Fortheseactswedenotetheirfocalpoints—forexamplethepieceofinformationrequestedinaset_question—withanasterisk(*)infrontoftherelevantpredicate.Thesecondcategorycon-tainsdialogactsthatarenotaccompaniedbypredi-cates,suchasacknowledgeandrepeat.PredicatesTheMRLcontainspredicatestode-noteentities,propertiesandtheirrelations:•Predicatesintroducingentitiesandtheirproper-ties:isA,isNamedandhasProperty.•Predicatesdescribinguseractions,suchaswalkandturn,withargumentssuchasdirectionandalong_location.•Predicatesdescribinggeographicrelations,suchasdistance,routeandposition,usingreftodenoterelativepositioning.•Predicatesdenotingwhetheranentityisintro-ducedusingadefinitearticle(def),anindefi-nite(indef)oranindexical(指数).•Predicatesexpressingnumericalrelationssuchasargminandargmax.CoreferenceInordertomodelcoreferenceweadoptthenotionofdiscoursereferents(DRs)anddiscourseentities(DEs)fromDiscourseRepresenta-tionTheory(DRT)(Webber,1978;KampandReyle,1993).DRsarereferentialexpressionsappearinginutteranceswhichdenoteDEs,whicharementalentitiesinthespeaker’smodelofdiscourse.Mul-tipleDEscanrefertothesamereal-worldentity;forexample,inFig.1“vapiano’s”referstoadif-ferentDEfromtherestaurantintheprevioussen-tence(“thenearestitalian”),eventhoughtheyarelikelytobethesamereal-worldentity.Wecon-sideredDEsinsteadofactualentitiesintheMRLbecausetheyallowustocapturethesemanticsofinteractionssuchasthelastexchangebetweenthewizardanduser.TheMRLrepresentsmultipleDEsreferringtothesamereal-worldentitythroughthepredicateequivalent.Coreferenceisindicatedbyusingidenticalvari-ablesacrosspredicateargumentswithinanutteranceoracrossutterances.Themainprincipleindeter-miningwhetherDRscoreferisthatitmustbepossi-bletoinferthisfromthedialogcontextalone,with-outusingworldknowledge.3DataCollectionandAnnotationTheNLutteranceswerecollectedusingWizard-of-Ozexperiments(Kelley,1983)withpairsofhu-mansubjects.Ineachexperiment,onehumanpre-tendedtobeatouristvisitingEdinburgh(byphysi-callywalkingaroundthecity),whiletheotherper-formedtheroleofthesystemrespondingthroughasuitableinterfaceusingatext-to-speechsystem.Eachuser-wizardpairwasgivenoneoftwosce-nariosinvolvingrequestsfordirectionstodifferentpointsofinterest.ThefirstscenarioinvolvesseekingdirectionstothenationalmuseumofScotland,thengoingtoanearbycoffeeshop,followedbyapubviaacashmachineandfinallylookingforapark.ThesecondscenarioinvolveslookingforaJapaneserestaurantandtheuniversitygym,requestinginfor-mationabouttheFloddenWallmonument,visitingtheScottishparliamentandtheDynamicEarthsci-encecentre,andgoingtotheRoyalMileandtheSurgeon’sHallmuseum.Eachexperimentformedonedialogwhichwasmanuallytranscribedfromrecordedaudiofiles.17dialogswerecollectedintotal,7fromthefirstscenarioand10fromthesec-ond.MoredetailsarereportedinHilletal.(2013).Giventhevariednatureofthedialogs,someoftheuserrequestswerenotwithinthescopeofthesys-tem.Furthermore,theproposedMRLhasitsownlimitations;forexampleitdoesnothavepredicatestoexpresstemporalrelationships.Thus,itwasnec-essarytofiltertheutterancescollectedanddecidewhichonestoannotatewithMRs.2Inparticular,we2AsimilarfilteringprocesswasusedforGeoQuery(Sec-tion7.5.1inZelle(1995))andATIS(principlesofinterpretationdocument(/atis3/doc/pofi.doc)intheNISTCDs).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
550
vocabularytypenumberoftermsdialogacts15predicates19arguments41constants9entitytypes26properties4Table1:MRLvocabularyusedintheannotationdidnotannotateutterancesfallingintooneormoreofthefollowingcategories:•Utterancesthatarenothuman-interpretable,e.g.utterancesthatwereinterruptedtooearlytobeinterpretable.Insuchcases,thesystemislikelytorespondwitharepetitionrequest.•Utterancesthatarehuman-interpretablebutout-sidethescopeofthesystem,e.g.questionsabouthistoricaleventswhicharenotincludedinthedatabaseoftheapplicationconsidered.•Utterancesthatarewithinthescopeofthesys-tembuttoocomplextoberepresentedbytheproposedMRL,e.g.anutterancerequiringrep-resentationoftimetobeinterpreted.NotethatwestillannotateanutteranceifthecoreofitssemanticscanbecapturedbytheMRL.Forexample,“takemetovapianonow!”wouldbean-notated,eventhoughtheMRLcannotrepresentthemeaningof“now”.Broadinformationrequestssuchas“tellmemoreaboutthischurch”arealsoanno-tatedusingthepredicateextraInfo(id:Xno).WearguethatdeterminingwhichutterancesshouldbetranslatedintoMRs,andwhichshouldbeig-nored,isanimportantsubtaskforreal-worldappli-cationsofsemanticparsing.Theannotationwasperformedbyoneoftheau-thorsandafreelancelinguistwithnoexperienceinsemanticparsing.Aswellasannotatingtheuserutterances,wealsoannotatedthewizardutteranceswithdialogactsandtheentitiesmentioned,astheyprovidethenecessarycontexttoperformcontext-dependentinterpretation.Inpractice,尽管,weexpectthisinformationtobeusedbyanaturallan-guagegenerationsystemtoproducethesystem’sre-sponseandthusbeavailabletothesemanticparser.Thetotalnumberofuserutterancesannotatedwas2374,outofwhich1906wereannotatedwithMRs,theremainingnottranslatedduetotherea-sonsdiscussedearlierinthissection.ThenumberandtypesoftheMRLvocabularytermsusedap-pearinTbl.1.Theannotateddialogs,theguide-linesandthelistsofthevocabularytermsareavailablefromhttp://sites.google.com/site/andreasvlachos/resources.Inordertoassessthequalityoftheguidelinesandtheannotation,weconductedaninter-annotatoragreementstudy.Forthispurpose,thetwoanno-tatorsannotatedonedialogconsistingof510utter-ances.Exactmatchagreementattheutterancelevel,whichrequiresthattheMRsbytheannotatorsagreeondialogact,predicatesandwithin-utterancevari-ableassignment,was0.829,whichisastrongre-sultgiventhecomplexityoftheannotationtask,andwhichsuggeststhattheproposedguidelinescanbeappliedconsistently.Wealsoassessedtheagree-mentonpredicatesusingF-score,whichwas0.914.4ComparisontoExistingCorporaThemostcloselyrelatedcorpustotheonepresentedinthispaper(hereinSPACEBOOK)istheairlinetravelinformationsystem(ATIS)语料库(Dahletal.,1994)whichconsistsofdialogsbetweenauserandaflightbookingsystemcollectedinWizard-of-Ozexperiments.EachutteranceisannotatedwiththeSQLstatementthatwouldreturntherequestedpieceofinformationfromtheflightsdatabase.Theutter-anceinterpretationiscontext-dependent.Forexam-ple,whentheuserfollowsupaninitialflightrequest—e.g.“findmeflightstoBoston”—withutterancescontainingadditionalpreferences—e.g.“onMon-day”—theinterpretationoftheadditionalprefer-encesextendstheMRfortheinitialrequest.ComparedtoATIS,thedialogsintheSPACE-BOOKcorpusaresubstantiallylonger(8.8vs.139.7utterancesonaveragerespectively)andcoverabroaderdomainduetothelongerscenariosusedindatacollection.Furthermore,allowingthewizardstoanswerinnaturallanguageinsteadofrestrictingthemtorespondingviadatabasequeriesasinATISledtomorevarieddialogs.Finally,ourapproachtoannotatingcoreferenceavoidsrepeatingtheMRofpreviousutterances,thusresultinginshorterex-pressionsthatareclosertothesemanticsoftheNL
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
551
utterances.Thedatasetsdevelopedintherecentdialogstatetrackingchallenge(Hendersonetal.,2014)alsocon-sistofdialogsbetweenauserandatourisminforma-tionsystem.Howeverthetaskiseasiersinceonlythreeentitytypesareconsidered(餐厅,cof-feeshopandpub),aslot-fillingMRLisusedandtheargumentslotstakevaluesfromfixedlists.Theabstractmeaningrepresentation(AMR)de-scribedbyBanarescuetal.(2013)wasdevelopedtoprovideasemanticinterpretationlayertoimprovemachinetranslation(公吨)systems.IthassimilarpredicateargumentstructuretotheMRLproposedhere,includingalackofcoverfortemporalrelationsandscoping.However,duetothedifferentappli-cationdomains(MTvs.tourism-relatedactivities),therearesomedifferences.SinceMTsystemsoper-ateatthesentence-level,eachsentenceisinterpretedinisolationinAMR,whilstourproposedMRLtakescontextintoaccount.Also,AMRtriestoaccountforallthewordsinasentence,whilstourMRLonlytriestocapturethesemanticsofthosewordsthatarerelevanttotheapplicationathand.OtherpopularsemanticparsingcorporaincludeGeoQuery(Zelle,1995)andFree-917(CaiandYates,2013).Bothconsistexclusivelyofquestionstobeansweredwithadatabasequery,theformerconsideringasmallAmericangeographydatabaseandthelatterthemuchwiderFreebasedatabase(Bollackeretal.,2008).UnlikeSPACEBOOKandATIS,thereisnonotionofcontextineitherofthesecorpora.Furthermore,theNLutterancesinthesecorporaarecompiledtobeinterpretedasdatabasequeries,whichisequivalenttoonlyoneofthedialogacts(set_question)intheSPACEBOOKcorpus.Thusthelatterallowstheexplorationoftheapplica-tionofdialogacttaggingasafirststepinsemanticparsing.Finally,MacMahonetal.(2006)developedacorpusofnaturallanguageinstructionspairedwithsequencesofactions;howeverthedomainislimitedtosimplenavigationinstructionsandthereisnono-tionofdialoginthiscorpus.5SemanticParsingfortheNewCorpusTheMRLinFig.1isreadableandeasytoannotatewith.However,itisnotidealforexperiments,asitisdifficulttocompareMRexpressionsbeyondexactmatch.Forthesereasons,weconvertedtheMRex-pressionsintoanode-argumentform.Inparticular,allpredicatesintroducingentities(isA)andmostpredicatesintroducingrelationsamongentities(e.g.distance)becomenodes,whileallotherpredi-cates(e.g.isNamed,def)areconvertedintoargu-ments.Forexample,theMRforthefirstutteranceinFig.1isconvertedintotheforminFig.2g.EntitiesappearinginMRexpressionswithoutatype(e.g.X2inthelastutteranceofFig.1)aredenotedwithanodeoftypeempty.Eachnodehasauniqueid(e.g.X1)andeachargumentcantakeasvalueaconstant(e.g.det),anodeid,oraverbatimstringextractfromtheutterance.Argumentsthatareabsent(e.g.thenameofrestaurant)aresettotheconstantnull.Thisconversionresultsin16utterance-levellabels(15dialogactsplusoneforthenon-interpretableut-terances),35nodetypesand32arguments.Thecomparisonbetweenapredictedandagoldstandardnode-argumentformisperformedinthreestages.Firstwemaptheidsofthepredictednodestothoseofthegoldstandard.Whileidsdonotcarryanysemantics,theyareneededtodifferenti-atebetweenmultiplenodesofthesametype;e.g.ifasecondrestauranthadbeenpredictedinFig.2hthenitwouldhaveadifferentidandwouldnotbematchedtoagoldstandardnode.Second,wedecomposethenode-argumentformsintoasetofatomicpredictions(Fig.2h).Thisdecomposi-tionallowstheawardingofpartialcredit,e.g.whenthenodetypeiscorrectbutsomeoftheargumentsarenot.Usingtheseatomicpredictionswecalculateprecision,recallandF-score.Themappingbetweenpredictedandgoldstan-dardidsisperformedbyevaluatingallmappings(withmappingsbetweennodesofdifferenttypesnotallowed),andchoosingtheoneresultinginthelow-estsumoffalsepositivesandnegatives.5.1TaskdecompositionFig.2showsthedecompositionofthesemanticparsingtaskinstages,whicharedescribedbelow.DialogactpredictionWefirstassignanutterance-levellabelusingaclassifierthatex-ploitsfeaturesbasedonthetextualcontentoftheutteranceandontheutteranceprecedingit.Thefea-turesextractedfromtheutteranceareallunigrams,
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
552
what’sthenearestitalianforameal?SETQUESTION(A)Dialogactpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurant(乙)Nodepredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetUSERlocation(C)Constantargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationOUTOUTOUTOUTINOUTOUTOUT(d)Stringargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationargminlocation(e)Nodeargumentpredictionwhat’sthenearestitalianforameal?SETQUESTIONdistancerestaurantdefsingularnumberdetcuisineUSERlocationargminlocationfocus(F)Focus/negationpredictionSETQUESTIONX1:餐厅(编号:singular,这:def,cuisine:“italian”)X2:distance(地点:USER,地点:X1,argmin:X1)focus:X1(G)Node-argumentformdialogAct:SETQUESTIONX1:restaurantX1:餐厅(编号:singular)X1:餐厅(这:def)X1:餐厅(cuisine:“italian”)X2:distanceX2:distance(地点:USER)X2:distance(地点:X1-restaurant(编号:singular,这:def))X2:distance(argmin:X1)focus:X1-restaurant(编号:singular,这:def)(H)AtomicpredictionsFigure2:Semanticparsingdecomposition.bigramsandtrigramsandthefinalpunctuationmark.Unlikeintypicaltextclassificationtasks,contentwordsarenotalwayshelpfulindialogacttagging;e.g.thetoken“meal”inFig.2aisnotindicativeofset_question,whilen-gramsofwordstypicallyconsideredasstopwords,suchas“what’sthe”,canbemorehelpful.Ifthedialogactpredictedistobeaccompaniedbyotherpredicatesaccordingtotheguidelines(Sec.2)weproceedtothefollowingstages,otherwisestop.Thefeaturesbasedontheprecedingutterancein-dicatewhetheritwasbytheuserorthewizardand,inthelattercase,itsdialogact.Suchfeaturesareusefulindeterminingtheactofshort,ambiguousutterancessuchas“yes”,whichistaggedasyeswhenfollowingaprop_questionutterance,butasacknowledgeotherwise.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
553
NodepredictionInnodepredictionweuseaclas-sifiertopredictwhethereachofthetokensintheut-terancedenotesanodeofaparticulartypeorempty(Fig.2b).Thefeaturesusedincludethetargetto-kenanditslemma,whichareconjoinedwiththePoStag,thepreviousandfollowingtokens,aswellasthelemmasofthetokenswithwhichithassyn-tacticdependencies.Furtherfeaturesrepresentthedialogact(e.g.routeismorelikelytoappearinasetquestionutterance),andthenumberandtypesofthenodesalreadypredicted.Sincetheevaluationignoresthealignmentbetweennodesandtokens,itwouldhavebeencorrecttopredictthecorrectnodesfromanytoken;e.g.restaurantcouldbepredictedfrom“italian”instead.However,alignmentdoesaffectargumentprediction,sinceitdeterminesitsfeatureextraction.ConstantargumentpredictionInthisstage(Fig.2c)wepredict,foreachargumentofeachnode,whetheritsvalueisanMRLvocabularyterm,aver-batimstringextract,anode,orabsent(specialval-uesSTRING,NODE,nullrespectively).IfthevaluepredictedisSTRINGorNODEitisreplacedbythepredictionsinsubsequentstages.Foreachargumentdifferentvaluesarepossible;thusweuseseparateclassifiersforeach,resultingin32classifiers.Thefeaturesusedincludethenodetype,thetokenthatpredictedthenode,andthesyntacticdependencypathsfromthattokentoallothertokensintheut-terance.Wealsoincludeasfeaturesthevaluespre-dictedforotherargumentsofthenode,thedialogact,andtheothernodetypespredicted.StringargumentpredictionForeachargumentpredictedtobeSTRING(e.g.cuisineinFig-ure2d),wepredictforeachtokeninleft-to-rightor-derwhetheritshouldbepartofthevalueforthisargumentornot(INorOUT).Sincethestringsthatareappropriateforeachargumentdiffer(e.g.thestringsforcuisineareunlikelytobeappropriateforname),weuseseparateclassifiersforeachofthem,resultinginfiveclassifiers.Thefeaturesusedincludethetargettokenanditslemma,itsconjunc-tionwiththePoStag,thepreviousandfollowingtokens,andthelemmasofthetokenswithwhichithassyntacticdependencies.Wealsoaddedthelabelassignedtotheprevioustokenandthesyntacticde-pendencypathtothetokenthatpredictedthenode.NodeargumentpredictionForeachargumentpredictedtohaveNODEasitsvalue,wepredictforeveryothernodewhetheritshouldbethevalueornot(e.g.argmininFig.2e).Aswiththestringar-gumentprediction,weuseseparatebinaryclassifiersforeachargument,resultingin18classifiers.Thefeaturesextractedaresimilartothatstage,butwenowconsiderthetokensthatpredictedeachcandi-dateargumentnode(e.g.“meal”forrestaurant)insteadofthetokensintheutterance.Focus/NegationpredictionWepredictwhethereachnodeshouldbefocusedornegatedastwosep-aratebinarytasks.Thefeaturesusedincludetheto-kenthatpredictedthetargetnode,itslemmaandPoStagandthesyntacticdependencypathstoallothertokensintheutterance.Furtherfeaturesincludethetypeofthenodeanditsarguments.6ImitationLearningInordertolearntheclassifiersforthetaskde-compositiondescribed,twochallengesmustbead-dressed.Thefirstisthecomplexityofthestruc-turetobepredicted.Thetaskinvolvesmanyinter-dependentpredictionsmadebyavarietyofclas-sifiers,andthuscannotbetackledbyapproachesthatassumeaparticulartypeofgraphstructure,orrestrictstructurefeatureextractioninordertoper-formefficientdynamicprogramming.Thesecondchallengeisthelackofalignmentinformationdur-ingtraining.ImitationlearningalgorithmssuchasSEARN(Daum´eIIIetal.,2009)andDAGGER(Rossetal.,2011)havebeenappliedsuccessfullytoavari-etyofstructuredpredictiontasksincludingsumma-rization,biomedicaleventextractionanddynamicfeatureselection(Daum´eIIIetal.,2009;Vlachos,2012;Heetal.,2013)thankstotheirabilitytohan-dlecomplexoutputspaceswithoutexhaustivesearchandtheirflexibilityinincorporatingfeaturesbasedonthestructuredoutput.InthisworkwefocusonDAGGERandextendittohandlethemissingalign-ments.6.1StructuredpredictionwithDAGGERThedatasetaggregation(DAGGER)algorithm(Rossetal.,2011)formsthepredictionofaninstancesasasequenceofTactionsˆy1:Tpredictedbyalearnedpolicywhichconsistsofoneormoreclassifiers.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
554
Algorithm1:ImitationlearningwithDAG-GERInput:traininginstancesS,expertpolicyπ?,lossfunction‘,learningrateβ,CSClearnerCSCLOutput:LearnedpolicyHN1CSCExamplesE=∅2fori=1toNdo3p=(1−β)i−14currentpolicyπ=pπ?+(1−p)Hi−15forsinSdo6Predictπ(s)=ˆy1:T7forˆytinπ(s)do8ExtractfeaturesΦt=f(s,ˆy1:t−1)9foreachpossibleactionyjtdo10Predicty0t+1:T=π(s;ˆy1:t−1,yjt)11Assesscjt=‘(ˆy1:t−1,yjt,y0t+1:时间)12Add(Φt,ct)toE13LearnHi=CSCL(乙)Theseactionsaretakeninagreedyfashion,i.e.onceanactionhasbeentakenitcannotbechanged.Dur-ingtraining,itconvertstheproblemoflearninghowtopredictthesesequencesofactionsintocostsensi-tiveclassification(CSC)learning.InCSClearningeachtrainingexamplehasavectorofmisclassifica-tioncostsassociatedwithit,thusrenderingsomemistakesonsomeexamplestobemoreexpensivethanothers(Domingos,1999).Algorithm1presentsthetrainingprocedure.DAGGERrequiresasetoflabeledtraininginstancesSandalossfunction‘thatcomparescompleteout-putsforinstancesinSagainstthegoldstandard.Inaddition,anexpertpolicyπ?mustbespecifiedwhichisanoraclethatreturnstheoptimalactionfortheinstancesinS,akintoanexpertdemonstratingthetask.π?istypicallyderivedfromthegoldstan-dard;e.g.inpartofspeechtaggingπ?wouldreturnthecorrecttagforeachtoken.Inaddition,thelearn-ingrateβandaCSClearner(CSCL)mustbepro-vided.ThealgorithmoutputsalearnedpolicyHNthat,unlikeπ?,cangeneralizetounseendata.Eachtrainingiterationbeginsbysettingtheprob-abilityp(line3)ofusingπ?inthecurrentpolicyπ.Inthefirstiteration,onlyπ?isusedbut,inlaterit-erations,πbecomesstochasticand,foreachaction,圆周率?isusedwithprobabilityp,andthelearnedpol-icyfromthepreviousiterationHi−1withprobability1−p(line4).Thenπisusedtopredicteachtrain-inginstances(line6).Foreachactionˆyt,aCSCexampleisgenerated(lines7-12).ThefeaturesΦtareextractedfromsandallpreviousactionsˆy1:t−1(line8).Thecostforeachpossibleactionyjtises-timatedbypredictingtheremainingactionsy0t+1:Tforsusingπ(line10)andcalculatingthelossin-curredgivenyjtw.r.t.thegoldstandardforsusing‘(line11).Asπisstochastic,itiscommontousemultiplesamplesofy0t+1:Ttoassessthecostofeachactionyjtbyrepeatinglines10-11.Thefeatures,to-getherwiththecostsforeachpossibleaction,formaCSCexample(Φt,ct)(line12).Attheendofeachiteration,theCSCexamplesobtainedfromallitera-tionsareusedbytheCSClearningalgorithmtolearntheclassifier(s)forHi(line13).Whenpredictingthetraininginstances(line6),andwhenestimatingthecostsforeachpossibleac-tion(lines10-11),thepolicylearnedinthepreviousiterationHi−1isusedaspartofπafterthefirstit-eration.ThustheCSCexamplesgeneratedtolearnHidependonthepredictionsofHi−1and,bygrad-uallyincreasingtheuseofHi−1andignoringπ?inπ,thelearnedpoliciesareadjustedtotheirownpre-dictions,thuslearningthedependenciesamongtheactionsandhowtopredicttheminordertomini-mizetheloss.Thelearningrateβdetermineshowfastπmovesawayfromπ?.TheuseofHi−1inpredictingthetraininginstances(line6)alsohastheeffectofexploringsub-optimalactionssothatthelearnedpoliciesareadjustedtorecoverfromtheirmistakes.Finally,notethatifonlyonetrainingit-erationisperformed,thelearnedpolicyisequiva-lenttoasetofindependentlytrainedclassifierssincenotrainingagainstthepredictionsofthepreviouslylearnedpolicytakesplace.6.2TrainingwithmissingalignmentsThelossfunction‘inDAGGERisonlyusedtocomparecompleteoutputsagainstthegoldstandard.Therefore,whengeneratingaCSCtrainingexampleinDAGGER(lines7-12),wedonotneedtoknowwhetheranactionyjtiscorrectornot,weonlyevalu-atewhattheeffectofyjtisonthelossincurredbythecompleteactionsequence.Thus,itdoesnotneedtodecomposeovertheactionstakentoevaluatethem.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
555
Theabilitytotrainagainstnon-decomposablelossfunctionsisusefulwhenthetrainingdatahasmiss-inglabels,asisthecasewithsemanticparsing.Fol-lowingSec.5,‘isdefinedasthesumofthefalsepositiveandfalsenegativeatomicpredictionsusedtocalculateprecisionandrecalland,sinceitignoresthealignmentbetweentokensandnodes,itcannotassessnodepredictionactions.However,wecanuseitunderDAGGERtolearnanodepredictionclassi-fiertogetherwiththeclassifiersoftheotherstages.TheonlycomponentofDAGGERwhichassumesknowledgeofthecorrectactionsfortrainingistheexpertpolicyπ?.Sincethesearenotavailableforthenodepredictionstage,wereplaceπ?witharan-domizedexpertpolicyπrand,inwhichactionsthatarenotspecifiedbytheannotationarechosenran-domlyfromasetofequallyoptimalones.Forex-ample,inFig.2bwhenpredictingtheactionforeachtoken,πrandchoosesrandomlyamongnull,distance,andrestaurant,sothatbytheendofthestagethecorrectnodeshavebeenpredicted.Randomizingthischoicehelpsexploretheactionsavailable.Inourexperimentsweplacedauniformdistributionovertheavailableactions,i.e.allop-timalactionsareequallylikelytobechosen.Theactionsreturnedbyπrandwilloftenresultinalign-mentsthatdonotincuranylossbutarenonsensical,e.g.predictingrestaurantfrom“what”.How-ever,sinceπrandisprogressivelyignored,theeffectofsuchactionsisreduced.Whilebeingabletolearnasemanticparserwith-outalignmentinformationisuseful,itwouldhelptousesomesupervision,e.g.that“street”commonlypredictsthenodestreet.Weincorporatesuchanalignmentdictionaryinπrandasfollows:ifthetargettokenismappedtoanodetypeinthedictio-nary,andifanodeofthistypeneedstobepredictedfortheutterance,thenthistypeisreturned.Other-wise,thepredictionismadewithπrand.Finally,likeπranditself,thedictionaryisprogressivelyignoredandneitherconstrainsthetrainingprocess,norisusedduringtesting.7ExperimentsWesplittheannotateddialogsintotrainingandtestsets.Theformerconsistsoffourdialogsfromthefirstscenarioandsevenfromthesecond,andthelat-terofthreedialogsfromeachscenario.Alldevel-opmentandfeatureengineeringwasconductedus-ingcross-validationonthetrainingset,atthedialoglevelratherthantheutterancelevel(thereforeresult-inginasmanyfoldsasdialogsinthetrainingset),toensurethateachfoldcontainsutterancesfromallpartsofthescenariofromwhichthedialogistaken.Toperformcost-sensitiveclassificationlearningweusedtheadaptiveregularizationofweightvec-tors(AROW)algorithm(Crammeretal.,2009).AROWisanonlinealgorithmforlinearpredic-torsthatadjuststheper-featurelearningratessothatpopularfeaturesdonotovershadowrarebutusefulones.Giventhetaskdecomposition,eachlearnedhypothesisconsistsof59classifiers.Werestrictedthepredictionofnodestocontentwordssincefunctionwordsareunlikelytoprovideusefulalignments.Allpreprocessingwasperformedus-ingtheStanfordCoreNLPtoolkit(Manningetal.,2014).Theimplementationofthesemanticparserisavailablefromhttp://sites.google.com/site/andreasvlachos/resources.TheDAGGERparametersweresetto12trainingitera-tions,β=0.3and3samplesforactioncostassess-ment.WecomparedourDAGGER-basedimitationlearningapproach(henceforthImit)againstindepen-dentlytrainedclassifiersusingthesameclassifica-tionlearnerandfeatures(henceforthIndep).Forbothsystemsweincorporatedanalignmentdictionary(+alignversions)asdescribedinSec.6.2,inordertoimprovenodepredictionperformance.Thedictionarywasextractedfromthetrainingdataandcontains96tokensthatcommonlypredictapar-ticularnodetype.Theresultsfromthecross-validationexperimentsarereportedinTbl.2.Overallperformanceeval-uatedasdescribedinSec.5was53.6pointsinF-scoreforImit,5.7pointshigherthanIndepandthedifferenceisgreaterforthe+alignversions.Theseresultsdemonstratetheadvantagesoftrainingclas-sifiersusingimitationlearningversusindependentlytrainedclassifiers.Isolatingtheperformancefornodeandargumentpredictionstages,weobservethatthemainbottleneckistheformer,whichinthecaseofImitis60.9pointsinF-scorecomparedto78.8forthelatter.Accuracyfordialogactsis78.9%.AsshowninTbl.2,thealignmentdictionaryim-provednotonlynodepredictionperformanceby6
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
556
ImitImit+alignIndepIndep+alignexactmatch(准确性)58.4%59.1%56%55.9%dialogact(准确性)78.9%79.3%78.8%79%节点(Rec/Prec/F)72.352.660.976.159.866.944.461.651.653.36458.1论据(Rec/Prec/F)77.68078.879.68381.374.167.270.178.266.371.8focus(Rec/Prec/F)81.887.284.484.486.785.585.98786.586.88.384.7全面的(Rec/Prec/F)59.348.953.662.254.459.145.350.847.95050.150.1Table2:Performancesusing11-foldcross-validationonthetrainingset.pointsinF-score,butalsoargumentpredictionby2.5points,thusdemonstratingthebenefitsoflearn-ingthealignmentstogetherwiththeothercompo-nentsofthesemanticparser.Theoverallperfor-manceimprovedby5.5pointsinF-score.Finally,werananexperimentwithoraclenodepredictionandfoundthattheoverallperformanceusingcross-validationonthetrainingdataimprovedto88.2and79.9pointsinF-scorefortheImit+alignIndep+alignsystems.ThisisinagreementwiththeresultspresentedbyFlaniganetal.(2014)ondevel-opingasemanticparsingparserfortheAMRfor-malismwhoalsoarguethatnodepredictionisthemainperformancebottleneck.Tbl.3givesresultsonthetestset.TheoverallperformanceforImitis48.4F-scoreand47.9%forexactmatch.Asinthecross-validationresultsonthetrainingdata,trainingwithimitationlearningim-proveduponindependentlytrainedclassifiers.Theperformancewasimprovedfurtherusingthealign-mentdictionary,reaching53.5pointsinF-scoreand49.1%exactmatchaccuracy.Intheexperimentalsetupabove,dialogsfromthesamescenariosappearinbothtrainingandtesting.WhilethisisareasonableevaluationapproachalsofollowedinATISevaluations,itislikelytoberel-ativelyforgiving;inpractice,semanticparsersarelikelytoencounterentities,活动,etc.unseenintraining.Henceweconductedasecondevaluationinwhichdialogsfromonescenarioareusedtotrainaparserevaluatedontheother(stillrespectingthetrain/testsplitfrombefore).Whentestingonthedi-alogsfromthefirstscenarioandtrainingonthedi-alogsfromthesecond,theoverallperformanceus-ingImit+alignwas36.9pointsinF-score,whileinthereverseexperimentitwas41.7.NotethatdirectcomparisonsagainsttheperformancesinTbl.3arenotmeaningfulsincefewerdialogsarebeingusedfortrainingandtestinginthecross-scenariosetup.8ComparisonwithRelatedWorkPreviousworkonsemanticparsinghandledthelackofalignmentsduringtraininginavarietyofways.ZettlemoyerandCollins(2009)manuallyengineeredaCCGlexiconfortheATIScorpus.Kwiatkowskietal.(2011)usedadedicatedalgo-rithmtoinferasimilardictionaryandusedalign-mentsfromGiza++(OchandNey,2000)toinitial-izetherelevantfeatures.MostrecentworkonGeo-Queryusesanalignmentdictionarythatincludesforeachgeographicalentityallnounphrasesreferringtoit(Jonesetal.,2012).Morerecently,Flaniganetal.(2014)developedadedicatedalignmentmodelontopofwhichtheylearnedasemanticparserfortheAMRformalism.Inourapproach,welearnthealignmentstogetherwiththesemanticparserwith-outrequiringadictionary.Intermsofstructuredpredictionframeworks,mostpreviousworkuseshiddenvariablelinear(ZettlemoyerandCollins,2007)orlog-linear(Liangetal.,2011)modelswithbeamsearch.Intermsofdirectcomparisonswithexistingwork,thegoalofthispaperistointroducethenewcorpusandpro-videacompetitivefirstattemptatthenewsemanticparsingtask.However,webelieveitisnon-trivialtoapplyexistingapproachestothenewtask,自从,as-sumingadecompositionsimilartothatofSec.5.1,exhaustivesearchwouldbetooexpensive,andap-plyingvanillabeamsearchwouldbedifficultsincedifferentpredictionsresultinbeamsof(sometimesradically)differentlengthsthatarenotcomparable.WehaveattemptedapplyingtheMT-basedse-manticparsingapproachproposedbyAndreasetal.(2013)toourdatasetbutininitialexperimentsthe
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
557
ImitImit+alignIndepIndep+alignexactmatch(准确性)47.9%49.1%47.6%46.1%dialogact(准确性)77%80.5%79.8%79.5%节点(Rec/Prec/F)68.745.754.875.551.761.441.961.149.75464.958.9论据(Rec/Prec/F)73.973.773.876.877.377.169.561.365.177.363.669.8focus(Rec/Prec/F)87.180.783.88681.283.681.673.477.390.676.883.1全面的(Rec/Prec/F)56.642.348.463.546.253.541.247.844.35047.448.7Table3:Performancesonthetestset.performancewaspoor.Themainreasonforthisisthat,unlikeGeoQuery,theproposedMRLdoesnotalignwellwithEnglish.TheexpertpolicyinDAGGERisageneralizationofthedynamicoracleofGoldbergandNivre(2013)forshift-reducedependencyparsingtoanystruc-turedpredictiontaskdecomposedintoasequenceofactions.Therandomizedexpertpolicyproposedex-tendsDAGGERtolearnnotonlyhowtoavoiderrorpropagation,butalsohowtoinferlatentvariables.Themainbottleneckistrainingdatasparsity.Somenodetypesappearonlyafewtimesinrela-tivelylongutterances,andthusitisdifficulttoinferappropriatealignmentsforthem.Unlikemachinetranslationbetweennaturallanguages,itisunreal-istictoexpectlargequantitiesofutterancestobeannotatedwithMRexpressions.Anappealingal-ternativewouldbetouseresponse-basedlearning,i.e.usetheresponsefromthesysteminsteadofMRexpressionsastrainingsignal(Liangetal.,2011;Kwiatkowskietal.,2013;BerantandLiang,2014).Howeversuchanapproachwouldnotbestraight-forwardtoimplementinourapplication,sincetheresponsefromthesystemisnotalwaystheresultofadatabasequerybut,e.g.,anavigationinstruc-tionthatiscontext-dependentandthusdifficulttoassessitscorrectness.Furthermore,itwouldrequirethedevelopmentofausersimulator(Keizeretal.,2012),anon-trivialtaskwhichisbeyondthescopeofthiswork.AdifferentapproachistousedialogsbetweenasystemanditsusersasproposedbyArtziandZettlemoyer(2011)usingtheDARPAcommu-nicatorcorpus(Walkeretal.,2002).然而,inthatworkutteranceswereselectedtobeshorterthan6wordsandtoincludeonenounphrasepresentinthelexiconusedduringlearningwhileignoringshortbutcommonphrasessuchas“yes”and“no”;thusitisunclearwhetheritwouldbeapplicabletoourdataset.Finally,dialogcontextisonlytakenintoaccountinpredictingthedialogactforeachutterance.Eventhoughourcorpuscontainscoreferenceinformation,wedidnotattemptthistaskasitisdifficulttoeval-uateandourperformanceonnodepredictiononwhichitreliesisrelativelylow.Weleavecorefer-enceresolutiononthenewcorpusasaninterestingandchallengingtaskforfuturework.9ConclusionsInthispaperwepresentedanewcorpusforcontext-dependentsemanticparsinginthecontextofaportable,interactivenavigationandexplorationsys-temfortourism-relatedactivities.TheMRLusedfortheannotationcanhandledialogcontextsuchascoreferenceandcanaccommodateutterancesthatarenotinterpretableaccordingtoadatabase.Weconductedaninter-annotatoragreementstudyandfound0.829exactmatchagreement.WealsodevelopedasemanticparserfortheSPACEBOOKcorpususingtheimitationlearningal-gorithmDAGGERthat,unlikepreviousapproaches,caninferthemissingalignmentsinthetrainingdatausingarandomizedexpertpolicy.Inexperimentsusingthenewcorpuswefoundthattrainingwithim-itationlearningsubstantiallyimprovesperformancecomparedtoindependentlytrainedclassifiers.Fi-nally,weshowedhowtoimproveperformancefur-therbyincorporatinganalignmentdictionary.AcknowledgementsTheresearchreportedwasconductedwhilethefirstauthorwasattheUniversityofCambridgeandfundedbytheEuropeanCommunity’sSev-enthFrameworkProgramme(FP7/2007-2013)和-
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
558
dergrantagreementno.270019(SPACEBOOKprojectwww.spacebook-project.eu).TheauthorswouldliketoacknowledgetheworkofDi-aneNichollsintheannotation;theeffortsofRobinHillincollectingthedialogsfromWizard-of-Ozex-periments;andTimVieiraforhelpfulcommentsonanearlierversionofthismanuscript.ReferencesJamesAllenandMarkCore.1997.Dialogueactmarkupinseverallayers.Technicalreport,UniversityofRochester.JacobAndreas,AndreasVlachos,andStephenClark.2013.Semanticparsingasmachinetranslation.InProceedingsofthe51stAnnualMeetingoftheAssoci-ationforComputationalLinguistics(shortpapers).YoavArtziandLukeZettlemoyer.2011.Bootstrappingsemanticparsersfromconversations.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNatu-ralLanguageProcessing,pages421–432,Edinburgh,UK.LauraBanarescu,ClaireBonial,ShuCai,MadalinaGeorgescu,KiraGriffitt,UlfHermjakob,KevinKnight,PhilippKoehn,MarthaPalmer,andNathanSchneider.2013.Abstractmeaningrepresentationforsembanking.InProceedingsofthe7thLinguisticAnnotationWorkshopandInteroperabilitywithDis-course,pages178–186,Sofia,Bulgaria,August.As-sociationforComputationalLinguistics.JonathanBerantandPercyLiang.2014.Semanticpars-ingviaparaphrasing.InProceedingsofthe52ndAn-nualMeetingoftheAssociationforComputationalLinguistics.KurtBollacker,ColinEvans,PraveenParitosh,TimSturge,andJamieTaylor.2008.Freebase:acol-laborativelycreatedgraphdatabaseforstructuringhu-manknowledge.InProceedingsofthe2008ACMSIGMODInternationalConferenceonManagementofData,pages1247–1250.HarryBunt,JanAlexandersson,Jae-WoongChoe,AlexChengyuFang,KoitiHasida,VolhaPetukhova,AndreiPopescu-Belis,andDavidTraum.2012.Iso24617-2:Asemantically-basedstandardfordialogueannotation.InProceedingsoftheEightInternationalConferenceonLanguageResourcesandEvaluation,Istanbul,Turkey.QingqingCaiandAlexanderYates.2013.Large-scaleSemanticParsingviaSchemaMatchingandLexiconExtension.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics.AnnCopestake,DanFlickinger,IvanSag,andCarlPol-lard.2005.Minimalrecursionsemantics:Anin-troduction.ResearchinLanguageandComputation,3(2–3):281–332.KobyCrammer,AlexKulesza,andMarkDredze.2009.Adaptiveregularizationofweightvectors.InAd-vancesinNeuralInformationProcessingSystems22,pages414–422.DeborahA.Dahl,MadeleineBates,MichaelBrown,WilliamFisher,KateHunicke-Smith,DavidPallett,ChristinePao,AlexanderRudnicky,andElizabethShriberg.1994.ExpandingthescopeoftheATIStask:theATIS-3corpus.InProceedingsoftheWork-shoponHumanLanguageTechnology,pages43–48,Plainsboro,NewJersey.HalDaum´eIII,JohnLangford,andDanielMarcu.2009.Search-basedstructuredprediction.MachineLearn-ing,75:297–325.PedroDomingos.1999.Metacost:ageneralmethodformakingclassifierscost-sensitive.InProceedingsofthe5thInternationalConferenceonKnowledgeDis-coveryandDataMining,pages155–164.AssociationforComputingMachinery.JeffreyFlanigan,SamThomson,JaimeCarbonell,ChrisDyer,andNoahA.Smith.2014.Adiscriminativegraph-basedparserfortheabstractmeaningrepresen-tation.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages1426–1436,Baltimore,Maryland,June.AssociationforComputationalLin-guistics.YoavGoldbergandJoakimNivre.2013.Trainingdeterministicparserswithnon-deterministicoracles.TransactionsoftheAssociationforComputationalLinguistics,3(1):403–414,October.HeHe,HalDaum´eIII,andJasonEisner.2013.Dynamicfeatureselectionfordependencyparsing.InProceed-ingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1455–1464,Seattle,October.MatthewHenderson,BlaiseThomson,andJasonWilliams.2014.TheThirdDialogStateTrackingChallenge.InProceedingsofIEEESpokenLanguageTechnology.RobinHill,JanaG¨otze,andBonnieWebber.2013.SpaceBookProject:FinalDataRelease,Wizard-of-Oz(WoZ)experiments.Technicalreport,UniversityofEdinburgh.SrinivasanJanarthanam,OliverLemon,PhilBartie,TiphaineDalmas,AnnaDickinson,XingkunLiu,WilliamMackaness,andBonnieWebber.2013.Eval-uatingacityexplorationdialoguesystemwithinte-gratedquestion-answeringandpedestriannavigation.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
559
InProceedingsofthe51stAnnualMeetingoftheAs-sociationforComputationalLinguistics(Volume1:LongPapers),pages1660–1668,Sofia,Bulgaria,Au-gust.AssociationforComputationalLinguistics.BevanKeeleyJones,MarkJohnson,andSharonGoldwa-ter.2012.SemanticparsingwithBayesiantreetrans-ducers.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics,pages488–496.HansKampandUweReyle.1993.FromDiscoursetoLogic.IntroductiontoModeltheoreticSemanticsofNaturalLanguage,FormalLogicandDiscourseRep-resentationTheory.Kluwer,Dordrecht.SimonKeizer,StphaneRossignol,SenthilkumarChan-dramohan,andOlivierPietquin.2012.Usersimula-tioninthedevelopmentofstatisticalspokendialoguesystems.InOliverLemonandOlivierPietquin,edi-tors,Data-DrivenMethodsforAdaptiveSpokenDia-logueSystems,pages39–73.SpringerNewYork.JohnF.Kelley.1983.Anempiricalmethodologyforwritinguser-friendlynaturallanguagecomputerappli-cations.InProceedingsoftheSIGCHIConferenceonHumanFactorsinComputingSystems,pages193–196.TomKwiatkowski,LukeZettlemoyer,SharonGoldwa-ter,andMarkSteedman.2011.Lexicalgeneraliza-tioninCCGgrammarinductionforsemanticparsing.InProceedingsofthe2011ConferenceonEmpiri-calMethodsinNaturalLanguageProcessing,pages1512–1523,Edinburgh,UK.TomKwiatkowski,EunsolChoi,YoavArtzi,andLukeZettlemoyer.2013.Scalingsemanticparserswithon-the-flyontologymatching.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1545–1556,Seattle,WA.PercyLiang,MichaelJordan,andDanKlein.2011.Learningdependency-basedcompositionalsemantics.InProceedingsofthe49thAnnualMeetingoftheAs-sociationforComputationalLinguistics:HumanLan-guageTechnologies,pages590–599,Portland,Ore-gon.MattMacMahon,BrianStankiewicz,andBenjaminKuipers.2006.Walkthetalk:connectinglanguage,知识,andactioninrouteinstructions.InPro-ceedingsofthe21stNationalConferenceonArtificialIntelligence,pages1475–1482.AAAIPress.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyFinkel,StevenJ.Bethard,andDavidMcClosky.2014.TheStanfordCoreNLPnaturallanguagepro-cessingtoolkit.InProceedingsof52ndAnnualMeet-ingoftheAssociationforComputationalLinguistics:SystemDemonstrations,pages55–60.FranzJosefOchandHermannNey.2000.Improvedsta-tisticalalignmentmodels.InProceedingsofthe38thAnnualMeetingoftheAssociationforComputationalLinguistics,pages440–447,HongKong,China.St´ephaneRoss,GeoffreyJ.Gordon,andDrewBagnell.2011.Areductionofimitationlearningandstructuredpredictiontono-regretonlinelearning.In14thIn-ternationalConferenceonArtificialIntelligenceandStatistics,pages627–635.AndreasStolcke,KlausRies,NoahCoccaro,Eliza-bethShriberg,RebeccaBates,DanielJurafsky,PaulTaylor,RachelMartin,CarolVanEss-Dykema,andMarieMeteer.2000.Dialogueactmodelingforautomatictaggingandrecognitionofconversationalspeech.Computationallinguistics,26(3):339–373.GokhanTur,DilekHakkani-T¨ur,andLarryHeck.2010.What’slefttobeunderstoodinATIS?InIEEEWork-shoponSpokenLanguageTechnologies.AndreasVlachos.2012.Aninvestigationofimitationlearningalgorithmsforstructuredprediction.JournalofMachineLearningResearchWorkshopandConfer-enceProceedings,Proceedingsofthe10thEuropeanWorkshoponReinforcementLearning,24:143–154.MarilynA.Walker,AlexanderI.Rudnicky,RashmiPrasad,JohnS.Aberdeen,ElizabethOwenBratt,JohnS.Garofolo,HelenWrightHastie,AudreyN.Le,BryanL.Pellom,AlexandrosPotamianos,Re-beccaJ.Passonneau,SalimRoukos,GregoryA.Sanders,StephanieSeneff,andDavidStallard.2002.DARPAcommunicator:cross-systemresultsforthe2001evaluation.InProceedingsofthe7thInterna-tionalConferenceonSpokenLanguageProcessing.BonnieLynnWebber.1978.AFormalApproachtoDis-courseAnaphora.Ph.D.thesis,HarvardUniversity.JohnM.Zelle.1995.UsingInductiveLogicProgram-mingtoAutomatetheConstructionofNaturalLan-guageParsers.Ph.D.thesis,DepartmentofComputerSciences,TheUniversityofTexasatAustin.LukeS.ZettlemoyerandMichaelCollins.2007.OnlinelearningofrelaxedCCGgrammarsforparsingtologi-calform.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning,pages678–687.LukeS.ZettlemoyerandMichaelCollins.2009.Learn-ingcontext-dependentmappingsfromsentencestologicalform.InProceedingsoftheJointconferenceofthe47thAnnualMeetingoftheAssociationforComputationalLinguisticsandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAsianFederationofNaturalLanguageProcessing,pages976–984,Singapore.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
0
2
1
5
6
6
9
4
5
/
/
t
我
A
C
_
A
_
0
0
2
0
2
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
560
下载pdf