计算语言学协会会刊, 1 (2013) 315–326. 动作编辑器: Mark Steedman.
Submitted 2/2013; 修改 6/2013; 已发表 7/2013. C
(西德:13)
2013 计算语言学协会.
Parsingentirediscoursesasverylongstrings:CapturingtopiccontinuityingroundedlanguagelearningMinh-ThangLuongDepartmentofComputerScienceStanfordUniversityStanford,Californialmthang@stanford.eduMichaelC.FrankDepartmentofPsychologyStanfordUniversityStanford,Californiamcfrank@stanford.eduMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,AustraliaMark.Johnson@MQ.edu.auAbstractGroundedlanguagelearning,thetaskofmap-pingfromnaturallanguagetoarepresentationofmeaning,hasattractedmoreandmorein-terestinrecentyears.Inmostworkonthistopic,然而,utterancesinaconversationaretreatedindependentlyanddiscoursestruc-tureinformationislargelyignored.Inthecontextoflanguageacquisition,thisindepen-denceassumptiondiscardscuesthatareim-portanttothelearner,e.g.,thefactthatcon-secutiveutterancesarelikelytosharethesamereferent(Franketal.,2013).Thecurrentpa-perdescribesanapproachtotheproblemofsimultaneouslymodelinggroundedlanguageatthesentenceanddiscourselevels.Wecom-bineideasfromparsingandgrammarinduc-tiontoproduceaparserthatcanhandlelonginputstringswiththousandsoftokens,creat-ingparsetreesthatrepresentfulldiscourses.Bycastinggroundedlanguagelearningasagrammaticalinferencetask,weuseourparsertoextendtheworkofJohnsonetal.(2012),investigatingtheimportanceofdiscoursecon-tinuityinchildren’slanguageacquisitionanditsinteractionwithsocialcues.Ourmodelboostsperformanceinalanguageacquisitiontaskandyieldsgooddiscoursesegmentationscomparedwithhumanannotators.1IntroductionLearningmappingsbetweennaturallanguage(NL)andmeaningrepresentations(MR)isanimportantgoalforbothcomputationallinguisticsandcognitivescience.Accuratelylearningnovelmappingsiscru-cialingroundedlanguageunderstandingtasksandsuchsystemscansuggestinsightsintothenatureofchildrenlanguagelearning.Twoinfluentialexamplesofgroundedlanguagelearningtasksarethesportscastingtask,RoboCup,wheretheNListhesetofrunningcommentaryandtheMRisthesetoflogicalformsrepresentingac-tionslikekickingorpassing(ChenandMooney,2008),andthecross-situationalword-learningtask,wheretheNListhecaregiver’sutterancesandtheMRisthesetofobjectspresentinthecontext(Siskind,1996;YuandBallard,2007).Workinthesedomainssuggeststhat,basedontheco-occurrencebetweenwordsandtheirreferentsincontext,itispossibletolearnmappingsbetweenNLandMRevenundersubstantialambiguity.Nevertheless,contextslikeRoboCup—whereev-erysingleutteranceisgrounded—areextremelyrare.Muchmorecommonarecaseswhereasin-gletopicisintroducedandthendiscussedatlengththroughoutadiscourse.Inatelevisionnewsshow,forexample,atopicmightbeintroducedbypresent-ingarelevantpictureorvideoclip.Oncethetopicisintroduced,theanchorscandiscussitbynameorevenusingapronounwithoutshowingapicture.Thediscourseisgroundedwithouthavingtogroundeveryutterance.Moreover,althoughpreviousworkhaslargelytreatedutteranceorderasindependent,theorderofutterancesiscriticalingroundeddiscoursecontexts:iftheorderisscrambled,itcanbecomeimpossibletorecoverthetopic.Supportingthisidea,Franketal.(2013)foundthattopiccontinuity—thetendencytotalkaboutthesametopicinmultipleutterancesthatarecontiguousintime—isbothprevalentandinformativeforwordlearning.Thispaperexaminestheimportanceoftopiccontinuitythroughagram-maticalinferenceproblem.WebuildonJohnsonetal.(2012)’sworkthatusedgrammaticalinferenceto
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
316
(西德:1)(西德:2)(西德:3)(西德:4)(西德:2)(西德:3)(西德:5)(西德:2)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:10)(西德:1)(西德:2)(西德:3)(西德:11)(西德:7)(西德:12)(西德:13)(西德:14)(西德:10)(西德:1)(西德:2)(西德:3)(西德:11)(西德:7)(西德:12)(西德:13)(西德:14)(西德:10)(西德:1)(西德:2)(西德:3)(西德:11)(西德:7)(西德:12)(西德:13)(西德:10)(西德:15)(西德:7)(西德:3)(西德:2)(西德:11)(西德:7)(西德:12)(西德:13)(西德:14)(西德:10)(西德:1)(西德:2)(西德:3)(西德:11)(西德:7)(西德:12)(西德:13)(西德:10)(西德:15)(西德:7)(西德:3)(西德:2)(西德:11)(西德:7)(西德:12)(西德:13)(西德:10)(西德:1)(西德:2)(西德:3)(西德:4)(西德:5)(西德:6)(西德:3)(西德:7)(西德:4)(西德:1)(西德:2)(西德:3)(西德:8)(西德:9)(西德:2)(西德:10)(西德:5)(西德:4)(西德:11)(西德:12)(西德:11)(西德:13)(西德:14)(西德:6)(西德:14)(西德:4)(西德:9)(西德:15)(西德:16)(西德:5)(西德:13)(西德:17)(西德:17)(西德:17)(西德:18)(西德:9)(西德:11)(西德:19)(西德:11)(西德:13)(西德:20)(西德:9)(西德:11)(西德:1)(西德:2)(西德:3)(西德:3)(西德:2)(西德:11)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:10)(西德:1)(西德:2)(西德:3)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:10)(西德:15)(西德:7)(西德:3)(西德:2)(西德:6)(西德:10)(西德:15)(西德:7)(西德:3)(西德:2)(西德:6)(西德:10)(西德:1)(西德:2)(西德:3)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:5)(西德:18)(西德:9)(西德:17)(西德:13)(西德:10)(西德:2)(西德:19)(西德:2)(西德:14)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:5)(西德:18)(西德:9)(西德:17)(西德:13)(西德:10)(西德:18)(西德:16)(西德:3)(西德:13)(西德:14)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:2)(西德:19)(西德:2)(西德:14)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:18)(西德:16)(西德:3)(西德:13)(西德:14)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:8)(西德:7)(西德:9)(西德:3)(西德:4)(西德:17)(西德:15)(西德:7)(西德:4)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:5)(西德:18)(西德:9)(西德:17)(西德:13)(西德:10)(西德:2)(西德:19)(西德:2)(西德:14)(西德:15)(西德:7)(西德:4)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:5)(西德:18)(西德:9)(西德:17)(西德:13)(西德:10)(西德:18)(西德:16)(西德:3)(西德:13)(西德:14)(西德:15)(西德:7)(西德:4)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:2)(西德:19)(西德:2)(西德:14)(西德:15)(西德:7)(西德:4)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:18)(西德:16)(西德:3)(西德:13)(西德:14)(西德:15)(西德:7)(西德:4)(西德:6)(西德:7)(西德:8)(西德:9)(西德:5)(西德:16)(西德:17)(西德:10)(西德:20)(西德:7)(西德:20)(西德:10)(西德:8)(西德:7)(西德:9)(西德:3)(西德:4)(西德:21)(西德:4)(西德:4)(西德:2)(西德:12)(西德:16)(西德:3)(西德:5)(西德:2)(西德:8)(西德:9)(西德:22)(西德:23)(西德:7)(西德:24)(西德:25)(西德:2)(西德:5)(西德:4)(西德:23)(西德:8)(西德:12)(西德:2)(西德:26)(西德:9)(西德:27)(西德:13)(西德:7)(西德:22)(西德:23)(西德:7)(西德:24)(西德:25)(西德:2)(西德:5)(西德:4)(西德:23)(西德:8)(西德:12)(西德:2)(西德:26)(西德:9)(西德:27)Figure1:UnigramSocialCuePCFGs(Johnsonetal.,2012)–shownisaparsetreeoftheinpututterance“wheresthepiggie”accompaniedwithsocialcueprefixes,indicatingthatthecaregiverisholdingapigtoywhilethechildislookingatit;atthesametime,adogtoyispresentinthescreen.learnword-objectmappingsandtoinvestigatetheroleofsocialinformation(cueslikeeye-gazeandpointing)inachildlanguageacquisitiontask.Ourmaincontributionliesinthenovelintegra-tionofexistingtechniquesandalgorithmsinparsingandgrammarinductiontoofferacompletesolutionforsimultaneouslymodelinggroundedlanguageatthesentenceanddiscourselevels.Specifically,我们:(1)usetheEarleyalgorithmtoexploitthespecialstructureofourgrammars,whicharedeterministicorhaveatmostboundedambiguity,toachieveap-proximatelylinearparsingtime;(2)suggestarescal-ingapproachthatenablesustobuildaPCFGparsercapableofhandlingverylongstringswiththou-sandsoftokens;和(3)employVariationalBayesforgrammaticalinferencetoobtainbettergrammarsthanthosegivenbytheEMalgorithm.Byparsingentirediscoursesatonce,weshedlightonascientificallyinterestingquestionaboutwhythechild’sowngazeisapositivecueforwordlearn-ing(Johnsonetal.,2012).Ourdataprovidesupportforthehypothesis(frompreviouswork)thatcare-givers“followin”:theynameobjectsthatthechildisalreadylookingat(TomaselloandFarrar,1986).Inaddition,ourdiscoursemodelproducesaperfor-manceimprovementinalanguageacquisitiontaskandyieldsgooddiscoursesegmentationscomparedwithhumanannotators.2RelatedWorkSupervisedsemanticparsers.Previousworkhasdevelopedsupervisedsemanticparserstomapsen-tencestomeaningrepresentationsofvariousforms,includingmeaninghierarchies(Luetal.,2008)和,mostdominantly,λ-calculusexpressions(Zettle-moyerandCollins,2005;Zettlemoyer,2007;WongandMooney,2007;Kwiatkowskietal.,2010).Theseapproachesrelyontrainingdataofannotatedsentence-meaningpairs,however.Suchdataarecostlytoobtainandarequitedifferentfromtheex-perienceoflanguagelearners.GroundedLanguageLearning.Incontrasttose-manticparsers,groundedlanguagelearningsystemsaimtolearnthemeaningsofwordsandsentencesgivenanobservedworldstate(YuandBallard,2004;GorniakandRoy,2007).Agrowingbodyofworkinthisfieldemploysdistincttechniquesfromawidevarietyofperspectivesfromtext-to-recordalign-mentusingstructuredclassification(BarzilayandLapata,2005;SnyderandBarzilay,2007),iterativeretraining(Chenetal.,2010),andgenerativemodelsofsegmentationandalignment(Liangetal.,2009)totext-to-interactionmappingusingreinforcementlearning(Branavanetal.,2009;VogelandJuraf-sky,2010),graphicalmodelsemanticsrepresenta-tion(Tellexetal.,2011a;Tellexetal.,2011b),andCombinatoryCategorialGrammar(ArtziandZettle-moyer,2013).Anumberofsystemshavealsousedalternativeformsofsupervision,includingsentencespairedwithresponses(Clarkeetal.,2010;Gold-wasserandRoth,2011;Liangetal.,2011)andnosupervision(PoonandDomingos,2009;金子-
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
317
wasseretal.,2011).Recentworkhasalsointroducedanalternativeapproachtogroundedlearningbyreducingittoagrammaticalinferenceproblem.B¨orschingeretal.(2011)castedtheproblemoflearningasemanticparserasaPCFGinductiontask,achievingstate-oftheartperformanceintheRoboCupdomain.KimandMooney(2012)extendedthetechniquetomakeittractableformorecomplexproblems.Later,KimandMooney(2013)adapteddiscriminativererank-ingtothegroundedlearningproblemusingaformofweaksupervision.Weemploythisgeneralgram-maticalinferenceapproachinthecurrentwork.ChildrenLanguageAcquisition.Inthecontextoflanguageacquisition,Franketal.(2008)proposedasystemthatlearnedwordsandjointlyinferredspeakers’intendedreferent(utterancetopic)usinggraphicalmodels.Johnsonetal.(2012)usedgram-maticalinferencetodemonstratetheimportanceofsocialcuesinchildren’searlywordlearning.Weex-tendthisbodyofworkbycapturingdiscourse-baseddependenciesamongutterancesratherthantreatingeachutteranceindependently.DiscourseParsing.Asubstantialliteraturehasex-aminedformalrepresentationsofdiscourseacrossawidevarietyoftheoreticalperspectives(MannandThompson,1988;SchaandPolanyi,1988;Hobbs,1990;LascaridesandAsher,1993;KnottandSanders,1997).Althoughmuchofthisworkwashighlyinfluential,马可(1997)’sworkondis-courseparsingbroughtthistasktospecialpromi-nence.Sincethen,moreandmoresophisticatedmodelsofdiscourseanalysishavebeendeveloped:,例如,(马可,1999;SoricutandMarcu,2003;Forbesetal.,2003;Polanyietal.,2004;BaldridgeandLas-carides,2005;SubbaandDiEugenio,2009;Her-naultetal.,2010;Linetal.,2012;FengandHirst,2012).Ourcontributiontoworkonthistaskistoexaminelatentdiscoursestructurespecificallyingroundedlanguagelearning.3AGroundedLearningTaskOurfocusinthispaperistodevelopcomputationalmodelsthathelpusbetterunderstandchildren’slan-guageacquisition.Thegoalistolearnboththelongtermlexiconofmappingsbetweenwordsandobjects(languagelearning)aswellastheintendedtopicofindividualutterances(languagecomprehen-sion).Weconsideracorpusofchild-directedspeechannotatedwithsocialcues,describedin(Franketal.,2013).Thereareatotalof4,763utterancesinthecorpus,eachofwhichisorthographically-transcribedfromvideosofcaregiversplayingwithpre-linguisticchildrenofvariousages(6,12,and18months)duringhomevisits.1Eachutterancewashand-annotatedwithobjectspresentinthe(non-linguistic)语境,e.g.dogandpig(Fig-ure1),togetherwithsetsofsocialcues,onesetperobject.Thesocialcuesdescribeobjectsthecare-giverislookingat(mom.eyes),holdingonto(mom.hands),orpointingto(mom.point);sim-ilarly,为了(child.eyes)和(child.hands).3.1Sentence-levelModelsMotivatedbytheimportanceofsocialinformationinchildren’searlylanguageacquisition(Carpenteretal.,1998),Johnsonetal.(2012)proposedajointmodelofnon-linguisticinformationincludingthephysicalcontextandsocialcues,andthelinguis-ticcontentofindividualutterances.Theyframedthejointinferenceproblemofinferringword-objectmappingsandinferringsentencetopicsasagram-marinductiontaskwhereinputstringsareutterancesprefixedwithnon-linguisticinformation.Objectspresentinthenon-linguisticcontextofanutteranceareconsidereditspotentialtopics.Thereisalsoaspecialnulltopic,没有任何,toindicatenon-topicalut-terances.Thegoalofthemodelisthentoselectthemostprobabletopicforeachutterance.Top-levelrules,Sentence→TopictWordst(unigramPCFG)orSentence→TopictCollocst(collocationAdaptorGrammar),aretai-loredtolinkthetwomodalities(trangesoverT′,thesetofallavailabletopics(时间)andNone).Theserulesenforcesharingoftopicsbetweenprefixes(Topict)andwords(WordstorCollocst).Eachwordintheutteranceisdrawnfromeitheratopic-specificdistributionWordtorageneral“null”dis-tributionWordNone.AsillustratedinFigure1,theselectedtopic,猪,ispropagateddowntotheinputstringthroughtwopaths:(A)throughtopicalnodesuntilanobjectis1Caregiversweregivenpairsoftoystoplaywith,e.g.astuffeddogandpig,orawoodencarandtruck.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
318
reached,inthiscasethe.pigobject,和(乙)throughlexicalnodestotopicalwordtokens,e.g.piggie.So-cialcuesarethengeneratedbyaseriesofbinarydecisionsasdetailedinJohnsonetal.(2012).Thekeyfeatureofthesegrammarsisthatparameterin-ferencecorrespondsbothtolearningword-topicre-lationsandlearningthesalienceofsocialcuesingroundedlearning.Inthecurrentwork,werestrictourattentiontoonlytheunigramPCFGmodeltofocusoninvesti-gatingtheroleoftopiccontinuity.Unliketheap-proachofJohnsonetal.(2012),whichusesMarkovChainMonteCarlotechniquestoperformgram-maticalinference,weexperimentwithVariationalBayesmethods,detailedinSection6.3.2ADiscourse-levelModelTopiccontinuity—thetendencytogrouputterancesintocoherentdiscoursesaboutasingletopic—maybeanimportantsourceofinformationforchildrenlearningthemeaningsofwords(Franketal.,2013).Toaddressthisissue,weconsideranewdiscourse-levelmodelofgroundedlanguagethatcapturesde-pendenciesbetweenutterances.Bylinkingmultipleutterancesinasingleparse,ourproposedgrammati-calformalismisabigramMarkovprocessthatmod-elstransitionsamongutterancetopics.OurgrammarstartswitharootsymbolDiscourse,whichthenselectsastartingtopicthroughasetofdiscourseinitialrules,Discourse→Discoursetfort∈T′.EachoftheDiscoursetnodesgeneratesanutter-anceofthesametopic,andadvancesintoothertopicsthroughtransitionrules,Discourset→SentencetDiscourset′fort′∈T′.Dis-coursesterminatebyendingrules,Discourset→Sentencet.OtherrulesintheunigramPCFGmodelbyJohnsonarereusedexceptforthetop-levelrulesinwhichwereplacethenon-terminalSentencebytopic-specificonesSentencet.3.3ParsingDiscoursesandChallengesUsingadiscourse-levelgrammar,wemustparseaconcatenationofalltheutterances(withannota-tions)ineachconversation.Thisconcatenationre-sultsinanextremelylongstring:inthesocial-cuecorpus(Franketal.,2013),theaveragelengthoftheseper-recordingconcatenationsis2152tokens(σ=972).Parsingsuchstringsposesmanychal-lengesforexistingalgorithms.ForfamiliaralgorithmssuchasCYK,runtimequicklybecomesenormous:thetimecomplexityofCYKisO(n3)foraninputoflengthn.Fortunately,wecantakeadvantageofaspecialstructuralprop-ertyofourgrammars.Theshapeoftheparsetreeiscompletelydeterminedbytheinputstring;theonlyvariationisinthetopicannotationsinthenonter-minallabels.Soeventhoughthenumberofpossi-bleparsesgrowsexponentiallywithinputlengthn,thenumberofpossibleconstituentsgrowsonlylin-earlywithinputlength,andthepossibleconstituentscanbeidentifiedfromtheleftcontext.2Thesecon-straintsensurethattheEarleyalgorithm3(Earley,1970)willparseaninputoflengthnwiththisgram-marintimeO(n).Asecondchallengeinparsingverylongstringsisthattheprobabilityofaparseistheproductoftheprobabilitiesoftherulesinvolvedinitsderivation.Asthelengthofaderivationgrowslinearlywiththelengthoftheinput,theparseprobabilitiesdecreaseexponentiallyasafunctionofsentencelength,caus-ingfloating-pointunderflowoninputsofevenmod-eratelength.Thestandardmethodforhandlingthisistocomputelogprobabilities(whichdecreaselin-earlyasafunctionofinputlength,ratherthanex-ponentially),butasweexplainlater(Section5),wecanusetheabilityoftheEarleyalgorithmtocom-puteprefixprobabilities(Stolcke,1995)torescaletheprobabilityoftheparseincrementallyandavoidfloating-pointunderflows.Inthenextsection,weprovidebackgroundin-formationontheEarleyalgorithmforPCFGs,theprefixprobabilityschemeweuse,andtheinside-outsidealgorithmintheEarleycontext.4Background4.1EarleyAlgorithmforPCFGsTheEarleyalgorithmwasdevelopedbyEarley(1970)andknowntobeefficientforcertainkindsofCFGs(AhoandUllman,1972).AnEarleyparser2Theprefixmarkers#and##andthetopicmarkerssuchas“.dog”enablealeft-to-rightparsertounambiguouslyiden-tifyitslocationintheinputstring.3Inordertoachievelineartimetheparsingchartmusthavesuitableindexing;seeAhoandUllman(1972),狮子座(1991)andAycockandHorspool(2002)fordetails.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
319
constructsleft-mostderivationsofstrings,usingdot-tedproductionstokeeptrackofpartialderivations.Specifically,eachstateinanEarleyparserisrep-resentedas[我,r]:X→α.βtoindicatethatinputsymbolsxl,…,xr−1havebeenprocessedandtheparserisexpectingtoexpandβ.Statesaregen-eratedontheflyusingthreetransitionoperations:predict(addstatestocharts),scan(shiftdotsacrossterminals),andcomplete(mergetwostates).Fig-ure2showsanexampleofacompletionstepwhichalsoillustratestheimplicitbinarizationautomati-callydoneinEarleyalgorithm.(西德:1)(西德:1)(西德:2) (西德:4) (西德:2)(西德:5)(西德:1)(西德:1)(西德:2) (西德:2)(西德:4) (西德:5)(西德:3)(西德:1)(西德:6)(西德:2)(西德:7)(西德:8)(西德:9)Figure2:Completionstep–mergingtwostates[我,米]:X→α.Yβand[米,r]:Y→ν.toproduceanewstate[我,r]:X→αY.β.InordertohandlePCFGs,Stolcke(1995)extendstheEarleyparsingalgorithmtointroducetheno-tionofanEarleypathbeingasequenceofstateslinkedbyEarleyoperations.Byestablishingaone-to-onemappingbetweenpartialderivationsandEar-leypaths,Stolckecouldthenassigneachpathaderivationprobability,thatistheproductoftheallruleprobabilitiesusedinthepredictedstatesofthatpath.Here,eachproductionX→νcorrespondstoapredictedstate[我,我]:X→.ν.Besidesparsing,beingabletocomputestringandprefixprobabilitiesbysummingderivationprobabil-itiesisalsoofgreatimportance.Tocomputethesesumsefficiently,eachEarleystateisattachedwithaforwardandaninnerprobabilitywhichareupdatedincrementallyasnewstatesarespawnedbythethreetransitionoperations.4.2ForwardandPrefixProbabilitiesIntuitively,theforwardprobabilityofastate[我,r]:X→α.βistheprobabilityofanEarleypaththroughthatstate,generatinginputuptopositionr-1.ThisprobabilitygeneralizesasimilarconceptinHMMandlendsitselftothecomputationofpre-fixprobabilities,sumsofforwardprobabilitiesoverscannedstatesyieldingaprefixx.Computingprefixprobabilitiesisimportantbe-causeitenablesprobabilisticpredictionofpos-siblefollow-wordsxi+1asP(xi+1|x0…希)=P(x0…xixi+1)磷(x0…希)(JelinekandLafferty,1991).Theseconditionalprobabilitiesallowestimationofthein-crementalcostsofastackdecoder(Bahletal.,1983).在(HuangandSagae,2010),aconceptu-allysimilarprefixcostisdefinedtoorderstatesinabeamsearchdecoder.Moreover,thenegativelog-arithmofsuchconditionalprobabilitiesaretermedassurprisalvaluesinthepsycholinguisticsliterature(e.g.,Hale,2001;征收,2008),todescribehowdif-ficultawordisinagivencontext.Interestingly,weshowthatprefixprobabilitiesleadustoconstructaparserthatcouldparseextremelylongstringsnext.4.3InsideOutsideAlgorithmToextendtheInsideOutside(IO)algorithm(贝克,1979)totheEarleycontext,Stolckeintroducedin-nerandouterprobabilitieswhichgeneralizethein-sideandoutsideprobabilitiesintheIOalgorithm.Specifically,theinnerprobabilityofastate[我,r]:X→α.βistheprobabilityofgeneratinganinputsubstringxl,…,xr−1fromanon-terminalXusingaproductionX→αβ.4(西德:1)(西德:1)(西德:2) (西德:4) (西德:2)(西德:5)(西德:1)(西德:1)(西德:2) (西德:2)(西德:4) (西德:5)(西德:3)(西德:1)(西德:6)(西德:2)(西德:7)(西德:8)(西德:9)Figure3:Innerandouterprobabilities.TheouterprobabilityofX→α.Yβisasumofallproductsofitsparentouterprobability(X→αY.β)anditssiblinginnerprobability(Y→ν.).相似地,theouterproba-bilityofY→ν.isderivedfromtheouterprobabilityofX→αY.βandtheinnerprobabilityofX→α.Yβ.Onceallinnerprobabilitieshavebeenpopulatedinaforwardpass,outerprobabilitiesarederivedbackward,startingfromtheouterprobabilityofthegoalstate[0,n]:→S.being1.Here,eachEarleystateisassociatedwithanouterprobabilitywhichcomplementstheinnerprobabilitybyreferringpre-ciselytothoseparts(notcoveredbythecorrespond-inginnerprobability)ofthecompletepathsgenerat-ingtheinputstringx.Theimplicitbinarizationin4SummingupinnerprobabilitiesofallstatesY→ν.exactlyyieldsBaker’sinsideprobabilityforY.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
320
Earleyparsingallowsouterprobabilitiestobeaccu-mulatedinasimilarwayasitscounterpartintheIOalgorithm(seeFigure3).ThesequantitiesallowforefficientgrammaticalinferenceinwhichtheexpectedcountofeachruleX→λgivenastringxiscomputedas:C(X→λ|X)=∑s:[我,r]X→.λouter(s)·inner(s)磷(S⇒∗x).(1)5ARescalingApproachforParsingOurparseroriginatedfromtheprefixprobabilityparserbyLevy(2008),buthasdivergedmarkedlysincethen.Theparser,calledEarleyx5,isca-pableofproducingViterbiparsesandperforminggrammaticalinductionbasedontheexpectation-maximizationandvariationalBayesalgorithms.Totackletheunderflowproblemposedwhenparsingdiscourses(§3.3),weborrowtherescal-ingconceptfromHMMs(Rabiner,1990)toextendtheprobabilisticEarleyalgorithm.Specifically,theprobabilityofeachEarleypathisscaledbyacon-stantcieachtimeitpassesthroughascannedstategeneratingtheinputsymbolxi.Infact,eachpathpassesthrougheachscannedstateexactlyonce,soweconsistentlyaccumulatescalingfactorsfortheforwardandinnerprobabilitiesofastate[我,r]:X→α.βasc0…cr−1andcl…cr−1respectively.Arguably,themostintuitivechoiceofthescal-ingfactorsaretheprefixprobabilities,whichessen-tiallyresetstheprobabilityofanyEarleypathstart-ingfromanypositionito1.Concretely,wesetc0=1P(x0)andci=P(x0…xi−1)磷(x0…希)fori=1,…,n-1wherenistheinputlength.Asnotedinsection§4.2,thelogarithmofcigivesusthesurprisalvaluefortheinputsymbolxi.Rescalingfactorsareonlyintroducedinthefor-wardpass,duringwhichtheouterprobabilityofastate[我,r]:X→α.βhasalreadybeenscaledbyfac-torsc0…cl−1cr…cn−1.6Moreimportantly,when5Parsercodeisavailableathttp://nlp.stanford.edu/˜lmthang/earleyx.6Theouterprobabilityofastateisessentiallytheproductofinnerprobabilitiescoveringallinputsymbolsoutsidethespanofthatstate.Forgrammarscontainingcyclicunitproductions,wealsoneedtomultiplywithtermsfromtheunit-productionrelationmatrix(Stolcke,1995).computingexpectedcounts,scalingfactorsintheouterandinnertermscanceloutwiththoseinthestringprobabilityinEq.(1),implyingthatruleprob-abilityestimationisunaffectedbyrescaling.5.1ParsingTimeonDenseGrammarsWecompareinTable1theparsingtime(ona2.4GHzXeonCPU)ofourparser(Earleyx)andLevy’s.Thetaskistocomputesurprisalvaluesfora22-wordsentenceoveradensegrammar.7Giventhatourparserisnowcapableofperformingscalingtoavoidunderflow,weavoidconvertingprobabili-tiestologarithmicform,whichyieldsaspeedupofabout4timescomparedtoLevy’sparser.ParserTime(s)(征收,2008)640Earleyx+scaling145Table1:Parsingtime(densegrammars)–tocomputesurprisalvaluesfora22-wordsentenceusingLevy’sparserandours(Earleyx).5.2ParsingTimeonSparseGrammars2004006008001000120014001600180020001234567Parsing Time on Sparse Grammars (西德:9)# Wordsseconds(西德:9)Figure4:Parsingtime(sparsegrammars)–tocomputeViterbiparsesforsentencesofincreasinglengths.Figure4showsthetimetaken(asafunctionoftheinputlength)forEarleyxtocomputeaViterbiparsesoveroursparsegrammars(§3.2).Theplotconfirmedouranalysisinthatthespecialstructureofourgrammarsyieldsapproximatelylinearparsingtimeintheinputlength(see§3.3).7MLEestimatedfromtheEnglishPennTreebank.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
e
d
你
/
t
A
C
我
/
我
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
.
1
0
1
1
6
2
/
t
我
A
C
_
A
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
我
A
C
_
A
_
0
0
2
3
0
p
d
.
F
乙
y
G
你
e
s
t
t
哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3
321
6GrammarInductionWeemployaVariationalBayes(VB)approachtoperformgrammaticalinferenceinsteadofthestan-dardInsideOutside(IO)algorithm,orequivalentlytheExpectationMaximization(EM)algorithm,forseveralreasons:(1)ithasbeenshowntobelesslikelytocauseover-fittingforPCFGsthanEM(KuriharaandSato,2004)和(2)implementation-wise,VBisastraightforwardextensionfromEMastheybothsharethesameprocessofcomputingtheexpectedcounts(theIOpart)andonlydifferathowruleprobabilitiesarereestimated.Atthesametime,VBhasalsobeendemonstratedtodowellonlargedatasetsandiscompetitivewithGibbssamplerswhilehavingthefastestconvergencetimeamongtheseestimators(GaoandJohnson,2008).TherulereestimationinVBiscarriedasfol-lows.LetαrbethepriorhyperparameterofarulerintherulesetRandcrbeitsexpectedcountaccumulatedovertheentirecorpusafteranIOiteration.Theposteriorhyperparameterforrisα∗r=αr+cr.Letψbethedigammafunction,theruleparameterupdateformulais:θr:X→λ=exp[ψ(α∗r)−ψ(∑r′:X→λ′α∗r′)].WhereasIOminimizesthenegativelog-likelihoodoftheobserveddata(句子),-logp(X),VBminimizesaquantitycalledfreeenergy,whichwewilluselatertomonitorcon-vergence.Herexdenotestheobserveddataandθrepresentsthemodelparameters(PCFGruleprobabilities).下列的(KuriharaandSato,2006),wecomputethefreeenergyas:F(X,我)=−logp(X)+∑X∈NlogΓ(∑r:X→λα∗r)Γ(∑r:X→λαr)−∑r∈R(logΓ(α∗r)Γ(αr)+crlogθr)whereΓdenotesthegammafunction.6.1SparseDirichletPriorsInourapplication,sinceeachtopicshouldonlybeassociatedwithafewwordsratherthantheentirevocabulary,weimposesparseDirichletpriorsovertheWordtdistributionsbysettingasymmetricpriorα<1forallrulesWordt→w(∀t∈T,w∈W),whereWisthesetofallwordsinthecorpus.Thisbiasesthemodeltoselectonlyafewrulespernon-terminalWordt.8Forallotherrules,auniformhy-perparametervalueof1isused.Weinitializedruleprobabilitieswithuniformdistributionsplusrandomnoise.ItisworthwhiletomentionthatsparseDirich-letpriorswereproposedinJohnson(2010)’sworkthatlearnsLatentDirchletAllocationtopicmodelsusingBayesianinferenceforPCFGs.7ExperimentsOurexperimentsapplysentence-anddiscourse-levelmodelstotheannotatedcorpusofchild-directedspeechdescribedinSection3.Eachmodelisevaluatedon(a)topicaccuracy—howmanyutter-ancesarelabeledwithcorrecttopics(includingthenull),(b)topicmetrics(f-scores/precision/recall)—howwellthemodelpredictsnon-nulltopicalutter-ances,(c)wordmetrics—howwellthemodelpre-dictstopicalwords,9and(d)lexiconmetrics—howwellwordtypesareassignedtothetopicthattheyattachtomostfrequently.Forexample,inFigure1,themodelassignstopicpigtotheentireutterance.Atthewordlevel,itlabelspiggiewithtopicpigandassignsnulltopictowheresandthe.See(Johnsonetal.,2012)formoredetailsofthesemetrics.InSection7.1,weexaminebaselinemodelsthatdonotmakeuseofsocialcues(motherandchild’seye-gazeandhandposition)todiscoverthetopic;thesebaselinesarecontrastedwitharangeofsocialcues(§7.2and§7.3).InSection7.4,weevaluatethediscoursestructuresdiscoveredbyourmodels.7.1BaselineModels(NoSocialCues)Tocreatebaselinesforlaterexperiments,weeval-uateourmodelswithoutsocialinformation.Wecomparesentence-levelmodelsusingthreedifferentinferenceprocedures—MarkovChainMonteCarlo(MCMC)(Johnsonetal.,2012),ExpectationMax-imization(EM),andVariationalBayes(VB)10—aswellasthediscourse-levelmodeldescribedabove.8ItisimportanttonotsparsifytheWordNonedistributionsinceWordNonecouldexpandintomanynon-topicalwords.9Topicsassignedbythemodelarecomparedwiththosegivenbythegolddictionaryprovidedby(Johnsonetal.,2012).10Todeterminethebestsparsityhyperparameterαforlexicalrules(§6.1),weperformedalinesearchover{1,0.1,0.01,0.001,0.0001}.Asαdecreases,performanceim-proves,peakingat0.001,thevalueusedforallreportedresults
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
l
a
c
_
a
_
0
0
2
3
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
322
ModelTopicWordLexiconEnergyAcc.F1PRF1PRF1PRMCMC49.0760.6448.6780.4329.5017.6390.3114.838.1088.10VB53.1460.8950.5376.5925.6214.9489.9116.719.2585.71156719discourse51.0259.4048.6076.3523.8613.8287.3315.058.2783.33150023discourse+init55.7860.9152.1573.2229.7517.9187.6521.1111.9590.48149458Table2:Social-cuemodels.Comparisonofsentence-anddiscourse-levelmodels(init:initializedfromtheVBsentence-levelmodel)overfullmetrics.FreeenergiesareshowntocompareVB-basedmodels.ModelAcc.TopicF1WordF1LexiconF1MCMC33.9540.4420.0710.37EM32.0839.7613.316.09VB39.6439.2217.4012.27discourse40.6342.0119.3112.72Table3:Baseline(non-social)models.Comparisonofsentence-levelmodels(MCMC(Johnsonetal.,2012),EM,VB)andthediscourse-levelmodel.ResultsinTable3suggestthatincorporatingtopiccontinuitythroughthediscoursemodelboostsperformancecomparedtosentence-levelmodels.Withinsentence-levelmodels,EMisinferiortobothMCMCandVB(inaccordancewiththeconsensusthatEMislikelytooverfitforPCFGs).ComparingVBandMCMC,VBissignificantlybetterattopicaccuracybutisworseattopicF1.Thisresultsug-geststhatVBpredictsthatmoreutterancesarenon-topicalcomparedwithMCMC,perhapsexplainingwhyMCMChasthehighestwordF1.Nevertheless,unlikeVB,thediscoursemodeloutperformsMCMCinalltopicmetrics,indicatingthattopiccontinuityhelpsinpredictingbothnullandtopicalutterances.Thediscoursemodelisalsocapableofcaptur-ingtopicaltransitions.Examiningoneinstanceofalearnedgrammarrevealsthatthedistributionun-derDiscoursetisoftendominatedbyafewmajortransitions.Forexample,cartendstohavetransi-tionsintocar(0.72)andtruck(0.19);whilepigpreferstotransitintopig(0.69)anddog(0.24).Theselearnedtransitionsnicelyrecoverthestruc-tureofthetaskthatcaregiversweregiven:toplaywithtoypairslikecar/truckandpig/dog.7.2Social-cueModelsWenextexplorehowtopiccontinuityinteractswithsocialinformationviaasetofsimulationsmirroringthoseintheprevioussection.ResultsareshowninTable2.Forthesentence-levelmodelsusingsocialcues,VBnowoutperformsMCMCintopicaccuracyandF1,aswellaslexiconevaluations,suggestingthatVBisoverallquitecompetitivewithMCMC.11Turningtothediscoursemodels,socialinforma-tionandtopiccontinuitybothindependentlyboostlearningperformance(asevidencedinJohnsonetal.(2012)andinSection7.1).Nevertheless,jointinfer-enceusingbothinformationsources(discourserow)resultedinaperformancedecrement.Ratherthanreflectingissuesinthemodelitself,perhapsthein-creasedcomplexityoftheinferenceproblemmighthaveledtothisperformancedecrement.Totestthisexplanation,weinitializedourdiscourse-levelmodelwiththeVBsentence-levelmodel.Resultsareshowninthediscourse+initrow.Withasentence-levelinitialization,perfor-manceimprovedsubstantially,yieldingthebestre-sultsovermostmetrics.Inaddition,thediscoursemodelwithsentence-levelinitializationachievedlowerfreeenergythanthestandardinitializationdis-coursemodel.Bothoftheseresultssupportthehy-pothesisthatinitializationfacilitatedinferenceinthemorecomplexdiscoursemodel.Fromacognitivescienceperspective,thissortofresultmaypointtotheutilityofbeginningthetaskofdiscoursesegmen-tationwithsomeinitialsentence-levelexpectations.7.3EffectsofIndividualSocialCuesTheimportanceofparticularsocialcuesandtheirrelationshiptodiscoursecontinuityisanadditionaltopicofinterestfromthecognitivescienceper-spective(Franketal.,2013).Returningtooneofthequestionsthatmotivatedthiswork,wecanuse11Detailedbreakdownofwordf-scoresrevealsthatMCMCismuchbetteratprecision,indicatingthatVBpredictsmorewordsastopicalthanMCMC.Anexplanationforsucheffectisthatweusethesameαforalllexicalrules,whichresultsinsuboptimalsparsitylevelsforWordtdistributions.ForMCMC,Johnsonetal.(2012)usedtheadaptorgrammarsoftwaretolearnthehyperparametersautomaticallyfromdata.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
l
a
c
_
a
_
0
0
2
3
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
323
allno.child.eyesno.child.handsno.mom.eyesno.mom.handsno.mom.pointMCMC49.1/60.6/29.5/14.838.4/46.6/21.5/11.149.1/60.6/29.6/15.348.0/59.7/29.0/15.548.7/60.0/29.3/15.648.8/60.3/29.3/15.6VB53.1/60.9/25.62/16.7149.3/56.0/22.6/15.152.9/60.4/26.2/16.251.5/59.1/24.6/16.351.9/59.2/25.3/16.352.9/60.6/25.5/16.6discourse+init55.8/60.9/29.8/21.153.7/59.2/27.8/19.7∗+55.2/60.7/29.0/21.4+54.7/60.0/29.0/21.655.2/60.1/29.1/21.455.6/60.8/29.5/21.7Table4:Socialcueinfluence.Ablationtestresultsacrossmodelswithoutdiscourse(MCMC,VB)andwithdiscourse(discourse+init).Westartwiththefullsetofsocialcuesanddroponeatatime.Eachcellcontainsresultsformetrics:topicaccuracy/topicF1/wordF1/lexiconF1.Forrowdiscourse+init,wecomparemodelswith/withoutasocialcueusingchi-squaretestsanddenotestatisticallysignificantresults(p<.05)attheutterance(∗)andword(+)levels.nonechild.eyeschild.handsmom.eyesmom.handsmom.pointMCMC34.0/40.4/20.1/10.445.7/57.3/28.9/13.634.0/40.1/20.1/9.733.8/40.2/19.9/9.735.6/42.8/19.8/10.030.6/35.5/18.1/9.2VB39.6/39.2/17.4/12.2747.2/53.0/21.9/13.943.0/45.8/15.4/12.942.9/46.5/14.6/12.441.1/43.8/17.1/12.439.7/39.7/17.5/13.4discourse40.7/41.8/19.2/12.147.8/55.4/22.8/14.2∗+44.6/50.8/20.3/13.1∗+44.7/50.1/21.7/14.3∗+42.7/46.4/19.0/11.6+38.7/40.2/16.6/11.9∗+Table5:Socialcueinfluence.Add-onetestresultsacrossmodelswithoutdiscourse(MCMC,VB)andwithdiscourse(discourse).Westartwithnosocialinformationandaddonecueatatime.Eachcellcontainsresultsformetrics:topicaccuracy/topicF1/wordF1/lexiconF1.Forrowdiscourse,wecomparemodelswith/withoutasocialcueusingchi-squaretestsanddenotestatisticallysignificantresults(p<.05)attheutterance(∗)andword(+)levels.ourdiscoursemodeltoanswerthequestionabouttherolethatthechild.eyescueplaysinchild-directeddiscourses.Johnsonetal.(2012)raisedtwohypothesesthatcouldexplaintheimportanceofchild.eyesasasocialcue:(1)caregivers“fol-lowin”onthechild’sgaze:theytendtotalkaboutwhatthechildislookingat(Baldwin,1993),or(2)thechild.eyescueencodesthetopicofthepre-vioussentence,inadvertentlygivinganon-discoursemodelaccesstorudimentarydiscourseinformation.Toaddressthisquestion,weconducttwotests:(1)ablation–eliminatingeachsocialcueinturn(e.g.child.eyes),and(2)add-one,usingasin-glesocialcueperturn.Table4and5showcorre-spondingresultsformodelswithoutdiscourse(theMCMCandVBsentence-levelmodels)andwithdiscourse(discourse+initfortheablationtestanddiscoursefortheadd-onetest).Weobservesimi-lartrendstoJohnsonetal.(2012):thechildsgazeisthemostimportantcue.Removingitfromthefullmodelwithallsocialcuesoraddingittothebasemodelwithnocuesbothresultinthelargestperfor-mancechange;inbothcasesthischangeisstatisti-callyreliable.12Thelargeperformancedifferencesforchild.eyesareconsistentwiththehypothe-sisthatcaregiversarefollowingin,ordiscussingtheobjectthatchildrenareinterestedin–evencontrol-12Itissomewhatsurprisingwhenchild.eyehasmuchlessinfluenceonVBthanonMCMCintheablationtest.Thoughre-sultsintheadd-onetestrevealthatVBgeneralizesmuchbetterthanMCMCwhenpresentedwithasinglesocialcue,itremainsinterestingtofindoutinternallywhatcausesthedifference.lingforthecontinuityofdiscourse,aconfoundinpreviousanalyses.Inotherwords,theimportanceofchild.eyesinthediscoursemodelsuggeststhatthiscueencodesusefulinformationinadditiontotheintersententialdiscoursetopic.7.4DiscourseStructureEvaluationWhilethediscoursemodelperformswellusingmet-ricsfrompreviouswork,thesemetricsdonotfullyreflectanimportantstrengthofthemodel:itsabil-itytocaptureinter-utterancestructure.Forexam-RawDiscourseUtterancecarcarcomehereletsfindthecarcartherecarcaristhatacarcarcarthecargoesvroomvroomvroomTable6:Topicannotationexamples.raw(previousmetrics)anddiscourse(newmetrics).ple,considerthesequenceofutterancesinTable6.Ourpreviousevaluationisbasedontherawannota-tion,whichlabelsastopicalonlyutterancescontain-ingtopicalwordsorpronounsreferringtoanobject.Asaresult,classifying“there”ascarisincorrect.Fromtheperspectiveofahumanlistener,however,“there”ispartofabroaderdiscourseaboutthecar,andlabelingitwiththesametopiccapturesthefactthatitencodesusefulinformationforlearners.Todifferentiatethesecases,FrankandRohde(underre-view)addedanewsetofannotations(tothedatasetusedinSection7)basedonthediscoursestructureperceivedbyhuman,similartocolumndiscourse,.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
l
a
c
_
a
_
0
0
2
3
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
324
Weutilizethesenewannotationstojudgetopicspredictedbyourdiscoursemodelandadoptprevi-ousmetricsfordiscoursesegmentationevaluation:a=b,asimpleproportionequivalenceofdiscourseassignments;pk,awindowmethod(Beefermanetal.,1999)tomeasuretheprobabilityoftworandomutterancescorrectlyclassifiedasbeinginthesamediscourse;andWindowDiff(PevznerandHearst,2002),animprovedversionofpkwhichgives“par-tialcredit”toboundariesclosetothecorrectones.ResultsinTable7demonstratethatourmodelisinbetteragreementwithhumanannotation(model-human)thantherawannotation(raw-human)acrossallmetrics.Asisvisiblefromthelimitedchangeinthea=bmetric,relativelyfewtopicassignmentsarealtered;yetthesealterationscreatemuchmorecoherentdiscoursesthatallowforfarbettersegmen-tationperformanceunderpkandWindowDiff.raw-humanmodel-humana=b63.669.3pk57.083.6WindowDiff36.261.2Table7:Discourseevaluation.Singleannotatorsample,comparisonbetweentopicsassignedbytherawannota-tion,ourdiscoursemodel,andahumancoder.Toputanupperboundonpossiblediscourseseg-mentationresults,wefurtherevaluatedperformanceonasubsetof634utterancesforwhichmultiplean-notationswerecollected.ResultsinTable8demon-stratethatourmodelpredictsdiscoursetopics(m-h1,m-h1)atalevelquiteclosetothelevelofagreementbetweenhumanannotators(columnh1-h2).r-h1r-h2m-h1m-h2h1-h2a=b60.165.670.472.481.7pk50.751.885.184.989.7WindowDiff29.030.160.166.972.7Table8:Discourseevaluation.Multipleannotatorsam-ple,comparisonbetweenrawannotations(r),ourmodel(m),andtwoindependenthumancoders(h1,h2).8ConclusionandFutureWorkInthispaper,weproposedanovelintegrationofexistingtechniquesinparsingandgrammarinduc-tiontoofferacompletesolutionforsimultaneouslymodelinggroundedlanguageatthesentenceanddiscourselevels.Specifically,weusedtheEar-leyalgorithmtoexploitthespecialstructureofourgrammarstoachieveapproximatelylinearparsingtime,introducedarescalingapproachtohandleverylonginputstrings,andutilizedVariationalBayesforgrammarinductiontoobtainbettersolutionsthantheExpectationMaximizationalgorithm.Bytransformingagroundedlanguagelearningproblemintoagrammaticalinferencetask,weusedourparsertostudyhowdiscoursestructurecouldfacilitatechildren’slanguageacquisition.Inad-dition,weinvestigatetheinteractionbetweendis-coursestructureandsocialcues,bothimportantandcomplementarysourcesofinformationinlan-guagelearning(Baldwin,1993;Franketal.,2013).Wealsoexaminedwhyindividualchildren’sgazewasanimportantpredictorofreferenceinprevi-ouswork(Johnsonetal.,2012).Usingablationtests,weshowedthatinformationprovidedbythechild’sgazeisstillvaluableeveninthepresenceofdiscoursecontinuity,supportingthehypothesisthatparents“followin”ontheparticularfocusofchil-dren’sattention(TomaselloandFarrar,1986).Lastly,weshowedthatourmodelscanproduceaccuratediscoursesegmentations.Oursystem’sout-putisconsiderablybetterthantherawtopicanno-tationsprovidedintheprevioussocialcuecorpus(Franketal.,2013)andisingoodagreementwithdiscoursetopicsassignedbyhumanannotatorsinFrankandRohde(underreview).Inconclusion,althoughpreviousworkongroundedlanguagelearninghastreatedindividualutterancesasindependententities,wehaveshownthattheabilitytoincorporatediscourseinformationcanbequiteusefulforsuchproblems.Discoursecontinuityisanimportantsourceofinformationinchildrenlanguageacquisitionandmaybeavaluablepartoffuturegroundedlanguagelearningsystems.AcknowledgementsWethanktheTACLactioneditor,MarkSteed-man,andtheanonymousreviewersfortheirvalu-ablefeedback,aswellasChrisManningforhelpfuldiscussions.ThisresearchwassupportedundertheAustralianResearchCouncil’sDiscoveryProjectsfundingscheme(projectnumbersDP110102506andDP110102593).
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
l
a
c
_
a
_
0
0
2
3
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
325
ReferencesAlfredV.AhoandJefferyD.Ullman.1972.TheThe-oryofParsing,TranslationandCompiling;Volume1:Parsing.Prentice-Hall,EnglewoodCliffs,NewJersey.YoavArtziandLukeS.Zettlemoyer.2013.Weaklysu-pervisedlearningofsemanticparsersformappingin-structionstoactions.TransactionsoftheAssociationforComputationalLinguistics,1:49–62.JohnAycockandR.NigelHorspool.2002.Practicalear-leyparsing.TheComputerJournal,45(6):620–630.LalitR.Bahl,FrederickJelinek,andRobertL.Mercer.1983.Amaximumlikelihoodapproachtocontinu-ousspeechrecognition.IEEETransactionsonPatternAnalysisandMachineIntelligence,5(2):179–190.JamesK.Baker.1979.Trainablegrammarsforspeechrecognition.TheJournaloftheAcousticalSocietyofAmerica,65(S1):S132.JasonBaldridgeandAlexLascarides.2005.Proba-bilistichead-drivenparsingfordiscoursestructure.InCONLL.DareA.Baldwin.1993.Infants’abilitytoconsultthespeakerforcluestowordreference.JournalofChildLanguage,20:395–418.ReginaBarzilayandMirellaLapata.2005.Collectivecontentselectionforconcept-to-textgeneration.InHLT-EMNLP.DougBeeferman,AdamBerger,andJohnLafferty.1999.Statisticalmodelsfortextsegmentation.Ma-chineLearning,34(1-3):177–210.BenjaminB¨orschinger,BevanK.Jones,andMarkJohn-son.2011.Reducinggroundedlearningtaskstogram-maticalinference.InEMNLP.S.R.K.Branavan,HarrChen,LukeS.Zettlemoyer,andReginaBarzilay.2009.Reinforcementlearningformappinginstructionstoactions.InACL-IJCNLP.MalindaCarpenter,KatherineNagell,MichaelTomasello,GeorgeButterworth,andChrisMoore.1998.Socialcognition,jointattention,andcom-municativecompetencefrom9to15monthsofage.Monographsofthesocietyforresearchinchilddevelopment,63(4).DavidL.ChenandRaymondJ.Mooney.2008.Learningtosportscast:Atestofgroundedlanguageacquisition.InICML.DavidL.Chen,JoohyunKim,andRaymondJ.Mooney.2010.Trainingamultilingualsportscaster:Usingper-ceptualcontexttolearnlanguage.JournalofArtificialIntelligenceResearch,37:397–435.JamesClarke,DanGoldwasser,Ming-WeiChang,andDanRoth.2010.Drivingsemanticparsingfromtheworld’sresponse.InCoNLL.JayEarley.1970.Anefficientcontext-freeparsingalgo-rithm.CommunicationsoftheACM,13(2):94–102.VanessaWeiFengandGraemeHirst.2012.Text-leveldiscourseparsingwithrichlinguisticfeatures.InACL.KatherineForbes,EleniMiltsakaki,RashmiPrasad,AnoopSarkar,JoshiAravind,andBonnieWebber.2003.D-ltagsystem:Discourseparsingwithalexical-izedtreeadjoininggrammar.JournalofLogic,Lan-guageandInformation,12:261–279.MichaelC.FrankandHannahRohde.underreview.Markersoftopicaldiscourseinchild-directedspeech.MichaelC.Frank,NoahD.Goodman,andJoshB.Tenen-baum.2008.ABayesianframeworkforcross-situationalword-learning.AdvancesinNeuralInfor-mationProcessingSystems20.MichaelC.Frank,JoshuaB.Tenenbaum,andAnneFernald.2013.Socialanddiscoursecontributionstothedeterminationofreferenceincross-situationalwordlearning.LanguageLearningandDevelopment,9(1):1–24.JianfengGaoandMarkJohnson.2008.AcomparisonofBayesianestimatorsforunsupervisedHiddenMarkovModelPOStaggers.InEMNLP.DanGoldwasserandDanRoth.2011.Learningfromnaturalinstructions.InIJCAI.DanGoldwasser,RoiReichart,JamesClarke,andDanRoth.2011.Confidencedrivenunsupervisedsemanticparsing.InACL.PeterGorniakandDebRoy.2007.Situatedlanguageunderstandingasfilteringperceivedaffordances.Cog-nitiveScience,31(2):197–231.HugoHernault,HelmutPrendinger,DavidA.duVerle,andMitsuruIshizuk.2010.HILDA:Adiscourseparserusingsupportvectormachineclassification.Di-alogueandDiscourse,1(3):1–33.JerryR.Hobbs.1990.LiteratureandCognition.CSLILectureNotes21.LiangHuangandKenjiSagae.2010.Dynamicprogram-mingforlinear-timeincrementalparsing.InACL.FrederickJelinekandJohnD.Lafferty.1991.Compu-tationoftheprobabilityofinitialsubstringgenerationbystochasticcontext-freegrammars.ComputationalLinguistics,17(3):315–323.MarkJohnson,KatherineDemuth,andMichaelFrank.2012.Exploitingsocialinformationingroundedlan-guagelearningviagrammaticalreduction.InACL.MarkJohnson.2010.Pcfgs,topicmodels,adaptorgram-marsandlearningtopicalcollocationsandthestruc-tureofpropernames.InACL.JoohyunKimandRaymondJ.Mooney.2012.Unsu-pervisedpcfginductionforgroundedlanguagelearn-ingwithhighlyambiguoussupervision.InEMNLP-CoNLL.JoohyunKimandRaymondJ.Mooney.2013.Adaptingdiscriminativererankingtogroundedlanguagelearn-ing.InACL.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
3
0
1
5
6
6
6
7
7
/
/
t
l
a
c
_
a
_
0
0
2
3
0
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
326
AlistairKnottandTedSanders.1997.Theclassificationofcoherencerelationsandtheirlinguisticmarkers:Anexplorationoftwolanguages.JournalofPragmatics,30(2):135–175.KenichiKuriharaandTaisukeSato.2004.Anappli-cationofthevariationalbayesianapproachtoproba-bilisticcontextfreegrammars.InIJCNLPWorkshopBeyondShallowAnalyses.KenichiKuriharaandTaisukeSato.2006.Variationalbayesiangrammarinductionfornaturallanguage.InICGI.TomKwiatkowski,LukeS.Zettlemoyer,SharonGold-water,andMarkSteedman.2010.Inducingproba-bilisticccggrammarsfromlogicalformwithhigher-orderunification.InEMNLP.AlexLascaridesandNicholasAsher.1993.Temporalinterpretation,discourserelations,andcommonsenseentailment.LinguisticsandPhilosophy,16(5):437–493.JoopM.I.M.Leo.1991.Ageneralcontext-freeparsingalgorithmrunninginlineartimeoneverylr(k)gram-marwithoutusinglookahead.TheoreticalComputerScience,82(1):165–176.RogerLevy.2008.Expectation-basedsyntacticcompre-hension.Cognition,106(3):1126–1177.PercyLiang,MichaelI.Jordan,andDanKlein.2009.Learningsemanticcorrespondenceswithlesssupervi-sion.InACL-IJCNLP,pages91–99.PercyLiang,MichaelI.Jordan,andDanKlein.2011.Learningdependency-basedcompositionalsemantics.InAssociationforComputationalLinguistics(ACL).ZihengLin,HweeTouNg,andMin-YenKan.2012.Apdtb-styledend-to-enddiscourseparser.NaturalLan-guageEngineering,FirstView:1–34.WeiLu,HweeTouNg,WeeSunLee,andLukeS.Zettle-moyer.2008.Agenerativemodelforparsingnaturallanguagetomeaningrepresentations.InEMNLP.WilliamC.MannandSandraA.Thompson.1988.Rhetoricalstructuretheory:Towardafunctionalthe-oryoftextorganization.Text,8(3):243–281.DanielMarcu.1997.Therhetoricalparsingofnaturallanguagetexts.InACL.DanielMarcu.1999.Adecision-basedapproachtorhetoricalparsing.InACL.LevPevznerandMartiA.Hearst.2002.Acritiqueandimprovementofanevaluationmetricfortextsegmen-tation.ComputationalLinguistics,28(1):19–36.LiviaPolanyi,ChrisCuly,MartinVanDenBerg,GianLorenzoThione,andDavidAhn.2004.Arulebasedapproachtodiscourseparsing.InSigDIAL.HoifungPoonandPedroDomingos.2009.Unsuper-visedsemanticparsing.InEMNLP.LawrenceR.Rabiner.1990.AtutorialonhiddenMarkovmodelsandselectedapplicationsinspeechrecognition.RemkoSchaandLiviaPolanyi.1988.Anaugmentedcontextfreegrammarfordiscourse.InCOLING.JeffreyM.Siskind.1996.Acomputationalstudyofcross-situationaltechniquesforlearningword-to-meaningmappings.Cognition,61(1-2):39–91.BenjaminSnyderandReginaBarzilay.2007.Database-textalignmentviastructuredmultilabelclassification.InIJCAI.RaduSoricutandDanielMarcu.2003.Sentenceleveldiscourseparsingusingsyntacticandlexicalinforma-tion.InNAACL.AndreasStolcke.1995.Anefficientprobabilisticcontext-freeparsingalgorithmthatcomputesprefixprobabilities.ComputationalLinguistics,21(2):165–201.RajenSubbaandBarbaraDiEugenio.2009.Aneffec-tivediscourseparserthatusesrichlinguisticinforma-tion.InNAACL.StefanieTellex,ThomasKolla,StevenDickerson,MatthewR.Walter,AshisG.Banerjee,SethTeller,andNicholas.Roy.2011a.Approachingthesymbolgroundingproblemwithprobabilisticgraphicalmod-els.AIMagazine,32(4):64–76.StefanieTellex,ThomasKolla,StevenDickerson,MatthewR.Walter,AshisG.Banerjee,SethTeller,andNicholasRoy.2011b.UnderstandingNaturalLan-guageCommandsforRoboticNavigationandMobileManipulation.InAAAI.MichaelTomaselloandMichaelJeffreyFarrar.1986.Jointattentionandearlylanguage.Childdevelopment,pages1454–1463.AdamVogelandDanielJurafsky.2010.Learningtofol-lownavigationaldirections.InACL.YukWahWongandRaymondJ.Mooney.2007.Learn-ingsynchronousgrammarsforsemanticparsingwithlambdacalculus.InACL.ChenYuandDanaH.Ballard.2004.Ontheintegrationofgroundinglanguageandlearningobjects.InAAAI.ChenYuandDanaHBallard.2007.Aunifiedmodelofearlywordlearning:Integratingstatisticalandsocialcues.Neurocomputing,70(13):2149–2165.LukeS.ZettlemoyerandMichaelCollins.2005.Learn-ingtoMapSentencestoLogicalForm:StructuredClassificationwithProbabilisticCategorialGrammars.InUAI.MichaelZettlemoyer,LukeS.andCollins.2007.OnlinelearningofrelaxedCCGgrammarsforparsingtolog-icalform.InEMNLP-CoNLL.
下载pdf