What topic do you need documentation on?
Transactions of the Association for Computational Linguistics, 1 (2013) 301–314. Action Editor: Jason Eisner.
Transactions of the Association for Computational Linguistics, 1 (2013) 301–314. Action Editor: Jason Eisner. Submitted 6/2012; Revised 10/2012; Published 7/2013. c (cid:13) 2013 Association for Computational Linguistics. Data-driven,PCFG-basedandPseudo-PCFG-basedModelsforChineseDependencyParsingWeiweiSunandXiaojunWanInstituteofComputerScienceandTechnology,PekingUniversityTheMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity{ws,wanxiaojun}@pku.edu.cnAbstractWepresentacomparativestudyoftransition-,graph-andPCFG-basedmodelsaimedatil-luminatingmorepreciselythelikelycontri-butionofCFGsinimprovingChinesedepen-dencyparsingaccuracy,especiallybycom-biningheterogeneousmodels.Inspiredbytheimpactofaconstituencygrammaronde-pendencyparsing,weproposeseveralstrate-giestoacquirepseudoCFGsonlyfromde-pendencyannotations.Comparedtolinguisticgrammarslearnedfromrichphrase-structuretreebanks,welldesignedpseudogrammarsachievesimilarparsingaccuracyandhaveequivalentcontributionstoparserensemble.Moreover,pseudogrammarsincreasethedi-versityofbasemodels;therefore,togetherwithallothermodels,furtherimprovesys-temcombination.BasedonautomaticPOStagging,ourfinalmodelachievesaUASof87.23%,resultinginasignificantimprove-mentofthestateoftheart.1IntroductionPopularapproachestodependencyparsingcanbedividedintotwoclasses:grammar-freeandgrammar-based.Data-driven,grammar-freeap-proachesmakeessentialuseofmachinelearningfromlinguisticannotationsinordertoparsenewsentences.Suchapproaches,e.g.transition-based(Nivre,2008)andgraph-based(McDonald,2006;TorresMartinsetal.,2009)haveattractedthemostattentioninrecentyears.Incontrast,grammar-basedapproachesrelyonlinguisticgrammars(ineitherdependencyorconstituencyformalisms)toshapethesearchspaceforpossiblesyntacticanal-ysis.Inparticular,CFG-baseddependencyparsingexploitsamappingbetweendependencyandcon-stituencyrepresentationsandreusesparsingalgo-rithmsdevelopedforCFGtoproducedependencystructures.Inpreviouswork,data-driven,discrim-inativeapproacheshavebeenwidelydiscussedforChinesedependencyparsing.Ontheotherhand,variousPCFG-basedconstituentparsingmethodshavebeenappliedtoobtainphrase-structuresaswell.Withrichlinguisticrules,phrase-structuresofChinesesentencescanbewelltransformedtotheircorrespondingdependencystructures(Xue,2007).Therefore,PCFGparserswithsuchconversionrulescanbetakenasanothertypeofdependencyparser.WecallthemPCFG-basedparsers,inthispaper.Explicitlydefininglinguisticrulestoexpresspreciselygenericgrammaticalregularities,acon-stituencygrammarcanbeappliedtoarrangesen-tencesintoahierarchyofnestedphrases,whichde-terminesconstructionsbetweenlargerphrasesandtheirsmallercomponentphrases.Thistypeofinfor-mationisdifferentfrom,buthighlyrelatedto,theinformationcapturedbyadependencyrepresenta-tion.Aconstituencygrammar,thus,hasgreatpossi-blecontributionstodependencyparsing.Inordertopavethewayfornewandbettermethods,westudytheimpactofCFGsonChinesedependencyparsing.Aseriesofempiricalanalysisofstate-of-the-artgraph-,transition-andPCFG-basedparsersispresentedtoilluminatemorepreciselythepropertiesofheterogeneousmodels.WeshowthatCFGshaveagreatimpactondependencyparsingandPCFG-basedmodelshavecomplementarypredictivepow-erstodata-drivenmodels.Systemensembleisaneffectiveandimportanttechniquetobuildmoreaccurateparsersbasedonmultiple,diverse,weakermodels.Exploitingdiffer- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 291–300. Action Editor: Chris Callison-Burch.
Transactions of the Association for Computational Linguistics, 1 (2013) 291–300. Action Editor: Chris Callison-Burch. Submitted 5/2013; Published 7/2013. c(cid:13)2013 Association for Computational Linguistics. 291 Large-scale Word Alignment Using Soft Dependency Cohesion Constraints Zhiguo Wang and Chengqing Zong National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences {zgwang, cqzong}@nlpr.ia.ac.cn Abstract Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees
Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee.
Transactions of the Association for Computational Linguistics, 1 (2013) 279–290. Action Editor: Lillian Lee. Submitted 11/2012; Revised 1/2013; Published 7/2013. c (cid:13) 2013 Association for Computational Linguistics. Good,Great,Excellent:GlobalInferenceofSemanticIntensitiesGerarddeMeloICSI,Berkeleydemelo@icsi.berkeley.eduMohitBansalCSDivision,UCBerkeleymbansal@cs.berkeley.eduAbstractAdjectiveslikegood,great,andexcellentaresimilarinmeaning,butdifferinintensity.In-tensityorderinformationisveryusefulforlanguagelearnersaswellasinseveralNLPtasks,butismissinginmostlexicalresources(dictionaries,WordNet,andthesauri).Inthispaper,wepresentaprimarilyunsupervisedapproachthatusessemanticsfromWeb-scaledata(e.g.,phraseslikegoodbutnotexcel-lent)torankwordsbyassigningthemposi-tionsonacontinuousscale.WerelyonMixedIntegerLinearProgrammingtojointlydeter-minetheranks,suchthatindividualdecisionsbenefitfromglobalinformation.Whenrank-ingEnglishadjectives,ourglobalalgorithmachievessubstantialimprovementsoverpre-viousworkonbothpairwiseandrankcorre-lationmetrics(specifically,70%pairwiseac-curacyascomparedtoonly56%bypreviouswork).Moreover,ourapproachcanincorpo-rateexternalsynonymyinformation(increas-ingitspairwiseaccuracyto78%)andextendseasilytonewlanguages.Wealsomakeourcodeanddatafreelyavailable.11IntroductionCurrentlexicalresourcessuchasdictionariesandthesauridonotprovideinformationaboutthein-tensityorderofwords.Forexample,bothWordNet(Miller,1995)andRoget’s21stCenturyThesaurus(thesaurus.com)presentacceptable,great,andsu-perbassynonymsoftheadjectivegood.However,anativespeakerknowsthatthesewordsrepresentvaryingintensityandcaninfactgenerallyberankedbyintensityasacceptable
Transactions of the Association for Computational Linguistics, 1 (2013) 267–278. Action Editor: Brian Roark.
Transactions of the Association for Computational Linguistics, 1 (2013) 267–278. Action Editor: Brian Roark. Submitted 3/2013; Published 7/2013. c(cid:13)2013 Association for Computational Linguistics. EfficientParsingforHead-SplitDependencyTreesGiorgioSattaDept.ofInformationEngineeringUniversityofPadua,Italysatta@dei.unipd.itMarcoKuhlmannDept.ofLinguisticsandPhilologyUppsalaUniversity,Swedenmarco.kuhlmann@lingfil.uu.seAbstractHeadsplittingtechniqueshavebeensuccess-fullyexploitedtoimprovetheasymptoticruntimeofparsingalgorithmsforproject-ivedependencytrees,underthearc-factoredmodel.Inthisarticleweextendthesetech-niquestoaclassofnon-projectivedependencytrees,calledwell-nesteddependencytreeswithblock-degreeatmost2,whichhasbeenprevi-ouslyinvestigatedintheliterature.Wedefineastructuralpropertythatallowsheadsplittingforthesetrees,andpresenttwoalgorithmsthatim-proveovertheruntimeofexistingalgorithmsatnosignificantlossincoverage.1IntroductionMuchoftherecentworkondependencyparsinghasbeenaimedatfindingagoodbalancebetweenac-curacyandefficiency.Foroneendofthespectrum,Eisner(1997)showedthatthehighest-scoringpro-jectivedependencytreeunderanarc-factoredmodelcanbecomputedintimeO.n3/,wherenisthelengthoftheinputstring.Laterworkhasfocusedonmak-ingprojectiveparsingviableundermoreexpressivemodels(Carreras,2007;KooandCollins,2010).Atthesametime,ithasbeenobservedthatformanystandarddatasets,thecoverageofprojectivetreesisfarfromcomplete(KuhlmannandNivre,2006),whichhasledtoaninterestinparsingal-gorithmsfornon-projectivetrees.Whilenon-project-iveparsingunderanarc-factoredmodelcanbedoneintimeO.n2/(McDonaldetal.,2005),parsingwithmoreinformedmodelsisintractable(McDonaldandSatta,2007).Thishasledseveralauthorstoinvestig-ate‘mildlynon-projective’classesoftrees,withthegoalofachievingabalancebetweenexpressivenessandcomplexity(KuhlmannandNivre,2006).Inthisarticlewefocusonaclassofmildlynon-projectivedependencystructurescalledwell-nesteddependencytreeswithblock-degreeatmost2.ThisclasswasfirstintroducedbyBodirskyetal.(2005),whoshowedthatitcorresponds,inanaturalway,totheclassofderivationtreesoflexicalizedtree-adjoin-inggrammars(JoshiandSchabes,1997).Whiletherearelinguisticargumentsagainsttherestrictiontothisclass(MaierandLichte,2011;Chen-MainandJoshi,2010),KuhlmannandNivre(2006)foundthatithasexcellentcoverageonstandarddatasets.Assum-inganarc-factoredmodel,well-nesteddependencytreeswithblock-degree(cid:20)2canbeparsedintimeO.n7/usingthealgorithmofG´omez-Rodr´ıguezetal.(2011).Recently,Pitleretal.(2012)haveshownthatifanadditionalrestrictioncalled1-inheritisim-posed,parsingcanbedoneintimeO.n6/,withoutanyadditionallossincoverageonstandarddatasets.Standardcontext-freeparsingmethods,whenadap-tedtotheparsingofprojectivetrees,provideO.n5/timecomplexity.TheO.n3/timeresultreportedbyEisner(1997)hasbeenobtainedbyexploitingmoresophisticateddynamicprogrammingtechniquesthat‘split’dependencytreesatthepositionoftheirheads,inordertosavebookkeeping.Splittingtechniqueshavealsobeenexploitedtospeedupparsingtimeforotherlexicalizedformalisms,suchasbilexicalcontext-freegrammarsandheadautomata(EisnerandSatta,1999).However,toourknowledgenoat-tempthasbeenmadeintheliteraturetoextendthesetechniquestonon-projectivedependencyparsing.InthisarticleweleveragethecentralideafromEisner’salgorithmandextendittotheclassofwell-nesteddependencytreeswithblock-degreeatmost2. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l /
Transactions of the Association for Computational Linguistics, 1 (2013) 255–266. Action Editor: Kristina Toutanova.
Transactions of the Association for Computational Linguistics, 1 (2013) 255–266. Action Editor: Kristina Toutanova. Submitted 11/2012; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. Minimally-SupervisedMorphologicalSegmentationusingAdaptorGrammarsKairitSirtsInstituteofCyberneticsTallinnUniversityofTechnologysirts@phon.ioc.eeSharonGoldwaterSchoolofInformaticsTheUniversityofEdinburghsgwater@inf.ed.ac.ukAbstractThispaperexplorestheuseofAdaptorGram-mars,anonparametricBayesianmodellingframework,forminimallysupervisedmorpho-logicalsegmentation.Wecomparethreetrain-ingmethods:unsupervisedtraining,semi-supervisedtraining,andanovelmodelselec-tionmethod.Inthemodelselectionmethod,wetrainunsupervisedAdaptorGrammarsus-inganover-articulatedmetagrammar,thenuseasmalllabelleddatasettoselectwhichpoten-tialmorphboundariesidentifiedbythemeta-grammarshouldbereturnedinthefinaloutput.Weevaluateonfivelanguagesandshowthatsemi-supervisedtrainingprovidesaboostoverunsupervisedtraining,whilethemodelselec-tionmethodyieldsthebestaverageresultsoveralllanguagesandiscompetitivewithstate-of-the-artsemi-supervisedsystems.Moreover,thismethodprovidesthepotentialtotuneper-formanceaccordingtodifferentevaluationmet-ricsordownstreamtasks.1IntroductionResearchintounsupervisedlearningofmorphologyhasalonghistory,startingwiththeworkofHarris(1951).Whileearlyresearchwasmostlymotivatedbylinguisticinterests,morerecentworkinNLPoftenaimstoreducedatasparsityinmorphologicallyrichlanguagesfortaskssuchasautomaticspeechrecogni-tion,statisticalmachinetranslation,orautomatictextgeneration.Fortheseapplications,however,com-pletelyunsupervisedsystemsmaynotbeidealifevenasmallamountofsegmentedtrainingdataisavailable.Inthispaper,weexploretheuseofAdap-torGrammars(Johnsonetal.,2007)forminimallysupervisedmorphologicalsegmentation.AdaptorGrammars(AGs)areanonparametricBayesianmodellingframeworkthatcanlearnlatenttreestructuresoveraninputcorpusofstrings.Forexample,theycanbeusedtodefineamorpholog-icalgrammarwhereeachwordconsistsofzeroormoreprefixes,astem,andzeroormoresuffixes;theactualformsofthesemorphs(andthesegmentationofwordsintomorphs)arelearnedfromthedata.InthisgeneralapproachAGsaresimilartomanyotherunsupervisedmorphologicalsegmentationsystems,suchasLinguistica(Goldsmith,2001)andtheMor-fessorfamily(CreutzandLagus,2007).Amajordifference,however,isthatthemorphologicalgram-marisspecifiedasaninputtotheprogram,ratherthanhard-coded,whichallowsdifferentgrammarstobeexploredeasily.Forthetaskofsegmentingutterancesintowords,forexample,Johnsonandcol-leagueshaveexperimentedwithgrammarsencodingdifferentkindsofsub-wordandsuper-wordstructure(e.g.,syllablesandcollocations),showingthatthebestgrammarsfaroutperformothersystemsonthesamecorpora(Johnson,2008a;JohnsonandGoldwa-ter,2009;JohnsonandDemuth,2010).ThesewordsegmentationpapersdemonstratedboththepoweroftheAGapproachandthesyner-gisticbehaviorthatoccurswhenlearningmultiplelevelsofstructuresimultaneously.However,thebest-performinggrammarswereselectedusingthesamecorpusthatwasusedforfinaltesting,andeachpaperdealtwithonlyonelanguage.Theidealunsuper-visedlearnerwoulduseasinglegrammartunedon l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 231–242. Action Editor: Noah Smith.
Transactions of the Association for Computational Linguistics, 1 (2013) 231–242. Action Editor: Noah Smith. Submitted 11/2012; Revised 2/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. ModelingSemanticRelationsExpressedbyPrepositionsVivekSrikumarandDanRothUniversityofIllinois,Urbana-ChampaignUrbana,IL.61801.{vsrikum2,danr}@illinois.eduAbstractThispaperintroducestheproblemofpredict-ingsemanticrelationsexpressedbypreposi-tionsanddevelopsstatisticallearningmodelsforpredictingtherelations,theirargumentsandthesemantictypesofthearguments.Wedefineaninventoryof32relations,build-ingonthewordsensedisambiguationtaskforprepositionsandcollapsingrelatedsensesacrossprepositions.Givenaprepositioninasentence,ourcomputationaltasktojointlymodeltheprepositionrelationanditsargu-mentsalongwiththeirsemantictypes,asawaytosupporttherelationprediction.Thean-notateddata,however,onlyprovideslabelsfortherelationlabel,andnottheargumentsandtypes.Weaddressthisbypresentingtwomod-elsforprepositionrelationlabeling.Ourgen-eralizationoflatentstructureSVMgivescloseto90%accuracyonrelationlabeling.Further,byjointlypredictingtherelation,arguments,andtheirtypesalongwithprepositionsense,weshowthatwecannotonlyimprovethere-lationaccuracy,butalsosignificantlyimprovesensepredictionaccuracy.1IntroductionThispaperaddressestheproblemofpredictingse-manticrelationsconveyedbyprepositionsintext.Prepositionsexpressmanysemanticrelationsbe-tweentheirgovernorandobject.Predictingthesecanhelpadvancingtextunderstandingtaskslikequestionansweringandtextualentailment.Considerthesentence:(1)ThebookofProf.Alexanderonprimaryschoolmethodsisavaluableteachingresource.Here,theprepositiononindicatesthatthebookandprimaryschoolmethodsareconnectedbytherelationTopicandofindicatestheCreator-CreationrelationbetweenProf.Alexanderandthebook.Predictingtheserelationscanhelpanswerquestionsaboutthesubjectofthebookandalsorec-ognizetheentailmentofsentenceslikeProf.Alexan-derhaswrittenaboutprimaryschoolmethods.Beinghighlypolysemous,thesameprepositioncanindicatedifferentkindsofrelations,dependingonitsgovernorandobject.Furthermore,severalprepositionscanindicatethesamesemanticrelation.Forexample,considerthesentence:(2)Poorcareledtoherdeathfrompneumonia.TheprepositionfrominthissentenceexpressestherelationCause(death,pneumonia).Inadiffer-entcontext,itcandenoteotherrelations,asinthephrasescopiedfromthefilm(Source)andrecog-nizedfromthestart(Temporal).Ontheotherhand,therelationCausecanbeexpressedbysev-eralprepositions;forexample,thefollowingphrasesexpressaCauserelation:diedofpneumoniaandtiredafterthesurgery.Wecharacterizesemanticrelationsexpressedbytransitiveprepositionsanddevelopaccuratemodelsforpredictingtherelations,identifyingtheirargu-mentsandrecognizingthesemantictypesofthear-guments.Buildingonthewordsensedisambigua-tiontaskforprepositions,wecollapsesemanticallyrelatedsensesacrossprepositionstoderiveourre-lationinventory.Theserelationsactaspredicatesinapredicate-argumentrepresentation,wheretheargumentsarethegovernorandtheobjectofthe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 219–230. Action Editor: Brian Roark.
Transactions of the Association for Computational Linguistics, 1 (2013) 219–230. Action Editor: Brian Roark. Submitted 1/2013; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. JointArc-factoredParsingofSyntacticandSemanticDependenciesXavierLlu´ısandXavierCarrerasandLlu´ısM`arquezTALPResearchCenterUniversitatPolit`ecnicadeCatalunyaJordiGirona1–3,08034Barcelona{xlluis,carreras,lluism}@lsi.upc.eduAbstractInthispaperweintroduceajointarc-factoredmodelforsyntacticandsemanticdependencyparsing.Thesemanticrolelabelerpredictsthefullsyntacticpathsthatconnectpredicateswiththeirarguments.Thisprocessisframedasalinearassignmenttask,whichallowstocontrolsomewell-formednessconstraints.Forthesyntacticpart,wedefineastandardarc-factoreddependencymodelthatpredictsthefullsyntactictree.Finally,weemploydualdecompositiontechniquestoproduceconsis-tentsyntacticandpredicate-argumentstruc-tureswhilesearchingoveralargespaceofsyntacticconfigurations.InexperimentsontheCoNLL-2009Englishbenchmarkweob-serveverycompetitiveresults.1IntroductionSemanticrolelabeling(SRL)isthetaskofidenti-fyingtheargumentsoflexicalpredicatesinasen-tenceandlabelingthemwithsemanticroles(GildeaandJurafsky,2002;M`arquezetal.,2008).SRLisanimportantshallowsemantictaskinNLPsincepredicate-argumentrelationsdirectlyrepresentse-manticpropertiesofthetype“who”did“what”to“whom”,“how”,and“why”foreventsexpressedbypredicates(typicallyverbsandnouns).Predicate-argumentrelationsarestronglyrelatedtothesyntacticstructureofthesentence:thema-jorityofpredicateargumentscorrespondtosomesyntacticconstituent,andthesyntacticstructurethatconnectsanargumentwiththepredicateisastrongindicatorofitssemanticrole.Actually,semanticrolesrepresentanabstractionofthesyntacticformofapredicativeevent.Whilesyntacticfunctionsofargumentschangewiththeformoftheevent(e.g.,activevs.passiveforms),thesemanticrolesofargu-mentsremaininvarianttotheirsyntacticrealization.Consequently,sincethefirstworks,SRLsystemshaveassumedaccesstothesyntacticstructureofthesentence(GildeaandJurafsky,2002;CarrerasandM`arquez,2005).Asimpleapproachistoobtaintheparsetreesasapre-processtotheSRLsystem,whichallowstheuseofunrestrictedfeaturesofthesyntax.However,asinotherpipelineapproachesinNLP,ithasbeenshownthattheerrorsofthesyn-tacticparserseverelydegradethepredictionsoftheSRLmodel(GildeaandPalmer,2002).AcommonapproachtoalleviatethisproblemistoworkwithmultiplealternativesyntactictreesandlettheSRLsystemoptimizeoveranyinputtreeorpartofit(Toutanovaetal.,2008;Punyakanoketal.,2008).Asastepfurther,morerecentworkhasproposedparsingmodelsthatpredictsyntacticstructureaug-mentedwithsemanticpredicate-argumentrelations(Surdeanuetal.,2008;Hajiˇcetal.,2009;Johansson,2009;Titovetal.,2009;Llu´ısetal.,2009),whichisthefocusofthispaper.Thesejointmodelsshouldfavorthesyntacticstructurethatismostconsistentwiththesemanticpredicate-argumentstructuresofasentence.Inprinciple,thesemodelscanexploitsyntacticandsemanticfeaturessimultaneously,andcouldpotentiallyimprovetheaccuracyforbothsyn-tacticandsemanticrelations.Onedifficultyinthedesignofjointsyntactic-semanticparsingmodelsisthatthereexistimpor-tantstructuraldivergencesbetweenthetwolayers. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 207–218. Action Editor: Ben Taskar.
Transactions of the Association for Computational Linguistics, 1 (2013) 207–218. Action Editor: Ben Taskar. Submitted 10/2012; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. DualCoordinateDescentAlgorithmsforEfficientLargeMarginStructuredPredictionMing-WeiChangWen-tauYihMicrosoftResearchRedmond,WA98052,USA{minchang,scottyih}@microsoft.comAbstractDuetothenatureofcomplexNLPproblems,structuredpredictionalgorithmshavebeenimportantmodelingtoolsforawiderangeoftasks.WhilethereexistsevidenceshowingthatlinearStructuralSupportVectorMachine(SSVM)algorithmperformsbetterthanstruc-turedPerceptron,theSSVMalgorithmisstilllessfrequentlychosenintheNLPcommunitybecauseofitsrelativelyslowtrainingspeed.Inthispaper,weproposeafastandeasy-to-implementdualcoordinatedescentalgorithmforSSVMs.UnlikealgorithmssuchasPer-ceptronandstochasticgradientdescent,ourmethodkeepstrackofdualvariablesandup-datestheweightvectormoreaggressively.Asaresult,thistrainingprocessisasefficientasexistingonlinelearningmethods,andyetde-rivesconsistentlybettermodels,asevaluatedonfourbenchmarkNLPdatasetsforpart-of-speechtagging,named-entityrecognitionanddependencyparsing.1IntroductionComplexnaturallanguageprocessingtasksarein-herentlystructured.Fromsequencelabelingprob-lemslikepart-of-speechtaggingandnamedentityrecognitiontotreeconstructiontaskslikesyntacticparsing,strongdependenciesexistamongthela-belsofindividualcomponents.Bymodelingsuchrelationsintheoutputspace,structuredoutputpre-dictionalgorithmshavebeenshowntooutperformsignificantlysimplebinaryormulti-classclassi-fiers(Laffertyetal.,2001;Collins,2002;McDonaldetal.,2005).Amongtheexistingstructuredoutputpredictionalgorithms,thelinearStructuralSupportVectorMachine(SSVM)algorithm(Tsochantaridisetal.,2004;Joachimsetal.,2009)hasshownoutstandingperformanceinseveralNLPtasks,suchasbilingualwordalignment(Mooreetal.,2007),constituencyanddependencyparsing(Taskaretal.,2004b;Kooetal.,2007),sentencecompression(CohnandLa-pata,2009)anddocumentsummarization(Lietal.,2009).Nevertheless,asalearningmethodforNLP,theSSVMalgorithmhasbeenlessthanpopularal-gorithmssuchasthestructuredPerceptron(Collins,2002).Thismaybeduetothefactthatcur-rentSSVMimplementationsoftensufferfromsev-eralpracticalissues.First,state-of-the-artimple-mentationsofSSVMsuchascuttingplanemeth-ods(Joachimsetal.,2009)aretypicallycompli-cated.1Second,whilemethodslikestochasticgradi-entdescentaresimpletoimplement,tuninglearningratescanbedifficult.Finally,whileSSVMmod-elscanachievesuperioraccuracy,thisoftenrequireslongtrainingtime.Inthispaper,weproposeanoveloptimiza-tionmethodforefficientlytraininglinearSSVMs.Ourmethodnotonlyiseasytoimplement,butalsohasexcellenttrainingspeed,competitivewithbothstructuredPerceptron(Collins,2002)andMIRA(Crammeretal.,2005).WhenevaluatedonseveralNLPtasks,includingPOStagging,NERanddependencyparsing,thisoptimizationmethodalsooutperformsotherapproachesintermsofpredictionaccuracy.Ourfinalalgorithmisadualcoordinate1Ouralgorithmiseasytoimplementmainlybecauseweusethesquarehingelossfunction. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 193–206. Action Editor: Jason Eisner.
Transactions of the Association for Computational Linguistics, 1 (2013) 193–206. Action Editor: Jason Eisner. Submitted 10/2012; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. JointlyLearningtoParseandPerceive:ConnectingNaturalLanguagetothePhysicalWorldJayantKrishnamurthyComputerScienceDepartmentCarnegieMellonUniversityjayantk@cs.cmu.eduThomasKollarComputerScienceDepartmentCarnegieMellonUniversitytkollar@andrew.cmu.eduAbstractThispaperintroducesLogicalSemanticswithPerception(LSP),amodelforgroundedlan-guageacquisitionthatlearnstomapnatu-rallanguagestatementstotheirreferentsinaphysicalenvironment.Forexample,givenanimage,LSPcanmapthestatement“bluemugonthetable”tothesetofimageseg-mentsshowingbluemugsontables.LSPlearnsphysicalrepresentationsforbothcate-gorical(“blue,”“mug”)andrelational(“on”)language,andalsolearnstocomposetheserepresentationstoproducethereferentsofen-tirestatements.WefurtherintroduceaweaklysupervisedtrainingprocedurethatestimatesLSP’sparametersusingannotatedreferentsforentirestatements,withoutannotatedref-erentsforindividualwordsortheparsestruc-tureofthestatement.Weperformexperimentsontwoapplications:sceneunderstandingandgeographicalquestionanswering.WefindthatLSPoutperformsexisting,lessexpressivemodelsthatcannotrepresentrelationallan-guage.Wefurtherfindthatweaklysupervisedtrainingiscompetitivewithfullysupervisedtrainingwhilerequiringsignificantlylessan-notationeffort.1IntroductionLearningthemappingfromnaturallanguagetophysicalenvironmentsisacentralproblemfornatu-rallanguagesemantics.Understandingthismappingisnecessarytoenablenaturallanguageinteractionswithrobotsandotherembodiedsystems.Forexam-ple,foranautonomousrobottounderstandthesen-tence“Thebluemugisonthetable,”itmustbeabletoidentify(1)theobjectsinitsenvironmentcorre-spondingto“bluemug”and“table,”and(2)theob-jectswhichparticipateinthespatialrelationdenotedby“on.”Iftherobotcansuccessfullyidentifytheseobjects,itunderstandsthemeaningofthesentence.Theproblemoflearningtomapfromnaturallan-guageexpressionstotheirreferentsinanenviron-mentisknownasgroundedlanguageacquisition.Inembodiedsettings,environmentsconsistofrawsensordata–forexample,anenvironmentcouldbeanimagecollectedfromarobot’scamera.Insuchapplications,groundedlanguageacquisitionhastwosubproblems:parsing,learningthecompositionalstructureofnaturallanguage;andperception,learn-ingtheenvironmentalreferentsofindividualwords.Acquiringbothkindsofknowledgeisnecessarytounderstandnovellanguageinnovelenvironments.Unfortunately,perceptionisoftenignoredinworkonlanguageacquisition.Othervariantsofgroundedlanguageacquisitioneliminatetheneedforpercep-tionbyassumingaccesstoalogicalrepresentationoftheenvironment(ZettlemoyerandCollins,2005;WongandMooney,2006;Matuszeketal.,2010;ChenandMooney,2011;Liangetal.,2011).Theexistingworkwhichhasjointlyaddressedbothpars-ingandperceptionhassignificantdrawbacks,in-cluding:(1)fullysupervisedmodelsrequiringlargeamountsofmanualannotationand(2)limitedse-manticrepresentations(Kollaretal.,2010;Tellexetal.,2011;Matuszeketal.,2012).ThispaperintroducesLogicalSemanticswithPerception(LSP),amodelforgroundedlanguageacquisitionthatjointlylearnstosemanticallyparselanguageandperceivetheworld.LSPmodelsamappingfromnaturallanguagequeriestosetsofob-jectsinareal-worldenvironment.TheinputtoLSPisanenvironmentcontainingobjects,suchasaseg- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos.
Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos. Submitted 1/2013; Revised 3/2013; Published 5/2013. c(cid:13)2013 Association for Computational Linguistics. CombinedDistributionalandLogicalSemanticsMikeLewisSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKmike.lewis@ed.ac.ukMarkSteedmanSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89AB,UKsteedman@inf.ed.ac.ukAbstractWeintroduceanewapproachtosemanticswhichcombinesthebenefitsofdistributionalandformallogicalsemantics.Distributionalmodelshavebeensuccessfulinmodellingthemeaningsofcontentwords,butlogicalse-manticsisnecessarytoadequatelyrepresentmanyfunctionwords.Wefollowformalse-manticsinmappinglanguagetologicalrep-resentations,butdifferinthattherelationalconstantsusedareinducedbyofflinedistri-butionalclusteringatthelevelofpredicate-argumentstructure.Ourclusteringalgorithmishighlyscalable,allowingustorunoncor-porathesizeofGigaword.Differentsensesofawordaredisambiguatedbasedontheirin-ducedtypes.Weoutperformavarietyofex-istingapproachesonawide-coveragequestionansweringtask,anddemonstratetheabilitytomakecomplexmulti-sentenceinferencesin-volvingquantifiersontheFraCaSsuite.1IntroductionMappingnaturallanguagetomeaningrepresenta-tionsisacentralchallengeofNLP.Therehasbeenmuchrecentprogressinunsuperviseddistributionalsemantics,inwhichthemeaningofawordisin-ducedbasedonitsusageinlargecorpora.Thisap-proachisusefulforarangeofkeyapplicationsin-cludingquestionansweringandrelationextraction(LinandPantel,2001;PoonandDomingos,2009;Yaoetal.,2011).Becausesuchasemanticscanbeautomicallyinduced,itescapesthelimitationofde-pendingonrelationsfromhand-builttrainingdata,knowledgebasesorontologies,whichhaveprovedoflimiteduseincapturingthehugevarietyofmean-ingsthatcanbeexpressedinlanguage.However,distributionalsemanticshaslargelyde-velopedinisolationfromtheformalsemanticsliter-ature.Whilstdistributionalsemanticshasbeenef-fectiveinmodellingthemeaningsofcontentwordssuchasnounsandverbs,itislessclearthatitcanbeappliedtothemeaningsoffunctionwords.Semanticoperators,suchasdeterminers,negation,conjunc-tions,modals,tense,mood,aspect,andpluralsareubiquitousinnaturallanguage,andarecrucialforhighperformanceonmanypracticalapplications—butcurrentdistributionalmodelsstruggletocaptureevensimpleexamples.Conversely,computationalmodelsofformalsemanticshaveshownlowrecallonpracticalapplications,stemmingfromtheirre-lianceonontologiessuchasWordNet(Miller,1995)tomodelthemeaningsofcontentwords(Bobrowetal.,2007;BosandMarkert,2005).Forexample,considerwhatisneededtoansweraquestionlikeDidGooglebuyYouTube?fromthefollowingsentences:1.GooglepurchasedYouTube2.Google’sacquisitionofYouTube3.Googleacquiredeverycompany4.YouTubemaybesoldtoGoogle5.GooglewillbuyYouTubeorMicrosoft6.Googledidn’ttakeoverYouTubeAlloftheserequireknowledgeoflexicalseman-tics(e.g.thatbuyandpurchasearesynonyms),butsomealsoneedinterpretationofquantifiers,nega-tives,modalsanddisjunction.Itseemsunlikelythat l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 165–178. Action Editor: David Chiang.
Transactions of the Association for Computational Linguistics, 1 (2013) 165–178. Action Editor: David Chiang. Submitted 11/2012; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. Learningtotranslatewithproductsofnovices:asuiteofopen-endedchallengeproblemsforteachingMTAdamLopez1,MattPost1,ChrisCallison-Burch1,2,JonathanWeese,JuriGanitkevitch,NargesAhmidi,OliviaBuzek,LeahHanson,BeenishJamil,MatthiasLee,Ya-TingLin,HenryPao,FatimaRivera,LeiliShahriyari,DebuSinha,AdamTeichert,StephenWampler,MichaelWeinberger,DaguangXu,LinYang,andShangZhao∗DepartmentofComputerScience,JohnsHopkinsUniversity1HumanLanguageTechnologyCenterofExcellence,JohnsHopkinsUniversity2ComputerandInformationScienceDepartment,UniversityofPennsylvaniaAbstractMachinetranslation(MT)drawsfromseveraldifferentdisciplines,makingitacomplexsub-jecttoteach.Thereareexcellentpedagogicaltexts,butproblemsinMTandcurrentalgo-rithmsforsolvingthemarebestlearnedbydoing.AsacenterpieceofourMTcourse,wedevisedaseriesofopen-endedchallengesforstudentsinwhichthegoalwastoim-proveperformanceoncarefullyconstrainedinstancesoffourkeyMTtasks:alignment,decoding,evaluation,andreranking.Studentsbroughtadiversesetoftechniquestotheprob-lems,includingsomenovelsolutionswhichperformedremarkablywell.Asurprisingandexcitingoutcomewasthatstudentsolutionsortheircombinationsfaredcompetitivelyonsometasks,demonstratingthatevennewcom-erstothefieldcanhelpimprovethestate-of-the-artonhardNLPproblemswhilesimulta-neouslylearningagreatdeal.Theproblems,baselinecode,andresultsarefreelyavailable.1IntroductionAdecadeago,studentsinterestedinnaturallan-guageprocessingarrivedatuniversitieshavingbeenexposedtotheideaofmachinetranslation(MT)primarilythroughsciencefiction.Today,incomingstudentshavebeenexposedtoserviceslikeGoogleTranslatesincetheywereinsecondaryschoolorear-lier.Forthem,MTissciencefact.SoitmakessensetoteachstatisticalMT,eitheronitsownorasaunit∗Thefirstfiveauthorswereinstructorsandtheremainingau-thorswerestudentsintheworkeddescribedhere.ThisresearchwasconductedwhileChrisCallison-BurchwasatJohnsHop-kinsUniversity.inaclassonnaturallanguageprocessing(NLP),ma-chinelearning(ML),orartificialintelligence(AI).AcoursethatpromisestoshowstudentshowGoogleTranslateworksandteachthemhowtobuildsome-thinglikeitisespeciallyappealing,andseveraluni-versitiesandsummerschoolsnowoffersuchclasses.Thereareexcellentintroductorytexts—dependingonthelevelofdetailrequired,instructorscanchoosefromacomprehensiveMTtextbook(Koehn,2010),achapterofapopularNLPtextbook(JurafskyandMartin,2009),atutorialsurvey(Lopez,2008),oranintuitivetutorialontheIBMModels(Knight,1999b),amongmanyothers.ButMTisnotjustanobjectofacademicstudy.It’sarealapplicationthatisn’tfullyperfected,andthebestwaytolearnaboutitistobuildanMTsys-tem.Thiscanbedonewithopen-sourcetoolkitssuchasMoses(Koehnetal.,2007),cdec(Dyeretal.,2010),orJoshua(Ganitkevitchetal.,2012),butthesesystemsarenotdesignedforpedagogy.Theyarematurecodebasesfeaturingtensofthousandsofsourcecodelines,makingitdifficulttofocusontheircorealgorithms.Mosttutorialspresentthemasblackboxes.ButourgoalisforstudentstolearnthekeytechniquesinMT,andideallytolearnbydoing.Blackboxesareincompatiblewiththisgoal.Wesolvethisdilemmabypresentingstudentswithconcise,fully-functioning,self-containedcom-ponentsofastatisticalMTsystem:wordalignment,decoding,evaluation,andreranking.Eachimple-mentationconsistsofana¨ıvebaselinealgorithminlessthan150linesofPythoncode.Weassignthemtostudentsasopen-endedchallengesinwhichthegoalistoimproveperformanceonobjectiveeval-uationmetricsasmuchaspossible.ThissettingmirrorsevaluationsconductedbytheNLPresearch l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 151–164. Action Editor: Patrick Pantel.
Transactions of the Association for Computational Linguistics, 1 (2013) 151–164. Action Editor: Patrick Pantel. Submitted 12/2012; Revised 2/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. Dijkstra-WSA:AGraph-BasedApproachtoWordSenseAlignmentMichaelMatuschek‡andIrynaGurevych†‡†UbiquitousKnowledgeProcessingLab(UKP-DIPF),GermanInstituteforEducationalResearchandEducationalInformationSchloßstr.29,60486Frankfurt,Germany‡UbiquitousKnowledgeProcessingLab(UKP-TUDA),DepartmentofComputerScience,TechnischeUniversit¨atDarmstadtHochschulstr.10,64289Darmstadt,Germanyhttp://www.ukp.tu-darmstadt.deAbstractInthispaper,wepresentDijkstra-WSA,anovelgraph-basedalgorithmforwordsensealignment.Weevaluateitonfourdifferentpairsoflexical-semanticresourceswithdif-ferentcharacteristics(WordNet-OmegaWiki,WordNet-Wiktionary,GermaNet-WiktionaryandWordNet-Wikipedia)andshowthatitachievescompetitiveperformanceon3outof4datasets.Dijkstra-WSAoutperformsthestateoftheartoneverydatasetifitiscom-binedwithaback-offbasedonglosssimilar-ity.WealsodemonstratethatDijkstra-WSAisnotonlyflexiblyapplicabletodifferentre-sourcesbutalsohighlyparameterizabletoop-timizeforprecisionorrecall.1IntroductionLexical-semanticresources(LSRs)areacorner-stoneformanyNaturalLanguageProcessing(NLP)applicationssuchaswordsensedisambiguation(WSD)andinformationextraction.However,thegrowingdemandforlarge-scaleresourcesindif-ferentlanguagesishardtomeet.ThePrincetonWordNet(WN)(Fellbaum,1998)iswidelyusedforEnglish,butformostlanguagescorrespondingre-sourcesareconsiderablysmallerormissing.Col-laborativelyconstructedresourceslikeWiktionary(WKT)andOmegaWiki(OW)provideaviableop-tionforsuchcasesandseemespeciallysuitableforsmallerlanguages(Matuscheketal.,2013),buttherearestillconsiderablegapsincoveragewhichneedtobefilled.Arelatedproblemisthatthereusuallydoesnotexistasingleresourcewhichworksbestforallpurposes,asdifferentLSRscoverdiffer-entwords,sensesandinformationtypes.Theseconsiderationshavesparkedincreasingre-searcheffortsintheareaofwordsensealignment(WSA).Ithasbeenshownthatalignedresourcescanindeedleadtobetterperformancethanusingtheresourcesindividually.Examplesincludeseman-ticparsingusingFrameNet(FN),WN,andVerbNet(VN)(ShiandMihalcea,2005),wordsensedisam-biguationusinganalignmentofWNandWikipedia(WP)(NavigliandPonzetto,2012)andsemanticrolelabelingusingacombinationofPropBank,VNandFNintheSemLinkproject(Palmer,2009).SomeoftheseapproachestoWSAeitherrelyheav-ilyonmanuallabor(e.g.ShiandMihalcea(2005))oroninformationwhichisonlypresentinfewresourcessuchasthemostfrequentsense(MFS)(Suchaneketal.,2008).Thismakesitdifficulttoapplythemtoalargersetofresources.Inearlierwork,wepresentedthelarge-scalere-sourceUBY(Gurevychetal.,2012).ItcontainsnineresourcesintwolanguageswhicharemappedtoauniformrepresentationusingtheLMFstandard(Eckle-Kohleretal.,2012).Theyarethusstruc-turallyinteroperable.UBYcontainspairwisesensealignmentsbetweenasubsetoftheseresources,andthisworkalsopresentedaframeworkforcreat-ingalignmentsbasedonthesimilarityofglosses(MeyerandGurevych,2011).However,itisnotcleartowhatextentthisapproachcanbeappliedtoresourceswhichlackthiskindofinformation(seeSection3).Insummary,aligningsensesisakeyrequirementforsemanticinteroperabilityofLSRstoincreasethe l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 139–150. Action Editor: Joakim Nivre.
Transactions of the Association for Computational Linguistics, 1 (2013) 139–150. Action Editor: Joakim Nivre. Submitted 12/2012; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. EfficientStackedDependencyParsingbyForestRerankingKatsuhikoHayashiandShuheiKondoandYujiMatsumotoGraduateSchoolofInformationScienceNaraInstituteofScienceandTechnology8916-5,Takayama,Ikoma,Nara630-0192,Japan{katsuhiko-h,shuhei-k,matsu}@is.naist.jpAbstractThispaperproposesadiscriminativefor-estrerankingalgorithmfordependencypars-ingthatcanbeseenasaformofefficientstackedparsing.Adynamicprogrammingshift-reduceparserproducesapackedderiva-tionforestwhichisthenscoredbyadiscrim-inativereranker,usingthe1-besttreeoutputbytheshift-reduceparserasguidefeaturesinadditiontothird-ordergraph-basedfeatures.Toimproveefficiencyandaccuracy,thispa-peralsoproposesanovelshift-reduceparserthateliminatesthespuriousambiguityofarc-standardtransitionsystems.TestingontheEnglishPennTreebankdata,forestrerankinggaveastate-of-the-artunlabeleddependencyaccuracyof93.12.1IntroductionTherearetwomainapproachesofdata-drivende-pendencyparsing–oneisgraph-basedandtheotheristransition-based.Inthegraph-basedapproach,globaloptimiza-tionalgorithmsfindthehighest-scoringtreewithlocallyfactoredmodels(McDonaldetal.,2005).Whilethird-ordergraph-basedmodelsachievestate-of-the-artaccuracy,ithasO(n4)timecomplexityforasentenceoflengthn.Recently,someprun-ingtechniqueshavebeenproposedtoimprovetheefficiencyofthird-ordermodels(RushandPetrov,2012;ZhangandMcDonald,2012).Thetransition-basedapproachusuallyemploystheshift-reduceparsingalgorithmwithlinear-timecomplexity(Nivre,2008).Itgreedilychoosesthetransitionwiththehighestscoreandtheresult-ingtransitionsequenceisnotalwaysgloballyop-timal.Thebeamsearchalgorithmimprovespars-ingflexibilityindeterministicparsing(ZhangandClark,2008;ZhangandNivre,2011),anddy-namicprogrammingmakesbeamsearchmoreeffi-cient(HuangandSagae,2010).Thereisalsoanalternativeapproachthatin-tegratesgraph-basedandtransition-basedmodels(SagaeandLavie,2006;ZhangandClark,2008;NivreandMcDonald,2008;Martinsetal.,2008).Martinsetal.(2008)formulatedtheirapproachasstackingofparserswheretheoutputofthefirst-stageparserisprovidedtothesecondasguidefeatures.Inparticular,theyusedatransition-basedparserforthefirststageandagraph-basedparserforthesecondstage.Themaindrawbackofthisapproachisthattheefficiencyofthetransition-basedparserissacri-ficedbecausethesecond-stageemploysfullparsing.Thispaperproposesanefficientstackedpars-ingmethodthroughdiscriminativererankingwithhigher-ordergraph-basedfeatures,whichworksontheforestsoutputbythefirst-stagedynamicpro-grammingshift-reduceparserandintegratesnon-localfeaturesefficientlywithcube-pruning(HuangandChiang,2007).Theadvantagesofourmethodareasfollows:•Unliketheconventionalstackingapproach,thefirst-stageshift-reduceparserprunesthesearchspaceofthesecond-stagegraph-basedparser.•Inadditiontoguidefeatures,thesecond-stagegraph-basedparsercanemploythescoresofthefirst-stageparserwhichcannotbeincorpo- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 125–138. Action Editor: Sharon Goldwater.
Transactions of the Association for Computational Linguistics, 1 (2013) 125–138. Action Editor: Sharon Goldwater. Submitted 10/2012; Revised 3/2013; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. ModelingChildDivergencesfromAdultGrammarSamSahakianUniversityofWisconsin-Madisonsahakian@cs.wisc.eduBenjaminSnyderUniversityofWisconsin-Madisonbsnyder@cs.wisc.eduAbstractDuringthecourseoffirstlanguageacquisi-tion,childrenproducelinguisticformsthatdonotconformtoadultgrammar.Inthispaper,weintroduceadatasetandapproachforsys-tematicallymodelingthischild-adultgrammardivergence.Ourcorpusconsistsofchildsen-tenceswithcorrectedadultforms.Webridgethegapbetweentheseformswithadiscrim-inativelyrerankednoisychannelmodelthattranslateschildsentencesintoequivalentadultutterances.OurmethodoutperformsMTandESLbaselines,reducingchilderrorby20%.Ourmodelallowsustochartspecificaspectsofgrammardevelopmentinlongitudinalstud-iesofchildren,andinvestigatethehypothesisthatchildrenshareacommondevelopmentalpathinlanguageacquisition.1IntroductionSincethepublicationoftheBrownStudy(1973),theexistenceofstandardstagesofdevelopmenthasbeenanunderlyingassumptioninthestudyoffirstlanguagelearning.Asachildmovestowardslan-guagemastery,theirlanguageusegrowspredictablytoincludemorecomplexsyntacticstructures,even-tuallyconvergingtofulladultusage.Inthecourseofthisprocess,childrenmayproducelinguisticformsthatdonotconformtothegrammaticalstandard.Fromtheadultpointofviewthesearelanguageer-rors,alabelwhichimpliesafaultyproduction.Con-sideringthework-in-progressnatureofachildlan-guagelearner,thesedivergencescouldalsobede-scribedasexpressionsofthestructuraldifferencesbetweenchildandadultgrammar.Thepredictabilityofthesedivergenceshasbeenobservedbypsychol-ogists,linguistsandparents(Owens,2008).1Ourworkleveragesthedifferencesbetweenchildandadultlanguagetomaketwocontributionsto-wardsthestudyoflanguageacquisition.First,weprovideacorpusoferrorfulchildsentencesanno-tatedwithadult-likerephrasings.Thisdatawillal-lowresearcherstotesthypothesesandbuildmodelsrelatingthedevelopmentofchildlanguagetoadultforms.Oursecondcontributionisaprobabilisticmodeltrainedonourcorpusthatpredictsagram-maticalrephrasinggivenanerrorfulchildsentence.Thegenerativeassumptionofourmodelisthatsentencesbegininunderlyingadultforms,andarethenstochasticallytransformedintoobservedchildutterances.Givenanobservedchildutterances,wecalculatetheprobabilityofthecorrectedadulttrans-lationtasP(t|s)∝P(s|t)P(t),whereP(t)isanadultlanguagemodelandP(s|t)isanoisemodelcraftedtocapturechildgrammarerrorslikeomissionofcertainfunctionwordsandcorruptionsoftenseordeclension.Theparame-tersofthisnoisemodelareestimatedusingourcor-pusofchildandadult-formutterances,usingEMtohandleunobservedwordalignments.Weusethisgenerativemodeltoproducen-bestlistsofcandi-datecorrectionswhicharethenrerankedusinglongrangesentencefeaturesinadiscriminativeframe-work(CollinsandRoark,2004).1Fortheremainderofthispaperweuse“error”and“diver-gence”interchangeably. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 99–110. Action Editor: Chris Callison-Burch.
Transactions of the Association for Computational Linguistics, 1 (2013) 99–110. Action Editor: Chris Callison-Burch. Submitted 12/2012; Published 5/2013. c (cid:13) 2013 Association for Computational Linguistics. UsingPivot-BasedParaphrasingandSentimentProfilestoImproveaSubjectivityLexiconforEssayDataBeataBeigmanKlebanov,NitinMadnani,JillBursteinEducationalTestingService660RosedaleRoad,Princeton,NJ08541,USA{bbeigmanklebanov,nmadnani,jburstein@ets.org}AbstractWedemonstrateamethodofimprovingaseedsentimentlexicondevelopedonessaydatabyusingapivot-basedparaphrasingsystemforlexicalexpansioncoupledwithsentimentpro-fileenrichmentusingcrowdsourcing.Profileenrichmentaloneyieldsupto15%improve-mentintheaccuracyoftheseedlexiconon3-waysentence-levelsentimentpolarityclassifi-cationofessaydata.Usinglexicalexpansioninadditiontosentimentprofilesprovidesafurther7%improvementinperformance.Ad-ditionalexperimentsshowthattheproposedmethodisalsoeffectivewithothersubjectivitylexiconsandinadifferentdomainofapplica-tion(productreviews).1IntroductionInalmostanysub-fieldofcomputationallinguistics,creationofworkingsystemsstartswithaninvest-mentinmanually-generatedormanually-annotateddataforcomputationalexploration.Insubjectivityandsentimentanalysis,annotationoftrainingandtestingdataandconstructionofsubjectivitylexiconshavebeenthelociofcostlylaborinvestment.Manysubjectivitylexiconsarementionedintheliterature.Thetwolargemanually-builtlexiconsforEnglish–theGeneralInquirer(Stoneetal.,1966)andthelexiconprovidedwiththeOpinion-Finderdistribution(WiebeandRiloff,2005)–areavailableforresearchandeducationonly1andun-derGNUGPLlicensethatdisallowstheirincor-porationintoproprietarymaterials,2respectively.1http://www.wjh.harvard.edu/inquirer/j11/manual/2http://www.gnu.org/copyleft/gpl.htmlThosewishingtointegratesentimentanalysisintoproducts,alongwiththosestudyingsubjectivityinlanguagesotherthanEnglish,orforspecificdo-mainssuchasfinance,orforparticulargenressuchasMySpacecomments,reportedconstructionoflexicons(Taboadaetal.,2011;LoughranandMcDonald,2011;Thelwalletal.,2010;RaoandRavichandran,2009;JijkounandHofmann,2009;PitelandGrefenstette,2008;Mihalceaetal.,2007).Inthispaper,weaddressthestepofexpandingasmall-scale,manually-builtsubjectivitylexicon(aseedlexicon,typicallyforadomainorlanguageinquestion)intoamuchlargerbutnoisierlexi-conusinganautomaticprocedure.Wepresentanovelexpansionmethodusingastate-of-the-artparaphrasingsystem.Theexpansionyieldsa4-foldincreaseinlexiconsize;yet,theexpansionaloneisinsufficientinordertoimproveperformanceonsentence-levelsentimentpolarityclassification.Inthispaperwetestthefollowinghypothesis.Wesuggestthattheeffectivenessoftheexpansionishamperedby(1)introductionofopposite-polarityitems,suchasintroducingresoluteasanexpansionofforceful,orremarkableasanexpansionofpecu-liar;(2)introductionofweaklypolar,neutral,oram-biguouswordsasexpansionsofpolarseedwords,suchasgeneratingconcernasanexpansionofanx-ietyorfutureasanexpansionofaftermath;3(3)in-abilitytodistinguishbetweenstrongerorclear-cutversusweakerorambiguoussentimentandtomakeadifferentialuseofthose.Weaddressitems(1)and(2)byenrichingthelexi-conwithsentimentprofiles(section3),andpropose3Table2andFigure1providesupporttotheseassessments. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 75–88. Action Editor: Sharon Goldwater.
Transactions of the Association for Computational Linguistics, 1 (2013) 75–88. Action Editor: Sharon Goldwater. Submitted 11/2012; Revised 1/2013; Published 3/2013. c (cid:13) 2013 Association for Computational Linguistics. AnHDPModelforInducingCombinatoryCategorialGrammarsYonatanBiskandJuliaHockenmaierDepartmentofComputerScienceTheUniversityofIllinoisatUrbana-Champaign201NGoodwinAveUrbana,IL61801{bisk1,juliahmr}@illinois.eduAbstractWeintroduceanovelnonparametricBayesianmodelfortheinductionofCombinatoryCat-egorialGrammarsfromPOS-taggedtext.Itachievesstateoftheartperformanceonanumberoflanguages,andinduceslinguisti-callyplausiblelexicons.1IntroductionWhatgrammaticalrepresentationisappropriateforunsupervisedgrammarinduction?Initialattemptswithcontext-freegrammars(CFGs)werenotverysuccessful(CarrollandCharniak,1992;Charniak,1993).OnereasonmaybethatCFGsrequirethespecificationofafiniteinventoryofnonterminalcat-egoriesandrewriterules,butunlessoneadoptslin-guisticprinciplessuchasX-bartheory(Jackendoff,1977),thesenonterminalsareessentiallyarbitrarylabelsthatcanbecombinedinarbitraryways.WhilefurtherCFG-basedapproacheshavebeenproposed(Clark,2001;KuriharaandSato,2004),mostre-centworkhasfollowedKleinandManning(2004)indevelopingmodelsfortheinductionofprojec-tivedependencygrammars.Ithasbeenshownthatmoresophisticatedprobabilitymodels(HeaddenIIIetal.,2009;Gillenwateretal.,2011;CohenandSmith,2010)andlearningregimes(Spitkovskyetal.,2010),aswellastheincorporationofpriorlin-guisticknowledge(CohenandSmith,2009;Berg-KirkpatrickandKlein,2010;Naseemetal.,2010)canleadtosignificantimprovementoverKleinandManning’sbaselinemodel.Theuseofdependencygrammarscircumventsthequestionofhowtoobtainanappropriateinventoryofcategories,sincedepen-dencyparsesaresimplydefinedbyunlabelededgesbetweenthelexicalitemsinthesentence.Butde-pendencygrammarsmakeitalsodifficulttocap-turenon-localstructures,andBlunsomandCohn(2010)showthatitmaybeadvantageoustorefor-mulatetheunderlyingdependencygrammarintermsofatree-substitutiongrammar(TSG)whichpairswordswithtreeletsthatspecifythenumberofleftandrightdependentstheyhave.Inthispaper,weexploreyetanotheroption:insteadofdependencygrammars,weuseCombinatoryCategorialGram-mar(CCG,Steedman(1996;2000)),alinguisticallyexpressiveformalismthatpairslexicalitemswithrichcategoriesthatcapturealllanguage-specificin-formation.Thismayseemapuzzlingchoice,sinceCCGrequiresasignificantlylargerinventoryofcat-egoriesthaniscommonlyassumedforCFGs.How-ever,unlikeCFGnonterminals,CCGcategoriesarenotarbitrarysymbols:theyencode,andaredeter-minedby,thebasicwordorderofthelanguageandthenumberofargumentseachwordtakes.CCGisverysimilartoTSGinthatitalsopairslexicalitemswithrichitemsthatcapturealllanguage-specificin-formation.LikeTSGandprojectivedependencygrammars,werestrictourselvestoaweaklycontext-freefragmentofCCG.ButwhileTSGdoesnotdis-tinguishbetweenargumentandmodifierdependen-cies,CCGmakesanexplicitdistinctionbetweenthetwo.AndwhiletheelementarytreesofBlunsomandCohn(2010)’sTSGandtheirinternalnodella-belshavenoobviouslinguisticinterpretation,thesyntacticbehaviorofanyCCGconstituentcanbedirectlyinferredfromitscategory.Toseewhether l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 1 (2013) 63–74. Action Editor: Brian Roark.
Transactions of the Association for Computational Linguistics, 1 (2013) 63–74. Action Editor: Brian Roark. Submitted 9/2012; Published 3/2013. c (cid:13) 2013 Association for Computational Linguistics. UnsupervisedDependencyParsingwithAcousticCuesJohnKPate†‡j.k.pate@sms.ed.ac.ukSharonGoldwater†sgwater@inf.ed.ac.uk†ILCC,SchoolofInformatics‡DepartmentofComputingUniversityofEdinburghMacquarieUniversityEdinburgh,EH89AB,UKSydney,NSW2109,AustraliaAbstractUnsupervisedparsingisadifficulttaskthatinfantsreadilyperform.Progresshasbeenmadeonthistaskusingtext-basedmodels,butfewcomputationalapproacheshaveconsideredhowinfantsmightbenefitfromacousticcues.Thispaperexploresthehypothesisthatworddurationcanhelpwithlearningsyntax.Wede-scribehowdurationinformationcanbeincor-poratedintoanunsupervisedBayesiandepen-dencyparserwhoseonlyothersourceofinfor-mationisthewordsthemselves(withoutpunc-tuationorpartsofspeech).Ourresults,evalu-atedonbothadult-directedandchild-directedutterances,showthatusingworddurationcanimproveparsequalityrelativetowords-onlybaselines.Theseresultssupporttheideathatacousticcuesprovideusefulevidenceaboutsyntacticstructureforlanguage-learningin-fants,andmotivatetheuseofworddurationcuesinNLPtaskswithspeech.1IntroductionUnsupervisedlearningofsyntaxisdifficultforNLPsystems,yetinfantsperformthistaskroutinely.Pre-viousworkinNLPhasfocusedonusingtheimplicitsyntacticinformationavailableinpart-of-speech(POS)tags(KleinandManning,2004),punctuation(Seginer,2007;Spitkovskyetal.,2011b;Ponvertetal.,2011),andsyntacticsimilaritiesbetweenrelatedlanguages(CohenandSmith,2009;Cohenetal.,2011).However,theseapproacheslikelyusethedatainaverydifferentwayfromchildren:neitherPOStagsnorpunctuationareobservedduringlanguageacquisition(althoughseeSpitkovskyetal.(2011a)andChristodoulopoulosetal.(2012)forencourag-ingresultsusingunsupervisedPOStags),andmanychildrenlearninabroadlymonolingualenvironment.ThispaperexploresapossiblesourceofinformationthatNLPsystemstypicallyignore:wordduration,orthelengthoftimetakentopronounceeachword.Therearegoodreasonstothinkthatworddura-tionmightbeusefulforlearningsyntax.First,thewell-establishedProsodicBootstrappinghypothesis(GleitmanandWanner,1982)proposesthatinfantsuseacoustic-prosodiccues(suchaswordduration)tohelpthemidentifysyntacticstructure,becauseprosodicandsyntacticstructuressometimescoincide.Morerecently,weproposed(PateandGoldwater,2011)thatinfantsmightuseworddurationasadi-rectcuetosyntacticstructure(i.e.,withoutrequir-ingintermediateprosodicstructure),becausewordsinhigh-probabilitysyntacticstructurestendtobepronouncedmorequickly(GahlandGarnsey,2004;Gahletal.,2006;Tilyetal.,2009).Likemostrecentworkonunsupervisedparsing,wefocusonlearningsyntacticdependencies.OurworkisbasedonHeaddenetal.(2009)’sBayesianversionoftheDependencyModelwithValence(DMV)(KleinandManning,2004),usinginterpo-latedbackofftechniquestoincorporatemultipleinfor-mationsourcespertoken.However,whereasHead-denetal.usedwordsandPOStagsasinput,weusewordsandworddurationinformation,presentingthreevariantsoftheirmodelthatusethisinformationinslightlydifferentways.11Byusingneithergold-standardnorlearnedPOStagsasinput,ourworkdiffersfromnearlyallpreviousworkonunsuper-viseddependencyparsing.Whilelearnedtagsmightbeplausible l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 49–62. Action Editor: Jason Eisner.
Transactions of the Association for Computational Linguistics, 1 (2013) 49–62. Action Editor: Jason Eisner. Submitted 11/2012; Published 3/2013. c (cid:13) 2013 Association for Computational Linguistics. WeaklySupervisedLearningofSemanticParsersforMappingInstructionstoActionsYoavArtziandLukeZettlemoyerComputerScience&EngineeringUniversityofWashingtonSeattle,WA98195{yoav,lsz}@cs.washington.eduAbstractThecontextinwhichlanguageisusedpro-videsastrongsignalforlearningtorecoveritsmeaning.Inthispaper,weshowitcanbeusedwithinagroundedCCGsemanticparsingapproachthatlearnsajointmodelofmean-ingandcontextforinterpretingandexecutingnaturallanguageinstructions,usingvarioustypesofweaksupervision.Thejointnatureprovidescrucialbenefitsbyallowingsituatedcues,suchasthesetofvisibleobjects,todi-rectlyinfluencelearning.Italsoenablesalgo-rithmsthatlearnwhileexecutinginstructions,forexamplebytryingtoreplicatehumanac-tions.Experimentsonabenchmarknaviga-tionaldatasetdemonstratestrongperformanceunderdifferingformsofsupervision,includ-ingcorrectlyexecuting60%moreinstructionsetsrelativetothepreviousstateoftheart.1IntroductionThecontextinwhichnaturallanguageisusedpro-videsastrongsignaltoreasonaboutitsmeaning.However,usingsuchasignaltoautomaticallylearntounderstandunrestrictednaturallanguageremainsachallenging,unsolvedproblem.Forexample,considertheinstructionsinFigure1.Correctinterpretationrequiresustosolvemanysub-problems,suchasresolvingallreferringexpres-sionstospecificobjectsintheenvironment(includ-ing,“thecorner”or“thethirdintersection”),disam-biguatingwordsensebasedoncontext(e.g.,“thechair”couldrefertoachairorsofa),andfindingexecutableactionsequencesthatsatisfystatedcon-straints(suchas“twice”or“tofacethebluehall”).moveforwardtwicetothechairλa.move(a)∧dir(a,forward)∧len(a,2)∧to(a,ιx.chair(x))atthecornerturnlefttofacethebluehallλa.pre(a,ιx.corner(x))∧turn(a)∧dir(a,left)∧post(a,front(you,ιx.blue(x)∧hall(x)))movetothechairinthethirdintersectionλa.move(a)∧to(a,ιx.sofa(x))∧intersect(order(λy.junction(y),frontdist,3),x)Figure1:Asamplenavigationinstructionset,pairedwithlambda-calculusmeaningrepresentations.Wemustalsounderstandimplicitrequests,forex-amplefromthephrase“atthecorner,”thatdescribegoalstobeachievedwithoutspecifyingthespecificsteps.Finally,todoallofthisrobustlywithoutpro-hibitiveengineeringeffort,weneedgroundedlearn-ingapproachesthatjointlyreasonaboutmeaningandcontexttolearndirectlyfromtheirinterplay,withaslittlehumaninterventionaspossible.Althoughmanyofthesechallengeshavebeenstudiedseparately,aswewillreviewinSection3,thispaperrepresents,tothebestofourknowledge,thefirstattemptatacomprehensivemodelthatad-dressesthemall.OurapproachinducesaweightedCombinatoryCategorialGrammar(CCG),includ-ingboththeparametersofthelinearmodelandaCCGlexicon.Tomodelcomplexinstructionallan-guage,weintroduceanewsemanticmodelingap-proachthatcanrepresentanumberofkeylinguisticconstructsthatarecommoninspatialandinstruc-tionallanguage.Tolearnfromindirectsupervision,wedefinethenotionofavalidationfunction,forexamplethatteststhestateoftheagentafterin-terpretinganinstruction.Wethenshowhowthisfunctioncanbeusedtodriveonlinelearning.For l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c