What topic do you need documentation on?
Analysis Methods in Neural Language Processing: A Survey
Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov1,2 and James Glass1 1MIT Computer Science and Artificial Intelligence Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA {belinkov, glass}@mit.edu Abstract The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new mod- els have been proposed,
Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing
Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing Strategies for MRLs and a Case Study from Modern Hebrew Amir More Open University Ra’anana, Israel habeanf@gmail.com Victoria Basmova Open University Ra’anana, Israel vicbas@openu.ac.il Amit Seker Open University Ra’anana, Israel amitse@openu.ac.il Reut Tsarfaty Open University Ra’anana, Israel reutts@openu.ac.il Abstract In standard NLP pipelines, morphological analysis and disambiguation (MA&D) pre- cedes syntactic and semantic downstream tasks. Jedoch, for languages
Semantic Neural Machine Translation Using AMR
Semantic Neural Machine Translation Using AMR Linfeng Song,1 Daniel Gildea,1 Yue Zhang,2 Zhiguo Wang,3 and Jinsong Su4 1Department of Computer Science, University of Rochester, Rochester, New York 14627 2School of Engineering, Westlake University, China 3IBM T.J. Watson Research Center, Yorktown Heights, New York 10598 4Xiamen University, Xiamen, China 1{lsong10,gildea}@cs.rochester.edu 2yue.zhang@wias.org.cn 3zgw.tomorrow@gmail.com 4jssu@xmu.edu.cn Abstract It is intuitive that semantic representations can be useful for machine translation, mainly be-
Grammar Error Correction in Morphologically Rich Languages:
Grammar Error Correction in Morphologically Rich Languages: The Case of Russian Alla Rozovskaya Queens College, City University of New York arozovskaya@qc.cuny.edu Dan Roth University of Pennsylvania danroth@seas.upenn.edu Abstract Until now, most of the research in grammar error correction focused on English, and the problem has hardly been explored for other languages. We address the task of correcting writing mistakes in morphologically rich lan- guages, mit
Learning Typed Entailment Graphs with Global Soft Constraints
Learning Typed Entailment Graphs with Global Soft Constraints Mohammad Javad Hosseini(cid:63)§ Nathanael Chambers(cid:63)(cid:63) Siva Reddy† Xavier R. Holt‡ Shay B. Cohen(cid:63) Mark Johnson‡ and Mark Steedman(cid:63) (cid:63)University of Edinburgh §The Alan Turing Institute, Vereinigtes Königreich (cid:63)(cid:63)United States Naval Academy †Stanford University ‡Macquarie University javad.hosseini@ed.ac.uk, nchamber@usna.edu, sivar@stanford.edu {xavier.ricketts-holt,mark.johnson}@mq.edu.au {scohen,steedman}@inf.ed.ac.uk Abstract This paper presents a new method for learn- ing typed entailment graphs from text. We extract predicate-argument
Surface Statistics of an Unknown Language Indicate How to Parse It
Surface Statistics of an Unknown Language Indicate How to Parse It Dingquan Wang and Jason Eisner Department of Computer Science, Johns Hopkins Universität {wdd,jason}@cs.jhu.edu Abstract We introduce a novel framework for delex- icalized dependency parsing in a new lan- Spur. We show that useful features of the target language can be extracted automati- cally from an unparsed corpus, which con- sists only of gold part-of-speech
Attentive Convolution:
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms Wenpeng Yin Department of Computer and Information Science, University of Pennsylvania wenpeng@seas.upenn.edu Hinrich Schütze Center for Information and Language Processing, LMU Munich, Germany inquiries@cislmu.org Abstract In NLP, convolutional neural networks (CNNs) have benefited less than recur- rent neural networks (RNNs) from attention mechanisms. We hypothesize that this is be- cause the attention in CNNs has been mainly
Erratum: “Improving Topic Models with Latent Feature Word
Erratum: “Improving Topic Models with Latent Feature Word Representations” Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson Abstract FROM (a part of Table 10 in the original published arti- cle): F1 scores for TMN and TMNtitle datasets. Change in clustering and classification results due to the DMM and LF-DMM bugs. Data TMN 4.3 Document clustering evaluation FROM (in the original published article): Für
Transactions of the Association for Computational Linguistics, 1 (2013) 429–440. Action Editor: Philipp Koehn.
Transactions of the Association for Computational Linguistics, 1 (2013) 429–440. Action Editor: Philipp Koehn. Submitted 3/2013; Überarbeitet 8/2013; Published 10/2013. C (cid:13) 2013 Verein für Computerlinguistik. MeasuringMachineTranslationErrorsinNewDomainsAnnIrvineJohnsHopkinsUniversityanni@jhu.eduJohnMorganUniversityofMarylandjjm@cs.umd.eduMarineCarpuatNationalResearchCouncilCanadamarine.carpuat@nrc.gc.caHalDaum´eIIIUniversityofMarylandme@hal3.nameDragosMunteanuSDLResearchdmunteanu@sdl.comAbstractWedeveloptwotechniquesforanalyzingtheeffectofportingamachinetranslationsystemtoanewdomain.Oneisamacro-levelana-lysisthatmeasureshowdomainshiftaffectscorpus-levelevaluation;thesecondisamicro-levelanalysisforword-levelerrors.Weap-plythesemethodstounderstandwhathappenswhenaParliament-trainedphrase-basedma-chinetranslationsystemisappliedinfourverydifferentdomains:news,medicaltexts,scien-tificarticlesandmoviesubtitles.Wepresentquantitativeandqualitativeexperimentsthathighlightopportunitiesforfutureresearchindomainadaptationformachinetranslation.1IntroductionWhenbuildingastatisticalmachinetranslation(SMT)System,theexpectedusecaseisoftenlimitedtoaspecificdomain,genreandregister(henceforth“domain”referstothisset,inkeepingwithstandard,imprecise,terminology),suchasaparticulartypeoflegalormedicaldocument.Unfortunately,itisex-pensivetoobtainenoughparalleldatatoreliablyes-timatetranslationmodelsinanewdomain.Instead,onecanhopethatlargeamountsofdatafromano-ther,“olddomain,”mightbecloseenoughtostandasaproxy.Thisisthedefactostandard:wetrainSMTsystemsonParliamentproceedings,butthenusethemtotranslateallsortsofnewtext.Unfortuna-tely,thisresultsinsignificantlydegradedtranslationquality.Inthispaper,wepresenttwocomplemen-tarymethodsforquantifiablymeasuringthesourceoftranslationerrors(§5.1and§5.2)inanoveltaxo-nomy(§4).Weshowquantitative(§7.1)andquali-tative(§7.2)resultsobtainedfromourmethodsonOldDomain(Hansard)Inpmonsieurlepr´esident,lespˆecheursdehomarddelar´egiondel’atlantiquesontdansunesituationcatastro-phique.Refmr.speaker,lobsterfishersinatlanticcanadaarefacingadisaster.Outmr.speaker,thelobsterfishersinatlanticcanadaareinamess.NewDomain(Medical)Inpmodeetvoie(S)d’administrationRefmethodandroute(S)ofadministrationOutfashionandvoie(S)ofdirectorsTABLE1:Exampleinputs,referencesandsystemoutputs.Therearethreetypesoferrors:unseenwords(Blau),in-correctsenseselection(Rot)andunknownsense(Grün).fourverydifferentnewdomains:newswire,medicaltexts,scientificabstracts,andmoviesubtitles.Ourbasicapproachistothinkoftranslationer-rorsinthecontextofanoveltaxonomyoferrorcategories,“S4.”OurtaxonomycontainscategoriesfortheerrorsshowninTable1,inwhichanSMTsystemtrainedontheHansardparliamentaryproce-dingsisappliedtoanewdomain(inthiscase,me-dicaltexts).Ourcategorizationfocusesonthefollo-wing:newFrenchwords,newFrenchsenses,andin-correctlychosentranslations.Thefirstmethodologywedevelopforstudyingsucherrorsisamicro-levelstudyofthefrequencyanddistributionoftheseerrortypesinrealtranslationoutputatthelevelofindivi-dualwords(§5.1),withoutrespecttohowtheseer-rorsaffectoveralltranslationquality.Thesecondisamacro-levelstudyofhowtheseerrorsaffecttrans-lationperformance(measuredbyBLEU;§5.2).Oneimportantfeatureofourmethodologiesisthatwefocusonerrorsthatcouldpossiblybefixedgivenaccesstodatafromanewdomain,ratherthanallerrorsthatmightarisebecausetheparticulartransla-tionmodelusedisinadequatetocapturetherequired l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 415–428. Action Editor: Brian Roark.
Transactions of the Association for Computational Linguistics, 1 (2013) 415–428. Action Editor: Brian Roark. Submitted 7/2013; Überarbeitet 9/2013; Published 10/2013. C (cid:13) 2013 Verein für Computerlinguistik. JointMorphologicalandSyntacticAnalysisforRichlyInflectedLanguagesBerndBohnet∗JoakimNivre?IgorBoguslavsky•◦Rich´ardFarkas(cid:5)FilipGinter†JanHajiˇc‡∗UniversityofBirmingham,SchoolofComputerScience?UppsalaUniversity,DepartmentofLinguisticsandPhilology•UniversidadPolit´ecnicadeMadrid,DepartamentodeInteligenciaArtificial◦RussianAcademyofSciences,InstituteforInformationTransmissionProblems(cid:5)UniversityofSzeged,InstituteofInformatics†UniversityofTurku,DepartmentofInformationTechnology‡CharlesUniversityinPrague,InstituteofFormalandAppliedLinguisticsAbstractJointmorphologicalandsyntacticanalysishasbeenproposedasawayofimprovingparsingaccuracyforrichlyinflectedlanguages.Start-ingfromatransition-basedmodelforjointpart-of-speechtagginganddependencypars-ing,weexploredifferentwaysofintegratingmorphologicalfeaturesintothemodel.Wealsoinvestigatetheuseofrule-basedmor-phologicalanalyzerstoprovidehardorsoftlexicalconstraintsandtheuseofwordclus-terstotacklethesparsityoflexicalfeatures.Evaluationonfivemorphologicallyrichlan-guages(Czech,Finnish,Deutsch,Hungarian,andRussian)showsconsistentimprovementsinbothmorphologicalandsyntacticaccuracyforjointpredictionoverapipelinemodel,withfurtherimprovementsthankstolexicalcon-straintsandwordclusters.Thefinalresultsimprovethestateoftheartindependencyparsingforalllanguages.1IntroductionSyntacticparsingofnaturallanguagehaswitnessedatremendousdevelopmentduringthelasttwentyyears,especiallythroughtheuseofstatisticalmod-elsforrobustandaccuratebroad-coverageparsing.However,asstatisticalparsingtechniqueshavebeenappliedtomoreandmorelanguages,ithasalsobeenobservedthattypologicaldifferencesbetweenlanguagesleadtonewchallenges.Inparticular,ithasbeenfoundoverandoveragainthatlanguagesexhibitingrichmorphologicalstructure,oftento-getherwitharelativelyfreewordorder,usuallyob-tainlowerparsingaccuracy,especiallyincompar-isontoEnglish.OnestrikingdemonstrationofthistendencycanbefoundintheCoNLLsharedtasksonmultilingualdependencyparsing,organizedin2006and2007,whererichlyinflectedlanguagesclusteredatthelowerendofthescalewithrespecttopars-ingaccuracy(BuchholzandMarsi,2006;Nivreetal.,2007).Theseandsimilarobservationshaveledtoanincreasedinterestinthespecialchallengesposedbyparsingmorphologicallyrichlanguages,asevidencedmostclearlybyanewseriesofwork-shopsdevotedtothistopic(Tsarfatyetal.,2010),aswellasaspecialissueinComputationalLinguistics(Tsarfatyetal.,2013)andasharedtaskonparsingmorphologicallyrichlanguages.1Onehypothesizedexplanationforthelowerpars-ingaccuracyobservedforrichlyinflectedlanguagesisthestrictseparationofmorphologicalandsyn-tacticanalysisassumedinmanyparsingframe-works(Tsarfatyetal.,2010;Tsarfatyetal.,2013).Thisistrueinparticularfordata-drivendependencyparsers,whichtendtoassumethatallmorphologicaldisambiguationhasbeenperformedbeforesyntacticanalysisbegins.However,asarguedbyLeeetal.(2011),inmorphologicallyrichlanguagesthereisoftenconsiderableinteractionbetweenmorphologyandsyntax,suchthatneithercanbedisambiguatedwithouttheother.Leeetal.(2011)goontoshowthatadiscriminativemodelforjointmorphologicaldisambiguationanddependencyparsinggivescon-sistentimprovementsinmorphologicalandsyntac-ticaccuracy,comparedtoapipelinemodel,forAn-cientGreek,Czech,HungarianandLatin.Simi-larly,BohnetandNivre(2012)proposeamodelfor1Seehttps://sites.google.com/site/spmrl2013/home/sharedtask. l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner.
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Submitted 6/2013; Published 10/2013. C (cid:13) 2013 Verein für Computerlinguistik. TrainingDeterministicParserswithNon-DeterministicOraclesYoavGoldbergBar-IlanUniversityDepartmentofComputerScienceRamat-Gan,Israelyoav.goldberg@gmail.comJoakimNivreUppsalaUniversityDepartmentofLinguisticsandPhilologyUppsala,Swedenjoakim.nivre@lingfil.uu.seAbstractGreedytransition-basedparsersareveryfastbuttendtosufferfromerrorpropagation.Thisproblemisaggravatedbythefactthattheyarenormallytrainedusingoraclesthataredeter-ministicandincompleteinthesensethattheyassumeauniquecanonicalpaththroughthetransitionsystemandareonlyvalidaslongastheparserdoesnotstrayfromthispath.Inthispaper,wegiveageneralcharacterizationoforaclesthatarenondeterministicandcom-plete,presentamethodforderivingsuchora-clesfortransitionsystemsthatsatisfyaprop-ertywecallarcdecomposition,andinstanti-atethismethodforthreewell-knowntransi-tionsystemsfromtheliterature.Wesaythattheseoraclesaredynamic,becausetheyallowustodynamicallyexplorealternativeandnon-optimalpathsduringtraining–incontrasttooraclesthatstaticallyassumeauniqueoptimalpath.Experimentalevaluationonawiderangeofdatasetsclearlyshowsthatusingdynamicoraclestotraingreedyparsersgivessubstan-tialimprovementsinaccuracy.Moreover,thisimprovementcomesatnocostintermsofefficiency,unlikeothertechniqueslikebeamsearch.1IntroductionGreedytransition-basedparsersareeasytoimple-mentandareveryefficient,buttheyaregenerallynotasaccurateasparsersthatarebasedonglobalsearch(McDonaldetal.,2005;KooandCollins,2010)orastransition-basedparsersthatusebeamsearch(ZhangandClark,2008)ordynamicpro-gramming(HuangandSagae,2010;Kuhlmannetal.,2011).Thisworkispartofalineofresearchtryingtopushtheboundariesofgreedyparsingandnarrowtheaccuracygapof2–3%betweensearch-basedandgreedyparsers,whilemaintainingtheef-ficiencyandincrementalnatureofgreedyparsers.Onereasonfortheloweraccuracyofgreedyparsersiserrorpropagation:oncetheparsermakesanerrorindecoding,moreerrorsarelikelytofol-low.Thisbehavioriscloselyrelatedtothewayinwhichgreedyparsersarenormallytrained.Givenatreebankoracle,agoldsequenceoftransitionsisderived,andapredictoristrainedtopredicttransi-tionsalongthisgoldsequence,withoutconsideringanyparserstateoutsidethissequence.Thus,oncetheparserstraysfromthegoldenpathattesttime,itventuresintounknownterritoryandisforcedtoreacttosituationsithasneverbeentrainedfor.Inrecentwork(GoldbergandNivre,2012),weintroducedtheconceptofadynamicoracle,whichisnon-deterministicandnotrestrictedtoasinglegoldenpath,butinsteadprovidesoptimalpredic-tionsforanypossiblestatetheparsermightbein.Dynamicoraclesarenon-deterministicinthesensethattheyreturnasetofvalidtransitionsforagivenparserstateandgoldtree.Moreover,theyarewell-definedandoptimalalsoforstatesfromwhichthegoldtreecannotbederived,inthesensethattheyreturnthesetoftransitionsleadingtothebesttreederivablefromeachstate.Weshowedexperimen-tallythat,usingadynamicoracleforthearc-eagertransitionsystem(Nivre,2003),agreedyparsercanbetrainedtoperformwellalsoafterincurringamis-take,thusalleviatingtheeffectoferrorpropagationandresultinginconsistentlybetterparsingaccuracy. l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 379–390. Action Editor: Lillian Lee.
Transactions of the Association for Computational Linguistics, 1 (2013) 379–390. Action Editor: Lillian Lee. Submitted 6/2013; Überarbeitet 9/2013; Published 10/2013. C (cid:13) 2013 Verein für Computerlinguistik. Data-DrivenMetaphorRecognitionandExplanationHongsongLiMicrosoftResearchAsiahongsli@microsoft.comKennyQ.ZhuShanghaiJiaoTongUniversitykzhu@cs.sjtu.edu.cnHaixunWangGoogleResearchhaixun@google.comAbstractRecognizingmetaphorsandidentifyingthesource-targetmappingsisanimportanttaskasmetaphoricaltextposesabigchallengeformachinereading.Toaddressthisproblem,weautomaticallyacquireametaphorknowledgebaseandanisAknowledgebasefrombillionsofwebpages.Usingtheknowledgebases,wedevelopaninferencemechanismtorec-ognizeandexplainthemetaphorsinthetext.Toourknowledge,thisisthefirstpurelydata-drivenapproachofprobabilisticmetaphorac-quisition,recognition,andexplanation.Ourresultsshowsthatitsignificantlyoutperformsotherstate-of-the-artmethodsinrecognizingandexplainingmetaphors.1IntroductionAmetaphorisawayofcommunicating.Itenablesustocomprehendonethingintermsofanother.Forexample,themetaphor,Julietisthesun,allowsustoseeJulietmuchmorevividlythanifShakespearehadtakenamoreliteralapproach.Weutteraboutonemetaphorforeverytentotwenty-fivewords,oraboutsixmetaphorsaminute(Geary,2011).Konkret,ametaphorisamappingofconceptsfromasourcedomaintoatargetdomain(LakoffandJohnson,1980).Thesourcedomainisoftencon-creteandbasedonsensoryexperience,whiletar-getdomainisusuallyabstract.Twoconceptsareconnectedbythismappingbecausetheysharesomecommonorsimilarproperties,andasaresult,themeaningofoneconceptcanbetransferredtoan-other.Forexample,in“Julietisthesun,”thesunisthesourceconceptwhileJulietisthetargetconcept.Oneinterpretationofthismetaphoristhatbothcon-ceptssharethepropertythattheirexistencebringsaboutwarmth,life,andexcitement.Inametaphor-icalsentence,atleastoneofthetwoconceptsmustbeexplicitlypresent.Thisleadstothreetypesofmetaphors:1.Julietisthesun.Here,boththesource(sun)andthetarget(Juliet)areexplicit.2.Pleasewashyourclawsbeforescratchingme.Here,thesource(claws)isexplicit,whilethetarget(hands)isimplicit,andthecontextofwashisintermsofthetarget.3.Yourwordscutdeep.Here,thetarget(Wörter)isexplicit,whilethesource(möglicherweise,knife)isimplicit,andthecontextofcutisintermsofthesource.Inthispaper,wefocusontherecognitionandex-planationofmetaphors.Foragivensentence,wefirstcheckwhetheritcontainsametaphoricexpres-sion(whichwecallmetaphorrecognition),andifitdoes,weidentifythesourceandthetargetcon-ceptsofthemetaphor(whichwecallmetaphorex-planation).Metaphorexplanationisimportantforunderstandingmetaphors.Explainingtype2and3metaphorsisparticularlychallenging,Und,tothebestofourknowledge,hasnotbeenattemptedfornominalconcepts1before.Inourexamples,know-ingthatlifeandhandsarethetargetconceptsavoidstheconfusionthatmayariseifsourceconceptssunandclawsareusedliterallyinunderstandingthesen-tences.This,Jedoch,doesnotmeanthatthesource1Nominalconceptsarethoserepresentedbynounphrases. l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 391–402. Action Editor: Rada Mihalcea.
Transactions of the Association for Computational Linguistics, 1 (2013) 391–402. Action Editor: Rada Mihalcea. Submitted 5/2013; Published 10/2013. C(cid:13)2013 Verein für Computerlinguistik. 391 Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading Sumit Basu Chuck Jacobs Lucy Vanderwende Microsoft Research Microsoft Research Microsoft Research One Microsoft Way One Microsoft Way One Microsoft Way Redmond, WA Redmond, WA Redmond, WA sumitb@microsoft.com cjacobs@microsoft.com
Transactions of the Association for Computational Linguistics, 1 (2013) 367–378. Action Editor: Kristina Toutanova.
Transactions of the Association for Computational Linguistics, 1 (2013) 367–378. Action Editor: Kristina Toutanova. Submitted 7/2013; Überarbeitet 8/2013; Published 10/2013. C (cid:13) 2013 Verein für Computerlinguistik. ModelingMissingDatainDistantSupervisionforInformationExtractionAlanRitterMachineLearningDepartmentCarnegieMellonUniversityrittera@cs.cmu.eduLukeZettlemoyer,MausamComputerSci.&Eng.UniversityofWashington{lsz,mausam}@cs.washington.eduOrenEtzioniVulcanInc.Seattle,WAorene@vulcan.comAbstractDistantsupervisionalgorithmslearninforma-tionextractionmodelsgivenonlylargeread-ilyavailabledatabasesandtextcollections.Mostpreviousworkhasusedheuristicsforgeneratinglabeleddata,forexampleassum-ingthatfactsnotcontainedinthedatabasearenotmentionedinthetext,andfactsinthedatabasemustbementionedatleastonce.Inthispaper,weproposeanewlatent-variableapproachthatmodelsmissingdata.Thispro-videsanaturalwaytoincorporatesidein-formation,forinstancemodelingtheintuitionthattextwilloftenmentionrareentitieswhicharelikelytobemissinginthedatabase.De-spitetheaddedcomplexityintroducedbyrea-soningaboutmissingdata,wedemonstratethatacarefullydesignedlocalsearchapproachtoinferenceisveryaccurateandscalestolargedatasets.Experimentsdemonstrateim-provedperformanceforbinaryandunaryre-lationextractionwhencomparedtolearningwithheuristiclabels,includingonaveragea27%increaseinareaundertheprecisionre-callcurveinthebinarycase.1IntroductionThispaperaddressestheissueofmissingdata(Lit-tleandRubin,1986)inthecontextofdistantsuper-vision.Thegoalofdistantsupervisionistolearntoprocessunstructureddata,forinstancetoextractbinaryorunaryrelationsfromtext(BunescuandMooney,2007;SnyderandBarzilay,2007;WuandWeld,2007;Mintzetal.,2009;CollinsandSinger,1999),usingalargedatabaseofpropositionsasaPersonEMPLOYERBibbLatan´eUNCChapelHillTimCookAppleSusanWojcickiGoogleTruePositive“BibbLatan´e,aprofessorattheUniversityofNorthCarolinaatChapelHill,publishedthetheoryin1981.”FalsePositive“TimCookpraisedApple’srecordrevenue…”FalseNegative“JohnP.McNamara,aprofessoratWashingtonStateUniversity’sDepartmentofAnimalSciences…”Figure1:Asmallhypotheticaldatabaseandheuris-ticallylabeledtrainingdatafortheEMPLOYERrela-tion.distantsourceofsupervision.Inthecaseofbinaryrelations,theintuitionisthatanysentencewhichmentionsapairofentities(e1ande2)thatpartici-pateinarelation,R,islikelytoexpresstheproposi-tionr(e1,e2),sowecantreatitasapositivetrainingexampleofr.Figure1presentsanexampleofthisprocess.Onequestionwhichhasreceivedlittleattentioninpreviousworkishowtohandlethesituationwhereinformationismissing,eitherfromthetextcorpus,orthedatabase.Asanexample,supposethepairofentities(JohnP.McNamara,WashingtonStateUni-versity)isabsentfromtheEMPLOYERrelation.Inthiscase,thesentenceinFigure1(andotherswhichmentiontheentitypair)iseffectivelytreatedasanegativeexampleoftherelation.Thisisanissue l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 353–366. Action Editor: Patrick Pantel.
Transactions of the Association for Computational Linguistics, 1 (2013) 353–366. Action Editor: Patrick Pantel. Submitted 5/2013; Überarbeitet 7/2013; Published 10/2013. C(cid:13)2013 Verein für Computerlinguistik. DistributionalSemanticsBeyondWords:SupervisedLearningofAnalogyandParaphrasePeterD.TurneyNationalResearchCouncilCanadaInformationandCommunicationsTechnologiesOttawa,Ontario,Kanada,K1A0R6peter.turney@nrc-cnrc.gc.caAbstractTherehavebeenseveraleffortstoextenddistributionalsemanticsbeyondindividualwords,tomeasurethesimilarityofwordpairs,phrases,andsentences(kurz,tuples;orderedsetsofwords,contiguousornoncontiguous).Onewaytoextendbeyondwordsistocom-paretwotuplesusingafunctionthatcom-binespairwisesimilaritiesbetweenthecom-ponentwordsinthetuples.Astrengthofthisapproachisthatitworkswithbothrela-tionalsimilarity(analogy)andcompositionalsimilarity(paraphrase).Jedoch,pastworkrequiredhand-codingthecombinationfunc-tionfordifferenttasks.Themaincontributionofthispaperisthatcombinationfunctionsaregeneratedbysupervisedlearning.Weachievestate-of-the-artresultsinmeasuringrelationalsimilaritybetweenwordpairs(SATanalo-giesandSemEval2012Task2)andmeasur-ingcompositionalsimilaritybetweennoun-modifierphrasesandunigrams(multiple-choiceparaphrasequestions).1IntroductionHarris(1954)andFirth(1957)hypothesizedthatwordsthatappearinsimilarcontextstendtohavesimilarmeanings.Thishypothesisisthefounda-tionfordistributionalsemantics,inwhichwordsarerepresentedbycontextvectors.Thesimilarityoftwowordsiscalculatedbycomparingthetwocor-respondingcontextvectors(Lundetal.,1995;Lan-dauerandDumais,1997;TurneyandPantel,2010).Distributionalsemanticsishighlyeffectiveformeasuringthesemanticsimilaritybetweenindivid-ualwords.Onasetofeightymultiple-choicesyn-onymquestionsfromthetestofEnglishasafor-eignlanguage(TOEFL),adistributionalapproachrecentlyachieved100%accuracy(BullinariaandLevy,2012).Jedoch,ithasbeendifficulttoextenddistributionalsemanticsbeyondindividualwords,towordpairs,phrases,andsentences.Movingbeyondindividualwords,therearevari-oustypesofsemanticsimilaritytoconsider.Herewefocusonparaphraseandanalogy.Paraphraseissimilarityinthemeaningoftwopiecesoftext(AndroutsopoulosandMalakasiotis,2010).Anal-ogyissimilarityinthesemanticrelationsoftwosetsofwords(Turney,2008A).Itiscommontostudyparaphraseatthesentencelevel(AndroutsopoulosandMalakasiotis,2010),butweprefertoconcentrateonthesimplesttypeofparaphrase,whereabigramparaphrasesaunigram.Forexample,doghouseisaparaphraseofkennel.Inourexperiments,weconcentrateonnoun-modifierbigramsandnoununigrams.Analogiesmaptermsinonedomaintotermsinanotherdomain(Gentner,1983).Thefamiliaranal-ogybetweenthesolarsystemandtheRutherford-Bohratomicmodelinvolvesseveraltermsfromthedomainofthesolarsystemandthedomainoftheatomicmodel(Turney,2008A).Thesimplesttypeofanalogyisproportionalanal-ogy,whichinvolvestwopairsofwords(Turney,2006B).Forexample,thepairhcook,rawiisanal-ogoustothepairhdecorate,plaini.Ifwecookathing,itisnolongerraw;ifwedecorateathing,it l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / t a c
Transactions of the Association for Computational Linguistics, 1 (2013) 341–352. Action Editor: Mirella Lapata.
Transactions of the Association for Computational Linguistics, 1 (2013) 341–352. Action Editor: Mirella Lapata. Submitted 12/2012; Überarbeitet 3/2013, 5/2013; Published 7/2013. C(cid:13)2013 Verein für Computerlinguistik. 341 WhatMakesWritingGreat?FirstExperimentsonArticleQualityPredictionintheScienceJournalismDomainAnnieLouisUniversityofPennsylvaniaPhiladelphia,PA19104lannie@seas.upenn.eduAniNenkovaUniversityofPennsylvaniaPhiladelphia,PA19104nenkova@seas.upenn.eduAbstractGreatwritingisrareandhighlyadmired.Readersseekoutarticlesthatarebeautifullywritten,informativeandentertaining.Yetinformation-accesstechnologieslackcapabil-itiesforpredictingarticlequalityatthislevel.Inthispaperwepresentfirstexperimentsonarticlequalitypredictioninthesciencejour-nalismdomain.Weintroduceacorpusofgreatpiecesofsciencejournalism,alongwithtypicalarticlesfromthegenre.Weimple-mentfeaturestocaptureaspectsofgreatwrit-ing,includingsurprising,visualandemotionalcontent,aswellasgeneralfeaturesrelatedtodiscourseorganizationandsentencestructure.Weshowthatthedistinctionbetweengreatandtypicalarticlescanbedetectedfairlyac-curately,andthattheentirespectrumofourfeaturescontributetothedistinction.1IntroductionMeasuresofarticlequalitywouldbehugelybene-ficialforinformationretrievalandrecommendationsystems.Inthispaper,wedescribeadatasetofNewYorkTimessciencejournalismarticleswhichwehavecategorizedforqualitydifferencesandpresentasystemthatcanautomaticallymakethedistinction.Sciencejournalismconveyscomplexscientificideas,entertainingandeducatingatthesametime.Considerthefollowingopeningofa2005articlebyDavidQuammenfromHarper’smagazine:Onemorningearlylastwinterasmallitemappearedinmylocalnewspaperannouncingthebirthofanextraordi-naryanimal.AteamofresearchersatTexasA&MUni-versityhadsucceededincloningawhitetaildeer.Neverdonebefore.Thefawn,knownasDewey,wasdevelopingnormallyandseemedtobehealthy.Hehadnomother,justasurrogatewhohadcarriedhisfetustoterm.Hehadnofather,justa“donor”ofallhischromosomes.HewasthegeneticduplicateofacertaintrophybuckoutofsouthTexaswhoseskincellshadbeenculturedinalaboratory.Oneofthosecellsfurnishedanucleusthat,transplantedandrejiggered,becametheDNAcoreofaneggcell,whichbecameanembryo,whichintimebe-cameDewey.Sohewaswildlife,inasense,andinan-othersenseelaboratelysynthetic.Thisisthesortofnews,quirkybutepochal,thatcancauseapersonwithamouth-fuloftoasttopauseandmarvel.Whatadumbidea,Imarveled.Thewritingisclearandwell-organizedbutthetextalsocontainscreativeuseoflanguageandacleverstory-likeexplanationofthescientificcon-tribution.Suchpropertiesmakesciencejournalismanattractivegenreforstudyingwritingquality.Sci-encejournalismisalsoahighlyrelevantdomainforinformationretrievalinthecontextofeducationalaswellasentertainingapplications.Articlequalitymeasurescanhugelybenefitsuchsystems.Priorworkindicatesthatthreeaspectsofarticlequalitycanbesuccessfullypredicted:A)whetheratextmeetstheacceptablestandardsforspelling(BrillandMoore,2000),grammar(TetreaultandChodorow,2008;RozovskayaandRoth,2010)anddiscourseorganization(Barzilayetal.,2002;Lap-ata,2003);B)hasatopicthatisinterestingtoapar-ticularuser.Forexample,content-basedrecommen-dationsystemsstandardlyrepresentuserinterestus-ingfrequentwordsfromarticlesinauser’shistoryandretrieveotherarticlesonthesametopics(Paz- l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 327–340. Action Editor: Philipp Koehn.
Transactions of the Association for Computational Linguistics, 1 (2013) 327–340. Action Editor: Philipp Koehn. Submitted 1/2013; Überarbeitet 5/2013; Published 7/2013. C (cid:13) 2013 Verein für Computerlinguistik. DynamicallyShapingtheReorderingSearchSpaceofPhrase-BasedStatisticalMachineTranslationAriannaBisazzaandMarcelloFedericoFondazioneBrunoKesslerTrento,Italien{bisazza,federico}@fbk.euAbstractDefiningthereorderingsearchspaceisacru-cialissueinphrase-basedSMTbetweendis-tantlanguages.Infact,theoptimaltrade-offbetweenaccuracyandcomplexityofde-codingisnowadaysreachedbyharshlylim-itingtheinputpermutationspace.Wepro-poseamethodtodynamicallyshapesuchspaceand,daher,capturelong-rangewordmovementswithouthurtingtranslationqual-itynordecodingtime.Thespacedefinedbyloosereorderingconstraintsisdynamicallyprunedthroughabinaryclassifierthatpredictswhetheragiveninputwordshouldbetrans-latedrightafteranother.Theintegrationofthismodelintoaphrase-baseddecoderim-provesastrongArabic-Englishbaselineal-readyincludingstate-of-the-artearlydistor-tioncost(MooreandQuirk,2007)andhierar-chicalphraseorientationmodels(GalleyandManning,2008).Significantimprovementsinthereorderingofverbsareachievedbyasys-temthatisnotablyfasterthanthebaseline,whileBLEUandMETEORremainstable,orevenincrease,ataveryhighdistortionlimit.1IntroductionWordorderdifferencesareamongthemostimpor-tantfactorsdeterminingtheperformanceofstatisti-calmachinetranslation(SMT)onagivenlanguagepair(Birchetal.,2009).Thisisparticularlytrueintheframeworkofphrase-basedSMT(PSMT)(Zensetal.,2002;Koehnetal.,2003;OchandNey,2002),anapproachthatremainshighlycompetitivedespitetherecentadvancesofthetree-basedapproaches.DuringthePSMTdecodingprocess,theoutputsentenceisbuiltfromlefttoright,whiletheinputsentencepositionscanbecoveredindifferentor-ders.Thus,reorderinginPSMTcanbeviewedastheproblemofchoosingtheinputpermutationthatleadstothehighest-scoringoutputsentence.Duetoefficiencyreasons,Jedoch,theinputpermutationspacecannotbefullyexplored,andisthereforelim-itedwithhardreorderingconstraints.Althoughmanysolutionshavebeenproposedtoexplicitlymodelwordreorderingduringdecoding,PSMTstilllargelyfailstohandlelong-rangewordmovementsinlanguagepairswithdifferentsyntac-ticstructures1.Webelievethisismostlynotduetodeficienciesoftheexistingreorderingmodels,butrathertoaverycoarsedefinitionofthereorder-ingsearchspace.Indeed,theexistingreorderingconstraintsarerathersimpleandtypicallybasedonword-to-worddistances.Moreover,theyareuni-formthroughouttheinputsentenceandinsensitivetotheactualwordsbeingtranslated.Relaxingthiskindofconstraintsmeansdramaticallyincreasingthesizeofthesearchspaceandmakingthereorder-ingmodel’staskextremelycomplex.Asaresult,eveninlanguagepairswherelongreorderingisreg-ularlyobserved,PSMTqualitydegradeswhenlongwordmovementsareallowedtothedecoder.Weaddressthisproblembytrainingabinaryclassifiertopredictwhetheragiveninputpositionshouldbetranslatedrightafteranother,giventhewordsatthosepositionsandtheircontexts.Whenthismodelisintegratedintothedecoder,itspredic-1Forempiricalevidence,seeforinstance(Birchetal.,2009;GalleyandManning,2008;BisazzaandFederico,2012). l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T
Transactions of the Association for Computational Linguistics, 1 (2013) 315–326. Action Editor: Mark Steedman.
Transactions of the Association for Computational Linguistics, 1 (2013) 315–326. Action Editor: Mark Steedman. Submitted 2/2013; Überarbeitet 6/2013; Published 7/2013. C (cid:13) 2013 Verein für Computerlinguistik. Parsingentirediscoursesasverylongstrings:CapturingtopiccontinuityingroundedlanguagelearningMinh-ThangLuongDepartmentofComputerScienceStanfordUniversityStanford,Californialmthang@stanford.eduMichaelC.FrankDepartmentofPsychologyStanfordUniversityStanford,Californiamcfrank@stanford.eduMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,AustraliaMark.Johnson@MQ.edu.auAbstractGroundedlanguagelearning,thetaskofmap-pingfromnaturallanguagetoarepresentationofmeaning,hasattractedmoreandmorein-terestinrecentyears.Inmostworkonthistopic,Jedoch,utterancesinaconversationaretreatedindependentlyanddiscoursestruc-tureinformationislargelyignored.Inthecontextoflanguageacquisition,thisindepen-denceassumptiondiscardscuesthatareim-portanttothelearner,e.g.,thefactthatcon-secutiveutterancesarelikelytosharethesamereferent(Franketal.,2013).Thecurrentpa-perdescribesanapproachtotheproblemofsimultaneouslymodelinggroundedlanguageatthesentenceanddiscourselevels.Wecom-bineideasfromparsingandgrammarinduc-tiontoproduceaparserthatcanhandlelonginputstringswiththousandsoftokens,creat-ingparsetreesthatrepresentfulldiscourses.Bycastinggroundedlanguagelearningasagrammaticalinferencetask,weuseourparsertoextendtheworkofJohnsonetal.(2012),investigatingtheimportanceofdiscoursecon-tinuityinchildren’slanguageacquisitionanditsinteractionwithsocialcues.Ourmodelboostsperformanceinalanguageacquisitiontaskandyieldsgooddiscoursesegmentationscomparedwithhumanannotators.1IntroductionLearningmappingsbetweennaturallanguage(NL)andmeaningrepresentations(MR)isanimportantgoalforbothcomputationallinguisticsandcognitivescience.Accuratelylearningnovelmappingsiscru-cialingroundedlanguageunderstandingtasksandsuchsystemscansuggestinsightsintothenatureofchildrenlanguagelearning.Twoinfluentialexamplesofgroundedlanguagelearningtasksarethesportscastingtask,RoboCup,wheretheNListhesetofrunningcommentaryandtheMRisthesetoflogicalformsrepresentingac-tionslikekickingorpassing(ChenandMooney,2008),andthecross-situationalword-learningtask,wheretheNListhecaregiver’sutterancesandtheMRisthesetofobjectspresentinthecontext(Siskind,1996;YuandBallard,2007).Workinthesedomainssuggeststhat,basedontheco-occurrencebetweenwordsandtheirreferentsincontext,itispossibletolearnmappingsbetweenNLandMRevenundersubstantialambiguity.Nevertheless,contextslikeRoboCup—whereev-erysingleutteranceisgrounded—areextremelyrare.Muchmorecommonarecaseswhereasin-gletopicisintroducedandthendiscussedatlengththroughoutadiscourse.Inatelevisionnewsshow,forexample,atopicmightbeintroducedbypresent-ingarelevantpictureorvideoclip.Oncethetopicisintroduced,theanchorscandiscussitbynameorevenusingapronounwithoutshowingapicture.Thediscourseisgroundedwithouthavingtogroundeveryutterance.Moreover,althoughpreviousworkhaslargelytreatedutteranceorderasindependent,theorderofutterancesiscriticalingroundeddiscoursecontexts:iftheorderisscrambled,itcanbecomeimpossibletorecoverthetopic.Supportingthisidea,Franketal.(2013)foundthattopiccontinuity—thetendencytotalkaboutthesametopicinmultipleutterancesthatarecontiguousintime—isbothprevalentandinformativeforwordlearning.Thispaperexaminestheimportanceoftopiccontinuitythroughagram-maticalinferenceproblem.WebuildonJohnsonetal.(2012)’sworkthatusedgrammaticalinferenceto l D o w n o a d e d f r o m h t t p : / / Direkte . m i t . e du / T