What topic do you need documentation on?
Analysis Methods in Neural Language Processing: A Survey
Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov1,2 and James Glass1 1MIT Computer Science and Artificial Intelligence Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA {belinkov, bicchiere}@mit.edu Abstract The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new mod- els have been proposed,
Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing
Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing Strategies for MRLs and a Case Study from Modern Hebrew Amir More Open University Ra’anana, Israel habeanf@gmail.com Victoria Basmova Open University Ra’anana, Israel vicbas@openu.ac.il Amit Seker Open University Ra’anana, Israel amitse@openu.ac.il Reut Tsarfaty Open University Ra’anana, Israel reutts@openu.ac.il Abstract In standard NLP pipelines, morphological analysis and disambiguation (MA&D) pre- cedes syntactic and semantic downstream tasks. Tuttavia, for languages
Semantic Neural Machine Translation Using AMR
Semantic Neural Machine Translation Using AMR Linfeng Song,1 Daniel Gildea,1 Yue Zhang,2 Zhiguo Wang,3 and Jinsong Su4 1Department of Computer Science, University of Rochester, Rochester, NY 14627 2School of Engineering, Westlake University, China 3IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 4Xiamen University, Xiamen, China 1{lsong10,gildea}@cs.rochester.edu 2yue.zhang@wias.org.cn 3zgw.tomorrow@gmail.com 4jssu@xmu.edu.cn Abstract It is intuitive that semantic representations can be useful for machine translation, mainly be-
Grammar Error Correction in Morphologically Rich Languages:
Grammar Error Correction in Morphologically Rich Languages: The Case of Russian Alla Rozovskaya Queens College, City University of New York arozovskaya@qc.cuny.edu Dan Roth University of Pennsylvania danroth@seas.upenn.edu Abstract Until now, most of the research in grammar error correction focused on English, and the problem has hardly been explored for other languages. We address the task of correcting writing mistakes in morphologically rich lan- guages, con
Learning Typed Entailment Graphs with Global Soft Constraints
Learning Typed Entailment Graphs with Global Soft Constraints Mohammad Javad Hosseini(cid:63)§ Nathanael Chambers(cid:63)(cid:63) Siva Reddy† Xavier R. Holt‡ Shay B. Cohen(cid:63) Mark Johnson‡ and Mark Steedman(cid:63) (cid:63)University of Edinburgh §The Alan Turing Institute, UK (cid:63)(cid:63)United States Naval Academy †Stanford University ‡Macquarie University javad.hosseini@ed.ac.uk, nchamber@usna.edu, sivar@stanford.edu {xavier.ricketts-holt,mark.johnson}@mq.edu.au {scohen,steedman}@inf.ed.ac.uk Abstract This paper presents a new method for learn- ing typed entailment graphs from text. We extract predicate-argument
Surface Statistics of an Unknown Language Indicate How to Parse It
Surface Statistics of an Unknown Language Indicate How to Parse It Dingquan Wang and Jason Eisner Department of Computer Science, Johns Hopkins University {wdd,jason}@cs.jhu.edu Abstract We introduce a novel framework for delex- icalized dependency parsing in a new lan- guage. We show that useful features of the target language can be extracted automati- cally from an unparsed corpus, which con- sists only of gold part-of-speech
Attentive Convolution:
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms Wenpeng Yin Department of Computer and Information Science, University of Pennsylvania wenpeng@seas.upenn.edu Hinrich Schütze Center for Information and Language Processing, LMU Munich, Germany inquiries@cislmu.org Abstract In NLP, convolutional neural networks (CNN) have benefited less than recur- rent neural networks (RNNs) from attention mechanisms. We hypothesize that this is be- cause the attention in CNNs has been mainly
Erratum: “Improving Topic Models with Latent Feature Word
Erratum: “Improving Topic Models with Latent Feature Word Representations” Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson Abstract FROM (a part of Table 10 in the original published arti- cle): F1 scores for TMN and TMNtitle datasets. Change in clustering and classification results due to the DMM and LF-DMM bugs. Data TMN 4.3 Document clustering evaluation FROM (in the original published article): For
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 429–440. Redattore di azioni: Philipp Koehn.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 429–440. Redattore di azioni: Philipp Koehn. Submitted 3/2013; Revised 8/2013; Pubblicato 10/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. MeasuringMachineTranslationErrorsinNewDomainsAnnIrvineJohnsHopkinsUniversityanni@jhu.eduJohnMorganUniversityofMarylandjjm@cs.umd.eduMarineCarpuatNationalResearchCouncilCanadamarine.carpuat@nrc.gc.caHalDaum´eIIIUniversityofMarylandme@hal3.nameDragosMunteanuSDLResearchdmunteanu@sdl.comAbstractWedeveloptwotechniquesforanalyzingtheeffectofportingamachinetranslationsystemtoanewdomain.Oneisamacro-levelana-lysisthatmeasureshowdomainshiftaffectscorpus-levelevaluation;thesecondisamicro-levelanalysisforword-levelerrors.Weap-plythesemethodstounderstandwhathappenswhenaParliament-trainedphrase-basedma-chinetranslationsystemisappliedinfourverydifferentdomains:news,medicaltexts,scien-tificarticlesandmoviesubtitles.Wepresentquantitativeandqualitativeexperimentsthathighlightopportunitiesforfutureresearchindomainadaptationformachinetranslation.1IntroductionWhenbuildingastatisticalmachinetranslation(SMT)system,theexpectedusecaseisoftenlimitedtoaspecificdomain,genreandregister(henceforth“domain”referstothisset,inkeepingwithstandard,imprecise,terminology),suchasaparticulartypeoflegalormedicaldocument.Unfortunately,itisex-pensivetoobtainenoughparalleldatatoreliablyes-timatetranslationmodelsinanewdomain.Instead,onecanhopethatlargeamountsofdatafromano-ther,“olddomain,”mightbecloseenoughtostandasaproxy.Thisisthedefactostandard:wetrainSMTsystemsonParliamentproceedings,butthenusethemtotranslateallsortsofnewtext.Unfortuna-tely,thisresultsinsignificantlydegradedtranslationquality.Inthispaper,wepresenttwocomplemen-tarymethodsforquantifiablymeasuringthesourceoftranslationerrors(§5.1and§5.2)inanoveltaxo-nomy(§4).Weshowquantitative(§7.1)andquali-tative(§7.2)resultsobtainedfromourmethodsonOldDomain(Hansard)Inpmonsieurlepr´esident,lespˆecheursdehomarddelar´egiondel’atlantiquesontdansunesituationcatastro-phique.Refmr.speaker,lobsterfishersinatlanticcanadaarefacingadisaster.Outmr.speaker,thelobsterfishersinatlanticcanadaareinamess.NewDomain(Medical)Inpmodeetvoie(S)d’administrationRefmethodandroute(S)ofadministrationOutfashionandvoie(S)ofdirectorsTABLE1:Exampleinputs,referencesandsystemoutputs.Therearethreetypesoferrors:unseenwords(blue),in-correctsenseselection(red)andunknownsense(green).fourverydifferentnewdomains:newswire,medicaltexts,scientificabstracts,andmoviesubtitles.Ourbasicapproachistothinkoftranslationer-rorsinthecontextofanoveltaxonomyoferrorcategories,“S4.”OurtaxonomycontainscategoriesfortheerrorsshowninTable1,inwhichanSMTsystemtrainedontheHansardparliamentaryproce-dingsisappliedtoanewdomain(inthiscase,me-dicaltexts).Ourcategorizationfocusesonthefollo-wing:newFrenchwords,newFrenchsenses,andin-correctlychosentranslations.Thefirstmethodologywedevelopforstudyingsucherrorsisamicro-levelstudyofthefrequencyanddistributionoftheseerrortypesinrealtranslationoutputatthelevelofindivi-dualwords(§5.1),withoutrespecttohowtheseer-rorsaffectoveralltranslationquality.Thesecondisamacro-levelstudyofhowtheseerrorsaffecttrans-lationperformance(measuredbyBLEU;§5.2).Oneimportantfeatureofourmethodologiesisthatwefocusonerrorsthatcouldpossiblybefixedgivenaccesstodatafromanewdomain,ratherthanallerrorsthatmightarisebecausetheparticulartransla-tionmodelusedisinadequatetocapturetherequired l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 415–428. Redattore di azioni: Brian Roark.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 415–428. Redattore di azioni: Brian Roark. Submitted 7/2013; Revised 9/2013; Pubblicato 10/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. JointMorphologicalandSyntacticAnalysisforRichlyInflectedLanguagesBerndBohnet∗JoakimNivre?IgorBoguslavsky•◦Rich´ardFarkas(cid:5)FilipGinter†JanHajiˇc‡∗UniversityofBirmingham,SchoolofComputerScience?UppsalaUniversity,DepartmentofLinguisticsandPhilology•UniversidadPolit´ecnicadeMadrid,DepartamentodeInteligenciaArtificial◦RussianAcademyofSciences,InstituteforInformationTransmissionProblems(cid:5)UniversityofSzeged,InstituteofInformatics†UniversityofTurku,DepartmentofInformationTechnology‡CharlesUniversityinPrague,InstituteofFormalandAppliedLinguisticsAbstractJointmorphologicalandsyntacticanalysishasbeenproposedasawayofimprovingparsingaccuracyforrichlyinflectedlanguages.Start-ingfromatransition-basedmodelforjointpart-of-speechtagginganddependencypars-ing,weexploredifferentwaysofintegratingmorphologicalfeaturesintothemodel.Wealsoinvestigatetheuseofrule-basedmor-phologicalanalyzerstoprovidehardorsoftlexicalconstraintsandtheuseofwordclus-terstotacklethesparsityoflexicalfeatures.Evaluationonfivemorphologicallyrichlan-guages(Czech,Finnish,German,Hungarian,andRussian)showsconsistentimprovementsinbothmorphologicalandsyntacticaccuracyforjointpredictionoverapipelinemodel,withfurtherimprovementsthankstolexicalcon-straintsandwordclusters.Thefinalresultsimprovethestateoftheartindependencyparsingforalllanguages.1IntroductionSyntacticparsingofnaturallanguagehaswitnessedatremendousdevelopmentduringthelasttwentyyears,especiallythroughtheuseofstatisticalmod-elsforrobustandaccuratebroad-coverageparsing.However,asstatisticalparsingtechniqueshavebeenappliedtomoreandmorelanguages,ithasalsobeenobservedthattypologicaldifferencesbetweenlanguagesleadtonewchallenges.Inparticular,ithasbeenfoundoverandoveragainthatlanguagesexhibitingrichmorphologicalstructure,oftento-getherwitharelativelyfreewordorder,usuallyob-tainlowerparsingaccuracy,especiallyincompar-isontoEnglish.OnestrikingdemonstrationofthistendencycanbefoundintheCoNLLsharedtasksonmultilingualdependencyparsing,organizedin2006and2007,whererichlyinflectedlanguagesclusteredatthelowerendofthescalewithrespecttopars-ingaccuracy(BuchholzandMarsi,2006;Nivreetal.,2007).Theseandsimilarobservationshaveledtoanincreasedinterestinthespecialchallengesposedbyparsingmorphologicallyrichlanguages,asevidencedmostclearlybyanewseriesofwork-shopsdevotedtothistopic(Tsarfatyetal.,2010),aswellasaspecialissueinComputationalLinguistics(Tsarfatyetal.,2013)andasharedtaskonparsingmorphologicallyrichlanguages.1Onehypothesizedexplanationforthelowerpars-ingaccuracyobservedforrichlyinflectedlanguagesisthestrictseparationofmorphologicalandsyn-tacticanalysisassumedinmanyparsingframe-works(Tsarfatyetal.,2010;Tsarfatyetal.,2013).Thisistrueinparticularfordata-drivendependencyparsers,whichtendtoassumethatallmorphologicaldisambiguationhasbeenperformedbeforesyntacticanalysisbegins.However,asarguedbyLeeetal.(2011),inmorphologicallyrichlanguagesthereisoftenconsiderableinteractionbetweenmorphologyandsyntax,suchthatneithercanbedisambiguatedwithouttheother.Leeetal.(2011)goontoshowthatadiscriminativemodelforjointmorphologicaldisambiguationanddependencyparsinggivescon-sistentimprovementsinmorphologicalandsyntac-ticaccuracy,comparedtoapipelinemodel,forAn-cientGreek,Czech,HungarianandLatin.Simi-larly,BohnetandNivre(2012)proposeamodelfor1Seehttps://sites.google.com/site/spmrl2013/home/sharedtask. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 403–414. Redattore di azioni: Jason Eisner.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 403–414. Redattore di azioni: Jason Eisner. Submitted 6/2013; Pubblicato 10/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. TrainingDeterministicParserswithNon-DeterministicOraclesYoavGoldbergBar-IlanUniversityDepartmentofComputerScienceRamat-Gan,Israelyoav.goldberg@gmail.comJoakimNivreUppsalaUniversityDepartmentofLinguisticsandPhilologyUppsala,Swedenjoakim.nivre@lingfil.uu.seAbstractGreedytransition-basedparsersareveryfastbuttendtosufferfromerrorpropagation.Thisproblemisaggravatedbythefactthattheyarenormallytrainedusingoraclesthataredeter-ministicandincompleteinthesensethattheyassumeauniquecanonicalpaththroughthetransitionsystemandareonlyvalidaslongastheparserdoesnotstrayfromthispath.Inthispaper,wegiveageneralcharacterizationoforaclesthatarenondeterministicandcom-plete,presentamethodforderivingsuchora-clesfortransitionsystemsthatsatisfyaprop-ertywecallarcdecomposition,andinstanti-atethismethodforthreewell-knowntransi-tionsystemsfromtheliterature.Wesaythattheseoraclesaredynamic,becausetheyallowustodynamicallyexplorealternativeandnon-optimalpathsduringtraining–incontrasttooraclesthatstaticallyassumeauniqueoptimalpath.Experimentalevaluationonawiderangeofdatasetsclearlyshowsthatusingdynamicoraclestotraingreedyparsersgivessubstan-tialimprovementsinaccuracy.Moreover,thisimprovementcomesatnocostintermsofefficiency,unlikeothertechniqueslikebeamsearch.1IntroductionGreedytransition-basedparsersareeasytoimple-mentandareveryefficient,buttheyaregenerallynotasaccurateasparsersthatarebasedonglobalsearch(McDonaldetal.,2005;KooandCollins,2010)orastransition-basedparsersthatusebeamsearch(ZhangandClark,2008)ordynamicpro-gramming(HuangandSagae,2010;Kuhlmannetal.,2011).Thisworkispartofalineofresearchtryingtopushtheboundariesofgreedyparsingandnarrowtheaccuracygapof2–3%betweensearch-basedandgreedyparsers,whilemaintainingtheef-ficiencyandincrementalnatureofgreedyparsers.Onereasonfortheloweraccuracyofgreedyparsersiserrorpropagation:oncetheparsermakesanerrorindecoding,moreerrorsarelikelytofol-low.Thisbehavioriscloselyrelatedtothewayinwhichgreedyparsersarenormallytrained.Givenatreebankoracle,agoldsequenceoftransitionsisderived,andapredictoristrainedtopredicttransi-tionsalongthisgoldsequence,withoutconsideringanyparserstateoutsidethissequence.Thus,oncetheparserstraysfromthegoldenpathattesttime,itventuresintounknownterritoryandisforcedtoreacttosituationsithasneverbeentrainedfor.Inrecentwork(GoldbergandNivre,2012),weintroducedtheconceptofadynamicoracle,whichisnon-deterministicandnotrestrictedtoasinglegoldenpath,butinsteadprovidesoptimalpredic-tionsforanypossiblestatetheparsermightbein.Dynamicoraclesarenon-deterministicinthesensethattheyreturnasetofvalidtransitionsforagivenparserstateandgoldtree.Moreover,theyarewell-definedandoptimalalsoforstatesfromwhichthegoldtreecannotbederived,inthesensethattheyreturnthesetoftransitionsleadingtothebesttreederivablefromeachstate.Weshowedexperimen-tallythat,usingadynamicoracleforthearc-eagertransitionsystem(Nivre,2003),agreedyparsercanbetrainedtoperformwellalsoafterincurringamis-take,thusalleviatingtheeffectoferrorpropagationandresultinginconsistentlybetterparsingaccuracy. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 379–390. Redattore di azioni: Lillian Lee.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 379–390. Redattore di azioni: Lillian Lee. Submitted 6/2013; Revised 9/2013; Pubblicato 10/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. Data-DrivenMetaphorRecognitionandExplanationHongsongLiMicrosoftResearchAsiahongsli@microsoft.comKennyQ.ZhuShanghaiJiaoTongUniversitykzhu@cs.sjtu.edu.cnHaixunWangGoogleResearchhaixun@google.comAbstractRecognizingmetaphorsandidentifyingthesource-targetmappingsisanimportanttaskasmetaphoricaltextposesabigchallengeformachinereading.Toaddressthisproblem,weautomaticallyacquireametaphorknowledgebaseandanisAknowledgebasefrombillionsofwebpages.Usingtheknowledgebases,wedevelopaninferencemechanismtorec-ognizeandexplainthemetaphorsinthetext.Toourknowledge,thisisthefirstpurelydata-drivenapproachofprobabilisticmetaphorac-quisition,recognition,andexplanation.Ourresultsshowsthatitsignificantlyoutperformsotherstate-of-the-artmethodsinrecognizingandexplainingmetaphors.1IntroductionAmetaphorisawayofcommunicating.Itenablesustocomprehendonethingintermsofanother.Forexample,themetaphor,Julietisthesun,allowsustoseeJulietmuchmorevividlythanifShakespearehadtakenamoreliteralapproach.Weutteraboutonemetaphorforeverytentotwenty-fivewords,oraboutsixmetaphorsaminute(Geary,2011).Specifically,ametaphorisamappingofconceptsfromasourcedomaintoatargetdomain(LakoffandJohnson,1980).Thesourcedomainisoftencon-creteandbasedonsensoryexperience,whiletar-getdomainisusuallyabstract.Twoconceptsareconnectedbythismappingbecausetheysharesomecommonorsimilarproperties,andasaresult,themeaningofoneconceptcanbetransferredtoan-other.Forexample,in“Julietisthesun,”thesunisthesourceconceptwhileJulietisthetargetconcept.Oneinterpretationofthismetaphoristhatbothcon-ceptssharethepropertythattheirexistencebringsaboutwarmth,life,andexcitement.Inametaphor-icalsentence,atleastoneofthetwoconceptsmustbeexplicitlypresent.Thisleadstothreetypesofmetaphors:1.Julietisthesun.Here,boththesource(sun)andthetarget(Juliet)areexplicit.2.Pleasewashyourclawsbeforescratchingme.Here,thesource(claws)isexplicit,whilethetarget(hands)isimplicit,andthecontextofwashisintermsofthetarget.3.Yourwordscutdeep.Here,thetarget(parole)isexplicit,whilethesource(possibly,knife)isimplicit,andthecontextofcutisintermsofthesource.Inthispaper,wefocusontherecognitionandex-planationofmetaphors.Foragivensentence,wefirstcheckwhetheritcontainsametaphoricexpres-sion(whichwecallmetaphorrecognition),andifitdoes,weidentifythesourceandthetargetcon-ceptsofthemetaphor(whichwecallmetaphorex-planation).Metaphorexplanationisimportantforunderstandingmetaphors.Explainingtype2and3metaphorsisparticularlychallenging,E,tothebestofourknowledge,hasnotbeenattemptedfornominalconcepts1before.Inourexamples,know-ingthatlifeandhandsarethetargetconceptsavoidstheconfusionthatmayariseifsourceconceptssunandclawsareusedliterallyinunderstandingthesen-tences.This,Tuttavia,doesnotmeanthatthesource1Nominalconceptsarethoserepresentedbynounphrases. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 391–402. Redattore di azioni: Rada Mihalcea.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 391–402. Redattore di azioni: Rada Mihalcea. Submitted 5/2013; Pubblicato 10/2013. C(cid:13)2013 Associazione per la Linguistica Computazionale. 391 Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading Sumit Basu Chuck Jacobs Lucy Vanderwende Microsoft Research Microsoft Research Microsoft Research One Microsoft Way One Microsoft Way One Microsoft Way Redmond, WA Redmond, WA Redmond, WA sumitb@microsoft.com cjacobs@microsoft.com
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 367–378. Redattore di azioni: Kristina Toutanova.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 367–378. Redattore di azioni: Kristina Toutanova. Submitted 7/2013; Revised 8/2013; Pubblicato 10/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. ModelingMissingDatainDistantSupervisionforInformationExtractionAlanRitterMachineLearningDepartmentCarnegieMellonUniversityrittera@cs.cmu.eduLukeZettlemoyer,MausamComputerSci.&Eng.UniversityofWashington{lsz,mausam}@cs.washington.eduOrenEtzioniVulcanInc.Seattle,WAorene@vulcan.comAbstractDistantsupervisionalgorithmslearninforma-tionextractionmodelsgivenonlylargeread-ilyavailabledatabasesandtextcollections.Mostpreviousworkhasusedheuristicsforgeneratinglabeleddata,forexampleassum-ingthatfactsnotcontainedinthedatabasearenotmentionedinthetext,andfactsinthedatabasemustbementionedatleastonce.Inthispaper,weproposeanewlatent-variableapproachthatmodelsmissingdata.Thispro-videsanaturalwaytoincorporatesidein-formation,forinstancemodelingtheintuitionthattextwilloftenmentionrareentitieswhicharelikelytobemissinginthedatabase.De-spitetheaddedcomplexityintroducedbyrea-soningaboutmissingdata,wedemonstratethatacarefullydesignedlocalsearchapproachtoinferenceisveryaccurateandscalestolargedatasets.Experimentsdemonstrateim-provedperformanceforbinaryandunaryre-lationextractionwhencomparedtolearningwithheuristiclabels,includingonaveragea27%increaseinareaundertheprecisionre-callcurveinthebinarycase.1IntroductionThispaperaddressestheissueofmissingdata(Lit-tleandRubin,1986)inthecontextofdistantsuper-vision.Thegoalofdistantsupervisionistolearntoprocessunstructureddata,forinstancetoextractbinaryorunaryrelationsfromtext(BunescuandMooney,2007;SnyderandBarzilay,2007;WuandWeld,2007;Mintzetal.,2009;CollinsandSinger,1999),usingalargedatabaseofpropositionsasaPersonEMPLOYERBibbLatan´eUNCChapelHillTimCookAppleSusanWojcickiGoogleTruePositive“BibbLatan´e,aprofessorattheUniversityofNorthCarolinaatChapelHill,publishedthetheoryin1981.”FalsePositive“TimCookpraisedApple’srecordrevenue…”FalseNegative“JohnP.McNamara,aprofessoratWashingtonStateUniversity’sDepartmentofAnimalSciences…”Figure1:Asmallhypotheticaldatabaseandheuris-ticallylabeledtrainingdatafortheEMPLOYERrela-tion.distantsourceofsupervision.Inthecaseofbinaryrelations,theintuitionisthatanysentencewhichmentionsapairofentities(e1ande2)thatpartici-pateinarelation,R,islikelytoexpresstheproposi-tionr(e1,e2),sowecantreatitasapositivetrainingexampleofr.Figure1presentsanexampleofthisprocess.Onequestionwhichhasreceivedlittleattentioninpreviousworkishowtohandlethesituationwhereinformationismissing,eitherfromthetextcorpus,orthedatabase.Asanexample,supposethepairofentities(JohnP.McNamara,WashingtonStateUni-versity)isabsentfromtheEMPLOYERrelation.Inthiscase,thesentenceinFigure1(andotherswhichmentiontheentitypair)iseffectivelytreatedasanegativeexampleoftherelation.Thisisanissue l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 353–366. Redattore di azioni: Patrick Pantel.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 353–366. Redattore di azioni: Patrick Pantel. Submitted 5/2013; Revised 7/2013; Pubblicato 10/2013. C(cid:13)2013 Associazione per la Linguistica Computazionale. DistributionalSemanticsBeyondWords:SupervisedLearningofAnalogyandParaphrasePeterD.TurneyNationalResearchCouncilCanadaInformationandCommunicationsTechnologiesOttawa,Ontario,Canada,K1A0R6peter.turney@nrc-cnrc.gc.caAbstractTherehavebeenseveraleffortstoextenddistributionalsemanticsbeyondindividualwords,tomeasurethesimilarityofwordpairs,frasi,andsentences(briefly,tuples;orderedsetsofwords,contiguousornoncontiguous).Onewaytoextendbeyondwordsistocom-paretwotuplesusingafunctionthatcom-binespairwisesimilaritiesbetweenthecom-ponentwordsinthetuples.Astrengthofthisapproachisthatitworkswithbothrela-tionalsimilarity(analogy)andcompositionalsimilarity(paraphrase).Tuttavia,pastworkrequiredhand-codingthecombinationfunc-tionfordifferenttasks.Themaincontributionofthispaperisthatcombinationfunctionsaregeneratedbysupervisedlearning.Weachievestate-of-the-artresultsinmeasuringrelationalsimilaritybetweenwordpairs(SATanalo-giesandSemEval2012Task2)andmeasur-ingcompositionalsimilaritybetweennoun-modifierphrasesandunigrams(multiple-choiceparaphrasequestions).1IntroductionHarris(1954)andFirth(1957)hypothesizedthatwordsthatappearinsimilarcontextstendtohavesimilarmeanings.Thishypothesisisthefounda-tionfordistributionalsemantics,inwhichwordsarerepresentedbycontextvectors.Thesimilarityoftwowordsiscalculatedbycomparingthetwocor-respondingcontextvectors(Lundetal.,1995;Lan-dauerandDumais,1997;TurneyandPantel,2010).Distributionalsemanticsishighlyeffectiveformeasuringthesemanticsimilaritybetweenindivid-ualwords.Onasetofeightymultiple-choicesyn-onymquestionsfromthetestofEnglishasafor-eignlanguage(TOEFL),adistributionalapproachrecentlyachieved100%accuracy(BullinariaandLevy,2012).Tuttavia,ithasbeendifficulttoextenddistributionalsemanticsbeyondindividualwords,towordpairs,frasi,andsentences.Movingbeyondindividualwords,therearevari-oustypesofsemanticsimilaritytoconsider.Herewefocusonparaphraseandanalogy.Paraphraseissimilarityinthemeaningoftwopiecesoftext(AndroutsopoulosandMalakasiotis,2010).Anal-ogyissimilarityinthesemanticrelationsoftwosetsofwords(Turney,2008UN).Itiscommontostudyparaphraseatthesentencelevel(AndroutsopoulosandMalakasiotis,2010),butweprefertoconcentrateonthesimplesttypeofparaphrase,whereabigramparaphrasesaunigram.Forexample,doghouseisaparaphraseofkennel.Inourexperiments,weconcentrateonnoun-modifierbigramsandnoununigrams.Analogiesmaptermsinonedomaintotermsinanotherdomain(Gentner,1983).Thefamiliaranal-ogybetweenthesolarsystemandtheRutherford-Bohratomicmodelinvolvesseveraltermsfromthedomainofthesolarsystemandthedomainoftheatomicmodel(Turney,2008UN).Thesimplesttypeofanalogyisproportionalanal-ogy,whichinvolvestwopairsofwords(Turney,2006B).Forexample,thepairhcook,rawiisanal-ogoustothepairhdecorate,plaini.Ifwecookathing,itisnolongerraw;ifwedecorateathing,it l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 341–352. Redattore di azioni: Mirella Lapata.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 341–352. Redattore di azioni: Mirella Lapata. Submitted 12/2012; Revised 3/2013, 5/2013; Pubblicato 7/2013. C(cid:13)2013 Associazione per la Linguistica Computazionale. 341 WhatMakesWritingGreat?FirstExperimentsonArticleQualityPredictionintheScienceJournalismDomainAnnieLouisUniversityofPennsylvaniaPhiladelphia,PA19104lannie@seas.upenn.eduAniNenkovaUniversityofPennsylvaniaPhiladelphia,PA19104nenkova@seas.upenn.eduAbstractGreatwritingisrareandhighlyadmired.Readersseekoutarticlesthatarebeautifullywritten,informativeandentertaining.Yetinformation-accesstechnologieslackcapabil-itiesforpredictingarticlequalityatthislevel.Inthispaperwepresentfirstexperimentsonarticlequalitypredictioninthesciencejour-nalismdomain.Weintroduceacorpusofgreatpiecesofsciencejournalism,alongwithtypicalarticlesfromthegenre.Weimple-mentfeaturestocaptureaspectsofgreatwrit-ing,includingsurprising,visualandemotionalcontent,aswellasgeneralfeaturesrelatedtodiscourseorganizationandsentencestructure.Weshowthatthedistinctionbetweengreatandtypicalarticlescanbedetectedfairlyac-curately,andthattheentirespectrumofourfeaturescontributetothedistinction.1IntroductionMeasuresofarticlequalitywouldbehugelybene-ficialforinformationretrievalandrecommendationsystems.Inthispaper,wedescribeadatasetofNewYorkTimessciencejournalismarticleswhichwehavecategorizedforqualitydifferencesandpresentasystemthatcanautomaticallymakethedistinction.Sciencejournalismconveyscomplexscientificideas,entertainingandeducatingatthesametime.Considerthefollowingopeningofa2005articlebyDavidQuammenfromHarper’smagazine:Onemorningearlylastwinterasmallitemappearedinmylocalnewspaperannouncingthebirthofanextraordi-naryanimal.AteamofresearchersatTexasA&MUni-versityhadsucceededincloningawhitetaildeer.Neverdonebefore.Thefawn,knownasDewey,wasdevelopingnormallyandseemedtobehealthy.Hehadnomother,justasurrogatewhohadcarriedhisfetustoterm.Hehadnofather,justa“donor”ofallhischromosomes.HewasthegeneticduplicateofacertaintrophybuckoutofsouthTexaswhoseskincellshadbeenculturedinalaboratory.Oneofthosecellsfurnishedanucleusthat,transplantedandrejiggered,becametheDNAcoreofaneggcell,whichbecameanembryo,whichintimebe-cameDewey.Sohewaswildlife,inasense,andinan-othersenseelaboratelysynthetic.Thisisthesortofnews,quirkybutepochal,thatcancauseapersonwithamouth-fuloftoasttopauseandmarvel.Whatadumbidea,Imarveled.Thewritingisclearandwell-organizedbutthetextalsocontainscreativeuseoflanguageandacleverstory-likeexplanationofthescientificcon-tribution.Suchpropertiesmakesciencejournalismanattractivegenreforstudyingwritingquality.Sci-encejournalismisalsoahighlyrelevantdomainforinformationretrievalinthecontextofeducationalaswellasentertainingapplications.Articlequalitymeasurescanhugelybenefitsuchsystems.Priorworkindicatesthatthreeaspectsofarticlequalitycanbesuccessfullypredicted:UN)whetheratextmeetstheacceptablestandardsforspelling(BrillandMoore,2000),grammar(TetreaultandChodorow,2008;RozovskayaandRoth,2010)anddiscourseorganization(Barzilayetal.,2002;Lap-ata,2003);B)hasatopicthatisinterestingtoapar-ticularuser.Forexample,content-basedrecommen-dationsystemsstandardlyrepresentuserinterestus-ingfrequentwordsfromarticlesinauser’shistoryandretrieveotherarticlesonthesametopics(Paz- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 327–340. Redattore di azioni: Philipp Koehn.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 327–340. Redattore di azioni: Philipp Koehn. Submitted 1/2013; Revised 5/2013; Pubblicato 7/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. DynamicallyShapingtheReorderingSearchSpaceofPhrase-BasedStatisticalMachineTranslationAriannaBisazzaandMarcelloFedericoFondazioneBrunoKesslerTrento,Italy{bisazza,federico}@fbk.euAbstractDefiningthereorderingsearchspaceisacru-cialissueinphrase-basedSMTbetweendis-tantlanguages.Infact,theoptimaltrade-offbetweenaccuracyandcomplexityofde-codingisnowadaysreachedbyharshlylim-itingtheinputpermutationspace.Wepro-poseamethodtodynamicallyshapesuchspaceand,così,capturelong-rangewordmovementswithouthurtingtranslationqual-itynordecodingtime.Thespacedefinedbyloosereorderingconstraintsisdynamicallyprunedthroughabinaryclassifierthatpredictswhetheragiveninputwordshouldbetrans-latedrightafteranother.Theintegrationofthismodelintoaphrase-baseddecoderim-provesastrongArabic-Englishbaselineal-readyincludingstate-of-the-artearlydistor-tioncost(MooreandQuirk,2007)andhierar-chicalphraseorientationmodels(GalleyandManning,2008).Significantimprovementsinthereorderingofverbsareachievedbyasys-temthatisnotablyfasterthanthebaseline,whileBLEUandMETEORremainstable,orevenincrease,ataveryhighdistortionlimit.1IntroductionWordorderdifferencesareamongthemostimpor-tantfactorsdeterminingtheperformanceofstatisti-calmachinetranslation(SMT)onagivenlanguagepair(Birchetal.,2009).Thisisparticularlytrueintheframeworkofphrase-basedSMT(PSMT)(Zensetal.,2002;Koehnetal.,2003;OchandNey,2002),anapproachthatremainshighlycompetitivedespitetherecentadvancesofthetree-basedapproaches.DuringthePSMTdecodingprocess,theoutputsentenceisbuiltfromlefttoright,whiletheinputsentencepositionscanbecoveredindifferentor-ders.Thus,reorderinginPSMTcanbeviewedastheproblemofchoosingtheinputpermutationthatleadstothehighest-scoringoutputsentence.Duetoefficiencyreasons,Tuttavia,theinputpermutationspacecannotbefullyexplored,andisthereforelim-itedwithhardreorderingconstraints.Althoughmanysolutionshavebeenproposedtoexplicitlymodelwordreorderingduringdecoding,PSMTstilllargelyfailstohandlelong-rangewordmovementsinlanguagepairswithdifferentsyntac-ticstructures1.Webelievethisismostlynotduetodeficienciesoftheexistingreorderingmodels,butrathertoaverycoarsedefinitionofthereorder-ingsearchspace.Indeed,theexistingreorderingconstraintsarerathersimpleandtypicallybasedonword-to-worddistances.Moreover,theyareuni-formthroughouttheinputsentenceandinsensitivetotheactualwordsbeingtranslated.Relaxingthiskindofconstraintsmeansdramaticallyincreasingthesizeofthesearchspaceandmakingthereorder-ingmodel’staskextremelycomplex.Asaresult,eveninlanguagepairswherelongreorderingisreg-ularlyobserved,PSMTqualitydegradeswhenlongwordmovementsareallowedtothedecoder.Weaddressthisproblembytrainingabinaryclassifiertopredictwhetheragiveninputpositionshouldbetranslatedrightafteranother,giventhewordsatthosepositionsandtheircontexts.Whenthismodelisintegratedintothedecoder,itspredic-1Forempiricalevidence,seeforinstance(Birchetal.,2009;GalleyandManning,2008;BisazzaandFederico,2012). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 315–326. Redattore di azioni: Mark Steedman.
Operazioni dell'Associazione per la Linguistica Computazionale, 1 (2013) 315–326. Redattore di azioni: Mark Steedman. Submitted 2/2013; Revised 6/2013; Pubblicato 7/2013. C (cid:13) 2013 Associazione per la Linguistica Computazionale. Parsingentirediscoursesasverylongstrings:CapturingtopiccontinuityingroundedlanguagelearningMinh-ThangLuongDepartmentofComputerScienceStanfordUniversityStanford,Californialmthang@stanford.eduMichaelC.FrankDepartmentofPsychologyStanfordUniversityStanford,Californiamcfrank@stanford.eduMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,AustraliaMark.Johnson@MQ.edu.auAbstractGroundedlanguagelearning,thetaskofmap-pingfromnaturallanguagetoarepresentationofmeaning,hasattractedmoreandmorein-terestinrecentyears.Inmostworkonthistopic,Tuttavia,utterancesinaconversationaretreatedindependentlyanddiscoursestruc-tureinformationislargelyignored.Inthecontextoflanguageacquisition,thisindepen-denceassumptiondiscardscuesthatareim-portanttothelearner,e.g.,thefactthatcon-secutiveutterancesarelikelytosharethesamereferent(Franketal.,2013).Thecurrentpa-perdescribesanapproachtotheproblemofsimultaneouslymodelinggroundedlanguageatthesentenceanddiscourselevels.Wecom-bineideasfromparsingandgrammarinduc-tiontoproduceaparserthatcanhandlelonginputstringswiththousandsoftokens,creat-ingparsetreesthatrepresentfulldiscourses.Bycastinggroundedlanguagelearningasagrammaticalinferencetask,weuseourparsertoextendtheworkofJohnsonetal.(2012),investigatingtheimportanceofdiscoursecon-tinuityinchildren’slanguageacquisitionanditsinteractionwithsocialcues.Ourmodelboostsperformanceinalanguageacquisitiontaskandyieldsgooddiscoursesegmentationscomparedwithhumanannotators.1IntroductionLearningmappingsbetweennaturallanguage(NL)andmeaningrepresentations(MR)isanimportantgoalforbothcomputationallinguisticsandcognitivescience.Accuratelylearningnovelmappingsiscru-cialingroundedlanguageunderstandingtasksandsuchsystemscansuggestinsightsintothenatureofchildrenlanguagelearning.Twoinfluentialexamplesofgroundedlanguagelearningtasksarethesportscastingtask,RoboCup,wheretheNListhesetofrunningcommentaryandtheMRisthesetoflogicalformsrepresentingac-tionslikekickingorpassing(ChenandMooney,2008),andthecross-situationalword-learningtask,wheretheNListhecaregiver’sutterancesandtheMRisthesetofobjectspresentinthecontext(Siskind,1996;YuandBallard,2007).Workinthesedomainssuggeststhat,basedontheco-occurrencebetweenwordsandtheirreferentsincontext,itispossibletolearnmappingsbetweenNLandMRevenundersubstantialambiguity.Nevertheless,contextslikeRoboCup—whereev-erysingleutteranceisgrounded—areextremelyrare.Muchmorecommonarecaseswhereasin-gletopicisintroducedandthendiscussedatlengththroughoutadiscourse.Inatelevisionnewsshow,forexample,atopicmightbeintroducedbypresent-ingarelevantpictureorvideoclip.Oncethetopicisintroduced,theanchorscandiscussitbynameorevenusingapronounwithoutshowingapicture.Thediscourseisgroundedwithouthavingtogroundeveryutterance.Moreover,althoughpreviousworkhaslargelytreatedutteranceorderasindependent,theorderofutterancesiscriticalingroundeddiscoursecontexts:iftheorderisscrambled,itcanbecomeimpossibletorecoverthetopic.Supportingthisidea,Franketal.(2013)foundthattopiccontinuity—thetendencytotalkaboutthesametopicinmultipleutterancesthatarecontiguousintime—isbothprevalentandinformativeforwordlearning.Thispaperexaminestheimportanceoftopiccontinuitythroughagram-maticalinferenceproblem.WebuildonJohnsonetal.(2012)’sworkthatusedgrammaticalinferenceto l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / T