¿Sobre qué tema necesitas documentación??
Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 15–28, 2015. Editor de acciones: Hwee Tou Ng.
Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 15–28, 2015. Editor de acciones: Hwee Tou Ng. Lote de envío: 9/2014; Lote de revisión 11/2014; Publicado 1/2015. C (cid:13) 2015 Asociación de Lingüística Computacional. 15 Cross-DocumentCo-ReferenceResolutionusingSample-BasedClusteringwithKnowledgeEnrichmentSouravDuttaMaxPlanckInstituteforInformaticsSaarbr¨ucken,Germanysdutta@mpi-inf.mpg.deGerhardWeikumMaxPlanckInstituteforInformaticsSaarbr¨ucken,Germanyweikum@mpi-inf.mpg.deAbstractIdentifyingandlinkingnamedentitiesacrossinformationsourcesisthebasisofknowledgeacquisitionandattheheartofWebsearch,rec-ommendations,andanalytics.Animportantprobleminthiscontextiscross-documentco-referenceresolution(CCR):computingequiv-alenceclassesoftextualmentionsdenotingthesameentity,withinandacrossdocuments.Priormethodsemployranking,clustering,orprobabilisticgraphicalmodelsusingsyntacticfeaturesanddistantfeaturesfromknowledgebases.However,thesemethodsexhibitlimita-tionsregardingrun-timeandrobustness.ThispaperpresentstheCROCSframeworkforunsupervisedCCR,improvingthestateoftheartintwoways.First,weextendthewayknowledgebasesareharnessed,bycon-structinganotionofsemanticsummariesforintra-documentco-referencechainsusingco-occurringentitymentionsbelongingtodiffer-entchains.Second,wereducethecomputa-tionalcostbyanewalgorithmthatembedssample-basedbisection,usingspectralclus-teringorgraphpartitioning,inahierarchi-calclusteringprocess.ThisallowsscalingupCCRtolargecorpora.Experimentswiththreedatasetsshowsignificantgainsinoutputqual-ity,comparedtothebestpriormethods,andtherun-timeefficiencyofCROCS.1Introduction1.1MotivationandProblemStatementWearewitnessinganotherrevolutioninWebsearch,userrecommendations,anddataanalytics:tran-sitioningfromdocumentsandkeywordstodata,conocimiento,andentities.Examplesofthismega-trendaretheGoogleKnowledgeGraphanditsap-plications,andtheIBMWatsontechnologyfordeepquestionanswering.Toalargeextent,thesead-vanceshavebeenenabledbytheconstructionofhugeknowledgebases(KB’s)suchasDBpedia,Yago,orFreebase;thelatterformingthecoreoftheKnowledgeGraph.Suchsemanticresourcesprovidehugecollectionsofentities:gente,lugares,compa-nies,celebrities,cine,etc.,alongwithrichknowl-edgeabouttheirpropertiesandrelationships.Perhapsthemostimportantvalue-addingcom-ponentinthissettingistherecognitionanddis-ambiguationofnamedentitiesinWebandusercontents.NamedEntityDisambiguation(NED)(ver,p.ej.,(Cucerzan,2007;milne&Witten,2008;Cornoltietal.,2013))mapsamentionstring(e.g.,apersonnamelike“Bolt”oranounphraselike“light-ningbolt”)ontoitsproperentityifpresentinaKB(e.g.,thesprinterUsainBolt).Arelatedbutdifferenttaskofco-referencereso-lution(CR)(ver,p.ej.,(Haghighi&Klein,2009;Ng,2010;Leeetal.,2013))identifiesallmentionsinagiventextthatrefertothesameentity,includinganaphorassuchas“thepresident’swife”,“thefirstlady”,or“she”.Thistaskwhenextendedtoprocessanentirecorpusisthenknownascross-documentco-referenceresolution(CCR)(Singhetal.,2011).Ittakesasinputasetofdocumentswithentitymen-tions,andcomputesasoutputasetofequivalenceclassesovertheentitymentions.Thisdoesnotin-volvemappingmentionstotheentitiesofaKB.Un-likeNED,CCRcandealwithlong-tailoremergingentitiesthatarenotcapturedintheKBoraremerelyinverysparseform.StateoftheArtanditsLimitations.CRmethods,forco-referenceswithinadocument,aregenerallybasedonrulesorsupervisedlearningusingdiffer- l D o w n o a d e d f r o m h t t p : / / directo . m i t
Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 1–13, 2015. Action Editors: Johan Bos, Liliana Lee.
Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 1–13, 2015. Action Editors: Johan Bos, Liliana Lee. Lote de envío: 6/2014; Lote de revisión 9/2014; Publicado 1/2015. C (cid:13) 2015 Asociación de Lingüística Computacional. 1 ReasoningaboutQuantitiesinNaturalLanguageSubhroRoyUniversityofIllinois,UrbanaChampaignsroy9@illinois.eduTimVieiraJohnsHopkinsUniversitytim.f.vieira@gmail.comDanRothUniversityofIllinois,UrbanaChampaigndanr@illinois.eduAbstractLittleworkfromtheNaturalLanguageProcessingcommunityhastargetedtheroleofquantitiesinNaturalLanguageUnderstanding.Thispapertakessomekeystepstowardsfacilitatingreasoningaboutquantitiesexpressedinnaturallanguage.Weinvestigatetwodifferenttasksofnumericalreasoning.First,weconsiderQuantityEntailment,anewtaskformulatedtounderstandtheroleofquantitiesingeneraltextualinferencetasks.Second,weconsidertheproblemofautomaticallyunderstandingandsolvingelementaryschoolmathwordproblems.Inordertoaddressthesequantitativereasoningproblemswefirstdevelopacomputationalapproachwhichweshowtosuccessfullyrecognizeandnormalizetextualexpressionsofquantities.Wethenusethesecapabilitiestofurtherdevelopalgorithmstoassistreasoninginthecontextoftheaforementionedtasks.1IntroductionEveryday,newspaperarticlesreportstatisticstopresentanobjectiveassessmentofthesituationstheydescribe.Fromelectionresults,numberofcasualtiesinaccidents,tochangesinstockprices,textualrepresentationsofquantitiesareextremelyimportantincommunicatingaccurateinformation.However,relativelylittleworkinNaturalLanguageProcessinghasanalyzedtheuseofquantitiesintext.Eveninareaswherewehaverelativelymaturesolutions,likesearch,wefailtodealwithquantities;forexample,onecannotsearchthefinancialmediafor“transactionsinthe1-2millionpoundsrange.”Languageunderstandingoftenrequirestheabilitytoreasonwithrespecttoquantities.Consider,forexample,thefollowingtextualinference,whichwepresentasTextualEntailmentquery.RecognizingTextualEntailment(RTE)(Daganetal.,2013)hasbecomeacommonwaytoformulatetextualinferenceandwefollowthistrend.RTEisthetaskofdeterminingwhetherthemeaningofagiventextpassageTentailsthatofahypothesisH.Example1T:AbombinaHebrewUniversitycafeteriakilledfiveAmericansandfourIsraelis.H:AbombingatHebrewUniversityinJerusalemkilledninepeople,includingfiveAmericans.Here,weneedtoidentifythequantities“fiveAmericans”and“fourIsraelis”,aswellasusethefactthat“Americans”and“Israelis”are“people”.Adifferentflavourofnumericreasoningisrequiredinmathwordproblems.Forexample,inExample2Ryanhas72marblesand17blocks.Ifhesharesthemarblesamong9friends,howmanymarblesdoeseachfriendget?onehastodeterminetherelevantquantitiesinthequestion.Here,thenumberofblocksinRyan’spossessionhasnobearingontheanswer.Thesecondchallengeistodeterminetherelevantmathematicaloperationfromthecontext.Inthispaper,wedescribesomekeystepsnecessarytofacilitatereasoningaboutquantitiesinnaturallanguagetext.Wefirstdescribeasystemdevelopedtorecognizequantitiesinfreeformtext,inferunitsassociatedwiththemandconvertthemto l D o w n o a d e d f r o m h t t p : / / directo . m i
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 537–549, 2016. Editor de acciones: Timothy Baldwin.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 537–549, 2016. Editor de acciones: Timothy Baldwin. Lote de envío: 1/2016; Lote de revisión: 5/2016; Publicado 12/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) UnderstandingSatiricalArticlesUsingCommon-SenseDanGoldwasserPurdueUniversityDepartmentofComputerSciencedgoldwas@purdue.eduXiaoZhangPurdueUniversityDepartmentofComputerSciencezhang923@purdue.eduAbstractAutomaticsatiredetectionisasubtletextclas-sificationtask,formachinesandattimes,evenforhumans.Inthispaperwearguethatsatiredetectionshouldbeapproachedusingcommon-senseinferences,ratherthantradi-tionaltextclassificationmethods.Wepresentahighlystructuredlatentvariablemodelcap-turingtherequiredinferences.Themodelab-stractsoverthespecificentitiesappearinginthearticles,groupingthemintogeneralizedcategories,thusallowingthemodeltoadapttopreviouslyunseensituations.1IntroductionSatireisawritingtechniqueforpassingcriticismusinghumor,ironyorexaggeration.Itisoftenusedincontemporarypoliticstoridiculeindividualpoliticians,politicalpartiesorsocietyasawhole.Werestrictourselvesinthispapertosuchpoliti-calsatirearticles,broadlydefinedasarticleswhosepurposeisnottoreportrealevents,butrathertomocktheirsubjectmatter.Satiricalwritingoftenbuildsonrealfactsandexpectations,pushedtoab-surditytoexpresshumorousinsightsaboutthesitu-ation.Asaresult,thedifferencebetweenrealandsatiricalarticlescanbesubtleandoftenconfusingtoreaders.Withtherecentriseofsocialmediaoutlets,satiricalarticleshavebecomeincreasinglypopularandhavefamouslyfooledseveralleadingnewsagencies1.Thesemisinterpretationscanoften1https://newrepublic.com/article/118013/satire-news-websites-are-cashing-gullible-outraged-readersVicePresidentJoeBidensuddenlybargedin,askingifanyonecould“hook[him]upwithaDixiecup”oftheirurine.“C’mon,yougottahelpmegetsomecleanwhiz.Shinseki,Donovan,I’mlookinginyourdirection”saidBiden.“Doyouwanttohitthis?”amanaskedPresidentBarackObamainabarinDenverTuesdaynight.Thepresidentlaughedbutdidn’tindulge.Itwasn’ttheonlytimeObamawasofferedweedonhisnightout.Figure1:Examplesofrealandsatiricalarticles.Top:satiricalnewsexcerpt.Bottom:realnewsexcerpt.beattributedtocarelessreading,asthereisaclearlinebetweenunusualeventsfindingtheirwaytothenewsandsatire,whichintentionallyplaceskeypo-liticalfiguresinunlikelyhumorousscenarios.Thetwocanbeseparatedbycarefullyreadingthearti-cles,exposingthesatiricalnatureoftheeventsde-scribedinsucharticles.Inthispaperwefollowthisintuition.Welookintothesatiredetectiontask(BurfootandBald-win,2009),predictingifagivennewsarticleisrealorsatirical,andsuggestthatthispredictiontaskshouldbedefinedovercommon-senseinferences,ratherthanlookingatitasalexicaltextclassifica-tiontask(PangandLee,2008;BurfootandBald-win,2009),whichbasesthedecisiononword-levelfeatures.Tofurthermotivatethisobservation,considerthetwoexcerptsinFigure1.Bothexcerptsmentiontop-rankingpoliticians(thePresidentandVicePres-ident)inadrug-relatedcontext,andcontaininfor-malslangutterances,inappropriateforthesubjects’ l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 507–519, 2016. Editor de acciones: Jason Eisner.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 507–519, 2016. Editor de acciones: Jason Eisner. Lote de envío: 3/2016; Lote de revisión: 5/2016; Publicado 11/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) MinimallySupervisedNumberNormalizationKyleGormanandRichardSproatGoogle,Inc.1118thAve.,NewYork,Nueva York,USAAbstractWeproposetwomodelsforverbalizingnum-bers,akeycomponentinspeechrecognitionandsynthesissystems.Thefirstmodelusesanend-to-endrecurrentneuralnetwork.Thesec-ondmodel,drawinginspirationfromthelin-guisticsliterature,usesfinite-statetransducersconstructedwithaminimalamountoftrainingdata.Whilebothmodelsachievenear-perfectperformance,thelattermodelcanbetrainedusingseveralordersofmagnitudelessdatathantheformer,makingitparticularlyusefulforlow-resourcelanguages.1IntroductionManyspeechandlanguageapplicationsrequiretexttokenstobeconvertedfromoneformtoanother.Forexample,intext-to-speechsynthesis,onemustcon-vertdigitsequences(32)intonumbernames(thirty-two),andappropriatelyverbalizedateandtimeex-pressions(12:47→twelveforty-seven)andabbre-viations(kg→kilograms)whilehandlingallomor-phyandmorphologicalconcord(e.g.,Sproat,1996).QuiteabitofrecentworkonSMS(e.g.,Beaufortetal.,2010)andtextfromsocialmediasites(e.g.,YangandEisenstein,2013)hasfocusedondetect-ingandexpandingnovelabbreviations(e.g.,cnuplzhlp).Colectivamente,suchconversionsallfallundertherubricoftextnormalization(Sproatetal.,2001),butthistermmeansradicallydifferentthingsindiffer-entapplications.Forinstance,itisnotnecessarytodetectandverbalizedatesandtimeswhenpreparingsocialmediatextfordownstreaminformationextrac-tion,butthisisessentialforspeechapplications.Whileexpandingnovelabbreviationsisalsoim-portantforspeech(RoarkandSproat,2014),num-bers,veces,fechas,measurephrasesandthelikearefarmorecommoninawidevarietyoftextgenres.FollowingTaylor(2009),werefertocate-goriessuchascardinalnumbers,veces,anddates—eachofwhichissemanticallywell-circumscribed—assemioticclasses.Somepreviousworkontextnor-malizationproposesminimally-supervisedmachinelearningtechniquesfornormalizingspecificsemi-oticclasses,suchasabbreviations(e.g.,Changetal.,2002;PennellandLiu,2011;RoarkandSproat,2014).Thispapercontinuesthistraditionbycon-tributingminimally-supervisedmodelsfornormal-izationofcardinalnumberexpressions(e.g.,ninety-seven).PreviousworkonthissemioticclassincludeformallinguisticstudiesbyCorstius(1968)andHur-ford(1975)andcomputationalmodelsproposedbySproat(1996;2010)andKanisetal.(2005).Ofallsemioticclasses,numbersarebyfarthemostim-portantforspeech,ascardinal(andordinal)num-bersarenotonlysemioticclassesintheirownright,butknowinghowtoverbalizenumbersisimportantformostoftheotherclasses:onecannotverbalizetimes,fechas,measures,orcurrencyexpressionswith-outknowinghowtoverbalizethatlanguage’snum-bersaswell.Onecomputationalapproachtonumbernamever-balization(Sproat,1996;Kanisetal.,2005)employsacascadeoftwofinite-statetransducers(FSTs).ThefirstFSTfactorstheinteger,expressedasadigitse-quence,intosumsofproductsofpowersoften(i.e.,inthecaseofabase-tennumbersystem).ThisiscomposedwithasecondFSTthatdefineshowthe l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 477–490, 2016. Editor de acciones: Brian Roark.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 477–490, 2016. Editor de acciones: Brian Roark. Lote de envío: 1/2016; Lote de revisión: 6/2016; Publicado 9/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) Fast,SmallandExact:Infinite-orderLanguageModellingwithCompressedSuffixTreesEhsanShareghi,[MatthiasPetri,\GholamrezaHaffari[andTrevorCohn\[Facultad de Tecnología de la Información,MonashUniversity\ComputingandInformationSystems,TheUniversityofMelbournefirst.last@{monash.edu,unimelb.edu.au}AbstractEfficientmethodsforstoringandqueryingarecriticalforscalinghigh-orderm-gramlan-guagemodelstolargecorpora.Weproposealanguagemodelbasedoncompressedsuffixtrees,arepresentationthatishighlycompactandcanbeeasilyheldinmemory,whilesup-portingqueriesneededincomputinglanguagemodelprobabilitieson-the-fly.Wepresentseveraloptimisationswhichimprovequeryruntimesupto2500×,despiteonlyincurringamodestincreaseinconstructiontimeandmemoryusage.ForlargecorporaandhighMarkovorders,ourmethodishighlycompeti-tivewiththestate-of-the-artKenLMpackage.Itimposesmuchlowermemoryrequirements,oftenbyordersofmagnitude,andhasrun-timesthatareeithersimilar(fortraining)orcomparable(forquerying).1IntroductionLanguagemodels(LMs)arefundamentaltomanyNLPtasks,includingmachinetranslationandspeechrecognition.StatisticalLMsareprobabilis-ticmodelsthatassignaprobabilitytoasequenceofwordswN1,indicatinghowlikelythesequenceisinthelanguage.m-gramLMsarepopular,andprovetobeaccuratewhenestimatedusinglargecorpora.IntheseLMs,theprobabilitiesofm-gramsareoftenprecomputedandstoredexplicitly.Althoughwidelysuccessful,currentm-gramLMapproachesareimpracticalforlearninghigh-orderLMsonlargecorpora,duetotheirpoorscalingprop-ertiesinbothtrainingandqueryphases.Prevailingmethods(Heafield,2011;Stolckeetal.,2011)pre-computeallm-gramprobabilities,andconsequentlyneedtostoreandaccessasmanyasahundredofbil-lionsofm-gramsforatypicalmoderate-orderLM.Recentresearchhasattemptedtotacklescalabil-ityissuesthroughtheuseofefficientdatastructuressuchastriesandhash-tables(Heafield,2011;Stol-ckeetal.,2011),lossycompression(TalbotandOs-borne,2007;LevenbergandOsborne,2009;GuthrieandHepple,2010;PaulsandKlein,2011;Churchetal.,2007),compactdatastructures(Germannetal.,2009;Watanabeetal.,2009;SorensenandAllauzen,2011),anddistributedcomputation(Heafieldetal.,2013;Brantsetal.,2007).Fundamentaltoallthewidelyusedmethodsistheprecomputationofallprobabilities,hencetheydonotprovideanadequatetrade-offbetweenspaceandtimeforhighm,bothduringtrainingandquerying.ExceptionsareKen-ningtonetal.(2012)andZhangandVogel(2006),whouseasuffix-treeorsuffix-arrayoverthetextforcomputingthesufficientstatisticson-the-fly.Inourpreviouswork(Shareghietal.,2015),weextendedthislineofresearchusingaCompressedSuffixTree(CST)(Ohlebuschetal.,2010),whichprovidesaconsiderablymorecompactsearchablemeansofstoringthecorpusthananuncompressedsuffixarrayorsuffixtree.Thisapproachshowedfavourablescalingpropertieswithmandhadonlyamodestmemoryrequirement.However,themethodonlysupportedKneser-Neysmoothing,notitsmodi-fiedvariant(ChenandGoodman,1999)whichover-allperformsbetterandhasbecomethede-factostandard.Additionally,queryingwassignificantlyslowerthanforleadingLMtoolkits,makingthemethodimpracticalforwidespreaduse.InthispaperweextendShareghietal.(2015)tosupportmodifiedKneser-Neysmoothing,and l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 445–461, 2016. Editor de acciones: Noah Smith.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 445–461, 2016. Editor de acciones: Noah Smith. Lote de envío: 11/2015; Lote de revisión: 2/2016; Publicado 8/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) Easy-FirstDependencyParsingwithHierarchicalTreeLSTMsEliyahuKiperwasserComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelelikip@gmail.comYoavGoldbergComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelyoav.goldberg@gmail.comAbstractWesuggestacompositionalvectorrepresen-tationofparsetreesthatreliesonarecursivecombinationofrecurrent-neuralnetworken-coders.Todemonstrateitseffectiveness,weusetherepresentationasthebackboneofagreedy,bottom-updependencyparser,achiev-ingverystrongaccuraciesforEnglishandChinese,withoutrelyingonexternalwordembeddings.Theparser’simplementationisavailablefordownloadatthefirstauthor’swebpage.1IntroductionDependency-basedsyntacticrepresentationsofsen-tencesarecentraltomanylanguageprocessingtasks(Kübleretal.,2009).Dependencyparse-treesen-codenotonlythesyntacticstructureofasentencebutalsomanyaspectsofitssemantics.ArecenttrendinNLPisconcernedwithencod-ingsentencesasvectors(“sentenceembeddings”),whichcanthenbeusedforfurtherpredictiontasks.Recurrentneuralnetworks(RNNs)(elman,1990),andinparticularmethodsbasedontheLSTMarchi-tecture(HochreiterandSchmidhuber,1997),workverywellformodelingsequences,andconstantlyobtainstate-of-the-artresultsonbothlanguage-modelingandpredictiontasks(ver,p.ej.(Mikolovetal.,2010)).Severalworksattempttoextendrecurrentneu-ralnetworkstoworkontrees(seeSection8forabriefoverview),givingrisetotheso-calledrecursiveneuralnetworks(GollerandKuchler,1996;Socheretal.,2010).Sin embargo,recursiveneuralnetworksdonotcopewellwithtreeswitharbitrarybranch-ingfactors–mostworkrequiretheencodedtreestobebinary-branching,orhaveafixedmaximumarity.Otherattemptsallowarbitrarybranchingfactors,attheexpenseofignoringtheorderofthemodifiers.Incontrast,weproposeatree-encodingthatnat-urallysupportstreeswitharbitrarybranchingfac-tors,makingitparticularlyappealingfordepen-dencytrees.Ourtreeencoderusesrecurrentneuralnetworksasabuildingblock:wemodeltheleftandrightsequencesofmodifiersusingRNNs,whicharecomposedinarecursivemannertoformatree(Sec-tion3).Weuseourtreerepresentationforencodingthepartially-builtparsetreesinagreedy,bottom-updependencyparserwhichisbasedontheeasy-firsttransition-systemofGoldbergandElhadad(2010).UsingtheHierarchicalTreeLSTMrepresenta-tion,andwithoutusinganyexternalembeddings,ourparserachievesparsingaccuraciesof92.6UASand90.2LASonthePTB(Stanforddependencies)and86.1UASand84.4LASontheChinesetree-bank,whilerelyingongreedydecoding.Tothebestofourknowledge,thisisthefirstworktodemonstratecompetitiveparsingaccuraciesforfull-scaleparsingwhilerelyingsolelyonrecursive,compositionaltreerepresentations,andwithoutus-ingarerankingframework.WediscussrelatedworkinSection8.Whiletheparsingexperimentsdemonstratethesuitabilityofourrepresentationforcapturingthestructuralelementsintheparsetreethatareusefulforpredictingparsingdecisions,weareinterestedinexploringtheuseoftheRNN-basedcompositionalvectorrepresentationofparsetreesalsoforseman- l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 431–444, 2016. Editor de acciones: David Chiang.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 431–444, 2016. Editor de acciones: David Chiang. Lote de envío: 3/2016; Lote de revisión: 5/2016; Publicado 7/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) ManyLanguages,OneParserWaleedAmmar♦GeorgeMulcaire♥MiguelBallesteros♠♦ChrisDyer♦NoahA.Smith♥♦SchoolofComputerScience,CarnegieMellonUniversity,pittsburgh,Pensilvania,USA♥ComputerScience&Ingeniería,UniversityofWashington,seattle,Washington,USA♠NLPGroup,PompeuFabraUniversity,Barcelona,Spainwammar@cs.cmu.edu,gmulc@uw.edu,miguel.ballesteros@upf.educdyer@cs.cmu.edu,nasmith@cs.washington.eduAbstractWetrainonemultilingualmodelfordepen-dencyparsinganduseittoparsesentencesinseverallanguages.Theparsingmodeluses(i)multilingualwordclustersandem-beddings;(ii)token-levellanguageinforma-tion;y(iii)language-specificfeatures(fine-grainedPOStags).Thisinputrepresentationenablestheparsernotonlytoparseeffec-tivelyinmultiplelanguages,butalsotogener-alizeacrosslanguagesbasedonlinguisticuni-versalsandtypologicalsimilarities,makingitmoreeffectivetolearnfromlimitedannota-tions.Ourparser’sperformancecomparesfa-vorablytostrongbaselinesinarangeofdatascenarios,includingwhenthetargetlanguagehasalargetreebank,asmalltreebank,ornotreebankfortraining.1IntroductionDevelopingtoolsforprocessingmanylanguageshaslongbeenanimportantgoalinNLP(Rösner,1988;HeidandRaab,1989),1butitwasonlywhenstatisticalmethodsbecamestandardthatmassivelymultilingualNLPbecameeconomical.Themain-streamapproachformultilingualNLPistodesignlanguage-specificmodels.Foreachlanguageofin-terest,theresourcesnecessaryfortrainingthemodelareobtained(orcreated),andseparateparametersarefitforeachlanguageseparately.Thisapproachissimpleandgrantstheflexibilityofcustomizing1Asof2007,thetotalnumberofnativespeakersofthehundredmostpopularlanguagesonlyaccountsfor85%oftheworld’spopulation(Wikipedia,2016).themodelandfeaturestotheneedsofeachlan-guage,butitissuboptimalfortheoreticalandprac-ticalreasons.Theoretically,thestudyoflinguistictypologytellsusthatmanylanguagessharemor-phological,phonological,andsyntacticphenomena(Bender,2011);por lo tanto,themainstreamapproachmissesanopportunitytoexploitrelevantsupervi-sionfromtypologicallyrelatedlanguages.Practi-cally,itisinconvenienttodeployordistributeNLPtoolsthatarecustomizedformanydifferentlan-guagesbecause,foreachlanguageofinterest,weneedtoconfigure,train,tune,monitor,andoccasion-allyupdatethemodel.Furthermore,code-switchingorcode-mixing(mixingmorethanonelanguageinthesamediscourse),whichispervasiveinsomegen-res,inparticularsocialmedia,presentsachallengeformonolingually-trainedNLPmodels(Barmanetal.,2014).2Inparsing,theavailabilityofhomogeneoussyn-tacticdependencyannotationsinmanylanguages(McDonaldetal.,2013;Nivreetal.,2015b;Agi´cetal.,2015;Nivreetal.,2015a)hascreatedanopportunitytodevelopaparserthatiscapableofparsingsentencesinmultiplelanguages,address-ingthesetheoreticalandpracticalconcerns.3Amultilingualparsercanpotentiallyreplaceanarrayoflanguage-specificmonolingually-trainedparsers2Whileourparsercanbeusedtoparseinputwithcode-switching,wehavenotevaluatedthiscapabilityduetothelackofappropriatedata.3Althoughmultilingualdependencytreebankshavebeenavailableforadecadeviathe2006and2007CoNLLsharedtasks(BuchholzandMarsi,2006;Nivreetal.,2007),thetree-bankofeachlanguagewasannotatedindependentlyandwithitsownannotationconventions. l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 417–430, 2016. Editor de acciones: Hal Daume III.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 417–430, 2016. Editor de acciones: Hal Daume III. Lote de envío: 3/2016; Publicado 7/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) EncodingPriorKnowledgewithEigenwordEmbeddingsDominiqueOsborneDepartmentofMathematicsandStatisticsUniversityofStrathclydeGlasgow,G11XH,UKdominique.osborne.13@uni.strath.ac.ukShashiNarayanandShayB.CohenSchoolofInformaticsUniversityofEdinburghEdinburgh,EH89LE,Reino Unido{snaraya2,scohen}@inf.ed.ac.ukAbstractCanonicalcorrelationanalysis(CCA)isamethodforreducingthedimensionofdatarepresentedusingtwoviews.Ithasbeenpreviouslyusedtoderivewordembeddings,whereoneviewindicatesaword,andtheotherviewindicatesitscontext.WedescribeawaytoincorporatepriorknowledgeintoCCA,giveatheoreticaljustificationforit,andtestitbyderivingwordembeddingsandevaluatingthemonamyriadofdatasets.1IntroductionInrecentyearstherehasbeenanimmensein-terestinrepresentingwordsaslow-dimensionalcontinuousreal-vectors,namelywordembeddings.Wordembeddingsaimtocapturelexico-semanticinformationsuchthatregularitiesinthevocabularyaretopologicallyrepresentedinaEuclideanspace.Suchwordembeddingshaveachievedstate-of-the-artperformanceonmanynaturallanguageprocess-ing(NLP)tareas,e.g.,syntacticparsing(Socheretal.,2013),wordorphrasesimilarity(Mikolovetal.,2013b),dependencyparsing(Bansaletal.,2014),unsupervisedlearning(Parikhetal.,2014)andoth-ers.SincethediscoverythatwordembeddingsareusefulasfeaturesforvariousNLPtasks,researchonwordembeddingshastakenonalifeofitsown,withavibrantcommunitysearchingforbetterwordrep-resentationsinavarietyofproblemsanddatasets.Thesewordembeddingsareofteninducedfromlargerawtextcapturingdistributionalco-occurrenceinformationvianeuralnetworks(Bengioetal.,2003;Mikolovetal.,2013b;Mikolovetal.,2013c)orspectralmethods(Deerwesteretal.,1990;Dhillonetal.,2015).Whilethesegeneralpur-posewordembeddingshaveachievedsignificantim-provementinvarioustasksinNLP,ithasbeendis-coveredthatfurthertuningofthesecontinuouswordrepresentationsforspecifictasksimprovestheirper-formancebyalargermargin.Forexample,inde-pendencyparsing,wordembeddingscouldbetai-loredtocapturesimilarityintermsofcontextwithinsyntacticparses(Bansaletal.,2014)ortheycouldberefinedusingsemanticlexiconssuchasWordNet(Molinero,1995),FrameNet(Bakeretal.,1998)andtheParaphraseDatabase(Ganitkevitchetal.,2013)toimprovevarioussimilaritytasks(YuandDredze,2014;Faruquietal.,2015;RotheandSch¨utze,2015).Thispaperproposesamethodtoencodepriorsemanticknowledgeinspectralwordembeddings(Dhillonetal.,2015).Spectrallearningalgorithmsareofgreatinter-estfortheirspeed,scalability,theoreticalguaran-teesandperformanceinvariousNLPapplications.Thesealgorithmsarenostrangerstowordembed-dingseither.Inlatentsemanticanalysis(LSA,(Deerwesteretal.,1990;Landaueretal.,1998)),wordembeddingsarelearnedbyperformingSVDonthewordbydocumentmatrix.Recently,Dhillonetal.(2015)haveproposedtousecanonicalcor-relationanalysis(CCA)asamethodtolearnlow-dimensionalrealvectors,calledEigenwords.Un-likeLSAbasedmethods,CCAbasedmethodsarescaleinvariantandcancapturemultiviewinforma-tionsuchastheleftandrightcontextsofthewords.Asaresult,theeigenwordembeddingsofDhillonetal.(2015)thatwerelearnedusingthesimplelin-earmethodsgiveaccuraciescomparabletoorbetterthanstateoftheartwhencomparedwithhighlynon-lineardeeplearningbasedapproaches(collobertandweston,2008;MnihandHinton,2007;Mikolovetal.,2013b;Mikolovetal.,2013c).Themaincontributionofthispaperisatechnique l D o w n o a d e d f r o m h t t p : / / directo . metro
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 357–370, 2016. Editor de acciones: Masaaki Nagata.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 357–370, 2016. Editor de acciones: Masaaki Nagata. Lote de envío: 11/2015; Lote de revisión: 3/2016; Publicado 7/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) NamedEntityRecognitionwithBidirectionalLSTM-CNNsJasonP.C.ChiuUniversityofBritishColumbiajsonchiu@gmail.comEricNicholsHondaResearchInstituteJapanCo.,Ltd.e.nichols@jp.honda-ri.comAbstractNamedentityrecognitionisachallengingtaskthathastraditionallyrequiredlargeamountsofknowledgeintheformoffeatureengineer-ingandlexiconstoachievehighperformance.Inthispaper,wepresentanovelneuralnet-workarchitecturethatautomaticallydetectsword-andcharacter-levelfeaturesusingahy-bridbidirectionalLSTMandCNNarchitec-ture,eliminatingtheneedformostfeatureen-gineering.Wealsoproposeanovelmethodofencodingpartiallexiconmatchesinneu-ralnetworksandcompareittoexistingap-proaches.Extensiveevaluationshowsthat,givenonlytokenizedtextandpubliclyavail-ablewordembeddings,oursystemiscom-petitiveontheCoNLL-2003datasetandsur-passesthepreviouslyreportedstateoftheartperformanceontheOntoNotes5.0datasetby2.13F1points.Byusingtwolexiconscon-structedfrompublicly-availablesources,weestablishnewstateoftheartperformancewithanF1scoreof91.62onCoNLL-2003and86.28onOntoNotes,surpassingsystemsthatemployheavyfeatureengineering,proprietarylexicons,andrichentitylinkinginformation.1IntroductionNamedentityrecognitionisanimportanttaskinNLP.Highperformanceapproacheshavebeendom-inatedbyapplyingCRF,SVM,orperceptronmodelstohand-craftedfeatures(RatinovandRoth,2009;Passosetal.,2014;Luoetal.,2015).Sin embargo,Collobertetal.(2011b)proposedaneffectiveneu-ralnetworkmodelthatrequireslittlefeatureengi-neeringandinsteadlearnsimportantfeaturesfromwordembeddingstrainedonlargequantitiesofun-labelledtext–anapproachmadepossiblebyrecentadvancementsinunsupervisedlearningofwordem-beddingsonmassiveamountsofdata(collobertandweston,2008;Mikolovetal.,2013)andneuralnet-worktrainingalgorithmspermittingdeeparchitec-tures(Rumelhartetal.,1986).UnfortunatelytherearemanylimitationstothemodelproposedbyCollobertetal.(2011b).Primero,itusesasimplefeed-forwardneuralnetwork,whichrestrictstheuseofcontexttoafixedsizedwindowaroundeachword–anapproachthatdiscardsuse-fullong-distancerelationsbetweenwords.Second,bydependingsolelyonwordembeddings,itisun-abletoexploitexplicitcharacterlevelfeaturessuchasprefixandsuffix,whichcouldbeusefulespeciallywithrarewordswherewordembeddingsarepoorlytrained.Weseektoaddresstheseissuesbypropos-ingamorepowerfulneuralnetworkmodel.Awell-studiedsolutionforaneuralnetworktoprocessvariablelengthinputandhavelongtermmemoryistherecurrentneuralnetwork(RNN)(GollerandKuchler,1996).Recientemente,RNNshaveshowngreatsuccessindiverseNLPtaskssuchasspeechrecognition(Gravesetal.,2013),machinetranslation(Choetal.,2014),andlanguagemod-eling(Mikolovetal.,2011).Thelong-shorttermmemory(LSTM)unitwiththeforgetgateallowshighlynon-triviallong-distancedependenciestobeeasilylearned(Gersetal.,2000).Forsequentialla-bellingtaskssuchasNERandspeechrecognition,abi-directionalLSTMmodelcantakeintoaccountaneffectivelyinfiniteamountofcontextonbothsidesofawordandeliminatestheproblemoflimitedcon-textthatappliestoanyfeed-forwardmodel(Gravesetal.,2013).WhileLSTMshavebeenstudiedinthepastfortheNERtaskbyHammerton(2003),thelackofcomputationalpower(whichledtotheuse l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 343–356, 2016. Editor de acciones: Joakim Nivré.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 343–356, 2016. Editor de acciones: Joakim Nivré. Lote de envío: 1/2016; Lote de revisión: 4/2016; Publicado 7/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) Multi-lingualDependencyParsingEvaluation:aLarge-scaleAnalysisofWordOrderPropertiesusingArtificialDataKristinaGulordavaandPaolaMerloDepartmentofLinguisticsUniversityofGeneva5RuedeCandolle,CH-1211Gen`eve4kristina.gulordava@unige.ch,paola.merlo@unige.chAbstractThegrowingworkinmulti-lingualparsingfacesthechallengeoffaircomparativeeval-uationandperformanceanalysisacrosslan-guagesandtheirtreebanks.Thedifficultyliesinteasingapartthepropertiesoftreebanks,suchastheirsizeoraveragesentencelength,fromthoseoftheannotationscheme,andfromthelinguisticpropertiesoflanguages.Wepro-poseamethodtoevaluatetheeffectsofwordorderofalanguageondependencyparsingperformance,whilecontrollingforconfound-ingtreebankproperties.Themethodusesartificially-generatedtreebanksthataremini-malpermutationsofactualtreebankswithre-specttotwowordorderproperties:wordor-dervariationanddependencylengths.Basedontheseartificialdataontwelvelanguages,weshowthatlongerdependenciesandhigherwordordervariabilitydegradeparsingperfor-mance.Ourmethodalsoextendstomini-malpairsofindividualsentences,leadingtoafiner-grainedunderstandingofparsingerrors.1IntroductionFaircomparativeperformanceevaluationacrosslan-guagesandtheirtreebanksisoneofthedifficul-tiesforworkonmulti-lingualparsing(BuchholzandMarsi,2006;Nivreetal.,2007;Seddahetal.,2011).Thedifferencesinparsingperformancecanbetheresultofdisparatepropertiesoftreebanks(suchastheirsizeoraveragesentencelength),choicesinan-notationschemes,andthelinguisticpropertiesoflanguages.Despiterecentattemptstocreateandapplycross-linguisticandcross-frameworkevalua-tionprocedures(Tsarfatyetal.,2011;Seddahetal.,2013),thereisnocommonlyusedmethodofanal-ysisofparsingperformancewhichaccountsfordif-ferentlinguisticandextra-linguisticfactorsoftree-banksandteasesthemapart.Wheninvestigatingpossiblecausalfactorsforob-servedphenomena,onepowerfulmethod,ifavail-able,consistsininterveningonthepostulatedcausestoobservepossiblechangesintheobservedeffects.Inotherwords,ifAcausesB,thenchangingAorpropertiesofAshouldresultinanobservablechangeinB.Thisinterventionistapproachtothestudyofcausalitycreatescounterfactualdataandatypeofcontrolledmodificationthatiswide-spreadinexperimentalmethodology,butthatisnotwidelyusedinfieldsthatrelyonobservationaldata,suchascorpus-drivennaturallanguageprocessing.Inanalysesofparsingperformance,itiscustom-arytomanipulateandcontrolword-levelfeatures,suchaspart-of-speechtagsormorphologicalfea-tures.Thesetypesoffeaturescanbeeasilyomit-tedormodifiedtoassesstheircontributiontopars-ingperformance.However,higher-orderfeatures,suchaslinearwordorderprecedenceproperties,aremuchhardertodefineandtomanipulate.Aparsingperformanceanalysisbasedoncontrolledmodifica-tionofwordorder,infact,hasnotbeenreportedpre-viously.Weproposesuchamethodbasedonwordorderpermutationswhichallowsustomanipulatewordorderpropertiesanalogouslytofamiliarword-levelpropertiesandstudytheireffectonparsingper-formance.Specifically,givenadependencytreebank,weob-tainnewsyntheticdatabypermutingtheoriginalor-derofwordsinthesentences,keepingtheunordered l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 313–327, 2016. Editor de acciones: Marco Kuhlmann.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 313–327, 2016. Editor de acciones: Marco Kuhlmann. Lote de envío: 2/2016; Publicado 7/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) SimpleandAccurateDependencyParsingUsingBidirectionalLSTMFeatureRepresentationsEliyahuKiperwasserComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelelikip@gmail.comYoavGoldbergComputerScienceDepartmentBar-IlanUniversityRamat-Gan,Israelyoav.goldberg@gmail.comAbstractWepresentasimpleandeffectiveschemefordependencyparsingwhichisbasedonbidirectional-LSTMs(BiLSTMs).Eachsen-tencetokenisassociatedwithaBiLSTMvec-torrepresentingthetokeninitssententialcon-text,andfeaturevectorsareconstructedbyconcatenatingafewBiLSTMvectors.TheBiLSTMistrainedjointlywiththeparserob-jective,resultinginveryeffectivefeatureex-tractorsforparsing.Wedemonstratetheef-fectivenessoftheapproachbyapplyingittoagreedytransition-basedparseraswellastoagloballyoptimizedgraph-basedparser.Theresultingparsershaveverysimplearchitec-tures,andmatchorsurpassthestate-of-the-artaccuraciesonEnglishandChinese.1IntroductionThefocusofthispaperisonfeaturerepresen-tationfordependencyparsing,usingrecenttech-niquesfromtheneural-networks(“deeplearning”)literature.Modernapproachestodependencypars-ingcanbebroadlycategorizedintograph-basedandtransition-basedparsers(Kübleretal.,2009).Graph-basedparsers(McDonald,2006)treatpars-ingasasearch-basedstructuredpredictionprob-leminwhichthegoalislearningascoringfunc-tionoverdependencytreessuchthatthecorrecttreeisscoredaboveallothertrees.Transition-basedparsers(Nivre,2004;Nivre,2008)treatparsingasasequenceofactionsthatproduceaparsetree,andaclassifieristrainedtoscorethepossibleactionsateachstageoftheprocessandguidetheparsingpro-cess.Perhapsthesimplestgraph-basedparsersarearc-factored(firstorder)modelos(McDonald,2006),inwhichthescoringfunctionforatreedecomposesovertheindividualarcsofthetree.Moreelaboratemodelslookatlarger(overlapping)partes,requiringmoresophisticatedinferenceandtrainingalgorithms(Martinsetal.,2009;KooandCollins,2010).Thebasictransition-basedparsersworkinagreedyman-ner,performingaseriesoflocally-optimaldecisions,andboastveryfastparsingspeeds.Moreadvancedtransition-basedparsersintroducesomesearchintotheprocessusingabeam(ZhangandClark,2008)ordynamicprogramming(HuangandSagae,2010).Regardlessofthedetailsoftheparsingframe-workbeingused,acrucialstepinparserdesignischoosingtherightfeaturefunctionfortheunderly-ingstatisticalmodel.Recentwork(seeSection2.2foranoverview)attempttoalleviatepartsofthefea-turefunctiondesignproblembymovingfromlin-eartonon-linearmodels,enablingthemodelertofocusonasmallsetof“core”featuresandleav-ingituptothemachine-learningmachinerytocomeupwithgoodfeaturecombinations(ChenandMan-ning,2014;Peietal.,2015;Leietal.,2014;Taub-Tabibetal.,2015).Sin embargo,theneedtocarefullydefineasetofcorefeaturesremains.Forexam-ple,theworkofChenandManning(2014)uses18differentelementsinitsfeaturefunction,whiletheworkofPeietal.(2015)uses21differentelements.Otherworks,notablyDyeretal.(2015)andLeandZuidema(2014),proposemoresophisticatedfeaturerepresentations,inwhichthefeatureengineeringisreplacedwitharchitectureengineering.Inthiswork,wesuggestanapproachwhichismuchsimplerintermsofbothfeatureengineering l D o w n o a d e d f r o m h t t p : / / directo . m i
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 259–272, 2016. Editor de acciones: Brian Roark.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 259–272, 2016. Editor de acciones: Brian Roark. Lote de envío: 12/2015; Lote de revisión: 3/2016; Publicado 6/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) ABCNN:Attention-BasedConvolutionalNeuralNetworkforModelingSentencePairsWenpengYin,HinrichSch¨utzeCenterforInformationandLanguageProcessingLMUMunich,Germanywenpeng@cis.lmu.deBingXiang,BowenZhouIBMWatsonYorktownHeights,Nueva York,USAbingxia,zhou@us.ibm.comAbstractHowtomodelapairofsentencesisacriticalissueinmanyNLPtaskssuchasanswerselec-tion(AS),paraphraseidentification(PI)andtextualentailment(EL).Mostpriorwork(i)dealswithoneindividualtaskbyfine-tuningaspecificsystem;(ii)modelseachsentence’srepresentationseparately,rarelyconsideringtheimpactoftheothersentence;o(iii)re-liesfullyonmanuallydesigned,task-specificlinguisticfeatures.Thisworkpresentsagen-eralAttentionBasedConvolutionalNeuralNetwork(ABCNN)formodelingapairofsentences.Wemakethreecontributions.(i)TheABCNNcanbeappliedtoawideva-rietyoftasksthatrequiremodelingofsen-tencepairs.(ii)Weproposethreeattentionschemesthatintegratemutualinfluencebe-tweensentencesintoCNNs;de este modo,therep-resentationofeachsentencetakesintocon-siderationitscounterpart.Theseinterdepen-dentsentencepairrepresentationsaremorepowerfulthanisolatedsentencerepresenta-tions.(iii)ABCNNsachievestate-of-the-artperformanceonAS,PIandTEtasks.Wereleasecodeat:https://github.com/yinwenpeng/Answer_Selection.1IntroductionHowtomodelapairofsentencesisacriticalis-sueinmanyNLPtaskssuchasanswerselection(AS)(Yuetal.,2014;Fengetal.,2015),paraphraseidentification(PI)(Madnanietal.,2012;YinandSch¨utze,2015a),textualentailment(EL)(Marellietal.,2014a;Bowmanetal.,2015a)etc.ASs0howmuchdidWaterboygross?s+1themovieearned$161.5millions−1thiswasJerryReed’sfinalfilmappearancePIs0shestruckadealwithRHtopenabooktodays+1shesignedacontractwithRHtowriteabooks−1shedeniedtodaythatshestruckadealwithRHTEs0aniceskatingrinkplacedoutdoorsisfullofpeoples+1alotofpeopleareinaniceskatingparks−1aniceskatingrinkplacedindoorsisfullofpeopleFigure1:Positive()andnegative()examplesforAS,PIandTEtasks.RH=RandomHouseMostpriorworkderiveseachsentence’srepresen-tationseparately,rarelyconsideringtheimpactoftheothersentence.Thisneglectsthemutualinflu-enceofthetwosentencesinthecontextofthetask.Italsocontradictswhathumansdowhencomparingtwosentences.Weusuallyfocusonkeypartsofonesentencebyextractingpartsfromtheothersentencethatarerelatedbyidentity,synonymy,antonymyandotherrelations.Thus,humanbeingsmodelthetwosentencestogether,usingthecontentofonesen-tencetoguidetherepresentationoftheother.Figure1demonstratesthateachsentenceofapairpartiallydetermineswhichpartsoftheothersen-tencewemustfocuson.ForAS,correctlyanswer-ings0requiresattentionon“gross”:s+1containsacorrespondingunit(“earned”)whiles−1doesnot.ForPI,focusshouldberemovedfrom“today”tocorrectlyrecognizeasparaphrasesandasnon-paraphrases.ForTE,weneedtofocuson“fullofpeople”(torecognizeTEfor)andon“outdoors”/“indoors”(torecog-nizenon-TEfor).Theseexamplesshowtheneedforanarchitecturethatcomputesdifferentrepresentationsofsifordifferents1−i(i∈{0,1}). l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 245–257, 2016. Editor de acciones: Hinrich Schütze.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 245–257, 2016. Editor de acciones: Hinrich Schütze. Lote de envío: 1/2016; Lote de revisión: 3/2016; Publicado 6/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) UnsupervisedPart-Of-SpeechTaggingwithAnchorHiddenMarkovModelsKarlStratos,MichaelCollins∗andDanielHsuDepartmentofComputerScience,ColumbiaUniversity{stratos,mcollins,djhsu}@cs.columbia.eduAbstractWetackleunsupervisedpart-of-speech(POS)taggingbylearninghiddenMarkovmodels(HMM)thatareparticularlywell-suitedfortheproblem.TheseHMMs,whichwecallan-chorHMMs,assumethateachtagisassoci-atedwithatleastonewordthatcanhavenoothertag,whichisarelativelybenigncon-ditionforPOStagging(e.g.,“the”isawordthatappearsonlyunderthedeterminertag).Weexploitthisassumptionandextendthenon-negativematrixfactorizationframeworkofAroraetal.(2013)todesignaconsistentestimatorforanchorHMMs.Inexperiments,ouralgorithmiscompetitivewithstrongbase-linessuchastheclusteringmethodofBrownetal.(1992)andthelog-linearmodelofBerg-Kirkpatricketal.(2010).Además,itpro-ducesaninterpretablemodelinwhichhiddenstatesareautomaticallylexicalizedbywords.1IntroductionPart-of-speech(POS)taggingwithoutsupervisionisaquintessentialprobleminunsupervisedlearningfornaturallanguageprocessing(NLP).Amajorap-plicationofthistaskisreducingannotationcost:por ejemplo,itcanbeusedtoproduceroughsyntacticannotationsforanewlanguagethathasnolabeleddata,whichcanbesubsequentlyrefinedbyhumanannotators.HiddenMarkovmodels(HMM)areanaturalchoiceofmodelandhavebeenaworkhorseforthisproblem.EarlyworksestimatedvanillaHMMs∗CurrentlyonleaveatGoogleInc.NewYork.withstandardunsupervisedlearningmethodssuchastheexpectation-maximization(EM)algoritmo,butitquicklybecameclearthattheyperformedverypoorlyininducingPOStags(Merialdo,1994).LaterworksimproveduponvanillaHMMsbyincorporat-ingspecificstructuresthatarewell-suitedforthetask,suchasasparseprior(Johnson,2007)orahard-clusteringassumption(Brownetal.,1992).Inthiswork,wetackleunsupervisedPOStaggingwithHMMswhosestructureisdeliberatelysuitableforPOStagging.TheseHMMsimposeanassump-tionthateachhiddenstateisassociatedwithanob-servationstate(“anchorword”)thatcanappearun-dernootherstate.Forthisreason,wedenotethisclassofrestrictedHMMsbyanchorHMMs.SuchanassumptionisrelativelybenignforPOStagging;itisreasonabletoassumethateachPOStaghasatleastonewordthatoccursonlyunderthattag.Forexample,inEnglish,“the”isananchorwordforthedeterminertag;“laughed”isananchorwordfortheverbtag.Webuildonthenon-negativematrixfactoriza-tion(NMF)frameworkofAroraetal.(2013)tode-riveaconsistentestimatorforanchorHMMs.Wemakeseveralnewcontributionsintheprocess.First,toourknowledge,thereisnopreviousworkdi-rectlybuildingonthisframeworktoaddressunsu-pervisedsequencelabeling.Second,wegeneralizetheNMF-basedlearningalgorithmtoobtainexten-sionsthatareimportantforempiricalperformance(Table1).Tercero,weperformextensiveexperimentsonunsupervisedPOStaggingandreportcompetitiveresultsagainststrongbaselinessuchasthecluster-ingmethodofBrownetal.(1992)andthelog-linear l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 215–229, 2016. Editor de acciones: Hwee Tou Ng.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 215–229, 2016. Editor de acciones: Hwee Tou Ng. Lote de envío: 7/2015; Lote de revisión: 1/2016; 3/2016; Publicado 5/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) J-NERD:JointNamedEntityRecognitionandDisambiguationwithRichLinguisticFeaturesDatBaNguyen1,MartinTheobald2,GerhardWeikum11MaxPlanckInstituteforInformatics2UniversityofUlm{datnb,weikum}@mpi-inf.mpg.demartin.theobald@uni-ulm.deAbstractMethodsforNamedEntityRecognitionandDisambiguation(NERD)performNERandNEDintwoseparatestages.Therefore,NEDmaybepenalizedwithrespecttoprecisionbyNERfalsepositives,andsuffersinrecallfromNERfalsenegatives.Conversely,NEDdoesnotfullyexploitinformationcomputedbyNERsuchastypesofmentions.ThispaperpresentsJ-NERD,anewapproachtoperformNERandNEDjointly,bymeansofaprob-abilisticgraphicalmodelthatcapturesmen-tionspans,mentiontypes,andthemappingofmentionstoentitiesinaknowledgebase.WepresentexperimentswithdifferentkindsoftextsfromtheCoNLL’03,ACE’05,andClueWeb’09-FACC1corpora.J-NERDcon-sistentlyoutperformsstate-of-the-artcompeti-torsinend-to-endNERDprecision,recordar,andF1.1IntroductionMotivation:MethodsforNamedEntityRecogni-tionandDisambiguation,NERDforshort,typicallyproceedintwostages:•AttheNERstage,textspansofentitymentionsaredetectedandtaggedwithcoarse-grainedtypeslikePerson,Organization,Location,etc.ThisistypicallyperformedbyatrainedCondi-tionalRandomField(CRF)overwordsequences(e.g.,Finkeletal.(2005)).•AttheNEDstage,mentionsaremappedtoen-titiesinaknowledgebase(KB)basedoncon-textualsimilaritymeasuresandthesemanticco-herenceoftheselectedentities(e.g.,Cucerzan(2014);Hoffartetal.(2011);Ratinovetal.(2011)).Thistwo-stageapproachhaslimitations.First,NERmayproducefalsepositivesthatcanmisguideNED.Second,NERmaymissoutonsomeentitymentions,andNEDhasnochancetocompensateforthesefalsenegatives.Third,NEDisnotabletohelpNER,forexample,bydisambiguating“easy”mentions(e.g.,ofprominententitieswithmoreorlessuniquenames),andthenusingtheentitiesandknowledgeaboutthemasenrichedfeaturesforNER.Example:Considerthefollowingsentences:Davidplayedformanu,real,andlagalaxy.Hiswifeposhperformedwiththespicegirls.ThisisdifficultforNERbecauseoftheabsenceofupper-casespelling,whichisnotuntypicalinso-cialmedia,forexample.MostNERmethodswillmissoutonmulti-wordmentionsorwordsthatarealsocommonnouns(“spice”)oradjectives(“posh”,“real”).Typically,NERwouldpassonlythemen-tions“David”,“manu”,and“la”totheNEDstage,whichthenispronetomanyerrorslikemappingthefirsttwomentionstoanyprominentpeoplewithfirstnamesDavidandManu,andmappingthethirdonetothecityofLosAngeles.WithNERandNEDper-formedjointly,thepossibledisambiguationof“lagalaxy”tothesoccerclubcanguideNERtotagtherightmentionswiththerighttypes(e.g.,recogniz-ingthat“manu”couldbeashortnameforasoccerteam),whichinturnhelpsNEDtomap“David”totherightentityDavidBeckham.Contribution:Thispaperpresentsanovelkindofprobabilisticgraphicalmodelforthejointrecogni-tionanddisambiguationofnamed-entitymentionsinnatural-languagetexts.Withthisintegratedap-proachtoNERD,weaimtoovercomethelimita-tionsofthetwo-stageNER/NEDmethodsdiscussedabove. l D o w n o a d e d f r o m h t t p : / / d i r e
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 113–125, 2016. Editor de acciones: Noah Smith.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 113–125, 2016. Editor de acciones: Noah Smith. Lote de envío: 10/2015; Lote de revisión: 2/2016; Publicado 4/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) AJointModelforAnswerSentenceRankingandAnswerExtractionMdArafatSultan†VittorioCastelli‡RaduFlorian‡†InstituteofCognitiveScienceandDepartmentofComputerScience,UniversityofColorado,Roca,CO‡IBMT.J.WatsonResearchCenter,YorktownHeights,NYarafat.sultan@colorado.edu,vittorio@us.ibm.com,raduf@us.ibm.comAbstractAnswersentencerankingandanswerextrac-tionaretwokeychallengesinquestionanswer-ingthathavetraditionallybeentreatediniso-lation,i.e.,asindependenttasks.Inthisarti-cle,nosotros(1)explainhowbothtasksarerelatedattheircorebyacommonquantity,y(2)proposeasimpleandintuitivejointprobabilis-ticmodelthataddressesbothviajointcom-putationbuttask-specificapplicationofthatquantity.InourexperimentswithtwoTRECdatasets,ourjointmodelsubstantiallyoutper-formsstate-of-the-artsystemsinbothtasks.1IntroductionOneoftheoriginalgoalsofAIwastobuildmachinesthatcannaturallyinteractwithhumans.Overtime,thechallengesbecameapparentandlanguagepro-cessingemergedasoneofAI’smostpuzzlingareas.Nevertheless,majorbreakthroughshavestillbeenmadeinseveralimportanttasks;withIBM’sWat-son(Ferruccietal.,2010)significantlyoutperform-inghumanchampionsinthequizcontestJeopardy!,questionanswering(control de calidad)isdefinitelyonesuchtask.QAcomesinvariousforms,eachsupportingspe-cifickindsofuserrequirements.Considerascenariowhereasystemisgivenaquestionandasetofsen-tenceseachofwhichmayormaynotcontainananswertothatquestion.Thegoalofanswerextrac-tionistoextractapreciseanswerintheformofashortspanoftextinoneormoreofthosesentences.Inthisform,QAmeetsusers’immediateinformationneeds.Answersentenceranking,por otro lado,isthetaskofassigningaranktoeachsentencesothattheonesthataremorelikelytocontainananswerarerankedhigher.Inthisform,QAissimilartoinforma-tionretrievalandpresentsgreateropportunitiesforfurtherexplorationandlearning.Inthisarticle,weproposeanovelapproachtojointlysolvingthesetwowell-studiedyetopenQAproblems.Mostanswersentencerankingalgorithmsoperateundertheassumptionthatthedegreeofsyntacticand/orsemanticsimilaritybetweenquestionsandan-swersentencesisasufficientlystrongpredictorofanswersentencerelevance(Wangetal.,2007;Yihetal.,2013;Yuetal.,2014;SeverynandMoschitti,2015).Por otro lado,answerextractionalgo-rithmsfrequentlyassesscandidateanswerphrasesbasedprimarilyontheirownpropertiesrelativetothequestion(e.g.,whetherthequestionisawhoquestionandthephrasereferstoaperson),makinginadequateornouseofsentence-levelevidence(Yaoetal.,2013a;SeverynandMoschitti,2013).Boththeseassumptions,sin embargo,aresimplistic,andfailtocapturethecorerequirementsofthetwotasks.Table1showsaquestion,andthreecandi-dateanswersentencesonlyoneofwhich(S(1))ac-tuallyanswersthequestion.Rankingmodelsthatrelysolelyontextsimilarityarehighlylikelytoin-correctlyassignsimilarrankstoS(1)andS(2).SuchmodelswouldfailtoutilizethekeypieceofevidenceagainstS(2)thatitdoesnotcontainanytemporalinformation,necessarytoanswerawhenquestion.Similarly,anextractionmodelthatreliesonlyonthefeaturesofacandidatephrasemightextractthetem-poralexpression“theyear1666”inS(3)asananswerdespiteaclearlackofsentence-levelevidence.Inviewoftheabove,weproposeajointmodel l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 99–112, 2016. Editor de acciones: Philipp Koehn.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 99–112, 2016. Editor de acciones: Philipp Koehn. Lote de envío: 11/2015; Lote de revisión: 2/2016; Publicado 4/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) AdaptingtoAllDomainsatOnce:RewardingDomainInvarianceinSMTHoangCuongandKhalilSima’anandIvanTitovInstituteforLogic,LanguageandComputationUniversityofAmsterdamSciencePark107,1098XGAmsterdam,TheNetherlands{c.hoang,k.simaan,titov}@uva.nlAbstractExistingworkondomainadaptationforstatis-ticalmachinetranslationhasconsistentlyas-sumedaccesstoasmallsamplefromthetestdistribution(targetdomain)attrainingtime.Inpractice,sin embargo,thetargetdomainmaynotbeknownattrainingtimeoritmaychangetomatchuserneeds.Insuchsituations,itisnaturaltopushthesystemtomakesaferchoices,givinghigherpreferencetodomain-invarianttranslations,whichworkwellacrossdomains,overriskydomain-specificalterna-tives.Weencodethisintuitionby(1)in-ducinglatentsubdomainsfromthetrainingdataonly;(2)introducingfeatureswhichmea-surehowspecializedphrasesaretoindividualinducedsub-domains;(3)estimatingfeatureweightsonout-of-domaindata(ratherthanonthetargetdomain).Weconductexperimentsonthreelanguagepairsandanumberofdiffer-entdomains.Weobserveconsistentimprove-mentsoverabaselinewhichdoesnotexplic-itlyrewarddomaininvariance.1IntroductionMismatchinphrasetranslationdistributionsbe-tweentestdata(targetdomain)andtraindataisknowntoharmperformanceofstatisticaltransla-tionsystems(Irvineetal.,2013;Carpuatetal.,2014).Domain-adaptationmethods(Fosteretal.,2010;Bisazzaetal.,2011;Sennrich,2012b;Raz-maraetal.,2012;Sennrichetal.,2013;Haddow,2013;Jotyetal.,2015)aimtospecializeasystemestimatedonout-of-domaintrainingdatatoatargetdomainrepresentedbyasmalldatasample.Inprac-tice,sin embargo,thetargetdomainmaynotbeknownattrainingtimeoritmaychangeovertimedepend-ingonuserneeds.Inthisworkweaddressexactlythesettingwherewehaveadomain-agnosticsystembutwehavenoaccesstoanysamplesfromthetar-getdomainattrainingtime.Thisisanimportantandchallengingsettingwhich,asfarasweareaware,hasnotyetreceivedattentionintheliterature.Whenthetargetdomainisunknownattrainingtime,thesystemcouldbetrainedtomakesaferchoices,preferringtranslationswhicharelikelytoworkacrossdifferentdomains.Forexample,whentranslatingfromEnglishtoRussian,themostnaturaltranslationfortheword‘code’wouldbehighlyde-pendentonthedomain(andthecorrespondingwordsense).TheRussianwords‘xifr’,‘zakon’or‘programma’wouldperhapsbeoptimalchoicesifweconsidercryptography,legalandsoftwaredevel-opmentdomains,respectively.However,thetransla-tion‘kod’isalsoacceptableacrossallthesedomainsand,assuch,wouldbeasaferchoicewhenthetar-getdomainisunknown.Notethatsuchatransla-tionmaynotbethemostfrequentoveralland,con-sequently,mightnotbeproposedbyastandard(i.e.,domain-agnostic)phrase-basedtranslationsystem.Inordertoencodepreferencefordomain-invarianttranslations,weintroduceameasurewhichquantifieshowlikelyaphrase(oraphrase-pair)istobe“domain-invariant».Werecallthatmostlargeparallelcorporaareheterogeneous,consistingofdi-verselanguageuseoriginatingfromavarietyofun-specifiedsubdomains.Forexample,newsarticlesmaycoversports,finance,política,technologyandavarietyofothernewstopics.Noneofthesub-domainsmaymatchthetargetdomainparticularly l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 87–98, 2016. Editor de acciones: Alejandro Clark.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 87–98, 2016. Editor de acciones: Alejandro Clark. Lote de envío: 7/2015; Lote de revisión: 11/2015; Publicado 4/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) LearningTier-basedStrictly2-LocalLanguagesAdamJardineandJeffreyHeinzUniversityofDelaware{ajardine,heinz}@udel.eduAbstractTheTier-basedStrictly2-Local(TSL2)lan-guagesareaclassofformallanguageswhichhavebeenshowntomodellong-distancephonotacticgeneralizationsinnaturallan-guage(Heinzetal.,2011).Thispaperin-troducestheTier-basedStrictly2-LocalIn-ferenceAlgorithm(2TSLIA),thefirstnon-enumerativelearnerfortheTSL2languages.Weprovethe2TSLIAisguaranteedtocon-vergeinpolynomialtimeonadatasamplewhosesizeisboundedbyaconstant.1IntroductionThisworkpresentstheTier-basedStrictly2-LocalInferenceAlgorithm(2TSLIA),anefficientlearn-ingalgorithmforaclassofTier-basedStrictlyLo-cal(TSL)formallanguages(Heinzetal.,2011).ATSLclassisdeterminedbytwoparameters:thetier,orsubsetofthealphabet,andthepermissibletierk-factors,whicharethelegalsequencesoflengthkallowedinthestring,onceallnon-tiersymbolshavebeenremoved.TheTier-basedStrictly2-Local(TSL2)languagesarethoseinwhichk=2.Aswillbediscussedbelow,theTSLlanguagesareofinteresttophonologybecausetheycanmodelawidevarietyoflong-distancephonotacticpatternsfoundinnaturallanguage(Heinzetal.,2011;Mc-MullinandHansson,próximo).OneexampleisderivedfromLatinliquiddissimilation,inwhichtwolscannotappearinawordunlessthereisanrintervening,regardlessofdistance.Forexam-ple,floralis‘floral’iswell-formedbutnot*mili-talis(cf.militaris‘military’).Asexplainedinsec-tions2and4,thiscanbemodeledwithpermissible2-factorsoveratierconsistingoftheliquids{yo,r}.Forlong-distancephonotactics,kcanbefixedto2,butitisdoesnotappearthatthetiercanbefixedsincelanguagesemployavarietyofdifferenttiers.Thispresentsaninterestinglearningproblem:Givenafixedk,howcananalgorithminducebothatierandasetofpermissibletierk-factorsfrompositivedata?Thereissomerelatedworkwhichaddressesthisquestion.GoldsmithandRiggle(2012),buildingonworkbyGoldsmithandXanthos(2009),presentamethodbasedonmutualinformationforlearn-ingtiersandsubsequentlylearningharmonypat-terns.Thispaperdiffersinthatitsmethodsarerootedfirmlyingrammaticalinferenceandformallanguagetheory(delaHiguera,2010).Forinstance,incontrasttotheresultspresentedthere,weprovethekindsofpatterns2TSLIAsucceedsonandthekindofdatasufficientforittodoso.Nonetheless,thereisrelevantworkincomputa-tionallearningtheory:Gold(1967)provedthatanyfiniteclassoflanguagesisidentifiableinthelimitviaanenumerationmethod.Givenafixedalphabetandafixedk,thenumberofpossibletiersandper-missibletierk-factorsisfinite,andthuslearnableinthisway.However,suchlearnersaregrosslyinef-ficient.Noprovably-correct,non-enumerative,ef-ficientlearnerforboththetierandpermissibletierk-factorparametershaspreviouslybeenproposed.Thisworkfillsthisgapwithanalgorithmwhichlearnstheseparameterswhenk=2frompositivedataintimepolynomialinthesizeofthedata.Finally,Jardine(2016)presentsasimplifiedver- l D o w n o a d e d f r o m h t t p : / / directo
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 61–74, 2016. Editor de acciones: Janyce Wiebe y Kristina Toutanova.
Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 61–74, 2016. Editor de acciones: Janyce Wiebe y Kristina Toutanova. Lote de envío: 10/2015; Lote de revisión: 12/2015; Publicado 3/2016. 2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia. C (cid:13) 61 AnEmpiricalAnalysisofFormalityinOnlineCommunicationElliePavlickUniversityofPennsylvania∗epavlick@seas.upenn.eduJoelTetreaultYahooLabstetreaul@yahoo-inc.comAbstractThispaperpresentsanempiricalstudyoflinguisticformality.Weperformananaly-sisofhumans’perceptionsofformalityinfourdifferentgenres.Thesefindingsareusedtodevelopastatisticalmodelforpre-dictingformality,whichisevaluatedun-derdifferentfeaturesettingsandgenres.Weapplyourmodeltoaninvestigationofformalityinonlinediscussionforums,andpresentfindingsconsistentwiththeoriesofformalityandlinguisticcoordination.1IntroductionLanguageconsistsofmuchmorethanjustcon-tent.Considerthefollowingtwosentences:1.Thoserecommendationswereunsolicitedandundesirable.2.that’sthestupidestsuggestionEVER.Bothsentencescommunicatethesameidea,butthefirstissubstantiallymoreformal.Suchstylisticdifferencesoftenhavealargerimpactonhowthehearerunderstandsthesentencethantheliteralmeaningdoes(Azul,1987).Fullnaturallanguageunderstandingrequirescomprehendingthisstylisticaspectofmeaning.Toenablerealadvancementsindialogsystems,informationextraction,andhuman-computerinteraction,computersneedtounderstandtheentiretyofwhathumanssay,boththeliteralandthenon-literal.Inthispaper,wefocusonthe∗ResearchperformedwhileatYahooLabs.particularstylisticdimensionillustratedabove:formality.Formalityhaslongbeenofinteresttolinguistsandsociolinguists,whohaveobservedthatitsubsumesarangeofdimensionsofstylein-cludingserious-trivial,polite-casual,andlevelofsharedknowledge(Irvine,1979;BrownandFraser,1979).Theformal-informaldimensionhasevenbeencalledthe“mostimportantdi-mensionofvariationbetweenstyles”(HeylighenandDewaele,1999).Aspeaker’slevelofformal-itycanrevealinformationabouttheirfamiliar-itywithaperson,opinionsofatopic,andgoalsforaninteraction(Azul,1987;Endrassetal.,2011).Asaresult,theabilitytorecognizefor-malityisanintegralpartofdialoguesystems(Mairesse,2008;MairesseandWalker,2011;BattaglinoandBickmore,2015),sociolinguisticanalyses(Danescu-Niculescu-Miziletal.,2012;Justoetal.,2014;KrishnanandEisenstein,2015),human-computerinteraction(Johnsonetal.,2005;KhosmoodandWalker,2010),summa-rization(SidhayeandCheung,2015),andau-tomaticwritingassessment(FeliceandDeane,2012).Formalitycanalsoindicatecontext-independent,universalstatements(HeylighenandDewaele,1999),makingformalitydetectionrelevantfortaskssuchasknowledgebasepopu-lation(Suhetal.,2006;ReiterandFrank,2010)andtextualentailment(Daganetal.,2006).Thispaperinvestigatesformalityinonlinewrittencommunication.Thecontributionsareasfollows:1)Weprovideananalysisofhumans’subjectiveperceptionsofformalityinfourdif-ferentgenres.Wehighlightareasofhighandlowagreementandextractpatternsthatconsis- l D o w n o a d e d f r o m h t t p : / / d i