Documentación - IA de Investigación especializada en el MIT

¿Sobre qué tema necesitas documentación??

Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 193–206. Editor de acciones: Joakim Nivré.

Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 193–206. Editor de acciones: Joakim Nivré. Submitted 12/2013; Revised 1/2014; Publicado 4/2014. C(cid:13)2014 Asociación de Lingüística Computacional. DiscriminativeLexicalSemanticSegmentationwithGaps:RunningtheMWEGamutNathanSchneiderEmilyDanchikChrisDyerNoahA.SmithSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,PA15213,USA{nschneid,emilydan,cdyer,nasmith}@cs.cmu.eduAbstractWepresentanovelrepresentation,evaluationmeasure,andsupervisedmodelsforthetaskofidentifyingthemultiwordexpressions(MWEs)inasentence,resultinginalexicalseman-ticsegmentation.OurapproachgeneralizesastandardchunkingrepresentationtoencodeMWEscontaininggaps,therebyenablingefﬁ-cientsequencetaggingalgorithmsforfeature-richdiscriminativemodels.ExperimentsonanewdatasetofEnglishwebtextoffertheﬁrstlinguistically-drivenevaluationofMWEiden-tiﬁcationwithtrulyheterogeneousexpressiontypes.Ourstatisticalsequencemodelgreatlyoutperformsalookup-basedsegmentationpro-cedure,achievingnearly60%F1forMWEidentiﬁcation.1IntroductionLanguagehasaknackfordefyingexpectationswhenputunderthemicroscope.Forexample,thereisthenotion—sometimesreferredtoascompositionality—thatwordswillbehaveinpredictableways,withindi-vidualmeaningsthatcombinetoformcomplexmean-ingsaccordingtogeneralgrammaticalprinciples.Yetlanguageisawashwithexamplestothecontrary:inparticular,idiomaticexpressionssuchasawashwithNP,haveaknackforVP-ing,tothecontrary,anddefyexpectations.Thankstoprocesseslikemetaphorandgrammaticalization,theseare(tovariousdegrees)semanticallyopaque,structurallyfossilized,and/orstatisticallyidiosyncratic.Inotherwords,idiomaticexpressionsmaybeexceptionalinform,función,ordistribution.Theyaresodiverse,sounruly,so1.MWnamedentities:PrimeMinisterTonyBlair2.MWcompounds:hotairballoon,skinnydip3.conventionallySWcompounds:somewhere4.verb-particle:pickup,dryout,takeover,cutshort5.verb-preposition:referto,dependon,lookfor6.verb-noun(-preposition):payattention(a)7.supportverb:makedecisions,takepictures8.otherphrasalverb:putupwith,getridof9.PPmodiﬁer:aboveboard,atall,fromtimetotime10.coordinatedphrase:cutanddry,moreorless11.connective:aswellas,letalone,inspiteof12.semi-ﬁxedVP:pickupwhereleftoff13.ﬁxedphrase:scaredtodeath,leaveofabsence14.phatic:You’rewelcome.Meneither!15.proverb:Beggarscan’tbechoosers.Figure1:SomeoftheclassesofidiomsinEnglish.Theexamplesincludedherecontainmultiplelexicalizedwords—withtheexceptionofthosein(3),iftheconven-tionalsingle-word(SW)spellingisused.difﬁculttocircumscribe,thatentiretheoriesofsyn-taxarepredicatedonthenotionthatconstructionswithidiosyncraticform-meaningmappings(Fillmoreetal.,1988;Goldberg,1995)orstatisticalproperties(Goldberg,2006)offercrucialevidenceaboutthegrammaticalorganizationoflanguage.Herewefocusonmultiwordexpressions(MWEs):lexicalizedcombinationsoftwoormorewordsthatareexceptionalenoughtobeconsideredassingleunitsinthelexicon.Asﬁgure1illus-trates,MWEsoccupydiversesyntacticandsemanticfunctions.WithinMWEs,wedistinguish(a)propernamesand(b)lexicalidioms.Thelatterhaveprovedthemselvesa“painintheneckforNLP”(Sagetal.,2002).AutomaticandefﬁcientdetectionofMWEs,thoughfarfromsolved,wouldhavediverseappli- l D o w n o a d e d f r o m h t t p : / / directo . m i t . e d u / t a c