What topic do you need documentation on?
Transactions of the Association for Computational Linguistics, 2 (2014) 193–206. Action Editor: Joakim Nivre.
Transactions of the Association for Computational Linguistics, 2 (2014) 193–206. Action Editor: Joakim Nivre. Submitted 12/2013; Revised 1/2014; Published 4/2014. c(cid:13)2014 Association for Computational Linguistics. DiscriminativeLexicalSemanticSegmentationwithGaps:RunningtheMWEGamutNathanSchneiderEmilyDanchikChrisDyerNoahA.SmithSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,PA15213,USA{nschneid,emilydan,cdyer,nasmith}@cs.cmu.eduAbstractWepresentanovelrepresentation,evaluationmeasure,andsupervisedmodelsforthetaskofidentifyingthemultiwordexpressions(MWEs)inasentence,resultinginalexicalseman-ticsegmentation.OurapproachgeneralizesastandardchunkingrepresentationtoencodeMWEscontaininggaps,therebyenablingeffi-cientsequencetaggingalgorithmsforfeature-richdiscriminativemodels.ExperimentsonanewdatasetofEnglishwebtextofferthefirstlinguistically-drivenevaluationofMWEiden-tificationwithtrulyheterogeneousexpressiontypes.Ourstatisticalsequencemodelgreatlyoutperformsalookup-basedsegmentationpro-cedure,achievingnearly60%F1forMWEidentification.1IntroductionLanguagehasaknackfordefyingexpectationswhenputunderthemicroscope.Forexample,thereisthenotion—sometimesreferredtoascompositionality—thatwordswillbehaveinpredictableways,withindi-vidualmeaningsthatcombinetoformcomplexmean-ingsaccordingtogeneralgrammaticalprinciples.Yetlanguageisawashwithexamplestothecontrary:inparticular,idiomaticexpressionssuchasawashwithNP,haveaknackforVP-ing,tothecontrary,anddefyexpectations.Thankstoprocesseslikemetaphorandgrammaticalization,theseare(tovariousdegrees)semanticallyopaque,structurallyfossilized,and/orstatisticallyidiosyncratic.Inotherwords,idiomaticexpressionsmaybeexceptionalinform,fonction,ordistribution.Theyaresodiverse,sounruly,so1.MWnamedentities:PrimeMinisterTonyBlair2.MWcompounds:hotairballoon,skinnydip3.conventionallySWcompounds:somewhere4.verb-particle:pickup,dryout,takeover,cutshort5.verb-preposition:referto,dependon,lookfor6.verb-noun(-preposition):payattention(à)7.supportverb:makedecisions,takepictures8.otherphrasalverb:putupwith,getridof9.PPmodifier:aboveboard,atall,fromtimetotime10.coordinatedphrase:cutanddry,moreorless11.connective:aswellas,letalone,inspiteof12.semi-fixedVP:pickupwhereleftoff13.fixedphrase:scaredtodeath,leaveofabsence14.phatic:You’rewelcome.Meneither!15.proverb:Beggarscan’tbechoosers.Figure1:SomeoftheclassesofidiomsinEnglish.Theexamplesincludedherecontainmultiplelexicalizedwords—withtheexceptionofthosein(3),iftheconven-tionalsingle-word(SW)spellingisused.difficulttocircumscribe,thatentiretheoriesofsyn-taxarepredicatedonthenotionthatconstructionswithidiosyncraticform-meaningmappings(Fillmoreetal.,1988;Goldberg,1995)orstatisticalproperties(Goldberg,2006)offercrucialevidenceaboutthegrammaticalorganizationoflanguage.Herewefocusonmultiwordexpressions(MWEs):lexicalizedcombinationsoftwoormorewordsthatareexceptionalenoughtobeconsideredassingleunitsinthelexicon.Asfigure1illus-trates,MWEsoccupydiversesyntacticandsemanticfunctions.WithinMWEs,wedistinguish(un)propernamesand(b)lexicalidioms.Thelatterhaveprovedthemselvesa“painintheneckforNLP”(Sagetal.,2002).AutomaticandefficientdetectionofMWEs,thoughfarfromsolved,wouldhavediverseappli- l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 2 (2014) 181–192. Action Editor: Eric Fosler-Lussier.
Transactions of the Association for Computational Linguistics, 2 (2014) 181–192. Action Editor: Eric Fosler-Lussier. Submitted 10/2013; Revised 2/2014; Published 4/2014. c (cid:13) 2014 Association for Computational Linguistics. DynamicLanguageModelsforStreamingTextDaniYogatama∗ChongWang∗BryanR.Routledge†NoahA.Smith∗EricP.Xing∗∗SchoolofComputerScience†TepperSchoolofBusinessCarnegieMellonUniversityPittsburgh,PA15213,USA∗{dyogatama,chongw,nasmith,epxing}@cs.cmu.edu,†routledge@cmu.eduAbstractWepresentaprobabilisticlanguagemodelthatcapturestemporaldynamicsandconditionsonarbitrarynon-linguisticcontextfeatures.Thesecontextfeaturesserveasimportantindicatorsoflanguagechangesthatareotherwisedifficulttocaptureusingtextdatabyitself.Welearnourmodelinanefficientonlinefashionthatisscalableforlarge,streamingdata.Withfivestreamingdatasetsfromtwodifferentgenres—economicsnewsarticlesandsocialmedia—weevaluateourmodelonthetaskofsequentiallanguagemodeling.Ourmodelconsistentlyoutperformscompetingmodels.1IntroductionLanguagemodelsareakeycomponentinmanyNLPapplications,suchasmachinetranslationandex-ploratorycorpusanalysis.Languagemodelsaretypi-callyassumedtobestatic—theword-given-contextdistributionsdonotchangeovertime.Examplesincluden-grammodels(Jelinek,1997)andproba-bilistictopicmodelslikelatentDirichletallocation(Bleietal.,2003);weusetheterm“languagemodel”toreferbroadlytoprobabilisticmodelsoftext.Recently,streamingdatasets(e.g.,socialmedia)haveattractedmuchinterestinNLP.Sincesuchdataevolverapidlybasedoneventsintherealworld,as-sumingastaticlanguagemodelbecomesunrealistic.Ingeneral,moredataisseenasbetter,buttreatingallpastdataequallyrunstheriskofdistractingamodelwithirrelevantevidence.Ontheotherhand,cau-tiouslyusingonlythemostrecentdatarisksoverfit-tingtoshort-termtrendsandmissingimportanttime-insensitiveeffects(BleiandLafferty,2006;Wangetal.,2008).Donc,inthispaper,wetakestepstowardmethodsforcapturinglong-rangetemporaldynamicsinlanguageuse.Ourmodelalsoexploitsobservablecontextvari-ablestocapturetemporalvariationthatisotherwisedifficulttocaptureusingonlytext.Specificallyfortheapplicationsweconsider,weusestockmarketdataasexogenousevidenceonwhichthelanguagemodeldepends.Forexample,whenanimportantcompany’spricemovessuddenly,thelanguagemodelshouldbebasednotontheveryrecenthistory,butshouldbesimilartothelanguagemodelforadaywhenasimilarchangehappened,sincepeoplearelikelytosaysimilarthings(eitheraboutthatcom-pany,oraboutconditionsrelevanttothechange).Non-linguisticcontextssuchasstockpricechangesprovideusefulauxiliaryinformationthatmightindi-catethesimilarityoflanguagemodelsacrossdiffer-enttimesteps.Wealsoturntoafullyonlinelearningframework(Cesa-BianchiandLugosi,2006)todealwithnon-stationarityanddynamicsinthedatathatnecessitateadaptationofthemodeltodatainrealtime.Inon-linelearning,streamingexamplesareprocessedonlywhentheyarrive.Onlinelearningalsoeliminatestheneedtostorelargeamountsofdatainmemory.Strictlyspeaking,onlinelearningisdistinctfromstochasticlearning,whichforlanguagemodelsbuiltonmassivedatasetshasbeenexploredbyHoffmanetal.(2013)andWangetal.(2011).Thosetech-niquesarestillforstaticmodeling.Languagemodel-ingforstreamingdatasetsinthecontextofmachinetranslationwasconsideredbyLevenbergandOs-borne(2009)andLevenbergetal.(2010).Goyaletal.(2009)introducedastreamingalgorithmforlargescalelanguagemodelingbyapproximatingn-gramfrequencycounts.Weproposeageneralonlinelearningalgorithmforlanguagemodelingthatdrawsinspirationfromregretminimizationinsequentialpredictions(Cesa-BianchiandLugosi,2006)andon- l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier.
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. Submitted 11/2013; Revised 2/2014; Published 4/2014. c (cid:13) 2014 Association for Computational Linguistics. SegmentationforEfficientSupervisedLanguageAnnotationwithanExplicitCost-UtilityTradeoffMatthiasSperber1,MirjamSimantzik2,GrahamNeubig3,SatoshiNakamura3,AlexWaibel11KarlsruheInstituteofTechnology,InstituteforAnthropomatics,Germany2MobileTechnologiesGmbH,Germany3NaraInstituteofScienceandTechnology,AHCLaboratory,Japanmatthias.sperber@kit.edu,mirjam.simantzik@jibbigo.com,neubig@is.naist.jps-nakamura@is.naist.jp,waibel@kit.eduAbstractInthispaper,westudytheproblemofmanu-allycorrectingautomaticannotationsofnatu-rallanguageinasefficientamanneraspos-sible.Weintroduceamethodforautomati-callysegmentingacorpusintochunkssuchthatmanyuncertainlabelsaregroupedintothesamechunk,whilehumansupervisioncanbeomittedaltogetherforothersegments.Atradeoffmustbefoundforsegmentsizes.Choosingshortsegmentsallowsustoreducethenumberofhighlyconfidentlabelsthataresupervisedbytheannotator,whichisusefulbecausetheselabelsareoftenalreadycorrectandsupervisingcorrectlabelsisawasteofeffort.Incontrast,longsegmentsreducethecognitiveeffortduetocontextswitches.Ourmethodhelpsfindthesegmentationthatopti-mizessupervisionefficiencybydefiningusermodelstopredictthecostandutilityofsu-pervisingeachsegmentandsolvingacon-strainedoptimizationproblembalancingthesecontradictoryobjectives.Auserstudydemon-stratesnoticeablegainsoverpre-segmented,confidence-orderedbaselinesontwonaturallanguageprocessingtasks:speechtranscrip-tionandwordsegmentation.1IntroductionManynaturallanguageprocessing(NLP)tasksre-quirehumansupervisiontobeusefulinpractice,beittocollectsuitabletrainingmaterialortomeetsomedesiredoutputquality.Giventhehighcostofhumanintervention,howtominimizethesupervi-sioneffortisanimportantresearchproblem.Previ-ousworksinareassuchasactivelearning,postedit-(un)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.(b)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.(c)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.Figure1:Threeautomatictranscriptsofthesentence“ItwasabrightcolddayinApril,andtheclockswerestrik-ingthirteen”,withrecognitionerrorsinparentheses.Theunderlinedpartsaretobecorrectedbyahumanfor(un)phrases,(b)words,ou(c)theproposedsegmentation.ing,andinteractivepatternrecognitionhaveinves-tigatedthisquestionwithnotablesuccess(Settles,2008;Specia,2011;Gonz´alez-Rubioetal.,2010).Themostcommonframeworkforefficientanno-tationintheNLPcontextconsistsoftraininganNLPsystemonasmallamountofbaselinedata,andthenrunningthesystemonunannotateddatatoestimateconfidencescoresofthesystem’spredictions(Set-tles,2008).Sentenceswiththelowestconfidencearethenusedasthedatatobeannotated(Figure1(un)).Cependant,ithasbeennotedthatwhentheNLPsysteminquestionalreadyhasrelativelyhighaccu-racy,annotatingentiresentencescanbewasteful,asmostwordswillalreadybecorrect(TomanekandHahn,2009;Neubigetal.,2011).Inthesecases,itispossibletoachievemuchhigherbenefitperanno-tatedwordbyannotatingsub-sententialunits(Fig-ure1(b)).Cependant,asSettlesetal.(2008)pointout,sim-plymaximizingthebenefitperannotatedinstanceisnotenough,astherealsupervisioneffortvaries l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 155–168. Action Editor: Janyce Wiebe.
Transactions of the Association for Computational Linguistics, 2 (2014) 155–168. Action Editor: Janyce Wiebe. Submitted 6/2013; Revised 11/2013; Published 4/2014. c (cid:13) 2014 Association for Computational Linguistics. Senti-LSSVM:Sentiment-OrientedMulti-RelationExtractionwithLatentStructuralSVMLizhenQuMaxPlanckInstituteforInformaticslqu@mpi-inf.mpg.deYiZhangNuanceCommunicationsyi.zhang@nuance.comRuiWangDFKIGmbHmars198356@hotmail.comLiliJiangMaxPlanckInstituteforInformaticsljiang@mpi-inf.mpg.deRainerGemullaMaxPlanckInstituteforInformaticsrgemulla@mpi-inf.mpg.deGerhardWeikumMaxPlanckInstituteforInformaticsweikum@mpi-inf.mpg.deAbstractExtractinginstancesofsentiment-orientedre-lationsfromuser-generatedwebdocumentsisimportantforonlinemarketinganalysis.Un-likepreviouswork,weformulatethisextrac-tiontaskasastructuredpredictionproblemanddesignthecorrespondinginferenceasanintegerlinearprogram.OurlatentstructuralSVMbasedmodelcanlearnfromtrainingcor-porathatdonotcontainexplicitannotationsofsentiment-bearingexpressions,anditcansi-multaneouslyrecognizeinstancesofbothbi-nary(polarity)andternary(comparative)re-lationswithregardtoentitymentionsofin-terest.Theempiricalevaluationshowsthatourapproachsignificantlyoutperformsstate-of-the-artsystemsacrossdomains(camerasandmovies)andacrossgenres(reviewsandforumposts).Thegoldstandardcorpusthatwebuiltwillalsobeavaluableresourceforthecommunity.1IntroductionSentiment-orientedrelationextraction(Choietal.,2006)isconcernedwithrecognizingsentimentpo-laritiesandcomparativerelationsbetweenentitiesfromnaturallanguagetext.Identifyingsuchrela-tionsoftenrequiressyntacticandsemanticanalysisatbothsentenceandphraselevel.Mostpriorworkonsentimentanalysisconsidereitheri)subjectivesentencedetection(YuandKübler,2011),ii)po-larityclassification(JohanssonandMoschitti,2011;Wilsonetal.,2005),oriii)comparativerelationidentification(JindalandLiu,2006;Ganapathib-hotlaandLiu,2008).Inpractice,cependant,differ-enttypesofsentiment-orientedrelationsfrequentlycoexistindocuments.Inparticular,wefoundthatmorethan38%ofthesentencesinourtestcorpuscontainmorethanonetypeofrelations.Theiso-latedanalysisapproachisinappropriatebecausei)itsacrificesacuracybyignoringtheintricateinterplayamongdifferenttypesofrelations;ii)itcouldleadtoconflictingpredictionssuchasestimatingarelationcandidateasbothnegativeandcomparative.There-fore,inthispaper,weidentifyinstancesofbothsen-timentpolaritiesandcomparativerelationsforenti-tiesofinterestsimultaneously.Weassumethatallthementionsofentitiesandattributesaregiven,andentitiesaredisambiguated.Itisawidelyusedas-sumptionwhenevaluatingamoduleinapipelinesystemthattheoutputsofprecedingmodulesareerror-free.Tothebestofourknowledge,theonlyexist-ingsystemcapableofextractingbothcomparisonsandsentimentpolaritiesisarule-basedsystempro-posedbyDingetal.(2009).Wearguethatitisbettertotacklethetaskbyusingaunifiedmodelwithstructuredoutputs.Itallowsustoconsiderasetofcorrelatedrelationinstancesjointlyandchar-acterizetheirinteractionthroughasetofsoftandhardconstraints.Forexample,wecanencodecon-straintstodiscourageanattributetoparticipateinapolarityrelationandacomparativerelationatthesametime.Asaresult,thesystemextractsasetofcorrelatedinstancesofsentiment-orientedrelationsfromagivensentence.Forexample,withthesen-tenceaboutthecameraCanon7D,“Thesensorisgreat,butthepriceishigherthanNikonD7000.”theexpectedoutputispositive(Canon7D,sensor) l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 143–154. Action Editor: Ellen Riloff.
Transactions of the Association for Computational Linguistics, 2 (2014) 143–154. Action Editor: Ellen Riloff. Submitted 9/2013; Revised 2/2014; Published 4/2014. c(cid:13)2014 Association for Computational Linguistics. TemporalAnnotationintheClinicalDomainWilliamF.StylerIV1,StevenBethard2,SeanFinan3,MarthaPalmer1,SameerPradhan3,PietCdeGroen4,BradErickson4,TimothyMiller3,ChenLin3,GuerganaSavova3andJamesPustejovsky51DepartmentofLinguistics,UniversityofColoradoatBoulder2DepartmentofComputerandInformationSciences,UniversityofAlabamaatBirmingham3Children’sHospitalBostonInformaticsProgramandHarvardMedicalSchool4MayoClinicCollegeofMedicine,MayoClinic,Rochester,MN5DepartmentofComputerScience,BrandeisUniversityAbstractThisarticlediscussestherequirementsofaformalspecificationfortheannotationoftemporalinformationinclinicalnarratives.WediscusstheimplementationandextensionofISO-TimeMLforannotatingacorpusofclinicalnotes,knownastheTHYMEcor-pus.Toreflecttheinformationtaskandtheheavilyinference-basedreasoningdemandsinthedomain,anewannotationguidelinehasbeendeveloped,“theTHYMEGuidelinestoISO-TimeML(THYME-TimeML)”.Toclarifywhatrelationsmeritannotation,wedistinguishbetweenlinguistically-derivedandinferentially-derivedtemporalorderingsinthetext.WealsoapplyatopperformingTemp-Eval2013systemagainstthisnewresourcetomeasurethedifficultyofadaptingsystemstotheclinicaldomain.ThecorpusisavailabletothecommunityandhasbeenproposedforuseinaSemEval2015task.1IntroductionThereisalong-standinginterestintemporalreason-ingwithinthebiomedicalcommunity(Savovaetal.,2009;Hripcsaketal.,2009;Meystreetal.,2008;Bramsenetal.,2006;Combietal.,1997;Keravnou,1997;Dolin,1995;Irvineetal.,2008;Sullivanetal.,2008).Thisinterestextendstotheautomaticex-tractionandinterpretationoftemporalinformationfrommedicaltexts,suchaselectronicdischargesum-mariesandpatientcasesummaries.Makingeffectiveuseoftemporalinformationfromsuchnarrativesisacrucialstepintheintelligentanalysisofinformat-icsformedicalresearchers,whileanawarenessoftemporalinformation(bothimplicitandexplicit)inatextisalsonecessaryformanydataminingtasks.Ithasalsobeendemonstratedthatthetemporalin-formationinclinicalnarrativescanbeusefullyminedtoprovideinformationforsomehigher-leveltempo-ralreasoning(Zhaoetal.,2005).Robusttemporalunderstandingofsuchnarratives,cependant,hasbeendifficulttoachieve,duetothecomplexityofdeter-miningtemporalrelationsamongevents,thediver-sityoftemporalexpressions,andtheinteractionwithbroadercomputationallinguisticissues.RecentworkonElectronicHealthRecords(EHRs)pointstonewwaystoexploitandminetheinforma-tioncontainedtherein(Savovaetal.,2009;Robertsetal.,2009;Zhengetal.,2011;Turchinetal.,2009).Wetargettwomainusecasesforextracteddata.First,wehopetoenableinteractivedisplaysandsummariesofthepatient’srecordstothephysicianatthetimeofvisit,makingacomprehensivereviewofthepatient’shistorybothfasterandlesspronetooversights.Sec-ond,wehopetoenabletemporally-awaresecondaryresearchacrosslargedatabasesofmedicalrecords(e.g.,“Whatpercentageofpatientswhoundergopro-cedureXdevelopside-effectYwithinZmonths?»).Bothoftheseapplicationsrequiretheextractionoftimeanddateassociationsforcriticaleventsandtherelativeorderingofeventsduringthepatient’speriodofcare,allfromthevariousrecordswhichmakeupapatient’sEHR.Althoughwehavethesetwospecificapplicationsinmind,theschemawehavedevelopedisgeneralizableandcouldpotentiallybeembeddedinawidevarietyofbiomedicalusecases.NarrativetextsinEHRsaretemporallyrichdoc-umentsthatfrequentlycontainassertionsaboutthetimingofmedicalevents,suchasvisits,laboratoryvalues,symptoms,signes,diagnoses,andprocedures(Bramsenetal.,2006;Hripcsaketal.,2009;Zhouetal.,2008).Temporalrepresentationandreason-inginthemedicalrecordaredifficultdueto:(1)thediversityoftimeexpressions;(2)thecomplexityofdeterminingtemporalrelationsamongevents(whichareoftenlefttoinference);(3)thedifficultyofhan-dlingthetemporalgranularityofanevent;et(4) l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 2 (2014) 131–142. Action Editor: Joakim Nivre.
Transactions of the Association for Computational Linguistics, 2 (2014) 131–142. Action Editor: Joakim Nivre. Submitted 11/2013; Revised 2/2014; Published 4/2014. c(cid:13)2014 Association for Computational Linguistics. JointIncrementalDisfluencyDetectionandDependencyParsingMatthewHonnibalDepartmentofComputingMacquarieUniversitySydney,Australiamatthew.honnibal@mq.edu.edu.auMarkJohnsonDepartmentofComputingMacquarieUniversitySydney,Australiamark.johnson@mq.edu.edu.auAbstractWepresentanincrementaldependencyparsingmodelthatjointlyperformsdisflu-encydetection.Themodelhandlesspeechrepairsusinganovelnon-monotonictran-sitionsystem,andincludesseveralnovelclassesoffeatures.Forcomparison,weevaluatedtwopipelinesystems,us-ingstate-of-the-artdisfluencydetectors.Thejointmodelperformedbetteronbothtasks,withaparseaccuracyof90.5%and84.0%accuracyatdisfluencydetection.Themodelrunsinexpectedlineartime,andprocessesover550tokensasecond.1IntroductionMostunscriptedspeechcontainsfilledpauses(umsanduhs),anderrorsthatareusuallyeditedon-the-flybythespeaker.Disfluencydetectionisthetaskofdetectingtheseinfelicitiesinspokenlanguagetranscripts.Thetaskhassomeimme-diatevalue,asdisfluencieshavebeenshowntomakespeechrecognitionoutputmuchmoredif-ficulttoread(Jonesetal.,2003),buthasalsobeenmotivatedasamoduleinanaturallanguageunderstandingpipeline,becausedisfluencieshaveprovenproblematicforPCFGparsingmodels.Insteadofapipelineapproach,webuildonre-centworkintransition-baseddependencyparsing,toperformthetwotasksjointly.Therehavebeentwosmallstudiesofdependencyparsingonun-scriptedspeech,bothusingentirelygreedypars-ingstrategies,withoutadirectcomparisonagainstapipelinearchitecture(Jorgensen,2007;RasooliandTetreault,2013).Wegosubstantiallybeyondthesepilotstudies,andpresentasystemthatcom-paresfavourablytoapipelineconsistingofstate-of-the-artcomponents.OurparserlargelyfollowsthedesignofZhangandClark(2011).Weuseastructuredaveragedperceptronmodelwithbeam-searchdecoding(Collins,2002).OurfeaturesetisbasedonZhangandClark(2011),andourtransition-systemisbasedonthearc-eagersystemofNivre(2003).Weextendthetransitionsystemwithanovelnon-monotonictransition,Edit.Itallowssen-tenceslike‘Passthepepperuhsalt’tobeparsedincrementally,withouttheneedtoguessearlythatpepperisdisfluent.Thisisachievedbyre-processingtheleftwardchildrenofthewordEditmarksasdisfluent.Forinstance,iftheparserat-tachesthetopepper,butsubsequentlymarkspep-perasdisfluent,thewillbereturnedtothestack.Wealsoexploittheeasewithwhichthemodelcanincorporatearbitraryfeatures,anddesignasetoffeaturesthatcapturethe‘roughcopy’structureofsomespeechrepairs,whichmotivatedtheJohnsonandCharniak(2004)noisychannelmodel.Ourmaincomparisonisagainsttwopipelinesystems,whichusethetwocurrentstate-of-the-artdisfluencydetectionsystemsaspre-processorstoourparser,minusthecustomdisfluencyfea-turesandtransition.Thejointmodelcomparedfavourablytothepipelineparsersatbothtasks,withanunlabelledattachmentscoreof90.5%,and84.0%accuracyatdetectingspeechrepairs.Anef-ficientimplementationisavailableunderanopen-sourcelicense.1Thefutureprospectsofthesys-temarealsoquitepromising.Becausetheparserisincremental,itshouldbewellsuitedtoun-segmentedtextsuchastheoutputofaspeech-recognitionsystem.Weconsiderourmaincon-tributionstobe:•anovelnon-monotonictransitionsystem,forspeechrepairsandrestarts,1http://github.com/syllog1sm/redshift l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 2 (2014) 105–118. Action Editor: Sharon Goldwater.
Transactions of the Association for Computational Linguistics, 2 (2014) 105–118. Action Editor: Sharon Goldwater. Submitted 11/2013; Revised 2/2014; Published 4/2014. c (cid:13) 2014 Association for Computational Linguistics. ParallelAlgorithmsforUnsupervisedTaggingSujithRaviGoogleMountainView,CA94043sravi@google.comSergeiVassilivitskiiGoogleMountainView,CA94043sergeiv@google.comVibhorRastogi∗TwitterSanFrancisco,CAvibhor.rastogi@gmail.comAbstractWeproposeanewmethodforunsupervisedtaggingthatfindsminimalmodelswhicharethenfurtherimprovedbyExpectationMax-imizationtraining.Incontrasttopreviousapproachesthatrelyonmanuallyspecifiedandmulti-stepheuristicsformodelminimiza-tion,ourapproachisasimplegreedyapprox-imationalgorithmDMLC(DISTRIBUTED-MINIMUM-LABEL-COVER)thatsolvesthisobjectiveinasinglestep.Weextendthemethodandshowhowtoef-ficientlyparallelizethealgorithmonmodernparallelcomputingplatformswhilepreservingapproximationguarantees.Thenewmethodeasilyscalestolargedataandgrammarsizes,overcomingthememorybottleneckinprevi-ousapproaches.Wedemonstratethepowerofthenewalgorithmbyevaluatingonvarioussequencelabelingtasks:Part-of-Speechtag-gingformultiplelanguages(includinglow-resourcelanguages),withcompleteandin-completedictionaries,andsupertagging,acomplexsequencelabelingtask,wherethegrammarsizealonecangrowtomillionsofentries.Ourresultsshowthatforallofthesesettings,ourmethodachievesstate-of-the-artscalableperformancethatyieldshighqualitytaggingoutputs.1IntroductionSupervisedsequencelabelingwithlargelabeledtrainingdatasetsisconsideredasolvedproblem.For∗∗TheresearchdescribedhereinwasconductedwhiletheauthorwasworkingatGoogle.instance,stateoftheartsystemsobtaintaggingac-curaciesover97%forpart-of-speech(POS)taggingontheEnglishPennTreebank.However,learningaccuratetaggerswithoutlabeleddataremainsachal-lenge.Theaccuraciesquicklydropwhenfacedwithdatafromadifferentdomain,langue,orwhenthereisverylittlelabeledinformationavailablefortraining(BankoandMoore,2004).Recently,therehasbeenanincreasingamountofresearchtacklingthisproblemusingunsuper-visedmethods.ApopularapproachistolearnfromPOS-tagdictionaries(Merialdo,1994),wherewearegivenarawwordsequenceandadictionaryoflegaltagsforeachwordtype.LearningfromPOS-tagdictionariesisstillchallenging.Completeword-tagdictionariesmaynotalwaysbeavailableforuseandineverysetting.Whentheyareavailable,thedictionariesareoftennoisy,resultinginhightag-gingambiguity.Furthermore,whenapplyingtag-gersinnewdomainsordifferentdatasets,wemayencounternewwordsthataremissingfromthedic-tionary.TherehavebeensomeeffortstolearnPOStaggersfromincompletedictionariesbyextendingthedictionarytoincludethesewordsusingsomeheuristics(ToutanovaandJohnson,2008)orusingothermethodssuchastype-supervision(GarretteandBaldridge,2012).Inthiswork,wetackletheproblemofunsuper-visedsequencelabelingusingtagdictionaries.ThefirstreportedworkonthisproblemwasonPOStag-gingfromMerialdo(1994).TheapproachinvolvedtrainingastandardHiddenMarkovModel(HMM)usingtheExpectationMaximization(EM)algo-rithm(Dempsteretal.,1977),thoughEMdoesnot l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 93–104. Action Editor: Stefan Riezler.
Transactions of the Association for Computational Linguistics, 2 (2014) 93–104. Action Editor: Stefan Riezler. Submitted 12/2013; Published 2/2014. c (cid:13) 2014 Association for Computational Linguistics. ExploringtheRoleofStressinBayesianWordSegmentationusingAdaptorGrammarsBenjaminB¨orschinger1,2MarkJohnson1,31DepartmentofComputing,MacquarieUniversity,Sydney,Australia2DepartmentofComputationalLinguistics,HeidelbergUniversity,Heidelberg,Germany3SantaFeInstitute,SantaFe,Etats-Unis{benjamin.borschinger|mark.johnson}@mq.edu.auAbstractStresshaslongbeenestablishedasamajorcueinwordsegmentationforEnglishinfants.Weshowthatenablingacurrentstate-of-the-artBayesianwordsegmentationmodeltotakead-vantageofstresscuesnoticeablyimprovesitsperformance.Wefindthattheimprovementsrangefrom10to4%,dependingonboththeuseofphonotacticcuesand,toalesserex-tent,theamountofevidenceavailabletothelearner.Wealsofindthatinparticularearlyon,stresscuesaremuchmoreusefulforourmodelthanphonotacticcuesbythemselves,consistentwiththefindingthatchildrendoseemtousestresscuesbeforetheyusephono-tacticcues.Finally,westudyhowthemodel’sknowledgeaboutstresspatternsevolvesovertime.Wenotonlyfindthatourmodelcor-rectlyacquiresthemostfrequentpatternsrel-ativelyquicklybutalsothattheUniqueStressConstraintthatisattheheartofapreviouslyproposedmodeldoesnotneedtobebuiltinbutcanbeacquiredjointlywithwordsegmen-tation.1IntroductionAmongthefirsttasksachildlanguagelearnerhastosolveispickingoutwordsfromthefluentspeechthatconstitutesitslinguisticinput.1ForEnglish,stresshaslongbeenclaimedtobeausefulcueininfantwordsegmentation(Jusczyketal.,1993;Jusczyketal.,1999b),followingthedemonstra-1Thedatasetsandsoftwaretoreplicateourexperimentsareavailablefromhttp://web.science.mq.edu.au/˜bborschi/tionofitseffectivenessinadultspeechprocess-ing(Cutleretal.,1986).Severalstudieshaveinvestigatedtheroleofstressinwordsegmenta-tionusingcomputationalmodels,usingbothneu-ralnetworkand“algebraic”(asopposedto“statis-tical”)approaches(Christiansenetal.,1998;Lequel,2004;LignosandYang,2010;Lignos,2011;Lig-nos,2012).Bayesianmodelsofwordsegmenta-tion(Brent,1999;Goldwater,2007),cependant,haveuntilrecentlycompletelyignoredstress.ThesoleexceptioninthisrespectisDoyleandLevy(2013)whoaddedstresscuestotheBigrammodel(Gold-wateretal.,2009),demonstratingthatthisleadstoanimprovementinsegmentationperformance.Inthispaper,weextendtheirworkandshowhowtointegratestresscuesintotheflexibleAdaptorGram-marframework(Johnsonetal.,2007).Thisallowsustobothstartfromastrongerbaselinemodelandtoinvestigatehowtheroleofstresscuesinteractswithotheraspectsofthemodel.Inparticular,wefindthatphonotacticcuestoword-boundariesinter-actwithstresscues,indicatingsynergisticeffectsforsmallinputsandpartialredundancyforlargerin-puts.Overall,wefindthatstresscuesaddroughly6%tokenf-scoretoamodelthatdoesnotaccountforphonotacticsand4%toamodelthatalreadyin-corporatesphonotactics.Relatedlyandinlinewiththefindingthatstresscuesareusedbyinfantsbe-forephonotacticcues(Jusczyketal.,1999a),weob-servethatphonotacticcuesrequiremoreinputthanstresscuestobeusedefficiently.AcloserlookattheknowledgeacquiredbyourmodelsshowsthattheUniqueStressConstraintofYang(2004)canbeacquiredjointlywithsegmentingtheinputinstead l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t a c
Transactions of the Association for Computational Linguistics, 2 (2014) 79–92. Action Editor: Mirella Lapata.
Transactions of the Association for Computational Linguistics, 2 (2014) 79–92. Action Editor: Mirella Lapata. Submitted 12/2013; Published 2/2014. c (cid:13) 2014 Association for Computational Linguistics. TheLanguageDemographicsofAmazonMechanicalTurkElliePavlick1MattPost2AnnIrvine2DmitryKachaev2ChrisCallison-Burch1,21ComputerandInformationScienceDepartment,UniversityofPennsylvania2HumanLanguageTechnologyCenterofExcellence,JohnsHopkinsUniversityAbstractWepresentalargescalestudyofthelanguagesspokenbybilingualworkersonMechanicalTurk(MTurk).Weestablishamethodologyfordeterminingthelanguageskillsofanony-mouscrowdworkersthatismorerobustthansimplesurveying.Wevalidateworkers’self-reportedlanguageskillclaimsbymeasuringtheirabilitytocorrectlytranslatewords,andbygeolocatingworkerstoseeiftheyresideincountrieswherethelanguagesarelikelytobespoken.Ratherthanpostingaone-offsurvey,wepostedpaidtasksconsistingof1,000as-signmentstotranslateatotalof10,000wordsineachof100languages.Ourstudyranforseveralmonths,andwashighlyvisibleontheMTurkcrowdsourcingplatform,increas-ingthechancesthatbilingualworkerswouldcompleteit.Ourstudywasusefulbothtocre-atebilingualdictionariesandtoactascen-susofthebilingualspeakersonMTurk.Weusethisdatatorecommendlanguageswiththelargestspeakerpopulationsasgoodcandidatesforotherresearcherswhowanttodevelopcrowdsourced,multilingualtechnologies.Tofurtherdemonstratethevalueofcreatingdataviacrowdsourcing,wehireworkerstocreatebilingualparallelcorporainsixIndianlan-guages,andusethemtotrainstatisticalma-chinetranslationsystems.1OverviewCrowdsourcingisapromisingnewmechanismforcollectingdatafornaturallanguageprocessingre-search.Accesstoafast,cheap,andflexiblework-forceallowsustocollectnewtypesofdata,poten-tiallyenablingnewlanguagetechnologies.BecausecrowdsourcingplatformslikeAmazonMechanicalTurk(MTurk)giveresearchersaccesstoaworld-wideworkforce,oneobviousapplicationofcrowd-sourcingisthecreationofmultilingualtechnologies.WithanincreasingnumberofactivecrowdworkerslocatedoutsideoftheUnitedStates,thereiseventhepotentialtoreachfluentspeakersoflowerresourcelanguages.Inthispaper,weinvestigatethefeasi-bilityofhiringlanguageinformantsonMTurkbyconductingthefirstlarge-scaledemographicstudyofthelanguagesspokenbyworkersontheplatform.Thereareseveralcomplicatingfactorswhentry-ingtotakeacensusofworkersonMTurk.Theworkers’identitiesareanonymized,andAmazonprovidesnoinformationabouttheircountriesofori-ginortheirlanguageabilities.Postingasimplesur-veytohaveworkersreportthisinformationmaybeinadequate,depuis(un)manyworkersmayneverseethesurvey,(b)manyoptnottodoone-offsurveyssincepotentialpaymentislow,et(c)validatingtheanswersofrespondentsisnotstraightforward.Ourstudyestablishesamethodologyfordeter-miningthelanguagedemographicsofanonymouscrowdworkersthatismorerobustthansimplesur-veying.Weaskworkerswhatlanguagestheyspeakandwhatcountrytheylivein,andvalidatetheirclaimsbymeasuringtheirabilitytocorrectlytrans-latewordsandbyrecordingtheirgeolocation.Toincreasethevisibilityandthedesirabilityofourtasks,wepost1,000assignmentsineachof100lan-guages.Thesetaskseachconsistoftranslating10foreignwordsintoEnglish.Twoofthe10wordshaveknowntranslations,allowingustovalidatethattheworkers’translationsareaccurate.Weconstructbilingualdictionarieswithupto10,000entries,withthemajorityofentriesbeingnew.Surveyingthousandsofworkersallowsustoana-lyzecurrentspeakerpopulationsfor100languages. l D o w n o a d e d f r o m h t t p : / / direct . m l l i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 55–66. Action Editor: Lillian Lee.
Transactions of the Association for Computational Linguistics, 2 (2014) 55–66. Action Editor: Lillian Lee. Submitted 9/2013; Revised 12/2013; Published 2/2014. c 2014 Association for Computational Linguistics. (cid:13) Cross-lingualProjectedExpectationRegularizationforWeaklySupervisedLearningMengqiuWangandChristopherD.ManningComputerScienceDepartmentStanfordUniversityStanford,CA94305USA{mengqiu,manning}@cs.stanford.eduAbstractWeconsideramultilingualweaklysupervisedlearningscenariowhereknowledgefroman-notatedcorporainaresource-richlanguageistransferredviabitexttoguidethelearninginotherlanguages.Pastapproachesprojectlabelsacrossbitextandusethemasfeaturesorgoldlabelsfortraining.Weproposeanewmethodthatprojectsmodelexpectationsratherthanlabels,whichfacilitiestransferofmodeluncertaintyacrosslanguagebound-aries.WeencodeexpectationsasconstraintsandtrainadiscriminativeCRFmodelusingGeneralizedExpectationCriteria(MannandMcCallum,2010).EvaluatedonstandardChinese-EnglishandGerman-EnglishNERdatasets,ourmethoddemonstratesF1scoresof64%and60%whennolabeleddataisused.Attainingthesameaccuracywithsu-pervisedCRFsrequires12kand1.5klabeledsentences.Furthermore,whencombinedwithlabeledexamples,ourmethodyieldssignifi-cantimprovementsoverstate-of-the-artsuper-visedmethods,achievingbestreportednum-berstodateonChineseOntoNotesandGer-manCoNLL-03datasets.1IntroductionSupervisedstatisticallearningmethodshaveen-joyedgreatpopularityinNaturalLanguageProcess-ing(NLP)overthepastdecade.Thesuccessofsu-pervisedmethodsdependsheavilyupontheavail-abilityoflargeamountsofannotatedtrainingdata.Manualcurationofannotatedcorporaisacostlyandtimeconsumingprocess.Todate,mostannotatedre-sourcesresideswithintheEnglishlanguage,whichhinderstheadoptionofsupervisedlearningmethodsinmanymultilingualenvironments.Tominimizetheneedforannotation,significantprogresshasbeenmadeindevelopingunsupervisedandsemi-supervisedapproachestoNLP(CollinsandSinger1999;Klein2005;Liang2005;Smith2006;Goldberg2010;interalia).Morerecentparadigmsforsemi-supervisedlearningallowmod-elerstodirectlyencodeknowledgeaboutthetaskandthedomainasconstraintstoguidelearning(Changetal.,2007;MannandMcCallum,2010;Ganchevetal.,2010).Cependant,inamultilingualsetting,comingupwitheffectiveconstraintsrequireextensiveknowledgeoftheforeign1language.Bilingualparalleltext(bitext)lendsitselfasamediumtotransferknowledgefromaresource-richlanguagetoaforeignlanguages.YarowskyandNgai(2001)projectlabelsproducedbyanEnglishtag-gertotheforeignsideofbitext,thenusethepro-jectedlabelstolearnaHMMmodel.Morerecentworkappliedtheprojection-basedapproachtomorelanguage-pairs,andfurtherimprovedperformancethroughtheuseoftype-levelconstraintsfromtagdictionaryandfeature-richgenerativeordiscrimina-tivemodels(DasandPetrov,2011;T¨ackstr¨ometal.,2013).Inourwork,weproposeanewprojection-basedmethodthatdiffersintwoimportantways.First,weneverexplicitlyprojectthelabels.Instead,weprojectexpectationsoverthelabels.Thisprojection1Forexperimentalpurposes,wedesignateEnglishastheresource-richlanguage,andotherlanguagesofinterestas“for-eign”.Inourexperiments,wesimulatetheresource-poorsce-nariousingChineseandGerman,eventhoughinrealitythesetwolanguagesarequiterichinresources. l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 27–40. Action Editor: Kristina Toutanova.
Transactions of the Association for Computational Linguistics, 2 (2014) 27–40. Action Editor: Kristina Toutanova. Submitted 1/2013; Revised 7/2013; Published 2/2014. c (cid:13) 2014 Association for Computational Linguistics. AutomaticDetectionandLanguageIdentificationofMultilingualDocumentsMarcoLui♥♣,JeyHanLau♠andTimothyBaldwin♥♣♥DepartmentofComputingandInformationSystemsTheUniversityofMelbourne♣NICTAVictoriaResearchLaboratory♠DepartmentofPhilosophyKing’sCollegeLondonmhlui@unimelb.edu.au,jeyhan.lau@gmail.com,tb@ldwin.netAbstractLanguageidentificationisthetaskofautomat-icallydetectingthelanguage(s)presentinadocumentbasedonthecontentofthedocu-ment.Inthiswork,weaddresstheproblemofdetectingdocumentsthatcontaintextfrommorethanonelanguage(multilingualdocu-ments).Weintroduceamethodthatisabletodetectthatadocumentismultilingual,iden-tifythelanguagespresent,andestimatetheirrelativeproportions.Wedemonstratetheef-fectivenessofourmethodoversyntheticdata,aswellasreal-worldmultilingualdocumentscollectedfromtheweb.1IntroductionLanguageidentificationisthetaskofautomaticallydetectingthelanguage(s)presentinadocumentbasedonthecontentofthedocument.Languageidentificationtechniquescommonlyassumethatev-erydocumentiswritteninoneofaclosedsetofknownlanguagesforwhichthereistrainingdata,andisthusformulatedasthetaskofselectingthemostlikelylanguagefromthesetoftraininglan-guages.Inthiswork,weremovethismonolingualassumption,andaddresstheproblemoflanguageidentificationindocumentsthatmaycontaintextfrommorethanonelanguagefromthecandidateset.Weproposeamethodthatconcurrentlydetectsthatadocumentismultilingual,andestimatesthepropor-tionofthedocumentthatiswrittenineachlanguage.Detectingmultilingualdocumentshasavarietyofapplications.Mostnaturallanguageprocessingtechniquespresupposemonolingualinputdata,soinclusionofdatainforeignlanguagesintroducesnoise,andcandegradetheperformanceofNLPsys-tems(Alexetal.,2007;CookandLui,2012).Au-tomaticdetectionofmultilingualdocumentscanbeusedasapre-filteringsteptoimprovethequalityofinputdata.Detectingmultilingualdocumentsisalsoimportantforacquiringlinguisticdatafromtheweb(Scannell,2007;AbneyandBird,2010),andhasapplicationsinminingbilingualtextsforstatisticalmachinetranslationfromonlineresources(Resnik,1999;Nieetal.,1999;Lingetal.,2013).Therehasbeenparticularinterestinextractingtextresourcesforlow-densitylanguagesfrommultilingualwebpagescontainingboththelow-densitylanguageandanotherlanguagesuchasEnglish(YamaguchiandTanaka-Ishii,2012;KingandAbney,2013).KingandAbney(2013,p1118)specificallymentiontheneedforanautomaticmethod“toexamineamul-tilingualdocument,andwithhighaccuracy,listthelanguagesthatarepresentinthedocument”.Weintroduceamethodthatisabletodetectmulti-lingualdocuments,andsimultaneouslyidentifyeachlanguagepresentaswellasestimatethepropor-tionofthedocumentwritteninthatlanguage.Weachievethiswithaprobabilisticmixturemodel,us-ingadocumentrepresentationdevelopedformono-linguallanguageidentification(LuiandBaldwin,2011).Themodelpositsthateachdocumentisgen-eratedassamplesfromanunknownmixtureoflan-guagesfromthetrainingset.WeintroduceaGibbssamplertomapsamplestolanguagesforanygivensetoflanguages,andusethistoselectthesetoflan-guagesthatmaximizestheposteriorprobabilityofthedocument. l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 15–26. Action Editor: Sharon Goldwater.
Transactions of the Association for Computational Linguistics, 2 (2014) 15–26. Action Editor: Sharon Goldwater. Submitted 9/2013; Revised 11/2013; Published 2/2014. c (cid:13) 2014 Association for Computational Linguistics. FLORS:FastandSimpleDomainAdaptationforPart-of-SpeechTaggingTobiasSchnabelDepartmentofComputerScienceCornellUniversitytbs49@cornell.eduHinrichSchützeCenterforInformation&LanguageProcessingUniversityofMunichinquiries@cislmu.orgAbstractWepresentFLORS,anewpart-of-speechtag-gerfordomainadaptation.FLORSusesro-bustrepresentationsthatworkespeciallywellforunknownwordsandforknownwordswithunseentags.FLORSissimplerandfasterthanpreviousdomainadaptationmethods,yetithassignificantlybetteraccuracythanseveralbaselines.1IntroductionInthispaperwedescribeFLORS,apart-of-speech(POS)taggerthatisFastintrainingandtagging,usesLOcalcontextonly(asopposedtofindingtheoptimaltagsequencefortheentiresentence),per-formsRobustlyontargetdomains(TDs)inunsu-perviseddomainadaptation(DA)andisSimpleinarchitectureandfeaturerepresentation.FLORSconstructsarobustrepresentationofthelocalcontextofthewordvthatistobetagged.Thisrepresentationconsistsofdistributionalfea-tures,suffixesandwordshapesofvanditslocalneighbors.Weshowthatithastwoadvantages.First,sincethemainpredictorsusedbyFLORSaredistributionalfeatures(nottheword’sidentity),FLORSpredictsunseentagsofknownwordsbet-terthanpriorworkonDAforPOS.Second,sinceFLORSusesrepresentationscomputedfromunla-beledtext,representationsofunknownwordsareinprincipleofthesametypeasrepresentationsofknownwords;thispropertyofFLORSresultsinbetterperformanceonunknownwordscomparedtopriorwork.ThesetwoadvantagesareespeciallybeneficialforTDsthatcontainhighratesofunseentagsofknownwordsandhighratesofunknownwords.WeshowthatFLORSachievesexcellentDAtaggingresultsonthefivedomainsoftheSANCL2012sharedtask(PetrovandMcDonald,2012)andoutperformsthreestate-of-the-arttaggersonBlitzeretal.’s(2006)biomedicaldata.FLORSisalsosimplerandfasterthanotherPOSDAmethods.Itissimpleinthattheinputrepre-sentationconsistsofthreesimpletypesoffeatures:distributionalcountfeaturesandtwotypesofbinaryfeatures,suffixandshapefeatures.Manyotherwordrepresentationsthatareusedforimprovinggeneral-ization(par exemple.,(Brownetal.,1992;Collobertetal.,2011))arecostlytotrainorhavedifficultyhan-dlingunknownwords.Ourrepresentationsarefasttobuildandcanbecreatedon-the-flyforunknownwordsthatoccurduringtesting.Thelearningarchitectureissimpleandfastaswell.Wetrainkbinaryone-vs-allclassifiersthatuselocalcontextonlyandnosequenceinforma-tion(wherekisthenumberoftags).Ainsi,tag-gingcomplexityisO(k).Manyotherlearningse-tupsforDAaremorecomplex;e.g.,theylearnrep-resentations(asopposedtojustcounting),theylearnseveralclassifiersfordifferentsubclassesofwords(e.g.,knownvs.unknown)ortheycombineleft-to-rightandright-to-lefttaggings.Thenexttwosectionsdescribeexperimentaldata,setupandresults.ResultsarediscussedinSection4.WecompareFLORStoalternativewordrepresenta-tionsinSection5andtorelatedworkinSection6.Section7presentsourconclusions.2ExperimentaldataandsetupData.OursourcedomainisthePennTreebank(Marcusetal.,1993)ofWallStreetJournal(WSJ) l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, 2 (2014) 1–14. Action Editor: Lillian Lee.
Transactions of the Association for Computational Linguistics, 2 (2014) 1–14. Action Editor: Lillian Lee. Submitted 3/2013; Revised 6/2013; Published 2/2014. c 2014 Association for Computational Linguistics. (cid:13) HeterogeneousNetworksandTheirApplications:Scientometrics,NameDisambiguation,andTopicModelingBenKing,RahulJhaDepartmentofEECSUniversityofMichiganAnnArbor,MI{benking,rahuljha}@umich.eduDragomirR.RadevDepartmentofEECSSchoolofInformationUniversityofMichiganAnnArbor,MIradev@umich.eduAbstractWepresentheterogeneousnetworksasawaytounifylexicalnetworkswithrelationaldata.WebuildaunifiedACLAnthologynetwork,tyingtogetherthecitation,authorcollaboration,andterm-cooccurencenetworkswithaffiliationandvenuerelations.Thisrepresentationprovestobeconvenientandallowsproblemssuchasnamedisambiguation,topicmodeling,andthemea-surementofscientificimpacttobeeasilysolvedusingonlythisnetworkandoff-the-shelfgraphalgorithms.1IntroductionGraph-basedmethodshavebeenusedtogreatef-fectinNLP,onproblemssuchaswordsensedisam-biguation(Mihalcea,2005),summarization(ErkanandRadev,2004),anddependencyparsing(McDon-aldetal.,2005).Mostpreviousstudiesofnetworksconsidernetworkswithonlyasingletypeofnode,andinsomecasesusinganetworkwithasingletypeofnodecanbeanoversimplifiedviewifitignoresothertypesofrelationships.Inthispaperwewilldemonstrateheterogeneousnetworks,networkswithmultipledifferenttypesofnodesandedges,alongwithseveralapplicationsofthem.Theapplicationsinthispaperarenotpre-sentedsomuchasrobustattemptstoout-performthecurrentstate-of-the-art,butratherattemptsatbeingcompetitiveagainsttopmethodswithlittleeffortbe-yondtheconstructionoftheheterogeneousnetwork.Throughoutthispaper,wewillusethedatafromtheACLAnthologyNetwork(AAN)(Birdetal.,2008;Radevetal.,2013),whichcontainsadditionalmetadatarelationshipsnotfoundintheACLAnthol-ogy,asatypicalheterogeneousnetwork.Theresultsinthispapershouldbegenerallyapplicabletootherheterogeneousnetworks.1.1HeterogeneousAANschemaWebuildaheterogeneousgraphG(V,E)fromAAN,whereVisthesetofverticesandEisthesetofedgesconnectingvertices.Avertexcanbeoneoffivesemantictypes:{papier,author,venue,institution,term}.Anedgecanalsobeoneoffivetypes,eachconnectingdifferenttypesofvertices:•author—[writes]—paper•paper—[cites]—paper•paper—[publishedin]—venue1•author—[affiliatedwith]—institution2•paper—[contains]—termAllofthisdata,exceptfortheterms,isavailableforallpapersinthe2009releaseofAAN.TermsareextractedfromtitlesbyrunningTextRank(Mihal-ceaandTarau,2004)onNP-chunksfromtitlesandmanuallyfilteringoutbadterms.Weshowtheusefulnessofthisrepresentationinseveralapplications:themeasurementofscien-tificimpact(Section2),namedisambiguation(Sec-tion3),andtopicmodeling(Section4).Thehetero-geneousnetworkrepresentationprovidesasimpleframeworkforcombininglexicalnetworks(likethetermco-occurencenetwork)withmetadatarelationsfromasourcelikeAANandallowsustobegintodevelopNLP-awaremethodsforproblemslikesci-entometricsandnamedisambiguation,whicharenotusuallyframedinanNLPperspective.1ForajointmeetingofvenuesAandBpublishingapaperx,twoedges(X,UN)et(X,B)arecreated.2Author-affiliationedgesareweightedaccordingtothenumberofpapersanauthorhaspublishedfromaninstitution. l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t
Transactions of the Association for Computational Linguistics, vol. 3, pp. 585–597, 2015. Action Editor: Regina Barzilay.
Transactions of the Association for Computational Linguistics, vol. 3, pp. 585–597, 2015. Action Editor: Regina Barzilay. Submission batch: 7/2015; Revision batch: 10/2015; Published 12/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) ParsingAlgebraicWordProblemsintoEquationsRikKoncel-Kedziorski,HannanehHajishirzi,AshishSabharwal†,OrenEtzioni†,andSienaDumasAngUniversityofWashington,†AllenInstituteforAI{kedzior,hannaneh,sienaang}@uw.edu,{ashishs,orene}@allenai.orgAbstractThispaperformalizestheproblemofsolv-ingmulti-sentencealgebraicwordproblemsasthatofgeneratingandscoringequationtrees.Weuseintegerlinearprogrammingtogener-ateequationtreesandscoretheirlikelihoodbylearninglocalandglobaldiscriminativemod-els.Thesemodelsaretrainedonasmallsetofwordproblemsandtheiranswers,withoutanymanualannotation,inordertochoosetheequationthatbestmatchestheproblemtext.WerefertotheoverallsystemasALGES.WecompareALGESwithpreviousworkandshowthatitcoversthefullgamutofarithmeticoperationswhereasHosseinietal.(2014)onlyhandleadditionandsubtraction.Inaddition,ALGESovercomesthebrittlenessoftheKush-manetal.(2014)approachonsingle-equationproblems,yieldinga15%to50%reductioninerror.1IntroductionGrade-schoolalgebrawordproblemsarebriefnar-ratives(seeFigure1).Atypicalproblemfirstde-scribesapartialworldstateconsistingofcharacters,entities,andquantities.Nextitupdatestheconditionofanentityorexplicatestherelationshipbetweenentities.Finally,itposesaquestionaboutaquantityinthenarrative.Anordinarychildhastolearntherequiredalge-bra,butwilleasilygraspthenarrativeutilizingex-tensiveworldknowledge,largevocabulary,word-sensedisambiguation,coreferenceresolution,mas-teryofsyntax,andtheabilitytocombineindividualOceansideBikeRentalShopcharges17dollarsplus7dollarsanhourforrentingabike.Tompaid80dollarstorentabike.Howmanyhoursdidhepaytohavethebikecheckedout?=+$17$∗$7$xh80$solution:917+(7∗x)=80Figure1:Exampleproblemandsolutionsentencesintoacoherentmentalmodel.Incontrast,thechallengeforanNLPsystemisto“makesense”ofthenarrative,whichmayrefertoarbitraryactiv-itieslikerentingbikes,collectingcoins,oreatingcookies.Previousworkcopedwiththeopen-domainas-pectofalgebraicwordproblemsbyrelyingondeter-ministicstatetransitionsbasedonverbcategoriza-tion(Hosseinietal.,2014)orbylearningtemplatesthatcoverequationsofparticularforms(Kushmanetal.,2014).Wehavediscovered,cependant,thatbothapproachesarebrittle,particularlyastrainingdataisscarceinthisdomain,andthespaceofequationsgrowsexponentiallywiththenumberofquantitiesmentionedinthemathproblem.WeintroduceALGES,1whichmapsanunseenmulti-sentencealgebraicwordproblemintoasetofpossibleequationtrees.Figure1showsanequationtreealongsidethewordproblemitrepresents.ALGESgeneratesthespaceoftreesviaIntegerLinearProgramming(ILP),whichallowsittocon-1Thecodeanddataispubliclyavailableathttps://gitlab.cs.washington.edu/ALGES/TACL2015. l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 3, pp. 271–282, 2015. Action Editor: Hal Daum´e III.
Transactions of the Association for Computational Linguistics, vol. 3, pp. 271–282, 2015. Action Editor: Hal Daum´e III. Submission batch: 3/2015; Published 5/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 Licence. c (cid:13) DomainAdaptationforSyntacticandSemanticDependencyParsingUsingDeepBeliefNetworksHaitongYang,TaoZhuangandChengqingZongNationalLaboratoryofPatternRecognitionInstituteofAutomation,ChineseAcademyofSciences,Beijing,100190,Chine{htyang,tao.zhuang,cqzong}@nlpr.ia.ac.cnAbstractIncurrentsystemsforsyntacticandseman-ticdependencyparsing,peopleusuallyde-fineaveryhigh-dimensionalfeaturespacetoachievegoodperformance.Butthesesystemsoftensuffersevereperformancedropsonout-of-domaintestdataduetothediversityoffea-turesofdifferentdomains.Thispaperfo-cusesonhowtorelievethisdomainadapta-tionproblemwiththehelpofunlabeledtar-getdomaindata.Weproposeadeeplearningmethodtoadaptbothsyntacticandsemanticparsers.Withadditionalunlabeledtargetdo-maindata,ourmethodcanlearnalatentfea-turerepresentation(LFR)thatisbeneficialtobothdomains.ExperimentsonEnglishdataintheCoNLL2009sharedtaskshowthatourmethodlargelyreducedtheperformancedroponout-of-domaintestdata.Moreover,wegetaMacroF1scorethatis2.32pointshigherthanthebestsystemintheCoNLL2009sharedtaskinout-of-domaintests.1IntroductionBothsyntacticandsemanticdependencyparsingarethestandardtasksintheNLPcommunity.Thestate-of-the-artmodelperformswellifthetestdatacomesfromthedomainofthetrainingdata.Butifthetestdatacomesfromadifferentdomain,theperfor-mancedropsseverely.TheresultsofthesharedtasksofCoNLL2008and2009(Surdeanuetal.,2008;Hajiˇcetal.,2009)alsosubstantiatestheargument.Torelievethedomainadaptation,inthispaper,weproposeadeeplearningmethodforbothsyntacticandsemanticparsers.Wefocusonthesituationthat,besidessourcedomaintrainingdataandtargetdo-maintestdata,wealsohavesomeunlabeledtargetdomaindata.Manysyntacticandsemanticparsersaredevel-opedusingasupervisedlearningparadigm,whereeachdatasampleisrepresentedasavectoroffea-tures,usuallyahigh-dimensionalfeature.Theper-formancedegradationontargetdomaintestdataismainlycausedbythediversityoffeaturesofdiffer-entdomains,i.e.,manyfeaturesintargetdomaintestdataareneverseeninsourcedomaintrainingdata.Previousworkhaveshownthatusingwordclus-terstoreplacethesparselexicalizedfeatures(Kooetal.,2008;Turianetal.,2010),helpsrelievetheperformancedegradationonthetargetdomain.Butforsyntacticandsemanticparsing,peoplealsousealotofsyntacticfeatures,i.e.,featuresextractedfromsyntactictrees.Forexample,therelationpathbe-tweenapredicateandanargumentisasyntacticfea-tureusedinsemanticdependencyparsing(Johans-sonandNugues,2008).Figure1showsanexam-pleofthisrelationpathfeature.Obviously,syntac-ticfeatureslikethisarealsoverysparseandusu-allyspecifictoeachdomain.Themethodofclus-teringfailsingeneralizingthesekindsoffeatures.Ourmethod,cependant,isverydifferentfromclus-teringspecificfeaturesandsubstitutingthesefea-turesusingtheirclusters.Instead,weattackthedo-mainadaptionproblembylearningalatentfeaturerepresentation(LFR)fordifferentdomains,whichissimilartoTitov(2011).Officiellement,weproposeaDeepBeliefNetwork(DBN)modeltorepresentadatasampleusingavectoroflatentfeatures.ThislatentfeaturevectorisinferredbyourDBNmodel l D o w n o a d e d f r o m h t t p : / / direct . m
Transactions of the Association for Computational Linguistics, vol. 3, pp. 571–584, 2015. Action Editor: Chris Callison-Burch.
Transactions of the Association for Computational Linguistics, vol. 3, pp. 571–584, 2015. Action Editor: Chris Callison-Burch. Submission batch: 5/2015; Revision batch: 9/2015; Published 12/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) SemanticParsingofAmbiguousInputthroughParaphrasingandVerificationPhilipArthur,GrahamNeubig,SakrianiSakti,TomokiToda,SatoshiNakamuraGraduateSchoolofInformationScience,NaraInstituteofScienceandTechnology,Japan{philip.arthur.om0,neubig,ssakti,tomoki,s-nakamura}@is.naist.jpAbstractWeproposeanewmethodforsemanticpars-ingofambiguousandungrammaticalinput,suchassearchqueries.Wedosobybuild-ingonanexistingsemanticparsingframeworkthatusessynchronouscontextfreegrammars(SCFG)tojointlymodeltheinputsentenceandoutputmeaningrepresentation.Wegener-alizethisSCFGframeworktoallownotone,butmultipleoutputs.Usingthisformalism,weconstructagrammarthattakesanambigu-ousinputstringandjointlymapsitintobothameaningrepresentationandanaturallan-guageparaphrasethatislessambiguousthantheoriginalinput.Thisparaphrasecanbeusedtodisambiguatethemeaningrepresenta-tionviaverificationusingalanguagemodelthatcalculatestheprobabilityofeachpara-phrase.11IntroductionSemanticparsing(SP)istheproblemofparsingagivennaturallanguage(NL)sentenceintoameaningrepresentation(MR)conducivetofurtherprocessingbyapplications.OneofthemajorchallengesinSPstemsfromthefactthatNLisrifewithambiguities.Forexample,eventhesimplesentence“WherecanweeatasteakinKobe?”containssyntacticambi-guities(“eatinKobe”or“steakinKobe”?),quan-tifierscopeambiguities(dowealleatonesteak,oreacheatonesteak?),andwordsenseambigui-ties(isKobeacityinJapan;oranNBAbasketball1Toolstoreplicateourexperimentscanbefoundathttp://isw3.naist.jp/~philip-a/tacl2015/index.html.player?).Previousworksusingstatisticalmodelsalongwithformalismssuchascombinatorialcat-egorialgrammars,synchronouscontextfreegram-mars,anddependencybasedcompositionalseman-ticshaveshownnotablesuccessinresolvingtheseambiguities(ZettlemoyerandCollins,2005;WongandMooney,2007;Liangetal.,2011;Kwiatkowskietal.,2013).MuchpreviousworkonSPhasfocusedonthecaseofansweringnaturallanguagequeriestoadatabaseoffacts,wherethequeriesgenerallytaketheformoffullsentencessuchas“WhatistheheightofKobeBryant?”Whileansweringtheseques-tionsprovidesanexcellentfirststeptonaturallan-guageinformationaccess,inmanycasestheinputisnotafullsentence,butsomethingmoreunderspec-ifiedandungrammatical.Forexample,thisisthecaseforkeyword-basedsearchqueries(Sajjadetal.,2012)orshortdialogueutterances(ZettlemoyerandCollins,2007).Specificallytakingtheexampleofsearchqueries,userstendtoomitsomeofthefunctionwordsandgrammaticalconstructsinthelanguagetomakeamoreconcisequery.ThefirstcolumnofTable1illustratesseveralsearchqueriesofthepattern“KobeX”whereXisanotherword.FromthesequeriesandtheirMRsincolumntwo,wecanseethatthereareseveralkindsofambiguity,includingnotonlythedistinctionbetweenKobeascityorabasketballplayerasinthepreviousexample,butalsomoreperniciousproblemsuniquetothemoreambiguousinput.Focusingonthequeries“Kobehotels”and“Kobeflight”wecanseethatitisalsonecessarytoestimatethelatentrelationshipbetween l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 3, pp. 559–570, 2015. Action Editor: Joakim Nivre.
Transactions of the Association for Computational Linguistics, vol. 3, pp. 559–570, 2015. Action Editor: Joakim Nivre. Submission batch: 9/2015;Published 11/2015. c(cid:13)2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. ParsingtoNoncrossingDependencyGraphsMarcoKuhlmannandPeterJonssonDepartmentofComputerandInformationScienceLinköpingUniversity,Swedenmarco.kuhlmann@liu.seandpeter.jonsson@liu.seAbstractWestudythegeneralizationofmaximumspan-ningtreedependencyparsingtomaximumacyclicsubgraphs.Becausetheunderlyingop-timizationproblemisintractableevenunderanarc-factoredmodel,weconsidertherestric-tiontononcrossingdependencygraphs.Ourmaincontributionisacubic-timeexactinfer-encealgorithmforthisclass.Weextendthisal-gorithmintoapracticalparserandevaluateitsperformanceonfourlinguisticdatasetsusedinsemanticdependencyparsing.Wealsoexploreageneralizationofourparsingframeworktodependencygraphswithpagenumberatmostkandshowthattheresultingoptimizationprob-lemisNP-hardfork(cid:21)2.1IntroductionDependencyparsersprovidelightweightrepresen-tationsforthesyntacticandthesemanticstructureofnaturallanguage.Syntacticdependencyparsing(Kübleretal.,2009)hasbeenanextremelyactiveresearchareaforthelastdecadeorso,resultinginaccurateandefficientparsersforawiderangeoflanguages.Semanticdependencyparsinghasonlyrecentlybeenaddressedintheliterature(Oepenetal.,2014;Oepenetal.,2015;Duetal.,2015a).Syntacticdependencyparsinghasbeenformal-izedasthesearchformaximumspanningtreesinweighteddigraphs(McDonaldetal.,2005b).Forsemanticdependencyparsing,wheretargetrepresen-tationsarenotnecessarilytree-shaped,itisnaturaltogeneralizethisviewtomaximumacyclicsubgraphs,withorwithouttheadditionalrequirementofweakconnectivity(Schluter,2014).Whileamaximumspanningtreeofaweighteddigraphcanbefoundinpolynomialtime(Tarjan,1977),computingamaximumacyclicsubgraphisintractable,andevengoodapproximatesolutionsarehardtofind(Guruswamietal.,2011).Inthispa-perwethereforeaddressmaximumacyclicsubgraphparsingundertherestrictionthatthesubgraphshouldbenoncrossing,whichinformallymeansthatitsarcscanbedrawnonthehalf-planeabovethesentenceinsuchawaythatnotwoarcscross(andwithoutchang-ingtheorderofthewords).Themaincontributionofthispaperisanalgorithmthatfindsamaximumnoncrossingacyclicsubgraphofa(vertex-ordered)weighteddigraphonnverticesintimeO.n3/.Aftergivingsomebackground(Section2)wein-troducethenoncrossingcondition,compareittootherstructuralconstraintsfromtheliterature,andstudyitsempiricalcoverage(Section3).Wethenpresentourparsingalgorithm(Section4).Toturnthisalgorithmintoapracticalparser,wecombineitwithanoff-the-shelffeaturemodelandonlinetrain-ing(Section5);thesourcecodeofoursystemisre-leasedwiththispaper.1Weevaluatetheperformanceofourparseronfourlinguisticdatasets:thoseusedintherecentSemEvaltaskonsemanticdependencyparsing(Oepenetal.,2015),andthedependencygraphsextractedfromCCGbank(HockenmaierandSteedman,2007).Enfin,weexplorethelimitsofourapproachbyshowingthatfindingthemaximumacyclicsubgraphunderanaturalgeneralizationofthenoncrossingcondition,pagenumberatmostk,isNP-hardfork(cid:21)2(Section6).Weconcludethepaperbydiscussingrelatedandfuturework(Section7).1https://github.com/liu-nlp/gamma l D o w n o a d e d f r o m h t t p : / / direct . m i t . e
Transactions of the Association for Computational Linguistics, vol. 3, pp. 545–558, 2015. Action Editor: Jason Eisner.
Transactions of the Association for Computational Linguistics, vol. 3, pp. 545–558, 2015. Action Editor: Jason Eisner. Submission batch: 5/2015; Revision batch: 10/2015; Published 11/2015. 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) ImitationLearningofAgenda-basedSemanticParsersJonathanBerantStanfordUniversityyonatan@cs.stanford.eduPercyLiangStanfordUniversitypliang@cs.stanford.eduAbstractSemanticparsersconventionallyconstructlogicalformsbottom-upinafixedorder,re-sultinginthegenerationofmanyextraneouspartiallogicalforms.Inthispaper,wecom-bineideasfromimitationlearningandagenda-basedparsingtotrainasemanticparserthatsearchespartiallogicalformsinamorestrate-gicorder.Empirically,ourparserreducesthenumberofconstructedpartiallogicalformsbyanorderofmagnitude,andobtainsa6x-9xspeedupoverfixed-orderparsing,whilemain-tainingcomparableaccuracy.1IntroductionSemanticparsing,thetaskofmappingnaturallanguagetosemanticrepresentations(e.g.,log-icalforms),hasemergedinrecentyearsasapromisingparadigmfordevelopingquestionan-sweringsystems(ZelleandMooney,1996;Zettle-moyerandCollins,2005;WongandMooney,2007;Kwiatkowskietal.,2010;Liangetal.,2011)andothernaturallanguageinterfaces(ZettlemoyerandCollins,2007;Tellexetal.,2011;Matuszeketal.,2012).Recently,therehavebeentwoma-jortrends:Thefirstistoscalesemanticparsingtolargeknowledgebases(KB)suchasFreebase(CaiandYates,2013;Kwiatkowskietal.,2013;BerantandLiang,2014).Thesecondistolearnsemanticparserswithoutrelyingonannotatedlogicalforms,butinsteadontheirdenotations(answers)(Clarkeetal.,2010;Liangetal.,2011);thislessenstheanno-tationburdenandhasbeeninstrumentalinfuelingthefirsttrend(Berantetal.,2013).whatcitywasabrahamlincolnbornin3915083622020>1MType.City(cid:117)PlaceOfBirthOf.AbeLincolnType.Loc(cid:117)ContainedBy.LincolnTown…AbeLincolnLincolnTownUSSLincoln…Type.CityType.Loc…PlaceOfBirthOfPlacesLived…Figure1:Aparsingchartfortheutterance“whatcitywasabrahamlincolnbornin”.Numbersinchartcellsindicatethenumberofpossiblesemanticparsesconstructedoverthatspan,andarrowspointtosomeofthelogicalformsthatwerecon-structed.Therearemorethanonemillionpossiblesemanticparsesforthisutterance.Inthispaper,weareinterestedintrainingseman-ticparsersfromdenotationsonlargeKBs.Thechal-lengeinthissettingisthatthevocabularyofthetargetlogicallanguageoftencontainsthousandsoflogicalpredicates,andthereisamismatchbetweenthestructureofthenaturallanguageandthelogicallanguage.Asaresult,thespaceofpossibleseman-ticparsesforevenashortutterancegrowsquickly.Forexample,considertheutterance“whatcitywasabrahamlincolnbornin”.Figure1illustratesthenumberofpossiblesemanticparsesthatcanbecon-structedoversomeoftheutterancespans.Justbycombiningsemanticparsesoverthespans“city”,“lincoln”and“born”wealreadyobtain362·391·20possibleparses;attheroot,wegetoveramillionparses.1Theambiguityoflanguagethusresultsina1Evenwhentypeconstraintsareusedtopruneparses,westillproducemorethanamillionpossibleparsesattheroot. l D o w n o a d e d f r o m h t t p : / / direct