What topic do you need documentation on?
Transactions of the Association for Computational Linguistics, vol. 4, pp. 47–60, 2016. Action Editor: David Chiang.
Transactions of the Association for Computational Linguistics, vol. 4, pp. 47–60, 2016. Action Editor: David Chiang. Submission batch: 11/2015; Published 2/2016. 2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) DetectingCross-CulturalDifferencesUsingaMultilingualTopicModelE.D.Guti´errez1EkaterinaShutova2PatriciaLichtenstein3GerarddeMelo4LucaGilardi51UniversityofCalifornia,SanDiego2ComputerLaboratory,UniversityofCambridge3UniversityofCalifornia,Merced4IIIS,TsinghuaUniversity,5ICSI,Berkeleyedg@icsi.berkeley.edues407@cam.ac.uktricia1@uchicago.edugdm@demelo.orglucag@icsi.berkeley.eduAbstractUnderstandingcross-culturaldifferenceshasimportantimplicationsforworldaffairsandmanyaspectsofthelifeofsociety.Yet,themajorityoftext-miningmethodstodatefocusontheanalysisofmonolingualtexts.Incon-trast,wepresentastatisticalmodelthatsimul-taneouslylearnsasetofcommontopicsfrommultilingual,non-paralleldataandautomati-callydiscoversthedifferencesinperspectivesonthesetopicsacrosslinguisticcommunities.Weperformabehaviouralevaluationofasub-setofthedifferencesidentifiedbyourmodelinEnglishandSpanishtoinvestigatetheirpsy-chologicalvalidity.1IntroductionRecentyearshaveseenagrowinginterestintext-miningapplicationsaimedatuncoveringpublicopinionsandsocialtrends(Faderetal.,2007;Mon-roeetal.,2008;GerrishandBlei,2011;Pennac-chiottiandPopescu,2011).Theyrestontheas-sumptionthatthelanguageweuseisindicativeofourunderlyingworldviews.Researchincognitiveandsociolinguisticssuggeststhatlinguisticvaria-tionacrosscommunitiessystematicallyreflectsdif-ferencesintheirculturalandmoralmodelsandgoesbeyondlexiconandgrammar(K¨ovecses,2004;LakoffandWehling,2012).Cross-culturaldiffer-encesmanifestthemselvesintextinamultitudeofways,mostprominentlythroughtheuseofexplicitopinionvocabularywithrespecttoacertaintopic(e.g.“policiesthatbenefitthepoor”),idiomaticandmetaphoricallanguage(e.g.“thecompanyisspin-ningitswheels”)andothertypesoffigurativelan-guage,suchasironyorsarcasm.Theconnectionbetweenlanguage,cultureandreasoningremainsoneofthecentralresearchques-tionsinpsychology.ThibodeauandBorodit-sky(2011)investigatedhowmetaphorsaffectourdecision-making.Theypresentedtwogroupsofhu-mansubjectswithtwodifferenttextsaboutcrime.Inthefirsttext,crimewasmetaphoricallyportrayedasavirusandinthesecondasabeast.Thetwogroupswerethenaskedasetofquestionsonhowtotacklecrimeinthecity.Asaresult,whilethefirstgrouptendedtooptforpreventivemeasures(e.g.strongersocialpolicies),thesecondgroupconvergedonpunishment-orrestraint-orientedmea-sures.AccordingtoThibodeauandBoroditsky,theirresultsdemonstratethatmetaphorshaveprofoundinfluenceonhowweconceptualizeandactwithre-specttosocietalissues.Thissuggeststhatinordertogainafullunderstandingofsocialtrendsacrosspop-ulations,oneneedstoidentifysubtlebutsystematiclinguisticdifferencesthatstemfromthegroups’cul-turalbackgrounds,expressedbothliterallyandfig-uratively.Performingsuchananalysisbyhandislabor-intensiveandoftenimpractical,particularlyinamultilingualsettingwhereexpertiseinallofthelanguagesofinterestmayberare.Withtheriseofbloggingandsocialmedia,NLPtechniqueshavebeensuccessfullyusedforanumberoftasksinpoliticalscience,includingautomaticallyestimatingtheinfluenceofparticularpoliticiansintheUSsenate(Faderetal.,2007),identifyinglex-icalfeaturesthatdifferentiatepoliticalrhetoricofopposingparties(Monroeetal.,2008),predictingvotingpatternsofpoliticiansbasedontheiruseoflanguage(GerrishandBlei,2011),andpredictingpoliticalaffiliationofTwitterusers(PennacchiottiandPopescu,2011).Fangetal.(2012)addressed l D o w n o a d e d f r o m h t t p : / / direct . m i
Transactions of the Association for Computational Linguistics, vol. 4, pp. 31–45, 2016. Action Editor: Tim Baldwin.
Transactions of the Association for Computational Linguistics, vol. 4, pp. 31–45, 2016. Action Editor: Tim Baldwin. Submission batch: 12/2015; Revision batch: 2/2016; Published 2/2016. 2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) ABayesianModelofDiachronicMeaningChangeLeaFrermannandMirellaLapataInstituteforLanguage,CognitionandComputationSchoolofInformatics,UniversityofEdinburgh10CrichtonStreet,EdinburghEH89ABl.frermann@ed.ac.uk,mlap@inf.ed.ac.ukAbstractWordmeaningschangeovertimeandanau-tomatedprocedureforextractingthisinfor-mationfromtextwouldbeusefulforhistor-icalexploratorystudies,informationretrievalorquestionanswering.Wepresentady-namicBayesianmodelofdiachronicmeaningchange,whichinferstemporalwordrepresen-tationsasasetofsensesandtheirprevalence.Unlikepreviouswork,weexplicitlymodellanguagechangeasasmooth,gradualpro-cess.Weexperimentallyshowthatthismodel-ingdecisionisbeneficial:ourmodelperformscompetitivelyonmeaningchangedetectiontaskswhilstinducingdiscerniblewordsensesandtheirdevelopmentovertime.ApplicationofourmodeltotheSemEval-2015temporalclassificationbenchmarkdatasetsfurtherre-vealsthatitperformsonparwithhighlyop-timizedtask-specificsystems.1IntroductionLanguageisadynamicsystem,constantlyevolv-ingandadaptingtotheneedsofitsusersandtheirenvironment(Aitchison,2001).Wordsinalllan-guagesnaturallyexhibitarangeofsenseswhosedis-tributionorprevalencevariesaccordingtothegenreandregisterofthediscourseaswellasitshistoricalcontext.Asanexample,considerthewordcutewhichaccordingtotheOxfordEnglishDictionary(OED,Stevenson2010)firstappearedintheearly18thcenturyandoriginallymeantcleverorkeen-witted.1Bythelate19thcenturycutewasusedin1Throughoutthispaperwedenotewordsintruetype,theirsensesinitalics,andsense-specificcontextwordsas{lists}.thesamesenseascunning.Todayitmostlyreferstoobjectsorpeopleperceivedasattractive,prettyorsweet.Anotherexampleisthewordmousewhichinitiallywasonlyusedintherodentsense.TheOEDdatesthecomputerpointingdevicesenseofmouseto1965.Thelattersensehasbecomepar-ticularlydominantinrecentdecadesduetotheever-increasinguseofcomputertechnology.Thearrivaloflarge-scalecollectionsofhistorictexts(Davies,2010)andonlinelibrariessuchastheInternetArchiveandGoogleBookshavegreatlyfacilitatedcomputationalinvestigationsoflanguagechange.Theabilitytoautomaticallydetecthowthemeaningofwordsevolvesovertimeispotentiallyofsignificantvaluetolexicographicandlinguisticresearchbutalsotorealworldapplications.Time-specificknowledgewouldpresumablyrenderwordmeaningrepresentationsmoreaccurate,andbenefitseveraldownstreamtaskswheresemanticinforma-tioniscrucial.Examplesincludeinformationre-trievalandquestionanswering,wheretime-relatedinformationcouldincreasetheprecisionofquerydisambiguationanddocumentretrieval(e.g.,byre-turningdocumentswithnewlycreatedsensesorfil-teringoutdocumentswithobsoletesenses).InthispaperwepresentadynamicBayesianmodelofdiachronicmeaningchange.Wordmean-ingismodeledasasetofsenses,whicharetrackedoverasequenceofcontiguoustimeintervals.Weinfertemporalmeaningrepresentations,consistingofaword’ssenses(asaprobabilitydistributionoverwords)andtheirrelativeprevalence.Ourmodelisthusabletodetectthatmousehadonesenseuntilthemid-20thcentury(characterizedbywordssuchas{cheese,tail,rat})andsubsequentlyacquireda l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 4, pp. 17–30, 2016. Action Editor: Chris Callison-Burch.
Transactions of the Association for Computational Linguistics, vol. 4, pp. 17–30, 2016. Action Editor: Chris Callison-Burch. Submission batch: 9/2015; revised 12/2015; revised 1/2016; Published 2/2016. 2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) LearningtoUnderstandPhrasesbyEmbeddingtheDictionaryFelixHillComputerLaboratoryUniversityofCambridgefelix.hill@cl.cam.ac.ukKyunghyunCho∗CourantInstituteofMathematicalSciencesandCentreforDataScienceNewYorkUniversitykyunghyun.cho@nyu.eduAnnaKorhonenDepartmentofTheoreticalandAppliedLinguisticsUniversityofCambridgealk23@cam.ac.ukYoshuaBengioCIFARSeniorFellowUniversit´edeMontr´ealyoshua.bengio@umontreal.caAbstractDistributionalmodelsthatlearnrichseman-ticwordrepresentationsareasuccessstoryofrecentNLPresearch.However,develop-ingmodelsthatlearnusefulrepresentationsofphrasesandsentenceshasprovedfarharder.Weproposeusingthedefinitionsfoundineverydaydictionariesasameansofbridg-ingthisgapbetweenlexicalandphrasalse-mantics.Neurallanguageembeddingmod-elscanbeeffectivelytrainedtomapdictio-narydefinitions(phrases)à(lexical)repre-sentationsofthewordsdefinedbythosedefi-nitions.Wepresenttwoapplicationsofthesearchitectures:reversedictionariesthatreturnthenameofaconceptgivenadefinitionordescriptionandgeneral-knowledgecrosswordquestionanswerers.Onbothtasks,neurallan-guageembeddingmodelstrainedondefini-tionsfromahandfuloffreely-availablelex-icalresourcesperformaswellorbetterthanexistingcommercialsystemsthatrelyonsig-nificanttask-specificengineering.There-sultshighlighttheeffectivenessofbothneu-ralembeddingarchitecturesanddefinition-basedtrainingfordevelopingmodelsthatun-derstandphrasesandsentences.1IntroductionMuchrecentresearchincomputationalseman-ticshasfocussedonlearningrepresentationsofarbitrary-lengthphrasesandsentences.Thistaskischallengingpartlybecausethereisnoobviousgoldstandardofphrasalrepresentationthatcouldbeused∗WorkmainlydoneattheUniversityofMontreal.intrainingandevaluation.Consequently,itisdiffi-culttodesignapproachesthatcouldlearnfromsuchagoldstandard,andalsohardtoevaluateorcomparedifferentmodels.Inthiswork,weusedictionarydefinitionstoad-dressthisissue.Thecomposedmeaningofthewordsinadictionarydefinition(atall,long-necked,spottedruminantofAfrica)shouldcorrespondtothemeaningofthewordtheydefine(giraffe).Thisbridgebetweenlexicalandphrasalsemanticsisuse-fulbecausehighqualityvectorrepresentationsofsinglewordscanbeusedasatargetwhenlearningtocombinethewordsintoacoherentphrasalrepre-sentation.Thisapproachstillrequiresamodelcapableoflearningtomapbetweenarbitrary-lengthphrasesandfixed-lengthcontinuous-valuedwordvectors.Forthispurposeweexperimentwithtwobroadclassesofneurallanguagemodels(NLMs):Recur-rentNeuralNetworks(RNNs),whichnaturallyen-codetheorderofinputwords,andsimpler(feed-forward)bag-of-words(BOW)embeddingmodels.PriortotrainingtheseNLMs,welearntargetlexi-calrepresentationsbytrainingtheWord2Vecsoft-ware(Mikolovetal.,2013)onbillionsofwordsofrawtext.Wedemonstratetheusefulnessofourapproachbybuildingandreleasingtwoapplications.Thefirstisareversedictionaryorconceptfinder:asystemthatreturnswordsbasedonuserdescriptionsordefini-tions(ZockandBilac,2004).Reversedictionariesareusedbycopywriters,novelists,translatorsandotherprofessionalwriterstofindwordsfornotionsorideasthatmightbeonthetipoftheirtongue. l D o w n o a d e d f r o m h t t p : / / d i r e c
Transactions of the Association for Computational Linguistics, vol. 5, pp. 529–542, 2017. Action Editor: Diana McCarthy.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 529–542, 2017. Action Editor: Diana McCarthy. Submission batch: 7/2017 Published 12/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) AnchoredCorrelationExplanation:TopicModelingwithMinimalDomainKnowledgeRyanJ.Gallagher1,2,KyleReing1,DavidKale1,andGregVerSteeg11InformationSciencesInstitute,UniversityofSouthernCalifornia2VermontComplexSystemsCenter,ComputationalStoryLab,UniversityofVermontryan.gallagher@uvm.edu{reing,kale,gregv}@isi.eduAbstractWhilegenerativemodelssuchasLatentDirichletAllocation(LDA)haveprovenfruit-fulintopicmodeling,theyoftenrequirede-tailedassumptionsandcarefulspecificationofhyperparameters.Suchmodelcomplexityis-suesonlycompoundwhentryingtogeneral-izegenerativemodelstoincorporatehumaninput.WeintroduceCorrelationExplanation(CorEx),analternativeapproachtotopicmod-elingthatdoesnotassumeanunderlyinggen-erativemodel,andinsteadlearnsmaximallyinformativetopicsthroughaninformation-theoreticframework.Thisframeworknat-urallygeneralizestohierarchicalandsemi-supervisedextensionswithnoadditionalmod-elingassumptions.Inparticular,word-leveldomainknowledgecanbeflexiblyincorpo-ratedwithinCorExthroughanchorwords,al-lowingtopicseparabilityandrepresentationtobepromotedwithminimalhumaninterven-tion.Acrossavarietyofdatasets,metrics,andexperiments,wedemonstratethatCorExproducestopicsthatarecomparableinqualitytothoseproducedbyunsupervisedandsemi-supervisedvariantsofLDA.1IntroductionThemajorityoftopicmodelingapproachesutilizeprobabilisticgenerativemodels,modelswhichspec-ifymechanismsforhowdocumentsarewritteninordertoinferlatenttopics.Thesemechanismsmaybeexplicitlystated,asinLatentDirichletAlloca-tion(LDA)(Bleietal.,2003),orimplicitlystated,aswithmatrixfactorizationtechniques(Hofmann,1999;Dingetal.,2008;BuntineandJakulin,2006).ThecoregenerativemechanismsofLDA,inpar-ticular,haveinspirednumerousgeneralizationsthataccountforadditionalinformation,suchastheau-thorship(Rosen-Zvietal.,2004),documentlabels(McAuliffeandBlei,2008),orhierarchicalstructure(Griffithsetal.,2004).Cependant,thesegeneralizationscomeatthecostofincreasinglyelaborateandunwieldygenerativeassumptions.Whiletheseassumptionsallowtopicinferencetobetractableinthefaceofadditionalmetadata,theyprogressivelyconstraintopicstoanarrowerviewofwhatatopiccanbe.Suchassump-tionsareundesirableincontextswhereonewishestominimizemodelcomplexityandlearntopicswith-outpreexistingnotionsofhowthosetopicsorigi-nated.Forthesereasons,weproposetopicmodelingbywayofCorrelationExplanation(CorEx),1aninformation-theoreticapproachtolearninglatenttopicsoverdocuments.UnlikeLDA,CorExdoesnotassumeaparticulardatageneratingmodel,andinsteadsearchesfortopicsthatare“maximallyin-formative”aboutasetofdocuments.Bylearninginformativetopicsratherthangeneratedtopics,weavoidspecifyingthestructureandnatureoftopicsaheadoftime.Inaddition,thelightweightframeworkunderly-ingCorExisversatileandnaturallyextendstohier-archicalandsemi-supervisedvariantswithnoaddi-tionalmodelingassumptions.Morespecifically,we1Opensource,documentedcodefortheCorExtopicmodelavailableathttps://github.com/gregversteeg/corex_topic. l D o w n o a d e d f r o m h t t p : / / direct . m i
Transactions of the Association for Computational Linguistics, vol. 5, pp. 487–500, 2017. Action Editor: Chris Quirk.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 487–500, 2017. Action Editor: Chris Quirk. Submission batch: 3/2017; Published 11/2017. c(cid:13)2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. PhraseTableInductionUsingIn-DomainMonolingualDataforDomainAdaptationinStatisticalMachineTranslationBenjaminMarieAtsushiFujitaNationalInstituteofInformationandCommunicationsTechnology3-5Hikaridai,Seika-cho,Soraku-gun,Kyoto,619-0289,Japan{bmarie,atsushi.fujita}@nict.go.jpAbstractWepresentanewframeworktoinduceanin-domainphrasetablefromin-domainmonolin-gualdatathatcanbeusedtoadaptageneral-domainstatisticalmachinetranslationsystemtothetargeteddomain.OurmethodfirstcompilessetsofphrasesinsourceandtargetlanguagesseparatelyandgeneratescandidatephrasepairsbytakingtheCartesianproductofthetwophrasesets.Itthencomputesin-expensivefeaturesforeachcandidatephrasepairandfiltersthemusingasupervisedclas-sifierinordertoinduceanin-domainphrasetable.WeexperimentedonthelanguagepairEnglish–French,bothtranslationdirections,intwodomainsandobtainedconsistentlybetterresultsthanastrongbaselinesystemthatusesanin-domainbilinguallexicon.Wealsocon-ductedanerroranalysisthatshowedthein-ducedphrasetablesproposedusefultransla-tions,especiallyforwordsandphrasesunseenintheparalleldatausedtotrainthegeneral-domainbaselinesystem.1IntroductionInphrase-basedstatisticalmachinetranslation(SMT),translationmodelsareestimatedoveralargeamountofparalleldata.Ingeneral,usingmoredataleadstoabettertranslationmodel.Whennospecificdomainistargeted,general-domain1par-alleldatafromvariousdomainsmaybeusedto1AsinAxelrodetal.(2011),inthispaper,weusethetermgeneral-domaininsteadofthecommonlyusedout-of-domainbecauseweassumethattheparalleldatamaycontainsomein-domainsentencepairs.trainageneral-purposeSMTsystem.However,itiswell-knownthat,intrainingasystemtotrans-latetextsfromaspecificdomain,usingin-domainparalleldatacanleadtoasignificantlybettertrans-lationquality(Carpuatetal.,2012).En effet,whenonlygeneral-domainparalleldataareused,itisun-likelythatthetranslationmodelcanlearnexpres-sionsandtheirtranslationsspecifictothetargeteddomain.Suchexpressionswillthenremainuntrans-latedinthein-domaintextstotranslate.Sofar,in-domainparalleldatahavebeenhar-nessedtocoverdomain-specificexpressionsandtheirtranslationsinthetranslationmodel.However,evenifwecanassumetheavailabilityofalargequantityofgeneral-domainparalleldata,atleastforresource-richlanguagepairs,findingin-domainpar-alleldataspecifictoaparticulardomainremainschallenging.In-domainparalleldatamaynotexistforthetargetedlanguagepairsormaynotbeavail-ableathandtotrainagoodtranslationmodel.Inordertocircumventthelackofin-domainpar-alleldata,thispaperpresentsanewmethodtoadaptanexistingSMTsystemtoaspecificdomainbyin-ducinganin-domainphrasetable,i.e.,asetofphrasepairsassociatedwithfeaturesfordecoding,fromin-domainmonolingualdata.AswereviewinSec-tion2,mostoftheexistingmethodsforinducingphrasetablesarenotdesigned,andmaynotperformasexpected,toinduceaphrasetableforaspecificdomainforwhichonlylimitedresourcesareavail-able.Insteadofrelyingonlargequantityofparalleldataorhighlycomparablecorpora,ourmethodin-ducesanin-domainphrasetablefromunalignedin-domainmonolingualdatathroughathree-steppro- l D o w n o a d e d f r o m h t t p : / / direct . m i t .
Transactions of the Association for Computational Linguistics, vol. 5, pp. 441–454, 2017. Action Editor: Marco Kuhlmann.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 441–454, 2017. Action Editor: Marco Kuhlmann. Submission batch: 4/2017; Published 11/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) ParsingwithTraces:AnO(n4)AlgorithmandaStructuralRepresentationJonathanK.KummerfeldandDanKleinComputerScienceDivisionUniversityofCalifornia,BerkeleyBerkeley,CA94720,USA{jkk,klein}@cs.berkeley.eduAbstractGeneraltreebankanalysesaregraphstruc-tured,butparsersaretypicallyrestrictedtotreestructuresforefficiencyandmodelingrea-sons.Weproposeanewrepresentationandalgorithmforaclassofgraphstructuresthatisflexibleenoughtocoveralmostalltreebankstructures,whilestilladmittingefficientlearn-ingandinference.Inparticular,weconsiderdirected,acyclic,one-endpoint-crossinggraphstructures,whichcovermostlong-distancedislocation,sharedargumentation,andsimilartree-violatinglinguisticphenomena.Wede-scribehowtoconvertphrasestructureparses,includingtraces,toournewrepresentationinareversiblemanner.Ourdynamicprogramuniquelydecomposesstructures,issoundandcomplete,andcovers97.3%ofthePennEn-glishTreebank.Wealsoimplementaproof-of-conceptparserthatrecoversarangeofnullelementsandtracetypes.1IntroductionManysyntacticrepresentationsusegraphsand/ordiscontinuousstructures,suchastracesinGovern-mentandBindingtheoryandf-structureinLexi-calFunctionalGrammar(Chomsky1981;KaplanandBresnan1982).SentencesinthePennTree-bank(PTB,Marcusetal.1993)haveacoreprojec-tivetreestructureandtraceedgesthatrepresentcon-trolstructures,wh-movementandmore.However,mostparsersandthestandardevaluationmetricig-noretheseedgesandallnullelements.Byleavingoutpartsofthestructure,theyfailtoprovidekeyrelationstodownstreamtaskssuchasquestionan-swering.Whiletherehasbeenworkoncapturingsomepartsofthisextrastructure,ithasgenerallyei-therbeenthroughpost-processingontrees(Johnson2002;Jijkoun2003;Campbell2004;LevyandMan-ning2004;Gabbardetal.2006)orhasonlycapturedalimitedsetofphenomenaviagrammaraugmenta-tion(Collins1997;DienesandDubey2003;Schmid2006;Caietal.2011).Weproposeanewgeneral-purposeparsingalgo-rithmthatcanefficientlysearchoverawiderangeofsyntacticphenomena.Ouralgorithmextendsanon-projectivetreeparsingalgorithm(Pitleretal.2013;Pitler2014)tographstructures,withimprove-mentstoavoidderivationalambiguitywhilemain-taininganO(n4)runtime.Ouralgorithmalsoin-cludesanoptionalextensiontoensureparsescontainadirectedprojectivetreeofnon-traceedges.Ouralgorithmcannotapplydirectlytocon-stituencyparses–itrequireslexicalizedstructuressimilartodependencyparses.Weextendandim-provepreviousworkonlexicalizedconstituentrep-resentations(Shenetal.2007;Carrerasetal.2008;HayashiandNagata2016)tohandletraces.Inthisform,tracescancreateproblematicstructuressuchasdirectedcycles,butweshowhowcarefulchoiceofheadrulescanminimizesuchissues.Weimplementaproof-of-conceptparser,scor-ing88.1ontreesinsection23and70.6ontraces.Together,ourrepresentationandalgorithmcover97.3%ofsentences,farabovethecoverageofpro-jectivetreeparsers(43.9%).2BackgroundThisworkbuildsontwoareas:non-projectivetreeparsing,andparsingwithnullelements.Non-projectivityisimportantinsyntaxforrep- l D o w n o a d e d f r o m h t t p : / / direct . m i
Transactions of the Association for Computational Linguistics, vol. 5, pp. 413–424, 2017. Action Editor: Brian Roark.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 413–424, 2017. Action Editor: Brian Roark. Submission batch: 6/2017; Published 11/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) In-OrderTransition-basedConstituentParsingJiangmingLiuandYueZhangSingaporeUniversityofTechnologyandDesign,8SomapahRoad,Singapore,487372jmliunlp@gmail.com,yuezhang@sutd.edu.sgAbstractBothbottom-upandtop-downstrategieshavebeenusedforneuraltransition-basedcon-stituentparsing.Theparsingstrategiesdif-ferintermsoftheorderinwhichtheyrecog-nizeproductionsinthederivationtree,wherebottom-upstrategiesandtop-downstrategiestakepost-orderandpre-ordertraversalovertrees,respectively.Bottom-upparsersbene-fitfromrichfeaturesfromreadilybuiltpar-tialparses,butlacklookaheadguidanceintheparsingprocess;top-downparsersbenefitfromnon-localguidanceforlocaldecisions,butrelyonastrongencoderovertheinputtopredictaconstituenthierarchybeforeitscon-struction.Tomitigatebothissues,wepro-poseanovelparsingsystembasedonin-ordertraversaloversyntactictrees,designingasetoftransitionactionstofindacompromisebe-tweenbottom-upconstituentinformationandtop-downlookaheadinformation.Basedonstack-LSTM,ourpsycholinguisticallymoti-vatedconstituentparsingsystemachieves91.8F1ontheWSJbenchmark.Furthermore,thesystemachieves93.6F1withsupervisedrerankingand94.2F1withsemi-supervisedreranking,whicharethebestresultsontheWSJbenchmark.1IntroductionTransition-basedconstituentparsingemploysse-quencesoflocaltransitionactionstoconstructcon-stituenttreesoversentences.Therearetwopop-ulartransition-basedconstituentparsingsystems,namelybottom-upparsing(SagaeandLavie,2005;ZhangandClark,2009;Zhuetal.,2013;WatanabeandSumita,2015)andtop-downparsing(Dyeretal.,2016;Kuncoroetal.,2017).Theparsingstrate-giesdifferintermsoftheorderinwhichtheyrecog-nizeproductionsinthederivationtree.Theprocessofbottom-upparsingcanbere-gardedaspost-ordertraversaloveraconstituenttree.Forexample,giventhesentenceinFigure1,abottom-upshift-reduceparsertakestheac-tionsequenceinTable2(un)1tobuildtheoutput,wherethewordsequence“Thelittleboy”isfirstread,andthenanNPrecognizedforthewordsequence.Afterthesystemreadstheverb“likes”anditssubsequentNP,aVPisrecognized.Thefullorderofrecognitionforthetreenodesis3(cid:13)→4(cid:13)→5(cid:13)→2(cid:13)→7(cid:13)→9(cid:13)→10(cid:13)→8(cid:13)→6(cid:13)→11(cid:13)→1(cid:13).Whenmakinglocaldecisions,richinformationisavailablefromreadilybuiltpartialtrees(Zhuetal.,2013;WatanabeandSumita,2015;CrossandHuang,2016),whichcontributestolocaldisambiguation.However,thereislackoftop-downguidancefromlookaheadinformation,whichcanbeuseful(Johnson,1998;RoarkandJohnson,1999;Charniak,2000;LiuandZhang,2017).Inaddition,binarizationmustbeappliedtotrees,asshowninFigure1(b),toensureaconstantnumberofactions(SagaeandLavie,2005),andtotakeadvantageoflexicalheadinformation(Collins,2003).Cependant,suchbinarizationrequiresasetoflanguage-specificrules,whichhampersadaptationofparsingtootherlanguages.Ontheotherhand,theprocessoftop-downparsingcanberegardedaspre-ordertraversaloveratree.GiventhesentenceinFigure1,atop-down1Theactionsequenceistakenonunbinarizedtrees. l D o w n o a d e d f r o m h t t p : / / direct . m i
Transactions of the Association for Computational Linguistics, vol. 5, pp. 379–395, 2017. Action Editor: Mark Steedman.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 379–395, 2017. Action Editor: Mark Steedman. Submission batch: 12/2016; Revision batch: 3/2017; Published 11/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) OrdinalCommon-senseInferenceShengZhangJohnsHopkinsUniversityzsheng2@jhu.eduRachelRudingerJohnsHopkinsUniversityrudinger@jhu.eduKevinDuhJohnsHopkinsUniversitykevinduh@cs.jhu.eduBenjaminVanDurmeJohnsHopkinsUniversityvandurme@cs.jhu.eduAbstractHumanshavethecapacitytodrawcommon-senseinferencesfromnaturallanguage:vari-ousthingsthatarelikelybutnotcertaintoholdbasedonestablisheddiscourse,andarerarelystatedexplicitly.Weproposeanevaluationofautomatedcommon-senseinferencebasedonanextensionofrecognizingtextualentail-ment:predictingordinalhumanresponsesonthesubjectivelikelihoodofaninferencehold-inginagivencontext.Wedescribeaframe-workforextractingcommon-senseknowledgefromcorpora,whichisthenusedtoconstructadatasetforthisordinalentailmenttask.Wetrainaneuralsequence-to-sequencemodelonthisdataset,whichweusetoscoreandgen-eratepossibleinferences.Further,weanno-tatesubsetsofpreviouslyestablisheddatasetsviaourordinalannotationprotocolinordertothenanalyzethedistinctionsbetweentheseandwhatwehaveconstructed.1IntroductionWeusewordstotalkabouttheworld.There-fore,tounderstandwhatwordsmean,wemusthaveapriorexplicationofhowweviewtheworld.–Hobbs(1987)ResearchersinArtificialIntelligenceand(Compu-tational)Linguisticshavelong-citedtherequire-mentofcommon-senseknowledgeinlanguageun-derstanding.1Thisknowledgeisviewedasakey1Schank(1975):Ithasbeenapparent…within…naturallanguageunderstanding…thattheeventuallimittooursolu-tion…wouldbeourabilitytocharacterizeworldknowledge.Samboughtanewclock;TheclockrunsDavefoundanaxeinhisgarage;AcarisparkedinthegarageTomwasaccidentallyshotbyhisteammateinthearmy;TheteammatediesTwofriendswereinaheatedgameofcheckers;ApersonshootsthecheckersMyfriendsandIdecidedtogoswimmingintheocean;TheoceaniscarbonatedFigure1:Examplesofcommon-senseinferencerangingfromverylikely,likely,plausible,technicallypossible,toimpossible.componentinfillinginthegapsbetweenthetele-graphicstyleofnaturallanguagestatements.Weareabletoconveyconsiderableinformationinarela-tivelysparsechannel,presumablyowingtoapar-tiallysharedmodelatthestartofanydiscourse.2Common-senseinference–inferencesbasedoncommon-senseknowledge–ispossibilistic:thingseveryonemoreorlesswouldexpecttoholdinagivencontext,butwithoutthenecessarystrengthoflogicalentailment.3Becausenaturallanguagecor-poraexhibitshumanreportingbias(GordonandVanDurme,2013),systemsthatderiveknowledgeex-clusivelyfromsuchcorporamaybemoreaccuratelyconsideredmodelsoflanguage,ratherthanofthe2McCarthy(1959):aprogramhascommonsenseifitau-tomaticallydeducesforitselfasufficientlywideclassofimme-diateconsequencesofanythingitistoldandwhatitalreadyknows.3ManyofthebridginginferencesofClark(1975)makeuseofcommon-senseknowledge,suchasthefollowingexampleof“Probablepart”:Iwalkedintotheroom.Thewindowslookedouttothebay.Toresolvethedefinitereferencethewindows,oneneedstoknowthatroomshavewindowsisprobable. l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 365–378, 2017. Action Editor: Adam Lopez.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 365–378, 2017. Action Editor: Adam Lopez. Submission batch: 11/2016; Revision batch: 2/2017; Published 10/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) FullyCharacter-LevelNeuralMachineTranslationwithoutExplicitSegmentationJasonLee∗ETHZ¨urichjasonlee@inf.ethz.chKyunghyunChoNewYorkUniversitykyunghyun.cho@nyu.eduThomasHofmannETHZ¨urichthomas.hofmann@inf.ethz.chAbstractMostexistingmachinetranslationsystemsop-erateatthelevelofwords,relyingonex-plicitsegmentationtoextracttokens.Wein-troduceaneuralmachinetranslation(NMT)modelthatmapsasourcecharactersequencetoatargetcharactersequencewithoutanyseg-mentation.Weemployacharacter-levelcon-volutionalnetworkwithmax-poolingattheencodertoreducethelengthofsourcerep-resentation,allowingthemodeltobetrainedataspeedcomparabletosubword-levelmod-elswhilecapturinglocalregularities.Ourcharacter-to-charactermodeloutperformsarecentlyproposedbaselinewithasubword-levelencoderonWMT’15DE-ENandCS-EN,andgivescomparableperformanceonFI-ENandRU-EN.Wethendemonstratethatitispossibletoshareasinglecharacter-levelencoderacrossmultiplelanguagesbytrainingamodelonamany-to-onetransla-tiontask.Inthismultilingualsetting,thecharacter-levelencodersignificantlyoutper-formsthesubword-levelencoderonallthelanguagepairs.WeobservethatonCS-EN,FI-ENandRU-EN,thequalityofthemultilin-gualcharacter-leveltranslationevensurpassesthemodelsspecificallytrainedonthatlan-guagepairalone,bothintermsoftheBLEUscoreandhumanjudgment.1IntroductionNearlyallpreviousworkinmachinetranslationhasbeenatthelevelofwords.Asidefromourintu-∗ThemajorityofthisworkwascompletedwhiletheauthorwasvisitingNewYorkUniversity.itiveunderstandingofwordasabasicunitofmean-ing(Jackendoff,1992),onereasonbehindthisisthatsequencesaresignificantlylongerwhenrep-resentedincharacters,compoundingtheproblemofdatasparsityandmodelinglong-rangedepen-dencies.ThishasdrivenNMTresearchtobeal-mostexclusivelyword-level(Bahdanauetal.,2015;Sutskeveretal.,2014).Despitetheirremarkablesuccess,word-levelNMTmodelssufferfromseveralmajorweaknesses.Forone,theyareunabletomodelrare,out-of-vocabularywords,makingthemlimitedintranslat-inglanguageswithrichmorphologysuchasCzech,FinnishandTurkish.Ifoneusesalargevocabularytocombatthis(Jeanetal.,2015),thecomplexityoftraininganddecodinggrowslinearlywithrespecttothetargetvocabularysize,leadingtoaviciouscycle.Toaddressthis,wepresentafullycharacter-levelNMTmodelthatmapsacharactersequenceinasourcelanguagetoacharactersequenceinatargetlanguage.Weshowthatourmodeloutperformsabaselinewithasubword-levelencoderonDE-ENandCS-EN,andachievesacomparableresultonFI-ENandRU-EN.Apurelycharacter-levelNMTmodelwithabasicencoderwasproposedasabase-linebyLuongandManning(2016),buttrainingitwasprohibitivelyslow.Wewereabletotrainourmodelatareasonablespeedbydrasticallyreducingthelengthofsourcesentencerepresentationusingastackofconvolutional,poolingandhighwaylayers.Oneadvantageofcharacter-levelmodelsisthattheyarebettersuitedformultilingualtranslationthantheirword-levelcounterpartswhichrequireaseparatewordvocabularyforeachlanguage.We l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 353–364, 2017. Action Editor: Eric Fosler-Lussier.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 353–364, 2017. Action Editor: Eric Fosler-Lussier. Submission batch: 10/2016; Revision batch: 12/2016; Published 10/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) UnsupervisedLearningofMorphologicalForestsJiamingLuoCSAIL,MITj_luo@mit.eduKarthikNarasimhanCSAIL,MITkarthikn@mit.eduReginaBarzilayCSAIL,MITregina@csail.mit.eduAbstractThispaperfocusesonunsupervisedmodelingofmorphologicalfamilies,collectivelycom-prisingaforestoverthelanguagevocabulary.Thisformulationenablesustocaptureedge-wisepropertiesreflectingsingle-stepmorpho-logicalderivations,alongwithglobaldistribu-tionalpropertiesoftheentireforest.Theseglobalpropertiesconstrainthesizeoftheaf-fixsetandencourageformationoftightmor-phologicalfamilies.TheresultingobjectiveissolvedusingIntegerLinearProgramming(ILP)pairedwithcontrastiveestimation.Wetrainthemodelbyalternatingbetweenop-timizingthelocallog-linearmodelandtheglobalILPobjective.Weevaluateoursys-temonthreetasks:rootdetection,clusteringofmorphologicalfamilies,andsegmentation.Ourexperimentsdemonstratethatourmodelyieldsconsistentgainsinallthreetaskscom-paredwiththebestpublishedresults.11IntroductionThemorphologicalstudyofalanguageinherentlydrawsupontheexistenceoffamiliesofrelatedwords.Allwordswithinafamilycanbederivedfromacommonrootviaaseriesoftransformations,whetherinflectionalorderivational.Figure1de-pictsonesuchfamily,originatingfromthewordfaith.Thisrepresentationcanbenefitarangeofapplications,includingsegmentation,rootdetectionandclusteringofmorphologicalfamilies.1Codeisavailableathttps://github.com/j-luo93/MorphForest.Figure1:Anillustrationofasingletreeinamor-phologicalforest.preandsufrepresentprefixationandsuffixation.Eachedgehasanassociatedproba-bilityforthemorphologicalchange.Usinggraphterminology,afullmorphologicalas-signmentofthewordsinalanguagecanberepre-sentedasaforest.2Validforestsofmorphologicalfamiliesexhibitanumberofwell-knownregulari-ties.Atthegloballevel,thenumberofrootsislim-ited,andonlyconstituteasmallfractionofthevo-cabulary.Asimilarconstraintappliestothenum-berofpossibleaffixes,sharedacrossfamilies.Atthelocaledgelevel,wepreferderivationsthatfol-lowregularorthographicpatternsandpreservese-manticrelatedness.Wehypothesizethatenforcingtheseconstraintsaspartoftheforestinductionpro-2ThecorrectmathematicaltermforthestructureinFigure1isadirected1-forestorfunctionalgraph.Forsimplicity,weshallusethetermsforestandtreetorefertoadirected1-forestoradirected1-treebecauseofthecycleattheroot. l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 309–324, 2017. Action Editor: Sebastian Pad´o.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 309–324, 2017. Action Editor: Sebastian Pad´o. Submission batch: 4/2017; Revision batch: 7/2017; Published 9/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) SemanticSpecializationofDistributionalWordVectorSpacesusingMonolingualandCross-LingualConstraintsNikolaMrkši´c1,2,IvanVuli´c1,DiarmuidÓSéaghdha2,IraLeviant3RoiReichart3,MilicaGaši´c1,AnnaKorhonen1,SteveYoung1,21UniversityofCambridge2AppleInc.3Technion,IITAbstractWepresentATTRACT-REPEL,analgorithmforimprovingthesemanticqualityofwordvectorsbyinjectingconstraintsextractedfromlexicalresources.ATTRACT-REPELfacilitatestheuseofconstraintsfrommono-andcross-lingualresources,yieldingsemanticallyspe-cializedcross-lingualvectorspaces.Ourevalu-ationshowsthatthemethodcanmakeuseofex-istingcross-linguallexiconstoconstructhigh-qualityvectorspacesforaplethoraofdifferentlanguages,facilitatingsemantictransferfromhigh-tolower-resourceones.Theeffectivenessofourapproachisdemonstratedwithstate-of-the-artresultsonsemanticsimilaritydatasetsinsixlanguages.WenextshowthatATTRACT-REPEL-specializedvectorsboostperformanceinthedownstreamtaskofdialoguestatetrack-ing(DST)acrossmultiplelanguages.Finally,weshowthatcross-lingualvectorspacespro-ducedbyouralgorithmfacilitatethetrainingofmultilingualDSTmodels,whichbringsfurtherperformanceimprovements.1IntroductionWordrepresentationlearninghasbecomeare-searchareaofcentralimportanceinmodernnatu-rallanguageprocessing.Thecommontechniquesforinducingdistributedwordrepresentationsaregroundedinthedistributionalhypothesis,relyingonco-occurrenceinformationinlargetextualcorporatolearnmeaningfulwordrepresentations(Mikolovetal.,2013b;Penningtonetal.,2014;ÓSéaghdhaandKorhonen,2014;LevyandGoldberg,2014).Re-cently,methodsthatgobeyondstand-aloneunsu-pervisedlearninghavegainedincreasedpopularity.Thesemodelstypicallybuildondistributionalonesbyusinghuman-orautomatically-constructedknowl-edgebasestoenrichthesemanticcontentofexistingwordvectorcollections.Oftenthisisdoneasapost-processingstep,wherethedistributionalwordvectorsarerefinedtosatisfyconstraintsextractedfromalex-icalresourcesuchasWordNet(Faruquietal.,2015;Wietingetal.,2015;Mrkši´cetal.,2016).Wetermthisapproachsemanticspecialization.Inthispaperweadvancethesemanticspecializa-tionparadigminanumberofways.Weintroduceanewalgorithm,ATTRACT-REPEL,thatusessyn-onymyandantonymyconstraintsdrawnfromlexi-calresourcestotunewordvectorspacesusinglin-guisticinformationthatisdifficulttocapturewithconventionaldistributionaltraining.OurevaluationshowsthatATTRACT-REPELoutperformspreviousmethodswhichmakeuseofsimilarlexicalresources,achievingstate-of-the-artresultsontwowordsim-ilaritydatasets:SimLex-999(Hilletal.,2015)andSimVerb-3500(Gerzetal.,2016).WethendeploytheATTRACT-REPELalgorithminamultilingualsetting,usingsemanticrelationsex-tractedfromBabelNet(NavigliandPonzetto,2012;Ehrmannetal.,2014),across-linguallexicalre-source,toinjectconstraintsbetweenwordsofdiffer-entlanguagesintothewordrepresentations.Thisal-lowsustoembedvectorspacesofmultiplelanguagesintoasinglevectorspace,exploitinginformationfromhigh-resourcelanguagestoimprovethewordrepresentationsoflower-resourceones.Table1illus-tratestheeffectsofcross-lingualATTRACT-REPELspecializationbyshowingthenearestneighborsforthreeEnglishwordsacrossthreecross-lingualspaces. l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 295–307, 2017. Action Editor: Christopher Potts.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 295–307, 2017. Action Editor: Christopher Potts. Submission batch: 10/2016; Revision batch: 12/2016; Published 8/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) OvercomingLanguageVariationinSentimentAnalysiswithSocialAttentionYiYangandJacobEisensteinSchoolofInteractiveComputingGeorgiaInstituteofTechnologyAtlanta,GA30308{yiyang+jacobe}@gatech.eduAbstractVariationinlanguageisubiquitous,particu-larlyinnewerformsofwritingsuchassocialmedia.Fortunately,variationisnotrandom;itisoftenlinkedtosocialpropertiesoftheau-thor.Inthispaper,weshowhowtoexploitsocialnetworkstomakesentimentanalysismorerobusttosociallanguagevariation.Thekeyideaislinguistichomophily:thetendencyofsociallylinkedindividualstouselanguageinsimilarways.Weformalizethisideainanovelattention-basedneuralnetworkarchitec-ture,inwhichattentionisdividedamongsev-eralbasismodels,dependingontheauthor’spositioninthesocialnetwork.Thishastheeffectofsmoothingtheclassificationfunctionacrossthesocialnetwork,andmakesitpos-sibletoinducepersonalizedclassifiersevenforauthorsforwhomthereisnolabeleddataordemographicmetadata.Thismodelsignif-icantlyimprovestheaccuraciesofsentimentanalysisonTwitterandonreviewdata.1IntroductionWordscanmeandifferentthingstodifferentpeople.Fortunately,thesedifferencesarerarelyidiosyn-cratic,butareoftenlinkedtosocialfactors,suchasage(RosenthalandMcKeown,2011),genre(Eck-ertandMcConnell-Ginet,2003),course(Vert,2002),géographie(Trudgill,1974),andmoreinef-fablecharacteristicssuchaspoliticalandculturalattitudes(Fischer,1958;Labov,1963).Innaturallanguageprocessing(NLP),socialmediadatahasbroughtvariationtothefore,spurringthedevelop-mentofnewcomputationaltechniquesforcharac-terizingvariationinthelexicon(Eisensteinetal.,2010),orthography(Eisenstein,2015),andsyn-tax(Blodgettetal.,2016).Cependant,asidefromthefocusedtaskofspellingnormalization(Sproatetal.,2001;Awetal.,2006),therehavebeenfewattemptstomakeNLPsystemsmorerobusttolanguagevari-ationacrossspeakersorwriters.OneexceptionistheworkofHovy(2015),whoshowsthattheaccuraciesofsentimentanalysisandtopicclassificationcanbeimprovedbytheinclusionofcoarse-grainedauthordemographicssuchasageandgender.However,suchdemographicinforma-tionisnotdirectlyavailableinmostdatasets,anditisnotyetclearwhetherpredictedageandgen-derofferanyimprovements.Ontheotherendofthespectrumareattemptstocreatepersonalizedlan-guagetechnologies,asareoftenemployedininfor-mationretrieval(Shenetal.,2005),recommendersystems(BasilicoandHofmann,2004),andlan-guagemodeling(Federico,1996).Butpersonal-izationrequiresannotateddataforeachindividualuser—somethingthatmaybepossibleininteractivesettingssuchasinformationretrieval,butisnottyp-icallyfeasibleinnaturallanguageprocessing.Weproposeamiddlegroundbetweengroup-leveldemographiccharacteristicsandpersonalization,byexploitingsocialnetworkstructure.Thesociologi-caltheoryofhomophilyassertsthatindividualsareusuallysimilartotheirfriends(McPhersonetal.,2001).Thispropertyhasbeendemonstratedforlan-guage(Brydenetal.,2013)aswellasforthedemo-graphicpropertiestargetedbyHovy(2015),whicharemorelikelytobesharedbyfriendsthanbyran-dompairsofindividuals(Thelwall,2009).Social l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 279–293, 2017. Action Editor: Yuji Matsumoto.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 279–293, 2017. Action Editor: Yuji Matsumoto. Submission batch: 5/2016; Revision batch: 10/2016; 2/2017; Published 8/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) Cross-LingualSyntacticTransferwithLimitedResourcesMohammadSadeghRasooliandMichaelCollins∗DepartmentofComputerScience,ColumbiaUniversityNewYork,NY10027,USA{rasooli,mcollins}@cs.columbia.eduAbstractWedescribeasimplebuteffectivemethodforcross-lingualsyntactictransferofdepen-dencyparsers,inthescenariowherealargeamountoftranslationdataisnotavailable.Thismethodmakesuseofthreesteps:1)amethodforderivingcross-lingualwordclus-ters,whichcanthenbeusedinamultilingualparser;2)amethodfortransferringlexicalinformationfromatargetlanguagetosourcelanguagetreebanks;3)amethodforintegrat-ingthesestepswiththedensity-drivenannota-tionprojectionmethodofRasooliandCollins(2015).Experimentsshowimprovementsoverthestate-of-the-artinseverallanguagesusedinpreviouswork,inasettingwheretheonlysourceoftranslationdataistheBible,acon-siderablysmallercorpusthantheEuroparlcorpususedinpreviouswork.ResultsusingtheEuroparlcorpusasasourceoftranslationdatashowadditionalimprovementsovertheresultsofRasooliandCollins(2015).Wecon-cludewithresultson38datasetsfromtheUni-versalDependenciescorpora.1IntroductionCreatingmanually-annotatedsyntactictreebanksisanexpensiveandtimeconsumingtask.Recentlytherehasbeenagreatdealofinterestincross-lingualsyntactictransfer,whereaparsingmodelistrainedforsomelanguageofinterest,usingonlytreebanksinotherlanguages.Thereisaclearmotivationforthisinbuildingparsingmodelsforlanguagesforwhichtreebankdataisunavailable.Methods∗OnleaveatGoogleInc.NewYork.forsyntactictransferincludeannotationprojectionmethods(Hwaetal.,2005;Ganchevetal.,2009;McDonaldetal.,2011;MaandXia,2014;RasooliandCollins,2015;Lacroixetal.,2016;Agi´cetal.,2016),learningofdelexicalizedmodelsonuniver-saltreebanks(ZemanandResnik,2008;McDon-aldetal.,2011;T¨ackstr¨ometal.,2013;RosaandZabokrtsky,2015),treebanktranslation(Tiedemannetal.,2014;Tiedemann,2015;TiedemannandAgi´c,2016)andmethodsthatleveragecross-lingualrep-resentationsofwordclusters,embeddingsordictio-naries(T¨ackstr¨ometal.,2012;Durrettetal.,2012;Duongetal.,2015a;ZhangandBarzilay,2015;XiaoandGuo,2015;Guoetal.,2015;Guoetal.,2016;Ammaretal.,2016a).Thispaperconsiderstheproblemofcross-lingualsyntactictransferwithlimitedresourcesofmono-lingualandtranslationdata.Specifically,weusetheBiblecorpusofChristodouloupoulosandSteed-man(2014)asasourceoftranslationdata,andWikipediaasasourceofmonolingualdata.Wede-liberatelylimitourselvestotheuseofBibletrans-lationdatabecauseitisavailableforaverybroadsetoflanguages:thedatafromChristodouloupou-losandSteedman(2014)includesdatafrom100languages.TheBibledatacontainsamuchsmallersetofsentences(around24,000)thanothertransla-tioncorpora,forexampleEuroparl(Koehn,2005),whichhasaround2millionsentencesperlanguagepair.Thismakesitaconsiderablymorechalleng-ingcorpustoworkwith.Similarly,ourchoiceofWikipediaasthesourceofmonolingualdataismo-tivatedbytheavailabilityofWikipediadatainaverybroadsetoflanguages. l D o w n o a d e d f r o m h t t p : / / d i r e c
Transactions of the Association for Computational Linguistics, vol. 5, pp. 247–261, 2017. Action Editor: Hinrich Sch¨utze.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 247–261, 2017. Action Editor: Hinrich Sch¨utze. Submission batch: 12/2015; Revision batch: 5/2016; 11/2016; Published 7/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) SparseCodingofNeuralWordEmbeddingsforMultilingualSequenceLabelingG´aborBerendDepartmentofInformaticsUniversityofSzeged2´Arp´adt´er,6720Szeged,Hungaryberendg@inf.u-szeged.huAbstractInthispaperweproposeandcarefullyeval-uateasequencelabelingframeworkwhichsolelyutilizessparseindicatorfeaturesde-rivedfromdensedistributedwordrepresen-tations.Theproposedmodelobtains(près)state-of-theartperformanceforbothpart-of-speechtaggingandnamedentityrecognitionforavarietyoflanguages.Ourmodelreliesonlyonafewthousandsparsecoding-derivedfeatures,withoutapplyinganymodificationofthewordrepresentationsemployedforthedifferenttasks.Theproposedmodelhasfa-vorablegeneralizationpropertiesasitretainsover89.8%ofitsaveragePOStaggingaccu-racywhentrainedat1.2%ofthetotalavailabletrainingdata,i.e.150sentencesperlanguage.1IntroductionDeterminingthelinguisticstructureofnaturallan-guagetextsbasedonrichhand-craftedfeatureshasalong-goinghistoryinnaturallanguageprocessing.Thefocusoftraditionalapproacheshasmostlybeenonbuildinglinguisticanalyzersforaparticularkindofanalysis,whichoftenleadstotheincorporationofextensivelinguisticand/ordomainknowledgefordefiningthefeaturespace.Consequently,traditionalmodelseasilybecomelanguageand/ortaskspecificresultinginimpropergeneralizationproperties.Anewresearchdirectionhasemergedrecently,thataimsatbuildingmoregeneralmodelsthatre-quirefarlessfeatureengineeringornoneatall.Theseadvancementsinnaturallanguageprocessing,pioneeredbyBengioetal.(2003),followedbyCol-lobertandWeston(2008),Collobertetal.(2011),Mikolovetal.(2013un)amongothers,employadif-ferentphilosophy.Theobjectiveoftheseworksistofindrepresentationsforlinguisticphenomenainanunsupervisedmannerbyrelyingonlargeamountsoftext.Naturallanguagephenomenaareextremelysparsebytheirnature,whereascontinuouswordem-beddingsemploydenserepresentationsofwords.Inourpaperweempiricallyverifyviarigorousexper-imentsthatturningthesedenserepresentationsintoamuchsparser(yetdenserthanone-hotencoding)formcankeepthemostsalientpartsofwordrepre-sentationsthatarehighlysuitableforsequencemod-els.Furthermore,ourexperimentsrevealthatourpro-posedmodelperformssubstantiallybetterthantra-ditionalfeature-richmodelsintheabsenceofabun-danttrainingdata.Ourproposedmodelalsohastheadvantageofperformingwellonmultiplesequencelabelingtaskswithoutanymodificationintheap-pliedwordrepresentationsthankstothesparsefea-turesderivedfromcontinuouswordrepresentations.Ourworkaimsatintroducinganovelsequencela-belingmodelsolelyutilizingfeaturesderivedfromthesparsecodingofcontinuouswordembeddings.Eventhoughsparsecodinghadpreviouslybeenuti-lizedinNLPpriortous(Faruquietal.,2015;Chenetal.,2016),tothebestofourknowledge,wearethefirsttoproposeasequencelabelingframeworkincorporatingitwiththefollowingcontributions:•Weshowthattheproposedsparserepresen-tationisgeneralassequencelabelingmodelstrainedonthemachieve(près)state-of-the-artperformancesforbothPOStaggingandNER. l D o w n o a d e d f r o m h t t p : / / d i r e c
Transactions of the Association for Computational Linguistics, vol. 5, pp. 233–246, 2017. Action Editor: Patrick Pantel.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 233–246, 2017. Action Editor: Patrick Pantel. Submission batch: 11/2016; Revision batch: 2/2017; Published 7/2017. c(cid:13)2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. Domain-Targeted,HighPrecisionKnowledgeExtractionBhavana Dalvi MishraNiketTandonAllenInstituteforArtificialIntelligence2157NNorthlakeWaySuite110,Seattle,WA98103{bhavanad,nikett,peterc}@allenai.orgPeterClarkAbstractOurgoalistoconstructadomain-targeted,highprecisionknowledgebase(KB),contain-inggeneral(sujet,predicate,objet)state-mentsabouttheworld,insupportofadown-streamquestion-answering(QA)application.Despiterecentadvancesininformationextrac-tion(IE)techniques,nosuitableresourceforourtaskalreadyexists;existingresourcesareeithertoonoisy,toonamed-entitycentric,ortooincomplete,andtypicallyhavenotbeenconstructedwithaclearscopeorpurpose.Toaddressthese,wehavecreatedadomain-targeted,highprecisionknowledgeextractionpipeline,leveragingOpenIE,crowdsourcing,andanovelcanonicalschemalearningalgo-rithm(calledCASI),thatproduceshighpre-cisionknowledgetargetedtoaparticulardo-main-inourcase,elementaryscience.TomeasuretheKB’scoverageofthetargetdo-main’sknowledge(its“comprehensiveness”withrespecttoscience)wemeasurerecallwithrespecttoanindependentcorpusofdo-maintext,andshowthatourpipelineproducesoutputwithover80%precisionand23%re-callwithrespecttothattarget,asubstantiallyhighercoverageoftuple-expressiblescienceknowledgethanothercomparableresources.WehavemadetheKBpubliclyavailable1.1IntroductionWhiletherehavebeensubstantialadvancesinknowledgeextractiontechniques,theavailabilityofhighprecision,generalknowledgeabouttheworld,1ThisKBnamedas“AristoTupleKB”isavailablefordown-loadathttp://data.allenai.org/tuple-kbremainselusive.Specifically,ourgoalisalarge,highprecisionbodyof(sujet,predicate,objet)statementsrelevanttoelementaryscience,tosup-portadownstreamQAapplicationtask.Althoughthereareseveralimpressive,existingresourcesthatcancontributetoourendeavor,e.g.,NELL(Carlsonetal.,2010),ConceptNet(SpeerandHavasi,2013),WordNet(Fellbaum,1998),WebChild(Tandonetal.,2014),Yago(Suchaneketal.,2007),FreeBase(Bollackeretal.,2008),andReVerb-15M(Faderetal.,2011),theirapplicabilityislimitedbyboth•limitedcoverageofgeneralknowledge(e.g.,FreeBaseandNELLprimarilycontainknowl-edgeaboutNamedEntities;WordNetusesonlyafew(80%)precisionoverthatcorpus(its“comprehensiveness”withrespecttosci-ence).ThismeasureissimilartorecallatthepointP=80%onthePRcurve,exceptmeasuredagainstadomain-specificsampleofdatathatreflectsthedis-tributionofthetargetdomainknowledge.Compre-hensivenessthusgivesusanapproximatenotionofthecompletenessoftheKBfor(tuple-expressible)factsinourtargetdomain,somethingthathasbeenlackinginearlierKBconstructionresearch.WeshowthatourKBhascomprehensiveness(recallofdomainfactsat>80%precision)of23%withrespecttoscience,asubstantiallyhighercoverage2AristoTupleKBisavailablefordownloadathttp://allenai.org/data/aristo-tuple-kboftuple-expressiblescienceknowledgethanothercomparableresources.WearemakingtheKBpub-liclyavailable.OutlineWediscusstherelatedworkinSection2.InSec-tion3,wedescribethedomain-targetedpipeline,in-cludinghowthedomainischaracterizedtotheal-gorithmandthesequenceoffiltersandpredictorsused.InSection4,wedescribehowtherelation-shipsbetweenpredicatesinthedomainareidenti-fiedandthemoregeneralpredicatesfurtherpop-ulated.FinallyinSection5,weevaluateourap-proach,includingevaluatingitscomprehensiveness(high-precisioncoverageofscienceknowledge).2RelatedWorkTherehasbeensubstantial,recentprogressinknowledgebasesthat(primarily)encodeknowledgeaboutNamedEntities,includingFreebase(Bol-lackeretal.,2008),KnowledgeVault(Dongetal.,2014),DBPedia(Aueretal.,2007),andothersthathierarchicallyorganizenounsandnamedentities,e.g.,Yago(Suchaneketal.,2007).WhiletheseKBsarerichinfactsaboutnamedentities,theyaresparseingeneralknowledgeaboutcommonnouns(e.g.,thatbearshavefur).KBscoveringgeneralknowledgehavereceivedlessattention,althoughtherearesomenotableexceptionsconstructedusingmanualmethods,e.g.,WordNet(Fellbaum,1998),crowdsourcing,e.g.,ConceptNet(SpeerandHavasi,2013),et,morerecently,usingautomatedmeth-ods,e.g.,WebChild(Tandonetal.,2014).Whileuseful,theseresourceshavebeenconstructedtotar-getonlyasmallsetofrelations,providingonlylim-itedcoverageforadomainofinterest.Toovercomerelationsparseness,theparadigmofOpenIE(Bankoetal.,2007;Soderlandetal.,2013)extractsknowledgefromtextusinganopensetofrelationships,andhasbeenusedtosuccess-fullybuildlarge-scale(arg1,relation,arg2)resourcessuchasReVerb-15M(containing15milliongeneraltriples)(Faderetal.,2011).Althoughbroadcov-erage,cependant,OpenIEtechniquestypicallypro-ducenoisyoutput.OurextractionpipelinecanbeviewedasanextensionoftheOpenIEparadigm:westartwithtargetedOpenIEoutput,andthenap-plyasequenceoffilterstosubstantiallyimprovethe l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 205–218, 2017. Action Editor: Stefan Riezler.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 205–218, 2017. Action Editor: Stefan Riezler. Submission batch: 12/2016; Revision batch: 2/2017; Published 7/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) PushingtheLimitsofTranslationQualityEstimationAndr´eF.T.MartinsUnbabelInstitutodeTelecomunicac¸˜oesLisbon,Portugalandre.martins@unbabel.comMarcinJunczys-DowmuntAdamMickiewiczUniversityinPozna´nPozna´n,Polandjunczys@amu.edu.plFabioN.KeplerUnbabelL2F/INESC-ID,Lisbon,PortugalUniversityofPampa,Alegrete,Brazilkepler@unbabel.comRam´onAstudilloUnbabelL2F/INESC-IDLisbon,Portugalramon@unbabel.comChrisHokampDublinCityUniversityDublin,Irelandchokamp@computing.dcu.ieRomanGrundkiewiczAdamMickiewiczUniversityinPozna´nPozna´n,Polandromang@amu.edu.plAbstractTranslationqualityestimationisataskofgrowingimportanceinNLP,duetoitspoten-tialtoreducepost-editinghumaneffortindis-ruptiveways.However,thispotentialiscur-rentlylimitedbytherelativelylowaccuracyofexistingsystems.Inthispaper,weachieveremarkableimprovementsbyexploitingsyn-ergiesbetweentherelatedtasksofword-levelqualityestimationandautomaticpost-editing.First,westackanew,carefullyengineered,neuralmodelintoarichfeature-basedword-levelqualityestimationsystem.Then,weusetheoutputofanautomaticpost-editingsys-temasanextrafeature,obtainingstrikingre-sultsonWMT16:aword-levelFMULT1scoreof57.47%(anabsolutegainof+7.95%overthecurrentstateoftheart),andaPearsoncorrela-tionscoreof65.56%forsentence-levelHTERprediction(anabsolutegainof+13.36%).1IntroductionThegoalofqualityestimation(QE)istoevaluateatranslationsystem’squalitywithoutaccesstoref-erencetranslations(Blatzetal.,2004;Speciaetal.,2013).Thishasmanypotentialusages:informinganenduseraboutthereliabilityoftranslatedcon-tent;decidingifatranslationisreadyforpublish-ingorifitrequireshumanpost-editing;highlightingthewordsthatneedtobechanged.QEsystemsareparticularlyappealingforcrowd-sourcedandpro-fessionaltranslationservices,duetotheirpotentialtodramaticallyreducepost-editingtimesandtosavelaborcosts(Specia,2011).Theincreasinginterestinthisproblemfromanindustrialanglecomesasnosurprise(Turchietal.,2014;deSouzaetal.,2015;Martinsetal.,2016;Kozlovaetal.,2016).Inthispaper,wetackleword-levelQE,whosegoalistoassignalabelofOKorBADtoeachwordinthetranslation(Figure1).Pastapproachestothisproblemincludelinearclassifierswithhandcraftedfeatures(UeffingandNey,2007;Bic¸ici,2013;Shahetal.,2013;Luongetal.,2014),oftencombinedwithfeatureselection(Avramidis,2012;Becketal.,2013),recurrentneuralnetworks(deSouzaetal.,2014;KimandLee,2016),andsystemsthatcom-binelinearandneuralmodels(Kreutzeretal.,2015;Martinsetal.,2016).Westartbyproposinga“pure”QEsystem(§3)consistingofanew,carefullyen-gineeredneuralmodel(NEURALQE),stackedintoalinearfeature-richclassifier(LINEARQE).Alongtheway,weprovidearigorousempiricalanalysistobetterunderstandthecontributionoftheseveralgroupsoffeaturesandtojustifythearchitectureoftheneuralsystem.Asecondcontributionofthispaperisbring-ingintherelatedtaskofautomaticpost-editing(APE;Simardetal.(2007)),whichaimstoau- l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. Action Editor: Hinrich Sch¨utze.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017. Action Editor: Hinrich Sch¨utze. Submission batch: 9/2016; Revision batch: 12/2016; Published 6/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) EnrichingWordVectorswithSubwordInformationPiotrBojanowski∗andEdouardGrave∗andArmandJoulinandTomasMikolovFacebookAIResearch{bojanowski,egrave,ajoulin,tmikolov}@fb.comAbstractContinuouswordrepresentations,trainedonlargeunlabeledcorporaareusefulformanynaturallanguageprocessingtasks.Popularmodelsthatlearnsuchrepresentationsignorethemorphologyofwords,byassigningadis-tinctvectortoeachword.Thisisalimitation,especiallyforlanguageswithlargevocabular-iesandmanyrarewords.Inthispaper,wepro-poseanewapproachbasedontheskipgrammodel,whereeachwordisrepresentedasabagofcharactern-grams.Avectorrepresen-tationisassociatedtoeachcharactern-gram;wordsbeingrepresentedasthesumoftheserepresentations.Ourmethodisfast,allow-ingtotrainmodelsonlargecorporaquicklyandallowsustocomputewordrepresentationsforwordsthatdidnotappearinthetrainingdata.Weevaluateourwordrepresentationsonninedifferentlanguages,bothonwordsim-ilarityandanalogytasks.Bycomparingtorecentlyproposedmorphologicalwordrepre-sentations,weshowthatourvectorsachievestate-of-the-artperformanceonthesetasks.1IntroductionLearningcontinuousrepresentationsofwordshasalonghistoryinnaturallanguageprocessing(Rumel-hartetal.,1988).Theserepresentationsaretyp-icallyderivedfromlargeunlabeledcorporausingco-occurrencestatistics(Deerwesteretal.,1990;Schütze,1992;LundandBurgess,1996).Alargebodyofwork,knownasdistributionalsemantics,hasstudiedthepropertiesofthesemethods(Turney∗Thetwofirstauthorscontributedequally.etal.,2010;BaroniandLenci,2010).Intheneuralnetworkcommunity,CollobertandWeston(2008)proposedtolearnwordembeddingsusingafeed-forwardneuralnetwork,bypredictingawordbasedonthetwowordsontheleftandtwowordsontheright.Morerecently,Mikolovetal.(2013b)pro-posedsimplelog-bilinearmodelstolearncontinu-ousrepresentationsofwordsonverylargecorporaefficiently.Mostofthesetechniquesrepresenteachwordofthevocabularybyadistinctvector,withoutparam-etersharing.Inparticular,theyignoretheinternalstructureofwords,whichisanimportantlimitationformorphologicallyrichlanguages,suchasTurk-ishorFinnish.Forexample,inFrenchorSpanish,mostverbshavemorethanfortydifferentinflectedforms,whiletheFinnishlanguagehasfifteencasesfornouns.Theselanguagescontainmanywordformsthatoccurrarely(ornotatall)inthetrainingcorpus,makingitdifficulttolearngoodwordrep-resentations.Becausemanywordformationsfollowrules,itispossibletoimprovevectorrepresentationsformorphologicallyrichlanguagesbyusingcharac-terlevelinformation.Inthispaper,weproposetolearnrepresentationsforcharactern-grams,andtorepresentwordsasthesumofthen-gramvectors.Ourmaincontributionistointroduceanextensionofthecontinuousskip-grammodel(Mikolovetal.,2013b),whichtakesintoaccountsubwordinformation.Weevaluatethismodelonninelanguagesexhibitingdifferentmor-phologies,showingthebenefitofourapproach. l D o w n o a d e d f r o m h t t p : / / direct
Transactions of the Association for Computational Linguistics, vol. 5, pp. 101–115, 2017. Action Editor: Mark Johnson.
Transactions of the Association for Computational Linguistics, vol. 5, pp. 101–115, 2017. Action Editor: Mark Johnson. Submission batch: 10/2016; Revision batch: 4/2017; Published 4/2017. 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence. c (cid:13) Cross-SentenceN-aryRelationExtractionwithGraphLSTMsNanyunPeng1∗HoifungPoon2ChrisQuirk2KristinaToutanova3∗Wen-tauYih21CenterforLanguageandSpeechProcessing,ComputerScienceDepartmentJohnsHopkinsUniversity,Baltimore,MARYLAND,USA2MicrosoftResearch,Redmond,WA,USA3GoogleResearch,Seattle,WA,USAnpeng1@jhu.edu,kristout@google.com{hoifung,chrisq,scottyih}@microsoft.comAbstractPastworkinrelationextractionhasfocusedonbinaryrelationsinsinglesentences.Re-centNLPinroadsinhigh-valuedomainshavesparkedinterestinthemoregeneralsettingofextractingn-aryrelationsthatspanmul-tiplesentences.Inthispaper,weexploreageneralrelationextractionframeworkbasedongraphlongshort-termmemorynetworks(graphLSTMs)thatcanbeeasilyextendedtocross-sentencen-aryrelationextraction.ThegraphformulationprovidesaunifiedwayofexploringdifferentLSTMapproachesandin-corporatingvariousintra-sententialandinter-sententialdependencies,suchassequential,syntactic,anddiscourserelations.Arobustcontextualrepresentationislearnedfortheen-tities,whichservesasinputtotherelationclas-sifier.Thissimplifieshandlingofrelationswitharbitraryarity,andenablesmulti-tasklearningwithrelatedrelations.Weevaluatethisframe-workintwoimportantprecisionmedicineset-tings,demonstratingitseffectivenesswithbothconventionalsupervisedlearninganddistantsupervision.Cross-sentenceextractionpro-ducedlargerknowledgebases.andmulti-tasklearningsignificantlyimprovedextractionac-curacy.AthoroughanalysisofvariousLSTMapproachesyieldedusefulinsighttheimpactoflinguisticanalysisonextractionaccuracy.1IntroductionRelationextractionhasmadegreatstridesinnewswireandWebdomains.Recently,therehas∗ThisresearchwasconductedwhentheauthorswereatMicrosoftResearch.beenincreasinginterestinapplyingrelationextrac-tiontohigh-valuedomainssuchasbiomedicine.Theadventof$1000humangenome1heraldsthedawnofprecisionmedicine,butprogressinpersonalizedcan-certreatmenthasbeenhinderedbythearduoustaskofinterpretinggenomicdatausingpriorknowledge.Forexample,givenatumorsequence,amoleculartumorboardneedstodeterminewhichgenesandmu-tationsareimportant,andwhatdrugsareavailabletotreatthem.Alreadytheresearchliteraturehasawealthofrelevantknowledge,anditisgrowingatanastonishingrate.PubMed2,theonlinerepositoryofbiomedicalarticles,addstwonewpapersperminute,oronemillioneachyear.Itisthusimperativetoadvancerelationextractionformachinereading.Inthevastliteratureonrelationextraction,pastworkfocusedprimarilyonbinaryrelationsinsinglesentences,limitingtheavailableinformation.Con-siderthefollowingexample:“Thedeletionmutationonexon-19ofEGFRgenewaspresentin16patients,whiletheL858Epointmutationonexon-21wasnotedin10.Allpatientsweretreatedwithgefitinibandshowedapartialresponse.”.Collectively,thetwosentencesconveythefactthatthereisaternaryinteractionbetweenthethreeentitiesinbold,whichisnotexpressedineithersentencealone.Namely,tumorswithL858EmutationinEGFRgenecanbetreatedwithgefitinib.Extractingsuchknowledgeclearlyrequiresmovingbeyondbinaryrelationsandsinglesentences.N-aryrelationsandcross-sentenceextractionhavereceivedrelativelylittleattentioninthepast.Prior1http://www.illumina.com/systems/hiseq-x-sequencing-system.html2https://www.ncbi.nlm.nih.gov/pubmed l D o w n o a d e d f r o m h t t p : / / direct