Operazioni dell'Associazione per la Linguistica Computazionale, vol. 6, pag. 391–406, 2018. Redattore di azioni: Katrin Erk.

Operazioni dell'Associazione per la Linguistica Computazionale, vol. 6, pag. 391–406, 2018. Redattore di azioni: Katrin Erk.
Lotto di invio: 8/2017; Lotto di revisione: 12/2017; Pubblicato 6/2018.

2018 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.

C
(cid:13)

MeasuringtheEvolutionofaScientificFieldthroughCitationFramesDavidJurgensUniversityofMichiganjurgens@umich.eduSrijanKumarStanfordUniversitysrijan@stanford.eduRaineHooverStanfordUniversityraine@stanford.eduDanMcFarlandStanfordUniversitydmcfarla@stanford.eduDanJurafskyStanfordUniversityjurafsky@stanford.eduAbstractCitationshavelongbeenusedtocharacter-izethestateofascientificfieldandtoiden-tifyinfluentialworks.However,writersusecitationsfordifferentpurposes,andthisvar-iedpurposeinfluencesuptakebyfutureschol-ars.Unfortunately,ourunderstandingofhowscholarsuseandframecitationshasbeenlim-itedtosmall-scalemanualcitationanalysisofindividualpapers.Weperformthelargestbe-havioralstudyofcitationstodate,analyzinghowscientificworksframetheircontributionsthroughdifferenttypesofcitationsandhowthisframingaffectsthefieldasawhole.Weintroduceanewdatasetofnearly2,000cita-tionsannotatedfortheirfunction,anduseittodevelopastate-of-the-artclassifierandla-belthepapersofanentirefield:NaturalLan-guageProcessing.Wethenshowhowdiffer-encesinframingaffectscientificuptakeandrevealtheevolutionofthepublicationvenuesandthefieldasawhole.Wedemonstratethatauthorsaresensitivetodiscoursestructureandpublicationvenuewhenciting,andthathowapaperframesitsworkthroughcitationsispre-dictiveofthecitationcountitwillreceive.Fi-nally,weusechangesincitationframingtoshowthatthefieldofNLPisundergoingasig-nificantincreaseinconsensus.1IntroductionAuthorsusecitationstoframetheircontributionsandconnecttoanintellectuallineage(Latour,1987).Anauthor’sscientificframeemployscitationsinmultipleways(Figure1)soastobuildastrongUnlike CITE, we use the method of CITE,which has been used previously for parsing (CITE).ContrastUseBackgroundFigure1:Examplesofcitationfunctionality.andmultifacetedargument.Thesedifferencesinci-tationshavebeenexaminedextensivelywithinthecontextofasinglepaper(Swales,1986;White,2004;Dingetal.,2014).Tuttavia,weknowrela-tivelylittleabouthowthesecitationframesdevelopovertimewithinafieldandwhatimpacttheyhaveonscientificuptake.Answeringthesequestionshasbeenlargelyhin-deredbythelackofadatasetshowinghowcitationsfunctionatthefieldscale.Here,weperformthefirstfield-scalestudyofcitationframingbyfirstde-velopingastate-of-the-artmethodforautomaticallyclassifyingcitationfunctionandthenapplyingthismethodtoanentirefield’sliteraturetoquantifytheeffectsandevolutionofframing.Analyzinglarge-scalechangesincitationfram-ingrequiresanaccuratemethodforclassifyingthefunctionacitationplaystowardsfurtheringanar-gument.Duetothedifficultyofinterpretingci-tationintent,manypriorworksperformedmanualanalysis(MoravcsikandMurugesan,1975;Swales,1990;Harwood,2009)andonlyrecentlyhaveau-tomatedapproachesbeendeveloped(Teufeletal.,2006b;Valenzuelaetal.,2015).Here,weunifycoreaspectsofseveralpriorcitationannotationschemes(White,2004;Dingetal.,2014;Hern´andez-AlvarezandGomez,2016).Usingthisscheme,wecreate

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
2
8
1
5
6
7
6
2
6

/

/
T

l

UN
C
_
UN
_
0
0
0
2
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

392

oneofthelargestannotatedcorporaofcitationsanduseittotrainahigh-accuracymethodforautomat-icallylabelingacorpus.WeapplyourmethodtolabelthefieldofNLP,withover134,127citationsinover20,000papersfromnearlyfortyyearsofwork.Ourworkprovidesfourkeycontributionsforun-derstandinghowauthorsframetheircitations.Weintroduceanewlarge-scalerepresentativecorpusofcitationfunctionandstate-of-the-artmethodologyforclassifyingcitationsbyfunction.Wedemon-stratethatcitationsreflectthediscoursestructureofapaperbutthatthisstructureissignificantlyinflu-encedbypublicationvenue.Third,weshowthatdifferencesinapaper’scitationframinghaveasig-nificantandmeaningfulimpactonfuturescientificuptakeasmeasuredthroughfuturecitations.Finally,byexaminingchangesintheusageofcitationfunc-tions,weshowthatthescholarlyNLPcommunityhasevolvedinhowitsauthorsframetheirwork,re-flectingthematurationandgrowthofthefieldasarapiddiscoveryscience(Collins,1994).Wepub-liclyreleaseourdatasetandcodetoenablefutureresearch.2ACorpusforCitationFunctionCitationsplayakeyroleinsupportingauthors’con-tributionsthroughoutascientificpaper.1Multi-pleschemeshavebeenproposedonhowtoclas-sifythesedifferentroles,rangingfromahandfulofclasses(NanbaandOkumura,1999;PhamandHoffmann,2003)totwentyormore(Garfield,1979;GarzoneandMercer,2000).Whilesuitableforex-pertmanualanalysis,manyschemesincludeeitherfine-graineddistinctionsthataretooraretoreli-ablyidentifyorsubjectiveclassificationsthatrequiredetailedknowledgeofthefieldorauthor(Ziman,1968;Swales,1990;Harwood,2009).Motivatedbythedesiretoautomaticallyexaminelarge-scaletrendsinscholarlybehavior,weaddresstheseis-suesbyunifyingthecommonaspectsofmultipleapproachesinasingleclassification.2.1ClassificationSchemeOurclassificationcapturesthebroadthematicfunc-tionsacitationcanserveinthediscourse,e.g.,pro-1Fornotationalclarity,weusethetermreferencefortheworkthatiscitedandcitationforthementionofitinthetext.vidingbackgroundorservingascontrast(Oppen-heimandRenn,1978;Spiegel-R¨using,1977;Teufeletal.,2006a;Garfield,1979;GarzoneandMercer,2000;Abu-Jbaraetal.,2013).2Citationfunctionre-flectsthespecificpurposeacitationplayswithre-specttothecurrentpaper’scontributions.Weunifythefunctionalrolescommoninseveralclassifica-tions,per esempio.,(Spiegel-R¨using,1977;Garfield,1979;Peritz,1983;Teufeletal.,2006a;Harwood,2009;DongandSch¨afer,2011),intothesixclassesshowninTable1,alongwiththeirdescriptionandexample.OurannotationschemeissimilartothesixclassesofAbu-Jbaraetal.(2013)andthetwelve-classschemeofTeufel(2000).Theformerhasseparateclassesforcomparisonandforcontrast,whereasthelatterhasmultiplefiner-graineddistinctionsfordif-ferentkindsofcomparisonandcontrasts.Here,wecollapsethesedistinctionsintoasingleclass,COM-PARISONANDCONTRAST,thatsignalstheauthorismakingsomeformofalignmentbetweentheirworkandanother.Inpractice,wefoundthatmanycita-tioncontextswithalignments—suchasthisone—containsignalsofbothcomparisonandcontrast;forourintendedanalyses,weconsideredthisalign-mentsignallingmoreimportantthanwhethertheau-thorwascomparingorcontrasting.Additionally,weintroducetheFUTUREclasstoindicatethatau-thorshaveforward-lookingreferencesforhowtheirworkmightbeappliedlater;thesereferencesareim-portantforestablishingatemporallineagebetweenworks,andasweshowlaterin§4,arethemostfre-quentcitationtypeinpapers’Conclusionsections.Ouradaptedschemeenablesustoconductdetailedanalysesofthenarrativestructureofpapers,venuecitationpatternandevolution,andmodelingtheevo-lutionofthewholefield.2.2AnnotationProcessandDatasetAnnotationguidelineswerecreatedusingapilotstudyof10paperssampledfromtheACLAnthol-ogyReferenceCorpus(ARC)(Birdetal.,2008).2Anotherpotentialthemeiscitationsentiment(Athar,2014;Kumar,2016),butweomitthisthemefromourfield-scaleanal-ysisbecauseresearchershaveshownthatnegativesentimentisrareinpractice(ChubinandMoitra,1975;Vinkler,1998;CaseandHiggins,2000)andcanbequitesubjectivetoclassifyduetotextualmixturesofpraiseandcriticism(Peritz,1983;Swales,1986;Brooks,1986;Teufel,2000).

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
0
2
8
1
5
6
7
6
2
6

/

/
T

l

UN
C
_
UN
_
0
0
0
2
8
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

393

ClassDescriptionExampleBACKGROUNDPprovidesrelevantinformationforthisdomain.Thisisoftenreferredtoasincorporatingdeterministicclosure(D¨orre,1993).MOTIVATIONPillustratesneedfordata,goals,metodi,etc.AsshowninMeurers(1994),thisisawell-motivatedconvention[…]USESUsesdata,metodi,etc.,fromP.Theheadwordscanbeautomaticallyextracted[…]inthemannerde-scribedbyMagerman(1994).EXTENSIONExtendsP’sdata,metodi,eccetera.[…]weimproveatwo-dimensionalmultimodalversionofLDA(An-drewsetal,2009)[…]COMPARISONORCONTRASTExpressessimilarity/differencestoP.Otherapproachesuselessdeeplinguisticresources(e.g.,POS-tagsStymne(2008))[…]FUTUREPisapotentialavenueforfuturework.[…]butweplantodosointhenearfutureusingthealgorithmofLittle-stoneandWarmuth(1992).Table1:OursetofsixfunctionsacitationmayservewithrespecttoacitedpaperP.Annotatorscompletedtworoundsofpre-annotationtodiscusstheirprocessanddesignguidelines.Allcitationswerethendoubly-annotatedbytwotrainedannotatorswithexpertiseinNLPusingtheBrattool(Stenetorpetal.,2012)andwerethenfullyadjudi-catedtoensurequality.Followingbestpracticesforannotatingcitations(Athar,2014),annotatorssawanextendedcontextbeforeandafterthecitingsen-tence,providedfromtheoutputofParsCit.Anno-tatorswereinstructedtoskipanyinstanceswhosecontextwascorruptedorwhosecitancetextdidnotmatchtheregularcitationstyleforACLvenues.3Thecitationschemewasappliedtoarandomsam-pleof52papersdrawnfromtheARC.EachpaperwasprocessedusingParsCit(Councilletal.,2008)toextractcitationsandtheirreferences.Asexpectedfrompriorstudies(Teufeletal.,2006a;Dongand3Asmallnumberofcitationinstancesinoursampleoc-curredincontextswherethesurroundingtextwasmalformed,whichweattributetobeingOCRerrors,thecitationbeinginthemiddleofamath-relatedcontextwhosesymbolswerenotconverted,orwherethecitationoccurredwithinatableorfig-urewhosestructurewastreatedasthesurroundingtext.Inallcases,weviewedintheinstanceasunsuitableforuseasatrain-ingexamplesinceitcontainedlittlemeaningfulcontext.Thesecasesaccountedforlessthan10instancesinourdata.Asec-ondsetofinstanceswereexcludedwhenParsCitmislabeledthespanofacitation,eithershorteningitorincreasingittomul-tiplecitations’text.Thesewrong-spansoccurredinlessthan10instancesinoursample.Athirdsetofcitationinstanceswereexcludedduetocitationstyledifference,whereapaperinanearlieriterationofaconferenceusednumericcitations,e.g.,“[12].”Thesewereexcludedtoensureuniformityinthedataandoccurredintwopapersthatwereexcludedfrominourinitialsample.Astheseerrorsaresufficientlyrareinoursam-ple(<4%),wedonotperformanyfurthercorrectionfortheseerrorsinthelarger,un-annotateddata.CitationFunctionCountBACKGROUND1021USES365COMPARESORCONTRASTS344MOTIVATION98CONTINUATION73FUTURE68Table2:CitationclassdistributioninourdatasetSch¨afer,2011),somecitationfunctionswereinfre-quent.Wethereforeattemptedtooversamplethein-frequentclassesFUTURE,EXTENSION,andMOTI-VATION,byusingkeywordsbiasedtowardextract-ingcitingsentencesofaparticularclass(suchastheword“future”fortheFUTUREclass).Theresultingcitingsentenceswerethenannotatedandcouldpo-tentiallybeassignedtoanyclass.Intotal,1436cita-tionsincontextwereannotatedforthefully-labeled52papers(mean27.6citations/paper)and533sup-plementalcontextsfrom133paperswereaddedbytargetedsampling,bringingthetotalnumberofin-stancesto1969.Table2showstheclassdistributioninthefinaldataset.Consistentwithpriorwork,themajorityofcitationsareBACKGROUND(MoravcsikandMurugesan,1975;Spiegel-R¨using,1977;Teufeletal.,2006b).3AutomaticallyClassifyingCitationsThestructureofascientificarticleprovidesmul-tiplecuesforacitation’spurpose.Ourworkdrawsonmultipleapproaches(Hern´andez-AlvarezandGomez,2016)todevelopaclassifierbasedon(1)structuralfeaturesdescribingwherethecitationislocated,(2)lexicalandgrammaticalfeaturesfor l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 394 Structuralsection#andremaining#ofsectionsrelativepositionsinpaper,section,subsectionsentence,&clause#ofothercitationsinsubsection,sentence,&clausecanonicalizedsectiontitleLexical,Morphological,andGrammaticalfunctionpatternsofTeufel(2000)topicalsimilaritywithcitedpaperthepresenceofeachof23connectivephrasesverbtenselengthsofthecontainingsentenceandclausewhetherusedinsideofaparentheticalstatement†bootstrappedfunctionpatterns†customfunctionpatterns†citationprototypicality†citationcontexttopics†papertopics†whetherusedinnominativeorparentheticalform,†whetherprecededbyaPascal-casedword†whetherprecededbyanall-capitalcasewordField#ofyearsdifferenceinpublicationdateswhetherthecitedpaperisaself-citation†citingpaper’svenue:journal/conference/workshop†reference’svenue:journal/conference/workshopreference’scitationcount,andPageRank(attimeofthecitation)†reference’sHubandAuthorityscoresandNetworkCentrality(attimeofthecitation)†#ofcitationsincommonUsage#ofindirectcitations#ofdirectcitations#ofindirectcitationspersectiontype#ofdirectcitationspersectiontypefractionofbibliographyusedbythisreferenceTable3:Featuresforclassifyingcitations.Novelfeaturesaremarkedwitha†.howthecitationisdescribed,(3)fieldfeaturesthattakeintoaccountvenueorotherexternalinforma-tion,and(4)usagefeaturesonhowthereferenceiscitedthroughoutthepaper.Table3showsourfea-tures,whichincludestennovelfeaturetypes,inad-ditiontoseveraldrawnfromrecentsystems(Teufel,2000;Teufeletal.,2006b;DongandSch¨afer,2011;WanandLiu,2014;Valenzuelaetal.,2015;Zhuetal.,2015).FunctionPatternCOMP.ORCON.@SIMILARADJto@REFERENTIAL@USECOMP.ORCON.the@RESEARCHNOUNof#NEXTENDS@CHANGENOUNof#N’sEXTENDS@CHANGENOUNofcitation’sMOTIVATION@INSPIRATIONby#NUSES@1STPERSONPRONOUN(NOM)@USEthe#NUSESthe#NcorpusUSES#D#N#NcitationTable4:Examplesofbootstrappedpatternslearnedandtheirassociatedclasswhere@denotesalexicalclassand#denotesapartofspeechwildcard.3.1FeaturesFollowing,wedescribeindetailthethreemaincat-egoriesofnovelfeatures.Pattern-basedFeaturesPatternsprovideapow-erfulmechanismforcapturingregularityincitationusage(DongandSch¨afer,2011).Ourpatternsareasequenceofcuephrases,partsofspeech,orlexicalcategories,likepositive-sentimentwordsorspecificcategoriesthatallowgeneralizationsacrossphraseslike“weextend”and“webuildupon.”Webeganwiththelargestpublicly-availablelistofcitationpat-terns(Teufel,2000)andextendeditwith132newpatternsand13newlexicalcategoriesbasedonamanualanalysisofthecorpus.Wethenusedbootstrappingtoautomaticallyiden-tifynewpatternsasfollows:Eachannotatedcon-textwasconvertedintofixed-lengthpatternsusing(a)our42lexicalcategories,(b)partofspeechwildcards,or(c)thetokensdirectly.Toavoidsemanticdrift(RiloffandJones,1999),abootstrappedpat-ternwasonlyincludedasafeatureifthemajorityofitsoccurrenceswerewithasinglecitationfunc-tion.4Table4showsexamplesofthesebootstrappedpatterns.Previouspatternsprimarilyusecuesfromthesamesentenceasthecitation(Teufel,2000).How-ever,authorsoftenusemultiplesentencestoindicateacitation’spurpose(Abu-JbaraandRadev,2012;Ritchieetal.,2008;Heetal.,2011;Katariaetal.,2011).Forexample,authorsmayfirstintroduceaworkpositively,onlytocontrastwithitinlatersen-4Forcomputationalefficiency,patternswererestrictedtohavingbetween3and8tokensandatmosttwopartofspeechwildcards.Duetoitshighfrequency,patternsforBACK-GROUNDwererequiredtooccurinatleast100contexts. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 395 1)algorithmparametermodeltrainingmethodclustering2)measurescoremetricinformationsimilaritydistance3)%resultaccuracyreportachieveperformancesystem4)trainingweightfeatureochmodelsetalgorithmerror5)workrelatedpreviouspaperproblemapproachpresentTable5:Themostprobablewordsfromfiveexampletop-icslearnedfromcitationcontexts.tences(Peritz,1983;Brooks,1986;Merceretal.,2004).Indeedtheaveragetextpertainingtoacita-tionspans1.6sentencesintheARC(Small,2011).Wethereforeinducebootstrappedpatternsspe-cifictothecitationsentenceaswellastheprecedingandfollowingsentences.Ultimately,805newboot-strappedpatternswereaddedforthecitingsentence,669fortheprecedingcontext,and1159forthefol-lowingcontext,atotalofoverfourtimesthenumberofmanuallycuratedpatterns.Topic-basedFeaturesAcontext’sthematicfram-ingcanpointtothepurposeofacitationevenintheabsenceofexplicitcues.Forexample,acitationinacontextdescribingsystemperformancesandre-sultsislikelytobeaCOMPAREORCONTRAST,whereasonedescribingmethodologyismorelikelytobeUSES.Wequantifythisthematicframingbyusingfeaturesbasedontopicmodels,computedoverthesentencecontainingthecitationandalsoovertheparagraphcontainingthecitingsentence.Foreachtypeofcontext,atopicmodelistrainedover321,129respectivecontextsfromtheARC.Table5showsexampletopics.PrototypicalArgumentFeaturesWealsoex-ploredrichergrammaticalfeatures,drawingonselectionalpreferencesreflectingexpectationsforpredicatearguments(Erk,2007).Weconstructaprototypeforeachcitationfunctionbyidentifyingthefrequentargumentsseenindifferentsyntacticpositions.Forexample,EXTENDScitationsoccurfrequentlyasobjectsofverbssuchas“follow”and“use”,whereasUSEScitationshavetechniquesorartifactwordsasdependents;Table6showsmoreexamples.Eachclass’sselectionalpreferencesarerepresentedusingavectorfortheargumentateachrelationtype,constructedbysummingthevectorsofallwordsappearinginit.Eachfunctionisrep-resentedasaseparatefeaturewhosevalueistheFunctionPathArgumentsMOTIVATIONnmod−1inspire,work,showMOTIVATIONnmod−1,nmod−1exemplify,direction,inspireMOTIVATIONnsubj−1show,use,suggestUSESnmod−1use,describe,proposeUSESdobjuse,follow,seeUSESdep−1system,algorithm,mechanismCOMP.ORCONT.nmod−1,nmod−1similar,related,useCOMP.ORCONT.dep−1system,method,approachCOMP.ORCONT.nsubj−1,dobjapproach,technique,ruleEXTENDSamodprevious,prior,unsupervisedEXTENDSnmod−1,nmod−1base,version,extensionEXTENDSdobj−1follow,extend,unfoldTable6:Examplesofcitationfunctionselectionalpref-erenceswiththemost-frequentargumentsseenforeachpaths.Eachdependencypathfeaturevaluereflectsthesimilarityof(i)theaveragewordvectorforthatpath’sargumentswith(ii)thevectorofthepath’sargumentinagivencontext,ifthepathispresent.averagesimilarityofaninstance’sargumentswiththeclass’spreferencesforallobservedsyntacticre-lationships(i.e.,howsimilararethesyntactically-relatedwordstothefunction’spreferences).Ourworkdiffersfromdependency-basedfeaturesfrompriorworkthatuseseparatefeaturesforeachuniquedependencypathandargument(AtharandTeufel,2012;Abu-Jbaraetal.,2013);incontrast,weuseasinglefeatureforeachpathwithdistributedrep-resentationforitsarguments,whichallowsourfea-turestogeneralizetosimilarwordsthatareunseeninthetrainingdata.3.2ExperimentalSetupModelsAllmodelsweretrainedusingaRandomForestclassifier,whichisrobusttooverfittingevenwithlargenumbersoffeatures(Fern´andez-Delgadoetal.,2014).Afterlimitedgridsearchoverpossibleconfigurations,5wesetparametervaluesasfollows.Thenumberofrandomtreesis2500andwerequiredeachleaftomatchatleast5instances.Toover-cometheclassimbalance,weuseSMOTE(Chawlaetal.,2002)togeneratesyntheticexamplesinthetrainingfoldusingthe5nearestneighbors.The5Thegridsearchwasperformedusingthefollowingparam-eterranges:numberoftrees[100,500,1000,2500];maximumnumberofdepthofthedecisiontreeasn10or√n,wherenisthenumberoffeatures;minimumleafsizeindecisiontree[2,...,10];numberoftopics[50,100,250,500];andwhethertouseSmote(Chawlaetal.,2002). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 396 classifierisimplementedusingSciKit(Pedregosaetal.,2011)andsyntacticprocessingwasdoneusingCoreNLP(Manningetal.,2014).Selectionalprefer-encesusedpretrained300-dimensionalGloVevec-torsfromthe840BtokenCommonCrawl(Penning-tonetal.,2014).ThetopicmodelfeaturesusedanLDAwith100topics.DataAnnotateddataiscrucialfordevelopinghighaccuracyforrarecitationclasses.There-fore,weintegrateportionsofthedatasetofTeufel(2010),6whichhasfine-grainedcitationfunctionla-beledforACL-relateddocumentsusingtheanno-tationschemeofTeufeletal.(2006b).Wemaptheir12functionclassesintooursixclasses(seeAppendixA).Whencombiningthetwodatasets,weomitthedatalabeledwiththeirBACKGROUND-equivalentclasstoreducetheeffectsofalargema-jorityclassandbecauseinstancesoftheFUTUREclassaremergedintoBACKGROUNDaccordingtotheirscheme.Theresultingcitationfunctiondatasetcontains3,083instances.EvaluationEvaluationisperformedusingcross-validationwhereeachfoldleavesoutallcitationsofasinglepaper.Stratifyingbypaperinsteadofin-stanceiscritical:sincemultiplecitationsmayap-pearinthesamesentence,instance-basedstratifica-tionwouldleakinformationbetweentrainingandtest.Wealsonotethatwhenperformingcross-validation,wecomputethebootstrappedpatternsandprototypicalargumentfeaturesusingonlycon-textsfromthetrainingdata.Wereportmacro-averagedF1scoresacrossthesixfunctionclasses.ComparisonSystemsWecompareagainstthreestate-of-the-artsystemswhichallusesimilarcita-tionfunctionclassifications.Abu-Jbaraetal.(2013)useacombinationoflexicons,structural,andsyn-tacticfeaturesforclassification.Instancesareclas-sifiedusingalinearkernelSVM.TheirdescribedmethodalsousesasecondCRF-basedclassifiertoincludeneighboringsentencesinthecitationcon-text.Asthedatasetforthiscitation-spanclassifierisnotpublic,weareunabletoreproducethispartoftheirsystem.However,theauthorsnotethatus-6Theiroriginaldatamaybeobtainedathttp://www.cl.cam.ac.uk/˜sht25/CFC.htmlandwedistributeare-annotatedversionofthiswithourdata.SystemMacroF1Thiswork0.530withouttopicfeatures0.502withoutselectionalprefs.0.464withoutbootstrappedpats.0.457withoutanynovelfeatures0.474Abu-Jbaraetal.(2013)0.410Teufel(2000)0.273DongandSch¨afer(2011)0.233Majority-Class0.092Random0.138Table7:Classifierperformances.ingthecitingsentencealoneisthecorrectcontextin80%oftheinstances,soweviewourimplemen-tationasacloseapproximation.DongandSch¨afer(2011)classifycitationsusingasmallsetoflexi-consanddiscoursefeatures,whichincludesregularexpressionsonsentencepartsofspeechforcaptur-ingsyntacticcues.Theirmodelusesanaivebayesclassifier,whichwasshowntoworkwellfortheirdata.Teufel(2000)isthemostsimilarmodeltooursasitusesasubsetofourlexicalfeaturesandlexicons;themodelusesak-nearestneighborclas-sifier.Wenotethattheoriginalimplementationusedacustomsyntactictoolforidentifyingaspectslikeverbtense,whichwereplacedwithCoreNLP.WecompareagainstthesystemTeufel(2000)insteadofthesystemTeufeletal.(2006b)becausethelatterin-cludespattern-basedfeaturesthatarenotfullyspec-ifiedorpubliclyavailable;however,thetwosystemsaresimilarintheirdescription.Forallthreecom-paredsystems,weuseidenticalparametervaluesasreportedinthepapers.BaselinesTwobaselinesareusedforcompari-son:aRandombaselinethatselectsafunctionatchanceandaMajority-classbaselinethatlabelsallinstanceswiththemostfrequentcitationfunctionBACKGROUND.3.3ResultsandDiscussionOurmethodssubstantiallyoutperformedthecloseststateoftheartandbothbaselinesforbothclassifi-cationtasks,asshowninTable7.Allimprovementsovercomparisonsystemsarestatisticallysignificant(McNemar’s,p≤0.01).Theclosest-performingsys-temwasthatofAbu-Jbaraetal.(2013),whichalsohadaheavily-lexiconbasedapproach.Anablationtestsuggeststhateachofournovel l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 397 featurescontributedtothefinalperformance.No-tably,weobservethatselectionalpreferenceandbootstrappedlexiconfeatureshadthelargestimpactonperformance;bothfeaturescapturelocalinfor-mationindicatingthistypeofinformationisim-portantforrecognizingfunction.Whilemultiplepriorworkshavefocusedonpatternstorecognizefunction,ourresultssuggestthatmachinelearnedpatternsandcontextualregularities(topicsorwordvectors)providehighly-accurateinformation.In-deed,examiningthefeatureweightingintheran-domforestshowsthatfeaturesforstructure(e.g.,sectionnumber),topic,andselectionalpreferencecomprisedmostofthe100highest-weightedfea-tures(76%).TheuseofconjunctivefeaturesbytheRandomForestwascriticalforhighperformance.Allothernon-conjunctiveclassifierswetriedresultedinsub-stantiallylowerMacroF1:NaiveBayes,0.286;k-nearestneighbor,0.255(k=3);andLinear-kernelSVM,0.393(C=1).7TheresultingclassifierperformanceissufficienttoapplyittotheentireARCdatasetfortheanalysesinthenextfoursections.Nonetheless,errorsremain.Ourerroranalysisrevealedthatamainchallengeisincorporatinginformationexternaltothecitingsen-tence.Considerthefollowingexample:BilderNetleisournewdatasetofGermannoun-to-ImageNetsynsetmappings.ImageNetisalarge-scaleandwidelyusedimagedatabase,builtontopofWordNet,whichmapswordsintogroupsofim-ages,calledsynsets(Dengetal.,2009).HerethecitingsentenceappearsmuchlikeaBACKGROUNDcitationwhenreadinisolation;how-ever,theprecedingsentencerevealsthatthecitingwork’sdataisbasedonthecitation,makingitsfunc-tionUSESthoughnoexplicitcuessuggestthisinthecitingsentence.Thus,ourerroranalysissupportstheobservationofAbu-Jbaraetal.(2013)thatcitationcontextidentificationisanimportantsteptowardsimprovingperformanceandmodelswithrichertex-tualunderstandingareneededtounderstandhowthe7Weobservedmixedresultswhenusingarandomforestwithotherapproaches.Replacingthek-nearestneighborsclas-sifierusedinTeufel(2000)witharandomforestimprovesci-tationfunctionclassificationby0.119MacroF1.Incontrast,replacingtheSVMmodelusedbyAbu-Jbaraetal.(2013)de-creasedperformanceby0.072MacroF1.WespeculatethatthelargerfeaturespaceofTeufel(2000),whichismoresimilartoourfeaturesspace,ismoreconducivetoconjunctivefeatures.IntroductionRelated WorkMotivationMethodologyEvaluationResultsDiscussionConclusion0.00.20.40.60.81.0UsesMotivationFutureExtendsCompare or ContrastBackgroundFigure2:Expectedpercentageofcitationfunctionspersectionshowsaclearnarrativetrajectoryacrosssections.citationrelatestothebroadercontextandnarrativeoutsideofthesentence.Inthenextfoursections,weapplyourclassifiertrainedonourcombineddataset(2600citationin-stances)totheACLAnthologytostudywhatcita-tionfunctionscantellusaboutscientificuptakeandauthorbehavior.4NarrativeStructureofCitationFunctionScientificpaperscommonlyfollowastructuredsec-tionnarrativetoframetheircontributions:Introduc-tion,Methodology,Results,andDiscussion(Skel-ton,1994;Nwogu,1997).Eachsectioninthenar-rativeadoptsargumentativemovesdesignedtocon-vincethereaderofthework’sclaims(Swales,1986;Swales,1990).Wehypothesizethatthisnarrativeismirroredinhowauthorsusetheircitationsinsec-tions,withthecitation’sfunctionservingtofurtherevokesection’sintendedrhetoricalframe(Goffman,1974;Gumperz,1982).Totestthishypothesis,thefunctionclassifierwasappliedtoall21,474papersofthelatest2016re-leaseoftheACLAnthology.Thisyieldedadatasetof134,127citationsbetweenpapersintheARC.Theresultingdistributionsofcitationfunction(Fig-ure2),showthatauthors’citationframingindeedparallelstheexpectedrhetoricalframingseeninthewriting:(1)establishinganintellectuallineageviaBACKGROUNDcitationsintheIntroduction,Moti-vation,andRelatedWorksectionsto(2)introducingmethodologywithUSEScitationsintheMethodol-ogyandEvaluationsections,(3)alargeincreaseinCOMPARISONORCONTRASTforrelatedliteratureintheResultsandDiscussions,andfinally(4)clos-ingcomparisonsandpointerstofuturedirections. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 398 Comp. Ling.TACLACLNAACLEMNLPEACLANLPWorkshopSemEval0.00.20.40.60.81.0UsesMotivationFutureExtendsCompare or ContrastBackgroundFigure3:Venuesattractdifferentcitationframing,asseeninthedifferencesinthedistributionofcitationfunc-tionspervenueinthejournals(lefttwo),workshops(righttwo)andconferences(middle).Thesetrendsalsomirrorthethematicstructureiden-tifiedinfull-papertextualanalyses(Skelton,1994;Nwogu,1997).Byshowingthatasectioncontainscitationsservingavarietyoffunctions,ourfindingsfurtherpointtoanewdirectionforcitationplace-mentstudies(Huetal.,2013;Dingetal.,2013;Bertinetal.,2016),whichhavelargelytreatedallcitationswithinasectionasequivalent.5VenuesandCitationPatternsEachpublicationvenuehasitsownexpectationforthetypesofworkitaccepts,e.g.,thedegreeofpolishordepthofexperiments.Assuch,eachvenuehasadistinctgenreofwriting,fromthetentativeresultsofworkshoppaperstojournalpaperswithsubstantialsynthesis.Towhatdegreedovenuegenresaffectthewayauthorscite?Toanswerthis,weusedthesameexperimentalsetupastheprevioussection.Figure3showscitationfunctionbyvenueforthe134,127citations.Wefindthatsimilarvenuetypeshavesimilardis-tributionsofcitationframing.JournalshavethehighestpercentageofBACKGROUNDcitations,sug-gestingthattheirextraspaceandwidertemporalscopelendsitselfmoretopositioning.Conferencevenuesdevoteproportionallymorespacetocon-trastandcomparisonwithotherwork,presumablybecausenewNLPworkisfirstpresentedatcon-ferencesandhenceacceptancerequiresdemonstrat-ingtheproposedtechniqueisbetterthanexistingones.Workshops,bycontrast,haverelativelylittlecomparisonandinsteadusemoreBACKGROUND;theexperimentalnatureofworkshoppaperspresum-ablyresultsinfewerpotentialprospectsforcompar-ison.Similarly,theSemEvalworkshopsfocusonrapidlydevelopingnewsystemsforasharedtask,whichisreflectedinthepapersframingasprimar-ilyUSEScitationsandrelativelylittleCOMPAREORCONTRAST,asthevenue’ssharedtaskprovidesthebroaderframingconnectingpaperstorelatedwork.6VenueEvolutionThegrowthoftheACLcommunityhasbeenaccom-paniedbythecreationofnewpublicationvenues.Howhavethesenewvenuesevolvedbypossiblybe-cominginstitutionalizedandresemblingestablishedconferencesorbecomingstylisticallydistinctandcapturingdifferentrepresentationsofknowledge?Citationframingprovidesanideallensforobserv-ingthisevolutionbymeasuringthedegreetowhichanewervenue’spapers’framingmirrorsthatusedbypapersinestablishedvenues.Here,weexaminevenueevolutionintheACLthroughitsworkshops.ConferenceswithintheACLcommunityfre-quentlyhavecollocatedworkshopsthatfocusonaparticularthemeandhavetheirownproceedings.Thenumberofworkshopshasincreasedsubstan-tiallywiththegrowthofthefield,fromaroundtenworkshopsinthe1990stoover100workshopsby2010,withmanyworkshopshavingmultipleitera-tionsacrosstheyears.ThisgrowthhasledtotheobservationthatACLworkshopshavebecomelikemini-conferencesratherthanvenuesforearly-stageresearchanddiscussion(Daum´eIII,2016).Areworkshopsbecomingmoreconference-likeand,ifso,isthisageneraltrendorprimarilyseeninlong-runningworkshops?Wehypothesizethatmultipleiterationsofthesameworkshopcreateinstitutionalknowledgeandcommunitynormsthatleadstomoreconference-likepapersovertime.Here,wetestthishypothesisbymeasuringwhetherworkshoppapershavebecomemoresimilarintheircitationframingtopapersfromthemainconferences.ExperimentalSetupWerepeattheclassificationsetupfromthepreviousexperiment.Wecom-paretheaverageframingofapaperwithinavenueinagivenyearwiththedistributionforthetwomainconferences(ACLandNAACL)withinthatyear.DistributionsarecomparedusingtheJensen-ShannonDivergence,where1indicatesthatthevenuesarecitingidentically. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 399 199020002010Year of Workshop0.60.70.80.91.0Similarity to ACL / NAACLOne-offLong-RunningFigure4:Bothone-offandlong-runningworkshopshaveproceedingsthatincreasinglyappearmoreconference-likeinhowtheirpapersframetheircitations,suggestingthatworkshopsareinfactbecomingmoreconference-like.Linesshowafittedlinearregressionwithboot-strapped95%confidenceintervals,whichstopoverlap-pingbeginningin2007.ResultsWorkshopsconsistentlybecamemoreconference-likeintheirpapers’citationframing(Figure4).Further,thistrendinincreasingsimilar-itywasseenmuchmoreforbothlong-runningwork-shops,suggestingthatmultiple-iterationworkshopscreatetheirownconference-likenormsthatattractmoreconference-likepaperseachyear.Wespecu-latethattheincreasingsimilarityofworkshopstoconferencesindicatesthefieldhasbeguntocongeal.Earlyworkshopswerelikesatelliteconferencesonperipheraltopics,butasthefieldgrowsandthemethodsstandardize,anormofpublicationemergessuchthatconferencesandworkshopsallresembleaninstitutionalizedstandard.Multipleiterationsofaworkacceleratethisprocessbyfurtherestablishingpublicationnormswithinasub-community.7PredictingFutureImpactThescholarlynarrativetoldthroughcitationspro-videsthereaderwithsupportforitsclaimsandtech-nicalcompetence(Latour,1987,p.34).Thisfram-ingcouldaffecthowtheworkisperceivedand,ulti-mately,howitisreceivedandcitedwithinthecom-munity(Shietal.,2010).Doestheframeevokedbyapaperthroughitscitationfunctions(thewayitcomparestorelatedwork,ormotivates,orpointstothefuture)affectitsreception?ExperimentalSetupToquantifyhowapaper’scitationframingaffectsitsfutureuptake,wecon-structedanegativebinomialregressiontopredictthecumulativenumberofcitationsapaperreceivedwithinthefirstfiveyearsafterpublication,whichisknowntobehighlyrepresentativeoftheeventualci-tationcount(Wangetal.,2013;Stern,2014).Inadditiontovariablesforhowthepapercites,weincludevariablesfromstate-of-the-artfeaturesforpredictingthecitationcount(Yanetal.,2011;Yo-gatamaetal.,2011;Yanetal.,2012;Chakrabortyetal.,2014;Dongetal.,2016),describedbelow.Wecompareagainstabaselineregressionmodelwithoutcitationframingandtestwhetherthemodel’sfitisimprovedwhentheframingisincludedasfeatures.Allpaperswithatleastfiveyearsofpublicationhistoryintheanthologywereconsidered,yieldingasetof10,434papers.Weusednegativebinomialmodels,whicharemoreappropriatethanlinearre-gressionascitationcountsarenon-negativediscretecounts,andcomparedthembyusingAkaikeInfor-mationCriterion(AIC).AICmeasureseachmodel’sgoodnessoffitinproportiontothenumberofin-dependentvariables;whencomparingmodels,themodelwiththeminimalAICispreferred(Akaike,1974).Ifcitationframinghelpstoexplainfutureim-pact,weshouldseealowerAICdespitethepenaltyforincludingmorevariablestothemodel.Fivetypesofnon-citationfeatureswereincluded.Tomodeltheamountofattentionreceivedbydif-ferentresearchareas,eachpaperisassociatedwithitsdistributionover100topics,builtusingLDAovertheARC.Tocapturediversity,weincludetheentropyofthetopicdistribution.Weincludethepublicationyearsincethesizeofthefieldchangesovertime.Multi-authorpapersareknowntoreceivehighercitationcounts(GazniandDidegah,2011),partiallyduetotheeffectsofself-citation(FowlerandAksnes,2007),andthereforeweincludethenumberofauthorsonthepaper.Toreflecthowinte-gratedthepaperis,weincludethenumberofrefer-ences.88Tocontrolforcollinearitybetweencitation-relatedpredic- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 400 BaselinewithFramingIntercept−173.334∗∗∗−161.375∗∗∗#ofauthors0.101∗∗∗0.101∗∗∗#ofcitations0.037∗∗∗0.036∗∗∗year0.088∗∗∗0.082∗∗∗topicdiversity−0.741∗∗∗0.685∗∗∗BACKGROUND0.013∗∗COMP.ORCON.0.025∗∗∗EXTENDS0.021∗∗FUTURE0.016MOTIVATION0.014∗USES0.055∗∗∗LogLikelihood−17,485.700−17,416.820AkaikeInf.Crit.35,693.40035,567.630∗p<0.1,∗∗p<0.05,∗∗∗(p<0.01Table8:Regressionmodelsforpredictingthetotalnum-berofcitationsfiveyearsafterpublicationshowthatapaper’scitationframingprovidesastatisticallysignifi-cantimprovementinmodelfitandrevealswhichtypeofframingyieldsmorecitedpapers.Regressioncoefficientsforvenueandtopicsareomittedforspace.ResultsKnowledgeofhowapaperframesitscon-tributionshelpsimprovepredictingitsfutureimpact,withastatisticallysignificantimprovementinAICwhenthedistributionofcitationfunctionsisadded(likelihoodratiotest,p≤0.01).Table8showswhichtypesofcitationsaresignif-icantlypredictiveofhigherimpact(p≤0.01).Twomaininsightscanbemadefromtheseresults.First,papersmaximizetheirfutureimpactwhenframedasintegratingmanyothertechnologiesviaUSEScita-tions.Second,worksthatframetheircontributionsthroughCOMPARISONORCONTRASTratherthanBACKGROUNDaremorelikelytohaveahigherim-pact.Latour(1987,p.54)hassuggestedthatau-thorsmaydeflectcriticismoftheirwork(improv-ingitsperception)byclaimingitasanextension,ratherthancomparingitwithpriorwork.However,wedidnotobservethiseffectinhowauthorsframetors,weregressoutthenumberofcitationsfromthecitationfunctioncounts(Kutneretal.,2004;O’brien,2007).Finally,weincludethepublicationvenue,usingtheindividualconfer-enceorworkshopinwhichthepaperwaspublishedtocontrolforvariationsinprestigebetweenvenues.Theresultingmodelhasavarianceinflationfactorof<10forallvariables.theirworkasCOMPARISONORCONTRASTorEX-TENDS,withbothhavingsignificantpositiveeffects.8TheGrowthofRapidDiscoveryScienceAsscientificfieldsevolve,newsubfieldsinitiallyemergearoundmethodsortechnologieswhichbe-comeafocusofcollectivepuzzle-solvingandcon-tinualimprovement(Moody,2004).NLPhaswit-nessedtheemergenceofseveralsuchsubfieldsfromtheearlygrammarbasedapproachesinthe1950s-1970s,tothestatisticalrevolutioninthe1990s,totherecentdeeplearningmodels(Sp¨arckJones,2001;Andersonetal.,2012).Collins(1994)pro-posedthatafieldcanundergoaparticularshift,re-ferringtoitasrapiddiscoveryscience,whenthefield(a)reacheshighconsensusonresearchtopicsaswellasmethodsandtechnologies,and(b)thendevelopsgenealogiesofmethodsandtechnologiesthatcontinuallyimproveononeanother.Overtime,thereisincreasedconsensusoncoreapproaches,andthefield’speripheryisextendedtonewre-searchpuzzlesratherthancontestingpriorefforts.Collinsclaimsthisshiftcharacterizesnaturalsci-ences,butnotmanysocialsciences,whicharein-steadmorelikelytoengageincontinualcontest-ingandturnoverofcoremethodsandassumptions(Evansetal.,2016).Wearguethatashifttorapiddiscoveryscienceshouldbevisibleinthewaycitationsareusedtoframeworksinthefieldasawhole.Specifically,weexpectthatasconsensusisreached(1)authorsareexpectedtohavefewercomparisonstootherworksandinsteadcansimplyacknowledgepastworkasbackgroundand(2)theremainingcomparisonscon-centrateonfewerworks,reflectingthoseworks’sta-tusasacceptedbenchmarksofperformance.Fur-ther,weexpectthatasamethodologicallineagede-velopsweshouldalsoobserveanincreasedconcen-trationofUSEScitationsonpapersdescribingmeth-odsanddata.Weproposethattheincreaseduseofsharedeval-uations,andthestatisticalmethodologyborrowedoriginallyfromelectricalengineering(Halletal.,2008;Andersonetal.,2012)hasledNLPtoundergoashifttowardsrapiddiscoveryscience.ExperimentalSetupWerepeatthesetupofprevi-ousexperimentsandmeasuretheexpectedcitation l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 401 1980199020002010Year0.00.20.4% of CitationsCompareOrContrastUsesBackgroundExtendsFutureMotivationFigure5:ChangesintheaveragecitationframeinACLpapersrevealsacontinueddeclineinthepercentageofCOM-PARISONORCONTRASTandincreaseinUSEScitations.TheincreaseinBACKGROUNDcitationscirca2010marksthestartoftheeraofunlimitedreferencesinACLconferences.Shadedregionsshowbootstrapped95%confidenceintervals.1980199020002010Year1.01.21.41.61.8Expected Citation CountUsesCompareOrContrastFigure6:TheaveragecitedpaperreceivesanincreasingnumberofUSESandCOMPARISONORCONTRASTcita-tionsperyearshowingthatfieldincreasinglybuildsuponthesamesetofpapers,providingamethodologicallin-eage.Shadedregionsshowbootstrapped95%confidenceintervals.frameofapaperperyearusingallpaperspublishedinthatyear.ResultsTheNLPfieldshowsasignificantin-creaseinconsensusconsistentwiththeriseinrapiddiscoveryscience,evidencedthroughtwomaintrends.First,NLPauthorsuseadecreasingnumberofcomparisonandcontrastcitations(r=-0.899,p≤0.01)asseeninFigure5.Insteadofcomparingtoothers,itseemsthatauthorssimplyacknowl-edgepriorworkasBACKGROUND,whichhadacorrespondingincreaseinrelativefrequency.De-spiteanincreaseinBACKGROUNDcitations,theto-talpercentageofnon-methodologicalcitationsstilldeclines(r=-0.663,p≤0.01),withauthorsinsteadincreasinglyincludingmoreUSEScitations.Latour(1987,p.50)arguesthatsuchnon-methodologicalreferencesarecriticaltoanauthor’sdefenseofanidea.Wethereforeinterprettheobserveddecreaseinnon-methodologicalreferencesassignalingare-ducedneedforauthorstodefendaspectsoftheirwork.Authorsareabletocompareagainstfewerpapersduetothefield’sgrowingconsensusonthevalidityoftheproblemandmethodologicalcontri-bution.Notethatthereisasmallbutsignificantincreaseinthenumberofnon-methodologicalreferencesbe-tween2009and2011.ThistransitioncorrespondstothedateatwhichACLvenuesbeganallowingun-limitedreferences(2010forACL,2011forNAACL,eccetera.).Unlimitedextraspaceforcitationsactedtomodifyauthors’citationframingbehavior;givenun-limitedspace,authorschosetoincludeproportion-allymorenon-methodologicalcitations.9Inthesecondtrend,authorsaremorelikelytouseandcompareagainstthesamesetofpapers,asshowninFigure6bytheriseinexpectedincom-9Notethatthischangeactsagainstthegeneraldecreaseinnon-methodological;consideringonly1980-2009,thedecreaseinnon-methodologicalisevenlarger(r=-0.568,p≤0.01). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 402 ingcitationstothoseworkscomparedagainst(r=0.734,p≤0.01)andused(r=0.889,p≤0.01).Forexample,in1991,authorscomparedwithadiffusegroupofparsingpapers,per esempio.,(Shieber,1988;PereiraandWarren,1983;Haas,1989),withsuchpapersre-ceivingatmostthreecitationsthatyear;whereasin2000,mostcomparisonsweretoacoresetofparsingpapers,per esempio.,(Collins,1999;Buchholzetal.,1999;Collins,1997),withamuchsharper(lowerentropy)distributionofcitations.Thesetrendsshowthein-creasedincorporationofpriorworktoformalineageofmethodtechnologiesaswellasshowincreasedconsensusonwhichworksaresufficientforcom-paringagainstinordertoestablishaclaim.TheseresultsalsoempiricallyconfirmtheobservationofSp¨arckJones(2001)thatamajortrendinNLPinthe1990swasanincreaseinreusabletechnologiesandevaluations,liketheBNC(Leech,1992)andthePennTreebank(Marcusetal.,1993).Morebroadly,ourworkpointstothefutureofNLPasaquicklymovingfieldofhighconsensusandsuggeststhatartifactsthatfacilitateconsensussuchassharedtasksandopensourceresearchsoft-warewillbenecessarytocontinuethistrend.9ConclusionAuthorsciteworksfordifferentreasons(orfunc-tion),soregardingthemasequivalentsignalsispo-tentiallyproblematic.Manyfluffcitationsexist,whilesomelesscommononesaresubstantivelyrel-evanttothepaper’sargument.Acarefulanalysisofcitationrevealsthatauthorsciteworksformulti-plereasons—asbackground,motivation,extension,use,contrasto,orfuture.Whenauthorsutilizesomeformsofcitationoverotherstheycansignificantlyinfluencehowtheirownworkgetsperceivedandtakenupbyothers(Latour,1987).Simplyput,ci-tationfunctionshelpframeanarticle’sreception.Moreover,adifferentiationofcitationfunctionsaf-fordsadeeperunderstandingofhowscholarsde-velopargumentsfordifferentpublicationvenuesaswellashowthesevenuesmaydemanddifferentformsofknowledgerepresentationandargumentsovertime.Infact,thesemodesofcitationhelpusunderstandthestateofresearcheffortsandtheirevo-lutionmorebroadlyforentirescientificfieldslikeNLP.Inthispaper,werelateallthisusinganewcor-pusannotatedwithcitationfunctionandbydevel-opingastate-of-the-artclassifierforrevealingscien-tificframing.Indoingso,wedemonstratetheim-portanceofnovelunsupervisedfeaturesrelatedtotopicmodelsandargumentstructure,andlabelallthecitationsforanentirefield.Wethenshowthatcitationframingrevealssalientbehaviorsofwriters,readers,andthefieldasawhole:(1)authorsaresensitivetodiscoursestruc-tureandvenuewhenciting,(2)ACLworkshopshaveevolvedtobecomemorelikethemainstreamconferences,withmulti-iterationworkshopsbeingquickertoestablishconference-likenorms,(3)thewayinwhichanauthorframestheirworkaidsinpredictingitsfutureimpactasthenumberofci-tationsitsreceives,withthecommunityfavoringworksthatintegratemanynewtechnologiesandalsorelatetopriorworkthroughcomparisonandcon-trast,E(4)theNLPfieldasawholehasseenin-creasedconsensusinwhatconstitutesvalidwork—withareducedneedforpositioningandexcessivecomparison—demonstratingitsshifttowardsrapiddiscoveryscience.Alldata,materiali,andcodeforallsystemsareavailableathttps://github.com/davidjurgens/citation-function.AcknowledgementsTheauthorsthankJureLeskovec,VinodPrab-hakaran,WillHamilton,andtheothermembersoftheStanfordNLPGroupforhelpfuldiscussionsandcommentsandthankMin-YenKanforhostingtheACLAnthologyandhelpwithdata.Wealsothanktheareachair,KatrinErk,andreviewersfortheirhelpfulcommentsandsuggestions.ThisworkisalsopartiallysupportedbytheNSFunderawardIIS-1633036,theStanfordDataScienceInitiative,theBrownInstituteforMediaInnovation,andtheSci-enceSurveyorproject.ReferencesAmjadAbu-JbaraandDragomirRadev.2012.Referencescopeidentificationincitingsentences.InProceed-ingsofthe2012ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies(NAACL-HLT),pages80–90.AssociationforComputationalLinguis-tics. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 403 AmjadAbu-Jbara,JeffersonEzra,andDragomirR.Radev.2013.Purposeandpolarityofcitation:To-wardsNLP-basedbibliometrics.InProceedingsofthe2013ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(NAACL-HLT),pages596–606.HirotuguAkaike.1974.Anewlookatthestatisticalmodelidentification.IEEETransactionsonAutomaticControl,19(6):716–723.AshtonAnderson,DanMcFarland,andDanJurafsky.2012.TowardsacomputationalhistoryoftheACL:1980-2008.InProceedingsoftheACL-2012SpecialWorkshoponRediscovering50YearsofDiscoveries,pages13–21.AwaisAtharandSimoneTeufel.2012.Context-enhancedcitationsentimentdetection.InProceedingsofthe2012ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages597–601.As-sociationforComputationalLinguistics.AwaisAthar.2014.Sentimentanalysisofscientificci-tations.TechnicalReport,UniversityofCambridge,ComputerLaboratory.MarcBertin,IanaAtanassova,YvesGingras,andVin-centLarivi`ere.2016.Theinvariantdistributionofref-erencesinscientificarticles.JournaloftheAssocia-tionforInformationScienceandTechnology(JASIST),67(1):164–177.StevenBird,RobertDale,BonnieJ.Dorr,BryanR.Gib-son,MarkJoseph,Min-YenKan,DongwonLee,BrettPowley,DragomirR.Radev,andYeeFanTan.2008.TheACLAnthologyReferenceCorpus:AReferenceDatasetforBibliographicResearchinComputationalLinguistics.InProceedingsofInternationalConfer-enceonLanguageResourcesandEvaluation(LREC).TerrenceA.Brooks.1986.Evidenceofcomplexcitermotivations.JournaloftheAmericanSocietyforIn-formationScience,37(1):34–36.SabineBuchholz,JornVeenstra,andWalterDaelemans.1999.Cascadedgrammaticalrelationassignment.InJointSIGDATConferenceonEmpiricalMethodsinNaturalLanguageProcessingandVeryLargeCor-pora,pages239–246.DonaldO.CaseandGeorgeannM.Higgins.2000.Howcanweinvestigatecitationbehavior?Astudyofrea-sonsforcitingliteratureincommunication.Jour-naloftheAmericanSocietyforInformationScience,51(7):635–645.TanmoyChakraborty,SuhansanuKumar,PawanGoyal,NiloyGanguly,andAnimeshMukherjee.2014.To-wardsastratifiedlearningapproachtopredictfuturecitationcounts.InProceedingsofthe14thJointCon-ferenceonDigitalLibraries(JCDL),pages351–360.NiteshV.Chawla,KevinW.Bowyer,LawrenceO.Hall,andW.PhilipKegelmeyer.2002.SMOTE:Syntheticminorityover-samplingtechnique.JournalofArtifi-cialIntelligenceResearch,16:321–357.DaleE.ChubinandSoumyoD.Moitra.1975.Contentanalysisofreferences:Adjunctoralternativetocita-tioncounting?SocialStudiesofScience,5:423–441.RandallCollins.1994.Whythesocialscienceswon’tbecomehigh-consensus,rapid-discoveryscience.So-ciologicalForum,9(2):155–177.MichaelCollins.1997.Threegenerative,lexicalisedmodelsforstatisticalparsing.InProceedingsoftheEighthConferenceonEuropeanChapteroftheAsso-ciationforComputationalLinguistics(EACL),pages16–23.MichaelCollins.1999.Head-drivenstatisticalmodelsfornaturallanguageparsing.Ph.D.thesis,UniversityofPennsylvania.IsaacG.Councill,C.LeeGiles,andMin-YenKan.2008.ParsCit:Anopen-sourceCRFreferencestringparsingpackage.InProceedingsofInternationalConferenceonLanguageResourcesandEvaluation(LREC).HalDaum´eIII.2016.Workshopsandmini-conferences,November.https://nlpers.blogspot.com/2016/11/workshops-and-mini-conferences.html.YingDing,XiaozhongLiu,ChunGuo,andBlaiseCronin.2013.Thedistributionofreferencesacrosstexts:Someimplicationsforcitationanalysis.JournalofInformetrics,7(3):583–592.YingDing,GuoZhang,TamyChambers,MinSong,Xi-aolongWang,andChengxiangZhai.2014.Content-basedcitationanalysis:Thenextgenerationofcitationanalysis.JournaloftheAssociationforInformationScienceandTechnology(JASIST),65(9):1820–1833.CailingDongandUlrichSch¨afer.2011.Ensemble-styleself-trainingoncitationclassification.InProceedingsofthe5thInternationalJointConferenceonNaturalLanguageProcessing(IJCNLP),pages623–631.YuxiaoDong,ReidA.Johnson,andNiteshV.Chawla.2016.Canscientificimpactbepredicted?IEEETransactionsonBigData(TBD),2(1):18–30.KatrinErk.2007.Asimple,similarity-basedmodelforselectionalpreferences.InProceedingsofthe45thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages216–223.ElizaD.Evans,CharlesJ.Gomez,andDanielA.Mc-Farland.2016.Measuringparadigmaticnessofdisci-plinesusingtext.SociologicalScience,3:757–778.ManuelFern´andez-Delgado,EvaCernadas,Sen´enBarro,andDinaniAmorim.2014.Doweneedhun-dredsofclassifierstosolverealworldclassificationproblems?JournalofMachineLearningResearch(JMLR),15(1):3133–3181. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 404 JamesH.FowlerandDagW.Aksnes.2007.Doesself-citationpay?Scientometrics,72(3):427–437.EugeneGarfield.1979.CitationIndexingItsTheoryandApplicationinScience,Tecnologia,andHumanities.JohnWiley&Sons,NewYork.MarkGarzoneandRobertE.Mercer.2000.Towardsanautomatedcitationclassifier.InProceedingsofthe13thBiennialConferenceoftheCanadianSocietyonComputationalStudiesofIntelligence,pages337–346.AliGazniandFereshtehDidegah.2011.Investigatingdifferenttypesofresearchcollaborationandcitationimpact:acasestudyofHarvardUniversity’spublica-tions.Scientometrics,87(2):251–265.ErvingGoffman.1974.Frameanalysis:Anessayontheorganizationofexperience.HarvardUniversityPress.JohnJ.Gumperz.1982.Discoursestrategies,volume1.CambridgeUniversityPress.AndrewHaas.1989.Aparsingalgorithmforunificationgrammar.ComputationalLinguistics,15(4):219–232.DavidHall,DanielJurafsky,andChristopherD.Man-ning.2008.Studyingthehistoryofideasusingtopicmodels.InProceedingsofConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages363–371.NigelHarwood.2009.Aninterview-basedstudyofthefunctionsofcitationsinacademicwritingacrosstwodisciplines.JournalofPragmatics,41(3):497–518.QiHe,DanielKifer,JianPei,PrasenjitMitra,andC.LeeGiles.2011.Citationrecommendationwithoutauthorsupervision.InProceedingsoftheFourthACMInter-nationalConferenceonWebSearchandDataMining(WSDM),pages755–764.ACM.MyriamHern´andez-AlvarezandJos´emGomez.2016.Surveyaboutcitationcontextanalysis:Tasks,tech-niques,andresources.NaturalLanguageEngineer-ing,22(03):327–349.ZhigangHu,ChaomeiChen,andZeyuanLiu.2013.Wherearecitationslocatedinthebodyofscientificarticles?Astudyofthedistributionsofcitationloca-tions.JournalofInformetrics,7(4):887–896.SaurabhKataria,PrasenjitMitra,CorneliaCaragea,andC.LeeGiles.2011.Contextsensitivetopicmodelsforauthorinfluenceindocumentnetworks.InProceed-ingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI),volume22,page2274.Citeseer.SrijanKumar.2016.Structureanddynamicsofsignedcitationnetworks.InProceedingsofthe25thInter-nationalConferenceCompaniononWorldWideWeb(WWW).MichaelH.Kutner,ChrisNachtsheim,andJohnNeter.2004.AppliedLinearRegressionModels.McGraw-Hill/Irwin.BrunoLatour.1987.Scienceinaction:Howtofollowscientistsandengineersthroughsociety.HarvardUni-versityPress.GeoffreyLeech.1992.100millionwordsofEnglish:TheBritishNationalCorpus(BNC).LanguageRe-search,28(1):1–13.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyRoseFinkel,StevenBethard,andDavidMc-Closky.2014.TheStanfordCoreNLPNaturalLan-guageProcessingToolkit.InProceedingsoftheSys-temDemonstrationsat52thAnnualMeetingoftheAs-sociationforComputationalLinguistics(ACL),pages55–60.MitchellP.Marcus,MaryAnnMarcinkiewicz,andBeat-riceSantorini.1993.Buildingalargeannotatedcor-pusofEnglish:ThePennTreebank.ComputationalLinguistics,19(2):313–330.RobertE.Mercer,ChrysanneDiMarco,andFrederickW.Kroon.2004.Thefrequencyofhedgingcuesincita-tioncontextsinscientificwriting.InConferenceoftheCanadianSocietyforComputationalStudiesofIntel-ligence,pages75–88.Springer.JamesMoody.2004.Thestructureofasocialsci-encecollaborationnetwork:Disciplinarycohesionfrom1963to1999.AmericanSociologicalReview,69(2):213–238.MichaelJ.MoravcsikandPoovanalingamMurugesan.1975.Someresultsonthefunctionandqualityofci-tations.SocialStudiesofScience,5(1):86–92.HidetsuguNanbaandManabuOkumura.1999.Towardsmulti-papersummarizationusingreferenceinforma-tion.InProceedingsoftheInternationalJointConfer-enceonArtificialIntelligence(IJCAI),pages926–931.KevinNgoziNwogu.1997.Themedicalresearchpa-per:Structureandfunctions.EnglishforSpecificPur-poses,16(2):119–138.RobertM.O’brien.2007.Acautionregardingrulesofthumbforvarianceinflationfactors.Quality&Quan-tity,41(5):673–690.CharlesOppenheimandSusanP.Renn.1978.Highlycitedoldpapersandthereasonswhytheycontinuetobecited.JournaloftheAmericanSocietyforInforma-tionScience,29(5):227–231.FabianPedregosa,Ga¨elVaroquaux,AlexandreGram-fort,VincentMichel,BertrandThirion,OlivierGrisel,MathieuBlondel,PeterPrettenhofer,RonWeiss,Vin-centDubourg,etal.2011.Scikit-learn:MachinelearninginPython.TheJournalofMachineLearningResearch(JMLR),12:2825–2830.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:GlobalVectorsforWordRepresentation.InProceedingsofConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1532–1543. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 405 FernandoC.N.PereiraandDavidH.D.Warren.1983.Parsingasdeduction.InProceedingsofthe21stan-nualmeetingonAssociationforComputationalLin-guistics(ACL),pages137–144.BlumaC.Peritz.1983.Aclassificationofcitationrolesforthesocialsciencesandrelatedfields.Scientomet-rics,5(5):303–312.S.B.PhamandA.Hoffmann.2003.Anewapproachforscientificcitationclassificationusingcuephrases.InProceedingsoftheAustralianJointConferenceinArtificialIntelligence(AI),pages759–771.EllenRiloffandRosieJones.1999.Learningdictionar-iesforinformationextractionbymulti-levelbootstrap-ping.InProceedingsofAssociationfortheAdvance-mentofArtificialIntelligence(AAAI),pages474–479.AnnaRitchie,StephenRobertson,andSimoneTeufel.2008.Comparingcitationcontextsforinformationre-trieval.InProceedingsofthe17thACMConferenceonInformationandKnowledgeManagement(CIKM),pages213–222.XiaolinShi,JureLeskovec,andDanielA.McFarland.2010.Citingforhighimpact.InProceedingsofthe10thannualJointConferenceonDigitallibraries(JCDL),pages49–58.ACM/IEEE-CS.StuartMShieber.1988.Auniformarchitectureforpars-ingandgeneration.InProceedingsofthe12thCon-ferenceonComputationalLinguistics,pages614–619.AssociationforComputationalLinguistics.JohnSkelton.1994.Analysisofthestructureoforigi-nalresearchpapers:Anaidtowritingoriginalpapersforpublication.BritishJournalofGeneralPractice,44(387):455–459.HenrySmall.2011.Interpretingmapsofscienceusingcitationcontextsentiments:Apreliminaryinvestiga-tion.Scientometrics,87(2):373–388.KarenSp¨arckJones.2001.Naturallanguageprocessing:ahistoricalreview.UniversityofCambridge,pages2–10.InaSpiegel-R¨using.1977.Bibliometricandcontentanalysis.SocialStudiesofScience,7:97–113.PontusStenetorp,SampoPyysalo,GoranTopi´c,TomokoOhta,SophiaAnaniadou,andJun’ichiTsujii.2012.BRAT:aweb-basedtoolforNLP-assistedtextanno-tation.InProceedingsoftheDemonstrationsatthe13thConferenceoftheEuropeanChapteroftheAsso-ciationforComputationalLinguistics(EACL),pages102–107.DavidI.Stern.2014.High-rankedsocialsciencejournalarticlescanbeidentifiedfromearlycitationinforma-tion.PloSOne,9(11):e112520.JohnSwales.1986.Citationanalysisanddiscourseanal-ysis.Appliedlinguistics,7(1):39–56.JohnSwales.1990.GenreAnalysis:EnglishinAca-demicandResearchSettings.Chapter7:Researchar-ticlesinEnglish.CambridgeUniversityPress,Cam-bridge,UK.SimoneTeufel,AdvaithSiddharthan,andDanTidhar.2006a.Anannotationschemeforcitationfunction.InProceedingsofthe7thSIGdialWorkshoponDis-courseandDialogue,pages80–87.AssociationforComputationalLinguistics.SimoneTeufel,AdvaithSiddharthan,andDanTidhar.2006b.Automaticclassificationofcitationfunction.InProceedingsofthe2006ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages103–110.SimoneTeufel.2000.Argumentativezoning:Informa-tionextractionfromscientifictext.Ph.D.thesis,Uni-versityofEdiburgh.SimoneTeufel.2010.Thestructureofscientificarticles:ApplicationstoIndexingandSummarization.CSLIPublications.MarcoValenzuela,VuHa,andOrenEtzioni.2015.Identifyingmeaningfulcitations.InWorkshopsattheTwenty-NinthAAAIConferenceonArtificialIntelli-gence,pages21–26.PeterVinkler.1998.Comparativeinvestigationoffrequencyandstrengthofmotivestowardreferenc-ing.thereferencethresholdmodel.Scientometrics,43(1):107–127.XiaojunWanandFangLiu.2014.Areallliteratureci-tationsequallyimportant?Automaticcitationstrengthestimationanditsapplications.JournaloftheAsso-ciationforInformationScienceandTechnology(JA-SIST),65(9):1929–1938.DashunWang,ChaomingSong,andAlbert-L´aszl´oBarab´asi.2013.Quantifyinglong-termscientificim-pact.Science,342(6154):127–132.HowardD.White.2004.Citationanalysisanddiscourseanalysisrevisited.Appliedlinguistics,25(1):89–116.RuiYan,JieTang,XiaobingLiu,DongdongShan,andXiaomingLi.2011.Citationcountprediction:Learn-ingtoestimatefuturecitationsforliterature.InPro-ceedingsofthe20thACMInternationalConferenceonInformationandKnowledgeManagement(CIKM),pages1247–1252.RuiYan,CongruiHuang,JieTang,YanZhang,andXi-aomingLi.2012.Tobetterstandontheshoulderofgiants.InProceedingsofthe12thJointConferenceonDigitalLibraries(JCDL),pages51–60.DaniYogatama,MichaelHeilman,BrendanO’Connor,ChrisDyer,BryanR.Routledge,andNoahA.Smith.2011.Predictingascientificcommunity’sresponsetoanarticle.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages594–604. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 2 8 1 5 6 7 6 2 6 / / t l a c _ a _ 0 0 0 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 406 XiaodanZhu,PeterTurney,DanielLemire,andAndr´eVellino.2015.Measuringacademicinfluence:Notallcitationsareequal.JournaloftheAssociationforInformationScienceandTechnology,66(2):408–427.JohnM.Ziman.1968.PublicKnowledge:AnEssayConcerningtheSocialDimensionsofScience.Cam-bridgeUniversityPress,Cambridge,UK.AConversionofTeufel(2010)DataAsapartoftrainingtheclassifier,instancesfromTeufel(2010)areusedtosupplementrareclasses.TheirdatausestheschemeofTeufeletal.(2006B),whichsimilartoourschemebuthasseveralfine-graineddistinctions.Weconverttheinstancesfromtheirdatasetasfollows:Teufeletal.(2006B)classificationOurLabelWeakComparisonorContrastCoCoGMComparisonorContrastCoCoComparisonorContrastCoCoR0ComparisonorContrastCoCoXYBackgroundPBasExtendsPUseUsesPModiExtendsPMotMotivationPSimComparisonorContrastPSupComparisonorContrastNeutBackgroundCoMetNComparisonorContrastCoGoaNComparisonorContrastCoMetComparisonorContrastCoCoNComparisonorContrastCoCoMComparisonorContrastCoResNComparisonorContrastNotethatweomitinstanceswhoseconvertedclassisBACKGROUNDinordertoreducetheeffectsofalargemajorityclassandbecauseinstancesoftheFUTUREclassaremergedintoBACKGROUNDaccordingtotheirscheme.
Scarica il pdf