RESEARCH ARTICLE - IA de Investigación especializada en el MIT

ARTÍCULO DE INVESTIGACIÓN

A validation of coauthorship credit models with
empirical data from the contributions of
PhD candidates

un acceso abierto

diario

German Centre for Higher Education Research and Science Studies, Departamento 2, Research System and Science Dynamics,
Schützenstraße 6a, 10117 Berlina, Alemania

Paul Donner

Citación: Donner, PAG. (2020). A validation
of coauthorship credit models with
empirical data from the contributions of
PhD candidates. Quantitative Science
Estudios, 1(2), 551–564. https://doi.org/
10.1162/qss_a_00048

DOI:
https://doi.org/10.1162/qss_a_00048

Recibió: 15 Enero 2020
Aceptado: 04 Abril 2020

Autor correspondiente:
Paul Donner
donner@dzhw.eu

Editor de manejo:
Staša Milojević

Derechos de autor: © 2020 Paul Donner.
Publicado bajo Creative Commons
Atribución 4.0 Internacional (CC POR 4.0)
licencia.

La prensa del MIT

Palabras clave: author contribution, coauthorship, coauthorship credit, fractional counting, harmonic
counting, validation study

ABSTRACTO

A perennial problem in bibliometrics is the appropriate distribution of authorship credit for
coauthored publications. Several credit allocation methods and formulas have been
introducido, but there has been little empirical validation as to which method best reflects the
typical contributions of coauthors. This paper presents a validation of credit allocation
methods using a new data set of author-provided percentage contribution figures obtained
from the coauthored publications in cumulative PhD theses by authors from three countries
that contain contribution statements. The comparison of allocation schemes shows that
harmonic counting performs best and arithmetic and geometric counting also perform well,
while fractional counting and first author counting perform relatively poorly.

INTRODUCCIÓN

The social creation of knowledge and its dissemination as publications is of critical importance
to the modern science system. The publications under a researchers’ name attest to their ac-
complishments and form the basis of the reward system in science by having a major influence
on recognition by colleagues and career advancement. The fair sharing of authorship credit of
coauthored publications, especially in the context of research evaluation, is therefore a signif-
icant topic of research in itself (Egghe, Rousseau, & van Hooydonk, 2000; Gauffriau, Larsen,
et al., 2008).

Coauthorship credit is a theoretical concept that refers to the idea that one may conceive of
a publication as being associated with a mathematical object with unity value (1.0), on which
mathematical operations can be performed. By this transformation the publication’s authors
can be credited with relative shares of the whole unit. These shares can be referred to as their
respective credits or partial publication equivalents. A paper’s credit is an abstraction, útil
for analytical and especially evaluative purposes, but it does not exist as such in reality. A real
paper can of course not be arbitrarily divided between coauthors. To give an example, a two-
author paper may be accompanied by a statement indicating that both authors contributed
equally to the work. Aquí, each author made a 50% contribution and claims 50% of the pa-
per’s credit. Because only a few publications state contributions explicitly and numerically, él
is crucial in bibliometric practice to use a credit allocation method that is on average in close
agreement with typical contributions.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d

F
/

1
2
5
5
1
1
8
8
5
8
2
4
q
s
s
_
a
_
0
0
0
4
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

A validation of coauthorship credit models

In this connection the number of authors and the position in the author order can be valu-
able clues to individual contributions, because in many fields, by convention, author ordering
is based on relative contribution. En general, the first author made the greatest contribution,
with successive authors having made successively smaller or equal contributions.

For any specific publication, the combination of author order and author count (and cor-
responding authorship as a further clue) can never be an accurate indicator of authors’ con-
tributions, as these vary between papers for the same combinations of byline position and
number of coauthors. Sin embargo, author credit models that show high agreement with empirical
data can be very valuable for improving credit calculation on the level of aggregated publi-
cation sets, where such case-to-case variation can be expected to cancel out with large enough
numbers of observations. It has been demonstrated a number of times that the choice of author-
ship credit allocation method (also called counting method) has a substantial influence on the
results of bibliometric studies, not only on the level of authors, but even on the levels of in-
stitutions and countries (Gauffriau et al., 2008; Gauffriau & Olesen Larsen, 2005; Huang, lin,
& Chen, 2011; lin, Huang, & Chen, 2013). Además, the values of the credit allocation
method are often also constituent parts of advanced citation indicators. They are used as
weights for the citation indicators of specific publications for the units (es decir., autores, organiza-
ciones, countries) that contributed. The question of the most appropriate coauthor credit alloca-
tion method therefore deserves comprehensive investigation.

In this paper we only consider methods which split the total publication credit in some way
across the coauthors and add up to 1.0, as only these methods avoid artificially inflating pub-
lication counts. Además, flexible credit allocation schemes modified by parameters are
not considered, as these would need tuning based on extensive empirical data appropriate
to an anticipated application (es decir., a discipline), which could lead to overfitting in the general
caso. An overview of various author credit assignment methods is given in Waltman (2016).
The “whole counting” scheme, in which every coauthor gets a full publication credit regard-
less of the number of coauthors, leads to distortions of paper counts, a phenomenon also re-
ferred to as authorship inflation. Lindsey (1980) argued against the use of whole counting and
first author counting, which was then prevalent in the social studies of science literature.
While whole counting causes multiplication of authorship credit, first author counting also
leads to distortions, as it is not a viable sampling strategy to consider authors’ first-authored
papers as representative of their entire work because the order of authors in coauthored papers
is not random. He proposed to divide the unit authorship credit by the number of authors and
called this “adjusted counts.” De Solla Price (1981) also suggested—“in the absence of evi-
dence to the contrary”—to divide one whole unit of publication credit equally by the number
of authors to counteract authorship inflation. Their method, now mostly referred to as “frac-
tional counting,” is nowadays commonly used (waltman, 2016) but apart from Lindsey (1980)
there are no studies that contain empirical data on how prevalent the different methods are in
bibliometric studies.

To briefly review the methods investigated here, the fractional counting method assumes
the contribution of each author of multiauthored papers to be equal and assigns each author 1
norte
of the total credit (cual es 1.0), where N is the number of authors. As other methods also
assign fractions of a whole publication to authors, it might be better to refer to this method
as equal fractional counting. In the first author counting method, the first author receives
all of the credit and all further coauthors nothing, under the assumption that the first author
is the most important contributor (cf. Col & Col, 1974), who called it straight count). Uno
can consider these two methods as the extremes of total equality and total inequality.
Geometric counting (Egghe et al., 2000), arithmetic counting (also called proportional or

Estudios de ciencias cuantitativas

552

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d

F
/

1
2
5
5
1
1
8
8
5
8
2
4
q
s
s
_
a
_
0
0
0
4
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

A validation of coauthorship credit models

positionwise counting) (Kalyane & Vidyasagar Rao, 1995; van Hooydonk, 1997) and harmonic
counting (Hagen, 2008; Hodge & Greenberg, 1981), por otro lado, assign different credit
shares based on the number of authors N and on the authors’ positions i of a publication, usando
the formulas given below.

cgeometric ¼ 2N − i
2N − 1

carithmetic ¼

N þ 1 − i
1 þ 2 þ … þ N

charmonic ¼

1
i

þ … þ 1
norte

1 þ 1
2

Details can be found in the respective cited sources. In many of these publications, the respec-
tive methods have merely been proposed, with no attempts made to validate them.

We mention in passing that there are plenty further suggestions of even more elaborate
credit allocation schemes, based more or less on the whim of their proponents rather than
any empirical data. By all logic the empirical investigation of which authors contributed
how much to publications should be conducted first. Only after that is it reasonable that
models be proposed that approximate the reality as much as possible.

Hagen (2010), being the exception to that observation, validated (perceived) coauthor credit
shares from prior studies in chemistry, psicología, and medicine. It was found that harmonic
counting fits the data better than arithmetic, geometric, or fractional counting. It should be
noted that the mean values of credit values were used, such that there is only one value for
each possible combination of author position and author count. A diferencia de, for the present
study we are able to make use of observations on the level of individual credit statements
of authors of particular publications. The disadvantage of the former is that some of the aggre-
gate data points could be computed from more primary data points than others, thus distorting
the influence of these data points, which could be avoided by appropriate weighting.
Además, information on variability within the observed values is lost.

A further conceptual difference is that in the present study we do not utilize perceptions of
typical author credit of readers of multiauthored papers but public statements by authors
ellos mismos. In Hagen’s (2010) estudiar, the data for psychology, obtained from Maciejovsky,
Budescu, and Ariely (2009) and medicine, obtained from Wren et al. (2007), was not based
on authors’ judgments about their own contributions, but on the perceptions of researchers
facing typical publications with specific numbers of authors and corresponding authorship
statements. In order for this type of data to be applicable for the purpose of validating author
credit assignment methods, it would first need to be corroborated by comparison with data of
judgments by authors themselves. It goes without saying that for the determination of author-
ship credit, the authors’ statements of contribution are more relevant than the impressions (o
guesses) of readers.

For the chemistry data, Hagen (2010) uses figures from Vinkler (2000, Mesa 4, pag. 608).
Sin embargo, these data are not the empirical data but a simplified derived scheme devised by
Vinkler for use in his institute’s internal evaluation practice based on the research reported in
Vinkler (1993). In this latter paper, Vinkler was concerned about the unreflected use of first
author counting prevalent in scientometric research at the time. He surveyed authors from his

Estudios de ciencias cuantitativas

553

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d

F
/

1
2
5
5
1
1
8
8
5
8
2
4
q
s
s
_
a
_
0
0
0
4
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

A validation of coauthorship credit models

chemistry research institute on their activities for specific publications on seven types of ac-
tivities, such as experimental work, analysis and evaluation of experimental data, y escribiendo
the text. For each publication and activity, participants judged how much they contributed on
a scale of five levels (100%, >50%, ~50%, <50%, ~10%). But it seems for later calculations that these were transformed to the integers 1 to 5 in descending order (Vinkler, 1993, note to Table 5). This means that the ratio across the range of values of originally 10:1 (100%:~10%) has been compressed to 5:1. For each of the activities, the author derived importance weight- ing factors by averaging the responses of researchers as to how important those activities are in producing chemistry papers, which were given in percentages. These weights range from 1.0 to 3.0. The data that come closest to empirical credit distribution are what Vinkler calls Percentage Total Contribution Factors and are derived from the empirical activity data and weights, which were summed and averaged across authorship positions and author counts. The exact calculation is difficult to reconstruct from the original publication. The important part is that the equivalent to credit shares are the rows labeled “TCF%” in Table 5 of Vinkler (1993). Summarizing these, we present the relevant figures in Table 1. We have also added the numbers of observations for each row as extracted from the text of Vinkler (1993, p. 217). If one compares the figures in Table 1 with those in Table 4 of Vinkler (2000), which are the same as in Hagen (2010), one can see that Hagen did not use Vinkler’s empirical data. Rather, he used a weighting scheme employed by the evaluation committee of Vinkler’s institute at the time. Kim and Kim (2015) use both the 1993 and 2000 data, assuming both to be empirical data. But even using the more correct data from Vinkler (1993), we would object that these are not contribution statements from authors regarding the amount of work they contributed to a paper but elaborate attempts at reconstructions based on very roughly quantified statements to various activities. There is also an issue with the usage of the Wren et al. (2007) data. In this study, perceived contributions in three pre specified categories, initial conception, work performed, and super- vision, were collected, but overall contribution was not. Instead, the figures for the three cat- egories are averaged and these are the numbers used by Hagen (2010). This simple averaging implicitly assumes that the three contribution categories are equally important, but this as- sumption is not substantiated in the paper at all. 1.1. Contribution-Based Authorship Order The ordering of coauthor names on publications by contribution is widespread. It is conven- tional, for example, in management, according to journal editors (von Glinow & Novelli Jr., Table 1. Percentage total contribution factors, extracted from Vinkler (1993, Table 5) Author count (observations) 2 (6) 1st author 71 2nd author 29 3rd author 4th author 5th author 3 (12) 4 (13) 5 (11) 61 54 34 26 31 14 13 9 11 6 17 24 554 Quantitative Science Studies l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models 1982), in nursing, according to nurses expected to publish (Butler & Ginn, 1998), in library science, according to authors (Hart, 2000), medicine, according to Cochrane reviews authors (Mowatt, Shirran, et al., 2002), editorial board members (Bhandari, Einhorn, et al., 2003), and promotion committee members (Wren et al., 2007) and educational science, according to authors (Moore & Griffin, 2006). 1.2. Authorship Status Signals Not Based on Contribution 1.2.1. Alphabetical authorship In disciplines in which alphabetical authorship order is the accepted convention, it would obviously be unreasonable to apply credit allocation methods that apply weights based on author position. However, the problem arises of how to handle publications in these “alpha- betized” disciplines that manifestly deviate from the norm of alphabetical order. In those cases it can be reasoned that the authors were either not aware of the norm or they inten- tionally refrained from submitting to it. In either case, the assumption of equal contributions ought to be rejected. The next difficulty is that in many cases it is simply not possible to determine whether a group of authors followed the alphabetical ordering convention or if their deliberately chosen author order just happened to be alphabetical, even though it was intended to reflect relative contribution. For example, the prevalence of alphabetical order for two authors by chance alone is 50%. But as the number of coauthors increases, the chance of coincidental alphabetical ordering quickly tends to zero. From the perspective of authors who would like to express their relative contributions by authorship order, alpha- betical ordering convention can be an impediment. In cases when the contribution order coincides with alphabetical order, the alphabetical norm would inadvertently lead to dis- torted perceptions of contributions. Waltman (2012) has conducted a large-scale empirical study on the prevalence of inten- tionally alphabetical authorship in coauthored publications; that is, alphabetical coauthorship corrected for the probability of incidental alphabetical ordering. The share of intentionally al- phabetically ordered coauthored publications has decreased from about 9% in 1981 to about 4% in 2011. On the level of disciplines, as approximated by Web of Science subject catego- ries, there are stark differences in the use of intentional alphabetical authorship order. It is common in “Mathematics,” “Business, finance,” “Economics,” and “Physics, particles & fields.” However, also in these disciplines, alphabetical order is far from universal in the stud- ied period (2007–2011). The percentages of alphabetical order range from 73% to 57%. The rates seem to be declining lately in “Mathematics” and “Economics” while alphabetical au- thorship has increased over time in “Business, finance.” It is important to keep in mind that the rates may appear higher to researchers from these fields because of incidental alphabetical ordering. 1.2.2. Last authorship and corresponding authorship There is as of yet no conclusive empirical evidence from prior studies about the specific signal of corresponding authorship and last authorship with respect to contribution to publications across different disciplines. While there are conventions in disciplines by which group leaders are indicated by last authorship, (a) this does not allow the conclusion that all last authors are group leaders and (b) neither last authorship nor corresponding authorship in themselves tell anything about how much these authors have, on average, contributed. Quantitative Science Studies 555 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models Laudel (2002), p. 11, based on interviews and publication data, reports on authorship po- sition conventions in highly collaborative work in interdisciplinary research at the intersection of biology, physics, and chemistry: Nearly all coauthors were ordered in the following way. The first author is the scientist who conducted the experimental work; that is, a doctoral student or a postdoctoral fel- low. […] The seven publications with the permuted order are of special interest. In these cases the group leader was first author, while another experimenter was listed last. In two cases the group leader did not change from the experimental role to the conceptual role but assumed both. This seems due to the group’s specific content of work (development of research techniques and instruments). […] In the case of a DOL [division of labor], usually the scientist who conducted the larger part of the experimental work is the first author, followed by the experimenter of the collaborating group. Both group leaders are last au- thors on the coauthorship list. In this case, the conventions of listing research group leaders last and of authorship order by decreasing overall contribution to the work were used. The mentioned exceptions to the last position rule suggest that the contribution-based ordering convention overrides the group- leaders-last convention for the studied community of researchers. Wren et al. (2007) report on a survey of the perceived authorship credit and influence of last author position and corresponding authorship in medicine. They elicited perceptions of North American promotion committee members on the perceived credit percentages of hypo- thetical three- and five-author papers in three specified contribution categories: initial concep- tion, work performed, and supervision. The results indicate that last authors are perceived as having made important contributions to initial conception and supervision (about 50%), but far smaller contributions to work performed. The two different conditions for indicated corre- sponding author in five-author papers suggest that a corresponding author is perceived to have contributed substantially to initial conception and supervision, but far less so if the author is listed as the middle author rather than the last author. To a lesser degree, the perception of contribution to the category of work performed also increases if the middle author is the cor- responding author. From the data no conclusions to overall contribution can be drawn, be- cause it is unknown how the three work categories would combine to an overall contribution. Sauermann and Haeussler (2017) study the number of listed contribution categories in the multidisciplinary journal PLOS ONE per author position. They point out that such statements are not informative about how much a given author contributed to the category and how im- portant the category was for the given paper. Corresponding authors contributed on average to more categories than other authors, and this effect can be found across all authorship positions. It must be concluded that extant studies give an inconclusive picture of the interaction of contribution, corresponding authorship, and last author position. More empirical studies are needed before discipline-specific adjustments to authorship credit assignment methods based on last or corresponding authorship can be considered. We therefore restrict our study to the information contained in author count and position and refrain from making adjustments to the studied credit methods, as it would be premature to do so at the current state of knowledge. 1.3. The Contribution of This Paper In the remainder of the paper we comparatively validate the quality of several coauthor credit assignment methods using a novel empirical data set. We collected explicit numerical Quantitative Science Studies 556 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models contribution statements on the micro level of authors and papers from German, Australian, and New Zealand cumulative dissertation theses. This is the first study of this specific kind, as prior comparable research did not use author contribution statements and only used aggregate data. Hence, we are able to conduct a validation study for coauthorship credit allocation methods that uses the most reliable and fine-grained micro level data available to date. 2. METHODS AND DATA The empirical data used in this study is derived from contribution statements of PhD candi- dates for coauthored publications used in cumulative dissertation theses. Cumulative disserta- tion theses are theses that consist entirely or partially of published material, such as journal papers, book chapters, or conference papers. PhD theses of German graduates were searched on Google Scholar, which indexes the full texts of theses archived in many university repositories. Searches were made by the German and English keywords “Eigenanteil”, “eigener Anteil”, “Eigenbeitrag”, “Anteilserklärung”1, and “self-contribution” in combination with keywords for PhD theses (“dissertation”, “PhD thesis”, and “doctoral thesis”). Furthermore, we searched for universities with English language guidelines for coauthor- ship contribution statements in cumulative PhD theses on Google with the keyword “theses” in combination with “contribution statement” or “coauthorship statement”, which yielded a number of universities with such policies in Australia and New Zealand. For each identified university, we searched the university’s institutional repository by a full text search for such statements and inspected the found hits. All found dissertations were checked and statements of contribution to coauthored pub- lished articles, proceedings papers, and edited book chapters given in percentages were man- ually extracted. If equally shared first authorships were declared, the same percentage was used for the indicated authors, as typically only that of the dissertation author was specified explicitly. For two-author publications for which the contribution percentage for one author is given explicitly we also recorded the remaining proportion to 100% for the other author as this can be inferred directly in these cases. We also used percentage figures for article coauthors other than the thesis author, where these were given. Verbal narrative declarations of contri- bution were not considered. Declarations giving separate percentage contribution shares for multiple work tasks but no overall estimate were also not considered. Contributions to unpub- lished conference talks, news items, and working papers (unless also published in a journal) were discarded. Publications that were not published at the time of thesis publications were searched and verified and only retained if they were eventually published and the final ver- sions had the same authors and authorship order as in the contribution statement. Percentage contribution statements to single-authored publications were not collected, as these in all cases were given as 100%, if they were mentioned at all. While this sample is by no means representative, it does have the advantage that the con- tribution statements are made publicly and are checked and approved by the supervisors and coauthors, which is required by the graduation regulations, and should therefore be relatively accurate and reliable. It should be stressed that all used credit figures are documented in the final published theses. Nevertheless, the data cannot be claimed to be objective, as the figures may be shaped by team-internal negotiation processes. As the sample is restricted to a specific 1 The German terms mean “own share” (twice), “own contribution,” and “declaration of contribution.” Quantitative Science Studies 557 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d / . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models Table 2. Number of observations by author count and author position Author count 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 28 48 74 1 51 77 62 46 54 28 19 14 8 7 2 1 1 4 2 1 2 0 0 0 0 2 45 34 27 19 11 10 4 1 2 1 2 2 1 0 0 1 0 0 1 0 1 3 0 31 27 16 10 8 3 2 0 0 2 2 0 1 0 0 0 0 0 1 0 4 0 0 23 15 11 6 1 2 0 0 1 1 0 0 0 2 0 1 0 0 0 5 0 0 0 15 8 5 2 2 0 0 1 0 0 1 0 0 0 0 0 0 0 Author position 6 0 7 0 8 0 9 0 10 0 11 0 12 0 17 0 Row percentage 12.3 0 0 0 8 4 1 1 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 18.3 17.9 14.3 13.1 8.5 4.0 3.2 1.5 1.3 2.1 0.9 0.3 0.8 0.3 0.6 0.3 0.1 0.1 0.1 0.1 Column percentage 48.7 20.8 13.2 8.1 4.4 2.4 1.0 0.4 0.3 0.3 0.1 0.1 0.1 author group, namely PhD researchers, working and publishing under their specific condi- tions, these conditions inevitably influence the data. In particular, PhD researchers are com- monly expected to provide major contributions to coauthored publications in order for them to count towards their cumulative theses, as a demonstration of independent research ability is a regular requirement for graduation. For this reason, the authorship positions in a sample of PhDs’ publications will not be distributed randomly but be concentrated toward the position indicat- ing most contribution relatively more often; that is, first author position. Nevertheless, for the purpose of validating generally applicable authorship credit models, the specific nature of the sample is of little consequence, as the methods should be valid regardless of the specifics of the publications and authors. In other words, the nature of the data does not cause any inherent systematic distortions that would be biased towards any of the validated methods. Quantitative Science Studies 558 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models Table 3. Number of dissertations and publications by subjects Subject Biology Medicine Chemistry Engineering Economics and business Psychology Pharmaceutics Nursing Sport science Nutrition Geology Astronomy Agricultural science Environmental science Mathematics Political science Arts Educational science Sociology Tourism studies Veterinary medicine Dissertations 25 Publications 104 19 19 16 9 7 4 4 4 3 3 2 2 2 1 1 1 1 1 1 1 71 68 62 23 23 27 18 18 7 6 5 4 2 4 3 1 1 1 1 1 As there seems to be considerable disagreement about the relative contributions of corre- sponding authors who are not first authors (Du & Tang, 2013) and heterogeneity in the mean- ing of the corresponding authorship signal between disciplines, we do not study credit allocation schemes modified to accommodate for corresponding authorship. Because statements about equal contributions or shared first authorship are not readily available in machine-readable form at the moment and hence impractical to use at scale, we do not take modifications for shared first authorships of credit allocation schemes into account either. The data set is published at https://doi.org/10.5281/zenodo.3755227. 3. RESULTS One hundred twenty-five PhD theses completed between 2005 and 2019 at 22 different uni- versities, fulfilling the above stated criteria, were found. Of these dissertations, 53 originated in Germany, 52 in Australia, and 20 in New Zealand. These included 465 combinations of thesis Quantitative Science Studies 559 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models Figure 1. Boxplot of distributions of claimed credit for first authors of authors with two to eight authors. Horizontal bar indicates the group median, circle the group mean. N (from two to eight authors): 51, 77, 62, 46, 54, 28, 19. and article, from 382 unique articles. Considering all stated contribution figures (i.e., combi- nations of article and author), there were 778 data points, which were spread across author counts and author positions as shown in Table 2. In this table, the rows refer to the number of coauthors of papers while the columns refer to the specific positions in the author byline that an author can occupy. For example, the cell of author count 4 and author position 2 refers to data about the second listed author in papers with four authors, and there are 27 observations of this combination in the data. It can be seen that the observed cases are concentrated on the first author position across most author counts. Overall, almost half of all observations are of first author positions, confirming the expected skew due to the nature of the data. As men- tioned above, PhD researchers are expected to make major contributions and major contribu- tions are more likely to result in first authorship. We classified theses by subject, based on the title and degree-conferring department. The distribution of theses and articles across subjects is shown in Table 3. Most theses and publications are from the natural sciences, engineering, psychology, and the health sciences, whereas the social sciences, mathematics, and arts and humanities are hardly represented. In Figure 1 we show that there is considerable variability in the claimed credit for the same position/author count combinations. Displayed are distribution summary boxplots for the credit of first authors of papers by two to eight authors, as there are reasonable numbers of observations only for these cases. For example, while the average contribution of first authors Table 4. Lack-of-fit scores of authors’ credit attributions and predicted values of counting methods (N = 778 and 777) Harmonic First Fractional Geometric Proportional Lack-of-fit index 0.174 Lack-of-fit, one outlier removed 0.168 1.926 0.852 15.503 0.381 1.917 0.852 0.323 0.365 Quantitative Science Studies 560 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Figure 2. Scatterplots of predictions of authorship credit allocation methods against authors’ claimed contributions. N = 778. Quantitative Science Studies 561 A validation of coauthorship credit models for two-author papers is 76%, the observed values range from 30% to 100%, with a standard deviation of 20%. As can be verified in the accompanying data set, there are statements claim- ing 100% authorship credit for first author positions of papers with multiple authors. We follow the earlier literature by evaluating the validity of the studied credit allocation methods for our data using a lack-of-fit index. This measure is a sum of squared deviations of predictions from data, scaled by the size of the data set according to the formula Lack-of-fit ¼ 1 n − 1 X Þ2 : ð O − E E where n is the total number of observations, O the observed value, and E the expected value (that is, the value predicted by the model). The values of the lack-of-fit measure, following Hagen (2010), are presented in Table 4. For this calculation, percentage values of the reported credit were divided by 100 in order to make the values comparable to the cited prior literature. For the first author count method, we changed the credit of nonfirst authors with a value of 0.0% to 0.1% to avoid division by zero. It is important to note that in contrast to the prior literature, such as Hagen (2010) and Kim and Kim (2015), we did not use the average values of contribution per combination of author count and author position. Instead we used the full data on the level of individual observations. When using the whole sample (second column), the lack-of-fit of the geometric counting method is extremely influenced by a single observa- tion. In this observation, the claimed credit is 30% for the 17th author (who is not a corre- sponding author) of a paper with 17 authors. Geometric counting assigns this position only 0.0008% credit, while harmonic counting, for example, assigns 1.7%. For this reason, the third column in Table 4 shows the lack-of-fit values for all methods without this particular observa- tion. The results are in line with the prior validation studies of Hagen (2010) and Kim and Kim (2015), the latter of which uses a similar but extended data set to the former and eliminates some inconsistencies. Harmonic credit most closely approximates the empirical data, followed by arithmetic and geometric credit. The data for the comparisons is visualized in Figure 2. 4. DISCUSSION Collaborative research published in multiauthored works is pervasive in science, as is eval- uation based on published outputs. The choice of coauthorship counting method is decisive for the results of bibliometric research and evaluation studies (Gauffriau & Olesen Larsen, 2005). Therefore, in this study we have validated various proposed authorship credit allocation methods for multiauthored scientific publications by their model fit to empirical data of au- thorship credit statements of PhD graduates from three countries in cumulative dissertation theses. It was found that the harmonic credit method shows the highest agreement with the data with a lack-of-fit index of 0.174. Arithmetic and geometric credit performed slightly worse (lack-of-fit: 0.381; 15.503 or with outlier removed 0.323) while fractional credit and first author-only credit are clearly inferior methods (0.852 and 1.926, respectively). The results are in agreement with those of Hagen (2010) and strengthen the case for replacing fractional counting with harmonic counting in scientometric research and evaluation. However, the re- sults presented here should be interpreted with due caution, as the data set is not represen- tative. Most of the sampled authorship statements are from the natural sciences, the health sciences, and engineering and the data set is skewed towards first authorship position. 4.1. A Research Agenda for Quantitative Author Contribution Studies Despite being a topic of discussion since at least the 1980s, author credit attribution has made little progress. The reason is a lack of empirical studies that provide solid quantitative Quantitative Science Studies 562 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d / . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models evidence. Future studies should complement the results presented here by surveying all au- thors of coauthored publications independently and asking them to assign percentage ranges of approximate authorship credit to all authors. Ranges, and not scalars, are necessary in order to capture the uncertainty inherent in such judgments. Such studies should also take into ac- count the different meaning of corresponding and last authorship across disciplines. ACKNOWLEDGMENTS This research was funded by German Federal Ministry of Education and Research grants 01PQ16004 and 01PQ17001. AUTHOR CONTRIBUTIONS PD: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. COMPETING INTERESTS The author has no competing interests. DATA AVAILABILITY The complete data set is published at https://doi.org/10.5281/zenodo.3755226. REFERENCES Bhandari, M., Einhorn, T. A., Swiontkowski, M. F., & Heckman, J. D. (2003). Who did what?: (Mis)perceptions about authors’ contribu- tions to scientific articles based on order of authorship. Journal of Bone and Joint Surgery, 85(8), 1605–1609. Butler, L., & Ginn, D. (1998). Canadian nurses’ views on assign- ment of publication credit for scholarly and scientific work. Canadian Journal of Nursing Research Archive, 30(1), 171–183. Cole, J. R., & Cole, S. (1974). Social stratification in science. Chicago, London: University of Chicago Press. de Solla Price, D. (1981). Multiple authorship. Science, 212(4498), 986–986. https://doi.org/10.1126/science.212.4498.986-a Du, J., & Tang, X. (2013). Perceptions of author order versus con- tribution among researchers with different professional ranks and the potential of harmonic counts for encouraging ethical co- authorship practices. Scientometrics, 96(1), 277–295. https:// doi.org/10.1007/s11192-012-0905-4 Egghe, L., Rousseau, R., & van Hooydonk, G. (2000). Methods for accrediting publications to authors or countries: Consequences for evaluation studies. Journal of the American Society for Information Science, 51(2), 145–157. https://doi.org/10.1002/ (SICI)1097-4571(2000)51:2%3C145::AID-ASI6%3E3.0.CO;2-9 Gauffriau, M., Larsen, P., Maye, I., Roulin-Perriard, A., & von Ins, M. (2008). Comparisons of results of publication counting using different methods. Scientometrics, 77(1), 147–176. https://doi. org/10.1007/s11192-007-1934-2 Gauffriau, M., & Olesen Larsen, P. (2005). Counting methods are decisive for rankings based on publication and citation studies. Scientometrics, 64(1), 85–93. https://doi.org/10.1007/s11192- 005-0239-6 Hagen, N. T. (2008). Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publi- cation and citation analysis. PLOS ONE, 3(12), e4021. https://doi. org/10.1371/journal.pone.0004021 Hagen, N. T. (2010). Harmonic publication and citation counting: Sharing authorship credit equitably—not equally, geometrically or arithmetically. Scientometrics, 84(3), 785–793. https://doi. org/10.1007/s11192-009-0129-4 Hart, R. L. (2000). Co-authorship in the academic library literature: A survey of attitudes and behaviors. The Journal of Academic Librarianship, 26(5), 339–345. Hodge, S. E., & Greenberg, D. A. (1981). Publication credit. Science, 213, 950. Huang, M.-H., Lin, C.-S., & Chen, D.-Z. (2011). Counting methods, country rank changes, and counting inflation in the assessment of national research productivity and impact. Journal of the American Society for Information Science and Technology, 62(12), 2427–2436. https://doi.org/10.1002/asi.21625 Kalyane, V., & Vidyasagar Rao, K. (1995). Quantification of credit for authorship. ILA Bulletin, 30(3–4), 94–96. Kim, J., & Kim, J. (2015). Rethinking the comparison of coauthorship credit allocation schemes. Journal of Informetrics, 9(3), 667–673. https://doi.org/10.1016/j.joi.2015.07.005 Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation, 11(1), 3–15. Lin, C.-S., Huang, M.-H., & Chen, D.-Z. (2013). The influences of counting methods on university rankings based on paper count and citation count. Journal of Informetrics, 7(3), 611–621. https:// doi.org/10.1016/j.joi.2013.03.007 Lindsey, D. (1980). Production and citation measures in the sociol- ogy of science: The problem of multiple authorship. Social Studies of Science, 10(2), 145–162. Maciejovsky, B., Budescu, D. V., & Ariely, D. (2009). The re- searcher as a consumer of scientific publications: How do name-ordering conventions affect inferences about contribution credits? Marketing Science, 28(3), 589–598. https://doi.org/ 10.1287/mksc.1080.0406 Quantitative Science Studies 563 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d / . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 A validation of coauthorship credit models Moore, M. T., & Griffin, B. W. (2006). Identification of factors that influence authorship name placement and decisions to collabo- rate in peer-reviewed, education-related publications. Studies in Educational Evaluation, 32(2), 125–135. https://doi.org/10.1016/ j.stueduc.2006.04.004 Mowatt, G., Shirran, L., Grimshaw, J. M., Rennie, D., Flanagin, A., Yank, V., … Bero, L. A. (2002). Prevalence of honorary and ghost authorship in Cochrane reviews. Journal of the American Medical Association, 287(21), 2769–2771. https://doi.org/ 10.1001/jama.287.21.2769 Sauermann, H., & Haeussler, C. (2017). Authorship and contribu- tion disclosures. Science Advances, 3(11), e1700404. van Hooydonk, G. (1997). Fractional counting of multiauthored publications: Consequences for the impact of authors. Journal of the American Society for Information Science, 48(10), 944–945. https://doi.org/10.1002/(Sici)1097-4571(199710)48:10%3C944:: Aid-Asi8%3E3.0.Co;2-1 Vinkler, P. (1993). Research contribution, authorship and team co- operativeness. Scientometrics, 26(1), 213–230. https://doi.org/ 10.1007/BF0201680 Vinkler, P. (2000). Evaluation of the publication activity of research teams by means of scientometric indicators. Current Science, 79(5), 602–612. von Glinow, M. A., & Novelli Jr., L. (1982). Ethical standards within orga- nizationalbehavior.AcademyofManagementJournal,25(2),417–436. Waltman, L. (2012). An empirical analysis of the use of alphabet- ical authorship in scientific publishing. Journal of Informetrics, 6(4), 700–711. https://doi.org/10.1016/j.joi.2012.07.008 Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391. https://doi. org/10.1016/j.joi.2016.02.007 Wren, J. D., Kozak, K. Z., Johnson, K. R., Deakyne, S. J., Schilling, L. M., & Dellavalle, R. P. (2007). The write position. EMBO Reports, 8(11), 988–991. https://doi.org/10.1038/sj.embor.7401095 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 1 2 5 5 1 1 8 8 5 8 2 4 q s s _ a _ 0 0 0 4 8 p d . / f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Quantitative Science Studies 564 ARTÍCULO DE INVESTIGACIÓN imagen

Descargar PDF