RESEARCH ARTICLE
Which factors are associated with Open Access
Veröffentlichung? A Springer Nature case study
Fakhri Momeni1
, Stefan Dietze1,2, Philipp Mayr1
,
Kristin Biesenbender3
, and Isabella Peters3
1GESIS—Leibniz Institute for the Social Sciences, Köln, Deutschland
2Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Deutschland
3ZBW—Leibniz Information Centre for Economics, Kiel, Deutschland
Schlüsselwörter: APC policies, bibliometrics, citation impact, machine learning, open access
ABSTRAKT
Open Access (OA) facilitates access to research articles. Jedoch, authors or funders often
must pay the publishing costs, preventing authors who do not receive financial support from
participating in OA publishing and gaining citation advantage for OA articles. OA may
exacerbate existing inequalities in the publication system rather than overcome them. To
investigate this, we studied 522,411 articles published by Springer Nature. Employing
correlation and regression analyses, we describe the relationship between authors affiliated
with countries from different income levels, their choice of publishing model, and the citation
impact of their papers. A machine learning classification method helped us to explore the
importance of different features in predicting the publishing model. The results show that
authors eligible for article processing charge (APC) waivers publish more in gold OA journals
als andere. Im Gegensatz, authors eligible for an APC discount have the lowest ratio of OA
publications, leading to the assumption that this discount insufficiently motivates authors to
publish in gold OA journals. We found a strong correlation between the journal rank and the
publishing model in gold OA journals, whereas the OA option is mostly avoided in hybrid
journals. Auch, results show that the countries’ income level, seniority, and experience with OA
publications are the most predictive factors for OA publishing in hybrid journals.
1.
EINFÜHRUNG
The unrestricted availability of Open Access (OA) publications is linked to the goal of granting
all interested parties free access to scientific knowledge and ensuring greater equality of access
(Munafo, Nosek et al., 2017). This view is strongly related to the consumers of scholarly knowl-
edge, who then would not have to pay for access. Jedoch, when taking the authors of those
articles into account, they are affected by OA in two different ways: when choosing a publica-
tion model for an article and when receiving citations (and hence reputation) for articles that
have been published via a certain model (usually described as citation advantage; sehen, für
Beispiel, Langham-Putrow, Bakker, and Riegelman (2021)). Those two aspects of OA may
introduce significant biases and inequity into the scholarly publication and reputation system
because they may restrict participation in OA in particular ways (Bahlai, Bartlett et al., 2019).
Erste, the OA publishing model generally shifts the publishing costs from readers to authors
or their institutions and funders by introducing article processing charges (APCs). This can be a
severe constraint for those authors who cannot afford these costs or do not receive any
Keine offenen Zugänge
Tagebuch
Zitat: Momeni, F., Dietze, S., Mayr,
P., Biesenbender, K., & Peters, ICH. (2023).
Which factors are associated with
Open Access publishing? A Springer
Nature case study. Quantitative
Science Studies, 4(2), 353–371. https://
doi.org/10.1162/qss_a_00253
DOI:
https://doi.org/10.1162/qss_a_00253
Peer Review:
https://www.webofscience.com/api
/gateway/wos/peer-review/10.1162
/qss_a_00253
Erhalten: 16 August 2022
Akzeptiert: 6 Marsch 2023
Korrespondierender Autor:
Fakhri Momeni
fakhri.momeni@t-online.de
Handling-Editor:
Ludo Waltman
Urheberrechte ©: © 2023 Fakhri Momeni,
Stefan Dietze, Philipp Mayr, Kristin
Biesenbender, and Isabella Peters.
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International (CC BY 4.0)
Lizenz.
Die MIT-Presse
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
/
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
financial support. To overcome this issue, most publishers have implemented an APC
waiver/discount policy for authors from, Zum Beispiel, low-income countries (Lawson, 2015).
Jedoch, it is an open question as to how the different options for OA publishing and
waivers/discounts are considered and adopted by researchers with various characteristics,
such as their countries’ income level and also their seniority and gender—factors that are also
often associated with the decision to publish OA (Iyandemye & Thomas, 2019; Olejniczak &
Wilson, 2020; Simard, Ghiasi et al., 2021; Schmied, Merz et al., 2021; Zhu, 2017). Rouhi, Beard,
and Brundy (2022) discussed the waiver issues from the perspectives of the publisher,
institutions, and developing countries. They mentioned the potential unfairness that authors
are confronted with, which may be caused by APC-based models. They argued that waiver
programs have yet to address this problem successfully. They suggested that meeting the
equity standard requires a cross-functional approach involving publishers, funders, Forschung
institutions, individual researchers, libraries, and service providers.
To accommodate OA publishing costs, three funding options have emerged over time. Erste,
diamond OA journals are funded by public institutions, such as libraries, which enable free
reading and publishing for all researchers. Zweite, transformative agreements between public
institutions and publishers have been introduced that include reading and publishing contracts
and which are also funded by the institutions. In this case, there are no direct fees for authors,
but their institutions pay the APCs as part of a consortium. Access to publishing and access to
publications is limited to participating organizations only. Dritte, APCs could also be paid by
the authors or their institutions themselves. The first option leads to gold OA at the journal
Ebene. Transformative agreements allow authors to publish in either gold OA or hybrid journals
(which—for a fee—allow publishing individual articles as an OA-variant). The third option is
often associated with hybrid journals. All other publishing models for journals usually require
funding via subscriptions, resulting in closed-access (CA) articles that can only be read after
paying the article or journal fee.
The publishing model is also strongly associated with the visibility of authors and articles.
For many researchers, it makes a difference in which journals they publish (z.B., angesichts
discipline-specific journal rankings). If they want to be noticed by others and/or seek promo-
tion, it can be crucial to publish in reputable journals, especially for early-career researchers.
To achieve this, not only do financial hurdles and APCs have to be overcome, but also, für
Beispiel, English language skills and technical skills are needed, as well as institutions that can
help with legal advice or infrastructure support. Against this background, researchers have to
decide which publishing model to choose and whether OA is not only an altruistic but a fea-
sible option at all.
The second possible source of bias and inequity is related to the paying for access case: Es
has been shown already that articles published as OA variants are more visible, leading to
higher citation counts and altmetrics (Evans & Reimer, 2009; Fraser, Momeni et al., 2020;
Lewis, 2018; McKiernan, Bourne et al., 2016; Ottaviani, 2016). Darüber hinaus, the Matthew effect
shows that researchers who are already well known and widely cited receive even more cita-
tionen (Farys & Wolbring, 2021)—which directly affects rewards for publication in prestigious
journals, for prominence, and citations. For researchers, publications play a central role in
their daily practice and the reputation system in which they operate. Publications enable
researchers to build on the body of knowledge and refer to those findings by citing the pub-
lications (which accumulate reputation in this way). Somit, access to publications is crucial
for the progress of science and building of reputation—both of which can be impeded by a
lack of access to OA publishing options and the risk of CA articles not being cited as frequently
as OA articles.
Quantitative Science Studies
354
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
From that, we hypothesize that researchers with better access to financial resources have
better access to publications—both in terms of access to read openly and in terms of access to
publish openly. Associated with that may be an even stronger citation advantage for those
researchers (usually WEIRD: Western, gebildet, industrialized, rich, and democratic (Henrich,
Heine, & Norenzayan, 2010)) with extensive OA-publishing options. Als solche, OA may carry
the risk of perpetuating already existing inequalities rather than resolving such marginalization
in the scholarly communication system (Fuchs, Pearce et al., 2021).
2. RELATED WORK
Related work also indicates a strong association between economic factors, OA, and citation
advantages. The scientific output of countries is associated with their economic evolution because
scientific progress needs governments’ financial support. Samimi (2011) used a Granger Causality
Test to examine the causal relationship between scientific output and GDP in 176 countries and
found a two-way positive relationship between them. King (2004) compared published papers
and their citation impacts across countries and found that only 31 countries contributed to 98%
of the world’s highly cited papers and that the remaining 161 countries contributed less than 2%.
OA publishing is also highly influenced by the authors’ country of affiliation, because it
determines APC waiver/discount policies or the availability of transformative agreements with
Verlag. Some publishers offer general waivers or have a discount policy for all of their
journals for eligible authors, and the country’s income level mainly determines eligibility.
Lawson (2015) has studied the waiver policy of the 32 most prominent publishers and found
Das 68% of them grant APC waivers. Simard et al. (2021) found that low-income countries
publish and cite OA more than upper-middle and high-income countries. The positive corre-
lation between OA citing and publishing is 1.3 times weaker for high-income countries than
andere Länder. Ähnlich, Iyandemye and Thomas (2019) showed that biomedicine
researchers from low-income countries have the highest percentage in OA publishing. Schmied
et al. (2021) reported the proportionately fewer OA articles published in Elsevier’s journals for
low-income countries, despite their eligibility for APC waivers.
Olejniczak and Wilson (2020) studied the articles published by faculty members at research
universities in the United States and found that in the United States, male and senior authors
are more likely to publish in OA form. Zhu (2017) conducted a survey with over 1,800
researchers at 12 Russell Group universities1 to find the differences in OA publishing regarding
discipline, seniority, and gender. Their results revealed disciplinary differences in OA publish-
ing (Medical and Life Scientists are most likely to publish in gold OA journals), more tendency
toward OA publishing for senior authors, and across genders for men.
The journal rank is a decisive factor in submitting the article in addition to its business
Modell. Schroter, Tite, and Smith (2005) conducted a survey study with 28 international authors
who submitted to the British Medical Journal and found that for authors, the journal’s ranking
is more important than the availability of OA.
Many studies have investigated the OA citation outcome, and most found a citation advan-
tage for OA articles (Evans & Reimer, 2009; Fraser et al., 2020; Lewis, 2018; McKiernan et al.,
2016; Ottaviani, 2016). Jedoch, regarding biases (z.B., quality bias, self-selecting, mandat-
ing, self-archiving), different sampling and controlling data make it difficult to conclude that
receiving more citations is only the effect of OA. Momeni, Mayr et al. (2021) studied the cita-
tion impact of flipping journals from CA to OA and generally found a slightly higher growth in
1 https://russellgroup.ac.uk/about/our-universities/.
Quantitative Science Studies
355
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
receiving citations compared to journals in the same discipline and the impact factor’s range.
Jedoch, they did not observe this trend in all scientific fields. Momeni, Mayr, and Dietze
(2022) examined the correlation between different factors and the future authors’ h-index
and found a positive but weak correlation between them.
One issue that is often discussed together with OA publishing and APCs is the problem of
predatory publishing. Predatory publishers take advantage of the OA movement but work
against good scientific practice. Ross-Hellauer, Reichmann et al. (2021) did a systematic
review to study the threat to equity in science via open science implementations. They con-
cluded that less well-resourced researchers, researchers from non-English-speaking countries,
and early-career researchers are particularly affected by the predatory publishing problem.
3. RESEARCH QUESTIONS
We conduct our study on the association between publishing models, the economic back-
ground of researchers, and other author-specific and structural factors along three major
research questions:
RQ1: What is the relationship between the income level of researchers’ affiliation countries
and their publication behavior (do they prefer OA or CA)?
RQ2: What is the relationship between the income level of researchers’ affiliation countries
and their publication behavior (OA or CA) with their citation impact?
To answer these questions, we categorize corresponding authors based on the income level
of their affiliation country and compare the access status of articles they have published and
their citation impact. Whereas the first two RQs are rather descriptive and aim at quantifying
the extent to which access to publish openly and access to read openly (and along with it to
make them easier/more likely to cite) are related to the economic background of authors, Die
third RQ takes a variety of factors into account that have been shown to be strongly associated
with tendencies to publish OA (Iyandemye & Thomas, 2019; Olejniczak & Wilson, 2020;
Simard et al., 2021; Smith et al., 2021; Zhu, 2017).
RQ3: What factors (z.B., journals, articles, authors, or their countries) are associated with
selecting the business model of publications (OA against CA)?
Here we aim to give a detailed view of associating factors with OA publishing using cor-
relation, regression, and machine learning analyses. Zu diesem Zweck, structural features, wie zum Beispiel
APC waivers, are considered besides author-specific properties, such as gender or years of
publishing activity (siehe Tabelle 2). We will also look closely at the different access forms to pub-
lications such as gold OA, hybrid, and CA. Concerning the level of journals, the relationships
between journal rankings, APCs, and research fields (Health Sciences, Life Sciences, Physical
Wissenschaften, Social Sciences, and multiple fields) will be examined. Zusätzlich, possible country-
related influencing factors will be investigated, such as countries’ income level, transformation
agreements’ existence, or opportunities for researchers to obtain APC discounts or waivers. Bei
the journal article level, the ratio of OA to CA citations in an article and the number of authors
involved are examined. Other author-specific influencing factors can be gender and age, Die
ratio of OA to CA publications in the past, or even the proportion of international coauthors.
4. DATA AND METHODOLOGY
To conduct our study, information on the business model, author characteristics, and article
impact are needed, and several approaches and databases must be linked to receive a com-
plete data set.
Quantitative Science Studies
356
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
4.1. Data Selection
For the business model of journals (OA, hybrid, CA) it is possible to crawl the information
from the journal’s or publisher’s website or to look up sources such as the Directory of Open
Access Journals (DOAJ) and Unpaywall, which both include OA information. But informa-
tion about the history of the business model of journals is rarely available. In den vergangenen Jahren,
many journals have converted (flipped) from CA to OA and vice versa, but often there is not
enough information about the exact date of starting with the new access model. The Open
Access Directory (OAD), a wiki hosted by the School of Library and Information Science at
Simmons University2, is the only resource containing a list of a few flipped journals and the
date of flipping. The OA start date of journals was available in the DOAJ dataset until 2020.
Bautista-Puig, Lopez-Illescas et al. (2020) and Momeni et al. (2021) used the OAD and
DOAJ for their studies about flipping journals. Bedauerlicherweise, the DOAJ has now stopped
collecting that information: “As time progressed, open access models became more compli-
cated … It has become harder to find the right answer to that seemingly simple question:
when did open access start for this journal?”3 Matthias, Jahn, and Laakso (2019) employed
different snapshots of data sets that have OA status (Scopus, DOAJ, Ulrichsweb, publishers’
websites, usw.) and some other resources to find out the reverse flip (converting from OA
back to CA) and verified them manually. For bibliometric analyses related to OA, es ist
necessary to know about the access status of journals for the period in which we study
the effect of OA. Obtaining information more coherently requires looking into different
journals’ business models and harmonizing them to make them comparable. Zusätzlich,
every publisher has its own rules for APC exemptions to foster publishing in OA format.
Zum Beispiel, eligibility for APC waivers for publishing in Elsevier’s journals is based on
the “Research4Life program”4 and for Springer Nature based on “World bank classification.”
Various transformative agreements with publishers and the period of their contracts are other
influential factors that should be considered in studying the publishing behavior of each pub-
lisher separately.
Due to these varying APC-related rules for different publishers, we focused on one major
publisher. To analyze papers for various disciplines and countries, we chose Springer Nature,
the largest publisher of academic journals (mehr als 2,900 journals5) with worldwide authors
from various disciplines, which provides us with a large amount of data and data diversity for
more accurate results. Auch, compared to Elsevier, the second most prominent publisher of
scholarly journals (über 2,700 journals6), this publisher has a higher OA update (Sotudeh,
Ghasempour, & Yaghtin, 2015; Sullo, 2016), resulting in less data skewness.
We downloaded the list of journals and their access status from the snapshot from the year
2019, which is available on the publisher’s website7. Three publishing models exist for these
Springer Nature (SN) journals: Gold OA, Hybrid (with the open access option: Open Choice),
and CA. Figur 1 displays the distribution of journals and their publishing models.
2 https://oad.simmons.edu/oadwiki/ Main_Page.
3 https:// blog.doaj.org/2021/02/05/why-did-we-stop-collecting-and-showing-the-open-access-start-date-for
-journals/.
4 https://www.research4life.org/access/eligibility/.
5 https://www.springernature.com/gp/librarians/products/journals/springer-journals.
6 https://www.elsevier.com/about/this-is-elsevier.
7 https://www.springernature.com/gp/open-research/journals-books/journals.
Quantitative Science Studies
357
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
/
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
Figur 1. Distribution of Springer Nature’s journals by (A) publishing model and (B) field and publishing model.
For the bibliometric analyses, we employed Scopus8. We matched the list of SN journals
with journals in Scopus via title and ISSN. Aus 3,138 SN journals, we could match 2,757
journals, which we used for further analyses. Because of the problems regarding journals’ flip-
ping mentioned above, we limited our data to two years, 2017 Und 2018, to reduce the errors
related to detecting the journals’ and articles’ business models. This resulted in 522,411
articles.
To detect the publishing model of articles in hybrid journals, we employed Unpaywall9 (Die
snapshot of 2019), a service to find the available version of articles. We obtained the publish-
ing model of articles in hybrid journals from metadata in this data set.
We obtained the APC amount in U.S. dollars for 1,741 hybrid journals and 297 gold OA
journals from the website of Springer Nature10. There was no fixed APC for 147 gold OA
journals (nur 5% of investigated articles belong to these journals), and we had to visit their
website to obtain the exact amount for these journals. daher, we replaced the APC
amount for these journals with null values (leer) and excluded them from the data for
the classification task.
To detect the gender status of authors, we utilized a combined name and image-based
approach introduced by Karimi, Wagner et al. (2016), which categorizes gender into male
and female. Based on this method, we tried detecting gender using the API at Genderize.
io11. For those names that the API couldn’t identify the gender of, we looked for names on
the web. We detected their gender using image-based recognition algorithms, which increases
the recall and accuracy compared to Genderize.io (Karimi et al., 2016). We acknowledge
that the person’s gender is not a binary variable. Considering the social dimensions, mehr
gender identities could not be identified with this approach, and that is left out of the analysis.
Using Scopus author ID, we found 381,074 unique corresponding authors for the investigated
articles, Und 10,614 authors (um 3%) had only initials or no first name, and we could not
detect their gender.
Gesamt, we identified the gender status for 49% of authors. daher, we excluded
254,044 articles (um 49%) for which we could not detect the gender status of their
8 The in-house Scopus database maintained by the German Competence Centre for Bibliometrics (Scopus-
KB), 2021 Ausführung.
9 https://unpaywall.org/.
10 https://www.springernature.com/de/open-research/journals-books/journals.
11 https://genderize.io/.
Quantitative Science Studies
358
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
Tisch 1. Number and proportion of articles among scientific fields and publishing model for which
we detected the gender status of their corresponding author
Publishing model
Health Sciences
Life Sciences
Physical Sciences
Social Sciences
Multiple fields
Total
CA model (%)
31,642 (53)
23,011 (54)
74,742 (48)
9,210 (40)
38,507 (52)
177,112 (50)
OA model (%)
20,534 (49)
10,032 (57)
9,927 (50)
2,020 (41)
48,742 (58)
91,255 (54)
corresponding author from data in the regression analysis and classification task. One possible
reason for the low rate of identifying gender is the large percentage of authors affiliated with
asiatische Länder (136,591; über 35%)12 and probably originally from these countries. Previ-
ous studies tested gender detection tools for authors with different nationalities and found them
less effective for Asian names (Karimi et al., 2016; Santamaría & Mihaljević, 2018). Tisch 1
shows the number and percentage of OA and CA publications belonging to the corresponding
authors with a gender status across scientific fields. The percentage of detected gender of
authors for OA publications is 4% more than for CA publications.
4.2. Features and Definitions
To investigate the factors that are associated with higher rates of OA publishing, we defined
some features presented in Table 2. Figur 2 presents an overview of data collection and prep-
aration steps. The final analyzed data is available in a Git repository13.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
/
.
To compare publishing and citation behavior across countries, we classified countries by
income based on the World Bank classification14 into four groups: niedrig, lower middle, upper
Mitte, and high-income economies. The income level of a country has been evaluated every
year and its history is available15. Aus 218 listed countries by theWorld Bank, we excluded
20 countries with different income levels from 2015 Zu 2018. Springer Nature offers an APC
waiver and discount to those articles with the corresponding author from low and lower
middle income countries (classified by the World Bank), respectively16.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
From the website Transformative Agreement Registry provided by ESAC17 we found three
organizations with an open access agreement with this publisher during the investigated years
2017 Und 2018 (KEMOE/ FWF in Austria, Max Planck Society in Germany, and Bibsam
consortium in Sweden) and two organizations ( VSNU-UKB in Netherlands and FinELib con-
sortium in Finland) In 2018. We obtained the list of involved institutions in the agreement by
12 Authors from Armenia, Aserbaidschan, Georgia, Kasachstan, Russland, and Turkey, which belong to both Asia and
Europa, are not included in this list.
13 https://github.com/momenifi/open_access_springer_nature.
14 https://datahelpdesk.worldbank.org/ knowledgebase/articles/906519-world-bank-country-and-lending
-groups.
15 https://databank.worldbank.org/data/download/site-content/OGHIST.xlsx.
16 https://www.springernature.com/gp/open-research/policies/journal-policies/apc-waiver-countries.
17 https://esac-initiative.org/about/transformative-agreements/agreement-registry/.
Quantitative Science Studies
359
Which factors are associated with Open Access publishing?
Tisch 2.
Features used to study the associated factors with OA publishing
Description
h-index ranking of the journal in the related discipline (for multidisciplinary
journals, the average ranking among disciplines).
The cost of APC to publish OA in the journal (US dollars).
Field of journal (if the journal has more than one field, the value is ‘multiple
fields’).
Feature type
Zeitschrift
Feature
journal_ranking
journal_APC
field
Health Sciences
Life Sciences
Physical Sciences
Social Sciences
multiple fields
Country
country_income
Income level (GDP per capita) of the country in which the corresponding
author is affiliated.
OA_agreement
If the corresponding author’s country of affiliation has an OA agreement with
the publisher, it equals 1, ansonsten 0.
discount_eligible
If the corresponding author’s country of affiliation belongs to the lower-
middle income group, it equals 1, ansonsten 0.
waiver_eligible
If the corresponding author’s country of affiliation belongs to the low-income
Gruppe, it equals 1, ansonsten 0.
Paper
OA_cite
Ratio of citing OA against CA in this paper
authors_count
Number of authors
Author*
Geschlecht
Alter
For females equals 0 and for males 1.
Years since first publication
OA_publish
Ratio of OA publications against CA in the past (number of previous OA
publications divided by the number of CA publications)
international_coauthors
Proportion of international coauthors** to all coauthors in this paper
* Korrespondierender Autor.
** An international coauthor is a coauthor who has a different affiliation country than the corresponding author.
asking the KEMOE/FWF, Bibsam, and FinELib organizations. The list of participating institu-
tions via VSNU-UK was available on the website of SN18. We assumed that publications with
the corresponding author affiliated with institutions included in the transformative agreement
are free of APC charges. To find Max Planck institutions, we used disambiguated institutional
addresses for German institutions (Rimmert, Schwechheimer, & Winterhager, 2017) verfügbar
on Scopus-KB. We manually looked up the participating institutions for the rest of the four
Länder. We found 12,323 articles and used them to set the feature “OA agreement” value.
Figur 3 represents the number of articles published in Springer Nature where their corre-
sponding author is affiliated with a country with the respective income group. Sixty-seven
18 https://resource-cms.springernature.com/springer-cms/rest/v1/content/19371608/data/v3.
Quantitative Science Studies
360
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
Figur 2.
Flow chart of data collection and preparation process.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Figur 3. Number of papers published by Springer Nature grouped by income level of countries.
Quantitative Science Studies
361
Which factors are associated with Open Access publishing?
articles had a corresponding author with multiple affiliation countries and we excluded them
from the analyses. Publication distribution by countries and their income level are available on
GitHub19.
We needed to identify authors and their publications to obtain the ratio of authors’ previous
OA publications. Scopus Author Id enabled us to get each author’s published article list. Für
the variable Country income, we consider average GDP per capita in 2017 Und 2018 erhalten
from the World Bank group20. We used the year of the first publication of authors indexed in
Scopus to calculate their career age as a measurement of seniority.
To evaluate and rank the quality of journals, we employed the journal’s h-index, welche
Hodge and Lacasse (2011) suggested as a better measurement for ranking journals than the
five-year impact factor in social science that has been used in previous studies (Barner,
Holosko, & Thyer, 2014; Xia, 2012). We calculated the h-index of all journals in Scopus clas-
sified in 27 subject categories21 between the years 2011 Und 2016.
4.3. Methodik
4.3.1. Normalizing the citation impact
To evaluate and compare the citation impact at the article and journal level among different
subject areas, we should normalize them because of varying citation patterns across scientific
disciplines and fields. To normalize the journal’s h-index across categories, we computed the
Percentile Rank (PR) of each journal (inspired by Bornmann and Mutz [2014]) in its category.
This method gives the journals within a category a rank between 0 (lowest h-index) Zu 100
(highest h-index). In this approach, journals with the same h-index have the same rank. Dort-
Vordergrund, this normalization method is an advantage in case of skewed distributions. If the journal
belongs to more than one category, we used the weighted PR (Bornmann & Williams, 2020).
Based on this approach, weighted PR (wPR) will be calculated using the formula:
wPR ¼ PRsc1 * nsc1 þ PRsc2 * nsc2 þ … þ PRsci * nsci
nsc1 þ nsc2 þ … þ nsci
(1)
where sci is the i th subject category that the journal belongs to, nsci is the number of journals
in this subject category, and PRsci is the PR of the journal in it.
We employed a similar normalizing approach to present the citation impact of articles.
Because the citation count is confounded by time since publication, we consider the citations
during a time window of 2 years since the publication, as in previous studies (Jannot, Agoritsas
et al., 2013; Piwowar, Priem et al., 2018). Nächste, we categorized the articles into groups with
the same subject category and publishing year and ranked them from 0 Zu 100 based on
received citations. We define a PR of 50 (citation’s median) as a threshold for highly cited
articles. An article is highly cited if its rank is above 50% of PR in its group, meaning that it
has received more citations than half of the articles in the same subject category and publish-
ing year. For articles belonging to multiple subject categories, we used wPR mentioned in
Eq. 1, where sci is the ith subject category of the article, nsci is the number of articles in this
subject category, and PRsci is the PR of the article in it.
19 https://github.com/momenifi/open_access_springer_nature/ blob/main/publications_country_distribution
.csv.
20 https://data.worldbank.org/indicator/NY.GDP.PCAP.CD.
21 https://service.elsevier.com/app/answers/detail/a_id/14882/supporthub/scopus/related/1/.
Quantitative Science Studies
362
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
4.3.2. Correlation analysis
To find the association between OA publishing and any feature defined in Table 2 we con-
ducted a correlation analysis. The first variable in calculating the correlation is OA publishing,
a dichotomous variable (a case of categorical variable). To assess the association with field,
which is a categorical variable, we selected Cramer’s V coefficient. Cramer’s V is based on the
chi-squared test and measures the strength of association between two variables. Its value
reicht von 0 (no association) Zu 1 (complete association). The association with binary vari-
fähig (OA_agreement, discount_eligible, waiver_eligible, Geschlecht) was examined with the phi
coefficient (Ekström, 2011). This correlation coefficient ranges from −1 to +1 and shows the
strength of the positive or negative correlation between two dichotomous variables. To
measure the association with other numerical or continuous variables, we applied the
point-biserial correlation coefficient, which is used instead of the Pearson correlation when
a variable is dichotomous (LeBlanc & Cox, 2017) and can range from −1 to +1.
4.3.3. Regression analysis
We used multivariate logistic regression to find the relationship between various variables
(defined in Table 2) and OA publishing. This is a common method for modeling the relation-
ship between the dichotomous dependent variable and multiple independent variables. Es
allows us to understand the association of the dependent variable with an independent vari-
able in the presence of other independent variables in the data.
4.3.4. Classification method
We employed a machine learning method to estimate the likelihood of choosing the publish-
ing model. Zu diesem Zweck, we categorized the publishing model of articles into two groups, OA
and CA. Dann, we utilized the value of defined features in Table 2 to predict the publishing
Modell. This process is a classification task in machine learning.
To estimate the publishing model of articles, we use a supervised machine learning method,
random forest (RF): a common tool for classification tasks (Behr, Giese et al., 2020; Kumar,
Mukhopadhyay et al., 2019; Roy, Chopra et al., 2020; Yamak, Saunier, & Vercouter, 2016).
We utilize this tool for binary classification (OA = 1 or CA = 0) and use the features introduced
in Table 2 as independent variables. We implement the algorithm for hybrid journals in which
authors can choose their paper’s business model. We used a k-fold cross-validation (k = 10)
procedure to train and test the model.
Due to the skewed distribution in the target variable (91% CA and 9% OA publishing), Wir
balance them by resampling data via SMOTE (synthetic minority oversampling technique),
which is proven to be a suitable method to handle a class imbalance problem (Spelmen &
Porkodi, 2018).
5. ERGEBNISSE
In diesem Abschnitt, we first present some descriptive statistics about the publishing model of articles
across four country groups and address RQ1. Nächste, we display their differences in terms of
citation impact among different models to answer RQ2. Then we focus on RQ3 and present
the correlation coefficient between the publishing model and features defined in Table 2 Und
multivariate logistic regression to show the relationship between variables. Auch, we demon-
strate the performance of estimating the publishing model of articles in hybrid journals and the
importance of defined features in the estimation task to reveal the influential factors in select-
ing the OA model for publishing.
Quantitative Science Studies
363
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
/
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
Figur 4. Distribution of articles published in journals with three publishing models across four
groups of countries. The access status of hybrid articles has been identified from Unpaywall (Fälle
2 Und 3). For case 4 (hybrid, no access status), we could not find hybrid journals’ articles in
Unpaywall.
5.1. Countries’ Income Level of Corresponding Authors and Their Publishing Model
Figur 4 shows the distribution of articles categorized by publishing model and the country
income level of the corresponding authors. Authors with affiliations in countries with the
lowest income level and eligible for the APC waiver have the highest proportion of gold
OA publications. In contrast to this, authors from lower middle income countries who are
eligible for the APC discount have the lowest percentage in gold OA publishing.
5.2. Countries’ Income Level of Corresponding Authors and Their Citation Impact
Figur 5 shows the ratio of highly cited articles with different publishing models across country
groups for the investigated articles. Generally, we observe a higher percentage of highly cited
papers for corresponding authors from countries with higher income levels.
The ratio of highly cited articles among all countries for gold and hybrid OA models is
higher than in other models. Auch, this ratio is higher for gold OA articles and indicates the
better citation impact of articles published in gold OA journals. The only exception is for
countries with low-income levels, with more highly cited papers in the hybrid OA model.
Compared to CA journals, journals in hybrid CA have more highly cited articles, except for
countries with a high income level.
Figur 5. Percentage of highly cited papers published in different models. Hybrid Open
Access/Closed Access belongs to articles published as OA/CA in hybrid journals.
Quantitative Science Studies
364
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
/
.
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
5.3.
Influential Factors on the Publishing Model
Erste, we conducted a correlation analysis to find the associations between OA publishing and
Merkmale. Tisch 3 shows the correlation coefficient between the publishing model (if open
access is equal to 1 ansonsten 0) and features in Table 2. We separated the data into two sets:
set 1 for articles published in OA or CA journals (nonhybrid journals) and set 2 for articles in
hybrid journals. Set 1 reveals the association of discount and waiver policies with OA publish-
ing, and optional OA publishing for hybrid journals in set 2 displays more author-specific
factors related to OA publishing. The weak negative correlation with gender demonstrates that
the tendency toward gold OA publishing for women is slightly more than for men, welche
disagrees with previous findings (Olejniczak & Wilson, 2020; Zhu, 2017). As we observed
the lowest proportion of OA publishing for countries with a lower middle income level in
Figur 4, the negative correlation for discount_eligible (also a positive value for waiver_
eligible) in Table 3 points out that the discount policies are insufficient to motivate the authors
from these countries for gold OA publishing. Tisch 4 displays the relationship between the pub-
lishing model and features in Table 3 by considering all of the features in multivariate logistic
regression. The results confirm the negative/positive correlation calculated in correlation anal-
ysis, except that the positive correlation between discount_eligible and the publishing model is
inconsistent with the result in the correlation coefficient. The highest Odds Ratios for Social
Sciences among fields in Table 4 reveal the highest proportion of OA publishing in this field.
This field has experienced a dramatic growth of OA journals since 2009 (Liu & Li, 2018). Der
strong positive correlation between journal_ranking and the publishing model for the first set
Tisch 3. Correlation coefficient between independent variables and the target variable. The value
of the target equal to 1 (0) means the paper has been published in the OA (CA) Modell
Feature
journal_ranking
journal_APC
field
country_income
OA_agreement
discount_eligible
waiver_eligible
OA_cite
authors_count
Geschlecht
Alter
OA_publish
Correlation test
Point-biserial
Set 1 (nonhybrid)
0.70
Set 2 (hybrid)
0.07
Correlation coefficient
Point-biserial
Cramer’s V
Point-biserial
Phi
Phi
Phi
Point-biserial
Point-biserial
Phi
Point-biserial
Point-biserial
–
0.69
0.28
0.08
−0.08
0.06
0.42
0.09
−0.08
−0.08
0.46
0.17
0.10
0.09
0.16
0.30
–
–
0.13
0.07
−0.01
0.02
0.41
0.11
international_coauthors
Point-biserial
Sample size:
192,498
329,913
Quantitative Science Studies
365
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
/
e
D
u
Q
S
S
/
A
R
T
ich
C
e
–
P
D
l
F
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D
.
/
F
B
j
G
u
e
S
T
T
Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Which factors are associated with Open Access publishing?
The results of logistic regression. The target variable is the publishing model and is equal to 1 for OA and 0 for CA publishing. Der
Tisch 4.
outputs are odds ratio, exp(β). (1 − exp(β)) shows the percentage change of the target variable per unit increase in an independent variable. Also,
an odds ratio greater/less than 1 displays a positive/negative correlation between variables
Set 1
Set 2
Odds ratio
0.002*** (−72.4)
95% CI
0.001 Zu 0.002
Odds ratio
0.00*** (−87.7)
95% CI
0.00 Zu 0.00
Intercept
Independent variables
journal_ranking
1.98*** (10.38)
1.74 Zu 2.25
110.7*** (86.5)
99.5 Zu 100.23
journal_APC
1.00*** (8.05)
1.0001 Zu 1.0002
–
–
field
Health Sciences
Life Sciences
Physical Sciences
Referenz
1.01 (0.31)
0.97 (−0.91)
Social Sciences
1.90*** (13.81)
multiple fields
1.25*** (8.5)
Referenz
0.94 Zu 1.08
0.91 Zu 1.07
1.73 Zu 2.08
1.19 Zu 1.32
Referenz
0.67*** (−9.55)
0.20*** (−44.29)
3.49*** (12.2)
3.4*** (30.87)
Referenz
0.62 Zu 0.73
0.18 Zu 0.21
2.86 Zu 4.27
3.17 Zu 3.71
country_income
OA_agreement
discount_eligible
waiver_eligible
OA_cite
authors_count
Geschlecht
Alter
1.00*** (33.88)
1.000 Zu 1.000
1.000*** (16.18)
1.00 Zu 1.00
14.9*** (65.07)
13.78 Zu 16.22
–
–
–
–
0.93(−0.78)
1.7*** (9.17)
20.19*** (5.53)
0.55*** (−12.97)
0.500 Zu 0.600
1.55*** (8.4)
0.78 Zu 1.11
1.52 Zu 1.90
8.29 Zu 77.5
1.39 Zu 1.71
1.003 (0.80)
0.94** (−2.8)
0.99 Zu 1.01
0.90 Zu 0.98
1.05*** (29.63)
1.05 Zu 1.1.054
1.17*** (33.15)
1.16 Zu 1.18
0.93* (−2.5)
0.97*** (−15.36)
0.88 Zu 0.98
0.96 Zu 0.98
OA_publish
196.79*** (105.65)
178.46 Zu 217.09
23.86*** (50.58)
21.1 Zu 26.99
international_coauthors
1.17*** (18.21)
1.15 Zu 1.19
1.03 (1.34)
0.99 Zu 1.06
McFadden’s pseudo R2
Sample size
0.25
96,674
0.60
162,773
Significance: *P < 0.05, **p < 0.01, ***p < 0.001. z-values of coefficients in parentheses. CI: Confidence interval.
Table 5.
Performance of predicting the publishing model of papers with random forest method
Classification
Precision
Recall
F1 score
Accuracy
OA
0.85
0.95
0.89
0.92
CA
0.94
0.83
0.88
366
Quantitative Science Studies
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Which factors are associated with Open Access publishing?
Figure 6. Permutation importance of features employed to predict the publishing model of papers
with the Random Forest method for the articles published in hybrid journals.
suggests that the journal’s rank is the dominant factor in choosing a gold OA journal to publish.
Therefore, we estimate the publishing model for articles in set 2 (hybrid journals) to discover
other feature categories rather than journal-specific factors influencing the authors’ decision for
an OA option. Moreover, the optional choice of the OA model in hybrid journals better reveals
characteristics leading to the OA model.
Table 5 shows the performance of the RF classifier for the second set (hybrid journals).
Figure 6 displays the permutation importance of features employed to predict the publishing
model implemented for this set. The permutation importance of a feature shows a decrease in
the model performance when the feature’s value is randomly shuffled, but the values of other
predictors remain unchanged. A higher value for a feature shows more predictive power in the
proposed model. The highest importance values for country_income and age in Figure 6
indicate that the most significant factors in selecting an OA model are the income level of
countries and seniority. The lowest value for the variable gender presents that gender has a
lower impact on the authors’ decision for the OA model compared to other factors. OA_agree-
ment is one of the weakest features in predicting the publishing model, and the correlation
analysis also shows a weak correlation between them. One possible reason for the weak effect
is that only 2.3% of papers have been involved in transformative agreements. In addition, the
income level of countries is the most important feature, and regarding the positive correlation
of this feature with OA publishing, it is more likely for authors from high-income countries
(even without a transformative agreement) to publish in the OA model. This may also smooth
the association of the agreement with OA publishing.
6. CONCLUSION AND DISCUSSION
This work presents a detailed study of the relationship between author-specific and structural fac-
tors (e.g., income level of authors’ affiliation country), OA publishing, and OA citation advantage.
First, we investigated the relationship between the income level of countries and OA publishing
for articles published by Springer Nature in the years 2017 and 2018. We found that authors from
lower middle income countries with eligibility to use APC discounts have a lower proportion of
gold OA publications in all published papers by this publisher compared to other countries. It
indicates that discounted APC is still too much for these authors to pay for a gold OA model
and agrees with the statement of Rouhi et al. (2022), who pointed out that waiver and discount
issues could not bring author equity in reading and publishing. In contrast, the proportion of
authors from countries with a low income level who receive APC waivers is higher than authors
from other countries. This result conflicts with the study results by Smith et al. (2021), which found
fewer OA paper proportions published by Elsevier for these countries compared to others. The
reason could be stricter conditions that this publisher considers for waiver eligibility.
Quantitative Science Studies
367
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Which factors are associated with Open Access publishing?
We examined the citation impact of these articles and compared the percentage of highly
cited papers among the publishing models and the income levels of the corresponding
authors’ countries. For all countries, the OA model in gold OA or hybrid has the highest per-
centage of highly cited papers. Also, the results demonstrate a higher proportion of highly cited
articles for countries with higher income levels. Although it displays more citation impact for
OA models, this can result from confounding factors such as self-selection and quality biases
(Gargouri, Hajjem et al., 2010). Also, examining the preprint and green OA publishing effect
(where the article has been published in the CA model, but a free version is available in a
repository outside of the publisher’s website) will result in more accurate analyses (Fraser
et al., 2020; Wang, Glänzel, & Chen, 2020).
We conducted correlation, regression, and machine learning analyses to find more charac-
teristics (e.g., author, journal, paper) related to OA publishing. The results of the correlation
analysis displayed the strength of positive/negative correlation between the publishing model
and every feature defined in Table 2. Using regression analysis, we examined the association
of each factor while accounting for other factors. The results reinforced the correlation out-
comes. The only conflict between these two methods was the negative correlation between
discount_eligibility with OA publishing in the correlation analysis, whereas it was positive in
regression evaluation. In addition, we estimated the publishing model of articles (OA or CA)
using an RF-based machine learning approach and examined the impact of each feature on
the estimation task. The results show that the country’s income and more experiences in OA
rather than CA publishing are the most influential factors in estimating the publishing model.
We discovered that the tendency toward OA publishing was slightly higher for women, but it
was a less important feature than other features in estimating the OA model.
7. LIMITATIONS AND FUTURE WORK
One obvious limitation of this study is that we included articles from just one publisher,
Springer Nature. Authors’ publishing behavior may differ among articles published by other
publishers, which limits the generalizability of the results of our study.
We obtained the access status of journals in 2019 based on the list published on Springer
Nature’s website (the same for the access status at the article level from Unpaywall). Some
journals may have flipped from CA to OA (Momeni et al., 2021) or vice versa, and we did
not detect this, which may cause errors in results. Furthermore, we did not control the correct-
ness of external data (Springer Nature and Unpaywall). The accuracy of these data affects the
results’ precision. We identified the gender of 49% authors and removed 49% of articles with-
out gender status for the corresponding authors in the regression and machine learning anal-
yses. In addition, 2% of the data have been removed because of the null value in other features
(e.g., journals’ APC). Because the gender detection approach does not work well for Asian
names, especially Chinese ones, we have a lower proportion of these authors with gender
status in the data set, which also creates biases in our analyses.
For future work, we can consider other publishers to examine how the different APC pol-
icies among publishers impact OA publishing. Also, controlling for articles’ language in the
analyses encourages future studies. Springer Nature is an international publisher and publishes
mostly articles in English22, and articles in other languages are underrepresented in this study.
Considering other publishers with non-English content and the articles’ language in the
22 https://support.springernature.com/en/support/solutions/articles/6000219817-are-any-of-your-titles-available
-in-other-languages.
Quantitative Science Studies
368
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Which factors are associated with Open Access publishing?
analyses may reveal the role of languages in publishing international OA articles and citation
advantages.
AUTHOR CONTRIBUTIONS
Fakhri Momeni: Conceptualization, Formal analysis, Investigation, Methodology, Resources,
Software, Validation, Visualization, Writing—original draft, Writing—review & editing. Kristin
Biesenbender: Conceptualization, Resources, Writing—review & editing. Philipp Mayr: Fund-
ing acquisition, Project administration, Writing—review & editing. Stefan Dietze: Methodol-
ogy, Supervision, Writing—review & editing. Isabella Peters: Funding acquisition, Project
administration, Supervision, Writing—review & editing.
COMPETING INTERESTS
The authors have no competing interests.
DATA AVAILABILITY
The data set analyzed during the current study and code are available at https://github.com
/momenifi/open_access_springer_nature.git.
FUNDING INFORMATION
This work is financially supported by BMBF project OASE, grant number 01PU17005A. We
acknowledge the support of the German Competence Center for Bibliometrics (grant:
01PQ17001) for maintaining the used data set for the analyses.
REFERENCES
Bahlai, C., Bartlett, L. J., Burgio, K. R., Fournier, A. M., Keiser, C. N.,
… Whitney, K. S. (2019). Open science isn’t always open to all
scientists. American Scientist, 107(2), 78–82. https://doi.org/10
.1511/2019.107.2.78
Barner, J. R., Holosko, M. J., & Thyer, B. A. (2014). American social
work and psychology faculty members’ scholarly productivity: A
controlled comparison of citation impact using the h-index. Brit-
ish Journal of Social Work, 44(8), 2448–2458. https://doi.org/10
.1093/bjsw/bct161
Bautista-Puig, N., Lopez-Illescas, C., de Moya-Anegon, F.,
Guerrero-Bote, V., & Moed, H. F. (2020). Do journals flipping
to gold open access show an OA citation or publication advan-
tage? Scientometrics, 124(3), 2551–2575. https://doi.org/10.1007
/s11192-020-03546-x
Behr, A., Giese, M., Teguim K., H. D., & Theune, K. (2020). Early
prediction of university dropouts—A random forest approach.
Jahrbücher für Nationalökonomie und Statistik, 240(6),
743–789. https://doi.org/10.1515/jbnst-2019-0006
Bornmann, L., & Mutz, R. (2014). From P100 to P1000: A new
citation-rank approach. Journal of the Association for Information
Science and Technology, 65(9), 1939–1943. https://doi.org/10
.1002/asi.23152
Bornmann, L., & Williams, R. (2020). An evaluation of percentile
measures of citation impact, and a proposal for making them
better. Scientometrics, 124(2), 1457–1478. https://doi.org/10
.1007/s11192-020-03512-7
Ekström, J. (2011). The phi-coefficient, the tetrachoric correlation
coefficient, and the Pearson-Yule debate. Journal of the Korean
Statistical Society, 42(3), 323–328. https://doi.org/10.1016/j.jkss
.2012.10.002
Evans, J. A., & Reimer, J. (2009). Open access and global participa-
tion in science. Science, 323(5917), 1025. https://doi.org/10
.1126/science.1154562, PubMed: 19229029
Farys, R., & Wolbring, T. (2021). Matthew effects in science and the
serial diffusion of ideas: Testing old ideas with new methods.
Quantitative Science Studies, 2(2), 505–526. https://doi.org/10
.1162/qss_a_00129
Fox, J., Pearce, K. E., Massanari, A. L., Riles, J. M., Szulc, Ł. …
Gonzales, A. L. (2021). Open science, closed doors? Countering
marginalization through an agenda for ethical, inclusive research
in communication. Journal of Communication, 71(5), 764–784.
https://doi.org/10.1093/joc/jqab029
Fraser, N., Momeni, F., Mayr, P., & Peters, I. (2020). The relation-
ship between bioRxiv preprints, citations and altmetrics. Quanti-
tative Science Studies, 1(2), 618–638. https://doi.org/10.1162/qss
_a_00043
Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., …,
Harnad, S. (2010). Self-selected or mandated, open access
increases citation impact for higher quality research. PLOS
ONE, 5(10), e13636. https://doi.org/10.1371/journal.pone
.0013636, PubMed: 20976155
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people
in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. https://
doi.org/10.1017/S0140525X0999152X, PubMed: 20550733
Hodge, D. R., & Lacasse, J. R. (2011). Evaluating journal quality: Is
the h-index a better measure than impact factors? Research on
Social Work Practice, 21(2), 222–230. https://doi.org/10.1177
/1049731510369141
Iyandemye, J., & Thomas, M. P. (2019). Low income countries have
the highest percentages of open access publication: A systematic
Quantitative Science Studies
369
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Which factors are associated with Open Access publishing?
computational analysis of the biomedical literature. PLOS ONE,
14(7), e0220229. https://doi.org/10.1371/journal.pone.0220229,
PubMed: 31356618
Jannot, A.-S., Agoritsas, T., Gayet-Ageron, A., & Perneger, T. V.
(2013). Citation bias favoring statistically significant studies was
present in medical research. Journal of Clinical Epidemiology,
66(3), 296–301. https://doi.org/10.1016/j.jclinepi.2012.09.015,
PubMed: 23347853
(2016).
Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., & Strohmaier,
Inferring gender from names on the web: A
M.
comparative evaluation of gender detection methods. In Pro-
ceedings of the 25th International Conference Companion on
World Wide Web (pp. 53–54). https://doi.org/10.1145/2872518
.2889385
King, D. A. (2004). The scientific impact of nations. Nature,
430(6997), 311–316. https://doi.org/10.1038/430311a,
PubMed: 15254529
Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., & Shukla,
S. K. (2019). Malware classification using early stage behavioral
analysis. In 2019 14th Asia Joint Conference on Information
Security (AsiaJCIS) (pp. 16–23). https://doi.org/10.1109/AsiaJCIS
.2019.00-10
Langham-Putrow, A., Bakker, C., & Riegelman, A. (2021). Is the
open access citation advantage real? A systematic review of
the citation of open access and subscription-based articles. PLOS
ONE, 16(6), e0253129. https://doi.org/10.1371/journal.pone
.0253129, PubMed: 34161369
Lawson, S. (2015). Fee waivers for open access journals. Publications,
3(3), 155–167. https://doi.org/10.3390/publications3030155
LeBlanc, V., & Cox, M. (2017). Interpretation of the point-biserial
correlation coefficient in the context of a school examination.
The Quantitative Methods for Psychology, 13, 46–56. https://
doi.org/10.20982/tqmp.13.1.p046
Lewis, C. L. (2018). The open access citation advantage: Does it
exist and what does it mean for libraries? Information Technology
and Libraries, 37(3), 50–65. https://doi.org/10.6017/ital.v37i3
.10604
Liu, W., & Li, Y. (2018). Open access publications in sciences and
social sciences: A comparative analysis. Learned Publishing,
31(2), 107–119. https://doi.org/10.1002/leap.1114
Matthias, L., Jahn, N., & Laakso, M. (2019). The two-way street of
open access journal publishing: Flip it and reverse it. Publica-
tions, 7(2), 23. https://doi.org/10.3390/publications7020023
McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., …
Yarkoni, T. (2016). Point of view: How open science helps
researchers succeed. eLife, 5, e16800. https://doi.org/10.7554
/eLife.16800, PubMed: 27387362
Momeni, F., Mayr, P., & Dietze, S. (2022). Investigating the contri-
bution of author- and publication-specific features to scholars’
h-index prediction. arXiv:2207.09655. https://doi.org/10.48550
/arXiv.2207.09655
Momeni, F., Mayr, P., Fraser, N., & Peters, I. (2021). What happens
when a journal converts to open access? A bibliometric analysis.
Scientometrics, 126, 9811–9827. https://doi.org/10.1007/s11192
-021-03972-5
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S.,
Chambers, C. D., … Ioannidis, J. P. A. (2017). A manifesto for
reproducible science. Nature Human Behaviour, 1, 0021.
https://doi.org/10.1038/s41562-016-0021, PubMed: 33954258
Olejniczak, A. J., & Wilson, M. J. (2020). Who’s writing open
access (OA) articles? Characteristics of OA authors at Ph.D.-
granting institutions in the United States. Quantitative Science
Studies, 1(4), 1429–1450. https://doi.org/10.1162/qss_a_00091
Ottaviani, J. (2016). The post-embargo open access citation advan-
tage: It exists (probably), it’s modest (usually), and the rich get
richer (of course). PLOS ONE, 11(8), e0159614. https://doi.org
/10.1371/journal.pone.0159614, PubMed: 27548723
Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., …
Haustein, S. (2018). The state of OA: A large-scale analysis of the
prevalence and impact of open access articles. PeerJ, 6, e4375.
https://doi.org/10.7717/peerj.4375, PubMed: 29456894
Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017).
Disambiguation of author addresses in bibliometric databases.
Technical Report. Bielefeld: Universität Bielefeld, Institute for
Interdisciplinary Studies of Science.
Ross-Hellauer, T., Reichmann, S., Cole, N. L., Fessl, A., Klebel, T., &
Pontika, N. (2021). Dynamics of cumulative advantage and
threats to equity in open science: A scoping review. Royal
Society Open Science, 9(1), 211032. https://doi.org/10.1098
/rsos.211032, PubMed: 35116143
Rouhi, S., Beard, R., & Brundy, C. (2022). Left in the cold: The fail-
ure of APC waiver programs to provide author equity. Science
Editor, 45(1), 5–13. https://doi.org/10.36591/SE-D-4501-5
Roy, S. S., Chopra, R., Lee, K. C., Spampinato, C., & Mohammadi-
Ivatlood, B. (2020). Random forest, gradient boosted machines
and deep neural network for stock price forecasting: A compar-
ative analysis on South Korean companies. International Journal
of Ad Hoc and Ubiquitous Computing, 33(1), 62–71. https://doi
.org/10.1504/IJAHUC.2020.104715
Samimi, A. J. (2011). Scientific output and GDP: Evidence from
countries around the world.
Journal of Education and Voca-
tional Research, 2(2), 38–41. https://doi.org/10.22610/jevr
.v2i2.23
Santamaría, L., & Mihaljević, H. (2018). Comparison and bench-
mark of name-to-gender inference services. PeerJ Computer
Science, 4, e156. https://doi.org/10.7717/peerj-cs.156, PubMed:
33816809
Schroter, S., Tite, L., & Smith, R. (2005). Perceptions of open access
publishing: Interviews with journal authors. British Medical
Journal, 330(7494), 756. https://doi.org/10.1136/ bmj.38359
.695220.82, PubMed: 15677363
Simard, M.-A., Ghiasi, G., Mongeon, P., & Larivière, V. (2021).
Geographic differences in the uptake of open access. In 18th
International Conference on Scientometrics and Informetrics
(p p. 1033–1038). Retrieved f rom https://issi2021.o rg
/proceedings/.
Smith, A. C., Merz, L., Borden, J. B., Gulick, C. K., Kshirsagar, A. R.,
& Bruna, E. M. (2021). Assessing the effect of article processing
charges on the geographic diversity of authors using Elsevier’s
“Mirror Journal” system. Quantitative Science Studies, 2(4),
1123–1143. https://doi.org/10.1162/qss_a_00157
Sotudeh, H., Ghasempour, Z., & Yaghtin, M. (2015). The citation
advantage of author-pays model: The case of Springer and
Elsevier OA journals. Scientometrics, 104(2), 581–608. https://
doi.org/10.1007/s11192-015-1607-5
Spelmen, V. S., & Porkodi, R. (2018). A review on handling imbal-
anced data. In 2018 International Conference on Current Trends
Towards Converging Technologies (pp. 1–11). https://doi.org/10
.1109/ICCTCT.2018.8551020
Sullo, E. (2016). Open access papers have a greater citation advan-
tage in the author-pays model compared to toll access papers in
Springer and Elsevier open access journals. Evidence Based
Library and Information Practice, 11(1), 60–62. https://doi.org
/10.18438/B84W67
Wang, Z., Glänzel, W., & Chen, Y. (2020). The impact of preprints
in library and information science: An analysis of citations,
Quantitative Science Studies
370
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Which factors are associated with Open Access publishing?
usage and social attention indicators. Scientometrics, 125(2),
1403–1423. https://doi.org/10.1007/s11192-020-03612-4
Xia, J. (2012). Positioning open access journals in a LIS journal
ranking. College & Research Libraries, 73(2), 134–145. https://
doi.org/10.5860/crl-234
Yamak, Z., Saunier, J., & Vercouter, L. (2016). Detection of multiple
identity manipulation in collaborative projects. In Proceedings
of the 25th International Conference Companion on World
Wide Web (pp. 955–960). https://doi.org/10.1145/2872518
.2890586
Zhu, Y. (2017). Who support open access publishing? Gender, dis-
cipline, seniority and other factors associated with academics’
OA practice. Scientometrics, 111(2), 557–579. https://doi.org/10
.1007/s11192-017-2316-z, PubMed: 28490821
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
q
s
s
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
4
2
3
5
3
2
1
3
6
3
8
7
q
s
s
_
a
_
0
0
2
5
3
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Quantitative Science Studies
371