RESEARCH ARTICLE

RESEARCH ARTICLE

Which factors are associated with Open Access
Veröffentlichung? A Springer Nature case study

Fakhri Momeni1

, Stefan Dietze1,2, Philipp Mayr1

,

Kristin Biesenbender3

, and Isabella Peters3

1GESIS—Leibniz Institute for the Social Sciences, Köln, Deutschland
2Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Deutschland
3ZBW—Leibniz Information Centre for Economics, Kiel, Deutschland

Schlüsselwörter: APC policies, bibliometrics, citation impact, machine learning, open access

ABSTRAKT

Open Access (OA) facilitates access to research articles. Jedoch, authors or funders often
must pay the publishing costs, preventing authors who do not receive financial support from
participating in OA publishing and gaining citation advantage for OA articles. OA may
exacerbate existing inequalities in the publication system rather than overcome them. To
investigate this, we studied 522,411 articles published by Springer Nature. Employing
correlation and regression analyses, we describe the relationship between authors affiliated
with countries from different income levels, their choice of publishing model, and the citation
impact of their papers. A machine learning classification method helped us to explore the
importance of different features in predicting the publishing model. The results show that
authors eligible for article processing charge (APC) waivers publish more in gold OA journals
als andere. Im Gegensatz, authors eligible for an APC discount have the lowest ratio of OA
publications, leading to the assumption that this discount insufficiently motivates authors to
publish in gold OA journals. We found a strong correlation between the journal rank and the
publishing model in gold OA journals, whereas the OA option is mostly avoided in hybrid
journals. Auch, results show that the countries’ income level, seniority, and experience with OA
publications are the most predictive factors for OA publishing in hybrid journals.

1.

EINFÜHRUNG

The unrestricted availability of Open Access (OA) publications is linked to the goal of granting
all interested parties free access to scientific knowledge and ensuring greater equality of access
(Munafo, Nosek et al., 2017). This view is strongly related to the consumers of scholarly knowl-
edge, who then would not have to pay for access. Jedoch, when taking the authors of those
articles into account, they are affected by OA in two different ways: when choosing a publica-
tion model for an article and when receiving citations (and hence reputation) for articles that
have been published via a certain model (usually described as citation advantage; sehen, für
Beispiel, Langham-Putrow, Bakker, and Riegelman (2021)). Those two aspects of OA may
introduce significant biases and inequity into the scholarly publication and reputation system
because they may restrict participation in OA in particular ways (Bahlai, Bartlett et al., 2019).

Erste, the OA publishing model generally shifts the publishing costs from readers to authors
or their institutions and funders by introducing article processing charges (APCs). This can be a
severe constraint for those authors who cannot afford these costs or do not receive any

Keine offenen Zugänge

Tagebuch

Zitat: Momeni, F., Dietze, S., Mayr,
P., Biesenbender, K., & Peters, ICH. (2023).
Which factors are associated with
Open Access publishing? A Springer
Nature case study. Quantitative
Science Studies, 4(2), 353–371. https://
doi.org/10.1162/qss_a_00253

DOI:
https://doi.org/10.1162/qss_a_00253

Peer Review:
https://www.webofscience.com/api
/gateway/wos/peer-review/10.1162
/qss_a_00253

Erhalten: 16 August 2022
Akzeptiert: 6 Marsch 2023

Korrespondierender Autor:
Fakhri Momeni
fakhri.momeni@t-online.de

Handling-Editor:
Ludo Waltman

Urheberrechte ©: © 2023 Fakhri Momeni,
Stefan Dietze, Philipp Mayr, Kristin
Biesenbender, and Isabella Peters.
Veröffentlicht unter Creative Commons
Namensnennung 4.0 International (CC BY 4.0)
Lizenz.

Die MIT-Presse

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

/

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

financial support. To overcome this issue, most publishers have implemented an APC
waiver/discount policy for authors from, Zum Beispiel, low-income countries (Lawson, 2015).
Jedoch, it is an open question as to how the different options for OA publishing and
waivers/discounts are considered and adopted by researchers with various characteristics,
such as their countries’ income level and also their seniority and gender—factors that are also
often associated with the decision to publish OA (Iyandemye & Thomas, 2019; Olejniczak &
Wilson, 2020; Simard, Ghiasi et al., 2021; Schmied, Merz et al., 2021; Zhu, 2017). Rouhi, Beard,
and Brundy (2022) discussed the waiver issues from the perspectives of the publisher,
institutions, and developing countries. They mentioned the potential unfairness that authors
are confronted with, which may be caused by APC-based models. They argued that waiver
programs have yet to address this problem successfully. They suggested that meeting the
equity standard requires a cross-functional approach involving publishers, funders, Forschung
institutions, individual researchers, libraries, and service providers.

To accommodate OA publishing costs, three funding options have emerged over time. Erste,
diamond OA journals are funded by public institutions, such as libraries, which enable free
reading and publishing for all researchers. Zweite, transformative agreements between public
institutions and publishers have been introduced that include reading and publishing contracts
and which are also funded by the institutions. In this case, there are no direct fees for authors,
but their institutions pay the APCs as part of a consortium. Access to publishing and access to
publications is limited to participating organizations only. Dritte, APCs could also be paid by
the authors or their institutions themselves. The first option leads to gold OA at the journal
Ebene. Transformative agreements allow authors to publish in either gold OA or hybrid journals
(which—for a fee—allow publishing individual articles as an OA-variant). The third option is
often associated with hybrid journals. All other publishing models for journals usually require
funding via subscriptions, resulting in closed-access (CA) articles that can only be read after
paying the article or journal fee.

The publishing model is also strongly associated with the visibility of authors and articles.
For many researchers, it makes a difference in which journals they publish (z.B., angesichts
discipline-specific journal rankings). If they want to be noticed by others and/or seek promo-
tion, it can be crucial to publish in reputable journals, especially for early-career researchers.
To achieve this, not only do financial hurdles and APCs have to be overcome, but also, für
Beispiel, English language skills and technical skills are needed, as well as institutions that can
help with legal advice or infrastructure support. Against this background, researchers have to
decide which publishing model to choose and whether OA is not only an altruistic but a fea-
sible option at all.

The second possible source of bias and inequity is related to the paying for access case: Es
has been shown already that articles published as OA variants are more visible, leading to
higher citation counts and altmetrics (Evans & Reimer, 2009; Fraser, Momeni et al., 2020;
Lewis, 2018; McKiernan, Bourne et al., 2016; Ottaviani, 2016). Darüber hinaus, the Matthew effect
shows that researchers who are already well known and widely cited receive even more cita-
tionen (Farys & Wolbring, 2021)—which directly affects rewards for publication in prestigious
journals, for prominence, and citations. For researchers, publications play a central role in
their daily practice and the reputation system in which they operate. Publications enable
researchers to build on the body of knowledge and refer to those findings by citing the pub-
lications (which accumulate reputation in this way). Somit, access to publications is crucial
for the progress of science and building of reputation—both of which can be impeded by a
lack of access to OA publishing options and the risk of CA articles not being cited as frequently
as OA articles.

Quantitative Science Studies

354

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

From that, we hypothesize that researchers with better access to financial resources have
better access to publications—both in terms of access to read openly and in terms of access to
publish openly. Associated with that may be an even stronger citation advantage for those
researchers (usually WEIRD: Western, gebildet, industrialized, rich, and democratic (Henrich,
Heine, & Norenzayan, 2010)) with extensive OA-publishing options. Als solche, OA may carry
the risk of perpetuating already existing inequalities rather than resolving such marginalization
in the scholarly communication system (Fuchs, Pearce et al., 2021).

2. RELATED WORK

Related work also indicates a strong association between economic factors, OA, and citation
advantages. The scientific output of countries is associated with their economic evolution because
scientific progress needs governments’ financial support. Samimi (2011) used a Granger Causality
Test to examine the causal relationship between scientific output and GDP in 176 countries and
found a two-way positive relationship between them. King (2004) compared published papers
and their citation impacts across countries and found that only 31 countries contributed to 98%
of the world’s highly cited papers and that the remaining 161 countries contributed less than 2%.

OA publishing is also highly influenced by the authors’ country of affiliation, because it
determines APC waiver/discount policies or the availability of transformative agreements with
Verlag. Some publishers offer general waivers or have a discount policy for all of their
journals for eligible authors, and the country’s income level mainly determines eligibility.
Lawson (2015) has studied the waiver policy of the 32 most prominent publishers and found
Das 68% of them grant APC waivers. Simard et al. (2021) found that low-income countries
publish and cite OA more than upper-middle and high-income countries. The positive corre-
lation between OA citing and publishing is 1.3 times weaker for high-income countries than
andere Länder. Ähnlich, Iyandemye and Thomas (2019) showed that biomedicine
researchers from low-income countries have the highest percentage in OA publishing. Schmied
et al. (2021) reported the proportionately fewer OA articles published in Elsevier’s journals for
low-income countries, despite their eligibility for APC waivers.

Olejniczak and Wilson (2020) studied the articles published by faculty members at research
universities in the United States and found that in the United States, male and senior authors
are more likely to publish in OA form. Zhu (2017) conducted a survey with over 1,800
researchers at 12 Russell Group universities1 to find the differences in OA publishing regarding
discipline, seniority, and gender. Their results revealed disciplinary differences in OA publish-
ing (Medical and Life Scientists are most likely to publish in gold OA journals), more tendency
toward OA publishing for senior authors, and across genders for men.

The journal rank is a decisive factor in submitting the article in addition to its business
Modell. Schroter, Tite, and Smith (2005) conducted a survey study with 28 international authors
who submitted to the British Medical Journal and found that for authors, the journal’s ranking
is more important than the availability of OA.

Many studies have investigated the OA citation outcome, and most found a citation advan-
tage for OA articles (Evans & Reimer, 2009; Fraser et al., 2020; Lewis, 2018; McKiernan et al.,
2016; Ottaviani, 2016). Jedoch, regarding biases (z.B., quality bias, self-selecting, mandat-
ing, self-archiving), different sampling and controlling data make it difficult to conclude that
receiving more citations is only the effect of OA. Momeni, Mayr et al. (2021) studied the cita-
tion impact of flipping journals from CA to OA and generally found a slightly higher growth in

1 https://russellgroup.ac.uk/about/our-universities/.

Quantitative Science Studies

355

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

receiving citations compared to journals in the same discipline and the impact factor’s range.
Jedoch, they did not observe this trend in all scientific fields. Momeni, Mayr, and Dietze
(2022) examined the correlation between different factors and the future authors’ h-index
and found a positive but weak correlation between them.

One issue that is often discussed together with OA publishing and APCs is the problem of
predatory publishing. Predatory publishers take advantage of the OA movement but work
against good scientific practice. Ross-Hellauer, Reichmann et al. (2021) did a systematic
review to study the threat to equity in science via open science implementations. They con-
cluded that less well-resourced researchers, researchers from non-English-speaking countries,
and early-career researchers are particularly affected by the predatory publishing problem.

3. RESEARCH QUESTIONS

We conduct our study on the association between publishing models, the economic back-
ground of researchers, and other author-specific and structural factors along three major
research questions:

RQ1: What is the relationship between the income level of researchers’ affiliation countries

and their publication behavior (do they prefer OA or CA)?

RQ2: What is the relationship between the income level of researchers’ affiliation countries

and their publication behavior (OA or CA) with their citation impact?

To answer these questions, we categorize corresponding authors based on the income level
of their affiliation country and compare the access status of articles they have published and
their citation impact. Whereas the first two RQs are rather descriptive and aim at quantifying
the extent to which access to publish openly and access to read openly (and along with it to
make them easier/more likely to cite) are related to the economic background of authors, Die
third RQ takes a variety of factors into account that have been shown to be strongly associated
with tendencies to publish OA (Iyandemye & Thomas, 2019; Olejniczak & Wilson, 2020;
Simard et al., 2021; Smith et al., 2021; Zhu, 2017).

RQ3: What factors (z.B., journals, articles, authors, or their countries) are associated with

selecting the business model of publications (OA against CA)?

Here we aim to give a detailed view of associating factors with OA publishing using cor-
relation, regression, and machine learning analyses. Zu diesem Zweck, structural features, wie zum Beispiel
APC waivers, are considered besides author-specific properties, such as gender or years of
publishing activity (siehe Tabelle 2). We will also look closely at the different access forms to pub-
lications such as gold OA, hybrid, and CA. Concerning the level of journals, the relationships
between journal rankings, APCs, and research fields (Health Sciences, Life Sciences, Physical
Wissenschaften, Social Sciences, and multiple fields) will be examined. Zusätzlich, possible country-
related influencing factors will be investigated, such as countries’ income level, transformation
agreements’ existence, or opportunities for researchers to obtain APC discounts or waivers. Bei
the journal article level, the ratio of OA to CA citations in an article and the number of authors
involved are examined. Other author-specific influencing factors can be gender and age, Die
ratio of OA to CA publications in the past, or even the proportion of international coauthors.

4. DATA AND METHODOLOGY

To conduct our study, information on the business model, author characteristics, and article
impact are needed, and several approaches and databases must be linked to receive a com-
plete data set.

Quantitative Science Studies

356

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

4.1. Data Selection

For the business model of journals (OA, hybrid, CA) it is possible to crawl the information
from the journal’s or publisher’s website or to look up sources such as the Directory of Open
Access Journals (DOAJ) and Unpaywall, which both include OA information. But informa-
tion about the history of the business model of journals is rarely available. In den vergangenen Jahren,
many journals have converted (flipped) from CA to OA and vice versa, but often there is not
enough information about the exact date of starting with the new access model. The Open
Access Directory (OAD), a wiki hosted by the School of Library and Information Science at
Simmons University2, is the only resource containing a list of a few flipped journals and the
date of flipping. The OA start date of journals was available in the DOAJ dataset until 2020.
Bautista-Puig, Lopez-Illescas et al. (2020) and Momeni et al. (2021) used the OAD and
DOAJ for their studies about flipping journals. Bedauerlicherweise, the DOAJ has now stopped
collecting that information: “As time progressed, open access models became more compli-
cated … It has become harder to find the right answer to that seemingly simple question:
when did open access start for this journal?”3 Matthias, Jahn, and Laakso (2019) employed
different snapshots of data sets that have OA status (Scopus, DOAJ, Ulrichsweb, publishers’
websites, usw.) and some other resources to find out the reverse flip (converting from OA
back to CA) and verified them manually. For bibliometric analyses related to OA, es ist
necessary to know about the access status of journals for the period in which we study
the effect of OA. Obtaining information more coherently requires looking into different
journals’ business models and harmonizing them to make them comparable. Zusätzlich,
every publisher has its own rules for APC exemptions to foster publishing in OA format.
Zum Beispiel, eligibility for APC waivers for publishing in Elsevier’s journals is based on
the “Research4Life program”4 and for Springer Nature based on “World bank classification.”
Various transformative agreements with publishers and the period of their contracts are other
influential factors that should be considered in studying the publishing behavior of each pub-
lisher separately.

Due to these varying APC-related rules for different publishers, we focused on one major
publisher. To analyze papers for various disciplines and countries, we chose Springer Nature,
the largest publisher of academic journals (mehr als 2,900 journals5) with worldwide authors
from various disciplines, which provides us with a large amount of data and data diversity for
more accurate results. Auch, compared to Elsevier, the second most prominent publisher of
scholarly journals (über 2,700 journals6), this publisher has a higher OA update (Sotudeh,
Ghasempour, & Yaghtin, 2015; Sullo, 2016), resulting in less data skewness.

We downloaded the list of journals and their access status from the snapshot from the year
2019, which is available on the publisher’s website7. Three publishing models exist for these
Springer Nature (SN) journals: Gold OA, Hybrid (with the open access option: Open Choice),
and CA. Figur 1 displays the distribution of journals and their publishing models.

2 https://oad.simmons.edu/oadwiki/ Main_Page.
3 https:// blog.doaj.org/2021/02/05/why-did-we-stop-collecting-and-showing-the-open-access-start-date-for

-journals/.

4 https://www.research4life.org/access/eligibility/.
5 https://www.springernature.com/gp/librarians/products/journals/springer-journals.
6 https://www.elsevier.com/about/this-is-elsevier.
7 https://www.springernature.com/gp/open-research/journals-books/journals.

Quantitative Science Studies

357

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

/

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

Figur 1. Distribution of Springer Nature’s journals by (A) publishing model and (B) field and publishing model.

For the bibliometric analyses, we employed Scopus8. We matched the list of SN journals
with journals in Scopus via title and ISSN. Aus 3,138 SN journals, we could match 2,757
journals, which we used for further analyses. Because of the problems regarding journals’ flip-
ping mentioned above, we limited our data to two years, 2017 Und 2018, to reduce the errors
related to detecting the journals’ and articles’ business models. This resulted in 522,411
articles.

To detect the publishing model of articles in hybrid journals, we employed Unpaywall9 (Die
snapshot of 2019), a service to find the available version of articles. We obtained the publish-
ing model of articles in hybrid journals from metadata in this data set.

We obtained the APC amount in U.S. dollars for 1,741 hybrid journals and 297 gold OA
journals from the website of Springer Nature10. There was no fixed APC for 147 gold OA
journals (nur 5% of investigated articles belong to these journals), and we had to visit their
website to obtain the exact amount for these journals. daher, we replaced the APC
amount for these journals with null values (leer) and excluded them from the data for
the classification task.

To detect the gender status of authors, we utilized a combined name and image-based
approach introduced by Karimi, Wagner et al. (2016), which categorizes gender into male
and female. Based on this method, we tried detecting gender using the API at Genderize.
io11. For those names that the API couldn’t identify the gender of, we looked for names on
the web. We detected their gender using image-based recognition algorithms, which increases
the recall and accuracy compared to Genderize.io (Karimi et al., 2016). We acknowledge
that the person’s gender is not a binary variable. Considering the social dimensions, mehr
gender identities could not be identified with this approach, and that is left out of the analysis.
Using Scopus author ID, we found 381,074 unique corresponding authors for the investigated
articles, Und 10,614 authors (um 3%) had only initials or no first name, and we could not
detect their gender.

Gesamt, we identified the gender status for 49% of authors. daher, we excluded
254,044 articles (um 49%) for which we could not detect the gender status of their

8 The in-house Scopus database maintained by the German Competence Centre for Bibliometrics (Scopus-

KB), 2021 Ausführung.
9 https://unpaywall.org/.
10 https://www.springernature.com/de/open-research/journals-books/journals.
11 https://genderize.io/.

Quantitative Science Studies

358

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

Tisch 1. Number and proportion of articles among scientific fields and publishing model for which
we detected the gender status of their corresponding author

Publishing model

Health Sciences

Life Sciences

Physical Sciences

Social Sciences

Multiple fields

Total

CA model (%)
31,642 (53)

23,011 (54)

74,742 (48)

9,210 (40)

38,507 (52)

177,112 (50)

OA model (%)
20,534 (49)

10,032 (57)

9,927 (50)

2,020 (41)

48,742 (58)

91,255 (54)

corresponding author from data in the regression analysis and classification task. One possible
reason for the low rate of identifying gender is the large percentage of authors affiliated with
asiatische Länder (136,591; über 35%)12 and probably originally from these countries. Previ-
ous studies tested gender detection tools for authors with different nationalities and found them
less effective for Asian names (Karimi et al., 2016; Santamaría & Mihaljević, 2018). Tisch 1
shows the number and percentage of OA and CA publications belonging to the corresponding
authors with a gender status across scientific fields. The percentage of detected gender of
authors for OA publications is 4% more than for CA publications.

4.2. Features and Definitions

To investigate the factors that are associated with higher rates of OA publishing, we defined
some features presented in Table 2. Figur 2 presents an overview of data collection and prep-
aration steps. The final analyzed data is available in a Git repository13.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

/

.

To compare publishing and citation behavior across countries, we classified countries by
income based on the World Bank classification14 into four groups: niedrig, lower middle, upper
Mitte, and high-income economies. The income level of a country has been evaluated every
year and its history is available15. Aus 218 listed countries by theWorld Bank, we excluded
20 countries with different income levels from 2015 Zu 2018. Springer Nature offers an APC
waiver and discount to those articles with the corresponding author from low and lower
middle income countries (classified by the World Bank), respectively16.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

From the website Transformative Agreement Registry provided by ESAC17 we found three
organizations with an open access agreement with this publisher during the investigated years
2017 Und 2018 (KEMOE/ FWF in Austria, Max Planck Society in Germany, and Bibsam
consortium in Sweden) and two organizations ( VSNU-UKB in Netherlands and FinELib con-
sortium in Finland) In 2018. We obtained the list of involved institutions in the agreement by

12 Authors from Armenia, Aserbaidschan, Georgia, Kasachstan, Russland, and Turkey, which belong to both Asia and

Europa, are not included in this list.

13 https://github.com/momenifi/open_access_springer_nature.
14 https://datahelpdesk.worldbank.org/ knowledgebase/articles/906519-world-bank-country-and-lending

-groups.

15 https://databank.worldbank.org/data/download/site-content/OGHIST.xlsx.
16 https://www.springernature.com/gp/open-research/policies/journal-policies/apc-waiver-countries.
17 https://esac-initiative.org/about/transformative-agreements/agreement-registry/.

Quantitative Science Studies

359

Which factors are associated with Open Access publishing?

Tisch 2.

Features used to study the associated factors with OA publishing

Description
h-index ranking of the journal in the related discipline (for multidisciplinary

journals, the average ranking among disciplines).

The cost of APC to publish OA in the journal (US dollars).

Field of journal (if the journal has more than one field, the value is ‘multiple

fields’).

Feature type
Zeitschrift

Feature
journal_ranking

journal_APC

field

Health Sciences

Life Sciences

Physical Sciences

Social Sciences

multiple fields

Country

country_income

Income level (GDP per capita) of the country in which the corresponding

author is affiliated.

OA_agreement

If the corresponding author’s country of affiliation has an OA agreement with

the publisher, it equals 1, ansonsten 0.

discount_eligible

If the corresponding author’s country of affiliation belongs to the lower-

middle income group, it equals 1, ansonsten 0.

waiver_eligible

If the corresponding author’s country of affiliation belongs to the low-income

Gruppe, it equals 1, ansonsten 0.

Paper

OA_cite

Ratio of citing OA against CA in this paper

authors_count

Number of authors

Author*

Geschlecht

Alter

For females equals 0 and for males 1.

Years since first publication

OA_publish

Ratio of OA publications against CA in the past (number of previous OA

publications divided by the number of CA publications)

international_coauthors

Proportion of international coauthors** to all coauthors in this paper

* Korrespondierender Autor.

** An international coauthor is a coauthor who has a different affiliation country than the corresponding author.

asking the KEMOE/FWF, Bibsam, and FinELib organizations. The list of participating institu-
tions via VSNU-UK was available on the website of SN18. We assumed that publications with
the corresponding author affiliated with institutions included in the transformative agreement
are free of APC charges. To find Max Planck institutions, we used disambiguated institutional
addresses for German institutions (Rimmert, Schwechheimer, & Winterhager, 2017) verfügbar
on Scopus-KB. We manually looked up the participating institutions for the rest of the four
Länder. We found 12,323 articles and used them to set the feature “OA agreement” value.

Figur 3 represents the number of articles published in Springer Nature where their corre-
sponding author is affiliated with a country with the respective income group. Sixty-seven

18 https://resource-cms.springernature.com/springer-cms/rest/v1/content/19371608/data/v3.

Quantitative Science Studies

360

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

Figur 2.

Flow chart of data collection and preparation process.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Figur 3. Number of papers published by Springer Nature grouped by income level of countries.

Quantitative Science Studies

361

Which factors are associated with Open Access publishing?

articles had a corresponding author with multiple affiliation countries and we excluded them
from the analyses. Publication distribution by countries and their income level are available on
GitHub19.

We needed to identify authors and their publications to obtain the ratio of authors’ previous
OA publications. Scopus Author Id enabled us to get each author’s published article list. Für
the variable Country income, we consider average GDP per capita in 2017 Und 2018 erhalten
from the World Bank group20. We used the year of the first publication of authors indexed in
Scopus to calculate their career age as a measurement of seniority.

To evaluate and rank the quality of journals, we employed the journal’s h-index, welche
Hodge and Lacasse (2011) suggested as a better measurement for ranking journals than the
five-year impact factor in social science that has been used in previous studies (Barner,
Holosko, & Thyer, 2014; Xia, 2012). We calculated the h-index of all journals in Scopus clas-
sified in 27 subject categories21 between the years 2011 Und 2016.

4.3. Methodik

4.3.1. Normalizing the citation impact

To evaluate and compare the citation impact at the article and journal level among different
subject areas, we should normalize them because of varying citation patterns across scientific
disciplines and fields. To normalize the journal’s h-index across categories, we computed the
Percentile Rank (PR) of each journal (inspired by Bornmann and Mutz [2014]) in its category.
This method gives the journals within a category a rank between 0 (lowest h-index) Zu 100
(highest h-index). In this approach, journals with the same h-index have the same rank. Dort-
Vordergrund, this normalization method is an advantage in case of skewed distributions. If the journal
belongs to more than one category, we used the weighted PR (Bornmann & Williams, 2020).
Based on this approach, weighted PR (wPR) will be calculated using the formula:

wPR ¼ PRsc1 * nsc1 þ PRsc2 * nsc2 þ … þ PRsci * nsci

nsc1 þ nsc2 þ … þ nsci

(1)

where sci is the i th subject category that the journal belongs to, nsci is the number of journals
in this subject category, and PRsci is the PR of the journal in it.

We employed a similar normalizing approach to present the citation impact of articles.
Because the citation count is confounded by time since publication, we consider the citations
during a time window of 2 years since the publication, as in previous studies (Jannot, Agoritsas
et al., 2013; Piwowar, Priem et al., 2018). Nächste, we categorized the articles into groups with
the same subject category and publishing year and ranked them from 0 Zu 100 based on
received citations. We define a PR of 50 (citation’s median) as a threshold for highly cited
articles. An article is highly cited if its rank is above 50% of PR in its group, meaning that it
has received more citations than half of the articles in the same subject category and publish-
ing year. For articles belonging to multiple subject categories, we used wPR mentioned in
Eq. 1, where sci is the ith subject category of the article, nsci is the number of articles in this
subject category, and PRsci is the PR of the article in it.

19 https://github.com/momenifi/open_access_springer_nature/ blob/main/publications_country_distribution

.csv.

20 https://data.worldbank.org/indicator/NY.GDP.PCAP.CD.
21 https://service.elsevier.com/app/answers/detail/a_id/14882/supporthub/scopus/related/1/.

Quantitative Science Studies

362

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

4.3.2. Correlation analysis

To find the association between OA publishing and any feature defined in Table 2 we con-
ducted a correlation analysis. The first variable in calculating the correlation is OA publishing,
a dichotomous variable (a case of categorical variable). To assess the association with field,
which is a categorical variable, we selected Cramer’s V coefficient. Cramer’s V is based on the
chi-squared test and measures the strength of association between two variables. Its value
reicht von 0 (no association) Zu 1 (complete association). The association with binary vari-
fähig (OA_agreement, discount_eligible, waiver_eligible, Geschlecht) was examined with the phi
coefficient (Ekström, 2011). This correlation coefficient ranges from −1 to +1 and shows the
strength of the positive or negative correlation between two dichotomous variables. To
measure the association with other numerical or continuous variables, we applied the
point-biserial correlation coefficient, which is used instead of the Pearson correlation when
a variable is dichotomous (LeBlanc & Cox, 2017) and can range from −1 to +1.

4.3.3. Regression analysis

We used multivariate logistic regression to find the relationship between various variables
(defined in Table 2) and OA publishing. This is a common method for modeling the relation-
ship between the dichotomous dependent variable and multiple independent variables. Es
allows us to understand the association of the dependent variable with an independent vari-
able in the presence of other independent variables in the data.

4.3.4. Classification method

We employed a machine learning method to estimate the likelihood of choosing the publish-
ing model. Zu diesem Zweck, we categorized the publishing model of articles into two groups, OA
and CA. Dann, we utilized the value of defined features in Table 2 to predict the publishing
Modell. This process is a classification task in machine learning.

To estimate the publishing model of articles, we use a supervised machine learning method,
random forest (RF): a common tool for classification tasks (Behr, Giese et al., 2020; Kumar,
Mukhopadhyay et al., 2019; Roy, Chopra et al., 2020; Yamak, Saunier, & Vercouter, 2016).
We utilize this tool for binary classification (OA = 1 or CA = 0) and use the features introduced
in Table 2 as independent variables. We implement the algorithm for hybrid journals in which
authors can choose their paper’s business model. We used a k-fold cross-validation (k = 10)
procedure to train and test the model.

Due to the skewed distribution in the target variable (91% CA and 9% OA publishing), Wir
balance them by resampling data via SMOTE (synthetic minority oversampling technique),
which is proven to be a suitable method to handle a class imbalance problem (Spelmen &
Porkodi, 2018).

5. ERGEBNISSE

In diesem Abschnitt, we first present some descriptive statistics about the publishing model of articles
across four country groups and address RQ1. Nächste, we display their differences in terms of
citation impact among different models to answer RQ2. Then we focus on RQ3 and present
the correlation coefficient between the publishing model and features defined in Table 2 Und
multivariate logistic regression to show the relationship between variables. Auch, we demon-
strate the performance of estimating the publishing model of articles in hybrid journals and the
importance of defined features in the estimation task to reveal the influential factors in select-
ing the OA model for publishing.

Quantitative Science Studies

363

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

/

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

Figur 4. Distribution of articles published in journals with three publishing models across four
groups of countries. The access status of hybrid articles has been identified from Unpaywall (Fälle
2 Und 3). For case 4 (hybrid, no access status), we could not find hybrid journals’ articles in
Unpaywall.

5.1. Countries’ Income Level of Corresponding Authors and Their Publishing Model

Figur 4 shows the distribution of articles categorized by publishing model and the country
income level of the corresponding authors. Authors with affiliations in countries with the
lowest income level and eligible for the APC waiver have the highest proportion of gold
OA publications. In contrast to this, authors from lower middle income countries who are
eligible for the APC discount have the lowest percentage in gold OA publishing.

5.2. Countries’ Income Level of Corresponding Authors and Their Citation Impact

Figur 5 shows the ratio of highly cited articles with different publishing models across country
groups for the investigated articles. Generally, we observe a higher percentage of highly cited
papers for corresponding authors from countries with higher income levels.

The ratio of highly cited articles among all countries for gold and hybrid OA models is
higher than in other models. Auch, this ratio is higher for gold OA articles and indicates the
better citation impact of articles published in gold OA journals. The only exception is for
countries with low-income levels, with more highly cited papers in the hybrid OA model.
Compared to CA journals, journals in hybrid CA have more highly cited articles, except for
countries with a high income level.

Figur 5. Percentage of highly cited papers published in different models. Hybrid Open
Access/Closed Access belongs to articles published as OA/CA in hybrid journals.

Quantitative Science Studies

364

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

/

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

5.3.

Influential Factors on the Publishing Model

Erste, we conducted a correlation analysis to find the associations between OA publishing and
Merkmale. Tisch 3 shows the correlation coefficient between the publishing model (if open
access is equal to 1 ansonsten 0) and features in Table 2. We separated the data into two sets:
set 1 for articles published in OA or CA journals (nonhybrid journals) and set 2 for articles in
hybrid journals. Set 1 reveals the association of discount and waiver policies with OA publish-
ing, and optional OA publishing for hybrid journals in set 2 displays more author-specific
factors related to OA publishing. The weak negative correlation with gender demonstrates that
the tendency toward gold OA publishing for women is slightly more than for men, welche
disagrees with previous findings (Olejniczak & Wilson, 2020; Zhu, 2017). As we observed
the lowest proportion of OA publishing for countries with a lower middle income level in
Figur 4, the negative correlation for discount_eligible (also a positive value for waiver_
eligible) in Table 3 points out that the discount policies are insufficient to motivate the authors
from these countries for gold OA publishing. Tisch 4 displays the relationship between the pub-
lishing model and features in Table 3 by considering all of the features in multivariate logistic
regression. The results confirm the negative/positive correlation calculated in correlation anal-
ysis, except that the positive correlation between discount_eligible and the publishing model is
inconsistent with the result in the correlation coefficient. The highest Odds Ratios for Social
Sciences among fields in Table 4 reveal the highest proportion of OA publishing in this field.
This field has experienced a dramatic growth of OA journals since 2009 (Liu & Li, 2018). Der
strong positive correlation between journal_ranking and the publishing model for the first set

Tisch 3. Correlation coefficient between independent variables and the target variable. The value
of the target equal to 1 (0) means the paper has been published in the OA (CA) Modell

Feature
journal_ranking

journal_APC

field

country_income

OA_agreement

discount_eligible

waiver_eligible

OA_cite

authors_count

Geschlecht

Alter

OA_publish

Correlation test
Point-biserial

Set 1 (nonhybrid)
0.70

Set 2 (hybrid)
0.07

Correlation coefficient

Point-biserial

Cramer’s V

Point-biserial

Phi

Phi

Phi

Point-biserial

Point-biserial

Phi

Point-biserial

Point-biserial

0.69

0.28

0.08

−0.08

0.06

0.42

0.09

−0.08

−0.08

0.46

0.17

0.10

0.09

0.16

0.30

0.13

0.07

−0.01

0.02

0.41

0.11

international_coauthors

Point-biserial

Sample size:

192,498

329,913

Quantitative Science Studies

365

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

/

e
D
u
Q
S
S
/
A
R
T
ich
C
e

P
D

l

F
/

/

/

/

4
2
3
5
3
2
1
3
6
3
8
7
Q
S
S
_
A
_
0
0
2
5
3
P
D

.

/

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Which factors are associated with Open Access publishing?

The results of logistic regression. The target variable is the publishing model and is equal to 1 for OA and 0 for CA publishing. Der
Tisch 4.
outputs are odds ratio, exp(β). (1 − exp(β)) shows the percentage change of the target variable per unit increase in an independent variable. Also,
an odds ratio greater/less than 1 displays a positive/negative correlation between variables

Set 1

Set 2

Odds ratio
0.002*** (−72.4)

95% CI
0.001 Zu 0.002

Odds ratio
0.00*** (−87.7)

95% CI
0.00 Zu 0.00

Intercept

Independent variables

journal_ranking

1.98*** (10.38)

1.74 Zu 2.25

110.7*** (86.5)

99.5 Zu 100.23

journal_APC

1.00*** (8.05)

1.0001 Zu 1.0002

field

Health Sciences

Life Sciences

Physical Sciences

Referenz

1.01 (0.31)

0.97 (−0.91)

Social Sciences

1.90*** (13.81)

multiple fields

1.25*** (8.5)

Referenz

0.94 Zu 1.08

0.91 Zu 1.07

1.73 Zu 2.08

1.19 Zu 1.32

Referenz

0.67*** (−9.55)

0.20*** (−44.29)

3.49*** (12.2)

3.4*** (30.87)

Referenz

0.62 Zu 0.73

0.18 Zu 0.21

2.86 Zu 4.27

3.17 Zu 3.71

country_income

OA_agreement

discount_eligible

waiver_eligible

OA_cite

authors_count

Geschlecht

Alter

1.00*** (33.88)

1.000 Zu 1.000

1.000*** (16.18)

1.00 Zu 1.00

14.9*** (65.07)

13.78 Zu 16.22

0.93(−0.78)

1.7*** (9.17)

20.19*** (5.53)

0.55*** (−12.97)

0.500 Zu 0.600

1.55*** (8.4)

0.78 Zu 1.11

1.52 Zu 1.90

8.29 Zu 77.5

1.39 Zu 1.71

1.003 (0.80)

0.94** (−2.8)

0.99 Zu 1.01

0.90 Zu 0.98

1.05*** (29.63)

1.05 Zu 1.1.054

1.17*** (33.15)

1.16 Zu 1.18

0.93* (−2.5)

0.97*** (−15.36)

0.88 Zu 0.98

0.96 Zu 0.98

OA_publish

196.79*** (105.65)

178.46 Zu 217.09

23.86*** (50.58)

21.1 Zu 26.99

international_coauthors

1.17*** (18.21)

1.15 Zu 1.19

1.03 (1.34)

0.99 Zu 1.06

McFadden’s pseudo R2

Sample size

0.25

96,674

0.60

162,773

Significance: *P < 0.05, **p < 0.01, ***p < 0.001. z-values of coefficients in parentheses. CI: Confidence interval. Table 5. Performance of predicting the publishing model of papers with random forest method Classification Precision Recall F1 score Accuracy OA 0.85 0.95 0.89 0.92 CA 0.94 0.83 0.88 366 Quantitative Science Studies l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Which factors are associated with Open Access publishing? Figure 6. Permutation importance of features employed to predict the publishing model of papers with the Random Forest method for the articles published in hybrid journals. suggests that the journal’s rank is the dominant factor in choosing a gold OA journal to publish. Therefore, we estimate the publishing model for articles in set 2 (hybrid journals) to discover other feature categories rather than journal-specific factors influencing the authors’ decision for an OA option. Moreover, the optional choice of the OA model in hybrid journals better reveals characteristics leading to the OA model. Table 5 shows the performance of the RF classifier for the second set (hybrid journals). Figure 6 displays the permutation importance of features employed to predict the publishing model implemented for this set. The permutation importance of a feature shows a decrease in the model performance when the feature’s value is randomly shuffled, but the values of other predictors remain unchanged. A higher value for a feature shows more predictive power in the proposed model. The highest importance values for country_income and age in Figure 6 indicate that the most significant factors in selecting an OA model are the income level of countries and seniority. The lowest value for the variable gender presents that gender has a lower impact on the authors’ decision for the OA model compared to other factors. OA_agree- ment is one of the weakest features in predicting the publishing model, and the correlation analysis also shows a weak correlation between them. One possible reason for the weak effect is that only 2.3% of papers have been involved in transformative agreements. In addition, the income level of countries is the most important feature, and regarding the positive correlation of this feature with OA publishing, it is more likely for authors from high-income countries (even without a transformative agreement) to publish in the OA model. This may also smooth the association of the agreement with OA publishing. 6. CONCLUSION AND DISCUSSION This work presents a detailed study of the relationship between author-specific and structural fac- tors (e.g., income level of authors’ affiliation country), OA publishing, and OA citation advantage. First, we investigated the relationship between the income level of countries and OA publishing for articles published by Springer Nature in the years 2017 and 2018. We found that authors from lower middle income countries with eligibility to use APC discounts have a lower proportion of gold OA publications in all published papers by this publisher compared to other countries. It indicates that discounted APC is still too much for these authors to pay for a gold OA model and agrees with the statement of Rouhi et al. (2022), who pointed out that waiver and discount issues could not bring author equity in reading and publishing. In contrast, the proportion of authors from countries with a low income level who receive APC waivers is higher than authors from other countries. This result conflicts with the study results by Smith et al. (2021), which found fewer OA paper proportions published by Elsevier for these countries compared to others. The reason could be stricter conditions that this publisher considers for waiver eligibility. Quantitative Science Studies 367 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Which factors are associated with Open Access publishing? We examined the citation impact of these articles and compared the percentage of highly cited papers among the publishing models and the income levels of the corresponding authors’ countries. For all countries, the OA model in gold OA or hybrid has the highest per- centage of highly cited papers. Also, the results demonstrate a higher proportion of highly cited articles for countries with higher income levels. Although it displays more citation impact for OA models, this can result from confounding factors such as self-selection and quality biases (Gargouri, Hajjem et al., 2010). Also, examining the preprint and green OA publishing effect (where the article has been published in the CA model, but a free version is available in a repository outside of the publisher’s website) will result in more accurate analyses (Fraser et al., 2020; Wang, Glänzel, & Chen, 2020). We conducted correlation, regression, and machine learning analyses to find more charac- teristics (e.g., author, journal, paper) related to OA publishing. The results of the correlation analysis displayed the strength of positive/negative correlation between the publishing model and every feature defined in Table 2. Using regression analysis, we examined the association of each factor while accounting for other factors. The results reinforced the correlation out- comes. The only conflict between these two methods was the negative correlation between discount_eligibility with OA publishing in the correlation analysis, whereas it was positive in regression evaluation. In addition, we estimated the publishing model of articles (OA or CA) using an RF-based machine learning approach and examined the impact of each feature on the estimation task. The results show that the country’s income and more experiences in OA rather than CA publishing are the most influential factors in estimating the publishing model. We discovered that the tendency toward OA publishing was slightly higher for women, but it was a less important feature than other features in estimating the OA model. 7. LIMITATIONS AND FUTURE WORK One obvious limitation of this study is that we included articles from just one publisher, Springer Nature. Authors’ publishing behavior may differ among articles published by other publishers, which limits the generalizability of the results of our study. We obtained the access status of journals in 2019 based on the list published on Springer Nature’s website (the same for the access status at the article level from Unpaywall). Some journals may have flipped from CA to OA (Momeni et al., 2021) or vice versa, and we did not detect this, which may cause errors in results. Furthermore, we did not control the correct- ness of external data (Springer Nature and Unpaywall). The accuracy of these data affects the results’ precision. We identified the gender of 49% authors and removed 49% of articles with- out gender status for the corresponding authors in the regression and machine learning anal- yses. In addition, 2% of the data have been removed because of the null value in other features (e.g., journals’ APC). Because the gender detection approach does not work well for Asian names, especially Chinese ones, we have a lower proportion of these authors with gender status in the data set, which also creates biases in our analyses. For future work, we can consider other publishers to examine how the different APC pol- icies among publishers impact OA publishing. Also, controlling for articles’ language in the analyses encourages future studies. Springer Nature is an international publisher and publishes mostly articles in English22, and articles in other languages are underrepresented in this study. Considering other publishers with non-English content and the articles’ language in the 22 https://support.springernature.com/en/support/solutions/articles/6000219817-are-any-of-your-titles-available -in-other-languages. Quantitative Science Studies 368 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Which factors are associated with Open Access publishing? analyses may reveal the role of languages in publishing international OA articles and citation advantages. AUTHOR CONTRIBUTIONS Fakhri Momeni: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing. Kristin Biesenbender: Conceptualization, Resources, Writing—review & editing. Philipp Mayr: Fund- ing acquisition, Project administration, Writing—review & editing. Stefan Dietze: Methodol- ogy, Supervision, Writing—review & editing. Isabella Peters: Funding acquisition, Project administration, Supervision, Writing—review & editing. COMPETING INTERESTS The authors have no competing interests. DATA AVAILABILITY The data set analyzed during the current study and code are available at https://github.com /momenifi/open_access_springer_nature.git. FUNDING INFORMATION This work is financially supported by BMBF project OASE, grant number 01PU17005A. We acknowledge the support of the German Competence Center for Bibliometrics (grant: 01PQ17001) for maintaining the used data set for the analyses. REFERENCES Bahlai, C., Bartlett, L. J., Burgio, K. R., Fournier, A. M., Keiser, C. N., … Whitney, K. S. (2019). Open science isn’t always open to all scientists. American Scientist, 107(2), 78–82. https://doi.org/10 .1511/2019.107.2.78 Barner, J. R., Holosko, M. J., & Thyer, B. A. (2014). American social work and psychology faculty members’ scholarly productivity: A controlled comparison of citation impact using the h-index. Brit- ish Journal of Social Work, 44(8), 2448–2458. https://doi.org/10 .1093/bjsw/bct161 Bautista-Puig, N., Lopez-Illescas, C., de Moya-Anegon, F., Guerrero-Bote, V., & Moed, H. F. (2020). Do journals flipping to gold open access show an OA citation or publication advan- tage? Scientometrics, 124(3), 2551–2575. https://doi.org/10.1007 /s11192-020-03546-x Behr, A., Giese, M., Teguim K., H. D., & Theune, K. (2020). Early prediction of university dropouts—A random forest approach. Jahrbücher für Nationalökonomie und Statistik, 240(6), 743–789. https://doi.org/10.1515/jbnst-2019-0006 Bornmann, L., & Mutz, R. (2014). From P100 to P1000: A new citation-rank approach. Journal of the Association for Information Science and Technology, 65(9), 1939–1943. https://doi.org/10 .1002/asi.23152 Bornmann, L., & Williams, R. (2020). An evaluation of percentile measures of citation impact, and a proposal for making them better. Scientometrics, 124(2), 1457–1478. https://doi.org/10 .1007/s11192-020-03512-7 Ekström, J. (2011). The phi-coefficient, the tetrachoric correlation coefficient, and the Pearson-Yule debate. Journal of the Korean Statistical Society, 42(3), 323–328. https://doi.org/10.1016/j.jkss .2012.10.002 Evans, J. A., & Reimer, J. (2009). Open access and global participa- tion in science. Science, 323(5917), 1025. https://doi.org/10 .1126/science.1154562, PubMed: 19229029 Farys, R., & Wolbring, T. (2021). Matthew effects in science and the serial diffusion of ideas: Testing old ideas with new methods. Quantitative Science Studies, 2(2), 505–526. https://doi.org/10 .1162/qss_a_00129 Fox, J., Pearce, K. E., Massanari, A. L., Riles, J. M., Szulc, Ł. … Gonzales, A. L. (2021). Open science, closed doors? Countering marginalization through an agenda for ethical, inclusive research in communication. Journal of Communication, 71(5), 764–784. https://doi.org/10.1093/joc/jqab029 Fraser, N., Momeni, F., Mayr, P., & Peters, I. (2020). The relation- ship between bioRxiv preprints, citations and altmetrics. Quanti- tative Science Studies, 1(2), 618–638. https://doi.org/10.1162/qss _a_00043 Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., …, Harnad, S. (2010). Self-selected or mandated, open access increases citation impact for higher quality research. PLOS ONE, 5(10), e13636. https://doi.org/10.1371/journal.pone .0013636, PubMed: 20976155 Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. https:// doi.org/10.1017/S0140525X0999152X, PubMed: 20550733 Hodge, D. R., & Lacasse, J. R. (2011). Evaluating journal quality: Is the h-index a better measure than impact factors? Research on Social Work Practice, 21(2), 222–230. https://doi.org/10.1177 /1049731510369141 Iyandemye, J., & Thomas, M. P. (2019). Low income countries have the highest percentages of open access publication: A systematic Quantitative Science Studies 369 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Which factors are associated with Open Access publishing? computational analysis of the biomedical literature. PLOS ONE, 14(7), e0220229. https://doi.org/10.1371/journal.pone.0220229, PubMed: 31356618 Jannot, A.-S., Agoritsas, T., Gayet-Ageron, A., & Perneger, T. V. (2013). Citation bias favoring statistically significant studies was present in medical research. Journal of Clinical Epidemiology, 66(3), 296–301. https://doi.org/10.1016/j.jclinepi.2012.09.015, PubMed: 23347853 (2016). Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., & Strohmaier, Inferring gender from names on the web: A M. comparative evaluation of gender detection methods. In Pro- ceedings of the 25th International Conference Companion on World Wide Web (pp. 53–54). https://doi.org/10.1145/2872518 .2889385 King, D. A. (2004). The scientific impact of nations. Nature, 430(6997), 311–316. https://doi.org/10.1038/430311a, PubMed: 15254529 Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., & Shukla, S. K. (2019). Malware classification using early stage behavioral analysis. In 2019 14th Asia Joint Conference on Information Security (AsiaJCIS) (pp. 16–23). https://doi.org/10.1109/AsiaJCIS .2019.00-10 Langham-Putrow, A., Bakker, C., & Riegelman, A. (2021). Is the open access citation advantage real? A systematic review of the citation of open access and subscription-based articles. PLOS ONE, 16(6), e0253129. https://doi.org/10.1371/journal.pone .0253129, PubMed: 34161369 Lawson, S. (2015). Fee waivers for open access journals. Publications, 3(3), 155–167. https://doi.org/10.3390/publications3030155 LeBlanc, V., & Cox, M. (2017). Interpretation of the point-biserial correlation coefficient in the context of a school examination. The Quantitative Methods for Psychology, 13, 46–56. https:// doi.org/10.20982/tqmp.13.1.p046 Lewis, C. L. (2018). The open access citation advantage: Does it exist and what does it mean for libraries? Information Technology and Libraries, 37(3), 50–65. https://doi.org/10.6017/ital.v37i3 .10604 Liu, W., & Li, Y. (2018). Open access publications in sciences and social sciences: A comparative analysis. Learned Publishing, 31(2), 107–119. https://doi.org/10.1002/leap.1114 Matthias, L., Jahn, N., & Laakso, M. (2019). The two-way street of open access journal publishing: Flip it and reverse it. Publica- tions, 7(2), 23. https://doi.org/10.3390/publications7020023 McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., … Yarkoni, T. (2016). Point of view: How open science helps researchers succeed. eLife, 5, e16800. https://doi.org/10.7554 /eLife.16800, PubMed: 27387362 Momeni, F., Mayr, P., & Dietze, S. (2022). Investigating the contri- bution of author- and publication-specific features to scholars’ h-index prediction. arXiv:2207.09655. https://doi.org/10.48550 /arXiv.2207.09655 Momeni, F., Mayr, P., Fraser, N., & Peters, I. (2021). What happens when a journal converts to open access? A bibliometric analysis. Scientometrics, 126, 9811–9827. https://doi.org/10.1007/s11192 -021-03972-5 Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021. https://doi.org/10.1038/s41562-016-0021, PubMed: 33954258 Olejniczak, A. J., & Wilson, M. J. (2020). Who’s writing open access (OA) articles? Characteristics of OA authors at Ph.D.- granting institutions in the United States. Quantitative Science Studies, 1(4), 1429–1450. https://doi.org/10.1162/qss_a_00091 Ottaviani, J. (2016). The post-embargo open access citation advan- tage: It exists (probably), it’s modest (usually), and the rich get richer (of course). PLOS ONE, 11(8), e0159614. https://doi.org /10.1371/journal.pone.0159614, PubMed: 27548723 Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., … Haustein, S. (2018). The state of OA: A large-scale analysis of the prevalence and impact of open access articles. PeerJ, 6, e4375. https://doi.org/10.7717/peerj.4375, PubMed: 29456894 Rimmert, C., Schwechheimer, H., & Winterhager, M. (2017). Disambiguation of author addresses in bibliometric databases. Technical Report. Bielefeld: Universität Bielefeld, Institute for Interdisciplinary Studies of Science. Ross-Hellauer, T., Reichmann, S., Cole, N. L., Fessl, A., Klebel, T., & Pontika, N. (2021). Dynamics of cumulative advantage and threats to equity in open science: A scoping review. Royal Society Open Science, 9(1), 211032. https://doi.org/10.1098 /rsos.211032, PubMed: 35116143 Rouhi, S., Beard, R., & Brundy, C. (2022). Left in the cold: The fail- ure of APC waiver programs to provide author equity. Science Editor, 45(1), 5–13. https://doi.org/10.36591/SE-D-4501-5 Roy, S. S., Chopra, R., Lee, K. C., Spampinato, C., & Mohammadi- Ivatlood, B. (2020). Random forest, gradient boosted machines and deep neural network for stock price forecasting: A compar- ative analysis on South Korean companies. International Journal of Ad Hoc and Ubiquitous Computing, 33(1), 62–71. https://doi .org/10.1504/IJAHUC.2020.104715 Samimi, A. J. (2011). Scientific output and GDP: Evidence from countries around the world. Journal of Education and Voca- tional Research, 2(2), 38–41. https://doi.org/10.22610/jevr .v2i2.23 Santamaría, L., & Mihaljević, H. (2018). Comparison and bench- mark of name-to-gender inference services. PeerJ Computer Science, 4, e156. https://doi.org/10.7717/peerj-cs.156, PubMed: 33816809 Schroter, S., Tite, L., & Smith, R. (2005). Perceptions of open access publishing: Interviews with journal authors. British Medical Journal, 330(7494), 756. https://doi.org/10.1136/ bmj.38359 .695220.82, PubMed: 15677363 Simard, M.-A., Ghiasi, G., Mongeon, P., & Larivière, V. (2021). Geographic differences in the uptake of open access. In 18th International Conference on Scientometrics and Informetrics (p p. 1033–1038). Retrieved f rom https://issi2021.o rg /proceedings/. Smith, A. C., Merz, L., Borden, J. B., Gulick, C. K., Kshirsagar, A. R., & Bruna, E. M. (2021). Assessing the effect of article processing charges on the geographic diversity of authors using Elsevier’s “Mirror Journal” system. Quantitative Science Studies, 2(4), 1123–1143. https://doi.org/10.1162/qss_a_00157 Sotudeh, H., Ghasempour, Z., & Yaghtin, M. (2015). The citation advantage of author-pays model: The case of Springer and Elsevier OA journals. Scientometrics, 104(2), 581–608. https:// doi.org/10.1007/s11192-015-1607-5 Spelmen, V. S., & Porkodi, R. (2018). A review on handling imbal- anced data. In 2018 International Conference on Current Trends Towards Converging Technologies (pp. 1–11). https://doi.org/10 .1109/ICCTCT.2018.8551020 Sullo, E. (2016). Open access papers have a greater citation advan- tage in the author-pays model compared to toll access papers in Springer and Elsevier open access journals. Evidence Based Library and Information Practice, 11(1), 60–62. https://doi.org /10.18438/B84W67 Wang, Z., Glänzel, W., & Chen, Y. (2020). The impact of preprints in library and information science: An analysis of citations, Quantitative Science Studies 370 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Which factors are associated with Open Access publishing? usage and social attention indicators. Scientometrics, 125(2), 1403–1423. https://doi.org/10.1007/s11192-020-03612-4 Xia, J. (2012). Positioning open access journals in a LIS journal ranking. College & Research Libraries, 73(2), 134–145. https:// doi.org/10.5860/crl-234 Yamak, Z., Saunier, J., & Vercouter, L. (2016). Detection of multiple identity manipulation in collaborative projects. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 955–960). https://doi.org/10.1145/2872518 .2890586 Zhu, Y. (2017). Who support open access publishing? Gender, dis- cipline, seniority and other factors associated with academics’ OA practice. Scientometrics, 111(2), 557–579. https://doi.org/10 .1007/s11192-017-2316-z, PubMed: 28490821 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u q s s / a r t i c e - p d l f / / / / 4 2 3 5 3 2 1 3 6 3 8 7 q s s _ a _ 0 0 2 5 3 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Quantitative Science Studies 371RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image
RESEARCH ARTICLE image

PDF Herunterladen