RESEARCH ARTICLE - Ricerca sull'intelligenza artificiale specializzata al MIT

RESEARCH ARTICLE

The confirmation of scientific theories using
Bayesian causal networks and citation sentiments

Henry Small

SciTech Strategies Inc., Bala Cynwyd, PAPÀ 19004

a n o p e n a c c e s s

j o u r n a l

Keywords: Bayes’s theorem, causal networks, citation context sentiments, confirmation, history and
philosophy of science, nociception

Citation: Small, H. (2022). IL
confirmation of scientific theories
using Bayesian causal networks and
citation sentiments. Quantitative
Science Studies, 3(2), 393–419. https://
doi.org/10.1162/qss_a_00189

DOI:
https://doi.org/10.1162/qss_a_00189

Peer Review:
https://publons.com/publon/10.1162
/qss_a_00189

Received: 2 Febbraio 2022
Accepted: 16 Marzo 2022

Corresponding Author:
Henry Small
hsmall@mapofscience.com

Handling Editor:
Ludo Waltman

The MIT Press

ABSTRACT

The confirmation of scientific theories is approached by combining Bayesian probabilistic
metodi, in particular Bayesian causal networks, and the analysis of citing sentences for highly
cited papers. It is assumed that causes and their effects can be identified by linguistic methods
from the citing sentences and that the cause-and-effect pairs can be equated with theories and
their evidence. Further, it is proposed that citation context sentiments for “evidence” and
“uncertainty” can be used to supply the required conditional probabilities for Bayesian
analysis where data is drawn from citing sentences for highly cited papers from various fields.
Hence, the approach combines citation and linguistic methods in a probabilistic framework
E, given the small sample of papers, should be considered a feasibility study. Special
attention is given to the case of nociception in medicine, and analogies are drawn with various
episodes from the history of science, such as the Watson and Crick discovery of the structure of
DNA and other discoveries where a striking and improbable fit between theory and evidence
leads to a sense of confirmation.

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

1. BACKGROUND

Scientometrics and quantitative studies of science have traditionally avoided epistemological
issues such as the nature of scientific knowledge, how knowledge is discovered and con-
firmed, and the relationship of theory and evidence. This is despite the fact that the scientific
papers we count, classify, and map are filled with arguments and descriptions dealing with
theories and observations, and why we should believe one finding or theory rather than
another. Clearly the field will need new tools, or to adapt old ones, to enable us to delve into
this deeper level of scientific content.

This paper will discuss one possible approach: the identification of causal statements in
scientific texts and the evaluation of their degree of confirmation, inspired by recent develop-
ments in causal network theory (Pearl, 2000; Pearl & Mackenzie, 2018). The concept of cau-
sality is itself the subject of much debate in philosophy from the time of Aristotle to the present
(Bunge, 1963; Findler & Bickmore, 1996; Sobrino, Olivas, & Puente, 2010). Contemporary
approaches to analyzing and extracting causal content from texts are increasingly focused
on deep learning algorithms (Li, Li et al., 2021UN; Trieu, Tran et al., 2020). Modern approaches
to causal networks are based on Bayes’s theorem, and we will use this framework to interpret
the causal assertions found in scientific texts.

Bayesian causal networks and citation sentiments

What do we mean by the statement A causes B? Because we are dealing with science, we
will interpret theories, hypotheses, models, or laws as positing causal assertions that are linked
to empirical findings or observations and are the effects of those causes. Così, if a theory
asserts that A causes B, and B is found to occur, this increases the probability that the theory
is correct, which is a basic tenet of Bayesian philosophy of science. Ovviamente, we know from
the history of science that theories have changed radically in the past, and there is no reason to
think that they will not continue to change in the future. No theory, no matter how well cor-
roborated, is invulnerable. This means that we will not be dealing with the ultimate causes,
whether A really causes B, or whether theory A is the final true explanation of B, but rather
with the perception or belief that theory A is true within a particular historical context given
the evidence B available at the time.

Familiar examples of changing causal explanations from the history of science are the tran-
sition from Aristotle’s theory of motion to Newton’s laws, Ptolemy’s Earth-centered account of
celestial motions to the Copernican Sun-centered account, the phlogiston theory of combus-
tion to Lavoisier’s oxygen-based theory, Newton’s to Einstein’s theory of gravity, and Bohr’s
atomic model to the Schrödinger/Heisenberg quantum mechanical theory of the atom. IL
Watson and Crick discovery of the structure of DNA will serve as an example of theory change
in the face of confirming and disconfirming evidence.

The replacing of one theory by another is, Ovviamente, an instance of what Kuhn (1962)
called a scientific revolution, although the vast majority of instances play out on a much
smaller, microrevolutionary scale. The common thread in these examples is that theories
act as causal constructs and effects are the observable phenomena. While the causes may
change over time as one theory supersedes another, the effects are somewhat more stable,
although the latter can increase in accuracy or expand dramatically when new scientific
instruments are invented. The field of medicine is replete with causes and effects, ad esempio,
when a bacterium or virus is postulated as the cause of a disease. Here the bacterium or virus
is the cause, or theory, and the disease is the effect or evidence. In diagnosis, the disease acts
as the cause or theory and symptoms as effects or evidence. Technologies and methods might
also be modeled in the same fashion, although here the mechanism or inner workings of the
method plays the role of the “theory,” and the end result or outcome is the “effect.” Generally,
the concept of “A causes B” can be viewed as a possible pathway in a complex, probabilistic
network of causes and effects.

From the effect side, we know from the work of Hanson (1972) that evidence can be theory
laden, and confirmation bias is always present. Ovviamente, theories are designed to explain
specific phenomena. If a theory is later found to explain or predict some other phenomenon,
then our confidence in the theory is usually increased. Likewise, unexpected failure to explain
some phenomenon may decrease our confidence in the theory. Effects are also subject to
experimental errors, which can propagate if a chain of measurements is involved. Such seems
to be the case for cold fusion, where initial experiments indicating an excess of energy output
over input were interpreted as support for a nuclear fusion hypothesis (“Cold fusion," 2021). In
the historical case of phlogiston, it was the neglect of weight comparisons of reactants and
prodotti, presumably irrelevant to theory, that delayed the recognition that something was
being added during combustion (oxygen) rather than being lost (phlogiston) (Ihde, 1964,
57). These examples are in accord with the Bayesian framework because confirming evidence
increases our confidence in a theory and disconfirming evidence decreases it.

In this paper we will deal with causal assertions at the microlevel rather than the paradigm-
changing level based on a close analysis of scientific texts. Ovviamente, collecting sufficient

Quantitative Science Studies

394

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

textual evidence for science in earlier centuries is challenging, but given current full text
resources there is no such limitation for contemporary science. Presumably, if we are seeking
opinion on the status of a current theory or empirical finding, we could perform a full text
search on the scientific literature or even on social media. This would generate a heteroge-
neous set of statements representing a broad range of opinion from experts and nonexperts.

In this paper we will constrain the process by focusing on specific highly cited papers and
their citing sentences, also called citances (Nakov, Schwartz, & Hearst, 2004), and attempt to
discern causes and effects from that more limited perspective. By restricting the data to highly
cited papers and their citing sentences, we can sharpen the focus on a specific theory, E
more accurately assess its degree of confirmation within a community of peers. Inoltre,
we can expand the scope by including closely related papers drawn from a citation-based
cluster. Citing sentences also reveal the degree of agreement among a community of citing
authors on the core findings of the cited work (Small, 1978), and when aggregated can be
represented as a network of assertions. The resulting network, it is proposed, can be inter-
preted as a collective model of the theory and its empirical outcomes.

The background to this effort was an analysis of a single highly cited paper on the topic of
nociception (Caterina, Schumacher et al., 1997), the biological basis of the sensation of pain
(Moayedi & Davis, 2013). Using a set of 763 citing sentences for this paper, it was possible to
manually construct a network of assertions that linked theoretical causes with experimental
effects (Small, 2021). The goal of this paper is to automate the creation of such networks as
far as possible and see if they can be used to assess the degree of confirmation of the under-
lying theory. In the course of this work, quite unexpectedly, the senior author of the original
focal paper (Caterina, Leffler et al., 2000), David Julius from the University of California, San
Francesco, was awarded the 2021 Nobel Prize in Physiology or Medicine for his contributions
to the field of nociception (Julius, 2021).

2. DATA

Three different data sources were used to identify highly cited papers and collect their citing
sentences. At the time this research began in early 2021, no single source of citing sentences
was available (see Nicholson, Mordaunt et al., 2021). Primo, the Centre for Science and
Technology Studies (CWTS) at Leiden University provided sets of highly cited papers and their
citing sentences partitioned into five algorithmically defined fields of science drawn from
Elsevier’s ScienceDirect database. These data were in turn drawn from full-text information
of English-language scientific papers published in Elsevier journals following the procedure
described in previous papers (Boyack, Van Eck et al., 2018; Larmers, Boyack et al., 2021).
Using this resource, IL 500 most cited papers were selected for each of five broad fields
(Biomedical and Health Sciences, Life and Earth Sciences, Mathematics and Computer
Scienza, Physical Sciences and Engineering, and Social Sciences and Humanities) in addition
to their citing sentences. The cited papers cover the years 2000 A 2015, and the citing sen-
tences are from papers published from 2000 A 2016.

A second data source was the open access subset of PubMed Central® (PMC) from the
National Library of Medicine, consisting of the full text of primarily biomedical articles in
XML format. The PMC includes papers that were required to be publicly available under
the National Institutes of Health public access policy and other open access sources. PMC
processing adds codes to the references cited by articles that allow the user to connect the
reference within the text to the bibliographic information at the end of each article, E, like
the Leiden ScienceDirect database, enables the retrieval of all the sentences from the full text

Quantitative Science Studies

395

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

of covered articles that cite a given reference. SciTech Strategies downloaded the open access
subset from November 2018, and imported it into a MySQL database (Small, Tseng, & Patek,
2017). The years covered are the 1990s to 2018.

The third data source used was a cluster analysis, or model, of Scopus data maintained by
SciTech Strategies. The model covers Scopus data for the years 1996 A 2018 and consists of
43 million documents assigned to 104,677 clusters or research communities (Klavans, Boyack,
& Murdick, 2020). Denoted as STS5, the model was created using a direct citation clustering
algorithm from Leiden University (Traag, Waltman, & Van Eck, 2019).

Papers were selected from different fields using these data sources. The papers served as
case studies for developing methods for extracting cause/effect (theory/evidence) relationships
from their citing sentences and testing their degree of confirmation, and should not be consid-
ered as representative of the broad fields. As an initial screening, samples of citances for each
paper were scanned for the presence of theoretical or experimental terms which suggested that
causal connections were being made. An examination of 20 or so citances for a given cited
paper reliably identified it as causal or noncausal. On this basis, roughly one-half of the papers
in a sample of 500 highly cited biomedical papers were classified as causal.

While citing sentences for causal cited papers tend to be causal as well, citing sentences
can also be descriptive, procedural, or programmatic and not make any theoretical assertions.
Citing sentences for method papers, Per esempio, are predominantly procedural in nature, E
not causal. Tuttavia, review papers, because of their role in synthesizing knowledge, can be a
rich source of causal connections.

Ten papers were selected from the Elsevier/CWTS data set spread across four fields: one from
Biomedical and Health Sciences, and three each from Life and Earth Sciences, Physical Sciences
and Engineering, and Social Sciences and Humanities (Vedi la tabella 1). These papers then served as
the basis of the feasibility study. The single paper from life science, the previously mentioned
paper by Caterina et al. (2000), appeared in cluster #769 from the SciTech Strategies STS5 model
(denoted STS5-769). This cluster consisted of 7,971 papers and was focused on nociception. IL
20 most cited Scopus papers from this cluster were also selected for analysis (Vedi la tabella 2). Citing
sentences for these 20 nociception papers were retrieved from the PubMed Central repository.
See Table 7 for general theory statements for each of the papers.

3. METHODS

3.1. Creating Causal-Effect Pairs

One of the goals of this project was to see if pairs of words, or more precisely noun phrase
pairs, could be extracted from citing sentences representing cause/effect or theory/evidence
connections. This seemed feasible because the citing sentences were often restatements of
the findings of the cited work, and multiple citing sentences were available.

Following the initial screening of highly cited papers for theoretical or experimental terms,
there was also the need to have some indicator that the citing sentence had made a causal
assertion. One way to do that is to look for general words that denote causes or effects. Exam-
ples of causal words are the verb activated and the noun stimulus. Examples of effect words are
response and result. A tal fine, general cause and effect words were compiled by manually
scanning citances for the 30 highly cited papers used in this study.

The manual selection process was augmented using machine learning in the following way
taking nociception as an example. A random sample of 327 sentences was selected from

Quantitative Science Studies

396

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Papers selected from Elsevier/CWTS data in four fields. The field from which the papers were selected precedes the bibliographic
Tavolo 1.
information on the paper. The column “number of citances” is the number of citing sentences in Elsevier’s ScienceDirect database through 2016

Paper

Biomedical and Health Sciences

Caterina, M. J., Leffler, A., Malmberg, UN. B., Martin, W. J., Trafton, J., … Julius, D. (2000). Impaired nociception and

pain sensation in mice lacking the capsaicin receptor. Scienza, 288(5464), 306–313.

Life and Earth Sciences

Mottram, D. S., Wedzicha, B. L., & Dodson, UN. T. (2002). Acrylamide is formed in the Maillard reaction. Nature,

419(6906), 448–449.

Loreau, M., Naeem, S., Inchausti, P., Bengtsson, J., Grime, J. P., … Wardle, D. UN. (2001). Ecology—biodiversity and

ecosystem functioning: Current knowledge and future challenges. Scienza, 294(5543), 804–808.

Alexander, M. (2000). Aging, bioavailability, and overestimation of risk from environmental pollutants.

Environmental Science & Tecnologia, 34(20), 4259–4265.

Physical Sciences and Engineering

Adachi, C., Baldo, M. A., Thompson, M. E., & Forrest, S. R. (2001). Nearly 100% internal phosphorescence

efficiency in an organic light-emitting device. Journal of Applied Physics, 90(10), 5048–5051.

Das, S. K., Putra, N., Thiesen, P., & Roetzel, W. (2003). Temperature dependence of thermal conductivity

enhancement for nanofluids. Journal of Heat Transfer-Transactions of the ASME, 125(4), 567–574.

Aharony, O., Gubser, S. S., Maldacena, J., Ooguri, H., & Oz, Y. (2000). Large N field theories, string theory and

gravity. Physics Reports—Review Section of Physics Letters, 323(3–4), 183–386.

Social Sciences and Humanities

Berkman, l. F., Glass, T., Brissette, I., & Seeman, T. E. (2000). From social integration to health: Durkheim in the

new millennium. Social Science & Medicine, 51(6), 843–857.

Cardinal, R. N., Pennicott, D. R., Sugathapala, C. L., Robbins, T. W., & Everitt, B. J. (2001). Impulsive choice

induced in rats by lesions of the nucleus accumbens core. Scienza, 292(5526), 2499–2501.

Blood, UN. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions
implicated in reward and emotion. Proceedings of the National Academy of Sciences of the United States of
America, 98(20), 11818–11823.

Number of
citances

763

399

406

395

560

574

480

349

326

323

papers citing Caterina et al. (2000), and manually classified as causal or noncausal. IL
sample was divided into training and test sets, and the Scikit-learn package was used for
machine learning (Pedregosa, Varoquaux et al., 2011). The algorithm finds an optimal surface
in multidimensional space separating the causal and noncausal sentences where each word
corresponds to an axis in the space. This is done for 10 classificatori. The median accuracy of
IL 10 classifiers was 73%. Three of the classifiers had an accuracy of 74%. One of these
was the BernoulliNB classifier, which had an F1 of 75% based on its precision and recall
scores. The coefficients of individual words for that classifier were used to select additional
cause/effect words. Per esempio, the highest coefficient words for the Bernoulli classifier
included words like induced, activation, stimuli, and responses, while low coefficient words
were action-oriented, like performed or examined but in general were more diverse. Eight
cause/effect words appeared in the top one-half of one percent of words ranked by
coefficient.

Quantitative Science Studies

397

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Tavolo 2.
through 2018

Twenty most cited papers from the STS5-769 cluster on nociception. The selection is based on citation counts from Scopus

Caterina, M. J., Schumacher, M. A., Tominaga, M., Rosen, T. A., Levine, J. D., & Julius, D. (1997). The capsaicin receptor: A heat-

activated ion channel in the pain pathway. Nature, 389(6653), 816–827.

Caterina, M. J., Leffler, A., Malmberg, UN. B., Martin, W. J., Trafton, J., … Julius, D. (2000). Impaired nociception and pain sensation in

mice lacking the capsaicin receptor. Scienza, 288(5464), 306–313.

Tominaga, M., Caterina, M. J., Malmberg, UN. B., Rosen, T. A., Gilbert, H., … Julius, D. (1998). The cloned capsaicin receptor integrates

multiple pain-producing stimuli. Neuron, 21(3), 531–543.

Clapham, D. E. (2003). TRP channels as cellular sensors. Nature, 426(6966), 517–524.

Story, G. M., Peier, UN. M., Reeve, UN. J., Eid, S. R., Mosbacher, J., … Patapoutian, UN. (2003). ANKTM1, a TRP-like channel expressed in

nociceptive neurons, is activated by cold temperatures. Cell, 112(6), 819–829.

McKemy, D. D., Neuhausser, W. M., & Julius, D. (2002). Identification of a cold receptor reveals a general role for TRP channels in

thermosensation. Nature, 416(6876), 52–58.

Julius, D., & Basbaum, UN. IO. (2001). Molecular mechanisms of nociception. Nature, 413(6852), 203–210.

Szallasi, A., & Blumberg, P. M. (1999). Vanilloid (Capsaicin) receptors and mechanisms. Pharmacological Reviews, 51(2), 159–211.

Peier, UN. M., Moqrich, A., Hergarden, UN. C., Reeve, UN. J., Andersson, D. A., … Patapoutian, UN. (2002). A TRP channel that senses cold

stimuli and menthol. Cell, 108(5), 705–715.

Holzer, P. (1991). Capsaicin: Cellular targets, mechanisms of action, and selectivity for thin sensory neurons. Pharmacological Reviews,

43(2), 143–201.

Jordt, S.-E., Bautista, D. M., Chuang, H.-H., McKemy, D. D., Zygmunt, P. M., … Julius, D. (2004). Mustard oils and cannabinoids excite

sensory nerve fibres through the TRP channel ANKTM1. Nature, 427(6971), 260–265.

Davis, J. B., Gray, J., Gunthorpe, M. J., Hatcher, J. P., Davey, P. T., … Sheardown, S. UN. (2000). Vanilloid receptor-1 is essential for

inflammatory thermal hyperalgesia. Nature, 405(6783), 183–187.

Bautista, D. M., Jordt, S.-E., Nikai, T., Tsuruda, P. R., Read, UN. J., … Julius, D. (2006). TRPA1 Mediates the inflammatory actions of

environmental irritants and proalgesic agents. Cell, 124(6), 1269–1282.

Venkatachalam, K., & Montell, C. (2007). TRP channels. Annual Review of Biochemistry, 76, 387–417.

Bandell, M., Story, G. M., Hwang, S. W., Viswanath, V., Eid, S. R., … Patapoutian, UN. (2004). Noxious cold ion channel TRPA1 is

activated by pungent compounds and bradykinin. Neuron, 41(6), 849–857.

Caterina, M. J., & Julius, D. (2001). The vanilloid receptor: A molecular gateway to the pain pathway. Annual Review of Neuroscience,

24, 487–517.

Caterina, M. J., Rosen, T. A., Tominaga, M., Brake, UN. J., & Julius, D. (1999). A capsaicin-receptor homologue with a high threshold for

noxious heat. Nature, 398(6726), 436–441.

Ramsey, IO. S., Delling, M., Clapham, D. E., Bautista, D. M., Jordt, S.-E., … Julius, D. (2006). An introduction to TRP channels. Annual

Review of Physiology, 68, 619–647.

Nilius, B., Owsianik, G., Voets, T., Peters, J. A., Venkatachalam, K., & Montell, C. (2007). Transient receptor potential cation channels in

disease. Physiological Reviews, 87(1), 165–217.

Chuang, H.-H., Prescott, E. D., Kong, H., Shields, S., Jordt, S.-E., … Julius, D. (2001). Bradykinin and nerve growth factor release the

capsaicin receptor from PtdIns(4,5)P2-mediated inhibition. Nature, 411(6840), 957–962.

Quantitative Science Studies

398

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Eventually a set of 230 cause/effect words was compiled by a combination of manual scan-
ning and machine learning runs. Only sentences containing one or more of the cause or effect
words were input to subsequent processing steps aimed at isolating the actual causal asser-
zioni. An example of a causal citance for Caterina et al. (2000) is “The transient receptor poten-
tial vanilloid 1 channel (TRPV1) is a nonselective cation channel expressed in primary sensory
neurons and implicated in thermal hyperalgesia.” Causal words here are expressed and impli-
cated. A descriptive, and hence noncausal, citance is “The cell bodies of the primary afferent
sensory nerves are located in the dorsal root ganglia and trigeminal ganglia.”

Many of the cause/effect words were verbs, and it appeared that verbs were important
separators of causes from effects within the sentences. Initially it was also noted that causes
seemed to occur as subjects of the citing sentences while effects occurred in the predicates
(per esempio., “A causes B”). This rule, Tuttavia, does not hold when the passive voice is used, for
esempio, “B was caused by A.” In either case, Tuttavia, the cause and effect usually appear
in different clauses separated by a verb. Così, the output from a part-of-speech parser was
input to a Python script that formed pairs of noun phrases separated by verbs, as illustrated
by Figure 1.

A count is then made of the total number of identical noun phrase pairs from different verb-
separated sentence segments across all citing sentences for the highly cited paper or cluster of
papers. Tavolo 3 shows the most frequent pairs for the 20 most cited papers from cluster
STS5-769 on nociception.

“TRP” stands for “transient receptor potential channel,” of which many subtypes have been
identified that respond to a variety of agents. These subtypes are considered sensors of the
cell’s environment. For the Caterina et al. (2000) citances alone, the noun phrase “TRPV1”
pairs with “noxious heat” 13 times. A more general wildcard search for “*TRPV1*” and
“*heat*” shows that these words are paired 143 times across verb-separated segments. In
causal terms we can say that the TRPV1 receptor leads to the sensation of heat, which at this
point is a hypothesis in need of confirmation.

This approach is similar to the subject-predicate-object (SPO) triples method used by
databases, such as the semantic MEDLINE, to facilitate search and to identify various types

Figura 1. The creation of causal phrase pairs from a citing sentence. TRPV1 receptor phrases are
highlighted in yellow, which function as causes, and “heat” words in red as effects. Verbs are high-
lighted in green and break the sentence into segments across which phrase pairs are formed. Two
cause/effect pairs are generated in this example involving the TRPV1 receptor.

Quantitative Science Studies

399

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Tavolo 3. Most frequent noun phrase pairs for the top 20 papers in the nociception cluster from
SciTech Strategies, Inc. IL 20 papers are listed in Table 2. Citing sentences are from the PMC
database through 2018. Equivalent terms and acronyms have been unified

Cause phrase
TRPV1

TRPV1

capsaicin

TRPA1

TRPM8

TRPV1

TRPA1

TRPV1

capsaicin

TRPV1

Effect phrase

capsaicin

heat

noxious heat

protons

TRPV1

allyl isothiocyanate

menthol

inflammation

noxious cold

mustard oil

pain

receptor

pain

bradykinin

# of citances
170

of concept dependences in the biomedical literature (Kilicoglu, Shin et al., 2012; Rindflesch,
Kilicoglu et al., 2011). The so-called “semantic predications” are available through the NLM’s
SemMed database and have been used by Chen and Song (2018) to map the subject and
object connections involving causal type links in various ways to understand how causal con-
nections can transform biomedical research areas. In a similar vein Li, Peng, and Du (2021B)
have explored SPO triples as knowledge units in connection with the uncertainty sentiment as
part of a case study of lung cancer.

The field of literature-based discovery (LBD) also uses the SPO tool to identify what
Swanson (1986) called “undiscovered public knowledge,” which is new knowledge somehow
implicit in existing knowledge. The extensive LBD literature has recently been reviewed
(Thilakaratne, Falkner, & Atapattu, 2019). The goal of LBD, Tuttavia, differs from that pursued
in this paper, which is not to posit new “undiscovered” knowledge but rather to identify
existing causal associations in the literature and assess their degree of confirmation. A related
approach to ours uses Bayesian networks among semantic predications to find novel biomed-
ical hypotheses (Atkinson & Rivas, 2008). Their approach, Tuttavia, requires that conditional
probabilities be supplied by experts in the field and is not aimed at confirmation.

Another difference between the present approach and the SPO work is that phrase pairs are
focused on the citing sentences for a specific highly cited paper or cluster of closely related
papers and not the titles and abstracts of papers used by semantic MEDLINE. This means that
we can capture the community consensus on the significance of the cited work and limit the
phrase pairs to the subject matter represented by the cited paper or cluster. We can also look at
causal connections across a variety of scientific fields and not be restricted to biomedicine.

Quantitative Science Studies

400

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

As noted above, there is no guarantee that the cause will precede the effect in the sentence,
and cases have been found where the cause appears in the predicate following a verb. Così,
the best policy is to look for frequently appearing noun phrases either preceding or following
verbs and use other criteria to discern which is the cause and which is the effect. The rule of
thumb adopted here is to take the more abstract or general entity to be the cause/theory and
the more specific and concrete entity to be the effect/evidence. To give an example from one
of the papers selected for analysis, if an abstract entity like the “Maillard reaction” is paired
with the specific substance “acrylamide,” then the chemical reaction is the cause and acryl-
amide, the effect. This is despite the fact that the phrase “Maillard reaction” comes after the
word “acrylamide” in most sentences due to the passive voice (per esempio., “acrylamide is formed by
the Maillard reaction”). In questo caso, a total cause and effect pair frequency can be obtained by
combining the forward with the backward occurrences. Later, we will differentiate these as
“forward” and “backward” cases and show that the “forward” cases predominate.

3.2. The Bayesian Theory Confirmation

In his pioneering work on a computational method for evaluating theories called the theory of
explanatory coherence (TEC), Thagard (1992) noted that the main drawback to applying
Bayes’s theorem to confirmation was the difficulty of specifying the conditional probabilities
required for the calculation. Invece, Thagard posited a network of nodes representing state-
ments that either cohere or conflict with one another. By passing confirming or disconfirming
signals iteratively through the network, the weights for each node eventually converge to
stable values for each statement.

By contrast, a Bayesian approach is based on causal relationships between a set of state-
ments in the form of a directed, acyclic graph (DAG), where each link has, in effect, two
weights associated with it, one denoting the probability that the theory agrees with the evi-
dence and the other the probability that some other theory does. The weights are the condi-
tional probabilities. Like the TEC process, the Bayesian network passes information back and
forth among the linked statements in a series of iterations in a process called belief propagation
until an equilibrium is reached, and new probabilities are arrived at that determine whether
confirmation is achieved (Pearl & Mackenzie, 2018, P. 112). This process has been imple-
mented in the Bnlearn package running in R (Nagarajan, Scutari, & Lebre, 2013), and later
will be applied to a network of causal relationships in the field of nociception.

Bayesian confirmation theory was proposed by Carnap in the 1950s and was developed by
philosophers of science beginning in the 1970s. It is based on a subjective interpretation prob-
ability in contrast to a frequentist one where countable events set the probabilities (Pearl,
2000). In either the subjective or frequentist interpretation, probabilities vary between 0 E
1, Dove 1 indicates complete certainty. Per esempio, the probability of a theory T being true,
such as quantum mechanics or the Watson/Crick double helix for DNA, is a matter of subjec-
tive opinion, whether individual or collective, and is called the prior probability, denoted as
P(T ). The fundamental assumption of Bayesian confirmation is that T and E are logically inde-
pendent, that the prediction of the theory does not affect or influence the acquisition of the
evidence, and vice versa. Così, the joint probability of T and E, P(T & E ) represents the agree-
ment of theory with evidence.

The notation P(E|T ) is the probability of observing E given that theory T is true. This has the
character of a deduction of E from T, going from the general to the specific. The inverse, P(T|E ),
is the probability of theory T being true, given that evidence E is observed, has the character of
an induction going from the specific to the general. P(T|E ) is called the posterior probability,

Quantitative Science Studies

401

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

the probability of the theory conditional on the evidence E, which indicates confirmation if it is
greater than the prior probability P(T ). In this case we apply Bayes’s rule and update the prior
probability for the theory P(T ) to the value of the posterior probability P(T|E ), awaiting the
arrival of further evidence either confirming or disconfirming the theory. The deductive step
T → E requires time and effort on the part of the scientist whereas the inductive step E → T
does not, which means that realizing T agrees with E is delayed even if E is old.

Bayes’s theorem can be written as:

P T jEð

Þ ¼ P Tð Þ (cid:2) P EjTð

Þ=P Eð Þ

which follows from the definition of P(T|E ) as P(T & E )/P(E ), and P(E|T ) as P(E & T )/P(T ).

An extension of this formula using a theorem in probability theory called “total probability” is:

P T jEð

Þ ¼ P Tð Þ (cid:2) P EjTð

Þ= P Tð Þ (cid:2) P EjTð

Þ þ P (cid:3)Tð

Þ (cid:2) P Ej(cid:3)T
ð

Dove (cid:3)T is “not T” or “anything other than T” and P((cid:3)T ) + P(T ) = 1.

In the context of theory and evidence, IL (cid:3)T indicates any possible theory other than T
that might explain E such as an alternative or competing theory. “Total probability” states that
any probability, say P(E ), can be expressed as the sum of all possible mutually exclusive
theories Ti, questo è, the sum of P(E|Ti) * P(Ti) over i, or equivalently the sum of all joint
probabilities P(E & Ti) (Pearl, 2000, P. 4).

The conditional probability P(E|T ) expresses how well the theory T fits the evidence E,
and P(E|(cid:3)T ) how well an alternative theory fits the evidence E. The ratio of these two quan-
tities is called the likelihood ratio and determines whether the hypothesis is confirmed or
disconfirmed (Howson & Urbach, 2006, P. 21; Pearl, 2000, P. 7). It follows from Bayes’s
theorem that if P(E|T ) is greater than P(E|(cid:3)T ), P(E|T ) must be greater than the prior proba-
bility P(T ). This indicates that the hypothesis is confirmed. Conversely, if P(E|T ) is less than
P(E|(cid:3)T ), the theory is disconfirmed and P(E|T ) is less than P(T ). If P(E|T ) = P(E|(cid:3)T ) then the
theory is neither confirmed nor disconfirmed, and the posterior probability P(T|E ) equals the
prior probability P(T ), which means that taking the evidence E into account does not change
the probability of the theory. These relationships can be illustrated graphically by plotting the
three probabilities P(T|E ), P(E|T ), and P(E|(cid:3)T ) as a three-dimensional surface for a given
value of P(T ) (Small, 2020). Note that P(E|(cid:3)T ) is the probability of a false positive assuming
T is true.

It is obvious that most scientists do not follow such a formal mathematical procedure when
formulating or testing their theories (Glymour, 1980; Kuhn, 1977). Tuttavia, it is possible that
many scientists intuitively apply two principles of the Bayesian approach in the conduct of
their research: first, when they assess the fit between a theory and the evidence, questo è, IL
ability of the theory to explain or predict the evidence, and second, when they assess whether
an alternative theory can explain the evidence equally well or better. Hence, the Bayesian
apparatus does suggest some simple rules of thumb for evaluating theories.

As an historical example, consider James Watson’s realization that the DNA bases fit
together in a unique way. By playing with cardboard cut-outs of the four bases (adenine,
thymine, guanine, and cytosine), he saw that the pattern of hydrogen bonding fit together
neatly for A linking to T and G linking to C (Olby, 1974; Watson, 1968). This unique pattern
also explained the Chargaff rules of base ratios, as well as the observed symmetry from X-ray
diffraction by Rosalind Franklin (Schindler, 2008). Così, at least three increments of confirma-
zione (stereochemistry, X-ray symmetry, and base ratios) gave a boost to the theory, increasing its
P(E|T ). Allo stesso tempo, Watson’s previous model of DNA, where bases were bonded like-to-

Quantitative Science Studies

402

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

like (Watson, 1968, P. 185), an alternative model, could not explain these findings, così
decreasing P(E|(cid:3)T ). Hence, the autobiographical and historical accounts of Watson and
Crick’s work are consistent with a Bayesian framework, although they do not show that
Bayesian precepts actually governed the actions of the participants.

3.3. Estimating Probabilities Using Sentiment Analysis

It is not immediately obvious how bibliometric methods can be adapted to a Bayesian model.
One approach is to use autobiographical accounts of discoveries such as Watson’s to look for
events that increment or decrement confidence in a theory or competing theory. Linus
Pauling’s competing theory of a triple helix structure for DNA was rejected by Watson because
the structure could not be acidic, which contradicted experimental evidence. This reduced
Watson’s confidence in the model. Tuttavia, we have no way of knowing how much the
probability of the model was reduced. Nor does the Bayesian theory give us any guidance
on what counts as evidence. Per esempio, “accuracy” is just one of the five criteria of theory
choice discussed by Kuhn (1977). Another very different approach is to survey the opinion of
peers on the model. This can be done in retrospective studies by analyzing a large sample of
contemporary texts, Per esempio, by a sentiment analysis of citation contexts. Presumably, IL
community would be using their own subjective criteria when citing the theory, which may or
may not match those used by the discoverers.

The quantity P((cid:3)T ), the prior probability of “not T,” seems amenable to an analysis of
uncertainty. By searching for the number of sentences jointly mentioning the theory (or causal
entity) and uncertainty terms, we get a measure of the uncertainty of T. Dividing this quantity
by the number of sentences containing T gives a number between 0 E 1. This provides a
probability measure of uncertainty for T or certainty for (cid:3)T. We obtain a quantity proportional to
the prior probability of the theory P(T ) by subtracting P((cid:3)T ) from one because P(T ) = 1 − P((cid:3)T ).
A similar approach might be taken to indirectly estimating P(E|(cid:3)T ) because we are looking
for instances of support for an alternative to T, namely (cid:3)T, as an explanation of E which
implies a weakening of T. We do this by searching for sentences containing both theory T
and evidence E (cioè., both cause and effect) in conjunction with uncertainty terms. In this
instance, the uncertainty terms weaken the theory and there is no need to subtract from
one. To estimate P(E|T ) we need to find sentences where support is provided for the theory-
evidence or cause-effect combination. In questo caso, we use a vocabulary of words indicating
that supporting evidence is provided and search for them in conjunction with the theory-
evidence pair. The number of such sentences divided by the total number of sentences with
the theory-evidence pair gives a rate of support for the theory by the evidence.

It is important to recognize the approximate and indirect nature of these estimates of con-
ditional probabilities. In the case of P(E|T ) we are assuming that the appearance of words
denoting supporting evidence for a hypothesis boosts the probability that T leads to E. In
the case of P(E|(cid:3)T ) we are assuming that the appearance of uncertainty words in a sentence
involving the theory increases the probability that some other theory ((cid:3)T ) explains the evi-
dence without, Tuttavia, saying what that other theory is. We will discuss the limitations of
this approach in the discussion section. No doubt the existence of viable competing or alter-
native theories increases the uncertainty of the theory under consideration (Chen & Song,
2018), but there may be other reasons for this lack of confidence and by itself it does not imply
support for an alternative theory.

Another difficulty with using uncertainty and support terms to estimate probabilities is due
to the inherent differences in the rate of occurrence of these words for different topics. For

Quantitative Science Studies

403

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

esempio, in most cases examined, the “supporting evidence” term occurrences exceed the
“uncertainty” occurrences. This may simply express a “confirmation bias” or tendency to use
supporting words in citation contexts, as pointed out by Greenberg (2009). Large-scale
studies, such as Nicholson et al. (2021), based on deep learning showed an even larger
imbalance between “supporting” and “contrasting” citances, although they appear not to
have taken “uncertainty” terms into account. There also may be inherent differences between
topics in the rates of sentiment words that could lead to biases in comparing topics. A simple
solution to compensate for such differences is to make the theory-evidence rates relative to
a baseline specific to the topic in question. To do this we divide the rates derived from
the cause-effect sentences by a baseline rate obtained from a broader sample of sentences
that includes the sentences under analysis. Per esempio, if the sentences are contained as a
subset of a broader topic, we can divide by the “support” and “uncertainty” rates computed
from the broader topic. Such baseline rates have been computed using all citing sentences
for individual highly cited papers or, alternatively, for a cluster of closely related papers on
the topic.

As an example, suppose the theory-evidence or cause-effect terms occur in 615 sentences
in a data set consisting of 4,752 sentences. Of the 615 sentences, 79 (12.8%) contain uncer-
tainty terms, while 123 (20%) sentences contain supporting evidence terms. The correspond-
ing rates for the broader baseline sample of 4,752 sentences are 20.3% E 24.7%. Dividing
by the baseline rates gives 0.63 for uncertainty and 0.81 for supporting evidence. Because we
are equating uncertainty with P(E|(cid:3)T ) and supporting evidence with P(E|T ), these values give
a likelihood ratio greater than 1 and the theory is confirmed.

3.4. Compiling Sentiment Word Sets

We have relied on the presence of specific cue or signal words to classify the citing sentences.
Three types of sentiment word sets have been compiled: words denoting causes and effects,
words expressing supporting evidence, and words expressing uncertainty. For uncertainty
parole, important prior work has been carried out by Chen and Song (2018) and by Chen,
Song, and Heo (2018). They use a seed set of uncertainty words from Hyland (2004) including
hedging terms and expand the set by the word2vec method (Mikolov, Sutskever et al., 2013).
In one of their studies, they use predications from semantic MEDLINE involving causal pred-
ications such as “HIV CAUSES Aids.” When they combine these data with the presence of
uncertainty words they can show the time evolution of certainty or uncertainty for the claim
over a period of years. They point out that predications are much enhanced by the inclusion
of uncertainty.

The approach taken here involves manual coding of random samples of sentences for each
of these sentiments, coding each sentence as having the sentiment or not having it. IL
sentences coded as having the sentiment were tokenized and word counts generated. IL
resulting ranked lists were scanned for possible cue words for the sentiment. The cue words
selected were as independent as possible of subject matter or technical meaning. Lists com-
piled by other authors were also consulted to see what cue words were used in their studies.
Per esempio, the recent paper on identifying “disagreement” citances (Larmers et al., 2021)
was used to augment the uncertainty word set as it seemed likely that disagreement contributes
to the lack of certainty of an assertion.

Machine learning was also used to aid in the compilation of cue words, as described pre-
viously for the cause/effect sentiment, by dividing the coded random samples of sentences into
training and test sets. The output from machine learning includes the accuracy of the various

Quantitative Science Studies

404

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

classifiers and the coefficients for individual words for a given classifier that define the optimal
surface in multidimensional word space. Because these coefficients are higher for words that
occur in sentences classified having a particular sentiment (assuming the sentence is coded 1
for presence of the sentiment, E 0 for its absence), scanning the list of words having the
highest coefficients can also reveal potential cue words for the sentiment.

The precision and recall of a given word can be computed by matching the manually
coded sentences with the sentences retrieved by the sentiment word. Per esempio, IL
cause/effect cue word “stimuli” retrieved 30 sentences that contained the word, of which
25 were coded causal and five noncausal. Così, the precision for this word in retrieving causal
sentences is 25/30, O 83%, based on this sampling. Recall for this single word is 25/254, O
10%, although recall is expected to be low for single words.

A similar exercise was undertaken for compiling and testing “uncertainty” sentiment words.
A small set of 25 uncertainty words was compiled and tested against 300 randomly selected
sentences from the fields of life science, biological science, physical science, and social sci-
ence. These sentences were coded independently by two coders as uncertain or certain.
Matching the set of 25 prospective uncertainty words (using wildcard searches to retrieve var-
iants) and comparing the hits to the manually coded sentences gave an overall precision of
75% and a recall of 56% for the aggregate of 300 sentences from the four fields combining the
results from both coders. The relatively low recall statistic indicates that the 25 uncertainty
words were inadequate for retrieving all the sentences that had been coded as uncertain.
Using Cohen’s Kappa (Cohen, 1968), only a moderate interrater reliability of 0.43 was found
for the two coders. Nevertheless, the precision computed for individual words revealed a core
of reliable uncertainty words (Tavolo 4).

The compilation of words for the “supporting evidence” sentiment followed a similar
course. This sentiment was designed to capture sentences that seek or claim evidence support-
ing the cause/effect assertion. Così, words that indicate support, such as demonstrate, show, O
measure, are included, as are words denoting actions to find evidence such as study, observe,
and experiment. Ten of these cue words were tested on the same set of 300 sentences from
four fields using the two coders as described above. In this case overall precision and recall
improved to 90% E 79% rispettivamente. Again, overall recall was lower than precision, indi-
cating that not all cases of “supporting evidence” were retrieved. The precision and recall for
eight individual words are shown in Table 5.

Tavolo 4. Uncertainty words with the highest recall and precision, based on a random sample of
300 sentences from four fields. Sentences were coded independently by two coders

Word
Tuttavia

may

could

although

appears

suggests

failed

Wildcard search
*however*

Precision (%)
91.6

Recall (%)
23.6

*may*

*could*

*although*

*appear*

*suggest*

*fail*

76.9

78.6

100.0

75.0

21.5

11.0

12.9

6.5

9.7

405

Quantitative Science Studies

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Tavolo 5.
sample of sentences used for Table 4.

Supporting evidence words with the highest recall and precision, based on the same

Word
report

observe

Wild card search

*report*

*observ*

experiment

*experiment*

study

*stud*

demonstrate

*demonstrat*

found

show

measure

4. RESULTS

*found*

*show*

*measur*

Precision (%)
92.0

Recall (%)
33.7

90.0

85.4

89.3

90.0

83.3

97.0

97.2

11.0

20.3

29.1

5.2

5.8

19.2

20.3

4.1. Computing Confirmation for Individual Causal Pairs by the Likelihood Ratio

Each of the highly cited papers in Table 1 and corresponding citing sentences were repre-
sented by frequently occurring cause-and-effect phrase pairs. As described previously, these
pairs are generated by combining noun phrases separated by verbs across the citing sentences
containing causal words and ranking the phrase pairs by frequency. This results in a list with a
few frequently encountered pairs at the top of the list and a long tail of less frequently occur-
ring pairs. Primo, we will focus on the most frequent phrase pair for each paper and present a
typical citing sentence for each.

Tavolo 6 shows the principal causal phrase pair for each highly cited paper, the number of
instances of the phrase pairs in verb-separated segments of the citing sentences, and the

Principal causal phrase pairs for highly cited papers in Table 1. The first column gives the primary author and year of the paper. IL
Tavolo 6.
third column gives the cause and effect separated by an arrow →. The verb-separated count column shows the forward (F) and backward (B)
occurrences

Highly cited paper
Caterina (2000)

Field

Life sci

Principal causal phrase pair

TRPV1 → heat

Mottram (2002)

Biological sci

Maillard reaction → acrylamide

Loreau (2001)

Biological sci

biodiversity → ecosystem

Alexander (2000)

Biological sci

time → bioavailability

Adachi (2001)

Das (2003)

Physical sci

excitons → quantum efficiency

nanofluid → thermal conductivity

134 F + 156 B

Aharony (2000)

Physical sci

ads/cft → boundary

Berkman (2000)

Cardinal (2001)

Blood (2001)

Social sci

social network → health

brain lesions → impulsivity

music → reward

19 F + 2 B

22 F + 10 B

74 F + 24 B

40 F + 34 B

Verb-separated
count
108 F + 35 B

35 F + 100 B

69 F + 26 B

7 F + 23 B

40 F + 25 B

Distinct sentences
with both phrases
99

152

100

283

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Quantitative Science Studies

406

Bayesian causal networks and citation sentiments

Typical citing sentences and theory statements for the principal causal pairs in Table 6. The first column gives the primary author
Tavolo 7.
and year of the paper from Table 1. The second column contains a typical citing sentence in quotes, and in the following row a summary
statement of the theory

Highly cited paper
Caterina (2000)

Typical citing sentence for the causal pair/statement of theory
“Temperature gating is an important feature of TRPV1, critical for the somatosensory response to noxious heat.”

Theory

There are a variety of genetically expressed molecular receptors on neurons responsible for the sensation of heat

and other environmental stimuli.

Mottram (2002)

“The major mechanistic pathway for the formation of acrylamide in foods so far established is via the Maillard

reaction.”

Theory

The Maillard reaction mechanism accounts for acrylamide formation in high-starch foods during cooking at high

temperatures.

Loreau (2001)

“Many studies were focused on so called biodiversity effects, cioè., the way in which diversity affects ecosystem

function and services.”

Theory

Plant diversity is crucial for maintaining the function and stability of ecosystems.

Alexander (2000)

“Bioavailability and toxicity of organic chemicals in soil can change over time.”

Theory

The aging of contaminated sediment and soil reduces bioavailability of pollutants to microorganisms due to

sequestration.

Adachi (2001)

“Due to the ability to harvest both singlet and triplet excitons, phosphorescent organic light emitting devices can

Avere 100% internal quantum efficiency.”

Theory

The internal quantum efficiency of the OLED devices can be greatly enhanced approaching 100%.

Das (2003)

“From the investigations in the past decade, nanofluids were found to exhibit significantly higher thermal

properties, in particular, thermal conductivity, than those of base fluids.”

Theory

In a nanofluid, thermal conductivity enhancement can be explained based on the stochastic or Brownian

motion of the nanoparticles.

Aharony (2000)

“The AdS/CFT correspondence asserts there is an equivalence between a gravitational theory in the bulk and a

conformal field theory in the boundary.”

Theory

The anti-de Sitter/conformal field theory conjecture postulates a duality between field theories and Type IIB

string theory in various geometries.

Berkman (2000)

“Structural and functional characteristics of social networks influence health via several other pathways.”

Theory

Social support theory deals with the various sources of positive or protective influences associated with an

individual’s social relationship and network.

Quantitative Science Studies

407

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Tavolo 7.

(continued )

Highly cited paper
Cardinal (2001)

Typical citing sentence for the causal pair/statement of theory
“In animal studies, lesions in the ventral striatum or in specific regions within the orbitofrontal cortex have been

shown to increase impulsivity.”

Theory

The nucleus accumbens is involved in codifying and computing the value of future rewards and therefore acts as

a driving force to perform goal-directed actions.

Blood (2001)

“Music activates brain regions involved in reward and emotion and can provoke intensely pleasurable responses

in these areas.”

Theory

Chills that occur in response to preferred music are partly mediated by reward-associated brain regions, Quale

are similarly activated by sex and addictive drugs.

distinct number of sentences containing the phrase pair. In determining these counts, IL
cause-and-effect phrases were searched using wildcards so that variants could be retrieved.
Per esempio, for Cardinal (2001) in Table 1 the search was for “*brain lesion*” and “*impulsiv*”.
The counts for verb separated phrases are divided between the cause coming before effect
(F = forward) and after the effect (B = backwards). The sum of F + B can be less or greater
than the distinct sentence counts (given in the last column) because a pair can repeat within
a sentence, which makes the count higher, or not be separated by a verb, which makes the
count lower.

In 7 Di 10 cases, the forward count exceeds the backward count, meaning the cause usually
precedes the effect in the sentences. In most cases, the causal direction is clear, even if the
effect precedes the cause, such as in the case of acrylamide caused by the Maillard reaction.
The main exception is the theoretical physics paper Aharony et al. (2000) on string theory,
where the causal direction is not clear. In this case both the cause and effect (“ads/cft” →
“boundary”) are theoretical constructs that are mathematically related. Whether our analysis
can apply to such cases remains to be seen.

Tavolo 7 gives examples of citing sentences illustrating the principal causal phrases in
Tavolo 6. Instances of effects preceding causes in the sentences are Mottram et al. (2002)
and Alexander et al. (2000). Tavolo 7 also gives a one-sentence summary of the theory that
underlies the causal phrases in Table 6. These summaries are manually constructed by scan-
ning a large sample of citing sentences for each paper. The summaries enable the specific
causal connections in Table 6 to be seen in the context of a more general theory. Per esempio,
TRPV1 is just one type of receptor for pain perception.

The aim of the analysis is to compute a likelihood ratio P(E|T )/P(E|(cid:3)T ), as defined in
Sezione 3.2, for each of the cause/effect relations in Table 6 that determines whether the causal
connection is confirmed by sentiment analysis. Hence, we are dealing with simple causal
patterns A → B, disregarding other factors that might impinge on either B or A or other effects
that might flow from them. The approach is to approximate the conditional probabilities P(E|T )
and P(E|(cid:3)T ) by computing the “supporting evidence” and “uncertainty” sentiments respectively.

The data for this calculation are shown in Table 8. Each paper is represented by two rows,
the first of which is data on the subset of citing sentences containing the cause-effect or theory-
evidence phrase pair, and the second is data on all the citing sentences for the highly cited
paper which serves as the baseline for the phrase pair. We start with the number of citing
sentences containing the phrase pair shown in the column headed “Total citances.” The next

Quantitative Science Studies

408

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

B
UN
sì
e
S
io
UN
N

C
UN
tu
S
UN
l

N
e
T
w
o
R
k
S

UN
N
D

C
io
T
UN
T
io
o
N

S
e
N
T
io

M
e
N
T
S

Q
tu
UN
N

T
io
T

T
io
v
e
S
C
e
N
C
e
S
tu
D
e
S

Tavolo 8. Computing confirmation based on citing sentence sentiments for the 10 highly cited papers. Each paper is represented by two rows: The first row is data on
the subset of citing sentences containing the causal phrase pair and the second row is data on all citing sentences for the individual highly cited paper which serves as
the baseline for the phrase pair. The column labeled “Norm evid wrt paper baseline” divides the “Percent evidence” for the causal pair by the “Percent evidence” for the
paper in the following row. The “Confirm” column is “Yes” if the “Norm evid wrt paper baseline” exceeds the “Norm uncert wrt paper baseline” and “No” if it does not

Paper
Caterina (2000)

Causal pair

TRPV1 → heat

Total
citances
99

Evidence
citances
43

Percent
evidence
43.4

Uncertain
citances
36

Percent
uncertain
36.4

Norm evid. wrt
paper baseline
1.07

Norm uncert. wrt
paper baseline
1.32

Confirm
No

paper baseline

411

167

40.6

113

27.5

Mottram (2002)

Loreau (2001)

maillard →

acrylamide

paper baseline

biodiversity →
ecosystem

paper baseline

Alexander
(2000)

time →

bioavailability

152

399

100

406

30.3

125

31.3

22.0

0.97

0.71

Yes

13.2

18.5

30.0

0.82

0.81

Yes

109

26.8

151

37.2

33.6

26.1

1.29

0.69

Yes

paper baseline

395

100

25.3

150

38.0

Adachi (2001)

excitons → quantum

12.5

efficiency

paper baseline

560

137

24.5

10.7

11.1

0.51

0.97

4
0
9

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Q
tu
UN
N

T
io
T

T
io
v
e
S
C
e
N
C
e
S
tu
D
e
S

Paper
Das (2003)

Causal pair
nanofluid → thermal

conductivity

Tavolo 8.

(continued )

Total
citances
283

Evidence
citances
196

Percent
evidence
69.3

Uncertain
citances
29

Percent
uncertain
10.2

Norm evid. wrt
paper baseline
1.11

Norm uncert. wrt
paper baseline
0.86

Confirm
Yes

paper baseline

598

373

62.4

Aharony (2000)

ads/cft → boundary

paper baseline

Berkman (2000)

social network →

health

paper baseline

Cardinal (2001)

lesions → impulsive

paper baseline

Blood (2001)

music → reward

paper baseline

480

349

326

323

164

205

11.9

3.3

16.0

1.00

0.21

Yes

30.0

0.92

0.87

Yes

16.7

20.0

21.8

120

34.4

54.9

50.3

65.8

63.5

148

52.1

45.4

25.0

26.6

1.09

1.15

1.04

0.94

Yes

B
UN
sì
e
S
io
UN
N

C
UN
tu
S
UN
l

N
e
T
w
o
R
k
S

UN
N
D

C
io
T
UN
T
io
o
N

S
e
N
T
io

M
e
N
T
S

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

4
1
0

Bayesian causal networks and citation sentiments

column, labeled “Evidence citances,” is a count of the sentences containing the “supporting
evidence” sentiment words, followed by its percentage of the total citances.

The count for the “Uncertain citances” and “Percent uncertain” follow. The columns
labeled “Norm evid wrt paper baseline” and “Norm uncert wrt paper baseline” are the
“Evidence” and “Uncertainty” percentages for the causal pair divided by the corresponding
percentages for the paper as a whole given in the row immediately below it labeled “Paper
baseline.” Hence, the total citances for the paper serve as a reference baseline for the specific
causal pair derived from it. This preserves the topic focus as well as compensating for any
Sopra- or underuse of specific sentiment words in the topic.

The relative magnitudes of these two normalized percentages determine the likelihood ratio
under the assumptions we are using on the interpretations of the sentiments. If the normalized
supporting evidence sentiment is greater than the normalized uncertainty, the causal pair is
confirmed. This is indicated by a “Yes” or “No” in the last column labeled “Confirm.” In
Tavolo 8 it is interesting to note that in eight of 10 cases the evidence sentiment outweighs
the uncertainty, but following normalization, five of 10 cases show a reversal of sentiments
where the dominant sentiment prior to normalization is reversed after normalization.

We also note that three of the 10 causal relations are disconfirmed because the uncertainty
outweighs the evidence, including “TRPV1 → heat” from the Caterina et al. (2000) paper.
Tuttavia, another prominent causal link for Caterina et al. (2000), not shown in Table 6,
namely “TRPV1 → capsaicin” (the sensation of capsaicin) is confirmed, so confirmation can
vary from link to link within a given paper. The explanation of why “TRPV1 → heat” is dis-
confirmed is more subtle. It turns out that the response of the receptor depends on the tem-
perature of the stimuli as made clear by the following citance: “Even though there is no doubt
that TRPV1 mediates thermal pain, the presence of additional heat sensors was suggested due
to the fact that TRPV1 knock-out mice still exhibited residual nociceptive behaviors to noxious
thermal stimuli.” In other words, suppressing the receptor did not eliminate the sensation of
extreme or noxious heat. We will see later on (in Table 8) that when compared to a cluster of
papers on nociception, this distinction between moderate and noxious heat is diminished and
the causal link is confirmed. Hence, confirmation can also depend on the scope of the corpus.

4.2. Computing Confirmation for a Network

Each of the cause/effect assertions in Table 6 can be considered a simple one link networks
A → B which have an exact solution using Bayes’s theorem. Tuttavia, when multiple causal links
are connected in a network, an exact solution is not possible, and an algorithm is required that
iteratively exchanges information between nodes until the network converges to a stable solution.

A network was created by merging the citances for the top 20 papers from the nociception
cluster from the SciTech Strategies model. Noun phrase pairs were created as described above
for the combined citances. Tavolo 3 showed that TRPV1 and TRPA1 receptors were involved in
multiple prominent causal assertions, leading to the sensations of heat, cold, acidity, capsaicin,
mustard oil, and other agents. Citances also revealed that the two receptors had a common
origin in neurons, as indicated by the following citance:

“The TRPA1 channel is found in a subset of rat DRG neurons in which it is co-expressed
with the TRPV1, but not the TRPM8 channel.”

This led to a linking together of seven causal assertions to form the directed acyclic graph
(DAG) in Figure 2. The causal network involved eight nodes, starting with a “neuron” node on

Quantitative Science Studies

411

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Figura 2. The causal network for seven nociception links and eight nodes, starting with a “neuron”
node on the left and progressing to the sensations evoked on the right via two receptor types. Nodes
are labeled with upper case letters. Each link is coded by two condition probabilities, E and U,
derived from evidence and uncertainty sentiments. The joint probability distribution expression
based on the “chain rule” for the network is shown below the network, as is the final P(T|E ) value
Di 0.54 which is an average of 20 runs using Bnlearn software using the “logic sampling” option.

the left, and progressing to the sensations evoked on the right via two receptor types: TRPV1
and TRPA1. In contrast to the simple A → B pattern, here an effect can act as a cause leading to
another effect, creating causal chains. In Figure 2 we also give the formula for so-called “joint
probability distribution” for the network, which is a product of conditional probabilities for
every link in the network following the “chain rule” of probabilities. The first term in this
expression is the prior probability of the initial node P(N ) where N stands for neuron. Follow-
ing terms are conditional probabilities each of which corresponds to an arrow in the network
of the form P(effect | cause).

Our aim is to compute the probability that the network is confirmed as a representation of a
theory of nociception based on the sentiments of the citing authors. Così, we need to com-
pute, as before, two conditional probabilities for each link in addition to the prior probability
for the initial node in Figure 2 and input these into the Bnlearn software. Tavolo 9 shows how
these numbers were calculated. As a baseline we use the cumulated citances for the cluster,
rather than the citances for individual papers, as in Table 8. This baseline is shown in the sec-
ond row of Table 9. Beginning in the fourth row we give data for each separate link in the
network computed in the same manner as in Table 8 except that the columns headed “Norm
evid wrt cluster” and “Norm uncert wrt cluster” show the sentiment rates divided by the cluster
baseline. The columns headed “rescale” divide each normalized value by a constant (= 2.2) so
that their values will fall between 0 E 1, as required by probabilities. The scaled values are
labeled as E for evidence and U for uncertainty on Figure 2 and are the values input into the

Quantitative Science Studies

412

B
UN
sì
e
S
io
UN
N

C
UN
tu
S
UN
l

N
e
T
w
o
R
k
S

UN
N
D

C
io
T
UN
T
io
o
N

S
e
N
T
io

M
e
N
T
S

Tavolo 9. Computing confirmation based on citing sentence sentiments for the network of Figure 2. The second row in the table labeled “Cluster baseline” contains
sentiment counts for the aggregate citances for the top 20 papers in the cluster listed in Table 2. Beginning in the fourth row, each link of the network of Figure 2 is listed.
The columns labeled “Norm evid wrt cluster” and “Norm uncert wrt cluster” divide the “Percent evidence citances” and the “Percent uncertain citances” by the values
of the respective cluster baselines in the second row. The two “Rescale” columns divide the normalized evidence and uncertainty percentages by a constant of 2.2 so
that the normalized values fall within the 0–1 interval required by probabilities. The last row in the table shows the computation of the prior probability for the leftmost
node in the network of Figure 2, P(N ). This is based on the uncertainty of “neuron” citances, normalized and rescaled as above, and subtracted from 1 to get a certainty
value

Total
citances
4,752

Evidence
citances
1,173

Percent
evidence
citations
24.7

Uncertain
citances
964

Percent
uncertain
citances
20.3

Norm
evid wrt
cluster

Norm
uncert wrt
cluster

Rescale
evid wrt
cluster (E)

Rescale
uncert wrt
cluster (U)

Confirm

Causal pair

106

123

186

103

24.8

20.0

27.1

13.2

33.6

40.9

31.4

428

615

687

151

247

252

172

846

133

109

181

1.00

0.81

1.10

0.54

1.36

1.66

1.27

21.7

12.8

19.4

13.2

25.5

43.3

15.7

21.4

1.07

0.63

0.95

0.65

1.26

2.13

0.77

1.05

0.46

0.37

0.50

0.24

0.62

0.75

0.58

Yes

0.49

0.29

0.43

0.30

0.57

0.97

0.35

0.48

Cluster baseline

DRG neuron → TRPV1

TRPV1 → capsaicin

TRPV1 → heat

TRPV1 → acid

DRG neuron → TRPA1

TRPA1 → cold

TRPA1 → mustard oil

DRG neuron

(1-prior prob)

Q
tu
UN
N

T
io
T

T
io
v
e
S
C
e
N
C
e
S
tu
D
e
S

4
1
3

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

software. It was found that confirmation was not sensitive to the value of the scaling constant
and P(T ) and P(T|E ) were both shifted up or down proportionally.

The last row in Table 9 shows how the prior probability of P(N ) is computed. As discussed
previously, we base this on the uncertainty sentiment which is computed for citances contain-
ing the terms “DRG [or trigeminal] neuron.” The prior is also subject to the same normalization
and rescaling applied to the conditional probabilities. The final number 0.48 must, Tuttavia,
be subtracted from 1 to convert it to a probability of certainty rather than one of uncertainty,
hence the value of 0.52 = (1 − 0.48) in Figure 2.

The last column in Table 9 shows that four of the individual links were confirming based on
the likelihood ratio. Running the full network using the Bnlearn software gives a probability of
0.54 (an average of 20 separate runs using the “logic sampling” option), which thus narrowly
confirms the network with respect to the prior of 0.52. Similar to the individual links in Table 8,
in five of seven links in Table 9 the evidence outweighs the uncertainty and the links are con-
firmed. Only one of the seven links changes the dominant sentiment after normalization. One
of the two disconfirmed links in Figure 2 is the “TRPA1 receptor” leading to the sensation of
“cold.” Examining the citances for this link we find statements like “noxious cold activation of
TRPA1 is somewhat controversial,” which perhaps explains why this link is not confirmed.
Tuttavia, the two disconfirming links were not strong enough to disconfirm the full network.

5. DISCUSSION

The next step in this research is to automate the formation of as many causal networks as possible
using the cumulative citances for a cluster of papers. This involves linking up as many causal
word phrase pairs as possible given some threshold or limit on pair frequency. Two main
problems remain to be solved. Primo, we need a systematic criterion for differentiating which
member of the pair is the cause/theory and which is the effect/evidence. Secondo, when comput-
ing sentiments, we need to normalize the different presentations of cause-and-effect phrases
which we have done here based on wildcard searching. But the synonym problem remains to
be addressed. A possible solution to the first problem is to take the more uncertain entity of the
pair as the cause or theory and the more certain entity of the pair as the effect or evidence.

Regarding the measuring of sentiments, there is also the need to expand and sharpen the
lists of evidence and uncertainty cue words. The list of terms denoting evidence was a mix of
words indicating the effort to obtain evidence, such as study or experiment, in addition to
words indicating that supporting evidence was found, such as determined or shown. IL
uncertainty words represented only a small sample of possible ways of expressing this
sentimento (Chen & Song, 2018). The normalization procedure of dividing the evidence and
uncertainty rates for cause-effect pairs by paper or cluster baselines may, to some extent,
compensate for the incompleteness of the cue word sets, but results at this stage must be con-
sidered tentative. A related problem is misclassification. The lower precision rates for some
cue words mean that misclassifications will inevitably occur. Another issue is failure to classify,
which is indicated by low recall rates, particularly for uncertainty words. This calls for the
broadening of the uncertainty cue word set.

A question yet to be examined is whether confirmation changes over time, as Chen and
Song have shown for the uncertainty of predications. For some papers we have 18 years of
citing sentences, which could be subdivided by citing years to see if the confirmation status of
a particular cause/effect relation changed from period to period. No doubt slicing the time
periods too narrowly would lead to random fluctuations in the ratio of evidence and uncer-
tainty sentiments. Such a community-based confirmation measure should be more stable than

Quantitative Science Studies

414

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

an individual participant’s perception, which in real time might fluctuate from day to day as
new evidence comes to light.

Another fundamental question relates to how we have used the uncertainty of the theory as
a proxy for the probability of an alternative theory explaining the evidence P(E|(cid:3)T ), assuming,
in effect, that uncertainty is due to the existence of alternative or competing theories. Questo
makes confirmation a balancing act of supporting evidence versus uncertainty. Tuttavia, Esso
is important to develop a more direct way of estimating the probability of an alternative theory.

Some perspective is offered by the history of science. In most research programs, the DNA
history included, investigators move from one theory to another sometimes over a series of
years (Small, 1971). These can be denoted as T1, T2, T3, …, and so on. In the case of DNA,
the Pauling triple helix might be T1 and Watson’s like-with-like base pairing model T2, with T3
their final published model. According to Crick, the debate about whether their model for
DNA was correct continued for nearly 25 years, with a number of alternative models suggested
and rejected (Crick, 1988, 73). From a Bayesian perspective, each theory must be evaluated
on its own merits based on its fit with evidence. But precursor theories can serve as alternative
or competing theories, which are needed for Bayes’s theorem to work. P(E|(cid:3)T ) È, Infatti, IL
sum of all mutually exclusive alternative theories, published or unpublished, which can have
varying degrees of fit with the evidence. This argues for a nonzero floor or minimum P(E|(cid:3)T )
even if T1 is merely an uninformed initial hunch.

In the case of nociception, David Julius in his Nobel lecture (2021) briefly alludes to a com-
peting theory that the capsaicin receptor, rather than being a specific molecular entity that
acted as an ion channel, was due to integrating capsaicin into the cell membrane to form
an ion channel that functioned nonselectively. This set off what he referred to as the “Holy
Grail” of pain research: the search for the molecular capsaicin receptor. Michael Caterina
in Julius’s lab succeeded in cloning genes from neurons and those genes stimulated fibroblast
cell cultures to express the receptor and respond to capsaicin (Caterina et al., 1997). Julius
describes this as a “Eureka moment.”

UN 1995 paper describing a competing hypothesis that capsaicin had created the receptor
was found in the STS5-769 direct citation cluster. Inoltre, this paper was cited in the 1997
discovery paper (Caterina et al., 1997) as a previously “proposed model,” and by examining its
citances we could perhaps assess its degree of support or uncertainty. This suggests that a good
way to find competing theories is to look at the references made by the discovery team itself,
as social norms call for citing competing theories. Obviously, this approach works only when
the competing theory corresponds to a published paper.

Many writers on science have concluded that discovery in science is spurred by chance
occurrences or serendipity. Per esempio, Francis Crick claimed that Watson’s discovery
of base pairing in DNA was due in part to luck (Crick, 1988, P. 65). Allo stesso modo, Hall (1954,
P. 125) stated that Kepler accidentally noticed that an ellipse fitted the orbit of Mars using
Tycho’s observations and Koestler (1964, P. 112) attributed Pasteur’s discovery of vaccination
for chicken cholera in part to chance. The discovery process may be initiated by a novel obser-
vation (some chickens did not get cholera), an inconsistency in theory (Einstein’s theory of
relativity), or even a dream (Kekulé’s structure of benzene). Whatever inspires the hypothesis,
once it is generated a long process of critical evaluation begins. The evaluation can spur new
esperimenti, or modifications of the theory. The discoverer may only reluctantly ask whether
there are competing theories due to his or her interests in priority. Whether we take the point of
view of the individual scientist or the collective view of a community, the evaluation needs to
look for positive and negative evidence as well as alternative explanations.

Quantitative Science Studies

415

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

The question of time slicing raises an interesting question if we view the discovery and
confirmation process as a series of random events. This contrasts with the empiricist notion
that discovery is a systematic process of working backwards from the evidence to the theory
(Losee, 1972, P. 103; Popper, 1962; Schindler, 2008). Reading Watson’s account of the dis-
covery of the structure of DNA, we see almost day-to-day swings in confidence as Watson and
Crick are buffeted by incoming evidence and theoretical insights favoring one model or
another. Per esempio, Linus Pauling’s triple helix model is rejected (Watson, 1968, P. 160).
Watson’s own like-to-like base pairing model was rejected because he had used the wrong
tautomeric form for two of the bases, and Crick also objected that it would violate the Chargaff
rules (Olby, 1974, P. 412). The final model of two right-handed helices with unique base
pairings between them satisfied all the objections and fit with the available evidence so well
that Watson proclaimed: “a structure this pretty just had to exist” (Watson, 1968, P. 205). In
Bayesian terms we could ascribe this feeling to a large jump in P(E|T ) leading to a similar jump
in P(T|E ) versus the prior P(T ) where T is the double helix. Likewise, the ups and downs of the
other models could be interpreted as incremental changes in probabilities P(E|T ) or P(E|(cid:3)T )
depending on the evidence at hand. The day-to-day swings in confidence experienced by
Watson and Crick are analogous to the precarious balance of supporting evidence and uncer-
tainty proposed in this paper as expressed by the likelihood ratio.

Whether such a qualitative application of Bayes’s theorem is possible based on historical
examples is beyond the scope of this paper. If we are correct, then Eureka or “aha” moments
are indicators of shifts in the prior vis-à-vis posterior probabilities of a theory. We further assume
that these moments will continue to occur randomly during the extended process of confirma-
zione, including disappointing moments of disconfirmation. The personal and subjective point of
view of Watson contrasts with the method used in this paper based on citing sentences from a
community of peers. The latter is by contrast a delayed, retrospective reaction. In the long run we
might expect a convergence of opinion between the subjective view of the discoverer and the
collective perspectives of the community. But given the different interests of these parties, Esso
would not be surprising to see differences. A discoverer who expends considerable effort to
support the validity of a knowledge claim would be expected to take a more sanguine view
of the evidence than a peer group with competing interests in an alternative theory.

6. CONCLUSIONS

This paper proposes a network model of confirmation in science based on cause-and-effect
linkages interpreted as theory and evidence connections. The model is a hybrid citation
and language approach that draws on citing sentences for single papers or clusters of papers.
This combines the capability of citation-based clustering methods to defined specialty areas
with the in-depth conceptual-level detail afforded by textual and linguistic methods to identify
cause-effect linkages. The present paper points to the possibility of using Bayes’s rule to under-
stand the process of confirmation.

The use of citation context sentiments for computing conditional probabilities is attempted for
the first time, but issues remain, particularly regarding the evaluation of competing theories. Questo
problem might be resolved if competing theories have been published and their citances ana-
lyzed, reducing confirmation to a comparison of sentiments for competing published theories.

It is interesting that Kuhn argued against the Bayesian approach to theory choice, because
he maintained that scientists in historical contexts used a variety of subjective criteria (Kuhn,
1977; Salmon, 1990). Per esempio, he argued that a phlogiston theorist might prefer their
theory over the oxygen theory because it explained the “similarity” of metals, all of which

Quantitative Science Studies

416

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

contained phlogiston. Allo stesso tempo, there was widespread acceptance of oxygen’s expla-
nation of weight gain of calxes. D'altra parte, an oxygen theorist might argue that the
similarity of metals was due to the absence of oxygen. A Bayesian might say that these diver-
gent criteria would have simply offset one another and at worst delayed the decision in favor of
the oxygen theory until further evidence emerged.

The “no miracles” argument, attributed to the realist philosopher Hilary Putnam (1975, P. 73),
says that the striking agreement between theory and evidence sometimes achieved in modern
science would not be possible unless the underlying theory was true (Howson & Urbach, 2006,
P. 26). The Bayesian, on the other hand, would point to the improbability of a close fit between
theory and evidence and the resulting higher probability of the theory being true given the
evidence, but no possibility of absolute truth as long as there are alternative theories. Arthur
Koestler in his classic book The Act of Creation (1964) talks about the “Eureka” moment when
two seemingly unrelated events come together for which he coins the term “bisociate”—the
transition from thinking something is unlikely to seeing that it works. Such moments occur
when theory closely fits with evidence, Per esempio, when James Watson lines up the molecular
models of the DNA base pairs, or when Caterina and Julius clone the capsaicin receptor.

Assuming “Eureka” moments occur randomly during the course of theory testing means
that conditional probabilities are incremented or decremented as the scientific community
critically examines and refines the theory’s and its competitor’s fit with the evidence. Così,
a theory’s confirmation status will remain in flux for an extended period. Clearly, a community
and citation-based assessment, as we have outlined here, filtered through cool scientific prose,
lacks the emotional impact of the “Eureka” or “aha” moment. A challenge for future research is
to show how the force of a sudden change in a theory’s probability, such as a discovery, È
communicated to the community and reflected in citing sentences.

ACKNOWLEDGMENTS

I would like to thank Nees Van Eck of CWTS and Kevin Boyack of SciTech Strategies, Inc. for
providing citation context and cluster data, Mike Patek of SciTech Strategies for programming,
and Harriet Noble for assistance in citation sentiment coding. Two anonymous referees
provided detailed comments which were very helpful.

COMPETING INTERESTS

The author has no competing interests.

FUNDING INFORMATION

No funding has been received for this research.

DATA AVAILABILITY

Data are available from the author.

REFERENCES

Atkinson, J., & Rivas, UN. (2008). Discovering novel causal patterns
from biomedical natural-language texts using Bayesian nets. IEEE
Transactions on Information Technology in Biomedicine, 12(6),
714–722. https://doi.org/10.1109/TITB.2008.920793, PubMed:
19000950

Boyack, K. W., Van Eck, N. J., Colavizza, G., & Waltman, l. (2018).
Characterizing in-text citations in scientific articles: A large-scale

analysis. Journal of Informetrics, 12(1), 59–73. https://doi.org/10
.1016/j.joi.2017.11.005

Bunge, M. (1963). Causality: The place of the causal principle in

modern science. Cleveland: Meridian Books.

Caterina, M. J., Leffler, A., Malmberg, UN. B., Martin, W. J., Trafton,
J., … Julius, D. (2000). Impaired nociception and pain sensation
in mice lacking the capsaicin receptor. Scienza, 288(5464),

Quantitative Science Studies

417

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

306–313. https://doi.org/10.1126/science.288.5464.306,
PubMed: 10764638

Caterina, M. J., Schumacher, M. A., Tominaga, M., Rosen, T. A., Levine,
J. D., & Julius, D. (1997). The capsaicin receptor: A heat-activated
ion channel in the pain pathway. Nature, 389(6653), 816–827.
https://doi.org/10.1038/39807, PubMed: 9349813

Chen, C., & Song, M. (2018). Representing scientific knowledge:
The role of uncertainty. London: Springer. https://doi.org/10
.1007/978-3-319-62543-0

Chen, C., Song, M., & Heo, G. E. (2018). A scalable and adaptive
method for finding semantically equivalent cue words of uncer-
tainty. Journal of Informetrics, 12(1), 158–180. https://doi.org/10
.1016/j.joi.2017.12.004

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with
provision for scale disagreement or partial credit. Psicologico
Bulletin, 70, 213–220. https://doi.org/10.1037/ h0026256,
PubMed: 19673146

Cold fusion. (2021, Dicembre 10). In Wikipedia. https://en

.wikipedia.org/wiki/Cold_fusion.

Crick, F. (1988). This mad pursuit: A personal view of scientific

discovery. New York: Basic Books.

Findler, N. V., & Bickmore, T. (1996). On the concept of causality
and a causal modeling system for scientific and engineering
domini, CAMUS. Applied Artificial Intelligence, 10(5),
455–487. https://doi.org/10.1080/088395196118506

Glymour, C. (1980). Theory and evidence. Princeton, NJ: Princeton

Stampa universitaria.

Greenberg, S. UN. (2009). How citation distortions create unfounded
authority: Analysis of a citation network. British Medical Journal,
339, b2680. https://doi.org/10.1136/ bmj.b2680, PubMed:
19622839

Hall, UN. R. (1954). The scientific revolution 1500–1800: The forma-
tion of the modern scientific attitude (2nd edn). Boston: Beacon
Press.

Hanson, N. R. (1972). Patterns of discovery: An inquiry into the
conceptual foundations of science. Cambridge: Cambridge
Stampa universitaria.

Howson, C., & Urbach, P. (2006). Scientific reasoning: The Bayes-

ian approach. Chicago: Open Court Publishing Co.

Hyland, K. (2004). Disciplinary discourses: Social interactions in
academic writing. Ann Arbor: The University of Michigan Press.
Ihde, UN. J. (1964). The development of modern chemistry (Chapter 3).

New York: Harper & Row.

Julius, D. (2021, Dicembre 7). From peppers to peppermints:
Insights into thermosensation and pain. https://www.nobelprize
.org/prizes/medicine/2021/julius/lecture/

Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch,
T. C. (2012). SemMedDB: A PubMed-scale repository of biomed-
ical semantic predications. Bioinformatics Applications Note,
28(23), 3158–3160. https://doi.org/10.1093/ bioinformatics
/bts591, PubMed: 23044550

Klavans, R., Boyack, K. W., & Murdick, D. UN. (2020). A novel
approach to predicting exceptional growth in research. PLOS
ONE, 15(9), e0239177. https://doi.org/10.1371/journal.pone
.0239177, PubMed: 32931500

Koestler, UN. (1964). The act of creation. London: Penguin.
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago:

University of Chicago Press.

Kuhn, T. S. (1977). Objectivity, value judgment and theory choice.
In The essential tension (pag. 320–339). Chicago: University of
Chicago Press.

Larmers, W. S., Boyack, K., Larivière, V., Sugimoto, C. R., van Eck,
N. J., … Murray, D. (2021). Investigating disagreement in the

scientific literature. eLife, 10, e72737. https://doi.org/10.7554
/eLife.72737, PubMed: 34951588

Li, Z., Li, Q., Zou, X., & Ren, J. (2021UN). Causal extraction based on
self-attentive BiLSTM-CRF with transferred embeddings. Neuro-
computing, 423, 207–219. https://doi.org/10.1016/j.neucom
.2020.08.078

Li, X., Peng, S., & Du, J. (2021B). Towards medical knowmetrics:
Representing and computing medical knowledge using semantic
predications as the knowledge unit and the uncertainty as the
knowledge context. Scientometrics, 126, 6225–6251. https://
doi.org/10.1007/s11192-021-03880-8, PubMed: 33612884

Losee, J. (1972). A historical introduction to the philosophy of

science. London: Oxford University Press.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013).
Distributed representations of words and phrases and their com-
positionality. Proceedings of the 26th International Conference
on Neural Information Processing Systems (pag. 3111–3119).
Moayedi, M., & Davis, K. D. (2013). Theories of pain: from speci-
ficity to gate control. Neurophysiology, 109(1), 5–12. https://doi
.org/10.1152/jn.00457.2012, PubMed: 23034364

Nagarajan, R., Scutari, M., & Lebre, S. (2013). Bayesian networks in
R with applications in systems biology. New York: Springer.
https://doi.org/10.1007/978-1-4614-6446-4

Nakov, P., Schwartz, A., & Hearst, M. (2004). Citances: Citation
sentences for semantic analysis of bioscience text. SIGIR Work-
shop of Search and Discovery on Bioinformatics.

Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., …
Rife, S. C. (2021). scite: A smart citation index that displays the
context of citations and classifies their intent using deep learning.
Quantitative Science Studies, 2(3), 882–898. https://doi.org/10
.1162/qss_a_00146

Olby, R. (1974). The path to the double helix. Seattle: University of

Washington Press.

Pearl, J. (2000). Causality: Modelli, reasoning, and inference.

Cambridge: Cambridge University Press.

Pearl, J., & Mackenzie, D. (2018). The book of why. New York:

Basic Books.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., …
Duchesnay, É. (2011). Scikit-learn: Machine learning in Python.
Journal of Machine Learning Research, 12, 2825–2830.

Popper, K. R. (1962). Conjectures and refutations: The growth of

scientific knowledge (Chapter 8). New York: Basic Books.

Putnam, H. (1975). Collected papers: Mathematics, matter and

method (Vol. 1). Cambridge: Cambridge University Press.

Rindflesch, T. C., Kilicoglu, H., Fiszman, M., Rosemblat, G., &
Shin, D. (2011) Semantic MEDLINE: An advanced information
management application for biomedicine. Information Services
& Use, 31, 15–21. https://doi.org/10.3233/ISU-2011-0627

Salmon, W. C. (1990). Rationality and objectivity in science, O,
Tom Kuhn meets Tom Bayes. University of Minnesota Press,
Minneapolis. Retrieved from the University of Minnesota Digital
Conservancy: https://hdl.handle.net/11299/185726

Schindler, S. (2008). Model, theory and evidence in the discovery
of the DNA structure. British Journal for the Philosophy of
Scienza, 59(4), 619–658. https://doi.org/10.1093/bjps/axn030
Small, H. (1971). The helium atom in the old quantum theory (doc-
toral dissertation). University of Wisconsin, ProQuest #7125217.
Small, H. (1978). Cited documents as concept symbols. Sociale
Studies of Science, 8, 327–340. https://doi.org/10.1177
/030631277800800305

Small, H. (2020). Past as prologue: Approaches to the study of
confirmation in science. Quantitative Science Studies, 1(3),
1025–1040. https://doi.org/10.1162/qss_a_00063

Quantitative Science Studies

418

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D

F
/

3
2
3
9
3
2
0
3
1
8
7
8
q
S
S
_
UN
_
0
0
1
8
9
P
D

B
sì
G
tu
e
S
T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Bayesian causal networks and citation sentiments

Small, H. (2021). From citing sentences to causal networks:
The causality index. In W. Glanzel, S. Heefer, P.-S. Chi, & R.
Rousseau (Eds.), Proceedings of the 18th Conference on Sciento-
metrics and Informetrics: ISSI2021 (pag. 1039–1044).

Small, H., Tseng, H., & Patek, M. (2017). Discovering discoveries:
Identifying biomedical discoveries using citation contexts.
Journal of Informetrics, 11, 46–62. https://doi.org/10.1016/j.joi
.2016.11.001

Sobrino, A., Olivas, J. A., & Puente, C. (2010). Causality and imper-
fect causality from texts: A frame for causality in social sciences.
International Conference on Fuzzy Systems (pag. 1–8). Barcelona:
IEEE. https://doi.org/10.1109/FUZZY.2010.5584863

Swanson, D. R. (1986). Undiscovered public knowledge. Library

Trimestrale, 56(2), 103–118. https://doi.org/10.1086/601720

Thagard, P. (1992). Conceptual revolutions. Princeton, NJ: Prince-
ton University Press. https://doi.org/10.1515/9780691186672

Thilakaratne, M., Falkner, K., & Atapattu, T. (2019). A systematic
review on literature-based discovery: General overview, method-
ology, & statistical analysis. ACM Computing Surveys, 52(6),
Article 129. https://doi.org/10.1145/3365756

Traag, V. A., Waltman, L., & Van Eck, N.-J. (2019). From Louvain to
Leiden: Guaranteeing well-connected communities. Scientific
Reports, 9, 5233. https://doi.org/10.1038/s41598-019-41695-z,
PubMed: 30914743

Trieu, H.-L., Tran, T. T., Duong, K. N. A., Nguyen, A., Miwa, M., &
Ananiadou, S. (2020). DeepEventMine: End-to-end neural nested
event extraction from biomedical texts. Bioinformatics, 36(19),
4910–4917. https://doi.org/10.1093/ bioinformatics/ btaa540,
PubMed: 33141147

Watson, J. D. (1968). The double helix: A personal account of the
discovery of the structure of DNA. New York: Atheneum. https://
doi.org/10.1063/1.3035117

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

:
/
/

D
io
R
e
C
T
.

io
T
.

e
D
tu
q
S
S
/
UN
R
T
io
C
e
–
P
D