ARTÍCULO DE INVESTIGACIÓN
Visualizing academic descendants using modified
Pavlo diagrams: Results based on five researchers
in biomechanics and biomedicine
un acceso abierto
diario
Bharti School of Engineering & Computer Science, Laurentian University, Sudbury, ontario, Canada
W.. Brent Lievers
Palabras clave: academic genealogy, doctoral descendants, mentorship indices, visualization
ABSTRACTO
Visualizing the academic descendants of prolific researchers is a challenging problem. To this
end, a modified Pavlo algorithm is presented and its utility is demonstrated based on manually
collected academic genealogies of five researchers in biomechanics and biomedicine. El
researchers have 15–32 children each and between 93 y 384 total descendants. The graphs
generated by the modified algorithm were over 97% smaller than the original. Mentorship
metrics were also calculated; their hm-indices are 5–7 and the gm-indices are in the range
7–13. Del 1,096 unique researchers across the five family trees, 153 (14%) had graduated
their own PhD students by the end of 2021. It took an average of 9.6 years after their own
graduation for an advisor to graduate their first PhD student, which suggests that an academic
generation in this field is approximately one decade. The manually collected data sets used
were also compared against the crowd-sourced academic genealogy data from the
AcademicTree.org website. The latter included only 45% of the people and 34% del
connections, so this limitation must be considered when using it for analyses where
completeness is required. The data sets and an implementation of the algorithm are available
for reuse.
1.
INTRODUCCIÓN
Mentorship is a foundational component of academia. Although it can take different forms,
many of which are unofficial and uncredited, the formal mentoring relationship between a
doctoral student and their advisor(s)1 is arguably the most important. It is certainly one that
has received has received a great deal of research, most of which can be be divided into
one of two categories.
One approach is to focus on the student side of the advisor–advisee relationship. Para examen-
por ejemplo, various studies have examined the effects that advisors can have on a student’s mental
salud (Levecque, Anseel et al., 2017; Mackie & Bates, 2019), their productivity (García-
Suaza, Otero, & Winkelmann, 2020), and their career outcomes (Gaule & Piacentini, 2018;
Malmgren, Ottino, & Nunes Amaral, 2010).
Another approach is to consider what these relationships reveal about the advisor. To this
end, various ways of quantifying the mentoring productivity—or in biological terms, el
1 The specific titles applied to this role vary by jurisdiction and institution (p.ej., advisor, silla, director, super-
visor), but the term advisor will be used throughout this paper for consistency.
Citación: Lievers, W.. B. (2022).
Visualizing academic descendants
using modified Pavlo diagrams:
Results based on five researchers in
biomechanics and biomedicine.
Estudios de ciencias cuantitativas, 3(3),
489–511. https://doi.org/10.1162/qss_a
_00205
DOI:
https://doi.org/10.1162/qss_a_00205
Revisión por pares:
https://publons.com/publon/10.1162
/qss_a_0025
Recibió: 28 Marzo 2022
Aceptado: 25 Julio 2022
Autor correspondiente:
W.. Brent Lievers
blievers@laurentian.ca
Editor de manejo:
Juego Waltman
Derechos de autor: © 2022 W.. Brent Lievers.
Publicado bajo Creative Commons
Atribución 4.0 Internacional (CC POR 4.0)
licencia.
La prensa del MIT
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
fecundity—of a researcher have also been proposed. One obvious metric is to simply count a
researcher’s direct descendants or children; eso es, those students that a researcher has advised
or coadvised. This counting can also be extended over multiple generations to sum a
researcher’s descendants (es decir., niños, grandchildren, great grandchildren, etc.): all those
who can trace their advisors’ lineage back to the original researcher. Recently some have
drawn inspiration from publishing metrics such as the h-index (Hirsch, 2005) and g-index
(Egghe, 2006) as alternate ways to assess fecundity. Their mentoring equivalents, the hm (Rossi,
Damaceno et al., 2018) and gm-index (Sanyal, Dey, & El, 2020), attempt to quantify men-
torship by considering the first two generations of descendants.
Besides these quantitative approaches, more qualitative analyses have also been per-
formed. Academic genealogies have been assembled for nations (Damaceno, Rossi et al.,
2019), campos (kelly & Sussman, 2007; Russell & Sugimoto, 2009), journals (mitchell, 1992;
Montoye & Washburn, 1980), and individual researchers (bennett & Lowe, 2005; Lv & Chang,
2021). These family trees highlight the mentoring relationships that exist among researchers
and can provide insight into a researcher’s influence on a field.
Despite this interest, visualizing networks of descendants remains a challenge for prolific
investigadores. A common approach (Rutter, VanderPlas et al., 2019) is to use a typical family tree
representation such as the one shown in Figure 1. Each node in the graph represents an indi-
vidual and each edge represents an advisor–advisee relationship. Although intuitive, este
approach is unsuitable for large numbers of descendants because the aspect ratio of the graph
is determined by the number of generations (height in Figure 1) and the number of individuals
en cada generación (width). As the number of descendants grows, the aspect ratio becomes
more extreme, making it more difficult to understand the overall topology of the network. Era-
ious forms of radial or circular graphs have been proposed as alternatives that have smaller
aspect ratios (Arce-Orozco, Camacho-Valerio, & Madrigal-Quesada, 2017; Grivet, Auber
et al., 2006; Huang, Le et al., 2020). A related challenge of the family tree is that trying to pack
as many nodes together to address aspect ratio issues makes it more difficult to distinguish who
was advised by whom.
The radial layout algorithm proposed by Pavlo, Homan, and Schull (2006) shows potential
for academic genealogies (Cifra 2) because the distinction between the descendants of dif-
ferent children is clear. Each node is surrounded by a containment circle around which the
child nodes are placed. The root node uses the entire circle and intermediate nodes use only
an outward portion of the circle, the containment arc, which is bounded by the straight lines.
Desafortunadamente, in the original algorithm proposed by Pavlo et al. (2006), the size of each con-
tainment circle is determined by its parent and the number of siblings. As pointed out by
Huang et al. (2020), this approach results in the outermost descendants becoming smaller
and smaller as the number of generations increases. If a suitably large initial size is not chosen,
the outermost children can become unreadably small. A second issue is that equivalent sub-
trees have different sizes and shapes depending on the generation in which they occur and the
number of siblings they have. Sin embargo, some modifications to the existing algorithm could
Cifra 1.
Example of an academic genealogy represented in a traditional family tree format.
Estudios de ciencias cuantitativas
490
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
Cifra 2. The same academic genealogy information from Figure 1 presented using a traditional
Pavlo diagram. The grey circles are the containment circles shown to illustrate how the nodes are
placed. The grey lines within the circles demark the containment arc, the portion of the circle where
nodes can be placed. The size and shape of the graph are determined by two user-selected param-
eters: r, the radius of the containment circle for the root node, and ϕ, the included angle of the
containment arc.
eliminate these shortcomings and make the resulting diagrams more compact and more usable
for academic genealogies.
A challenge related to evaluating visualization methods is that the data used for assessment
are often artificially generated and simplified compared to their real-world equivalents. Real
data are preferred to ensure that any characteristic features are present to reveal any shortcom-
ings of an algorithm for that desired application. Por ejemplo, academic genealogies are
highly asymmetric; successful researchers may have many doctoral students, but only a frac-
tion of those will go on to have PhD students themselves. The fecundity of those students will
also vary dramatically, both as a result of their individual careers and also due to birth-order
efectos. We think of a human generation in terms of the 20–30 years needed for a child to be
born, mature, and then reproduce. Because an equivalent (albeit shorter) time is needed for
academic reproduction, a researcher’s first few doctoral descendants will have had longer to
reproduce, and will likely have more descendants, than those who graduated near the end of
the researcher’s career. Finalmente, a student may also have multiple coadvisors and these rela-
tionships must be represented clearly. These unique characteristics underscore the need for
comprehensive data to test the usability of different visualization methods. Although there
are public sources of academic genealogy data available, the quality of these data remains
unclear and must be assessed.
The goals of the current work are threefold. The first is to assemble data sets of academic
descendants for five biomechanical/biomedical researchers that are both as comprehensive
as possible and demonstrate a variety of possible shapes and sizes. These data, que lo hará
be limited to just doctoral advisor–advisee relationships, will be made available for future
visualization studies (see Data Availability). The second goal is to introduce an improved
version of the Pavlo visualization algorithm and demonstrate its suitability for displaying
Estudios de ciencias cuantitativas
491
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
academic genealogies using the collected data sets. Finalmente, the third goal is to analyze the
data sets to quantify the fecundity of the five researchers, calculate the time necessary for
someone to graduate their first PhD student, and assess the coverage of a particular online
repository of academic genealogy data (Academic Family Tree, n.d.). Completing these
three goals will help further the study of mentorship within academia, particularly within
the fields of biomechanics and biomedicine.
2. MÉTODOS
2.1. Recopilación de datos
Academic genealogy data is spread across a number of sources, including public databases
such as Academic Family Tree (n.d.) and the Mathematics Genealogy Project (n.d.), commer-
cial databases such as ProQuest, and university dissertation repositories, as well as the per-
sonal websites and online CVs of individual researchers. None of these sources is necessarily
comprehensive, correcto, or current. Yet assembling representative genealogies, ones that have
the characteristic sizes and shapes, is critical to evaluating the efficacy and robustness of a
visualization algorithm.
Academic descendant data were collected for five researchers from the fields of biome-
chanics and biomedicine: Steven A. Goldstein, Wilson C. Hayes, Van C. Mow, Lawrence E.
Thibault, and Ronald F. Zernicke. Each researcher received their doctoral degree from
1960 to1980, which is long enough ago to have multiple generations of academic descen-
dants but also recent enough to ensure that most immediate descendants can still be con-
tacted. The five were also chosen to ensure a range of sizes and shapes in their academic
árboles. Beyond these selection criteria, the individual researchers represent a convenience
sample. It should also be noted that all five obtained their degrees in the United States.
Although they or their descendants have graduated students in institutions around the
world, the vast majority of the researchers in these trees completed their degrees in North
America. Por lo tanto, the sizes and shapes of the genealogies may not be representative of
those in other countries.
Information gathered from AcademicTree.org and ProQuest was first consolidated together.
These data were expanded using public information on researchers’ websites, CVs available
en línea, and information in institutional dissertation repositories. When the full text of the dis-
sertations was available electronically, the data collected were validated against the informa-
tion presented on the title page or in the acknowledgments section. Finalmente, individual
researchers who had a current or past academic appointment were also contacted via email
to confirm existing information and request any missing information. Desafortunadamente, this was
not always possible (p.ej., retirement, death, lack of contact information, no response).
Descendants were limited to doctoral students for the purpose of this study. This narrow
scope was adopted because a doctoral degree is typically required to advise graduate students,
which makes holders of these degrees most likely to reproduce. Master’s theses were excluded
because they receive less coverage in databases, which makes them more difficult to track.
Postdoctoral supervision was also excluded as it would require confirmation from one of
the parties involved; it doesn’t generate a single dissertation-like document that allows for
independent verification. Por lo tanto, limiting the scope to PhDs greatly simplified data collec-
ción. Finalmente, any terminal research degree that included a written thesis or dissertation was
included regardless of the name (p.ej., PhD, DSc, ScD, DEng, Maryland).
Estudios de ciencias cuantitativas
492
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
The following information was collected about each descendant: the person’s name, el
institution from which their doctoral degree was obtained, the year of completion, y el
names of all advisors or coadvisors. These data were deemed by our institutional Research
Ethics Board (REB) to be public information and not requiring formal consent forms for collec-
ción. Sin embargo, when individuals were contacted via email, they were informed that the
assembled data would be made publicly available and were given the opportunity to raise
concerns. None of the respondents did so. Any students who had successfully defended by
the end of 2021 were included. Data entry errors or discrepancies were occasionally uncov-
ered. When conflicts arose, information obtained from sources more closely associated with
the individual (es decir., dissertation documents, lab websites, online CVs, or email) were deemed
as more authoritative.
It should also be noted that, while generally straightforward, identifying who should be rec-
ognized as an “advisor” can at times be difficult to establish. Por ejemplo, when an advisor
moves to another institution, students still enrolled at the original institution may require a
local supervisor for administrative purposes. Although they may be listed as the primary advi-
sor, they may not actually perform any of the associated duties. En cambio, others may be
actively providing mentorship and support to a doctoral student, yet not receive formal recog-
nition as an advisor or coadvisor. For the purposes of the data collected, we have tried to limit
“advisors” to those who both received formal recognition for that role and were not purely
administrative. Sin embargo, when direct communication with the advisors and students
was not possible, or no reply was received, we had to proceed with the best information
disponible.
The assembled data sets are available for reuse (see Data Availability) as comma-separated
valor (.csv) and extended markup language (.xml) files. Although every effort was made to
ensure they were complete through to the end of 2021, they are acknowledged to be imper-
fect. Además, they will quickly become outdated as new descendants are added over time.
Sin embargo, they are the most comprehensive sets of data for these five researchers at the
time of writing.
2.2. Modified Pavlo Algorithm
The proposed visualization algorithm is based on the work of Pavlo et al. (2006). As shown in
Cifra 2, the original algorithm starts with a root node surrounded by a containment circle
with a radius r. The value of r is a user-specified parameter and determines the subsequent
size and spacing of all other nodes. The child nodes are then equally spaced around the perim-
eter of the root’s containment circle and given their own containment circles, whose radii are
determined by geometry. The next set of nodes are then equally spaced around the contain-
ment arc, a portion of the containment circle prescribed by the user-selected angle ϕ. El
process continues recursively until all nodes are processed.
As highlighted by Huang et al. (2020), a fundamental problem with the root-outward
approach of the original algorithm is that the nodes and containment circles for each subse-
quent generation become progressively smaller. A very large value for r must be selected to
ensure that there is adequate spacing between the outermost nodes, and this value is not
known a priori. A second issue is that individuals with equivalent numbers of descendants will
not be represented by equivalently sized containment circles if they occur in different gener-
ations or have different numbers of siblings (ver figura 2). This phenomenon violates a com-
mon aesthetic principal for graph drawing which holds that “a sub-tree should be drawn the
Estudios de ciencias cuantitativas
493
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
same way regardless of where it occurs in the tree” (Reinoro & Tilford, 1981). More impor-
tantly, it also results in a larger graph than is necessary due to excess space being used for
some nodes, particularly those without children. Por lo tanto, the overall objectives of this
revised algorithm are to ensure that equivalent nodes and subtrees are drawn consistently
and the entire graph uses less space than the original algorithm.
The modified Pavlo algorithm will be presented in detail in the following subsections. Este
process consists of two main steps: determining the size of the containment circles for each
nodo, and determining the orientation of each node. A third optional step will also be pre-
sented that assigns unique node and edge colors based on a hue-saturation-lightness (HSL)
color wheel.
Python implementations of the original and modified Pavlo algorithms have been made
available for reuse. Consult the Data Availability section for more information.
2.2.1. Determining the node containment circle sizes
A major change to the algorithm is the order in which the size of the containment circles is
calculated. The original algorithm relied on a root-outward approach. The user would select
the radius, r, of the containment circle for the root node, and the sizes of all the subsequent
circles were determined recursively based on r and the number of children in each generation.
Desafortunadamente, this method requires an interactive selection of r to ensure some minimum
spacing between nodes. The modified algorithm employs a periphery-inward approach. A
minimum size for the containment circles is specified for childless nodes and the subsequent
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 3. Size and layout parameters for (a) terminal nodes [rojo], (b) intermediate nodes [verde], y (C) the root node [azul]. The genea-
logical information in (C) is the same as in Figures 1 y 2.
Estudios de ciencias cuantitativas
494
Visualizing academic descendants using modified Pavlo diagrams
size calculations proceed recursively inward toward the root node. These changes ensure that
a minimum spacing is maintained between nodes, ensure equivalent nodes and subtrees are
drawn consistently, and pack the nodes together more tightly to reduce the total area of the
graph.
As illustrated in Figure 3, there are three types of node scenarios that must be considered.
The most common are what we’ll refer to as terminal nodes because they have no children
(Cifra 3(a)). Each node has a containment circle, which is used to place it relative to its sibling
nodos. En este caso, the radius of the containment circle, ri, is given by
ri ¼ 1
2
d
di þ g
Þ
(1)
where di is the diameter of the node, and g is a user-specified parameter that prescribes the
minimum gap between nodes. The ability to prescribe a minimum gap is an important
improvement over the original Pavlo algorithm and eliminates the need to vary the radius
of the root containment circle, r, to achieve the desired spacing.
We will assume that the diameter of the node, di, is related to the total number of descen-
dants for that node, En, por
pag
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
:
Ni þ 1
di ¼
(2)
The term descendant refers to an individual from any subsequent generation (p.ej., niños,
grandchildren, great-grandchildren) that trace their lineage to node i. By definition, terminal
nodes have zero descendants (Ni = 0), which means they have a diameter of di = 1.
Intermediate nodes (Cifra 3(b)) are those with both parents and children. Similar to Pavlo
et al. (2006), the child nodes are spaced around a containment arc of the circle prescribed by
the angle ϕ, a second user-specified parameter. The length of a continuous containment arc,
li, is given by
Li ¼ ϕri
(3)
but based on a finite number of child nodes, ni, packed together along the arc, it can be dis-
cretized into a series of line segments corresponding to the radii, rj, of the children’s contain-
ment circles. The segmental length of the arc is then given by
Li ¼ 2
Xni
j¼1
rj:
(4)
Using the cosine rule, we know that the radius of the node’s containment circle, ri, is related to
the radius of a child’s containment circle, rj, a través de
r 2
j
¼ r 2
i
þ r 2
i
which we can rearrange as
− 2r 2
i cos αj ¼ 2r 2
(cid:3)
i 1 − cos αj
(cid:4)
αj ¼ arccos 1 − 1
2
!
:
r 2
j
r 2
i
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
(5)
(6)
The minimum value of ri that ensures all children are optimally packed is determined by
solving the equation
(cid:5)
(cid:5)
(cid:5)
(cid:5)
(cid:5)
ϕ − 2
(cid:5)
(cid:5)
(cid:5)
(cid:5)
(cid:5)
αj
Xni
j¼1
¼ 0:
(7)
495
Estudios de ciencias cuantitativas
Visualizing academic descendants using modified Pavlo diagrams
A numerical approach must be used to solve this equation for ri as no direct solution exists.
We can rearrange Eq. 3 to obtain an initial estimate of ri = Li /ϕ.
There are certain scenarios where it is mathematically possible that a parent node could
have a smaller containment circle radius than its descendants, such as when an intermediate
node has only one child. This problem only compounds when a chain of nodes with a single
child occurs. To avoid this issue, we define a minimum size for the containment circle, rmin
given by
rmin ¼ g þ 1
2
(cid:3)
(cid:3) (cid:4)
di þ max dj
(cid:4)
;
(8)
which ensures the minimum gap size is maintained between the parent and the largest child.
We set the containment circle radius, ri, to be the maximum of the two values given by Eqs. 7
y 8. The arc will be larger than necessary for the children in such cases, como el
intermediate nodes with one and two children shown in Figure 3(b).
The size of the root node (Cifra 3(C)) is calculated in a similar manner to the intermediate
nodos, except that the entire circumference of the containment circle can be used. Por lo tanto,
we determine ri by finding a solution to
(cid:5)
(cid:5)
(cid:5)
(cid:5)
(cid:5)
2π − 2
(cid:5)
(cid:5)
(cid:5)
(cid:5)
(cid:5)
αj
Xni
j¼1
¼ 0:
(9)
De nuevo, the ri is set to the maximum of Eqs. 8 y 9 to avoid problems from small numbers of
niños.
Because determining a node’s containment circle depends on all its descendants, el
size calculations must be performed beginning with the terminal nodes and ending with
the root node. Reversing the order of size calculations from root-outward to periphery-
inward results in much more compact graphs and is a major improvement to the original
Pavlo algorithm.
2.2.2. Determining the node orientation
Once the sizes of the containment circles have been calculated, we must determine the angu-
lar position of each child node relative to its parent. Por lo tanto, the angular assignments must
begin with the root node and work outward to the terminal nodes.
The general case is the one given by the intermediate nodes (Cifra 3(b)), so we will con-
sider it first. We know that half the angle covered by a child is given by Eq. 6. Por lo tanto, el
angle θj for each child is given by
θj ¼ θi − 1
2
ϕ0 þ αj þ 2
Xj−1
k¼1
αk ;
where θi is the orientation of the parent node and
ϕ0 ¼ 2
Xni
j¼1
αj
to account for the fact that the children may not fill the entire containment arc length.
(10)
(11)
496
Estudios de ciencias cuantitativas
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
For the root node, ecuación. 10 simplifies to
θj ¼ − 1
2
ϕ0 þ αj þ 2
Xj−1
k¼1
αk
(12)
by assuming that θi = 0.
2.2.3. Determining node and edge colors
Every child node has had only a single parent in the examples shown thus far; sin embargo, es
possible for a doctoral student to have two or more advisors. For this study, only coadvisors
already within the academic tree—that is, someone who is a descendant of the root
researcher—will be included in the visualizations. Sin embargo, these extra edges in the
graph can still cause some confusion.
A feature common to both the original and the modified Pavlo algorithm is that each
child has to be assigned to the containment circle of a single parent. When multiple advi-
sors exist in the tree, assignment was made based on the order of recognition in the doctoral
disertación, either on the title page or in the acknowledgements. It was assumed that the
advisor mentioned first should be given priority. When the dissertation was unavailable, nosotros
relied on information provided by those who responded to our email requests for
información.
A second issue is that multiple edges crossing through the graphs may make it difficult to
identify who is advising whom. To reduce confusion, each node was assigned a unique color
and all edges originating from that node were given the same color. It is assumed that all
nodes can be considered on a circle centered at the root node (Cifra 4). The radial distance
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 4. Node colors are assigned by determining the smallest circle, centered at the root node,
that contains all the descendants. Locations with an equivalent hue-saturation-value circle are used
to calculate individual colors. The nodal information is the same as in Figures 1–3; sin embargo, alguno
extra edges have been added to indicate coadvisors.
Estudios de ciencias cuantitativas
497
Visualizing academic descendants using modified Pavlo diagrams
from the root to the center of the furthest node, ρmax, is treated as the radius of this circle.
Colors are then assigned to the nodes based on the angle and radius of an HSL (hue, satu-
ration, lightness) color wheel (HSL and HSV, n.d.). For each node i, with a radial distance ρi
and an angular position ωi, el (r, gramo, b) components for that node are given by
d
ri; gi
; bi
Þ ¼
8
>>>>>>< >>>>>>:
λ; χ; 0
d
χ; λ; 0
d
d
0; λ; χ
0; χ; λ
d
χ; 0; λ
d
d
λ; 0; χ
Þ;
Þ;
Þ;
Þ;
Þ;
Þ;
0○ ≤ ωi ≤ 60○
60○ ≤ ωi ≤ 120○
120○ ≤ ωi ≤ 180○
180○ ≤ ωi ≤ 240○
240○ ≤ ωi ≤ 300○
300○ ≤ ωi ≤ 360○
where λ = ρi /ρmax and
(cid:6)
χ ¼ λ 1 −
(cid:5)
(cid:5)
(cid:5)
(cid:5)
ωi
(cid:5)
60○ mod 2 − 1
(cid:5)
(cid:7)
:
(13)
(14)
The order in which children are drawn is important because it can be used to indicate the
order in which they completed their doctoral studies. For intermediate nodes, this means the
oldest child (es decir., earliest completion date) is drawn first and subsequent children are drawn,
en orden, in a clockwise direction. Sin embargo, because the children of the root node are placed
around a circle with no obvious beginning or end, the edge to the oldest child is drawn
directly from the root node (Cifra 4). All other edges from the root node are drawn from
around a central arc to indicate the order of completion. Edges from intermediate nodes
to their children are drawn using Bezier curves if that child is on its own ring; sin embargo, a
straight line is used if the child is on another ring to better distinguish the two scenarios
(Cifra 4).
2.3. Analyses
In addition to creating the visualizations themselves, five groups of analyses were performed
on the five data sets: determining the reductions in graph size achieved by the modified algo-
ritmo, investigating the effects of the user-selected parameters on the generated graphs,
quantifying researcher fecundity using mentorship metrics, calculating the length of an aca-
demic (doctoral) generación, and assessing the completeness of the data available via
AcademicTree.
2.3.1.
Improved performance of the modified algorithm
One of the main objectives of the modified algorithm was to decrease the space required to
display the genealogies. Family trees for the five researchers were generated using both algo-
rithms (see Data Availability for Python implementations). Each graph was rotated to the por-
trait orientation that used the smallest area as calculated by a rectangular bounding box. El
reduction in area was calculated as
(cid:9)
(cid:8)
ΔA ¼ 1 − AMP
AP
(cid:2) 100%
(15)
where AP is the rectangular area for the original Pavlo algorithm and AMP is the rectangular
area for the modified algorithm. Decreased size is reported as a positive percentage.
Estudios de ciencias cuantitativas
498
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
It should be noted that the size of the original Pavlo diagram will be determined by the
initial radius r, whereas the modified algorithm prescribes a minimum gap between nodes,
gramo. To ensure a fair comparison, the values of r were scaled to ensure an equivalent gap size.
The value of ϕ was kept constant for both algorithms.
2.3.2.
Effects of user-selected parameters (g and ϕ)
The two user-selected parameters ( g and ϕ) will control the size, forma, and quality of the
graphs generated by the modified algorithm. Por lo tanto, these parameters were investigated
independently to understand their effects. Because a constant value of g ( g0 = 1) fue usado
throughout, and an optimal value of ϕ (ϕ0) was determined for each genealogy, estos
parameters were used to calculate a reference area (A0) for each graph. The values of g were
then varied from 0.5 a 3 and the ratio of the resulting graph area (A) relative to the refer-
ence area (A/A0) was used to calculate the effect on size. Similarmente, ϕ was varied between
90° and ϕ0.
2.3.3. Mentorship statistics and metrics
Various summary statistics and metrics were calculated for the five individual researchers. El
first involved counting the number of descendants a researcher had in each generation. Those
supervised directly by the researcher were the first generation (niños), those supervised by
the first generation were the second generation (grandchildren), etcétera. The total of all
descendants was also calculated. When an individual has two or more advisors, it is possible
for them to be considered part of multiple generations. As with the visualizations, the primary
supervisor was used to determine the generation to which they belonged.
Two researcher fecundity metrics were also calculated and reported, both of which were
inspired by bibliometric indices (Hirsch, 2005; Egghe, 2006). The mentoring h-index (hm) pro-
posed by Rossi et al. (2018) is defined as the number of direct descendants n who themselves
have at least n descendants. Sin embargo, Sanyal et al. (2020) noted that this metric was insen-
sitive to the fact that an individual child may have a large number of descendants. They pro-
posed the mentoring g-index ( gm) which is defined as the largest number n, for which a
researcher has n academic children and n2 grandchildren.
2.3.4. Academic generation length calculation
For each researcher, the time between their own graduation and the completion of their first
doctoral student was calculated in years. This time to reproduce within academia, an aca-
demic generation, is the research equivalent of a human generation. Given that other forms
of progeny such as master’s or postdoctoral students have not been considered in this study,
it might more accurately be termed a doctoral generation. Sin embargo, this distinction may
be unnecessary because those with master’s degrees are typically ineligible to advise grad-
uate students, and postdoctoral students, by definition, already have the qualifications
necessary.
2.3.5. Assessment of AcademicTree data
Online databases are frequently used by researchers interested in understanding academic
genealogical patterns. These databases tend to be focused on researchers in specific domains
Estudios de ciencias cuantitativas
499
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
such as mathematics (Mathematics Genealogy Project, n.d.) or biological anthropology (barr,
Nachman, & Shapiro, n.d.). Although it began as NeuroTree, and was initially focused on
researchers in neuroscience (David & Hayden, 2012), AcademicTree.org has since expanded
to other areas and has become the most generalized repository available. With almost one
million entries and connections, and because researchers routinely use this data for analysis,
it is of interest to assess the comprehensiveness of these community-provided data compared
to the manually tracked data collected for this project.
AcademicTree data consist of two types of information: a person and a connection. Snap-
shots of the entire data set (David, 2021)—the most recent of which is from January 14,
2021—are publicly available for processing and analysis (Liénard, Achakulvisut et al.,
2018). Primero, the researchers identified in the five data sets were checked against those in Aca-
demicTree to confirm whether they were present. Segundo, whether the connection between
advisor and student was present in the database was also verified. Only the connections
between people in the individual trees, those represented by edges in the visualizations, eran
evaluated. The values for both people and connections are reported as a percentage of the
data collected in this study which are correctly contained in AcademicTree. Incorrect or addi-
tional information in AcademicTree was not evaluated, so the reported values represent an
upper-bound estimate of the data coverage.
3. RESULTADOS
Academic genealogies for the five researchers were assembled from a variety of online sources
and from information provided by individuals within each tree. Data files containing this infor-
mation are available for those who wish to reuse them (see Data Availability).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 5. Modified Pavlo diagram showing 93 doctoral descendants of Ron E. Zernicke (ϕ = 175°).
Estudios de ciencias cuantitativas
500
Visualizing academic descendants using modified Pavlo diagrams
The modified Pavlo diagrams showing the academic descendants of the five selected
researchers are given in Figures 5–9. Siblings will never overlap due to the nature of the algo-
ritmo; sin embargo, interactions between more distantly related individuals are possible. The larg-
est value of ϕ that eliminated intersection of the containment rings was determined for each
graph via trial and error; the specific value used is indicated in the caption. Note that only the
outer portions of the containment rings have been drawn, and the ϕ lines have been elimi-
nated altogether, to reduce visual clutter. Note also that the graphs have been rotated into the
portrait orientation that makes the most efficient use of the page. Python implementations of
the original and modified algorithms are available for reuse (see Data Availability).
The modifications to the algorithm were able to reduce the total area needed to present
the genealogies. These reductions were quantified after adjusting the r value of the original
Pavlo algorithm to ensure equivalent node size and spacing, and after rotating both geneal-
ogies to their optimal portrait orientation. As shown in Figure 10, the sample data used in
Figures 1–3 were just one quarter of the size of the original when plotted with the modified
algoritmo (ΔA = 76.5%). Even larger reductions were observed for the genealogies of the five
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 6. Modified Pavlo diagram showing 147 doctoral descendants of Steven A. Goldstein
(ϕ = 165°).
Estudios de ciencias cuantitativas
501
Visualizing academic descendants using modified Pavlo diagrams
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
Cifra 7. Modified Pavlo diagram showing 150 doctoral descendants of Lawrence E. Thibault
(ϕ = 142°).
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 8. Modified Pavlo diagram showing 343 doctoral descendants of Van C. Mow (ϕ = 128°).
Estudios de ciencias cuantitativas
502
Visualizing academic descendants using modified Pavlo diagrams
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 9. Modified Pavlo diagram showing 384 doctoral descendants of Wilson C. Hayes (ϕ = 140°).
investigadores. The Zernicke data was reduced in area by 97.4% (Cifra 11). The graphs of the
other four researchers had ΔA > 99.9% but are not shown due to the very sparse trees pro-
duced by the original algorithm.
The areas of the graphs generated by the modified Pavlo algorithm will be affected by the
user-selected g parameter. Three values of g are shown in Figure 12 applied to the Zernicke
genealogy; the overall layout of the nodes is unchanged by g, and only the scale is affected.
The change in size might be expected to follow a trend where A/A0 ∝ ( g/g0)2 as a doubling of g
might be expected to double both the width and height of the graph; sin embargo, Cifra 13 indi-
cates that A/A0 increases more slowly. This behavior results from the graph-specific path by
which the outermost nodes approach the bounding box.
The optimal angle (ϕ0) was selected iteratively for each graph based on the values at which
two or more containment rings began to overlap. The plot in Figure 13 indicates that the area (A/
A0) increases nonlinearly for decreasing values of ϕ. The values of ϕ0 varied from 128–175° for
the five genealogies. Although ϕ0 tends to decrease with the total number of nodes, the value is
dependent on the specific shape of the graph. An alternative to iteratively selecting an optimized
ϕ is to select a small angle unlikely to result in collisions of the containment rings, albeit with a
resulting increase in area. Por ejemplo, if a conservative value of ϕ = 120° had been chosen
Estudios de ciencias cuantitativas
503
Visualizing academic descendants using modified Pavlo diagrams
Cifra 10. Comparison of the sample family tree shown in Figures 1–3 generated using the orig-
inal and the modified Pavlo algorithms. The bounding boxes indicate the areas used to compare the
relative size of each graph. The same node size, minimum gap size, and ϕ = 160° were used in both
casos, but the modified algorithm is smaller (ΔA = 76.5%).
a priori for all graphs, their areas would have increased between 1.25 y 2.25 veces (Cifra 13).
It should also be noted that, because there tends to be one region that determines the ϕ0, el
graphs are relatively insensitive to small deviations from the optimal value. Cifra 14 ilustra
the resulting changes when adjusting ϕ0 for the Goldstein genealogy by ±10°.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 11. Comparison of the Zernicke family trees generated by the original and the modified
Pavlo algorithms (recuadro). The bounding boxes indicate the areas used to compare the relative size
of each graph. The same node size, minimum gap size, and ϕ = 175° were used in both cases, pero
the modified algorithm is much smaller (ΔA = 97.4%).
Estudios de ciencias cuantitativas
504
Visualizing academic descendants using modified Pavlo diagrams
Cifra 12. Modified Pavlo diagram showing 93 doctoral descendants of Ron E. Zernicke (ϕ =
175°) for three values of g.
The numbers of descendants in each generation for the five researchers are shown in
Mesa 1. The researchers had between 15 y 32 direct descendants and their total number
of descendants ranged from 93 a 384. Some individuals and their descendants appear
in two family trees because of cosupervision. Por lo tanto, el 1,118 descendants calculated
by adding up the totals of the five researchers consist of only 1,091 unique descendants
when duplicates are removed. When the five original researchers are included, hay
1,096 unique researchers across the five trees. The hm-index was 5–7 for each of the
investigadores, despite the very different genealogical trees. The gm-index showed more sen-
sitivity and varied in the range 7–13. Based on the analysis of AcademicTree.org data
reported by Sanyal et al. (2020), these values place them in the top 1% of researchers with
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 13. Change in area (A/A0) for a range of g and ϕ values for the five genealogies. The dot-dashed line in each graph indicates the
reference condition for each genealogy.
Estudios de ciencias cuantitativas
505
Visualizing academic descendants using modified Pavlo diagrams
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 14.
quality for the 147 doctoral descendants of Steven A. Goldstein.
Effects of a ±10° deviation from the optimal ϕ value (ϕ0 = 165°) on graph size and
at least one descendant. This comparison is reported for context but should be interpreted
with caution given the differences in the data sets used.
Había 153 individuals among the 1,096 unique researchers (14%) who had graduated
at least one PhD student of their own by the end of 2021. la diferencia (in years) between the
graduation date of each advisor and that of their first doctoral student is shown in Figure 15.
The distribution is right skewed with an average time of 9.6 años. The median (9 años) y
mode (7 y 8 años) were both slightly faster than the average.
Finalmente, the coverage of the crowd-sourced AcademicTree data, relative to the data col-
lected for this study, se muestran en la tabla 2. The percentage of people in the five individual
genealogies varied between 23% y 70%, con 45% of the unique researchers included.
The numbers of connections included in AcademicTree was lower: 34% en general, con un
Mesa 1. Number of descendants for the five researchers, broken down by generation, junto con
their mentoring metrics (hm and gm)
Investigador
Zernicke
Goldstein
Thibault
Mow
Hayes
1st
25
32
15
29
21
Descendants by generation
4th
3rd
2nd
4
11
53
69
83
176
139
35
48
124
201
12
4
14
23
Total
93
148
150
343
384
hm
5
5
6
7
7
Métrica
gm
7
8
9
13
11
506
Estudios de ciencias cuantitativas
Visualizing academic descendants using modified Pavlo diagrams
Cifra 15. Time between a researcher completing their own PhD and graduating their first doctoral student, based on their year of gradu-
ación. The angled black line indicates the maximum possible time for a given year. The red line indicates the average. The frequency distri-
bution and statistics are also shown.
Mesa 2.
(AT) for the five trees and when only unique researchers are considered
Percentage of the people and connections in the current study (CS) found in AcademicTree
Investigador
Zernicke
Goldstein
Thibault
Mow
Hayes
Unique
AT
22
64
61
241
130
496
People
CS
94
149
151
344
385
1,096
%
23.4
43.0
40.4
70.0
33.8
45.3
Connections
CS
96
152
160
352
391
1,132
%
16.7
31.0
30.0
57.4
22.8
34.3
AT
16
47
48
202
89
391
range of 17–57%. These numbers represent an upper-bound estimate given that we expect
that the current data are incomplete and because any erroneous connections in the Acade-
micTree data set were also not evaluated.
4. DISCUSIÓN
Circular or radial graphing algorithms result in academic genealogies with smaller aspect ratio
layouts, as compared to a standard family tree, for large numbers of descendants. A modified
Pavlo layout algorithm has been presented herein that corrects some of the shortcomings of the
original. It has been shown to be useful on a range of academic trees with up to four gener-
ations and over 380 descendants. The data sets and a reference implementation of the algo-
rithm are available as open data (see Data Availability).
Estudios de ciencias cuantitativas
507
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
.
/
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
The modified algorithm succeeded in reducing the area occupied by each genealogy. A
77% reduction was obtained for the simple example tree in Figure 10, with reductions of
97% or greater for the genealogies of the five researchers. It could be argued that including
the containment rings in the bounding boxes used to calculate area inflated these values in
some cases (p.ej., Cifra 11). Sin embargo, substantive reductions were obtained in this study
with the modified algorithm. Other use cases would have to be studied to confirm whether
similar performance can be expected; sin embargo, the approach appears to be robust.
The algorithm has two user-specified parameters: the minimum gap length ( gramo) y el
included angle for the containment arc onto which children are fit (ϕ). Values of g = 1 y
ϕ = 128–175° have been used successfully herein. Because the g term controls the scale of
the resulting graph, it can be chosen to alter the spacing between nodes without altering the
overall shape (Cifra 12). En cambio, the value of ϕ was manually selected to obtain the
largest value that ensured that no overlap in the containment rings occurred. Smaller values
of ϕ tended to be needed as the number of descendants grew, but the exact value depends
on the specific shape of the graph. Based on the range of values determined in the current
estudiar, an initial estimate of ϕ = 150° is recommended for an iterative search. Alternativamente, a
constant value of 120° would have yielded satisfactory graphs in all cases, albeit with up to
a 2.25× increase in area (Cifra 13). Such increases in size may be undesirable in some
applications, but the resulting graphs would still be much smaller (ΔA > 90%) than those
produced by the original Pavlo algorithm. The value of ϕ in the current algorithm is both
manually selected and constant across all nodes of the graph. Future improvements could
be made to the algorithm to either recursively adjust a constant ϕ value or to determine
unique ϕi values for each containment ring to eliminate overlap and minimize the area
usado; sin embargo, these changes would come with increased computational costs. The current
algorithm has shown to be applicable to the unique shape and sizes of academic geneal-
ojos, but may also have application to representing a broader range of trees. Different
guidelines for ϕ values may be needed in such cases.
The data sets assembled for five biomedical researchers relied on a variety of public and
commercial resources. Individual researchers were then contacted to confirm the collected
data and gather additional information. Ensuring that the data sets were as current, correcto,
and comprehensive as possible was important to properly demonstrate the suitability of the
algorithm and justified the extra effort involved. Having shown that the algorithm can perform
well when handling these large, real-world data sets, it should have no issues with smaller,
more sparse graphs. Plus, it is important that the data used exhibit the unique characteristics
of academic genealogies. Por ejemplo, the Hayes tree (Cifra 9) has one child (Dennis R.
Carretero) who himself has a very large number of descendants. Such a feature would not nec-
essarily be found in artificially generated data, or even when using incomplete data. A pesar de
every effort was made to ensure the completeness of the data, it must be acknowledged that
they are imperfect. Not everyone could be contacted, and not everyone who was contacted
replied (the response rate was roughly 45%). Sin embargo, these data are the most exhaustive
academic genealogies for these five researchers currently available.
It was interesting to compare the results of the manually traced genealogies created for this
study with the crowd-sourced data available. Apenas 45% of the people and 34% of the con-
nections identified were found in the Academic Family Tree (n.d.) datos. It should also be noted
that this evaluation only considered the people and connections within the researchers’ gene-
alogies; advisors (and connections to those advisors) outside the tree were not considered or
counted, nor were any erroneous connections within the AcademicTree data. Because of this
Estudios de ciencias cuantitativas
508
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
methodology, and because the current data are known to be incomplete, these percentages
represent an upper-bound estimate of the true coverage. Given that the five researchers were
within biomechanics and biomedicine, it is unclear how coverage might differ for researchers
in other domains. Sin embargo, some incompleteness should be expected and accounted for
by researchers performing analyses using the AcademicTree data.
Less complete data are to be expected in crowd-sourced resources, as they rely on contin-
uous participation to provide the necessary information. In this context, it is noteworthy that
one researcher had much higher coverage than the other four (Mesa 2). The reason for this
discrepancy is that Dr Mow was awarded the 2017 Alfred R. Shands, Jr., MD Award by the
Orthopaedic Research Society (ORS) for significant contributions to the field. As part of the
awards ceremony, some of his descendants presented an academic lineage that they had com-
piled and uploaded to the AcademicTree website. This detail further underscores both the dil-
igence needed to assemble exhaustive data and the challenge of keeping it updated.
Había 1,096 unique researchers across the five data sets. As of the end of 2021, 153 de
them had gone on to have a PhD student of their own (14%) and it took an average time of 9.6
years to do so. The distribution of these times to graduate a first PhD is right skewed. Este
behavior likely reflects uneven sampling, as the increasing numbers of descendants with time
means that graduation dates skew towards the present, and the fact that there is maximum
length of time for recent graduates to have graduated their own PhD students (the angled line
of Figure 15). Given that most of the researchers studied are in biomechanics or biomedicine,
and given that most degrees were earned at institutions in the United States, those durations
might not be reflective of other contexts. Sin embargo, it is helpful to think of an academic (o
doctoral) generation as being roughly a decade in length.
Two mentorship metrics were calculated for the five researchers. The hm-index varied from
5–7, whereas the gm index ranged from 7–13. These results agree with the observations of
Sanyal et al. (2020) that the hm-index was a less sensitive metric. Based on an analysis of
AcademicTree by Sanyal et al. (2020), these gm values would place each of the researchers
in elite territory; sin embargo, direct comparison between the two is difficult because of differ-
ences in the data used. Por ejemplo, AcademicTree includes all graduate students and post-
doctoral researchers in its mentorship data, not just the doctoral students considered herein,
which would lead to higher metrics than those reported in the current study. En cambio, el
incompleteness of the AcademicTree data already discussed could also skew their metrics
downward. The gm-index offers improved discretization but care is needed when evaluating
different researchers to ensure that equitable comparisons are being made.
Finalmente, it should be recognized that the graphs and indices only capture a particular form
of “success” with regard to mentorship. Quantity and quality are orthogonal concepts. Estos
numbers focus on the former and, although it is tempting to view those with smaller numbers
as being less successful, it is important to recall the distinction between student- or advisor-
centric methods of assessment. A student entering a doctoral program may do so to pursue a
career in industrial research, with the goal of launching a start-up company, or to comple-
ment future training in other professions such as law or medicine. The extent to which an
advisor equips that student to obtain these goals is a different metric of success altogether.
Other important definitions of success, such as the way in which an advisor treats their
trainees, are equally difficult to quantify. Por lo tanto, although the work presented herein cer-
tainly provides insight into mentoring fecundity, it should be balanced by the recognition that
“not everything that can be counted counts, and not everything that counts can be counted”
(Cameron, 1963).
Estudios de ciencias cuantitativas
509
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
En conclusión, the current work has proposed a modified Pavlo algorithm for representing
compact depictions of academic genealogies. The utility of the approach has been demon-
strated using data sets showing the doctoral descendants of five prolific researchers in biome-
chanics and biomedicine. A number of different analyses have also been performed on the
datos, which show that the gm-index is a more sensitive measurement of fecundity, apenas
45% of people and 34% of connections were covered in AcademicTree, and the average time
to graduate one’s first PhD student was roughly a decade.
EXPRESIONES DE GRATITUD
The author would like to thank all the respondents for their assistance with, interest in, y
enthusiasm for this project. Interacting with you has been a wonderful reminder of the best
aspects of academia. The author would also like to acknowledge his father, the keeper of
our family tree, from whom he has inherited an interest in genealogies.
CONFLICTO DE INTERESES
The author has no competing interests.
INFORMACIÓN DE FINANCIACIÓN
No funding was received for this research.
DISPONIBILIDAD DE DATOS
The data associated with this paper are available for reuse from Borealis (formerly Scholars
Portal Dataverse): https://doi.org/10.5683/SP3/MDGUTK. The data for the five genealogies
are available as comma-separated value (CSV) and extended markup language (XML) files,
while the diagrams themselves are provided as scalable vector graphics (SVG). The Python
code used to generate the original and modified Pavlo diagrams is also provided as a reference
implementación. All files are available under a Creative Commons CC0 “Public Domain
Dedication” license.
REFERENCIAS
Academic Family Tree. (n.d.). https://academictree.org/.
Arce-Orozco, A., Camacho-Valerio, l., & Madrigal-Quesada, S.
(2017). Radial tree in bunches: Optimizing the use of space in
the visualization of radial trees. En 2017 International Conference
on Information Systems and Computer Science (páginas. 369–374).
https://doi.org/10.1109/INCISCOS.2017.32
barr, W.. A., Nachman, B., & Shapiro, l. (n.d.). The academic phy-
logeny of biological anthropology. https://bioanthtree.org.
bennett, A. F., & Lowe, C. (2005). The academic genealogy of
George A. Bartholomew. Integrative and Comparative Biology,
45(2), 231–233. https://doi.org/10.1093/icb/45.2.231, PubMed:
21676766
Cameron, W.. B. (1963). Informal sociology: A casual introduction
to sociological. Nueva York: Random House.
Damaceno, R. j. PAG., Rossi, l., Mugnaini, r., & Mena-Chalco, j. PAG.
(2019). The Brazilian academic genealogy: Evidence of advisor–
advisee relationships through quantitative analysis. cienciometria,
119(1), 303–333. https://doi.org/10.1007/s11192-019-03023-0
David, S. V. (2021). Academic Family Tree data export (1.0) [datos
colocar]. https://doi.org/10.5281/zenodo.4441298
David, S. v., & Hayden, B. Y. (2012). Neurotree: A collaborative,
graphical database of the academic genealogy of neuroscience.
PLOS One, 7(10), e46608. https://doi.org/10.1371/journal.pone
.0046608, PubMed: 23071595
Egghe, l. (2006). Theory and practise of the g-index. Scientmet-
rics, 69(1), 131–152. https://doi.org/10.1007/s11192-006-0144-7
García-Suaza, A., Otero, J., & Winkelmann, R. (2020). Predicting
early career productivity of PhD economists: Does advisor-match
asunto? cienciometria, 122(1), 429–449. https://doi.org/10.1007
/s11192-019-03277-8
Gaule, PAG., & Piacentini, METRO. (2018). An advisor like me? Advisor gen-
der and post-graduate careers in science. Política de investigación, 47(4),
805–813. https://doi.org/10.1016/j.respol.2018.02.011
Grivet, S., Auber, D., Domenger, j. PAG., & Melancon, GRAMO. (2006).
Bubble tree drawing algorithm. Computer Vision and Graphics,
633–641. https://doi.org/10.1007/1-4020-4179-9_91
Estudios de ciencias cuantitativas
510
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Visualizing academic descendants using modified Pavlo diagrams
HSL and HSV. (n.d.). https://en.wikipedia.org/wiki/HSL_and_HSV.
Hirsch, j. mi. (2005). An index to quantify an individual’s scientific
research output. Proceedings of the National Academy of Sci-
ences of the United States of America, 102(46), 16569–16572.
https://doi.org/10.1073/pnas.0507655102, PubMed: 16275915
Huang, GRAMO., li, y., Broncearse, X., Broncearse, y., & Lu, X. (2020). PLANET: A radial
layout algorithm for network visualization. Physica A, 539,
122948. https://doi.org/10.1016/j.physa.2019.122948
kelly, mi. A., & Sussman, R. W.. (2007). An academic genealogy on
the history of American field primatologists. American Journal of
Physical Anthropology, 132(3), 406–425. https://doi.org/10.1002
/ajpa.20532, PubMed: 17154360
Levecque, K., Anseel, F., De Beuckelaer, A., Van der Heyden, J., &
Gisle, l. (2017). Work organization and mental health problems
in PhD students. Política de investigación, 46(4), 868–879. https://doi.org
/10.1016/j.respol.2017.02.008
Liénard, j. F., Achakulvisut, T., Acuna, D. MI., & David, S. V. (2018).
Intellectual synthesis in mentorship determines success in aca-
demic careers. Comunicaciones de la naturaleza, 9, 4840. https://doi.org
/10.1038/s41467-018-07034-y, PubMed: 30482900
Lv, r., & Chang, h. (2021). Bibliometric-based study of scientist
academic genealogy. Journal of Data and Information Science,
6(3), 146–163. https://doi.org/10.2478/jdis-2021-0021
Mackie, S. A., & Bates, GRAMO. W.. (2019). Contribution of the doctoral
education environment to PhD candidates’ mental health prob-
lemas: A scoping review. Higher Education Research and Devel-
opment, 38(3), 565–578. https://doi.org/10.1080/07294360
.2018.1556620
Malmgren, R. D., Ottino, j. METRO., & Nunes Amaral, l. A. (2010). El
role of mentorship in protégé performance. Naturaleza, 465(7298),
622–626. https://doi.org/10.1038/nature09040, PubMed:
20520715
Mathematics Genealogy Project. (n.d.). https://www.mathgenealogy
.org/.
mitchell, METRO. F. (1992). A descriptive analysis and academic gene-
alogy of major contributors to JTPE in the 1980s. Diario de
Teaching in Physical Education, 11(4), 426–442. https://doi.org
/10.1123/jtpe.11.4.426
Montoye, h. J., & Washburn, R. (1980). Research Quarterly contrib-
utors: An academic genealogy. Research Quarterly for Exercise
and Sport, 51(1), 261–266. https://doi.org/10.1080/02701367
.1980.10609287
Pavlo, A., Homan, C., & Schull, j. (2006). A parent-centered radial
layout algorithm for interactive graph visualization and anima-
ción. arXiv:cs/0606007. https://doi.org/10.48550/arXiv.cs
/0606007
Reinoro, mi. METRO., & Tilford, j. S. (1981). Tidier drawings of trees. IEEE
Transactions on Software Engineering, 7(2), 223–228. https://doi
.org/10.1109/TSE.1981.234519
Rossi, l., Damaceno, R. j. PAG., Freire, I. l., Bechara, mi. j. h., & Mena-
Chalco, j. PAG. (2018). Topological metrics in academic genealogy
graphs. Journal of Informetrics, 12(4), 1042–1058. https://doi.org
/10.1016/j.joi.2018.08.004
Russell, t. GRAMO., & Sugimoto, C. R. (2009). MPACT family trees:
Quantifying academic genealogy in library and information sci-
ence. Journal of Education for Library and Information Science,
50(4), 248–262.
Rutter, l., VanderPlas, S., Cocinar, D., & graham, METRO. A. (2019). gge-
nealogy: An R package for visualizing genealogical data. Diario
of Statistical Software, 89(13), 1–31. https://doi.org/10.18637/jss
.v089.i13
Sanyal, D. K., Dey, S., & El, PAG. PAG. (2020). gm-index: A new men-
torship index for researchers. cienciometria, 123(1), 71–102.
https://doi.org/10.1007/s11192-020-03384-x
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
/
mi
d
tu
q
s
s
/
a
r
t
i
C
mi
–
pag
d
yo
F
/
/
/
/
3
3
4
8
9
2
0
5
7
8
5
4
q
s
s
_
a
_
0
0
2
0
5
pag
d
/
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Estudios de ciencias cuantitativas
511