INFORME

INFORME

A Reappraisal of Dependency Length
Minimization as a Linguistic Universal

Himanshu Yadav1

, Shubham Mittal2, and Samar Husain3

1Department of Linguistics, University of Potsdam, Alemania
2Department of Chemical Engineering, Indian Institute of Technology Delhi, India
3Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, India

un acceso abierto

diario

Palabras clave: dependency length minimization, syntactic complexity, working-memory constraints

ABSTRACTO

Dependency length minimization is widely regarded as a cross-linguistic universal reflecting
syntactic complexity in natural languages. A typical way to operationalize dependency length
in corpus-based studies has been to count the number of words between syntactically related
palabras. Sin embargo, such a formulation ignores the syntactic nature of the linguistic material
that intervenes a dependency. En este trabajo, we investigate if the number of syntactic heads
(rather than the number of words) that intervene a dependency better captures the syntactic
complexity across languages. We demonstrate that the dependency length minimization
constraint in terms of the number of words could arise as a consequence of constraints on the
intervening heads and the tree properties such as node arity. The current study highlights
the importance of syntactic heads as central regions of structure building during processing.
The results show that when syntactically related words are nonadjacent, increased structure
building in the intervening region is avoided.

INTRODUCCIÓN

Natural languages have been argued to be shaped by communicative pressures as well as cer-
tain cognitive constraints such as limited working memory (Bickerton, 2003; Hawkins, 2014;
Hockett, 1960; Jaeger & Teja, 2011; Zipf, 1949). Such accounts contend that efficiency in for-
mulating and comprehending a language dictates its formal properties (Bybee, 2006; Croft,
2001; Gibson et al., 2019; Haspelmath, 2008; Hawkins, 1994; Piantadosi et al., 2012) y
is a vital determinant of a language’s communicative utility. In the sentence processing liter-
ature, a dominant way to operationalize and test this efficiency has been in terms of the linear
arrangement of syntactically related words (p.ej., a verb and its nominal arguments) (Futrell
et al., 2020). The hypothesis, termed dependency length minimization (DLM), holds that,
on average, the distance between a head (p.ej., a verb) and its dependent (p.ej., a noun) es
minimized in natural languages (Behagel, 1930; Gibson, 1998; Gildea & Temperley, 2007;
Hawkins, 1990, 2014; Hudson, 1995; Rijkhoff, 1986; Temperley & Gildea, 2018). Why should
dependencies be short? Theories of sentence processing maintain that syntactic dependencies
(p.ej., the syntactic relation between the verb “ate” and “John”/“a mango” in John ate a mango)
need to be established in order to comprehend or produce a sentence. Dependency resolution
between a pair of words typically involves one of the words to be temporarily retained in
memory. Under the assumption of limited working memory (Baddeley & Hitch, 1974; Cowan,

Citación: Yadav, h., Mittal, S., & Husain,
S. (2022). A Reappraisal of Dependency
Length Minimization as a Linguistic
Universal. Mente abierta: Descubrimientos
en Ciencias Cognitivas, 6, 147–168.
https://doi.org/10.1162/opmi_a_00060

DOI:
https://doi.org/10.1162/opmi_a_00060

Materiales suplementarios:
https://doi.org/10.1162/opmi_a_00060;
https://osf.io/j975y/

Recibió: 14 Marzo 2021
Aceptado: 1 Julio 2022

Conflicto de intereses: Los autores
declare no conflict of interest.

Autor correspondiente:
Samar Husain
samar@hss.iitd.ac.in

Derechos de autor: © 2022
Instituto de Tecnología de Massachusetts
Publicado bajo Creative Commons
Atribución 4.0 Internacional
(CC POR 4.0) licencia

La prensa del MIT

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

2001; Molinero, 1956; see Miyake & Shah, 1999, for an extensive overview), longer dependen-
cies could lead to retrieval failure due to decay or interference-driven constraints (Bartek et al.,
2011; Grodner & Gibson, 2005; Luis & Vasishth, 2005). En efecto, longer syntactic dependen-
cies have been shown to pose more difficulty during both comprehension and generation
(Bartek et al., 2011; Grodner & Gibson, 2005; Scontras et al., 2017). Recent large-scale
cross-linguistic corpus investigations have provided a strong validation for the DLM hypothesis
(Futrell et al., 2015; Liu, 2008; Liu et al., 2017). Based on this line of research, DLM has been
claimed to be a linguistic universal showcasing the influence of communicative pressure and
cognitive constraints on language forms (Futrell et al., 2020). Por ejemplo, it has been argued
to determine some critical properties of languages, como, the rarity of discontiguous phrases
(Ferrer-i Cancho, 2006). Relacionado, it has been argued that the occurrence of the two most
frequent word orders (Subject-Verb-Object, and Subject-Object-Verb) across languages can
be explained by such minimization pressures during comprehension (Hawkins, 1990).

Dependency length in large-scale corpus studies (p.ej., Futrell et al., 2015) has typically
been operationalized by counting the number of words between syntactically related words.
Sin embargo, in the larger literature, dependency length has been computed using a variety of
maneras, Por ejemplo, number of discourse referents (Gibson, 1998), number of phrasal nodes
(Ferreira, 1991), number of words (Temperley, 2007), etcétera. Previous studies comparing
the effectiveness of such metrics have argued that these metrics (p.ej., counting number of
words vs. counting number of phrases) are largely interchangeable (Szmrecsányi, 2004;
Wasow, 1997). This would suggest that computing dependency length using any of these mea-
sures should be equally effective in capturing linguistic complexity. Sin embargo, a large-scale
corpus study that tests the possible interaction or independence of various metrics is currently
lacking.

Operationalizing dependency length in terms of the number of words ignores the syntactic
nature of the linguistic material that intervenes a dependency. Given the limited memory
resource, it is reasonable to assume that more structure building in the intervening region
should lead to more difficultly in processing the unresolved dependency. Consistente con esto
idea, there is evidence that not only the number but the complexity of the words that intervene
a syntactic dependency matters (p.ej., Gibson & tomás, 1999; Wasow & arnold, 2003; Yadav
et al., 2020). Por ejemplo, it has been shown that introducing clausal embeddings can lead to
forgetting effects during comprehension (Gibson & tomás, 1999). Similarmente, Wasow and
arnold (2003) found an independent effect of phrasal complexity on noun phrase shifts and
dative alternations. Curiosamente, while Wasow and Arnold (2003) argue for an independent
effect of both length and phrasal complexity, others have proposed that phrasal length is
not an appropriate metric to quantify syntactic complexity (Chomsky, 1975). This line of work
predicts that the complexity of the linguistic material that intervenes a syntactic dependency
will be minimized. We call this the intervener complexity minimization (ICM) hypothesis. En
this work, we operationalized complexity as the number of syntactic heads that intervene a
dependency ( Yadav et al., 2017, 2020; ver figura 1).

The rationale behind using the number of intervening heads as a measure of complexity
comes from the proposal that both structural integrations and temporary storage of linguistic
items consume the same pool of limited resources (Gibson, 1998; Just & Carpintero, 1992). Para
ejemplo, En figura 1, the node Xd has to be actively maintained in memory until the compre-
hender resolves the dependency Xh → Xd. En figura 1(b), compared to Figure 1(a), more num-
bers of structural integrations are required in the region intervening Xh and Xd, eso es, Xj → Xi,
Xk → Xj need to be resolved. Since these integrations are assumed to consume the same pool
of limited resources, the maintenance of node Xd should become more difficult in Figure 1(b)

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

148

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

/

.

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

→ Xd. While the dependency lengths (number of words that
Cifra 1. Dependency structures with varying intervener complexity for Xh
intervene Xh → Xd) in tree (a) and tree (b) son lo mismo, the two structures differ in their intervener complexity (the number of intervening heads).

compared to Figure 1(a), and hence cause more difficulty in resolving the Xh → Xd depen-
dency in Figure 1(b) than in Figure 1(a). En suma, the number of intervening heads represents
the amount of resource demand due to structural integrations in the intervening region of a
dependency.1 The ICM hypothesis states that the intervener complexity, eso es, the number
of heads intervening a dependency, is minimized in natural languages. The DLM hypothesis
based on the number of words does not make any prediction regarding the nature of words
that intervene a dependency.

While the ICM hypothesis tests if intervener complexity (IC) is minimized in natural lan-
guage, it does not test how IC and dependency length (DL) interact. Recall that previous work
( Wasow & arnold, 2003) suggests that both have independent influence on the complexity of
una sentencia. Given that the dependency length is an upper bound on the intervener complexity
there are two ways in which DL and IC could interact in capturing syntactic complexity across
idiomas. The first possibility is that a constraint on IC and a constraint on DL independently
shape the pattern of linguistic structures. One can ask whether the intervener complexity is
minimized independent of the minimization of dependency length. We term this as the ICM
as an independent constraint hypothesis. The second possibility is that an IC-based measure is
better at capturing syntactic complexity compared to a DL-based measure. De este modo, nosotros también
investigate the DLM as an independent constraint hypothesis, eso es, whether dependency
length is minimized independently of the constraint on intervener complexity. En suma, we test
three related hypotheses: (a) ICM hypothesis, (b) ICM as an independent constraint hypothesis,
y (C) DLM as an independent constraint hypothesis.

In order to test these hypotheses, we conduct a cross-linguistic corpus study where we com-
pare the real trees attested in dependency treebanks with random baseline trees that match the
real trees in certain properties. Such a methodology has previously been successfully
employed to demonstrate the cross-linguistic validity of DLM (p.ej., Futrell et al., 2015; Liu,
2008; Liu et al., 2017). For the purpose of this study, we introduce novel random baselines
that are more restrictive compared to the baselines used previously. Por ejemplo, to evaluate
whether intervener complexity is minimized independent of constraint on dependency
lengths, we generate baseline trees controlled for the distribution of dependency lengths
and compare them with the real trees in terms of intervener complexity.

The article is arranged as follows: En la sección 2, we discuss the baselines and statistical
methods used for testing the three hypotheses. En la sección 3, we discuss the results for each

1 En efecto, such a prediction will also hold for proposals that do not make a distinction between temporary
storage and integrations (p.ej., Luis & Vasishth, 2005). Under such an account, increased structure building
due to intervening heads will lead to retrieval difficulty of the dependent Xd due to time-driven decay (o
similarity-based interference).

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

149

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

/

.

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

hypothesis. We discuss the implications of the results in Section 4. Finalmente, we conclude the
article in Section 5.

MATERIALES Y MÉTODOS

Random Baselines

We employ six random baselines to test the hypotheses stated in the previous section. Cada
baseline controls for a particular set of tree properties relevant to the hypothesis.

Random baseline trees are generated by sampling from a uniform distribution over either
random tree structures or random linear arrangements. We apply further constraints (como
dependency length constraint) on these trees using rejection sampling to achieve the required
sample for each baseline. We try to generate one baseline tree for each tree in the dependency
treebank.

In all the baselines discussed below, we control the rate of crossing dependencies. En otra
palabras, baseline trees match the real trees in the number of crossing dependencies.2 Since
crossing dependencies are rare in natural languages (Straka et al., 2015), random trees with
large number of crossings tend to be dramatically different from real trees. Controlling for the
rate of crossing, por lo tanto, ensures a more strict baseline by preventing certain unrealistic
structural configurations.

The ICM hypothesis is tested using the random structures baseline and the random linear
arrangements (RLAs) base. In order to generate a random structures baseline tree for a given
real language tree, we first compute the number of nodes, eso es, sentence length, and the num-
ber of crossing dependencies in the real tree. Then using Prüfer codes (Prüfer, 1918), we sample
trees from a uniform distribution over tree structures of a given number of nodes. Sampled trees
that match with the number of crossings in the real trees are accepted as valid samples for the
base. Por eso, the random trees generated for this baseline are matched with real trees for the
sentence length and the number of crossing dependencies. Cifra 2(b) shows a random struc-
ture tree corresponding to a tree for a real sentence attested in a treebank—Figure 2(a). El
RLAs baseline trees are sampled from a uniform distribution over all random linearizations of
a given tree structure t. Compared to the random structures baseline, the RLA baseline preserves
all the topological properties such as arity,3 tree depth, hubbiness, etcétera, in addition to
sentence length and number of crossings. This makes the RLA baseline more conservative than
the random structures baseline (put differently, compared to the random structure trees, ellos son
more similar to the real trees). RLAs are generated by permuting the order of the nodes in a real
tree such that the dependency relations among the nodes are preserved. If a sampled tree
matches the number of crossings in the real tree, it is accepted as a valid sample for the baseline.
Cifra 2(C) shows a sample RLA corresponding to a real tree in Figure 2(a).

In order to test the ICM as an independent constraint hypothesis, the random structures base-
line trees and RLAs discussed above are further constrained by only selecting those baseline
trees where the sequence of dependency lengths matches with the corresponding real tree.
En otras palabras, the baseline trees are obtained by restricting the dependency length distribution
in the random structures and RLA trees. These baselines are termed respectively as DL-matched
random structures baseline and DL-matched RLAs. Figures 2(d) y 2(mi) show DL-matched

2 A crossing dependency is formed when two dependencies cross each other. Formalmente, a dependency, h → d
with h as the head and d as its dependent, is a crossing dependency if and only if there is at least one node, say i,
that intervenes h and d such that h does not (directly or indirectly) dominate i.
3 Arity of a node in a tree is defined as the number of dependents of that node.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

150

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

.

/

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

.

/

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 2. Sample trees for various random baselines corresponding to a dependency tree from an English treebank.

random structure and DL-matched RLA, respectivamente, corresponding to a real tree in Figure 2(a).
Tenga en cuenta que, since these baselines control the dependency length sequence, they allow for a com-
parison of intervener complexity between the real trees and baseline trees independent of the
influence of the dependency length distribution and topological properties like arity, etcétera.
En otras palabras, any difference in intervener complexity between the real trees and DL-matched
random structures baseline or DL-matched RLAs cannot be attributed to DL.

On similar lines, the DLM as an independent constraint hypothesis can be tested using the
IC-matched random structures baseline and the IC-matched RLAs. These trees are sampled by
restricting the intervener complexity distribution in the random structure and RLA trees, respetar-
activamente. Figures 2(F ) y 2(gramo) respectively show IC-matched random structure and IC-matched
RLA corresponding to a real tree in Figure 2(a). We again note that, since these baselines control
the IC sequence, they allow for a comparison of dependency length between the real trees and

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

151

Reappraisal of Dependency Length Minimization

Yadav et al.

Random baseline
Random structures baseline

Random linear arrangements

DL-matched random structures

DL-matched RLAs

IC-matched random structures

IC-matched RLAs

Mesa 1.

An overview of all six baselines.

Controlled tree property

Sentence length

DL-sequence

IC-sequence

Tree topology

Nota. DL = dependency length; IC = intervener complexity; RLA = random linear arrangement. Tree topology controls for arity and depth.

baseline trees independent of the influence of the intervener complexity and topological prop-
erties like arity, etcétera. A summary of all the baselines can be found in Table 1.

The baselines mentioned above have the advantage of being quite constrained and there-
fore allow us to test various hypotheses rigorously. Por ejemplo, the IC-matched RLA is a very
conservative baseline as it controls both the topological properties such as arity, profundidad, num-
ber of crossings, as well as the intervener complexity distribution. This baseline will be used to
test if there is any difference in dependency length distribution between real trees and baseline
trees when the intervener complexity is the same in the real and random trees. Mientras que la
above baseline allows us to test the DLM as an independent constraint hypothesis rigorously,
its complexity makes the generation process of such baseline trees prohibitively slow. Esto es
because we are controlling many properties of the baseline trees using rejection sampling.
Por lo tanto, we take sentences up to length 12 in this work. We discuss the issue of generaliz-
ability of our results in Section 4.

Datos

We use Surface-Syntactic Universal Dependencies (SUD) treebanks (versión 2.4) (Gerdes
et al., 2018, 2019) to perform all the analyses. We use the data of 54 idiomas. This set
was obtained after excluding the treebanks for languages with fewer than 500 oraciones
and treebanks for ancient languages such as Latin, Ancient Greek, Sanskrit, Old Church Sla-
vonic, Old Russian, and Old French. Our choice of SUD for the reported analysis is motivated
by the widespread assumptions regarding syntactic representation in sentence processing
investigación. En particular, this research subscribes to sentential representations consistent with
modern linguistic theories (p.ej., Bresnan, 1982; Chomsky, 1995; Hudson, 1984; Mel’čuk,
1988; Pollard & Sag, 1994) where function words are held to be syntactic heads (cf. Dillon,
2011; Gibson, 1998; Luis & Vasishth, 2005). See Osborne and Gerdes (2019) for a detailed
exposition on the syntactic assumptions in the SUD representation.

We compare the real trees attested in SUD treebanks with the baseline trees to test different

hypotheses. As stated earlier, we take sentences up to length 12 in this work.

Statistical Method

We want to test whether the distribution of intervener complexity or dependency length is
significantly different between real trees and the baseline trees. In order to do this, we fit linear

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

152

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

mixed-effect models (Bates et al., 2015) with varying intercepts and random slope adjustments
for languages using the lme4 package in R (R Core Team, 2020).

Suppose ICij is the mean intervener complexity for ith sentence of the jth language, Sij is the
length of ith sentence of the jth language, Rij is a dummy variable that encodes whether the sen-
tence is a real tree (como 1) or a baseline tree (como 0), β0 is the intercept term, β1 and β2 are the slope
terms for the main effect of sentence length and real/baseline variable respectively, β3 is the inter-
action term, u0,j is the random intercept adjustment for jth language, u1,j, u2,j and u3,j are random
slope adjustments for the jth language. The model to predict ICij is shown below
(cid:3)

(cid:1)

(cid:1)

(cid:3)

(cid:3)

(cid:3)

þ β1 þ u1;j

Sij þ β2 þ u2;j

(cid:1)
Rij þ β3 þ u3;j

: Rij þ (cid:2)

Sij

(1)

(cid:1)
ICij ¼ β0 þ u0;j

Similarmente, the model to predict mean dependency length for ith sentence of the jth language

is shown below.

(cid:1)
DLij ¼ β0 þ u0;j

(cid:3)

(cid:1)

þ β1 þ u1;j

(cid:3)

(cid:1)

Sij þ β2 þ u2;j

(cid:3)

(cid:1)
Rij þ β3 þ u3;j

(cid:3)

: Rij þ (cid:2)

Sij

(2)

For IC-related hypotheses, the dependent variable is the intervener complexity; for DL-
related hypotheses, the dependent variable is dependency length. We check the interaction
effect estimate ^β3 to test whether the data supports our hypotheses regarding ICM and DLM.
The interaction effect estimate ^β3 captures to what extent does the intervener complexity (o
dependency length) grows slower in real trees compared to baseline trees with respect to sen-
tence length. As an illustration, in order to test the ICM hypothesis, we check whether the
growth of intervener complexity with respect to sentence length is significantly slower in real
trees compared to random structure trees.

We note that the interaction parameter β3 is the effect of interest for testing our hypotheses
because an aggregate difference in dependency length or intervener complexity between real
trees and baseline trees (es decir., the main effect) could be subject to inaccuracies as the depen-
dencies are mixed from different sentence lengths (see Ferrer-i Cancho & Liu, 2013; Futrell
et al., 2015). In using the interaction effect for interpreting our results, we follow the recom-
mendation in Ferrer-i Cancho and Liu (2013) that dependency length should be considered as
a function of sentence length.

In addition to running the analysis on data for all the languages, we also tested the hypoth-
eses individually for each language. While doing so, we remove the random intercept and
slope adjustment for languages.

Prediction

Recall that the ICM hypothesis is tested with intervener complexity as the dependent variable
and uses the random structure and random linear arrangements baseline trees. The ICM as an
independent constraint hypothesis is tested with intervener complexity as the dependent var-
iable and uses the DL-matched random structure trees and DL-matched RLAs. Finalmente, el
DLM as an independent constraint hypothesis is tested with dependency length as the depen-
dent variable and uses the IC-matched random structure trees and IC-matched RLAs.

Each hypothesis predicts that the relevant dependent measure (IC or DL) grows slower in
real language trees with respect to sentence length compared to the respective baseline. En
particular, the ICM hypothesis predicts that the intervener complexity should grow slower
in real language trees with respect to sentence length compared to random structure baseline
trees and random linear arrangements. Similarmente, the ICM as an independent constraint

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

153

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

/

.

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

Cifra 3. Fitted models showing the growth of intervener complexity with respect to sentence length in real language trees compared to
random structure trees and random linear arrangements (RLAs).

Mesa 2.
baseline and random linear arrangements.

ICM hypothesis: Estimates from the fitted linear-mixed models for random structures

Random structures baseline
SE
0.008

Estimate
1.48

t value
172.96*

Random linear arrangements
SE
0.013

Estimate
1.52

t value
116.89*

0.29

−0.28

−0.17

0.004

0.015

0.007

71.07*

−19.06*

−24.46*

0.22

−0.29

−0,13

0.004

0.019

0.006

47.25*

−15.40*

−19.73*

Interceptar

S.length

Real

S.length:Real

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Nota. S.length = sentence length.

predicts that the intervener complexity should grow slower in real language trees with respect
to sentence length compared to DL-matched random structure trees and DL-matched RLAs.
Finalmente, the DLM as an independent constraint hypothesis predicts that the dependency length
grows slower in real language trees with respect to sentence length compared to IC-matched
random structure trees and IC-matched RLAs.

Por lo tanto, if the estimated interaction effect coefficient ^β3 is negative (see Equations 1, 2), él

would be evidence in support for a particular hypothesis.

RESULTADOS

With regard to the ICM hypothesis, Cifra 3 shows the distribution of intervener complexity
with respect to sentence length in real trees attested in treebanks and random baseline trees.
Mesa 2 shows the estimates from the fitted linear-mixed models.4 We find that the average

4 All the data and reproducible analysis files are available at https://osf.io/j975y/.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

154

Reappraisal of Dependency Length Minimization

Yadav et al.

Cifra 4. Fitted models showing the growth of intervener complexity with respect to sentence length in real language trees compared to
dependency length (DL)-matched random structures and DL-matched random linear arrangement (RLAs).

Mesa 3.
DL-matched random structures and DL-matched RLAs.

ICM as an independent constraint: estimates from the fitted linear-mixed models for

DL-matched random structures
SE
0.009

Estimate
1.19

t value
132.85*

DL-matched RLAs
SE
0.012

Estimate
1.24

t value
96.27*

0.15

−0,03

−0,03

0.007

0.003

0.004

19.54*

−10.80*

−6.40*

0.11

−0,02

−0,02

0.007

15.99*

0.004

0.004

−4.91*

−4.82*

Interceptar

S.length

Real

S.length:Real

Nota. DL = dependency length; ICM = intervener complexity minimization hypothesis; RLA =
random linear arrangement; S.length = sentence length.

intervener complexity grows much slower in real language trees compared to random struc-
tures baseline trees (^β3 = −0.17, t value = −24.5) and random linear arrangements (^β3 = −0.13,
t value = −19.7). The notes S7 and S8 in the Supplemental Materials show the language-
specific analyses for the hypothesis.

A similar trend is observed with regard to the ICM as an independent constraint hypothesis,
ver figura 4. Mesa 3 shows the estimates from the fitted linear-mixed models. El efecto fue
found to be significant for both DL-matched random structures (^β3 = −0.03, t value = −6.4) y
DL-matched RLAs (^β3 = −0.02, t value = −4.8).5

Finalmente, with regard to DLM as an independent constraint hypothesis, the average depen-
dency length grows significantly slower in real trees compared to IC-matched random

5 The notes S9 and S10 in the Supplemental Materials show the language-specific analysis for the ICM as an
independent constraint hypothesis.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

155

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

.

/

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

Cifra 5.
intervener complexity (IC)-matched random structures and IC-matched random linear arrangement (RLAs).

Fitted models showing the growth of dependency length with respect to sentence length in real language trees compared to

Mesa 4. DLM as an independent constraint: Estimates from the fitted linear-mixed models for
IC-matched random structures and IC-matched RLAs.

IC-matched random structures
SE
0.014

Estimate
1.85

t value
131.97*

IC-matched RLAs
SE
0.022

Estimate
1.81

0.34

−0.19

−0.07

0.009

0.009

0.005

36.85*

−21.71*

−12.96*

0.22

−0,04

0.01

0.010

0.005

0.003

t value
79.50*

22.31*

−8.17*

3.51*

Interceptar

S.length

Real

S.length:Real

Nota. DLM = dependency length minimization; IC = intervener complexity; RLA = random linear
arrangement; S.length = sentence length.

estructuras (^β3 = −0.07, t value = −12.9). Sin embargo, this pattern does not hold for IC-matched
RLAs—the dependency length with respect to sentence length *does not* grow slower in real
language trees compared to that in IC-matched RLAs (^β3 = 0.01, t value = 3.5). See Figure 5
and Table 4 for details.6

DISCUSIÓN

Our first key finding is that, cross-linguistically, the complexity of the linguistic material (cosa-
sured as syntactic heads) intervening a syntactic dependency in treebank sentences is mini-
mized. Our second key finding is that this minimization of intervener complexity holds even
when the dependency length distribution is controlled in the random baseline trees. Finalmente,
and most surprisingly, the results show that dependency length in real trees is not minimized

6 The notes S11 and S12 in the Supplemental Materials show the language-specific analysis for the DLM as an
independent constraint hypothesis.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

156

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

Random baseline
Random structures baseline

Random linear arrangements

DL-matched random structures

DL-matched RLAs

IC-matched random structures

IC-matched RLAs

Mesa 5.

Summary of evidence for each hypothesis.

Evidence for hypothesis

ICM hypothesis

ICM as Independent Constraint

DLM as Independent Constraint

Nota. ✓ means a baseline furnished evidence for tested hypothesis, ✗ means a baseline did not furnish any evidence for the hypothesis,
– signifies not relevant; ICM = intervener complexity minimization hypothesis; DL = dependency length; IC = intervener complexity; RLA =
random linear arrangement.

against a baseline controlled for IC-distribution and topological structure of the tree. Juntos
the results suggest that, cross-linguistically, intervener complexity captures syntactic complex-
ity better than DL. Mesa 5 provides a summary of the results.

Is DLM Epiphenomenal?

Results show that an optimal linear arrangement for minimizing intervener complexity could,
Sucesivamente, minimize DL. How can we interpret this finding?

We begin by noting that a particular dependency length can result from two types of inter-
vening structures: (a) Low intervener complexity structure having more intervening dependents
and fewer intervening heads, o (b) High intervener complexity structure having more interven-
ing heads and fewer intervening dependents.7 Figure 6 shows the two structures; the observed
dependency length of Xh → Xd in structure (a) is driven entirely by intervening dependents,
while in (b), it is primarily driven by intervening heads. Notice that a low intervener complex-
ity structure requires a high arity for at least one of the nodes in the structure (p.ej., Xh in
Figure 6a).

Given these two intervener complexity configurations, results for the ICM as an indepen-
dent constraint hypothesis show that cross-linguistically a low intervener complexity structure
is preferred over a high intervener complexity structure. Recall that the hypothesis was tested
using DL-matched baselines where the distribution of dependency length is identical to the
real trees. The results for this hypothesis, por lo tanto, are not driven by dependency length–
related constraints. We now assess the results for DLM as an independent constraint hypoth-
esis in the light of the constraint that natural languages prefer low intervener complexity
estructuras.

DL Minimization in Real Trees Against IC-Matched Random Structures Assuming the ICM con-
straint on real language trees, IC-matched random structures trees cannot posit syntactic
configurations with high-intervener complexity (ver figura 6). Sin embargo, there is no restriction
on the topological structure of these random trees.8 Consequently, these random trees can

7 Intervening dependents here mean the terminal dependents that intervene a dependency.
8 Recall that IC-matched random structures trees match in intervener complexity distribution, however they
do not control for topological properties (p.ej., arity) of the real trees.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

157

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

.

/

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

→ Xd) can be obtained by two types
Cifra 6. A schematic showing that a given dependency length (p.ej., length = 4 for the dependency Xh
of structures. Low intervener complexity structure (a) has higher arity and few heads. High intervener complexity structure (b) has low arity
and more number of intervening heads.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

Cifra 7. The distribution of tree arity in real trees and intervener complexity (IC)-matched random structures. IC-matched random struc-
tures use flexibility in topological structure to posit higher arity and hence longer dependency distance than real trees.

i

/

/

.

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

have more instances of structures with high arity compared to real trees. Como resultado, they could
still posit longer dependencies in spite of low-intervener complexity configurations (ver
Cifra 6). Cifra 7 shows that arity in IC-matched random structures is higher than real trees,
especially for longer sentences. This demonstrates that arity distribution in real trees is an
important determinant of dependency length.

No Evidence for Dependency Length Minimization in Real Trees Against IC-matched RLAs Compared
to the IC-matched random structures trees, the possibility to posit longer dependencies due to
flexibility in topological structure gets severely restricted in IC-matched RLAs.9 As a conse-
quence, the two mechanisms that can drive long dependencies (ver figura 6) are less acces-
sible here. Como consecuencia, IC-matched RLAs do not show conclusive evidence for dependency
length minimization in real trees. This suggests that, together, the constraints on intervener
complexity and constraints on topological structures of trees, like arity, could determine the
distribution of dependency length in natural language.

9 Recall that IC-matched RLAs control for intervener complexity as well as topological properties (como
arity).

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

158

Reappraisal of Dependency Length Minimization

Yadav et al.

Asymmetry in Constraints on Intervener Complexity Versus Dependency Length In order to under-
stand the nature of structures preferred by real trees for positing dependencies of a given IC or
a given DL, we did an exploratory analysis. We note the following:

1. For positing dependencies of a given length, the real trees use low-IC structures more
frequently compared to the DL-matched baseline trees (ver figura 8). This implies that
real trees prefer low-IC structures regardless of dependency length. This low-IC ten-
dency in real trees becomes even more stronger for longer dependencies.

2. Por el contrario, real trees do not show much preference for low-DL structures when com-
pared with IC-matched RLAs (ver figura 9). For positing structures with a given IC, el
real trees choose almost as many short dependencies as the baseline trees. Además,
the real trees and IC-matched RLAs have the same average DL for a given intervener
complejidad (ver figura 10).

The above points suggest an asymmetry in constraints on IC versus DL in real trees:
Compared to a baseline controlled for DL, the real trees prefer low IC structures; but compared
to the RLAs controlled for IC, the real trees do not show much preference for shorter depen-
dencies. This asymmetry supports the ICM as an independent constraint hypothesis, pero lo hace
not support the DLM as an independent constraint hypothesis.

Notes on Methodology and Limitations of the Current Work

As stated earlier, multiple corpus-based work (p.ej., Futrell et al., 2015; Gildea & Temperley,
2010; Liu, 2008) have previously provided evidence for DLM cross-linguistically using the
method similar to the one employed in the current study. Given that the methodology involves
the comparison of real trees with random baseline trees, the nature of these baseline trees

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 8. The number of low intervener complexity (IC) (≤ 2) structures in real trees and dependency length (DL)-matched random struc-
tures at each dependency length. Compared to the baseline trees, the real trees tend to use low IC structures for positing longer dependencies.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

159

Reappraisal of Dependency Length Minimization

Yadav et al.

Cifra 9. The frequency of short (dependency lenth—DL ≤ 3) dependencies in real trees versus intervener complexity (IC)-emparejado
random linear arrangement (RLAs) at each intervener complexity. Compared to the baseline trees, the real trees do not show preference
for short dependencies for positing a given IC structure. Figure shows up to IC 4, because high IC (> 4) structures cannot be achieved by short
(DL ≤ 3) dependencies.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

/

.

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 10. The average dependency length at each intervener complexity (IC) in real trees versus IC-matched random linear arrangement
(RLAs).

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

160

Reappraisal of Dependency Length Minimization

Yadav et al.

becomes critical. Most previous work (p.ej., Futrell et al., 2015; Liu, 2008) use baselines akin to
the random structures baseline and RLAs. En el trabajo actual, we wanted to directly assess the
evidence for the independence of two constraints—whether a certain constraint X on real trees
holds independent of another constraint Y. This required us to compare real trees against base-
line trees that were generated under constraint Y. Por lo tanto, compared to previously used ran-
dom structures or RLAs, the baselines employed in the current work are strongly constrained.
Por ejemplo, to test whether ICM occurs independent of DLM, we compare real trees against
baseline trees that have constraints on dependency-length distribution and tree topology. En
addition, unlike baselines in previous work, which either had only noncrossing trees or an
unreasonably large number of crossing dependencies, the baselines in the current work con-
trolled for the number of crossings. Sin embargo, controlling for multiple properties makes the
generation process of these baselines very slow. Due to this reason, we have provided evi-
dence for the role of intervener complexity and arity in determining syntactic complexity in
natural languages using various baselines for sentence length < 12. So, while our baselines allow for a rigorous evaluation of various hypotheses, they are based on relatively short sen- tences. This could raise concerns regarding the generalizability of the current results. In order to assuage such concerns, below, we provide some observation of IC/arity patterns in real trees that suggests the results should hold for longer sentences as well. 1. Figure 11 shows that the rate of intervener complexity growth with sentence length is almost the same for short and long sentences. This suggests that the constraint on inter- vener complexity persists for longer sentences. 2. Figure 12 shows that arity in real sentences becomes severely restricted in longer sen- tences, while intervener complexity grows at almost the same rate for short and long sentences. This implies that IC-matched RLAs—the baseline trees that match in arity l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i . / / 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Intervener complexity at various sentence lengths for various languages. The figure shows that intervener complexity grows with Figure 11. sentence length at almost the same rate for short and long sentences cross-linguistically, which indicates that constraint on intervener com- plexity persists for longer sentences. OPEN MIND: Discoveries in Cognitive Science 161 Reappraisal of Dependency Length Minimization Yadav et al. Figure 12. The figure compares the rate of growth of intervener complexity and arity with respect to sentence length. Intervener complex- ity grows at almost the same rate for short and long sentences, while tree arity becomes increasingly restricted for longer sentences. and intervener complexity with real trees—would have much stronger restrictions on dependency length in longer sentences. This is because, as discussed earlier, positing a longer dependency requires either a high arity or a high IC, but high-arity configura- tions get severely restricted in longer sentences. This would predict that dependency length in real trees would grow at almost the same rate or faster than IC-matched RLAs. Figure 13 shows the rate of growth of dependency length in real trees up to sentence length 30 and in baseline trees up to sentence length 11. The above observations (and related figures) show that compared to short sentences, the ICM/arity effects in real trees are even stronger in longer sentences. This provides a reasonable basis to believe that the current results will hold for long sentences. We plan to take up base- line generation for long sentences in the near future. Additional concerns regarding our conclusions could be that (a) shorter sentences might belong to nonrepresentative text in the corpus such as headlines, article headings, and (b) we do not have enough power to accept the null hypothesis regarding DLM as an independent constraint. For (a), we did an additional analysis by extracting clauses of length up to 12 words from longer (> 12 palabras) sentences and compared them with corresponding IC-matched base-
line trees.10 We were able to replicate the results for DLM as an independent constraint
hypothesis: dependency length grows significantly slower in real trees from clausal data com-
pared to IC-matched random structures but not when compared with IC-matched RLAs (ver
Note S5 in the Supplemental Materials). Para (b), we did a Bayes factor analysis. We find mod-
erate to strong evidence in the favor of the null hypothesis (see Note S4 in the Supplemental
Materials for detailed results). The result suggests that the confidence in accepting the null
hypothesis regarding DLM as an independent constraint should be reasonably high.

10 We thank an anonymous reviewer for suggesting this method.

MENTE ABIERTA: Descubrimientos en ciencia cognitiva

162

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

/

.

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

.

/

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Reappraisal of Dependency Length Minimization

Yadav et al.

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

/

mi
d
tu
oh
pag
metro

i
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

i

.

/

/

1
0
1
1
6
2
oh
pag
metro
_
a
_
0
0
0
6
0
2
0
4
3
1
4
8
oh
pag
metro
_
a
_
0
0
0
6
0
pag
d

/

.

i

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 13. Growth of dependency length with respect to sentence length in real trees versus baseline trees. Light gray lines represent
various real languages, thick gray line represents average growth across real trees, thick colored lines represent random baseline trees of
sentence length less than 12.

Finalmente, the lack of evidence for the DLM as an independent constraint in this work has
been based on a lack of significant interaction in the right direction (ver tabla 4). Sin embargo,
we do find a main effect of tree type (real vs. aleatorio): the average dependency length is shorter
in real trees compared to IC-matched RLAs at each sentence length (all t values < −2). As pointed out by an anonymous reviewer, this pattern goes against our claim that DLM could be a consequence of constraint on IC and tree topology. Our choice of using the interaction effect to test the hypothesis is based on one of the definitive, large-scale corpus investigation of dependency length minimization (Futrell et al., 2015) that uses the interaction effect estimate to argue for the DLM hypothesis. Given the importance of the claims in Futrell et al. (2015), it is imperative that a comparative study of DLM against a competing hypothesis should also use a similar methodology. However, in the context of our last claim about potential nonindepen- dence of DLM, different conclusions can be drawn based on the estimates of main effect (at each sentence length) and the interaction effect. Considering this methodological issue, we cannot conclusively argue that DLM might arise due to constraint on IC and arity restrictions. The only certain conclusion from our study is that ICM is an independent constraint on lan- guage while DLM may or may not be epiphenomena of ICM. Our additional analyses show that ICM is indeed a stronger constraint compared to DLM in determining the distribution of word order and syntactic choices in natural languages. The current work, therefore, shows that, in shorter sentences, ICM is an independent con- straint on natural languages. On the other hand, we do not find any conclusive evidence for DLM as an independent constraint suggesting that DLM might arise as a consequence of ICM and arity restrictions. However, it remains a possibility that our conclusions are driven by OPEN MIND: Discoveries in Cognitive Science 163 Reappraisal of Dependency Length Minimization Yadav et al. methodological idiosyncrasies (i.e., we interpreted the interaction effects only) and/or nature of the data (i.e., we used only shorter sentences). At the very least, the current work conclusively shows ICM and arity restrictions to be an equally important determinant of syntactic complex- ity as DLM. Measuring Syntactic Complexity Building syntactic structures efficiently is a key aspect of language processing. Numerous research has highlighted that simple and easier structures are preferred during both compre- hension (e.g., Ferreira et al., 2002; Ferreira & Patson, 2007; Fodor & Inoue, 2000; Frazier, 1985; Gibson, 1998; Lewis & Vasishth, 2005) and production (e.g., Bock & Warren, 1985; Ferreira, 1991; Gibson et al., 2019; Hahn et al., 2020; Kurumada & Jaeger, 2015; MacDonald, 2013). Since syntactic heads can be assumed to be central regions of structural integrations during processing, it is not surprising that these processing-intensive units should be avoided while building a dependency. Quantifying complexity as intervening heads is consistent with previous proposals where the number of nonterminal nodes of a phrase structure tree has been assumed to be an important determinant of processing difficulty11 (e.g., Ferreira, 1991; Frazier, 1985; Miller & Chomsky, 1963; Yngve, 1960). The current work also highlights the key role of arity in deter- mining syntactic complexity. Results show that real trees have lower arity than that found in baselines such as IC-matched RLA. This is not surprising when we consider that the syntactic requirements of heads are constrained in natural languages. For example, in English, it will be rare to find verb lemmas where the number of arguments would be more than three. The current work suggests that linguistic constraints related to a head’s requirements (e.g., verb’s argument structure) are important determinants of dependency length. Overall, considerable previous work has designated phrasal complexity and number of words to be two independent ways to quantify syntactic complexity in natural languages (Ferreira, 1991; Szmrecsányi, 2004; Wasow, 1997; Wasow & Arnold, 2003). However, no previous work, to our knowledge, has tested if one of these measures is better at capturing complexity when the other is held constant. The current work introduces a method to evaluate the relative performance of a complexity measure cross-linguistically using corpus data (also see Yadav et al., 2019). Using our method, one can test whether a constraint on measure X occurs independently of a constraint on measure Y. We can do this by comparing the distri- bution of X in real trees with baseline trees matched in Y with real trees. Using this method, we tested the independence of constraints on intervening heads and constraints on intervening words. We found that the number of intervening heads is a better measure of complexity than the number of intervening words. Thus, our methodology provides a principled way to eval- uate new complexity measures against existing ones. With regard to various heads intervening a dependency, the ICM hypothesis predicts a greater avoidance of high-processing heads (i.e., those that involve a larger number of syntac- tic integrations) compared to low-processing heads. Given varying syntactic constraints, it is reasonable to assume a differential processing cost at various heads. For example, verbal heads would typically involve more integrations than adjectival heads (cf. Frazier, 1985; Gibson, 1998; Gibson & Thomas, 1999; Miller & Chomsky, 1963; Yngve, 1960). Future work 11 Intervener complexity might also seem related to the storage cost metric proposed in Gibson (1998), but they are distinct. See Note S2 in the Supplemental Materials for more details. OPEN MIND: Discoveries in Cognitive Science 164 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / / . 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Reappraisal of Dependency Length Minimization Yadav et al. Figure 14. The distribution of dependency length and intervener complexity with respect to sentence length across language. Intervener complexity shows less variability across languages and across sentence length compared to dependency length. will extend the current work by reformulating the intervener complexity measure to capture both the number and the type of intervening heads. Syntactic Complexity and Linguistic Typology The current work suggests that the number of intervening heads could be a better measure to quantify syntactic complexity compared to the number of intervening words. Could typolog- ically distinct languages differ in their distribution of intervening heads and words? More importantly, could the results for ICM/DLM as an independent hypothesis differ based on lan- guage typology? We did an additional analysis to test these questions, specifically testing if (a) distribution of intervening words/heads differ in Subject-Object-Verb (SOV ) versus Subject-Verb-Object (SVO) languages, and (b) if the results for ICM/DLM as an independent hypothesis on aggre- gated data differ for SOV versus SVO languages. Regarding (a), results show that the number of intervening heads, as well as the number of intervening words, are more in SOV languages com- pared to SVO languages. Interestingly, a recent cross-linguistic corpus study by Yadav et al. (2020) shows that the number of intervening heads is highly constrained across languages, and this constraint shows less variability compared to the number of intervening words (see Figure 14). Regarding (b), we find that both SOV and SVO languages show expected depen- dency length and intervener complexity minimization that was found in the aggregated data, that is, IC/DL grows significantly slower in real trees compared to random baseline trees (except IC-matched RLAs). At the same time, the effect of minimization is weaker in SOV language compared to SVO languages suggesting a degree of linguistic adaptability in SOV languages (cf. Levy & Keller, 2013; Vasishth et al., 2010; Yadav et al., 2020).12 Together these additional analyses suggest that results obtained on the aggregated data can be generalized to these typo- logically distinct languages. 12 Note S6 in the Supplemental Materials provides detailed results for these analyses. OPEN MIND: Discoveries in Cognitive Science 165 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i . / / 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Reappraisal of Dependency Length Minimization Yadav et al. CONCLUSION This work presents a corpus investigation to show that dependency length minimization as a cross-linguistic constraint is better operationalized as the minimization of the number of syn- tactic heads that intervene a dependency rather than as the minimization of the number of words. We use a novel method to demonstrate this result. In particular, we show that when real trees are compared with random trees that control for intervening heads (and other tree properties such as arity), there is no conclusive evidence for dependency length minimization (in terms of the number of words) in the real trees. On the other hand, when real trees are compared with random trees that control for dependency length and various tree properties, we find evidence for intervener complexity minimization. These results suggest that, compared to the number of words, intervener complexity could be a better measure to quantify cross- linguistic syntactic complexity. ACKNOWLEDGMENTS We would like to thank the two anonymous reviewers for their comments. We also thank Richard Futrell for his comments on an earlier draft of the paper. AUTHOR CONTRIBUTIONS HY: Conceptualization: Equal; Formal analysis: Lead; Methodology: Equal; Supervision: Equal; Visualization: Lead; Writing - Original Draft: Supporting; Writing - Review & Editing: Equal. SM: Formal analysis: Supporting; Visualization: Supporting; Writing - Review & Editing: Supporting. SH: Conceptualization: Equal; Methodology: Equal; Supervision: Equal; Writing - Original Draft: Lead; Writing - Review & Editing: Equal. REFERENCES Baddeley, A., & Hitch, J. (1974). Working memory. In G. Bower (Ed.), Recent advances in learning and motivation (vol. 8, pp. 47–89). Academic Press. https://doi.org/10.1016/S0079-7421(08)60452-1 Bartek, B., Lewis, R. L., Vasishth, S., & Smith, M. (2011). In search of on-line locality effects in sentence comprehension. Journal of Experimental Psychology: Learning, Memory and Cognition, 37(5), 1178–1198. https://doi.org/10.1037/a0024194, PubMed: 21707210 Bates, D., Machler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 Behagel, O. (1930). Zur wortstellung des deutschen. In Curme volume of linguistic studies (Language Monograph 7) (pp. 29–33). Waverly. https://doi.org/10.2307/521983 Bickerton, D. (2003). Symbol and structure: A comprehensive framework for language evolution. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 77–93). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199244843.003 .0005 Bock, J. K., & Warren, R. K. (1985). Conceptual accessibility and syntactic structure in sentence formulation. Cognition, 21(1), 47–67. https://doi.org/10.1016/0010-0277(85)90023-X, PubMed: 4075761 Bresnan, J. (1982). The mental representation of grammatical rela- tions. MIT Press. Bybee, J. (2006). From usage to grammar: The mind’s response to repetition. Language, 82(4), 711–733. https://doi.org/10.1353/lan .2006.0186 Chomsky, N. (1975). The logical structure of linguistic theory. University of Chicago Press. Chomsky, N. (1995). The minimalist program (vol. 28). Cambridge University Press. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Brain and Behav- ioral Sciences, 24(1), 87–114. https://doi.org/10.1017 /S0140525X01003922, PubMed: 11515286 Croft, W. A. (2001). Functional approaches to grammar. In N. J. Smelser & P. B. Baltes (Eds.), International encyclopedia of the social and behavioral sciences (pp. 6323–6330). Elsevier Sciences. https://doi.org/10.1016/B0-08-043076-7/02946-6 Dillon, B. (2011). Structured access in sentence comprehension (Unpublished doctoral dissertation). University of Maryland. Ferreira, F. (1991). Effects of length and syntactic complexity on initiation times for prepared utterances. Journal of Memory and Language, 20(2), 210–233. https://doi.org/10.1016/0749-596X (91)90004-4 Ferreira, F., Bailey, K. G., & Ferraro, V. (2002). Good-enough rep- resentations in language comprehension. Current Directions in Psychological Science, 11(1), 11–15. https://doi.org/10.1111 /1467-8721.00158 Ferreira, F., & Patson, N. D. (2007). The “good enough” approach to language comprehension. Language and Linguistics Compass, 1, 71–83. https://doi.org/10.1111/j.1749-818X.2007.00007.x Ferrer-i Cancho, R. (2006). Why do syntactic links not cross? EPL (Europhysics Letters), 76(6), Article 1228. https://doi.org/10.1209 /epl/i2006-10406-0 OPEN MIND: Discoveries in Cognitive Science 166 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / . / 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Reappraisal of Dependency Length Minimization Yadav et al. Ferrer-i Cancho, R., & Liu, H. (2013). The risks of mixing depen- dency lengths from sequences of different length. ArXiv. https:// arxiv.org/abs/1304.3841 Fodor, J. D., & Inoue, A. (2000). Garden path reanalysis: Attach (anyway) and revision as last resort. In M. DiVincenzi & V. Lombardo (Eds.), Cross-linguistic perspectives in language pro- cessing (pp. 21–61). Kluwer. https://doi.org/10.1007/978-94 -011-3949-6_2 Frazier, L. (1985). Syntactic complexity. In L. K. D. Dowty & language parsing (pp. 129–189). A. Zwicky (Eds.), Natural Cambridge University Press. https://doi.org/10.1017 /CBO9780511597855.005 Futrell, R., Levy, R., & Gibson, E. (2020). Dependency locality as an explanatory principle for word order. Language, 96(2), 371–412. https://doi.org/10.1353/lan.2020.0024 Futrell, R., Mahowald, K., & Gibson, E. (2015). Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences, 112(33), 10336–10341. https://doi.org/10.1073/pnas.1502134112, PubMed: 26240370 Gerdes, K., Guillaume, B., Kahane, S., & Perrier, G. (2018). SUD or surface-syntactic universal dependencies: An annotation scheme near-isomorphic to UD. In M.-C. de Marneffe, T. Lynn, & S. Schuster (Eds.), Proceedings of the Second Workshop on Univer- sal Dependencies (UDW 2018) (pp. 66–74). Association for Computational Linguistics. https://doi.org/10.18653/v1/ W18 -6008 Gerdes, K., Guillaume, B., Kahane, S., & Perrier, G. (2019). Improving surface-syntactic universal dependencies (SUD): Surface-syntactic relations and deep syntactic features. In M. Candito, K. Evang, S. Oepen, & D. Seddah (Eds.), TLT 2019-18th International Workshop on Treebanks and Linguistic Theories (pp. 126–132). Association for Computational Linguistics. https://doi.org/10.18653/v1/ W19-7814 Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1–76. https://doi.org/10.1016 /S0010-0277(98)00034-1, PubMed: 9775516 Gibson, E., Futrell, R., Piantadosi, S. T., Dautriche, I., Mahowald, K., Bergen, L., & Levy, R. (2019). How efficiency shapes human language. Trends in Cognitive Sciences, 23(5), 389–407. https:// doi.org/10.1016/j.tics.2019.02.003, PubMed: 31006626 Gibson, E., & Thomas, J. (1999). Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes, 14(3), 225–248. https://doi.org/10.1080/016909699386293 Gildea, D., & Temperley, D. (2007). Optimizing grammars for min- imum dependency length. In A. Zaenen & A. van den Bosch (Eds.), Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 184–191). Association for Computational Linguistics. Gildea, D., & Temperley, D. (2010). Do grammars minimize depen- dency length? Cognitive Science, 34(2), 286–310. https://doi.org /10.1111/j.1551-6709.2009.01073.x, PubMed: 21564213 Grodner, D., & Gibson, E. (2005). Consequences of the serial nature of linguistic input. Cognitive Science, 29(2), 261–290. https://doi.org/10.1207/s15516709cog0000_7, PubMed: 21702774 Hahn, M., Jurafsky, D., & Futrell, R. (2020). Universals of word order reflect optimization of grammars for efficient communica- tion. Proceedings of the National Academy of Sciences, 117(5), 2347–2353. https://doi.org/10.1073/pnas.1910923117, PubMed: 31964811 Haspelmath, M. (2008). Parametric versus functional explanations of syntactic universals. In T. Biberauer (Ed.), The limits of syntactic variation (pp. 75–107). Benjamins. https://doi.org/10 .1075/la.132.04has Hawkins, J. A. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21(2), 223–261. Hawkins, J. A. (1994). A performance theory of order and constitu- ency (vol. 73). Cambridge University Press. https://doi.org/10 .1017/CBO9780511554285 Hawkins, J. A. (2014). Cross-linguistic variation and efficiency. Oxford University Press. https://doi.org/10.1093/acprof:oso /9780199664993.001.0001 Hockett, C. F. (1960). The origin of speech. Scientific American, 203(3), 88–96. https://doi.org/10.1038/scientificamerican0960-88 Hudson, R. (1984). Word grammar. Blackwell. Hudson, R. (1995). Measuring syntactic difficulty. University College London. Jaeger, T. F., & Tily, H. (2011). On language “utility”: Processing complexity and communicative efficiency. Wiley Interdisciplin- ary Reviews: Cognitive Science, 2(3), 323–335. https://doi.org /10.1002/wcs.126, PubMed: 26302080 Just, M. A., & Carpenter, P. A. (1992). A capacity theory of compre- hension: Individual differences in working memory. Psychologi- cal Review, 99(1), 122–149. https://doi.org/10.1037/0033-295X .99.1.122, PubMed: 1546114 Kurumada, C., & Jaeger, T. F. (2015). Communicative efficiency in language production: Optional case-marking in Japanese. Jour- nal of Memory and Language, 83, 152–178. https://doi.org/10 .1016/j.jml.2015.03.003 Levy, R., & Keller, F. (2013). Expectation and locality effects in German verb-final structures. Journal of Memory and Language, 68(2), 199–222. https://doi.org/10.1016/j.jml.2012.02.005, PubMed: 24558294. Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29(3), 375–419. https://doi.org/10.1207/s15516709cog0000_25, PubMed: 21702779 Liu, H. (2008). Dependency distance as a metric of language com- prehension difficulty. Journal of Cognitive Science, 9(2), 159–191. https://doi.org/10.17791/jcs.2008.9.2.159 Liu, H., Xu, C., & Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews, 21, 171–193. https://doi.org/10.1016/j.plrev .2017.03.002, PubMed: 28624589 MacDonald, M. C. (2013). How language production shapes language form and comprehension. Frontiers in Psychology, 4, Article 226. https://doi.org/10.3389/fpsyg.2013.00226, PubMed: 23637689 Mel’čuk, I. A. (1988). Dependency syntax: Theory and practice. SUNY Press. Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63(2), 81–97. https://doi.org/10.1037 /h0043158, PubMed: 13310704 Miller, G. A., & Chomsky, N. (1963). Finitary models of language users. In R. R. Bush, R. D. Luce, & E. Galanter (Eds.), Handbook of mathematical psychology (vol. 2, pp. 419–492). Wiley. Miyake, A., & Shah, P. (1999). Models of working memory: Mechanisms of active maintenance and executive control. C a m b r i d g e U n i v e r s i t y P r e s s . h t t p s : / / d o i . o rg / 1 0 . 1 0 1 7 /CBO9781139174909 Osborne, T., & Gerdes, K. (2019). The status of function words in dependency grammar: A critique of universal dependencies (UD). Glossa: A Journal of General Linguistics, 4(1), Article 17. https://doi.org/10.5334/gjgl.537 OPEN MIND: Discoveries in Cognitive Science 167 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / / . 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d / . i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Reappraisal of Dependency Length Minimization Yadav et al. Piantadosi, S. T., Tily, H., & Gibson, E. (2012). The communicative function of ambiguity in language. Cognition, 122(3), 280–291. https://doi.org/10.1016/j.cognition.2011.10.004, PubMed: 22192697 Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure gram- mar. Center for the Study of Language and Information. Prüfer, H. (1918). Neuer beweis eines satzes über permutationen [New proof of a theorem on permutations]. Archiv der Mathema- tik und Physik, 3(27), 142–144. R Core Team. (2020). R: A language and environment for statistical computing [Computer software manual]. R Foundation for Statis- tical Computing. Rijkhoff, J. (1986). Word order universals revisited: The principle of head proximity. Belgian Journal of Linguistics, 1, 95–125. https:// doi.org/10.1075/bjl.1.05rij Scontras, G., Badecker, W., & Fedorenko, E. (2017). Syntactic complexity effects in sentence production: A reply to Macdonald, Montag, and Gennari (2016). Cognitive Science, 41(8), 2280–2287. https://doi.org/10.1111/cogs.12495, PubMed: 28397342 Straka, M., Hajic, J., Straková, J., & Hajic, J., Jr. (2015). Parsing universal dependency treebanks using neural networks and search-based Oracle. In M. Dickinson, E. Hinrichs, A. Patejuk, & A. Przepiórkowski (Eds.), Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT14) (pp. 208–220). Institute of Computer Science of the Polish Academy of Sciences. Szmrecsányi, B. M. (2004). On operationalizing syntactic com- plexity. In G. Purnelle, C. Fairon, & A. Dister (Eds.), Proceedings of the 7th International Conference on Textual Data Statistical Analysis (pp. 1032–1039). Presses Universitaires de Louvain, Louvain-la-Neuve. Temperley, D. (2007). Minimization of dependency length in written English. Cognition, 105(2), 300–333. https://doi.org/10 .1016/j.cognition.2006.09.011, PubMed: 17074312 Temperley, D., & Gildea, D. (2018). Minimizing syntactic depen- dency lengths: Typological/cognitive universal? Annual Review of Linguistics, 4, 67–80. https://doi.org/10.1146/annurev -linguistics-011817-045617 Vasishth, S., Suckow, K., Lewis, R. L., & Kern, S. (2010). Short-term forgetting in sentence comprehension: Crosslinguistic evidence from verb-final structures. Language and Cognitive Processes, 25(4), 533–567. https://doi.org/10.1080/01690960903310587 Wasow, T. (1997). Remarks on grammatical weight. Language Variation and Change, 9(1), 81–105. https://doi.org/10.1017 /S0954394500001800 Wasow, T., & Arnold, J. (2003). Post-verbal constituent ordering in English. In G. Rohdenburg & B. Mondorf (Eds.), Determinants of grammatical variation in English (pp. 119–154). De Gruyter Mouton. https://doi.org/10.1515/9783110900019.119 Yadav, H., Husain, S., & Futrell, R. (2019). Are formal restrictions on crossing dependencies epiphenominal? In M. Candito, K. Evang, S. Oepen, & D. Seddah (Eds.), TLT 2019-18th International Workshop on Treebanks and Linguistic Theories (pp. 2–12). Asso- ciation for Computational Linguistics. https://doi.org/10.18653 /v1/ W19-7802 Yadav, H., Vaidya, A., & Husain, S. (2017). Understanding constraints on non-projectivity using novel measures. In S. Montemagni & J. Nivre (Eds.), Proceedings of the Fourth Interna- tional Conference on Dependency Linguistics (Depling 2017) (pp. 276–286). Linköping University Electronic Press. Yadav, H., Vaidya, A., Shukla, V., & Husain, S. (2020). Word order typology interacts with linguistic complexity: A cross-linguistic corpus study. Cognitive Science, 44(4), Article e12822. https:// doi.org/10.1111/cogs.12822, PubMed: 32223024 Yngve, V. H. (1960). A model and an hypothesis for language struc- ture. Proceedings of the American Philosophical Society, 104(5), 444–466. Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley Press. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u o p m i / l a r t i c e - p d f / d o i / i / / . 1 0 1 1 6 2 o p m _ a _ 0 0 0 6 0 2 0 4 3 1 4 8 o p m _ a _ 0 0 0 6 0 p d . / i f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 OPEN MIND: Discoveries in Cognitive Science 168INFORME imagen
INFORME imagen
INFORME imagen

Descargar PDF