How to Distinguish Languages and Dialects - IA de Investigación especializada en el MIT

Cómo distinguir idiomas y dialectos

Søren Wichmann
Leiden University Centre for
Lingüística, Kazan Federal University,
and Beijing Advanced Innovation Center
for Language Resources
wichmannsoeren@gmail.com

The terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agree
that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be
identiﬁed as either distinct languages or two dialects of one and the same language. A database
of lexical information for more than 7,500 speech varieties, sin embargo, unveils a strong tendency
for linguistic distances to be bimodally distributed. For a given language group the linguistic
distances pertaining to either cluster can be teased apart, identifying a mixture of normal
distributions within the data and then separating them ﬁtting curves and ﬁnding the point where
they cross. The thresholds identiﬁed are remarkably consistent across data sets, qualifying their
mean as a universal criterion for distinguishing between language and dialect pairs. The mean
of the thresholds identiﬁed translates into a temporal distance of around one to one-and-a-half
millennia (1,075–1,635 years).

1. Two Approaches

Most linguists would agree that it is difﬁcult and often controversial to distinguish lan-
guages from dialects. Many, sin embargo, would also agree that the notions of language and
dialect are still useful, even for the linguist who is aware of the problems of deﬁnition
that they entail (Agard 1984). The distinction is useful for many different purposes, semejante
as cataloguing languages, assigning ISO 639-3 codes, preparing maps of languages,
planning revitalization efforts, or for doing statistics on language distributions (p.ej.,
calculating diversity or density indices) (Korjakov 2017). More importantly, tal vez: Si
such a distinction is a feature of the way that language varieties are distributed rather
than just a distinction we impose in some arbitrary way, then this would be important
for the understanding of the sociology of language at large.

There are two main directions to go in order to establish a quantitative distinction.
One direction is to measure mutual intelligibility; another is to apply some consistent
and objective measure of differences between two variants with regard to phonology,
morfología, syntax, lexicon, or some combination.

Early applications of mutual intelligibility testing are detailed in Casad (1974), y
more recent work in this area includes Whaley, Grenoble, and Li (1999), Szeto (2000),
Gooskens and Schneider (2016), and Gooskens et al. (2018).

Envío recibido: 18 Febrero 2019; versión revisada recibida: 11 Julio 2019; accepted for publication:
15 Septiembre 2019.

https://doi.org/10.1162/COLI a 00366

© 2019 Asociación de Lingüística Computacional
Publicado bajo una Atribución Creative Commons-NoComercial-SinDerivadas 4.0 Internacional
(CC BY-NC-ND 4.0) licencia

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
/
C
oh

yo
i
/

a
r
t
i
C
mi
–
pag
d

F
/

4
5
4
8
2
3
1
8
4
7
5
3
1
/
C
oh

yo
i

_
a
_
0
0
3
6
6
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Ligüística computacional

Volumen 45, Número 4

Glottolog (Hammarstr ¨om, Forkel, and Haspelmath 2017) adopts the criterion of
mutual intelligibility, positing that a language variant that is not mutually intelligible
with any other language variant should be counted as a separate language.1 By this
criterion, Glottolog 4.0 contiene 7,592 spoken L1 (mother-tongue) idiomas, a excepción de
sign languages. Hay, sin embargo, two problems with this criterion. The more serious
problem is that intelligibility is often not symmetrical. Thus language variant A can
be more intelligible to speakers of language variant B than language variant B is to
speakers of language variant A. Such a situation may arise when A is the larger,
more inﬂuential language, causing speakers of B to have more exposure to A than
the other way around. Sin embargo, the amount of exposure that speakers have to other
language variants is entirely determined by historical and sociological factors, y esto
or other extraneous factors2 should not affect a linguistically based classiﬁcation. En
some situations the factor of exposure can be circumvented, narrowing in on “inherent
intelligibility” (Gooskens and van Heuven 2019), but this is not an easy task. The more
practical problem with the criterion of mutual intelligibility is that measurements are
usually simply not available.

The second approach was referred to by Voegelin and Harris (1951) as “count
sameness.” While recognizing that “sameness” can be measured for different areas of
linguistic structure, they place emphasis on the then recent approach of Swadesh (1950),
who had presented counts of cognates for different varieties of Salishan languages—an
approach that represented the birth of glottochronology and lexicostatistics.

In this paper I will use a phonologial distance coming from lexical data, and I will
not discuss measures from other types of linguistic data; the fact is that we presently
only have sufﬁcient coverage for the lexical domain. I will also leave the issue of mutual
intelligibility measures, but it is worth mentioning that such measures actually have
been shown to correlate well with counts of cognates on standardized word lists (grandes
1957; Ladefoged, Glick, and Criper 1972; Bender and Cooper 1971).

2. Using the Normalized Levenshtein Distance (LDN)

The ASJP database (Wichmann, Holman, and Brown 2018) contains word lists for a
40-item subset of the Swadesh 100-item list from 7,655 doculects (language varieties
as deﬁned by the source in which they are documented). Stating how many languages
this corresponds to would beg the question that interests us here, but if a unique ISO
639-3 code represents a unique language then the database can be said to represent
around two-thirds of the world’s languages. Only word lists are used that are 70%
completo (es decir., having at least 28 words out of the 40) and represent languages recorded
within the last few centuries. Creoles and pidgins are excluded. This leaves a sample of
5,800 doculects. Although the word lists are short, it has been shown by J¨ager (2015),
who also used the 70% completeness criterion for his selection of word lists from the
ASJP database, that reliability as measured by Cronbach’s alpha, following Heeringa
et al. (2006), is sufﬁcient for phylogenetic purposes. The word lists are transcribed in a
simpliﬁed system called ASJPcode. The pros and cons of this system are discussed in
Marrón, Wichmann, and Holman (2013).

A linguistic distance measure that can be applied to the ASJP data is a version of
the Levenshtein (or edit) distancia, averaged over word pairs. La distancia de Levenshtein

1 See http://glottolog.org/glottolog/glottologinformation (accedido 2019-07-01).
2 Even a factor such as differences in ethnic background may affect perceived intelligibility, despite two

people speaking the same language (Frotar 1992).

824

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
/
C
oh

yo
i
/

a
r
t
i
C
mi
–
pag
d

F
/

4
5
4
8
2
3
1
8
4
7
5
3
1
/
C
oh

yo
i

_
a
_
0
0
3
6
6
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Wichmann

Cómo distinguir idiomas y dialectos

is the number of substitutions, inserciones, or deletions required to transform one word
into another. Given two word lists, one can measure the Levenshtein distance for each
word pair divided by the length of the longer of the two words. The mean of these
individual word distances may be called the LDN (Levenshtein distance normalized).
A further modiﬁcation is to divide the LDN by the average LDNs for words in the two
lists that do not refer to the same concepts. This has been called LDND (Vida
distance normalized divided) (Wichmann et al. 2010). The second modiﬁcation is in-
tended primarily for comparisons of languages that are regarded as being unrelated,
since it controls for accidental similarities due to similar sound inventories. Para el
present purposes of using ASJP for distinguishing languages and dialects, we are only
interested in comparing genealogically related language varieties. De este modo, we can resort
to the simpler and faster LDN measure. Sin embargo, LDND measures will also be cited
because they translate into years of separation of two related speech varieties (Holman
et al. 2011). Both the LDN and the LDND are implemented in Wichmann (2019), cual
is used here for the following experiments.

3. Distinguishing Languages and Dialects by LDN

Before trying to ﬁnd a value of LDN that might serve as a criterion for distinguishing
languages and dialects, it is of interest to look at the distribution of LDNs for puta-
tive language vs. dialect pairs using the ISO 639-3 codes of Ethnologue (Simons and
Fennig 2017). Cifra 1 is a comparison of two boxplots, the one to the left showing
the distribution of LDNs for doculects in ASJP belonging to the same ISO 639-3 código
language and the one to the right showing the distribution of LDNs for doculects in
ASJP not belonging to the same ISO 639-3 code language but belonging to the same
genus (group of relatively closely related languages using the scheme of WALS [Dryer
and Haspelmath 2013]).

Cifra 1
Boxplots of LDNs for ASJP doculects belonging to same vs. different ISO 639-3 codes (but same
genera).

825

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
/
C
oh

yo
i
/

a
r
t
i
C
mi
–
pag
d

F
/

4
5
4
8
2
3
1
8
4
7
5
3
1
/
C
oh

yo
i

_
a
_
0
0
3
6
6
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

0.00.20.40.60.81.0same ISO−code different ISO−codeLDN

Ligüística computacional

Volumen 45, Número 4

Como se esperaba, Cifra 1 shows that same-ISO-code LDNs tend to be smaller than
different-ISO-code LDNs. But we also see that there are many outliers where same-
ISO-code LDNs are extremely large and different-ISO-code LDNs extremely small. No
doubt, some of these outliers are due to ISO 639-3 codes that were misassigned either in
the original sources used by ASJP transcribers or by these transcribers themselves. Incluso
when the outliers are ignored, sin embargo, there is an overlap.

To be more consistent than Ethnologue, one could apply a certain cut-off value
of LDN to distinguish languages and dialects. The obvious question then arises: Es
there some way in which a non-arbitrary cut-off point can be found? In Wichmann
(2010) it was suggested that language distances within families may have a multimodal
distribution where distances typically belonging to dialects form a roughly normal
distribution different from the also roughly normal distribution of distances between
two different languages. In that paper the distances used were LDNDs and the example
used for illustration came from the Uto-Aztecan family. Here I follow up on the idea by
plotting LDNs for 15 language genera for which 10% or more of ASJP doculect pairs
have the same ISO 639-3 código. Cifra 2 shows a histogram of LDNs for each of the 15
genera. Overlaid on each histogram is a density curve (in black) and two curves (in red
and green) ﬁtting the data to a mixture of two normal distributions. These graphs were
produced using the R package mixtools (Benaglia et al. 2009), speciﬁcally, the plotting
method associated with the output of the normalmixEM() función. According to the doc-
umentation, this function implements “the standard EM algorithm for normal mixtures
that maximizes the conditional expected complete-data log-likelihood at each M-step of
the algorithm.” The curves show that a bimodal distribution can either be manifested
very distinctly (p.ej., curves for Eleman Proper, Huitoto, Iranian, Mayan, Sama-Bajaw,
South Sulawesi) or merely show up as a skew in the left tail of the distribution (p.ej.,
Atayalic, Japanese), and some other curves are more difﬁcult to interpret.

Although Figure 2 visually suggests that the vast majority of cases can be inter-
preted as a mixture of two normal distributions, we would like to verify this more ex-
actly. Using the boot.comp() function of mixtools package, this is achieved by producing
100 bootstrap realizations of the likelihood ratio statistic for testing the null hypothesis
of a k-component ﬁt versus the alternative hypothesis of a (k + 1)-component ﬁt to a
model of mixed normal distributions, applying the p < 0.05 threshold. The existence of from 1 to 4 components was tested in this way. The column carrying the header ‘k’ in Table 1 contains the results, which show a strong tendency for the preferred number of components to be two (12 out of 15 cases), or, in a few cases (3 out of 15), three components. In spite of the three cases where k = 3 yielded the best ﬁt we can treat all 15 cases in a uniform way toward the objective of ﬁnding the LDN that separates the members of two distributions by looking only at the distances within the two dis- tributions containing the lower values. Again the normalmixEM() function of Benaglia et al. (2009) is used. This outputs the parameters of the normal distributions, which allow one to identify the LDN value where the two normal distributions cross. Table 1 shows these LDN cut-offs rounded off to four decimals. It also shows the correspond- ing LDND value. LDND values corresponding to LDN were found through a linear regression using all 639,727 doculect pair distances analyzed in the present study. LDN and LDND are highly correlated (r = .985) and the formula for deriving LDND from LDN has the slope 1.00158 and intersect 0.08459. LDND values will become useful for interpreting the cut-off in terms of time depths (cf. next section). The LDN cut-offs in Table 1 are relatively narrowly distributed. Calculating a 95% conﬁdence interval around the mean of 0.5138 produces ± 0.0707. The rounded-off value of LDN = 0.51 is proposed here as a universal cut-off that may be used to 826 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 5 4 8 2 3 1 8 4 7 5 3 1 / c o l i _ a _ 0 0 3 6 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Wichmann How to Distinguish Languages and Dialects distinguish pairs of dialects from pairs of languages. This distinguishing criterion is easily applied and it was arrived at by an entirely objective procedure that can be both replicated and revised on an evolving data set. The sample of genera listed in Table 1 was chosen so as to ensure that each contains a good number of close varieties, applying the selection criterion that at least 10% of the doculect pairs should represent the same ISO 639-3 code. As the next section shows, this does not imply that the ISO standard comes to determine the results; it is just a way of ﬁltering away genera that would be unsuitable for the present investigation. The selection subtly introduces another potential bias, however: For all of the genera some data are included that come from dialect surveys. It may well be that the researcher carrying out the survey, consciously or not, aimed at a certain resolution, and this resolution would have a major impact on the parameters of the normal distribution identiﬁed here as belonging to dialects. To control for such a potential bias a resampling l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 5 4 8 2 3 1 8 4 7 5 3 1 / c o l i _ a _ 0 0 3 6 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Figure 2 Density plots of LDNs for 15 genera for which 10% or more of doculect pairs’ members pertain to one and the same ISO 639-3 code. 827 AtayalicLDNDensity0.00.40.8024BaiLDNDensity0.20.61.00.02.0Eleman ProperLDNDensity0.00.40.8024HuitotoLDNDensity0.20.604IranianLDNDensity0.00.40.8036JapaneseLDNDensity0.10.40.7024LampungicLDNDensity0.00.30.60.02.0MarienbergLDNDensity0.00.40.8036MayanLDNDensity0.00.40.80.02.5MundaLDNDensity0.00.40.8036North HalmaheranLDNDensity0.00.40.8048Northern NagaLDNDensity0.00.40.8036Sama−BajawLDNDensity0.10.30.5024South SulawesiLDNDensity0.00.40.02.0Western FlyLDNDensity0.20.6036 Computational Linguistics Volume 45, Number 4 Table 1 Language groups, threshold for LDN assuming two normal distributions, corresponding LDND value, the number of components (k) as found by a bootstrap test, percent of pairs of language varieties belonging to same ISO-639-3 code, and the number of doculects in the data for each group (N). Group LDN cut-off LDND k % same ISO-code n Atayalic Bai Eleman Proper Huitoto Iranian Japanese Lampungic Marienberg Mayan Munda North Halmaheran Northern Naga Sama-Bajaw South Sulawesi Western Fly 0.5770 0.5551 0.5195 0.6108 0.5405 0.4534 0.3882 0.6845 0.5148 0.6658 0.2686 0.5627 0.3511 0.2870 0.7275 0.6625 0.6406 0.6049 0.6964 0.6259 0.5387 0.4734 0.7702 0.6002 0.7514 0.3536 0.6482 0.4362 0.3720 0.8132 2 2 2 2 3 2 2 2 3 2 3 2 2 2 2 28.57 31.48 25.74 13.33 12.25 12.90 40.58 24.18 13.98 12.00 24.17 12.73 45.33 10.80 52.94 14 28 17 10 73 31 24 14 106 25 16 11 25 39 18 experiment was carried out where distances for 100 same-ISO-code pairs and 900 differerent-ISO-code pairs were sampled randomly (with replacement) from the total pool of distances pertaining to all genera. This was done 100 times. Each of these 100 “resampled genera” was subjected to the same analysis as the original genera, ﬁnding a cut-off between the two normal distributions having the smallest means. The result was a mean LDN threshold of 0.5686 ± 0.0072, that is, a range of 0.5614–0.5758. This is not inconsistent with the range 0.4431–0.5845 found for the unscrambled data, but is more narrow and lies toward its upper end. I interpret this result as suggesting that a future, more extensive sample of genera might lead to a somewhat higher and more narrowly deﬁned threshold, perhaps around 0.57. In the interest of transparency and replicability, I still propose the directly measured threshold of 0.51, but with the qualiﬁcation that this is a conservative estimate. The question remains whether the resampling experiment really does away with any sampling bias. Do the ﬁndings perhaps reveal more about shared perceptions among linguists about where to draw borders between languages and dialects when sampling data than about real distributions? This point of critique is somewhat specu- lative and therefore hard to counter, but it may perhaps be addressed in future research through computer simulations free of sampling biases, using a framework such as that of Wichmann and Holman (2017). 4. Discussion Going back to the Ethnologue classiﬁcation (cf. Figure 1), we may wonder whether this classiﬁcation tends to over- or underdifferentiate, assuming that LDN = 0.51 is a sensi- ble cut-off point. The 5,800 doculects in our sample lead to 635,419 pairs whose members both belong to the same WALS genus and both of which carry ISO-codes. Out of all these pairs, 0.1% have the same-ISO-code and LDN > 0.51, while there are 3.1% pares

828

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
/
C
oh

yo
i
/

a
r
t
i
C
mi
–
pag
d

F
/

4
5
4
8
2
3
1
8
4
7
5
3
1
/
C
oh

yo
i

_
a
_
0
0
3
6
6
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Wichmann

Cómo distinguir idiomas y dialectos

with different ISO-codes and LDN < 0.51. This means that if we pick a random pair of doculects from a given genus, the chance that Ethnologue will overdifferentiate, treating a dialect pair as a language pair, is more than 30 times greater than the chance that it will underdifferentiate, treating a language pair as a dialect pair. In short, Ethnologue tends to overdifferentiate, so the number of languages counted in this catalogue would be too high. To make the results more palpable, some examples of pairs of speech varieties whose status as dialects or languages probably tend to be contested are supplied in Table 2. The examples are mere illustrations that are meant to help the reader interpret different LDN values, including the 0.51 cut-off. Besides a possible solution to the perennial problem of distinguishing languages and dialects, this paper has yielded a result of potentially deeper importance for the understanding of language dynamics. We found (cf. Figure 2) that, given a sufﬁciently balanced sample of data from very closely related speech varieties and more distantly related ones, it is normally possible to discern a mixture of different distributions yielding peaks corresponding to characteristic means of what can be interpreted as, respectively, dialects and languages. The valleys between these peaks are a highly interesting phenomenon: They would seem to suggest that values around our LDN = 0.51 cut-off are atypical. This corresponds to a situation where we have a chain of dialects of one language and then a relatively abrupt transition to a neighboring chain of dialects of some other language. Such a situation characterizes national languages like German and Dutch, for instance, but, as seen in Figure 2, we also ﬁnd it for minority languages around the world. In Holman et al. (2011), it was shown that the twice-modiﬁed Levenshtein distance LDND translates into a time separation between the language varieties compared. The Table 2 LDN values for pairs of speech varieties prone to turn up in discussions about how to distinguish languages and dialects. Language names are given as in ASJP. From the top down to and including East and West Greenlandic, the pairs constitute dialects of one and the same language according to the LDN = 0.51 cut-off proposed here. The rest of the pairs constitute different languages. Speech variety A Speech variety B Malay Croatian Indonesian Bosnian Quechua Chachapoyas Quechua Huaylas Ancash Hindi Classical Nahuatl Standard German Russian Danish East Greenlandic Navajo Cairo Arabic Dongshan Chinese Catalan Japanese Urdu Pipil Bernese German (Switzerland) Belarusian Swedish West Greenlandic Jicarilla Apache Moroccan Arabic Fuzhou Chinese Spanish Miyako (Ryukuan) LDN 0.1199 0.1324 0.3055 0.4281 0.4336 0.4638 0.4647 0.4921 0.5036 0.5708 0.5814 0.6013 0.6589 0.6680 829 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 5 4 8 2 3 1 8 4 7 5 3 1 / c o l i _ a _ 0 0 3 6 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 45, Number 4 LDND values in Table 1 have a mean of 0.5992 ± 0.0709. Using the formula of Holman et al. (2011) this is equivalent to a range of 1,075–1,635 years. Thus, it takes around one to one-and-a-half millennia for a speech community to diverge into different language groups. Why can we ﬁnd those valleys in the distributions in Figure 2? Or, put differently, what is it about the way that speakers interact that allows us to distinguish languages and dialects? A possible explanation is that there is a threshold of mutual intelligibility where language varieties will inﬂuence one another if they are below the threshold but will cease to inﬂuence one another if they are above it. If mutual intelligibility between variety A and B is impeded completely, speakers may take recourse to just the more prestigious of the two, if not some third language, leaving A and B to drift apart more rapidly than would the case if both A and B were used for communication between the two groups. It awaits future studies to corroborate this idea through modeling and, ideally, through systematic sampling of both lexical data and data on intelligibility from the cells of a large but also ﬁne geographically deﬁned grid or, less ideally, through an analysis of the literature on mutual intelligibility. 5. Conclusion In this paper the question of how to distinguish languages and dialects was addressed by studying the distribution of lexical distances within groups of uncontroversially related languages (the genera of Dryer and Haspelmath [2013]). Following up on an idea tentatively suggested in Wichmann (2010), it was veriﬁed that distances among speech varieties represent mixed distributions, including a cluster that may be said to correspond to dialects and another cluster corresponding to languages. Applying an expectation-maximization algorithm to tease apart the mixture of normal distributions across a sample of 15 language groups, the average cut-off point between the two distributions was found to be LDN = 0.51, where LDN is the normalized Levenshtein distance across word pairs in the ASJP 40-item word lists of Wichmann, Holman, and Brown (2018). The corresponding temporal distance lies around 1,355 years, within the interval 1,075–1,635 years. Thus, we now have a principled way of distinguishing languages and dialects. A tantalizing question for future research is why there seems to be a real distinction, not just a theoretical or arbitrary one. Some suggestions for ways to approach this question were suggested. Acknowledgments Thanks go to Eric Holman, Harald Hammarstr ¨om, Qibin Ran, and four anonymous reviewers for stimulating discussion. The research was carried out under the auspices of the project “The Dictionary/Grammar Reading Machine: Computational Tools for Accessing the World’s Linguistic Heritage” (NWO proj. no. 335-54-102) within the European JPI Cultural Heritage and Global Change programme. It was additionally funded by a subsidy from the Russian government to support the Programme of Competitive Development of Kazan Federal University and a grant (KYR17018) from Beijing Language Innovation Center in support of a sub-topic directed by Qibin Ran. References Agard, Frederick. 1984. A Course in Romance Linguistics. Georgetown University Press. Benaglia, Tatiana, Didier Chauveau, David R. Hunter, and Derek S. Young. 2009. mixtools: An R package for analyzing ﬁnite mixture models. Journal of Statistical Software, 32(6):1–29. Bender, Marvin L. and Robert L. Cooper. 1971. Mutual intelligibility within Sidamo. Lingua, 27:32–52. Biggs, Bruce. 1957. Testing mutual intelligibility among Yuman languages. International Journal of American Linguistics, 23(2):57–62. Brown, Cecil H., Søren Wichmann, and Eric W. Holman. 2013. Sound 830 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 5 4 8 2 3 1 8 4 7 5 3 1 / c o l i _ a _ 0 0 3 6 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 Wichmann How to Distinguish Languages and Dialects correspondences in the world’s languages. Language, 89(1):4–29. Casad, Eugene H. 1974. Dialect Intelligibility Testing. Summer Institute of Linguistics of the University of Oklahoma, Norman. Dryer, Matthew S. and Martin Haspelmath, editors. 2013. The World Atlas of Language Structures Online. http://wals.info. Max Planck Institute for Evolutionary Anthropology, Leipzig. Gooskens, Charlotte and Vincent J. van Heuven. 2019. How well can intelligibility of closely related languages in Europe be predicted by linguistic and non-linguistic variables? Linguistic Approaches to Bilingualism, Early online publication, https://doi.org/10.1075/lab.17084.goo. Gooskens, Charlotte, Vincent J. van Heuven, Jelena Golubovic, Anja Sch ¨uppert, Femke Swarte, and Stefanie Voigt. 2018. Mutual intelligibility between closely related languages in Europe. International Journal of Multilingualism, 15(2):169–193. Gooskens, Charlotte and Cindy Schneider. 2016. Testing mutual intelligibility between closely related languages in an oral society. Language Documentation and Conservation, 10:278–305. Hammarstr ¨om, Harald, Robert Forkel, and Martin Haspelmath. 2017. Glottolog 3.0. http://glottolog.org. Max Planck Institute for the Science of Human History, Jena. Heeringa, Wilbert, Peter Kleiweg, Charlotte Gooskens, and John Nerbonne. 2006. Evaluation of string distance algorithms for dialectology. In Proceedings of the Workshop on Linguistic Distances, pages 51–62, Sydney. Holman, Eric W., Cecil H. Brown, Søren Wichmann, Andr´e M ¨uller, Viveka Velupillai, Harald Hammarstr ¨om, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown, Oleg Belyaev, Matthias Urban, Robert Mailhammer, Johann-Mattis List, and Dmitry Egorov. 2011. Automated dating of the world’s language families based on lexical similarity. Current Anthropology, 52(6):841–875. J¨ager, Gerhard. 2015. Support for linguistic macrofamilies from weighted sequence alignment. Proceedings of the National Academy of Sciences of the U.S.A., 112(41):12752–12757. Korjakov, Yurij Borisovich. 2017. Problema “jazyk ili dialekt” i popytka leksikostatisticheskogo podxoda. Voprosy Jazykoznanija, 6:79–101. Ladefoged, Peter, Ruth Glick, and Clive Criper. 1972. Language in Uganda. Oxford University Press. Rubin, Donald L. 1992. Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education, 33(4):511–531. Simons, Gary F. and Charles D. Fennig. 2017. Ethnologue: Languages of the World, Twentieth Edition. SIL International, Dallas, TX. Swadesh, Morris. 1950. Salish internal relationships. International Journal of American Linguistics, 16(4):157–164. Szeto, Cecilia. 2000. Testing intelligibility among Sinitic dialects. In Proceedings of ALS2K, the 2000 Conference of the Australian Linguistic Society, Melbourne. Voegelin, Charles F. and Zellig S. Harris. 1951. Methods for determining intelligibility among dialects of natural languages. Proceedings of the American Philosophical Society, 95(3):322–329. Whaley, Lindsay J., Lenore A. Grenoble, and Fengxiang Li. 1999. Revisiting Tungusic. Language, 75(2):286–321. Wichmann, Søren. 2010. Internal language classiﬁcation. In Luraghi, Silvia and Vit Bubenik, editors, The Continuum Companion to Historical Linguistics. Continuum Books, London/New York, pages 70–86. Wichmann, Søren. 2019. Interactive R program for ASJP version 1. https:// github.com/Sokiwi/InteractiveASJP01. Wichmann, Søren and Eric W. Holman. 2017. New evidence from linguistic phylogenetics supports phyletic gradualism. Systematic Biology, 66(4):604–610. Wichmann, Søren, Eric W. Holman, Dik Bakker, and Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A, 389:3632–3639. Wichmann, Søren, Eric W. Holman, and Cecil H. Brown, editors. 2018. The ASJP Database (version 18). http://asjp. clld.org/. 831 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 5 4 8 2 3 1 8 4 7 5 3 1 / c o l i _ a _ 0 0 3 6 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3
Descargar PDF