Linguistic Parameters of Spontaneous Speech - 麻省理工学院人工智能研究专业

Linguistic Parameters of Spontaneous Speech
for Identifying Mild Cognitive Impairment
and Alzheimer Disease

Veronika Vincze
MTA-SZTE Research Group on
人工智能
vinczev@inf.u-szeged.hu

Martina Katalin Szab ´o
MTA TK Computational Social Science –
Research Center for Educational and
Network Studies (CSS-RECENS)
and University of Szeged
Institute of Informatics
martina@inf.u-szeged.hu

Ildik ´o Hoffmann
Research Centre for Linguistics
E ¨otv ¨os Lorand Research Network
and University of Szeged
Department of Hungarian
语言学, Szeged
hoffmannildi@gmail.com

L´aszl ´o T ´oth
University of Szeged
Institute of Informatics
tothl@inf.u-szeged.hu

Magdolna P´ak´aski
University of Szeged
Department of Psychiatry
babikne.pakaski.magdolna
@med.u-szeged.hu

提交材料已收到: 9 七月 2020; 收到修订版: 14 九月 2021; 接受出版:
5 十二月 2021.

https://doi.org/10.1162/COLI 00428

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 48, 数字 1

J´anos K´alm´an
University of Szeged
Department of Psychiatry
kalman.janos@med.u-szeged.hu

G´abor Gosztolya
MTA-SZTE Research Group on
人工智能
ggabor@inf.u-szeged.hu

在本文中, we seek to automatically identify Hungarian patients suffering from mild cog-
nitive impairment (MCI) or mild Alzheimer disease (mAD) based on their speech transcripts,
focusing only on linguistic features. In addition to the features examined in our earlier study, 我们
introduce syntactic, semantic, and pragmatic features of spontaneous speech that might affect
the detection of dementia. In order to ascertain the most useful features for distinguishing
healthy controls, MCI patients, and mAD patients, we carry out a statistical analysis of the
data and investigate the signiﬁcance level of the extracted features among various speaker group
pairs and for various speaking tasks. In the second part of the article, we use this rich feature
set as a basis for an effective discrimination among the three speaker groups. In our machine
learning experiments, we analyze the efﬁcacy of each feature group separately. Our model that
uses all the features achieves competitive scores, either with or without demographic information
(3-class accuracy values: 68%–70%, 2-class accuracy values: 77.3%–80%). We also analyze how
different data recording scenarios affect linguistic features and how they can be productively used
when distinguishing MCI patients from healthy controls.

1. 介绍

Alzheimer disease (广告) is a neurodegenerative disorder that develops for years before
clinical manifestation, while mild cognitive impairment (MCI) is usually viewed as
a prodromal stage of AD (Galvin and Sadowsky 2012). Symptoms such as language
dysfunctions may even occur nine years before the actual diagnosis (APA 2000). 因此,
the language use of the patient may suggest MCI well before the clinical diagnosis of
失智. For both types of neurodegenerative disorders, an early diagnosis is crucial
in order to allow timely treatment to decelerate progression (Nelson and Tabet 2015).
然而, according to Boise at al., for many MCI patients (最多 50%) MCI is never
公认的 (Boise, Neal, and Kaye 2004). A reason for this might be that in the early
stages of the disease it is not easy for experts to detect cognitive impairment.

Tests that are the most sensitive to cognitive and linguistic changes occurring in
early AD and other types of dementia have been intensively studied (Chapman et al.
2002). Several screening tests aim for the early detection of dementia, but they are either
too time-consuming or cannot diagnose preclinical stages. 例如, diagnostic tools
such as volumetric MRI (Scheltens et al. 2002; Zimny et al. 2011; Yin et al. 2013) 和
diffusion tensor imaging (Nakata et al. 2009; Stricker et al. 2009; Matsuda, Asada, 和
Tokumaru 2017) may be effective, but these are time-consuming and costly techniques
for early screening. Most dementia ﬁlter tests (Mini-Mental State Examination [MMSE],
Clock Drawing Test [CDT], Alzheimer’s Disease Assessment Scale-cognitive subscale

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Vincze et al.

Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD

[ADAS-cog]) are not able to accurately recognize MCI (Folstein, Folstein, and McHugh
1975; 罗森, Mohs, and Davis 1984; Janka et al. 1988; K´alm´an, Magl ´oczky, and Janka
1995; Patocskai et al. 2014). Tests on linguistic memory prove more effective in detecting
MCI, but they tend to yield a relatively high number of false-positive diagnoses (Roark
等人. 2011). 因此, cheap but still effective methods for identifying dementia as early as
possible are urgently required.

Conversation analysis has proven to be an encouraging method in detecting mem-
ory complaints (Mirheidari et al. 2017, 2016). MCI is known to affect the speech of the
patient via three main aspects. 第一的, verbal ﬂuency declines, which results in longer
hesitations and a lower speech rate (Roark et al. 2011; Pistono et al. 2019). 第二, 这
lexical frequency of words and the differences in the frequencies of parts of speech may
also change signiﬁcantly as the patient has difﬁculties with ﬁnding lexical items (Croot
等人. 2000). 第三, the emotional responsiveness of the patient has also been reported to
change frequently (L ´opez-de-Ipi ˜na et al. 2015).

In connection with the above-mentioned features, researchers recently experi-
mented with detecting different types of dementia using Automatic Speech Recognition
(ASR) tools in several studies. Just to name a few, ASR tools were utilized to detect MCI
(Lehr et al. 2012) and AD (Baldas et al. 2010; L ´opez-de-Ipi ˜na et al. 2013; Satt et al. 2014;
L ´opez-de-Ipi ˜na et al. 2015; Al-Hameed et al. 2017; K ¨onig et al. 2015; 韦纳, Herff, 和
Schultz 2016). Jarrold et al. relied on speech rate and mean and standard deviation of
vowels and consonants in spontaneous speech samples (Jarrold et al. 2014). Al-Hameed
等人. (2017) sought to predict a common clinical examination score for dementia using
acoustic information extracted from people describing a picture. They also sought to
develop a diagnostic tool that is able to distinguish sufferers with AD from those
with MCI and healthy controls. Their classiﬁcation model is capable of predicting
dementia with an average cross-visit accuracy ranging from 89.2% 到 92.4% 什么时候
performing pairwise classiﬁcation among the AD, MCI, and healthy control classes.
Al-Hameed et al. (2019) examined 15 patients with progressive neurodegenerative
disorders and 15 with functional memory disorder and, 基于 51 acoustic features
extracted from the recordings, they identiﬁed the most discriminating features. 然后
these features were used to train ﬁve different machine learning classiﬁers to differenti-
ate between the two classes, which gave a mean classiﬁcation accuracy of 96.2%.

Types of speech production tasks have also been investigated from the viewpoint of
the prediction of lexical and semantic impairment. Pistono et al. (2019) compared pause
duration and frequency in the AD participants and healthy controls using a picture-
based narrative and memory-based narrative. The results indicated that participants
with AD had more pauses only in the picture-based narrative.

As for natural language processing (自然语言处理) 方法, the lexical analysis of sponta-
neous speech may also suggest different types of dementia (Holmes and Singh 1996;
Bucks et al. 2000; Lunsford and Heeman 2015) and the results of these analyses can
be exploited in the automatic detection of patients suffering from dementia (托马斯
等人. 2005; Jarrold et al. 2014; Shibata, Wakamiya, and Aramak 2016; K ¨onig et al. 2015).
Changes in the writing style of people may also indicate dementia (Garrard et al. 2005;
Hirst and Wei Feng 2012; Le et al. 2011). Fraser et al. were able to distinguish MCI
speakers from healthy older adults with accuracy scores of up to 63% (英语) 和
72% (Swedish) on the basis of information content alone (弗雷泽, Fors, and Kokkinakis
2018). The results of these studies are very encouraging. 例如, 弗雷泽, Fors, 和
Kokkinakis (2018) established that subtle differences in language can be detected in
narrative speech, even at the very early stages of cognitive decline, when scores on
screening tools such as the MMSE are still in the “normal” range.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 48, 数字 1

Besides English, there are studies that seek to identify dementia in native speakers
的, 例如, 德语 (韦纳, Herff, and Schultz 2016), Portuguese (dos Santos et al.
2017), Japanese (Shibata, Wakamiya, and Aramak 2016), and Swedish (Kokkinakis et al.
2017; Fraser et al. 2017). 弗雷泽, Fors, and Kokkinakis (2018) analyzed the information
content of narrative speech samples from individuals with MCI, in both English and
Swedish, using a combination of supervised and unsupervised learning techniques.
They found that the multilingual approach leads to signiﬁcantly better classiﬁcation
accuracy scores than training on the target language alone. As for the automatic de-
tection of MCI in Hungarian individuals, Vincze et al. (2016) sought to identify MCI
patients based on linguistic features gained from the transcripts of spontaneous speech
录音. As regards speech features, T ´oth et al. (2015) and T ´oth et al. (2018) experi-
mented with speech recognition techniques. To extend the previous studies concerning
the Hungarian language, Gosztolya et al. (2019) involved both mild AD (mAD) 和
MCI patients, and speech-based and linguistic features were used in distinguishing the
two classes from healthy controls.

在本文中, we again seek to automatically identify Hungarian patients suffering
from MCI or mAD based on their speech transcripts. In contrast with previous work
(例如, T ´oth et al. 2018), here we focus on only linguistic features and ignore those derived
from ASR. Our system applies machine learning techniques and is based on a rich
feature set that includes parameters of linguistic characteristics of spontaneous speech
along with features that exploit morphological and syntactic parsing and features de-
rived from semantic and pragmatic phenomena. In addition to the features used in
our earlier studies (Vincze et al. 2016; Gosztolya et al. 2019), we have included new
morphological, 句法的, semantic, and pragmatic features that might be characteristic
of spontaneous speech. We also attempt to investigate how the different data recording
scenarios affect linguistic features. This also leads us to propose a methodology to
identify dementia on the basis of linguistic parameters of spontaneous speech. 因此,
the main contributions of the article are the following:

• We deﬁne a rich feature set of linguistic parameters for detecting different

types of dementia and propose some novel features for the task;

• We carry out a detailed statistical analysis of (小说) linguistic parameters

that may distinguish healthy controls (HC) from MCI and mAD patients
in three different tasks, 即, immediate recall, delayed recall, 和
describing what happened on the previous day;

• We perform machine learning experiments with the above-mentioned

feature set for detecting different types of dementia;

• We analyze the efﬁcacy of the above-mentioned three different tasks based
on the results of a data analysis from transcripts and the results of the
实验.

The article is structured as follows. 在部分 2 we present the basic attributes and
statistical data of the Hungarian MCI-mAD database. 然后, in Section 3 we discuss the
methodology of the research, along with the rich feature set applied in the processing
of the speech transcripts and investigate the signiﬁcance level of these values among
various speaker group pairs (HC vs. MCI, HC vs. mAD, and MCI vs. mAD) for the dif-
ferent speaker tasks. 在部分 4, we describe our machine learning experiments using
the same feature set. Afterwards, in Section 5 we systematically analyze the datasets

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Vincze et al.

Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD

and we show that these attributes also serve as a basis for an efffective discrimination
among the three speaker groups. We will also draw some conclusions on the usefulness
of each speaker task. 最后, we summarize the main results of our study in Section 6.

2. The Hungarian MCI-mAD Database

In our study, we used the Hungarian MCI–mAD database, recorded at the Memory
Clinic at the Department of Psychiatry of the University of Szeged, 匈牙利 (Gosztolya
等人. 2019). The study was approved by the Ethics Committee of the University of
Szeged, and it was conducted in accordance with the Declaration of Helsinki. Writ-
ten informed consent was obtained from all the participants involved in the research
项目. 很遗憾, our ethics agreement does not allow the sharing of these speech
录音. For the sake of simplicity, we will provide the most important steps of the
data collection based on Gosztolya et al. (2019).

We collected utterances from three groups of subjects. 即, those suffering from
MCI, those affected by early-stage AD, and HC (IE。, those with no cognitive impairment
at the time of recording). The three groups were then matched for age, 性别, 和
教育. MCI and mAD patients were selected after a medical diagnosis was con-
ﬁrmed by computed tomography, magnetic resonance imaging (MRI), and cognitive
测试 (MMSE [Folstein, Folstein, and McHugh 1975], CDT [Freedman et al. 1994], 和
ADAS-Cog [罗森, Mohs, and Davis 1984]). Anyone who had previously suffered from
head injuries, depression, or psychosis was excluded here. Further exclusion criteria
were drug or alcohol consumption, being under pharmacological treatment affecting
cognitive functions, and visual or auditory deﬁcits. This choice is justiﬁed by the fact
that head injuries may also lead to speech impairment (例如, aphasia). 而且, 的-
压力, alcohol use, and drug use are clinically known to affect cognitive processes,
hence may inﬂuence speech as well.

Here our aim was to investigate whether we can determine the state of the patients
based on linguistic features only. 为此原因, we needed ground truth labels, 那是,
a clinically conﬁrmed medical diagnosis for each patient, obtained in the most precise
方式 (applying imaging processes, cognitive tests, ETC。). The classiﬁcation of MCI and
mAD patients was always the result of a consensus between the members of our
clinical expert panel (a psychiatrist, a neurologist, and a psychologist), who made their
decision based on the global clinical picture, neuropsychological test results, and also
neuroimaging (when available). As far as we know there is no clinical protocol for
diagnosing patients only on the basis of their linguistic utterances, hence we were not
able to rely on such protocols in the diagnosing phase. 然而, as Petersen (2004)
also remarks, the distinction between healthy aging and MCI, and also between MCI
and very early AD, is challenging as these conditions often overlap on a cognitive
continuum. If the expert panel could not agree on the classiﬁcation of a patient, 那
patient was not included in analysis to prevent the confounding effect of an already
controversial diagnosis.

All our previous studies (Hoffmann et al. 2010; T ´oth et al. 2015; Gosztolya et al. 2016;
T ´oth et al. 2018; Gosztolya et al. 2019) and studies carried out by other groups (例如, Taler
and Phillips 2008; Roark et al. 2011; Satt et al. 2014) found that MCI and AD affect the
spontaneous speech of the patients more than their planned speech. In the case of planned
speech, speakers usually have some time in advance to think about what they would
like to say, hence difﬁculties in word ﬁnding (due to memory decline) cannot be reliably
detected. 然而, in the case of spontaneous speech, speakers are required to speak on
the spot, so they do not have time to prepare their speech, which might truly reﬂect their

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

计算语言学

体积 48, 数字 1

桌子 1
The instructions to the patients when recording the three utterances.

(1)

(2)
(3)

“I am going to show you a silent movie lasting about a minute. Try to remember
the story, the actors, the objects and the places, paying attention to the details.”
“Now, I would like to ask you to tell me about yesterday in detail.
“Now, I am going to show you another clip. Try to remember the story, the actors,
the objects and the places, paying attention to the details. OK, I am going to start
it now.”
The Patient watches the clip. If he starts talking about it, he is reminded
that he is not yet allowed to talk about it. When the clip ends:
“Now we will take a one-minute break.”
If the Patient starts talking during the break, he is reminded that it is still
break time, and he has to wait until the minute is over. After the one-
minute break is over:
“Right, could you please tell me what you saw in the clip?”

difﬁculties in word ﬁnding. 所以, our aim was to record spontaneous speech, 和
use the transcripts of these utterances. This is why our experimental setup for recording
was as follows (for the details, see Hoffmann et al. 2010). After the presentation of a
specially designed one-minute-long animated ﬁlm, the subjects were asked to talk about
the events seen on the ﬁlm (immediate recall or Task 1). 下一个, the subjects were asked
to talk about their previous day (previous day or Task 2). As the last task, the subjects
were shown a second ﬁlm, then—after a one-minute long pause—were asked to talk
about the second ﬁlm (delayed recall or Task 3). (For the instructions to the subjects, 看
桌子 1.) 因此, we had three recordings for each subject, each containing spontaneous
speech, but the tasks performed were different. 在本文中, we also seek to investigate
whether some tasks are less effective for detecting MCI or mAD than other tasks. 这
is why we experimented with three different recordings.

Our approach makes use of textual input, 那是, the transcripts of utterances made
by the speaker groups. 然而, it must be emphasized that this method may be
complementary to using speech recordings as we did in our previous work (T ´oth et al.
2015; T ´oth et al. 2018; Gosztolya et al. 2019). We think that the combination of these two
methodologies, 即, relying on textual information as well as on automatic speech
recognition techniques, can lead to even higher accuracy with regard to identifying the
patients’ status, which we would like to implement in the future.

Our database of MCI and AD patients is continuously growing; 在那个时间
writing we had recordings taken from more than 150 人. For various reasons
(poor sound quality, controversial diagnosis, ETC。) we had to ﬁlter out some patients;
furthermore, because we insisted on matching the three groups of speakers by age, gen-
这, and level of education, we could not use some of the recordings, which otherwise
fulﬁlled our requirements of having a clear diagnosis and an acceptable sound quality.
所以, in the end we used the recordings of 25 speakers for each speaker group,
resulting in a total of 75 speakers and 225 录音. We applied one-way ANOVA to
check if there were signiﬁcant differences among the different groups. F and p-values
can be seen in Table 2. It can be seen that the differences in the age and years of education
are statistically not signiﬁcant (p-values of 0.105 和 0.118), while the MMSE, CDT, 和
Adas-COG tests indeed show a statistically signiﬁcant difference among the speaker
团体. With t-tests, we also checked whether there are signiﬁcant differences among

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
/
C
哦

我
我
/

我

A
r
t
我
C
e
–
p
d

F
/

4
8
1
1
1
9
2
0
0
6
6
9
9
/
C
哦

我
我

_
A
_
0
0
4
2
8
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Vincze et al.

Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD

桌子 2
Demographic data and the results of the MMSE, CDT, and Adas-Cog tests of the three subject
团体. We also report mean and standard deviation (平均值±标准差).

控制 (n = 25)
70.72 ± 5.004
12.08 ± 2.326
29.24 ± 0.523
8.88 ± 2.007
8.575 ± 2.374

Subject groups

MCI (n = 25)
72.4 ± 3.594
10.84 ± 2.304
27.16 ± 0.898
6.44 ± 3.429
12.044 ± 3.205

mAD (n = 25)
73.96 ± 6.846
10.76 ± 2.818
23.92 ± 2.488
5.88 ± 3.244
18.675 ± 5.818

统计数据

F (2;74)

2.321

2.202

76.213

7.254

38.35

p = 0.105

p = 0.118
p < 0.001 p = 0.001 p < 0.001 Age Education MMSE CDT Adas-COG Table 3 Signiﬁcance of demographic data and the MMSE, CDT, and Adas-Cog tests of healthy controls and patients with dementia. Patient groups Age Education Control vs. MCI p = 0.0912 p = 0.0115 Control vs. MCI+mAD p = 0.0202 p = 0.0083 MMSE p < 0.0001 p < 0.0001 CDT Adas-COG p = 0.0037 p < 0.0001 p = 0.0002 p < 0.0001 healthy controls and patients with MCI on the one hand and healthy controls and patients with dementia (i.e., grouping the MCI and mAD patients together) on the other hand. As shown in Table 3, there are signiﬁcant differences among the groups except for age in the case of the control vs. MCI speakers. 3. Methodology In this section, we will describe our methods used to identify MCI and mAD patients based on their speech transcripts. 3.1 Feature Set In our experiments, we used a rich feature set derived from the transcripts and the results of the automatic linguistic analyses performed with magyarlanc, a linguistic preprocessing toolkit for Hungarian (Zsibrita, Vincze, and Farkas 2013). With this tool, the text was ﬁrst split into sentences, then tokenized, and ﬁnally the tokens were lemmatized. A token is a semantic unit, usually separated by spaces from other char- acter sequences in the text (Szab ´o et al. 2020). A token can be a word, a number, or punctuation as well. Lemmatization is especially important in case of morphologi- cally rich languages such as Hungarian. In these languages words—nouns, verbs, pro- nouns, and adjectives—may have numerous inﬂected and derived forms (Mladenovi´c et al. 2016). This property may make the automatic analysis signiﬁcantly more difﬁ- cult or even ineffective. Lemmatization removes inﬂectional endings and returns the base or dictionary form of a word (Balakrishnan and Lloyd-Yemoh 2014; Kutuzov 49 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 and Kuzmenko 2019). As a last step of preprocessing, punctuation was removed. The remaining strings are referred here as words. Similarly to T ´oth et al. (2015), we hypothesized that the speech of MCI patients may contain more pauses and hesitations than the speech of HC and they are also supposed to have a restricted vocabulary due to cognitive deﬁcit, which may affect the choice of words and the frequency of parts of speech (Croot et al. 2000), and they might even produce neologisms. In addition to the features used in our earlier study, we added new morphological, syntactic, semantic, and pragmatic features that might be characteristic of spontaneous speech, and we made use of demographic features that were available to us. Altogether, the feature set consisted of 330 features (3 demographic features and 3 times 109 features for each recording). Our feature set contained the following features (novel features that have not been applied in our previous studies are italicized): We extracted basic statistical features (7 features) from each transcript, namely: • • • • • The number of sentences; The number and relative frequency of tokens; The number of words; The number and frequency of distinct lemmas compared to the number of words; The average sentence length. We also processed the data from the viewpoint of spontaneous speech-based features (6 features): • • • • The number of ﬁlled and silent pauses; The number and frequency of hesitations compared to the number of tokens; The number of pauses that follow an article and precede content words, as this might indicate that MCI patients may have difﬁculties in ﬁnding the suitable content words; The number of lengthened sounds (which we treated as a special form of hesitation based on Gosztolya et al. [2016]). Most of the morphological features employed in our analysis rely on the fact that Hungarian is a morphologically rich language, and this is why many grammatical relations are expressed by sufﬁxes, the number of which might indicate whether or not the cognitive abilities of the speaker have been adversely affected. In this phase of the data processing we extracted the following features (35 features altogether): Part-of-speech (or POS) features (17 features): The number and frequency of nouns, verbs, adjectives, pronouns, numerals, adverbs, and conjunctions compared to the number of all words; The number of punctuation marks; • • 50 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD • The number and frequency of unanalyzed words, that is, those with an “unknown” POS tag, compared to the number of all words, which could reﬂect whether neologisms are being created by the speaker while speaking. Deep morphological features (18 features): • • • • • • The number of ﬁrst person singular verbs, as this might tell us how often the patient reﬂects upon himself or herself; The number of ﬁrst person plural verbs, as this might provide evidence for a strong or weak group identity of the patient; The number and frequency of past and present tense verbs compared to the number of all verbs, as this might reﬂect how well the patient can remember past events; The number and frequency of imperative and conditional verbs compared to the number of all verbs, as this might provide evidence how the patient is able to cognitively perceive non-factual events; The number and frequency of comparative and superlative adjectives compared to the number of all adjectives, as this might tell us how the patient can make comparisons; The number and frequency of demonstrative pronouns compared to the number of all pronouns, as this might indicate the ability of changing relative directions and viewpoints; • The average number of morphemes of nouns. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o As for syntactic features, we extracted the following characteristics (10 features): • • • The number and frequency of subjects and objects, compared to the number of all words, as Hungarian is a pro-drop language, meaning that pronominal subjects and objects might not be overt in the clause; The number and frequency of adverbs, compared to the number of all words, as adverbs usually describe additional circumstances to the events and this might indicate the way the speaker recalls the story (i.e., describing only the main events or adding some further details); The number and frequency of coordinations and subordinations, compared to the number of all words, as these features may characterize the complexity of the speaker’s sentences. l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 We also carried out an analysis of the semantic features of the texts from the point of view of sentiments, emotions, and words or phrases denoting uncertainty of the speaker in the veracity of the information expressed and different kinds of memory activity, among others (47 features altogether): Uncertainty features (16 features): • The number and frequency of ﬁllers and uncertain words compared to the total number of tokens; 51 Computational Linguistics Volume 48, Number 1 • The number and frequency of words belonging to several classes of linguistic uncertainty based on Vincze (2014), compared to the number of all words. Sentiment features (10 features): • The number and frequency of positive and negative words based on a list of sentiment phrases, compared to the number of all words. We applied two different Hungarian dictionaries for sentiment analysis: One list was a translation of Liu (2012), while the other one contained Hungarian slang words (Szab ´o 2015) (in the tables “positive/negative” and “slangPositive/slangNegative,” respectively.) Emotion features (16 features): • The number and frequency of words belonging to the emotions described in Szab ´o, Vincze, and Morvay (2016), compared to the number of all words. Other semantic features (5 features): • • • The number and frequency of words/phrases related to memory activity (e.g., nem eml´ekszem not remember-1SG “I can’t remember”), compared to the number of all words, as they directly signal problems related to memory and recall; The number of negation words; The ratio of content words and function words. As regards pragmatic features of the transcripts, we processed speech act verbs and discourse markers (4 features): • • The number and frequency of speech act verbs, based on a manually constructed list, compared to the number of all verbs; The number and frequency of discourse markers, compared to the number of all words. To ﬁnd discourse markers in the texts we applied a word list based on D´er and Mark ´o (2007). Lastly, we also took into consideration the demographic features of the speakers (3 features): • • • Gender; Age; Education. All the lists we have used in the investigation of semantic and pragmatic features are available at https://github.com/vinczev/hungarian_lists. 52 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD 3.2 Statistical Analysis of Features In order to quantify the usefulness of each feature in distinguishing HC, MCI patients, and mAD patients, we carried out a statistical analysis of the data (pairwise t-tests for each feature and transcript). The signiﬁcance levels for each feature among the three groups are listed in Tables 7 and 8, and the signiﬁcance levels for each feature between HC and speakers with either MCI or mAD are listed in Tables 10 and 11, according to the following notation: • • • *: 0.01 ≤ p < 0.05, **: 0.001 ≤ p < 0.01, and ***: p < 0.001. The features that do not exhibit signiﬁcant differences have been omitted from the tables for the sake of simplicity. Analyzing the single features, Tables 10 and 11 tell us that almost all the features display signiﬁcant differences when working with only two classes: There are only 9 features—out of 109—that do not exhibit signiﬁcant differences in any of the three tasks. Hence, the use of linguistic features for distinguishing between HC and speakers with MCI or mAD is well justiﬁed and our feature set for the machine learning experiments will be based on them (see Section 4). Figure 1 shows the results of the analysis, in accordance with the task types. More precisely, we can see how many features of the speciﬁc feature group exhibit signiﬁcant differences with p < 0.05 for each speaker group pairs. From the statistically signiﬁcant features, we can conclude that Task 2, namely, the description of the previous day, proves to be the best indicator to differentiate between HC and speakers with MCI. On the other hand, Task 3 is useful when patients with MCI and mAD need to be distinguished. A more detailed analysis of feature groups and the effect of each task will be provided in Section 5, on the basis of both statistical signiﬁcance and machine learning experiments. 4. Machine Learning Experiments So far we have described our extracted text-based features, and investigated the signiﬁ- cance level of their values among various speaker group pairs for the different speaking tasks. In the next part of our study we will show that these attributes can also serve as a basis for an effective automatic discrimination among the three speaker groups (i.e., HC, subjects having MCI, and patients suffering from AD). That is, now we will perform machine learning experiments, using the extracted features. 4.1 Classiﬁcation We performed the classiﬁcation experiments with the use of Support-Vector Machines (Sch ¨olkopf et al. 2001); we employed the libSVM implementation (Chang and Lin 2011). To avoid overﬁtting due to having a large number of meta-parameters, we utilized a linear kernel, with the complexity (C) value explored in the range 10{−5,−4,...,0,2}. We treated each subject as one independent example. We then standardized each feature so as to have a zero mean and unit variance. 53 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 1 Ratio of attributes for the three speaker tasks and the feature categories examined, which signiﬁcantly differ from p < 0.05 for the HC-MCI (top), MCI-mAD (middle), and HC-mAD (bottom) speaker categories. From a machine learning perspective, having only 75 examples (i.e., subjects) is an extremely small dataset. However, the number of diagnosed MCI and mAD patients is limited; moreover, collecting and transcribing their speech and obtaining a medical diagnosis is time-consuming. Other similar studies we are aware of involved fewer than 100 patients (Satt et al. 2014; Jarrold et al. 2014; Lehr et al. 2012; Roark et al. 2011; Fraser, Rudzicz, and Rochon 2013; Weiner, Herff, and Schultz 2016). Having so few examples, we did not create separate training and test sets, but opted for cross-validation. In order to guarantee that each fold had the same number of speakers from each speaker group, we used 5-fold cross-validation: We divided the subjects into 5 groups (folds), 54 0 20 40 60 80 100StatisticalSpeech-basedPOSDeep morph.SyntacticUncertaintySentimentEmotionOther semanticPragmaticFeature categoriesSignificant features (%)Task 1 (Immediate Recall) (p<0.05)Task 2 (Previous Day) (p<0.05)Task 3 (Delayed Recall) (p<0.05)0 20 40 60 80 100StatisticalSpeech-basedPOSDeep morph.SyntacticUncertaintySentimentEmotionOther semanticPragmaticFeature categoriesSignificant features (%)Task 1 (Immediate Recall) (p<0.05)Task 2 (Previous Day) (p<0.05)Task 3 (Delayed Recall) (p<0.05)0 20 40 60 80 100StatisticalSpeech-basedPOSDeep morph.SyntacticUncertaintySentimentEmotionOther semanticPragmaticFeature categoriesSignificant features (%)Task 1 (Immediate Recall) (p<0.05)Task 2 (Previous Day) (p<0.05)Task 3 (Delayed Recall) (p<0.05) Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD all containing 5 MCI and 5 mAD speakers, and 5 HC. Then we always trained on the features extracted from the speech of 60 speakers, from which 20 had MCI, 20 had mAD, and 20 were HC (i.e., 4 folds). Next, this machine learning model was evaluated on the remaining ﬁfth fold (the data of 15 speakers), thereby guaranteeing that the same speaker’s data was not used during training and evaluating the same machine learning model. Repeating this process for all folds, we obtained our predictions for all the 75 speakers. For comparison, we ran a baseline experiment, using only features that were proposed before (i.e., our novel features were excluded), but with the same settings mentioned above. 4.2 Evaluation The choice of evaluation metric is not a clear-cut issue for this task. First of all, we can simply use the traditional classiﬁcation accuracy score, since the class distribution is balanced for this dataset. However, besides indicating how well the subjects were iden- tiﬁed as the members of each category, this task can also be viewed as a detection task, where we are interested in whether the speaker has any sort of cognitive disorder, that is, treating the MCI and mAD categories together as the positive class, while HC formed the negative class. As in this case the class distribution becomes imbalanced (25 control subjects and 50 subjects having some kind of cognitive disorder), we will also report (two-class) classiﬁcation accuracy scores, but standard Information Retrieval metrics of precision and recall might also be useful. As there is evidently a trade-off between these two scores, they are usually aggregated together by the F-measure (or F1-score), which is the harmonic mean of precision and recall. In the experiments we will present (3-class) accuracy scores and all the four 2-class scores (i.e., accuracy, precision, recall, and F-measure). As the last evaluation metric, we calculated the area under the ROC curve (AUC). We will report the AUC value of the HC class (reﬂecting how well the healthy subjects could be distinguished from either the MCI or the mAD speaker groups) as well as the unweighted mean of the AUC score for the three speaker cate- gories. We tuned the meta-parameters (such as complexity of SVM) by choosing the one that led to the highest mean AUC value. 4.3 Handling the Three Tasks Recall (see Section 2) that, in our recording setup, each subject performed three dif- ferent tasks, leading to three different utterances. This means that the attributes we calculated (see Section 3.1) could be extracted from the transcripts of three different speech recordings, each one differing in the memory function triggered. In the simplest approach, the attributes calculated based on the three recordings were concatenated. Of course, because the three utterances differed by nature, we were also interested in the difference among these subject tasks. To this end, we also performed experiments using the features extracted from only one of these transcriptions. 4.4 Results In our baseline experiment, we obtained an accuracy of 56% when identifying 3 classes of patients, with a precision of 0.556, a recall of 0.560, and an F-score of 0.557. 55 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Table 4 Machine learning results obtained with the different linguistic attribute categories. 3-class 2-class AUC Features Statistical Speech Morph. (all) POS Deep morph. Syntactic Semantic (all) Uncertainty Sentiment Emotion Other Pragmatic Demographic All (w/o demogr.) All (w. demogr.) Accuracy Accuracy 50.7% 54.7% 61.3% 57.3% 61.3% 58.7% 46.7% 48.0% 42.7% 37.3% 34.7% 54.7% 41.3% 68.0% 70.7% 62.7% 64.0% 76.0% 72.0% 74.7% 72.0% 65.3% 64.0% 61.3% 52.0% 61.3% 72.0% 64.0% 77.3% 80.0% Precision 72.9% 78.0% 78.6% 78.4% 78.2% 85.4% 70.7% 71.7% 71.4% 65.2% 69.8% 80.9% 78.0% 82.4% 84.3% Recall 70.0% 64.0% 88.0% 80.0% 86.0% 70.0% 82.0% 76.0% 70.0% 60.0% 74.0% 76.0% 64.0% 84.0% 86.0% F1 71.4 70.3 83.0 79.2 81.9 76.9 75.9 73.8 70.7 62.5 71.8 78.4 70.3 83.2 85.1 HC 0.727 0.713 0.818 0.743 0.750 0.742 0.674 0.690 0.574 0.448 0.569 0.720 0.708 0.845 0.847 mean 0.725 0.700 0.780 0.734 0.725 0.699 0.670 0.671 0.527 0.520 0.558 0.687 0.585 0.822 0.823 Table 4 shows the metric values we obtained for the various feature subsets. We can see that utilizing all the features led to actually quite competitive scores, either with or without the demographic information: the 68%–70% 3-class accuracy values, in our opinion, are quite high, and the two-class classiﬁcation accuracy values of 77.3%–80% and the F1-scores of 84–86 reﬂect a ﬁne classiﬁcation performance as well. These values also outperform our baseline results, hence the added value of our new features is justiﬁed. In the AUC values the difference was also even smaller: We mea- sured values of 0.845–0.847 for the HC category, and the mean AUC of the three speaker groups was 0.822–0.823. This difference suggests that it was more straightforward to make a binary decision (i.e., whether the actual subject has any form of mental disorder) than to distinguish between the MCI and mAD categories, since we obtained lower AUC scores for the MCI and mAD classes than for the HC category. Regarding the various attribute types, Table 4 displays the effectiveness of statistical features as an indicator of MCI and mAD: The relatively high scores (AUC values of 0.727 and 0.725, HC category and average, respectively) indicate that even with these simple descriptive features, dementia can be identiﬁed considerably above the level of chance. The semantic attributes, however, generally led to low scores. Uncertainty attributes seem to be the only exception (AUC values of 0.690 and 0.671), probably because of the difﬁculties in recalling things and events as the dementia becomes more and more progressive (see Section 3.2). Using just the pragmatic attributes, the classiﬁcation results are moderate as well: The values (an accuracy of 72% and F1-score of 78.4) are in clear contrast with the 3-class accuracy score of 54.7%, and although we achieved a fair AUC score for the HC speaker group (0.720), the mean AUC value of 56 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD 0.687 suggests that the pragmatic attributes vary only slightly between the MCI and the mAD speaker groups. 4.5 Results Using the Signiﬁcant Attributes Only Next, we sought to fuse our previous experiments: We performed machine learning experiments, but this time we used only those attributes that showed a statistically signiﬁcant difference. Filtering the attributes on the basis of statistical signiﬁcance is a well-known feature selection method (see, e.g., Satt et al. 2013; Fraser, Rudzicz, and Rochon 2013; Kiss and Vicsi 2017; T ´oth et al. 2018), which was reported to improve classiﬁcation performance when detecting a wide variety of illnesses. The experimental setup of our classiﬁcation experiments matched that of our pre- vious experiments. We kept only those attributes that had shown a statistically signif- icant difference at the rate p < 0.05. In this way, we treated the three subject tasks as independent, that is, if an attribute was found to be signiﬁcant only for the immediate recall subject task, we could have discarded the same attribute in the delayed recall and previous day tasks. However, the p-values were calculated for speaker group pairs; we selected the given attribute if it was found to be statistically signiﬁcant for any of these pairs (e.g., HC and MCI). Table 6 lists our results. In most cases, discarding the irrelevant (or at least, statisti- cally not signiﬁcant) attributes improved classiﬁcation performance. This was especially the case when we utilized all attribute types (either with or without the demographic information): The mean AUC values improved from 0.822–0.823 to 0.847. Perhaps more important, the AUC value of the HC speaker group, which reﬂects how well we could tell whether the actual subject has any sort of mental illness, rose from 0.847 to 0.889, and from 0.845 to 0.891, when utilizing and when discarding the demographic attributes, re- spectively. As the three groups examined were matched for age, gender, and education, it is probable that this type of demographic information just confused the algorithm. Regarding the statistical attributes, the evaluation metric values did not change at all, which is quite reasonable, since all such attributes were found to be signiﬁcant with p < 0.05. Examining the performance of the classiﬁer models trained on the speech- based features, we can see a large improvement in the 2-class case, as classiﬁcation accuracy, precision, recall, and the F1-score all rose by about 10% absolute (although the AUC values did not change signiﬁcantly). Regarding morphological attributes, all values except recall improved notably: The F-measure value of 85.1 and the AUC value of 0.865 for the HC speaker category are, in our opinion, quite high. Examining the morphological attribute subtypes, this classiﬁcation performance is mostly due to the deep morphological attributes, although using only the POS features led to high evaluation metric values as well. This is perhaps due to the morphologically rich nature of Hungarian. Retaining only the signiﬁcant syntactic attributes also led to nice improvements in four out of the seven scores; however, the AUC values of 0.754 and 0.739, HC speaker group and mean, respectively, are still among the lower ones obtained, indicating that these features are less useful for identifying dementia. On the other hand, relying on the semantic attributes was much more effective: When using all four feature subtypes, we achieved an F1-score of 81.6 and an AUC score of 0.846 for the HC class. Clearly, this performance is due to the utility of the uncertainty feature subtype, as the remain- ing three groups (i.e., sentiment, emotion, and other semantic) in general led to rather 57 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 2 AUC curves for the HC (top left), MCI (top right), and mAD (bottom) speaker groups when using all attributes (excluding demographic ones) and when using only the attributes that were found to show a statistically signiﬁcant difference with p < 0.05. poor classiﬁcation scores. Considering that MCI and mAD are both reported to cause difﬁculties in recalling things, this can be expected. Lastly, using the pragmatic at- tributes that were found to show statistically signiﬁcant differences led the AUC value of the HC speaker category to increase from 0.720 to 0.743; still, this classiﬁcation performance can be considered mediocre at best. Figure 2 shows the AUC values for the three speaker categories when using all attributes (except demographic ones), and when using only the statistically signiﬁcant ones. In the case of the HC speaker group (left side) the improvement from 0.845 to 0.891 brought by feature selection is clearly visible. Regarding the MCI group (middle), it is clear that this class was the hardest to identify, which is reasonable, as MCI is considered as the prodromal stage of AD, therefore the speech produced by these subjects differs only slightly from either the control subjects or those who already have dementia. Still, this graph demonstrates that using only the statistically signiﬁcant attributes improved 58 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD Table 5 Results obtained for each of the tasks. IM: immediate recall, PD: previous day, DR: delayed recall, Acc.: accuracy, P: precision, R: recall. Features Task Statistical Speech-based Morph. (all) POS Deep m. Syntactic Pragmatic All (w/o dem.) All (w. dem.) #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR #IR #PD #DR 3-class Acc. 56.0% 46.7% 50.7% 49.3% 42.7% 46.7% 56.0% 46.7% 48.0% 54.7% 46.7% 53.3% 60.0% 41.3% 46.7% 57.3% 49.3% 49.3% 42.7% 45.3% 46.7% 57.3% 50.7% 53.3% 58.7% 53.3% 52.0% 2-class P R 69.4% 100.0% 52.0% 76.5% 84.0% 70.0% 66.0% 75.0% 56.0% 77.8% 62.0% 81.6% 78.0% 75.0% 58.0% 74.4% 64.0% 72.7% 84.0% 76.4% 78.0% 69.6% 86.0% 72.9% 86.0% 74.1% 58.0% 67.4% 70.0% 72.9% 96.0% 70.6% 56.0% 77.8% 64.0% 72.7% 66.0% 67.3% 72.0% 73.5% 62.0% 75.6% 76.0% 76.0% 62.0% 77.5% 78.0% 73.6% 78.0% 76.5% 66.0% 78.6% 74.0% 72.5% F1 82.0 61.9 76.4 70.2 65.1 70.5 76.5 65.2 68.1 80.0 73.6 78.9 79.6 62.4 71.4 81.4 65.1 68.1 66.7 72.7 68.1 76.0 68.9 75.7 77.2 71.7 73.3 Acc. 70.7% 57.3% 65.3% 62.7% 60.0% 65.3% 68.0% 58.7% 60.0% 72.0% 62.7% 69.3% 70.7% 53.3% 62.7% 70.7% 60.0% 60.0% 56.0% 64.0% 61.3% 68.0% 62.7% 66.7% 69.3% 65.3% 64.0% AUC HC 0.701 0.642 0.743 0.685 0.667 0.643 0.720 0.651 0.682 0.730 0.560 0.685 0.641 0.577 0.654 0.734 0.622 0.703 0.557 0.634 0.602 0.756 0.732 0.685 0.760 0.768 0.674 mean 0.740 0.688 0.761 0.659 0.610 0.624 0.746 0.629 0.686 0.715 0.603 0.702 0.700 0.591 0.646 0.744 0.645 0.722 0.605 0.585 0.627 0.763 0.669 0.703 0.782 0.692 0.699 the AUC value of this class from 0.726 to 0.750. Lastly, examining the AUC curves corresponding to the mAD subjects, we can note that the SVMs were able to identify these subjects with a high conﬁdence (AUC score of 0.894); however, utilizing only the signiﬁcant attributes could not improve the performance noticeably (AUC value of 0.901). In fact, this means that discarding the non-signiﬁcant features helped the classiﬁer model where it is the most useful: in distinguishing subjects having MCI from HC. 59 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Table 6 Machine learning results obtained with the different linguistic attribute categories when using only attributes that displayed a statistically signiﬁcant difference; cases where an improvement of at least 2% (accuracy, precision, recall, and F1) or 0.02 (AUC) was observed are shown as bold. 3-class 2-class AUC Accuracy Accuracy 50.7% 57.3% 68.0% 52.0% 64.0% 61.3% 62.7% 54.7% 50.7% 36.0% 37.3% 56.0% 40.0% 69.3% 65.3% 62.7% 74.7% 80.0% 73.3% 76.0% 73.3% 76.0% 68.0% 73.3% 58.7% 56.0% 70.7% 61.3% 80.0% 76.0% Precision 72.9% Recall 70.0% 86.0% 84.3% 80.0% 79.6% 82.6% 83.3% 78.3% 75.9% 66.1% 68.9% 83.3% 81.8% 84.3% 79.6% 74.0% 86.0% 80.0% 86.0% 76.0% 80.0% 72.0% 88.0% 78.0% 62.0% 70.0% 54.0% 86.0% 86.0% F1 71.4 79.6 85.1 80.0 82.7 79.2 81.6 75.0 81.5 71.6 65.3 76.1 65.1 85.1 82.7 HC 0.727 0.698 0.865 0.825 0.847 0.754 0.846 0.748 0.623 0.522 0.623 0.743 0.721 0.891 0.889 mean 0.725 0.706 0.824 0.760 0.802 0.739 0.783 0.724 0.606 0.575 0.562 0.701 0.587 0.847 0.847 Features Statistical Speech-based Morph. (all) POS Deep morph. Syntactic Semantic (all) Uncertainty Sentiment Emotion Other Pragmatic Demographic All (w/o demogr.) All (w. demogr.) 5. Discussion Now, we shall analyze the results in more detail and draw some conclusions about the relevance of each speaking task. 5.1 Analysis of the Effect of Feature Groups As mentioned earlier, almost all features proved to be statistically signiﬁcant when working with only two classes, that is, distinguishing only HC and patients with some kind of dementia. Hence, in the following we will focus on signiﬁcant differences among the three groups (i.e., Tables 7–9), as we are interested in how the various groups of linguistic features may be affected as the disease progresses. Upon analyzing the signiﬁcance of statistical features, it was found that only Task 2 reveals differences among controls and MCI patients. However, all the tasks and almost all the features exhibit signiﬁcant differences between the MCI and mAD group, which suggests that as these features deteriorate, patients tend to speak less and less as AD progresses. Hence, the diagnostic utility of statistical features can be fully exploited for differentiating the latter two groups. As for the speech features, it is striking that there are no signiﬁcant differences between the MCI and mAD groups (which is easily identiﬁable in Figure 1), but hesita- tions and pauses indicate signiﬁcant changes among controls and MCI patients. Hence, speech features mostly deﬁne the border between these two groups, which suggests 60 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD that speech factors are already adversely affected in the early stage of dementia, making them good candidates for diagnostic purposes in order to detect dementia as early as possible. However, this group of features is less useful for distinguishing MCI and mAD patients. Morphological features, especially the rates of nouns, verbs, pronouns, and ad- verbs, are good indicators of dementia in an early stage of the disease, in the case of Tasks 1 and 2. However, features in Task 3 (delayed recall) only exhibit signiﬁcant differences in a later stage, that is, between MCI and mAD patients (see Figure 1). Thus, when the goal is to detect dementia as early as possible based on morphology, we should focus on Tasks 1 and 2. Similar to the statistical features discussed above, the syntactic abilities of the speakers seem to decline over time as there is a higher number of signiﬁcant differences among MCI and mAD patients, while only a few features distinguish controls and MCI patients (e.g., the number of subjects, objects, coordinations, and subordinations). Concerning the occurrence of coordinations and subordinations, we supposed that subordinate (dependent) clauses occur with higher frequency in the data of the control group. However, the rate of coordinations and subordinations led us to conclude that healthy controls do not tend to use more subordinate or coordinate clauses. Again, Task 3 seems to be relevant only in distinguishing the MCI and mAD classes. Examining semantic features, we see that uncertainty features are responsible for most of the signiﬁcant differences. This is especially true for epistemic and doxastic uncertainty (related to beliefs) and weasels (related to indeﬁniteness), which are of importance here. As dementia progresses, patients have difﬁculty in recalling things and events, hence the number of uncertain and fuzzy expressions like someone, I think, and so forth, increases. In spite of this, sentiment and emotion features in general did not prove to be effective in distinguishing the classes, only a few of these being signiﬁcant for some groups, especially in Task 2. It should also be mentioned that whenever there is a signiﬁcant difference, it is mostly related to negative sentiments and emotions such as anxiety and disgust. Even for positive emotions like joy and love, their number and rate decreases as dementia progresses. That is, it seems that patients with MCI and mAD express their thoughts in more negative ways than healthy controls do. Also, it should be noted that sentiment and emotion features in Tasks 1 and 3 tend to be signiﬁcant mostly for the MCI–mAD distinction, which implies that these features are adversely affected in a later stage of the disease. However, other semantic features tend to be indicative of MCI, especially in Task 2, which means that when recalling the events of the previous day, MCI patients use signiﬁcantly more phrases referring to memory activity, which is a clear indication of having memory problems. Also, the ratio of function words increases in their speech, that is, they may have difﬁculties with ﬁnding content words. Lastly, among pragmatic features the discourse markers prove to be one of the most effective features. Discourse markers are special types of pragmatic markers that form part of an utterance, but they do not contribute to the meaning of the proposition per se (Fraser 2009). These lexical expressions are classiﬁed not syntactically, but in terms of their semantic/pragmatic functions. According to Fraser (2009), discourse markers basically signal a relation between the utterance which hosts them and the prior utterance. For instance: you know, actually, basically, I mean, or so in English or m´armint ‘I mean’, tudniillik ‘namely’, tudod ‘you know’, akkor ‘then’, or szerintem ‘in my opinion’ in Hungarian. Based on the results of our analysis we may conclude that the more the disease progresses, the more likely the patient’s speech will contain discourse markers. 61 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Table 7 Signiﬁcance of statistical and morphological features in the 3-class task. #: number, %: frequency, T1: immediate recall task, T2: previous day task, T3: delayed recall task. HC vs. MCI T2 T1 T3 Statistical token# sentence# lemma# lemma% word# sentence length Morphological POS unknown# unknown% verb# verb% noun# noun% adjective# adjective% pronoun# pronoun% conjunction# conjunction% numeral# numeral% adverb# adverb% punctuation# Deep morphological comparative# comparative% past tense# past tense% present tense# present tense% imperative verb# imperative verb% conditional verb# conditional verb% Pl1 verb# Pl1 verb% Sg1 verb# demonstrative pronoun# avg # of nominal sufﬁxes * ** ** * * * * * * ** ** * ** * ** * ** *** ** * ** ** ** * ** ** * * ** MCI vs. mAD T3 T2 T1 ** *** ** ** ** ** ** * * * ** * *** * * *** *** *** * *** * *** ** *** *** * ** * * ** *** ** * ** *** * * * *** * * * ** ** *** ** *** *** ** * ** ** *** * ** * * ** * * * * * HC vs. mAD T2 T1 T3 ** *** * * ** * * ** ** * *** *** *** ** *** * *** *** *** * ** * * * ** ** * * ** ** *** ** ** *** *** *** * *** * * ** * * * * *** l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 In our machine learning experiments, we analyzed the efﬁcacy of each feature group separately. We found that after analyzing all the tasks, statistical, morphological, and syntactic features proved to be the most useful (see Tables 7–12). Still, semantic features are less effective when used on their own, giving only an accuracy score of less 62 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD Table 8 Signiﬁcance of syntactic and semantic features in the 3-class task. #: number, %: frequency, T1: immediate recall task, T2: previous day task, T3: delayed recall task. HC vs. MCI T2 T1 T3 MCI vs. mAD T3 T2 T1 HC vs. mAD T2 T1 T3 Syntactic features subject# subject% object# object% subordination# subordination% adverbial# adverbial% coordination# coordination% Semantic features Uncertainty features uncertain# uncertain% epistemic# epistemic% condition# condition% weasel# weasel% peacock# peacock% hedge# doxastic# doxastic% Sentiment features negative# positive% Emotion features love# anxiety# anxiety% disgust# joy% fear% emotive negative# emotive negative% Other semantic features memory% memory# negation word# content word% function word% * ** ** ** ** ** ** * ** * ** * * * * * *** ** * * * * ** * * * ** ** *** ** ** ** * ** ** ** * * ** * * * *** * *** ** * ** ** ** ** * * ** * * * * * ** * ** ** * * ** ** * * ** * * * * * * * *** ** ** ** *** * ** * ** * * * * ** ** ** * * *** * * * ** ** ** * ** ** * ** ** ** ** * * * * * than 50%. The same is true for the scenario of merging MCI and mAD patients (i.e., the 2-class identiﬁcation task). Morphological features seem to play an important role in machine learning exper- iments. Considering all the tasks, only by using morphological features, can we obtain an accuracy score of 61.33%, and when relying only on one of the tasks, high accuracy 63 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Table 9 Signiﬁcance of speech-based and pragmatic features in the 3-class task. #: number, %: frequency, T1: immediate recall task, T2: previous day task, T3: delayed recall task. HC vs. MCI T2 T3 T1 MCI vs. mAD T3 T2 T1 HC vs. mAD T2 T3 T1 Speech-based hesitation# hesitation% ﬁlled pause# pause# lengthened sound# pause after article# Pragmatic features speech act# discourse marker# discourse marker% * * * * * * * * * * ** * ** * ** ** *** * * * * ** ** * * * scores can be again attained (i.e., 56%, 46.7%, and 48% for Task 1, 2, and 3, respectively— see Table 5). As the disease progresses, an impoverishment of morphology can be observed in the data: For instance, the number of verbs and nouns (and basically those of all parts of speech) decrease over time and the average number of nominal sufﬁxes decreases with the progress of dementia. This might explain why morphological fea- tures are effective in separating the groups of speakers. Uncertainty features exhibit signiﬁcant differences among the groups, as well as being relevant in the machine learning experiments, especially in Task 3. As mentioned before, the reason for this might lie in the fact that dementia causes difﬁculties in recalling what happened earlier, meaning that speakers tend to express their uncer- tainty with linguistic cues too. Also, as Task 3 took place at the end of each recording session, speakers probably became tired by that time, resulting in a higher number of uncertainty cues. 5.2 Analysis of the Effect of the Tasks Next, we would like to emphasize the strengths and weaknesses of each task, in order to determine which task is the most appropriate for identifying speakers with dementia. When the tasks are considered separately (see Table 5), there are some interesting tendencies that should be examined further. For Task 1, statistical, morphological, and syntactic features are the most effective, but the role of emotion features is signiﬁcant in the 2-class identiﬁcation task, especially regarding recall. In Task 2, it is the sentiment features and other semantic features that have a positive effect on recall, and statistical and morphological features seem less important here. Moreover, uncertainty features prove to be effective in Task 2, together with morphological and statistical features. Overall, we may conclude that morphological and statistical features can perform well for all three tasks, while the efﬁcacy of semantic features depends on the actual task. The results for Tasks 1 and 2 indicate that semantic features can inﬂuence the results more strongly for the 2-class identiﬁcation task than for the 3-class task. As expected, 64 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD Table 10 Signiﬁcance of statistical and morphological features in the 2-class (HC vs. MCI/mAD) task. #: number. %: rate. Statistical features token# token% sentence# lemma# lemma% word# sentence length Morphological features POS features unknown% verb% noun# noun% adjective# adjective% pronoun# pronoun% conjunction# conjunction% numeral# numeral% adverb# adverb% punctuation# Deep morphological features superlative# superlative% comparative# comparative% sg1Pron# past# past% present# present% imperative# imperative% conditional verb# conditional verb% pl1Verb# pl1Verb% demonstrative pronoun# demonstrative pronoun% average # of nominal sufﬁxes Imm.rec. *** *** *** *** *** *** ** *** * *** *** *** *** *** *** * *** *** *** *** *** ** ** *** *** *** *** *** * ** *** *** *** *** *** Prev.day Del.rec. *** *** *** *** *** *** *** *** ** *** *** *** *** ** *** *** *** *** *** *** ** ** *** *** *** *** *** *** *** *** *** *** *** * *** *** *** *** * *** ** *** *** *** *** *** *** *** * *** *** *** *** *** * * *** *** *** *** *** ** ** *** *** *** *** ** *** binary classiﬁcation is an easier task to handle; it yields better scores for all feature groups, but it should also be added that a larger number of semantic features reveals signiﬁcant differences in the 2-class task than in the 3-class task. This may mean that semantic features are more sensitive indicators of speakers with dementia, which is in accordance with our ﬁnding that semantics seems to be affected only in a later stage of the disease. 65 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Table 11 Signiﬁcance of semantic features in the 2-class (HC vs. MCI/mAD) task. #: number. %: rate. Semantic features Uncertainty features uncertain# uncertain% epistemic# epistemic% investigation# investigation% condition# condition% weasel# weasel% peacock# peacock% hedge# doxastic# doxastic% hedge% Emotion features joy# joy% fear# fear% anger# anger% love# love% surprise# surprise% sorrow% Sentiment features positive# positive% negative% slangPositive# slangPositive% slangNegative# slangNegative% negative emotive% Other semantic features negation word# content% function% memory# memory% Imm.rec. Prev.day Del.rec. *** *** *** *** *** *** *** *** *** *** *** *** *** ** * * ** *** * *** *** *** ** *** *** *** *** *** ** *** *** *** *** *** *** *** *** ** * *** *** ** ** ** ** *** *** ** ** * *** ** *** * *** *** * *** *** *** *** *** *** *** *** *** *** *** *** ** * * * * * ** *** * * * ** * *** *** ** *** ** *** ** *** *** It is also worth mentioning that Task 1 and Task 3 more effectively indicate the difference between the statistical features for the control group and the MCI and mAD patients. In connection with our previous experiences (see Sections 3.2 and 4.4), we may conclude that when the speakers have to tell a previously speciﬁed story (with given 66 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD Table 12 Signiﬁcance of speech-based, syntactic, and pragmatic features in the 2-class (HC vs. MCI/mAD) task. #: number. %: rate. Speech-based features hesitation# hesitation% ﬁlled pause# pause# lengthened sound# pause after article# Syntactic features subject# subject% object# object% subordination# subordination% adverb# adverb% coordination# coordination% Pragmatic features speech act# speech act% discourse marker# discourse marker% Imm.rec. *** *** *** ** * *** *** *** *** *** *** *** *** *** *** *** *** *** *** Prev.day Del.rec. *** *** *** *** *** *** ** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** *** ** *** *** *** *** *** *** *** *** *** *** *** *** *** l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 content words, verbs, and story line) as in the case of Task 1 and Task 3, this restriction helps to highlight any mental disorder. However, in the case of Task 2 there is no such restriction so the topic, the content, and the order of the events are relatively free. The above-mentioned difference between the task types could possibly lead to the diverging frequency of parts-of-speech of the words as well. In the machine learning experiments, it can be seen that in Tasks 1 and 3, the application of only morphological features results in a higher accuracy than applying all features. This is probably due to the fact that most semantic features perform poorly in these tasks—with the exception of uncertainty features in Task 3—which might harm performance. It is also interesting that in Task 1, statistical features can yield about the same accuracy (and even higher F-score) than morphological features in the 2-class identiﬁcation task. Thus, it may be concluded that when our goal is to distinguish healthy controls from patients with dementia, it might be sufﬁcient to rely on very simple statistical features in the immediate recall task. In Task 2 (previous day), it is notable that the other semantic features behave very differently in the 2-class and the 3-class identiﬁcation tasks. Namely, the use of only the other semantic features yields the best accuracy (66.67%) and the best F-score (80%) for the 2-class task but they are not useful for telling apart the 3 classes (cf. the accuracy score of 33.33%). Sentiment features exhibit a similar trend here: They achieve high accuracy scores in the 2-class task but only a lower accuracy score in the 3-class task. Hence, it is recommended that these types of features can effectively identify 67 Computational Linguistics Volume 48, Number 1 people with dementia, but they are not sensitive enough to detect the subtle differences between the MCI and mAD groups. Based on the statistical signiﬁcance tests and our machine learning experiments, the following can be concluded with regard to each task type. For Task 1 (immediate recall), statistical features exhibit signiﬁcant differences among the MCI and mAD groups. The same is true for syntactic features. Also, when focusing on semantics, whenever we can ﬁnd a statistically signiﬁcant feature, it is related to the distinction of the MCI-mAD classes. In spite of this, morphological features can exhibit statistically signiﬁcant differences for controls and speakers with dementia. In the machine learning experiments, deep morphological features seem to be the most effective in both the 2-class and the 3-class identiﬁcation tasks; however, statistical and syntactic features also result in high accuracy and F-scores. In summary, this means that the strongest point of the immediate recall task is to distinguish the MCI and mAD groups. Moreover, when the goal is to identify speakers with dementia (i.e., no distinction among MCI and mAD speakers), it is sufﬁcient to use only statistical features, without the need for any deep linguistic analysis, which makes it a very cost- effective procedure in the case where there is a short video at our disposal to play for the patients in the data collecting sessions. In Task 2 (previous day), however, the other semantic features tend to achieve the highest F-score, and they also perform well in the 2-class identiﬁcation task. As regards the signiﬁcance of the features, statistical features are also strong here, as well as morphological, syntactic, and the other semantic features for both the distinction among control vs. MCI speakers and MCI vs. mAD speakers. Hence, the other semantic features tend to be distinctive for Task 2: controls use signiﬁcantly more content words and fewer function words (such as conjunctions, articles, etc.) than speakers with dementia and they also use fewer phrases related to memory activity. To sum up, Task 2 does not require any speciﬁc preparation because it is based on a single question (Tell me what happened yesterday); however, deeper linguistic analysis is needed to proﬁt from the distinctive features of this task.1 Lastly, Task 3 (delayed recall) seems to indicate the fewest number of signiﬁcant differences among the control and MCI groups. This might be related to the fact that by the end of the session, speakers were tired and could not concentrate as well, hence it was difﬁcult to ﬁnd any differences in their cognitive abilities. Nevertheless, there are signiﬁcant differences, for instance, for the statistical, morphological, and syntactic features, between the MCI and mAD groups. Moreover, if we consider our experiments with the three tasks, it is Task 3 where the overall highest F-score is the lowest, that is, the other two tasks can perform better in the machine learning experiments, although the difference is not considerable. The added value of Task 3 lies in distinguishing MCI and mAD, which justiﬁes its inclusion in the experimental setup. To summarize, we can conclude that whenever we need a ﬁne-grained distinction (i.e., distinguishing healthy controls, MCI, and mAD speakers), then the use of the immediate recall and the delayed recall tasks are strongly recommended (in addition to the previous day task). Regarding the usefulness of each task, we performed one last machine learning experiment. We trained 3-class SVMs using only the attributes found to be signiﬁcant (with p < 0.05), but using only the attributes corresponding to one of the speaker tasks 1 It needs to be added that the day of the visit might slightly inﬂuence the semantic content of the patient’s utterances in this task. However, our feature set does not primarily focus on the semantic content; rather, the emphasis is on deeper linguistic features, which are probably independent of the semantic content or real-life activities in most cases. 68 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 3 AUC curves for the HC (top left), MCI (top right), and mAD (bottom) speaker groups when using only the attributes extracted from one speaker task. (again excluding demographic information). Figure 3 shows the measured AUC values for the three speaker categories. Of course, the AUC scores appeared to be lower than in the previous case, but our aim here was to focus on the usefulness of the different speaker tasks. Examining the AUC scores corresponding to the control subjects (see the left side of Figure 3), it is clear that the second task (i.e., previous day) contributed the most to the identiﬁcation of these speakers (AUC score of 0.818), while the two recall tasks were noticeably less useful (AUC values of 0.748 and 0.726, Task 1 and Task 3, respectively). For the MCI speakers (see the middle of Figure 3), Task 1 (i.e., immediate recall) was found to be the most useful with an AUC score of 0.713, followed by Task 2 (previous day, AUC of 0.664), and, surprisingly, Task 3 (delayed recall) proved to be the worst one (AUC of 0.607). Regarding the subjects suffering from mAD (see the right side of Figure 3), all three tasks led to a high-quality identiﬁcation of these subjects (AUC scores of 0.872, 0.828, and 0.898). Our hypothesis is that Task 2 is less useful in differentiating between MCI and mAD, which also contributed to its mediocre AUC 69 Computational Linguistics Volume 48, Number 1 value for the MCI group; however, perhaps the most important aspect is to separate subjects having MCI from the healthy speakers, for which Task 2 (i.e., asking the subjects about their previous day) is the most useful. 6. Conclusions In this article, we presented our methods for automatically identifying Hungarian patients suffering from MCI or mAD based on their speech transcripts. In our study, we utilized the Hungarian MCI–mAD database, recorded at the Memory Clinic at the Department of Psychiatry or the University of Szeged, Hungary. Here, we used 225 recordings performed by the subjects in three different tasks (immediate recall, delayed recall, and telling some words about the previous day). In our experiments, we used a rich feature set (altogether 330 features) derived from the transcripts and the results of the automatic linguistic analyses performed with mag- yarlanc. We described each feature category in detail, then we presented the results of the statistical analysis of the data. We concluded that there are notable differences in the usability of not just the features, but also the speaker tasks as an indicator to differentiate between each group (i.e., HC, those with MCI, and those with mAD), as well. In the next part of the study we showed how the various attributes can serve as a basis for an effective automatic discrimination among the three speakers groups. Our system used machine learning techniques on the basis of a rich feature set including parameters of linguistic characteristics of spontaneous speech as well as features ex- ploiting morphological and syntactic parsing and semantic and pragmatic features. We concluded that, utilizing all features led to competitive scores, either with or without the demographic information (3-class accuracy scores: 68%–70%, 2-class classiﬁcation accuracy scores: 77.3%–80%, F1-scores: 84–86). In the AUC values the difference was even smaller (for the healthy control category: 0.845–0.847, for the three speaker groups: 0.822–0.823). This difference suggests that it is more straightforward to make a binary decision (i.e., whether the actual individual has any form of mental disorder) than to distinguish between the MCI and mAD categories. Regarding the various attribute types, the analysis of the statistical differences indicate that even with these simple descriptive features, dementia can be identiﬁed notably above chance level. The semantic attributes, however, generally led to low scores, with uncertainty attributes being the only exception. Using only the pragmatic attributes, the results suggest that the pragmatic attributes vary just slightly between the MCI and the mAD speaker groups. We also examined how the different data recording scenarios affect linguistic fea- tures, and concluded that when the goal is to distinguish MCI and mAD patients from healthy controls, the use of immediate recall and delayed recall tasks is strongly advisable, in addition to the previous day task. In the future, we would like to extend our data set with new transcripts. Also, on the basis of the promising research results concerning some of the deep morphological, semantic, and pragmatic features, we will investigate whether combining certain sets of features can further improve the automatic detection of MCI and mAD. Acknowledgments This study was partially funded by the National Research, Development, and Innovation Ofﬁce of Hungary via contract NKFIH FK-124413, by grant NKFIH-1279-2/2020 of the Hungarian Ministry of Innovation and Technology, and by the Ministry of Innovation and Technology NRDI Ofﬁce within the framework of the Artiﬁcial Intelligence 70 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD National Laboratory Program (MILAB). This work was supported by the Hungarian Research Fund (NKFIH / OTKA, grant number PD 132312). G´abor Gosztolya was also funded by the J´anos Bolyai Scholarship of the Hungarian Academy of Sciences and by the Hungarian Ministry of Innovation and Technology New National Excellence Program ´UNKP-21-5-SZTE. References Al-Hameed, Sabah, Mohammed Benaissa, and Heidi Christensen. 2017. Detecting and predicting Alzheimer’s Disease severity in longitudinal acoustic data. In Proceedings of the International Conference on Bioinformatics Research and Applications 2017, ICBRA 2017, pages 57–61. https://doi.org/10.1145 /3175587.3175589 Al-Hameed, Sabah, Mohammed Benaissa, Heidi Christensen, Bahman Mirheidari, Daniel Blackburn, and Markus Reuber. 2019. A new diagnostic approach for the identiﬁcation of patients with neurodegenerative cognitive complaints. PLoS ONE, 14(5):1–18. https://doi.org /10.1371/journal.pone.0217388, PubMed: 31125389 APA. 2000. DSM-IV-TR, American Psychiatric Association. Balakrishnan, Vimala and Ethel Lloyd-Yemoh. 2014. Stemming and lemmatization: A comparison of retrieval performances. In Proceedings of SCEI Seoul Conferences, pages 174–179. https://doi .org/10.7763/LNSE.2014.V2.134 Baldas, Vassilis, Charalampos Lampiris, Christos N. Capsalis, and Dimitrios Koutsouris. 2010. Early diagnosis of Alzheimer’s type dementia using continuous speech recognition. In Proceedings of MobiHealth, pages 105–110. https://doi.org/10.1007/978-3-642 -20865-2 14 Boise, Linda, Margaret B. Neal, and Jeffrey Kaye. 2004. Dementia assessment in primary care: Results from a study in three managed care systems. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 59(6):M621–M626. https://doi.org/10.1093/gerona/59 .6.M621, PubMed: 15215282 Bucks, R. S., S. Singh, J. M. Cuerden, and G. K. Wilcock. 2000. Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance. Aphasiology, 14(1):71–91. https://doi.org/10.1080 /026870300401603 Chang, Chih Chung and Chih-Jeh Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:1–27. https://doi.org/10.1145 /1961189.1961199 Chapman, Sandra Bond, Jennifer Zientz, Myron Weiner, Roger Rosenberg, William Frawley, and Mary Hope Burns. 2002. Discourse changes in early Alzheimer disease, mild cognitive impairment, and normal aging. Alzheimer Disease & Associated Disorders, 16(3):177–186. https://doi.org/10.1097/00002093 -200207000-00008, PubMed: 12218649 Croot, Karen, John R. Hodges, John Xuereb, and Karalyn Patterson. 2000. Phonological and articulatory impairment in Alzheimer’s disease: A case series. Brain and Language, 75(2):277–309. https:// doi.org/10.1006/brln.2000.2357, PubMed: 11049669 D´er, Csilla Ilona and Alexandra Mark ´o. 2007. A magyar diskurzusjel ¨ol˝ok szupraszegment´alis jel ¨olts´ege, Nyelvelm´elet–nyelvhaszn´alat. Tinta, Sz´ekesfeh´erv´ar–Budapest, pages 61–67. dos Santos, Leandro B., Edilson Anselmo Corrˆea Jr., Osvaldo N. Oliveira Jr., Diego R. Amancio, Let´ıcia L. Mansur, and Sandra M. Alu´ısio. 2017. Enriching complex networks with word embeddings for detecting mild cognitive impairment from speech transcripts. In Proceedings of ACL, pages 1284–1296. https://doi.org/10 .18653/v1/P17-1118 Folstein, M. F., S. E. Folstein, and P. R. McHugh. 1975. Mini-mental state: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3):189–198. Fraser, Bruce. 2009. An account of discourse markers. International Review of Pragmatics, 1(2):293–320. https://doi.org/10.1163 /187730909X12538045489818 Fraser, Kathleen C., Kristina Lundholm Fors, and Dimitrios Kokkinakis. 2018. Multilingual word embeddings for the assessment of narrative speech in mild cognitive impairment. Computer, Speech & Language, 53:121–139. https://doi .org/10.1016/j.csl.2018.07.005 Fraser, Kathleen C., Kristina Lundholm Fors, Dimitrios Kokkinakis, and Arto Nordlund. 2017. An analysis of eye-movements 71 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 during reading for the detection of mild cognitive impairment. In Proceedings of EMNLP, pages 1027–1037. https://doi .org/10.18653/v1/D17-1107 Fraser, Kathleen C., Frank Rudzicz, and Elizabeth Rochon. 2013. Using text and acoustic features to diagnose progressive aphasia and its subtypes. In INTERSPEECH, pages 2177–2181. https://doi.org/10.21437 /Interspeech.2013-514 Freedman, M., L. Leach, E. Kaplan, G. Winocur, K. I. Shulman, and D. Delis. 1994. Clock Drawing: A Neuropsychological Analysis. New York: Oxford University Press. Galvin, James E. and Carl H. Sadowsky. 2012. Practical guidelines for the recognition and diagnosis of dementia. The Journal of the American Board of Family Medicine, 25(3):367–382. https://doi.org/10.3122 /jabfm.2012.03.100181, PubMed: 22570400 Garrard, P., L. M. Maloney, J. R. Hodges, and K. Patterson. 2005. The effects of very early Alzheimer’s disease on the characteristics of writing by a renowned author. Brain, 128(2):250–260. https://doi.org /10.1093/brain/awh341, PubMed: 15574466 Gosztolya, G´abor, L´aszl ´o T ´oth, Tam´as Gr ´osz, Veronika Vincze, Ildik ´o Hoffmann, Gr´eta Szatl ´oczki, Magdolna P´ak´aski, and J´anos K´alm´an. 2016. Detecting Mild Cognitive Impairment from spontaneous speech by correlation-based phonetic feature selection. In Proceedings of Interspeech, pages 107–111. https://doi .org/10.21437/Interspeech.2016-384 Gosztolya, G´abor, Veronika Vincze, L´aszl ´o T ´oth, Magdolna P´ak´aski, J´anos K´alm´an, and Ildik ´o Hoffmann. 2019. Identifying Mild Cognitive Impairment and mild Alzheimer’s disease based on spontaneous speech using ASR and linguistic features. Computer, Speech & Language, 53(Jan):181–197. https://doi.org/10.1016/j.csl.2018 .07.007 Hirst, Graeme and Vanessa Wei Feng. 2012. Changes in style in authors with Alzheimer’s disease. English Studies, 93(3):357–370. https://doi.org/10. 1080/0013838X.2012.668789 Hoffmann, Ildik ´o, Dezs˝o N´emeth, Cristina D. Dye, Magdolna P´ak´aski, Tam´as Irinyi, and J´anos K´alm´an. 2010. Temporal parameters of spontaneous speech in Alzheimer’s disease. International Journal of 72 Speech-Language Pathology, 12(1):29–34. https://doi.org/10.3109 /17549500903137256, PubMed: 20380247 Holmes, David I. and Sameer Singh. 1996. A stylometric analysis of conversational speech of aphasic patients. Literary and Linguistic Computing, 11(3):133–140. https://doi.org/10.1093/llc/11 .3.133 Janka, Z., A. Somogyi, E. Magl ´oczky, Magoldna P´ak´aski, and J´anos K´alm´an. 1988. Dementia sz ˝ur˝ovizsg´alat cognit´ıv gyorsteszt seg´ıts´eg´evel. Orvosi hetilap, 129:297–299. Jarrold, William, Bart Peintner, David Wilkins, Dimitra Vergryi, Colleen Richey, Maria L. Gorno-Tempini, and Jennifer Ogar. 2014. Aided diagnosis of dementia type through computer-based analysis of spontaneous speech. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pages 27–37. K´alm´an, J´anos, E. Magl ´oczky, and Z. Janka. 1995. ´Ora Rajzol´asi Teszt: Gyors ´es egyszer ˝u dementia sz ˝ur˝om ´odszer. Psychiatria Hungarica, 10(3):11–18. Kiss, G´abor and Kl´ara Vicsi. 2017. Mono- and multi-lingual depression prediction based on speech processing. International Journal of Speech Technology, 20(4):919–935. https://doi.org/10.1007/s10772-017 -9455-8 Kokkinakis, Dimitrios, Kristina Lundholm Fors, Eva Bj ¨orkner, and Arto Nordlund. 2017. Data collection from persons with mild forms of cognitive impairment and healthy controls – infrastructure for classiﬁcation and prediction of dementia. In Proceedings of NoDaLiDa, pages 172–182. K ¨onig, Alexandra, Aharon Satt, Alexander Sorin, Ron Hoory, Orith Toledo-Ronen, Alexandre Derreumaux, Valeria Manera, Frans Verhey, Pauline Aalten, P. Robert, and Renaud David. 2015. Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 1(1):112–124. https://doi .org/10.1016/j.dadm.2014.11.012, PubMed: 27239498 Kutuzov, Andrey and Elizaveta Kuzmenko. 2019. To lemmatize or not to lemmatize: How word normalisation affects ELMo performance in word sense disambiguation. arXiv preprint arXiv:1909.03135. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD Le, Xuan, Ian Lancashire, Graeme Hirst, and Regina Jokel. 2011. Longitudinal detection of dementia through lexical and syntactic changes in writing: A case study of three British novelists. Literary and Linguistic Computing, 26(4):435–461. https://doi .org/10.1093/llc/fqr013 Lehr, Maider, Emily Prud’hommeaux, Izhak Shafran, and Brian Roark. 2012. Fully automated neuropsychological assessment for detecting Mild Cognitive Impairment. In Proceedings of Interspeech, pages 1039–1042. https://doi.org/10 .21437/Interspeech.2012-306 Liu, Bing. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers. https://doi.org/10.2200 /S00416ED1V01Y201204HLT016 L ´opez-de-Ipi ˜na, K., J. B. Alonso, J. Sol´e-Casals, N. Barroso, P. Henriquez, M. Faundez-Zanuy, C. M. Travieso, M. Ecay-Torres, P. Mart´ınez-Lage, and H. Eguiraun. 2015. On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation, 7(1):44–55. https://doi.org/10.1007 /s12559-013-9229-9 L ´opez-de-Ipi ˜na, Karmele, Jesus-Bernardino Alonso, Carlos Manuel Travieso, Jordi Sol´e-Casals, Harkaitz Egiraun, Marcos Faundez-Zanuy, Aitzol Ezeiza, Nora Barroso, Miriam Ecay-Torres, Pablo Martinez-Lage, and Unai Martinez de Lizardui. 2013. On the selection of non-invasive methods based on speech analysis oriented to automatic Alzheimer disease diagnosis. Sensors, 13(5):6730–6745. https://doi.org/10 .3390/s130506730 PubMed: 23698268 L ´opez-de-Ipi ˜na, Karmele, Jordi Sol´e i Casals, Harkaitz Eguiraun, Jes ´us B. Alonso, Carlos Manuel Travieso-Gonz´alez, Aitzol Ezeiza, Nora Barroso, Miriam Ecay, Pablo Martinez-Lage, and Blanca Beitia. 2015. Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: A fractal dimension approach. Computer, Speech & Language, 30(1):43–60. https://doi.org/10.1016 /j.csl.2014.08.002 Lunsford, Rebecca and Peter A. Heeman. 2015. Using linguistic indicators of difﬁculty to identify mild cognitive impairment. In Proceedings of Interspeech, pages 658–662. Matsuda, Hiroshi, Takashi Asada, and Aya Midori Tokumaru. 2017. Neuroimaging Diagnosis for Alzheimer’s Disease and Other Dementias. Springer. https://doi.org /10.21437/Interspeech.2015-235 Mirheidari, Bahman, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, and Heidi Christensen. 2017. Toward the automation of diagnostic conversation analysis in patients with memory complaints. Journal of Alzheimer’s Disease, 58(2):373–387. https://doi.org/10.3233/JAD-160507, PubMed: 28436388 Mirheidari, Bahman, Daniel Blackburn, Markus Reuber, Traci Walker, and Heidi Christensen. 2016. Diagnosing people with dementia using automatic conversation analysis. In Proceedings of Interspeech, pages 1220–1224. https://doi.org/10 .21437/Interspeech.2016-857 Mladenovi´c, Miljana, Jelena Mitrovi´c, Cvetana Krstev, and Duˇsko Vitas. 2016. Hybrid sentiment analysis framework for a morphologically rich language. Journal of Intelligent Information Systems, 46(3):599–620. https://doi.org/10 .1007/s10844-015-0372-5 Nakata, Yasuhiro, Noriko Sato, Kiyotaka Nemoto, Osamu Abe, Shoko Shikakura, Kunimasa Arima, Nobuo Furuta, Masatake Uno, Shigeo Hirai, Yoshitaka Masutani, Kuni Ohtomo, A. James Barkovich, and Shigeki Aoki. 2009. Diffusion abnormality in the posterior cingulum and hippocampal volume: Correlation with disease progression in Alzheimer’s disease. Magnetic Resonance Imaging, 27(3):347–354. https://doi .org/10.1016/j.mri.2008.07.013, PubMed: 18771871 Nelson, Lucy and Naji Tabet. 2015. Slowing the progression of Alzheimer’s disease; what works? Ageing Research Reviews, 23(B):193–209. https://doi.org /10.1016/j.arr.2015.07.002, PubMed: 26219494 Patocskai, A. T., Magdolna P´ak´aski, G. Vincze, M. Fullajt´ar, Irma Szimjanovszki, K. Boda, Z. Janka, and J´anos K´alm´an. 2014. Is there any difference between the ﬁndings of clock drawing tests if the clocks show different times? International Journal of Geriatric Psychiatry, 39(4):749–757. https://doi.org/10.3233/JAD-131313, PubMed: 24270210 Petersen, R. C. C. 2004. Mild cognitive impairment as a diagnostic entity. Journal of Internal Medicine, 256(3):183–194. https://doi.org/10.1111/j.1365 -2796.2004.01388.x, PubMed: 15324362 73 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Computational Linguistics Volume 48, Number 1 Pistono, Aur´elie, Jeremie Pariente, C. B´ezy, B. Lemesle, J. Le Men, and M´elanie Jucla. 2019. What happens when nothing happens? An investigation of pauses as a compensatory mechanism in early Alzheimer’s disease. Neuropsychologia, 124:133–143. https://doi.org/10.1016 /j.neuropsychologia.2018.12.018, PubMed: 30593773 Roark, B., M. Mitchell, J. Hosom, K. Hollingshead, and J. Kaye. 2011. Spoken language derived measures for detecting mild cognitive impairment. Transactions on Audio, Speech, and Language Processing, 19(7):2081–2090. https://doi.org/10 .1109/TASL.2011.2112351, PubMed: 22199464 Rosen, W. G., R. C. Mohs, and K. L. Davis. 1984. A new rating scale for Alzheimer’s disease. Journal of Psychiatric Research, 141(11):1356–1364. https://doi.org/10 .1176/ajp.141.11.1356, PubMed: 6496779 Satt, Aharon, Ron Hoory, Alexandra K ¨onig, Pauline Aalten, and Philippe H. Robert. 2014. Speech-based automatic and robust detection of very early dementia. In 15th Annual Conference of the International Speech Communication Association, pages 2538–2542. https://doi.org/10 .21437/Interspeech.2014-544 Satt, Aharon, Alexandra Sorin, Orith Toledo-Ronen, Oren Barkan, Ioannis Kompatsiaris, Athina Kokonozi, and Magda Tsolaki. 2013. Evaluation of speech-based protocol for detection of early-stage dementia. In Proceedings of Interspeech, pages 1692–1696. https:// doi.org/10.21437/Interspeech.2013-32 Scheltens, Philip, Nick Fox, Frederik Barkhof, and Charles De Carli. 2002. Structural magnetic resonance imaging in the practical assessment of dementia: Beyond exclusion. Lancet Neurology, 1(1):13–21. https://doi.org/10.1016/S1474 -4422(02)00002-9 Sch ¨olkopf, Bernhard, John C. Platt, John Shawe-Taylor, Alexander J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471. https://doi.org/10 .1162/089976601750264965, PubMed: 11440593 Shibata, Daisaku, Shoko Wakamiya, and Eiji Aramak. 2016. Detecting Japanese patients with Alzheimer’s disease based on word category frequencies. In Proceedings of ClinicalNLP, pages 78–85. 74 Stricker, N. H., B. C. Schweinsburg, L. Delano-Wood, C. E. Wierenga, K. J. Bangen, K. Y. Haaland, L. R. Frank, D. P. Salmon, and M. W. Bondi. 2009. Decreased white matter integrity in late-myelinating ﬁber pathways in Alzheimer’s disease supports retrogenesis. Neuroimage, 45(1):10–16. https://doi.org/10.1016 /j.neuroimage.2008.11.027, PubMed: 19100839 Szab ´o, Martina Katalin. 2015, Egy magyar nyelv ˝u szentimentlexikon l´etrehoz´as´anak tapasztalatai ´es dilemm´ai. In Seg´edk¨onyvek a nyelv´eszet tanulm´anyoz´as´ahoz 177. Tinta, Budapest, pages 278–285. Szab ´o, Martina Katalin, Orsolya Ring, Bal´azs Nagy, L´aszl ´o Kiss, J ´ulia Koltai, G´abor Berend, L´aszl ´o Vid´acs, Attila Guly´as, and Zolt´an Kmetty. 2020. Exploring the dynamic changes of key concepts of the Hungarian socialist era with natural language processing methods. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 54(1):1–13. https://doi.org/10.1080/01615440 .2020.1823289 Szab ´o, Martina Katalin, Veronika Vincze, and Gergely Morvay. 2016, Magyar nyelv ˝u sz ¨ovegek em ´oci ´oelemz´es´enek elm´eleti nyelv´eszeti ´es nyelvtechnol ´ogiai probl´em´ai. In T´avlatok a mai magyar alkalmazott nyelv´eszetben. Tinta, Budapest, pages 282–292. Taler, Vanessa and N. A. Phillips. 2008. Language performance in Alzheimer’s disease and mild cognitive impairment: A comparative review. Journal of Clinical and Experimental Neuropsychology, 30(5):501–556. https://doi.org/10 .1080/13803390701550128, PubMed: 18569251 Thomas, Calvin, Vlado Keˇselj, Nick Cercone, Kenneth Rockwood, and Elissa Asp. 2005. Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. In Mechatronics and Automation, 2005 IEEE International Conference, volume 3, pages 1569–1574. T ´oth, L´aszl ´o, G´abor Gosztolya, Veronika Vincze, Ildik ´o Hoffmann, Gr´eta Szatl ´oczki, Edit Bir ´o, Fruzsina Zsura, Magdolna P´ak´aski, and J´anos K´alm´an. 2015. Automatic detection of mild cognitive impairment from spontaneous speech using ASR. In 16th Annual Conference of the International Speech Communication Association, pages 2694–2698. https://doi .org/10.21437/Interspeech.2015-568 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Vincze et al. Linguistic Parameters of Spontaneous Speech for Identifying MCI and AD T ´oth, L´aszl ´o, Ildik ´o Hoffmann, G´abor Gosztolya, Veronika Vincze, Gr´eta Szatl ´oczki, Zolt´an B´anr´eti, Magdolna P´ak´aski, and J´anos K´alm´an. 2018. A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Current Alzheimer Research, 15(2):130–138. https://doi.org/10.2174 /1567205014666171121114930, PubMed: 29165085 Vincze, Veronika. 2014. Uncertainty detection in Hungarian texts. In Proceedings of Coling, pages 1844–1853. Vincze, Veronika, G´abor Gosztolya, L´aszl ´o T ´oth, Ildik ´o Hoffmann, Gr´eta Szatl ´oczki, Zolt´an B´anr´eti, Magdolna P´ak´aski, and J´anos K´alm´an. 2016. Detecting Mild Cognitive Impairment by exploiting linguistic information from transcripts. In Proceedings of ACL, pages 181–187. https://doi.org/10.18653/v1/P16-2030 Weiner, Jochen, Christian Herff, and Tanja Schultz. 2016. Speech-based detection of Alzheimer’s disease in conversational German. In Proceedings of Interspeech, pages 1938–1942. https://doi .org/10.21437/Interspeech.2016-100 Yin, Changhao, Siou Li, Weina Zhao, and Jiachun Feng. 2013. Brain imaging of mild cognitive impairment and Alzheimer’s disease. Neural Regeneration Research, 8(5):435–444. Zimny, A., P. Szewczyk, E. Trypka, R. Wojtynska, L. Noga, J. Leszek, and M. Sasiadek. 2011. Multimodal imaging in diagnosis of Alzheimer’s disease and amnestic mild cognitive impairment: Value of magnetic resonance spectroscopy, perfusion, and diffusion tensor imaging of the posterior cingulate region. Journal of Alzheimer’s Disease, 27(3):435–444. https://doi.org/10 .3233/JAD-2011-110254, PubMed: 21841260 Zsibrita, J´anos, Veronika Vincze, and Rich´ard Farkas. 2013. magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP, pages 763–771. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 75 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / c o l i / l a r t i c e - p d f / / / / 4 8 1 1 1 9 2 0 0 6 6 9 9 / c o l i _ a _ 0 0 4 2 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 76
下载pdf