文章 - 麻省理工学院人工智能研究专业

文章

Communicated by Terrence Sejnowski

Hidden Aspects of the Research ADOS Are Bound
to Affect Autism Science

Elizabeth B. Torres
ebtorres@psych.rutgers.edu
Psychology Department; 计算机科学, Center for Biomedical Imagining
and Modeling; and Rutgers University Center for Cognitive Science,
Rutgers University, 皮斯卡塔韦, 新泽西州 08854, 美国.

Richa Rai
richarai9@gmail.com
Psychology Department, Rutgers University, 皮斯卡塔韦, 新泽西州 08854, 美国.

Sejal Mistry
sejal.mistry@hsc.utah.edu
Mathematics Department, Rutgers University, 皮斯卡塔韦, 新泽西州 08854, 美国.

Brenda Gupta
brendap.patel@gmail.com
Montclair State University, Montclair, 新泽西州 07043, 美国.

The research-grade Autism Diagnostic Observational Schedule (ADOS)
is a broadly used instrument that informs and steers much of the science
of autism. Despite its broad use, little is known about the empirical
variability inherently present in the scores of the ADOS scale or their
appropriateness to define change and its rate, to repeatedly use this test to
characterize neurodevelopmental trajectories. Here we examine the em-
pirical distributions of research-grade ADOS scores from 1324 records in
a cross-section of the population comprising participants with autism be-
tween five and 65 年龄. We find that these empirical distributions
violate the theoretical requirements of normality and homogeneous vari-
安斯, essential for independence between bias and sensitivity. 更远,
we assess a subset of 52 typical controls versus those with autism and find
a lack of proper elements to characterize neurodevelopmental trajecto-
ries in a coping nervous system changing at nonuniform, nonlinear rates.
Repeating the assessments over four visits in a subset of the participants
with autism for whom verbal criteria retained the same appropriate
ADOS modules over the time span of the four visits reveals that switch-
ing the clinician changes the cutoff scores and consequently influences
the diagnosis, despite maintaining fidelity in the same test’s modules,
room conditions, and tasks’ fluidity per visit. Given the changes in

神经计算 32, 515–561 (2020)
https://doi.org/10.1162/neco_a_01263

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
2
3
5
1
5
1
8
6
4
6
4
1
n
e
C
哦
_
A
_
0
1
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

516

乙. Torres, 右. Rai, S. Mistry, 和乙. 古普塔

probability distribution shape and dispersion of these ADOS scores, 这
lack of appropriate metric spaces to define similarity measures to charac-
terize change and the impact that these elements have on sensitivity-bias
codependencies and on longitudinal tracking of autism, we invite a
discussion on readjusting the use of this test for scientific purposes.

1 介绍

Autism is an umbrella term that groups a highly heterogeneous set of condi-
系统蒸发散, ranging from problems with abstract thinking within a social context
to profound somatic sensory motor differences. Any random draw of the
population with this diagnosis may have extremely different phenotypes.
更重要, it may have very different genotypes that go on to receive a
similar diagnosis of autism (见图 1). This heterogeneity poses a prob-
lem to science because it becomes challenging to do basic research aimed at
developing treatments that target the person’s needs while leveraging the
person’s capabilities and predispositions to learn and adapt within natu-
ral and social environments. Inherent to neurodevelopment is the ability of
the nascent human nervous systems to develop overcompensatory strate-
gies to cope with a disorder, yet in the current diagnoses of autism, 有
no room to extract what those coping capabilities are or how to foster them
while treating the condition.

The diagnosis criteria of the DSM-5 from the American Psychiatric
协会 (APA, 2013) have broadened to include attention deficit hy-
peractivity disorder and sensory issues, while one of several psychological
同行, the ADOS (Lord et al., 2000) can now include toddlers.
With younger children receiving the diagnosis and broader criteria to
diagnose, there are no appropriate medical interventions today that target
the coping capacity of the nervous systems and identify in a personal-
ized manner the best route to initiate treatment. Whether psychotropic
drugs recommended by psychiatrists or behavioral treatments recom-
mended by psychologists, the broad spectrum of autism today has no
treatments that capitalize on what the nervous system already does well.
There is a one-size-fits-all model to intervene from a very early age,
informed and driven by a behavioral (observational definition) 但不是
physical outcome measures of treatment effectiveness. 的确, a recent
report to the U.S. Senate1 on the progress of the Autism Collaboration,
Accountability, 研究, 教育, and Support (CARES) Act (2014)

Report to the Committees on Armed Services of the Senate and House of Represen-
tatives, Department of Defense Comprehensive Autism Care Demonstration June 2018,
Report on Efforts Being Conducted by the Department of Defense on Applied Behavior Analysis
Services, requested by: Senate Report 114-49, p. 157, accompanying S. 1376, 全国
Defense Authorization Act for Fiscal Year 2016.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
2
3
5
1
5
1
8
6
4
6
4
1
n
e
C
哦
_
A
_
0
1
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

Hidden Features of the ADOS Test May Skew Autism Detection Rates

517

数字 1: Impossible-to-stratify autism subtypes for research purposes with the
Autism Diagnostic Observational Schedule, Module 2 (ADOS-2). Two partici-
pants with different phenotypes and different genotypes (A-idiopathic versus
B-fragile X) received the same autism diagnosis from the ADOS-2 (分数为 17
denoting autism) by the same clinician. Wearable sensors capturing 3.5 s of ac-
celeration, body orientation rotations, and temperature show very different raw
data waveforms from the child and clinician during the task Free Play of module
1 in the ADOS-2 test. Stochastic signatures derived from the fluctuations in bod-
ily acceleration are shown as a map of empirically estimated gamma moments.
These were derived from the fluctuations in the acceleration amplitude (IE。,
the spike peaks) normalized to account for allometric effects due to anatomical
差异. These normalized peaks’ fluctuations converted to unitless micro-
movement spikes (M-spikes) are from the right and left wrists and torso of the
child and the clinician. The raw data are from synchronously registered motions
of their upper body as they socially interact during this task. The disparate sig-
natures for these participant-clinician dyads are shown using empirically esti-
mated mean (x-axis), 方差 (y-axis), skewness (z-axis), and kurtosis (propor-
tional to the size of the marker) of the empirically estimated continuous gamma
家庭 (PDF insets). Notice that scales are different (for visualization purposes)
due to large differences in data range.

signed into law by President Obama and extended on September 30,
2019, by the current administration, reveals that behaviorally defined
interventions to treat behaviors such as those defined by the ADOS-2
instrument do not rise to the standards of the American Medical Associ-
化. 像这样, medical insurance providers will limit medical insurance
coverage unless these behaviorally based approaches, as defined by these
behavioral instruments, are medically relevant. The need to improve the
medical research and the resulting medical treatments for autism is now
more evident than ever before owing to the large aging adult autistic

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
n
e
C
哦
A
r
t
我
C
e
–
p
d

我

F
/

3
2
3
5
1
5
1
8
6
4
6
4
1
n
e
C
哦
_
A
_
0
1
2
6
3
p
d

乙
y
G
你
e
s
t

哦
n
0
8
S
e
p
e
米
乙
e
r
2
0
2
3

518

乙. Torres, 右. Rai, S. Mistry, 和乙. 古普塔

population in need of medical support and corresponding medical insur-
ance coverage in the United States.

Because the statistical assumptions underlying the behaviorally defined
detection criteria that guide and inform the scientific research are not based
on empirical data from physical measurements, but rather on assumed
expectations defined by subjective observation, it is difficult to uncover and
define medical target treatments tailored to the person’s specific phenotypic
and genotypic characteristics and aimed at treating the medical issues. 这
is so because this important capacity for adaptation in the autistic nervous
systems remains hidden to the naked eye of the observer trained to catch
pre-set expected aspects of social responses to specific social presses. Social
behavior is much too complex and dynamic to compartmentalize in such
方法. In so doing, one risks a gross loss of data that is relevant to these
pressing medical issues. Such information can also be of use in stratifying
the many subtypes of autism that we now see in our labs. We see and quan-
tify (例如, using advanced wearable instruments) a variety of medical condi-
tions in children who have identical autism scores (例如, as in Figure 1)—for
例子, dysautonomia (dysregulated heart rhythms, food aspiration
owing to swallowing issues, peristalsis dysfunction, sphincter dysfunc-
的, gut autonomy dysfunction, delayed reflexes, and seizures, 之中
其他的), excess tolerance to pain, temperature dysregulation, metabolic
dysregulation, altered microbiota, and an overall profound lack of au-
tonomous neuromotor control (例如, frequent falls, vestibular dysfunction,
abnormal vestibulo-cochlear and vestibulo-occular reflexes, balance issues,
gait abnormalities). While these were rendered “comorbid” conditions by
the subjective psychological and psychiatric instruments that behaviorally
define autism, their prevalence among the population has now alerted
medical insurance providers in the United States to the urgent need to ad-
dress these medical issues in both basic and translational research. 的确,
the Autism CARES Act approved in 2019 budget of $3.87 billion to that end. 最近几年, the scientific community has raised the need to stratify autism into various autism subtypes to facilitate the path toward more ap- propriate, personalized treatments. But reliance on instruments that have no physical measurements and rely exclusively on behavioral observation and behavioral criteria impede progress toward a personalized approach. 的确, several debates have been published surrounding controversial reliance on the adoption of the research-grade ADOS (Lord et al., 2000) for use in scientific autism research (Constantino & Charman, 2016). The field seems to have reached a point where some autism researchers advocate the importance of tracking and quantifying the biorhythms of the nervous sys- 特姆斯 (Constantino et al., 2017; Klin, 2008; Klin, 琼斯, Schultz, Volkmar, & 科恩, 2002A, 2002乙; Tordjman et al., 2015; Torres, Brincker, 等人。, 2013; Tor- res & Denisova, 2016; Torres, Isenhower, 等人。, 2016) and the use of biophys- ical data to adapt tenets of precision medicine (Hawgood, Hook-Barnard, l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 519 l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 数字 2: The precision Medicine paradigm connecting different layers of the network of knowledge to develop personalized target treatments. When trans- lating this model to psychiatry and psychology, observation of behavior can be complemented by digital data from wearable biosensors. (Panel A from Haw- 好的, Hook-Barnard, O’Brien, & Yamamoto, 2015. Reprinted with permission from AAAS.) (乙) Commercially available wearable biosensors affording the level of precision of research-grade sensors to, 例如, separate activities of daily living that involve voluntary versus involuntary motions. Panel B shows the output of the Apple watch distinguishing walking from pointing activities, along with the categorization of activities of daily life from accelerometer data collected with a smart phone. (Panel B from Torres, Vero, & Rai, 2018.) O’Brien, & Yamamoto, 2015; Insel, 2014) to a nascent field of precision psy- chiatry (弗里斯顿, Redish, & Gordon, 2017; Torres, Isenhower, 等人。, 2016). The precision medicine paradigm (see Figure 2A) aims for the devel- opment of personalized targeted drug treatments that successfully com- bine different layers of the knowledge network spanning from patients’ self-reports to genomic data. In the fields of cancer research, the tenets of precision medicine have been implemented, and several target treatments 520 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta have been successfully developed. Yet adoption of this paradigm has lagged in fields that deal with neurological and neuropsychiatric disorders. These fields rely heavily on observation of behavior and behavioral definitions of clinical phenotypes. 因此, methods that describe behavioral phe- notypes through patients’ self-reports or through scoring reports by ac- credited professionals remain disconnected from basic scientific methods that assess various underpinnings of behavior through objective physical means with the purpose of defining medical conditions (例如, neurologi- 卡尔, immunological, endocrine/metabolic). This is perhaps the case because research-grade instrumentation used to perform such measurements in the laboratory settings had not been widely available to the clinical field until the very recent revolution in wearable biosensors. The advent of wearables that nonintrusively record biorhythmic activities self-generated by the ner- vous systems, has made such level of precision to measure behaviors com- mercially available to all (例如, see Figure 2B). Other fields have adopted the digital technology and begun the path of integration with clinically validated inventories. 例如, recent work in Parkinson’s disease (PD) has digitized the Universal Parkinson’s Disease Rating Scale (UPDRS) and used it in research settings to help stratify dif- ferent subtypes of PD (龙, Vero, Dobkin, & Torres, 2019). 相似地, the digitized ADOS (Whyatt & Torres, 2017) is amenable to integrating clinical behavioral criteria defining and detecting autism with digital biomarkers of behavior to help stratify different subtypes of autism (see appendix Figures 12 和 13) and redefine the behavioral phenotype with far higher underly- ing precision. As autism is a lifelong condition, those who were diagnosed in the 1970s through behavioral criteria are aging autistic adults today. Yet with the prevalence since then of behaviorally defined therapies devoid of neuro- logical criteria, their nervous systems never received neuroprotective ther- apies to scaffold autonomy and promote agency. It has been reported that among the autistic aging adults’ over 40 年龄, symptoms of mo- tor disorders and Parkinsonism are far more prevalent (20%) than among those over 65 years of age without autism (0.9%) (Starkstein, Gellar, Parlier, 佩恩, & Piven, 2015). These findings extend to excess involuntary move- ments in autistics 5 到 40 年龄, even for autistic individuals who did not take psychotropic medications with motor side-effects (Torres & Denisova, 2016). The current results point to an accelerated path toward neurodegeneration that calls for medically based research, defined accord- ing to medical and physiological standards, now absent from the (behav- iorally defined) detection criteria. The motivation behind our work has been to extend the monologue style of ADOS scoring highlighting social deficits to a dialogue dyadic style that aims at identifying and scoring inherent capacity for social exchange, 阿尔- ready present in the child’s nervous systems—that is, those escaping obser- 休假. Such hidden capabilities in autistic individuals are not immediately l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 521 obvious to the naked eye of a clinician trained to expect certain aspects of the social behavior to be present during the exchange and coding only those ex- pectations. The potentially overcompensatory biological coping strategies are capturable in the continuous digital data of the dyad. 在这个意义上, such data could give us an entry point into the nervous systems’ self-generated biorhythms to help create support and augment sensory systems for the affected person. As a result of this more holistic approach to autism, we could help treat the child’s somatic-sensory-motor systems, scaffolding all aspects of neurodevelopmental readiness for social exchange and support- ing the person with age-dependent accommodations across the life span. In the process of testing the robustness of the biometrics of social ex- change that we derived from the digitized ADOS scoring across different testing contexts, we had different clinicians test the same participant within the same room layout and same ADOS module. To our surprise, we cap- tured dramatic differences in the bodily responses of the same child to dif- ferent clinicians administering the same module and testing the child under similar room conditions (digital results reported elsewhere). Because these changes in the digitized behavioral responses of the child are not included in the ADOS criteria and because very likely these activities are largely be- neath the clinician’s awareness, away from naked eye detection capacity, we decided to examine the scores that the two clinicians conferred to the child’s responses. 这里, we ask whether such subtle differences in the child’s be- havioral outputs would also affect clinicians’ scoring. As the child’s bodily and facial micromotions were different for each clin- ician, it may also be the case that the type of visual feedback that these sub- tle motions from the child offer to the clinician change the diagnosis for the same child in a clinician-dependent manner. Did the subtle changes in the micromotions of the child rise to the level of detection such that the scor- ing changed? We found large differences in clinicians’ ratings of the same child, for the same module of the test administered under similar room con- 版本. This discrepancy motivated us to further examine these phenomena more systematically in our lab and to study the ADOS scores more gener- ally in large, open-access data sets. What are the statistical features of these empirical scores reported by research-reliable ADOS testers? And what can these scores tell us about the form of autism that we know today in the research arena? In this study, we first assess clinicians’ ADOS scoring of a modest cohort of autistic children, using the ADOS as an experimental protocol, where we vary the clinician and the module type for the same child. We then, 首次, compare the performance of the ADOS in neurotypical children of different ages. And finally, we examine the statistical features of these scores taken for a large cohort, across different research sites in the United States and abroad. These scores are openly accessible in the Autism Brain Imaging Data Exchange (ABIDE) repository. We discuss our results con- sidering the use of this test in basic scientific research to guide and inform l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 522 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta genetic research in autism aimed at developing target treatments within the tenets of precision medicine. 2 Methods The Rutgers University Institutional Review Board (IRB) committee ap- proved all protocols used in this study. Parental and legal guardian consent was obtained for all participants. All procedures were performed in com- pliance with the Helsinki Act under IRB approval. For the consent process, we read and explained to the parents the IRB-approved protocol and the child assent form. After the informed-consent process, all families agreed to participate and signed written parental permission and assent forms. The participants’ names and identifiers were removed to maintain confidential- ity across the entire analyses. 更远, deidentified data were used in all sections of this article, including the tables. 2.1 The ADOS Assessment. The ADOS assessment is a tool to aid the diagnosis of autism. It is often used in combination with other tools such as the Diagnostic and Statistical Manual (DSM-5; American Psychological Asso- 引文, 2013) and the ADI-R screening tool. ADOS had at some point four modules (now five modules and training for toddlers’ and adults’ levels are more common). Each module is designed to provide the most appropriate test for an individual at a certain language level. There is a newly created calibrated severity score (CSS) in ADOS-2. It is based on the person’s age as it converts an individual’s total ADOS-2 score in comparison to other indi- viduals with ASD at the same age and language level (for each individual module). For the actual ADOS-2 administration, age does not determine the module but may determine algorithm items within that module. We note, 然而, that two children of the same age will likely have very different neuromotor control age, as assessed by objective physiological metrics. 一般来说, age has no real meaning in neurodevelopmental disorders, where the coping nervous systems of different individuals born at similar dates evolve at very different rates (Torres, Brincker, 等人。, 2013). In this experimental setup, the most appropriate module was deter- mined by two experienced clinicians: a developmental pediatrician and a developmental clinical psychologist. Two clinically certified raters indepen- dently videotaped and discussed the sessions to ensure module administra- tion fidelity. The clinicians administering the test were not involved in any aspect of the design of this study. The two clinicians who administered the ADOS to the participants were research reliable: they are trained professionals who have undergone ADOS-2 clinical training. ADOS-2 clinical training is an introductory work- shop with instructional methods that include lectures, 视频, 展示- tions of administration and scoring, and discussions. This workshop serves l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 523 as a prerequisite for more thorough training required to obtain research re- liability needed to use the ADOS-2 for research purposes. “Research reliable” means that prospective trainers have submitted ADOS-2 administration videos to fellow research-reliable examiners and have received at least an agreement of 80% of similar scores on the ADOS-2 administration as displayed through the submitted video. The prospective research-reliable trainee and the already research-reliable trainer have at least 80% similar scoring on the ADOS-2 administration video submitted by the trainee. Three videos must be submitted, and a minimum of 80% similar scores must be achieved. Once they passed the requirement, they became research-reliable ADOS raters. The information about four of the modules that we used is taken from the manual and briefly summarized below but are not meant to be complete (see the supplementary table) and when in doubt, consult the ADOS manual: • Module 1: This is designed for individuals who do not have consis- tent verbal communication skills. The tasks use completely nonverbal scenarios for scoring. • Module 2: This is designed for individuals who have minimal verbal communication skills, including young children at age-appropriate skill levels, whereby tasks require moving around the room and in- teracting with objects. All objects are standard and come in a stan- dardized kit. • Module 3: This is designed for individuals who are verbally fluent. These participants are also capable of playing with age-appropriate toys. The test is conducted largely at a desk or table. In our setup, the table was always the same, and it was positioned in the room in the same configuration. The room where the test took place was not changed from session to session and participant to participant. Researchers and ADOS-certified personnel in the study (four total) made sure that the conditions were identical in each session, 任务, 孩子, rater, and module. • Module 4: This is designed for individuals who are verbally fluent but no longer at an age to play with toys. This module incorporates some module 3 元素, yet it is more conversational regarding daily liv- ing experiences. Often the examiner chooses a module and then realizes that the par- ticipant’s functional abilities anticipated by that module do not match the rater’s expectation. Then the tester chooses another module. This is a com- mon practice. 所以, our experiment manipulated the module type to probe participants’ responses to the same module administered by two dif- ferent raters or to determine the adequacy of the modules. The modules involving playing with toys or objects have the tester present standardized scenarios to evoke responses and rate the child’s per- formance. Some elements of the game (例如, a puzzle) are left out on purpose l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 524 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta to evoke the child’s need to ask for other pieces. The examiner uses several strategies to evoke different responses and assess the child’s reaction. This mode of testing behaviors inherently makes certain assumptions about so- cial expectations that do not necessarily transfer from culture to culture. 更远, the ADOS states that no sensorimotor issues are present without ascertaining the intactness of the child’s peripheral nervous systems before administering the test. A child response may not be voluntary. When the nervous systems are damaged, there is an inevitable component of the re- sponse that is never evaluated during the test because of the test’s assump- tion that no significant sensory and motor issues are present. 像这样, it is never possible to assess causality. For all these reasons and because our lab receives a highly diverse population from many different cultures and neuromotor developmental stages, the experimental protocol manipulated the variable representing the rater while maintaining constant all other con- 版本 (例如, 语境, module, room, layout of all objects in the room) and videotaping the sessions by other two independent ADOS-certified raters. As it is required by the ADOS, raters flexibly adapted each session to the child’s responses to the flow of the tasks while maintaining fidelity to the tasks employed in each module of choice. The response of the child determines the score. 同样地, the way in which the rater evokes the response influences the child’s choice of actions that are consequential to the rater’s provoking actions. To probe the ex- tent to which a change in the tester influenced the scoring, our experiment manipulated the rater as a parameter while holding all other conditions constant in two visits. Participants had four visits to the lab. For each par- ticipant, two modules were selected, and research-reliable testers were em- ployed. One module was rendered the more adequate one, while the other was rendered the feasible one. By “more adequate,” we mean that the mod- ule was at the child’s verbal and developmental levels; “feasible” means that the child could perform the entire module, but it would not be the ad- equate one to perform a diagnosis or aid a clinician in performing a diag- nosis of autism. We note that previous research indicates that inappropriate ADOS module use invalidates the assessment and the scores do not accu- rately reflect the child’s performance on the assessment. 尽管如此, since this study is not about diagnosing autism but about evaluating the use of this ADOS test in basic science, specifically assessing the variability that us- ing different raters may add to the scores, in addition to changing the rater, we are also manipulating the use of the modules across visits. Each module took between 40 和 60 minutes to complete. Both the rater and the participant were recorded by two video cameras from different angles and by smart sensors that they wore embedded in the clothing, as watches on the wrists and on the ankles. (The digital data will be the subject of a different publication. Additional information about the ADOS test can be found in the supplementary table.) l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 525 2.2 Assessment of the ADOS Scores in the Lab. We measured the out- come of the ADOS-2 test in 52 个人: 26 controls (年龄 7 到 66 years old) 和 26 suspected to have (and then diagnosed with) autism spectrum disorders (年龄 4 到 20 years old). We used the baseline visit 1 to character- ize the participants diagnosed with ASD and those who were neurotypical. This was done to ascertain the extent to which the scores’ range from typical control participants deviate from the 0 scores denoting the absence of be- havior otherwise present and contributing to the overall cutoff number. We were motivated by the critical need to create a similarity metric for autism research measuring departure from normative data. Given the adoption of the research-grade ADOS for research and the fact that this observational inventory is not a norm-reference test (Lord et al., 2000), we ascertained the spread of scores obtained from neurotypicals. This gave us a sense of de- parture from typical ranges. The absence of neurotypical assessment using the ADOS is not well known in the community that adopts the test for research. We quote from the ADOS-G paper: “Replication of psychometric data with additional sam- ples including more homogeneous non-Autistic populations and more indi- viduals with pervasive developmental disorders who do not meet Autism criteria, establishing concurrent validity with other instruments, evaluation of whether treatment effects can be measured adequately, and determining its usefulness for clinicians are all pieces of information that will add to our understanding of its most appropriate use.” Table 1 shows the 52 par- ticipants’ scores and ages at the baseline visit, when the most appropriate ADOS-2 module was selected for each child. 2.3 Repeated Measurements of Scores (But Not as Longitudinal As- sessment of Change). 在 14 of the individuals with ASD (意思是 9.3 years old ± 3.0), we reassessed them across four visits taking place within 1.3 years on average (±6 months) to determine the extent to which switching the clinician or the ADOS module (或两者) would change the outcome of the test for the same child. To that end, for each child, in the first two vis- 它是, we used the same clinician but employed two different ADOS modules. According to each assessment in each visit, the raters determined the mod- ules that were the most appropriate and feasible. From these assessments, 这 14 individuals described are those for whom the second round of visits (访问 3 和 4) retained the same appropriate and feasible modules despite the passage of time. The first module (visit 1) determined the most appropriate module at baseline for the given child. The second module (visit 2) was feasible (the child could do it) but not appropriate. 例如, if the most appropriate module in visit 1 was module 3, we would choose module 2 for visit 2. Then the same clinician would give these two modules whenever the participant retained them. 那是, the clinician determined that the module for visits 3 和 4 was the same as the module for visits 1 和 2. Then the set of modules from visits 1 和 2, according to the raters’ evaluation, was administered in l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 526 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta Table 1: Baseline ADOS-2 with 26 Participants with ASD and 26 Controls. ID V1 ASD Age Mod V1 V1 SA RRB V1 Total ID V1 Ctrl Age Mod V1 V1 SA RRB V1 Total 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 4 8 10 13 6 6 11 5 9 6 14 10 4 11 9 7 7 11 4 8 10 13 10 4 18 20 3 3 3 3 3 3 3 1 1 3 1 1 3 1 3 3 3 3 1 3 1 1 2 2 3 4 7 6 14 6 6 6 10 15 11 14 7 11 9 13 6 6 9 10 20 13 16 11 15 2 15 6 3 3 3 1 3 3 2 3 2 3 1 6 2 3 4 2 2 1 6 5 8 8 8 4 6 1 10 9 17 7 9 9 12 18 13 17 8 17 11 16 10 8 11 11 26 18 24 19 23 6 21 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8 10 9 12 7 7 10 9 7 11 7 15 11 13 31 49 48 38 29 30 32 22 48 20 66 43 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 0 0 1 2 0 2 4 0 1 1 2 1 1 0 1 2 0 0 0 0 0 1 0 7 0 2 0 0 1 0 0 0 5 0 0 0 0 1 0 0 0 0 0 0 0 0 1 2 0 1 0 0 0 0 2 2 0 2 9 0 1 1 2 2 1 0 1 2 0 0 0 0 1 3 0 8 0 2 those subsequent visits (by a different rater): module 3 in visit 3 and module 2 in visit 4 in this example (见表 2). We switched clinicians and maintained module fidelity and identical room setup. 这样, each child had a chance to become familiar with the two modules by the time that we switched the clinician. Those two same modules would then be fluidly administered by the new clinician to give us the opportunity to probe the influences of the clinician on the child’s response. The flexibility in task administration according to the child’s re- sponses was respected to ensure fluid responses. We hypothesized that this switching of clinicians (despite the use of the same modules’ and tasks’ order in each administration) would have a sub- stantial effect on the ADOS subscores, thus significantly affecting the relia- bility of the total score and the cutoff for the diagnosis given to the child by the rater (clinician). 为了检验这个假设, we used nonparametric statis- tics whereby we do not assume any distribution a priori. 桌子 2 shows the 14 scores across the four visits. l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 527 m s i t u A S , m s i t u A t u A ( s r e t a R 2 d n a , s e l u d o M 2 – S O D A t n e r e f f i D 2 , s t i s i V 4 s s o r c a s t n a p i c i t r a P 4 1 f o t n e m s s e s s A d e t a e p e R : 2 e l b a T 4 V x D 4 V t o T 4 V B R R 4 V A S e g A d o M 4 V 3 V x D 3 V t o T 3 V B R R 3 V A S e g A d o M 3 V 2 V x D 2 V t o T 2 V B R R 2 V A S e g A d o M 2 V 1 V x D 1 V t o T 1 V B R R 1 V A S S S 8 8 t u A 1 2 t u A S S S S S t u A t u A t u A t u A t u A 8 7 3 1 5 1 1 2 0 2 9 1 0 1 0 1 6 1 1 1 4 1 7 6 3 5 6 7 7 8 6 3 4 4 4 8 4 1 2 4 8 9 4 1 3 1 1 1 4 7 7 2 1 3 . 7 8 . 0 1 3 . 1 1 6 . 4 1 4 . 7 2 . 3 1 1 . 5 1 3 . 1 1 4 . 5 4 . 2 1 2 . 0 1 8 . 9 1 . 8 1 . 2 1 2 2 2 4 2 2 2 1 2 1 2 2 2 2 t u A t u A S S S S S t u A S t u A t u A S t u A t u A 9 1 1 4 2 1 1 8 8 1 4 1 6 2 2 2 8 1 6 1 8 8 7 1 5 4 8 2 3 6 5 9 7 7 6 2 6 2 4 7 9 5 6 1 9 2 1 7 1 5 1 1 1 0 1 6 6 1 1 9 . 5 5 . 0 1 9 . 0 1 1 . 4 1 2 . 7 1 . 2 1 7 . 4 1 7 . 0 1 5 1 . 2 1 1 . 9 2 . 9 9 . 7 4 . 2 1 3 3 3 3 3 3 1 1 3 1 3 3 3 3 S S 7 8 t u A 3 1 S S S S S t u A t u A t u A S t u A t u A 8 7 9 0 1 5 1 1 1 8 1 1 1 8 9 1 1 2 0 3 6 1 1 2 5 3 2 1 0 2 1 5 8 0 1 2 6 8 8 8 0 1 6 1 0 1 8 9 8 2 . 5 2 . 9 4 . 0 1 9 . 3 1 8 . 6 5 . 2 1 3 . 4 1 4 . 0 1 5 . 4 7 . 1 1 6 . 9 7 . 8 4 . 7 2 1 2 2 2 4 2 2 2 1 2 1 2 2 2 2 t u A t u A t u A t u A S t u A t u A t u A t u A S t u A t u A S S 0 1 9 7 1 7 9 8 2 1 7 1 1 1 6 1 0 1 8 1 1 1 1 3 3 3 1 3 2 1 6 2 3 4 2 2 1 7 6 6 6 4 1 0 1 7 1 1 9 3 1 6 6 9 0 1 ) 1 . V m u r t c e p S d o M e g A D I 3 3 3 3 3 3 1 1 3 1 3 3 3 3 3 . 4 9 . 8 1 . 0 1 5 . 2 1 6 . 6 1 . 2 1 4 1 0 1 4 2 . 9 2 . 8 7 6 . 1 1 9 . 1 1 1 2 3 4 5 6 7 8 9 0 1 1 1 2 1 3 1 4 1 l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 528 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta Finally, since the passage of time inevitably affects any longitudinal study and since development occurs in these children under unexpected or hidden coping mechanisms of their physical development, we also nor- malized by age at the time of each visit. In addition to the use of the abso- lute scores above (to test for absolute value effects), we use relative scores (to test for derivative effects) assessing the rate of change, in scores per age quantity. The latter denote dynamic changes expected to occur in childhood owing to the nonlinear accelerated and irregular nature of neurodevelop- 蒙特 (Torres, 史密斯, Mistry, Brincker, & Whyatt, 2016). As in the previous analyses, we here employed nonparametric tests. 2.4 Assessment of the ADOS Scores in the Open Access Autism Brain Imaging Data Exchange Repository (ABIDE). The ABIDE records contain ADOS-2 and ADOS-G scores that we extracted to plot the distributions of these tests’ subscores across 1324 参与者, ranging between 5 和 65 年龄. We present the distribution of participants in Figure 3 compris- ing clinical records from ABIDE I and II. Preliminary results from this work involving involuntary head motions during resting state in fMRI studies were published elsewhere (Caballero, Mistry, Vero, & Torres, 2018; Torres & Denisova, 2016; Torres, Mistry, Caballero, & Whyatt, 2017). Here we fo- cus on the nature of the distributions of the ADOS scores, assumed to be symmetric by the ADOS manual and by reliability and validity tests com- monly used in clinical psychology to validate this diagnostic tool driving basic scientific research (Havdahl et al., 2017, 2016). Clinical tools for diagnoses use the tenets of signal detection theory (SDT; Swets, 1996; Swets & 皮克特, 1982), albeit in a black box approach that has yet to verify the implicit assumptions made by the statistical packages that such papers report. It has been suggested that nonparametric methods to correct for nonnormality may be inadequate in a diagnostics test (Witt, Tay- lor, Sugovic, & Wixted, 2015). This may be even more problematic in a test that has not mapped out the noise levels inherent in the person’s behaviors. 例如, in the ADOS, as in many other clinical tests, the clinician plays the dual role of being the stimulus (via the prompts to evoke social over- tures or primed responses, 或两者) 和, 同时, the observer, scor- ing the participant’s response. The participant’s responses contain motor noise with specific age-dependent signatures (Torres, Brincker, 等人。, 2013) that are not currently considered in the ADOS scores. 尽管如此, they are a subliminal source of information (IE。, through visual feedback) that the raters may unconsciously use to determine the scores that eventually help detect autism. The detection step is hindered by the very outcome of this test, which is aimed at measuring social interactions between two people. The process of scoring is heavily one-sided. The outcomes depend on the clini- cian’s observation. In the absence of the required assumptions of normality and variance homogeneity, the sensitivity (d(西德:2) ) and response bias (β ) do not l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 529 n i s t n a p i c i t r a p 4 2 3 1 m o r f n o i t a m r o f n i S O D A d n a s i s o n g a i d e h t o t g n i d r o c c a ) 3 4 6 2 ( s t n a p i c i t r a P ) A ( . e l p m a s a t a d E D I B A : 3 e r u g i F r o f x e s d n a e g a y b d e d i v i d o s l a e r a s m a r g o t s i H . ) 乙 ( s t n a p i c i t r a p e l a m 2 6 1 2 d n a e l a m e f 1 8 4 : x e s y b d e d i v i d m s i t u a f o m u r t c e p s e h t . 4 e l u d o m r o 3 e l u d o m d e m r o f r e p o h w s t n a p i c i t r a p f o s n o i t u b i r t s i d d n a , S A , D S A , D T l D o w n o a d e d f r o m h t t p : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 530 乙. Torres, 右. Rai, S. Mistry, 和乙. Gupta represent accurate and independent measures (Swets & 皮克特, 1982; Witt et al., 2015). There are codependencies between them. When we add sub- tle micromotions present in the bodily biorhythms of the child, noise that comes from them in autistic individuals may also have an impact on the visual feedback that the clinician receives. How this inherent motor noise affects scoring is unknown (via visual feedback and unconscious mirroring) but may be a contributing factor in the scoring of the child’s responses. Because of the broad use of this test in autism research and the possibil- ity that the assumptions required for research validity may not be met, it becomes crucial to verify the normality assumption of the distributions of scores and subscores, along with the assumption that the rater’s bias and response are independent. To that end, we take the unique opportunity that ABIDE offers with thousands of records providing the ADOS-G and ADOS-2 versions of the test for DSM-IV and DSM-5 criteria, and we exam- ine the distributions of the ADOS scores. We gather the scores in frequency histograms built using various binning procedures (弗里德曼 & Diaconis, 1981; 斯科特, 1979, 2015). Then we fit the normal distribution and other dis- tributions (例如, log normal, exponential, gamma, Weibull) using maximum likelihood estimation (MLE) methods in Matlab. In addition to the MLE tests, we use the Lilliefors test (Lilliefors, 1967) and the Kolmogorov-Smirnov test to compare theoretical distributions to the empirical one we get from the ABIDE data sets. Besides testing the distribution of ADOS-G and ADOS-2 scores across the full set of ABIDE, we also separate the clinical data by module 3 versus module 4. 更远, we separate the data from the females and the males in the ABIDE repository, which contains an unusually large number of females (absent in any random draw of the population). Typical females, females with ASD, and females with Asperger’s syndrome can be physio- logically stratified and separated from the males using involuntary micro- movements (Torres, Isenhower, 等人。, 2013; Torres et al., 2017) and voluntary movements (Torres, Isenhower, 等人。, 2013). Here we ask if the ADOS scores of ABIDE separate them too or if, unlike physiological criteria, the clinical ADOS scores confound males and females. Note that in any regular study, this question cannot be asked owing to the near five-to-one autistic male- to-autistic female ratio in the population. A random draw of the autistic population would not give us enough power to assess the ADOS scores in males versus females. All studies in the ABIDE data repository were performed under IRB ap- proval in accordance with the Helsinki Act. 3 结果 3.1 Normative Data Spread Significantly Departs from Lowest-Bound 0-Score. The analyses of the in-person visit to assess the ADOS-2 in 52 l 从http下载 : / / 直接的 . 米特 . / e d u n e c o a r t i c e – 压力 / 的f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 压力 . / 来宾来访 0 8 九月 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 531 participants—26 suspected (and confirmed) as having autism or ASD and 26 typical controls—yielded significantly different distributions of scores between the two age- and sex-matched groups. Figure 4A shows the fre- quency histograms of the ASD (顶部) and typical control (底部) 团体. Figure 4B shows the outcome of the nonparametric Kruskal-Wallis (一- way ANOVA) test highlighting the statistical significance of differences captured by the comparison. 此外, the right panel of Figure 4B highlights the output of this nonparametric test for the comparison of the 0-score, denoting the absence of behavioral symptoms. For typical controls, the scores from the empirical data spanned a distribution of nonzero values, evident at the p < .01 level. 3.2 ADOS-2 RRB Outcome Is Significantly Affected by Clinician. The data from the repeated measurements across four visits enabled us to exam- ine the influences of the clinician in 14 children who returned to the lab to perform the ADOS-2 test. The scores from all the children were pooled, and the total score was compared across visits using the nonparametric Kruskal- Wallis test followed by the multicompare test. The total score comparison revealed no significance (chi square, 7.21; p-value, 0.06). Yet given the bor- derline value close to the 0.05 significance level, we examined the social affect score and the ritualistic repetitive behavior (RRB) score making up the total. We found no differences across visits in the social affect score. How- ever, the RRB score significantly changed across visits (chi square, 21.01; p-value, 0.0001) with major differences when switching clinician in visits 3 and 4. Despite the use of the same modules, room setup, and task fluidity for each child, the differences in ADOS-2 scores for RRB were marked as systematically different by the post hoc multicompare test. These outcomes can be appreciated in Figure 5 for total (panel A), social affect (panel B), and RRB (panel C). We further tested all the scores for each clinician by pooling across all children and score type to examine the types of distributions best fitting their frequency histograms. Figure 6A shows this analysis for each clinician, while Figure 6B shows the output of the nonparametric Kruskal-Wallis test, which revealed statistically significant differences. Figure 6C shows the fail- ure of normality for scores by clinician 1, and Figure 6D shows that for clini- cian 2. The use of MLE to ascertain the fit of several probability distribution functions confirmed that the normal is not a good fit for either (see Figure 6E). Further, the gamma distribution was used as per the MLE outcome to fit the data and compare the scores of the two clinicians for the same chil- dren, same modules, same module order or visit, and same order of tasks. Figure 6F shows the fits of the normal distribution (left-hand side) and the gamma distribution (right-hand side). The gamma distribution fit was best for clinician 2 and poor for clinician 1. The normal was poor for both. The log normal and exponential were also poor fits. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 532 E. Torres, R. Rai, S. Mistry, and B. Gupta y c n e u q e r F ) A ( . t s e t S O D A e h t s t p o d a t a h t h c r a e s e r m s i t u a r o f c i r t e m y t i r a l i m i s e u r t a d l i u b o t a t a d e v i t a m r o n r o f d e e n e h T : 4 e r u g i F . s l o r t n o c l a c i p y t f o e s o h t s u s r e v s f f o t u c m s i t u a d n a D S A h t i w s l a u d i v i d n i r o f s e r o c s d e t a r 2 - S O D A e h t m o r f s e r o c s f o s m a r g o t s i h l a c i p y t r o f s e r o c s f o n o i t u b i r t s i d a ) l e n a p t h g i r ( d n a s e r o c s l a t o t ’ s p u o r g o w t e h t n e e w t e b e c n a c fi i n g i s l a c i t s i t a t s h t i w s e c n e r e f f i D ) B ( d e c n e r e f e r - m r o n a o t d e c n e r e f e r - n o i r e t i r c a m o r f t i t r e v n o c o t m e t s y s g n i r o c s S O D A e h t f o t n e m s s e s s a e r a r o f g n i l l a c s l o r t n o c c fi i t n e i c s c i s a b n i s e i r o t c e a r t j l a t n e m p o l e v e d e v i r e d d n a e g n a h c e r u s a e m o t e l b a n e m a , e c a p s c i r t e m e t a i r p o r p p a n a d l i u b d n a m e t s y s . h c r a e s e r l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 533 y a w - e n o c i r t e m a r a p n o n s i l l a W - l a k s u r K e h t f o t u p t u O ) A ( . n a i c i n i l c r e t a r e h t o t e r o c s b u s 2 - S O D A B R R e h t f o y t i l i b i t p e c s u S : 5 e r u g i F s t i s i v n i 2 n a i c i n i l c d n a 2 d n a 1 s t i s i v n i 1 n a i c i n i l c n e e w t e b e r o c s l a t o t n i s e c n e r e f f i d s w o h s t a m r o f s r e k s i h w - d n a - x o b g n i s u A V O N A e h t f o n o i t u b i r t s i d e r o c s e h t n i s e g n a r r e d a o r b e h t e c i t o N . ) 6 0 . 0 < p ( e c n a c fi i n g i s l a c i t s i t a t s h c a e r t o n o d s e c n e r e f f i d e s e h t t u b 4 d n a 3 . s n a i c i n i l c n e e w t e b t n e r e f f i d y l t n a c fi i n g i s t o n e r e w s e r o c s b u s t c e f f a l a i c o S ) B ( . l l a r e v o s e r o c s r e h g i h f o e c n e s e r p e h t d n a 2 n a i c i n i l c n e r d l i h c e h t d e t a r 1 n a i c i n i l C . r e d r o k s a t e m a s d n a , s e l u d o m e m a s , n e r d l i h c e m a s e h t r o f t n e r e f f i d y l t n a c fi i n g i s e r e w s e r o c s B R R ) C ( s e t a r e h t f o s e u l a v r e h g i h e h t , t s a r t n o c n I . m u r t c e p s m s i t u a d r a w o t s a i b a o t g n i t u b i r t n o c s e g n a r l l a r e v o h t i w , r e w o l y l t n a c fi n g i s i . n e r d l i h c e m a s e h t r o f s i s o n g a i d m s i t u a d r a w o t s a i b a o t e t u b i r t n o c 2 n a i c i n i l c y b l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 534 E. Torres, R. Rai, S. Mistry, and B. Gupta n i d e t a i c e r p p a e b n a c s e l y t s n a i c i n i l c r e t a r n i s e c n e r e f f i D ) A ( . n a i c i n i l c r e t a r h c a e m o r f s e r o c s l l a r e v o f o y t i l a m r o n n o N : 6 e r u g i F t r o h o c e m a s e h t r o f s e r o c s n i s e c n e r e f f i D ) B ( . t r o h o c e m a s e h t m o r f d e n i a t b o e n o h c a e t a h t s e r o c s e h t f o s m a r g o t s i h y c n e u q e r f e h t . ) s e r o c s 2 1 1 , s e r o c s b u s 2 × s t i s i v 4 × n e r d l i h c 4 1 : s t n e m e r u s a e m f o r e b m u n l l a m s y l e v i t a l e r e h t e t i p s e d ( e c n a c fi i n g i s l a c i t s i t a t s h c a e r e t a i r p o r p p a n i n o i t u b i r t s i d l a m r o n e h t s r e d n e r t u p t u o E L M ) E ( . n a i c i n i l c r e t a r h c a e m o r f s e l p m a s e r o c s e h t f o y t i l a m r o n n o N ) D C , ( e h t s e r o c s r e d n u F D C l a c i r i p m e t e s e r o c s h c a e r o f y l i m a f a m m a g e h t s u s r e v l a m r o n e h t g n i t t i F ) F ( . e s a c n a i c i n i l c r e t a r h c a e n i e h t h t i w n e r d l i h c f o t r o h o c e m a s e h t f o g n i t a r r i e h t f o y t i l i b a i r a v e h t m o r f d e v i r e d s e r u t a n g i s c i t s a h c o t s t n e r e f f i d y l l a t n e m a d n u f . s e l u d o m 2 - S O D A e m a s l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 535 3.3 Age-Corrected ADOS-2 Derivative Scores Confirm the RRB Score as the Most Affected by Change in Rater Clinician. The relative changes in score and age (with the age measured in years, months) were obtained for each of the 14 participants we tracked over four visits. When examining these derivative data, we found significant differences in the RRB ADOS- 2 scores. Consistent with the effect that the change of clinician in visit 3 had revealed for the size data given by absolute ADOS-2 scores, here, the derivative data considering the age change of the participant from visit to visit also reveal significant changes in the RRB scores. The same individu- als were rated significantly different by the clinicians, thus yielding differ- ent scores for the same module and tasks. The Kruskal-Wallis test for the comparison of the ADOS-2 RRB scores across visits yielded significant dif- ferences across visits (chi square, 13.45; p-value, 0.003). Figure 7A shows the evolution of the clinician’ diagnostic criteria over time. Visits 1 and 2 with clinician 1 show 10 of 14 with autism versus 4 of 14 with ASD. This pattern changes in visit 2 for this rater clinician to 6 of 14 with autism and 8 of 14 with ASD. The second clinician rather differ- ently scores the same cohort of children performing the same tasks under the same room setup, from the same modules, compared to the rater clini- cian 1 in visits 1 and 2. In visits 3 and 4, rater clinician 2 scores these same children as 50-50 autism-ASD. The individual evolution for each child is seen in Figure 7B, whereby the different clinicians’ styles of scoring can be seen. There is no interrater reliability rendering these ADOS-2 criteria robust. For the same child and same ADOS-2 module, we see changes in the classification of autism versus ASD. These differences in perception bi- ases for this lab cohort will be examined next in relation to the distribu- tions of scores derived from the large cross-sectional population data from ABIDE. 3.4 ADOS-G and ADOS-2 Scores Do Not Distribute Normally. The ADOS-G and the ADOS-2 scores for each of the criteria comprising the total, communication, social, and stereotypical behaviors across ages in ADOS-G and the total, severity, social affect and repetitive ritualistic be- haviors in ADOS-2 were not normally distributed. Figure 8A shows the frequency histograms taken across ABIDE I and II scores for each of the above-mentioned criteria of the ADOS-G, while Figures 8B and 8C break down the scores for modules 3 and 4 (note that ABIDE does not report module 1, and module 2 is sparsely used). Figure 9 shows the corre- sponding frequency histograms for ADOS-G. The empirical cumulative distribution functions (CDFs) for each criterion are shown in Figure 8D for ADOS-2 and Figure 9D for ADOS-G. They were tested separately be- cause they are different ADOS versions, as were each subscore distribution. Notice here the similarities across CDFs for modules 3 and 4 and the similar- ity of each of these empirically estimated CDFs with the CDF corresponding to all scores. We note that pooling all scores is not valid, yet we do it here l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 536 E. Torres, R. Rai, S. Mistry, and B. Gupta r e t a r f o s a i B ) A ( . s k s a t d n a s e l u d o m 2 - S O D A e m a s e h t r e d n u n e r d l i h c f o t r o h o c e m a s e h t r o f s c i t s o n g a i d e l b a i l e r n o N : 7 e r u g i F n o i t a c fi i s s a l c 6 ± e g a r e v a c i t s o n g a i d f o y t i l i b a i r a V ) B ( . e m o c t u o e h t t c e f f a o t d n u o b s e l y t s t n e r e f f i d s l a e v e r 2 n a i c i n i l c r e t a r s u s r e v 1 n a i c i n i l c n o s r a e y 3 . 1 g n i n n a p s s t i s i v r u o f e h t n i n e r d l i h c g n i t a p i c i t r a p 4 1 e h t f o h c a e r o f ) m u r t c e p s m s i t u a s u s r e v m s i t u a ( . s h t n o m l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 537 d e n i a t b o y l l a c i r i p m E ) A ( . s e t i s I I d n a I E D I B A e h t n i n o i t u b i r t s i d e r o c s b u s h c a e d n a l a t o t 2 - S O D A e h t f o y t i l a m r o n n o N : 8 e r u g i F o t d e t c i r t s e r t u b A l e n a p n i s a e m a S ) B ( . s e r o c s E D I B A l l a m o r f s e r o c s b u s d n a l a t o t 2 - S O D A e h t f o h c a e r o f s m a r g o t s i h y c n e u q e r f e h t r o f e r o c s b u s h c a e r o f s F D C d e t a m i t s e y l l a c i r i p m E ) D ( . s e r o c s 4 e l u d o m o t d e t c i r t s e r t u b A l e n a p n i s a e m a S ) C ( . s e r o c s 3 e l u d o m t n a c fi i n g i s y l l a c i t s i t a t s o n t u b , ) t x e t e h t e e s ( y t i l a m r o n f o t s e t e h t d e l i a f s e s a c l l A . a t a d e h t o t t fi ’ s n o i t u b i r t s i d a m m a g d n a l a m r o n f o h c a e n e e w t e b d n u o f e r e w s e c n e r e f f i d o N . 4 d n a 3 s e l u d o m y b d e t a r e n e g s F D C l a c i r i p m e e h t n e e w t e b d n u o f e r e w s e c n e r e f f i d . A l e n a p n i a t a d d e l o o p e h t d n a s e l u d o m e h t l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 538 E. Torres, R. Rai, S. Mistry, and B. Gupta y l l a c i r i p m E ) A ( . s e t i s I I d n a I E D I B A e h t n i s n o i t u b i r t s i d e r o c s b u s e h t f o h c a e d n a l a t o t G - S O D A e h t f o y t i l a m r o n n o N : 9 e r u g i F t u b A l e n a p n i s a e m a S ) B ( . s e r o c s E D I B A l l a s s o r c a s e r o c s b u s d n a l a t o t G - S O D A e h t f o h c a e r o f s m a r g o t s i h y c n e u q e r f d e n i a t b o h c a e r o f s F D C d e t a m i t s e y l l a c i r i p m E ) D ( . s e r o c s 4 e l u d o m o t d e t c i r t s e r t u b A l e n a p n i s a e m a S ) C ( . s e r o c s 3 e l u d o m o t d e t c i r t s e r - i t s i t a t s o n t u b , ) t x e t e h t e e s ( y t i l a m r o n f o t s e t e h t d e l i a f s e s a c l l A . a t a d e h t o t t fi ’ s n o i t u b i r t s i d a m m a g d n a l a m r o n e h t r o f e r o c s b u s o n , 8 e r u g i F n i 2 - S O D A e k i L . 4 d n a 3 s e l u d o m y b d e t a r e n e g s F D C l a c i r i p m e e h t n e e w t e b d n u o f e r e w s e c n e r e f f i d t n a c fi n g i s y l l a c i . A l e n a p n i a t a d d e l o o p e h t d n a s e l u d o m e h t f o h c a e n e e w t e b d n u o f e r e w s e c n e r e f f i d l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 539 to show that the shape and dispersion of these histograms are quite similar despite the differences in module tasks. The Lilliefors test failed the normality test with p < 0.01 for each sub- score of each ADOS test version. More important, we used maximum like- lihood estimation (MLE) and fit several distributions to assess the best fit with 95% confidence. The MLE revealed that the continuous gamma family of PDFs had a better fit than the normal PDF. The results from the normal and gamma fits are shown in Figures 8D and 9D for each of the correspond- ing subscores CDFs of the ADOS-2 and ADOS-G, respectively. 3.5 Females and Males Are Indistinguishable According to ADOS Scores. The ADOS score data from ABIDE were divided into those for the male and female participants to ascertain if (1) if the distributions of the total and subscores were symmetric (test for normality) and (2) they had statistically different overall scores comprising social, communication, and stereotypical repetitive motions that could distinguish the two phenotypes. The motivation for this comparison emerged from the ABIDE repository in- voluntary head motion data accompanying these ADOS scores (Caballero et al., 2018; Torres et al., 2017) and from distinction based on motor pat- terns derived from natural voluntary behaviors (Torres, Isenhower, et al., 2013; Torres, Nguyen, et al., 2016). These patterns can also automatically and blindly separate females/males with ASD from females/males with AS (from the subset of ABIDE with a DSM-IV classification) (Torres et al., 2017). We reasoned that given that the ADOS tests are used to drive the science of autism (i.e., to correlate physiological data with it), it may be important to ascertain whether the variability in rater scoring from these versions of the ADOS matched the ability to distinguish the male from the female pheno- type using involuntary motor noise. This is important because such motor noise is inherent in the autistic phenotype and provides visual feedback to the rater that could influence the rater’s criterion currently missing the fe- males in ADOS-driven detection. Figures 10AB and 11AB show the distributions of scores for males (10A and 10B) and females (11A and 11B). In all cases, the normality test failed according to the Lilliefors test (p < 0.001). Recall that this test returns a decision for the null hypothesis that the data come from a distribution in the normal family, against the alternative that it does not come from such a distribution. In all cases, the test rejected the null hypothesis at the 5% significance level. As with the pooled data in Figures 7A and 8A and those from the breakdown into scores from modules 3 or 4 in Figures 7B and 7C and 8B and 8C, we used MLE to evaluate the fit of different distributions, which we show in the center panel of Figures 10A and 10B for the normal and the gamma family of distributions fit to the empirical CDFs of the total and subscores (the left side is ADOS-G and the right side ADOS-2). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 540 E. Torres, R. Rai, S. Mistry, and B. Gupta o t g n i d r o c c a e m o r d n y s s ’ r e g r e p s A h t i w e s o h t d n a D S A h t i w s e l a m e h t y b d e n n a p s s n o i t u b i r t s i d e h t f o y t i l a m r o n n o N : 0 1 e r u g i F l e n a p l a r t n e c e h t ; e r o c s b u s G - S O D A h c a e r o f s m a r g o t s i h y c n e u q e r f e h t s w o h s l e n a p t f e l e h T ) A ( . E D I B A f o n m u l o c V I - M S D e h t s w o h s l e n a p t h g i r e h t d n a ; e r o c s l a t o t e h t r o f s F D C y l i m a f a m m a g d n a l a m r o n l a c i t e r o e h t e h t y b t fi s ’ F D C l a c i r i p m e e h t s w o h s s F D C f o n o i t a r a p e s e h t e c i t o N . e m o r d n y s s ’ r e g r e p s A f o e s a c e h t r o f A l e n a p n i s a t a m r o f e m a S ) B ( . 2 - S O D A e h t r o f s t o l p r a l i m i s . A l e n a p n i s e s a c D S A e h t f o e s o h t h t i w g n i t s a r t n o c 2 - S O D A d n a G - S O D A n e e w t e b l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 541 - s i h y c n e u q e r F ) A ( . y r o t i s o p e r E D I B A e h t n i S A d n a D S A h t i w s e l a m e f f o s n o i t u b i r t s i d e h t y b d e n n a p s y t i l a m r o n n o N : 1 1 e r u g i F - r i p m e e h t g n i w o h s l e n a p r e t n e c e h t h t i w , ) l e n a p t h g i r ( 2 - S O D A d n a ) l e n a p t f e l ( s e r o c s b u s d n a s e r o c s l a t o t G - S O D A e h t f o s m a r g o t h t i w s e l a m e f r o f A l e n a p n i s a e m a S ) B ( . s e r o c s 2 - S O D A d n a G - S O D A e h t r o f s n o i t u b i r t s i d a m m a g d n a l a m r o n e h t y b t fi F D C l a c i . e m o r d n y s s ’ r e g r e p s A f o s i s o n g a i d V I - M S D a l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 542 E. Torres, R. Rai, S. Mistry, and B. Gupta Despite the well-established physiological and neurobiological distinc- tions between males and females in the spectrum of autism, we could not find any statistically significant difference between the ADOS-G (or the ADOS-2) total scores to automatically separate these two distinct phenotypes. We could not find either statistically significant differences between the subscores of each ADOS test when comparing males and fe- males using the Wilcoxon rank-sum test, as all p-values were above 0.5. This is important, as motor noise (whether visible to the rater or occur- ring beneath the rater’s awareness) would add to the observation criteria via visual feedback unless it is purposefully discarded as irrelevant to the diagnosis. However, we did notice a significant difference in the distributions of the total scores for males when comparing those of the ADOS-G versus those of the ADOS-2 (see the center panel in Figure 10B). This difference was as- sessed using the Kolmogorov-Smirnov test for two empirical distributions and yielded a p-value of 0.0015. In contrast, this statistically significant dif- ference was absent in the ADOS-G versus ADOS-2 comparison of total scores from the females’ data with a p-value of 0.8 (see Figure 11A, center). Note here that these comparisons do not have meaningful clinical value. They are exclusively taken within a research framework to learn whether two versions of the same test always yield consistent separation or if in some cases they do not. Given the visible separation of ADOS-G and ADOS-2 scores for the males, we proceeded to further interrogate the cohort of participants with AS according to the DSM-IV classification. This is possible as ABIDE pro- vides the information on DSM-IV versus DSM-5 on two separate columns of the data matrix. We then asked if the males with AS were also separa- ble from the males with ASD, according to the ADOS scores from the two versions of the same test. Notice here the relevance of this question, as the ADOS-G and ADOS-2 are indistinctly used in autism research (as instanti- ated by the ABIDE repository), and no differentiation is ever made by peer- reviewed papers that use one or the other to inform and guide the results from their physiology- and neurobiology-based research. 3.6 Incongruent Results between ADOS-G and ADOS-2 When Com- paring Males with ASD and AS. We expected that despite their subtle dif- ferences, the variants of the same test would provide consistent results for participants with AS and for those with ASD. In the case of ADOS-G, we found a statistically significant difference between the scores of males with ASD and those with AS using the Wilcoxon rank-sum test, which yielded a p-value less than 0.001 (3.7 × 10−7), significant at the 0.001 level. Further, the Kolmogorov-Smirnov test comparing two empirical distributions yielded −6). These re- statistical significance at the 0.001 level (p-value of 2.4 × 10 sults rendered the two distributions for AS and ASD statistically different l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 543 at the 0.001 level (i.e., rejecting the null hypothesis that the two sets came from a similar distribution). In contrast to the ADOS-G, the ADOS-2 total score comparison (i.e., within the same test) between the males with ASD and AS, using the Wilcoxon rank-sum test, yielded a p-value of 0.31, nonsignificant at the 0.05 level. The Kolmogorov-Smirnov test comparing two empirical distributions yielded a p-value of 0.5, with no significant difference and failing to reject the null hypothesis that the two data sets came from a similar distribution. Figure 10 shows the frequency histograms and CDF fits to the empirical data for males using the left-hand panel for the ADOS-G and the right-hand panel for the ADOS-2. Notice that the comparisons were made within each test, as they are different versions of the ADOS. Comparisons of the females with ASD and AS are shown in Figure 11, using the total scores from the ADOS-G and ADOS-2, respectively. These outputs were congruent for the two versions of the test in that neither ADOS version distinguished the two groups. The Wilcoxon rank-sum test yielded a p-value of 0.30 for ADOS-G and 0.54 for ADOS-2. The Kolmogorov- Smirnov test comparing two empirical distributions yielded a p-value of 0.42 for ADOS-G and 0.96 for ADOS-2. Notice that in both males and fe- males, breaking the data into module 3 or 4 scores made no difference. As in the case of Figures 8 and 9, the distributions retained the shape and disper- sion across the modules, suggesting that blind classification of participants is not possible using these tests. In other words, given the scores of a module 3 or a module 4 version containing different tasks, it is not possible to know if they came from module 3 or 4, as they span identical distributions. The numbers are statistically indistinguishable, so any machine learning algo- rithm attempting to classify participants based on these scores would fail, despite their coming from entirely different sets of tasks and being aimed at assessing individuals with disparate language or communication levels. We note that given the differences in sample size and the nonnormality in the distributions of scores, we used the Wilcoxon rank-sum test in all the above comparisons. This is a nonparametric test for two populations, used when samples are independent and of different sizes. It is equivalent to the nonparametric Mann-Whitney U-test for equality of population medians. The test statistic that rank-sum returns is the rank sum of the first sample (Hollander & Pena, 2004). 4 Discussion In this work, we studied the statistical properties of the scores generated by the ADOS test in two different contexts. In one study, we assessed the scoring of research-reliable testers in a laboratory environment, using the ADOS as an experimental protocol. In the other study, we examined the ADOS scores reported by research-reliable raters in the open access ABIDE l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 544 E. Torres, R. Rai, S. Mistry, and B. Gupta repository (Di Martino et al., 2014), including 1324 clinical records of par- ticipants with ASD. We examined the extent to which the rater’s style and inherent biases could change the autism-versus-ASD diagnosis for a given child under similar laboratory conditions. We further asked in this laboratory context which of the scores would be the most affected by the change in rater. We found significant differences in diagnoses that depended on the rater. We also found that the RRB scores were the most susceptible to the nuances of the rater’s style and inherent biases. However, given the small sample set of ADOS scores for the study in the lab, we decided to examine the sta- tistical features of the ADOS scores in a much larger data set. To that end, we used the open-access ABIDE repository and interrogated the reported scores of ADOS-G versus ADOS-2 in several cohorts. These included scores from older studies that reported scores based on the DSM-IV criteria sep- arating AS and ASD. We also took advantage of the rare opportunity that ABIDE offers with respect to females with AS and ASD. This repository has large numbers of females affected by these conditions. These are difficult to find in any random draw of the population, given the disparate ratio of approximately five males for every female in the spectrum. Given the com- parable numbers in ABIDE, we could contrast the statistical features of their scores to those of corresponding males. We found that the distributions of scores from the ADOS tests do not fol- low the a priori assumption of normality that researchers who are adopters of the test often make. Given these findings, it may be important to recon- sider adopting this test to inform basic scientific research in neurodevelop- ment. Across a multitude of research papers, discrete scores that do not have a properly defined norm are systematically forced to be (linearly) correlated with continuous physical data. Yet the lack of normality in the distributions of the scores, along with the lack of independence between raters’ bias and sensitivity, pose a problem for research validity, according to SDT and ROC area-under-the-curve types of analyses (Somoza & Mossman, 1991). One now wonders how many false positives we may have in research studies. The sensitivity and reliability tests that researchers adopting this clin- ical scoring system assume to assess its validity are built under specific statistical assumptions of normality (Jang, Wixted, & Huber, 2009; Kroll, Yonelinas, Dobbins, & Frederick, 2002; White & Wixted, 2010; Witt et al., 2015). Under such assumptions, the shape and dispersion of the variability from the scores’ probability distributions are critical to maintain indepen- dence between the inherent bias of the observer and the sensitivity of the test. What does it really mean to have autism or to be on the autism spectrum? And how can that distinction be made relative to normative data from typ- ically developing controls when no such data exist? Here we see, even in a rather modest cohort of neurotypical participants, a significant spread of scores for typical neurodevelopment, thus indicating the presence of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 545 behavioral symptoms in the neurotypical population. Given such fluctua- tions, how can we build a proper metric for neurodevelopmental research? The ADOS range of scores is based on positive integer values, with 0-value at the lower bound, that is, the behavior is not present as spec- ified. Behaviors coded on the ADOS are assumed to be those that occur in nonspectrum individuals (e.g., eye contact, pointing, shared enjoyment), as well as behaviors that could occur in ASD (e.g., stereotyped or idiosyn- cratic language, complex mannerisms). We note that those motions present in nonautistic individuals have not been experimentally assessed under ADOS conditions, as noted in the classical paper by Lord et al. (2000) in the last sentence of the paper. However, motions inherently present in natural behaviors that scaffold social interactions, such as eye contact, pointing movements, and face-processing micromotions, are routinely studied in the neuromotor control developmental literature. These studies use objective means that characterize several types of motions (Klin, 2000; Klin et al., 2002a, 2002b; Klin, Lin, Gorrindo, Ramsay, & Jones, 2009; Tor- res, Brincker, et al., 2013; Torres, Isenhower, et al., 2013; Torres, Yanovich, & Metaxas, 2013), and would be amenable to combine with validated clinical scores of subjective observation-based inventories. Combining such com- plementary data would bring a level of precision that we now lack in be- havioral analyses that is purely observational and does not reflect the neu- rological aspects of this condition. As demonstrated by our analyses, the human eye has a limited capacity to detect motor noise at various layers of motor control inherently present in social interactions, communicative language, and repetitive ritualistic mo- tions of the kinds that the ADOS assesses. Relying on such data exclusively poses a challenge to basic research when guiding and informing our sci- entific quest in the search for adequate medical treatments. Clinical tools in other fields are now routinely combined with digital data and commercially available biosensors to help patients and clinicians obtain feedback along more than one data channel, beyond the limits of the naked eye. In neu- rodevelopment, and particularly in disorders of the nervous systems that go on to receive a diagnosis of autism today, it may be important to com- bine multiple sources of the various layers of the knowledge network (as in Figure 2A) to inform personalized approaches of new, more effective ways to develop target therapies. Along those lines, the lack of normative data seems to be an obstacle to the precision medicine paradigm, which could benefit from stratifying subtypes in heterogeneous presentations under an umbrella term like autism. Other fields could help autism researchers establish a proper metric to quantify the type of accelerated rates of change that one sees in early neurodevelopment by characterizing normative states and departure from them at expected neurodevelopmental milestones. For example, in pedi- atrics, the growth charts from the Centers for Disease Control and the World Health Organization serve the purpose of establishing normative criteria to l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 546 E. Torres, R. Rai, S. Mistry, and B. Gupta measure departure from typical development as the child physically grows day by day. In autism, there is nothing comparable to the growth charts, so there are no metrics of similarity to detect and track change and its rate. Indeed, it has now been established that observational behavioral crite- ria grounded on psychological, behaviorally defined (subjective) constructs drive the scientific quest in autism and supersede physiological (objective) criteria (Torres & Whyatt, 2018; Whyatt & Torres, 2018). The time is ripe to build a new standard similarity metric that combines validated clinical scores with the type of motor noise data that we can easily harness today with wearables from the fast-growing and rapidly developing nervous sys- tems of young infants and young children. One challenge ahead to combine digital and clinical data, as other fields have done for personalized assessments under the tenets of precision medicine, is the absence of a proper age-dependent statistical framework to measure relative changes away from typical levels. In this sense, although the ADOS modules were designed to account for possible disparities in cognitive and verbal capacity, there are no age-dependent physical crite- ria in the research-adopted version to ascertain the physical rate of change in nervous systems that develop asynchronously across the population. In autism two children of the same physical age may have entirely different profiles of motor noise in the different levels of neuromotor control that contribute to the emergence of autonomous social interactions and commu- nicative language. Since behaviors are made up of motions at the observable macro- and the hidden microlevels, we need new metrics that consider the interactions between these disparate levels of inquiry combining different time, spatial, and frequency scales. We also need methods that reveal in- herent capacity for entrainment and synchronous dyadic exchange during social interactions. Many of the children with autism are capable of social exchange when properly supported. Yet, the support depends on our knowledge of what the coping nervous systems of the child can already do at the voluntary, spontaneous, automatic, involuntary, reflexive, and autonomic levels of function. If we had a better sense of the inherent capacity for social ex- change, we could work with that child rather than inavertedly stress the nervous systems with more uncertainty—for example, by prompting the child to perform expected (socially appropriate) behaviors. In the absence of a test that can reveal these features through proper metric scales, research to develop such accommodations may be impeded. The lack of normality accompanied by the lack of methodology to properly reveal change and its rate in a rapidly developing system poses a challenge for researchers who aim at developing personalized targeted therapies for stratified autism sub- types. It is difficult to reveal the target for medical treatment in the face of one-size-fits-all methods that assume a single statistical distribution across the population. This assumption also bears the potential for false positives and lower reliability than has been assumed up to now. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 547 Even the so-called standardized ADOS severity scores do not address these needs; they were developed for criteria-referenced clinical use, so the scale was not built using typical controls as a norm-referenced test would be (Gotham, Pickles, & Lord, 2009; Hus, Gotham, & Lord, 2014; Hus & Lord, 2014). The score is designed to compare an individual with ASD to other in- dividuals with ASD of the same age and language level. It also has a range of ages 6 to 10 for individuals with ASD and is not meant to represent ASD on a range of 1 to 10. Autism is not only highly heterogeneous. It also has neurodevelopmental asynchronies in a group of the same age, meaning that two individuals may be 10 years old but one may have the signatures of neuromotor control from normative 3-year-old children (Torres, Brincker, et al., 2013). See Figure 14 in the appendix. Thus, aging with autism is differ- ent from typically aging. What is the normative range of scores that reflect age-appropriate typical social interactions? Because of the prevalent influence that the research-grade ADOS test has on basic science at all levels, it is imperative to reexamine the inherent the- oretical assumptions that adopters of this test have made and verify that the outcome of this test, as administered in research settings, empirically matches the theoretical assumptions of the users. Data-driven approaches tend to preserve empirically assessed features of phenomena. They do not throw away important variability and offer the possibility of capturing the true nature of change in a coping neurobiologi- cal system that is developing at atypical rates. Although this article uses the ADOS test as the example to illustrate the potential problems that blindly adopting such tests for scientific research may create, the same tenets ap- ply to any other clinical test used in basic research of neurodevelopmental disorders. These disorders reflect in great part problems with the nervous systems, and since nervous systems are adaptable we would be missing self-correcting mechanisms by imposing theoretical models without empir- ically informing those models. Open access repositories now make it possible to examine (for the first time by researchers from different fields with different skillsets) the valid- ity of the use of this instrument in research drawing on the large number of records now available. These provide enough statistical power, where the high cost of running these studies often prevents labs from deploying them. For example, recent work using the National Database for Autism Research (NDAR) and the Simons Simplex collection demonstrated that it is possible to shorten the administration time of the ADOS while preserving cutoff cri- teria. While this earlier work already highlighted the nonnormality of the distributions of the scores reported in those data repositories (Duda, Kos- micki, & Wall, 2014), the work presented here takes a deeper look at the assumptions that the research-grade test makes. Here, we further raise sev- eral relevant questions about the need for normative assessment across the human life span to truly formulate a standard test. Such a test would have an age-appropriate metric for research use in neurodevelopment, that is, to l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 548 E. Torres, R. Rai, S. Mistry, and B. Gupta enable derivation of developmental trajectories across different parameters scaffolding the emergence of social behaviors. 4.1 Positive Aspects of the ADOS Adoption in Clinical Settings. Per- haps the use of the ADOS in the clinical arena is less damning than the use of its research-grade version in basic scientific research. In the United States, the autism label ensures coverage for certain treatments that many children could not otherwise have access to, particularly in underrepresented mi- norities and poor rural areas. Although ADOS testing does not produce an official diagnosis, its use in research labs helps refine the criteria from the DSM-5 and lends more specificity to the symptoms than other criteria. In the United States, this research-grade version could serve as a flag to send parents to federally certified clinics that offer services upon multiprone cri- teria involving other tests. However, owing to copyright issues, Western Psychological Services (WPS) does not allow researchers to copy, repro- duce, or share with the families the ADOS booklet with important details of the outcome. In other words, the children who come to our labs, receive the research-grade ADOS, and pass the cutoff scores are labeled autistic by this test. However, as researchers, we are not allowed to share details with their parents. This obstructs their ability to go to a proper clinic, show these research-grade results, and pursue the diagnosis that will give them access to early intervention programs or individualized education programs for children of school age. If the WPS and the trainers of the ADOS allowed this, the test adopted by researchers would serve as a warning to parents that some aspects of the child’s neurodevelopment may be offtrack and ac- celerate the official diagnosis. In our own experience, in poor regions of the United States, it can take two years before an official diagnosis grants ser- vices to families upon suspicion that the infant is offtrack. Unlike the statistical confounds that the ADOS total scores and subscores surely bring to research, in its current form at the clinic, the ADOS pro- vides psychological comfort to adults who had never been previously di- agnosed and could not understand their place in the social scene. Many adults, newly diagnosed at the clinic, express a sense of relief on learning that they are on the autism spectrum and have social interaction differences. Further, the ADOS adds important information to the coarser DSM diagno- sis. Thus, the clinical value of this instrument is highly appreciated. When used in laboratory settings as an experimental protocol to study social in- teractions, the structured ADOS inventory can facilitate the development of new models to study the neuromotor control underpinnings of social exchange. This combination of the clinical protocol and the digital data from wearables has offered new ways to assess complex dyadic behaviors, amenable to extend to real situations. However, continued reliance on its coarse scores while plagued with false positives will prevent us from auto- matically (in the blind) stratifying the different forms of autism and opening l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 549 new avenues to researchers in genetics and other important emerging fields of metabolomics and the human microbiome. 4.2 Somatic Sensory Motor Issues in Neurodevelopmental Disorders on a Spectrum. There is room for transformative change to improve the use of such clinical tests in scientific research studies in general. After years without recognizing the importance of sensory motor issues in men- tal health (Bernard & Mittal, 2015), the Research Domain Criteria (RDoC) matrix created by the National Institute of Mental Health (Cuthbert & Insel, 2013; Insel, 2014) finally included in January 2019 the entry for sensorimo- tor issues. This new development, paired with the admission by the DSM-5 that there are sensory issues in autism, will provide a new foundation to explore human social behaviors from a new angle that can include move- ments and their kinesthetic sensation (Torres & Whyatt, 2018), while offer- ing the opportunity to uncover the inherent capabilities and predispositions that a coping nervous system develops (Brincker & Torres, 2013). The ADOS test nevertheless still fails to recognize sensorimotor issues (Lord et al., 2000), as we quote the following caveat when choosing a mod- ule from the manual: Note that the ADOS-2 was developed for and standardized using pop- ulations of children and adults without significant sensory and motor im- pairments [emphasis added]. Standardized use of any ADOS-2 module presumes that the individual can walk independently and is free of vi- sual or hearing impairments that could potentially interfere with use of the materials or participation in specific tasks. (Catherine Lord, Rutter, DiLavore, Risi, & Western Psychological Services Firm.)2 Interestingly, despite this caveat for the use of the ADOS test stated in the manual, to the best of our knowledge, the makers of the ADOS test have never reported scientific studies of individuals with autism where it is objectively established that individuals in the spectrum of autism have no significant sensory and motor impairments. Yet when we test the chil- dren in basic scientific research labs and use high-grade instruments, we in- variably find visual, hearing, olfactory, and touch impairments that would surely interfere with the use of the materials in this test. More important yet, such motor noise, involuntary micromotions, dysautonomia and other sen- sorimotor differences contribute to the disparity in scoring styles because inevitably, not all raters perceive these problems through a unifying lens. Some let it enter into their criteria for scoring, while others filter them out altogether. Figure 7 provides a clear example of it, and so do the inconsis- tencies with the RRB scores. These highly quantifiable problems with their somatic and sensorimotor systems are the tip of the iceberg, as deeper problems are present with their 2 https://www.wpspublish.com/ados-autism-diagnostic-observation-schedule l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 550 E. Torres, R. Rai, S. Mistry, and B. Gupta enteric nervous systems (the gut) and microbiome (Fattorusso, Di Genova, Dell’Isola, Mencaroni, & Esposito, 2019; Heiss & Olofsson, 2019; Hughes, Rose, & Ashwood, 2018; Ng et al., 2019; Pulikkan, Mazumder, & Grace, 2019). Many suffer from pain and temperature dysregulation as their over- all sense of touch, vestibular issues with balance, and multi-sensory inte- gration overwhelm them in ways that we can now precisely quantify in a personalized manner. All of these issues deeply affect the ability to develop social interactions. Perhaps the new NIH-RDoC sensorimotor criteria will help WPS redefine ADOS for research and encourage the use of new ob- jective criteria grounded on biophysical metrics assessing the nervous sys- tems’ functions. There is mounting evidence that somatic-sensory-motor issues do ex- ist across the many phenotypes that go on to receive this diagnosis today under the DSM-5 broader criteria community (Behere, Shahani, Noggle, & Dean, 2012; Campione, Piazza, Villa, & Molteni, 2016; Donnellan & Leary, 1995; Eigsti, Rosset, Col Cozzari, da Fonseca, & Deruelle, 2015; Hannant, Tavassoli, & Cassidy, 2016; Jasmin et al., 2009; Kushki, Chau, & Anagnos- tou, 2011; Mandelbaum et al., 2006; Minshew, Sung, Jones, & Furman, 2004; Mosconi et al., 2013; Mosconi, Mohanty, et al., 2015; Mosconi & Sweeney, 2015; Mosconi, Wang, Schmitt, Tsai, & Sweeney, 2015; Ornitz, 1974; Perry, Minassian, Lopez, Maron, & Lincoln, 2007; Siaperas et al., 2012; Torres, Brincker, et al., 2013; Torres, Isenhower, et al., 2016; Torres, Yanovich, et al., 2013; Troyb et al., 2016; Whyatt & Craig, 2013) and many more. The neurological differences in autism have been well documented since as far back as the 1970s and solidified as a model by the 1980s (Maurer & Damasio, 1979, 1982; Vilensky, Damasio, & Maurer, 1981). A neurological model was championed by Damasio and Maurer (1978) and developed in a model of support and accommodations (Donnellan & Leary, 1995). Be- havioral criteria for detection and treatments took over the field of autism research, and this neurological model was abandoned in favor of behavioral modification methods like those developed by Skinner in the 1950s. Those models from the behaviorist school of thought were developed to condition animals in the laboratory. But the methods to condition lab animals were transferred to human children by Lovaas, Schreibman, and Koegel (1974). The Lovaas methods, along with behaviorally based detection criteria from the same school of thought, remain the gold standard of diagnosis and treat- ments of autism today. These are not subject to the level of accountability that scientific research is. That is, they do not require any type of institu- tional review board and do not have to be compliant with the Helsinki Act. This is indeed highly puzzling given their pervasive presence in science. Today, the DSM-5 allows ADHD and sensory issues in the criteria for autism, so autism is no longer a narrowly, well-defined behavioral disor- der (perhaps it never was) despite insistence on defining it by criteria that exclude the underlying physiological conditions that these individuals de- velop from birth onward. It is a view that strictly separates the description l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 551 of behavior from the physiological underpinnings—as if one could produce behaviors without a nervous system. An emergent field aimed at uncover- ing multiple digital biomarkers to characterize and automatically stratify various aspects of development for research purposes, may be the answer to the start of a new era in scientific research on autism aimed at a physio- logical characterization of the condition for medical use. In the United States, the field may not have a choice at this point, given the current demands by medical insurance providers on a higher standard for treatments to conform to the American Medical Association criteria. Why we waited so long and affected generations to adopt a medical crite- rion is puzzling, when a proper neurological model existed in the 1970s. Even more puzzling, we see that support systems to accommodate the neurological issues were already being successfully adopted back then by affected families. Yet the evolution of those who are adults today and re- ceived intense behavioral modification (referred to as conversion therapy in some circles) has shown us the gross error made several decades ago. It is up to a new generation of scientists now to correct that error, but it will not be without resistance from organizations and a powerful lobby advocating a purely behaviorist model of detection and treatment recommendations that exclude physiology. Tricare, one of the major insurance providers in the United States, reported $261.9 million for 13,390 beneficiaries of applied be- havioral analyses (ABA) in contrast to $38.2 million for occupational, phys- ical, and speech therapy combined. Prescription medication expenditures were marked at $15.1 million. This Tricare report pertained to families in the military, subject to lower rates in comparison to those by other providers. The high cost preventing broad access to treatment paired with their poor and uncertain medical outcome has raised concerns in the autism CARES report mentioned in section 1 (publicly available online). Because of the lack of autonomous living and agency in the aging autistic adults who re- ceived such treatments intensively since early life, a new medical model is imperative. In our study, it was evident that whether using absolute scores or deriva- tive, age-dependent data accounting for longitudinal dynamic changes from visit to visit, the RRB scores of the ADOS implicitly reflecting sensory motor issues picked up best the switching of the clinician. If we were to combine this structured social test with wearable biosensors, we could au- tomatically stratify autism spectrum disorders, unveil different subtypes of idiopathic autism, and map them to known genetic phenotypes. Such clinically informed digital data could provide objective criteria useful to the community doing basic scientific work (e.g., geneticists, electrophys- iologists, neuroimaging) blind to biases and false positives that inade- quate statistics introduce under existing approaches. The label of autism marginalizing the affected person within society would no longer be nec- essary, and treatments and services would be driven and covered by phys- iologically informed medical criteria. New hybrid methods could also help l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 552 E. Torres, R. Rai, S. Mistry, and B. Gupta physicians treat the medical issues in a more personalized manner. Some collaborative work along those lines has been done between clinicians and researchers, but more research is needed to fully validate and replicate the use of digital ADOS within the smart-mobile and personalized health concepts. 5 Conclusion We invite reader to consider that science in autism needs to retake the path of independence and reclaim its agency to be able to conduct proper scientific research away from profitable models. This can be done by build- ing an interdisciplinary consortium of scientists from diverse disciplines with complementary skill sets to derive proper metrics and true stan- dardized methods for personalized medicine. Combining open access data sets and interdisciplinary collaborations can lead to empirical verification of our theoretical models and alert us when they fall short of enforced expectations. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 553 Figure Appendix l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l Figure 12: The use of the ADOS test as an experimental protocol to digitize the dyadic interaction that takes place between the child and the clinician (Why- att & Torres, 2017). Using weighted, directed graphs and network connectivity analyses derived from motor output fluctuations registered by wearables, we can automatically determine who leads and lags in each task and which tasks are socially favorable to the child’s physical abilities to entrain with the tester and express joint attention and other forms of synchronization and social cohe- siveness spontaneously emerging during the tasks. Nodes of the network rep- resent body parts (wrists, thorax, lumbar, and ankles of child and clinician) with the edge color denoting cross-coherence levels (taken minute by minute at 128 Hz) and represented in the color bar. The size of the nodes represents incom- ing links, while color gives self-emerging modules (subnetworks with maximal inner connectivity and minimal outer connectivity). Arrows denote out-degree level, and the thickness of the arrow denotes node i leading node j by a positive phase shift (from cross-coherence spectral analyses.) We highlight two tasks— one in which the clinician leads and one in which the child leads. f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 554 E. Torres, R. Rai, S. Mistry, and B. Gupta l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l Figure 13: Sample children (two TD and two ASD) with full ADOS-2 tasks and lead-lag profile capturing the child-clinician interactions during the test. Metrics were derived from weighted directed graphs with outstrength measure at each node tallied to quantify which body parts lead the interaction and who leads the social exchange overall, minute by minute. Notice that TD children lead most of the time, while ASD children tend to lag in the interactions. The weighted, directed graphs were derived from cross-coherence analyses of synchronously coregistered motions from inertial measurement units (IMUs) and gyroscopes in a grid of wireless wearable sensors across the body (Whyatt & Torres, 2017). f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 555 Author Contributions Conceptualization, E.B.T.; methodology, E.B.T.; software, E.B.T., S.M.; val- idation, E.B.T., R.R., S.M. and B.G.; formal analysis, E.B.Y.; investigation, Figure 14: Quantifying noise levels in the self-generated kinesthetic reafferent feedback derived from motor output variability of goal-directed visuo-motor actions. (A) Pointing task to assess signatures of motor noise across neurode- velopment. The arrow indicates the proposed taxonomy of neuromotor control across different levels of function in the nervous systems. (B) Typical develop- ment (TD) signatures of voluntary (instructed) and spontaneous (uninstructed) hand-retracting movements during pointing. Speed profiles are derived from hand-pointing trajectories in panel A forward to the target (first bell-shaped) and away from the target toward a resting position during decision making in a cognitive match-to-sample task. The resting hand raises speed continuously until it reaches a peak, then decelerates to touch the target on a touch moni- tor that registers the touch. The hand leaves the monitor and accelerates away from the target, toward the body, to reach the resting state again. Midway along the retracting hand trajectory, the hand reaches its second peak and then de- celerates again to rest. Trials are recorded at the child’s own pace. The target appears in trial 1 and disappears once the child touches it, then appears again, implicitly instructing the child to touch it again. There are 100 trials presented as a heat map with global peaks highlighted in yellow. They are stacked in the order of occurrence and aligned to the touch. The typical behavior of the TD children is highly periodic and well structured. The third plot at the bottom shows the small (local) speed peaks of the resting state, followed by the first ballistic phase of the motion with well-aligned peaks of the maximal speed of the hand on its way to touch the target, then the ballistic phase toward the tar- get, followed by small speed peaks while resting briefly at the target-stratifying ASD subtypes in Wu, Jose, Nurnberger, and Torres (2018). The ballistic phase returning the hand toward the body follows (with no peaks); then the peaks of the speed maxima automatically align again from trial to trial. Another ballistic phase retracting the hand ensues, and the hand lands to a resting state, with the presence once again of the small speed peaks at rest. (B) The participant with ASD (similar age and sex to TD child) depicts very different patterns, consisting of highly disorganized motions, with (involuntary, unintended) random noise output by his system while performing these goal-directed reaches. Notice the absence of a pattern in the forward phase of the pointing to the target and the emergence of some structure in the return motions. Further, notice the presence of random noise (also modeled and empirically characterized in Wu et al., 2018, using a Poisson process. (C) The micromovement spikes reveal states of mat- uration in TD from 3 years of age to college age, in contrast to its absence in ASD 3 to 25 years old (Torres, Brincker, et al., 2013), suggesting the importance of tracking the shifts in probability space with age and treating autism as a life- long condition. This contrasts with assuming a theoretical distribution under the one-size-fits-all model of autism ADOS-driven research today. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 556 E. Torres, R. Rai, S. Mistry, and B. Gupta E.B.T., R.R., S.M., B.G.; resources, E.B.T.; data curation, E.B.T., R.R., S.M., B.G.; writing—original draft preparation, E.B.T.; writing—review and edit- ing, R.R., S.M., B.G.; visualization, E.B.T.; supervision, E.B.T.; project admin- istration, E.B.T.; funding acquisition, E.B.T. Acknowledgments We thank the children and families who participated in this study. We thank the anonymous clinicians who helped us with administering the ADOS, scoring it, and reliability assessment. We thank the past and current mem- bers of the Rutgers Sensory Motor Integration Lab who helped with data collection and study coordination. We thank the Computational Neurobi- ology Lab of the Salk Institute of Biological Studies for hosting E.B.T. during her sabbatical, while writing this article. This research was funded by the New Jersey Governor’s Council for the Medical Research and Treatments of Autism, grant number CAUT14APL018 and by the Nancy Lurie Marks Family Foundation. Conflicts of Interest We declare no conflict of interest. The funders had no role in the design of the study; the collection, analyses, or interpretation of data; the writing of the manuscript; or the decision to publish the results. References American Psychological Association. (2013). Diagnostic and statistical manual of mental disorders: Fifth edition. Arlington, VA: APA. Behere, A., Shahani, L., Noggle, C. A., & Dean, R. (2012). Motor functioning in autis- tic spectrum disorders: A preliminary analysis. J. Neuropsychiatry Clin. Neurosci., 24(1), 87–94. doi:10.1176/appi.neuropsych.11050105 Bernard, J. A., & Mittal, V. A. (2015). Updating the research domain criteria: The utility of a motor dimension. Psychol. Med., 45(13), 2685–2689. doi:10.1017 /S0033291715000872 Brincker, M., & Torres, E. B. (2013). Noise from the periphery in autism. Front. Integr. Neurosci., 7, 34. doi:10.3389/fnint.2013.00034 Caballero, C., Mistry, S., Vero, J., & Torres, E. B. (2018). Characterization of noise sig- natures of involuntary head motion in the Autism Brain Imaging Data Exchange Repository. Front. Integr. Neurosci., 12, 7. doi:10.3389/fnint.2018.00007 Campione, G. C., Piazza, C., Villa, L., & Molteni, M. (2016). Three-dimensional kine- matic analysis of prehension movements in young children with autism spectrum disorder: New insights on motor impairment. J. Autism. Dev. Disord., 46(6), 1985– 1999. doi:10.1007/s10803-016-2732-6 Constantino, J. N., & Charman, T. (2016). Diagnosis of autism spectrum disorder: Reconciling the syndrome, its diverse origins, and variation in expression. Lancet Neurol., 15(3), 279–291. doi:10.1016/S1474-4422(15)00151-9 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 557 Constantino, J. N., Kennon-McGill, S., Weichselbaum, C., Marrus, N., Haider, A., Glowinski, A. L., . . . Jones, W. (2017). Infant viewing of social scenes is under genetic control and is atypical in autism. Nature, 547(7663), 340–344. doi:10.1038 /nature22999 Cuthbert, B. N., & Insel, T. R. (2013). Toward the future of psychiatric diagnosis: The seven pillars of RDoC. BMC Med., 11, 126. doi:10.1186/1741-7015-11-126 Damasio, A. R., & Maurer, R. G. (1978). A neurological model for childhood autism. Arch. Neurol., 35(12), 777–786. doi:10.1001/archneur.1978.00500360001001 Di Martino, A., Yan, C. G., Li, Q., Denio, E., Castellanos, F. X., Alaerts, K., . . . Milham, M. P. (2014). The Autism Brain Imaging Data Exchange: Towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry., 19(6), 659– 667. doi:10.1038/mp.2013.78 Donnellan, A. M., & Leary, M. R. (1995). Movement Differences and Diversity in Autism/Mental Retardation. Madison, WI: DRI Press. Duda, M., Kosmicki, J. A., & Wall, D. P. (2014). Testing the accuracy of an observation- based classifier for rapid detection of autism risk. Transl. Psychiatry., 4, e424. doi:10.1038/tp.2014.65 Eigsti, I. M., Rosset, D., Col Cozzari, G., da Fonseca, D., & Deruelle, C. (2015). Effects of motor action on affective preferences in autism spectrum disorders: Different influences of embodiment. Dev. Sci., 18(6), 1044–1053. doi:10.1111/desc.12278 Fattorusso, A., Di Genova, L., Dell’Isola, G. B., Mencaroni, E., & Esposito, S. (2019). Autism spectrum disorders and the gut microbiota. Nutrients, 11(3). doi:10.3390 /nu11030521 Freedman, D., & Diaconis, P. (1981). On the histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57(4), 453– 476. Friston, K. J., Redish, A. D., & Gordon, J. A. (2017). Computational nosology and precision psychiatry. Comput. Psychiatr., 1, 2–23. doi:10.1162/CPSY_a_00001 Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing ADOS scores for a measure of severity in autism spectrum disorders. J. Autism. Dev. Disord., 39(5), 693–705. doi:10.1007/s10803-008-0674-3 Hannant, P., Tavassoli, T., & Cassidy, S. (2016). The role of sensorimotor difficul- ties in autism spectrum conditions. Front. Neurol., 7, 124. doi:10.3389/fneur.2016 .00124 Havdahl, K. A., Bal, V. H., Huerta, M., Pickles, A., Oyen, A. S., Stoltenberg, C., . . . Bishop, S. L. (2017). Dr. Havdahl et al. reply. J. Am. Acad. Child Adolesc. Psychiatry, 56(7), 619–620. doi:10.1016/j.jaac.2017.05.010 Havdahl, K. A., Hus Bal, V., Huerta, M., Pickles, A., Oyen, A. S., Stoltenberg, C., . . . Bishop, S. L. (2016). Multidimensional influences on autism symptom measures: Implications for use in etiological research. J. Am. Acad. Child. Adolesc. Psychiatry, 55(12), 1054–1063 e1053. doi:10.1016/j.jaac.2016.09.490 Hawgood, S., Hook-Barnard, I. G., O’Brien, T. C., & Yamamoto, K. R. (2015). Pre- cision medicine: Beyond the inflection point. Sci. Transl. Med., 7(300), 300ps317. doi:10.1126/scitranslmed.aaa9970 Heiss, C. N., & Olofsson, L. E. (2019). The role of the gut microbiota in development, function and disorders of the central nervous system and the enteric nervous system. J. Neuroendocrinol., 31(5), e12684. doi:10.1111/jne.12684 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 558 E. Torres, R. Rai, S. Mistry, and B. Gupta Hollander, M., & Pena, E. A. (2004). Nonparametric methods in reliability. Stat. Sci., 19(4), 644–651. doi:10.1214/088342304000000521 Hughes, H. K., Rose, D., & Ashwood, P. (2018). The gut microbiota and dysbiosis in autism spectrum disorders. Curr. Neurol. Neurosci. Rep., 18(11), 81. doi:10.1007 /s11910-018-0887-6 Hus, V., Gotham, K., & Lord, C. (2014). Standardizing ADOS domain scores: Sepa- rating severity of social affect and restricted and repetitive behaviors. J. Autism. Dev. Disord., 44(10), 2400–2412. doi:10.1007/s10803-012-1719-1 Hus, V., & Lord, C. (2014). The Autism Diagnostic Observation Schedule, module 4: Revised algorithm and standardized severity scores. J. Autism. Dev. Disord., 44(8), 1996–2012. doi:10.1007/s10803-014-2080-3 Insel, T. R. (2014). The NIMH Research Domain Criteria (RDoC) Project: Precision medicine for psychiatry. Am. J. Psychiatry, 171(4), 395–397. doi:10.1176/appi.ajp .2014.14020138 Jang, Y., Wixted, J. T., & Huber, D. E. (2009). Testing signal-detection models of yes/no and two-alternative forced-choice recognition memory. J. Exp. Psychol. Gen., 138(2), 291–306. doi:10.1037/a0015525 Jasmin, E., Couture, M., McKinley, P., Reid, G., Fombonne, E., & Gisel, E. (2009). Sensori-motor and daily living skills of preschool children with autism spec- trum disorders. J. Autism Dev. Disord., 39(2), 231–241. doi:10.1007/s10803-008 -0617-z Klin, A. (2000). Attributing social meaning to ambiguous visual stimuli in higher- functioning autism and Asperger syndrome: The Social Attribution Task. J. Child Psychol. Psychiatry., 41(7), 831–846. Klin, A. (2008). In the eye of the beholden: Tracking developmental psychopathol- ogy. J. Am. Acad. Child Adolesc. Psychiatry, 47(4), 362–363. doi:10.1097/CHI .0b013e3181648dd1 Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2002a). Defining and quantifying the social phenotype in autism. Am. J. Psychiatry, 159(6), 895–908. doi:10.1176/appi.ajp.159.6.895 Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2002b). Visual fixation patterns during viewing of naturalistic social situations as predictors of so- cial competence in individuals with autism. Arch. Gen. Psychiatry, 59(9), 809– 816. Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. doi:10.1038/nature07868 Kroll, N. E., Yonelinas, A. P., Dobbins, I. G., & Frederick, C. M. (2002). Separating sensitivity from response bias: Implications of comparisons of yes-no and forced- choice tests for models and measures of recognition memory. J. Exp. Psychol. Gen., 131(2), 241–254. Kushki, A., Chau, T., & Anagnostou, E. (2011). Handwriting difficulties in children with autism spectrum disorders: A scoping review. J. Autism Dev. Disord., 41(12), 1706–1716. doi:10.1007/s10803-011-1206-0 Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62, 399– 402. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 559 Lord, C., Risi, S., Lambrecht, L., Cook, E. H. Jr., Leventhal, B. L., DiLavore, P. C., . . . Rutter, M. (2000). The Autism Diagnostic Observation Schedule–Generic: A standard measure of social and communication deficits associated with the spec- trum of autism. J. Autism Dev. Disord., 30(3), 205–223. Lovaas, O. I., Schreibman, L., & Koegel, R. L. (1974). A behavior modification ap- proach to the treatment of autistic children. J. Autism Child Schizophr., 4(2), 111– 129. doi:10.1007/bf02105365 Mandelbaum, D. E., Stevens, M., Rosenberg, E., Wiznitzer, M., Steinschneider, M., Filipek, P., & Rapin, I. (2006). Sensorimotor performance in school-age children with autism, developmental language disorder, or low IQ. Dev. Med. Child Neurol., 48(1), 33–39. doi:10.1017/S0012162206000089 Maurer, R. G., & Damasio, A. R. (1979). Vestibular dysfunction in autistic children. Dev. Med. Child. Neurol., 21(5), 656–659. doi:10.1111/j.1469-8749.1979.tb01682.x Maurer, R. G., & Damasio, A. R. (1982). Childhood autism from the point of view of behavioral neurology. J. Autism Dev. Disord., 12(2), 195–205. doi:10.1007 /bf01531309 Minshew, N. J., Sung, K., Jones, B. L., & Furman, J. M. (2004). Underdevelopment of the postural control system in autism. Neurology, 63(11), 2056–2061. Mosconi, M. W., Luna, B., Kay-Stacey, M., Nowinski, C. V., Rubin, L. H., Scudder, C., . . . Sweeney, J. A. (2013). Saccade adaptation abnormalities implicate dysfunc- tion of cerebellar-dependent learning mechanisms in autism spectrum disorders (ASD). PLoS One, 8(5), e63709. doi:10.1371/journal.pone.0063709 Mosconi, M. W., Mohanty, S., Greene, R. K., Cook, E. H., Vaillancourt, D. E., & Sweeney, J. A. (2015). Feedforward and feedback motor control abnormalities implicate cerebellar dysfunctions in autism spectrum disorder. J. Neurosci., 35(5), 2015–2025. doi:10.1523/JNEUROSCI.2731-14.2015 Mosconi, M. W., & Sweeney, J. A. (2015). Sensorimotor dysfunctions as primary features of autism spectrum disorders. Sci. China Life. Sci., 58(10), 1016–1023. doi:10.1007/s11427-015-4894-4 Mosconi, M. W., Wang, Z., Schmitt, L. M., Tsai, P., & Sweeney, J. A. (2015). The role of cerebellar circuitry alterations in the pathophysiology of autism spectrum dis- orders. Front. Neurosci., 9, 296. doi:10.3389/fnins.2015.00296 Ng, Q. X., Loke, W., Venkatanarayanan, N., Lim, D. Y., Soh, A. Y. S., & Yeo, W. S. (2019). A systematic review of the role of prebiotics and probiotics in autism spec- trum disorders. Medicina (Kaunas), 55(5). doi:10.3390/medicina55050129 Ornitz, E. M. (1974). The modulation of sensory input and motor output in autistic children. J. Autism Child. Schizophr., 4(3), 197–215. Perry, W., Minassian, A., Lopez, B., Maron, L., & Lincoln, A. (2007). Sensorimotor gating deficits in adults with autism. Biol. Psychiatry, 61(4), 482–486. doi:10.1016 /j.biopsych.2005.09.025 Pulikkan, J., Mazumder, A., & Grace, T. (2019). Role of the gut microbiome in autism spectrum disorders. Adv. Exp. Med. Biol., 1118, 253–269. doi:10.1007 /978-3-030-05542-4_13 Ryu, J., Vero, J., Dobkin, R. D., & Torres, E. B. (2019). Dynamic digital biomark- ers of motor and cognitive function in Parkinson’s disease. J. Vis. Exp., 149. doi:10.3791/59827 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 560 E. Torres, R. Rai, S. Mistry, and B. Gupta Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605– 610. Scott, D. W. (2015). Multivariate density estimation: Theory, practice, and visualization (2nd ed.). Hoboken, NJ: Wiley. Siaperas, P., Ring, H. A., McAllister, C. J., Henderson, S., Barnett, A., Watson, P., & Holland, A. J. (2012). Atypical movement performance and sensory integra- tion in Asperger’s syndrome. J. Autism Dev. Disord., 42(5), 718–725. doi:10.1007 /s10803-011-1301-2 Somoza, E., & Mossman, D. (1991). ROC curves and the binormal assumption. J. Neuropsychiatry Clin. Neurosci., 3(4), 436–439. doi:10.1176/jnp.3.4.436 Starkstein, S., Gellar, S., Parlier, M., Payne, L., & Piven, J. (2015). High rates of Parkinsonism in adults with autism. J. Neurodev. Disord., 7(1), 29. doi:10.1186 /s11689-015-9125-6 Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnos- tics: Collected papers. Mahwah, NJ: Erlbaum. Swets, J. A., & Pickett, R. M. (1982). Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press. Tordjman, S., Davlantis, K. S., Georgieff, N., Geoffray, M. M., Speranza, M., An- derson, G. M., . . . Dawson, G. (2015). Autism as a disorder of biological and behavioral rhythms: Toward new therapeutic perspectives. Front. Pediatr., 3, 1. doi:10.3389/fped.2015.00001 Torres, E. B., Brincker, M., Isenhower, R. W., Yanovich, P., Stigler, K. A., Nurnberger, J. I., . . . Jose, J. V. (2013). Autism: The micro-movement perspective. Front. Integr. Neurosci., 7, 32. doi:10.3389/fnint.2013.00032 Torres, E. B., & Denisova, K. (2016). Motor noise is rich signal in autism research and pharmacological treatments. Sci. Rep., 6, 37422. doi:10.1038/srep37422 Torres, E. B., Isenhower, R. W., Nguyen, J., Whyatt, C., Nurnberger, J. I., Jose, J. V., . . . Cole, J. (2016). Toward precision psychiatry: Statistical platform for the person- alized characterization of natural behaviors. Front. Neurol., 7, 8. doi:10.3389/fneur .2016.00008 Torres, E. B., Isenhower, R. W., Yanovich, P., Rehrig, G., Stigler, K., Nurnberger, J., & Jose, J. V. (2013). Strategies to develop putative biomarkers to characterize the female phenotype with autism spectrum disorders. J. Neurophysiol., 110(7), 1646– 1662. doi:10.1152/jn.00059.2013 Torres, E. B., Mistry, S., Caballero, C., & Whyatt, C. P. (2017). Stochastic signatures of involuntary head micro-movements can be used to classify females of ABIDE into different subtypes of neurodevelopmental disorders. Front. Integr. Neurosci., 11, 10. doi:10.3389/fnint.2017.00010 Torres, E. B., Nguyen, J., Mistry, S., Whyatt, C., Kalampratsidou, V., & Kolevzon, A. (2016). Characterization of the statistical signatures of micro-movements un- derlying natural gait patterns in children with Phelan mcdermid syndrome: To- wards precision-phenotyping of behavior in ASD. Front. Integr. Neurosci., 10, 22. doi:10.3389/fnint.2016.00022 Torres, E. B., Smith, B., Mistry, S., Brincker, M., & Whyatt, C. (2016). Neonatal diag- nostics: Toward dynamic growth charts of neuromotor control. Front. Pediatr., 4, 121. doi:10.3389/fped.2016.00121 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3 Hidden Features of the ADOS Test May Skew Autism Detection Rates 561 Torres, E. B., Vero, J., & Rai, R. (2018). Statistical platform for individualized behav- ioral analyses using biophysical micro-movement spikes. Sensors (Basel), 18(4). doi:10.3390/s18041025 Torres, E. B., & Whyatt, C. (2018). Autism: The movement sensing perspective. Boca Ra- ton, FL: CRC Press/Taylor & Francis. Torres, E. B., Yanovich, P., & Metaxas, D. N. (2013). Give spontaneity and self- discovery a chance in ASD: Spontaneous peripheral limb variability as a proxy to evoke centrally driven intentional acts. Front. Integr. Neurosci., 7, 46. doi:10.3389 /fnint.2013.00046 Troyb, E., Knoch, K., Herlihy, L., Stevens, M. C., Chen, C. M., Barton, M., . . . Fein, D. (2016). Restricted and repetitive behaviors as predictors of outcome in autism spectrum disorders. J. Autism Dev. Disord., 46(4), 1282–1296. doi:10.1007 /s10803-015-2668-2 Vilensky, J. A., Damasio, A. R., & Maurer, R. G. (1981). Gait disturbances in pa- tients with autistic behavior: A preliminary study. Arch. Neurol., 38(10), 646–649. doi:10.1001/archneur.1981.00510100074013 White, K. G., & Wixted, J. T. (2010). Psychophysics of remembering: To bias or not to bias. J. Exp. Anal. Behav., 94(1), 83–94. doi:10.1901/jeab.2010.94-83 Whyatt, C., & Craig, C. (2013). Sensory-motor problems in autism. Front. Integr. Neu- rosci., 7, 51. doi:10.3389/fnint.2013.00051 Whyatt, C., & Torres, E. B. (2017). The social-dance: Decomposing naturalistic dyadic in- teraction dynamics to the micro-level. Paper presented at the MOCO 2017, London, UK. Whyatt, C. P., & Torres, E. B. (2018). Autism research: An objective quantitative review of progress and focus between 1994 and 2015. Front. Psychol., 9, 1526. doi:10.3389/fpsyg.2018.01526 Witt, J. K., Taylor, J. E., Sugovic, M., & Wixted, J. T. (2015). Signal detection measures cannot distinguish perceptual biases from response biases. Perception, 44(3), 289– 300. doi:10.1068/p7908 Wu, D., Jose, J. V., Nurnberger, J. I., & Torres, E. B. (2018). A biomarker characterizing neurodevelopment with applications in autism. Sci. Rep., 8(1), 614. doi:10.1038 /s41598-017-18902-w Received July 30, 2019; accepted November 10, 2019. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u n e c o a r t i c e - p d / l f / / / / 3 2 3 5 1 5 1 8 6 4 6 4 1 n e c o _ a _ 0 1 2 6 3 p d . / f b y g u e s t t o n 0 8 S e p e m b e r 2 0 2 3

下载pdf