Machine Learning Predictive Models Will Not Replace Clinical Judgment Anytime Soon

Beth Krone, Ph.D.
Icahn School of Medicine at Mount Sinai

In the spirit of full disclosure, I am a technophile. My age-cohort was the first to have desktop computers as children. I first learned to program in binary. After a decade as an end-user, I still have the muscle memory of a programmer. The concept behind Machine Learning predictive models in mental health diagnostics – the idea that we can train computers to be ‘smart’ enough to recognize patterns in data and ‘learn’ to classify and predict outcomes from reading the data without any prior information or rule specification – does not intimidate me. I welcome our computer overlords. So far, though, computers have not been out-performing clinicians in separating ADHD from typically developing youth using brain-based biomarkers.

The ADHD 200 global competition freely gave a moderately large fMRI dataset to researchers and statisticians, who responded to the call and flexed their creativity in developing algorithms and models to distinguish between the dataset’s ADHD patients and healthy controls. Several teams have published on the data, using pieces of the dataset to test their theories and in search of the elusive definitive confirmatory biomarker of disease state that, so far, seems not to exist. In 2012, for example, Sato and his team created a classification model using brain region homogeneity as a measure of volume, Fractional amplitude of low frequency fluctuations (fALFF) as a measure of spontaneous brain activity at rest, and network maps of the default mode (positive values) and task-positive networks (negative values). The model returned a median predictive accuracy of 54% for discriminating ADHD from controls, providing no additive clinical value to the diagnostic process at that time.

Recently, Sen and his team (2018) published a general prediction model using the ADHD 200, then tested in the ABIDE dataset, which is also freely available. From MRI data, their team generated 3 dimensional representations of brain volumes, or ‘texture’, that discriminated between ADHD and typical development with 63% accuracy. Adding to the dataset information about 45 independent intrinsic connectivity networks (ICNs) derived from the resting state fMRI data (networks thought to underlie functions such as mind-wandering, and planning), raised the predictive accuracy of their model to 67% in the ADHD 200 dataset, and retested with 64% accuracy in the ABIDE dataset. An accuracy rate in the mid to high 60’s is still far below the expected performance of a well-trained human diagnostician, but not much different than the overall predictive validity of the Continuous Performance Test (CPT-II; Fazio, 2014). Given the differential between CPT and fMRI in terms of time, cost, and resources, the CPT is not likely to soon be replaced by fMRI for augmentation of clinical judgment as standard of care.

Other recently published works highlight the diversity of methods employed within machine learning and the range of quality control procedures in data acquisition and analysis, against the larger clinical backdrop of heterogeneity in ADHD and the value of clinical training in mental health care diagnostics. In 2017, for example, Lirong Tan and his team developed a Support Vector Machine (SVM) model to separate youth with ADHD from controls based on the volume of brain regions as measured by fMRI rather than using the more traditional approach of looking at the volume of brain regions measured as physical structures via MRI. The advantage of using the functional measure here was to capture how much a task caused activation in and around a particular structure of the brain. The team entered demographic data into their model, looking at socio-cultural contributors to the overall presentation of ADHD. They found that, brain-wide, functional volumes discriminated ADHD with equivalent accuracy (59.6% accuracy) to age and sex (58.5% accuracy), with neither being of strong clinical value. Tweaking the model by entering information about 10 brain regions of interest in ADHD pathology improved accuracy to 67%, correlating to subtle differences across the brain, rather than to a significant difference in any one particular region that could identify a group.

Xun-Heng Wang (2018) and his team also examined 10-ICNs, including an executive control network and a cerebellar network, for their predictive value. Their approach was to measure variability in the networks’ functional connectivity when not performing a task. Unfortunately, they included demographics in their model without examining the independent predictive quality of network variability. Since these were the same demographic features that Lirong found independently predicted with 58.5% accuracy, and we cannot determine the actual independent value of the connectivity analysis, we cannot be certain that their model truly achieves 75% accuracy with which the team presents us. These are claims that science will prove with replication, or not.

In the end, scientists will keep searching, fueled by the strong desire and public need to find ‘the’ biomarker or biomarkers that definitively separate ADHD from typical development. For the foreseeable future, though, clinical judgment is in no danger of being replaced by machine intelligence. Through more than a decade’s work as a clinician for a clinical and translational research group, I have frequently had to tell patients that, “No, I’m sorry, but we cannot use your fMRI/MRI to diagnose you. The science just is not there. No one can do that, yet.” Yet.

References:
Fazio, R., Dole, L. & King, J. (2014). CPT-II versus TOVA: Assessing the Diagnostic Power of Continuous Performance Tests. Archives of Clinical Neuropsychology 29(6):540
Sato, J.R., Hoexter, M.Q., Fujita, A., & Rohde, L.A. (2012). Evaluation of Pattern Recognition and Feature Extraction Methods in ADHD Prediction. Frontiers in Systems Neuroscience, 6
Sen, B., Borle, N., Greiner, R. & Brown, M.R.G. (2018). A General Prediction Model for the Detection of ADHD and Autism Using Structural and Functional Imaging. Plos One, 13(4), e0194856.
Tan, L., Guo, X., Ren, S., Epstein, J. & Lu, L.J. (2017) A Computational Model for the Automatic Diagnosis of Attention Deficit Hyperactivity Disorder Based on Functional Brain Volume. Frontiers in Computational Neuroscience, 11, 75. DOI 10.3389/fncom.2017.00075
Wang, X-H., Jiao, Y & Li, L. (2018) Identifying Individuals with Attention Deficit Hyperactivity Disorder Based on Temporal Variability of Dynamic Functional Connectivity. Nature Scientific Reports, 8/11789.

Machine Learning Predictive Models Will Not Replace Clinical Judgment Anytime Soon

Announcements