The Variable You Can’t Touch: Advances in Mental Health Measurements

You cannot touch mental health impairments, so how best can we measure them? Adaptive testing may be the answer.


In a session titled “Virtual and Electronic Assessment and Intervention Tools: Advances in Mental Health Measurement” at the 2022 APSARD Virtual Conference, Robert Gibbons, PhD, a professor of biostatistics at the University of Chicago, discussed the subjectivity of mental health measurement and how problems arise in emergency departments, primary care, pediatric clinics, child welfare systems, judicial systems, and more.

“Measurement is so much easier and more precise—seemingly—in the physical sciences than it is in social sciences,” said Gibbons. “With mental health measurements, we now are trying to measure a latent variable that I can't put my hands on, and I can't really know what it is. What I can learn about it is from its manifestations.”

The goal of Gibbons’ presentation, he shared, was to figure out a way to boost the precision of measurement of this latent variable by using the smallest yet optimal number of items. The solution is computerized adaptive testing (CAT), in which different individuals may receive different scale items that are targeted to their specific impairment level.

By using multidimensional item response theory, researchers like Gibbons have developed adaptive ADHD, depression, anxiety, mania, posttraumatic stress disorder, psychosis, substance abuse, suicidality, and social determinants of health tests. Adaptive testing works by administering a question with medium severity, estimating the severity based on the response to this initial question, then selecting the next most informative question. The questioning ends when the desired precision of measurement has been reached.

This shifts away from small, fixed length tests with uncertain psychometric properties to large item banks that draw an optimal, smaller subset of items for each individual, meant to specifically target their type and level of impairment.

“These measures provide constant precision of measurement throughout the entire severity continuum,” said Gibbons. “The items are targeted to a patient specific level of severity at that point in time. Different questions are asked upon repeated administration, eliminating response bias produced by repeatedly asking the same questions over and over.”

The results, Gibbons shared, are promising, showing remarkable increases in precision of measurement and decreases in patient burden. For example, the K-CAT, used in children, had an 80% compliance rate in a diverse population, and was reported by youth and caregivers to be easy to use.

“We spend billions of dollars on biological measurements, yet we validate them using Stone Age clinical measurements. We can and should do so much better,” finished Gibbons.