Diagnoses From Clinical Evaluations and Standardized Diagnostic Interviews Don’t Agree
Diagnoses From Clinical Evaluations and Standardized Diagnostic Interviews Don’t Agree
A recent meta-analysis showed that diagnoses generated from clinical evaluations often do not agree with the results of structured and semistructured interviews—together called standardized diagnostic interviews (SDIs).1 Such a study could easily be overlooked as another dry and “methodological” investigation. Yet the implications of this meta-analysis are enormous.
Whenever clinicians pick up a journal to learn the latest clinical research, it is very likely that the patients described in those studies received their diagnosis in no small part from an SDI. Indeed, it is very difficult to publish a clinical study without using one. SDIs, which can be performed by both clinicians and nonclinicians, are designed so that interviewers ask the same diagnostic questions in the same way, thereby minimizing variability within and across studies. The findings of clinical research are obviously intended to be applicable to the clinician in routine clinical practice, where SDIs are seldom used. This discrepancy in diagnostic approach would not be a problem if, regardless of method, the results of both strategies generally led to the same diagnostic conclusion.
The problem is that they do not. Instead, the diagnoses generated from SDIs tend to be different from the diagnoses derived from standard clinical evaluations. The 38 studies assessed in our meta-analysis included nearly 16,000 patients and encompassed 10 different standardized interviews as well as a wide range of diagnoses, ages, and clinical settings. One requirement for the included studies was that the interviewers using SDIs and the clinical evaluators did not know the diagnostic results of the other method.
Most of these studies calculated a Κ statistic as their measure of agreement between SDIs and clinical evaluations. While there is debate on the subject, Κ values of 0.6 to 0.8 are generally considered acceptable to good, depending on the context.
In our meta-analysis, the overall Κ value across all diagnoses was 0.27. Some individual diagnoses clearly fared better than others. The Κ for anorexia nervosa was 0.86 while the Κ for generalized anxiety disorder was a meager 0.19. Many diagnoses had a Κ between 0.30 and 0.40.
SDIs come in 2 main types—structured and semistructured. Semistructured interviews allow for more flexibility in their delivery, whereas the questions on structured interviews are intended to be read as written. While one might expect semistructured interviews to correspond more closely to clinical evaluations than to structured interviews, no statistical differences were found. There was some indication that agreement was better in outpatient settings and children. However, these improvements were minor and statistically nonsignificant when all potential modifiers were entered in a meta-regression.
These findings mean that if an SDI or a clinical evaluation produced a positive diagnosis for a particular disorder, the majority of time the other diagnostic system did not produce the same diagnosis. Why was agreement overall so poor? The study was not equipped to look directly at this question, and the authors were careful not to assign blame to either clinicians or SDIs. Interviewers using SDIs tended to generate a higher number of diagnoses than clinicians. However, this difference cannot make a judgment about who is correct.
Possible sources for the poor agreement included features of both clinical evaluations and SDIs. Clinicians, on the one hand, might tend to pick a single diagnosis on the basis of clinical impressions—even if a patient meets criteria for several. They may also tailor their interviews more closely around the chief complaint and weigh contextual information, such as family history, in the diagnosis.
Clinicians may also hold back on making diagnoses out of concern about stigma. SDIs, by contrast, could elicit a lot of negative responses because of the sheer number of questions that are typically asked. These interviews also tend to rely more heavily on the impression of the patient or a parent about what constitutes a clinically significant symptom.
To illustrate how these processes may work in a clinical situation, consider the hypothetical example of 15-year-old Billy who presents initially to a psychologist in the community.
Billy was neglected and abused by his biological parents who struggled with their own mental health problems. He was removed from home at age 3 and lived in several foster homes before he was adopted into a stable household. He has been observed in multiple settings to be impulsive, aggressive, anxious, and prone to intense behavioral outbursts with minimal provocation. The adopted parents take Billy to a community clinician who conducts an evaluation without an SDI and makes a diagnosis of posttraumatic stress disorder, noting the history of trauma and profound behavioral disturbance even in the absence of symptoms such as prominent flashbacks or nightmares. The clinician begins trauma-based psychotherapy and considers such interventions as eye movement desensitization and reprocessing.
After a violent episode at school, Billy is hospitalized at an academic medical center where an SDI is conducted as part of the admission process. According to this interview, Billy meets criteria for attention-deficit/hyperactivity disorder (ADHD), oppositional defiant disorder, and conduct disorder. Treatment with a psychostimulant is started, and recommendations are made for parent behavioral training using a cognitive-behavioral context.
After discharge, Billy is evaluated by a local psychiatrist who notices the level of impairment and extreme mood swings and diagnoses pediatric bipolar disorder—even without a history of sustained manic symptoms lasting several days. The psychiatrist begins treatment with an antipsychotic agent and considers discontinuing the stimulant.
The example demonstrates the decision making that can underlie the widely divergent diagnoses and treatment plans. Indeed, when these data were first presented as a poster at a meeting of the American Academy of Child and Adolescent Psychiatry, the informal comments made by viewers reflected the different perspectives. Those who tended not to use SDIs interpreted these findings as confirmation that SDIs were not useful, while academics who conducted research trials found the results to be validation that continued use of SDIs was needed to produce accurate diagnoses. Indeed, the source studies included in the meta-analysis also had divergent interpretations of their results. In some individual studies, the design was framed in a way that diagnoses based on clinical evaluations were the “gold standard” so as to put SDIs to the test. Other studies were designed to challenge the validity of diagnoses from clinical evaluations.
In much of clinical research, skepticism about using either clinical evaluations or SDIs as the sole source of diagnostic information is reflected in the generally accepted practice of using a “best estimate” procedure. Here, senior investigators review the diagnostic results of SDIs and assign final diagnoses after reviewing all the clinical data. This procedure is thought to add incremental validity to the final diagnoses, although its utility has rarely been tested formally against potential benchmarks that may be difficult to define.
My coauthors and I also discuss the possibility that the dichotomous nature of diagnoses may contribute to disagreement. For example, if a clinical evaluation determines that a patient meets 5 of 9 criteria for the inattentive items of ADHD while an SDI reports that 6 of 9 criteria are met, the final diagnosis is 100% disagreement (since at least 6 items are required)—even though the 2 methods may have agreed for 8 of 9 criteria. While disagreement in such instances may be viewed as something of an artifact, it does reflect current practice and the need to often make binary decisions—such as whether to prescribe a medication.
In summary, our meta-analysis has revealed the somewhat alarming finding that diagnoses generated from SDIs (the prevailing method used in clinical research studies) and the diagnoses generated from clinical evaluations (the prevailing method used in routine practice) frequently disagree. The sources of this disagreement are probably complex and a result of features of both diagnostic methods. While researchers in practice frequently try to combine SDI results with clinical judgment, much remains to be learned about how scientists, clinicians, and perhaps DSM5 can synthesize often divergent information in the service of a valid and clinically meaningful assessment.
1. Rettew DC, Lynch AD, Achenbach TM, et al. Meta-analyses of agreement between diagnoses made from clinical evaluations and standardized diagnostic interviews. Int J Methods Psychiatr Res.2009;18:169-184.