Diagnoses from Clinical Evaluations and Standardized Diagnostic Interviews Don’t Agree

April 19, 2010

A recently published a meta-analysis showed that diagnoses generated from clinical evaluations often do not agree with the results of structured and semi-structured interviews-together called standardized diagnostic interviews (SDI).1 Such a study could easily be overlooked as another dry and “methodological” investigation. Nevertheless, the implications of this meta-analysis are enormous

A recently published a meta-analysis showed that diagnoses generated from clinical evaluations often do not agree with the results of structured and semi-structured interviews-together called standardized diagnostic interviews (SDI).1 Such a study could easily be overlooked as another dry and “methodological” investigation. Nevertheless, the implications of this meta-analysis are enormous.

Whenever clinicians pick up a journal to learn the latest clinical research, it is very likely that the patients described in those studies received their diagnosis in no small part from an SDI. Indeed, it is very difficult to publish a clinical study without using one. These interviews, which can be performed by both by clinicians and by nonclinicians, are designed so that interviewers ask the same diagnostic questions in the same way, thereby minimizing variability within and across studies. The findings of clinical research are obviously intended to be applicable to the clinician in routine clinical practice, where SDIs are seldom used. This discrepancy in approach would not be a problem if, regardless of method, the results of both strategies generally lead to the same diagnostic conclusion.

The problem is that they don’t. Instead, the diagnoses generated from SDIs tend to be different from the diagnoses derived from standard clinical evaluations. The 38 studies assessed in our meta-analysis included nearly 16,000 patients and encompassed 10 different standardized interviews as well as a wide range of diagnoses, ages, and clinical settings. One requirement for included studies was that the interviewers using SDIs and the clinical evaluators did not know the diagnostic results of the other method. Most of these studies calculated a kappa statistic as their measure of agreement between SDIs and clinical evaluations. While there is debate on the subject, kappas of 0.6 to 0.8 are generally considered acceptable to good, depending on the context. In our meta-analysis, the overall kappa across all diagnoses was 0.27. Some individual diagnoses clearly fared better than others. The kappa for anorexia nervosa was 0.86 while the kappa for generalized anxiety disorder was a meager 0.19. Many diagnoses had kappas between 0.30 and 0.40.
 

Standard diagnostic interviews come in 2 main types names: structured and semistructured.  Semistructured interviews allow for more flexibility in their delivery and usually need to be administered by people with more clinical training, whereas the questions on structured interviews are intended to be read as written. While one might expect semistructured interviews to correspond more closely with clinical evaluations than structured interviews, no statistical differences were found.  There was some indication that agreement was better in outpatient settings and among children; however, these improvements were minor and statistically nonsignificant when all potential modifiers were entered together in a meta-regression.

These findings mean that if an SDI or a clinical evaluation produced a positive  diagnosis for a particular disorder, the majority of time the other diagnostic system did not produce the same diagnosis. Why was agreement overall so poor? The study was not equipped to look directly at this question and the authors were careful not to assign blame to either clinicians or SDIs. SDIs tended to generate a higher number of  diagnoses than clinicians. However, this difference cannot make a judgment about who is correct.

Possible sources for the poor agreement include features of both clinical evaluations and SDIs. Clinicians, on the one hand, might tend to pick a single diagnosis based on clinical impression-even if a patient meets criteria for several. They may also tailor their interviews more closely around the chief complaint and weigh contextual information, such as family history, in the diagnosis. Clinicians may also hold back on making diagnoses for concern of stigma. SDIs, by contrast, could elicit a lot of negative responses due to the sheer number of questions that are typically asked. These interviews also tend to rely more heavily on the impression of the patient or a parent as to what constitutes a clinically significant symptom.

To illustrate how these processes might work in a clinical situation, the following example might be useful.

    Consider the hypothetical example of 15-year-old Billy who presents initially to a psychologist in the community. Billy has a history of being neglected and abused by his biological parents who struggled with their own mental health problems. He was removed from the home at age 3 and lived in several foster homes before he was adopted into a stable household. He has been observed in multiple settings to be impulsive, aggressive, anxious, and prone to intense behavioral outbursts with minimal provocation. The adopted parents take Billy to a community clinician who conducts an evaluation without an SDI and makes a diagnosis of posttraumatic stress disorder (PTSD), noting the history of trauma and profound behavioral disturbance even in the absence of symptoms such as flashbacks or nightmares. She begins trauma-based psychotherapy and considers interventions such as eye movement desensitization and reprocessing. 
     After a violent episode at school, Billy is hospitalized at an academic medical center where a structured diagnostic interview is given as part of the admission process. According to this interview, Billy meets criteria for ADHD, oppositional defiant disorder and conduct disorder. Treatment with a psychostimulant is started in addition to recommendations for parent behavioral training using a cognitive-behavioral context. After discharge, Billy is evaluated by a local psychiatrist who notices the level of impairment and extreme mood swings and diagnoses pediatric bipolar disorder-even without a history of sustained manic symptoms lasting several days. The psychiatrist begins treatment with an antipsychotic agent and considers discontinuing the stimulant.

The example demonstrates the decision-making that can underlie the widely divergent diagnoses and treatment plans. Indeed, when these data were first presented as a poster at a meeting of the American Academy of Child and Adolescent Psychiatry, the informal comments made by viewers reflected the different perspectives. Those who tended not to use SDIs in their practice interpreted these findings as confirmation that SDIs were not useful, while academics who conducted research trials found the results as validation that the continued use of SDIs were needed to produce accurate diagnoses. Indeed, the source studies of the meta-analysis also diverged in the interpretation of their results. For some individual studies, the design was framed in a way that diagnoses based on clinical evaluations were considered to be the “gold standard” so as to put SDIs to the test. Other studies were designed to challenge the validity of diagnoses from clinical evaluations.

In much of clinical research, skepticism about using either clinical evaluations or SDIs as the sole source of diagnostic information is reflected in the generally accepted practice of using a “best estimate” procedure. Here, senior investigators review the diagnostic results of SDIs and assign final diagnoses after reviewing all the clinical data. This procedure is thought to add incremental validity to the final diagnoses although its utility has rarely been tested formally against potential benchmarks that may be difficult to define.

My coauthors and I also discuss the possibility that the dichotomous nature of diagnoses may also contribute to disagreement. For example, if a clinical evaluation determines that a patient meets 5 of 9 criteria for the inattentive items of ADHD while an SDI reports that 6 of 9 criteria are met, the final diagnosis is in 100% disagreement (since at least 6 items are required)- even though the 2 methods may have agreed for 8 of 9 criteria. While disagreement in such instances may be viewed as somewhat of an artifact, it does reflect current practice and the need often to make binary decisions such as whether or not to prescribe a medication.

In summary, our meta-analysis has revealed the somewhat alarming finding that diagnoses generated from SDIs (the prevailing method used in clinical research studies) and the diagnoses generated from clinical evaluations (the prevailing method used in routine practice) frequently disagree. The sources of this disagreement are likely complex and a result of features of both diagnostic methods. While researchers in practice often try to combine SDI results with clinical judgment, much remains to be learned regarding how scientists, clinicians, and perhaps DSM5 can synthesize often divergent information in the service of a valid and clinically meaningful assessment.

References:

Reference

1. Rettew DC, Lynch AD, Achenbach TM, et al. Meta-analyses of agreement between diagnoses made from clinical evaluations and standardized diagnostic interviews. Int J Methods Psychiatr Res. 2009;18:169-184.