You Can’t Turn a Sow’s Ear Into a Silk Purse
You Can’t Turn a Sow’s Ear Into a Silk Purse
In his recent Huffington Post piece titled Field Trial Results Guide DSM Recommendations,1 DSM-5 Task Force Chair Dr David Kupfer says, “What’s clear is just how well the field trials did their job.” This surprisingly optimistic claim has inspired these telling rejoinders from Mickey Nardo, MD, and Barney Carroll, MD, 2 of the best informed critics of DSM-5.
Dr Nardo first: “The absence of biological tests in psychiatry is unique in medicine and sentences the classification of mental disorders to endless controversy. In the 1970s, Dr Robert Spitzer proposed we use inter-rater reliability as a stand in for objective tests. His statistician colleagues developed a simple measure (called ‘kappa’) to indicate the level of diagnostic agreement corrected for chance. In 1974, Spitzer reported on 5 studies that clearly exposed the unreliability of DSM-II, the official diagnostic system at the time.
“To correct this problem and obtain the diagnostic agreement necessary for research studies, Spitzer then set about constructing sets of diagnostic criteria meant to tap overt signs and symptoms, rather than the more inferential mechanisms that informed DSM-II. He also developed structured clinical interviews that provided a uniform method of assessment. These approaches worked well to improve the poor kappas obtained using the free form approach of DSM-II.
“In 1980, Spitzer took the next big step of introducing the criterion based method of diagnosis into DSM-III. What had originated as a research tool now informed all clinical practice. It was an important milestone for psychiatry when DSM-III field testing showed that the system achieved good kappas. The new manual was an instant success throughout the mental health professions and brought a measure of objectivity to a field previously dominated by warring subjective opinions. Later, in 1994, DSM-IV was also able to demonstrate good kappas in its much more extensive field testing.
“The DSM-5 Task Force originally planned two sets of field trials, the second of which was meant to provide quality control to correct whatever weaknesses would be exposed in the first. But along the way, the field testing got far behind its schedule and the quality control step was quietly cancelled. No explanation was ever offered, but it seemed likely that DSM-5 was being rushed to press so that APA could reap publishing profits.
“Dr David Kupfer now wants us to believe that the recently published results of the DSM-5 field testing somehow serve to justify the inclusion in DSM 5 of extremely controversial and much feared changes. This is a terribly misleading claim. Independent of all the other criticisms of DSM-5 (and there are plenty), the poor results of the field trials must have been a major disappointment to the Task Force. Dr Kupfer is now making a desperate attempt to salvage the failed project by putting an unrealistically positive spin on its results.
“Our forty-year experience in reliability testing for DSM-II, the RDC, DSM-III, and DSM-IV makes clear what are acceptable and what are unacceptable kappa levels. There is no way of avoiding or cloaking the stark and troubling fact that the DSM-5 field trials produced remarkably low kappas—harking back to the bad old days of DSM-II. [see http://1boringoldman.com/index.php/2012/10/31/humility-2/].
“Equally disturbing, three of the eight diagnoses tested at multiple centers had widely divergent kappa values at the different sites—hardly a vote for their reliability. Even worse, two major diagnostic categories [Major Depressive Disorder and Generalized Anxiety Disorder] performed terribly, in a range that is clearly unacceptable by anybody’s standard. [see http://1boringoldman.com/index.php/2012/10/31/but-this-is-ridiculous/].
“Dr Kupfer has been forced to drastically lower our expectations in an effort to somehow justify the remarkably poor and scattered DSM-5 kappa results. There is, in fact, only one possible explanation for the results—the DSM-5 field trials were poorly designed and incompetently administered. Scientific integrity requires owning up to the defects of the study, rather than asking us to deviate from historical standards of what is considered acceptable reliability. It is not cricket to lower the target kappas after the study results fail to meet reasonable expectations.
“Diagnostic agreement is the bedrock of our system—a non-negotiable bottom line. The simple truth is that by historical standards, the DSM-5 field trials did not pass muster. Dr Kupfer can’t expect to turn this sow’s ear into a silk purse.”
Dr Carroll adds this: “The purpose of DSM-5 is to have criteria that can be used reliably across the country and around the world. The puzzling variability of results across the sites in the DSM-5 field trials is a major problem. Let’s take just one of many examples—for Bipolar I Disorder, the Mayo Clinic came in with a very good kappa value of 0.73 whereas the San Antonio site came in with a really lousy kappa of 0.27. You can’t just gloss over this gaping discrepancy by reporting a mean value. The inconsistencies across sites have nothing to do with the criteria tested—they are instead prima facie evidence of unacceptably poor execution of the study protocol. The inconsistent results prove that something clearly wasn’t right in how the study was done.
“The appropriate response is to go back to the drawing board by completing the originally planned quality control second stage of testing—rather than barreling ahead to premature publication and pretending that everything is just fine when it is not. The DSM-5 leaders have lowered the goal posts and are claiming a bogus sophistication for their field trials design as an excuse for its sloppy implementation. But a low kappa is a low kappa no matter how you try to disguise it. Dr Kupfer is putting lipstick on the pig.
“Many people experience a glazing of the eyes when the term kappa appears, but it’s really a simple idea. The kappa value tells us how far we have moved from completely random agreement (a kappa of 0) to completely perfect agreement (a kappa of 1.0). The low end of kappas that DSM-5 wants us to find acceptable are barely better than blind raters throwing random darts. If there is this much slop in the system when tested at academic centers, imagine how bad things will become in the real world of busy and less specialized clinical practice.
Something isn’t right . . . and when something isn’t right in a matter as serious as psychiatric diagnosis the professional duty is to fix it. Having shirked this responsibility, APA deserves to fail in the business enterprise that it has made of DSM-5. If ever there was a clear conflict of interest, this is it.”
Thanks are due to Drs Nardo and Carroll. There can be no doubt that the DSM-5 Field Trials were a colossal waste of money, time, and effort. First off, they didn’t ask the most obvious and important question—What are the risks that DSM-5 will create millions of misidentified new ‘patients’ who would then be subjected to unnecessary treatment? Second, the results on the question it did ask (about diagnostic reliability) are so all over the map that they are completely uninterpretable. And to top it off, DSM-5 cancelled the quality control stage that might have cleaned up the mess.
It is almost certain that DSM-5 will be a dangerous contributor to our already existing problems of diagnostic inflation and inappropriate prescription of psychotropic drugs. The DSM-5 leadership is trying to put a brave face on its badly failed first stage of field testing and has offered no excuse or explanation for canceling its second and most crucial quality control stage. This field testing fiasco erases whatever was left of the credibility of DSM-5 and APA.
1. Kupfer DJ. Field trial results guide DSM recommendations. Huffington Post. November 7, 2012. http://www.huffingtonpost.com/david-j-kupfer-md/dsm-5_b_2083092.html. Accessed November 13, 2012.