Are Studies Misguiding the Choice of First-Line Treatments?

March 1, 2002

A recently published meta-analysis questions if efficacy data garnered from clinical trials is relevant to everyday clinical practice. The authors ponder if enough patients are being included, if they are being followed long enough afterward, and whether exclusion criteria are too broad?

Psychologist Drew Westen, Ph.D., director of the adolescent and adult personality program of the Center for Anxiety and Related Disorders at Boston University, and his colleague, Kate Morrison, Ph.D., have written an ambitious, multidimensional meta-analysis of 34 studies on empirically validated psychotherapies published between 1990 and 1998 in top peer-reviewed journals. The study has raised controversy in its reassessment of previously published data and with its suggestion that these data are not always strong enough to warrant the status of these therapies as the treatment of choice for their respective diagnoses.

Westen and Morrison (2001) looked at 17 trials of panic disorder, five trials of generalized anxiety disorder (GAD) and 12 trials of depression, and included a total of 2,414 subjects. Another 23 trials were considered but excluded because they "did not meet minimal criteria for randomized controlled trials."

In general, Westen and Morrison argued that researchers need to report a broader range of outcome indices to provide a "comprehensive, multi-dimensional portrait of treatment effects." They found a serious dearth of long-term follow-up that could attest to the lasting benefits of treatment; their search identified only nine experimental studies with follow-up at 12 to 18 months and only four studies with follow-up at 24 or more months for all three disorders combined. Exclusion rates of between 64% and 68% of subjects screened, lack of standardization of exclusion criteria and failure to report exclusion criteria call into question the studies' generalizability to real-world patients.

The meta-analysis revealed the most impressive results for the treatment of panic disorder, where a substantial proportion of patients who completed treatment (63%) were shown to improve and remain improved. For depression, 54% were deemed improved, but the existing data did not support long-term effects. For GAD, the improvement rate was less -- 52% -- and there were virtually no published long-term follow-up data.

In one of four critical commentaries published alongside the report, Peter E. Nathan, Ph.D., (2001) called the meta-analysis a "tour de force that deserves serious consideration." In an interview with Psychiatric Times, David H. Barlow, Ph.D., author of several of the anxiety and panic studies Westen and Morrison examined, summed up the findings this way: "What [Westen and Morrison] basically say is that we need to do three things: 1) look at long-term outcomes, at least two years down the road; 2) look at a range of indicators of measures of outcome, not just one narrow one; and 3) broaden our inclusion criteria." But Barlow also thinks it is important to note that the suggested criteria "go way beyond the [U.S. Food and Drug Administration's] criteria for proof in drug studies. If these same criteria were applied to drug [trials], there would be no drugs considered effective."

In another commentary, Deane E. Aikins, Ph.D., and colleagues (2001) advised caution in interpreting the results of any meta-analysis, a procedure that usually collapses results from a number of trials to re-calculate efficacy (usually in the form of effect size) based on the total. By pooling a large number of subjects, a meta-analysis gains power and may provide the bigger picture of a treatment's overall efficacy. But a meta-analysis may also lump together treatments whose "mechanisms of change" are not comparable.

In the case of Westen and Morrison's GAD data, Aikins et al. found that the "analysis of effect size was based on four CBT [cognitive-behavioral therapy] conditions, two relaxation conditions, a combination of cognitive therapy and relaxation, behavior therapy, brief supportive expressive psychotherapy, analytic psychotherapy and anxiety-management training." While these all may qualify as empirically validated therapies, Aikins et al. do not consider them comparable. They performed their own mini meta-analysis of the four cognitive treatments and found a more impressive pre- and post-effect size of 1.18 relative to the two wait-list conditions (0.15) (Aikins et al., 2001).

One of Westen and Morrison's major points is that researchers need to report more of their data if clinicians are to draw clinically meaningful conclusions from the research literature. They advocate that in addition to effect size, such indices as percent improved or recovered; mean level of symptomatology at the end of treatment; percent who remain improved at the end of treatment; percent included and excluded of those initially screened and the reasons for exclusion; and the percent seeking additional treatment should also be included.

Effect size alone "does not yield information on variability of response among subjects," wrote Westen and Morrison (2001). A study could have an impressive effect size if a few patients recovered completely even if the majority were only modestly affected. Reporting the percent of patients improved provides a more comprehensive sense of the treatment's overall efficacy, but "about half to two-thirds of the time this number is not reported in the initial study," Westen explained to PT, "and in meta-analyses, it's never reported."

The mean level of symptomatology at the end of treatment also provides important information. "If you're going to describe a therapy as an empirically supported treatment for depression," explained Westen, "you ought to specify if the treatment alleviates depression entirely or only decreases it, a significant distinction." In their meta-analysis, depressed patients post-treatment averaged 8.68 (SD=6.49) on the Hamilton Rating Scale for Depression (HAM-D) and 10.98 (SD=8.60) on the Beck Depression Inventory (BDI), which are arguably still clinically significant levels of depression (Westen and Morrison, 2001).

If these patients were being treated in the community, Westen and Morrison wrote, their clinicians would continue to treat them. For GAD, the average patient continued to score 11.03 (SD=6.18) on the HAM-D and 47.45 (SD=9.33) on the State-Trait Anxiety Inventory-Trait Version (STAI-T). Weston and Morrison concluded, "These findings, relative to published norms [where available], suggest that the average patient receives substantial benefit but continues, even at termination, to have mild symptoms of the disorder for which he or she was treated."

By far, Westen and Morrison's most provocative proposal was their notion of calculating outcomes on the basis of an "effective efficacy" quotient. Ordinarily, when calculating the success of a treatment, the denominator, or n, in the equation "the number improved/n" represents either those who completed treatment or, more conservatively, the total number of those who started the therapy (the so-called intent-to-treat group). The effective efficacy quotient would take as its denominator all of those screened for a trial, even if they were excluded from the study. According to Westen and Morrison, this number may more accurately reflect the treatment's efficacy in the real world since "clinicians in everyday practice do not have the luxury of screening out patients who they have reason to believe will not respond."

The notion of effective efficacy generated shock and even outrage among the reviewers. Commentary co-author Robert J. DeRubeis, Ph.D., told PT, "There are some good ideas in this paper, but this is not one of them. [They are] the first ever to suggest that you should take the excluded patients and treat them as if they were not helped at all by the therapy. That's as far-fetched as the notion that all of them would have gotten better."

Stewart Agras, M.D., professor of psychiatry at Stanford University, concurred that there is a failure of logic. He explained to PT, "You don't know how many of them would get better, especially since one of the main reasons for excluding [subjects] is that they are not sick enough."

While Westen and Morrison may have been playing devil's advocate with their effective efficacy quotient, they contend that the large number of patients excluded from trials casts the generalizability of findings into doubt. Westen explained, "If you get rid of 70% of the patients who walk in the door because they have comorbid conditions or don't meet your experimental criteria in some other way, and then you get rid of the people who didn't complete the treatment even though they passed a rigorous screening procedure, then what you're really saying is that 50% were treated successfully of the 80% who completed the treatment of the 30% who were accepted into the study in the first place."

In Weston and Morrison's meta-analysis, 32% of patients screened were included in studies of depression; 36% in panic trials and 35% in GAD studies. In nearly all the studies, patients were excluded for psychosis, bipolar disorder or other organic disorders. Substance abuse and suicidality also constituted common reasons for exclusion. According to Westen, the latter two criteria mean that some of the most difficult patients to treat -- those with borderline personality disorder -- will be excluded from efficacy trials.

DeRubeis thinks that Westen and Morrison are making too much of the exclusion issue, particularly in terms of the comorbidity question. "Many of those excluded for comorbidity are excluded because their comorbid condition is their primary condition," he said. "Those patients might be eligible for a study of their primary condition from which their secondary comorbidity would not necessarily exclude them."

Along with his colleague Shannon Wiltsey Stirman, DeRubeis has been analyzing patient charts from a consortium of outpatient clinics to determine how many of these real-world patients fit the inclusion criteria of already-conducted clinical trials. "What we have found so far," Stirman told PT, after analyzing over 125 cases, "is that people are excluded because their disorder isn't severe enough or because they have a disorder like dysthymia that hasn't been studied in a randomized controlled trial." According to Stirman and colleagues' preliminary analysis (2001, as cited in DeRubeis and Stirman, 2001), comorbidity has not been the major reason why a patient would not qualify for inclusion.

Barlow also differed with Westen and Morrison on the exclusion question, claiming that they "have gone too far and missed the point." He explained to PT, "If someone with panic disorder came in and the principal problem was that the patient was acutely suicidal, the clinician is not going to treat the panic disorder, they're going to deal with the acute suicidality. This is not a question of excluding patients from research, this is a question of clinical common sense."

Agras agreed, arguing for perspective: "You've got to think about the progression of research in any therapeutic area." Small studies with restrictive inclusion criteria represent the first step in establishing that a treatment is efficacious. "You want to know first if a treatment works in a clearly defined group of patients without much comorbid psychopathology so you don't have a lot of other variables getting in your way," he added. Having established a treatment's efficacy in smaller trials with narrowly defined samples, empirically validated therapy trials are now expanding, with more inclusive samples, and at least some trials are being done on real-world clinic populations.

Aikins also sees CBT studies expanding their inclusion criteria. "It's time now to start doing scientific analysis on comorbid psychopathology," he told PT. He cited colleague Michelle G. Craske, Ph.D.'s, current study of panic disorder in patients with an additional diagnosis as an example. "She's asking if the primary treatment affects the secondary diagnosis and what happens with additional psychotherapy directed to the second disorder."

Underlying Westen and Morrison's meta-analysis is the suggestion that researchers and policy-makers have moved too quickly to deem therapies as successful or superior to other forms of untested treatment, when patients still have clinical or subclinical symptoms at the end of the study and when long-term prognosis is still an open question.

"When treatments produce relatively positive outcomes they are put on a list of empirically supported therapies and people go then very readily to the assumption that those therapies are better than all the other therapies, and we just don't know because nobody's ever tested those other therapies," Westen said.

Coming from the field of substance abuse where complete abstinence over a prolonged period of time is often an unrealistic goal, Nathan told PT, "I don't think it's necessarily fair that treatment should only be judged effective if all the symptoms disappear. If the treatment results in an improved quality of life with a substantial reduction in symptoms, that's success." Westen countered, "The data argue for humility."

If therapy-outcomes researchers have cast their findings in the best possible light, they may have been motivated by the contentious and politicized environment in which they have had to conduct their research. "People really wanted to dispel the myth that the most effective treatment was always medication," Westen said.

On another front, proponents of cognitive-behavioral therapies had "disdain for long-term, especially psychodynamic, treatment that had never been tested in the laboratory," according to Westen. The catch-22 of testing those therapies now is that their methodologies do not fit the short-term, manualized, controlled efficacy trials that have developed as the gold standard for assessing treatments -- whether pharmacological or psychological.

"The requisites of doing good science from a controlled clinical trial point of view -- brief, manualized treatments that are as close to identical across subjects as possible -- has made it virtually impossible for anything but a small range of treatments to be tested," explained Westen.

Medication trials are now coming under Westen and Morrison's scrutiny. How do they compare with the empirically validated therapies? "The data look about the same or worse for most classes of medications for most disorders," said Westen. "The reporting practices are equally problematic, and there is very little research that follows patients up at clinically meaningful outcome intervals of a year or more."




Aikins DE, Hazlett-Stevens H, Craske MG (2001), Issues of measurement and mechanism in meta-analyses: comment on Westen and Morrison (2001). J Consult Clin Psychol 69(6):904-907 [comment].


DeRubeis RJ, Stirman SW (2001), Determining the pertinence of psychotherapy outcome research findings for clinical practice: comment on Westen and Morrison (2001). J Consult Clin Psychol 69(6):908-909 [comment].


Nathan PE (2001), Deny nothing, doubt everything: a comment on Westin and Morrison (2001). J Consult Clin Psychol 69(6):900-903 [comment].


Westen D, Morrison K (2001), A multidimensional meta-analysis of treatments for depression, panic, and generalized anxiety disorder: an empirical examination of the status of empirically supported therapies. J Consult Clin Psychol 69(6):875-899 [see comments pp900-913].