Critics have noted that meta-analysis, when misused, resembles statistical alchemy, taking the dross of individually negative studies to produce the gold of a positive pooled result.2 This happened with an influential 2004 publication on antidepressants in acute bipolar depression, and it is being slightly rectified with a recently published update of that study.3,4 In both cases, reasonable judgments cannot be made unless we understand how to use, and abuse, meta-analysis.
Meta-analysis represents an observational study of studies: one combines results of different studies into one summary measure, weighted for sample size and data variability. These strengths come at a cost: meta-analysis can be invalid if studies differ greatly from each other—the problem of “heterogeneity” (sometimes called the “apples and oranges” problem), which reflects confounding bias, the key scientific problem that our observations are often false because other factors, of which we are not aware, influence the results observed. This is why clinicians should believe only half of what they see, and none of what they hear.5 Randomization minimizes confounding bias, since all these other factors will be randomly distributed and thus cancel each other out.
Since meta-analysis is the observational study of studies, it reintroduces confounding bias, and the benefits of randomization are lost. This means that as with any clinical opinion or observational study, we cannot take meta-analyses at face value. The less heterogeneity there is, the more valid the meta-analysis. If studies involve the same drug or same class, in the same clinical setting, with the same research design, then some judgments can be made. Otherwise, most judgments will be false.
The problem of heterogeneity also relates to the number of studies included. Meta-analysis of 2 or 3 studies is practically meaningless. It is like taking symptom scales in 2 or 3 patients and calculating means and standard deviations. Five studies is still borderline. Certainly 10 or more studies would be considered more valid; most good meta-analyses involve dozens of studies. This is not a minor matter; it has been estimated mathematically that if one begins with a low probability (25%) that we believe that something is the case (eg, that antidepressants are effective in bipolar depression), a meta-analysis of a few small studies with much heterogeneity would increase the probability of the outcome (antidepressant efficacy) to about 41% likelihood.6 Hardly definitive.
In 2004, the American Journal of Psychiatry unfortunately published just such a study: a meta-analysis of 5 randomized studies of antidepressants in acute bipolar depression. In that report, heterogeneity was rampant. For instance, the only placebo-controlled study that found no evidence of acute antidepressant response is the only study in which all patients received baseline lithium(Drug information on lithium).7 That study was excluded from the efficacy analysis on technical grounds. Two studies compared antidepressant alone with placebo alone, without any mood stabilizers; sample sizes in those studies were quite small. One large study drove the whole meta-analysis (N = 433, accounting for 59% of the review sample), and it was an Eli Lilly–conducted study of olanzapine(Drug information on olanzapine) plus fluoxetine(Drug information on fluoxetine) versus olanzapine plus placebo8; in the meta-analysis, what was called “placebo” was actually olanzapine, whereas in most of the other studies, patients literally got placebo (meaning an inert pill).
Exercising the right of occasional suppression and slight modification, it is truly absurd to see how plastic a limited number of observations become, in the hands of men with preconceived ideas.1(p267)
– Sir Francis Galton, 1863
Apples and oranges are too weak as metaphors: Mars versus Venus would be better. Of course, antidepressants will “not cause” mania when most patients defined as getting “placebo” are in fact getting an antimanic neuroleptic. The claim of efficacy in 4 totally different studies involves much probable confounding bias. Decent randomized studies are so mixed that the results represent the ullage of what they once were. Such data manipulation may serve demotic purposes of persuasion, but it does not serve the scientific purpose of approximating truth. Unfortunately, like an infection that spreads by citation, this kind of prominently published study has had impact.
The new update basically adds the best-designed largest study to date (N = 332), the STEP-BD study of bipolar depression, in which bupropion or paroxetine(Drug information on paroxetine) was equivalent to placebo when added to standard mood stabilizers.9 This confirms the previous similar study excluded from the prior meta-analysis.7 The finding of “no benefit,” when added to the previous tenuous meta-analysis, moves the meta-analytic summary result closer to the null value, hence the “new” conclusion that antidepressants do not work for bipolar depression. Again there was “no” mania, which is not surprising, since the STEP-BD patients were all treated with antimanic medications. Also, the new meta-analysis does not emphasize the clear and repeated finding of inefficacy in a meta-analysis of maintenance treatments, a clinically important matter, separate from whether or not there is any short-term benefit for severe acute depression.10
What should clinicians take away? They should not have believed the results of the first meta-analysis for the reasons given above. The second one only clarifies the clear flaws of the first. The best sources are good, well-designed, randomized studies, such as the STEP-BD study, which speak clearly: in general, antidepressants are not effective in acute bipolar depression for the average patient. As with any average, there will be exceptions—a minority in whom some utility will be seen. But, this does not justify extensive use of antidepressants. Contrary to occasional claims otherwise, antidepressants are still widely prescribed in bipolar depression, in the US and throughout the world, and are passionately believed in by many experts and clinicians.11,12 More studies are always welcomed. But for now, it is hard to see how any other conclusion can be made except that the best available scientific evidence contradicts these claims and practices.