Levels of Evidence
Levels of Evidence
The term “evidence” has become about as controversial as the word “unconscious” had been in its Freudian heyday, or as the term “proletariat” was in another arena.
It means many things to many people, and for some, it elicits reverent awe—or reflexive aversion. This is because, like those other terms, it is linked to a movement—in this case, evidence-based medicine (EBM), which is currently quite influential, with both supporters and critics.
A clinical controversy
In the previous article in this series, “Why You Cannot Believe Your Eyes,” I described a study in which antidepressants were judged effective in the treatment of bipolar depression, based on observational data that indicated that stopping antidepressants would lead to depression relapse.2 I said that we should not take those results at face value because they do not pass the hurdle of bias, specifically confounding by indication.
A single article never stands alone, however; the context of the whole literature is relevant. A critic might object that other observational studies have also “replicated” that initial study.3 I put replicated in quotes, because observational studies can be expected to replicate other observational studies if they have the same sources of bias. Recall that bias is systematic error; therefore, the same mistakes will be made systematically.
My research group conducted the same study as those observational trials, except our study was randomized.4 We partially confirmed the earlier findings but with a much smaller amount of benefit (smaller effect size): there was mild, but not marked, depressive symptom benefit with antidepressant continuation; there was a small (not large) delay to depressive relapse with antidepressant continuation. And, notably, we found that there was worsening of depressive episodes in those with rapid-cycling bipolar disorder who continued antidepressant treatment. Which results should we believe—the observational data that show more benefits with antidepressants or the randomized data that show less (and some harm)? The answer is the latter, and this is not a matter of opinion but is based on an understanding of the concept of levels of evidence.
Origins of EBM
It may be worthwhile to note that the originators of the EBM movement in Canada (such as David Sackett5) toyed with different names for what they wanted to do; they initially thought to use “science-based medicine” but opted for “evidence-based medicine” instead. This is perhaps unfortunate, since “science” tends to engender respect, while “evidence” seems more vague.6 Hence, we often see proponents of EBM (mistakenly, in my view) saying things like: “that opinion is not evidence-based” or “those articles are not evidence- based.” The folly of this kind of language is evident if we use the term “science” instead. Once we use the term “science,” it becomes clear that such statements raise the question of what “science” means.
In my reading of EBM, the basic idea is that we need to understand what kinds of evidence we use, and we need to use the best: this is the concept of levels of evidence. EBM is not about an opposition between having evidence or not having evidence; since we always have some kind of evidence or another, it is about ranking different kinds of evidence.
With a somewhat ready assumption of cause and effect and, equally, a neglect of the laws of chance, the literature be-comes filled with conflicting cries and claims, assertions and counterassertions.
—Austin Bradford Hill1(p4)
Specific levels of evidence
The EBM literature has various definitions for levels of evidence. The main EBM text uses letters (A through D).5 I prefer using 1 through 5, and I think the specific content of the levels should vary depending on the field of study. The basic idea is that randomized studies are higher levels of evidence than nonrandomized studies (because of bias, as discussed in the previous article), and that the lowest level of evidence consists of case reports, expert opinion, or the consensus of the opinion of clinicians or investigators.
Levels of evidence provide clinicians and researchers with a road map that allows consistent and justified comparison of different studies so as to adequately compare and contrast their findings. Various disciplines have applied the concept of levels of evidence in slightly different ways and, in psychiatry, no consensus definition exists. In my view, in mental health, there are 5 levels of evidence that best apply (Table), ranked from 1 (highest) to 5 (lowest).7
The key feature of levels of evidence to keep in mind is that each level has its own strengths and weaknesses, and as a result, no single level is completely useful or useless. All other things being equal, however, as one moves from level 5 to level 1, increasing rigor and probable scientific accuracy improves.
Level 5 refers to a case report or a case series (a few case reports strung together), or an expert’s opinion, or the consensus of experts or clinicians’ or investigators’ opinions (such as in treatment algorithms), or the personal clinical experience of clinicians, or the words of wise men (eg, Sigmund Freud, Emil Kraepelin, Karl Marx, Adam Smith). All of this is the same lowest level of evidence. This does not mean that such evidence is wrong, nor does it mean that it is not evidence; it is a kind of evidence, just a weak kind. It could turn out that a case report is correct, and a randomized study wrong, but, in general, randomized studies are much more likely to be correct than case reports. We simply cannot know when a case report, or an expert opinion, or a saying of Freud or Marx, is right, and when it is wrong. Authority is not, as with Rome, the last word.