The validity of any study involves the sequential assessment of confounding bias, followed by chance, followed by causation.2 A study needs to pass these 3 hurdles before you can consider accepting its results. Once we accept that no fact or study result is accepted at face value (because no facts can be observed purely, but rather all are interpreted), we can then turn to statistics to see what kinds of methods need to be used to analyze the facts. These 3 steps are widely accepted and form the core of statistics and epidemiology. When clinicians understand these 3 concepts, they will know whether what they observe is valid.
By bias, we mean systematic error (as opposed to the random error of chance). Systematic error means that one makes the same mistake over and over again because of some inherent problem with how the observations are made. Confounding is a kind of bias that has to do with factors, of which we are unaware, that influence our observed results.3 This concept is best visualized in the Figure.
As seen in the Figure, the confounding factor is associated with the exposure (or what we think is the cause) and leads to the result. The real cause is the confounding factor; the apparent cause, which we observe, is just along for the ride. The example of caffeine(Drug information on caffeine), cigarettes, and cancer was given in the previous article on statistics, published in the March 2009 issue of Psychiatric Times and posted on www.psychiatrictimes.com. Coffee is associated with cancer, but it does not cause cancer, because coffee drinkers are more likely to smoke cigarettes, the actual cause of cancer.
Confounding by indication. As a clinician, you are trained to be a nonrandomizing prescriber. What this means is that you are taught, through years of supervision and more years of clinical experience, to tailor your treatment decisions to the individual patient. You do not treat patients randomly. You do not say to patient A, take drug X; to patient B, take drug Y; to patient C, take drug X; and to patient D, take drug Y—you do not do this without thinking further about why each patient should receive one drug and not the other. However, by practicing nonrandomly, you automatically bias all your experience. You think your patients are doing well because of your treatments, whereas they should be doing well because you are tailoring your treatments to those who would do well with them. In other words, it often is not the treatment effects that you are observing, but the treatment effects in specially chosen populations. If you then generalize from those specific patients to the wider population of patients, you will be mistaken.
A few years ago, in an article that was widely disseminated and discussed, the investigators reported that if an antidepressant improved the acute major depressive episode in bipolar disorder (along with mood stabilizers), then its continuation led to fewer depressive relapses in 1 year than its discontinuation.4 Therefore, many readers of the article surmised that if a bipolar patient responds to an antidepressant, he or she should continue with that antidepressant. This seems simple, but only if you do not understand statistics. The first hurdle to validity is bias: In any such study, one has to ask oneself the question, is it randomized or not? If not, then it is prone to confounding bias.
Believe nothing you hear, and only one half that you see. —Edgar Allan Poe1
The study was observational, not randomized, and the results were presented without any analysis of potential confounding factors, such as confounding by indication: Did the researchers stop antidepressants in those patients who were more likely to relapse (based either on the patient’s past history or on other clinical factors that clinicians appreciated)? Did they continue antidepressants in those who were more likely to do well? For the same reason that coffee does not cause cancer, we cannot conclude, without further inquiry, that the results show benefit with antidepressants.
This is the lesson of confounding bias: We cannot believe our eyes. Or perhaps more accurately, we cannot be sure when our observations are right and when they are wrong. They can be either one way or the other but are frequently wrong because of the high rate of confounding factors in the world of medical care.
If confounding bias is removed by randomization, or reduced by statistical analyses (eg, regression), then we can assess whether the results happened by chance, as random error (not systematic error, as in bias). A relationship is unlikely to be erroneous if, using mathematical equations designed to measure chance occurrence of associations, it is likely to have occurred 5% of the time, or less frequently, due to chance. This is the famous P value.3
The application of those mathematical equations is a simple matter, and thus the assessment of chance is not complex at all. It is much simpler than assessing bias, but it is correspondingly less important. Yet many clinicians equate statistics with P values and assessing chance. Often, bias is insufficiently examined and chance is exaggerated—during the course of some articles, 20 or 50 P values are thrust on the reader. The P value is abused until it becomes useless or, worse, misleading. Usually, the problem with chance is that we focus too much on it, and we misinterpret our statistics. The problem with bias is that we focus too little on it, and we don’t bother with statistics to assess it.
Assuming that X is associated with Y and there is no bias or chance error, we still need to show that X causes Y (not just that fluoxetine(Drug information on fluoxetine) is associated with less depression, but that it causes less depression). How can we do this? A P value will not suffice.
This is a problem that has been central to the field of clinical epidemiology for decades. The classic handling of it has been ascribed to the work of the great medical epidemiologist A. Bradford Hill,5 who was central to the research on tobacco and lung cancer. A major problem with that research was that randomized studies could not be done: practically or ethically. The research by necessity had to be observational and was liable to bias. Although Hill and others devised methods to assess bias, they always had the problem of never being able to completely eliminate bias. The cigarette companies, of course, exploited this matter to constantly magnify this doubt and delay the inevitable day when they would be forced to back off in their dangerous business.
With all this observational research, they argued that there is still no proof cigarettes cause lung cancer. And they were right. So Hill set about trying to clarify how one might prove that something causes anything in medical research with human beings, especially when randomization is infeasible. He pointed out that causation is not derived from any one source, but that it is inferred by an accumulation of evidence from multiple sources. It is not enough to say a study is valid; one also needs to know if the results have been replicated in multiple studies, if they are supported by biological studies in animals on mechanisms of effect, and if they follow certain patterns consistent with causation (like a dose-response relationship). We might especially insist on replication. No single study should stand on its own, no matter how well designed and carried out. Even after crossing the barriers of bias and chance, we should ask that a study be replicated and the results confirmed in other samples and other settings.