P values can be deceiving. In order to determine if a clinical trial result is important, we need to look at the effect size.
Leslie Citrome, MD, MPH
Clinicians often assume that if a P value is below 0.05, the result must be important. Moreover, there is often the false belief that a P value of less than 0.0001 denotes an even more impressive difference. There is a general lack of understanding of effect size and how this can be used to appraise clinical trial results. Simple-to-calculate measures of effect size, such as number needed to treat (NNT) and number needed to harm (NNH), can help psychiatrists place study outcomes in a more relevant clinical context.
What is an effect size?
In medicine, a treatment effect size denotes the difference between two possible interventions. This can be expressed in point change on a rating scale or the percentage of people who meet the threshold for response. For example, a 4-point difference on the Montgomery-Asberg Depression Rating Scale between an experimental antidepressant and placebo at the end of a 6-week study would be the effect size. It can be “standardized” into standard deviation units using a mathematical formula. However, such effect sizes are difficult to understand and place into clinical context.
NNT is an effect size that is easy to explain. The NNT describes the number of persons who need to be treated with one intervention vs. another before expecting to encounter an outcome you are interested in, such as “response.” If we define response as achieving at least a 50% improvement on a rating scale, then we can take the percentages of people who meet the criteria for response for each intervention and calculate the NNT. The formula is simple:
NNT = 1/([percentage of responders receiving Intervention A] – [percentage of responders receiving Intervention B])
If the NNT is a low number, then it takes only a few patients to be given Intervention A instead of Intervention B before encountering one additional responder. If the NNT is a high number, the difference between the interventions for that outcome is very small.
How is an effect size different from a P value?
A P value tells us how likely we are dealing with the “truth.” The lower the P value, the more likely that the result is probably real. However, the P value tells us nothing about effect size. A result may be statistically significant but would be clinically irrelevant if the effect size is small. The “truth” may be unimportant.
What is an important NNT?
A NNT for an intervention vs. placebo that is less than 10 generally means that the intervention is worth considering.
What about adverse outcomes?
Number needed to harm (NNH) can be helpful when describing tolerability. We can calculate NNH in the same way as NNT, except here we are describing outcomes we would like to avoid. Examples of these adverse outcomes are akathisia, sedation, and weight gain. An overall indicator of tolerability is the percentage of people who stop an intervention because of an adverse event.
In medication trials we know the percentage of subjects who had to stop the experimental drug or placebo because of adverse event. Using the formula, we can gauge the overall tolerability of a medication vs. placebo:
NNH = 1/([percentage of people who discontinued because of an adverse event while receiving Intervention A] – [percentage of people who discontinued because of an adverse event while receiving Intervention B])
What is an important NNH?
A NNH for an intervention vs. placebo that is greater than 10 generally means the intervention is reasonably well tolerated on that outcome being measured.
What is the likelihood to be helped or harmed?
Likelihood to be Helped or Harmed (LHH) is the ratio of NNH to NNT and answers the question “How more often will I encounter the benefit compared to the harm?” For example, if response for a medication to treat depression yields a NNT vs. placebo of 5, we can say it would take 5 patients to receive this antidepressant instead of placebo before encountering one additional responder. If discontinuation because of an adverse event yields a NNH vs. placebo of 25, we would conclude that it would take 25 patients to receive this antidepressant instead of placebo before expecting to encounter one additional person who had to stop because of an adverse event.
Because the NNH is a bigger number than the NNT, we would expect to encounter response more often than having to stop the medication because of an adverse event. How much more often? Well, 25/5 = 5 times more often. This mean I can tell the patients “Based on the clinical trials, you are 5 times more likely to be helped (to be a responder) rather than harmed (stopping the medication because of an adverse event).”
With the metrics of NNT, NNH, and LHH, we can put clinical trial results into clinical perspective by using these patient-centric concepts.
Dr Citrome is Clinical Professor of Psychiatry and Behavioral Sciences at New York Medical College, Valhalla, NY. He is a speaker at the 2019 Psych Congress in San Diego, CA, in a presentation titled “When does a difference make a difference? Everything you need to know about effect sizes, but were afraid to ask.”
The author reports that he is or has been a consultant of Acadia, Alkermes, Allergan, Avanir, BioXcel, Eisai, Impel, Indivior, Intra-Cellular Therapies, Janssen, Lundbeck, Luye, Merck, Neurocrine, Noven, Osmotica, Otsuka, Pfizer, Shire, Sunovion, Takeda, Teva, Vanda; a speaker at Acadia, Alkermes, Allergan, Janssen, Lundbeck, Merck, Neurocrine, Otsuka, Pfizer, Sage, Shire, Sunovion, Takeda, Teva; held stocks (small number of shares of common stock) with Bristol-Myers Squibb, Eli Lilly, J & J, Merck, Pfizer purchased > 10 years ago; and receives royalties: Wiley (Editor-in-Chief, International Journal of Clinical Practice), UpToDate (reviewer), Springer Healthcare (book).