News|Articles|December 15, 2025

Psychiatric Times

  • Vol 42, Issue 12

Analyzing a Randomized Controlled Trial

Listen
0:00 / 0:00

Key Takeaways

  • A reliable RCT should have a clear primary hypothesis, valid randomization, and appropriate blinding to ensure trustworthy results.
  • Statistical significance is less important than effect size and prediction intervals for determining clinical relevance.
SHOW MORE

Learn how to critically analyze randomized controlled trials to ensure trustworthy results and improve clinical decision-making in mental health care.

The tectonic plates of research drift, and sometimes earthquakes shake our mountains of facts. The facts change, and to stay current, clinicians need to study the gold standard of evidence: randomized controlled trials (RCTs). However, RCTs vary greatly in quality. Before we trust the results of an RCT, we need to know how to analyze RCTs to determine which ones are trustworthy.

A good RCT should state a clear primary hypothesis. The study should clearly state, “Our primary hypothesis was…,” but unfortunately, many RCTs fail to do this. Such failure increases the risk of hypothesizing after the results are known (often called HARKing). HARKing means researchers conduct a study, collect the results, and then emphasize the results that make their study seem to have produced positive and important results.

Some studies may also investigate implausible hypotheses. If a hypothesis seems implausible, you should be very skeptical of any evidence the study provides in favor of the hypothesis.

Psychiatric studies often rely on rating scales, as well; the rating scales used should be well validated.

Good RCTs describe and contain a table showing the baseline characteristics of the patients. If the patients in the study are not similar to the patient with whom you must make a decision, then the study provides little guidance on treating that patient. Even well-done RCTs might have involved patients who are not similar to the patient with whom you are working. In this case as well, the RCT may provide little guidance to you.

The best RCTs are triple blinded: The patients, treating clinicians, and those who measure the outcomes are all blinded to the assigned treatment. Because of the effects of the treatment, including adverse effects, some RCTs are hard to blind (eg, RCTs of psychedelics). An RCT should randomly assign patients using a valid method, and an appropriate computer program should be used to randomize modern studies. We hope that after randomization, the patients in the placebo and the active treatment groups will be largely similar. Sometimes, by chance, groups may differ significantly in a way that could have affected the response to treatment. For example, a group might have a stronger history of resistance to treatment.

A good RCT follows patients for a meaningful length of time. Sometimes, for example, in studies of psychotherapy, follow-up should continue well after the active treatment ends. Unfortunately, many RCTs are unrealistically short. A positive result at the end of a short treatment period does not guarantee a continued positive outcome, even just a few months later.

In general, larger sample sizes are better than smaller ones. However, some treatments, such as psychotherapy, are very labor intensive, and we may have to accept a smaller sample size.

A high dropout rate also undermines our faith in a study’s results. When patients drop out, studies often “impute” a measurement—that is, assign outcomes for the patients who dropped out. There are many methods of imputation, but all methods depend on untestable assumptions. Many studies impute the last observation for each patient who dropped out; this is termed the last observation carried forward (LOCF). LOCF is a conservative method of imputation for positive results and often underestimates the effect that would have been observed if the patients who dropped out had completed the study. However, LOCF may also underestimate adverse effects that might have developed later in the course of treatment. The best RCTs employ various procedures to try to obtain measurements at the end of the study for every patient, including those who dropped out.

An RCT should explicitly state the numerical level of statistical significance (usually P = .05) and state whether the P-value is 1 sided or 2 sided. The investigators should have specified the P-value before starting their trial.

The P-value should be applied to the primary outcome, as defined by the primary hypothesis, and should be made more stringent for secondary outcomes. There are various mathematical methods to adjust the P-value, but the most common method is the Bonferroni correction.

The formula is:

Adjusted P-value = original P-value/number of secondary outcomes

For example, if the original P-value was .05 and there were 5 secondary outcomes, the adjusted P-value would be: .05/5 = .01

However, the adjusted P-value should be applied only to independent events. For example, in a study of response to treatment for depression, it is not legitimate to report statistical significance for both of 2 different rating scales for depression. That would be comparable to studying a drug for weight loss and testing for statistical significance, measured in both pounds and kilograms. Unfortunately, many RCTs report secondary P-values for outcomes that are not independent.

One should not overvalue statistical significance, which can also be reported as a confidence interval (CI). All a CI tells us is how well we have estimated the mean. That is, if P = .05, we can report a 95% CI. A 95% CI tells us that we estimate that it is 95% likely that the mean result would be within the stated range. The CI is only an estimate determined by a mathematical formula that depends on the size of the study; a larger study is more likely to find a statistically significant result.

Far more important than statistical significance is the effect size, which is a measure of how much difference the treatment made. There are many ways to report the effect size.

The effect size should be clinically meaningful over a reasonable period of time. Many researchers seem to believe that if an effect is statistically significant, that the effect is clinically meaningful—this is not at all true. Remember, statistical significance refers only to how well we have estimated the mean effect. A mean effect can easily be statistically significant and not clinically meaningful.

In addition to knowing the mean effect, we would like to know the prediction interval, which tells us how spread out the results are. A 95% prediction interval tells us that 95% of the results fall between the lower and upper limits of the prediction interval. We can use this measurement to see how likely it is that a patient will have a clinically meaningful outcome. If the prediction interval has been calculated as the difference between active treatment and placebo, and if the prediction interval is normally distributed, then we can compute the number needed to treat (NNT).

Let’s compare the CI and the prediction interval with the clinically meaningful effect size. In the Figure, the prediction interval is normally distributed. In the graph below, “M” designates the mean result of the treatment, and “CM” represents the clinically meaningful effect size. Note that the mean result is not clinically meaningful; this is common for many medical treatments.

The green curve represents the distribution of all results of the difference between active treatment and placebo. Let’s eliminate the bottom 2.5% and the top 2.5% under the curve. I have illustrated this by marking in red the bottom 2.5% and the top 2.5% under the curve. Compared with the results with placebo, 95% of the subjects achieved a result between the lower red blot and the upper red blot. This interval is referred to as the 95% prediction interval.

The blue line represents the CI. A mathematical formula estimates that it is 95% likely that the mean result is between the left end and the right end of the blue line. Note that the entire blue line is below the clinically meaningful effect size. If we paid attention only to the mean result and the CI, we would decide that the treatment is not worth providing because it would seem that no subjects achieved a clinically meaningful result. However, if we study the prediction interval, we see that a significant number of patients did achieve a clinically meaningful result.

I have drawn the graph below such that 20% of the area under the curve lies at and to the right of the clinically meaningful effect size. This means 20% of patients achieved a clinically meaningful result compared with placebo. So, we would have to treat 5 patients to have 1 achieve a clinically meaningful result. Thus, the NNT would be 5.

If the prediction interval is narrow, that tells us that patients achieved largely similar results, and we can predict that our patient might have a similar result. If the prediction interval is wide, we must question why the results were so inconsistent, and we are less certain how well our patient will respond.

Sadly, very few studies report the prediction interval. Usually, the best we can hope for is the NNT, but this tells us only how likely it is that a patient will obtain a certain effect and does not tell us how spread out (inconsistent) the results are.

There is a lot more to learn about analyzing RCTs, but this article is a starting point. Repeatedly practice using the worksheet, and you will grow in your ability to determine whether an RCT is reasonably trustworthy and whether it would be appropriate to apply the RCT in deciding whether to offer a particular treatment to a particular patient.

For blank and completed worksheets on analyzing RCTs, files are available here.

Dr Moore is a clinical professor of psychiatry at the Baylor College of Medicine Temple campus.

Dr White is an assistant professor of psychiatry at Texas Tech University Health Science Center.

Newsletter

Receive trusted psychiatric news, expert analysis, and clinical insights — subscribe today to support your practice and your patients.


Latest CME