New methods of conducting and evaluating research were as intriguing as their results at the National Institute of Mental Health (NIMH)-sponsored New Clinical Drug Evaluation Unit Program's (NCDEU) 38th annual meeting in Boca Raton, Fla., June 10-13. The meeting has grown from a forum of NIMH-funded researchers reporting on their progress into a convention of approximately 1,000 clinicians, industry and regulatory personnel, and investigators marking the progress in psychopharmacology. However, the meeting continues to include assessments of research methodology.
To answer the question posed in their presentation title, "Do clinical trials reflect drug potential?" Mary Hooper, B.A., and Jay Amsterdam, M.D., of the Depression Research Unit at the University of Pennsylvania, invoked the Freedom of Information Act to obtain and examine the U.S. Food and Drug Administration reviews of New Drug Applications (NDAs) for paroxetine (Paxil), venlafaxine (Effexor), nefazodone (Serzone), mirtazapine (Remeron) and sertraline (Zoloft). Although the efficacy of these agents is established in each NDA by statistical differentiation from placebo, Hooper and Amsterdam suspect that a heightened placebo effect from such factors as a high dropout rate, poor site selection or poor protocol design can mask the potential of the active drugs.
Hooper reported on their review of 37 clinical trials, all of which had used the Hamilton Depression Scale (HAM-D), the HAM-D "depressed mood" item, the Clinical Global Impression scale (CGI) severity and CGI improvement scores to measure improvement from baseline at the end of week 6 and week 8 of the acute treatment phase. They found that efficacy was not demonstrated on any of the four measures in 13 (35%) of the trials. Paroxetine efficacy was demonstrated on all four measures in three of six trials (50%). Venlafaxine studies did not employ CGI, but in two of six trials (33%), success was found on each of the other three measures. Nefazodone demonstrated significant improvement in all four measures in one of nine trials (11%). Mirtazapine was found effective in all measures in four trials, but failed efficacy criteria in four other trials. Sertraline demonstrated efficacy in three of six trials (50%).
"These data," the investigators related, "emphasize the importance of controlling for confounding variables in the development of new antidepressant compounds."
Douglas Feltner, M.D., and colleagues at Solvay Pharmaceuticals sought to confirm that factors other than the spontaneous improvement of depressive symptoms contribute to the placebo effect in antidepressant trials. The group made retrospective calculations of between-visit changes in 17-item Hamilton Depression Scale (HAM-D-17) total scores for patients receiving placebo in several similarly designed antidepressant trials. They found that the mean HAM-D-17 scores in the single-blind phase of the trial, from screening to baseline visit, were not reduced (mean change was an increase of 0.18 1.79); but a reduced mean score, indicating symptom improvement, occurred in patients with high, medium and low baseline HAM-D-17 scores from the first postrandomization week to the third.
"These data suggest," Feltner and colleagues concluded, "that in placebo-controlled depression trials, the placebo effect is more prominent during the double-blind phase than during the single-blind placebo run-in phase." The investigators emphasized that this difference is unlikely to be due to different rates of spontaneously improved depression between the two trial phases.
In addition to validating a new agent against placebo, the studies in an NDA customarily employ fixed-dosage comparisons to establish a minimal effective dose for the new agent and, perhaps, to discourage subsequent use of excessively high doses with associated heightened side effects. However, Jeffrey Mattes, M.D., of the Psychopharmacology Research Association of Princeton, N.J., suggested these goals might not be achieved by fixed-dose studies that may actually obfuscate the response by some patients to unusually low doses.
"Thus, a finding in a fixed-dose study that only patients receiving a specified high dose improve significantly does not necessarily mean that lower doses are not beneficial for some patients," Mattes explained.
Mattes offered as an example a recent study of paroxetine efficacy in panic disorder, which indicated that a dosage of 40 mg/day was effective while 10 mg/day and 20 mg/day were not. "Results of this type might lead to some patients receiving a higher dose than they need, the very outcome that fixed-dosage studies were intended to avoid," he warned.
Mattes proposed that variable-dose rather than fix-dose designs could often serve, and would more closely resemble clinical treatment. Variable-dose design can also identify minimal effective dose, Mattes indicated, with slow-dose titration to therapeutic effect and tapering following stabilization.
"Variable-dose studies are also more cost-effective when attempting to demonstrate efficacy," Mattes argued, "since fixed-dose studies require larger sample sizes, anticipating that some patients will receive either too low or too high a dose."
The possibility that the dosages approved by the FDA from fixed-dose studies are not the optimal dose range was raised by Robert Hamer, Ph.D., of the Robert Wood Johnson Medical School, Piscataway, N.J., and Pippa Simpson, Ph.D., of the University of Arkansas for Medical Sciences, Little Rock, Ark. Hamer and Simpson went beyond ascribing this possible disparity to the deficiencies in fixed-dose designs to suggest, "It may be in the sponsor's interest to design the clinical trials in such a way as to minimize the finding of a dose-response effect."
In addition, Hamer and Simpson questioned the adequacy of the statistical methods used to account for subjects who drop out in clinical trials. "These techniques themselves often make untenable assumptions, and frequently have difficulty with the estimation process." They warn that even if a dose-response effect exists, it may not be discerned in the face of confounding factors in the trial design, dropouts, causes of dropouts and in the chosen methods of statistical analysis.
Michael Borenstein, Ph.D., of the Hillside Hospital, Teaneck, N.J., described two statistical analysis computer programs he developed to help ensure accurate meta-analysis of multiple studies and power analysis in survival studies. "Meta-analysis can be used," Borenstein explained, "to set policy by allowing us to develop a clear picture from existing data. It can also be used to help plan future research by pinpointing areas for which the existing information is not sufficient." His meta-analysis program aids this research by facilitating the establishment of a detailed database for individual studies.
Borenstein's computerized survival analysis performs computations similar to those that determine the power of a study with a single time-point, but can incorporate the complexity of the parameters over time. "Survival analysis," Borenstein explained, "provides a clear picture of outcome that is not available at any single time point." Such analysis, for example, distinguishes between patients who remit at different times during a trial, or identifies those who fail to remit by a particular time point and require different follow-up.
"The program allows the researcher to plan for precision [i.e., the confidence interval width] as well as power," Borenstein indicated. "In some cases the study goal is not only to test the null hypothesis of no effect, but also to estimate the size of the treatment effect."
Blakely DB, Oddone EZ, Hasselblad V et al. (1995), Noninvasive carotid artery testing. A meta-analytic review. Ann Int Med 122:360-367.