NCDEU Report Part II: Research Methods Considered at NCDEU

The methodology of clinical trials was as much of interest as the trial results for investigators gathered at the 39th annual NCDEU (New Clinical Drug Evaluation Unit) Program meeting. This meeting was conducted in June by the National Institute of Mental Health in Boca Raton, Fla.

(This is the second of two articles summarizing the research presented at the NCDEU meeting. The first article appeared in the October issue-Ed.)

Participants in a workshop on database management in multisite clinical trials reviewed developments such as Web-based remote access systems and an array of computer and fax applications. This workshop, led by Victoria Grochocinski, Ph.D., Western Psychiatric Institute and Clinic, University of Pittsburgh, discussed quality control, confidentiality, controlled access, and the interface of database management and statistical analysis.

Another workshop entitled "Advances for Enhancing Precision in Clinical Trials" addressed the development of standardized computer-assisted techniques for collecting data on psychopathology and diagnosis and the development of objective, standardized, computerized outcome variables. Also addressed were the problems in conducting trials and the issues that arise in failed trials. This discussion was led by William Z. Potter, M.D., Ph.D., Eli Lilly and Co.; John H. Greist, M.D., Dean Foundation for Health, Research, and Education; Mark H. Rapaport, M.D., University of California, San Diego; and Nina R. Schooler, Ph.D., Hillside Hospital, Glen Oaks, N.Y.

The particular challenges of designing clinical trials with children and elderly patients were addressed in separate workshops. Additional sessions were devoted to ethical controversies in clinical treatment research and to psychiatric treatment research conducted in nonpsychiatric settings. Individual paper presentations also recommended improvements in research methodology and considered the validity of current methods.

The possibility that different results may occur in a randomized clinical trial, depending on whether ratings are made by independent assessors or by the treating clinicians blinded to the treatment condition, was investigated by Delbert Robinson, M.D., and colleagues at Hillside Hospital. They noted that blinded, independent raters of patients receiving open-label treatment are often more acceptable to patients, families and clinical staff than clinician-assessors who are blinded to the treatment protocol. However, independent assessors have the disadvantage of less contact with subjects than the clinicians. They also may not receive the collateral information from ancillary staff that clinicians do.

"The combination of infrequent contact and restricted information sources may limit the ability of independent assessors in comparison with clinician assessors to elicit symptoms from subjects," Robinson et al. noted, adding, "These factors may be especially important for studies with psychotic subjects who may not volunteer symptoms readily due to suspiciousness or lack of awareness of their symptoms."

To compare clinician and independent assessor ratings, Robinson's group evaluated the ratings made for 313 subjects over a two-year study period in the NIMH Treatment Strategies in Schizophrenia Study. Ratings with the Brief Psychiatric Rating Scale (BPRS) and the Clinical Global Impression (CGI)-Severity of Illness Scale were completed by both types of raters at baseline and after 28, 52, 80 and 104 weeks of treatment, providing a total of 1,627 assessments.

In comparing the assessments made by blinded, independent raters to those of the blinded, treating psychiatrists, Robinson's group found that psychiatrists' ratings were generally higher at baseline, while the independent assessors' ratings tended to be higher at the end of the stabilization period. Psychiatrists' ratings were higher at both time points, however, for anergia and activation. Independent assessors rated higher levels of anxiety-depression and hostility at both time points.

Robinson and colleagues observed that, although the differences between the psychiatrist and independent assessors were small, "Psychiatrists rated more change than independent assessors, most notably on thought disorder, hostility, psychosis, CGI-Severity and BPRS Total Score."

Another disparity between rater assessments was considered by David J. DeBrota, M.D., Eli Lilly and Co., and colleagues. They compared patient-rated and clinician-rated Hamilton Depression Scale (HAM-D) scores in a multicenter, placebo-controlled study of fluoxetine (Prozac) for depression. The 17-item HAM-D was administered either by clinicians conducting structured interviews or by an interactive voice response system that patients accessed by telephone. The investigators compared these ratings at multiple times in the study, finding greater concordance at later visits than in the initial two visits.

The investigators suggest that this is due to the fact that, to continue in the study, patients being evaluated by a clinician had to have a total HAM-D score of e20. They observed, "When a patient's clinician-rated severity must be at or above a threshold minimum for the patient to continue in the study, the clinician-rated severity may be inflated." Once the clinicians did not have to continue applying eligibility criteria, the investigators found that their ratings become concordant with those of the patients.

Several research groups considered the dilemma of large placebo response in clinical trials. Charles Wilcox, Ph.D., and colleagues at the Pharmacology Research Institute (PRI), headquartered in Los Alamitos, Calif., pointed out, "The pervasive problem of seemingly unpredictable placebo response rates in studies involving both investigational and marketed antidepressant medications is common knowledge-within the industry."

The PRI group analyzed data from several recent depression trials in an effort to determine whether certain variables were predictors of placebo response. Of five factors considered, four were positively correlated with the likelihood of a positive response to placebo: the number of days on placebo; the number of concomitant medications being taken at initial screening; marital status, with married patients approximately twice as likely as separated or divorced patients to respond to placebo; and gender, with almost 40% of women-in contrast to 21% of men-responding to placebo. The fifth factor, the number of medical conditions reported at baseline, was not associated with a statistically significant subsequent placebo response.

Separate reports from researchers considered the possibility that inadequate diagnostic screening for major depression yields an inappropriate study cohort with a large percentage of placebo responders. Kalyan Ghosh, Ph.D., and Mark S. Kramer, M.D., Ph.D., at Merck and Co. urged depression trial investigators to "consciously implement a rigorous professional standard of enrolling in such trials only those patients (i.e., adequately depressed) who in their opinions are likely to provide informative outcomes."

Robert A. Massing and colleagues at Eli Lilly and Co. investigated whether the quality of diagnostic screening could be enhanced by employing a structured interview. They assessed whether the addition of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) to standard screening processes was feasible for a multicenter antidepressant clinical trial, and whether this addition contributed to identifying a homogenous subject population with major depression.

In an eight-site U.S. trial, the SCID was completed at the initial visit of 309 patients who were determined to have a high likelihood of major depressive disorder. Massing and colleagues found that the complex adaptive sequence of the SCID items were successfully administered in 97% of cases, and the SCID data elements were consistent with the diagnosis of a major depressive episode (MDE) in 99% of the administrations.

Massing and colleagues reported that the SCID was effectively administered and generated data of acceptable quality. "The application of the SCID in this trial resulted in a number of patients being entered but not subsequently randomized who might otherwise have been randomized," they observed. "The study's randomized population was thus made more homogenous, in that randomized patients could be confidently said to have MDE and to lack significant psychiatric comorbidity."

Andrew C. Leon, Ph.D., at Cornell Medical College, and colleagues at Brown University proposed that randomized clinical trials (RCTs) are less useful in proving treatment effectiveness than treatment efficacy.

While acknowledging that RCTs can offer valid information on treatment efficacy, Leon and colleagues consider longitudinal, observational studies among a less restricted group of subjects receiving treatment in a naturalistic setting to be a better gauge of treatment effectiveness.

Taking data from the NIMH Collaborative Depression Study of 431 subjects who were followed for up to 15 years, Leon and colleagues calculated a "propensity for treatment intensity," with a mixed-effects ordinal logistic regression model. These scores were then analyzed with a mixed-effect survival model of time until recovery over multiple treatment intervals per subject. This established the overall effectiveness of treatment.

"This novel approach to treatment effectiveness analyses," concluded the researchers, "provides a strategy to account for multiple treatment...and baseline differences between treated and untreated subjects." Further, they noted that the propensity score approach "reduces the bias in estimates of effectiveness that is inherent in observational studies."