Better Tools Needed to Measure Treatment Outcome

Author(s)Kenneth J. Bender, PharmD, MA

The need for better tools, as well as better use of existing tools, to measure treatment response in clinical trials was a principle focus of the 46th annual NIMH-sponsored NCDEU (New Clinical Drug Evaluation Unit) meeting, held June 12-15 in Boca Raton, Fla. Improved clinical research techniques are needed to better separate treatment effect from placebo response, to distinguish between active comparators, and to facilitate development of novel treatments, according to several presenters at the conference.

In the workshop titled "Enhancing Precision in Clinical Trials," Mark Rapaport, MD, of the Cedars-Sinai Medical Center in Los Angeles, challenged clinical researchers to ensure the validity of the measures that they use to establish treatment outcomes. He asked workshop participants to discuss the selection, development, and refining of methods to evaluate acute and long-term outcomes of interventions for mood and anxiety disorders, as well as efforts to discern treatment effect while minimizing risks to participating subjects.

The Hamilton Rating Scale for Depression (HAM-D) was one of the established instruments revisited in discussions on improving the validity of clinical investigations. Although it has long been considered a standard for measuring severity of affective symptoms, its use to assess therapeutic intervention end points has been questioned. The problem, according to Ellen Frank, PhD, of the University of Pittsburgh, has less to do with the instrument than with its application.

"I knew Max Hamilton," Frank recounted. "I even had the privilege of training with him on his now iconic instrument and hearing him talk about what he had in mind when he developed it, and it had nothing to do with how we are using it today."

Frank contrasted the scale's current use--as a gauge of symptom change in inpatient and outpatient populations with mild to severe depression--with its original use to measure relative severity of symptoms in patients typically hospitalized for severe depression or melancholia. Researchers have been reluctant to stop using the instrument, according to Frank, because of its prominence and because it links new research to past studies.

Frank agreed with Rapaport on the need to have new end points for studies of depression. The new measures should include improvement in function, she asserted. "What patients and patient advocacy groups tell us they want out of treatment are a home, meaningful relationships, and satisfying work," Frank noted.

With academic and pharmaceutical industry researchers now recognizing the need for more sensitive measures of interventions for depression, Frank suggested that a choice be made to direct resources toward developing either a single broad instrument or multiple scales for different forms of depression. In addition, Frank urged increased adoption of new technology for the development of better assessment instruments and processes.

One attempt to improve upon the HAM-D--undertaken by a collaboration of researchers from several pharmaceutical manufacturers and universities--was described in another section of the NCDEU conference by Nina Engelhardt, PhD, of MedAvante, Inc, in New Jersey. Engelhardt explained that the GRID-HAM-D offers a standardized scoring system that incorporates intensity and frequency of depressive symptoms into the severity score.

Engelhardt reported on validity testing of the GRID-HAM-D total and item scores drawn from a sample of 150 outpatients with depression. Inter-rater reliability was comparable for the GRID-HAM-D, the structured interview guide for the HAM-D, and the unstructured interview Guy version of the HAM-D. Engelhardt declared that the GRID-HAM-D was as reliable as the current HAM-D, with the advantages of a standardized scoring system, integrated conventions, and an interview guide.

"These features may provide specific benefits for typical raters who have less clinical assessment experience than the highly experienced raters in this study," Engelhardt indicated.

Biomarkers in clinical research

John Kane, MD, of the Zucker Hillside Hospital, Glen Oaks, NY, and Albert Einstein College of Medicine of Yeshiva University, Bronx, NY, noted the increasing use of biomarkers as another emerging trend in clinical research but acknowledged that many have been found too inconsistent to serve as correlates of psychiatric illness. "The integration of biomarkers with valid measures of symptom severity may herald a new model for future clinical trials," Kane projected.

The renewed interest in using biomarkers in clinical studies was welcomed by Mark Opler, PhD, MPH, of Columbia University and the PANSS Institute in New York City. Opler noted that clinical studies of psychiatric disorders have been complicated "by the almost universal absence of reliable physical or biochemical pathologies."

While noting many unsuccessful efforts to establish valid biologic markers for a range of conditions--including depression, schizophrenia, and Alzheimer disease--Opler pointed to some recent successes in studies employing "hybrid" measures that combine traditional data collection approaches with tests of biochemical and systemic function.

"There is renewed interest in searching for biological markers that can be used to diagnose, subtype, or quantify disease severity," Opler indicated.

John Sweeney, PhD, of the University of Illinois at Chicago, pointed out that eye-movement studies have provided a quantitative biomarker for studying cognitive and motor systems for decades. As an example, he mentioned that eye-movement studies have tracked the ability to suppress context-inappropriate responses after antipsychotic treatment, which emerges more gradually than the reduction in psychotic symptoms.

"While studies of eye tracking in psychiatry began with interest in their use as an endophenotype for family/ genetic research, more recent work has shown that studies of eye movements hold great promise as a translational biomarker in testing the neurocognitive efficacy of treatments," Sweeney observed.

Progress in adopting technologies for clinical research, as Frank and others have advocated, was the theme of a workshop on telepsychiatry and biomarkers conducted by Kane. He observed that telepsychiatry--a form of telemedicine involving videoconferencing, telephones, secure e-mail, and other modalities--has been considered by some to be too complex and impractical to employ in research settings.

According to Kane, however, this technology is enjoying growing popularity. "Telepsychiatry is emerging as a viable alternative to in-person assessments," he noted, "with particular advantage demonstrated in obtaining standardized, reliable assessments at fewer sites and with fewer raters."

Don Hilty, MD, from the University of California, Davis, agreed that telepsychiatry--which he characterized as a component of "e-health"--is developing into a useful technology for research, in addition to its wider application in clinical care and education. Hilty noted its successful use in several research populations, its particular utility for outreach to rural areas, and its application as a novel tool in subject recruitment.

"It's well received by participants, it's affordable, and it's versatile," Hilty concluded.

Engelhardt reported using telepsychiatry for conducting psychiatric assessments in clinical trials, as well as in evaluating and training the raters. Her group has used both 2-way audio conferencing and videoconferencing for those purposes, remotely assessing patients from a centralized location in real time in multicenter studies of treatments for schizophrenia and depression.

Searching for optimal scales

The putative benefit of the use of second-generation antipsychotics to treat the negative symptoms of schizophrenia has not been supported by sufficient evidence for these agents to be approved for that indication. In the introduction to a workshop on the methodologic hurdles in proving treatment effectiveness for negative symptoms, Nina R. Schooler, PhD, of the Washington DC Veterans Affairs Medical Center, and Stephen R. Marder, MD, of the University of California, Los Angeles, attributed this partly to a lack of consensus on an effective means to measure these symptoms.

"Currently available measurement tools for negative symptoms vary widely in their face validity, psychometric properties, and user friendliness to the clinical trials investigator," they wrote in the workshop overview.

Brian W. Kirkpatrick, MD, of the Medical College of Georgia, described recent work on a new rating scale appropriate for clinical trials based on recommendations from the NIMH-sponsored Consensus Development Conference on Negative Symptoms held in January 2005. The instrument contains subscales for 5 domains that were identified at the conference: alogia, blunted affect, avolition, anhedonia, and asociality. This new scale is close to being field tested, although Kirkpatrick anticipates that it could undergo modifications through that process.

Fabien Tremeau, MD, and colleagues from the Nathan S. Kline Institute for Psychiatric Research, Orangeburg, NY, described another new scale for negative symptoms--the Motor-Affective-Social Scale (MASS). This instrument is based on the determination that negative symptoms can be measured as expressiveness during an interview or by certain social behaviors. The MASS is applied during a 5-minute structured interview with ratings of coverbal hand gestures and spontaneous/voluntary smiling, as well as answers to interview questions.

Delusions are another component of schizophrenia for which there have been few quantitative measures. Barnett S. Meyers, MD, of the Weill Medical College of Cornell University in New York City, described the development of the Delusional Assessment Scale (DAS). The DAS, which measures the intensity of beliefs across multiple delusional domains, was recently applied to determine whether delusions among older patients differ from those in young adults. Meyers reported that delusions appeared to have a greater impact in older patients and that men demonstrated greater conviction than women, regardless of age.

With the amount of interest in new scales facilitating clinical research, it was probably inevitable that a scale would be developed to rate how well the evidence from research is adopted into clinical practice. Jessica L. Garno, PhD, reported on the validation of a rating scale to assess adherence to evidence-based psychopharmacotherapy practices. Despite recent efforts to expand research based on effectiveness, Garno asserted that practitioners underutilize the findings and have insufficient guidance to extrapolate from efficacy studies to the treatments they provide.

"The rapid growth of new pharmacotherapies for mood disorders, coupled with the promulgation of numerous practice guidelines, has prompted the need for more systematic approaches to choosing longitudinal treatment strategies," Garno explained.

Garno and colleagues developed a 20-item, 20-point scale to rate fidelity to pharmacotherapeutic practices that have been commonly accepted in consensus meetings and expert reviews. The scale was found to have face validity, internal consistency, and item-total, as well as overall inter-rater reliability in validity testing with psychiatrists who referred patients to an inpatient facility for treatment of unipolar or bipolar mood disorder and comorbid substance abuse.

Garno and colleagues suggested that the application of their scale "may hold value in tracking performance improvement among clinicians or in examining relationships between evidence-based practices and patient service utilization or outcome."

This is part 1 of the 2-part coverage of the 2006 NCDEU meeting; the second report, on treatment investigations, will appear in a future issue of Psychiatric Times.