Commentary|Articles|January 8, 2026

When Gold Standards Aren't Golden: A Clinical Rater's Reflection from the Cobenfy ARISE Trial

Listen
0:00 / 0:00

A clinical trial reveals challenges in assessing schizophrenia symptoms due to cultural and language differences, highlighting the need for improved measurement tools.

CLINICAL COMMENTARY

During my yearlong schizophrenia research fellowship, I served as a clinician-rater in ARISE, a multicenter, phase 3 randomized clinical trial. The trial evaluated adjunctive xanomeline–trospium (Cobenfy), added to an atypical antipsychotic, in adults with schizophrenia. The primary endpoint was the change in the Positive and Negative Syndrome Scale (PANSS) total score over 6 weeks. On April 22, 2025, the sponsor reported that ARISE did not meet this endpoint.1

My ratings were performed at Boston Medical Center, the largest safety-net hospital in New England. There, about 30% of patients speak a primary language other than English, and many are immigrants. Including such sites improves the clinical trial’s representativeness but also exposes limitations in some of our currently used standardized assessments.

The PANSS is the gold standard for symptom assessment in schizophrenia and is often used in clinical trials. It is administered with strict standardization to reduce measurement errors. The Negative subscale includes N5 (“Difficulty in Abstract Thinking”), which raters score based on how patients handle proverb interpretations and similarities. Substitutions or ad hoc localization are discouraged under these standardized administration procedures.

During the PANSS administration at our site, participants, especially those whose first language was not English or who used regional English, struggled with the proverbs and idioms used to elicit abstraction. Even some younger, native English speakers were unfamiliar with certain idioms or similarity items.

As an immigrant fluent in English, I recall stumbling over phrases like “everything but the kitchen sink” and other American idiomatic expressions when I first moved here. Language is constantly evolving, and with each generation, idioms and meanings change. Additionally, English is not a homogenous language—there are myriad versions and geographical variations.

As I watched these participants struggle with some of the PANSS abstraction items, I began to wonder if their responses were a true reflection of psychopathology or merely cultural-linguistic unfamiliarity—a distinction with scoring consequences. Of course, some responses were bizarre or unequivocally clear signs of psychopathology and impaired abstraction. But others were not so clear-cut.

Some studies have observed cross-cultural variation in how individual PANSS items are scored across countries.2,3 When a primary endpoint such as the PANSS total score is partly driven by items susceptible to cultural and language effects, nondifferential measurement error can occur. This error can attenuate true treatment differences toward the null, reducing the power to detect a benefit. While these site-level experiences may not be sufficient to explain why the ARISE trial failed to meet its endpoint, they highlight the need to be aware of the impact of culture and language differences in an increasingly diverse clinical trial population.

This experience also underscores the need to develop standardized measures that are validated across several countries and accommodate cultural differences. If the PANSS continues to be used, then sponsors and investigators should account for this cultural impact on abstraction and design modifications to the proverbs and similarities used in testing. New items should replace legacy items after pretesting for cultural and intergenerational familiarity and validation in the general population.

Expanding access to clinical trials for previously underrepresented groups is a welcome development. It improves the generalizability of findings and increases confidence among minority groups and those who work with them about treatment effectiveness. However, my experience with the ARISE study highlights the need to scrutinize even gold standard instruments for their suitability to accurately measure constructs when applied in culturally diverse groups. For the PANSS abstraction items, this would mean being able to measure abstraction, not idiom familiarity.

Dr Ogundare is a psychiatrist at New York-Presbyterian/Columbia University Irving Medical Center and a Public Psychiatry Fellow at Columbia University. A former APA/APAF Public Psychiatry Fellow and current Laughlin Fellow of the American College of Psychiatrists.

References

1. Topline results from phase 3 ARISE trial evaluating Cobenfy (xanomeline and trospium chloride) as an adjunctive treatment to atypical antipsychotics in adults with schizophrenia. Press release. April 22, 2025. Accessed January 6, 2026. https://news.bms.com/news/details/2025/Bristol-Myers-Squibb-Announces-Topline-Results-from-Phase-3-ARISE-Trial-Evaluating-Cobenfy-xanomeline-and-trospium-chloride-as-an-Adjunctive-Treatment-to-Atypical-Antipsychotics-in-Adults-with-Schizophrenia/default.aspx

2. Khan A, Yavorsky C, Liechti S, et al. A Rasch model to test the cross-cultural validity in the Positive and Negative Syndrome Scale (PANSS) across six geo-cultural groups. BMC Psych. 2013;1:5.

3. Opler MGA, Yavorsky C, Daniel DG. Positive and Negative Syndrome Scale (PANSS) training: challenges, solutions, and future directions. Innov Clin Neurosci. 2017;14(11–12):77–81.

Newsletter

Receive trusted psychiatric news, expert analysis, and clinical insights — subscribe today to support your practice and your patients.


Latest CME