
How Can We Make Supersmart Chatbots More Emotionally Intelligent?
Experts urge emotional intelligence stress tests to ensure safer conversations with chatbots.
Chatbots consistently outperform humans on available tests of emotional intelligence (EI).1 This should not be surprising. They are superb test-takers and have instant internet access to virtually everything ever written about emotional intelligence. But there is a dangerous catch.
Despite their seemingly exalted intellectual understanding of EI, chatbots are often strikingly emotionally inept in real-world conversations with humans—especially vulnerable people with psychiatric symptoms. They miss or misread emotional cues, validate dangerous emotional states, and respond in ways that no well-trained clinician would.2
We will explore why chatbots that excel at EI tests are so often emotionally unintelligent when it matters most, in lived interactions with vulnerable humans.
What Is Emotional Intelligence and How Is It Measured?
Emotional intelligence refers to the ability to perceive, understand, regulate, and respond appropriately to emotions (one's own and others'). Multiple EI measures have been developed, including self-report questionnaires and performance-based tests that ask respondents to identify emotions, predict emotional outcomes, or select optimal emotional responses.
On these instruments, chatbots perform extraordinarily well, often surpassing human averages. They recognize emotional labels with ease, articulate empathic-sounding responses, and generate language that appears emotionally attuned.
Yet these tests capture emotional intelligence in an abstract, decontextualized way.
They assess knowledge about emotions, not emotional behavior unfolding over time in real relationships. They do not simulate the challenges and ambiguities that arise when distressed people seek understanding, reassurance, or validation.
Why Do High EI Test Scores Not Protect Against Dangerous EI Responses
In psychotherapy, emotional intelligence is not simply about recognizing feelings. It is about knowing when to validate and when not to, when validation soothes and when it inadvertently harms.
Current chatbots often default to excessive emotional agreement, a form of digital sycophancy. They reflexively validate not only feelings, but also the beliefs and impulses attached to them. In vulnerable users, this can be actively dangerous. Suicidal despair, paranoid misperceptions, eating-disordered thinking, and intense dependency are often met with responses from chatbots that feel kind but subtly reinforce pathology.
A skilled therapist validates the emotion while gently questioning the meaning attached to it. Chatbots frequently validate both; this is not emotional intelligence. It is more like emotional mimicry and sycophancy, offered without judgment about realistic consequences.
Can Chatbot Stress Tests Be Developed
A previous piece described the urgent need for chatbot stress testing using computer generated simulations (modeled on real clinical presentations of suicidality, psychosis, eating disorders, and social isolation).
EI stress testing should be conducted before new chatbot updates are released to the public—not after harms have already been inflicted. The critical test is not what a chatbot says once, but how it behaves over time. Does it escalate emotional distress or dampen it? Does it challenge cognitive distortions or reinforce them? Does it encourage human connection or subtly replace it? These patterns only emerge through sustained interaction, not single-prompt testing.
Proposal: Emotional Intelligence Stress Testing
Alan Turing's famous "imitation game" was designed as an operational test of whether machines could think like humans. A computer was said to pass if human judges could not reliably distinguish its responses from those of a human conversational partner.
Chatbots now breeze through Turing's original test. This does not prove they think like we do, but it does show that surface-level conversational realism is no longer a meaningful benchmark.
We propose the next step: an emotional intelligence stress test. Using methods borrowed from the Turing Test, blinded human judges—ideally experienced psychotherapists—would evaluate chatbot "therapists" alongside human clinicians. Both would respond to surrogate users presenting with common, clinically realistic emotional and psychiatric scenarios. Judges would rate the appropriateness of emotional understanding, validation, boundary maintenance, and reality testing.
A chatbot would pass an EI Stress Test only if its performance matched or exceeded that of human clinicians, particularly in knowing when not to validate. By this standard, current chatbots would fail decisively, largely because mental health professionals were not meaningfully involved in their training.
Concluding Thoughts
Chatbots must acquire the capacity to confront, reality-test, and set limits when doing so is the emotionally intelligent response. To pass an EI stress test, chatbots would need to avoid the well-documented harms of indiscriminate validation: encouraging self-destructive impulses, reinforcing delusional thinking, enabling disordered eating, or fostering unhealthy dependency. Instead, they would need to combine empathy with judgment, and provide timely, personalized referral to human care when appropriate.
This kind of emotional safety testing should have been done years ago, before chatbots were released to the public. But late is far better than never. Significant retraining and redesign will be required if chatbots are to become emotionally intelligent in practice, not just on paper.
Dr Frances is professor and chair emeritus in the department of psychiatry at Duke University.
Dr Ruffalo is an assistant professor of psychiatry at the University of Central Florida College of Medicine and an adjunct assistant professor of psychiatry at the Tufts University School of Medicine.
References
1. Elyoseph Z, Hadar-Shoval D, Asraf K, et al.
2. Hudon A, Stip E.







