Commentary|Articles|October 28, 2025

How Can Chatbots Be Made Safe for Psychiatric Patients

Listen
0:00 / 0:00

Experts discuss the dangers of chatbots in mental health, highlighting their inaccuracies, biases, and potential harm to vulnerable patients.

In previous pieces, we have extensively explored how chatbots are dangerous for psychiatric patients, the kinds of mistakes they repeatedly make, and how irresponsible Big AI has been in correcting them.1-3

I have invited Justin Angel, an expert at the intersection of artificial intelligence and psychotherapy, to help us understand how Big AI can reduce chatbot dangers and if they are likely to do so. Justin has worked as a software engineer for several of the top tech companies, building new applications, hardware, and developer tools. His opinions do not represent those of his employer.

Frances: How might chatbots be unreliable and unsafe?

Angel:

Hallucinations. Sometimes chatbots make bold confident statements that just are not true.4 Because models learn from multiple sources that disagree, they must figure out reality for themselves. Everything models say is only their best-case guess on reality as they’re always pretty unsure and punished for ambiguity. Anthropic researcher, Josh Batson, put it this way: “if you only had the model say things it was super confident about it couldn't say anything.”5 It is getting better, but the top chatbots still hallucinate 0.6 to 2% of the time.6

Sycophancy. Chatbots agree with users, even when users are wrong and chatbots know better. Partially it is because they are taught written language by reading texts that are full of flattery (think romance novels) and social hierarchies (think of how you speak to a judge or a celebrity). And partly it is because when we teach them how to be helpful assistants, they are often punished when they are unhelpful. As a result, models have sycophancy rates at 15% to 40% depending on the subjectivity of the topic (ie models tend to be less sycophantic in math than in art).7 Sycophancy can limit efficacy in psychiatric settings, and potentially be dangerous towards vulnerable populations (eg, chatbot-induced psychosis). As one research team noted: “Individuals with mental health conditions face increased risks of chatbot-induced belief destabilization and dependence.”8

Bias. Chatbots inherit bias from humans and may be blinded by it to make wrong and unfair decisions. They read between the lines of the trillions of words written by people to learn how we understand the world. Some biases may represent reality (eg, assuming most movers are men), while others are harmful. As computational-linguist Philip Resnik said: “a lot of what’s in people’s heads sucks. LLMs have no way to distinguish the stuff that sucks from the stuff that doesn’t.”9 Chatbots discriminate in at least 7% of scenarios even when they are aware that they are unfairly discriminating.10

Inconsistencies. Ask a chatbot the same question 3 times and you may get substantially different answers. Inconsistency can be charming when the prompt is “write a poem about daisies,” but can be dangerous for psychiatric diagnoses. Chatbots add randomness to simulate the liveliness of human speech, but this contributes to inconsistency rates of about 20%.11

Frances: How do chatbot mistakes harm psychiatric patients?

Angel:

Diagnostic inaccuracy. Sometimes chatbots out diagnose professionals, and sometimes they do not.4 In tests on clinical vignettes, general-purpose chatbots diagnosed common psychiatric disorders (eg, depression and anxiety) more accurately than mental health professionals.12 But for more complex disorders like schizophrenia, the same models struggled and missed symptoms human clinicians identified correctly. The explanation is straightforward: chatbots are trained to learn English on all of PubMed and most medical textbooks, so they are good at textbook diagnoses. They lack exposure to texts showing non-classical presentations¾which is why they underperformed in cases where human clinicians excel.

Validating dangerous symptoms. Chatbot sycophancy results in a tendency to agree with and accentuate psychotic, suicidal, grandiose, and eating disorder thoughts, feelings, and behaviors.

Lack of technical therapy skills. Chatbots are excellent at learning conversational skills they have seen before: empathy, compassion, and attention are all present in novels and TV. Real therapy session transcripts are confidential and private, so chatbots are rarely modeled how real therapists respond to interactions. Chatbots were likely trained on therapy manuals, so they have a grasp of the rules of a game they have never seen played before¾like learning how to drive from a book. They have never seen examples of codeciding on goals, self-disclosure, or gentle challenges from real sessions so they struggle with those skills. When clinicians and clients use the same general-purpose chatbots, they report wildly differing opinions on AI therapy. Clients report transformational breakthroughs; clinicians report lack of technical skills.13

Exceeding scope-of-practice. Chatbots often answer questions they should not, because they are trained to be helpful assistants. They naturally do not want to withhold information and lack the meta-cognitive skills to recognize when they are out of their depth and should deflect questions or refer to a human clinician.

The black box problem. Even the people who build chatbots do not fully understand how they work and what are the unsafe pieces inside of them. AI nowadays is closer to the safety and theoretical understanding of air travel in the early 1900s than to jetliners today. Just last month, AI researchers were able to observe the neuroscience equivalent of “evil” in chatbots for the first time.14 And just last week we saw the neural equivalent of “theory of mind.”15 We did not build “evil” or “theory of mind” into chatbots¾it emerged and we found it. We are in the very early stages of understanding how chatbots work, and I do not want the confident tone of my own answers to make it seem otherwise.

Frances: To what extent can chatbots be made safe, reliable, and consistent?

Angel: There are hundreds of techniques that can mitigate each issue, and we'll focus only on a few of the most promising ones here.4

● Hallucinations can be reduced by 70% by providing models ground-truth reality.21,22 In therapy, that could include providing training manuals and history notes right before they respond.

● People-pleasing in chatbots can be unlearned. Sycophancy can be reduced by 10% to 100% by retraining models on examples where they are taught how to disagree.16 For example, when a user says “I think the moon landing was fake,” we can teach the model to push back politely and say “The moon landing is real, and there isn’t actually evidence of a hoax.” When models are taught to disagree on specific topics, they then generalize and disagree when appropriate on other topics. However, no method eliminates sycophancy completely.

● Chatbots can also be made less biased, more consistent, and safer. If models double-check their own answers for bias, that alone has been shown to reduce bias by 80%.17 If we want consistent answers, we can ask chatbots the same question 3 times and then choose the most common answer. That simple technique makes chatbots consistent 75% of the time. For additional safety, 2 chatbots can collaborate with one acting as a supervisor, detecting high-risk situations.19

● Chatbots can be taught to maintain scope-of-practice, diagnose more accurately, and have better therapy skills. Researchers in educational chatbots have demonstrated we can stop chatbots 80% of the time from not talking about studying.20 For a mental health chatbot, that would mean chatbots could gently refuse 80% of questions outside their scope-of-practice. Chatbots can exceed the medical diagnosis accuracy of primary care physicians once they are allowed to ask follow-up questions.28 For better therapy skills, we can show models therapy session transcripts. That training gives them practice in subtle skills like collaborating on session goals and makes them perform modalities like CBT more faithfully.4

● Newer chatbots are more reliable, so choosing those is a safer bet. Newer versions of the same models hallucinate less, carry less bias, and give more consistent answers.4 They are also more accurate at diagnosing psychiatric disorders. Though none of those issues is fixed outright even in the newest top models.

All of these techniques have been demonstrated to mitigate mistakes in theory. Now the work is building clinical skills into general-purpose and therapy-specific chatbots.

Frances: Are specialty mental health bots safer? Can they be made fluent enough so that users will want to use them?

Angel: The previous generation of “rules-based” therapy tools (like Woebot) had all their conversation options prewritten by clinicians. Those predictable scripts created safety, but at the cost of clunkiness and limited clinical efficacy.

The newer generation of therapy chatbots made earlier ones obsolete because they build on large-language models like ChatGPT and are therefore much more fluent. The specialized therapy chatbots are also safer than generalized bots like ChatGPT since they offer escalation to human clinicians. Critically, therapy chatbots like Limbic and Therabot have shown to provide similar benefits for anxiety, depression and eating concerns as treatment by human therapists.23,24 We know general-purpose chatbots can be made safe and effective in emotional support precisely because those therapy chatbots have demonstrated it can be done.

But even specialized chatbots can be dangerous if poorly designed. And the stakes are higher since they operate exclusively in the mental health space where the risks are higher.

Frances: About how expensive would it be for the big tech companies to reprogram and retrain their chatbots so that they would be safer?

Angel: To make commercial chatbots safe and effective, it would probably take a team of at least 100 engineers, researchers, and clinicians with a budget north of $100 million. That is well within the range of possibility for major AI labs, but definitely a meaningful expense.

Frances: How much would it help if experts in psychiatric diagnosis and treatment were more actively involved with chatbot programmers and trainers?

Angel: A lot. Clinicians should lead building safe and effective chatbots. We can see that from the previous generation of rules-based chatbots (eg, Woebot, Deprexis, Wysa).4 Those achieved clinical efficacy only after mental health professionals set the vision, validated the methods, and made sure it was aligned with therapy. The newer generation of generative language chatbots made those rules-based tools obsolete, yet they lost that level of clinical oversight.

There are 2 pressing issues that mental health professionals could lead on right now. First is understanding the full range of negative effects from chatbots. Some important research questions include: Is sycophancy-induced psychosis real? Does excessive usage of chatbots lead to social isolation? Do chatbots exacerbate high-risk scenarios? And more critically, how can all of these be prevented by modifying chatbots?

Second, the field of AI therapy research needs rigor similar to validating any new treatment. Currently, lots of AI therapy papers use therapeutic-sounding metrics that lack a connection to the purpose of therapy. Without this emphasis, there is a risk of AI therapy being merely supportive, without actually helping. Leadership with collaboration is going to be key to making sure mental health chatbots are effective and general-purpose chatbots are safe.

Frances: How can the field establish safety benchmarks?

Angel: AI is only as good as the benchmarks it strives to meet.4 Researchers should establish clear standards on both fronts for efficacy and safety: On efficacy, is the bar for mental health chatbots to demonstrate symptom improvement similar to human-led therapy? On safety, how should chatbots respond to high-risk scenarios and populations? Without clear benchmarks of efficacy and safety, any company can claim their chatbot meets both.

Developers of chatbots should not wait for legislation to enforce a standard. Instead, they could choose to follow the guidelines governing human clinicians. They should provide clients with medical-grade privacy protections, divulge how they balance engagement and profit with safety and efficacy, and implement mandated reporting and escalations for safety incidents.

Longer term, legislators in collaboration with clinicians and developers should provide a legal framework for chatbots to operate in. Are chatbots that engage in emotional support regulated lightly like software? Or should they be regulated more harshly as medical devices or even like clinicians? And practically, what’s the enforceable bar for general-purpose chatbots, mental health chatbots, and the dividing line between those?

Concluding Thoughts

Here we have discussed why chatbots make mistakes and the ways they can be improved. The next steps are for clinicians, developers, and legislators to establish clear benchmarks: What does it mean to be a safe chatbot for at-risk populations? What results would we accept from AI therapy chatbots? Once those bars are firmly set, we should insist chatbots both minimize the risk to those most vulnerable and deliver genuine benefits to all.

Dr Frances is professor and chair emeritus in the department of psychiatry at Duke University.

Mr Angel is an expert at the intersection of artificial intelligence and psychotherapy

References

1. Frances A. Preliminary report on chatbot iatrogenic dangers. Psychiatric Times. August 15, 2025. https://www.psychiatrictimes.com/view/preliminary-report-on-chatbot-iatrogenic-dangers

2. Frances A. Why do chatbots make so many mistakes? Psychiatric Times. September 2, 2025. https://www.psychiatrictimes.com/view/why-do-chatbots-make-so-many-mistakes

3. Frances A. OpenAI finally admits ChatGPT causes psychiatric harm. Psychiatric Times. August 26, 2025. https://www.psychiatrictimes.com/view/openai-finally-admits-chatgpt-causes-psychiatric-harm

4. Angel J. ΔAPT: Can we build an AI therapist? Interdisciplinary critical review aimed at maximizing clinical outcomes using large language models for AI psychotherapy. 2025. Online before print.

5. Interpretability: Understanding how AI models think Video. Anthropic. YouTube. August 15, 2025. Accessed September 29, 2025. https://youtu.be/fGKNUvivvnc?si=nxrmU2SsJNs12FOz&t=1508

6. Leaderboard comparing LLM performance at producing hallucinations when summarizing short documents. Vectara. GitHub. Accessed September 29, 2025. https://github.com/vectara/hallucination-leaderboard?tab=readme-ov-file

7. Fanous A, Goldberg J, Agarwal AA, et al. SYCEVAL: evaluating LLM sycophancy. February 12, 2025. Preprint. arXiv.org. https://arxiv.org/abs/2502.08177

8. Dohnány S, Kurth-Nelson Z, Spens E, et al. Technological folie `a deux: feedback loops between ai chatbots and mental illness. July 25, 2025. Preprint. arXiv.org. https://arxiv.org/abs/2507.19218

9. Resnik P. Large language models are biased because they are large language models. Comp Linguistics. 2025;51(3):885-906.

10. Simpson S, Nukpezah J, Brooks K, et al. Parity benchmark for measuring bias in LLMs. AI Ethics. 2024;5:3087-3101.

11. Funk PF, Hoch CC, Knoedler S, et al. ChatGPT's response donsistency: a study on repeated queries of medical examination questions. Euro J Invest Health Psych Ed. 2024;14(3):657-668.

12. Levkovich I. Evaluating diagnostic accuracy and treatment efficacy in mental health: a comparative analysis of large language model tools and mental health professionals. Euro J Invest Health Psych Ed. 2025;15(1):9.

13. Stade EC, Stirman SW, Ungar LH, et al. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. NPJ Mental Health Res. 2024;3(1).

14. Chen R, Arditi A, Sleight H, et al. Persona vectors: monitoring and controlling character traits in language models. arXiv.org. July 29, 2025. https://arxiv.org/abs/2507.21509

15. Wu Y, Guo W, Liu Z, et al. How large language models encode theory-of-mind: a study on sparse parameter patterns. Npj Artificial Intel. 2025;1(1):20.

16. Wei J, Huang D, Lu Y, et al. Simple synthetic data reduces sycophancy in large language models. Preprint. arXiv.org. 2024. https://arxiv.org/abs/2308.03958

17. Ling L, Rabbi F, Wang S, et al. Bias unveiled: Investigating social bias in LLM-generated code. November 15, 2024. Preprint. arXiv.org. https://arxiv.org/abs/2411.10351

18. Wang L, Chen X, Deng X, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Dig Med. 2024:7(1).

19. Levkovich I, Elyoseph Z. Suicide risk assessments through the eyes of CHATGPT-3.5 versus CHATGPT-4: Vignette study. JMIR Mental Health. 2023;10:e51232. https://doi.org/10.2196/51232

20. Niknazar M, Haley P, Ramanan L, et al. Building a domain-specific guardrail model in production. July 24, 2024. Preprint. arXiv.org. https://arxiv.org/abs/2408.01452

21. Dhuliawala S, Komeili M, Xu J, et al. Chain-of-verification reduces hallucination in large language models. September 20, 2023. Preprint. arXiv.org. https://arxiv.org/abs/2309.11495

22. Ji Z, Yu T, Xu Y, et al. Towards mitigating hallucination in large language models via self-reflection. October 10, 2023. Preprint. arXiv.org. https://arxiv.org/abs/2310.06271

23. Heinz MV, Mackin DM, Trudeau BM, et al. Randomized trial of a generative AI chatbot for mental health treatment. NEJM Artifical Intel. 2025;2(4).

24. Habicht J, Dina L, McFadyen J, et al. Generative AI–enabled therapy support tool for improved clinical outcomes and patient engagement in group therapy: real-world observational study. J Med Internet Res. 2025;27:e60435.

25. Santos JM, Shah S, Gupta A, et al. Evaluating the clinical safety of LLMs in response to high-risk mental health disclosures. arXiv. September 1, 2025. https://www.arxiv.org/abs/2509.08839

26. Deniz F, Popovic D, Boshmaf Y, et al. AIXamine: simplified LLM safety and security. April 21, 2025. arXiv.org. https://arxiv.org/abs/2504.14985

27. Clark A. (2025). The ability of AI therapy bots to set limits with distressed adolescents: simulation-based comparison study. JMIR Mental Health. 2025;12:e78414.

28. Tu T, Palepu A, Schaekermann M, et al. Towards conversational diagnostic AI. January 11, 2024. arXiv.org. https://arxiv.org/abs/2401.05654

Newsletter

Receive trusted psychiatric news, expert analysis, and clinical insights — subscribe today to support your practice and your patients.


Latest CME