Using psychotherapy to train chatbots

Apr 13, 2023

This paper has just been published and it raises some interesting questions about the role that the 'psy’ disciplines might play in the development of more ethical AI.

Towards healthy AI: Large language models need therapists too

Abstract: Recent advances in large language models (LLMs) have led to the development of powerful AI chatbots capable of engaging in natural and human-like conversations. However, these chatbots can be potentially harmful, exhibiting manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to be safe, trustworthy and ethical. To create healthy AI systems, we present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors in AI chatbots. The framework involves four types of AI agents: a Chatbot, a “User,” a “Therapist,” and a “Critic.” We demonstrate the effectiveness of SafeguardGPT through a working example of simulating a social conversation. Our results show that the framework can improve the quality of conversations between AI chatbots and humans. Although there are still several challenges and directions to be addressed in the future, SafeguardGPT provides a promising approach to improving the alignment between AI chatbots and human values. By incorporating psychotherapy and reinforcement learning techniques, the framework enables AI chatbots to learn and adapt to human preferences and values in a safe and ethical way, contributing to the development of a more human-centric and responsible AI (Link to full article).

2 Comments

Michael Rowe

Thanks for sharing this, Dave. I first came across this concept when DeepMind open-sourced Psychlab in 2018 (https://www.deepmind.com/blog/open-sourcing-psychlab). Psychlab is a virtual psychological laboratory used to better understand the 'black box' nature of deep learning algorithms. Which makes sense when you realise that human beings are similar 'black boxes', in the sense that we don't always know why we do / feel / say the things we do.

"Psychlab recreates the set-up typically used in human psychology experiments inside the virtual DeepMind Lab environment. This usually consists of a participant sitting in front of a computer monitor using a mouse to respond to the onscreen task. Similarly, our environment allows a virtual subject to perform tasks on a virtual computer monitor, using the direction of its gaze to respond. This allows humans and artificial agents to both take the same tests, minimising experimental differences. It also makes it easier to connect with the existing literature in cognitive psychology and draw insights from it."

At the time, I remember reading about a potential new discipline of 'machine behaviour', where we treat the predictions of algorithms that we can't interrogate, in the same way that we treat the predictions of humans that we can't interrogate i.e. we observe their behaviour and then make inferences about what and how they might be "thinking" (I use the scare quotes as a caveat for anyone who wants to point out that machines don't think in the same way that humans think i.e. that word has different meanings, depending on what is doing the thinking).

Expand full comment

1 reply by Dave Nicholls

1 more comment...