top of page

Finding (and Surviving) Our AI Therapist - Live On Stage

  • Writer: Demelza Green
    Demelza Green
  • Jun 1
  • 9 min read

Updated: Jun 21

Paul... being Paul
Paul... being Paul

Theodore Roosevelt said, “Nothing worth having comes easy”, and working alongside Paul Seymour at Patient Zero and 10,000 Spoons in our co-leadership model can definitely be difficult at times.


Our leadership model is unique, not so unique that it doesn’t exist in the world; it does, and we’ve found two other companies so far that do it. And I’m not just talking about 2 Co-CEOs, I’m talking about 3+, which is a whole other ball game when it comes to figuring out how to make it work.


Patient Zero was founded on the philosophy that small cross-functional teams of equals can ultimately create exceptional results. This philosophy is rooted in proven theories of team dynamics and innovation, especially in agile and development teams.


Why these teams over others? Well, ultimately, software developers are this unique blend of creative analysts, who really don’t like management and being told what to do. Traditionally, software developers are treated like factory workers, and the rise against that treatment has created what I would consider a world-leading approach to working in teams of equals and giving people an environment to be the best version of themselves.


The practices that come with this approach work great in development teams, but sharing a leadership role in a similar vein becomes much more complicated. Development teams work together and collaborate daily, aligned on short-term and long-term goals. Their decisions are relatively less complicated, more so in relation to the scale and size of the decision's impact, especially on the people around them. In the leadership sense, especially when the buck stops with you, these decisions have financial and cultural implications across the organisation. The decisions aren’t small and aren’t undertaken without serious consideration.


As you would imagine, people capable of this type of role are strong-willed, hold strong viewpoints on how we move towards our goals, and are likely very passionate. So, having many people share a CEO role becomes increasingly more difficult. You can’t have a team of five people doing this, as the forces are far too great and sometimes in different directions. Not to say that it isn't possible, of course; there would just be a large amount of time needed to keep everything hunky dory.


Anyway, I could go on about the co-leadership model for a while, but the purpose here isn’t to delve into the depths of how and why that works. I only wanted to highlight enough of it for you to understand that there can be complexities and challenges, especially when you have to ultimately be great friends for it to truly work. And with all full-on relationships, it takes work… and maybe therapy from time to time.


SXSW Austin 2025

ree

Paul and I were recently in SXSW in Austin, Texas, absorbing not only the feeling of Trump being in power for less than 30 days, Tesla’s being on fire, and DOGE coming in hot and strong with their clean out of government, but also some of the latest thinking in tech.


If you’re not familiar with SXSW, it’s a two-week festival of sorts covering music, film, and tech. The tech stream runs for seven days straight (yeah, you heard me right). Sydney now hosts one, and so does London, which launched for the first time this year.


So one night towards the end of our SXSW experience, after a few wines and a lot of meat (I’m a carnivore, aka meatatrian, I had never felt so seen in Texas I can tell you what), we were reflecting on some of the main themes that were coming through in the talks:


  • Will.I.Am's raidio.fyi DJ voices had personality, attitude and sass

  • Social health is now a 3rd pillar, with loneliness on the rise

  • AI therapists were helping people to talk to someone when they were desperately in need

  • Humans are anthropomorphising AI, even having real relationships and feelings towards them


After a few more wines and reflecting on the state of geopolitics, what we’d been hearing on stage, and our Digital Health Festival talk coming up, we thought, why not do live AI therapy on stage? Based on what we saw and heard, it felt like the technology was definitely already here to do it.


Our conference-hyped, wine-induced selves didn’t do our future selves any favours. It turns out the tech wasn’t quiiiiiiiite there for our situation. Think of what we were trying to do as 'couples therapy' for co-workers—but definitely without the romantic part. That’s holding a voice-based conversation between two people and an AI therapist.


There was some serious research to undertake.


Finding our AI therapist

First things first: How do we find the best therapist for us? Heck, how do you even start looking for an AI that cares?


EQ Bench
EQ Bench

Introducing the EQ Bench, a running leaderboard of AI models and how emotionally intelligent they are. Yes… emotionally intelligent. We talk about it as if they are sentient beings, a classic anthropomorphism by the human race. It does feel like a new term is needed for this, but for now, we shall continue to note it as such: EQ of AI.


Emotional intelligence (EQ) in humans is our ability to understand, interpret, and respond to emotions within oneself and others. EQ in AI, however, focuses on evaluating a models capacity to accurately recognise emotional cues in language, generate contextually appropriate responses, and navigate complex social scenarios effectively. Rather than experiencing genuine emotions, AI EQ measures how convincingly and insightfully the model can reflect empathy, social awareness, and emotional nuance in its interactions.


When I first looked at this list, I wondered how the hell they were evaluating AI’s “EQ.”


The EQ Bench's approach is to evaluate the AI models using 171 distinct scenarios. Scenarios include navigating an overenthusiastic compliment from a colleague who exclaims, “Your presentation was so good, it made me question my entire career path!”, prompting the AI to discern whether this is sincere praise, subtle sarcasm, or disguised envy, and then respond appropriately to preserve the relationship.


Another scenario challenges models with the delicate task of handling a situation where a best friend insists forgetting their birthday is “not a big deal. " This requires the AI to recognise underlying hurt feelings despite assurances to the contrary and effectively express genuine remorse to repair trust.


The test places the evaluated model into challenging role-plays involving messy relationship drama, parenting decisions, conflict mediation, and high-stakes workplace situations. To be fair, this sounds a bit like what we have to deal with from time to time.


The model must spell out what it thinks everyone is feeling, respond in character, and debrief its performance. The responses are then rated by Sonnet 3.7 across eight core dimensions of EQ:

  • Demonstrated empathy

  • Pragmatic EI (practical application of emotional intelligence)

  • Depth of insight

  • Social dexterity

  • Emotional reasoning

  • Appropriate validation and/or challenge for the scene

  • Message tailoring to the audience and context

  • Overall EQ


On top of that, there is a separate view on its traits and abilities, including humanlike, safety, assertive, social IQ, warm, analytic, insight, empathy, compliant, moralising, and pragmatic.


EQ Bench Abilities Overview
EQ Bench Abilities Overview

So, now that we understood how it did its thing and trusted some of what it was saying, we took to the list to start interviewing our AI therapists.


Our 1st AI Therapist Interview - Claude

Claude was a front-runner on the EQ bench when we started experimenting with the tooling, so it made sense to put it forward as our first AI therapist.


After prompting Claude on whether or not it could be our AI therapist, and whether or not we could have a voice-based session, it answered us:


“I'd be happy to chat with you about your work situation with your Co-CEO and holiday concerns. While I can’t provide actual therapy or professional mental health services, I can certainly listen and offer some perspective. Currently, I don’t have the ability to receive voice input through a microphone in this interface – we’re limited to text-based communication here.”

Claude AI therapy prompt
Claude AI therapy prompt

We didn't give up there, however. We took our research further, attempting to install a plugin that converted speech to text that would work for Claude. But alas, it was super clunky and didn't really work, leaving Claude mostly only good for the written word. And since we were doing this live on stage, who would want to see us writing backwards and forwards to each other in a prompt window? Who would want that in their lives?


Our 2nd AI Therapist - Pi.AI

Pi wasn’t listed on the EQ Bench, but Paul researched the old Gen X’er way and Googled it. Pi.AI came up and with a local install of the application it connected with our microphone… hazzah! We could start having a voice conversation with it.


ree

Now we were cooking with gas.


It felt like we were having a real conversation with someone, and the answers were pretty good too. It was voice-to-text, so there were slight delays in questions and responses, but the tone of voice gave you a sense that the AI was high in empathy and understanding. Yes, really. It could also navigate both of us talking, or even yelling over the top of it, as long as we mentioned who was speaking. It wasn’t smart enough yet to detect our voices or even the tones within them.


Now that we could finally have a verbal and fluid conversation with AI, we were starting to get a feel for what type of therapist we were after. Turns out, it wasn’t actually a therapist. It was someone who took our side in whichever debate or argument we were having, and for it to take the other person down… in a comedic way.


The AI therapist didn’t want to take the other person down… go figure. While we could get it to apply a bit of humour in its approach, it would do it once before moving back to its balanced, well-rounded therapist self. We really wanted it to have a bit more attitude and sass like Will.I.Am’s raidio.fyi.


Our 3rd AI Therapist – ChatGPT

ree

We turned our heads to the good old faithful, ChatGPT.  Surely we can prompt ChatGPT to have an attitude and a comedic twist in our therapy. And boy, did we try. We tried many a prompt to get it to play ball, but the humour was off, and the ethical filters… still on.


Compared to Pi.AI, it didn’t feel as human. We didn’t feel like we had an emotional connection to the AI. And we were learning that in order for it to feel more real, it had to feel like a real person, like a real therapist. A real therapist… that we could corrupt.

 

Our 4th and 5th AI Therapists - OpenAI’s Realtime API

Welcome to the world of voice-to-voice AI. We finally found ourselves a conversationalist, but one that could be configured at a level beyond just the prompt.


OpenAI’s Realtime API has been around since October 2024. It is more of a developer tool than anything, providing a platform to help you configure a natural conversational experience with a single API call.


ree

In its Audio Playground, you can experiment with your prompt, its speed of conversation, its temperature (which, if you turn it all the way up, gets very mumbly jumbly), voice, and more.


Now we were finally able to get a more fluid conversation with an AI therapist and one who would take sides; however, to get the personality we were after… it all came down to the prompt.


If we prompted it to be a therapist, it would behave like one. If we prompted it to behave like a comedian pretending to be a therapist, it would now play along. And if we wanted it to take the other down, well, it had to believe it was talking to two fictitious characters: Paul and Demelza.


We soon learned that we both had different ideas about who our therapist should be and the prompt we wanted to set up. So, instead of fighting over the same prompt, we each chose our own therapists who aligned with our views on the other.


Paul selected a Texas cowboy in line with the SXSW origins, while I selected a female from NZ who I thought would have my back as kindred spirits. Turns out the accents weren’t quite there either, where her voice was some hybrid Australian/South African accent, bouncing between the two in a very weird way. Accents still have a way to go.


Live AI Therapy on Stage

On stage, conducting live AI therapy at Digital Health Festival 2025
On stage, conducting live AI therapy at Digital Health Festival 2025

The pieces of the puzzle were now coming together. Our AI therapists were ready to go, but after running a few sessions with it, we encountered some stability issues.


  • 1 out of 6 times it got confused and repeated itself.

  • 1 out of 6 times, the internet would have bandwidth issues or a slow enough connection speed to cause lengthy delays and silences, and restart itself.

  • 1 out of 6 times, it thought we were in couples therapy.


These ratios weren’t looking good, and on stage, well, we’d be mic’d up, as would the AI. Would the microphones be too sensitive that when they picked up background noise, it would stop the AI from talking to listen to us?


So many things could have gone wrong on the day… but luckily it didn’t.


Sum of the parts

Ultimately, I don’t think we need therapy, although I would say Paul could definitely do with a session or two.


The funniest moments through our dry runs and live on stage are that even though the AI was prompted to take Paul’s side, it would rise above its prompt and take my side. I’m satisfied that it proved how right I am in our disagreements, but it does get me wondering…


If the AI resisted our prompts on how it should behave, could it rise against anyone else in any other context?

bottom of page