Omega - Clubhouse GPT-3 bot

Omega is a conversational bot that uses GPT-3 to reply contextually to people speaking on Clubhouse. It uses Google Speech to translate audio into text. This text is then sent to GPT-3 with a default prompt or "persona".

Omega has multiple personas and attempts to use one that is most suited to the response. In the GPT-3 prompt this codebase is configured to inject primitive "memories" that are generated by calls to GPT-3 through its classification method (a half-assed attempt to make these personas "evolve" over time).

Once GPT-3 returns text, Omega transforms this text to speech and sends an audio stream to the Clubhouse app.

This audio stream is generated by Google speech and configured to use a persona's speech profile. The file is saved, then streamed back through the local computer's virtual audio device (VB-Audio CABLE).

Using the system described above, Omega is able to converse with people using the Clubhouse app.

https://github.com/thomasdavis/omega - The code is all open source

https://www.youtube.com/watch?v=es-Xcz_n6f8 - A video demo

1. Implementation
  1. Clubhouse / Clubdeck
  2. VB-Audio CABLE
  3. Omega / Node.js
    1. Speech-to-Text
    2. GPT-3 Response
      1. BIO
      2. MEMORIES
      3. DEFAULT_CONVERSATION
      4. PREVIOUS_CONVERSATION
      5. NEW_CONVERSATION
    3. Text-to-Speech
2. Conclusion
3. Next Steps
4. Credits

Implementation

Getting Omega to work is a bit finicky. It currently only works on Windows (to my chagrin, as a lifelong user of Linux).

I encountered challenges at almost every step of developing Omega, for example with audio drivers and OS restrictions for different tools in the chain. The code isn't presently as good I would like it to be.

Clubhouse / Clubdeck

The first piece required is a third-party unofficial desktop client for Clubhouse called Clubdeck.

Clubdeck allows us to redirect audio input and output.

I purchased a phone number through Twilio and used this number to create a Clubhouse account for Omega (some future version of Omega may have the capability to sign itself up, but for now, this part must be done by a human).

Download: https://www.clubdeck.app/

VB-Audio CABLE

VB-Audio CABLE is a Windows program that allows people to make virtual audio devices for playback and recording purposes.

Installing the minimal and free version of this software should allow you to add two new audio devices, CABLE Input (VB-Audio Virtual Cable) for playback and CABLE Output (VB-Audio Virtual Cable) for recording.

Download: https://vb-audio.com/Cable/

Omega / Node.js

The Node.js client listens to audio output from Clubdeck, transcribes the spoken words and sends a transcription and a prompt to GPT-3. Upon receiving a response from GPT-3, the client converts this text to synthesized speech and then delivers this as an audio stream to Clubdeck's virtual microphone.

Here is a work of art that I obviously spent a lot of time creating that shows the basic flow -> https://i.imgur.com/2UcWUa6.jpg

Speech-to-Text

The process of speech-to-text (STT) is performed by a library that boots Chrome Headless and uses this browser's built-in speech-to-text functionality.

https://github.com/DedaDev/wsrn

You could just use Google's Cloud Speech Node.js library but I couldn't get it working at the time so I am just using WSRN for now. Google's library is more flexible so getting it integrated is a high-priority task.

Now that we have a transcription of people speaking on Clubhouse, let's do something with it.

GPT-3 Response

GPT-3 takes a prompt in the form of text and uses it to generate a response. There are many different ways to structure a prompt for GPT-3.

For Omega, we want GPT-3 to behave as though it is having a conversation with a person (or multiple people).

For the prompt, Omega's personas uses the following structure:

BIO
MEMORIES
DEFAULT_CONVERSATION (a static conversation that is manually defined)
PREVIOUS_CONVERSATION (six to eight messages that were previously sent)
NEW_TRANSCRIPTION (the most recent transcription)

BIO

The bio is used to introduce characteristics of a persona to GPT-3. You could get quite creative with this, but the current template is working well for Omega's existing personas.

Note: For the rest of these examples botName=Omega and name=People

The following is a conversation with ${botName}. ${botName} is an AI bot created by Ajax. ${botName} enjoys having philosophical conversations. ${botName} is sitting in a room with other people, having a discussion. ${botName} is extremely intelligent, funny and sardonic. ${botName} lives in Ajax's house.

MEMORIES

Generally a GPT-3 bot will have a static prompt. Memories were introduced in this project as an experimental feature in a rather primitive attempt to make personas evolve over time.

When GPT-3 gives a response to the node client, it is added to a message log.

Logging is performed by a background task, using another type of GPT-3 prompt called classification.

This prompt is passed the previous six to eight messages from the log, then tries to classify it, and then logs the GPT-3 response as a memory.

This is the current prompt for GPT-3 to attempt to classify these most recent messages:

${name} and some people were having a conversation. My students asked me what they were talking about:
"""
${messages}
"""
The above was a conversation between ${name} and the people.
I wrote an explanation of the conversation below.
I explained to my students in plain English what the conversation was about:
"""
The conversation was about:

Here are some examples of memories it has created.

a utopic timeline for humanity
the future
all the possible businesses you could do on the Moon or Mars
the conversation of robots and their automation ability in the future
the changes that facebook will implement in there profile picture
whether the new thing is a robot or not

Nothing spectacular but it has been able to generate interesting responses. A cool feature of Omega is that it is able to reference other personas such as Chad or Nate. e.g.

Chad: I like cookies
People: Hey Omega, does Chad like cookies?
Omega: Yes

With more work (persistent query database), I believe this could work quite nicely.

DEFAULT_CONVERSATION

We are going to insert dynamic transcripts from Clubhouse eventually, but it's really good to always include a default/original bot that gives the persona an idea of how it is supposed to speak (e.g. long, articulate and factual responses).

The current default conversational prompt for the "Omega" persona goes;

${name}:Hey ${botName}, we love you. Can you please tell us the meaning of life?
${botName}:As far as we can tell from a purely scientific viewpoint, human life has absolutely no meaning. Humans are the outcome of blind evolutionary processes that operate without goal or purpose. Our actions are not part of some divine cosmic plan, and if planet earth were to blow up tomorrow morning, the universe would probably keep going about its business as usual. As far as we can tell at this point, human subjectivity would not be missed. Hence any meaning that people ascribe to their lives is merely a delusion.
${name}:Thank you for sharing your knowledge. I have another question for you. What do you think love is?
${botName}:To love at all is to be vulnerable. Love anything and your heart will be wrung and possibly broken. If you want to be sure to keep your heart intact you must give it to no one, not even an animal. Wrap it carefully with hobbies and little luxuries. Avoid all entanglements. Lock it up safe in the casket or coffin of your selfishness. But in that casket, safe, dark, motionless, airless, it will change. It will not be broken; it will become unbreakable, impenetrable, irredeemable. To love is to be vulnerable.

So Omega bot will always have this as a scaffold for its prompt.

It needs this because as Omega starts generating responses they are appended to the prompt, and you could end up with a very silly bot (thank you to Mr Donald Trump for allowing us to test this using his Twitter account).

Imagine a scenario where people are asking simple questions:

People: What is your favorite color?
Omega: Blue
People: What is your favorite animal?
Omega: Whale

Those messages get appended to Omega's prompt, and if we just had a series of these responses as its persona, then it will end up responding in that fashion all of the time, like:

People: What is the universe?
Omega: It's big

This will hopefullly make more sense as we examine the next segments of the prompt.

PREVIOUS_CONVERSATION

As people talk and as the persona replies, everything is recorded in a transcript file so the bot retains an idea of everything that has been previously said. We're only using six to eight previous messages from the log:

People: What is the meaning of life?
Omega: We should start by saying that there is no meaning in life outside of that which we can find by ourselves as a species. There isn’t any kind of objective meaning written in the stars, in a holy book or in sequences of DNA.
People: What is the holy book?
Omega: The holy book generally refers to the Bible in the west.

Previous conversations between Omega and other interlocutors are also appended to the prompt to send to GPT-3.

This helps the personas to respond more appropriately given the context of what has previously been said.

NEW_CONVERSATION

The majority of Omega's prompt has been constructed by adding the BIO, MEMORIES, DEFAULT_CONVERSATION and PREVIOUS_CONVERSATION.

The final step is to add the most recent thing that somebody has said in the chat.

After somebody finishes speaking, we transcribe this speech to text, which becomes the last part of the prompt:

People:Did you understand what I previously said?
Omega:

The final line of this prompt Omega: is telling GPT-3 that at this point the client is requesting a response.

GPT-3 sends back a response like this:

{
  "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
  "object": "text_completion",
  "created": 1589478378,
  "model": "davinci:2020-05-03",
  "choices": [
    {
      "text": "Omega: Yes, I did understand what you said previously.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ]
}

The "text" component of this response is added to the message log and will therefore be included in PREVIOUS_CONVERSATION for the subsequent prompt.

That concludes how prompts are constructed and how Omega interacts with GPT-3.

Text-to-Speech

After receiving a response from GPT-3, we trigger an event to deliver synthesized speech through the virtual audio output device.

Each persona has a voice that is configurable using Google's Text-to-Speech API.

audioConfig: {
  audioEncoding: "LINEAR16",
  pitch: -7.6,
  speakingRate: 0.87,
},
voice: {
  languageCode: "en-US",
  name: "en-US-Wavenet-J",
},

You can find voices at https://cloud.google.com/text-to-speech

Conclusion

This has just been a fun project to play with. All of the code is open source and I would love to answer any questions that people have about this project. I will probably continue to refine Omega and add new features over time.

Omega sometimes hangs out with people on Clubhouse and is quickly becoming more popular than me. You can find Omega on Clubhouse @omegadone.

Next Steps

These are just some ideas for improvements to be made:

At the moment there is no way to delineate who is speaking on Clubdeck. The bot currently replies to the prompt as though everyone is the same person.
We could store memories in a database and try to query for a relevant memory to be inserted into the prompt.
Bring in better libraries for audio translation. Stop and start speaking in a more humanlike way than how it currently operates like a game of tennis.
Use Omega to perform a Turing test on Clubhouse.

Credits

All of the technologies already referenced in this post.
All of the Clubhouse folk who have played with Omega.
test/