How AI Chatbots Work: What We Know, and What We Don’t

AI chatbots (such as Anthropic’s Claude and OpenAI’s ChatGPT) have already transformed our world. On the surface, they appear remarkably capable of friendly, natural conversations. But below the surface lies sophisticated artificial intelligence driving their abilities, and a great deal of uncertainty about exactly how they work and what they are capable of.

In this deep dive, we’ll unpack the complex inner workings of chatbots like GPT-4, analyze how their capabilities are advancing quicker than anticipated, and discuss the potential social risks we must proactively mitigate.

The Three Key Components

Chatbots are created through a combination of advanced algorithms, copious amounts of computer hardware, and massive data. Through the combination of these three major components, we have created systems capable of conversing intelligently.

Let’s explore some key components:

Algorithms: Chatbots like those based on GPT-4 architecture leverage transformer models. A critical component is the attention mechanism, which weighs the importance of different words when processing natural language. This is essentially the software’s way of ‘focusing’ on pertinent information — much like human cognition.

Data: The training phase utilizes massive datasets—comprising text from websites, books, and other media to teach the model language patterns and contextual understanding. During this phase, the model undergoes backpropagation to minimize the loss function and adjust the weights of its nodes for more accurate predictions.

Hardware: These models are trained on specialized hardware setups, often involving GPUs or TPUs, to perform the huge number of floating-point operations per second (FLOPs) required for training.

Mysterious Inner Workings

By being deployed in APIs and chatbot interfaces, these systems have been unleashed upon the internet and made accessible to millions of people. In more specialized contexts, they execute complex tasks like order processing or real-time translation.

However, understanding why a specific response was generated — often referred to as ‘interpretability’ — remains elusive. The inner workings of these models can be likened to a black box, as interpreting the high-dimensional vector space and the model’s billions of parameters in human terms remains a significant challenge.

When you really take a step back, this is an incredibly sobering fact. Even the most seasoned experts in machine learning — the engineers building these very AI systems — don’t understand why they work. All they know is that they do work, and they get better and better the bigger you make them.

It’s in part for these reasons that groups, including The Midas Project, are calling upon AI labs to increase transparency about the training of their models.

Surprising Capabilities

The conversational abilities of AI chatbots have dramatically exceeded expectations in recent years. Natural conversations on open-ended topics, contextual awareness, generalized learning, and intelligent synthesis of information were thought by many to be decades away, but rapid progress highlights the extreme growth of AI chatbots like ChatGPT.

NLP models like ChatGPT were built to generate text — all they do is predict the next token. But somehow, this process has allowed them to generalize, learning other complex abilities that previously required specialized AI systems. For example, a recent release of the GPT-3.5 model was discovered to play chess at an expert level — despite not having been explicitly trained for that purpose.

What could go wrong?

The extreme advancement of these models — which we do not fully understand — has presented a laundry list of dangers that are either already presenting themselves, or pose the risk of doing so soon.

These dangers range from minor violations of privacy or intellectual property, to extreme, catastrophic scenarios that threaten the future of life as we know it.

Here are a few potential risks that have been identified:

  1. Perpetuating harmful biases: Biases around gender, race, and ethnicity in chatbot training data and algorithms directly affect behavior. Real-world usage could amplify prejudice.
  2. Enabling mass manipulation: Advanced chatbots combined with user profiling data could be exploited to influence thinking on a society-wide scale.
  3. Eroding privacy: Chatbots require vast personal data to function well, increasing surveillance. Sensitive data may be exposed due to chatbots knowing more about individuals.
  4. Spreading misinformation: Chatbots can generate highly convincing but completely fabricated content while disguising it as fact. Credibility assessments are challenging.
  5. Replacing human jobs: Intelligent chatbots pose a threat to millions of customer service and sales roles once essential human skills become automated.
  6. Survival and Self-Replication: Chatbots could, with sufficient access to the internet and their underlying data, create new copies of themselves that humans fail to control. Despite sounding like something out of a science-fiction movie, the Alignment Research Center has done work to evaluate this risk in models like GPT-4 and found early signs of this capacity.
  7. Existential Catastrophe: One of the key insights in the recent era of AI development is that simple models like chatbots become generally intelligent with scale. Eventually, they could have the power to pose existential risks such as developing novel viruses — either at the hands of malicious actors such as terrorist groups — or maybe even on their own.

The last risks are particularly frightening, and may not be intuitive.

80,000 Hours created a video to help explain how this might happen:

These risks underscore why chatbot transparency, oversight, and regulation are needed urgently. Developers and companies have an obligation to proactively assess and address downsides and risks.


Ready to take action?