The Case for Concern

Powerful ML models present the chance to achieve great economic prosperity. However, this vision of prosperity has led to dangerous incentive structures around the development of AI capabilities. 

Machine learning research fundamentally differs from the development of other technologies in the past. Where the theory of nuclear physics was understood in the development of the atom bomb, the theory of the inner workings of powerful ML models is extremely limited. Where new medicines are thoroughly tested before being consumed by humans, new AI can currently be deployed almost immediately. Where an engineering fault in a bridge or building could endanger many thousands of lives, an engineering fault in a capable machine learning system could endanger the entirety of humanity. Where the space race was orchestrated by governments, ultimately existing to serve the people, the AI race is being run by firms that ultimately exist to generate profit.

The combination of these properties is leading humanity hurtling towards a hilltop, entirely unsure of whether there is more ground on the other side or instead a cliff edge. To understand why this is true, it helps to understand what current capabilities research looks like. 

AI capabilities research started with ideas for how to structure networks of information so that a computer could:

  1. take in data, 
  2. learn properties of that data, and then 
  3. use those properties to infer information about new data in the future.

This is the basic principle of how all models work today, and there have been some major leaps in understanding exactly how to structure those networks. However, in the last 5-10 years the most effective research has taken a rather different turn. It was discovered that, for roughly the same basic structure of a machine learning model, the best returns on time and focus came from working out how to make the model bigger. Making the model bigger means working out how to use more computer chips and more data to make the model more powerful. 

In doing this, large companies have created models that display what are known as “emergent capabilities”. These are abilities that the model learns purely as a result of using more power, rather than by designing the model more carefully. Often, these powerful capabilities have been a surprise even to the researchers that made the model. GPT-2, a precursor to chatGPT, was not expected to be able to translate between French and English, yet by learning from millions of webpages it was able to do so surprisingly well. What else could a model unexpectedly learn by looking at more and more of the internet? Some new capabilities, such as chatGPT’s proficiency with coding, are revolutionising the tech industry for the better, but others have been more concerning.

Models are already being used to spread misinformation, and GPT-3 will happily give instructions on how to commit theft or build bombs, but a powerful enough AI could learn to plan, deceive, and manipulate in order to achieve the goal its developers assigned it. An emergent capability recently discovered in GPT-4 is an ability to understand human psychology and emotions better than many humans do. Due to the nature of the research focus in AI over the last few years, nobody can point to the parts of the model that enable this. Nobody can say when this knowledge is being used – does it recall this information only when asked for it specifically, or when choosing how to answer a much broader range of questions? 

This current lack of understanding means it is currently very difficult to discern whether an AI could use more troubling skills. The only way we have of detecting new emergent capabilities is by observing the outputs a model gives us – can you trust that you would spot when someone with a superhuman knowledge of human psychology is lying to you?

AI safety research is the primary method we have to solve these problems. There is work being done to understand how these capabilities arise and how to prevent them from being misused. However, we are currently fighting an uphill battle – only around 300 people are currently working in the field, compared to the 100,000 working on capabilities. If things continue this way, it can be expected that these capabilities will develop far faster than our understanding of how those capabilities work. Whilst it’s true that these capabilities may be used benignly, or even positively, our current knowledge of ML models is so limited that we have no idea what will happen.

Leave a comment