Damian leads a team that researches AI robustness, safety and security. What does this mean? They spend their time developing breakthrough methods to stress test and break machine learning algorithms.
This, in turn, shows us how to protect these same algorithms; from intentional misuse, and from natural deterioration.Ā It also lets us understand how to strengthen their performance under diverse conditions.
This is to say, to make them ārobustā.
Ā
The Superalignment initiative aligns well with our research at Advai. Manual testing of every algorithm and for every facet of weakness isnāt feasible, so ā just as OpenAI have planned, weāve developed internal tooling that performs a host of automated tests to indicate the internal strength of AI systems. Ā
āItās not totally straightforward to make these tools.ā Damianās fond of an understatement.
Ā
The thing is, trying to test for when something will fail is traying to say what something can't do.
You might say 'this knife can cut vegetablesā. But what if you come across more than vegetables? What canāt the knife cut? Testing when a knife will fail means trying to cut an entire world of materials, categorising āthings that can be cutā from āeverything else in the universeā. The list of things the knife canāt cut is almost endless. Yet, to avoid breaking your knife (or butchering your item) you need to know what to avoid cutting!
To be feasible, one needs shortcuts in conducting these failure mode tests. This is where automated assurance mechanisms and Superalignment comes in. There are algorithmic approaches to testing what we might call the ānegative spaceā of AI capabilities.
Ā
This might sound difficult - and it is, controlling what an algorithm does is hard, but controlling what it doesnāt do is harder. Weāve been sharing our concerns about AI for a few years now: they have so many failure modes. These are things businesses should be worrying about because there is a pressure to keep up with innovations.
There are so many ways that a seemingly accurate algorithm can be vulnerable and can subsequently expose its users to risk. Generative AI and large language models like Chat GPT-4 make it harder still because these models are so much more complex and guardrail development is reciprocally much more challenging. Ā