Tonight's talk is about regulation. We'll look at upcoming regulations and ways to prepare for them in MLOps.
A lot of today's discussion is about supporting decision-makers in building, designing, and deploying ML systems. There's a significant amount of technical work involved. Common questions include whether we're meeting innovation targets by integrating AI into our systems and if the projects align with our business objectives. Is the AI we're using better than existing solutions? Are we using the latest technology, like advanced algorithms or deep learning?
An important aspect we're seeing is adherence to basic principles. There are existing principles to consider, but a gap lies in linking to risks associated with deployment, such as failure risks. This has become a significant blocker for AI system adoption, especially at higher risk levels.
As for my background, I'm Chris, one of the co-founders of Advie. We've been around for three years, focusing on understanding what causes AI to fail, how to measure and mitigate it. We started with adversarial AI, which involves using it to break systems. We've learned this is just a part of understanding AI system failure.
In our work, we've solved many technical questions using accuracy measurements and other criteria. As we move through deployment stages, other risks emerge, like regulatory risks. With upcoming AI regulation, there's more focus on compliance. Questions arise about accountability for system performance in the wild. Understanding this risk is challenging.
Then there's robustness risk: how the system performs with unknown data, handles edge cases, and deals with unknown circumstances. Understanding system limits and these risks is vital, as it affects adoption and public trust. There's backlash against some systems, either from well-informed or misinformed public opinion, causing concerns about unintended societal harms.
So, we're focusing on three themes: compliance, risk, and harm. Understanding stakeholders in the MLOps lifecycle is key. Data scientists and engineers often feel frustrated when great systems they develop hit approval roadblocks. Regulatory users and higher-ups, like managers and directors, face challenges in understanding and approving AI systems due to a lack of information. The public, as another stakeholder, has concerns about how AI affects them and whether they can trust these systems.
Regarding a recent case study, Italy banned ChatGPT, not due to upcoming regulations but because of GDPR challenges. Concerns included the accuracy of ChatGPT's information, data usage and storage, and protecting young users. In response, OpenAI launched the Super Alignment Project to address these concerns, aiming to build trust and a more robust system.
Legislation worldwide shares common themes in AI regulation, focusing on risk, reliability, accountability, security, and human-centricity. The EU AI Act, the UK white paper, and the U.S. Bill of Rights are examples of this trend.
Looking ahead, the EU act and UK white paper are progressing. We expect specific regulatory guidelines and governmental functions to emerge. The U.S. Bill of Rights is a bit further out. By 2025-26, AI regulation will likely be business as usual, especially in regulated sectors and high-risk use cases.
Preparing for this landscape means focusing on compliance, risk, and harm. At Advai, we've developed methods to stress test AI systems and understand their failure limits. This understanding is crucial for robust system design.
Key takeaways include ensuring an assurance process, understanding use cases, tracking failure modes, involving stakeholders, and generating evidence of mitigating challenges.
In summary, aligning ML ops with principles of accountability, reliability, risk management, and protection against threats is crucial. Understanding stakeholders, from data scientists to the public, and their challenges in AI deployment is essential. Documenting evidence of addressing risks and failures is vital for auditability.
Finally, applying these principles in practice involves understanding use cases, tracking appropriate metrics, accounting for failure modes, ensuring robust design, securing endpoints, and documenting all processes for auditability.