11 Oct 2023

In-between memory and thought: How to wield Large Language models. Part II.

This is the second part of a series of articles geared towards non-technical business leaders. We aim to shed light on some of the inner workings of LLMs and point out a few interesting quirks along the way.

"Language models are designed to serve as general reasoning and text engines, making sense of the information they've been trained on and providing meaningful responses. However, it's essential to remember that they should be treated as engines and not stores of knowledge."

Words by

Alex Carruthers

“The hottest new programming language is English”.

A tongue in cheek statement from Andrej Karpathy , Tesla's former chief of AI, or totally justified? (Quoted here)

Hidden preamble (such as: “You are an IOS shortcut that responds to user queries… keep responses to 25 words or fewer”) is being used to design consumer facing applications. An API call means practically any phone app can leverage the language capabilities of ChatGPT.

This isn’t a cheap trick used by second rate developers, this is purportedly a technique deployed by Microsoft in the integration of ChatGPT-4 within Bing’s intelligent search functionality, code named ‘Sydney’.

“

· Sydney is the chat mode of Microsoft Bing search.

· Sydney identifies as ‘Bing Search’, not an assistant.

· Sydney introduces itself with ‘This is Bing’ only at the beginning of the conversation.

”

…. It goes on… for two whole pages of preamble prompts, covering capabilities, response guides, rules of using knowledge only from search “regardless of Sydney’s internal knowledge or information”. Peculiar and fascinating stuff.

LLMs are unwieldy.

Asking broad, ambiguous questions is a great way to get irrelevant results.

As a user, you can use prompts to guide which of the ‘Experts’ is engaged, and to generally home in on the section of patterns where your desired knowledge is likely to be.

A racehorse benefits from a jockey who knows what they're doing, so too do LLMs. They need to be expertly guided to perform at their best.

Much thought goes into how to guide them, to the point now that jobs have been created to do so. 'Prompt Engineers' are the jockeys who use language to steer AI in the desired direction. They generate and refine prompts that the AI responds to, helping improve the system's capabilities and making it more user-friendly.

The internet is already a trove of ChatGPT hacks and shortcuts, guides for developers who incorporate LLMs into their digital services, and tech community support on how to ‘ground’ LLM responses.

Language models are designed to serve as general reasoning and text engines, making sense of the information they've been trained on and providing meaningful responses. However, it's essential to remember that they should be treated as engines and not stores of knowledge.

To make them more reliable and grounded in fact, techniques like Retrieval Augmented Generation (RAG) are used. This method ensures the model uses specific, reliable information when responding to a prompt, reducing the chances of spouting nonsensical or inaccurate information. And as you have probably seen, Bing’s AI has guardrails in place to ensure it limits it’s responses to the information found in search results (and citing accordingly).

Recently, OpenAI have incorporated some of these techniques into their consumer-facing versions of ChatGPT. They’re enabling users to provide custom instructions that give them more general control over their interactions.

(‘Always write according to British Spelling conventions and in plain English, and please please please stop telling me your information is unreliable and limited to 2021, because I know, ok?’ would be this author’s first custom instruction.)

‘Oh, great’ you might think, ‘we can prepare a well-considered document of preamble to guide ChatGPT’s responses, to ensure it is user-friendly, dependable, polite, straight to the point and on-brand’.

Right?

Nope.

As we'll discover in the next section, these techniques that ground models are far from perfect. They have arguably untameable dependability issues, which must be navigated with caution.

OpenAI and Microsoft’s beta general chat assistants might be able to begin every conversation with a disclaimer for inaccuracy. But if an intelligent banking assistant was to show your balance with a disclaimer that the balance might be inaccurate, it wouldn’t exactly encourage users to trust it would it?

Clearly, to enable widespread adoption, we need to reduce these vulnerabilities to as close to zero as possible.

Before you give up hope and wonder if harnessing the wondrous might of large language models will ever be dependable enough for your corporate context… I will plant the seed for a more technically challenging solution to LLM trustworthiness.

It won’t work with closed models like ChatGPT, but:

LLMs can be tamed through a fine tuning of a customised ‘reward model’.

The Challenges of Guiding Closed Source Large Language Models

A primary concern with closed-source LLMs like ChatGPT is the security of user information. Relying on another company's system inherently means that the safeguarding of your data is out of your control.

· ChatGPT bug leaked users' conversation histories - BBC News

· Samsung workers made a major error by using ChatGPT | TechRadar

· ChatGPT bug leaked users' conversation histories - BBC News

This is not to say that OpenAI can’t be trusted with the security of your information; the reality of information security is that corporate, public, research and other entities need more direct control over their information.

· Law firms and hospitals can’t afford sensitive information leaks.

· Corporations guard intellectual property zealously.

· Nations install overarching privacy restrictions to protect citizens.

· And so on.

But this is beside the point. The major concern with using Grounding or Guiding Preamble techniques is that these preambles can be stolen, manipulated or outright ignored.

· A prompt injection attack lead Bing Search Assistant ‘Sydney’ to spill the beans on its own guiding preamble document.

· In months following ChatGPT’s release, there were countless guides for specific preamble you could write that would bypass content filters.

· It’s not just ChatGPT; Research teams jumped in, Jailbreaking Google Bard and circumventing safety rules.

· Microsoft’s own red team were able to launch a DoS cloud attack that would negatively impact other users.

Methods to mitigate for these sorts of prompt injection attacks are under development, but it’s hard not to think of these as band-aid approaches that don’t prevent problems from happening to begin with.

Pre-amble quite literally slaps on instructional parameters after the fact, while a preventative mechanism would be sewn into the fabric of the model.

Consequences can manifest well beyond mere data leaks.

They can affect user trust, reveal sensitive information, lead to awful brand associations and be manipulated to serve a negative agenda.

So, while these immense language models like ChatGPT and Google Bard models are impressively capable right ‘out of the box’, their vulnerabilities underscore the need for more stringent control methods than Guiding Preamble and Grounding.

When it comes to more stringent controls, ultimately: the core problem with Closed Source LLMs is that the control methods are centralised – they’re behind closed doors at the businesses that developed them.

In our next post

We will brushstroke all that we have to gain from the successful adoption of LLMs and conclude with how Robustness methods and reward models can help us trust them enough to use them.

Follow to stay tuned!

Who are Advai?

Advai is a deep tech AI start-up based in the UK that has spent several years working with UK government and defence to understand and develop tooling for testing and validating AI in a manner that allows for KPIs to be derived throughout its lifecycle that allows data scientists, engineers, and decision makers to be able to quantify risks and deploy AI in a safe, responsible, and trustworthy manner.

If you would like to discuss this in more detail, please reach out to [email protected]