Asking broad, ambiguous questions is a great way to get irrelevant results.
As a user, you can use prompts to guide which of the ‘Experts’ is engaged, and to generally home in on the section of patterns where your desired knowledge is likely to be.
A racehorse benefits from a jockey who knows what they're doing, so too do LLMs. They need to be expertly guided to perform at their best.
Much thought goes into how to guide them, to the point now that jobs have been created to do so. 'Prompt Engineers' are the jockeys who use language to steer AI in the desired direction. They generate and refine prompts that the AI responds to, helping improve the system's capabilities and making it more user-friendly.
The internet is already a trove of ChatGPT hacks and shortcuts, guides for developers who incorporate LLMs into their digital services, and tech community support on how to ‘ground’ LLM responses.
Language models are designed to serve as general reasoning and text engines, making sense of the information they've been trained on and providing meaningful responses. However, it's essential to remember that they should be treated as engines and not stores of knowledge.
To make them more reliable and grounded in fact, techniques like Retrieval Augmented Generation (RAG) are used. This method ensures the model uses specific, reliable information when responding to a prompt, reducing the chances of spouting nonsensical or inaccurate information. And as you have probably seen, Bing’s AI has guardrails in place to ensure it limits it’s responses to the information found in search results (and citing accordingly).
Recently, OpenAI have incorporated some of these techniques into their consumer-facing versions of ChatGPT. They’re enabling users to provide custom instructions that give them more general control over their interactions.
(‘Always write according to British Spelling conventions and in plain English, and please please please stop telling me your information is unreliable and limited to 2021, because I know, ok?’ would be this author’s first custom instruction.)
‘Oh, great’ you might think, ‘we can prepare a well-considered document of preamble to guide ChatGPT’s responses, to ensure it is user-friendly, dependable, polite, straight to the point and on-brand’.
Right?
Nope.
As we'll discover in the next section, these techniques that ground models are far from perfect. They have arguably untameable dependability issues, which must be navigated with caution.
OpenAI and Microsoft’s beta general chat assistants might be able to begin every conversation with a disclaimer for inaccuracy. But if an intelligent banking assistant was to show your balance with a disclaimer that the balance might be inaccurate, it wouldn’t exactly encourage users to trust it would it?
Clearly, to enable widespread adoption, we need to reduce these vulnerabilities to as close to zero as possible.
Before you give up hope and wonder if harnessing the wondrous might of large language models will ever be dependable enough for your corporate context… I will plant the seed for a more technically challenging solution to LLM trustworthiness.
It won’t work with closed models like ChatGPT, but:
LLMs can be tamed through a fine tuning of a customised ‘reward model’.