Resource Automation + AI Generative AI

Controlling Large Language Models: A Primer

Concerns over risks from generative artificial intelligence systems have increased significantly over the past year, driven in large part by the advent of increasingly capable large language models. But, how do AI developers attempt to control the outputs of these models? This primer outlines four commonly used techniques and explains why this objective is so…

The document outlines the risks posed by large language models (LLMs), including the generation of inaccurate, biased, or malicious outputs.

It reviews four main control techniques—editing pre-training data, supervised fine-tuning, reinforcement learning with human feedback (RLHF) and Constitutional AI, and prompt/output controls—emphasizing that these methods are often combined but none are completely effective, especially given the rapid evolution and open availability of LLMs.