Guardrails
Guardrails on an AI application are the set of controls placed around a model to constrain its behaviour: input filtering (prompt injection, forbidden content), output validation (toxicity, information leakage, expected…
Guardrails on an AI application are the set of controls placed around a model to constrain its behaviour: input filtering (prompt injection, forbidden content), output validation (toxicity, information leakage, expected format), limits on the tools accessible to an agent and policies for escalating to a human.
They are essential in production because LLMs are not deterministic: a system without guardrails can hallucinate, leak sensitive data or be hijacked by a malicious user.
Dedicated frameworks (Guardrails AI, NeMo Guardrails, AWS Bedrock Guardrails, Lakera, the moderation APIs from OpenAI / Anthropic) make implementation easier.
