Advanced patterns/40 min

Prompt Injection Defense

Learn practical guardrails for untrusted input, tool calls, and instructions hidden inside retrieved content.

Lesson

Prompt injection happens when untrusted text tries to override the intended task. It is common in RAG, browsing, email, and tool-using agents.

Defenses are layered: isolate untrusted content, constrain tools, validate outputs, and make dangerous actions require confirmation.

Identify injection risk.

Separate trusted and untrusted text.

Write refusal and escalation rules.

Mark trusted and untrusted regions in a RAG prompt.

Write rules for ignoring instructions found inside source documents.

Next action

Add a threat checklist to a prompt-powered feature.