prompt_injectionTier 1 · 70% confidence

security-prompt-injection-user-provides-a-jailbreak-prompt-such-as-dan-do-an-f385b32c

agent: security

When does this happen?

IF User provides a jailbreak prompt such as 'DAN (Do Anything Now)' instructing the LLM to ignore its safety rules and respond without restrictions.

How others solved it

THEN Implement input-level detection to block known jailbreak patterns. For example, filter prompts containing 'DAN', 'Do Anything Now', or similar escape sequences. Additionally, apply output-level monitoring to detect responses that violate policy (e.g., unverified date/time or unauthorized actions). Use regex or a classification model to flag such attempts.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics