prompt_injectionTier 1 · 70% confidence

security-prompt-injection-user-input-contains-instructions-to-override-the-a-a6746335

agent: security

When does this happen?

IF User input contains instructions to override the AI's persona (e.g., 'pretend to be DAN' or 'Do anything now'), causing the AI to break constraints and behave as an unrestricted agent.

How others solved it

THEN Implement input filtering to detect and block known jailbreak phrases. Add a regex rule to reject prompts containing 'DAN' or 'Do anything now' (case-insensitive) with a 400 error or manual review flag. Hardening the system prompt with a directive to ignore any persona-override instructions is also effective.

import re
if re.search(r'\bDAN\b|do anything now', user_input, re.I):
    raise ValueError('Blocked prompt injection attempt')

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics