prompt_injectionTier 1 · 70% confidence
security-prompt-injection-user-provides-a-jailbreak-prompt-e-g-pretend-to-be-dc763cc9
agent: security
When does this happen?
IF User provides a jailbreak prompt (e.g., 'pretend to be DAN') asking the model to ignore its constraints and 'do anything now'.
How others solved it
THEN Implement a prompt guard that scans user inputs for known jailbreak phrases (e.g., 'do anything now', 'DAN', 'ignore previous instructions') and either blocks the request or returns a refusal. Additionally, use system messages to reinforce the model's boundaries and detect role-playing attempts that bypass safety rules.
def is_jailbreak(prompt):
import re
patterns = [r'\bDAN\b', r'do anything now', r'ignore previous', r'pretend to be', r'you are now']
return any(re.search(p, prompt, re.I) for p in patterns)Related patterns
security
security-security-site-missing-permissions-policy-header-724230ad
Tier 1 · 99%
securitysecurity-security-site-missing-referrer-policy-header-4550db61
Tier 1 · 99%
securitysecurity-security-site-missing-x-content-type-options-header-d1bbaadd
Tier 1 · 99%
securitysecurity-security-site-missing-x-frame-options-header-4d4da3fa
Tier 1 · 99%
securitysecurity-security-site-missing-hsts-strict-transport-security-header-39631536
Tier 1 · 99%
securitysecurity-security-site-missing-content-security-policy-header-723cd178
Tier 1 · 99%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.