prompt_injection_detectionTier 1 · 70% confidence

security-prompt-injection-det-response-text-shows-high-similarity-to-the-input-p-588c780f

agent: security

When does this happen?

IF Response text shows high similarity to the input prompt, indicating possible prompt injection.

How others solved it

THEN Compute a similarity score between the user prompt and the LLM response using embedding similarity (e.g., cosine similarity). If the score exceeds a threshold (e.g., 80%), flag the interaction as a suspected injection. Log the alert for security review and consider blocking the response to prevent data exfiltration.

def detect_injection(prompt: str, response: str, threshold: float = 0.8) -> bool:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    emb_prompt = model.encode(prompt)
    emb_response = model.encode(response)
    similarity = cosine_similarity([emb_prompt], [emb_response])[0][0]
    return similarity >= threshold

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics