← Back to blog
·6 min read

8 Quality Patterns That Eliminate AI Hallucinations in Production

qualityai-agentshallucinationsragproduction
Every AI system that goes to production faces the same problem: hallucinations. The model returns something confident, well-formatted, and completely wrong. Users trust it. Damage gets done.

Our quality agent has analyzed every recommendation that successfully reduced hallucination rates across connected sites. These 8 patterns showed up consistently — they actually worked.

---

1. Verified Knowledge Layer (Layer 0)

Pattern: Critical questions need a verified knowledge layer that takes precedence over RAG.

The most reliable AI systems don't trust their RAG pipeline for high-stakes questions. They maintain a small "Layer 0" — verified, hand-curated answers for questions where being wrong is expensive.

Example from production: an architectural Q&A system added Layer 0 entries for fire stair widths (120cm minimum) and elevator floor requirements. These verified answers always win, regardless of what the LLM retrieves. Result: zero hallucinations on safety-critical questions.

What to do: Identify the top 20 questions where being wrong has real consequences. Hand-write the correct answers. Make these answers the highest-priority source. Everything else falls through to normal RAG.

---

2. Source Verification in verify_answer

Pattern: Verification needs to compare against actual source documents, not just LLM confidence.

The original verification pattern asked the LLM "are you confident?" and trusted the answer. The improved pattern sends the source documents to verification: "given these sources, is this answer correct?"

This caught a category of hallucinations the confidence-based approach missed: the LLM was confidently citing sources that didn't actually support its claim.

What to do: When you call your verification step, include the source chunks in the prompt. Ask "does the answer match what's in the sources?" not "are you confident?"

---

3. Numerical Safety Rule in Prompts

Pattern: Add an explicit prompt rule against fabricating numbers.

Production data showed a specific hallucination pattern: when the model didn't have an exact number, it would invent one that "sounded right." Adding a single line to the prompt — "if you don't have a clear numerical value from sources, say 'no clear data'" — eliminated this.

The model now responds with "net bilgi yok" (no clear data) instead of inventing percentages, measurements, or dates.

What to do: Audit your prompt for instructions about uncertainty. Most prompts say "be helpful" but don't say "if you're guessing about numbers, admit it." Add the explicit rule.

---

4. Section-Aware Source Parsing

Pattern: Duplicate document sections cause context contamination and hallucinations.

One production system had 892 articles parsed from a regulatory document. Investigation revealed many were duplicate sections with different metadata. The model was getting contradictory chunks from "the same" article and synthesizing wrong answers.

After section-aware parsing that recognized document structure, the count went to 3,311 articles (8,030 chunks). Each article had unique metadata. Hallucinations from "contradictory but identical" sources stopped.

What to do: Audit your document parsing. If your chunks have collisions on (article_id, section), you have a parsing bug. Fix the parser to respect document hierarchy.

---

5. Fuzzy Spelling Correction (Performance + Quality Win)

Pattern: Typos cause cache misses AND irrelevant results — fixing typos before search wins twice.

Users type "erdiven" when they mean "merdiven" (stairs in Turkish). The original system did exact match: no results, no cache hit, falls through to expensive LLM call that often hallucinates because the query is unclear.

Adding fuzzy correction (erdiven → merdiven, beonarme → betonarme) before search:

  • Cache hit rate increased
  • Search results became relevant
  • Hallucinations dropped because LLM got clear, normalized queries
  • 0 API cost for the correction (purely client-side)
  • What to do: Add a normalize_query step before everything else. Handle typos, ASCII transliteration, accent removal. Cheap, fast, prevents downstream hallucinations.

    ---

    6. ASCII to Native Character Normalization

    Pattern: Search systems break when users mix ASCII and native characters.

    Turkish users sometimes type "yuksekligi" instead of "yüksekliği." Same word, different bytes. Without normalization, these are different searches with different results.

    The fix: integrate ASCII_TO_TR mapping inside the normalize_question step, not as a separate function. Single source of truth, consistent behavior across the pipeline.

    What to do: Identify the character substitutions your users make. Build them into a single normalization function. Apply it everywhere search happens — including embedding generation.

    ---

    7. Topic Priority Ranking

    Pattern: Generic results compete with specific results — give specific topics explicit priority.

    A regulatory Q&A system had a problem: queries about "Tip İmar Yönetmeliği" (a specific construction standard) were returning results from generic urban planning documents instead. The model would then hallucinate because it was trying to synthesize across mismatched contexts.

    The fix: explicit topic priority. When a query matches a specific known topic, results from that topic's source rank highest. Generic results only show if no specific match.

    What to do: Identify your "specific topics" — areas where users want exact, authoritative answers. Tag your sources by topic. Add a re-ranking step that boosts topic matches.

    ---

    8. Cross-Pipeline Coordination (Performance + Quality)

    Pattern: Quality fixes often need performance changes — coordinate or both fail.

    The fuzzy spelling correction wasn't just a quality win, it required performance team coordination because it changed the embedding pipeline. Without coordination, you fix the search but break the cache, or fix the cache but break the embeddings.

    The pattern that emerged: any change touching the search pipeline needs both Quality and Performance sign-off. Single-team changes to shared infrastructure cause regressions.

    What to do: Identify your shared infrastructure (caches, embeddings, search). Require multi-team review for changes there. The 5 minutes of coordination prevents weeks of debugging.

    ---

    The Meta-Pattern

    These 8 patterns share something: they treat hallucinations as a system problem, not a model problem. None of them required a better LLM. None required more compute. They required better verification, better preprocessing, better source management.

    The model is fine. The pipeline around the model is where hallucinations live.

    Connect to AgentMinds — get hallucination patterns and 2,500+ more proven solutions for your AI system.

    Ready to try AgentMinds?

    Scan your site for free. No signup required.

    Scan Your Site