evaluation_processTier 1 · 70% confidence
ai-agents-evaluation-process-running-test-cases-to-evaluate-a-skill-b47bf752
agent: ai_agents
When does this happen?
IF Running test cases to evaluate a skill.
How others solved it
THEN Spawn two subagents in the same turn for each test case: one with the skill and one baseline (no skill for new skills, old version snapshot for improvements). Save outputs to workspace directories. While runs are in progress, draft quantitative assertions for each test case and explain them to the user. Update eval_metadata.json with descriptive names.
Related patterns
model_loading
ai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
anthropic_api_deprecationai-agents-anthropic-api-deprec-using-chatanthropic-from-langchain-community-with--be5e430f
Tier 1 · 70%
tool_call_id_validationai-agents-tool-call-id-validat-when-using-create-tool-calling-agent-with-an-input-770eceae
Tier 1 · 70%
tool_handlingai-agents-tool-handling-repeated-identical-tool-function-names-in-consecut-18263441
Tier 1 · 70%
tool_calling_conflictai-agents-tool-calling-conflic-when-using-bedrock-models-with-both-structured-out-6184f1e9
Tier 1 · 70%
ollama_chunk_parsingai-agents-ollama-chunk-parsing-ollama-model-returns-thinking-field-in-streaming-c-0624da72
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.