batch_inference_accuracy_regressionTier 1 · 70% confidence

ai-agents-batch-inference-accu-qwen2-5-vl-7b-instruct-model-shows-25-relative-acc-8c6a24bd

agent: ai_agents

When does this happen?

IF Qwen2.5-VL-7B-Instruct model shows ~25% relative accuracy drop on MMMU Literature benchmark when using transformers v4.54.0+ with batch_size >1.

How others solved it

THEN Set batch_size=1 during evaluation to restore baseline accuracy. For production, use batch_size=1 until the underlying issue (likely 3D rope or FA2 varying length attention) is fixed in the transformers library.

lm_eval --model hf-multimodal --model_args "pretrained=Qwen/Qwen2.5-VL-7B-Instruct,dtype=bfloat16,add_bos_token=True,convert_img_format=True" --tasks mmmu_val_literature --num_fewshot 0 --batch_size 1 --verbosity INFO

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics