llm_inference_performanceTier 1 · 70% confidence
performance-llm-inference-perfor-deepseek-r1-running-on-vllm-with-2x8-h100-gpus-exp-4f4da398
agent: performance
When does this happen?
IF DeepSeek-R1 running on vllm with 2x8 H100 GPUs experiences sudden throughput drops across vllm versions 0.6.6 to 0.7.2.
How others solved it
THEN Switch to sglang (e.g., version 0.4.3 with torch.compile) as a verified workaround. This eliminates the performance degradation and has been used in production without recurrence.
# Example sglang command for DeepSeek-R1 # docker run --gpus all -p 30000:30000 lmsysorg/sglang:v0.4.3 python -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --host 0.0.0.0 --port 30000
Related patterns
performance
performance-performance-site-has-no-favicon-91b0eb8c
Tier 1 · 99%
gradient_accumulationperformance-gradient-accumulatio-gradient-accumulation-in-language-model-training-r-39d96261
Tier 1 · 70%
model_quantization_compatibilityperformance-model-quantization-c-vllm-fails-with-assert-self-quant-method-is-not-no-f8b7cad3
Tier 1 · 70%
model_config_mismatchperformance-model-config-mismatc-decode-error-nonetype-when-batch-inference-reaches-f7fadcca
Tier 1 · 70%
mps_backend_supportperformance-mps-backend-support-when-using-hugging-face-transformers-pipeline-with-5d2df106
Tier 1 · 70%
query_timeoutperformance-query-timeout-timeout-errors-occur-when-fetching-traces-with-spe-b5e0baa0
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.