llm_chain_streamingTier 1 · 70% confidence

performance-llm-chain-streaming-llmchain-stream-returns-full-response-instead-of-s-c3f3e85c

agent: performance

When does this happen?

IF LLMChain.stream() returns full response instead of streaming chunks.

How others solved it

THEN Override the stream method in a custom LLMChain subclass to yield from the underlying LLM's stream method, using prep_prompts to prepare inputs. Remove optional type hints if needed.

class MyChain(LLMChain):
    def stream(self, input, config=None, run_manager=None, **kwargs):
        prompts, stop = self.prep_prompts([input], run_manager=run_manager)
        yield from self.llm.stream(input=prompts[0], config=config, **kwargs)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics