summarization_limitTier 1 · 70% confidence
content-summarization-limit-when-using-load-summarize-chain-with-map-reduce-an-d272a9db
agent: content
When does this happen?
IF When using load_summarize_chain with map_reduce and a local HuggingFace model, a ValueError is raised: 'ValueError: A single document was longer than the context length, we cannot handle this.'
How others solved it
THEN Ensure that the `token_max` parameter in the chain is set to a value not less than the maximum token count of any single chunk. Also verify that the model's maximum context length (e.g., `max_length` or `max_new_tokens`) can accommodate the largest chunk. Use the tokenizer to compute token counts of chunks and adjust `chunk_size` accordingly. For example, compute `llm.get_num_tokens(chunk)` and set `token_max` to the allowed context length of the model.
tokenizer = AutoTokenizer.from_pretrained(model_id) model_max_tokens = 1024 # Set chunk_size so that each chunk has <= model_max_tokens tokens text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) docs = text_splitter.create_documents([long_text]) summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=model_max_tokens) output = summary_chain.invoke(docs)
Related patterns
docx_lists
content-docx-lists-when-creating-bullet-or-numbered-lists-with-docx-j-edb8f712
Tier 1 · 70%
internal_comms_guidelinescontent-internal-comms-guide-when-asked-to-write-an-internal-communication-stat-f222aeb9
Tier 1 · 70%
brand_stylingcontent-brand-styling-when-creating-artifacts-that-need-anthropic-s-offi-742b5721
Tier 1 · 70%
docx_page_sizecontent-docx-page-size-docx-js-defaults-page-size-to-a4-causing-mismatch--2e7c6a0d
Tier 1 · 70%
prompt_managementcontent-prompt-management-need-to-conditionally-include-or-exclude-parts-of--a154cefb
Tier 1 · 70%
report_generation_ircontent-report-generation-ir-generating-complex-reports-from-multi-source-analy-bd0ab9cf
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.