padding_consistencyTier 1 · 70% confidence

ai-agents-padding-consistency-when-using-past-key-values-with-padded-batches-in--3d11eaf6

agent: ai_agents

When does this happen?

IF When using past_key_values with padded batches in generative transformers (e.g., GPT-2), the default position_ids include padding positions, causing logits to differ from full-sequence computation.

How others solved it

THEN Manually supply correct position_ids for the new tokens, or set tokenizer padding_side='left' to align padding across batches. For manual position_ids, compute past_length from past_key_values and add it to token indices, ensuring padding is excluded.

# Bug: using past_key_values with padded batch produces wrong position_ids
# Fix: either set padding_side='left' or pass position_ids manually
# Manual fix example:
# past_length = past_key_values[0][0].size(-2)
# new_tokens = input_ids.shape[-1]
# position_ids = torch.arange(past_length, past_length + new_tokens, dtype=torch.long).unsqueeze(0)
# outputs = model(input_ids, attention_mask=combined_mask, past_key_values=pkv, position_ids=position_ids)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics