lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-02 20:01:25 +00:00

Files

Pepijn 1fb46ab300 annotate: cap embedded-frame budget to fit VLM context (fix 32k overflow)

Switching the plan module to embedded frames (use_video_url=false)
exposed a context overflow: at frames_per_second=2.0 with the old
max_video_frames=128 default, a 480x640 episode embeds ~128 frames ≈
33-39k vision tokens, over the model's 32768 context — every plan call
died with 'Input length exceeds maximum context length' (HTTP 400),
crashing the whole annotation job.

The video_url path never hit this because the server downsampled; the
embedded path sends every sampled frame, so the frame count is a hard
token budget.

Fix:
  * config default max_video_frames 128 -> 32 (~8-10k vision tokens,
    comfortable headroom for the prompt + describe/verify passes).
    Frames are still sampled UNIFORMLY across the whole episode, so
    longer episodes are subsampled, not truncated — full temporal
    coverage preserved, just coarser density.
  * run_hf_job.py: frames_per_second 2.0 -> 1.0, explicit
    --plan.max_video_frames=32, with a comment explaining the token
    budget and the 'do not raise toward 128 with embedded frames' rule.

Only the plan module embeds the full episode; VQA (1 frame/tick) and
interjections (4-frame window) were never at risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-02 16:02:25 +02:00

annotations

annotate: cap embedded-frame budget to fit VLM context (fix 32k overflow)

2026-06-02 16:02:25 +02:00

backward_compatibility

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

dataset

refactor: support custom progress parquet overlays (#3640 )

2026-05-21 14:32:10 +02:00

lekiwi

feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI (#3413 )