lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-02 03:41:25 +00:00

Author	SHA1	Message	Date
Pepijn	9a298524ca	fix: pass video_metadata via process_vision_info for correct position embeddings The Qwen3.5 processor needs video_metadata (fps, frame indices) to compute temporal position embeddings. Use return_video_metadata=True which embeds metadata inside the video tensors as (tensor, metadata) tuples, and return_video_kwargs=True which returns {'do_sample_frames': False} without the problematic fps list. Made-with: Cursor	2026-03-30 17:23:44 +02:00
Pepijn	002a9dd0b9	fix: use do_sample_frames=False instead of video_kwargs fps list The Qwen3.5 processor expects fps as a scalar, not a list, so passing video_kwargs with fps=[...] fails validation. Since process_vision_info already handles frame sampling, we only need do_sample_frames=False to tell the processor to use the pre-sampled frames as-is. Made-with: Cursor	2026-03-30 16:55:46 +02:00
Pepijn	e40985b013	fix: pass video_kwargs from process_vision_info to Qwen processor The Qwen processor needs fps metadata (via return_video_kwargs=True) to compute correct temporal position embeddings. Without it, the processor defaults to fps=24 regardless of the actual video fps, causing shape mismatches between expected and actual video tokens. Made-with: Cursor	2026-03-30 16:50:34 +02:00
Pepijn	d03200bdb3	fix: force torchvision video backend instead of cv2 bypass Replace manual cv2 frame reading with FORCE_QWENVL_VIDEO_READER=torchvision env var. The torchvision backend (PyAV) properly reads video metadata and respects the fps parameter, avoiding the torchcodec fps=24 default issue. Made-with: Cursor	2026-03-30 16:42:52 +02:00
Pepijn	ac41cd6672	fix: bypass torchcodec video decoding by pre-reading frames via cv2 When torchcodec is installed, qwen-vl-utils ignores the fps parameter and defaults to 24fps if video metadata is missing, causing shape mismatches. Fix by reading video frames directly as PIL images and passing them to the processor, bypassing torchcodec entirely. Made-with: Cursor	2026-03-30 16:03:26 +02:00
Pepijn	9b211a45d6	fix: disable thinking mode in Qwen35VL single-episode fallback path The single-episode `segment_skills` method was missing `enable_thinking=False` in `apply_chat_template`, causing the model to output reasoning traces instead of JSON when the batch path fails and falls back to per-episode processing. Made-with: Cursor	2026-03-30 15:31:18 +02:00
root	a6387da464	add license	2026-03-11 23:14:22 +00:00
Jade Choghari	0328b3f4aa	Update src/lerobot/data_processing/data_annotations/vlm_annotations.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jade Choghari <chogharijade@gmail.com>	2026-03-11 16:10:37 -07:00
root	819c1b9710	add tests/fixes	2026-03-11 22:49:06 +00:00
root	f0848c6887	add subtasl	2026-03-11 19:51:48 +00:00

10 Commits