lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-01 11:21:27 +00:00

Author	SHA1	Message	Date
Pepijn	a515eadc96	refactor(profiling): consolidate into single module Unify the profiling subsystem into one file per reviewer request. Before (4 files): src/lerobot/utils/profiling_utils.py 399 LOC scripts/ci/run_model_profiling.py 337 LOC profiling/model_profiling_specs.json 181 LOC tests/scripts/test_model_profiling.py 423 LOC After (2 files): src/lerobot/utils/model_profiling.py 758 LOC — TrainingProfiler + CI orchestrator + POLICY_SPECS (inline) tests/test_model_profiling.py 315 LOC Net: -267 LOC and 4 files → 2. All functionality preserved: per-step forward/backward/optimizer timings, torch profiler tables + chrome traces, deterministic-forward fingerprint, HF Hub result upload, and the same CLI surface. Changes: - Collapse `_StepTimingCollector` into inline attributes on `TrainingProfiler` (no separate class). - Drop `ProfilingSpec` dataclass; specs are plain dicts. - Inline the JSON matrix as a module-level `POLICY_SPECS` dict — one less file to keep in sync with the training args. - CI workflow invokes `python -m lerobot.utils.model_profiling` in place of the standalone script. - Tests import `lerobot.utils.model_profiling` directly instead of loading a script-by-path. Removed JSON schema tests that no longer apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 21:31:17 +02:00
Pepijn	a6dd28e8b4	fix(profiling): tolerate groot dep-install failure groot's only policy-specific dependency is flash-attn, which has no prebuilt wheel for torch 2.10 and requires nvcc to build from source. The CI image is based on nvidia/cuda:12.4.1-base, which ships the CUDA runtime but not the compiler toolkit, so the source build fails with `/usr/local/cuda/bin/nvcc: No such file or directory`. The repo's own pyproject.toml already carries a TODO acknowledging this: gr00t needs bespoke flash-attn install steps. Treat this as an environmental limitation rather than a regression: dep-install failures for groot are logged via `::warning::` and skip the policy without failing the job. Dep-install failures for any other policy remain fatal, so real regressions still surface. Made-with: Cursor	2026-04-16 21:15:14 +02:00
Pepijn	00e9defb80	fix(profiling): build flash-attn without isolation for groot groot depends on flash-attn, which fails to build in uv's default isolated build env because it doesn't declare torch as a build-time dependency. Torch is a core lerobot dep and is already present in the target venv when groot is synced, so we can safely disable build isolation just for flash-attn. The flag is a no-op for policies that don't pull in flash-attn. Made-with: Cursor	2026-04-16 20:21:58 +02:00
Pepijn	d483dd4c4b	feat(profiling): profile groot, xvla, diffusion, wall_x on PRs Add groot, xvla, diffusion and wall_x (wall-oss-flow) to the smoke profiling filter and switch the runner to per-policy dependency resolution. Each policy now gets its own `uv sync --extra <policy>` pass followed by a profiling run, so heavy or conflicting extras (flash-attn, peft, diffusers, etc.) can never block another policy's profiling. A failure in one policy is logged and surfaces a non-zero exit at the end instead of aborting the matrix. Made-with: Cursor	2026-04-16 19:04:27 +02:00
Pepijn	8ece10e484	feat(ci): profile more models in pr smoke runs	2026-04-16 14:49:37 +02:00
Pepijn	ddeb216ab9	fix(ci): skip hub publish for pr profiling runs	2026-04-16 14:38:43 +02:00
Pepijn	d46d67f75d	fix(profiling): forward GIT_REF + PR_NUMBER into Docker container The previous commit moved these expressions from inline shell expansion to job-level env: vars, but the profiling script runs inside a Docker container. Job-level env vars are only visible in the runner, not inside the container — they need explicit -e flags on the docker run command (same pattern as HOST_GIT_COMMIT which was already forwarded). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:38:13 +02:00
Pepijn	b746cd3c61	fix(profiling): sort import + move expressions to env vars for zizmor Pre-commit Quality gate flagged two issues: 1. ruff/isort: `from numbers import Real` must sort after `from collections.abc import Callable` (stdlib alphabetical order). 2. zizmor (high): `github.head_ref`, `github.ref_name`, `github.event.inputs.git_ref`, and `github.event.pull_request.head.sha` were expanded directly in `run:` shell blocks, which zizmor flags as attacker-controllable. Move all four into job-level `env:` vars (GIT_REF, PR_NUMBER, HOST_GIT_COMMIT) so the shell only sees env-var references — the same pattern the workflow already uses for PROFILE_MODE, POLICY_FILTER, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:30:13 +02:00
Pepijn	b27e838376	fix(profiling): publish preview rows to existing dataset	2026-04-16 11:54:35 +02:00
Pepijn	40470648d1	feat(profiling): publish preview runs for dashboard debugging	2026-04-16 10:54:34 +02:00
Pepijn	9dc38d9993	fix(ci): isolate torch cache in profiling job	2026-04-16 09:32:16 +02:00
Pepijn	3922f81791	fix(ci): set HF_LEROBOT_HOME in profiling job	2026-04-15 23:35:27 +02:00
Pepijn	e1b22ed1c4	fix(ci): set torchinductor cache dir in profiling job	2026-04-15 22:55:31 +02:00
Pepijn	f2d0f04dd0	fix(ci): isolate profiling container home dirs	2026-04-15 22:51:22 +02:00
Pepijn	3ea722c6c0	fix(ci): run profiling container as runner user	2026-04-15 22:47:29 +02:00
Pepijn	48660e7a7c	fix(ci): avoid host shell expansion in policy error	2026-04-15 22:42:34 +02:00
Pepijn	c94fe868c9	fix(ci): install only profiling policy extras	2026-04-15 22:38:37 +02:00
Pepijn	d4f27cfb6e	fix(ci): restore docker env line continuation	2026-04-15 22:33:14 +02:00
Pepijn	1a2aec1b04	feat(profiling): add weekly model profiling	2026-04-15 22:31:44 +02:00

19 Commits