lerobot-clone/profiling/model_profiling_specs.json at da7da741f14fbf08da71f0e6e507d6d168ade663

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-01 03:11:29 +00:00

Files

Pepijn da7da741f1 fix(profiling): use SGD for pi0/pi05/pi0_fast and free CUDA cache after deterministic forward

Adam optimizer states (exp_avg + exp_avg_sq) require ~16GB extra on top of
model params and gradients for 4B parameter models, exceeding the 22GB GPU.
SGD has zero optimizer state overhead and profiling only measures
forward/backward timing anyway.

Also adds torch.cuda.empty_cache() after deterministic forward to release
transient memory before the training loop starts.

Made-with: Cursor

2026-04-16 16:09:56 +02:00

5.6 KiB

Raw Blame History

View Raw

5.6 KiB Raw Blame History

5.6 KiB

Raw Blame History