mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-30 18:31:25 +00:00
* Add basic support for PEFT adapter methods This changes adds support for training policies with much less parameters by applying adapter methods such as LoRA on specific parts of the policies and therefore possibly higher learning rates / batch sizes. To make this as accessible as possible I thought it useful to provide defaults for `target_modules` and `modules_to_save`. Currently only SmolVLA has such defaults but when we agree that this change is useful I will set out to generate more such defaults. While the user can override these settings, they are expected to only change the peft_method, rank and init_type parameters. * Implement loading of PEFT adapters Loading a PEFT adapter is currently done by initializing a policy with default config and then applying the adapter on the resulting model. This has the obvious drawback that any configurations done during training are not applied in the adapted model. Currently the `use_peft` attribute of `PreTrainedConfig` is only set during loading to signal the following code that it has to deal with a PEFT adapter. However we could imagine a scenario where this is already set at training time and stored alongside the adapter. * Store policy config alongside PEFT checkpoint Before this change the PEFT-wrapped policy did not save the policy's config alongside the adapter config / weights which prevented us from changing the policy config. Now the policy config is saved both in full training and PEFT training. This change makes loading the PEFT policy adapter much easier as well. * Add default config for ACT * Support targets like `all-linear` * Formatting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix failing tests * Remove PEFT compatibility changes in config We'll wait for the PEFT release that fixes this for good. * Remove `use_peft` parameter from training script Instead we make the PEFT config optional which has the same effect. * Log adapter config to WandB * Better documentation for CLI arguments * Don't unload & merge the PEFT model This can make things hard when using quantized layers (user expects quantized base layers with unquantized adapters for example, merging defaults to upcast the layers leading to higher memory). * Correct way of identifying when to save config * Add CLI end-to-end tests Currently there don't seem to be any way to test the CLI commands. Since this change mostly happens in those I thought it best to add a way to test these commands end-to-end. More integrated commands like `lerobot-record` need patching but standalone commands like training seem to work fine. * Update default targets Removed ACT since it doesn't make sense to fine-tune ACT without having it pretrained beforehand. SmolVLA and Pi0/0.5 are much more senseful targets. * Clean up loading code - Centralized instantiation of the PEFT wrapper in `make_policy` for inference (e.g. in `lerobot-record`) - Training a PEFT policy also sets `cfg.use_peft` so that all inference code loading the policy can rely on that attribute to identify if PEFT loading is needed - Modified RTC example to also include PEFT policies. Mostly because this is an example I'm currently exploring. * Make sure push_to_hub works Since PEFT only wraps `push_to_hub` and not `push_model_to_hub`, the reference to `self` in `policy.push_model_to_hub` is the unwrapped policy which, of course, doesn't know anything about PEFT. To make the upload process aware of PEFT, we pass the unwrapped policy down to `push_model_to_hub` as a kwarg. This is not ideal but I think it is the best way for now. * formatting * Warn when encountering from-scratch-training * Revamp pretrained model loading There were quite a few factors that convinced me that the status quo is able to load pretrained models from the PEFT adapter config but in fact that didn't work. This commit fixes the following things: - policies wrapped in PEFT will now have a `name_or_path` attribute containing the name or path of the pretrained model we're fine-tuning - we further assume that SmolVLA without `pretrained_path` and `load_vlm_weights==False` must be an user-side error - we assume that using PEFT on from-scratch-policies must be an user-side-error * Make it possible to unset policy features This is necessary to train pre-trained policies on new datasets so that the features are inferred from the new dataset and not from the pretrained policy. * Use correct loading for PEFT in RTC example * Make it possible to use PeftModels in eval * Add test checking that PEFT actually reduces params * Adapt state/action projections instead of full-finetuning There doesn't seem to be a benefit to fully fine-tune these layers over just adapting them, so we do that instead. * Disallow PEFT training on non-pretrained policies At first I thought it would make sense to have this feature in case you want to fine-tune a pre-trained section but in the end it makes more trouble than it's worth. It's still possible to allow this in the future when a concrete need arises. * Add basic documentation * Formatting * Add peft as extra dependency, mark tests Fast tests currently fail because of the missing dependency. * Fix pre-commit issues * Add walx <> peft conflict for uv * Exclude peft from pi install for now --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
214 lines
9.4 KiB
Python
214 lines
9.4 KiB
Python
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
import builtins
|
|
import datetime as dt
|
|
import os
|
|
from dataclasses import dataclass, field
|
|
from pathlib import Path
|
|
from typing import Any
|
|
|
|
import draccus
|
|
from huggingface_hub import hf_hub_download
|
|
from huggingface_hub.errors import HfHubHTTPError
|
|
|
|
from lerobot import envs
|
|
from lerobot.configs import parser
|
|
from lerobot.configs.default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
|
|
from lerobot.configs.policies import PreTrainedConfig
|
|
from lerobot.optim import OptimizerConfig
|
|
from lerobot.optim.schedulers import LRSchedulerConfig
|
|
from lerobot.utils.hub import HubMixin
|
|
|
|
TRAIN_CONFIG_NAME = "train_config.json"
|
|
|
|
|
|
@dataclass
|
|
class TrainPipelineConfig(HubMixin):
|
|
dataset: DatasetConfig
|
|
env: envs.EnvConfig | None = None
|
|
policy: PreTrainedConfig | None = None
|
|
# Set `dir` to where you would like to save all of the run outputs. If you run another training session
|
|
# with the same value for `dir` its contents will be overwritten unless you set `resume` to true.
|
|
output_dir: Path | None = None
|
|
job_name: str | None = None
|
|
# Set `resume` to true to resume a previous run. In order for this to work, you will need to make sure
|
|
# `dir` is the directory of an existing run with at least one checkpoint in it.
|
|
# Note that when resuming a run, the default behavior is to use the configuration from the checkpoint,
|
|
# regardless of what's provided with the training command at the time of resumption.
|
|
resume: bool = False
|
|
# `seed` is used for training (eg: model initialization, dataset shuffling)
|
|
# AND for the evaluation environments.
|
|
seed: int | None = 1000
|
|
# Number of workers for the dataloader.
|
|
num_workers: int = 4
|
|
batch_size: int = 8
|
|
steps: int = 100_000
|
|
eval_freq: int = 20_000
|
|
log_freq: int = 200
|
|
tolerance_s: float = 1e-4
|
|
save_checkpoint: bool = True
|
|
# Checkpoint is saved every `save_freq` training iterations and after the last training step.
|
|
save_freq: int = 20_000
|
|
use_policy_training_preset: bool = True
|
|
optimizer: OptimizerConfig | None = None
|
|
scheduler: LRSchedulerConfig | None = None
|
|
eval: EvalConfig = field(default_factory=EvalConfig)
|
|
wandb: WandBConfig = field(default_factory=WandBConfig)
|
|
peft: PeftConfig | None = None
|
|
|
|
# RA-BC (Reward-Aligned Behavior Cloning) parameters
|
|
use_rabc: bool = False # Enable reward-weighted training
|
|
rabc_progress_path: str | None = None # Path to precomputed SARM progress parquet file
|
|
rabc_kappa: float = 0.01 # Hard threshold for high-quality samples
|
|
rabc_epsilon: float = 1e-6 # Small constant for numerical stability
|
|
rabc_head_mode: str | None = "sparse" # For dual-head models: "sparse" or "dense"
|
|
|
|
# Rename map for the observation to override the image and state keys
|
|
rename_map: dict[str, str] = field(default_factory=dict)
|
|
checkpoint_path: Path | None = field(init=False, default=None)
|
|
|
|
def validate(self) -> None:
|
|
# HACK: We parse again the cli args here to get the pretrained paths if there was some.
|
|
policy_path = parser.get_path_arg("policy")
|
|
if policy_path:
|
|
# Only load the policy config
|
|
cli_overrides = parser.get_cli_overrides("policy")
|
|
self.policy = PreTrainedConfig.from_pretrained(policy_path, cli_overrides=cli_overrides)
|
|
self.policy.pretrained_path = Path(policy_path)
|
|
elif self.resume:
|
|
# The entire train config is already loaded, we just need to get the checkpoint dir
|
|
config_path = parser.parse_arg("config_path")
|
|
if not config_path:
|
|
raise ValueError(
|
|
f"A config_path is expected when resuming a run. Please specify path to {TRAIN_CONFIG_NAME}"
|
|
)
|
|
|
|
if not Path(config_path).resolve().exists():
|
|
raise NotADirectoryError(
|
|
f"{config_path=} is expected to be a local path. "
|
|
"Resuming from the hub is not supported for now."
|
|
)
|
|
|
|
policy_dir = Path(config_path).parent
|
|
if self.policy is not None:
|
|
self.policy.pretrained_path = policy_dir
|
|
self.checkpoint_path = policy_dir.parent
|
|
|
|
if self.policy is None:
|
|
raise ValueError(
|
|
"Policy is not configured. Please specify a pretrained policy with `--policy.path`."
|
|
)
|
|
|
|
if not self.job_name:
|
|
if self.env is None:
|
|
self.job_name = f"{self.policy.type}"
|
|
else:
|
|
self.job_name = f"{self.env.type}_{self.policy.type}"
|
|
|
|
if not self.resume and isinstance(self.output_dir, Path) and self.output_dir.is_dir():
|
|
raise FileExistsError(
|
|
f"Output directory {self.output_dir} already exists and resume is {self.resume}. "
|
|
f"Please change your output directory so that {self.output_dir} is not overwritten."
|
|
)
|
|
elif not self.output_dir:
|
|
now = dt.datetime.now()
|
|
train_dir = f"{now:%Y-%m-%d}/{now:%H-%M-%S}_{self.job_name}"
|
|
self.output_dir = Path("outputs/train") / train_dir
|
|
|
|
if isinstance(self.dataset.repo_id, list):
|
|
raise NotImplementedError("LeRobotMultiDataset is not currently implemented.")
|
|
|
|
if not self.use_policy_training_preset and (self.optimizer is None or self.scheduler is None):
|
|
raise ValueError("Optimizer and Scheduler must be set when the policy presets are not used.")
|
|
elif self.use_policy_training_preset and not self.resume:
|
|
self.optimizer = self.policy.get_optimizer_preset()
|
|
self.scheduler = self.policy.get_scheduler_preset()
|
|
|
|
if self.policy.push_to_hub and not self.policy.repo_id:
|
|
raise ValueError(
|
|
"'policy.repo_id' argument missing. Please specify it to push the model to the hub."
|
|
)
|
|
|
|
if self.use_rabc and not self.rabc_progress_path:
|
|
# Auto-detect from dataset path
|
|
repo_id = self.dataset.repo_id
|
|
if self.dataset.root:
|
|
self.rabc_progress_path = str(Path(self.dataset.root) / "sarm_progress.parquet")
|
|
else:
|
|
self.rabc_progress_path = f"hf://datasets/{repo_id}/sarm_progress.parquet"
|
|
|
|
@classmethod
|
|
def __get_path_fields__(cls) -> list[str]:
|
|
"""This enables the parser to load config from the policy using `--policy.path=local/dir`"""
|
|
return ["policy"]
|
|
|
|
def to_dict(self) -> dict[str, Any]:
|
|
return draccus.encode(self) # type: ignore[no-any-return] # because of the third-party library draccus uses Any as the return type
|
|
|
|
def _save_pretrained(self, save_directory: Path) -> None:
|
|
with open(save_directory / TRAIN_CONFIG_NAME, "w") as f, draccus.config_type("json"):
|
|
draccus.dump(self, f, indent=4)
|
|
|
|
@classmethod
|
|
def from_pretrained(
|
|
cls: builtins.type["TrainPipelineConfig"],
|
|
pretrained_name_or_path: str | Path,
|
|
*,
|
|
force_download: bool = False,
|
|
resume_download: bool | None = None,
|
|
proxies: dict[Any, Any] | None = None,
|
|
token: str | bool | None = None,
|
|
cache_dir: str | Path | None = None,
|
|
local_files_only: bool = False,
|
|
revision: str | None = None,
|
|
**kwargs: Any,
|
|
) -> "TrainPipelineConfig":
|
|
model_id = str(pretrained_name_or_path)
|
|
config_file: str | None = None
|
|
if Path(model_id).is_dir():
|
|
if TRAIN_CONFIG_NAME in os.listdir(model_id):
|
|
config_file = os.path.join(model_id, TRAIN_CONFIG_NAME)
|
|
else:
|
|
print(f"{TRAIN_CONFIG_NAME} not found in {Path(model_id).resolve()}")
|
|
elif Path(model_id).is_file():
|
|
config_file = model_id
|
|
else:
|
|
try:
|
|
config_file = hf_hub_download(
|
|
repo_id=model_id,
|
|
filename=TRAIN_CONFIG_NAME,
|
|
revision=revision,
|
|
cache_dir=cache_dir,
|
|
force_download=force_download,
|
|
proxies=proxies,
|
|
resume_download=resume_download,
|
|
token=token,
|
|
local_files_only=local_files_only,
|
|
)
|
|
except HfHubHTTPError as e:
|
|
raise FileNotFoundError(
|
|
f"{TRAIN_CONFIG_NAME} not found on the HuggingFace Hub in {model_id}"
|
|
) from e
|
|
|
|
cli_args = kwargs.pop("cli_args", [])
|
|
with draccus.config_type("json"):
|
|
return draccus.parse(cls, config_file, args=cli_args)
|
|
|
|
|
|
@dataclass(kw_only=True)
|
|
class TrainRLServerPipelineConfig(TrainPipelineConfig):
|
|
# NOTE: In RL, we don't need an offline dataset
|
|
# TODO: Make `TrainPipelineConfig.dataset` optional
|
|
dataset: DatasetConfig | None = None # type: ignore[assignment] # because the parent class has made it's type non-optional
|