docs/source/hilserl.mdx

# HIL-SERL Real Robot Training Workflow Guide

In this tutorial you will go through the full Human-in-the-Loop Sample-Efficient Reinforcement Learning (HIL-SERL) workflow using LeRobot. You will master training a policy with RL on a real robot in just a few hours.

HIL-SERL is a sample-efficient reinforcement learning algorithm that combines human demonstrations with online learning and human interventions. The approach starts from a small set of human demonstrations, uses them to train a reward classifier, and then employs an actor-learner architecture where humans can intervene during policy execution to guide exploration and correct unsafe behaviors. In this tutorial, you'll use a gamepad to provide interventions and control the robot during the learning process.

It combines three key ingredients: 1. **Offline demonstrations & reward classifier:** a handful of human-teleop episodes plus a vision-based success detector give the policy a shaped starting point. 2. **On-robot actor / learner loop with human interventions:** a distributed Soft Actor Critic (SAC) learner updates the policy while an actor explores on the physical robot; the human can jump in at any time to correct dangerous or unproductive behaviour. 3. **Safety & efficiency tools:** joint/end-effector (EE) bounds, crop region of interest (ROI) preprocessing and WandB monitoring keep the data useful and the hardware safe.

Together these elements let HIL-SERL reach near-perfect task success and faster cycle times than imitation-only baselines.

<p align="center">
  <img
    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hilserl-main-figure.png"
    alt="HIL-SERL workflow"
    title="HIL-SERL workflow"
    width="100%"
  ></img>
</p>

<p align="center">
  <i>HIL-SERL workflow, Luo et al. 2024</i>
</p>

This guide provides step-by-step instructions for training a robot policy using LeRobot's HilSerl implementation to train on a real robot.

## What do I need?

- A gamepad (recommended) or keyboard to control the robot
- A Nvidia GPU
- A real robot with a follower and leader arm (optional if you use the keyboard or the gamepad)
- A URDF file for the robot for the kinematics package (check `lerobot/common/model/kinematics.py`)

## What kind of tasks can I train?

One can use HIL-SERL to train on a variety of manipulation tasks. Some recommendations:

- Start with a simple task to understand how the system works.
  - Push cube to a goal region
  - Pick and lift cube with the gripper
- Avoid extremely long horizon tasks. Focus on tasks that can be completed in 5-10 seconds.
- Once you have a good idea of how the system works, you can try more complex tasks and longer horizons.
  - Pick and place cube
  - Bimanual tasks to pick objects with two arms
  - Hand-over tasks to transfer objects from one arm to another
  - Go crazy!

## Install LeRobot with HIL-SERL

To install LeRobot with HIL-SERL, you need to install the `hilserl` extra.

```bash
pip install -e ".[hilserl]"
```

## Real Robot Training Workflow

### Understanding Configuration

The training process begins with proper configuration for the HILSerl environment. The configuration class of interest is `HILSerlRobotEnvConfig` in `lerobot/envs/configs.py`. Which is defined as:

<!-- prettier-ignore-start -->
```python
class HILSerlRobotEnvConfig(EnvConfig):
    robot: RobotConfig | None = None    # Main robot agent (defined in `lerobot/robots`)
    teleop: TeleoperatorConfig | None = None    # Teleoperator agent, e.g., gamepad or leader arm, (defined in `lerobot/teleoperators`)
    wrapper: EnvTransformConfig | None = None    # Environment wrapper settings; check `lerobot/scripts/server/gym_manipulator.py`
    fps: int = 10    # Control frequency
    name: str = "real_robot"    # Environment name
    mode: str = None    # "record", "replay", or None (for training)
    repo_id: str | None = None    # LeRobot dataset repository ID
    dataset_root: str | None = None    # Local dataset root (optional)
    task: str = ""    # Task identifier
    num_episodes: int = 10    # Number of episodes for recording
    episode: int = 0    # episode index for replay
    device: str = "cuda"    # Compute device
    push_to_hub: bool = True    # Whether to push the recorded datasets to Hub
    pretrained_policy_name_or_path: str | None = None    # For policy loading
    reward_classifier_pretrained_path: str | None = None    # For reward model
    number_of_steps_after_success: int = 0    # For reward classifier, collect more positive examples after a success to train a classifier
```
<!-- prettier-ignore-end -->

### Finding Robot Workspace Bounds

Before collecting demonstrations, you need to determine the appropriate operational bounds for your robot.

This helps simplify the problem of learning on the real robot in two ways: 1) by limiting the robot's operational space to a specific region that solves the task and avoids unnecessary or unsafe exploration, and 2) by allowing training in end-effector space rather than joint space. Empirically, learning in joint space for reinforcement learning in manipulation is often a harder problem - some tasks are nearly impossible to learn in joint space but become learnable when the action space is transformed to end-effector coordinates.

**Using find_joint_limits.py**

This script helps you find the safe operational bounds for your robot's end-effector. Given that you have a follower and leader arm, you can use the script to find the bounds for the follower arm that will be applied during training.
Bounding the action space will reduce the redundant exploration of the agent and guarantees safety.

```bash
python -m lerobot.scripts.find_joint_limits \
    --robot.type=so100_follower \
    --robot.port=/dev/tty.usbmodem58760431541 \
    --robot.id=black \
    --teleop.type=so100_leader \
    --teleop.port=/dev/tty.usbmodem58760431551 \
    --teleop.id=blue
```

**Workflow**

1. Run the script and move the robot through the space that solves the task
2. The script will record the minimum and maximum end-effector positions and the joint angles and prints them to the console, for example:
   ```
   Max ee position [0.2417 0.2012 0.1027]
   Min ee position [0.1663 -0.0823 0.0336]
   Max joint positions [-20.0, -20.0, -20.0, -20.0, -20.0, -20.0]
   Min joint positions [50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
   ```
3. Use these values in the configuration of your teleoperation device (TeleoperatorConfig) under the `end_effector_bounds` field

**Example Configuration**

```json
"end_effector_bounds": {
    "max": [0.24, 0.20, 0.10],
    "min": [0.16, -0.08, 0.03]
}
```

### Collecting Demonstrations

With the bounds defined, you can safely collect demonstrations for training. Training RL with off-policy algorithm allows us to use offline datasets collected in order to improve the efficiency of the learning process.

**Setting Up Record Mode**

Create a configuration file for recording demonstrations (or edit an existing one like [env_config_so100.json](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json)):

1. Set `mode` to `"record"`
2. Specify a unique `repo_id` for your dataset (e.g., "username/task_name")
3. Set `num_episodes` to the number of demonstrations you want to collect
4. Set `crop_params_dict` to `null` initially (we'll determine crops later)
5. Configure `robot`, `cameras`, and other hardware settings

Example configuration section:

```json
"mode": "record",
"repo_id": "username/pick_lift_cube",
"dataset_root": null,
"task": "pick_and_lift",
"num_episodes": 15,
"episode": 0,
"push_to_hub": true
```

### Using a Teleoperation Device

Along with your robot, you will need a teleoperation device to control it in order to collect datasets of your task and perform interventions during the online training.
We support using a gamepad or a keyboard or the leader arm of the robot.

HIL-Serl learns actions in the end-effector space of the robot. Therefore, the teleoperation will control the end-effector's x,y,z displacements.

For that we need to define a version of the robot that takes actions in the end-effector space. Check the robot class `SO100FollowerEndEffector` and its configuration `SO100FollowerEndEffectorConfig` for the default parameters related to the end-effector space.

<!-- prettier-ignore-start -->
```python
class SO100FollowerEndEffectorConfig(SO100FollowerConfig):
    """Configuration for the SO100FollowerEndEffector robot."""

    # Default bounds for the end-effector position (in meters)
    end_effector_bounds: dict[str, list[float]] = field( # bounds for the end-effector in x,y,z direction
        default_factory=lambda: {
            "min": [-1.0, -1.0, -1.0],  # min x, y, z
            "max": [1.0, 1.0, 1.0],  # max x, y, z
        }
    )

    max_gripper_pos: float = 50 # maximum gripper position that the gripper will be open at

    end_effector_step_sizes: dict[str, float] = field( # maximum step size for the end-effector in x,y,z direction
        default_factory=lambda: {
            "x": 0.02,
            "y": 0.02,
            "z": 0.02,
        }
    )
```
<!-- prettier-ignore-end -->

The `Teleoperator` defines the teleoperation device. You can check the list of available teleoperators in `lerobot/teleoperators`.

**Setting up the Gamepad**

The gamepad provides a very convenient way to control the robot and the episode state.

To setup the gamepad, you need to set the `control_mode` to `"gamepad"` and define the `teleop` section in the configuration file.

```json
    "teleop": {
        "type": "gamepad",
        "use_gripper": true
    },
```

<p align="center">
  <img
    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/gamepad_guide.jpg?raw=true"
    alt="Figure shows the control mappings on a Logitech gamepad."
    title="Gamepad Control Mapping"
    width="100%"
  ></img>
</p>
<p align="center">
  <i>Gamepad button mapping for robot control and episode management</i>
</p>

**Setting up the SO101 leader**

The SO101 leader arm has reduced gears that allows it to move and track the follower arm during exploration. Therefore, taking over is much smoother than the gearless SO100.

To setup the SO101 leader, you need to set the `control_mode` to `"leader"` and define the `teleop` section in the configuration file.

```json
    "teleop": {
        "type": "so101_leader",
        "port": "/dev/tty.usbmodem585A0077921", # check your port number
        "use_degrees": true
    },
```

In order to annotate the success/failure of the episode, **you will need** to use a keyboard to press `s` for success, `esc` for failure.
During the online training, press `space` to take over the policy and `space` again to give the control back to the policy.

<details>
<summary><strong>Video: SO101 leader teleoperation</strong></summary>

<div class="video-container">
  <video controls width="600">
    <source
      src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/so101_leader_tutorial.mp4"
      type="video/mp4"
    />
  </video>
</div>

<p align="center"><i>SO101 leader teleoperation example, the leader tracks the follower, press `space` to intervene</i></p>
</details>

**Recording Demonstrations**

Start the recording process, an example of the config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json):

```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config_so100.json
```

During recording:

1. The robot will reset to the initial position defined in the configuration file `fixed_reset_joint_positions`
2. Complete the task successfully
3. The episode ends with a reward of 1 when you press the "success" button
4. If the time limit is reached, or the fail button is pressed, the episode ends with a reward of 0
5. You can rerecord an episode by pressing the "rerecord" button
6. The process automatically continues to the next episode
7. After recording all episodes, the dataset is pushed to the Hugging Face Hub (optional) and saved locally

### Processing the Dataset

After collecting demonstrations, process them to determine optimal camera crops.
Reinforcement learning is sensitive to background distractions, so it is important to crop the images to the relevant workspace area.

Visual RL algorithms learn directly from pixel inputs, making them vulnerable to irrelevant visual information. Background elements like changing lighting, shadows, people moving, or objects outside the workspace can confuse the learning process. Good ROI selection should:

- Include only the essential workspace where the task happens
- Capture the robot's end-effector and all objects involved in the task
- Exclude unnecessary background elements and distractions

Note: If you already know the crop parameters, you can skip this step and just set the `crop_params_dict` in the configuration file during recording.

**Determining Crop Parameters**

Use the `crop_dataset_roi.py` script to interactively select regions of interest in your camera images:

```bash
python -m lerobot.scripts.rl.crop_dataset_roi --repo-id username/pick_lift_cube
```

1. For each camera view, the script will display the first frame
2. Draw a rectangle around the relevant workspace area
3. Press 'c' to confirm the selection
4. Repeat for all camera views
5. The script outputs cropping parameters and creates a new cropped dataset

Example output:

```
Selected Rectangular Regions of Interest (top, left, height, width):
observation.images.side: [180, 207, 180, 200]
observation.images.front: [180, 250, 120, 150]
```

<p align="center">
  <img
    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/crop_dataset.gif"
    width="600"
  />
</p>

<p align="center">
  <i>Interactive cropping tool for selecting regions of interest</i>
</p>

**Updating Configuration**

Add these crop parameters to your training configuration:

```json
"crop_params_dict": {
    "observation.images.side": [180, 207, 180, 200],
    "observation.images.front": [180, 250, 120, 150]
},
"resize_size": [128, 128]
```

**Recommended image resolution**

Most vision-based policies have been validated on square inputs of either **128×128** (default) or **64×64** pixels. We therefore advise setting the resize_size parameter to [128, 128] – or [64, 64] if you need to save GPU memory and bandwidth. Other resolutions are possible but have not been extensively tested.

### Training a Reward Classifier

The reward classifier plays an important role in the HIL-SERL workflow by automating reward assignment and automatically detecting episode success. Instead of manually defining reward functions or relying on human feedback for every timestep, the reward classifier learns to predict success/failure from visual observations. This enables the RL algorithm to learn efficiently by providing consistent and automated reward signals based on the robot's camera inputs.

This guide explains how to train a reward classifier for human-in-the-loop reinforcement learning implementation of LeRobot. Reward classifiers learn to predict the reward value given a state which can be used in an RL setup to train a policy.

**Note**: Training a reward classifier is optional. You can start the first round of RL experiments by annotating the success manually with your gamepad or keyboard device.

The reward classifier implementation in `modeling_classifier.py` uses a pretrained vision model to process the images. It can output either a single value for binary rewards to predict success/fail cases or multiple values for multi-class settings.

**Collecting a Dataset for the reward classifier**

Before training, you need to collect a dataset with labeled examples. The `record_dataset` function in `gym_manipulator.py` enables the process of collecting a dataset of observations, actions, and rewards.

To collect a dataset, you need to modify some parameters in the environment configuration based on HILSerlRobotEnvConfig.

```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/reward_classifier_train_config.json
```

**Key Parameters for Data Collection**

- **mode**: set it to `"record"` to collect a dataset
- **repo_id**: `"hf_username/dataset_name"`, name of the dataset and repo on the hub
- **num_episodes**: Number of episodes to record
- **number_of_steps_after_success**: Number of additional frames to record after a success (reward=1) is detected
- **fps**: Number of frames per second to record
- **push_to_hub**: Whether to push the dataset to the hub

The `number_of_steps_after_success` parameter is crucial as it allows you to collect more positive examples. When a success is detected, the system will continue recording for the specified number of steps while maintaining the reward=1 label. Otherwise, there won't be enough states in the dataset labeled to 1 to train a good classifier.

Example configuration section for data collection:

```json
{
  "mode": "record",
  "repo_id": "hf_username/dataset_name",
  "dataset_root": "data/your_dataset",
  "num_episodes": 20,
  "push_to_hub": true,
  "fps": 10,
  "number_of_steps_after_success": 15
}
```

**Reward Classifier Configuration**

The reward classifier is configured using `configuration_classifier.py`. Here are the key parameters:

- **model_name**: Base model architecture (e.g., we mainly use `"helper2424/resnet10"`)
- **model_type**: `"cnn"` or `"transformer"`
- **num_cameras**: Number of camera inputs
- **num_classes**: Number of output classes (typically 2 for binary success/failure)
- **hidden_dim**: Size of hidden representation
- **dropout_rate**: Regularization parameter
- **learning_rate**: Learning rate for optimizer

Example configuration for training the [reward classifier](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/reward_classifier_train_config.json):

```json
{
  "policy": {
    "type": "reward_classifier",
    "model_name": "helper2424/resnet10",
    "model_type": "cnn",
    "num_cameras": 2,
    "num_classes": 2,
    "hidden_dim": 256,
    "dropout_rate": 0.1,
    "learning_rate": 1e-4,
    "device": "cuda",
    "use_amp": true,
    "input_features": {
      "observation.images.front": {
        "type": "VISUAL",
        "shape": [3, 128, 128]
      },
      "observation.images.side": {
        "type": "VISUAL",
        "shape": [3, 128, 128]
      }
    }
  }
}
```

**Training the Classifier**

To train the classifier, use the `train.py` script with your configuration:

```bash
python -m lerobot.scripts.train --config_path path/to/reward_classifier_train_config.json
```

**Deploying and Testing the Model**

To use your trained reward classifier, configure the `HILSerlRobotEnvConfig` to use your model:

<!-- prettier-ignore-start -->
```python
env_config = HILSerlRobotEnvConfig(
    reward_classifier_pretrained_path="path_to_your_pretrained_trained_model",
    # Other environment parameters
)
```
<!-- prettier-ignore-end -->

or set the argument in the json config file.

```json
{
  "reward_classifier_pretrained_path": "path_to_your_pretrained_model"
}
```

Run `gym_manipulator.py` to test the model.

```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/env_config.json
```

The reward classifier will automatically provide rewards based on the visual input from the robot's cameras.

**Example Workflow for training the reward classifier**

1. **Create the configuration files**:
   Create the necessary json configuration files for the reward classifier and the environment. Check the examples [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/tree/main).

2. **Collect a dataset**:

   ```bash
   python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
   ```

3. **Train the classifier**:

   ```bash
   python -m lerobot.scripts.train --config_path src/lerobot/configs/reward_classifier_train_config.json
   ```

4. **Test the classifier**:
   ```bash
   python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
   ```

### Training with Actor-Learner

The LeRobot system uses a distributed actor-learner architecture for training. This architecture decouples robot interactions from the learning process, allowing them to run concurrently without blocking each other. The actor server handles robot observations and actions, sending interaction data to the learner server. The learner server performs gradient descent and periodically updates the actor's policy weights. You will need to start two processes: a learner and an actor.

**Configuration Setup**

Create a training configuration file (example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_config_hilserl_so100.json)). The training config is based on the main `TrainRLServerPipelineConfig` class in `lerobot/configs/train.py`.

1. Configure the policy settings (`type="sac"`, `device`, etc.)
2. Set `dataset` to your cropped dataset
3. Configure environment settings with crop parameters
4. Check the other parameters related to SAC in [configuration_sac.py](https://github.com/huggingface/lerobot/blob/19bb621a7d0a31c20cd3cc08b1dbab68d3031454/lerobot/common/policies/sac/configuration_sac.py#L79).
5. Verify that the `policy` config is correct with the right `input_features` and `output_features` for your task.

**Starting the Learner**

First, start the learner server process:

```bash
python -m lerobot.scripts.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json
```

The learner:

- Initializes the policy network
- Prepares replay buffers
- Opens a `gRPC` server to communicate with actors
- Processes transitions and updates the policy

**Starting the Actor**

In a separate terminal, start the actor process with the same configuration:

```bash
python -m lerobot.scripts.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json
```

The actor:

- Connects to the learner via `gRPC`
- Initializes the environment
- Execute rollouts of the policy to collect experience
- Sends transitions to the learner
- Receives updated policy parameters

**Training Flow**

The training proceeds automatically:

1. The actor executes the policy in the environment
2. Transitions are collected and sent to the learner
3. The learner updates the policy based on these transitions
4. Updated policy parameters are sent back to the actor
5. The process continues until the specified step limit is reached

**Human in the Loop**

- The key to learning efficiently is to have human interventions to provide corrective feedback and completing the task to aide the policy learning and exploration.
- To perform human interventions, you can press the upper right trigger button on the gamepad (or the `space` key on the keyboard). This will pause the policy actions and allow you to take over.
- A successful experiment is one where the human has to intervene at the start but then reduces the amount of interventions as the policy improves. You can monitor the intervention rate in the `wandb` dashboard.

<p align="center">
  <img
    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hil_effect.png?raw=true"
    alt="Figure shows the control mappings on a Logitech gamepad."
    title="Gamepad Control Mapping"
    width="100%"
  ></img>
</p>

<p align="center">
  <i>
    Example showing how human interventions help guide policy learning over time
  </i>
</p>

- The figure shows the plot of the episodic reward over interaction step. The figure shows the effect of human interventions on the policy learning.
- The orange curve is an experiment without any human interventions. While the pink and blue curves are experiments with human interventions.
- We can observe that the number of steps where the policy starts achieving the maximum reward is cut by a quarter when human interventions are present.

**Monitoring and Debugging**

If you have `wandb.enable` set to `true` in your configuration, you can monitor training progress in real-time through the [Weights & Biases](https://wandb.ai/site/) dashboard.

### Guide to Human Interventions

The learning process is very sensitive to the intervention strategy. It will takes a few runs to understand how to intervene effectively. Some tips and hints:

- Allow the policy to explore for a few episodes at the start of training.
- Avoid intervening for long periods of time. Try to intervene in situation to correct the robot's behaviour when it goes off track.
- Once the policy starts achieving the task, even if its not perfect, you can limit your interventions to simple quick actions like a simple grasping commands.

The ideal behaviour is that your intervention rate should drop gradually during training as shown in the figure below.

<p align="center">
  <img
    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/intervention_rate_tutorial_rl.png?raw=true"
    alt="Intervention rate"
    title="Intervention rate during training"
    width="100%"
  ></img>
</p>

<p align="center">
  <i>
    Plot of the intervention rate during a training run on a pick and lift cube
    task
  </i>
</p>

### Key hyperparameters to tune

Some configuration values have a disproportionate impact on training stability and speed:

- **`temperature_init`** (`policy.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
- **`policy_parameters_push_frequency`** (`policy.actor_learner_config.policy_parameters_push_frequency`) – interval in _seconds_ between two weight pushes from the learner to the actor. The default is `4 s`. Decrease to **1-2 s** to provide fresher weights (at the cost of more network traffic); increase only if your connection is slow, as this will reduce sample efficiency.
- **`storage_device`** (`policy.storage_device`) – device on which the learner keeps the policy parameters. If you have spare GPU memory, set this to `"cuda"` (instead of the default `"cpu"`). Keeping the weights on-GPU removes CPU→GPU transfer overhead and can significantly increase the number of learner updates per second.

Congrats 🎉, you have finished this tutorial!

> [!TIP]
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).

Paper citation:

```
@article{luo2024precise,
  title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
  author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
  journal={arXiv preprint arXiv:2410.21845},
  year={2024}
}
```
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								# HIL-SERL Real Robot Training Workflow Guide
 								In this tutorial you will go through the full Human-in-the-Loop Sample-Efficient Reinforcement Learning (HIL-SERL) workflow using LeRobot. You will master training a policy with RL on a real robot in just a few hours.
 								HIL-SERL is a sample-efficient reinforcement learning algorithm that combines human demonstrations with online learning and human interventions. The approach starts from a small set of human demonstrations, uses them to train a reward classifier, and then employs an actor-learner architecture where humans can intervene during policy execution to guide exploration and correct unsafe behaviors. In this tutorial, you'll use a gamepad to provide interventions and control the robot during the learning process.
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								It combines three key ingredients: 1. **Offline demonstrations & reward classifier:** a handful of human-teleop episodes plus a vision-based success detector give the policy a shaped starting point. 2. **On-robot actor / learner loop with human interventions:** a distributed Soft Actor Critic (SAC) learner updates the policy while an actor explores on the physical robot; the human can jump in at any time to correct dangerous or unproductive behaviour. 3. **Safety & efficiency tools:** joint/end-effector (EE) bounds, crop region of interest (ROI) preprocessing and WandB monitoring keep the data useful and the hardware safe.
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								Together these elements let HIL-SERL reach near-perfect task success and faster cycle times than imitation-only baselines.
 								<p align="center">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  <img
 								    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hilserl-main-figure.png"
 								    alt="HIL-SERL workflow"
 								    title="HIL-SERL workflow"
 								    width="100%"
 								  ></img>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								</p>
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<p align="center">
 								  <i>HIL-SERL workflow, Luo et al. 2024</i>
 								</p>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								This guide provides step-by-step instructions for training a robot policy using LeRobot's HilSerl implementation to train on a real robot.
 								## What do I need?
 								- A gamepad (recommended) or keyboard to control the robot
 								- A Nvidia GPU
 								- A real robot with a follower and leader arm (optional if you use the keyboard or the gamepad)
-												Refactor kinematics and switch to using placo (#1322)

Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: leo-berte <leonardo.bertelli96@gmail.com>
											
										
										
											2025-07-02 15:20:04 +02:00
+								- A URDF file for the robot for the kinematics package (check `lerobot/common/model/kinematics.py`)
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								## What kind of tasks can I train?
 								One can use HIL-SERL to train on a variety of manipulation tasks. Some recommendations:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- Start with a simple task to understand how the system works.
 								  - Push cube to a goal region
 								  - Pick and lift cube with the gripper
 								- Avoid extremely long horizon tasks. Focus on tasks that can be completed in 5-10 seconds.
 								- Once you have a good idea of how the system works, you can try more complex tasks and longer horizons.
 								  - Pick and place cube
 								  - Bimanual tasks to pick objects with two arms
 								  - Hand-over tasks to transfer objects from one arm to another
 								  - Go crazy!
 								## Install LeRobot with HIL-SERL
 								To install LeRobot with HIL-SERL, you need to install the `hilserl` extra.
 								```bash
 								pip install -e ".[hilserl]"
 								```
 								## Real Robot Training Workflow
 								### Understanding Configuration
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								The training process begins with proper configuration for the HILSerl environment. The configuration class of interest is `HILSerlRobotEnvConfig` in `lerobot/envs/configs.py`. Which is defined as:
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-start -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```python
 								class HILSerlRobotEnvConfig(EnvConfig):
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								    robot: RobotConfig | None = None    # Main robot agent (defined in `lerobot/robots`)
 								    teleop: TeleoperatorConfig | None = None    # Teleoperator agent, e.g., gamepad or leader arm, (defined in `lerobot/teleoperators`)
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								    wrapper: EnvTransformConfig | None = None    # Environment wrapper settings; check `lerobot/scripts/server/gym_manipulator.py`
 								    fps: int = 10    # Control frequency
 								    name: str = "real_robot"    # Environment name
 								    mode: str = None    # "record", "replay", or None (for training)
 								    repo_id: str | None = None    # LeRobot dataset repository ID
 								    dataset_root: str | None = None    # Local dataset root (optional)
 								    task: str = ""    # Task identifier
 								    num_episodes: int = 10    # Number of episodes for recording
 								    episode: int = 0    # episode index for replay
 								    device: str = "cuda"    # Compute device
 								    push_to_hub: bool = True    # Whether to push the recorded datasets to Hub
 								    pretrained_policy_name_or_path: str | None = None    # For policy loading
 								    reward_classifier_pretrained_path: str | None = None    # For reward model
 								    number_of_steps_after_success: int = 0    # For reward classifier, collect more positive examples after a success to train a classifier
 								```
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-end -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								### Finding Robot Workspace Bounds
 								Before collecting demonstrations, you need to determine the appropriate operational bounds for your robot.
 								This helps simplify the problem of learning on the real robot in two ways: 1) by limiting the robot's operational space to a specific region that solves the task and avoids unnecessary or unsafe exploration, and 2) by allowing training in end-effector space rather than joint space. Empirically, learning in joint space for reinforcement learning in manipulation is often a harder problem - some tasks are nearly impossible to learn in joint space but become learnable when the action space is transformed to end-effector coordinates.
 								**Using find_joint_limits.py**
 								This script helps you find the safe operational bounds for your robot's end-effector. Given that you have a follower and leader arm, you can use the script to find the bounds for the follower arm that will be applied during training.
 								Bounding the action space will reduce the redundant exploration of the agent and guarantees safety.
 								```bash
 								python -m lerobot.scripts.find_joint_limits \
 								    --robot.type=so100_follower \
 								    --robot.port=/dev/tty.usbmodem58760431541 \
 								    --robot.id=black \
 								    --teleop.type=so100_leader \
 								    --teleop.port=/dev/tty.usbmodem58760431551 \
 								    --teleop.id=blue
 								```
 								**Workflow**
 . Run the script and move the robot through the space that solves the task
 . The script will record the minimum and maximum end-effector positions and the joint angles and prints them to the console, for example:
 								   ```
 								   Max ee position [0.2417 0.2012 0.1027]
 								   Min ee position [0.1663 -0.0823 0.0336]
 								   Max joint positions [-20.0, -20.0, -20.0, -20.0, -20.0, -20.0]
 								   Min joint positions [50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
 								   ```
 . Use these values in the configuration of your teleoperation device (TeleoperatorConfig) under the `end_effector_bounds` field
 								**Example Configuration**
 								```json
 								"end_effector_bounds": {
 								    "max": [0.24, 0.20, 0.10],
 								    "min": [0.16, -0.08, 0.03]
 								}
 								```
 								### Collecting Demonstrations
 								With the bounds defined, you can safely collect demonstrations for training. Training RL with off-policy algorithm allows us to use offline datasets collected in order to improve the efficiency of the learning process.
 								**Setting Up Record Mode**
 								Create a configuration file for recording demonstrations (or edit an existing one like [env_config_so100.json](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json)):
 . Set `mode` to `"record"`
 . Specify a unique `repo_id` for your dataset (e.g., "username/task_name")
 . Set `num_episodes` to the number of demonstrations you want to collect
 . Set `crop_params_dict` to `null` initially (we'll determine crops later)
 . Configure `robot`, `cameras`, and other hardware settings
 								Example configuration section:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```json
 								"mode": "record",
 								"repo_id": "username/pick_lift_cube",
 								"dataset_root": null,
 								"task": "pick_and_lift",
 								"num_episodes": 15,
 								"episode": 0,
 								"push_to_hub": true
 								```
 								### Using a Teleoperation Device
 								Along with your robot, you will need a teleoperation device to control it in order to collect datasets of your task and perform interventions during the online training.
 								We support using a gamepad or a keyboard or the leader arm of the robot.
 								HIL-Serl learns actions in the end-effector space of the robot. Therefore, the teleoperation will control the end-effector's x,y,z displacements.
 								For that we need to define a version of the robot that takes actions in the end-effector space. Check the robot class `SO100FollowerEndEffector` and its configuration `SO100FollowerEndEffectorConfig` for the default parameters related to the end-effector space.
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-start -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```python
 								class SO100FollowerEndEffectorConfig(SO100FollowerConfig):
 								    """Configuration for the SO100FollowerEndEffector robot."""
 								    # Default bounds for the end-effector position (in meters)
 								    end_effector_bounds: dict[str, list[float]] = field( # bounds for the end-effector in x,y,z direction
 								        default_factory=lambda: {
 								            "min": [-1.0, -1.0, -1.0],  # min x, y, z
 								            "max": [1.0, 1.0, 1.0],  # max x, y, z
 								        }
 								    )
 								    max_gripper_pos: float = 50 # maximum gripper position that the gripper will be open at
 								    end_effector_step_sizes: dict[str, float] = field( # maximum step size for the end-effector in x,y,z direction
 								        default_factory=lambda: {
 								            "x": 0.02,
 								            "y": 0.02,
 								            "z": 0.02,
 								        }
 								    )
 								```
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-end -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								The `Teleoperator` defines the teleoperation device. You can check the list of available teleoperators in `lerobot/teleoperators`.
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								**Setting up the Gamepad**
 								The gamepad provides a very convenient way to control the robot and the episode state.
 								To setup the gamepad, you need to set the `control_mode` to `"gamepad"` and define the `teleop` section in the configuration file.
 								```json
 								    "teleop": {
 								        "type": "gamepad",
 								        "use_gripper": true
 								    },
 								```
 								<p align="center">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  <img
 								    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/gamepad_guide.jpg?raw=true"
 								    alt="Figure shows the control mappings on a Logitech gamepad."
 								    title="Gamepad Control Mapping"
 								    width="100%"
 								  ></img>
 								</p>
 								<p align="center">
 								  <i>Gamepad button mapping for robot control and episode management</i>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								</p>
 								**Setting up the SO101 leader**
 								The SO101 leader arm has reduced gears that allows it to move and track the follower arm during exploration. Therefore, taking over is much smoother than the gearless SO100.
 								To setup the SO101 leader, you need to set the `control_mode` to `"leader"` and define the `teleop` section in the configuration file.
 								```json
 								    "teleop": {
 								        "type": "so101_leader",
 								        "port": "/dev/tty.usbmodem585A0077921", # check your port number
 								        "use_degrees": true
 								    },
 								```
 								In order to annotate the success/failure of the episode, **you will need** to use a keyboard to press `s` for success, `esc` for failure.
 								During the online training, press `space` to take over the policy and `space` again to give the control back to the policy.
 								<details>
 								<summary><strong>Video: SO101 leader teleoperation</strong></summary>
 								<div class="video-container">
 								  <video controls width="600">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								    <source
 								      src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/so101_leader_tutorial.mp4"
 								      type="video/mp4"
 								    />
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								  </video>
 								</div>
 								<p align="center"><i>SO101 leader teleoperation example, the leader tracks the follower, press `space` to intervene</i></p>
 								</details>
 								**Recording Demonstrations**
 								Start the recording process, an example of the config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json):
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config_so100.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								During recording:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+. The robot will reset to the initial position defined in the configuration file `fixed_reset_joint_positions`
 . Complete the task successfully
 . The episode ends with a reward of 1 when you press the "success" button
 . If the time limit is reached, or the fail button is pressed, the episode ends with a reward of 0
 . You can rerecord an episode by pressing the "rerecord" button
 . The process automatically continues to the next episode
 . After recording all episodes, the dataset is pushed to the Hugging Face Hub (optional) and saved locally
 								### Processing the Dataset
 								After collecting demonstrations, process them to determine optimal camera crops.
 								Reinforcement learning is sensitive to background distractions, so it is important to crop the images to the relevant workspace area.
 								Visual RL algorithms learn directly from pixel inputs, making them vulnerable to irrelevant visual information. Background elements like changing lighting, shadows, people moving, or objects outside the workspace can confuse the learning process. Good ROI selection should:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- Include only the essential workspace where the task happens
 								- Capture the robot's end-effector and all objects involved in the task
 								- Exclude unnecessary background elements and distractions
 								Note: If you already know the crop parameters, you can skip this step and just set the `crop_params_dict` in the configuration file during recording.
 								**Determining Crop Parameters**
 								Use the `crop_dataset_roi.py` script to interactively select regions of interest in your camera images:
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.crop_dataset_roi --repo-id username/pick_lift_cube
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 . For each camera view, the script will display the first frame
 . Draw a rectangle around the relevant workspace area
 . Press 'c' to confirm the selection
 . Repeat for all camera views
 . The script outputs cropping parameters and creates a new cropped dataset
 								Example output:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								Selected Rectangular Regions of Interest (top, left, height, width):
 								observation.images.side: [180, 207, 180, 200]
 								observation.images.front: [180, 250, 120, 150]
 								```
 								<p align="center">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  <img
 								    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/crop_dataset.gif"
 								    width="600"
 								  />
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								</p>
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<p align="center">
 								  <i>Interactive cropping tool for selecting regions of interest</i>
 								</p>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								**Updating Configuration**
 								Add these crop parameters to your training configuration:
 								```json
 								"crop_params_dict": {
 								    "observation.images.side": [180, 207, 180, 200],
 								    "observation.images.front": [180, 250, 120, 150]
 								},
 								"resize_size": [128, 128]
 								```
 								**Recommended image resolution**
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								Most vision-based policies have been validated on square inputs of either **128×128** (default) or **64×64** pixels. We therefore advise setting the resize_size parameter to [128, 128] – or [64, 64] if you need to save GPU memory and bandwidth. Other resolutions are possible but have not been extensively tested.
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								### Training a Reward Classifier
 								The reward classifier plays an important role in the HIL-SERL workflow by automating reward assignment and automatically detecting episode success. Instead of manually defining reward functions or relying on human feedback for every timestep, the reward classifier learns to predict success/failure from visual observations. This enables the RL algorithm to learn efficiently by providing consistent and automated reward signals based on the robot's camera inputs.
 								This guide explains how to train a reward classifier for human-in-the-loop reinforcement learning implementation of LeRobot. Reward classifiers learn to predict the reward value given a state which can be used in an RL setup to train a policy.
 								**Note**: Training a reward classifier is optional. You can start the first round of RL experiments by annotating the success manually with your gamepad or keyboard device.
 								The reward classifier implementation in `modeling_classifier.py` uses a pretrained vision model to process the images. It can output either a single value for binary rewards to predict success/fail cases or multiple values for multi-class settings.
 								**Collecting a Dataset for the reward classifier**
 								Before training, you need to collect a dataset with labeled examples. The `record_dataset` function in `gym_manipulator.py` enables the process of collecting a dataset of observations, actions, and rewards.
 								To collect a dataset, you need to modify some parameters in the environment configuration based on HILSerlRobotEnvConfig.
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/reward_classifier_train_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								**Key Parameters for Data Collection**
 								- **mode**: set it to `"record"` to collect a dataset
 								- **repo_id**: `"hf_username/dataset_name"`, name of the dataset and repo on the hub
 								- **num_episodes**: Number of episodes to record
 								- **number_of_steps_after_success**: Number of additional frames to record after a success (reward=1) is detected
 								- **fps**: Number of frames per second to record
 								- **push_to_hub**: Whether to push the dataset to the hub
 								The `number_of_steps_after_success` parameter is crucial as it allows you to collect more positive examples. When a success is detected, the system will continue recording for the specified number of steps while maintaining the reward=1 label. Otherwise, there won't be enough states in the dataset labeled to 1 to train a good classifier.
 								Example configuration section for data collection:
 								```json
 								{
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  "mode": "record",
 								  "repo_id": "hf_username/dataset_name",
 								  "dataset_root": "data/your_dataset",
 								  "num_episodes": 20,
 								  "push_to_hub": true,
 								  "fps": 10,
 								  "number_of_steps_after_success": 15
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								}
 								```
 								**Reward Classifier Configuration**
 								The reward classifier is configured using `configuration_classifier.py`. Here are the key parameters:
 								- **model_name**: Base model architecture (e.g., we mainly use `"helper2424/resnet10"`)
 								- **model_type**: `"cnn"` or `"transformer"`
 								- **num_cameras**: Number of camera inputs
 								- **num_classes**: Number of output classes (typically 2 for binary success/failure)
 								- **hidden_dim**: Size of hidden representation
 								- **dropout_rate**: Regularization parameter
 								- **learning_rate**: Learning rate for optimizer
 								Example configuration for training the [reward classifier](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/reward_classifier_train_config.json):
 								```json
 								{
 								  "policy": {
 								    "type": "reward_classifier",
 								    "model_name": "helper2424/resnet10",
 								    "model_type": "cnn",
 								    "num_cameras": 2,
 								    "num_classes": 2,
 								    "hidden_dim": 256,
 								    "dropout_rate": 0.1,
 								    "learning_rate": 1e-4,
 								    "device": "cuda",
 								    "use_amp": true,
 								    "input_features": {
 								      "observation.images.front": {
 								        "type": "VISUAL",
 								        "shape": [3, 128, 128]
 								      },
 								      "observation.images.side": {
 								        "type": "VISUAL",
 								        "shape": [3, 128, 128]
 								      }
 								    }
 								  }
 								}
 								```
 								**Training the Classifier**
 								To train the classifier, use the `train.py` script with your configuration:
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.train --config_path path/to/reward_classifier_train_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								**Deploying and Testing the Model**
 								To use your trained reward classifier, configure the `HILSerlRobotEnvConfig` to use your model:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-start -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```python
 								env_config = HILSerlRobotEnvConfig(
 								    reward_classifier_pretrained_path="path_to_your_pretrained_trained_model",
 								    # Other environment parameters
 								)
 								```
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<!-- prettier-ignore-end -->
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								or set the argument in the json config file.
 								```json
 								{
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  "reward_classifier_pretrained_path": "path_to_your_pretrained_model"
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								}
 								```
 								Run `gym_manipulator.py` to test the model.
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/env_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								The reward classifier will automatically provide rewards based on the visual input from the robot's cameras.
 								**Example Workflow for training the reward classifier**
 . **Create the configuration files**:
 								   Create the necessary json configuration files for the reward classifier and the environment. Check the examples [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/tree/main).
 . **Collect a dataset**:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								   ```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								   python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								   ```
 . **Train the classifier**:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								   ```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								   python -m lerobot.scripts.train --config_path src/lerobot/configs/reward_classifier_train_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								   ```
 . **Test the classifier**:
 								   ```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								   python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								   ```
 								### Training with Actor-Learner
 								The LeRobot system uses a distributed actor-learner architecture for training. This architecture decouples robot interactions from the learning process, allowing them to run concurrently without blocking each other. The actor server handles robot observations and actions, sending interaction data to the learner server. The learner server performs gradient descent and periodically updates the actor's policy weights. You will need to start two processes: a learner and an actor.
 								**Configuration Setup**
 								Create a training configuration file (example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_config_hilserl_so100.json)). The training config is based on the main `TrainRLServerPipelineConfig` class in `lerobot/configs/train.py`.
 . Configure the policy settings (`type="sac"`, `device`, etc.)
 . Set `dataset` to your cropped dataset
 . Configure environment settings with crop parameters
-												Fixing all broken links in integrate_hardware document (#1445)

Signed-off-by: arulloomba1 <145633197+arulloomba1@users.noreply.github.com>
											
										
										
											2025-07-25 07:44:43 -07:00
+. Check the other parameters related to SAC in [configuration_sac.py](https://github.com/huggingface/lerobot/blob/19bb621a7d0a31c20cd3cc08b1dbab68d3031454/lerobot/common/policies/sac/configuration_sac.py#L79).
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+. Verify that the `policy` config is correct with the right `input_features` and `output_features` for your task.
 								**Starting the Learner**
 								First, start the learner server process:
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								The learner:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- Initializes the policy network
 								- Prepares replay buffers
 								- Opens a `gRPC` server to communicate with actors
 								- Processes transitions and updates the policy
 								**Starting the Actor**
 								In a separate terminal, start the actor process with the same configuration:
 								```bash
-												Package folder structure (#1417)

* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
											
										
										
											2025-07-01 16:34:46 +02:00
+								python -m lerobot.scripts.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								The actor:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- Connects to the learner via `gRPC`
 								- Initializes the environment
 								- Execute rollouts of the policy to collect experience
 								- Sends transitions to the learner
 								- Receives updated policy parameters
 								**Training Flow**
 								The training proceeds automatically:
 . The actor executes the policy in the environment
 . Transitions are collected and sent to the learner
 . The learner updates the policy based on these transitions
 . Updated policy parameters are sent back to the actor
 . The process continues until the specified step limit is reached
 								**Human in the Loop**
 								- The key to learning efficiently is to have human interventions to provide corrective feedback and completing the task to aide the policy learning and exploration.
 								- To perform human interventions, you can press the upper right trigger button on the gamepad (or the `space` key on the keyboard). This will pause the policy actions and allow you to take over.
 								- A successful experiment is one where the human has to intervene at the start but then reduces the amount of interventions as the policy improves. You can monitor the intervention rate in the `wandb` dashboard.
 								<p align="center">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  <img
 								    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hil_effect.png?raw=true"
 								    alt="Figure shows the control mappings on a Logitech gamepad."
 								    title="Gamepad Control Mapping"
 								    width="100%"
 								  ></img>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								</p>
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<p align="center">
 								  <i>
 								    Example showing how human interventions help guide policy learning over time
 								  </i>
 								</p>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								- The figure shows the plot of the episodic reward over interaction step. The figure shows the effect of human interventions on the policy learning.
 								- The orange curve is an experiment without any human interventions. While the pink and blue curves are experiments with human interventions.
 								- We can observe that the number of steps where the policy starts achieving the maximum reward is cut by a quarter when human interventions are present.
 								**Monitoring and Debugging**
 								If you have `wandb.enable` set to `true` in your configuration, you can monitor training progress in real-time through the [Weights & Biases](https://wandb.ai/site/) dashboard.
 								### Guide to Human Interventions
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								The learning process is very sensitive to the intervention strategy. It will takes a few runs to understand how to intervene effectively. Some tips and hints:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- Allow the policy to explore for a few episodes at the start of training.
 								- Avoid intervening for long periods of time. Try to intervene in situation to correct the robot's behaviour when it goes off track.
 								- Once the policy starts achieving the task, even if its not perfect, you can limit your interventions to simple quick actions like a simple grasping commands.
 								The ideal behaviour is that your intervention rate should drop gradually during training as shown in the figure below.
 								<p align="center">
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								  <img
 								    src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/intervention_rate_tutorial_rl.png?raw=true"
 								    alt="Intervention rate"
 								    title="Intervention rate during training"
 								    width="100%"
 								  ></img>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								</p>
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								<p align="center">
 								  <i>
 								    Plot of the intervention rate during a training run on a pick and lift cube
 								    task
 								  </i>
 								</p>
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								### Key hyperparameters to tune
 								Some configuration values have a disproportionate impact on training stability and speed:
 								- **`temperature_init`** (`policy.temperature_init`) – initial entropy temperature in SAC. Higher values encourage more exploration; lower values make the policy more deterministic early on. A good starting point is `1e-2`. We observed that setting it too high can make human interventions ineffective and slow down learning.
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								- **`policy_parameters_push_frequency`** (`policy.actor_learner_config.policy_parameters_push_frequency`) – interval in _seconds_ between two weight pushes from the learner to the actor. The default is `4 s`. Decrease to **1-2 s** to provide fresher weights (at the cost of more network traffic); increase only if your connection is slow, as this will reduce sample efficiency.
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								- **`storage_device`** (`policy.storage_device`) – device on which the learner keeps the policy parameters. If you have spare GPU memory, set this to `"cuda"` (instead of the default `"cpu"`). Keeping the weights on-GPU removes CPU→GPU transfer overhead and can significantly increase the number of learner updates per second.
 								Congrats 🎉, you have finished this tutorial!
 								> [!TIP]
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
+								> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
 								Paper citation:
-												Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
											
										
										
											2025-07-17 14:30:20 +02:00
-												Port HIL SERL (#644)

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Eugene Mironov <helper2424@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
											
										
										
											2025-06-13 13:15:47 +02:00
+								```
 								@article{luo2024precise,
 								  title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
 								  author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
 								  journal={arXiv preprint arXiv:2410.21845},
 								  year={2024}
 								}
 								```