docs/source/metaworld.mdx

# Meta-World

Meta-World is an open-source simulation benchmark for **multi-task and meta reinforcement learning** in continuous-control robotic manipulation. It bundles 50 diverse manipulation tasks using everyday objects and a common tabletop Sawyer arm, providing a standardized playground to test whether algorithms can learn many different tasks and generalize quickly to new ones.

- Paper: [Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning](https://arxiv.org/abs/1910.10897)
- GitHub: [Farama-Foundation/Metaworld](https://github.com/Farama-Foundation/Metaworld)
- Project website: [metaworld.farama.org](https://metaworld.farama.org)

![MetaWorld MT10 demo](https://meta-world.github.io/figures/ml45.gif)

## Available tasks

Meta-World provides 50 tasks organized into difficulty groups. In LeRobot, you can evaluate on individual tasks, difficulty groups, or the full MT50 suite:

| Group      | CLI name             | Tasks | Description                                            |
| ---------- | -------------------- | ----- | ------------------------------------------------------ |
| Easy       | `easy`               | 28    | Tasks with simple dynamics and single-step goals       |
| Medium     | `medium`             | 11    | Tasks requiring multi-step reasoning                   |
| Hard       | `hard`               | 6     | Tasks with complex contacts and precise manipulation   |
| Very Hard  | `very_hard`          | 5     | The most challenging tasks in the suite                |
| MT50 (all) | Comma-separated list | 50    | All 50 tasks — the most challenging multi-task setting |

You can also pass individual task names directly (e.g., `assembly-v3`, `dial-turn-v3`).

We provide a LeRobot-ready dataset for Meta-World MT50 on the HF Hub: [lerobot/metaworld_mt50](https://huggingface.co/datasets/lerobot/metaworld_mt50). This dataset is formatted for the MT50 evaluation that uses all 50 tasks with fixed object/goal positions and one-hot task vectors for consistency.

## Installation

After following the LeRobot installation instructions:

```bash
pip install -e ".[metaworld]"
```

<Tip warning={true}>
If you encounter an `AssertionError: ['human', 'rgb_array', 'depth_array']` when running Meta-World environments, this is a mismatch between Meta-World and your Gymnasium version. Fix it with:

```bash
pip install "gymnasium==1.1.0"
```

</Tip>

## Evaluation

### Default evaluation (recommended)

Evaluate on the medium difficulty split (a good balance of coverage and compute):

```bash
lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=metaworld \
  --env.task=medium \
  --eval.batch_size=1 \
  --eval.n_episodes=10
```

### Single-task evaluation

Evaluate on a specific task:

```bash
lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=metaworld \
  --env.task=assembly-v3 \
  --eval.batch_size=1 \
  --eval.n_episodes=10
```

### Multi-task evaluation

Evaluate across multiple tasks or difficulty groups:

```bash
lerobot-eval \
  --policy.path="your-policy-id" \
  --env.type=metaworld \
  --env.task=assembly-v3,dial-turn-v3,handle-press-side-v3 \
  --eval.batch_size=1 \
  --eval.n_episodes=10
```

- `--env.task` accepts explicit task lists (comma-separated) or difficulty groups (e.g., `easy`, `medium`, `hard`, `very_hard`).
- `--eval.batch_size` controls how many environments run in parallel.
- `--eval.n_episodes` sets how many episodes to run per task.

### Policy inputs and outputs

**Observations:**

- `observation.image` — single camera view (`corner2`), 480x480 HWC uint8
- `observation.state` — 4-dim proprioceptive state (end-effector position + gripper)

**Actions:**

- Continuous control in `Box(-1, 1, shape=(4,))` — 3D end-effector delta + 1D gripper

### Recommended evaluation episodes

For reproducible benchmarking, use **10 episodes per task**. For the full MT50 suite this gives 500 total episodes. If you care about generalization, run on the full MT50 — it is intentionally challenging and reveals strengths/weaknesses better than a few narrow tasks.

## Training

### Example training command

Train a SmolVLA policy on a subset of Meta-World tasks:

```bash
lerobot-train \
  --policy.type=smolvla \
  --policy.repo_id=${HF_USER}/metaworld-test \
  --policy.load_vlm_weights=true \
  --dataset.repo_id=lerobot/metaworld_mt50 \
  --env.type=metaworld \
  --env.task=assembly-v3,dial-turn-v3,handle-press-side-v3 \
  --output_dir=./outputs/ \
  --steps=100000 \
  --batch_size=4 \
  --eval.batch_size=1 \
  --eval.n_episodes=1 \
  --eval_freq=1000
```

## Practical tips

- Use the one-hot task conditioning for multi-task training (MT10/MT50 conventions) so policies have explicit task context.
- Inspect the dataset task descriptions and the `info["is_success"]` keys when writing post-processing or logging so your success metrics line up with the benchmark.
- Adjust `batch_size`, `steps`, and `eval_freq` to match your compute budget.