docs/source/env_processor.mdx

# Environment Processors

Environment processors are a critical layer in LeRobot's data processing architecture that handle **environment-specific** transformations, separate from policy-specific processing. This separation of concerns enables cleaner code, better modularity, and easier experimentation with different environments and policies.

## Why Environment Processors?

When working with different robot environments (LIBERO, MetaWorld, Aloha, etc.), each environment often has unique data formats, coordinate systems, and conventions that need standardization **before** policy processing. Without environment processors, these transformations would be:

1. **Hardcoded in environment code** - Making it difficult to experiment with different state representations
2. **Duplicated across policies** - Each policy would need to handle environment-specific quirks
3. **Mixed with policy logic** - Violating separation of concerns and making debugging harder

Environment processors solve this by providing a **dedicated processing layer** between raw environment observations and policy inputs.

## The Processing Pipeline

Here's how data flows through the complete processing pipeline during evaluation:

```python
# In lerobot_eval.py rollout() function:

# 1. Raw environment observation (numpy arrays, various formats)
raw_observation = env.step(action)

# 2. Convert numpy to torch, normalize images [0,1]
observation = preprocess_observation(raw_observation)

# 3. Add task metadata (for multi-task environments)
observation = add_envs_task(env, observation)

# 4. ENVIRONMENT-SPECIFIC preprocessing (NEW!)
#    - Flatten robot states
#    - Rotate images to match dataset conventions
#    - Handle environment-specific coordinate systems
observation = env_preprocessor(observation)

# 5. POLICY-SPECIFIC preprocessing
#    - Normalize with dataset statistics
#    - Add batch dimensions
#    - Move to GPU
#    - Tokenize language instructions
observation = preprocessor(observation)

# 6. Policy inference
action = policy.select_action(observation)

# 7. POLICY-SPECIFIC postprocessing
#    - Unnormalize actions
#    - Remove batch dimensions
action = postprocessor(action)

# 8. ENVIRONMENT-SPECIFIC postprocessing (NEW!)
#    - Convert action formats if needed
#    - Apply environment-specific constraints
action_transition = {"action": action}
action_transition = env_postprocessor(action_transition)
action = action_transition["action"]

# 9. Execute in environment
env.step(action)
```

## The Benefits

### 1. **Separation of Concerns**

Environment processors handle transformations specific to the **environment's data format**, while policy processors handle transformations specific to the **model's requirements**.

```python
# ❌ Before: Mixed concerns
class LiberoVLAPolicy:
    def preprocess(self, obs):
        # Environment-specific: Flatten robot state (shouldn't be in policy!)
        state = self._flatten_robot_state(obs["robot_state"])
        # Policy-specific: Normalize with dataset stats
        state = self.normalizer(state)
        return state

# ✅ After: Clear separation
# Environment processor: Handles LIBERO's nested robot state
env_preprocessor = LiberoProcessorStep()  # Flattens robot_state

# Policy processor: Handles model requirements
policy_preprocessor = NormalizerProcessorStep(stats=dataset_stats)
```

### 2. **Flexibility and Reusability**

The same policy can work with different environment processors, and the same environment processor can work with different policies:

```python
# Use SmolVLA policy with LIBERO environment
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
smolvla_preprocessor, smolvla_postprocessor = make_pre_post_processors(smolvla_cfg)

# Or use ACT policy with the same LIBERO environment
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
act_preprocessor, act_postprocessor = make_pre_post_processors(act_cfg)
```

### 3. **Easier Experimentation**

Want to try different state representations for LIBERO? Just create a new processor:

```python
# Original: 8D state (pos + quat→axisangle + gripper)
@ProcessorStepRegistry.register("libero_processor")
class LiberoProcessorStep(ObservationProcessorStep):
    def _process_observation(self, obs):
        eef_pos = robot_state["eef"]["pos"]          # 3D
        eef_axisangle = quat2axisangle(quat)         # 3D
        gripper = robot_state["gripper"]["qpos"]     # 2D
        state = torch.cat([eef_pos, eef_axisangle, gripper], dim=-1)  # 8D
        return state

# Experiment: Add velocity for better control
@ProcessorStepRegistry.register("libero_velocity_processor")
class LiberoVelocityProcessorStep(ObservationProcessorStep):
    def _process_observation(self, obs):
        # Include velocities for 14D state
        eef_pos = robot_state["eef"]["pos"]          # 3D
        eef_axisangle = quat2axisangle(quat)         # 3D
        eef_vel = robot_state["eef"]["vel"]          # 3D  (NEW)
        gripper_pos = robot_state["gripper"]["qpos"] # 2D
        gripper_vel = robot_state["gripper"]["qvel"] # 3D  (NEW)
        state = torch.cat([eef_pos, eef_axisangle, eef_vel,
                          gripper_pos, gripper_vel], dim=-1)  # 14D
        return state
```

### 4. **Cleaner Environment Code**

Environments expose **all available data** without needing to know what downstream models will use:

```python
# LIBERO environment exposes full robot state
observation = {
    "pixels": {"image": img, "image2": img2},
    "robot_state": {
        "eef": {"pos": ..., "quat": ..., "vel": ..., "mat": ..., "axisangle": ...},
        "gripper": {"qpos": ..., "qvel": ...},
        "joints": {"pos": ..., "vel": ...}
    }
}

# Environment processor decides what to use
# Policy processor handles model-specific transformations
```

## Using Environment Processors

### Factory Function

The `make_env_pre_post_processors` function delegates to `env_cfg.get_env_processors()`:

```python
from lerobot.envs.factory import make_env_pre_post_processors
from lerobot.envs.configs import LiberoEnv, PushtEnv

# For LIBERO: Returns LiberoProcessorStep in preprocessor
libero_cfg = LiberoEnv(task="libero_spatial", camera_name=["agentview"])
env_preprocessor, env_postprocessor = make_env_pre_post_processors(libero_cfg, policy_cfg)

# For other environments: Returns identity processors (no-op)
pusht_cfg = PushtEnv()
env_preprocessor, env_postprocessor = make_env_pre_post_processors(pusht_cfg, policy_cfg)
```

### How It Works

Each `EnvConfig` subclass can override `get_env_processors()` to return benchmark-specific
processor pipelines. The base class returns identity (no-op) processors by default.

```python
# In your EnvConfig subclass:
def get_env_processors(self):
    from lerobot.processor.pipeline import PolicyProcessorPipeline
    return (
        PolicyProcessorPipeline(steps=[MyProcessorStep()]),
        PolicyProcessorPipeline(steps=[]),
    )
```

The factory function `make_env_pre_post_processors` simply delegates to this method,
with a special case for `XVLAConfig` policies which override the env processors entirely.

### Integration in Evaluation

In `lerobot_eval.py`, the environment processors are created once and used throughout:

```python
def eval_main(cfg: EvalPipelineConfig):
    # Create environment
    envs = make_env(cfg.env, n_envs=cfg.eval.batch_size)

    # Create policy
    policy = make_policy(cfg=cfg.policy, env_cfg=cfg.env)

    # Create policy processors
    preprocessor, postprocessor = make_pre_post_processors(
        policy_cfg=cfg.policy,
        pretrained_path=cfg.policy.pretrained_path,
    )

    # Create environment processors (NEW!)
    env_preprocessor, env_postprocessor = make_env_pre_post_processors(env_cfg=cfg.env)

    # Run evaluation with both processor types
    eval_policy_all(
        envs=envs,
        policy=policy,
        env_preprocessor=env_preprocessor,      # Environment-specific
        env_postprocessor=env_postprocessor,    # Environment-specific
        preprocessor=preprocessor,              # Policy-specific
        postprocessor=postprocessor,            # Policy-specific
        n_episodes=cfg.eval.n_episodes,
    )
```

## Example: LIBERO Environment Processor

The `LiberoProcessorStep` demonstrates a real-world environment processor:

```python
from lerobot.processor.pipeline import ObservationProcessorStep

@dataclass
@ProcessorStepRegistry.register(name="libero_processor")
class LiberoProcessorStep(ObservationProcessorStep):
    """
    Processes LIBERO observations into the LeRobot format.

    **State Processing:**
    - Extracts end-effector position (3D)
    - Converts quaternion to axis-angle representation (3D)
    - Extracts gripper joint positions (2D)
    - Concatenates into 8D state vector

    **Image Processing:**
    - Rotates images 180° to match HuggingFaceVLA/libero convention
    """

    def _process_observation(self, observation):
        processed_obs = observation.copy()

        # Process images: Flip 180° for camera convention
        for key in list(processed_obs.keys()):
            if key.startswith("observation.images."):
                img = processed_obs[key]
                img = torch.flip(img, dims=[2, 3])  # Flip H and W
                processed_obs[key] = img

        # Process robot_state: Flatten to 8D vector
        if "observation.robot_state" in processed_obs:
            robot_state = processed_obs.pop("observation.robot_state")

            eef_pos = robot_state["eef"]["pos"]           # (B, 3)
            eef_quat = robot_state["eef"]["quat"]         # (B, 4)
            gripper_qpos = robot_state["gripper"]["qpos"] # (B, 2)

            # Convert quaternion to axis-angle
            eef_axisangle = self._quat2axisangle(eef_quat)  # (B, 3)

            # Concatenate into single state vector
            state = torch.cat((eef_pos, eef_axisangle, gripper_qpos), dim=-1)
            state = state.float()

            processed_obs["observation.state"] = state

        return processed_obs
```

### Why These Transformations?

1. **Image Rotation**: The HuggingFaceVLA/libero dataset has images rotated 180° from the raw LIBERO simulator. The processor handles this convention mismatch so policies trained on the dataset work seamlessly.

2. **State Flattening**: The raw LIBERO environment exposes nested dictionaries with all available state information (position, quaternion, velocity, matrix representation, etc.). The processor:
   - Selects the relevant components (pos, quat, gripper)
   - Converts quaternion to axis-angle (more suitable for learning)
   - Flattens to a single 8D vector that policies expect

3. **Flexibility**: The environment still exposes **all** raw data. If you want to try different state representations (e.g., including velocities, using matrix representation instead of axis-angle), you can create a new processor without modifying the environment code.

## Adding Environment Processors for New Environments

To add environment processors for a new environment:

### 1. Create the Processor Step

```python
# In src/lerobot/processor/env_processor.py

@dataclass
@ProcessorStepRegistry.register(name="myenv_processor")
class MyEnvProcessorStep(ObservationProcessorStep):
    """Process observations from MyEnv."""

    def _process_observation(self, observation):
        processed = observation.copy()

        # Your environment-specific transformations
        if "myenv.specific.state" in processed:
            state = processed.pop("myenv.specific.state")
            # Transform to standard format
            processed["observation.state"] = self._transform_state(state)

        return processed
```

### 2. Update the Factory

```python
# In src/lerobot/envs/factory.py

def make_env_pre_post_processors(env_cfg: EnvConfig):
    if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
        preprocessor = PolicyProcessorPipeline(steps=[LiberoProcessorStep()])
    elif isinstance(env_cfg, MyEnvConfig) or "myenv" in env_cfg.type:
        preprocessor = PolicyProcessorPipeline(steps=[MyEnvProcessorStep()])
    else:
        preprocessor = PolicyProcessorPipeline(steps=[])

    postprocessor = PolicyProcessorPipeline(steps=[])
    return preprocessor, postprocessor
```

### 3. Use in Evaluation

No changes needed! The evaluation script automatically uses the appropriate processor:

```bash
lerobot-eval \
    --policy.path=lerobot/my_policy \
    --env.type=myenv \  # Automatically uses MyEnvProcessorStep
    --eval.n_episodes=10
```

## Future: Environment Postprocessors

Currently, postprocessors are identity (no-op) for all environments. Future use cases include:

### Action Space Transformations

```python
@dataclass
class MyEnvActionPostprocessor(ProcessorStep):
    """Convert policy actions to environment-specific format."""

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        action = transition["action"]

        # Example: Convert from Cartesian to joint space
        if self.action_space == "joint":
            action = self.ik_solver(action)

        # Example: Apply environment-specific safety limits
        action = torch.clamp(action, self.min_action, self.max_action)

        transition["action"] = action
        return transition
```

### Coordinate System Conversions

```python
@dataclass
class CoordinateTransformPostprocessor(ProcessorStep):
    """Transform actions between coordinate systems."""

    def __call__(self, transition: EnvTransition) -> EnvTransition:
        action = transition["action"]

        # Example: Policy outputs in world frame, env expects base frame
        action = self.world_to_base_transform(action)

        transition["action"] = action
        return transition
```

## Best Practices

1. **Keep environment processors simple**: They should only handle environment-specific data format issues, not complex learning-related transformations.

2. **Use policy processors for model requirements**: Normalization, batching, device placement, and tokenization belong in policy processors.

3. **Expose all data from environments**: Let processors decide what to use rather than hardcoding choices in the environment.

4. **Document conventions**: Clearly document any coordinate system conventions, camera orientations, or data formats that your processor handles.

5. **Test independently**: Environment processors should be testable without loading full policies or environments.

## Summary

Environment processors provide a **clean separation** between environment-specific data transformations and policy-specific model requirements. This architecture:

- ✅ Enables easy experimentation with different state representations
- ✅ Allows policies to work seamlessly across different environments
- ✅ Keeps environment code focused on simulation/hardware interface
- ✅ Makes processor pipelines more maintainable and debuggable
- ✅ Follows the single responsibility principle

The key insight: **Environments define data formats, processors standardize them, policies consume standardized data.** Each layer has a clear, focused responsibility.