diff --git a/docs/source/umi_pi0_relative_ee.mdx b/docs/source/umi_pi0_relative_ee.mdx index efa2daebb..e8699b6fa 100644 --- a/docs/source/umi_pi0_relative_ee.mdx +++ b/docs/source/umi_pi0_relative_ee.mdx @@ -1,16 +1,28 @@ # UMI Data with pi0 Relative EE Actions -This guide explains how to prepare a UMI-collected dataset for training a pi0 policy with relative end-effector (EE) actions, and how to run inference with the trained model. +This guide explains how to train a pi0 policy with UMI-style relative end-effector (EE) actions and deploy it on a real OpenArm robot. **What we will do:** -1. Recompute dataset statistics for relative actions and state. -2. Train pi0 with `derive_state_from_action=true` (full UMI pipeline). -3. Evaluate the trained policy on a real robot. +1. Prepare the dataset (EE pose + gripper in the action column). +2. Recompute statistics for relative actions. +3. Train pi0 with `derive_state_from_action=true`. +4. Evaluate the trained policy on a real robot. ## Background -[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. UMI datasets stored in LeRobot format already contain `action` (absolute EE pose) and wrist-camera images. To train pi0 with relative actions, we need **relative action statistics** — so the normalizer sees `(action − state)` distributions. +[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. The key insight from UMI (Chi et al., 2024) is that the action space must include **both EE trajectory and gripper width**, and actions should be expressed as **relative trajectories** (offsets from the current pose). + +### Dataset layout + +The dataset should have this structure: + +| Feature | Shape | Content | +| ------------------------- | --------- | -------------------------------------------------------- | +| `observation.images.cam0` | `[3,H,W]` | Wrist camera image | +| `action` | `[8]` | `[x, y, z, ax, ay, az, proximal, distal]` (EE + gripper) | + +No separate `observation.pose` or `observation.joints` columns are needed — the model derives its proprioception state directly from the action column (`derive_state_from_action=true`). ### Why relative actions? @@ -20,99 +32,128 @@ With relative actions, each action in a chunk is an **offset from the current st relative_action[i] = absolute_action[t + i] − state[t] ``` -This is the representation advocated by UMI (Chi et al., 2024). Compared to absolute actions it removes the need for a consistent global coordinate frame, and compared to delta actions (each step relative to the previous) it avoids error accumulation across the chunk. See the [Action Representations](action_representations) guide for a full comparison. +UMI ablations show this is critical: absolute actions achieve only 25% success vs 100% for relative trajectory on the cup arrangement task. Compared to delta actions (each step relative to the previous), relative trajectory avoids error accumulation. See the [Action Representations](action_representations) guide for details. -### Full UMI mode: `derive_state_from_action` +### `derive_state_from_action` -When `derive_state_from_action=true`, pi0 automatically derives `observation.state` from the action column on the fly — no separate state column or dataset conversion step needed. Under the hood: +When `derive_state_from_action=true`, pi0 derives `observation.state` from the action column during training — no separate state column needed. Under the hood: -- `action_delta_indices` extends to `[-1, 0, 1, ..., 49]` (one extra leading timestep). +- `action_delta_indices` extends to `[-1, 0, 1, ..., chunk_size-1]` (one extra leading timestep). - `DeriveStateFromActionStep` extracts `[action[t-1], action[t]]` as a 2-step state and strips the extra timestep from the action chunk. - `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`. - `RelativeStateProcessorStep` converts the 2-step state to relative proprioception (velocity + zeros) and flattens. -This single flag implies `use_relative_state=true` and `state_obs_steps=2`. +This implies `use_relative_state=true` and `state_obs_steps=2`. -During **inference**, state comes from the robot (via FK), so `DeriveStateFromActionStep` is a no-op. `RelativeStateProcessorStep` buffers the previous state and applies the same conversion automatically. +During **inference**, `DeriveStateFromActionStep` is a no-op — state comes from the robot via forward kinematics. `RelativeStateProcessorStep` buffers the previous state and applies the same conversion automatically. ## Step 1: Recompute Stats -Use the built-in CLI to recompute dataset statistics for relative actions and derive-state-from-action: +After preparing the dataset with EE pose in the action column, recompute statistics with `derive_state_from_action=true`. This computes relative action and state stats so the normalizer sees offset distributions: ```bash lerobot-edit-dataset \ - --repo_id \ - --operation.type recompute_stats \ - --operation.relative_action true \ - --operation.derive_state_from_action true \ - --operation.chunk_size 50 \ - --operation.relative_exclude_joints "['gripper']" \ - --push_to_hub true + --repo-id=glannuzel/grabette-dataset \ + --operation=recompute_stats \ + --operation.relative_action=true \ + --operation.relative_exclude_joints='["proximal", "distal"]' \ + --operation.derive_state_from_action=true \ + --operation.chunk_size=30 \ + --push_to_hub=true ``` -The `derive_state_from_action` flag tells `recompute_stats` to read from the action column (instead of `observation.state`) when computing relative state stats. It automatically enables `relative_state=true` and `state_obs_steps=2`. - -The `relative_exclude_joints` parameter specifies joints that stay absolute. Gripper commands are typically binary or continuous open/close and don't benefit from relative encoding. Leave it as `"[]"` to convert all dimensions to relative. +| Flag | Purpose | +| ------------------------------- | ------------------------------------------------------------------------------- | +| `relative_action=true` | Compute stats on `action − state` (relative actions) | +| `relative_exclude_joints` | Keep gripper dims absolute (they don't benefit from relative encoding) | +| `derive_state_from_action=true` | Derive state from action column (implies `relative_state`, `state_obs_steps=2`) | +| `chunk_size=30` | Must match training chunk size | ## Step 2: Train -No custom training script is needed — standard `lerobot-train` handles everything: - ```bash -lerobot-train \ - --dataset.repo_id=/ \ +#!/bin/bash +set -euo pipefail + +export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:${LD_LIBRARY_PATH:-} + +DATASET="glannuzel/grabette-dataset" +NUM_PROCESSES=8 + +echo "=== Training pi0 on $DATASET (UMI relative EE, ${NUM_PROCESSES} GPUs) ===" +accelerate launch --multi_gpu --num_processes=$NUM_PROCESSES \ + -m lerobot.scripts.lerobot_train \ + --dataset.repo_id="$DATASET" \ + --dataset.video_backend=pyav \ --policy.type=pi0 \ --policy.pretrained_path=lerobot/pi0_base \ + --policy.repo_id=pepijn/grabette-umi-pi0 \ + --policy.chunk_size=30 \ + --policy.n_action_steps=30 \ --policy.derive_state_from_action=true \ - --policy.use_relative_actions=true \ - --policy.relative_exclude_joints='["gripper"]' + --use_relative_actions=true \ + --policy.relative_exclude_joints='["proximal", "distal"]' \ + --batch_size=32 \ + --steps=5000 \ + --policy.scheduler_decay_steps=5000 \ + --policy.dtype=bfloat16 \ + --policy.compile_model=false \ + --policy.gradient_checkpointing=true \ + --policy.device=cuda \ + --output_dir=/fsx/pepijn/outputs/grabette-umi \ + --job_name=grabette-umi-v2 \ + --wandb.enable=true \ + --wandb.disable_artifact=true \ + --wandb.project=grabette-umi \ + --log_freq=100 \ + --save_freq=5000 ``` -`derive_state_from_action=true` auto-enables `use_relative_state=true` and `state_obs_steps=2`. +Key flags: -Under the hood, the training pipeline: +| Flag | Purpose | +| ------------------------------- | ---------------------------------------------------------------------- | +| `derive_state_from_action=true` | Derive proprioception from action column (full UMI mode) | +| `use_relative_actions=true` | Actions are offsets from current state | +| `relative_exclude_joints` | `["proximal", "distal"]` — gripper stays absolute, EE pose is relative | +| `chunk_size=30` | Action horizon: 30 steps (~0.65s at 46 FPS) | +| `n_action_steps=30` | Execute full chunk before replanning | -- Loads relative action stats and relative state stats from the dataset's `meta/stats.json`. -- Extends `action_delta_indices` to `[-1, 0, 1, ..., 49]` to load one extra leading timestep. -- `DeriveStateFromActionStep` extracts the 2-step state from the action chunk and strips the extra timestep. -- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`. -- `RelativeStateProcessorStep` converts the 2-step state to relative offsets from the current timestep, then flattens. -- `NormalizerProcessorStep` normalizes everything. -- The model trains on normalized relative values. - -See the [pi0 documentation](pi0) for all available training options. +Note: `derive_state_from_action=true` automatically implies `use_relative_state=true` and `state_obs_steps=2`. No `rename_map` is needed since there are no separate observation columns to rename. ## Step 3: Evaluate -The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real robot (SO-100 with EE space): +The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real OpenArm robot: ```bash python examples/umi_pi0_relative_ee/evaluate.py ``` -Edit `HF_MODEL_ID`, `HF_DATASET_ID`, and robot configuration at the top of the file. +Edit `HF_MODEL_ID`, camera index, and robot configuration at the top of the file. + +### How inference works + +At inference, the training dataset has no `observation.state` — it was derived from actions. The evaluate script provides `observation.state` from the robot via forward kinematics: + +1. **Robot → FK** — Arm joint positions → EE pose `[x,y,z,ax,ay,az]`, gripper → `[proximal, distal]`. Combined into `observation.state` (8D). +2. **Preprocessor** (loaded from checkpoint) — `DeriveStateFromActionStep` is a no-op. `RelativeStateProcessorStep` buffers previous state, stacks `[prev, current]`, subtracts current → velocity info. `RelativeActionsProcessorStep` caches state. `NormalizerProcessorStep` normalizes. +3. **pi0 inference** — Predicts normalized relative action chunk (30 steps). +4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, `AbsoluteActionsProcessorStep` adds cached state → absolute EE targets. +5. **IK → Robot** — Absolute `[x,y,z,ax,ay,az]` → arm joint targets with full 6-DOF IK (orientation weight = 1.0). `[proximal, distal]` → direct gripper position commands. ### Latency compensation -For real robot deployment, you may want to skip the first few steps of each predicted action chunk to compensate for system latency. Set `LATENCY_SKIP_STEPS` in the evaluate script: +Set `LATENCY_SKIP_STEPS` to skip the first few predicted action steps, compensating for system latency: ```python -LATENCY_SKIP_STEPS = 0 # ceil(total_latency_ms / (1000 / FPS)) +LATENCY_SKIP_STEPS = 7 # ceil(total_latency_ms / (1000 / FPS)) ``` -For example, at 10Hz with ~200ms total latency, set `LATENCY_SKIP_STEPS = 2`. - -The inference flow uses pi0's built-in processor pipeline — no custom wrappers needed: - -1. **Robot → FK** — Joint positions are converted to EE pose via `ForwardKinematicsJointsToEE`, producing `observation.state`. -2. **Preprocessor** — `DeriveStateFromActionStep` is a no-op (state comes from robot). `RelativeStateProcessorStep` buffers previous state, stacks, and converts to relative. `RelativeActionsProcessorStep` caches state. `NormalizerProcessorStep` normalizes. -3. **pi0 inference** — The model predicts a normalized relative action chunk. -4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, then `AbsoluteActionsProcessorStep` adds the cached state back to get absolute EE targets. -5. **IK → Robot** — `InverseKinematicsEEToJoints` converts absolute EE targets to joint commands. +At 46 FPS (~22ms/step) with ~150ms total latency: `ceil(150/22) ≈ 7`. Start with 0 for a safe first test. ## Replay Viewer -Before running on hardware, you can visualize any dataset episode in a browser-based 3D viewer. The viewer shows the EE trajectory overlaid on the OpenArm URDF model, making it easy to sanity-check recorded data or debug unexpected behavior. +Visualize any dataset episode in a browser-based 3D viewer before running on hardware. The viewer shows the EE trajectory overlaid on the OpenArm URDF model. ### Quick start @@ -120,8 +161,6 @@ Before running on hardware, you can visualize any dataset episode in a browser-b python examples/umi_pi0_relative_ee/replay.py ``` -This extracts the trajectory from episode 0 of the default dataset, starts a local HTTP server, and opens the viewer at [http://localhost:8765/replay_viewer.html](http://localhost:8765/replay_viewer.html). - ### Options | Flag | Default | Description | @@ -131,64 +170,53 @@ This extracts the trajectory from episode 0 of the default dataset, starts a loc | `--port` | `8765` | HTTP server port | | `--force` | off | Re-extract trajectory even if cached | -Example with a different dataset and episode: - -```bash -python examples/umi_pi0_relative_ee/replay.py \ - --repo-id myuser/my-dataset \ - --episode 3 \ - --port 8766 -``` - ### Viewer controls -The panel in the top-left corner shows live EE coordinates (x, y, z, ax, ay, az) and gripper state for the current frame. Below that are transport controls: +The panel in the top-left corner shows live EE coordinates and gripper state. Transport controls: - **Play / Pause** — toggle automatic playback. -- **Step buttons** (◀ ▶) — advance or rewind one frame at a time. -- **Reset** (⟳) — jump back to frame 0. -- **Scrubber** — drag to seek to any frame. -- **Speed selector** — 0.25×, 0.5×, 1×, 2×, or 4× playback speed. - -The 3D scene uses orbit controls — click and drag to rotate, scroll to zoom, right-click drag to pan. +- **Step buttons** (◀ ▶) — advance or rewind one frame. +- **Reset** (⟳) — jump to frame 0. +- **Scrubber** — drag to seek. +- **Speed selector** — 0.25× to 4× playback speed. ### Color legend | Color | Meaning | | ------------------ | --------------------------------------------- | | Red sphere | Current EE position | -| Yellow trail | Past trajectory (frames already visited) | -| Dark trail | Future trajectory (frames ahead) | +| Yellow trail | Past trajectory | +| Dark trail | Future trajectory | | Orange ring + axes | URDF `ee_target` frame (zero-joint reference) | -The trajectory is automatically re-centered so that frame 0 aligns with the robot's `openarm_right_ee_target` link in the zero-joint pose. - ## How the Pieces Fit Together ``` -Training (full UMI mode: derive_state_from_action=true): - DataLoader (action: B,51,D) - → DeriveStateFromActionStep (state = action[:,:2,:], action = action[:,1:,:]) - → RelativeActionsProcessorStep (action -= state[:,-1,:]) - → RelativeStateProcessorStep (state offsets from current, flatten → B,2*D) - → NormalizerProcessorStep → pi0 model +Training (derive_state_from_action=true): + DataLoader loads action: [B, 31, 8] (chunk_size=30 + 1 leading) + → DeriveStateFromActionStep + state = action[:, :2, :] → [B, 2, 8] + action = action[:, 1:, :] → [B, 30, 8] + → RelativeActionsProcessorStep (action -= state[:, -1, :]) + → RelativeStateProcessorStep (state offsets from current, flatten → [B, 16]) + → NormalizerProcessorStep → pi0 model Inference: - robot joints → FK → observation.state (absolute EE) - ↓ - DeriveStateFromActionStep (no-op) - ↓ - RelativeActionsProcessorStep (caches state) - ↓ - RelativeStateProcessorStep (buffers prev, stacks, subtracts, flattens) - ↓ - NormalizerProcessorStep → pi0 model → relative action chunk - ↓ - UnnormalizerProcessorStep - ↓ - AbsoluteActionsProcessorStep (+ cached state → absolute EE) - ↓ - IK → joint targets → robot + arm joints → FK → observation.state [8D: x,y,z,ax,ay,az,prox,dist] + ↓ + DeriveStateFromActionStep (no-op) + ↓ + RelativeActionsProcessorStep (caches state) + ↓ + RelativeStateProcessorStep (buffers prev, stacks, subtracts, flattens) + ↓ + NormalizerProcessorStep → pi0 model → relative action chunk [30, 8] + ↓ + UnnormalizerProcessorStep + ↓ + AbsoluteActionsProcessorStep (+ cached state → absolute EE) + ↓ + IK → joint targets → robot ``` ## References diff --git a/examples/umi_pi0_relative_ee/evaluate.py b/examples/umi_pi0_relative_ee/evaluate.py index 687a77241..9e41ee6bb 100644 --- a/examples/umi_pi0_relative_ee/evaluate.py +++ b/examples/umi_pi0_relative_ee/evaluate.py @@ -15,29 +15,32 @@ # limitations under the License. """ -Inference script for a pi0 model trained with **relative EE actions** on an OpenArm robot. +Inference script for a pi0 model trained with UMI-style relative EE actions +on an OpenArm robot (single right arm, one wrist camera). -Single right OpenArm follower with one wrist camera. +Training dataset layout: + observation.images.cam0 [3, 720, 960] + action [x, y, z, ax, ay, az, proximal, distal] (shape 8) -This uses the built-in ``DeriveStateFromActionStep`` (no-op at inference), -``RelativeActionsProcessorStep``, ``AbsoluteActionsProcessorStep``, and -``RelativeStateProcessorStep`` that are already wired into pi0's processor -pipeline. +The model uses ``derive_state_from_action=true``, so observation.state is +derived from the action column during training. At inference the state must +be provided by the robot — this script uses FK to compute the current EE +pose and gripper position, which it exposes as ``observation.state``. -The inference loop: - 1. Reads joint positions from the robot (7-DOF arm + gripper). - 2. Converts to EE pose via forward kinematics (FK). - This produces ``observation.state`` with the current EE pose. - 3. The pi0 preprocessor: - a) ``DeriveStateFromActionStep`` — no-op (state comes from robot). - b) ``RelativeActionsProcessorStep`` caches the raw state. - c) ``RelativeStateProcessorStep`` buffers prev state, stacks, subtracts. - d) ``NormalizerProcessorStep`` normalizes state and actions. - 4. pi0 predicts relative action chunk. - 5. The pi0 postprocessor: - a) ``UnnormalizerProcessorStep`` unnormalizes. - b) ``AbsoluteActionsProcessorStep`` adds cached state → absolute EE. - 6. IK converts absolute EE → joint targets → robot. +Pipeline: + 1. Read arm joints from robot → FK → observation.state [x,y,z,ax,ay,az,prox,dist] + 2. Read camera image → observation.images.cam0 + 3. pi0 preprocessor (loaded from checkpoint): + - DeriveStateFromActionStep: no-op at inference (state from robot) + - RelativeActionsProcessorStep: caches current state + - RelativeStateProcessorStep: buffers prev state, stacks [prev,cur], + subtracts current → velocity info, flattens + - NormalizerProcessorStep: normalizes + 4. pi0 predicts relative action chunk (30 steps) + 5. pi0 postprocessor: unnormalize, add cached state → absolute EE + 6. IK: absolute EE [x,y,z,ax,ay,az] → arm joint targets + 7. Gripper [proximal, distal] → gripper motor targets + 8. Send to robot Usage: python evaluate.py @@ -45,57 +48,177 @@ Usage: from __future__ import annotations +import numpy as np +from scipy.spatial.transform import Rotation + from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig -from lerobot.configs.types import FeatureType, PolicyFeature -from lerobot.datasets.feature_utils import combine_feature_dicts from lerobot.datasets.lerobot_dataset import LeRobotDataset -from lerobot.datasets.pipeline_features import aggregate_pipeline_dataset_features, create_initial_features from lerobot.model.kinematics import RobotKinematics from lerobot.policies.factory import make_pre_post_processors from lerobot.policies.pi0.modeling_pi0 import PI0Policy -from lerobot.processor import ( - RelativeStateProcessorStep, - RobotProcessorPipeline, - make_default_teleop_action_processor, -) -from lerobot.processor.converters import ( - observation_to_transition, - robot_action_observation_to_transition, - transition_to_observation, - transition_to_robot_action, -) +from lerobot.processor import RelativeStateProcessorStep from lerobot.robots.openarm_follower import OpenArmFollower, OpenArmFollowerConfig -from lerobot.robots.so_follower.robot_kinematic_processor import ( - ForwardKinematicsJointsToEE, - InverseKinematicsEEToJoints, -) from lerobot.scripts.lerobot_record import record_loop from lerobot.types import RobotAction, RobotObservation from lerobot.utils.control_utils import init_keyboard_listener from lerobot.utils.utils import log_say from lerobot.utils.visualization_utils import init_rerun -FPS = 30 +# --------------------------------------------------------------------------- +# Configuration — adapt these to your setup +# --------------------------------------------------------------------------- + +FPS = 46 EPISODE_TIME_SEC = 60 TASK_DESCRIPTION = "red cube" HF_MODEL_ID = "pepijn223/grabette-umi-pi0" -# Latency compensation: skip this many steps from the start of each predicted -# action chunk. Formula: ceil(total_latency_ms / (1000 / FPS)). -# E.g. at 10Hz with ~200ms total system latency: ceil(200 / 100) = 2. +# Latency compensation: skip this many predicted action steps to account for +# camera + inference + execution latency. Formula: ceil(total_ms / (1000/FPS)). +# At 46 FPS (~22ms/step) with ~150ms total latency: ceil(150/22) ≈ 7. +# Start with 0 for a safe first test, then increase to match measured latency. LATENCY_SKIP_STEPS = 0 -# EE feature keys produced by ForwardKinematicsJointsToEE (arm pose only). -# Gripper joints use absolute position control, not EE-relative. -EE_KEYS = ["x", "y", "z", "wx", "wy", "wz"] - URDF_PATH = "src/lerobot/robots/openarm_follower/urdf/openarm_bimanual_pybullet.urdf" -URDF_EE_FRAME = "openarm_right_link7" +URDF_EE_FRAME = "openarm_right_ee_target" + +IK_POSITION_WEIGHT = 1.0 +IK_ORIENTATION_WEIGHT = 1.0 + +# --------------------------------------------------------------------------- +# Dataset features for inference +# +# The training dataset has only observation.images.cam0 and action. +# observation.state is derived from action during training +# (derive_state_from_action=true) but must be supplied by the robot at +# inference. We define it here so build_dataset_frame can map FK output +# to the right feature. +# --------------------------------------------------------------------------- + +DATASET_FEATURES: dict = { + "observation.state": { + "dtype": "float32", + "shape": [8], + "names": ["x", "y", "z", "ax", "ay", "az", "proximal", "distal"], + }, + "observation.images.cam0": { + "dtype": "video", + "shape": [3, 720, 960], + "names": ["channels", "height", "width"], + "info": { + "video.height": 720, + "video.width": 960, + "video.codec": "h264", + "video.pix_fmt": "yuv420p", + "video.is_depth_map": False, + "video.fps": FPS, + "video.channels": 3, + "has_audio": False, + }, + }, + "action": { + "dtype": "float32", + "shape": [8], + "names": ["x", "y", "z", "ax", "ay", "az", "proximal", "distal"], + }, + "timestamp": {"dtype": "float32", "shape": [1], "names": None}, + "frame_index": {"dtype": "int64", "shape": [1], "names": None}, + "episode_index": {"dtype": "int64", "shape": [1], "names": None}, + "index": {"dtype": "int64", "shape": [1], "names": None}, + "task_index": {"dtype": "int64", "shape": [1], "names": None}, +} + + +# --------------------------------------------------------------------------- +# FK / IK callables +# --------------------------------------------------------------------------- + + +class JointsToEE: + """FK: raw robot observation → flat dict matching observation.state names. + + Arm joint positions → EE pose [x,y,z,ax,ay,az] via forward kinematics. + Gripper motor positions → [proximal, distal]. + Camera images pass through unchanged. + """ + + def __init__(self, kinematics: RobotKinematics, arm_motor_names: list[str]): + self.kin = kinematics + self.arm = arm_motor_names + + def __call__(self, obs: RobotObservation) -> RobotObservation: + q = np.array([float(obs[f"{m}.pos"]) for m in self.arm]) + t = self.kin.forward_kinematics(q) + rot = Rotation.from_matrix(t[:3, :3]).as_rotvec() + + out: dict = { + "x": float(t[0, 3]), + "y": float(t[1, 3]), + "z": float(t[2, 3]), + "ax": float(rot[0]), + "ay": float(rot[1]), + "az": float(rot[2]), + "proximal": float(obs["proximal.pos"]), + "distal": float(obs["distal.pos"]), + } + for k, v in obs.items(): + if not k.endswith((".pos", ".vel", ".torque")): + out[k] = v + return out + + +class EEToJoints: + """IK: policy action dict → motor position dict for the robot. + + Reads [x,y,z,ax,ay,az] from the action, runs IK for arm joint targets. + Passes [proximal, distal] as direct gripper position commands. + """ + + def __init__( + self, + kinematics: RobotKinematics, + arm_motor_names: list[str], + position_weight: float = 1.0, + orientation_weight: float = 1.0, + ): + self.kin = kinematics + self.arm = arm_motor_names + self.pw = position_weight + self.ow = orientation_weight + self.q_curr: np.ndarray | None = None + + def __call__(self, args: tuple[RobotAction, RobotObservation]) -> RobotAction: + action, obs = args + + q_raw = np.array([float(obs[f"{m}.pos"]) for m in self.arm]) + if self.q_curr is None: + self.q_curr = q_raw + + t_des = np.eye(4) + t_des[:3, :3] = Rotation.from_rotvec([action["ax"], action["ay"], action["az"]]).as_matrix() + t_des[:3, 3] = [action["x"], action["y"], action["z"]] + + q_target = self.kin.inverse_kinematics( + self.q_curr, t_des, position_weight=self.pw, orientation_weight=self.ow + ) + self.q_curr = q_target + + out: dict = {f"{m}.pos": float(q_target[i]) for i, m in enumerate(self.arm)} + out["proximal.pos"] = float(action["proximal"]) + out["distal.pos"] = float(action["distal"]) + return out + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- def main(): - camera_config = {"cam0": OpenCVCameraConfig(index_or_path=0, width=960, height=720, fps=FPS)} + camera_config = { + "cam0": OpenCVCameraConfig(index_or_path=0, width=960, height=720, fps=FPS), + } robot_config = OpenArmFollowerConfig( port="can0", id="right_openarm", @@ -110,79 +233,20 @@ def main(): policy.config.latency_skip_steps = LATENCY_SKIP_STEPS arm_motor_names = list(robot.bus.motors.keys()) - gripper_motor_names = list(robot.gripper_bus.motors.keys()) - kinematics_solver = RobotKinematics( + kinematics = RobotKinematics( urdf_path=URDF_PATH, target_frame_name=URDF_EE_FRAME, joint_names=arm_motor_names, ) - # The policy starts from the robot's current EE pose (via FK below). - # Relative actions are predicted as deltas from that pose, so no manual - # re-centering is needed — the starting point is always the live EE tip. + fk = JointsToEE(kinematics, arm_motor_names) + ik = EEToJoints(kinematics, arm_motor_names, IK_POSITION_WEIGHT, IK_ORIENTATION_WEIGHT) - # FK: joint observation → EE observation (produces observation.state). - # gripper_names=[] means proximal/distal pass through as absolute positions. - robot_joints_to_ee_processor = RobotProcessorPipeline[RobotObservation, RobotObservation]( - steps=[ - ForwardKinematicsJointsToEE( - kinematics=kinematics_solver, - motor_names=arm_motor_names, - gripper_names=[], - ) - ], - to_transition=observation_to_transition, - to_output=transition_to_observation, - ) - - # IK: EE action → joint targets. Gripper actions are absolute and pass through. - robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction]( - steps=[ - InverseKinematicsEEToJoints( - kinematics=kinematics_solver, - motor_names=arm_motor_names, - gripper_names=[], - initial_guess_current_joints=True, - ), - ], - to_transition=robot_action_observation_to_transition, - to_output=transition_to_robot_action, - ) - - # OpenArm observations include .vel and .torque per motor; the EE policy - # pipeline only needs .pos (converted to EE by FK) and camera features. - obs_features = { - k: v - for k, v in robot.observation_features.items() - if not (k.endswith(".vel") or k.endswith(".torque")) - } - - # A dataset object is needed for its .features and .meta.stats even when - # not recording — record_loop uses them for building observation/action frames. dataset = LeRobotDataset.create( repo_id="tmp/openarm_eval_scratch", fps=FPS, - features=combine_feature_dicts( - aggregate_pipeline_dataset_features( - pipeline=robot_joints_to_ee_processor, - initial_features=create_initial_features(observation=obs_features), - use_videos=True, - ), - aggregate_pipeline_dataset_features( - pipeline=make_default_teleop_action_processor(), - initial_features=create_initial_features( - action={ - **{f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,)) for k in EE_KEYS}, - **{ - f"{g}.pos": PolicyFeature(type=FeatureType.ACTION, shape=(1,)) - for g in gripper_motor_names - }, - } - ), - use_videos=True, - ), - ), + features=DATASET_FEATURES, robot_type=robot.name, use_videos=True, image_writer_threads=4, @@ -195,7 +259,7 @@ def main(): preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}}, ) - _relative_state_steps = [s for s in preprocessor.steps if isinstance(s, RelativeStateProcessorStep)] + relative_state_steps = [s for s in preprocessor.steps if isinstance(s, RelativeStateProcessorStep)] robot.connect() @@ -207,7 +271,7 @@ def main(): raise ValueError("Robot is not connected!") log_say("Starting policy execution") - for step in _relative_state_steps: + for step in relative_state_steps: step.reset() record_loop( @@ -221,9 +285,8 @@ def main(): control_time_s=EPISODE_TIME_SEC, single_task=TASK_DESCRIPTION, display_data=True, - teleop_action_processor=make_default_teleop_action_processor(), - robot_action_processor=robot_ee_to_joints_processor, - robot_observation_processor=robot_joints_to_ee_processor, + robot_action_processor=ik, + robot_observation_processor=fk, ) finally: robot.disconnect()