Commit Graph

115 Commits

Author SHA1 Message Date
Michel Aractingi
0747afdba7 Optimize dataset updates by incrementally concatenating new data instead of reloading from disk, reducing memory usage and improving performance. 2025-09-05 18:37:48 +02:00
Michel Aractingi
992fb177c3 further memory optimizations needed due to calling pd.concat 2025-09-03 18:49:30 +02:00
Michel Aractingi
1db3401159 remove unused Iterable Namespace 2025-09-03 16:23:36 +02:00
pre-commit-ci[bot]
7868df27dc [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-09-03 14:18:02 +00:00
Michel Aractingi
0e04f5fbbe remove html templates and flask dependency 2025-09-03 16:17:10 +02:00
Michel Aractingi
fdccf7774b fix(memory explosion) added delete to episodes and hf_dataset everytime we reload while collecting a dataset ot avoid memroy explosion 2025-09-03 15:31:28 +02:00
Michel Aractingi
2a3d62259e visualize_dataset_html deprecated 2025-09-02 15:51:11 +02:00
Michel Aractingi
2df4e25558 added the file and video max size as arguments 2025-09-02 15:41:42 +02:00
pre-commit-ci[bot]
4062d0564a [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-09-01 19:32:19 +00:00
CarolinePascal
0a30636fc6 chore(dataset v2.0): drop support for dataset v2.0 format 2025-09-01 21:31:46 +02:00
CarolinePascal
adad3698e1 chore(dataset v1): drop support for dataset v1 format 2025-09-01 19:37:20 +02:00
Michel Aractingi
84ffc28854 moved get_video_duration_in_s to video_utils and replaced subprocess and ffmpeg with pyAV 2025-08-29 01:31:53 +02:00
Michel Aractingi
47aee1fdbe revert back video_utils.py to using pyav while keeping concat_video_files function 2025-08-29 01:06:46 +02:00
Michel Aractingi
bbd64b9ce5 fixes in datasets/utils.py 2025-08-29 00:03:13 +02:00
Michel Aractingi
35f36e8fba removed outdated todos 2025-08-28 10:10:17 +02:00
Francesco Capuano
2ca6edc19e Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
2025-08-25 16:34:44 +02:00
mgiac-hexagon
577cd10974 Removed dupicate lines of code (#1709) 2025-08-25 12:39:32 +02:00
lxk
b0923ab74b fix(dataset): Use provided episode_data in save_episode (#1740)
The 'episode_data' parameter was previously ignored, causing an error if provided. This change ensures it is correctly used, which allows for asynchronous episode saving by passing a copy of the episode buffer, preventing conflicts with the main data collection loop.
2025-08-22 15:24:02 +02:00
Jack Vial
7f70b78f32 Add missing encoding table entries for Koch arm (#1534) 2025-08-20 17:24:05 +02:00
Michel Aractingi
db36f01e8b add update_chunk_settings method for LeRobotDatasetMetadata. Introduce tests for chunk settings updates and validation of parameters. 2025-08-18 00:00:23 +02:00
Michel Aractingi
c7a3b02625 fixed tensor indicies in _check_cached_episode_sufficient in lerobot_dataset.py, added test 2025-08-13 16:16:32 +02:00
Michel Aractingi
267a753eda Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3 2025-08-13 01:39:32 +02:00
Caroline Pascal
11e6bd762a fix(busy_wait): fix busy_wait implementation for Windows platforms and removing erronous TODO (#1695) 2025-08-08 10:46:14 +02:00
Steven Palma
ce3b9f627e chore(docs): prioritize use of entry points in docs + fix nightly badge (#1692)
* chore(docs): fix typo in nightly badge

* chore(docs): prioritize the use of entrypoints for consistency
2025-08-07 14:25:44 +02:00
Adil Zouitine
88f7bf01c1 feat(pipeline): universal processor for LeRobot (#1431)
* Refactor observation preprocessing to use a modular pipeline system

- Introduced `RobotPipeline` and `ObservationProcessor` for handling observation transformations.
- Updated `preprocess_observation` to maintain backward compatibility while leveraging the new pipeline.
- Added tests for the new processing components and ensured they match the original functionality.
- Removed hardcoded logic in favor of a more flexible, composable architecture.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor observation processing and improve modularity

- Updated `ObservationProcessor` to enhance the modular design for processing observations.
- Cleaned up imports and improved code readability by removing unnecessary lines and comments.
- Ensured backward compatibility while integrating new processing components.
- Added tests to validate the functionality of the updated processing architecture.

* Remove redundant tests for None observation and serialization methods in `test_observation_processor.py` to streamline the test suite and improve maintainability.

* Refactor processing architecture to use RobotProcessor

- Replaced instances of RobotPipeline with RobotProcessor across the codebase for improved modularity and clarity.
- Introduced ProcessorStepRegistry for better management of processing steps.
- Updated relevant documentation and tests to reflect the new processing structure.
- Enhanced the save/load functionality to support the new processor design.
- Added a model card template for RobotProcessor to facilitate sharing and documentation.

* Add RobotProcessor tutorial to documentation

- Introduced a new tutorial on using RobotProcessor for preprocessing robot data.
- Added a section in the table of contents for easy navigation to the new tutorial.
- The tutorial covers key concepts, real-world scenarios, and practical examples for effective use of the RobotProcessor pipeline.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add normalization processor and related components

- Introduced `NormalizationProcessor` to handle both observation normalization and action unnormalization.
- Added `ObservationNormalizer` and `ActionUnnormalizer` classes for specific normalization tasks.
- Updated `__init__.py` to include the new `NormalizationProcessor` in the module exports.
- Enhanced `ObservationProcessor` with registration in the `ProcessorStepRegistry` for better modularity.
- Created `RenameProcessor` for renaming keys in observations, improving flexibility in data processing.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enhance processing architecture with new components

- Added `RenameProcessor` to facilitate key renaming in observations, improving data handling flexibility.
- Updated `__init__.py` to include `RenameProcessor` in module exports.
- Refactored `NormalizationProcessor` and `ObservationNormalizer` to use `rsplit` for better key handling.
- Introduced comprehensive tests for `NormalizationProcessor` and `RenameProcessor` to ensure functionality and robustness.

* chore (docs): add docstring for processor

* fix (test): test factory

* fix(test): policies

* Update tests/processor/test_observation_processor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adil Zouitine <adilzouitinegm@gmail.com>

* chore(test): add suggestion made by copilot regarding numpy test

* fix(test): import issue

* Refactor normalization components and update tests

- Renamed `ObservationNormalizer` to `NormalizerProcessor` and `ActionUnnormalizer` to `UnnormalizerProcessor` for clarity.
- Consolidated normalization logic for both observations and actions into `NormalizerProcessor` and `UnnormalizerProcessor`.
- Updated tests to reflect the new class names and ensure proper functionality of normalization and unnormalization processes.
- Enhanced handling of missing statistics in normalization processes.

* chore (docstrin):Improve docstring for NormalizerProcessor

* feat (device processor): Implement device processor

* chore (batch handling): Enhance processing components with batch conversion utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(test): linting issue

* chore (output format): improves output format

* chore (type): add typing for multiprocess envs

* feat (overrides): Implement support for loading processors with parameter overrides

- Added the ability to provide non-serializable objects when loading processors from saved configurations using the `overrides` parameter.
- Enhanced error handling for invalid override keys and instantiation errors.
- Updated documentation and examples to illustrate the usage of overrides for both registered and unregistered steps.
- Added comprehensive tests to validate the new functionality and ensure backward compatibility.

* chore(normalization): addressing comments from copilot

* chore(learner): nit comment from copilot

* feat(pipeline): Enhance step_through method to support both tuple and dict inputs

* refactor(pipeline): Simplify observation and padding data handling in batch transitions

* Apply suggestions from code review

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Signed-off-by: Adil Zouitine <adilzouitinegm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(pipeline): Introduce ComplementaryDataProcessor for handling complementary data in transitions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(pipeline): Transition from tuple to dictionary format for EnvTransition

- Updated the EnvTransition structure to use a dictionary format instead of a tuple, enhancing readability and maintainability.
- Replaced instances of TransitionIndex with TransitionKey for accessing transition components.
- Adjusted related processing functions and tests to accommodate the new dictionary format, ensuring consistent handling of transitions across the codebase.

* refactor(observation_processor): Improve observation processing by using constants and simplifying pixel handling

- Introduced constants for observation keys to enhance readability.
- Streamlined the handling of the "pixels" key by copying observations first and processing images more clearly.
- Updated the environment state and agent position assignments to use the new constants, improving maintainability.

* feat(pipeline): Add hook unregistration functionality and enhance documentation

- Implemented methods to unregister before, after, and reset hooks in the RobotProcessor class, allowing for more flexible hook management.
- Enhanced documentation to clarify hook execution semantics and the implications of modifying transitions within hooks.
- Added comprehensive tests to verify the correct behavior of hook registration and unregistration, including error handling for non-existent hooks.

* refactor(pipeline): Clarify hook behavior and improve documentation

- Updated the RobotProcessor class to ensure hooks are strictly for observation and do not modify transitions, enhancing clarity and maintainability.
- Refactored hook registration methods to reflect the new behavior, ensuring they accept only functions that do not return modified transitions.
- Enhanced documentation to clearly outline the purpose of hooks and their execution semantics.
- Added tests to verify that hooks are not executed during the step_through method while ensuring they function correctly during the __call__ method.

* feat(pipeline): Add __repr__ method to RobotProcessor for improved readability

- Implemented a __repr__ method in the RobotProcessor class to provide a clear string representation of the processor, including step names and optional parameters like name and seed.
- Added comprehensive tests to validate the __repr__ output for various scenarios, including empty processors, single and multiple steps, custom names, and seed values.
- Ensured that the representation handles long lists of steps with truncation for better readability.

* chore(pipeline): Move _CFG_NAME along other class member

* refactor(pipeline): Utilize get_safe_torch_device for device assignment

- Replaced direct torch.device instantiation with get_safe_torch_device to ensure safe device handling.
- This change enhances code readability and maintains consistency in device management across the RobotProcessor class.

* refactor(pipeline): Enhance state filename generation and profiling method

- Updated state filename generation to use the registry name when available, improving clarity in saved files.
- Modified the profile_steps method to include a warmup_runs parameter, allowing for more controlled performance profiling.
- Ensured consistent conditions during profiling by deep copying transitions for each run, enhancing accuracy in timing results.

* chore(doc): address pip install commant lerobot that not exist yet

* feat(pipeline): Enhance configuration filename handling and state file naming

- Introduced support for custom configuration filenames in the `save_pretrained` method, allowing users to specify a filename instead of the default.
- Improved state file naming to include step indices, preventing conflicts when multiple processors of the same type are saved.
- Added automatic detection for configuration files when loading from a directory, with error handling for multiple files.
- Updated tests to validate new features, including custom filenames and automatic config detection.

* refactor(pipeline): Improve state file naming conventions for clarity and uniqueness

- Enhanced state file naming to include the processor's sanitized name, ensuring uniqueness when multiple processors are saved in the same directory.
- Updated tests to reflect changes in state file naming, verifying that filenames now include the processor name and step indices to prevent conflicts.
- Added a new test to validate state file naming when using multiple processors, ensuring distinct filenames for each processor's state files.

* docs(pipeline): Add clarification for repo name sanitization process

* Feat/pipeline add feature contract (#1637)

* Add feature contract to pipelinestep and pipeline

* Add tests

* Add processor tests

* PR feedback

* encorperate pr feedback

* type in doc

* oops

* docs(pipeline): Clarify transition handling and hook behavior

- Updated documentation to specify that hooks always receive transitions in EnvTransition format, ensuring consistent behavior across input formats.
- Refactored the step_through method to yield only EnvTransition objects, regardless of the input format, and updated related tests to reflect this change.
- Enhanced test assertions to verify the structure of results and the correctness of processing steps.

* refactor(pipeline): Remove to() method for device management

- Eliminated the to() method from RobotProcessor, which was responsible for moving tensor states to specified devices.
- Removed associated unit tests that validated the functionality of the to() method across various scenarios.
- Streamlined the pipeline code by focusing on other device management strategies.

* refactor(pipeline): Remove model card generation and streamline processor methods

- Eliminated the _generate_model_card method from RobotProcessor, which was responsible for generating README.md files from a template.
- Updated save_pretrained method to remove model card generation, focusing on serialization of processor definitions and parameters.
- Added default implementations for get_config, state_dict, load_state_dict, reset, and feature_contract methods in various processor classes to enhance consistency and usability.

* refactor(observation): Streamline observation preprocessing and remove unused processor methods

- Updated the `preprocess_observation` function to enhance image handling and ensure proper tensor formatting.
- Removed the `RobotProcessor` and associated transition handling from the `rollout` function, simplifying the observation processing flow.
- Integrated direct calls to `preprocess_observation` for improved clarity and efficiency in the evaluation script.

* refactor(pipeline): Rename parameters for clarity and enhance save/load functionality

- Updated parameter names in the save_pretrained and from_pretrained methods for improved readability, changing destination_path to save_directory and source to pretrained_model_name_or_path.
- Enhanced the save_pretrained method to ensure directory creation and file handling is consistent with the new parameter names.
- Streamlined the loading process in from_pretrained to utilize loaded_config for better clarity and maintainability.

* refactor(pipeline): minor improvements (#1684)

* chore(pipeline): remove unused features + device torch + envtransition keys

* refactor(pipeline): ImageProcessor & StateProcessor are both implemented directly in VanillaObservationPRocessor

* refactor(pipeline): RenameProcessor now inherits from ObservationProcessor + remove unused code

* test(pipeline): fix broken test after refactors

* docs(pipeline): update docstrings VanillaObservationProcessor

* chore(pipeline): move None check to base pipeline classes

---------

Signed-off-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-08-06 16:11:04 +02:00
Francesco Capuano
90d3a99aa1 Fix policy construction (#1665)
* add: test to check proper construction with multiple features with STATE/ACTION type

* fix: robot and action state should match policy's expectations

* fix minor

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

---------

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
2025-08-04 21:49:51 +02:00
Simon Alibert
2f8d98b05e Update readme (#1570)
* Cleanup badges

* Remove comment

* Remove profiling section

* Move acknowledgment

* Move citations

* Fix badge display

* Move build your robot section

* Fix nightly badge

* Revert be13b3f

* Update README.md

Co-authored-by: HUANG TZU-CHUN <tzu.chun.huang.tw@gmail.com>
Signed-off-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

* chore(docs): optimize readme for PyPI rendering

* chore(docs): move policy readme to docs folder + symlink in policy dirs

* fix(docs): max width og lerobot logo + url in citation block

---------

Signed-off-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: HUANG TZU-CHUN <tzu.chun.huang.tw@gmail.com>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2025-08-01 17:39:39 +02:00
Steven Palma
91ed6097bc fix(ci): declare entrypoints + fix testing release (#1642) 2025-08-01 12:04:34 +02:00
Michel Aractingi
4048b02d4a improved typing in datasets/utils.py 2025-07-31 14:32:29 +02:00
Yushun Xiang
71eff183ff Fix pi0 checkpoint state map (#1415)
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-07-30 17:38:32 +02:00
Francesco Capuano
527ae8e557 Add variable-size test datasets (#1610)
* fix: dummy datasets can be written to multiple files in multiple folders based on arbitrary data size

* fix: writing atomic episodes to multiple files (maybe)

* fix: moving unused write dataset function to test code
2025-07-30 11:26:28 +02:00
Michel Aractingi
890b1e473d Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3 2025-07-30 00:43:53 +02:00
Michel Aractingi
6447352439 added a check for comparing cached episodes in order to trigger a new download if the requested episodes dont match the cached ones 2025-07-30 00:32:28 +02:00
Michel Aractingi
788544d936 update lerobot_dataset docstring 2025-07-30 00:12:23 +02:00
Michel Aractingi
59d108a807 fix(convert_v2_v3) reverted concat data files from previous commit
fixed bug in meta data related chunk_index and file_index when concatenating video files, added clearer condition to respect conditions so that episode doesnt span multiple videos
2025-07-29 22:58:24 +02:00
Rayen Ghali
67196c9d53 fix(180-degree rotation): Add cv2.ROTATE_180 to rotation checks in both OpenCV and RealSense camera implementations 2025-07-29 13:54:43 +02:00
Abhay Deshpande
5695432142 fix(DiffusionPolicy): Fix bug where training without image features would crash with exception, fix environment state docs (#1617)
* Fix bug in diffusion config validation when not using image features

* Fix DiffusionPolicy docstring about shape of env state
2025-07-29 13:40:16 +02:00
Michel Aractingi
c7c3b477d6 Fix sample beta for smolvla as done for pi0, remove sample_beta func (#1611) 2025-07-28 17:28:55 +02:00
Lumen Yang
7fe6adaf61 fix(config): typing correction on config.py (#1320)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-07-28 15:22:37 +02:00
Kleist Bond
4b88842d20 fix bug about sampling time from beta distribution (#1605)
* fix bug about sampling t from beta distribution

* fix: address review comments

---------
2025-07-28 15:17:30 +02:00
Adil Zouitine
c3d5e494c0 fix(policies): remove action from batch for offline evaluation (#1609)
* fix(policies): remove action from batch for offline evaluation in diffusion, tdmpc, and vqbet policies

* style(diffusion): correct comment capitalization for clarity in modeling_diffusion.py
2025-07-28 13:10:34 +02:00
Caroline Pascal
664e069c3f docs/style: updating docs and deprecated links (#1584) 2025-07-28 12:55:47 +02:00
Adil Zouitine
b61a4ded9a chore(pi0fast): TODO comment to warn the need for removal ignore_index (#1593) 2025-07-28 11:49:05 +02:00
Adil Zouitine
615adfc48d smolfix(vla): typing and fix offline inference when action in the batch (#1597) 2025-07-28 11:44:22 +02:00
HUANG TZU-CHUN
b2a71c6fe4 fix: Rename sync_cache_first to force_cache_sync in LeRobotDataset docstring (#1310) 2025-07-25 15:08:00 +02:00
Adil Zouitine
4c8f002055 fix(act): disable VAE during offline inference (#1588)
Prevent VAE inference when running in offline mode. In the lerobot dataset, the presence of the 'action' field incorrectly triggers the VAE inference block. This leads to a RuntimeError due to mismatched tensor dimensions (3 vs 2) when concatenating cls_embed, robot_state_embed, and action_embed—since action_embed lacks the chunk_size dimension. Additionally, this aligns with the original paper, where variational inference is skipped during inference.
2025-07-24 17:09:12 +02:00
Eugene Mironov
989f3d05ba [Async Inference] Merge Protos & refactoring (#1480)
* Merge together proto files and refactor Async inference

* Fixup for Async inference

* Drop not reuqired changes

* Fix tests

* Drop old async files

* Drop chunk_size param

* Fix versions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix wrong fix

Co-authored-by: Ben Zhang <ben.zhang@uwaterloo.ca>

* Fixup

---------

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Ben Zhang <ben.zhang@uwaterloo.ca>
Co-authored-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
2025-07-23 11:30:01 +02:00
Michel Aractingi
218ebed3ef feat(convert_dataset_v21_to_v3) added the use of more efficient Dataset.from_parquet and concatenate_datasets 2025-07-22 17:27:41 +02:00
Michel Aractingi
835f0eddfa bug(gamepad_utils) inverted axis between x and y (#1572) 2025-07-22 14:31:30 +02:00
Caroline Pascal
9b9f4757fb style(deprecated method): remove no longer used get_features_from_robot function (replaced by hw_to_dataset_features) (#1560) 2025-07-21 19:12:03 +02:00