Commit Graph

43 Commits

Author SHA1 Message Date
Michel Aractingi
35f36e8fba removed outdated todos 2025-08-28 10:10:17 +02:00
Michel Aractingi
db36f01e8b add update_chunk_settings method for LeRobotDatasetMetadata. Introduce tests for chunk settings updates and validation of parameters. 2025-08-18 00:00:23 +02:00
Michel Aractingi
c7a3b02625 fixed tensor indicies in _check_cached_episode_sufficient in lerobot_dataset.py, added test 2025-08-13 16:16:32 +02:00
Michel Aractingi
4048b02d4a improved typing in datasets/utils.py 2025-07-31 14:32:29 +02:00
Francesco Capuano
527ae8e557 Add variable-size test datasets (#1610)
* fix: dummy datasets can be written to multiple files in multiple folders based on arbitrary data size

* fix: writing atomic episodes to multiple files (maybe)

* fix: moving unused write dataset function to test code
2025-07-30 11:26:28 +02:00
Michel Aractingi
890b1e473d Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3 2025-07-30 00:43:53 +02:00
Michel Aractingi
6447352439 added a check for comparing cached episodes in order to trigger a new download if the requested episodes dont match the cached ones 2025-07-30 00:32:28 +02:00
Michel Aractingi
788544d936 update lerobot_dataset docstring 2025-07-30 00:12:23 +02:00
Michel Aractingi
59d108a807 fix(convert_v2_v3) reverted concat data files from previous commit
fixed bug in meta data related chunk_index and file_index when concatenating video files, added clearer condition to respect conditions so that episode doesnt span multiple videos
2025-07-29 22:58:24 +02:00
HUANG TZU-CHUN
b2a71c6fe4 fix: Rename sync_cache_first to force_cache_sync in LeRobotDataset docstring (#1310) 2025-07-25 15:08:00 +02:00
Michel Aractingi
218ebed3ef feat(convert_dataset_v21_to_v3) added the use of more efficient Dataset.from_parquet and concatenate_datasets 2025-07-22 17:27:41 +02:00
Caroline Pascal
9b9f4757fb style(deprecated method): remove no longer used get_features_from_robot function (replaced by hw_to_dataset_features) (#1560) 2025-07-21 19:12:03 +02:00
Michel Aractingi
066b81aec2 moved concat_video function to video_utils, cleaned some code 2025-07-21 14:47:16 +02:00
Michel Aractingi
dcb02a951d fix(convert_v1) use correct legacy path, remove comments from scripts, revert lekiwi/record.py to main 2025-07-21 11:49:15 +02:00
Michel Aractingi
23375cce3a fix(tests) bug in clear_episode_buffer 2025-07-20 01:39:19 +02:00
Michel Aractingi
5ec70f704e removed check_timestamps_sync that is no longer used in the code,
removed tests in datasets related to check_timestamps_sync
added the use of `clear_episode_buffer` that was not used in `save_episode`
added the creation of the codebase_version tag that was missing in `slurm_upload`
2025-07-18 16:33:20 +02:00
Michel Aractingi
4c0ac93eb6 nit 2025-07-18 16:33:20 +02:00
pre-commit-ci[bot]
788dde3a34 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-18 16:33:20 +02:00
Michel Aractingi
e05d22cb7b Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-07-18 16:33:18 +02:00
Xingdong Zuo
e6e1f085d4 Feat: Add Batched Video Encoding for Faster Dataset Recording (#1390)
* LeRobotDataset video encoding: updated `save_episode` method and added `batch_encode_videos` method to handle video encoding based on `batch_encoding_size`, allowing for both immediate and batched encoding.

* LeRobotDataset video cleanup: Enabled individual episode cleanup and check for remaining PNG files before removing the `images` directory.

* LeRobotDataset - VideoEncodingManager: added proper handling of pending episodes (encoding, cleaning) on exit or recording failures.

* LeRobotDatasetMetadata: removed `update_video_info` to only update video info at episode index 0 encoding.

* Adjusted the `record` function to utilize the new encoding management logic.

* Removed `encode_videos` method from `LeRobotDataset` and `encode_episode_videos` outputs as they are nowhere used.

---------

Signed-off-by: Xingdong Zuo <zuoxingdong@users.noreply.github.com>
Co-authored-by: Xingdong Zuo <xingdong.zuo@navercorp.com>
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
2025-07-18 12:18:52 +02:00
Steven Palma
378e1f0338 Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)
* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
2025-07-17 14:30:20 +02:00
Michel Aractingi
a4d3a414ca Added Francescos PRs for fixing aggregate.py 2025-07-08 14:17:01 +02:00
fracapuano
a49760e2ba fix: tests depending on various sizes, and duration is updated 2025-07-08 13:43:19 +02:00
Michel Aractingi
4a466d94b6 moved legacy functions to convert_stats.py 2025-07-06 22:32:51 +02:00
Michel Aractingi
9287c36f37 - Added missing license in the new scripts
- Added back legacy functions in conversion script of v2 to v21
 - Updated README description for dataset_v3
2025-07-06 22:29:05 +02:00
Michel Aractingi
83bf24cc9a fix(tests) add features argument to load_nested_dataset 2025-07-05 10:16:29 +02:00
Michel Aractingi
3dbc3e60fb Added docstrings to aggregate, fix test_policies.py 2025-07-04 11:27:00 +02:00
Michel Aractingi
66454a0fbf Remove more references to lerobot.common 2025-07-02 18:18:19 +02:00
Michel Aractingi
1c17419224 Reverted back files that were changed during the rebase 2025-07-02 17:26:34 +02:00
Michel Aractingi
9dde8829e6 style nit 2025-07-02 17:10:56 +02:00
Michel Aractingi
0f66bbe2f9 Migrate PR to new folder structure introduce on 1417 2025-07-02 17:10:26 +02:00
fracapuano
378c147be6 fix: debug aggregation code 2025-07-02 11:45:27 +02:00
Remi Cadene
8c1503dafa WIP after Francesco discussion 2025-07-02 11:45:11 +02:00
Remi Cadene
13a1f68b8e WIP aggregate 2025-07-02 11:44:29 +02:00
Remi Cadene
a231930044 Fix aggregate (num_frames, dataset_from_index, index) 2025-07-02 11:43:46 +02:00
Remi Cadene
6f0fc7f386 Aggregate: Add concatenation 2025-07-02 11:43:36 +02:00
Remi Cadene
fde67dbae7 Fix convert v30 with image datasets 2025-07-02 11:43:35 +02:00
Remi Cadene
01bc89b6f4 Merge remote-tracking branch 'origin/user/rcadene/2025_04_11_dataset_v3' into user/rcadene/2025_04_11_dataset_v3 2025-07-02 11:43:24 +02:00
Remi Cadene
8c43b3d05e Faster self.meta.episodes[...]
switch back to set_transform instead of set_format

Add video_files_size_in_mb

pre-commit run --all-files
2025-07-02 11:43:22 +02:00
Remi Cadene
d4af22418b Fix unit tests 2025-07-02 11:42:52 +02:00
Remi Cadene
eaec52a7b7 Merge remote-tracking branch 'origin/user/rcadene/2025_04_11_dataset_v3' into user/rcadene/2025_04_11_dataset_v3 2025-07-02 11:42:49 +02:00
Remi Cadene
0a390de361 Merge remote-tracking branch 'origin/main' into user/rcadene/2025_04_11_dataset_v3 2025-07-02 11:41:53 +02:00
Simon Alibert
d4ee470b00 Package folder structure (#1417)
* Move files

* Replace imports & paths

* Update relative paths

* Update doc symlinks

* Update instructions paths

* Fix imports

* Update grpc files

* Update more instructions

* Downgrade grpc-tools

* Update manifest

* Update more paths

* Update config paths

* Update CI paths

* Update bandit exclusions

* Remove walkthrough section
2025-07-01 16:34:46 +02:00