lerobot-clone

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-01 19:31:25 +00:00

Author	SHA1	Message	Date
CarolinePascal	5f114c1d74	feat(init audio buffers): adding correct audio buffer initialization with actually recorded background noise instead of pure silence	2026-01-20 12:20:26 +01:00
CarolinePascal	8e29c530ed	fix(pytorch audio format): switching to pytorch's default channel first format for audio	2026-01-20 12:12:59 +01:00
CarolinePascal	b573b7a052	fix(audio decoding): fixing edge cases where the requested audio chunk starts before the beginning of the recording	2026-01-20 12:12:59 +01:00
CarolinePascal	926184110b	feat(audio in policies): adding audio as a input feature in policies	2026-01-20 12:12:59 +01:00
CarolinePascal	f73db4394b	fix(audio chunks): querying audio chunks in the past rather than in the future	2026-01-20 12:12:59 +01:00
CarolinePascal	bff91f9927	feat(torchcodec): setting torchcodec as default as the new official release supports audio decoding	2026-01-20 12:12:59 +01:00
CarolinePascal	6d726266fd	fix(audio load file): adding missing dimension when loading mono audio data	2026-01-20 12:12:59 +01:00
CarolinePascal	067993bb11	fix(typos): fixing typos	2026-01-20 12:12:58 +01:00
CarolinePascal	e4dd00c8f5	fix(audio feature shape): fixing audio feature shape ordering (frames first, channels second)	2026-01-20 12:12:58 +01:00
CarolinePascal	3bbd161cfd	[skip ci] feat(audio recording): adding new asyn start_recording, stop_recording and read functions to avoid for loop delays	2026-01-20 12:12:58 +01:00
CarolinePascal	dce483060f	[skip ci] feat(audio recording): handle folder creation in start_recording directly	2026-01-20 12:12:58 +01:00
CarolinePascal	c32b9182d9	[skip ci] feat(torchcodec): adding support for torchcodec audio decoding	2026-01-20 12:12:58 +01:00
CarolinePascal	688195fc46	docs: add methods descriptions and comments on tricky parts	2026-01-20 12:12:58 +01:00
CarolinePascal	99eb0bbafc	Adding last missing audio features in LeRobotDataset	2026-01-20 12:12:58 +01:00
CarolinePascal	16de8b3f19	Adding support for audio data recording and broadcasting for LeKiwi	2026-01-20 12:12:55 +01:00
CarolinePascal	52c424c5eb	Adding multiprocessing support for audio recording	2026-01-20 12:12:07 +01:00
CarolinePascal	836195e59c	Renamming sampling rate to sample rate for consistency	2026-01-20 12:12:07 +01:00
CarolinePascal	00536c6c5b	Adding missing features for audio frames verification and stats	2026-01-20 12:10:45 +01:00
CarolinePascal	cdd3a859ef	Adding pytorch compatible conversion for audio	2026-01-20 12:10:45 +01:00
CarolinePascal	8874547353	Adding microphone recording in control loop	2026-01-20 12:10:12 +01:00
CarolinePascal	2864caad80	Adding audio modality in LeRobotDatasets	2026-01-20 12:10:12 +01:00
Jade Choghari	79688a09f2	improve(dataset-tools): image2video editing tools : Multiple episodes per video file (#2811 ) * improve image2video * add episodes video encoding * fix mypy failing * iterate on review * nit * remove max, and let it be optional * iterate more * update docs * fix test --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2026-01-20 11:04:22 +01:00
Francesco Capuano	b2ff219624	Fixes aggregation of image datasets (#2717 ) * fix: use features when aggregating image based datasets * add: test asserting for data type * add: features param to writing dataset --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-01-19 23:36:41 +01:00
Alex Tyshka	77dc49b3a3	Fix delta timestamps with episodes filter and add tests (#2612 )	2026-01-16 18:14:54 +01:00
Alex Tyshka	33910673ec	Bugfix: Add tests for image deletion and fix mixed image-video deletion (#2592 ) * Add tests for image deletion and fix mixed-image-video deletion * Fix docstring whitespace * Remove debug print Signed-off-by: Alex Tyshka <atyshka15@gmail.com> * Remove inaccurate comment * Remove batched video test --------- Signed-off-by: Alex Tyshka <atyshka15@gmail.com>	2026-01-16 18:14:15 +01:00
Steven Palma	15724826dd	chore: use alias & constants (#2785 ) * chore: use alias and constants * fix(rl): solve circular dependecy * chore: nit right constant * chore: pre-commit * chore(script): conflict tokenizer train --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>	2026-01-13 09:49:46 +01:00
Leo Tronchon	8b6fc0ae05	feat(datasets): expose video codec option for dataset recording (#2771 ) * expose codec options + add tests * pre-commit run -a	2026-01-08 18:06:39 +01:00
Steven Palma	2b304eeb84	feat(dataset): expose tolerance_s argument to training config (#2653 )	2025-12-16 00:53:19 +01:00
Michel Aractingi	0217e1e3ad	Fix dataset aggreagation for multi video datasets' (#2550 )	2025-12-05 16:09:25 +01:00
Caroline Pascal	581dd45eae	feat(parallel encoding): making parallel encoding the default choice over all platforms (#2525 )	2025-11-26 14:57:34 +01:00
Steven Palma	87bee86640	feat(dataset): dynamic compress_level depending on the type of dataset (video or image) (#2517 )	2025-11-25 19:11:12 +01:00
Steven Palma	18b32dced9	feat(dataset): speed-up encoding time (#2514 ) * feat(dataset): speed-up encoding time * feat(dataset): add parallel encoding option * feat(datasets): parallel encoding only if num_cams > 2 * feat(datasets): implement feedback	2025-11-25 16:46:12 +01:00
Michel Aractingi	0f551df8f4	add `absolute_to_reative_idx` for remapping indicies when a subset of data is loaded (#2490 )	2025-11-20 14:05:31 +01:00
Michel Aractingi	b464d9f8bc	Fix episode filtering bug when requesting a subset of the episodes in a dataset (#2456 ) * filter episodes in load_nested_dataset * nit * remove test filtering * move import to module level * added missing episode indices to the EpisodeAwareSampler in lerobot_train.py;	2025-11-18 17:26:41 +01:00
Steven Palma	a4aa316470	fix(dataset): fix data access bottleneck for faster training (#2408 )	2025-11-07 21:54:44 +01:00
Michel Aractingi	f6b16f6d97	fix(dataset_tools) Critical bug in modify features (#2342 ) * fix bug in `_copy_data_with_feature_changes` * Update src/lerobot/datasets/dataset_tools.py Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co> * add missing import --------- Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>	2025-11-04 15:56:41 +01:00
Caroline Pascal	3f8c5d9809	fix(video_key typo): fixing video_key typo in update_video_info (#2323 )	2025-10-28 09:41:33 +01:00
Michel Aractingi	76a425c600	Fix: `check_cached_episodes` doesn't check if the requested episode video were downloaded (#2296 ) * In `check_cached_episodes_sufficient` check whether all the requested video files are downloaded * optimize loop over the video paths * revert example num_workers * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co> * set num_workers to zero in example * style nit * reintroduce copilot optim --------- Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-23 17:34:03 +02:00
Michel Aractingi	12f2f35760	- Introduce _current_file_start_frame for better tracking of the number of frames in each parquet file (#2280 ) - Added testing for that section in `test_datasets.py`	2025-10-21 16:17:12 +02:00
Antoine	502fdc0630	fix dataset revision (#2260 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2025-10-20 18:45:09 +02:00
Bryson Jones	88100943ef	add affine transforms and test (#2145 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2025-10-19 21:39:30 +02:00
Lycoris	f0aeded142	Fixes failed to delete images because the timing of gc is uncertain (#1710 ) * Prevents resource leak in video_utils when getting width and height Added the with statement when opening the image to ensure that the file handle is properly closed after its contents are read. Otherwise, shutil.rmtree(img_dir) will fail when called after the encode_video_frames function completes. Signed-off-by: Lycoris <32864669+lycoris1129@users.noreply.github.com> --------- Signed-off-by: Lycoris <32864669+lycoris1129@users.noreply.github.com>	2025-10-18 06:47:07 +02:00
Edgar Riba	0050d7c61c	docs: change video file path format in conversion script (#2113 ) * Change video file path format in conversion script Updated video file path in the dataset conversion script. Signed-off-by: Edgar Riba <edgar.riba@gmail.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Edgar Riba <edgar.riba@gmail.com> --------- Signed-off-by: Edgar Riba <edgar.riba@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2025-10-17 16:32:24 +02:00
Antoine	a51682b266	Optimized episode cache verification (#2166 ) Signed-off-by: Antoine <antoine.dandigne@gmail.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2025-10-17 15:18:21 +02:00
Michel Aractingi	8e940bf361	Feat/expand add features (#2202 ) * make add_feature take multiple features at a time and rename to add_features * - New function: modify_features that was a combination of remove features and add features. - This function is important for when we want to add a feature and remove another so we can do it in one time to avoid copying and creating the dataset multiple times	2025-10-14 16:19:50 +02:00
Steven Palma	bf6ac5e110	fix(datasets): conversion script function naming (#2199 ) Co-authored-by: gagalo123 <bamianweifen@gmail.com>	2025-10-14 14:36:32 +02:00
Michel Aractingi	f2ff370459	Incremental parquet writing (#1903 ) * incremental parquet writing * add .finalise() and a backup __del__ for stopping writers * fix missing import * precommit fixes added back the use of embed images * added lazy loading for hf_Dataset to avoid frequently reloading the dataset during recording * fix bug in video timestamps * Added proper closing of parquet file before reading * Added rigorous testing to validate the consistency of the meta data after creation of a new dataset * fix bug in episode index during clear_episode_buffer * fix(empty concat): check for empty paths list before data files concatenation * fix(v3.0 message): updating v3.0 backward compatibility message. * added fixes for the resume logic * answering co-pilot review * reverting some changes and style nits * removed unused functions * fix chunk_id and file_id when resuming * - fix parquet loading when resuming - add test to verify the parquet file integrity when resuming so that data files are now overwritten * added general function get_file_size_in_mb and removed the one for video * fix table size value when resuming * Remove unnecessary reloading of the parquet file when resuming record. Write to a new parquet file when resuming record * added back reading parquet file for image datasets only * - respond to Qlhoest comments - Use pyarrows `from_pydict` function - Add buffer for episode metadata to write to the parquet file in batches to improve efficiency - Remove the use of `to_parquet_with_hf_images` * fix(dataset_tools) with the new logic using proper finalize bug in finding the latest path of the metdata that was pointing to the data files added check for the metadata size in the case the metadatabuffer was not written yet * nit in flush_metadata_buffer * fix(lerobot_dataset) return the right dataset len when a subset of the dataset is requested --------- Co-authored-by: Harsimrat Sandhawalia <hs.sandhawalia@gmail.com>	2025-10-11 11:01:30 +02:00
Michel Aractingi	b8f7e401d4	Dataset tools (#2100 ) * feat(dataset-tools): add dataset utilities and example script - Introduced dataset tools for LeRobotDataset, including functions for deleting episodes, splitting datasets, adding/removing features, and merging datasets. - Added an example script demonstrating the usage of these utilities. - Implemented comprehensive tests for all new functionalities to ensure reliability and correctness. * style fixes * move example to dataset dir * missing lisence * fixes mostly path * clean comments * move tests to functions instead of class based * - fix video editting, decode, delete frames and rencode video - copy unchanged video and parquet files to avoid recreating the entire dataset * Fortify tooling tests * Fix type issue resulting from saving numpy arrays with shape 3,1,1 * added lerobot_edit_dataset * - revert changes in examples - remove hardcoded split names * update comment * fix comment add lerobot-edit-dataset shortcut * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co> * style nit after copilot review * fix: bug in dataset root when editing the dataset in place (without setting new_repo_id * Fix bug in aggregate.py when accumelating video timestamps; add tests to fortify aggregate videos * Added missing output repo id * migrate delete episode to using pyav instead of decoding, writing frames to disk and encoding again. Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com> * added modified suffix in case repo_id is not set in delete_episode * adding docs for dataset tools * bump av version and add back time_base assignment * linter * modified push_to_hub logic in lerobot_edit_dataset * fix(progress bar): fixing the progress bar issue in dataset tools * chore(concatenate): removing no longer needed concatenate_datasets usage * fix(file sizes forwarding): forwarding files and chunk sizes in metadata info when splitting and aggregating datasets * style fix * refactor(aggregate): Fix video indexing and timestamp bugs in dataset merging There were three critical bugs in aggregate.py that prevented correct dataset merging: 1. Video file indices: Changed from += to = assignment to correctly reference merged video files 2. Video timestamps: Implemented per-source-file offset tracking to maintain continuous timestamps when merging split datasets (was causing non-monotonic timestamp warnings) 3. File rotation offsets: Store timestamp offsets after rotation decision to prevent out-of-bounds frame access (was causing "Invalid frame index" errors with small file size limits) Changes: - Updated update_meta_data() to apply per-source-file timestamp offsets - Updated aggregate_videos() to track offsets correctly during file rotation - Added get_video_duration_in_s import for duration calculation * Improved docs for split dataset and added a check for the possible case that the split size results in zero episodes * chore(docs): update merge documentation details Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> --------- Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com> Co-authored-by: Jack Vial <vialjack@gmail.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2025-10-10 12:32:07 +02:00
Steven Palma	9a49e57c72	refactor(datasets): add compress_level parameter to write_image() and set it to 1 (#2135 ) * refactor(datasets): add compress_level parameter to write_image() and set it to 1 * docs(dataset): add docs to write_image()	2025-10-08 20:06:56 +02:00
Michel Aractingi	fcaa0ea5f9	remove extra time base set. (#2133 ) Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>	2025-10-07 14:09:36 +02:00

1 2

75 Commits