Skip to content

fix(curate): emit valid LeRobot v3 dataset layout#246

Open
rylinjames wants to merge 1 commit into
mainfrom
fix/lerobot-v3-converter-format
Open

fix(curate): emit valid LeRobot v3 dataset layout#246
rylinjames wants to merge 1 commit into
mainfrom
fix/lerobot-v3-converter-format

Conversation

@rylinjames

Copy link
Copy Markdown
Collaborator

Summary

  • Change the LeRobot v3 converter from v2-style per-episode parquet files plus JSONL metadata to v3 file-based shards: data/chunk-000/file-000.parquet and meta/episodes/chunk-000/file-000.parquet.
  • Write meta/tasks.parquet with pandas-compatible task index metadata and meta/stats.json with action/state statistics.
  • Populate info.json totals and v3 path templates, and enforce float32 action/state parquet schemas to match declared features.
  • Add regression coverage that rejects the old episode_*.parquet/tasks.jsonl/episodes.jsonl layout and verifies converter output passes the dataset validator.

Validation

  • PYTHONPATH=$PWD/src /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/python -m pytest tests/test_curate_format_converters.py tests/test_dataset_validator.py -p no:cacheprovider
  • PYTHONPATH=$PWD/src /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/ruff check src/tether/curate/format_converters/lerobot_v3.py tests/test_curate_format_converters.py
  • PYTHONPATH=$PWD/src /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/python -m py_compile src/tether/curate/format_converters/lerobot_v3.py tests/test_curate_format_converters.py

🤖 Generated with Claude Code

Write consolidated v3 parquet shards, task parquet metadata, stats.json, and matching info.json path templates for the LeRobot v3 converter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant