-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Overview
This tracking issue coordinates the migration of 5 legacy Python constant modules from the old JSON-as-data approach to our modern spec + code generation system.
Background
Current (legacy) approach:
- Constants defined in JSON files under
le_utils/resources/ - Python modules load JSON at runtime using
pkgutil.get_data() - Manual Python constants must be kept in sync with JSON files
- JavaScript code cannot use these constants (no JS export)
- Tests verify Python/JSON sync (which is the pain point)
Target (modern) approach:
- Constants defined once in JSON spec files under
spec/ - Code generation script (
generate_from_specs.py) creates both Python and JavaScript files - Single source of truth eliminates sync issues
- Automatic JavaScript export enables frontend use
- Already used successfully for 8 modules (modalities, labels, schemas, etc.)
Modules to Migrate
- file_formats.py (FOUNDATION - must be done first)
- licenses.py (blocked by file_formats)
- content_kinds.py (blocked by file_formats, enhances generation for metadata/mappings)
- format_presets.py (blocked by file_formats and content_kinds)
- languages.py (blocked by file_formats)
Migration Strategy
file_formats is the FOUNDATION issue that:
- Enhances
generate_from_specs.pyto support namedtuple-based constants - Establishes the spec format pattern for all other issues
- Includes helper function generation (
getformat()) - Must be completed before the rest can proceed
content_kinds further enhances generation:
- Adds support for metadata-driven code generation (MAPPING dict)
- Must be completed before format_presets
licenses and languages (can be done in parallel after file_formats completes):
- Follow the pattern established in file_formats
- Create spec file using the namedtuple format
- Run generation to create Python/JS files
- Update tests to verify against spec
- Delete old JSON resource file
format_presets must wait for both file_formats and content_kinds to complete.
All 5 modules share a common structure (namedtuples, {MODULE}LIST, choices), with progressive enhancement of the generation script.
Spec File Format
All migrated modules will use this consistent JSON structure in their spec files:
{
"namedtuple": {
"name": "Format",
"fields": ["id", "mimetype"]
},
"constants": {
"mp4": {"mimetype": "video/mp4"},
"webm": {"mimetype": "video/webm"},
"pdf": {"mimetype": "application/pdf"}
}
}The generation script will use this to create:
- Python namedtuple class:
class Format(namedtuple("Format", ["id", "mimetype"])): pass - Python LIST variable:
FORMATLIST = [Format(id="mp4", mimetype="video/mp4"), ...] - Python constants:
MP4 = "mp4",WEBM = "webm", etc. - Python choices tuple:
choices = ((MP4, "Mp4"), (WEBM, "Webm"), ...) - JavaScript exports:
export default { MP4: "mp4", WEBM: "webm", ... }
Each module will have different namedtuple fields appropriate to its data:
file_formats:["id", "mimetype"]licenses:["id", "name", "exists", "url", "description", "custom", "copyright_holder_required"]content_kinds:["id", "name"](plus metadata for MAPPING generation)format_presets:["id", "kind_id", "allowed_formats", ...](10+ fields)languages:["lang_code", "lang_subcode", "readable_name", ...](complex structure)
Post-Migration Cleanup
After all 5 modules are migrated:
- Remove
package_data={"le_utils": ["resources/*.json"]}from setup.py - Delete
le_utils/resources/directory - Update README.md to remove manual sync warnings
- Update CHANGELOG.md with migration notes
Benefits
✅ Single source of truth (spec files)
✅ JavaScript export for all constants
✅ Eliminates manual sync requirement
✅ Consistent with modern modules
✅ Better developer experience for contributors
Disclosure
🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖