Skip to content

Migrate legacy constants from JSON-as-data to spec + code generation #181

@rtibbles

Description

@rtibbles

Overview

This tracking issue coordinates the migration of 5 legacy Python constant modules from the old JSON-as-data approach to our modern spec + code generation system.

Background

Current (legacy) approach:

  • Constants defined in JSON files under le_utils/resources/
  • Python modules load JSON at runtime using pkgutil.get_data()
  • Manual Python constants must be kept in sync with JSON files
  • JavaScript code cannot use these constants (no JS export)
  • Tests verify Python/JSON sync (which is the pain point)

Target (modern) approach:

  • Constants defined once in JSON spec files under spec/
  • Code generation script (generate_from_specs.py) creates both Python and JavaScript files
  • Single source of truth eliminates sync issues
  • Automatic JavaScript export enables frontend use
  • Already used successfully for 8 modules (modalities, labels, schemas, etc.)

Modules to Migrate

  1. file_formats.py (FOUNDATION - must be done first)
  2. licenses.py (blocked by file_formats)
  3. content_kinds.py (blocked by file_formats, enhances generation for metadata/mappings)
  4. format_presets.py (blocked by file_formats and content_kinds)
  5. languages.py (blocked by file_formats)

Migration Strategy

file_formats is the FOUNDATION issue that:

  • Enhances generate_from_specs.py to support namedtuple-based constants
  • Establishes the spec format pattern for all other issues
  • Includes helper function generation (getformat())
  • Must be completed before the rest can proceed

content_kinds further enhances generation:

  • Adds support for metadata-driven code generation (MAPPING dict)
  • Must be completed before format_presets

licenses and languages (can be done in parallel after file_formats completes):

  • Follow the pattern established in file_formats
  • Create spec file using the namedtuple format
  • Run generation to create Python/JS files
  • Update tests to verify against spec
  • Delete old JSON resource file

format_presets must wait for both file_formats and content_kinds to complete.

All 5 modules share a common structure (namedtuples, {MODULE}LIST, choices), with progressive enhancement of the generation script.

Spec File Format

All migrated modules will use this consistent JSON structure in their spec files:

{
  "namedtuple": {
    "name": "Format",
    "fields": ["id", "mimetype"]
  },
  "constants": {
    "mp4": {"mimetype": "video/mp4"},
    "webm": {"mimetype": "video/webm"},
    "pdf": {"mimetype": "application/pdf"}
  }
}

The generation script will use this to create:

  • Python namedtuple class: class Format(namedtuple("Format", ["id", "mimetype"])): pass
  • Python LIST variable: FORMATLIST = [Format(id="mp4", mimetype="video/mp4"), ...]
  • Python constants: MP4 = "mp4", WEBM = "webm", etc.
  • Python choices tuple: choices = ((MP4, "Mp4"), (WEBM, "Webm"), ...)
  • JavaScript exports: export default { MP4: "mp4", WEBM: "webm", ... }

Each module will have different namedtuple fields appropriate to its data:

  • file_formats: ["id", "mimetype"]
  • licenses: ["id", "name", "exists", "url", "description", "custom", "copyright_holder_required"]
  • content_kinds: ["id", "name"] (plus metadata for MAPPING generation)
  • format_presets: ["id", "kind_id", "allowed_formats", ...] (10+ fields)
  • languages: ["lang_code", "lang_subcode", "readable_name", ...] (complex structure)

Post-Migration Cleanup

After all 5 modules are migrated:

  • Remove package_data={"le_utils": ["resources/*.json"]} from setup.py
  • Delete le_utils/resources/ directory
  • Update README.md to remove manual sync warnings
  • Update CHANGELOG.md with migration notes

Benefits

✅ Single source of truth (spec files)
✅ JavaScript export for all constants
✅ Eliminates manual sync requirement
✅ Consistent with modern modules
✅ Better developer experience for contributors

Disclosure

🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions