Skip to content

Generating Mujoco MJCF XSD schema from the parser#3237

Open
julien-blanchon wants to merge 5 commits intogoogle-deepmind:mainfrom
julien-blanchon:schema-autogen
Open

Generating Mujoco MJCF XSD schema from the parser#3237
julien-blanchon wants to merge 5 commits intogoogle-deepmind:mainfrom
julien-blanchon:schema-autogen

Conversation

@julien-blanchon
Copy link
Copy Markdown

@julien-blanchon julien-blanchon commented Apr 20, 2026

Adds an auto-generated MJCF XSD, built from the parser tables in src/xml/ and enriched with defaults and docs from doc/XMLreference.rst.

This is intended for IDE/LLM editing support (elements, attributes, enums), not as a replacement for compiler validation.

What’s included

  • XSD generator (mj_printSchemaXSD) based on MJCF[] + attribute-type table
  • CLI tool to dump the raw schema (sample/xmlschema.cc)
  • Enrichment script to pull defaults/docs from RST and detect drift (--strict)
  • Docs + contributing updates (regen + editor integration)

Closes #6. Refs julien-blanchon/mujoco-schema#1.

Regeneration

# Build the xmlschema cli
cmake --build build --target xmlschema
# Build the base schema from the parser data
./build/bin/xmlschema /tmp/raw.xsd
# Eventually enrich the schema with documentation using the `XMLreference.rst`
python doc/mjcf_schema_enrich.py --in /tmp/raw.xsd \
    --rst doc/XMLreference.rst --out doc/mjcf.xsd --strict --report

Test plan

  • Builds on Linux / macOS / Windows
  • The generated schema (doc/mjcf.xsd) validates against XMLSchema (valid schema)
  • ./model/**/*.xml validate against the schema
  • Output is deterministic

Decisions

Where to serve mjcf.xsd ?
Should we keep store it in the repo at doc/mjcf.xsd, or as a Github release artifact ? Or something else ?

Adding CI ?
Should we add a lightweight CI job to:

  • build xmlschema
  • run enricher with --strict
  • diff against committed XSD
  • validate schema + sample models (and eventually google-deepmind/mujoco_menagerie model too)

Four local refactors to the MJCF XSD generator; all preserve the schema
semantics (same validating behaviour, same set of declared types).

1. Move the ~600-line kMjXAttrTable (and the mjXAttr/mjXAttrKind type
   declarations) into xml_native_reader.cc / xml_native_reader.h, next
   to MJCF[] and the enum maps. Reverts the earlier `extern const mjMap`
   sprinkling: the maps go back to internal linkage, and the schema
   generator just consumes kMjXAttrTable through its declarations.
   xml_native_schema.cc shrinks from 1653 to 532 lines.

2. Drop the manual kGeomTypesSz / kNDYN / etc. shadow constants; reference
   the authoritative mjNGEOMTYPES / mjNDYN / ... enum values directly from
   mjmodel.h / user_composite.h / user_flexcomp.h. No more silent drift.

3. Replace the O(N) linear Lookup() over kMjXAttrTable with a file-static
   unordered_map<(string_view, string_view), const mjXAttr*>, populated
   on first call. Schema emission drops from O(N^2) to O(N).

4. Replace the ListTypeName / EmitListType string round-tripping with a
   ListType struct (kind + size). EmitListType takes the struct directly,
   no stoi/find_last_of parsing of type names.

5. Deduplicate <xs:simpleType> enum declarations by mjMap* identity: the
   first (element, attr) pair that references a given map sets the
   canonical enum type name; every subsequent attribute using the same
   map reuses it. The generated XSD shrinks from 183 enum simpleTypes
   (many byte-identical copies of bool / interp / enable enums) down to
   39 distinct ones.
Drop the plain python3 shebang in favor of `uv run --script`, add the
inline script metadata block pinning requires-python >=3.10, mark the
file executable, and update the pipeline snippets in CONTRIBUTING.md
and xml_native_schema.cc to invoke it directly.

With this change the only prerequisite for regenerating doc/mjcf.xsd
is uv on PATH — uv provisions a Python interpreter that actually has
expat, which the Homebrew python3 builds lack.
@julien-blanchon
Copy link
Copy Markdown
Author

You can found the resulting schema here if you want to test it in your IDE directly: https://raw.githubusercontent.com/julien-blanchon/mujoco/refs/heads/schema-autogen-xsd/doc/mjcf.xsd

For vscode you need https://github.com/redhat-developer/vscode-xml +

<mujoco xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://raw.githubusercontent.com/julien-blanchon/mujoco/refs/heads/schema-autogen-xsd/doc/mjcf.xsd">

In the header

@devshahofficial
Copy link
Copy Markdown
Contributor

devshahofficial commented Apr 29, 2026

This is a useful direction. I can help test the generated schema against a broader set of MJCFs, especially model/ plus a subset of Menagerie models. My instinct would be to keep doc/mjcf.xsd committed if deterministic generation is enforced in CI; otherwise users would not have a stable URL for editor integration. A lightweight CI check that regenerates + diffs the schema feels like the right guardrail. Happy to run a validation pass and report concrete schema misses if that would help review.

@julien-blanchon
Copy link
Copy Markdown
Author

Broader testing will be greatly appreciate !
But maybe let's wait for GDM people inputs on this before.

@julien-blanchon
Copy link
Copy Markdown
Author

Any follow up on this ?

@devshahofficial do not hesitate to test, expecially on some edge case / minor mujoco feature. I've done some testing already on all the ./model mujoco models from this repo, and some of the Menagerie models.

About commiting the doc/mjcf.xsd, I was also thinking releasing the mjcf.xsd along side the mujoco build inside of the released (see https://github.com/google-deepmind/mujoco/releases). So the schema CI doesn't have to trigger at every commit (just on release), we get both a versioned url with the version tag in it (https://github.com/google-deepmind/mujoco/releases/download/<version_tag>/mjcf.xsd), and a always up to date latest version (https://github.com/google-deepmind/mujoco/releases/latest/download/mjcf.xsd).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish XML schema for MJCF

2 participants