Skip to content

schema: domain profile library (curated + local, multi-type augments)#5

Merged
hebbianloop merged 3 commits into
schema/fix-duplicate-conceptfrom
schema/profile-library
Jun 8, 2026
Merged

schema: domain profile library (curated + local, multi-type augments)#5
hebbianloop merged 3 commits into
schema/fix-duplicate-conceptfrom
schema/profile-library

Conversation

@hebbianloop

Copy link
Copy Markdown
Owner

Answers "how are we handling profiles?" — researchers can add their own, and we keep a curated library that grows over time. Builds on the dataset domain-profile from #3 and generalizes it.

Stacked on #4 (schema/fix-duplicate-concept) — this PR's base is that branch so it isn't reviewed against the pre-fix concept bug. Merge #4 first; this rebases onto main cleanly.

What changes

Profiles are domain bundles, not a dataset-only feature. Every field whose vocabulary is fixed by a domain standard moved out of the core node types into a profile:

core type moved to neuroimaging profile
dataset subject-id (req), session, modality, scanner, bids-version
experiment design, regressors, timing-source, stimulus-set
method nipype-node-type
derivative output-kind (req)

The core schema is now domain-neutral (identity + generic provenance + DataLad coords only).

Two-tier library:

  • glimmer/schema/profiles/<domain>.yaml — curated, in-repo, versioned. neuroimaging.yaml (BIDS) ships now; behavioral/genomics/… get added here over time.
  • <rokb>/_glimmer-profiles/<domain>.yaml — a researcher's own profile, local to one KB. Add a new domain with zero edits to the core schema or shared library. A local profile shadows a curated one of the same name (validator warns).

Each profile carries standard / version / status: curated|community|local → self-describing, and maps onto the schema registry in roadmap v0.6.

Domain resolution (most specific wins): node domain field → KB default-domain in _glimmer-index.json → core schema default-domain (neuroimaging). The validator merges augments.<node-type>.required onto the core requirements; an unknown domain is a non-fatal hint.

New / changed

  • profiles/_profile.schema.yaml — profile file format (meta-schema)
  • profiles/neuroimaging.yaml — the BIDS profile (worked example)
  • frontmatter.yaml — core types stripped of domain specifics; domain is a _common optional; top-level default-domain; default-domain added to index schema
  • validate.pyload_profiles() (curated + local), KB default-domain resolution, augments merge, prints loaded profiles + tier
  • schema.md — new Domain profiles section (tiers, resolution order, how to add one)
  • ds000114 emitter — sets default-domain: neuroimaging (+ glimmer/v0.3.1)

Verification

  • Core parses domain-neutral; profile library loads (neuroimaging (curated)).
  • Neuroimaging KB — via default-domain or core fallback — still enforces subject-id + output-kind.
  • A researcher-added local geology profile enforced sample-id with no core edits.
  • Unknown domain → warning, not error. Example RO-KBs validate at 0 errors.

🤖 Generated with Claude Code

Generalizes the dataset domain-profile idea (PR #3) into a first-class profile
system, per design review:

- Profiles are now DOMAIN BUNDLES that augment ANY node type, not just dataset.
  Fields whose vocabulary is fixed by a domain standard moved out of the core
  node types: BIDS subject-id/session/modality/scanner/bids-version (dataset),
  fMRI design/regressors/timing-source/stimulus-set (experiment), nipype-node-type
  (method), and output-kind (derivative). The core schema is now domain-neutral.

- Profiles live in two tiers:
    glimmer/schema/profiles/<domain>.yaml   — curated, in-repo library (grows over time)
    <rokb>/_glimmer-profiles/<domain>.yaml  — researcher's own, local to one KB
  Each profile carries metadata (standard, version, status: curated|community|local),
  mapping onto the roadmap v0.6 schema registry. profiles/_profile.schema.yaml
  documents the format; profiles/neuroimaging.yaml is the worked BIDS profile.

- Domain resolution (most specific wins): node `domain` field → KB `default-domain`
  in _glimmer-index.json → core schema default-domain (neuroimaging). The validator
  loads curated + local profiles and merges augments.<type>.required onto the core
  requirements. Unknown domain = non-fatal hint, never a hard error.

- `domain` is now a _common optional (any node); `default-domain` added to the
  index schema; validator prints the loaded profiles + tier.

Verified: core parses domain-neutral; neuroimaging KB (via default-domain or core
fallback) still enforces subject-id + output-kind; a researcher-added LOCAL geology
profile enforces sample-id with zero core edits; examples validate at 0 errors.

Stacked on schema/fix-duplicate-concept (#4); rebases cleanly once that merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hebbianloop

Copy link
Copy Markdown
Owner Author

do agents know how to read this and get onboarded? to decide how to use the repository and whether a new profile is needed or whether an existing suffices? or how to write a new one?

Addresses PR #5 review: "do agents know how to read this and get onboarded —
to decide whether a new profile is needed or an existing suffices, or how to
write one?" The profile system worked mechanically but was invisible to an agent
following the repo's own onboarding, which also discouraged touching glimmer/.

Frames profiles as an ENCOURAGED third extension tier and documents the intended
lifecycle: get started → discover you need a profile → write it locally in the
safe zone → use it → PR it upstream → reviewed and added to the library.

- AGENTS.md: new "Domain profiles" section (3-tier decision table + step-by-step
  decision flow + minimal template); new task-table row; relaxed the "never touch
  glimmer/ / never modify schema.md" rules to "never add node/edge TYPES without
  an RFC" — adding fields is a profile and is encouraged. Local profiles need no RFC.
- docs/extending-the-schema.md: decision tree up top (local profile / curated
  profile / core RFC) so the heavyweight RFC is correctly scoped to new node/edge
  types only; "Authoring a profile" lifecycle; welcomes curated-profile PRs.
- glimmer/schema/profiles/README.md: library index (curated table, resolution
  order, how to use/add a profile).
- validate.py: unknown-domain warnings now give an actionable next step
  (use a listed profile, or author <rokb>/_glimmer-profiles/<domain>.yaml) instead
  of a dead end. No fictional CLI referenced — discovery is `glimmer validate`'s
  Profiles: line.
- roadmap.md: link profiles to v0.4 multi-standard + v0.6 registry (profiles are
  the unit published); new v0.7 item for CLI <-> glimmer.science (auth, request
  storage/compute, query/modify the RO), pending the service architecture rewrite.

No core schema mechanics changed; onboarding/discoverability only. Examples still
validate at 0 errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hebbianloop

Copy link
Copy Markdown
Owner Author

Good catch — the profile system worked mechanically but was invisible to an agent following the repo's own onboarding (which also discouraged touching glimmer/, hiding the local-profile escape hatch). Folded the fix into this PR (00ba2fa).

Profiles are now framed as an encouraged third extension tier, with the lifecycle you described: get started → discover you need a profile → write it locally in the safe zone → use it → PR it upstream → we review and add it to the library.

  • AGENTS.md (the declared first read for agents) now has a Domain profiles section: a 3-tier decision table (local profile / curated profile / core RFC), a step-by-step decision flow, and a minimal template. Relaxed "never modify glimmer/""never add node/edge types without an RFC" — adding fields is a profile and is encouraged; local profiles under <rokb>/_glimmer-profiles/ need no RFC.
  • docs/extending-the-schema.md: decision tree up top so the heavyweight RFC is scoped to new node/edge types only, plus an "Authoring a profile" lifecycle that welcomes curated-profile PRs.
  • glimmer/schema/profiles/README.md: library index (curated table, resolution order, how to use/add).
  • validate.py: an unresolved domain now prints an actionable next step (use a listed profile, or author <rokb>/_glimmer-profiles/<domain>.yaml — with template + guide pointers) instead of a dead end. Discovery is the existing Profiles: line in glimmer validate output — no fictional CLI.

On the CLI: I did not build a glimmer profiles command. Per your steer, the CLI's growth is captured as a roadmap item (v0.7) — auth to glimmer.science, request storage/compute, and query/modify the hosted research object — marked pending the service architecture rewrite. Also linked profiles to the existing v0.4 (multi-standard) and v0.6 (registry — profiles are the unit of publication) items.

No core schema mechanics changed; onboarding/discoverability only. Examples still validate at 0 errors.

Pre-existing drift: AGENTS.md and cli.py pointed at tools and an example that
don't exist in the tree (`build_rokb.py`, `agent.py`, `score.py`, the
`training-fsqc` example). An agent following onboarding would hit cryptic
"tool not found" crashes or look for missing files.

- cli.py: `build` / `agent` / `score` no longer exec missing scripts — they
  print an honest "not yet packaged" message with the real path (per-example
  emit_graph.py for build) or the roadmap pointer (agent/score → v0.5).
  import-bids / export-rocrate use the same consistent stub helper.
- AGENTS.md: "Building the worked example" now uses the real ds000114-nipype
  flow (install.sh → workflow.py → emit_graph.py → validate); core-zone tooling
  list and task table corrected to what ships today vs what's planned;
  training-fsqc → ds000114-nipype.
- examples/ds000114-nipype/README.md: the optional `agent.py` step is marked
  planned (roadmap v0.5), not listed as a present file.
- roadmap.md: v0.5 line no longer asserts agent.py already exists; fixed a
  v0.4/v0.5 typo.

Verified: `glimmer build`/`agent` print clear stubs (exit 2); `glimmer validate`
still works; examples validate at 0 errors; no `training-fsqc`/`build_rokb`
references remain and all agent.py/score.py mentions read as planned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hebbianloop hebbianloop merged commit b06a086 into schema/fix-duplicate-concept Jun 8, 2026
@hebbianloop

Copy link
Copy Markdown
Owner Author

Filed #7 to track the glimmer.science service architecture rewrite that the new roadmap v0.7 item is blocked on. #7 is the umbrella; #6 (storage & archival + multi-tenant self-provisioning of bares) is its storage layer, and the full design record lives in ads-glimmer/docs/data/INFORMATION-ARCHITECTURE.md (status: design, implementation deferred to a dedicated glimmer-platform session).

hebbianloop added a commit that referenced this pull request Jun 8, 2026
* schema: fix duplicate concept key, dedup persona/organization, guard against recurrence

Audit of the merged v0.3.1 schema found `concept`, `persona`, and `organization`
each defined twice. yaml.safe_load silently keeps the LAST duplicate, so:

- `concept` resolved to the minimal meta-graph block — it had LOST its required
  `statement` field and 7 edges (decomposes-into, extends-concept, subsumed-by,
  competes-with, superseded-by, supports, contradicts, cited-in). Every concept
  node the agentic loop / ads-glimmer meta-graph emits with a concept→concept
  edge would have failed validation. Latent only because the one example concept
  used `edges: []`.
- `persona` / `organization`: last-wins kept the richer blocks so content
  survived, but the dead first blocks misled and persona.leads silently changed
  target semantics.

Fixes:
- Merge each type into a single canonical definition. `concept` regains
  `statement` + all edges, and folds in the meta-graph optionals (outcome-data,
  outcome-access, sensitivity) and the `tested-by-experiment` edge.
- validate.py: load the schema AND sidecar front-matter through a strict YAML
  loader that errors on duplicate keys, so a shadowed definition can never again
  pass silently. Verified: concept→concept edges now validate; a concept missing
  `statement` now errors; examples still validate at 0 errors.
- Drop a duplicate `tests-hypothesis` in experiment.edges-allowed.
- Fix version drift: header and _glimmer-index schema example said v0.2.0 → v0.3.1.
- schema.md: sync concept/persona docs; collapse the redundant bottom section
  that re-documented (and contradicted) the canonical concept.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* schema: extract domain profiles into a curated + local library

Generalizes the dataset domain-profile idea (PR #3) into a first-class profile
system, per design review:

- Profiles are now DOMAIN BUNDLES that augment ANY node type, not just dataset.
  Fields whose vocabulary is fixed by a domain standard moved out of the core
  node types: BIDS subject-id/session/modality/scanner/bids-version (dataset),
  fMRI design/regressors/timing-source/stimulus-set (experiment), nipype-node-type
  (method), and output-kind (derivative). The core schema is now domain-neutral.

- Profiles live in two tiers:
    glimmer/schema/profiles/<domain>.yaml   — curated, in-repo library (grows over time)
    <rokb>/_glimmer-profiles/<domain>.yaml  — researcher's own, local to one KB
  Each profile carries metadata (standard, version, status: curated|community|local),
  mapping onto the roadmap v0.6 schema registry. profiles/_profile.schema.yaml
  documents the format; profiles/neuroimaging.yaml is the worked BIDS profile.

- Domain resolution (most specific wins): node `domain` field → KB `default-domain`
  in _glimmer-index.json → core schema default-domain (neuroimaging). The validator
  loads curated + local profiles and merges augments.<type>.required onto the core
  requirements. Unknown domain = non-fatal hint, never a hard error.

- `domain` is now a _common optional (any node); `default-domain` added to the
  index schema; validator prints the loaded profiles + tier.

Verified: core parses domain-neutral; neuroimaging KB (via default-domain or core
fallback) still enforces subject-id + output-kind; a researcher-added LOCAL geology
profile enforces sample-id with zero core edits; examples validate at 0 errors.

Stacked on schema/fix-duplicate-concept (#4); rebases cleanly once that merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: make domain profiles discoverable + actionable for agents

Addresses PR #5 review: "do agents know how to read this and get onboarded —
to decide whether a new profile is needed or an existing suffices, or how to
write one?" The profile system worked mechanically but was invisible to an agent
following the repo's own onboarding, which also discouraged touching glimmer/.

Frames profiles as an ENCOURAGED third extension tier and documents the intended
lifecycle: get started → discover you need a profile → write it locally in the
safe zone → use it → PR it upstream → reviewed and added to the library.

- AGENTS.md: new "Domain profiles" section (3-tier decision table + step-by-step
  decision flow + minimal template); new task-table row; relaxed the "never touch
  glimmer/ / never modify schema.md" rules to "never add node/edge TYPES without
  an RFC" — adding fields is a profile and is encouraged. Local profiles need no RFC.
- docs/extending-the-schema.md: decision tree up top (local profile / curated
  profile / core RFC) so the heavyweight RFC is correctly scoped to new node/edge
  types only; "Authoring a profile" lifecycle; welcomes curated-profile PRs.
- glimmer/schema/profiles/README.md: library index (curated table, resolution
  order, how to use/add a profile).
- validate.py: unknown-domain warnings now give an actionable next step
  (use a listed profile, or author <rokb>/_glimmer-profiles/<domain>.yaml) instead
  of a dead end. No fictional CLI referenced — discovery is `glimmer validate`'s
  Profiles: line.
- roadmap.md: link profiles to v0.4 multi-standard + v0.6 registry (profiles are
  the unit published); new v0.7 item for CLI <-> glimmer.science (auth, request
  storage/compute, query/modify the RO), pending the service architecture rewrite.

No core schema mechanics changed; onboarding/discoverability only. Examples still
validate at 0 errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: remove phantom tool/example references; make CLI stubs honest

Pre-existing drift: AGENTS.md and cli.py pointed at tools and an example that
don't exist in the tree (`build_rokb.py`, `agent.py`, `score.py`, the
`training-fsqc` example). An agent following onboarding would hit cryptic
"tool not found" crashes or look for missing files.

- cli.py: `build` / `agent` / `score` no longer exec missing scripts — they
  print an honest "not yet packaged" message with the real path (per-example
  emit_graph.py for build) or the roadmap pointer (agent/score → v0.5).
  import-bids / export-rocrate use the same consistent stub helper.
- AGENTS.md: "Building the worked example" now uses the real ds000114-nipype
  flow (install.sh → workflow.py → emit_graph.py → validate); core-zone tooling
  list and task table corrected to what ships today vs what's planned;
  training-fsqc → ds000114-nipype.
- examples/ds000114-nipype/README.md: the optional `agent.py` step is marked
  planned (roadmap v0.5), not listed as a present file.
- roadmap.md: v0.5 line no longer asserts agent.py already exists; fixed a
  v0.4/v0.5 typo.

Verified: `glimmer build`/`agent` print clear stubs (exit 2); `glimmer validate`
still works; examples validate at 0 errors; no `training-fsqc`/`build_rokb`
references remain and all agent.py/score.py mentions read as planned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* roadmap: add v0.7 storage/platform, reconcile with as-built schema

- Add v0.7 (storage, durability & multi-tenant platform — the service
  architecture rewrite, tracked in #7); renumber the hosted CLI to v0.8
  so it reads as building on the platform it depends on.
- Reconcile the meta-graph edge inventory with what actually ships:
  fix the `leads` signature; add `tested-by-experiment`, the evidence-
  relation layer (`supports`/`contradicts`/`challenged-by`), `aggregates`,
  `depends-on-method`, `co-acquired-with`; note core structural edges
  live in schema.md.
- Assess skipped planned edges: keep `meta-analyzes` (distinct from
  `aggregates`), demote `requires-experiment`, drop `produces-data-for`
  (redundant with shipped `realized-by`).
- Reframe stale open questions against shipped edges (relationship-typed
  citation; retraction-only anti-claims).
- Fix stale "Beyond v0.5" heading -> "Beyond v0.8"; deepen v0.2 and v0.5
  framing; flag the v0.3 worked-example / example-instance gap as pending
  a follow-up PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Shady El Damaty <shady@holonym.id>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant