schema: domain profile library (curated + local, multi-type augments)#5
Conversation
Generalizes the dataset domain-profile idea (PR #3) into a first-class profile system, per design review: - Profiles are now DOMAIN BUNDLES that augment ANY node type, not just dataset. Fields whose vocabulary is fixed by a domain standard moved out of the core node types: BIDS subject-id/session/modality/scanner/bids-version (dataset), fMRI design/regressors/timing-source/stimulus-set (experiment), nipype-node-type (method), and output-kind (derivative). The core schema is now domain-neutral. - Profiles live in two tiers: glimmer/schema/profiles/<domain>.yaml — curated, in-repo library (grows over time) <rokb>/_glimmer-profiles/<domain>.yaml — researcher's own, local to one KB Each profile carries metadata (standard, version, status: curated|community|local), mapping onto the roadmap v0.6 schema registry. profiles/_profile.schema.yaml documents the format; profiles/neuroimaging.yaml is the worked BIDS profile. - Domain resolution (most specific wins): node `domain` field → KB `default-domain` in _glimmer-index.json → core schema default-domain (neuroimaging). The validator loads curated + local profiles and merges augments.<type>.required onto the core requirements. Unknown domain = non-fatal hint, never a hard error. - `domain` is now a _common optional (any node); `default-domain` added to the index schema; validator prints the loaded profiles + tier. Verified: core parses domain-neutral; neuroimaging KB (via default-domain or core fallback) still enforces subject-id + output-kind; a researcher-added LOCAL geology profile enforces sample-id with zero core edits; examples validate at 0 errors. Stacked on schema/fix-duplicate-concept (#4); rebases cleanly once that merges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
do agents know how to read this and get onboarded? to decide how to use the repository and whether a new profile is needed or whether an existing suffices? or how to write a new one? |
Addresses PR #5 review: "do agents know how to read this and get onboarded — to decide whether a new profile is needed or an existing suffices, or how to write one?" The profile system worked mechanically but was invisible to an agent following the repo's own onboarding, which also discouraged touching glimmer/. Frames profiles as an ENCOURAGED third extension tier and documents the intended lifecycle: get started → discover you need a profile → write it locally in the safe zone → use it → PR it upstream → reviewed and added to the library. - AGENTS.md: new "Domain profiles" section (3-tier decision table + step-by-step decision flow + minimal template); new task-table row; relaxed the "never touch glimmer/ / never modify schema.md" rules to "never add node/edge TYPES without an RFC" — adding fields is a profile and is encouraged. Local profiles need no RFC. - docs/extending-the-schema.md: decision tree up top (local profile / curated profile / core RFC) so the heavyweight RFC is correctly scoped to new node/edge types only; "Authoring a profile" lifecycle; welcomes curated-profile PRs. - glimmer/schema/profiles/README.md: library index (curated table, resolution order, how to use/add a profile). - validate.py: unknown-domain warnings now give an actionable next step (use a listed profile, or author <rokb>/_glimmer-profiles/<domain>.yaml) instead of a dead end. No fictional CLI referenced — discovery is `glimmer validate`'s Profiles: line. - roadmap.md: link profiles to v0.4 multi-standard + v0.6 registry (profiles are the unit published); new v0.7 item for CLI <-> glimmer.science (auth, request storage/compute, query/modify the RO), pending the service architecture rewrite. No core schema mechanics changed; onboarding/discoverability only. Examples still validate at 0 errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Good catch — the profile system worked mechanically but was invisible to an agent following the repo's own onboarding (which also discouraged touching Profiles are now framed as an encouraged third extension tier, with the lifecycle you described: get started → discover you need a profile → write it locally in the safe zone → use it → PR it upstream → we review and add it to the library.
On the CLI: I did not build a No core schema mechanics changed; onboarding/discoverability only. Examples still validate at 0 errors. |
Pre-existing drift: AGENTS.md and cli.py pointed at tools and an example that don't exist in the tree (`build_rokb.py`, `agent.py`, `score.py`, the `training-fsqc` example). An agent following onboarding would hit cryptic "tool not found" crashes or look for missing files. - cli.py: `build` / `agent` / `score` no longer exec missing scripts — they print an honest "not yet packaged" message with the real path (per-example emit_graph.py for build) or the roadmap pointer (agent/score → v0.5). import-bids / export-rocrate use the same consistent stub helper. - AGENTS.md: "Building the worked example" now uses the real ds000114-nipype flow (install.sh → workflow.py → emit_graph.py → validate); core-zone tooling list and task table corrected to what ships today vs what's planned; training-fsqc → ds000114-nipype. - examples/ds000114-nipype/README.md: the optional `agent.py` step is marked planned (roadmap v0.5), not listed as a present file. - roadmap.md: v0.5 line no longer asserts agent.py already exists; fixed a v0.4/v0.5 typo. Verified: `glimmer build`/`agent` print clear stubs (exit 2); `glimmer validate` still works; examples validate at 0 errors; no `training-fsqc`/`build_rokb` references remain and all agent.py/score.py mentions read as planned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Filed #7 to track the glimmer.science service architecture rewrite that the new roadmap v0.7 item is blocked on. #7 is the umbrella; #6 (storage & archival + multi-tenant self-provisioning of bares) is its storage layer, and the full design record lives in |
* schema: fix duplicate concept key, dedup persona/organization, guard against recurrence Audit of the merged v0.3.1 schema found `concept`, `persona`, and `organization` each defined twice. yaml.safe_load silently keeps the LAST duplicate, so: - `concept` resolved to the minimal meta-graph block — it had LOST its required `statement` field and 7 edges (decomposes-into, extends-concept, subsumed-by, competes-with, superseded-by, supports, contradicts, cited-in). Every concept node the agentic loop / ads-glimmer meta-graph emits with a concept→concept edge would have failed validation. Latent only because the one example concept used `edges: []`. - `persona` / `organization`: last-wins kept the richer blocks so content survived, but the dead first blocks misled and persona.leads silently changed target semantics. Fixes: - Merge each type into a single canonical definition. `concept` regains `statement` + all edges, and folds in the meta-graph optionals (outcome-data, outcome-access, sensitivity) and the `tested-by-experiment` edge. - validate.py: load the schema AND sidecar front-matter through a strict YAML loader that errors on duplicate keys, so a shadowed definition can never again pass silently. Verified: concept→concept edges now validate; a concept missing `statement` now errors; examples still validate at 0 errors. - Drop a duplicate `tests-hypothesis` in experiment.edges-allowed. - Fix version drift: header and _glimmer-index schema example said v0.2.0 → v0.3.1. - schema.md: sync concept/persona docs; collapse the redundant bottom section that re-documented (and contradicted) the canonical concept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * schema: extract domain profiles into a curated + local library Generalizes the dataset domain-profile idea (PR #3) into a first-class profile system, per design review: - Profiles are now DOMAIN BUNDLES that augment ANY node type, not just dataset. Fields whose vocabulary is fixed by a domain standard moved out of the core node types: BIDS subject-id/session/modality/scanner/bids-version (dataset), fMRI design/regressors/timing-source/stimulus-set (experiment), nipype-node-type (method), and output-kind (derivative). The core schema is now domain-neutral. - Profiles live in two tiers: glimmer/schema/profiles/<domain>.yaml — curated, in-repo library (grows over time) <rokb>/_glimmer-profiles/<domain>.yaml — researcher's own, local to one KB Each profile carries metadata (standard, version, status: curated|community|local), mapping onto the roadmap v0.6 schema registry. profiles/_profile.schema.yaml documents the format; profiles/neuroimaging.yaml is the worked BIDS profile. - Domain resolution (most specific wins): node `domain` field → KB `default-domain` in _glimmer-index.json → core schema default-domain (neuroimaging). The validator loads curated + local profiles and merges augments.<type>.required onto the core requirements. Unknown domain = non-fatal hint, never a hard error. - `domain` is now a _common optional (any node); `default-domain` added to the index schema; validator prints the loaded profiles + tier. Verified: core parses domain-neutral; neuroimaging KB (via default-domain or core fallback) still enforces subject-id + output-kind; a researcher-added LOCAL geology profile enforces sample-id with zero core edits; examples validate at 0 errors. Stacked on schema/fix-duplicate-concept (#4); rebases cleanly once that merges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: make domain profiles discoverable + actionable for agents Addresses PR #5 review: "do agents know how to read this and get onboarded — to decide whether a new profile is needed or an existing suffices, or how to write one?" The profile system worked mechanically but was invisible to an agent following the repo's own onboarding, which also discouraged touching glimmer/. Frames profiles as an ENCOURAGED third extension tier and documents the intended lifecycle: get started → discover you need a profile → write it locally in the safe zone → use it → PR it upstream → reviewed and added to the library. - AGENTS.md: new "Domain profiles" section (3-tier decision table + step-by-step decision flow + minimal template); new task-table row; relaxed the "never touch glimmer/ / never modify schema.md" rules to "never add node/edge TYPES without an RFC" — adding fields is a profile and is encouraged. Local profiles need no RFC. - docs/extending-the-schema.md: decision tree up top (local profile / curated profile / core RFC) so the heavyweight RFC is correctly scoped to new node/edge types only; "Authoring a profile" lifecycle; welcomes curated-profile PRs. - glimmer/schema/profiles/README.md: library index (curated table, resolution order, how to use/add a profile). - validate.py: unknown-domain warnings now give an actionable next step (use a listed profile, or author <rokb>/_glimmer-profiles/<domain>.yaml) instead of a dead end. No fictional CLI referenced — discovery is `glimmer validate`'s Profiles: line. - roadmap.md: link profiles to v0.4 multi-standard + v0.6 registry (profiles are the unit published); new v0.7 item for CLI <-> glimmer.science (auth, request storage/compute, query/modify the RO), pending the service architecture rewrite. No core schema mechanics changed; onboarding/discoverability only. Examples still validate at 0 errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: remove phantom tool/example references; make CLI stubs honest Pre-existing drift: AGENTS.md and cli.py pointed at tools and an example that don't exist in the tree (`build_rokb.py`, `agent.py`, `score.py`, the `training-fsqc` example). An agent following onboarding would hit cryptic "tool not found" crashes or look for missing files. - cli.py: `build` / `agent` / `score` no longer exec missing scripts — they print an honest "not yet packaged" message with the real path (per-example emit_graph.py for build) or the roadmap pointer (agent/score → v0.5). import-bids / export-rocrate use the same consistent stub helper. - AGENTS.md: "Building the worked example" now uses the real ds000114-nipype flow (install.sh → workflow.py → emit_graph.py → validate); core-zone tooling list and task table corrected to what ships today vs what's planned; training-fsqc → ds000114-nipype. - examples/ds000114-nipype/README.md: the optional `agent.py` step is marked planned (roadmap v0.5), not listed as a present file. - roadmap.md: v0.5 line no longer asserts agent.py already exists; fixed a v0.4/v0.5 typo. Verified: `glimmer build`/`agent` print clear stubs (exit 2); `glimmer validate` still works; examples validate at 0 errors; no `training-fsqc`/`build_rokb` references remain and all agent.py/score.py mentions read as planned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * roadmap: add v0.7 storage/platform, reconcile with as-built schema - Add v0.7 (storage, durability & multi-tenant platform — the service architecture rewrite, tracked in #7); renumber the hosted CLI to v0.8 so it reads as building on the platform it depends on. - Reconcile the meta-graph edge inventory with what actually ships: fix the `leads` signature; add `tested-by-experiment`, the evidence- relation layer (`supports`/`contradicts`/`challenged-by`), `aggregates`, `depends-on-method`, `co-acquired-with`; note core structural edges live in schema.md. - Assess skipped planned edges: keep `meta-analyzes` (distinct from `aggregates`), demote `requires-experiment`, drop `produces-data-for` (redundant with shipped `realized-by`). - Reframe stale open questions against shipped edges (relationship-typed citation; retraction-only anti-claims). - Fix stale "Beyond v0.5" heading -> "Beyond v0.8"; deepen v0.2 and v0.5 framing; flag the v0.3 worked-example / example-instance gap as pending a follow-up PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Shady El Damaty <shady@holonym.id> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Answers "how are we handling profiles?" — researchers can add their own, and we keep a curated library that grows over time. Builds on the
datasetdomain-profile from #3 and generalizes it.What changes
Profiles are domain bundles, not a dataset-only feature. Every field whose vocabulary is fixed by a domain standard moved out of the core node types into a profile:
datasetsubject-id(req),session,modality,scanner,bids-versionexperimentdesign,regressors,timing-source,stimulus-setmethodnipype-node-typederivativeoutput-kind(req)The core schema is now domain-neutral (identity + generic provenance + DataLad coords only).
Two-tier library:
glimmer/schema/profiles/<domain>.yaml— curated, in-repo, versioned.neuroimaging.yaml(BIDS) ships now; behavioral/genomics/… get added here over time.<rokb>/_glimmer-profiles/<domain>.yaml— a researcher's own profile, local to one KB. Add a new domain with zero edits to the core schema or shared library. A local profile shadows a curated one of the same name (validator warns).Each profile carries
standard/version/status: curated|community|local→ self-describing, and maps onto the schema registry in roadmap v0.6.Domain resolution (most specific wins): node
domainfield → KBdefault-domainin_glimmer-index.json→ core schemadefault-domain(neuroimaging). The validator mergesaugments.<node-type>.requiredonto the core requirements; an unknown domain is a non-fatal hint.New / changed
profiles/_profile.schema.yaml— profile file format (meta-schema)profiles/neuroimaging.yaml— the BIDS profile (worked example)frontmatter.yaml— core types stripped of domain specifics;domainis a_commonoptional; top-leveldefault-domain;default-domainadded to index schemavalidate.py—load_profiles()(curated + local), KB default-domain resolution,augmentsmerge, prints loaded profiles + tierschema.md— new Domain profiles section (tiers, resolution order, how to add one)ds000114emitter — setsdefault-domain: neuroimaging(+glimmer/v0.3.1)Verification
neuroimaging (curated)).default-domainor core fallback — still enforcessubject-id+output-kind.geologyprofile enforcedsample-idwith no core edits.🤖 Generated with Claude Code