[feat] Add `{{mustache}}` rendering (prompt unification WP-B3) by junaway · Pull Request #4393 · Agenta-AI/agenta

junaway · 2026-05-21T10:44:20Z

Summary

Implements WP-B3 of the prompt runtime unification RFC: adds mustache as the fourth prompt template_format and makes it the default for newly created apps, prompt configs, and LLM-as-a-judge evaluators. It builds on the low-level renderer from WP-B1 and the structured renderer from WP-B2.

mustache is real Mustache (via the mystace engine) plus the one Agenta extension every format already carries: tags that start with {{$ are resolved as JSONPath against the render context. Existing curly, fstring, and jinja2 prompts are untouched — old apps keep their declared format, and only new creation paths write mustache.

This is primarily a backend/SDK package. The frontend changes are the minimal type-and-picker surface needed to load, preserve, and select mustache; the larger playground/native-JSON work stays in the frontend follow-up packages (WP-F2/F3).

What's in it

SDK rendering (`sdks/python/agenta/sdk/utils/`)

templating.py — new _render_mustache(...); TemplateMode widened to include "mustache". Rendering follows the same shield-and-substitute model the other formats use: {{$...}} JSONPath tags are shielded from the engine, the rest is rendered by mystace, and the resolved JSONPath values are substituted into the output last, as inert text — never re-parsed. Partials ({{>...}}), empty placeholders, JSON-Pointer tags, NUL bytes, and engine parse errors fail clearly.
types.py — PromptTemplate accepts mustache and keeps its public TemplateFormatError surface for chat/completion callers.
rendering.py — type-widening only; render_messages(...) / render_json_like(...) work unchanged once the mode is accepted.

Effect on the other renderers (`curly` / `jinja2` / `fstring`)

Adding mustache was done by extracting one shared {{$...}} JSONPath helper (_render_with_jsonpath) rather than a mustache-only path, so the other formats are touched to varying degrees:

curly — functionally equivalent. Its output is unchanged: it already resolved {{$...}} as inert data, and resolvers.py has zero diff. It is now the reference behavior the other two {{ }} formats match, rather than a special case.
jinja2 — refactored onto the shared helper, behavior preserved. _render_jinja2 no longer renders directly; it routes through _render_with_jsonpath, so {{$...}} is shielded from Jinja, the engine runs, and resolved values are substituted last as inert data ({% raw %} / {# #} spans are skipped and left to Jinja). Same rendered output, now sharing curly's JSONPath contract.
fstring — untouched. Still template.format(**context); no JSONPath, no change.
One error-contract change that spans all formats (but is only newly observable for mustache/jinja2). The TemplateFormatError message for unresolved variables now interpolates the actual template_format instead of the hardcoded literal "curly" (types.py, both the chat/completion and structured paths). For curly the wording is identical to before ("…in curly template…"). What changed is that, after the JSONPath unification, an unresolved {{$...}} tag can now raise UnresolvedVariablesError from mustache and jinja2 too — so the interpolation is what keeps their error message correctly labeled (previously they would have been mislabeled "curly"). fstring never raises this error (it uses str.format, surfacing KeyError), so for fstring the branch is dormant — the change applies to it in principle but is not currently triggerable.

Engine config (`sdks/python/agenta/sdk/engines/running/`)

interfaces.py — the mustache default lands here for all three workflow types:
- llm_v0_interface: the template_format schema scalar widens its enum to ["mustache", "curly", "fstring", "jinja2"] and flips default from curly to mustache (this is what new LLM/completion apps inherit, and the dropdown default).
- chat_v0_interface and completion_v0_interface: built-in default config flips "template_format" from curly to mustache.
handlers.py — auto_ai_critique_v0 learns a v5 default of mustache (v2 → fstring, v3/v4 → curly unchanged). An explicit template_format always wins over the version default; old judge revisions keep their original behavior.
builtin.py — the built-in auto_ai_critique template bumps to version 5 / template_format="mustache".

Backend resource (`api/oss/src/resources/evaluators/evaluators.py`)

LLM-as-a-judge evaluator definitions bump to version 5 and carry an explicit hidden template_format: "mustache" field, so newly created judges render with mustache.

Error contract

MustacheTemplateError — unsupported partial, empty placeholder, JSON-Pointer tag, NUL byte, or mystace parse error.
UnresolvedVariablesError — an unresolved curly placeholder or a failed {{$...}} JSONPath tag, in any of mustache / jinja2 / curly (cross-format parity).
TemplateFormatError — the public PromptTemplate surface, preserved.

Frontend (type + picker surface only)

template_format unions widened to include "mustache" across the editor token plugin, chat-message components, prompt schema control, and the shared chat-prompt extractor. mustache shares curly's {{name}} extraction/highlighting path.
New templateFormatOptions.ts: the picker now offers only mustache and jinja2 to new prompts. curly / fstring are legacy — hidden from the picker, but a prompt that already stores one keeps it visible and selectable (no silent coercion). Restores hiding that had regressed; pinned by a unit test.
Shared resolveTemplateFormat(...) is reused in the workflow molecule so mustache is preserved instead of coerced.

Docs

rfc.md — dependency choice (mystace vs chevron, with langchain_core considered and rejected), the three intentional Mustache deviations, the JSONPath compatibility requirement, and the security note (narrow context, never-re-parse).
_mustache-templates.mdx — draft how-to (variables, sections, {{$...}}, value coercion, what's unsupported, and escaping literal {{ }}).
escape-analysis.md — standalone analysis of the escape question raised in review: no backslash escape exists in mystace or langchain_core/chevron; the canonical literal-brace mechanism is the Mustache delimiter swap (and {% raw %} for jinja2). Decision: document now, defer a backslash escape unless real demand appears for literal {{ in curly.
findings.md, research.md, plan.md, qa.md, status.md, README.md — design workspace and review-findings record.

Compatibility

Existing apps remain on their declared format. curly / fstring / jinja2 behavior is unchanged.
Only new creation paths write mustache. Old judge revisions keep their per-version default.
Frontend never coerces a stored legacy format; it stays selectable for prompts that already use it.

Validation

cd sdks/python && uv run ruff format + uv run ruff check — clean.
cd sdks/python && uv run pytest oss/tests/pytest/unit -q — green (mustache coverage across JSONPath resolution, sections, value coercion, partial/empty/JSON-Pointer/NUL rejection, cross-format {{$...}} parity, PromptTemplate, and LLM-as-a-judge).
pnpm --filter @agenta/entity-ui test — picker and mustache-extraction regression pins pass.
pnpm lint-fix + types:check on the touched web packages — clean.

vercel · 2026-05-21T10:44:28Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 29, 2026 2:07pm

coderabbitai · 2026-05-21T10:44:44Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Introduced mustache as the new default template format for newly created apps and prompt configurations.
- Added JSONPath extension support via {{$...}} tags for advanced variable resolution in templates.
Documentation
- Added comprehensive design documentation, guides, and QA plans for mustache template usage and rendering behavior.
- Updated evaluator configuration defaults to support the new template format.

Walkthrough

This PR introduces comprehensive RFC and implementation documentation for the prompt-runtime unification effort, including the overarching problem space (RFC), the WP-B3 mustache rendering workstream specification, research foundation, multi-phase implementation plan, QA strategy, and status tracking, plus updates to the main documentation to cross-link and clarify the rollout behavior.

Changes

Prompt Runtime Unification and Mustache Rendering

Layer / File(s)	Summary
Prompt Runtime Unification RFC - Problem and Solution Requirements `docs/design/prompt-runtime-unification/rfc.md`	Defines the overarching RFC for unifying prompt variables, JSON handling, template formats, and runtime behavior across completion, chat, and judge services, including variable availability matrix, solution requirements, work packages, test plan, and future directions.
Top-level README updates `docs/design/prompt-runtime-unification/README.md`	Updates runtime-gaps, examples, mustache semantics (`{{$...}}` shield-and-substitute), WP‑B3 alignment, WP‑F2 frontend notes, WP‑D1 examples, rollout/test-plan language, and adds implementation-tracking for the WP‑B3 subfolder.
WP-B3 Mustache Rendering - Workstream README & Spec `docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.md`, `docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md`	Adds WP‑B3 overview and technical RFC specifying `mustache` semantics (JSONPath `{{$...}}` pre-resolve and inert substitution, partials rejected, coercion rules), defaulting behavior for new apps, and frontend selector visibility constraints.
WP-B3 Implementation plan, QA, research, and findings `docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md`, `.../qa.md`, `.../research.md`, `.../findings.md`, `.../status.md`, `.../summary.md`, `.../web-handoff.md`, `.../escape-analysis.md`	Adds multi-phase plan, QA test matrix and commands, Mustache engine research and chosen engine notes, verification findings, status log and decisions, rollout summary, web handoff guidance, and escape-analysis decision (no backslash escape).
Mustache how‑to and reference docs `docs/docs/prompt-engineering/integrating-prompts/_mustache-templates.mdx`	New MDX how-to covering Mustache variable usage, `{{$...}}` JSONPath extension, coercion rules, unsupported features, escaping mechanisms, and format selection guidance.
Evaluator presets update `api/oss/src/resources/evaluators/evaluators.py`	Bumps `auto_ai_critique` presets from version "4" to "5", adds `template_format: "mustache"` in presets, and adds hidden `template_format` default `"mustache"` plus hidden `version` `"5"` in evaluator settings_template.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the primary change: adding mustache template rendering as WP-B3 of the prompt unification effort.
Description check	✅ Passed	The description comprehensively relates to the changeset, detailing SDK rendering changes, engine config updates, backend resources, error contracts, frontend modifications, documentation, and validation across all affected areas.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/add-mustache-rendering

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds the WP-B3 design workspace and updates the prompt-runtime-unification documentation set to describe introducing a mustache template mode (nested-only dotted lookup + brace escaping) as the default for newly created apps/prompt configs, while preserving legacy curly behavior for existing configs.

Changes:

Added a new WP-B3 documentation workspace (RFC, research notes, plan, QA plan, status tracking).
Added/updated the parent prompt-runtime-unification RFC to include mustache semantics and rollout sequencing.
Updated the prompt-runtime-unification README index to reference the new WP-B3 workspace and clarify new-app default behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.md	Entry point for the WP-B3 workspace and its scope/files.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md	Tracks current decisions, open questions, next steps, and validation commands.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md	Defines proposed `mustache` subset semantics, escaping, compatibility, and dependency evaluation plan.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md	Maps current runtime touchpoints and evaluates Mustache library options.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md	Phased implementation plan for adding `mustache` support and defaults.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/qa.md	Test strategy covering renderer behavior, compatibility, and call-site adoption.
docs/design/prompt-runtime-unification/rfc.md	Parent RFC describing the broader runtime/frontend unification effort and template-format semantics.
docs/design/prompt-runtime-unification/README.md	Index updates to reference WP-B3 and refine wording around defaults/curly visibility.

Comments suppressed due to low confidence (6)

docs/design/prompt-runtime-unification/rfc.md:46

The referenced implementation paths here look outdated for this repo checkout: PromptTemplate lives in sdks/python/agenta/sdk/utils/types.py, and the judge helper _format_with_template is in sdks/python/agenta/sdk/engines/running/handlers.py (there is no api/sdk/agenta/... path). Updating these links would keep the RFC actionable for readers.

* Config lives under `parameters.prompt`: `messages`, `template_format`, `input_keys`, and `llm_config`.
* Rendering goes through `PromptTemplate.format(**inputs)` in `api/sdk/agenta/sdk/types.py`, which supports `curly`, `fstring`, and `jinja2`.
* Completion exposes top-level `inputs` keys as variables. Chat exposes the same keys except `messages`, which is appended as typed messages after rendering (not exposed as a template variable).

**LLM-as-a-judge** is close in behavior but uses a separate runtime path.

* Config is a flat evaluator shape: `prompt_template`, `model`, `response_type`, `json_schema`, `correct_answer_key`, `threshold`, `version`, optional `template_format`.
* Renders messages through `_format_with_template` in `api/sdk/agenta/sdk/workflows/handlers.py`. It supports the same three formats as `PromptTemplate.format`; the default depends on evaluator `version` — `fstring` for v2, `curly` for v3+.

docs/design/prompt-runtime-unification/rfc.md:58

This “Current State” bullet about rendering behavior is no longer accurate in the current code: the judge path renders json_schema via render_json_like(...) and _format_with_template doesn’t ‘return original content with a warning’ for supported formats. Please update these bullets to match the current runtime behavior so the RFC doesn’t contradict the implementation.

* **Provider/model resolution.** Chat and completion use workflow provider settings; the judge manually extracts a fixed provider-key set and therefore cannot reliably use custom or self-hosted models configured in the UI.
* **Rendering.** Each service has different rendering behavior:
  * `PromptTemplate.format` raises on Jinja errors; `_format_with_template` returns the original content with a warning.
  * Chat and completion recursively render `llm_config.response_format`. The judge builds `response_format` from `response_type` / `json_schema` and does not render variables inside `json_schema`.

docs/design/prompt-runtime-unification/rfc.md:93

Markdown formatting issue: there are extra ** characters at the end of this bold sentence, which will render incorrectly. It should likely be a single bold span followed by a period.

The basic rule should be: **native JSON stays native until template rendering****.**

docs/design/prompt-runtime-unification/rfc.md:162

Several typos/grammar issues in this section reduce clarity: “All services (chat, completion, chat)” repeats chat; “providers settings” should be “provider settings”; and “The should all support …” is missing a subject (likely “They”).

* All services (chat, completion, chat) should resolve providers settings using the same path. As such:
  * The should all support custom/self-hosted models configured in the UI

docs/design/prompt-runtime-unification/rfc.md:163

This bullet has mismatched parentheses and is missing a closing parenthesis after “explicitly set”, which makes it hard to read. Consider rephrasing/splitting the sentence to avoid nested parentheses.

* LLM-as-a-judge must not inject unsupported optional parameters such as `temperature` (the default should be None unless explicitly set (just like we currently do in chat/completion).

docs/design/prompt-runtime-unification/rfc.md:176

Minor punctuation/spacing issues: there’s an extra space before the semicolon in “acceptable ;” and the sentence ends with “welcome..”. Cleaning this up will improve readability.

* The variables panel (right side of the playground) shows:
  * variables discovered from the prompt
  * variables available from the current testcase or trace context, labeled with source and type
* The prompt editor provides autocomplete for available variables. A degraded solution with only top-level autocomplete is definitely acceptable ; a full solution with full nested autocompletion is surely welcome..

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (6)

docs/design/prompt-runtime-unification/rfc.md (2)

199-290: ⚡ Quick win

Clarify status of "JP's notes" sections.

The document contains four "JP's notes" sections (at lines 199-215, 225-232, 245-253, 276-290) that appear to be implementation details or developer notes. These inline notes may not belong in the formal RFC, or should be clearly marked as non-normative implementation guidance to distinguish them from requirements.

Consider either:

Moving these notes to a separate implementation guide or the WP-B3 planning documents

Clearly marking them as "Non-normative implementation notes" if they should remain

Removing them if they've served their purpose during draft review

144-145: ⚡ Quick win

Strengthen caveat about partial Mustache implementation.

The RFC mentions that mustache "do not implement the full mustache spec (no sections or partials)" but this critical limitation is embedded in a long sentence. Given that the WP-B3 RFC (lines 31-48) explicitly lists all unsupported features, users might be surprised if they expect standard Mustache behavior.

Consider adding a prominent callout or note immediately after introducing the mustache format to highlight that this is "Mustache-compatible variable substitution" rather than full Mustache. This aligns with the WP-B3 RFC's "Agenta's Mustache-compatible variable substitution mode, not full Mustache" language.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md (1)
57-58: 💤 Low value

Clarify why chevron is "too old" for SDK dependency.

Line 57 states "Do not use noahmorrison/chevron directly. It is too old for a new SDK runtime dependency." While the directive is clear, explaining why it's too old (unmaintained? incompatible? security issues?) would help future maintainers understand the decision.

Consider adding a brief explanation, such as:
Do not use `noahmorrison/chevron` directly. The package is unmaintained (last release 2016) and not suitable for a new SDK runtime dependency.
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md (1)

79-114: ⚡ Quick win

Consider flagging the dependency decision more prominently.

The evaluation recommends langchain_core.utils.mustache (line 98), but the dependency note (line 114) is somewhat buried. Since adding langchain-core to the SDK is a significant architectural decision, consider:

Adding a decision box or callout at the top of the "Library Evaluation" section.

Specifying concrete acceptance criteria for the Phase 2 spike (package size threshold, import time threshold, transitive dependency count).

Defining a clear fallback plan if langchain-core is rejected (e.g., "implement local tokenizer using the reference patterns from langchain-core source").

This ensures reviewers and implementers understand the dependency is conditional and has an escape hatch.

docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md (1)

16-27: ⚡ Quick win

Consider whether Phase 2 timing is optimal.

Phase 2 (library spike) is scheduled after Phase 1 (resolver implementation). This ordering could lead to rework if the langchain_core evaluation reveals that its tokenizer or renderer has incompatible behavior that affects resolver design.

Two options:

Move Phase 2 before Phase 1 - Evaluate the library first, then design resolvers around the chosen tokenizer.

Keep current order - Design resolvers independently, then adapt the library integration to match Agenta's resolver contract (this is what the research doc recommends at line 98-106).

The current order is defensible if the resolver semantics are non-negotiable product requirements (which they appear to be). If so, consider adding a note in Phase 2 explicitly stating: "Resolver semantics from Phase 1 are fixed requirements; library integration must adapt to them, not vice versa."

docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md (1)

50-57: ⚡ Quick win

Consider adding tentative recommendations for open questions.

The open questions are well-identified, but some could benefit from tentative recommendations to guide Phase 2 evaluation:

Line 53 (langchain-core acceptability): Add tentative threshold, e.g., "Proceed if package adds <5MB and <10 transitive deps; otherwise implement local tokenizer."

Line 54 (unsupported constructs): Add tentative direction, e.g., "Preferred: raise explicit UnsupportedConstructError with helpful message pointing to jinja2 for advanced logic."

Line 55 ({{.}} vs {{$}}): The note already recommends {{$}} and keeping {{.}} invalid - consider promoting this to the Decisions section if it's settled.

This would make Phase 2 (library spike) more deterministic without requiring another design review cycle.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e1e69858-cde0-4168-a9c4-8f86f6b690c5

📥 Commits

Reviewing files that changed from the base of the PR and between 5eef689 and 78ac5a4.

📒 Files selected for processing (8)

docs/design/prompt-runtime-unification/README.md
docs/design/prompt-runtime-unification/rfc.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/qa.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md

Copilot

Pull request overview

Copilot reviewed 38 out of 41 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

web/pnpm-lock.yaml: Language not supported

Addresses three PR #4393 review findings (WPB3-018/019/020): - chatPrompts.ts: extractVariablesFromText missed mustache/curly/jinja2 tags with inner whitespace ({{ name }}). Mustache treats {{ name }} and {{name}} as equivalent, so the {{ }} patterns now allow optional spaces. - TokenPlugin.tsx: the default-branch comment overstated coverage by claiming an "fstring fallback"; the {{ }} regexes do not match fstring's {...} placeholders. Comment corrected to state reality. - types.py: PromptTemplate.template_format defaults to `curly`, but the field description called mustache the default. Reworded so the model default (curly, legacy compat) is distinct from the mustache default that app-creation flows/interfaces set explicitly. Tests: whitespace token-extraction cases added to chatPromptsMustache.test.ts (via the public extractPromptTemplateContext). entity-ui vitest 13 passed; @agenta/shared + @agenta/ui types:check clean; entity-ui lint clean; ruff format + check clean on types.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 38 out of 44 changed files in this pull request and generated 2 comments.

Files not reviewed (1)

web/pnpm-lock.yaml: Language not supported

github-actions · 2026-05-22T10:44:09Z

Railway Preview Environment


Preview URL	https://gateway-production-fa33.up.railway.app/w
Project	`agenta-oss-pr-4393`
Image tag	`pr-4393-048fb5d`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-05-26T23:49:13.717Z

Addresses two PR #4393 review findings (WPB3-021/022): - _mustache-templates.mdx: the value-coercion table claimed dict/list render as compact JSON with "no extra whitespace" (e.g. {"x": 1}). The renderer uses json.dumps(ensure_ascii=False) with default separators, so the real output is {"x": 1, "y": 2} (spaces after : and ,). Reworded the row to match; renderer unchanged (curly-matching behavior is intended). - Parent RFC + README: the {{$...}} description still used the superseded "pre-rendered as JSONPath ... then the resulting template is rendered" framing, implying JSONPath results are fed back through the engine. WPB3-010 fixed only the wp-b3 doc set; this extends the same correction to the parent docs. Reworded every occurrence (rfc.md / README.md) to shield -> render -> substitute-last (inert data, never re-parsed); also tightened wp-b3 rfc.md. Verification: render-helper + structured-rendering suites 185 passed (covers all four modes incl. jinja2's shared-JSONPath path); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Make explicit in summary.md and web-handoff.md that adding mustache touched the other renderers via a shared {{$...}} JSONPath helper: - curly: functionally equivalent (output unchanged; now the reference behavior). - jinja2: refactored onto the shared helper, behavior preserved. - fstring: untouched. - error-contract change spans all formats but is only newly observable for mustache/jinja2: the "Unreplaced variables in <format> template" message now interpolates the real format instead of the hardcoded "curly". curly wording is identical to before; fstring never raises this error so the branch is dormant for it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 38 out of 43 changed files in this pull request and generated 2 comments.

Files not reviewed (1)

web/pnpm-lock.yaml: Language not supported

+    // Default: {{ }} variable tokens only. Covers "curly" and "mustache" —
+    // mustache shares curly's {{name}} delimiters for plain variables, so it
+    // tokenizes through this path. (fstring also falls through to here, but its
+    // {...} single-brace placeholders are NOT matched by these {{ }} regexes.)
    const full = /\{\{[^{}]*\}\}/
    const input = /\{\{[^{}]*$/
    const exact = /^\{\{[^{}]*\}\}$/


+        except Exception as exc:
+            raise MustacheTemplateError(
+                f"Mustache template error in content: '{template}'. Error: {exc}"
+            ) from exc


coderabbitai

Actionable comments posted: 3

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7dad6fd7-de96-4b14-b946-c884619d0c92

📥 Commits

Reviewing files that changed from the base of the PR and between 78ac5a4 and f6de9b5.

⛔ Files ignored due to path filters (1)

api/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

api/oss/src/resources/evaluators/evaluators.py
docs/design/prompt-runtime-unification/README.md
docs/design/prompt-runtime-unification/rfc.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/README.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/escape-analysis.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/findings.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/plan.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/qa.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/rfc.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/status.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/summary.md
docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/web-handoff.md
docs/docs/prompt-engineering/integrating-prompts/_mustache-templates.mdx

✅ Files skipped from review due to trivial changes (2)

docs/design/prompt-runtime-unification/wp-b3-mustache-rendering/research.md
docs/docs/prompt-engineering/integrating-prompts/_mustache-templates.mdx

coderabbitai · 2026-05-29T14:10:37Z

+Update (2026-05-22): all findings (WPB3-001..017) are fixed and Closed; there are no open findings, and all PR #4393 review threads are resolved. WPB3-014 (escape behavior) was closed via Option 3 — document the per-format reality now (delimiter swap / `{% raw %}` / none for curly; no backslash escape), defer a `\{{` escape pending real demand; full evidence and the tested `\{{` vs `\{\{` result are in `escape-analysis.md`, and the user-facing how-to gained an "Escaping" section. WPB3-017 (frontend `extractTemplateVariables` JSDoc omitted mustache) was fixed. 270 across the four focused suites pass; ruff clean. GitHub: 14 solved-by-content threads resolved (the WPB3-015 RFC cluster `3280747520`/`3280759652`/`3280767036`/`3280770190`/`3280772919`/`3280776786`/`3280781711`/`3280782719`/`3280579226`, the WPB3-016 how-to `3280800719`, and the scope/PR-title threads `3280751193`/`3280761180`/`3280794168`/`3281567723`); the only 3 left unresolved are the escape threads (`3280753760`, `3280788530`, `3280579221`) mapped to the open WPB3-014.
+
+Finding lineage: WPB3-001..004 from the first scan; WPB3-005..007 from the 2026-05-22 re-scan; WPB3-008..011 (doc-only prose fixes — docstring/qa/pre-render-framing/`+++` markers); WPB3-012..013 (P2 cross-format error-contract bugs, fixed with tests); WPB3-014 (escape, OPEN); WPB3-015 (RFC library/deviation/requirement/security consolidation, `langchain_core` recorded as considered-and-rejected); WPB3-016 (draft `_mustache-templates.mdx` how-to). WPB3-008..016 are all from the PR #4393 sync.
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve contradictory status statements in the findings summary.

This section says all findings are closed and all review threads are resolved, but also says three threads remain unresolved and mapped to WPB3-014. Please align this status block with the final WPB3-014 state in the document.

coderabbitai · 2026-05-29T14:10:38Z

+- `cd sdks/python && uv run ruff format` + `uv run ruff check` — clean.
+- `cd sdks/python && uv run pytest oss/tests/pytest/unit -q` — green (mustache coverage across JSONPath resolution, sections, value coercion, partial/empty/JSON-Pointer/NUL rejection, cross-format `{{$...}}` parity, `PromptTemplate`, and LLM-as-a-judge).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the documented lint command with --fix for SDK/API change flows.

The validation section says uv run ruff check but the repo guideline requires uv run ruff check --fix after formatting. Please align the summary command to avoid drift between docs and expected workflow.

Suggested doc fix

-- `cd sdks/python && uv run ruff format` + `uv run ruff check` — clean. +- `cd sdks/python && uv run ruff format` + `uv run ruff check --fix` — clean.

As per coding guidelines: "Run ruff format and ruff check --fix within the SDK or API folder after making changes to the API or SDK (or from repo root: ruff format then ruff check)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- `cd sdks/python && uv run ruff format` + `uv run ruff check` — clean.

- `cd sdks/python && uv run pytest oss/tests/pytest/unit -q` — green (mustache coverage across JSONPath resolution, sections, value coercion, partial/empty/JSON-Pointer/NUL rejection, cross-format `{{$...}}` parity, `PromptTemplate`, and LLM-as-a-judge).

- `cd sdks/python && uv run ruff format` + `uv run ruff check --fix` — clean.

- `cd sdks/python && uv run pytest oss/tests/pytest/unit -q` — green (mustache coverage across JSONPath resolution, sections, value coercion, partial/empty/JSON-Pointer/NUL rejection, cross-format `{{$...}}` parity, `PromptTemplate`, and LLM-as-a-judge).

coderabbitai · 2026-05-29T14:10:38Z

+| Default for new chat / completion / LLM apps | `curly` | `mustache` (all three `*_v0` interfaces) |
+| Picker options offered | curly, fstring, jinja2 | **mustache, jinja2** (legacy formats hidden unless already selected) |
+| LLM-as-a-judge evaluator | version `4`, no explicit format | version `5`, `template_format: "mustache"` |
+| Engine (Python) | `mystace>=1,<2` added |  |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the Engine (Python) before/after table row.

The row currently places mystace>=1,<2 under Before and leaves After blank, which reads as the opposite of the intended change.

initial analysis

78ac5a4

Copilot AI review requested due to automatic review settings May 21, 2026 10:44

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. feature labels May 21, 2026

vercel Bot deployed to Preview May 21, 2026 10:44 View deployment

Copilot started reviewing on behalf of junaway May 21, 2026 10:44 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread docs/design/prompt-runtime-unification/rfc.md Outdated

coderabbitai Bot reviewed May 21, 2026

View reviewed changes