chore: plan 427, PR 2 of agent-first development plan by nabinchha · Pull Request #478 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-03-30T16:32:14Z

📋 Summary

Implements PR 2 of the agent-first development plan (#427), covering Phase 3 (GitHub machinery) and architecture doc population. PR 1 (#454) restructured the top-level documentation and created the architecture stubs; this PR fills them with content and adds the GitHub templates that make the agent-assisted contribution workflow concrete.

🔗 Related Issue

PR-2 for #427

🔄 Changes

Architecture Documentation (10 files populated)

All architecture/ stubs from PR 1 are now populated with content covering overview, key components, data flow, design decisions, and cross-references:

overview.md — System architecture: three-package layout, PEP 420 namespace packages, end-to-end data flow, dual execution engines
config.md — Config layer: builder API, column configs, discriminated unions, model configs, plugin injection, lazy imports
engine.md — Engine layer: compilation pipeline, registry system, column generator hierarchy, ResourceProvider
models.md — Model subsystem: facade pattern, AIMD throttling, retry transport, usage tracking, MCP tool loops
mcp.md — MCP subsystem: MCPIOService, session pooling, tool schema coalescing, turn limits
dataset-builders.md — Dataset builders: sequential/async execution, ExecutionGraph, CompletionTracker, DAG, DatasetBatchManager
sampling.md — Sampling: DatasetGenerator, constraint system, person/entity generation, managed datasets
cli.md — CLI: lazy command loading, controller/service/repo pattern, generation commands
agent-introspection.md — Agent introspection: FamilySpec, type discovery, state commands, error handling
plugins.md — Plugin system: entry-point discovery, PluginRegistry, union injection, custom columns comparison

GitHub Templates (Phase 3)

.github/PULL_REQUEST_TEMPLATE.md — New PR template: Summary, Related Issue, Changes, Testing checklist, Checklist
.github/ISSUE_TEMPLATE/bug-report.yml — Added Agent Diagnostic field and investigation checklist
.github/ISSUE_TEMPLATE/feature-request.yml — Added Agent Investigation field and review checklist
.github/ISSUE_TEMPLATE/development-task.yml — Added Investigation/Context and Agent Plan fields
.github/ISSUE_TEMPLATE/config.yml — Updated Discussions link copy to mention agent-assisted workflow

Skill Template Conformance

.agents/skills/create-pr/SKILL.md — Updated to produce PR descriptions matching the new PR template structure (Summary, Related Issue, Changes, Testing, Checklist)

Not Yet Addressed (from plan step list)

CODEOWNERS update (step 9) — Plan calls for keeping the existing single-group ownership, so no change needed
Label creation (step 10) — Already created via gh label create after merge, or in a follow-up

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

Architecture docs accuracy — Each doc was written from source code analysis. Please verify the descriptions match current behavior, especially for dataset-builders.md (async engine) and models.md (AIMD throttling).
Issue template fields — The new agent diagnostic/investigation fields are optional (not required). Verify this matches the intended contributor experience.

🧪 Testing

N/A — documentation and template changes only, no testable logic

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated (this is the PR that populates them)

greptile-apps · 2026-03-30T16:36:45Z

Greptile Summary

This PR populates 10 previously-stub architecture documentation files and adds GitHub issue/PR templates plus a skill update to support the agent-assisted contribution workflow (Phase 3 of plan #427). The documentation was written from source code analysis and is generally accurate — cross-references between docs are internally consistent and the descriptions of the plugin system, MCP subsystem, CLI layering, dataset builders, and agent introspection subsystem all match the source.

Key changes:

Architecture docs (10 files): All stubs from PR docs: restructure agent and contributor documentation (plan 427, PR 1) #454 are now fully populated with overview, key components, data flow, design decisions, and cross-references.
GitHub templates: New PULL_REQUEST_TEMPLATE.md; updated issue templates to add optional agent diagnostic/investigation fields and lightweight checklists.
create-pr skill: Updated to align generated PR descriptions with the new template structure.
One factual inaccuracy found in architecture/models.md: the Design Decisions section states "429s excluded from transport retries" without qualification, but this only applies to the async engine path. The sync engine uses strip_rate_limit_codes=False, meaning 429s are retried transparently at the transport layer.

Confidence Score: 4/5

Safe to merge after correcting the sync-vs-async 429 retry claim in models.md — all other content is accurate and well-structured.

One P1 factual inaccuracy in architecture/models.md: the Design Decisions section claims 429s are globally excluded from transport retries, but the sync engine path (strip_rate_limit_codes=False in http_model_client.py line 107) retries 429s at the transport layer. This misrepresents how the sync engine handles rate limits and could mislead contributors. All other docs, templates, and skill changes are accurate and well-aligned with source code.

architecture/models.md — Design Decisions section needs the 429 retry claim qualified for sync vs. async engine paths.

Important Files Changed

Filename	Overview
architecture/models.md	Comprehensive model subsystem doc; one factual inaccuracy in the Design Decisions section — the '429s excluded from transport retries' claim is only true for the async engine path, not the sync path.
architecture/dataset-builders.md	Accurate and detailed description of sequential and async dataset builder execution, DAG, CompletionTracker, and DatasetBatchManager, consistent with source code.
architecture/overview.md	Clear high-level architecture overview with accurate package layout, dependency direction, and end-to-end data flow description.
.github/PULL_REQUEST_TEMPLATE.md	New PR template with Summary, Related Issue, Changes, Testing checklist, and Checklist sections — well-structured and aligned with the updated SKILL.md.
.github/ISSUE_TEMPLATE/bug-report.yml	Adds optional Agent Diagnostic textarea and a checklist (two required, one optional) for structured bug reports.
.agents/skills/create-pr/SKILL.md	Updated to produce PR descriptions conforming to the new PR template structure (Summary, Related Issue, Changes, Testing, Checklist), with clearer section guidelines.

Sequence Diagram

sequenceDiagram
    participant G as ColumnGenerator
    participant MF as ModelFacade
    participant TM as ThrottledModelClient
    participant RT as RetryTransport
    participant API as LLM API

    G->>MF: completion(messages)
    MF->>TM: chat(messages)
    TM->>TM: acquire_sync/async (AIMD)
    TM->>RT: POST /chat/completions
    alt Transient error (502/503/504)
        RT-->>RT: retry (transport layer)
    end
    alt Rate limit 429 — async engine (strip_rate_limit_codes=True)
        RT-->>TM: 429 surfaces immediately
        TM->>TM: release_rate_limited (AIMD decrease)
        TM->>TM: cooldown + retry
    end
    alt Rate limit 429 — sync engine (strip_rate_limit_codes=False)
        RT-->>RT: 429 retried at transport layer
    end
    API-->>RT: 200 OK
    RT-->>TM: response
    TM->>TM: release_success (AIMD increase)
    TM-->>MF: ChatCompletion
    opt MCP tool calls in response
        MF->>MF: MCPFacade.process_completion_response
        MF->>API: tool results → next completion
    end
    MF-->>G: parsed result

Comments Outside Diff (1)

architecture/models.md, line 73 (link)

Inaccurate claim: 429s are not excluded from transport retries on the sync path

The Design Decisions section says "429s excluded from transport retries" without qualification, but the actual behavior is engine-mode dependent:
- Async engine (strip_rate_limit_codes=True): 429s are stripped from the retryable status set and surface to the ThrottledModelClient AIMD feedback loop. ✓
- Sync engine (strip_rate_limit_codes=False): 429s are retried at the transport layer transparently; there is no AIMD salvage path for the sync engine.
This split is confirmed in packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/http_model_client.py (sync path at line 107, async path at line 125) and documented in the RetryConfig docstring in retry.py.

The Key Components section above already gets this right by saying "excluded ... when strip_rate_limit_codes=True". The Design Decisions entry should match:

Prompt To Fix All With AI

This is a comment left during a code review.
Path: architecture/models.md
Line: 73

Comment:
**Inaccurate claim: 429s are not excluded from transport retries on the sync path**

The Design Decisions section says "429s excluded from transport retries" without qualification, but the actual behavior is engine-mode dependent:

- **Async engine** (`strip_rate_limit_codes=True`): 429s are stripped from the retryable status set and surface to the `ThrottledModelClient` AIMD feedback loop. ✓
- **Sync engine** (`strip_rate_limit_codes=False`): 429s **are** retried at the transport layer transparently; there is no AIMD salvage path for the sync engine.

This split is confirmed in `packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/http_model_client.py` (sync path at line 107, async path at line 125) and documented in the `RetryConfig` docstring in `retry.py`.

The Key Components section above already gets this right by saying "excluded ... when `strip_rate_limit_codes=True`". The Design Decisions entry should match:

```suggestion
- **429s excluded from transport retries in async mode** so rate-limit signals reach the throttle manager for AIMD backoff. In the sync engine, 429s are handled directly by the transport retry layer (no salvage queue).
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile}

nabinchha added 4 commits March 25, 2026 13:17

save progress

002750f

undo review-code skill change

afdf713

delete status file

c516b40

Merge branch 'main' into nmulepati/docs/427-agent-first-dev-pr-2

945f1a7

nabinchha requested a review from a team as a code owner March 30, 2026 16:32

nabinchha changed the title ~~docs: populate architecture docs, add GitHub templates, update skills (plan 427, PR 2)~~ docs: plan 427, PR 2 of agent-first development plan Mar 30, 2026

nabinchha changed the title ~~docs: plan 427, PR 2 of agent-first development plan~~ chore: plan 427, PR 2 of agent-first development plan Mar 31, 2026

Merge branch 'main' into nmulepati/docs/427-agent-first-dev-pr-2

931022e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: plan 427, PR 2 of agent-first development plan#478

chore: plan 427, PR 2 of agent-first development plan#478
nabinchha wants to merge 5 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-2

nabinchha commented Mar 30, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 30, 2026 •

edited

Loading

Confidence Score: 4/5

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nabinchha commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🔗 Related Issue

🔄 Changes

Architecture Documentation (10 files populated)

GitHub Templates (Phase 3)

Skill Template Conformance

Not Yet Addressed (from plan step list)

🔍 Attention Areas

🧪 Testing

✅ Checklist

Uh oh!

greptile-apps bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nabinchha commented Mar 30, 2026 •

edited

Loading

greptile-apps bot commented Mar 30, 2026 •

edited

Loading