Skip to content

chore: plan 427, PR 2 of agent-first development plan#478

Open
nabinchha wants to merge 5 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-2
Open

chore: plan 427, PR 2 of agent-first development plan#478
nabinchha wants to merge 5 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-2

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented Mar 30, 2026

📋 Summary

Implements PR 2 of the agent-first development plan (#427), covering Phase 3 (GitHub machinery) and architecture doc population. PR 1 (#454) restructured the top-level documentation and created the architecture stubs; this PR fills them with content and adds the GitHub templates that make the agent-assisted contribution workflow concrete.

🔗 Related Issue

PR-2 for #427

🔄 Changes

Architecture Documentation (10 files populated)

All architecture/ stubs from PR 1 are now populated with content covering overview, key components, data flow, design decisions, and cross-references:

  • overview.md — System architecture: three-package layout, PEP 420 namespace packages, end-to-end data flow, dual execution engines
  • config.md — Config layer: builder API, column configs, discriminated unions, model configs, plugin injection, lazy imports
  • engine.md — Engine layer: compilation pipeline, registry system, column generator hierarchy, ResourceProvider
  • models.md — Model subsystem: facade pattern, AIMD throttling, retry transport, usage tracking, MCP tool loops
  • mcp.md — MCP subsystem: MCPIOService, session pooling, tool schema coalescing, turn limits
  • dataset-builders.md — Dataset builders: sequential/async execution, ExecutionGraph, CompletionTracker, DAG, DatasetBatchManager
  • sampling.md — Sampling: DatasetGenerator, constraint system, person/entity generation, managed datasets
  • cli.md — CLI: lazy command loading, controller/service/repo pattern, generation commands
  • agent-introspection.md — Agent introspection: FamilySpec, type discovery, state commands, error handling
  • plugins.md — Plugin system: entry-point discovery, PluginRegistry, union injection, custom columns comparison

GitHub Templates (Phase 3)

Skill Template Conformance

  • .agents/skills/create-pr/SKILL.md — Updated to produce PR descriptions matching the new PR template structure (Summary, Related Issue, Changes, Testing, Checklist)

Not Yet Addressed (from plan step list)

  • CODEOWNERS update (step 9) — Plan calls for keeping the existing single-group ownership, so no change needed
  • Label creation (step 10) — Already created via gh label create after merge, or in a follow-up

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • Architecture docs accuracy — Each doc was written from source code analysis. Please verify the descriptions match current behavior, especially for dataset-builders.md (async engine) and models.md (AIMD throttling).
  • Issue template fields — The new agent diagnostic/investigation fields are optional (not required). Verify this matches the intended contributor experience.

🧪 Testing

  • N/A — documentation and template changes only, no testable logic

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (this is the PR that populates them)

@nabinchha nabinchha requested a review from a team as a code owner March 30, 2026 16:32
@nabinchha nabinchha changed the title docs: populate architecture docs, add GitHub templates, update skills (plan 427, PR 2) docs: plan 427, PR 2 of agent-first development plan Mar 30, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR populates 10 previously-stub architecture documentation files and adds GitHub issue/PR templates plus a skill update to support the agent-assisted contribution workflow (Phase 3 of plan #427). The documentation was written from source code analysis and is generally accurate — cross-references between docs are internally consistent and the descriptions of the plugin system, MCP subsystem, CLI layering, dataset builders, and agent introspection subsystem all match the source.

Key changes:

  • Architecture docs (10 files): All stubs from PR docs: restructure agent and contributor documentation (plan 427, PR 1) #454 are now fully populated with overview, key components, data flow, design decisions, and cross-references.
  • GitHub templates: New PULL_REQUEST_TEMPLATE.md; updated issue templates to add optional agent diagnostic/investigation fields and lightweight checklists.
  • create-pr skill: Updated to align generated PR descriptions with the new template structure.
  • One factual inaccuracy found in architecture/models.md: the Design Decisions section states "429s excluded from transport retries" without qualification, but this only applies to the async engine path. The sync engine uses strip_rate_limit_codes=False, meaning 429s are retried transparently at the transport layer.

Confidence Score: 4/5

Safe to merge after correcting the sync-vs-async 429 retry claim in models.md — all other content is accurate and well-structured.

One P1 factual inaccuracy in architecture/models.md: the Design Decisions section claims 429s are globally excluded from transport retries, but the sync engine path (strip_rate_limit_codes=False in http_model_client.py line 107) retries 429s at the transport layer. This misrepresents how the sync engine handles rate limits and could mislead contributors. All other docs, templates, and skill changes are accurate and well-aligned with source code.

architecture/models.md — Design Decisions section needs the 429 retry claim qualified for sync vs. async engine paths.

Important Files Changed

Filename Overview
architecture/models.md Comprehensive model subsystem doc; one factual inaccuracy in the Design Decisions section — the '429s excluded from transport retries' claim is only true for the async engine path, not the sync path.
architecture/dataset-builders.md Accurate and detailed description of sequential and async dataset builder execution, DAG, CompletionTracker, and DatasetBatchManager, consistent with source code.
architecture/overview.md Clear high-level architecture overview with accurate package layout, dependency direction, and end-to-end data flow description.
.github/PULL_REQUEST_TEMPLATE.md New PR template with Summary, Related Issue, Changes, Testing checklist, and Checklist sections — well-structured and aligned with the updated SKILL.md.
.github/ISSUE_TEMPLATE/bug-report.yml Adds optional Agent Diagnostic textarea and a checklist (two required, one optional) for structured bug reports.
.agents/skills/create-pr/SKILL.md Updated to produce PR descriptions conforming to the new PR template structure (Summary, Related Issue, Changes, Testing, Checklist), with clearer section guidelines.

Sequence Diagram

sequenceDiagram
    participant G as ColumnGenerator
    participant MF as ModelFacade
    participant TM as ThrottledModelClient
    participant RT as RetryTransport
    participant API as LLM API

    G->>MF: completion(messages)
    MF->>TM: chat(messages)
    TM->>TM: acquire_sync/async (AIMD)
    TM->>RT: POST /chat/completions
    alt Transient error (502/503/504)
        RT-->>RT: retry (transport layer)
    end
    alt Rate limit 429 — async engine (strip_rate_limit_codes=True)
        RT-->>TM: 429 surfaces immediately
        TM->>TM: release_rate_limited (AIMD decrease)
        TM->>TM: cooldown + retry
    end
    alt Rate limit 429 — sync engine (strip_rate_limit_codes=False)
        RT-->>RT: 429 retried at transport layer
    end
    API-->>RT: 200 OK
    RT-->>TM: response
    TM->>TM: release_success (AIMD increase)
    TM-->>MF: ChatCompletion
    opt MCP tool calls in response
        MF->>MF: MCPFacade.process_completion_response
        MF->>API: tool results → next completion
    end
    MF-->>G: parsed result
Loading

Comments Outside Diff (1)

  1. architecture/models.md, line 73 (link)

    Inaccurate claim: 429s are not excluded from transport retries on the sync path

    The Design Decisions section says "429s excluded from transport retries" without qualification, but the actual behavior is engine-mode dependent:

    • Async engine (strip_rate_limit_codes=True): 429s are stripped from the retryable status set and surface to the ThrottledModelClient AIMD feedback loop. ✓
    • Sync engine (strip_rate_limit_codes=False): 429s are retried at the transport layer transparently; there is no AIMD salvage path for the sync engine.

    This split is confirmed in packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/http_model_client.py (sync path at line 107, async path at line 125) and documented in the RetryConfig docstring in retry.py.

    The Key Components section above already gets this right by saying "excluded ... when strip_rate_limit_codes=True". The Design Decisions entry should match:

Prompt To Fix All With AI
This is a comment left during a code review.
Path: architecture/models.md
Line: 73

Comment:
**Inaccurate claim: 429s are not excluded from transport retries on the sync path**

The Design Decisions section says "429s excluded from transport retries" without qualification, but the actual behavior is engine-mode dependent:

- **Async engine** (`strip_rate_limit_codes=True`): 429s are stripped from the retryable status set and surface to the `ThrottledModelClient` AIMD feedback loop. ✓
- **Sync engine** (`strip_rate_limit_codes=False`): 429s **are** retried at the transport layer transparently; there is no AIMD salvage path for the sync engine.

This split is confirmed in `packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/http_model_client.py` (sync path at line 107, async path at line 125) and documented in the `RetryConfig` docstring in `retry.py`.

The Key Components section above already gets this right by saying "excluded ... when `strip_rate_limit_codes=True`". The Design Decisions entry should match:

```suggestion
- **429s excluded from transport retries in async mode** so rate-limit signals reach the throttle manager for AIMD backoff. In the sync engine, 429s are handled directly by the transport retry layer (no salvage queue).
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile

@nabinchha nabinchha changed the title docs: plan 427, PR 2 of agent-first development plan chore: plan 427, PR 2 of agent-first development plan Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant