diff --git a/docs/copilot-cli-integration-research.md b/docs/copilot-cli-integration-research.md new file mode 100644 index 0000000..55f5909 --- /dev/null +++ b/docs/copilot-cli-integration-research.md @@ -0,0 +1,910 @@ +# PromptKit Γ— GitHub Copilot CLI β€” Native Integration Research + +## Problem Statement + +PromptKit's current entry point requires users to manually type **"Read and execute bootstrap.md"** +into an LLM session. The existing CLI (`npx promptkit`) automates this slightly by spawning +`copilot -i "Read and execute bootstrap.md"`, but the experience still feels like a bolt-on +addon rather than a native feature of the AI tooling. + +**Goal**: Make PromptKit's prompt composition feel like a built-in capability of +GitHub Copilot CLI β€” discoverable, invokable with natural commands, and seamless. + +> **Implementation status (May 2026 update).** Several strategies in this +> document have shipped since the original research: +> +> | Strategy | Status | Evidence | +> |---|---|---| +> | A. Skills (`/promptkit`, `/boot`, `/bootstrap`) | βœ… Shipped | PRs #175, #176, #229, #245; `.github/skills/` | +> | B. Custom Agents (as PromptKit *output target*) | 🟑 Partial | PR #174 β€” `agent-instructions` format can generate `.github/agents/*.agent.md`, but PromptKit itself is not yet distributed as agents | +> | E. Custom Instructions | βœ… Pre-existing | `.github/copilot-instructions.md` + `agent-instructions` format | +> | C. MCP Server | ❌ Not started | β€” | +> | D. Plugin (composite distribution) | ❌ Not started | β€” | +> | F. Hooks | ❌ Not started | β€” | +> | G. LSP Configs | ❌ Not started | β€” | +> +> The skill work resolved part of the **discoverability + invocation** pain +> (users can now type `/promptkit` instead of `Read and execute bootstrap.md`), +> but the **assembly-fidelity** problem (LLM honor-system reading 500KB of +> Markdown) and the **context-discontinuity across artifacts** problem +> remain. Strategies C / D / F / G are the forward work this document scopes. + +--- + +## Current Architecture (Baseline) + +``` +User β†’ promptkit CLI β†’ spawns copilot -i "Read and execute bootstrap.md" + ↓ + Copilot reads bootstrap.md in temp dir + ↓ + LLM reads manifest.yaml, asks user, selects components + ↓ + LLM reads .md files, assembles prompt verbatim + ↓ + Writes assembled prompt to user's project +``` + +**Pain points**: +- Requires a separate `promptkit` CLI installation +- Spawns a *child* Copilot process (loses existing session context) +- No integration with Copilot's own discoverability (`/skills`, `/agent`, etc.) +- User can't naturally say "use PromptKit to investigate this bug" in an existing session +- No auto-invocation when PromptKit would be relevant +- **Context discontinuity across PromptKit-generated artifacts**: A user invokes PromptKit + to generate an output β€” say, a custom agent for reviewing their firmware code. They begin + working with that agent and, mid-session, discover a memory corruption bug they want to + investigate systematically. PromptKit has an `investigate-bug` template with root-cause + analysis and memory-safety protocols that would be ideal β€” but invoking it means leaving + the current agent's session, losing its accumulated context and domain expertise, and + starting fresh in a separate PromptKit workflow. The generated agent doesn't know how to + call back into PromptKit, and PromptKit doesn't know about the agent it previously + generated. This creates a disjointed experience where PromptKit's value is siloed into + one-shot generation rather than being a continuously available capability within the + user's workflow. The ideal experience would let the user say "use PromptKit's bug + investigation methodology here" without breaking out of their current session or + losing the specialized context they've built up. + +--- + +## Copilot CLI Extension Points (Comprehensive Inventory) + +### 1. Custom Instructions +**Purpose**: Persistent, always-loaded guidance for how Copilot should behave. + +**Locations** (all auto-loaded): +- `.github/copilot-instructions.md` β€” repository-wide +- `.github/instructions/**/*.instructions.md` β€” path-specific (with `applyTo` glob frontmatter) +- `AGENTS.md`, `CLAUDE.md`, `GEMINI.md` β€” in repo root and/or CWD +- `~/.copilot/copilot-instructions.md` β€” user-level (all projects) +- `COPILOT_CUSTOM_INSTRUCTIONS_DIRS` env var β€” additional locations + +**Key characteristics**: +- Always injected into context (no opt-in/opt-out per session) +- Path-specific instructions only load when Copilot works on matching files +- Support `excludeAgent` frontmatter to limit to coding-agent or code-review only +- No slash command invocation β€” purely implicit +- Best for: coding standards, repo conventions, communication preferences + +### 2. Skills +**Purpose**: Task-specific, on-demand instructions with optional scripts and resources. + +**Locations** (first-found-wins dedup by `name` field): +1. `/.github/skills/*/SKILL.md` (project) +2. `/.agents/skills/*/SKILL.md` (project) +3. `/.claude/skills/*/SKILL.md` (project) +4. `/.github/skills/` etc. (inherited, monorepo) +5. `~/.copilot/skills/*/SKILL.md` (personal-copilot) +6. `~/.agents/skills/*/SKILL.md` (personal-agents) +7. `~/.claude/skills/*/SKILL.md` (personal-claude) +8. Plugin `skills/` dirs (plugin) +9. `COPILOT_SKILLS_DIRS` env + config (custom) + +**Key characteristics**: +- Loaded **on demand** β€” only when Copilot detects relevance or user invokes explicitly +- `SKILL.md` has YAML frontmatter: `name` (required), `description` (required), `license`, `allowed-tools` +- Entire skill directory is discoverable by Copilot (scripts, examples, data files) +- `allowed-tools` in frontmatter can pre-approve tools (e.g., `shell`) β€” security implications +- Invocation: `/skill-name`, natural language, or auto-detected from description +- `/skills list`, `/skills info`, `/skills reload`, `/skills add`, `/skills remove` +- Best for: repeatable workflows, specific task instructions, script-backed automation + +### 3. Custom Agents +**Purpose**: Specialized AI personas with constrained toolsets, running in their own context window. + +**Locations** (first-found-wins dedup by ID derived from filename): +1. `~/.copilot/agents/*.agent.md` (user) +2. `/.github/agents/*.agent.md` (project) +3. `/.github/agents/*.agent.md` (inherited, monorepo) +4. `~/.claude/agents/*.agent.md` (user, .claude convention) +5. `/.claude/agents/*.agent.md` (project) +6. `/.claude/agents/*.agent.md` (inherited) +7. Plugin `agents/` dirs (plugin, by install order) +8. Remote org/enterprise agents (remote, via API) + +**Key characteristics**: +- YAML frontmatter: `name`, `description` (required), `tools` (list or omit for all), `infer` (bool), `model` +- `infer: true` enables auto-delegation β€” Copilot spawns a subagent when task matches +- Own context window β€” doesn't pollute main session, but also can't see main session context +- Max 30,000 characters for prompt body +- Invocation: `/agent`, `--agent=name`, natural language reference, or auto-inferred +- Best for: specialized personas, constrained tool access, tasks needing isolation + +### 4. MCP Servers +**Purpose**: External tool providers that extend Copilot's capabilities. + +**Configuration locations** (last-wins dedup by server name): +1. `~/.copilot/mcp-config.json` (lowest priority) +2. `.vscode/mcp.json` (workspace) +3. Plugin MCP configs (plugins) +4. `--additional-mcp-config` flag (highest priority) + +**Key characteristics**: +- Adds callable tools to Copilot's toolset +- stdio-based transport (Copilot manages process lifecycle) +- Tools available in all sessions automatically +- Cross-client: MCP is an open protocol supported by Claude, VS Code, etc. +- Best for: programmatic capabilities, external system integration, deterministic operations + +### 5. Hooks +**Purpose**: Lifecycle event handlers that execute shell commands at specific points during agent execution. + +**Configuration locations**: +- `.github/hooks/hooks.json` (repository, must be on default branch for cloud agent) +- `hooks.json` in CWD (for CLI) +- Plugin `hooks.json` (via plugin) + +**Available hook triggers**: + +| Hook | When | Input | Can modify behavior? | +|------|------|-------|---------------------| +| `sessionStart` | New/resumed session | `{timestamp, cwd, source, initialPrompt}` | No (output ignored) | +| `sessionEnd` | Session completes | `{timestamp, cwd, reason}` | No | +| `userPromptSubmitted` | User submits prompt | `{timestamp, cwd, prompt}` | No (prompt modification not supported) | +| `preToolUse` | Before any tool runs | `{timestamp, cwd, toolName, toolArgs}` | **Yes** β€” can `deny` tool use | +| `postToolUse` | After tool completes | `{timestamp, cwd, toolName, toolArgs, toolResult}` | No | +| `errorOccurred` | Error during execution | `{timestamp, cwd, error}` | No | +| `agentStop` | Main agent stops | N/A | No | +| `subagentStop` | Subagent completes | N/A | No | + +**Key characteristics**: +- Execute shell commands (bash/powershell), not LLM logic +- `preToolUse` is the most powerful β€” can approve/deny tool execution +- Multiple hooks per event, executed in order +- Configurable timeout (`timeoutSec`, default 30s) +- Can set environment variables and working directory per hook +- Best for: guardrails, audit logging, policy enforcement, external notifications + +**PromptKit relevance**: +- `sessionStart` hook could auto-load PromptKit context or validate setup +- `userPromptSubmitted` hook could detect PromptKit-relevant prompts and log/tag them +- `preToolUse` hook could enforce PromptKit assembly rules (e.g., prevent writing to PromptKit source files) +- `postToolUse` hook could validate assembled prompt output (e.g., check for `{{param}}` placeholders) +- `subagentStop` hook could capture output from a PromptKit agent for post-processing + +### 6. LSP Server Configurations +**Purpose**: Language Server Protocol integration for code intelligence (go-to-definition, diagnostics, hover). + +**Configuration locations**: +- `~/.copilot/lsp-config.json` (user-level, all projects) +- `.github/lsp.json` (repository-level) +- Plugin `lsp.json` (via plugin) + +**Configuration format**: +```json +{ + "lspServers": { + "typescript": { + "command": "typescript-language-server", + "args": ["--stdio"], + "fileExtensions": { + ".ts": "typescript", + ".tsx": "typescript" + } + } + } +} +``` + +**Key characteristics**: +- Provides code intelligence to Copilot (diagnostics, symbols, references) +- Copilot CLI does NOT bundle LSP servers β€” they must be installed separately +- Configured per-language with file extension mappings +- Status viewable via `/lsp` command +- Not for adding LLM tools β€” purely for code understanding + +**PromptKit relevance**: +- LSP servers enhance Copilot's ability to understand code that PromptKit prompts operate on +- A PromptKit plugin could bundle LSP configs for languages commonly used with PromptKit templates +- For example: when a user invokes `review-cpp-code`, the plugin's LSP config could ensure + `clangd` is configured for `.c`/`.h` files, giving Copilot richer code understanding +- LSP is **complementary** to PromptKit, not a direct integration path for PromptKit itself +- Could also help validate PromptKit YAML files if a YAML language server is configured + +### 7. Plugins +**Purpose**: Distributable packages that bundle agents, skills, hooks, MCP servers, and LSP configs. + +**Plugin manifest** (`plugin.json`): +```json +{ + "name": "my-plugin", + "description": "Plugin description", + "version": "1.0.0", + "author": { "name": "Author" }, + "license": "MIT", + "agents": "agents/", + "skills": ["skills/"], + "hooks": "hooks.json", + "mcpServers": ".mcp.json", + "lspServers": "lsp.json" +} +``` + +**Installation sources**: +| Format | Example | Description | +|--------|---------|-------------| +| Marketplace | `plugin@marketplace` | Plugin from a registered marketplace | +| GitHub | `OWNER/REPO` | Root of a GitHub repository | +| GitHub subdir | `OWNER/REPO:PATH/TO/PLUGIN` | Subdirectory in a repository | +| Git URL | `https://github.com/o/r.git` | Any Git URL | +| Local path | `./my-plugin` or `/abs/path` | Local directory | + +**Default marketplaces** (pre-registered): +- `github/copilot-plugins` +- `github/awesome-copilot` + +**Key characteristics**: +- One-command install: `copilot plugin install SPEC` +- Cached at `~/.copilot/installed-plugins/` +- Components follow same precedence rules as manual installs (skills first-found-wins, MCP last-wins) +- Project-level components override plugin components (plugins cannot override project config) +- Marketplace support for discovery, versioning, and team distribution +- Plugin marketplace format (`marketplace.json`) enables curated plugin catalogs +- CLI commands: `install`, `uninstall`, `list`, `update`, `marketplace add/list/browse/remove` + +**PromptKit relevance**: This is the **ideal distribution mechanism** for PromptKit integration. +A PromptKit plugin can bundle all integration components in a single installable unit. + +--- + +## Integration Strategies + +### Strategy A: PromptKit as a Copilot CLI Skill + +**Concept**: Create a `SKILL.md` that teaches Copilot how to use PromptKit's manifest +and components to assemble prompts on demand. + +**Structure** (personal skill β€” works across all projects): +``` +~/.copilot/skills/promptkit/ +β”œβ”€β”€ SKILL.md # Instructions for Copilot +β”œβ”€β”€ manifest.yaml # PromptKit component index +β”œβ”€β”€ bootstrap-core.md # Stripped-down assembly instructions +β”œβ”€β”€ personas/ # All persona .md files +β”œβ”€β”€ protocols/ # All protocol .md files +β”œβ”€β”€ formats/ # All format .md files +β”œβ”€β”€ templates/ # All template .md files +└── taxonomies/ # All taxonomy .md files +``` + +**SKILL.md** would contain: +```yaml +--- +name: promptkit +description: > + Composable prompt assembly for engineering tasks. Use when asked to + investigate bugs, review code, write requirements, design systems, + audit specifications, plan implementations, or author CI/CD pipelines. + Also use when the user mentions "promptkit" or asks for a structured + engineering prompt. +--- +``` + +Followed by condensed bootstrap instructions (read manifest, select components, +assemble verbatim, etc.). + +**User experience**: +``` +> /promptkit investigate the memory leak in packet_handler.c +> Use the promptkit skill to review this C++ code for thread safety +> What promptkit templates are available? +``` + +Or auto-invoked when Copilot detects relevance: +``` +> I need to write a requirements doc for our new auth system + [Copilot auto-selects promptkit skill based on description match] +``` + +**Pros**: +- Native Copilot UX β€” discoverable via `/skills list`, invokable via `/promptkit` +- Auto-detection when task matches skill description +- No separate CLI installation needed (content lives in skill directory) +- Works in existing session (no child process spawn) +- Personal skills (`~/.copilot/skills/`) work across all projects +- Project skills (`.github/skills/`) can be version-controlled per-repo + +**Cons**: +- Skill directory would be large (~500KB+ of .md files) +- All content loaded into context when skill activates (context pressure) +- No programmatic assembly β€” still relies on LLM reading files +- Updating requires re-copying files (no `npm update`) + +**Mitigation for context pressure**: The SKILL.md can instruct Copilot to read +manifest.yaml first for discovery, then only read the specific component files +needed for the selected template β€” not all files at once. + +**Feasibility**: βœ… High β€” uses existing, stable Copilot CLI features + +--- + +### Strategy B: PromptKit as Custom Agent(s) + +**Concept**: Define PromptKit as one or more custom agents that Copilot can +delegate to for engineering prompt assembly tasks. + +**Option B1: Single meta-agent** +``` +~/.copilot/agents/promptkit.agent.md +``` + +```yaml +--- +name: promptkit +description: > + PromptKit prompt composition engine. Assembles task-specific prompts + from composable components (personas, protocols, formats, templates). + Use for structured engineering tasks: bug investigation, code review, + requirements authoring, specification auditing, and more. +tools: ["read", "edit", "search", "create", "glob", "grep"] +infer: true +--- +``` + +Body contains bootstrap instructions. Agent reads PromptKit files from a +known location (e.g., `~/.copilot/promptkit/` or a cloned repo). + +**Option B2: Per-template agents** +``` +~/.copilot/agents/ +β”œβ”€β”€ promptkit-investigate-bug.agent.md +β”œβ”€β”€ promptkit-review-code.agent.md +β”œβ”€β”€ promptkit-author-requirements.agent.md +β”œβ”€β”€ promptkit-review-cpp.agent.md +└── ... +``` + +Each agent has the relevant persona + protocols pre-selected, with the +template instructions baked in. No manifest lookup needed. + +**User experience**: +``` +> /agent promptkit +> --agent=promptkit investigate the crash in main.c +> [auto-delegated when Copilot infers promptkit is relevant] +``` + +**Pros**: +- Runs in its own context window (doesn't pollute main session) +- `infer: true` enables auto-delegation +- Tool restrictions for safety (read-only for audits, etc.) +- Clean separation of concerns +- Per-template agents (B2) are pre-composed β€” faster, no manifest lookup + +**Cons**: +- Custom agent loses the composability that makes PromptKit powerful (B2) +- Meta-agent (B1) still needs access to all component files +- Separate context window means it can't see what the main agent already knows +- Per-template agents (B2) create maintenance burden (N agents Γ— updates) + +**Feasibility**: βœ… High β€” custom agents are well-supported + +--- + +### Strategy C: PromptKit MCP Server + +**Concept**: Build an MCP server that exposes PromptKit's assembly engine as +tools that Copilot (or any MCP client) can call programmatically. + +**Exposed tools**: +``` +promptkit_list_templates β†’ Returns available templates with descriptions +promptkit_list_components β†’ Returns personas, protocols, formats, taxonomies +promptkit_get_template_info β†’ Returns template details (params, persona, protocols) +promptkit_assemble β†’ Assembles a complete prompt from template + params +promptkit_get_component β†’ Returns a single component's body text +``` + +**MCP server implementation** (Node.js, reuses existing CLI infrastructure): +```javascript +// Reads manifest.yaml, resolves components, assembles verbatim +server.tool("promptkit_assemble", { + template: "investigate-bug", + params: { context: "segfault in packet_handler.c", audience: "senior engineers" } +}) β†’ returns assembled prompt as string +``` + +**Registration** (`~/.copilot/mcp-config.json`): +```json +{ + "mcpServers": { + "promptkit": { + "command": "npx", + "args": ["@promptkit/mcp-server"], + "type": "stdio" + } + } +} +``` + +**User experience**: +``` +> I need to investigate a memory leak in our C code + [Copilot calls promptkit_list_templates, finds investigate-bug] + [Copilot calls promptkit_assemble with selected template] + [Copilot uses assembled prompt as its working instructions] + +> What PromptKit templates do I have? + [Copilot calls promptkit_list_templates, displays results] +``` + +**Pros**: +- Programmatic, precise assembly (no LLM interpretation of bootstrap.md) +- Deterministic β€” same inputs always produce same output +- Works across ALL MCP-supporting clients (Copilot, Claude, VS Code, etc.) +- Clean separation β€” PromptKit runs as a service, Copilot consumes tools +- `npm update` to get latest PromptKit content +- No context window pressure (assembled prompt returned as tool output) +- Assembly can enforce the Verbatim Inclusion Rule in code (not LLM honor system) + +**Cons**: +- Most engineering effort to build (new MCP server package) +- Requires Node.js runtime on user's machine +- MCP server process runs alongside Copilot +- Loses interactive template mode (MCP tools are request/response, not conversational) +- User must configure MCP server (though plugin could automate this) + +**Feasibility**: βœ… Medium-High β€” requires new package but uses existing ecosystem + +--- + +### Strategy D: PromptKit as a Plugin (Composite) + +**Concept**: Package PromptKit as a Copilot CLI plugin that bundles skills, +custom agents, MCP server, hooks, and LSP configs. One-command installation. + +**Plugin structure**: +``` +promptkit-plugin/ +β”œβ”€β”€ plugin.json # Plugin manifest +β”œβ”€β”€ skills/ +β”‚ └── promptkit/ +β”‚ β”œβ”€β”€ SKILL.md # Main composition skill +β”‚ └── [PromptKit content files] +β”œβ”€β”€ agents/ +β”‚ β”œβ”€β”€ promptkit.agent.md # Meta-agent for full composition +β”‚ β”œβ”€β”€ promptkit-investigator.agent.md +β”‚ └── promptkit-reviewer.agent.md +β”œβ”€β”€ hooks.json # Lifecycle hooks +β”œβ”€β”€ .mcp.json # MCP server config +└── lsp.json # LSP server configs for common languages +``` + +**plugin.json**: +```json +{ + "name": "promptkit", + "description": "Composable prompt library for structured engineering tasks", + "version": "0.5.0", + "author": { "name": "PromptKit Contributors" }, + "license": "MIT", + "keywords": ["prompts", "engineering", "code-review", "requirements", "investigation"], + "repository": "https://github.com/microsoft/PromptKit", + "agents": "agents/", + "skills": ["skills/"], + "hooks": "hooks.json", + "mcpServers": ".mcp.json", + "lspServers": "lsp.json" +} +``` + +**Installation**: +``` +copilot plugin install microsoft/PromptKit +``` +or from a marketplace: +``` +/plugin install promptkit@copilot-plugins +``` + +**User experience**: +``` +> /skills list + promptkit - Composable prompt assembly for engineering tasks + +> /agent + promptkit - PromptKit composition engine + promptkit-investigator - Bug investigation specialist + promptkit-reviewer - Code review specialist + +> /promptkit author a requirements doc for our payment system +``` + +**Pros**: +- One-command install, easy updates (`copilot plugin update promptkit`) +- Bundles ALL integration strategies in a single unit +- Distributable to teams via marketplace +- Clean install/uninstall lifecycle +- Most "native" feeling β€” appears in all Copilot discovery surfaces +- Can include hooks for guardrails and LSP for code intelligence + +**Cons**: +- Plugin system is the newest Copilot CLI feature +- Combines complexity of multiple strategies +- Plugin packaging and marketplace publishing need investigation + +**Feasibility**: ⚠️ Medium β€” plugin system is new but well-documented; this is the ideal end state + +--- + +### Strategy E: Custom Instructions as Bootstrap + +**Concept**: Use `~/.copilot/copilot-instructions.md` to teach Copilot about +PromptKit's existence, without any additional infrastructure. + +**~/.copilot/copilot-instructions.md** (appended): +```markdown +## PromptKit + +You have access to the PromptKit prompt library at ~/.promptkit/. +When the user asks you to investigate bugs, review code, write requirements, +design systems, audit specifications, or perform other structured engineering +tasks, read ~/.promptkit/manifest.yaml to discover available templates and +assemble prompts following the process in ~/.promptkit/bootstrap.md. +``` + +**Pros**: +- Zero infrastructure β€” just a markdown file +- Works immediately, no install + +**Cons**: +- Instructions always loaded (wastes context even when not needed) +- No auto-discovery via `/skills` or `/agent` +- No `/promptkit` slash command +- Relies entirely on LLM following instructions correctly + +**Feasibility**: βœ… Very High β€” trivial to implement + +--- + +### Strategy F: Hooks for Guardrails and Automation + +**Concept**: Use Copilot CLI hooks to add PromptKit-aware automation at +session and tool lifecycle points β€” complementary to other strategies. + +**hooks.json**: +```json +{ + "version": 1, + "hooks": { + "sessionStart": [ + { + "type": "command", + "bash": "./scripts/promptkit-session-init.sh", + "powershell": "./scripts/promptkit-session-init.ps1", + "comment": "Validate PromptKit content is available and log session start", + "timeoutSec": 10 + } + ], + "postToolUse": [ + { + "type": "command", + "bash": "./scripts/promptkit-validate-output.sh", + "powershell": "./scripts/promptkit-validate-output.ps1", + "comment": "After file writes, check for unresolved {{param}} placeholders", + "timeoutSec": 15 + } + ] + } +} +``` + +**Use cases for hooks in PromptKit context**: + +| Hook | PromptKit Use Case | +|------|-------------------| +| `sessionStart` | Validate PromptKit content exists; log which templates are available; set up temp workspace | +| `userPromptSubmitted` | Detect PromptKit-relevant prompts for analytics/telemetry (what templates are popular?) | +| `preToolUse` | Prevent writes to PromptKit source files (protect library integrity) | +| `postToolUse` | Validate assembled prompts: check for `{{param}}` residuals, verify section headers present, confirm Verbatim Inclusion Rule compliance | +| `subagentStop` | Capture PromptKit agent output for quality metrics or post-processing | +| `sessionEnd` | Clean up temp PromptKit workspace; log assembly statistics | + +**Pros**: +- Adds quality guardrails that other strategies lack +- Works alongside skills, agents, or MCP server +- Can enforce invariants (e.g., no unresolved placeholders) programmatically +- Provides observability (what templates are being used, how often) +- Shell-based β€” can run any validation logic + +**Cons**: +- Hooks alone don't provide discovery or invocation β€” purely supplementary +- Adds latency to tool execution (hooks run synchronously) +- Shell scripts add maintenance surface +- `userPromptSubmitted` and `postToolUse` outputs are currently ignored (can only log, not modify) + +**Feasibility**: βœ… High β€” well-documented and straightforward + +--- + +### Strategy G: LSP Configuration for Enhanced Code Intelligence + +**Concept**: Bundle LSP server configurations alongside PromptKit to improve +Copilot's code understanding when executing PromptKit templates that analyze code. + +**lsp.json** (bundled in plugin): +```json +{ + "lspServers": { + "clangd": { + "command": "clangd", + "args": ["--background-index"], + "fileExtensions": { + ".c": "c", + ".h": "c", + ".cpp": "cpp", + ".cc": "cpp", + ".cxx": "cpp", + ".hpp": "cpp" + } + }, + "rust-analyzer": { + "command": "rust-analyzer", + "fileExtensions": { + ".rs": "rust" + } + }, + "yaml-language-server": { + "command": "yaml-language-server", + "args": ["--stdio"], + "fileExtensions": { + ".yaml": "yaml", + ".yml": "yaml" + } + } + } +} +``` + +**How it enhances PromptKit**: +- When `review-cpp-code` template is used β†’ `clangd` provides diagnostics, symbol resolution +- When `memory-safety-rust` protocol is active β†’ `rust-analyzer` provides type info, borrow checker insights +- When editing `manifest.yaml` β†’ YAML language server provides validation +- Copilot gets richer context about code structure, types, and errors +- Templates that analyze code (review-code, exhaustive-bug-hunt, find-and-fix-bugs) benefit most + +**Important caveat**: LSP servers must be **installed separately** by the user β€” Copilot CLI +only manages configuration, not installation. The plugin should document prerequisites +and ideally the `sessionStart` hook can check for LSP server availability. + +**Pros**: +- Improves quality of code analysis templates significantly +- Copilot sees real compiler diagnostics, not just text patterns +- YAML LSP helps maintain PromptKit's own manifest +- Zero-config for users who already have language servers installed + +**Cons**: +- LSP servers are heavy external dependencies +- User must install them separately (can't be bundled) +- Only helps code-analysis templates, not document authoring ones +- Not a direct PromptKit integration β€” more of a quality enhancement + +**Feasibility**: βœ… High β€” simple JSON config; value depends on user's existing tooling + +--- + +## Comparison Matrix + +| Dimension | Current CLI | Skill (A) | Agent (B) | MCP (C) | Plugin (D) | Instructions (E) | Hooks (F) | LSP (G) | +|-----------|-------------|-----------|-----------|---------|------------|-------------------|-----------|---------| +| **Discoverability** | None | `/skills` | `/agent` | Tools | All | None | N/A | `/lsp` | +| **Invocation** | `npx promptkit` | `/promptkit` | `/agent` | Auto | All | In prompt | Auto | Auto | +| **Session integration** | New session | Same | Own ctx | Same | Varies | Same | Same | Same | +| **Assembly fidelity** | LLM | LLM | LLM | **Deterministic** | Varies | LLM | Validates | N/A | +| **Context pressure** | N/A | High | Isolated | **Low** | Varies | Always | None | None | +| **Install effort** | npm | Copy | Copy | npm+config | **One cmd** | Edit file | Copy | Config | +| **Update effort** | npm | Re-copy | Re-copy | npm | **One cmd** | Edit | Re-copy | Manual | +| **Cross-client** | No | No | No | **Yes** | No | No | No | Partial | +| **Code intelligence** | No | No | No | No | Optional | No | No | **Yes** | +| **Guardrails** | No | No | No | Code-level | Optional | No | **Yes** | No | +| **Engineering effort** | Exists | Low | Low | Medium | Medium | Trivial | Low | Low | + +--- + +## Recommended Approach: Plugin-First with Layered Components + +Rather than a sequential phase approach, the strategies should be viewed as +**complementary components of a single plugin**. The plugin is the distribution +vehicle; skills, agents, MCP, hooks, and LSP are its contents. + +### Core Plugin Components + +| Component | Role in Plugin | Priority | +|-----------|---------------|----------| +| **Skill** (A) | Primary invocation path β€” `/promptkit` | P0 (must-have) | +| **MCP Server** (C) | Deterministic assembly engine | P0 (must-have) | +| **Meta-Agent** (B1) | Interactive templates + full composition | P1 (high-value) | +| **Per-Template Agents** (B2) | Pre-composed workflows (investigator, reviewer, requirements) | P1 (high-value) | +| **Hooks** (F) | Output validation, telemetry, guardrails | P1 (high-value) | +| **LSP Config** (G) | Enhanced code intelligence | P2 (nice-to-have) | +| **Custom Instructions** (E) | Fallback / lightweight alternative | Standalone option | + +> **Why per-template agents are P1, not P2.** The +> [sonde case study](./case-studies/sonde-protocol-evolution.md) shows that +> pre-composed coder/reviewer/validator workflows β€” personas and protocols +> baked in β€” are where PromptKit delivers the most value in practice. A +> `promptkit-investigator.agent.md` that hard-wires `systems-engineer` + +> `root-cause-analysis` + `memory-safety-c` is immediately useful with +> **no manifest lookup and no assembly round-trip**, which is where most +> of the per-invocation latency and friction live today. + +### Recommended Plugin Architecture + +``` +promptkit/ +β”œβ”€β”€ plugin.json +β”‚ +β”œβ”€β”€ skills/ +β”‚ └── promptkit/ +β”‚ └── SKILL.md # Invocation entry point β€” calls MCP tools +β”‚ +β”œβ”€β”€ agents/ +β”‚ β”œβ”€β”€ promptkit.agent.md # Full composition meta-agent +β”‚ β”œβ”€β”€ promptkit-investigator.agent.md # Pre-composed: investigate-bug +β”‚ β”œβ”€β”€ promptkit-reviewer.agent.md # Pre-composed: review-code +β”‚ └── promptkit-requirements.agent.md # Pre-composed: author-requirements-doc +β”‚ +β”œβ”€β”€ hooks.json # Validation + telemetry hooks +β”œβ”€β”€ .mcp.json # PromptKit MCP server config +└── lsp.json # Recommended LSP configs for common languages +``` + +### How the Pieces Work Together + +``` +User: "Investigate the crash in packet_handler.c" + β”‚ + β–Ό +Copilot auto-detects relevance β†’ invokes promptkit skill + β”‚ + β–Ό +Skill instructs Copilot to call MCP tools: + 1. promptkit_list_templates() β†’ finds investigate-bug + 2. promptkit_get_template_info("investigate-bug") β†’ learns params needed + 3. Copilot asks user for missing params (or infers from context) + 4. promptkit_assemble({template, params}) β†’ returns complete prompt + β”‚ + β–Ό +Copilot adopts assembled prompt as working instructions + β”‚ + β–Ό +postToolUse hook validates: no {{param}} residuals, sections present + β”‚ + β–Ό +LSP (clangd) provides code intelligence while Copilot analyzes C code +``` + +For **interactive templates** (`mode: interactive`), the skill delegates to +the `promptkit.agent.md` custom agent, which runs in its own context window +and executes the multi-turn reasoning workflow directly. + +### Key UX Gap to Prototype + +Step 4 of the flow above β€” *"Copilot adopts assembled prompt as working +instructions"* β€” is the load-bearing step everything else flows from, and +also the least understood. Copilot CLI does not currently expose a +documented "set working instructions" primitive that a tool result can +populate. Three candidate mechanisms, in order of how cheap each is to +prototype: + +1. **Temp file + `read`.** The skill instructs Copilot, "call + `promptkit_assemble`, write the returned prompt to a session-scoped + temp file (e.g., `~/.copilot/sessions//promptkit-active.md`), + then `read` that file before continuing." Leans entirely on existing + primitives. This is the recommended **first prototype** β€” if it works, + the rest of the architecture is unblocked. +2. **Tool result as system-style context.** MCP returns the prompt in a + structured tool response that Copilot is instructed to treat as + system-level context for the remainder of the session. Cleaner UX, but + depends on whether tool results can carry that semantic weight in + Copilot CLI today. +3. **Behavioral-override directive.** The skill emits a directive + ("From this point forward, follow the protocol below verbatim…") + that re-frames the rest of the session. Most fragile β€” relies on the + LLM honoring the directive without a runtime enforcement mechanism. + +**Implementation prerequisite.** Mechanism (1) must be validated end-to-end +on Copilot CLI before any of the per-template agents (B2), MCP server (C), +or plugin (D) work begins β€” they all depend on a working "adopt as +instructions" path. If (1) is insufficient, the architecture needs to +fall back to per-template agents for everything, since agents *do* have +a stable system-prompt slot. + +### Design Recommendations + +Two design choices have been promoted out of Open Questions based on +review feedback: + +- **MCP is mandatory; no direct-file-read fallback.** If the MCP server + is not running, the skill stops and tells the user to run + `copilot plugin update promptkit` (or the equivalent start command) β€” + it does **not** silently degrade to LLM-reads-500KB-of-Markdown. A + fallback path would reintroduce the assembly-fidelity problem this + whole effort is trying to eliminate. +- **`promptkit_get_interactive_context` is the seam between MCP and the + meta-agent.** MCP owns deterministic assembly (request/response); + the meta-agent owns multi-turn conversational execution. For + interactive templates (`mode: interactive`), the agent calls + `promptkit_get_interactive_context(template, params)` once to load + the persona + protocols + format + template body, then runs the + reasoning loop in its own context window. This cleanly resolves the + request/response-vs-interactive tension without forcing MCP to model + long-running sessions. + +--- + +## Open Questions + +> Items that were originally listed here as questions and have since been +> resolved into design decisions: **MCP-vs-fallback** (MCP mandatory β€” see +> *Design Recommendations*) and **interactive templates in MCP** (resolved +> via `promptkit_get_interactive_context` β€” see *Design Recommendations*). + +1. **Skill + MCP interaction**: Can a skill's instructions direct Copilot to + call specific MCP tools? This is the key integration pattern β€” skill provides + discovery/invocation, MCP provides deterministic assembly. + +2. **Adopt-as-instructions mechanism**: Of the three candidate mechanisms in + *Key UX Gap to Prototype* above, which actually works on current Copilot CLI? + Mechanism (1) β€” temp-file + `read` β€” needs to be validated end-to-end before + any of the C/D/F/G work proceeds. + +3. **Plugin MCP lifecycle**: When a plugin includes `.mcp.json`, does Copilot + auto-start the MCP server process? If so, `npx @promptkit/mcp-server` could + be the command β€” no global install needed. + +4. **Hook output limitations**: `userPromptSubmitted` and `postToolUse` hook + outputs are currently ignored. Can `postToolUse` at least log validation + failures that the user sees, even if it can't modify tool output? + +5. **Context window budget**: When the MCP server returns a 50KB assembled prompt + as a tool result, how does this affect available context? Tool results may be + more efficient than file reads since they're structured. + +6. **Plugin marketplace publishing**: Can PromptKit be listed on `github/copilot-plugins` + or `github/awesome-copilot`? What's the submission process? + +7. **LSP server detection**: Can hooks or skills detect whether recommended LSP + servers are installed and warn if not? This would improve the setup experience. + +--- + +## Next Steps + +1. **Prototype the plugin**: Create a minimal `plugin.json` + skill + one agent, + install locally via `copilot plugin install ./promptkit-plugin`, and test. + +2. **Build MCP server**: Implement `list_templates`, `get_template_info`, and + `assemble` tools. Test with Copilot CLI via `.mcp.json`. + +3. **Write hooks**: Create `postToolUse` validation hook that checks assembled + prompt output for common issues. + +4. **Test skill ↔ MCP interaction**: Verify that a skill can instruct Copilot + to call MCP tools effectively. + +5. **Measure context impact**: Compare context consumption between direct file + reading (skill-only) vs. MCP tool results. + +6. **Investigate marketplace listing**: Research how to publish to + `github/copilot-plugins` or `github/awesome-copilot`. diff --git a/docs/roadmap.md b/docs/roadmap.md index c51be81..c39ed94 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -57,6 +57,34 @@ team-wide deployment via org-level extension install. selection and parameter gathering happen through Copilot Chat's conversational interface. +## Copilot CLI Native Integration + +> **Status: Research complete.** See +> [`copilot-cli-integration-research.md`](./copilot-cli-integration-research.md) +> for the full analysis. + +A complementary integration path: embed PromptKit directly into +**GitHub Copilot CLI** using its plugin, skill, agent, and MCP server +extension points β€” making PromptKit available as a native capability +within terminal-based workflows. + +```sh +/promptkit investigate this bug β€” segfault in packet_handler.c +``` + +The research evaluates seven integration strategies (skills, custom +agents, MCP server, plugins, hooks, LSP configs, and custom +instructions) and recommends a **plugin-first approach** bundling a +skill for invocation, an MCP server for deterministic assembly, and +agents for interactive templates. + +**Compared to a Copilot Extension**, CLI integration works within +existing local sessions (no context switching), leverages local code +intelligence (LSP), supports lifecycle hooks for guardrails, and +distributes via `copilot plugin install`. However, it only targets +terminal users, whereas a Copilot Extension reaches Copilot Chat +across web, IDE, and CLI surfaces. + ## VS Code Extension > **Status: Not yet started.** Exploratory idea.