diff --git a/docs/guides/setup/backend-services.md b/docs/guides/setup/backend-services.md
new file mode 100644
index 00000000..7581cff3
--- /dev/null
+++ b/docs/guides/setup/backend-services.md
@@ -0,0 +1,430 @@
+# Backend Services Setup
+
+Run the Copilot SDK in server-side applications — APIs, web backends, microservices, and background workers. The CLI runs as a headless server that your backend code connects to over the network.
+
+**Best for:** Web app backends, API services, internal tools, CI/CD integrations, any server-side workload.
+
+## How It Works
+
+Instead of the SDK spawning a CLI child process, you run the CLI independently in **headless server mode**. Your backend connects to it over TCP using the `cliUrl` option.
+
+```mermaid
+flowchart TB
+    subgraph Backend["Your Backend"]
+        API["API Server"]
+        SDK["SDK Client"]
+    end
+
+    subgraph CLIServer["Copilot CLI (Headless)"]
+        RPC["JSON-RPC Server<br/>TCP :4321"]
+        Sessions["Session Manager"]
+    end
+
+    Users["👥 Users"] --> API
+    API --> SDK
+    SDK -- "cliUrl: localhost:4321" --> RPC
+    RPC --> Sessions
+    RPC --> Copilot["☁️ GitHub Copilot<br/>or Model Provider"]
+
+    style Backend fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLIServer fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+**Key characteristics:**
+- CLI runs as a persistent server process (not spawned per request)
+- SDK connects over TCP — CLI and app can run in different containers
+- Multiple SDK clients can share one CLI server
+- Works with any auth method (GitHub tokens, env vars, BYOK)
+
+## Architecture: Auto-Managed vs. External CLI
+
+```mermaid
+flowchart LR
+    subgraph Auto["Auto-Managed (Default)"]
+        A1["SDK"] -->|"spawns"| A2["CLI Process"]
+        A2 -.->|"dies with app"| A1
+    end
+
+    subgraph External["External Server (Backend)"]
+        B1["SDK"] -->|"cliUrl"| B2["CLI Server"]
+        B2 -.->|"independent<br/>lifecycle"| B1
+    end
+
+    style Auto fill:#161b22,stroke:#8b949e,color:#c9d1d9
+    style External fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+## Step 1: Start the CLI in Headless Mode
+
+Run the CLI as a background server:
+
+```bash
+# Start with a specific port
+copilot --headless --port 4321
+
+# Or let it pick a random port (prints the URL)
+copilot --headless
+# Output: Listening on http://localhost:52431
+```
+
+For production, run it as a system service or in a container:
+
+```bash
+# Docker
+docker run -d --name copilot-cli \
+    -p 4321:4321 \
+    -e COPILOT_GITHUB_TOKEN="$TOKEN" \
+    ghcr.io/github/copilot-cli:latest \
+    --headless --port 4321
+
+# systemd
+[Service]
+ExecStart=/usr/local/bin/copilot --headless --port 4321
+Environment=COPILOT_GITHUB_TOKEN=your-token
+Restart=always
+```
+
+## Step 2: Connect the SDK
+
+<details open>
+<summary><strong>Node.js / TypeScript</strong></summary>
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+
+const client = new CopilotClient({
+    cliUrl: "localhost:4321",
+});
+
+const session = await client.createSession({
+    sessionId: `user-${userId}-${Date.now()}`,
+    model: "gpt-4.1",
+});
+
+const response = await session.sendAndWait({ prompt: req.body.message });
+res.json({ content: response?.data.content });
+```
+
+</details>
+
+<details>
+<summary><strong>Python</strong></summary>
+
+```python
+from copilot import CopilotClient
+
+client = CopilotClient({
+    "cli_url": "localhost:4321",
+})
+await client.start()
+
+session = await client.create_session({
+    "session_id": f"user-{user_id}-{int(time.time())}",
+    "model": "gpt-4.1",
+})
+
+response = await session.send_and_wait({"prompt": message})
+```
+
+</details>
+
+<details>
+<summary><strong>Go</strong></summary>
+
+```go
+client := copilot.NewClient(&copilot.ClientOptions{
+    CLIUrl: "localhost:4321",
+})
+client.Start(ctx)
+defer client.Stop()
+
+session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
+    SessionID: fmt.Sprintf("user-%s-%d", userID, time.Now().Unix()),
+    Model:     "gpt-4.1",
+})
+
+response, _ := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: message})
+```
+
+</details>
+
+<details>
+<summary><strong>.NET</strong></summary>
+
+```csharp
+var client = new CopilotClient(new CopilotClientOptions
+{
+    CliUrl = "localhost:4321",
+    UseStdio = false,
+});
+
+await using var session = await client.CreateSessionAsync(new SessionConfig
+{
+    SessionId = $"user-{userId}-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}",
+    Model = "gpt-4.1",
+});
+
+var response = await session.SendAndWaitAsync(
+    new MessageOptions { Prompt = message });
+```
+
+</details>
+
+## Authentication for Backend Services
+
+### Environment Variable Tokens
+
+The simplest approach — set a token on the CLI server:
+
+```mermaid
+flowchart LR
+    subgraph Server
+        EnvVar["COPILOT_GITHUB_TOKEN"]
+        CLI["Copilot CLI"]
+    end
+
+    EnvVar --> CLI
+    CLI --> Copilot["☁️ Copilot API"]
+
+    style Server fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+```bash
+# All requests use this token
+export COPILOT_GITHUB_TOKEN="gho_service_account_token"
+copilot --headless --port 4321
+```
+
+### Per-User Tokens (OAuth)
+
+Pass individual user tokens when creating sessions. See [GitHub OAuth](./github-oauth.md) for the full flow.
+
+```typescript
+// Your API receives user tokens from your auth layer
+app.post("/chat", authMiddleware, async (req, res) => {
+    const client = new CopilotClient({
+        cliUrl: "localhost:4321",
+        githubToken: req.user.githubToken,
+        useLoggedInUser: false,
+    });
+
+    const session = await client.createSession({
+        sessionId: `user-${req.user.id}-chat`,
+        model: "gpt-4.1",
+    });
+
+    const response = await session.sendAndWait({
+        prompt: req.body.message,
+    });
+
+    res.json({ content: response?.data.content });
+});
+```
+
+### BYOK (No GitHub Auth)
+
+Use your own API keys for the model provider. See [BYOK](./byok.md) for details.
+
+```typescript
+const client = new CopilotClient({
+    cliUrl: "localhost:4321",
+});
+
+const session = await client.createSession({
+    model: "gpt-4.1",
+    provider: {
+        type: "openai",
+        baseUrl: "https://api.openai.com/v1",
+        apiKey: process.env.OPENAI_API_KEY,
+    },
+});
+```
+
+## Common Backend Patterns
+
+### Web API with Express
+
+```mermaid
+flowchart TB
+    Users["👥 Users"] --> LB["Load Balancer"]
+    LB --> API1["API Instance 1"]
+    LB --> API2["API Instance 2"]
+
+    API1 --> CLI["Copilot CLI<br/>(headless :4321)"]
+    API2 --> CLI
+
+    CLI --> Cloud["☁️ Model Provider"]
+
+    style API1 fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style API2 fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+```typescript
+import express from "express";
+import { CopilotClient } from "@github/copilot-sdk";
+
+const app = express();
+app.use(express.json());
+
+// Single shared CLI connection
+const client = new CopilotClient({
+    cliUrl: process.env.CLI_URL || "localhost:4321",
+});
+
+app.post("/api/chat", async (req, res) => {
+    const { sessionId, message } = req.body;
+
+    // Create or resume session
+    let session;
+    try {
+        session = await client.resumeSession(sessionId);
+    } catch {
+        session = await client.createSession({
+            sessionId,
+            model: "gpt-4.1",
+        });
+    }
+
+    const response = await session.sendAndWait({ prompt: message });
+    res.json({
+        sessionId,
+        content: response?.data.content,
+    });
+});
+
+app.listen(3000);
+```
+
+### Background Worker
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+
+const client = new CopilotClient({
+    cliUrl: process.env.CLI_URL || "localhost:4321",
+});
+
+// Process jobs from a queue
+async function processJob(job: Job) {
+    const session = await client.createSession({
+        sessionId: `job-${job.id}`,
+        model: "gpt-4.1",
+    });
+
+    const response = await session.sendAndWait({
+        prompt: job.prompt,
+    });
+
+    await saveResult(job.id, response?.data.content);
+    await session.destroy();  // Clean up after job completes
+}
+```
+
+### Docker Compose Deployment
+
+```yaml
+version: "3.8"
+
+services:
+  copilot-cli:
+    image: ghcr.io/github/copilot-cli:latest
+    command: ["--headless", "--port", "4321"]
+    environment:
+      - COPILOT_GITHUB_TOKEN=${COPILOT_GITHUB_TOKEN}
+    ports:
+      - "4321:4321"
+    restart: always
+    volumes:
+      - session-data:/root/.copilot/session-state
+
+  api:
+    build: .
+    environment:
+      - CLI_URL=copilot-cli:4321
+    depends_on:
+      - copilot-cli
+    ports:
+      - "3000:3000"
+
+volumes:
+  session-data:
+```
+
+```mermaid
+flowchart TB
+    subgraph Docker["Docker Compose"]
+        API["api:3000"]
+        CLI["copilot-cli:4321"]
+        Vol["📁 session-data<br/>(persistent volume)"]
+    end
+
+    Users["👥 Users"] --> API
+    API --> CLI
+    CLI --> Vol
+
+    CLI --> Cloud["☁️ Copilot / Provider"]
+
+    style Docker fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+## Health Checks
+
+Monitor the CLI server's health:
+
+```typescript
+// Periodic health check
+async function checkCLIHealth(): Promise<boolean> {
+    try {
+        const status = await client.getStatus();
+        return status !== undefined;
+    } catch {
+        return false;
+    }
+}
+```
+
+## Session Cleanup
+
+Backend services should actively clean up sessions to avoid resource leaks:
+
+```typescript
+// Clean up expired sessions periodically
+async function cleanupSessions(maxAgeMs: number) {
+    const sessions = await client.listSessions();
+    const now = Date.now();
+
+    for (const session of sessions) {
+        const age = now - new Date(session.createdAt).getTime();
+        if (age > maxAgeMs) {
+            await client.deleteSession(session.sessionId);
+        }
+    }
+}
+
+// Run every hour
+setInterval(() => cleanupSessions(24 * 60 * 60 * 1000), 60 * 60 * 1000);
+```
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **Single CLI server = single point of failure** | See [Scaling guide](./scaling.md) for HA patterns |
+| **No built-in auth between SDK and CLI** | Secure the network path (same host, VPC, etc.) |
+| **Session state on local disk** | Mount persistent storage for container restarts |
+| **30-minute idle timeout** | Sessions without activity are auto-cleaned |
+
+## When to Move On
+
+| Need | Next Guide |
+|------|-----------|
+| Multiple CLI servers / high availability | [Scaling & Multi-Tenancy](./scaling.md) |
+| GitHub account auth for users | [GitHub OAuth](./github-oauth.md) |
+| Your own model keys | [BYOK](./byok.md) |
+
+## Next Steps
+
+- **[Scaling & Multi-Tenancy](./scaling.md)** — Handle more users, add redundancy
+- **[Session Persistence](../session-persistence.md)** — Resume sessions across restarts
+- **[GitHub OAuth](./github-oauth.md)** — Add user authentication
diff --git a/docs/guides/setup/bundled-cli.md b/docs/guides/setup/bundled-cli.md
new file mode 100644
index 00000000..9fc88f09
--- /dev/null
+++ b/docs/guides/setup/bundled-cli.md
@@ -0,0 +1,326 @@
+# Bundled CLI Setup
+
+Package the Copilot CLI alongside your application so users don't need to install or configure anything separately. Your app ships with everything it needs.
+
+**Best for:** Desktop apps, standalone tools, Electron apps, distributable CLI utilities.
+
+## How It Works
+
+Instead of relying on a globally installed CLI, you include the CLI binary in your application bundle. The SDK points to your bundled copy via the `cliPath` option.
+
+```mermaid
+flowchart TB
+    subgraph Bundle["Your Distributed App"]
+        App["Application Code"]
+        SDK["SDK Client"]
+        CLIBin["Copilot CLI Binary<br/>(bundled)"]
+    end
+
+    App --> SDK
+    SDK -- "cliPath" --> CLIBin
+    CLIBin -- "API calls" --> Copilot["☁️ GitHub Copilot"]
+
+    style Bundle fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+**Key characteristics:**
+- CLI binary ships with your app — no separate install needed
+- You control the exact CLI version your app uses
+- Users authenticate through your app (or use env vars / BYOK)
+- Sessions are managed per-user on their machine
+
+## Architecture: Bundled vs. Installed
+
+```mermaid
+flowchart LR
+    subgraph Installed["Standard Setup"]
+        A1["Your App"] --> SDK1["SDK"]
+        SDK1 --> CLI1["Global CLI<br/>(/usr/local/bin/copilot)"]
+    end
+
+    subgraph Bundled["Bundled Setup"]
+        A2["Your App"] --> SDK2["SDK"]
+        SDK2 --> CLI2["Bundled CLI<br/>(./vendor/copilot)"]
+    end
+
+    style Installed fill:#161b22,stroke:#8b949e,color:#c9d1d9
+    style Bundled fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+## Setup
+
+### 1. Include the CLI in Your Project
+
+The CLI is distributed as part of the `@github/copilot` npm package. You can also obtain platform-specific binaries for your distribution pipeline.
+
+```bash
+# The CLI is available from the @github/copilot package
+npm install @github/copilot
+```
+
+### 2. Point the SDK to Your Bundled CLI
+
+<details open>
+<summary><strong>Node.js / TypeScript</strong></summary>
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+import path from "path";
+
+const client = new CopilotClient({
+    // Point to the CLI binary in your app bundle
+    cliPath: path.join(__dirname, "vendor", "copilot"),
+});
+
+const session = await client.createSession({ model: "gpt-4.1" });
+const response = await session.sendAndWait({ prompt: "Hello!" });
+console.log(response?.data.content);
+
+await client.stop();
+```
+
+</details>
+
+<details>
+<summary><strong>Python</strong></summary>
+
+```python
+from copilot import CopilotClient
+from pathlib import Path
+
+client = CopilotClient({
+    "cli_path": str(Path(__file__).parent / "vendor" / "copilot"),
+})
+await client.start()
+
+session = await client.create_session({"model": "gpt-4.1"})
+response = await session.send_and_wait({"prompt": "Hello!"})
+print(response.data.content)
+
+await client.stop()
+```
+
+</details>
+
+<details>
+<summary><strong>Go</strong></summary>
+
+```go
+client := copilot.NewClient(&copilot.ClientOptions{
+    CLIPath: "./vendor/copilot",
+})
+if err := client.Start(ctx); err != nil {
+    log.Fatal(err)
+}
+defer client.Stop()
+
+session, _ := client.CreateSession(ctx, &copilot.SessionConfig{Model: "gpt-4.1"})
+response, _ := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: "Hello!"})
+fmt.Println(*response.Data.Content)
+```
+
+</details>
+
+<details>
+<summary><strong>.NET</strong></summary>
+
+```csharp
+var client = new CopilotClient(new CopilotClientOptions
+{
+    CliPath = Path.Combine(AppContext.BaseDirectory, "vendor", "copilot"),
+});
+
+await using var session = await client.CreateSessionAsync(
+    new SessionConfig { Model = "gpt-4.1" });
+
+var response = await session.SendAndWaitAsync(
+    new MessageOptions { Prompt = "Hello!" });
+Console.WriteLine(response?.Data.Content);
+```
+
+</details>
+
+## Authentication Strategies
+
+When bundling, you need to decide how your users will authenticate. Here are the common patterns:
+
+```mermaid
+flowchart TB
+    App["Bundled App"]
+
+    App --> A["User signs in to CLI<br/>(keychain credentials)"]
+    App --> B["App provides token<br/>(OAuth / env var)"]
+    App --> C["BYOK<br/>(your own API keys)"]
+
+    A --> Note1["User runs 'copilot' once<br/>to authenticate"]
+    B --> Note2["Your app handles login<br/>and passes token"]
+    C --> Note3["No GitHub auth needed<br/>Uses your model provider"]
+
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+### Option A: User's Signed-In Credentials (Simplest)
+
+The user signs in to the CLI once, and your bundled app uses those credentials. No extra code needed — this is the default behavior.
+
+```typescript
+const client = new CopilotClient({
+    cliPath: path.join(__dirname, "vendor", "copilot"),
+    // Default: uses signed-in user credentials
+});
+```
+
+### Option B: Token via Environment Variable
+
+Ship your app with instructions to set a token, or set it programmatically:
+
+```typescript
+const client = new CopilotClient({
+    cliPath: path.join(__dirname, "vendor", "copilot"),
+    env: {
+        COPILOT_GITHUB_TOKEN: getUserToken(),  // Your app provides the token
+    },
+});
+```
+
+### Option C: BYOK (No GitHub Auth Needed)
+
+If you manage your own model provider keys, users don't need GitHub accounts at all:
+
+```typescript
+const client = new CopilotClient({
+    cliPath: path.join(__dirname, "vendor", "copilot"),
+});
+
+const session = await client.createSession({
+    model: "gpt-4.1",
+    provider: {
+        type: "openai",
+        baseUrl: "https://api.openai.com/v1",
+        apiKey: process.env.OPENAI_API_KEY,
+    },
+});
+```
+
+See the **[BYOK guide](./byok.md)** for full details.
+
+## Session Management
+
+Bundled apps typically want named sessions so users can resume conversations:
+
+```typescript
+const client = new CopilotClient({
+    cliPath: path.join(__dirname, "vendor", "copilot"),
+});
+
+// Create a session tied to the user's project
+const sessionId = `project-${projectName}`;
+const session = await client.createSession({
+    sessionId,
+    model: "gpt-4.1",
+});
+
+// User closes app...
+// Later, resume where they left off
+const resumed = await client.resumeSession(sessionId);
+```
+
+Session state persists at `~/.copilot/session-state/{sessionId}/`.
+
+## Distribution Patterns
+
+### Desktop App (Electron, Tauri)
+
+```mermaid
+flowchart TB
+    subgraph Electron["Desktop App Package"]
+        UI["App UI"] --> Main["Main Process"]
+        Main --> SDK["SDK Client"]
+        SDK --> CLI["Copilot CLI<br/>(in app resources)"]
+    end
+    CLI --> Cloud["☁️ GitHub Copilot"]
+
+    style Electron fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+Include the CLI binary in your app's resources directory:
+
+```typescript
+import { app } from "electron";
+import path from "path";
+
+const cliPath = path.join(
+    app.isPackaged ? process.resourcesPath : __dirname,
+    "copilot"
+);
+
+const client = new CopilotClient({ cliPath });
+```
+
+### CLI Tool
+
+For distributable CLI tools, resolve the path relative to your binary:
+
+```typescript
+import { fileURLToPath } from "url";
+import path from "path";
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const cliPath = path.join(__dirname, "..", "vendor", "copilot");
+
+const client = new CopilotClient({ cliPath });
+```
+
+## Platform-Specific Binaries
+
+When distributing for multiple platforms, include the correct binary for each:
+
+```
+my-app/
+├── vendor/
+│   ├── copilot-darwin-arm64    # macOS Apple Silicon
+│   ├── copilot-darwin-x64      # macOS Intel
+│   ├── copilot-linux-x64       # Linux x64
+│   └── copilot-win-x64.exe     # Windows x64
+└── src/
+    └── index.ts
+```
+
+```typescript
+import os from "os";
+
+function getCLIPath(): string {
+    const platform = process.platform;   // "darwin", "linux", "win32"
+    const arch = os.arch();              // "arm64", "x64"
+    const ext = platform === "win32" ? ".exe" : "";
+    const name = `copilot-${platform}-${arch}${ext}`;
+    return path.join(__dirname, "vendor", name);
+}
+
+const client = new CopilotClient({
+    cliPath: getCLIPath(),
+});
+```
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **Bundle size** | CLI binary adds to your app's distribution size |
+| **Updates** | You manage CLI version updates in your release cycle |
+| **Platform builds** | Need separate binaries for each OS/architecture |
+| **Single user** | Each bundled CLI instance serves one user |
+
+## When to Move On
+
+| Need | Next Guide |
+|------|-----------|
+| Users signing in with GitHub accounts | [GitHub OAuth](./github-oauth.md) |
+| Run on a server instead of user machines | [Backend Services](./backend-services.md) |
+| Use your own model keys | [BYOK](./byok.md) |
+
+## Next Steps
+
+- **[BYOK guide](./byok.md)** — Use your own model provider keys
+- **[Session Persistence](../session-persistence.md)** — Advanced session management
+- **[Getting Started tutorial](../../getting-started.md)** — Build a complete app
diff --git a/docs/guides/setup/byok.md b/docs/guides/setup/byok.md
new file mode 100644
index 00000000..3a6ce596
--- /dev/null
+++ b/docs/guides/setup/byok.md
@@ -0,0 +1,359 @@
+# BYOK (Bring Your Own Key) Setup
+
+Use your own model provider API keys instead of GitHub Copilot authentication. You control the identity layer, the model provider, and the billing — the SDK provides the agent runtime.
+
+**Best for:** Apps where users don't have GitHub accounts, enterprise deployments with existing model provider contracts, apps needing full control over identity and billing.
+
+## How It Works
+
+With BYOK, the SDK uses the Copilot CLI as an agent runtime only — it doesn't call GitHub's Copilot API. Instead, model requests go directly to your configured provider (OpenAI, Azure AI Foundry, Anthropic, etc.).
+
+```mermaid
+flowchart LR
+    subgraph App["Your Application"]
+        SDK["SDK Client"]
+        IdP["Your Identity<br/>Provider"]
+    end
+
+    subgraph CLI["Copilot CLI"]
+        Runtime["Agent Runtime"]
+    end
+
+    subgraph Provider["Your Model Provider"]
+        API["OpenAI / Azure /<br/>Anthropic / Ollama"]
+    end
+
+    IdP -.->|"authenticates<br/>users"| SDK
+    SDK --> Runtime
+    Runtime -- "API key" --> API
+
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style Provider fill:#161b22,stroke:#f0883e,color:#c9d1d9
+```
+
+**Key characteristics:**
+- No GitHub Copilot subscription needed
+- No GitHub account needed for end users
+- You manage authentication and identity yourself
+- Model requests go to your provider, billed to your account
+- Full agent runtime capabilities (tools, sessions, streaming) still work
+
+## Architecture: GitHub Auth vs. BYOK
+
+```mermaid
+flowchart TB
+    subgraph GitHub["GitHub Auth Path"]
+        direction LR
+        G1["User"] --> G2["GitHub OAuth"]
+        G2 --> G3["SDK + CLI"]
+        G3 --> G4["☁️ Copilot API"]
+    end
+
+    subgraph BYOK["BYOK Path"]
+        direction LR
+        B1["User"] --> B2["Your Auth"]
+        B2 --> B3["SDK + CLI"]
+        B3 --> B4["☁️ Your Provider"]
+    end
+
+    style GitHub fill:#161b22,stroke:#8b949e,color:#c9d1d9
+    style BYOK fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+## Quick Start
+
+<details open>
+<summary><strong>Node.js / TypeScript</strong></summary>
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+
+const client = new CopilotClient();
+
+const session = await client.createSession({
+    model: "gpt-4.1",
+    provider: {
+        type: "openai",
+        baseUrl: "https://api.openai.com/v1",
+        apiKey: process.env.OPENAI_API_KEY,
+    },
+});
+
+const response = await session.sendAndWait({ prompt: "Hello!" });
+console.log(response?.data.content);
+
+await client.stop();
+```
+
+</details>
+
+<details>
+<summary><strong>Python</strong></summary>
+
+```python
+import os
+from copilot import CopilotClient
+
+client = CopilotClient()
+await client.start()
+
+session = await client.create_session({
+    "model": "gpt-4.1",
+    "provider": {
+        "type": "openai",
+        "base_url": "https://api.openai.com/v1",
+        "api_key": os.environ["OPENAI_API_KEY"],
+    },
+})
+
+response = await session.send_and_wait({"prompt": "Hello!"})
+print(response.data.content)
+
+await client.stop()
+```
+
+</details>
+
+<details>
+<summary><strong>Go</strong></summary>
+
+```go
+client := copilot.NewClient(nil)
+client.Start(ctx)
+defer client.Stop()
+
+session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
+    Model: "gpt-4.1",
+    Provider: &copilot.ProviderConfig{
+        Type:    "openai",
+        BaseURL: "https://api.openai.com/v1",
+        APIKey:  os.Getenv("OPENAI_API_KEY"),
+    },
+})
+
+response, _ := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: "Hello!"})
+fmt.Println(*response.Data.Content)
+```
+
+</details>
+
+<details>
+<summary><strong>.NET</strong></summary>
+
+```csharp
+await using var client = new CopilotClient();
+await using var session = await client.CreateSessionAsync(new SessionConfig
+{
+    Model = "gpt-4.1",
+    Provider = new ProviderConfig
+    {
+        Type = "openai",
+        BaseUrl = "https://api.openai.com/v1",
+        ApiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY"),
+    },
+});
+
+var response = await session.SendAndWaitAsync(
+    new MessageOptions { Prompt = "Hello!" });
+Console.WriteLine(response?.Data.Content);
+```
+
+</details>
+
+## Provider Configurations
+
+### OpenAI
+
+```typescript
+provider: {
+    type: "openai",
+    baseUrl: "https://api.openai.com/v1",
+    apiKey: process.env.OPENAI_API_KEY,
+}
+```
+
+### Azure AI Foundry
+
+```typescript
+provider: {
+    type: "openai",
+    baseUrl: "https://your-resource.openai.azure.com/openai/v1/",
+    apiKey: process.env.FOUNDRY_API_KEY,
+    wireApi: "responses",  // For GPT-5 series models
+}
+```
+
+### Azure OpenAI (Native)
+
+```typescript
+provider: {
+    type: "azure",
+    baseUrl: "https://your-resource.openai.azure.com",
+    apiKey: process.env.AZURE_OPENAI_KEY,
+    azure: { apiVersion: "2024-10-21" },
+}
+```
+
+### Anthropic
+
+```typescript
+provider: {
+    type: "anthropic",
+    baseUrl: "https://api.anthropic.com",
+    apiKey: process.env.ANTHROPIC_API_KEY,
+}
+```
+
+### Ollama (Local)
+
+```typescript
+provider: {
+    type: "openai",
+    baseUrl: "http://localhost:11434/v1",
+    // No API key needed for local Ollama
+}
+```
+
+## Managing Identity Yourself
+
+With BYOK, you're responsible for authentication. Here are common patterns:
+
+### Pattern 1: Your Own Identity Provider
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant App as Your App
+    participant IdP as Your Identity Provider
+    participant SDK as SDK + CLI
+    participant LLM as Model Provider
+
+    User->>App: Login
+    App->>IdP: Authenticate user
+    IdP-->>App: User identity + permissions
+
+    App->>App: Look up API key for user's tier
+    App->>SDK: Create session (with provider config)
+    SDK->>LLM: Model request (your API key)
+    LLM-->>SDK: Response
+    SDK-->>App: Result
+    App-->>User: Display
+```
+
+```typescript
+// Your app handles auth, then creates sessions with your API key
+app.post("/chat", authMiddleware, async (req, res) => {
+    const user = req.user;  // From your auth middleware
+
+    // Use your API key — not the user's
+    const session = await getOrCreateSession(user.id, {
+        model: getModelForTier(user.tier),  // "gpt-4.1" for pro, etc.
+        provider: {
+            type: "openai",
+            baseUrl: "https://api.openai.com/v1",
+            apiKey: process.env.OPENAI_API_KEY,  // Your key, your billing
+        },
+    });
+
+    const response = await session.sendAndWait({ prompt: req.body.message });
+    res.json({ content: response?.data.content });
+});
+```
+
+### Pattern 2: Per-Customer API Keys
+
+For B2B apps where each customer brings their own model provider keys:
+
+```mermaid
+flowchart TB
+    subgraph Customers
+        C1["Customer A<br/>(OpenAI key)"]
+        C2["Customer B<br/>(Azure key)"]
+        C3["Customer C<br/>(Anthropic key)"]
+    end
+
+    subgraph App["Your App"]
+        Router["Request Router"]
+        KS["Key Store<br/>(encrypted)"]
+    end
+
+    C1 --> Router
+    C2 --> Router
+    C3 --> Router
+
+    Router --> KS
+    KS --> SDK1["SDK → OpenAI"]
+    KS --> SDK2["SDK → Azure"]
+    KS --> SDK3["SDK → Anthropic"]
+
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+```typescript
+async function createSessionForCustomer(customerId: string) {
+    const config = await keyStore.getProviderConfig(customerId);
+
+    return client.createSession({
+        sessionId: `customer-${customerId}-${Date.now()}`,
+        model: config.model,
+        provider: {
+            type: config.providerType,
+            baseUrl: config.baseUrl,
+            apiKey: config.apiKey,
+        },
+    });
+}
+```
+
+## Session Persistence with BYOK
+
+When resuming BYOK sessions, you **must** re-provide the provider configuration. API keys are never persisted to disk for security.
+
+```typescript
+// Create session
+const session = await client.createSession({
+    sessionId: "task-123",
+    model: "gpt-4.1",
+    provider: {
+        type: "openai",
+        baseUrl: "https://api.openai.com/v1",
+        apiKey: process.env.OPENAI_API_KEY,
+    },
+});
+
+// Resume later — must re-provide provider config
+const resumed = await client.resumeSession("task-123", {
+    provider: {
+        type: "openai",
+        baseUrl: "https://api.openai.com/v1",
+        apiKey: process.env.OPENAI_API_KEY,  // Required again
+    },
+});
+```
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **Static credentials only** | API keys or bearer tokens — no Entra ID, OIDC, or managed identities |
+| **No auto-refresh** | If a bearer token expires, you must create a new session |
+| **Your billing** | All model usage is billed to your provider account |
+| **Model availability** | Limited to what your provider offers |
+| **Keys not persisted** | Must re-provide on session resume |
+
+For the full BYOK reference, see the **[BYOK documentation](../../auth/byok.md)**.
+
+## When to Move On
+
+| Need | Next Guide |
+|------|-----------|
+| Run the SDK on a server | [Backend Services](./backend-services.md) |
+| Multiple users with GitHub accounts | [GitHub OAuth](./github-oauth.md) |
+| Handle many concurrent users | [Scaling & Multi-Tenancy](./scaling.md) |
+
+## Next Steps
+
+- **[BYOK reference](../../auth/byok.md)** — Full provider config details and troubleshooting
+- **[Backend Services](./backend-services.md)** — Deploy the SDK server-side
+- **[Scaling & Multi-Tenancy](./scaling.md)** — Serve many customers at scale
diff --git a/docs/guides/setup/github-oauth.md b/docs/guides/setup/github-oauth.md
new file mode 100644
index 00000000..a7aac473
--- /dev/null
+++ b/docs/guides/setup/github-oauth.md
@@ -0,0 +1,383 @@
+# GitHub OAuth Setup
+
+Let users authenticate with their GitHub accounts to use Copilot through your application. This supports individual accounts, organization memberships, and enterprise identities.
+
+**Best for:** Multi-user apps, internal tools with org access control, SaaS products, apps where users have GitHub accounts.
+
+## How It Works
+
+You create a GitHub OAuth App (or GitHub App), users authorize it, and you pass their access token to the SDK. Copilot requests are made on behalf of each authenticated user, using their Copilot subscription.
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant App as Your App
+    participant GH as GitHub
+    participant SDK as SDK Client
+    participant CLI as Copilot CLI
+    participant API as Copilot API
+
+    User->>App: Click "Sign in with GitHub"
+    App->>GH: Redirect to OAuth authorize
+    GH->>User: "Authorize this app?"
+    User->>GH: Approve
+    GH->>App: Authorization code
+    App->>GH: Exchange code for token
+    GH-->>App: Access token (gho_xxx)
+
+    App->>SDK: Create client with token
+    SDK->>CLI: Start with githubToken
+    CLI->>API: Request (as user)
+    API-->>CLI: Response
+    CLI-->>SDK: Result
+    SDK-->>App: Display to user
+```
+
+**Key characteristics:**
+- Each user authenticates with their own GitHub account
+- Copilot usage is billed to each user's subscription
+- Supports GitHub organizations and enterprise accounts
+- Your app never handles model API keys — GitHub manages everything
+
+## Architecture
+
+```mermaid
+flowchart TB
+    subgraph Users["Users"]
+        U1["👤 User A<br/>(Org Member)"]
+        U2["👤 User B<br/>(Enterprise)"]
+        U3["👤 User C<br/>(Personal)"]
+    end
+
+    subgraph App["Your Application"]
+        OAuth["OAuth Flow"]
+        TokenStore["Token Store"]
+        SDK["SDK Client(s)"]
+    end
+
+    subgraph CLI["Copilot CLI"]
+        RPC["JSON-RPC"]
+    end
+
+    U1 --> OAuth
+    U2 --> OAuth
+    U3 --> OAuth
+    OAuth --> TokenStore
+    TokenStore --> SDK
+    SDK --> RPC
+    RPC --> Copilot["☁️ GitHub Copilot"]
+
+    style Users fill:#161b22,stroke:#8b949e,color:#c9d1d9
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+## Step 1: Create a GitHub OAuth App
+
+1. Go to **GitHub Settings → Developer Settings → OAuth Apps → New OAuth App**
+   (or for organizations: **Organization Settings → Developer Settings**)
+
+2. Fill in:
+   - **Application name**: Your app's name
+   - **Homepage URL**: Your app's URL
+   - **Authorization callback URL**: Your OAuth callback endpoint (e.g., `https://yourapp.com/auth/callback`)
+
+3. Note your **Client ID** and generate a **Client Secret**
+
+> **GitHub App vs OAuth App:** Both work. GitHub Apps offer finer-grained permissions and are recommended for new projects. OAuth Apps are simpler to set up. The token flow is the same from the SDK's perspective.
+
+## Step 2: Implement the OAuth Flow
+
+Your application handles the standard GitHub OAuth flow. Here's the server-side token exchange:
+
+```typescript
+// Server-side: Exchange authorization code for user token
+async function handleOAuthCallback(code: string): Promise<string> {
+    const response = await fetch("https://github.com/login/oauth/access_token", {
+        method: "POST",
+        headers: {
+            "Content-Type": "application/json",
+            Accept: "application/json",
+        },
+        body: JSON.stringify({
+            client_id: process.env.GITHUB_CLIENT_ID,
+            client_secret: process.env.GITHUB_CLIENT_SECRET,
+            code,
+        }),
+    });
+
+    const data = await response.json();
+    return data.access_token; // gho_xxxx or ghu_xxxx
+}
+```
+
+## Step 3: Pass the Token to the SDK
+
+Create a SDK client for each authenticated user, passing their token:
+
+<details open>
+<summary><strong>Node.js / TypeScript</strong></summary>
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+
+// Create a client for an authenticated user
+function createClientForUser(userToken: string): CopilotClient {
+    return new CopilotClient({
+        githubToken: userToken,
+        useLoggedInUser: false,  // Don't fall back to CLI login
+    });
+}
+
+// Usage
+const client = createClientForUser("gho_user_access_token");
+const session = await client.createSession({
+    sessionId: `user-${userId}-session`,
+    model: "gpt-4.1",
+});
+
+const response = await session.sendAndWait({ prompt: "Hello!" });
+```
+
+</details>
+
+<details>
+<summary><strong>Python</strong></summary>
+
+```python
+from copilot import CopilotClient
+
+def create_client_for_user(user_token: str) -> CopilotClient:
+    return CopilotClient({
+        "github_token": user_token,
+        "use_logged_in_user": False,
+    })
+
+# Usage
+client = create_client_for_user("gho_user_access_token")
+await client.start()
+
+session = await client.create_session({
+    "session_id": f"user-{user_id}-session",
+    "model": "gpt-4.1",
+})
+
+response = await session.send_and_wait({"prompt": "Hello!"})
+```
+
+</details>
+
+<details>
+<summary><strong>Go</strong></summary>
+
+```go
+func createClientForUser(userToken string) *copilot.Client {
+    return copilot.NewClient(&copilot.ClientOptions{
+        GithubToken:     userToken,
+        UseLoggedInUser: copilot.Bool(false),
+    })
+}
+
+// Usage
+client := createClientForUser("gho_user_access_token")
+client.Start(ctx)
+defer client.Stop()
+
+session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
+    SessionID: fmt.Sprintf("user-%s-session", userID),
+    Model:     "gpt-4.1",
+})
+response, _ := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: "Hello!"})
+```
+
+</details>
+
+<details>
+<summary><strong>.NET</strong></summary>
+
+```csharp
+CopilotClient CreateClientForUser(string userToken) =>
+    new CopilotClient(new CopilotClientOptions
+    {
+        GithubToken = userToken,
+        UseLoggedInUser = false,
+    });
+
+// Usage
+await using var client = CreateClientForUser("gho_user_access_token");
+await using var session = await client.CreateSessionAsync(new SessionConfig
+{
+    SessionId = $"user-{userId}-session",
+    Model = "gpt-4.1",
+});
+
+var response = await session.SendAndWaitAsync(
+    new MessageOptions { Prompt = "Hello!" });
+```
+
+</details>
+
+## Enterprise & Organization Access
+
+GitHub OAuth naturally supports enterprise scenarios. When users authenticate with GitHub, their org memberships and enterprise associations come along.
+
+```mermaid
+flowchart TB
+    subgraph Enterprise["GitHub Enterprise"]
+        Org1["Org: Engineering"]
+        Org2["Org: Data Science"]
+    end
+
+    subgraph Users
+        U1["👤 Alice<br/>(Engineering)"]
+        U2["👤 Bob<br/>(Data Science)"]
+    end
+
+    U1 -.->|member| Org1
+    U2 -.->|member| Org2
+
+    subgraph App["Your Internal App"]
+        OAuth["OAuth + Org Check"]
+        SDK["SDK Client"]
+    end
+
+    U1 --> OAuth
+    U2 --> OAuth
+    OAuth -->|"Verify org membership"| GH["GitHub API"]
+    OAuth --> SDK
+
+    style Enterprise fill:#161b22,stroke:#f0883e,color:#c9d1d9
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+### Verify Organization Membership
+
+After OAuth, check that the user belongs to your organization:
+
+```typescript
+async function verifyOrgMembership(
+    token: string,
+    requiredOrg: string
+): Promise<boolean> {
+    const response = await fetch("https://api.github.com/user/orgs", {
+        headers: { Authorization: `Bearer ${token}` },
+    });
+    const orgs = await response.json();
+    return orgs.some((org: any) => org.login === requiredOrg);
+}
+
+// In your auth flow
+const token = await handleOAuthCallback(code);
+if (!await verifyOrgMembership(token, "my-company")) {
+    throw new Error("User is not a member of the required organization");
+}
+const client = createClientForUser(token);
+```
+
+### Enterprise Managed Users (EMU)
+
+For GitHub Enterprise Managed Users, the flow is identical — EMU users authenticate through GitHub OAuth like any other user. Their enterprise policies (IP restrictions, SAML SSO) are enforced by GitHub automatically.
+
+```typescript
+// No special SDK configuration needed for EMU
+// Enterprise policies are enforced server-side by GitHub
+const client = new CopilotClient({
+    githubToken: emuUserToken,  // Works the same as regular tokens
+    useLoggedInUser: false,
+});
+```
+
+## Supported Token Types
+
+| Token Prefix | Source | Works? |
+|-------------|--------|--------|
+| `gho_` | OAuth user access token | ✅ |
+| `ghu_` | GitHub App user access token | ✅ |
+| `github_pat_` | Fine-grained personal access token | ✅ |
+| `ghp_` | Classic personal access token | ❌ (deprecated) |
+
+## Token Lifecycle
+
+```mermaid
+flowchart LR
+    A["User authorizes"] --> B["Token issued<br/>(gho_xxx)"]
+    B --> C{"Token valid?"}
+    C -->|Yes| D["SDK uses token"]
+    C -->|No| E["Refresh or<br/>re-authorize"]
+    E --> B
+    D --> F{"User revokes<br/>or token expires?"}
+    F -->|Yes| E
+    F -->|No| D
+
+    style A fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style E fill:#0d1117,stroke:#f0883e,color:#c9d1d9
+```
+
+**Important:** Your application is responsible for token storage, refresh, and expiration handling. The SDK uses whatever token you provide — it doesn't manage the OAuth lifecycle.
+
+### Token Refresh Pattern
+
+```typescript
+async function getOrRefreshToken(userId: string): Promise<string> {
+    const stored = await tokenStore.get(userId);
+
+    if (stored && !isExpired(stored)) {
+        return stored.accessToken;
+    }
+
+    if (stored?.refreshToken) {
+        const refreshed = await refreshGitHubToken(stored.refreshToken);
+        await tokenStore.set(userId, refreshed);
+        return refreshed.accessToken;
+    }
+
+    throw new Error("User must re-authenticate");
+}
+```
+
+## Multi-User Patterns
+
+### One Client Per User (Recommended)
+
+Each user gets their own SDK client with their own token. This provides the strongest isolation.
+
+```typescript
+const clients = new Map<string, CopilotClient>();
+
+function getClientForUser(userId: string, token: string): CopilotClient {
+    if (!clients.has(userId)) {
+        clients.set(userId, new CopilotClient({
+            githubToken: token,
+            useLoggedInUser: false,
+        }));
+    }
+    return clients.get(userId)!;
+}
+```
+
+### Shared CLI with Per-Request Tokens
+
+For a lighter resource footprint, you can run a single external CLI server and pass tokens per session. See [Backend Services](./backend-services.md) for this pattern.
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **Copilot subscription required** | Each user needs an active Copilot subscription |
+| **Token management is your responsibility** | Store, refresh, and handle expiration |
+| **GitHub account required** | Users must have GitHub accounts |
+| **Rate limits per user** | Subject to each user's Copilot rate limits |
+
+## When to Move On
+
+| Need | Next Guide |
+|------|-----------|
+| Users without GitHub accounts | [BYOK](./byok.md) |
+| Run the SDK on servers | [Backend Services](./backend-services.md) |
+| Handle many concurrent users | [Scaling & Multi-Tenancy](./scaling.md) |
+
+## Next Steps
+
+- **[Authentication docs](../../auth/index.md)** — Full auth method reference
+- **[Backend Services](./backend-services.md)** — Run the SDK server-side
+- **[Scaling & Multi-Tenancy](./scaling.md)** — Handle many users at scale
diff --git a/docs/guides/setup/index.md b/docs/guides/setup/index.md
new file mode 100644
index 00000000..54e4a2db
--- /dev/null
+++ b/docs/guides/setup/index.md
@@ -0,0 +1,142 @@
+# Setup Guides
+
+These guides walk you through configuring the Copilot SDK for your specific use case — from personal side projects to production platforms serving thousands of users.
+
+## Architecture at a Glance
+
+Every Copilot SDK integration follows the same core pattern: your application talks to the SDK, which communicates with the Copilot CLI over JSON-RPC. What changes across setups is **where the CLI runs**, **how users authenticate**, and **how sessions are managed**.
+
+```mermaid
+flowchart TB
+    subgraph YourApp["Your Application"]
+        SDK["SDK Client"]
+    end
+
+    subgraph CLI["Copilot CLI"]
+        direction TB
+        RPC["JSON-RPC Server"]
+        Auth["Authentication"]
+        Sessions["Session Manager"]
+        Models["Model Provider"]
+    end
+
+    SDK -- "JSON-RPC<br/>(stdio or TCP)" --> RPC
+    RPC --> Auth
+    RPC --> Sessions
+    Auth --> Models
+
+    style YourApp fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#161b22,stroke:#3fb950,color:#c9d1d9
+```
+
+The setup guides below help you configure each layer for your scenario.
+
+## Who Are You?
+
+### 🧑‍💻 Hobbyist
+
+You're building a personal assistant, side project, or experimental app. You want the simplest path to getting Copilot in your code.
+
+**Start with:**
+1. **[Local CLI](./local-cli.md)** — Use the CLI already signed in on your machine
+2. **[Bundled CLI](./bundled-cli.md)** — Package everything into a standalone app
+
+### 🏢 Internal App Developer
+
+You're building tools for your team or company. Users are employees who need to authenticate with their enterprise GitHub accounts or org memberships.
+
+**Start with:**
+1. **[GitHub OAuth](./github-oauth.md)** — Let employees sign in with their GitHub accounts
+2. **[Backend Services](./backend-services.md)** — Run the SDK in your internal services
+
+**If scaling beyond a single server:**
+3. **[Scaling & Multi-Tenancy](./scaling.md)** — Handle multiple users and services
+
+### 🚀 App Developer (ISV)
+
+You're building a product for customers. You need to handle authentication for your users — either through GitHub or by managing identity yourself.
+
+**Start with:**
+1. **[GitHub OAuth](./github-oauth.md)** — Let customers sign in with GitHub
+2. **[BYOK](./byok.md)** — Manage identity yourself with your own model keys
+3. **[Backend Services](./backend-services.md)** — Power your product from server-side code
+
+**For production:**
+4. **[Scaling & Multi-Tenancy](./scaling.md)** — Serve many customers reliably
+
+### 🏗️ Platform Developer
+
+You're embedding Copilot into a platform — APIs, developer tools, or infrastructure that other developers build on. You need fine-grained control over sessions, scaling, and multi-tenancy.
+
+**Start with:**
+1. **[Backend Services](./backend-services.md)** — Core server-side integration
+2. **[Scaling & Multi-Tenancy](./scaling.md)** — Session isolation, horizontal scaling, persistence
+
+**Depending on your auth model:**
+3. **[GitHub OAuth](./github-oauth.md)** — For GitHub-authenticated users
+4. **[BYOK](./byok.md)** — For self-managed identity and model access
+
+## Decision Matrix
+
+Use this table to find the right guides based on what you need to do:
+
+| What you need | Guide |
+|---------------|-------|
+| Simplest possible setup | [Local CLI](./local-cli.md) |
+| Ship a standalone app with Copilot | [Bundled CLI](./bundled-cli.md) |
+| Users sign in with GitHub | [GitHub OAuth](./github-oauth.md) |
+| Use your own model keys (OpenAI, Azure, etc.) | [BYOK](./byok.md) |
+| Run the SDK on a server | [Backend Services](./backend-services.md) |
+| Serve multiple users / scale horizontally | [Scaling & Multi-Tenancy](./scaling.md) |
+
+## Configuration Comparison
+
+```mermaid
+flowchart LR
+    subgraph Auth["Authentication"]
+        A1["Signed-in CLI<br/>(local)"]
+        A2["GitHub OAuth<br/>(multi-user)"]
+        A3["Env Vars / Tokens<br/>(server)"]
+        A4["BYOK<br/>(your keys)"]
+    end
+
+    subgraph Deploy["Deployment"]
+        D1["Local Process<br/>(auto-managed)"]
+        D2["Bundled Binary<br/>(shipped with app)"]
+        D3["External Server<br/>(headless CLI)"]
+    end
+
+    subgraph Scale["Scaling"]
+        S1["Single User<br/>(one CLI)"]
+        S2["Multi-User<br/>(shared CLI)"]
+        S3["Isolated<br/>(CLI per user)"]
+    end
+
+    A1 --> D1 --> S1
+    A2 --> D3 --> S2
+    A3 --> D3 --> S2
+    A4 --> D2 --> S1
+    A2 --> D3 --> S3
+    A3 --> D3 --> S3
+
+    style Auth fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style Deploy fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style Scale fill:#0d1117,stroke:#f0883e,color:#c9d1d9
+```
+
+## Prerequisites
+
+All guides assume you have:
+
+- **Copilot CLI** installed ([Installation guide](https://docs.github.com/en/copilot/how-tos/set-up/install-copilot-cli))
+- **One of the SDKs** installed:
+  - Node.js: `npm install @github/copilot-sdk`
+  - Python: `pip install github-copilot-sdk`
+  - Go: `go get github.com/github/copilot-sdk/go`
+  - .NET: `dotnet add package GitHub.Copilot.SDK`
+
+If you're brand new, start with the **[Getting Started tutorial](../../getting-started.md)** first, then come back here for production configuration.
+
+## Next Steps
+
+Pick the guide that matches your situation from the [decision matrix](#decision-matrix) above, or start with the persona description closest to your role.
diff --git a/docs/guides/setup/local-cli.md b/docs/guides/setup/local-cli.md
new file mode 100644
index 00000000..8d9573eb
--- /dev/null
+++ b/docs/guides/setup/local-cli.md
@@ -0,0 +1,207 @@
+# Local CLI Setup
+
+Use the Copilot SDK with the CLI already signed in on your machine. This is the simplest configuration — zero auth code, zero infrastructure.
+
+**Best for:** Personal projects, prototyping, local development, learning the SDK.
+
+## How It Works
+
+When you install the Copilot CLI and sign in, your credentials are stored in the system keychain. The SDK automatically starts the CLI as a child process and uses those stored credentials.
+
+```mermaid
+flowchart LR
+    subgraph YourMachine["Your Machine"]
+        App["Your App"] --> SDK["SDK Client"]
+        SDK -- "stdio" --> CLI["Copilot CLI<br/>(auto-started)"]
+        CLI --> Keychain["🔐 System Keychain<br/>(stored credentials)"]
+    end
+    CLI -- "API calls" --> Copilot["☁️ GitHub Copilot"]
+
+    style YourMachine fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+**Key characteristics:**
+- CLI is spawned automatically by the SDK (no setup needed)
+- Authentication uses the signed-in user's credentials from the system keychain
+- Communication happens over stdio (stdin/stdout) — no network ports
+- Sessions are local to your machine
+
+## Quick Start
+
+The default configuration requires no options at all:
+
+<details open>
+<summary><strong>Node.js / TypeScript</strong></summary>
+
+```typescript
+import { CopilotClient } from "@github/copilot-sdk";
+
+const client = new CopilotClient();
+const session = await client.createSession({ model: "gpt-4.1" });
+
+const response = await session.sendAndWait({ prompt: "Hello!" });
+console.log(response?.data.content);
+
+await client.stop();
+```
+
+</details>
+
+<details>
+<summary><strong>Python</strong></summary>
+
+```python
+from copilot import CopilotClient
+
+client = CopilotClient()
+await client.start()
+
+session = await client.create_session({"model": "gpt-4.1"})
+response = await session.send_and_wait({"prompt": "Hello!"})
+print(response.data.content)
+
+await client.stop()
+```
+
+</details>
+
+<details>
+<summary><strong>Go</strong></summary>
+
+```go
+client := copilot.NewClient(nil)
+if err := client.Start(ctx); err != nil {
+    log.Fatal(err)
+}
+defer client.Stop()
+
+session, _ := client.CreateSession(ctx, &copilot.SessionConfig{Model: "gpt-4.1"})
+response, _ := session.SendAndWait(ctx, copilot.MessageOptions{Prompt: "Hello!"})
+fmt.Println(*response.Data.Content)
+```
+
+</details>
+
+<details>
+<summary><strong>.NET</strong></summary>
+
+```csharp
+await using var client = new CopilotClient();
+await using var session = await client.CreateSessionAsync(
+    new SessionConfig { Model = "gpt-4.1" });
+
+var response = await session.SendAndWaitAsync(
+    new MessageOptions { Prompt = "Hello!" });
+Console.WriteLine(response?.Data.Content);
+```
+
+</details>
+
+That's it. The SDK handles everything: starting the CLI, authenticating, and managing the session.
+
+## What's Happening Under the Hood
+
+```mermaid
+sequenceDiagram
+    participant App as Your App
+    participant SDK as SDK Client
+    participant CLI as Copilot CLI
+    participant GH as GitHub API
+
+    App->>SDK: new CopilotClient()
+    Note over SDK: Locates CLI binary
+
+    App->>SDK: createSession()
+    SDK->>CLI: Spawn process (stdio)
+    CLI->>CLI: Load credentials from keychain
+    CLI->>GH: Authenticate
+    GH-->>CLI: ✅ Valid session
+    CLI-->>SDK: Session created
+    SDK-->>App: Session ready
+
+    App->>SDK: sendAndWait("Hello!")
+    SDK->>CLI: JSON-RPC request
+    CLI->>GH: Model API call
+    GH-->>CLI: Response
+    CLI-->>SDK: JSON-RPC response
+    SDK-->>App: Response data
+```
+
+## Configuration Options
+
+While defaults work great, you can customize the local setup:
+
+```typescript
+const client = new CopilotClient({
+    // Override CLI location (default: bundled with @github/copilot)
+    cliPath: "/usr/local/bin/copilot",
+
+    // Set log level for debugging
+    logLevel: "debug",
+
+    // Pass extra CLI arguments
+    cliArgs: ["--disable-telemetry"],
+
+    // Set working directory
+    cwd: "/path/to/project",
+
+    // Auto-restart CLI if it crashes (default: true)
+    autoRestart: true,
+});
+```
+
+## Using Environment Variables
+
+Instead of the keychain, you can authenticate via environment variables. This is useful for CI or when you don't want interactive login.
+
+```bash
+# Set one of these (in priority order):
+export COPILOT_GITHUB_TOKEN="gho_xxxx"   # Recommended
+export GH_TOKEN="gho_xxxx"               # GitHub CLI compatible
+export GITHUB_TOKEN="gho_xxxx"           # GitHub Actions compatible
+```
+
+The SDK picks these up automatically — no code changes needed.
+
+## Managing Sessions
+
+With the local CLI, sessions default to ephemeral. To create resumable sessions, provide your own session ID:
+
+```typescript
+// Create a named session
+const session = await client.createSession({
+    sessionId: "my-project-analysis",
+    model: "gpt-4.1",
+});
+
+// Later, resume it
+const resumed = await client.resumeSession("my-project-analysis");
+```
+
+Session state is stored locally at `~/.copilot/session-state/{sessionId}/`.
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **Single user** | Credentials are tied to whoever signed in to the CLI |
+| **Local only** | The CLI runs on the same machine as your app |
+| **No multi-tenant** | Can't serve multiple users from one CLI instance |
+| **Requires CLI login** | User must run `copilot` and authenticate first |
+
+## When to Move On
+
+If you need any of these, it's time to pick a more advanced setup:
+
+| Need | Next Guide |
+|------|-----------|
+| Ship your app to others | [Bundled CLI](./bundled-cli.md) |
+| Multiple users signing in | [GitHub OAuth](./github-oauth.md) |
+| Run on a server | [Backend Services](./backend-services.md) |
+| Use your own model keys | [BYOK](./byok.md) |
+
+## Next Steps
+
+- **[Getting Started tutorial](../../getting-started.md)** — Build a complete interactive app
+- **[Authentication docs](../../auth/index.md)** — All auth methods in detail
+- **[Session Persistence](../session-persistence.md)** — Advanced session management
diff --git a/docs/guides/setup/scaling.md b/docs/guides/setup/scaling.md
new file mode 100644
index 00000000..fcdb716d
--- /dev/null
+++ b/docs/guides/setup/scaling.md
@@ -0,0 +1,635 @@
+# Scaling & Multi-Tenancy
+
+Design your Copilot SDK deployment to serve multiple users, handle concurrent sessions, and scale horizontally across infrastructure. This guide covers session isolation patterns, scaling topologies, and production best practices.
+
+**Best for:** Platform developers, SaaS builders, any deployment serving more than a handful of concurrent users.
+
+## Core Concepts
+
+Before choosing a pattern, understand three dimensions of scaling:
+
+```mermaid
+flowchart TB
+    subgraph Dimensions["Scaling Dimensions"]
+        direction LR
+        I["🔒 Isolation<br/>Who sees what?"]
+        C["⚡ Concurrency<br/>How many at once?"]
+        P["💾 Persistence<br/>How long do sessions live?"]
+    end
+
+    I --> I1["Shared CLI<br/>vs. CLI per user"]
+    C --> C1["Session pooling<br/>vs. on-demand"]
+    P --> P1["Ephemeral<br/>vs. persistent"]
+
+    style Dimensions fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+## Session Isolation Patterns
+
+### Pattern 1: Isolated CLI Per User
+
+Each user gets their own CLI server instance. Strongest isolation — a user's sessions, memory, and processes are completely separated.
+
+```mermaid
+flowchart TB
+    LB["Load Balancer"]
+
+    subgraph User_A["User A"]
+        SDK_A["SDK Client"] --> CLI_A["CLI Server A<br/>:4321"]
+        CLI_A --> SA["📁 Sessions A"]
+    end
+
+    subgraph User_B["User B"]
+        SDK_B["SDK Client"] --> CLI_B["CLI Server B<br/>:4322"]
+        CLI_B --> SB["📁 Sessions B"]
+    end
+
+    subgraph User_C["User C"]
+        SDK_C["SDK Client"] --> CLI_C["CLI Server C<br/>:4323"]
+        CLI_C --> SC["📁 Sessions C"]
+    end
+
+    LB --> SDK_A
+    LB --> SDK_B
+    LB --> SDK_C
+
+    style User_A fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style User_B fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style User_C fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+**When to use:**
+- Multi-tenant SaaS where data isolation is critical
+- Users with different auth credentials
+- Compliance requirements (SOC 2, HIPAA)
+
+```typescript
+// CLI pool manager — one CLI per user
+class CLIPool {
+    private instances = new Map<string, { client: CopilotClient; port: number }>();
+    private nextPort = 5000;
+
+    async getClientForUser(userId: string, token?: string): Promise<CopilotClient> {
+        if (this.instances.has(userId)) {
+            return this.instances.get(userId)!.client;
+        }
+
+        const port = this.nextPort++;
+
+        // Spawn a dedicated CLI for this user
+        await spawnCLI(port, token);
+
+        const client = new CopilotClient({
+            cliUrl: `localhost:${port}`,
+        });
+
+        this.instances.set(userId, { client, port });
+        return client;
+    }
+
+    async releaseUser(userId: string): Promise<void> {
+        const instance = this.instances.get(userId);
+        if (instance) {
+            await instance.client.stop();
+            this.instances.delete(userId);
+        }
+    }
+}
+```
+
+### Pattern 2: Shared CLI with Session Isolation
+
+Multiple users share one CLI server but have isolated sessions via unique session IDs. Lighter on resources, but weaker isolation.
+
+```mermaid
+flowchart TB
+    U1["👤 User A"]
+    U2["👤 User B"]
+    U3["👤 User C"]
+
+    subgraph App["Your App"]
+        Router["Session Router"]
+    end
+
+    subgraph CLI["Shared CLI Server :4321"]
+        SA["Session: user-a-chat"]
+        SB["Session: user-b-chat"]
+        SC["Session: user-c-chat"]
+    end
+
+    U1 --> Router
+    U2 --> Router
+    U3 --> Router
+
+    Router --> SA
+    Router --> SB
+    Router --> SC
+
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+**When to use:**
+- Internal tools with trusted users
+- Resource-constrained environments
+- Lower isolation requirements
+
+```typescript
+const sharedClient = new CopilotClient({
+    cliUrl: "localhost:4321",
+});
+
+// Enforce session isolation through naming conventions
+function getSessionId(userId: string, purpose: string): string {
+    return `${userId}-${purpose}-${Date.now()}`;
+}
+
+// Access control: ensure users can only access their own sessions
+async function resumeSessionWithAuth(
+    sessionId: string,
+    currentUserId: string
+): Promise<Session> {
+    const [sessionUserId] = sessionId.split("-");
+    if (sessionUserId !== currentUserId) {
+        throw new Error("Access denied: session belongs to another user");
+    }
+    return sharedClient.resumeSession(sessionId);
+}
+```
+
+### Pattern 3: Shared Sessions (Collaborative)
+
+Multiple users interact with the same session — like a shared chat room with Copilot.
+
+```mermaid
+flowchart TB
+    U1["👤 Alice"]
+    U2["👤 Bob"]
+    U3["👤 Carol"]
+
+    subgraph App["Collaboration Layer"]
+        Queue["Message Queue<br/>(serialize access)"]
+        Lock["Session Lock"]
+    end
+
+    subgraph CLI["CLI Server"]
+        Session["Shared Session:<br/>team-project-review"]
+    end
+
+    U1 --> Queue
+    U2 --> Queue
+    U3 --> Queue
+
+    Queue --> Lock
+    Lock --> Session
+
+    style App fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style CLI fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+**When to use:**
+- Team collaboration tools
+- Shared code review sessions
+- Pair programming assistants
+
+> ⚠️ **Important:** The SDK doesn't provide built-in session locking. You **must** serialize access to prevent concurrent writes to the same session.
+
+```typescript
+import Redis from "ioredis";
+
+const redis = new Redis();
+
+async function withSessionLock<T>(
+    sessionId: string,
+    fn: () => Promise<T>,
+    timeoutSec = 300
+): Promise<T> {
+    const lockKey = `session-lock:${sessionId}`;
+    const lockId = crypto.randomUUID();
+
+    // Acquire lock
+    const acquired = await redis.set(lockKey, lockId, "NX", "EX", timeoutSec);
+    if (!acquired) {
+        throw new Error("Session is in use by another user");
+    }
+
+    try {
+        return await fn();
+    } finally {
+        // Release lock (only if we still own it)
+        const currentLock = await redis.get(lockKey);
+        if (currentLock === lockId) {
+            await redis.del(lockKey);
+        }
+    }
+}
+
+// Usage: serialize access to shared session
+app.post("/team-chat", authMiddleware, async (req, res) => {
+    const result = await withSessionLock("team-project-review", async () => {
+        const session = await client.resumeSession("team-project-review");
+        return session.sendAndWait({ prompt: req.body.message });
+    });
+
+    res.json({ content: result?.data.content });
+});
+```
+
+## Comparison of Isolation Patterns
+
+| | Isolated CLI Per User | Shared CLI + Session Isolation | Shared Sessions |
+|---|---|---|---|
+| **Isolation** | ✅ Complete | ⚠️ Logical | ❌ Shared |
+| **Resource usage** | High (CLI per user) | Low (one CLI) | Low (one CLI + session) |
+| **Complexity** | Medium | Low | High (locking) |
+| **Auth flexibility** | ✅ Per-user tokens | ⚠️ Service token | ⚠️ Service token |
+| **Best for** | Multi-tenant SaaS | Internal tools | Collaboration |
+
+## Horizontal Scaling
+
+### Multiple CLI Servers Behind a Load Balancer
+
+```mermaid
+flowchart TB
+    Users["👥 Users"] --> LB["Load Balancer"]
+
+    subgraph Pool["CLI Server Pool"]
+        CLI1["CLI Server 1<br/>:4321"]
+        CLI2["CLI Server 2<br/>:4322"]
+        CLI3["CLI Server 3<br/>:4323"]
+    end
+
+    subgraph Storage["Shared Storage"]
+        NFS["📁 Network File System<br/>or Cloud Storage"]
+    end
+
+    LB --> CLI1
+    LB --> CLI2
+    LB --> CLI3
+
+    CLI1 --> NFS
+    CLI2 --> NFS
+    CLI3 --> NFS
+
+    style Pool fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+    style Storage fill:#161b22,stroke:#f0883e,color:#c9d1d9
+```
+
+**Key requirement:** Session state must be on **shared storage** so any CLI server can resume any session.
+
+```typescript
+// Route sessions to CLI servers
+class CLILoadBalancer {
+    private servers: string[];
+    private currentIndex = 0;
+
+    constructor(servers: string[]) {
+        this.servers = servers;
+    }
+
+    // Round-robin selection
+    getNextServer(): string {
+        const server = this.servers[this.currentIndex];
+        this.currentIndex = (this.currentIndex + 1) % this.servers.length;
+        return server;
+    }
+
+    // Sticky sessions: same user always hits same server
+    getServerForUser(userId: string): string {
+        const hash = this.hashCode(userId);
+        return this.servers[hash % this.servers.length];
+    }
+
+    private hashCode(str: string): number {
+        let hash = 0;
+        for (let i = 0; i < str.length; i++) {
+            hash = (hash << 5) - hash + str.charCodeAt(i);
+            hash |= 0;
+        }
+        return Math.abs(hash);
+    }
+}
+
+const lb = new CLILoadBalancer([
+    "cli-1:4321",
+    "cli-2:4321",
+    "cli-3:4321",
+]);
+
+app.post("/chat", async (req, res) => {
+    const server = lb.getServerForUser(req.user.id);
+    const client = new CopilotClient({ cliUrl: server });
+
+    const session = await client.createSession({
+        sessionId: `user-${req.user.id}-chat`,
+        model: "gpt-4.1",
+    });
+
+    const response = await session.sendAndWait({ prompt: req.body.message });
+    res.json({ content: response?.data.content });
+});
+```
+
+### Sticky Sessions vs. Shared Storage
+
+```mermaid
+flowchart LR
+    subgraph Sticky["Sticky Sessions"]
+        direction TB
+        S1["User A → always CLI 1"]
+        S2["User B → always CLI 2"]
+        S3["✅ No shared storage needed"]
+        S4["❌ Uneven load if users vary"]
+    end
+
+    subgraph Shared["Shared Storage"]
+        direction TB
+        SH1["User A → any CLI"]
+        SH2["User B → any CLI"]
+        SH3["✅ Even load distribution"]
+        SH4["❌ Requires NFS / cloud storage"]
+    end
+
+    style Sticky fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style Shared fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+**Sticky sessions** are simpler — pin users to specific CLI servers. No shared storage needed, but load distribution is uneven.
+
+**Shared storage** enables any CLI to handle any session. Better load distribution, but requires networked storage for `~/.copilot/session-state/`.
+
+## Vertical Scaling
+
+### Tuning a Single CLI Server
+
+A single CLI server can handle many concurrent sessions. Key considerations:
+
+```mermaid
+flowchart TB
+    subgraph Resources["Resource Dimensions"]
+        CPU["🔧 CPU<br/>Model request processing"]
+        MEM["💾 Memory<br/>Active session state"]
+        DISK["💿 Disk I/O<br/>Session persistence"]
+        NET["🌐 Network<br/>API calls to provider"]
+    end
+
+    style Resources fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+**Session lifecycle management** is key to vertical scaling:
+
+```typescript
+// Limit concurrent active sessions
+class SessionManager {
+    private activeSessions = new Map<string, Session>();
+    private maxConcurrent: number;
+
+    constructor(maxConcurrent = 50) {
+        this.maxConcurrent = maxConcurrent;
+    }
+
+    async getSession(sessionId: string): Promise<Session> {
+        // Return existing active session
+        if (this.activeSessions.has(sessionId)) {
+            return this.activeSessions.get(sessionId)!;
+        }
+
+        // Enforce concurrency limit
+        if (this.activeSessions.size >= this.maxConcurrent) {
+            await this.evictOldestSession();
+        }
+
+        // Create or resume
+        const session = await client.createSession({
+            sessionId,
+            model: "gpt-4.1",
+        });
+
+        this.activeSessions.set(sessionId, session);
+        return session;
+    }
+
+    private async evictOldestSession(): Promise<void> {
+        const [oldestId] = this.activeSessions.keys();
+        const session = this.activeSessions.get(oldestId)!;
+        // Session state is persisted automatically — safe to destroy
+        await session.destroy();
+        this.activeSessions.delete(oldestId);
+    }
+}
+```
+
+## Ephemeral vs. Persistent Sessions
+
+```mermaid
+flowchart LR
+    subgraph Ephemeral["Ephemeral Sessions"]
+        E1["Created per request"]
+        E2["Destroyed after use"]
+        E3["No state to manage"]
+        E4["Good for: one-shot tasks,<br/>stateless APIs"]
+    end
+
+    subgraph Persistent["Persistent Sessions"]
+        P1["Named session ID"]
+        P2["Survives restarts"]
+        P3["Resumable"]
+        P4["Good for: multi-turn chat,<br/>long workflows"]
+    end
+
+    style Ephemeral fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+    style Persistent fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+### Ephemeral Sessions
+
+For stateless API endpoints where each request is independent:
+
+```typescript
+app.post("/api/analyze", async (req, res) => {
+    const session = await client.createSession({
+        model: "gpt-4.1",
+    });
+
+    try {
+        const response = await session.sendAndWait({
+            prompt: req.body.prompt,
+        });
+        res.json({ result: response?.data.content });
+    } finally {
+        await session.destroy();  // Clean up immediately
+    }
+});
+```
+
+### Persistent Sessions
+
+For conversational interfaces or long-running workflows:
+
+```typescript
+// Create a resumable session
+app.post("/api/chat/start", async (req, res) => {
+    const sessionId = `user-${req.user.id}-${Date.now()}`;
+
+    const session = await client.createSession({
+        sessionId,
+        model: "gpt-4.1",
+        infiniteSessions: {
+            enabled: true,
+            backgroundCompactionThreshold: 0.80,
+        },
+    });
+
+    res.json({ sessionId });
+});
+
+// Continue the conversation
+app.post("/api/chat/message", async (req, res) => {
+    const session = await client.resumeSession(req.body.sessionId);
+    const response = await session.sendAndWait({ prompt: req.body.message });
+
+    res.json({ content: response?.data.content });
+});
+
+// Clean up when done
+app.post("/api/chat/end", async (req, res) => {
+    await client.deleteSession(req.body.sessionId);
+    res.json({ success: true });
+});
+```
+
+## Container Deployments
+
+### Kubernetes with Persistent Storage
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: copilot-cli
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: copilot-cli
+  template:
+    metadata:
+      labels:
+        app: copilot-cli
+    spec:
+      containers:
+        - name: copilot-cli
+          image: ghcr.io/github/copilot-cli:latest
+          args: ["--headless", "--port", "4321"]
+          env:
+            - name: COPILOT_GITHUB_TOKEN
+              valueFrom:
+                secretKeyRef:
+                  name: copilot-secrets
+                  key: github-token
+          ports:
+            - containerPort: 4321
+          volumeMounts:
+            - name: session-state
+              mountPath: /root/.copilot/session-state
+      volumes:
+        - name: session-state
+          persistentVolumeClaim:
+            claimName: copilot-sessions-pvc
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: copilot-cli
+spec:
+  selector:
+    app: copilot-cli
+  ports:
+    - port: 4321
+      targetPort: 4321
+```
+
+```mermaid
+flowchart TB
+    subgraph K8s["Kubernetes Cluster"]
+        Svc["Service: copilot-cli:4321"]
+        Pod1["Pod 1: CLI"]
+        Pod2["Pod 2: CLI"]
+        Pod3["Pod 3: CLI"]
+        PVC["PersistentVolumeClaim<br/>(shared session state)"]
+    end
+
+    App["Your App Pods"] --> Svc
+    Svc --> Pod1
+    Svc --> Pod2
+    Svc --> Pod3
+
+    Pod1 --> PVC
+    Pod2 --> PVC
+    Pod3 --> PVC
+
+    style K8s fill:#0d1117,stroke:#58a6ff,color:#c9d1d9
+```
+
+### Azure Container Instances
+
+```yaml
+containers:
+  - name: copilot-cli
+    image: ghcr.io/github/copilot-cli:latest
+    command: ["copilot", "--headless", "--port", "4321"]
+    volumeMounts:
+      - name: session-storage
+        mountPath: /root/.copilot/session-state
+
+volumes:
+  - name: session-storage
+    azureFile:
+      shareName: copilot-sessions
+      storageAccountName: myaccount
+```
+
+## Production Checklist
+
+```mermaid
+flowchart TB
+    subgraph Checklist["Production Readiness"]
+        direction TB
+        A["✅ Session cleanup<br/>cron / TTL"]
+        B["✅ Health checks<br/>ping endpoint"]
+        C["✅ Persistent storage<br/>for session state"]
+        D["✅ Secret management<br/>for tokens/keys"]
+        E["✅ Monitoring<br/>active sessions, latency"]
+        F["✅ Session locking<br/>if shared sessions"]
+        G["✅ Graceful shutdown<br/>drain active sessions"]
+    end
+
+    style Checklist fill:#0d1117,stroke:#3fb950,color:#c9d1d9
+```
+
+| Concern | Recommendation |
+|---------|---------------|
+| **Session cleanup** | Run periodic cleanup to delete sessions older than your TTL |
+| **Health checks** | Ping the CLI server periodically; restart if unresponsive |
+| **Storage** | Mount persistent volumes for `~/.copilot/session-state/` |
+| **Secrets** | Use your platform's secret manager (Vault, K8s Secrets, etc.) |
+| **Monitoring** | Track active session count, response latency, error rates |
+| **Locking** | Use Redis or similar for shared session access |
+| **Shutdown** | Drain active sessions before stopping CLI servers |
+
+## Limitations
+
+| Limitation | Details |
+|------------|---------|
+| **No built-in session locking** | Implement application-level locking for concurrent access |
+| **No built-in load balancing** | Use external LB or service mesh |
+| **Session state is file-based** | Requires shared filesystem for multi-server setups |
+| **30-minute idle timeout** | Sessions without activity are auto-cleaned by the CLI |
+| **CLI is single-process** | Scale by adding more CLI server instances, not threads |
+
+## Next Steps
+
+- **[Session Persistence](../session-persistence.md)** — Deep dive on resumable sessions
+- **[Backend Services](./backend-services.md)** — Core server-side setup
+- **[GitHub OAuth](./github-oauth.md)** — Multi-user authentication
+- **[BYOK](./byok.md)** — Use your own model provider