Skip to content

Commit 3d822d0

Browse files
authored
docs: split agents.md documentation for shorter contexts (#52)
1 parent 12cf408 commit 3d822d0

File tree

5 files changed

+115
-185
lines changed

5 files changed

+115
-185
lines changed

AGENTS.md

Lines changed: 17 additions & 185 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Sysdig MCP Server – Agent Developer Handbook
22

3-
This document is a comprehensive guide for an AI agent tasked with developing and maintaining the Sysdig MCP Server. It covers everything from project setup and architecture to daily workflows and troubleshooting.
3+
This document is a comprehensive guide for an AI agent tasked with developing and maintaining the Sysdig MCP Server.
44

55
## 1. Project Overview
66

@@ -44,6 +44,7 @@ For a full list of optional variables (e.g., for transport configuration), see t
4444
### 3.1. Repository Layout
4545

4646
```
47+
.github/workflows - CI Workflows
4748
cmd/server/ - CLI entry point, tool registration
4849
internal/
4950
config/ - Environment variable loading and validation
@@ -55,6 +56,7 @@ internal/
5556
docs/ - Documentation assets
5657
justfile - Canonical development tasks (format, lint, test, generate, bump)
5758
flake.nix - Defines the Nix development environment and its dependencies
59+
package.nix - Defines how the package is going to be built with Nix
5860
```
5961

6062
### 3.2. Key Components & Flow
@@ -75,10 +77,10 @@ flake.nix - Defines the Nix development environment and its depen
7577
- HTTP middleware extracts `Authorization` and `X-Sysdig-Host` headers for remote transports (line 108-138)
7678

7779
4. **Sysdig Client (`internal/infra/sysdig/`):**
78-
- `client.gen.go`: Generated OpenAPI client (**DO NOT EDIT**, regenerated via oapi-codegen)
80+
- `client.gen.go`: Generated OpenAPI client (**DO NOT EDIT**, manually regenerated via oapi-codegen, not with `go generate`)
7981
- `client.go`: Authentication strategies with fallback support
8082
- Context-based auth: `WrapContextWithToken()` and `WrapContextWithHost()` for remote transports
81-
- Fixed auth: `WithFixedHostAndToken()` for stdio mode
83+
- Fixed auth: `WithFixedHostAndToken()` for stdio mode and remote transports
8284
- Custom extensions in `client_extension.go` and `client_*.go` files
8385

8486
5. **Tools (`internal/infra/mcp/tools/`):**
@@ -87,26 +89,12 @@ flake.nix - Defines the Nix development environment and its depen
8789
- Use `WithRequiredPermissions()` from `utils.go` to declare Sysdig API permissions
8890
- Permission filtering happens automatically in handler
8991

90-
### 3.3. Authentication Flow
91-
92-
1. **stdio transport**: Fixed host/token from env vars (`SYSDIG_MCP_API_HOST`, `SYSDIG_MCP_API_TOKEN`)
93-
2. **Remote transports**: Extract from HTTP headers (`Authorization: Bearer <token>`, `X-Sysdig-Host`)
94-
3. Fallback chain: Try context auth first, then fall back to env var auth
95-
4. Each request includes Bearer token in Authorization header to Sysdig APIs
96-
97-
### 3.4. Tool Permission System
98-
99-
- Each tool declares its required Sysdig API permissions using `WithRequiredPermissions("permission1", "permission2")`.
100-
- Before exposing tools to the LLM, the handler calls the Sysdig `GetMyPermissions` API.
101-
- The agent will only see tools for which the provided API token has **all** required permissions.
102-
- Common permissions: `policy-events.read`, `sage.exec`, `risks.read`, `promql.exec`
103-
10492
## 4. Day-to-Day Workflow
10593

106-
1. **Enter the Dev Shell:** Always work inside the Nix shell (`nix develop` or `direnv allow`) to ensure all tools are available. You can assume the developer is already in a Nix shell.
94+
1. **Enter the Dev Shell:** Always work inside the Nix shell (`nix develop` or `direnv allow`). You can assume the developer already did that.
10795
2. **Make Focused Changes:** Implement a new tool, fix a bug, or improve documentation.
10896
3. **Run Quality Gates:** Use `just` to run formatters, linters, and tests.
109-
4. **Commit:** Follow the Conventional Commits specification. Keep the commit messages short, just title, no description. Pre-commit hooks will run quality gates automatically.
97+
4. **Commit:** Follow the Conventional Commits specification.
11098

11199
### 4.1. Testing & Quality Gates
112100

@@ -121,174 +109,18 @@ just check # A convenient alias for fmt + lint + test.
121109

122110
### 4.2. Pre-commit Hooks
123111

124-
This repository uses **pre-commit** to automate quality checks before each commit. The hooks are configured in `.pre-commit-config.yaml` to run `just fmt`, `just lint`, and `just test`.
125-
126-
This means that every time you run `git commit`, your changes are automatically formatted, linted, and tested. If any of these checks fail, the commit is aborted, allowing you to fix the issues.
127-
128-
If the hooks do not run automatically, you may need to install them first:
129-
```bash
130-
# Install the git hooks defined in the configuration
131-
pre-commit install
132-
133-
# After installation, you can run all checks on all files
134-
pre-commit run -a
135-
```
112+
This repository uses **pre-commit** to automate quality checks before each commit.
113+
The hooks are configured in `.pre-commit-config.yaml` to run `just fmt`, `just lint`, and `just test`.
114+
If any of the hooks fail, the commit will not be created.
136115

137116
### 4.3 Updating all dependencies
138117

139-
You need to keep the project dependencies fresh from time to time. The way to do so is automated with `just bump`. Keep in mind that for that command to work, you need to have `nix` installed and in the $PATH.
140-
141-
## 5. MCP Tools & Permissions
142-
143-
The handler filters tools dynamically based on the Sysdig user's permissions. Each tool declares mandatory permissions via `WithRequiredPermissions`.
144-
145-
| Tool | File | Capability | Required Permissions | Useful Prompts |
146-
| --- | --- | --- | --- | --- |
147-
| `list_runtime_events` | `tool_list_runtime_events.go` | Query runtime events with filters, cursor, scope. | `policy-events.read` | “Show high severity runtime events from last 2h.” |
148-
| `get_event_info` | `tool_get_event_info.go` | Pull full payload for a single policy event. | `policy-events.read` | “Fetch event `abc123` details.” |
149-
| `get_event_process_tree` | `tool_get_event_process_tree.go` | Retrieve the process tree for an event when available. | `policy-events.read` | “Show the process tree behind event `abc123`.” |
150-
| `run_sysql` | `tool_run_sysql.go` | Execute caller-supplied Sysdig SysQL queries safely. | `sage.exec`, `risks.read` | “Run the following SysQL…”. |
151-
| `generate_sysql` | `tool_generate_sysql.go` | Convert natural language to SysQL via Sysdig Sage. | `sage.exec` (does not work with Service Accounts) | “Create a SysQL to list S3 buckets.” |
152-
| `kubernetes_list_clusters` | `tool_kubernetes_list_clusters.go` | Lists Kubernetes cluster information. | `promql.exec` | "List all Kubernetes clusters" |
153-
| `kubernetes_list_nodes` | `tool_kubernetes_list_nodes.go` | Lists Kubernetes node information. | `promql.exec` | "List all Kubernetes nodes in the cluster 'production-gke'" |
154-
| `kubernetes_list_workloads` | `tool_kubernetes_list_workloads.go` | Lists Kubernetes workload information. | `promql.exec` | "List all desired workloads in the cluster 'production-gke' and namespace 'default'" |
155-
| `kubernetes_list_pod_containers` | `tool_kubernetes_list_pod_containers.go` | Retrieves information from a particular pod and container. | `promql.exec` | "Show me info for pod 'my-pod' in cluster 'production-gke'" |
156-
| `kubernetes_list_cronjobs` | `tool_kubernetes_list_cronjobs.go` | Retrieves information from the cronjobs in the cluster. | `promql.exec` | "List all cronjobs in cluster 'prod' and namespace 'default'" |
157-
| `troubleshoot_kubernetes_list_top_unavailable_pods` | `tool_troubleshoot_kubernetes_list_top_unavailable_pods.go` | Shows the top N pods with the highest number of unavailable or unready replicas. | `promql.exec` | "Show the top 20 unavailable pods in cluster 'production'" |
158-
| `troubleshoot_kubernetes_list_top_restarted_pods` | `tool_troubleshoot_kubernetes_list_top_restarted_pods.go` | Lists the pods with the highest number of container restarts. | `promql.exec` | "Show the top 10 pods with the most container restarts in cluster 'production'" |
159-
| `troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods.go` | Lists the pods with the highest rate of HTTP 4xx and 5xx errors over a specified time interval. | `promql.exec` | "Show the top 20 pods with the most HTTP errors in cluster 'production'" |
160-
| `troubleshoot_kubernetes_list_top_network_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_network_errors_in_pods.go` | Shows the top network errors by pod over a given interval. | `promql.exec` | "Show the top 10 pods with the most network errors in cluster 'production'" |
161-
| `troubleshoot_kubernetes_list_count_pods_per_cluster` | `tool_troubleshoot_kubernetes_list_count_pods_per_cluster.go` | List the count of running Kubernetes Pods grouped by cluster and namespace. | `promql.exec` | "List the count of running Kubernetes Pods in cluster 'production'" |
162-
| `troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota.go` | List Kubernetes pods with CPU usage below 25% of the quota limit. | `promql.exec` | "Show the top 10 underutilized pods by CPU quota in cluster 'production'" |
163-
| `troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota.go` | List Kubernetes pods with memory usage below 25% of the limit. | `promql.exec` | "Show the top 10 underutilized pods by memory quota in cluster 'production'" |
164-
| `troubleshoot_kubernetes_list_top_cpu_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_workload.go` | Identifies the Kubernetes workloads (all containers) consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 workloads consuming the most CPU in cluster 'production'" |
165-
| `troubleshoot_kubernetes_list_top_cpu_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_container.go` | Identifies the Kubernetes containers consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 containers consuming the most CPU in cluster 'production'" |
166-
| `troubleshoot_kubernetes_list_top_memory_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_workload.go` | Lists memory-intensive workloads (all containers). | `promql.exec` | "Show the top 10 workloads consuming the most memory in cluster 'production'" |
167-
| `troubleshoot_kubernetes_list_top_memory_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_container.go` | Lists memory-intensive containers. | `promql.exec` | "Show the top 10 containers consuming the most memory in cluster 'production'" |
168-
169-
## 6. Adding a New Tool
170-
171-
1. **Create Files:** Add `tool_<name>.go` and `tool_<name>_test.go` in `internal/infra/mcp/tools/`.
172-
173-
2. **Implement the Tool:**
174-
* Define a struct that holds the Sysdig client.
175-
* Implement the `handle` method, which contains the tool's core logic.
176-
* Implement the `RegisterInServer` method to define the tool's MCP schema, including its name, description, parameters, and required permissions. Use helpers from `utils.go`.
177-
178-
3. **Write Tests:** Use Ginkgo/Gomega to write BDD-style tests. Mock the Sysdig client to cover:
179-
- Parameter validation
180-
- Permission metadata
181-
- Sysdig API client interactions (mocked)
182-
- Error handling
183-
184-
4. **Register the Tool:** Add the new tool to `setupHandler()` in `cmd/server/main.go` (line 88-114).
185-
186-
5. **Document:** Add the new tool to the README.md and the table in section 5 (MCP Tools & Permissions).
187-
188-
### 6.1. Example Tool Structure
189-
190-
```go
191-
type ToolMyFeature struct {
192-
sysdigClient sysdig.ExtendedClientWithResponsesInterface
193-
}
194-
195-
func (h *ToolMyFeature) handle(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
196-
param := request.GetString("param_name", "")
197-
response, err := h.sysdigClient.SomeAPICall(ctx, param)
198-
// Handle response...
199-
return mcp.NewToolResultJSON(response.JSON200)
200-
}
201-
202-
func (h *ToolMyFeature) RegisterInServer(s *server.MCPServer) {
203-
tool := mcp.NewTool("my_feature",
204-
mcp.WithDescription("What this tool does"),
205-
mcp.WithString("param_name",
206-
mcp.Required(),
207-
mcp.Description("Parameter description"),
208-
),
209-
mcp.WithReadOnlyHintAnnotation(true),
210-
mcp.WithDestructiveHintAnnotation(false),
211-
WithRequiredPermissions("permission.name"),
212-
)
213-
s.AddTool(tool, h.handle)
214-
}
215-
```
216-
217-
### 6.2. Testing Philosophy
218-
219-
- Use BDD-style tests with Ginkgo/Gomega
220-
- Each tool requires comprehensive test coverage for:
221-
- Parameter validation
222-
- Permission metadata
223-
- Sysdig API client interactions (mocked using go-mock)
224-
- Error handling
225-
- Integration tests marked with `_integration_test.go` suffix
226-
- No focused specs (`FDescribe`, `FIt`) should be committed
227-
228-
## 7. Conventional Commits
229-
230-
All commit messages must follow the [Conventional Commits](https://www.conventionalcommits.org/) specification. This is essential for automated versioning and changelog generation.
231-
232-
- **Types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `build`, `ci`.
233-
- **Format**: `<type>(<optional scope>): <imperative description>`
234-
235-
## 8. Code Generation
236-
237-
- `internal/infra/sysdig/client.gen.go` is auto-generated from OpenAPI spec via oapi-codegen.
238-
- Run `go generate ./...` (or `just generate`) to regenerate after spec changes.
239-
- Generated code includes all Sysdig Secure API types and client methods.
240-
- **DO NOT** manually edit `client.gen.go`. Extend functionality in separate files (e.g., `client_extension.go`).
241-
242-
## 9. Important Constraints
243-
244-
1. **Generated Code**: Never manually edit `client.gen.go`. Extend functionality in separate files like `client_extension.go`.
245-
246-
2. **Service Account Limitation**: The `generate_sysql` tool does NOT work with Service Account tokens (returns 500). Use regular user API tokens for this tool.
247-
248-
3. **Permission Filtering**: Tools are hidden if the API token lacks required permissions. Check user's Sysdig role if a tool is unexpectedly missing.
249-
250-
4. **stdio Mode Requirements**: When using stdio transport, `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` MUST be set. Remote transports can receive these via HTTP headers instead.
251-
252-
## 10. Troubleshooting
253-
254-
**Problem**: Tool not appearing in MCP client
255-
- **Solution**: Check API token permissions match tool's `WithRequiredPermissions()`. Use Sysdig UI: **Settings > Users & Teams > Roles**. The token must have **all** permissions listed.
256-
257-
**Problem**: "unable to authenticate with any method"
258-
- **Solution**: For `stdio`, verify `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` env vars are set correctly. For remote transports, check `Authorization: Bearer <token>` header format.
259-
260-
**Problem**: Tests failing with "command not found"
261-
- **Solution**: Enter Nix shell with `nix develop` or `direnv allow`. All dev tools are provided by the flake.
262-
263-
**Problem**: `generate_sysql` returning 500 error
264-
- **Solution**: This tool requires a regular user API token, not a Service Account token. Switch to a user-based token.
265-
266-
**Problem**: Pre-commit hooks not running
267-
- **Solution**: Run `pre-commit install` to install git hooks, then `pre-commit run -a` to test all files.
268-
269-
## 11. Releasing
270-
271-
The workflow in .github/workflows/publish.yaml will create a new release automatically when the version of the crate changes in package.nix in the default git branch.
272-
So, if you attempt to release a new version, you need to update this version. You should try releasing a new version when you do any meaningful change that the user can benefit from.
273-
The guidelines to follow would be:
274-
275-
* New feature is implemented -> Release new version.
276-
* Bug fixes -> Release new version.
277-
* CI/Refactorings/Internal changes -> No need to release new version.
278-
* Documentation changes -> No need to release new version.
279-
280-
The current version of the project is not stable yet, so you need to follow the [Semver spec](https://semver.org/spec/v2.0.0.html), with the following guidelines:
281-
282-
* Unless specified, do not attempt to stabilize the version. That is, do not try to update the version to >=1.0.0. Versions for now should be <1.0.0.
283-
* For minor changes, update only the Y in 0.X.Y. For example: 0.5.2 -> 0.5.3
284-
* For major/feature changes, update the X in 0.X.Y and set the Y to 0. For example: 0.5.2 -> 0.6.0
285-
* Before choosing if the changes are minor or major, check all the commits since the last tag.
286-
287-
After the commit is merged into the default branch the workflow will cross-compile the project and create a GitHub release of that version.
288-
Check the workflow file in case of doubt.
118+
Automated with `just bump`. Requires `nix` installed.
289119

290-
## 12. Reference Links
120+
## 5. Guides & Reference
291121

292-
- `README.md` – Comprehensive product docs, quickstart, and client configuration samples.
293-
- `CLAUDE.md` – Complementary guide with additional examples and command reference.
294-
- [Model Context Protocol](https://modelcontextprotocol.io/) – Protocol reference for tool/transport behavior.
122+
* **Tools & New Tool Creation:** See `internal/infra/mcp/tools/README.md`
123+
* **Releasing:** See `docs/RELEASING.md`
124+
* **Troubleshooting:** See `docs/TROUBLESHOOTING.md`
125+
* **Conventional Commits:** [Specification](https://www.conventionalcommits.org/)
126+
* **Protocol:** [Model Context Protocol](https://modelcontextprotocol.io/)

docs/RELEASING.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Releasing
2+
3+
The workflow in .github/workflows/publish.yaml will create a new release automatically when the version of the crate changes in package.nix in the default git branch.
4+
So, if you attempt to release a new version, you need to update this version. You should try releasing a new version when you do any meaningful change that the user can benefit from.
5+
The guidelines to follow would be:
6+
7+
* New feature is implemented -> Release new version.
8+
* Bug fixes -> Release new version.
9+
* CI/Refactorings/Internal changes -> No need to release new version.
10+
* Documentation changes -> No need to release new version.
11+
12+
The current version of the project is not stable yet, so you need to follow the [Semver spec](https://semver.org/spec/v2.0.0.html), with the following guidelines:
13+
14+
* Unless specified, do not attempt to stabilize the version. That is, do not try to update the version to >=1.0.0. Versions for now should be <1.0.0.
15+
* For minor changes, update only the Y in 0.X.Y. For example: 0.5.2 -> 0.5.3
16+
* For major/feature changes, update the X in 0.X.Y and set the Y to 0. For example: 0.5.2 -> 0.6.0
17+
* Before choosing if the changes are minor or major, check all the commits since the last tag.
18+
19+
After the commit is merged into the default branch the workflow will cross-compile the project and create a GitHub release of that version.
20+
Check the workflow file in case of doubt.

docs/TROUBLESHOOTING.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Troubleshooting
2+
3+
**Problem**: Tool not appearing in MCP client
4+
- **Solution**: Check API token permissions match tool's `WithRequiredPermissions()`. The token must have **all** permissions listed.
5+
6+
**Problem**: "unable to authenticate with any method"
7+
- **Solution**: For `stdio`, verify `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` env vars are set correctly. For remote transports, check `Authorization: Bearer <token>` header format.
8+
9+
**Problem**: Tests failing with "command not found"
10+
- **Solution**: Enter Nix shell with `nix develop` or `direnv allow`. All dev tools are provided by the flake.
11+
12+
**Problem**: `generate_sysql` returning 500 error
13+
- **Solution**: This tool requires a regular user API token, not a Service Account token. Switch to a user-based token.
14+
15+
**Problem**: Pre-commit hooks not running
16+
- **Solution**: Run `pre-commit install` to install git hooks, then `pre-commit run -a` to test all files.

0 commit comments

Comments
 (0)