diff --git a/AGENTS.md b/AGENTS.md index 518c497..9e73504 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,6 +1,6 @@ # Sysdig MCP Server – Agent Developer Handbook -This document is a comprehensive guide for an AI agent tasked with developing and maintaining the Sysdig MCP Server. It covers everything from project setup and architecture to daily workflows and troubleshooting. +This document is a comprehensive guide for an AI agent tasked with developing and maintaining the Sysdig MCP Server. ## 1. Project Overview @@ -44,6 +44,7 @@ For a full list of optional variables (e.g., for transport configuration), see t ### 3.1. Repository Layout ``` +.github/workflows - CI Workflows cmd/server/ - CLI entry point, tool registration internal/ config/ - Environment variable loading and validation @@ -55,6 +56,7 @@ internal/ docs/ - Documentation assets justfile - Canonical development tasks (format, lint, test, generate, bump) flake.nix - Defines the Nix development environment and its dependencies +package.nix - Defines how the package is going to be built with Nix ``` ### 3.2. Key Components & Flow @@ -75,10 +77,10 @@ flake.nix - Defines the Nix development environment and its depen - HTTP middleware extracts `Authorization` and `X-Sysdig-Host` headers for remote transports (line 108-138) 4. **Sysdig Client (`internal/infra/sysdig/`):** - - `client.gen.go`: Generated OpenAPI client (**DO NOT EDIT**, regenerated via oapi-codegen) + - `client.gen.go`: Generated OpenAPI client (**DO NOT EDIT**, manually regenerated via oapi-codegen, not with `go generate`) - `client.go`: Authentication strategies with fallback support - Context-based auth: `WrapContextWithToken()` and `WrapContextWithHost()` for remote transports - - Fixed auth: `WithFixedHostAndToken()` for stdio mode + - Fixed auth: `WithFixedHostAndToken()` for stdio mode and remote transports - Custom extensions in `client_extension.go` and `client_*.go` files 5. **Tools (`internal/infra/mcp/tools/`):** @@ -87,26 +89,12 @@ flake.nix - Defines the Nix development environment and its depen - Use `WithRequiredPermissions()` from `utils.go` to declare Sysdig API permissions - Permission filtering happens automatically in handler -### 3.3. Authentication Flow - -1. **stdio transport**: Fixed host/token from env vars (`SYSDIG_MCP_API_HOST`, `SYSDIG_MCP_API_TOKEN`) -2. **Remote transports**: Extract from HTTP headers (`Authorization: Bearer `, `X-Sysdig-Host`) -3. Fallback chain: Try context auth first, then fall back to env var auth -4. Each request includes Bearer token in Authorization header to Sysdig APIs - -### 3.4. Tool Permission System - -- Each tool declares its required Sysdig API permissions using `WithRequiredPermissions("permission1", "permission2")`. -- Before exposing tools to the LLM, the handler calls the Sysdig `GetMyPermissions` API. -- The agent will only see tools for which the provided API token has **all** required permissions. -- Common permissions: `policy-events.read`, `sage.exec`, `risks.read`, `promql.exec` - ## 4. Day-to-Day Workflow -1. **Enter the Dev Shell:** Always work inside the Nix shell (`nix develop` or `direnv allow`) to ensure all tools are available. You can assume the developer is already in a Nix shell. +1. **Enter the Dev Shell:** Always work inside the Nix shell (`nix develop` or `direnv allow`). You can assume the developer already did that. 2. **Make Focused Changes:** Implement a new tool, fix a bug, or improve documentation. 3. **Run Quality Gates:** Use `just` to run formatters, linters, and tests. -4. **Commit:** Follow the Conventional Commits specification. Keep the commit messages short, just title, no description. Pre-commit hooks will run quality gates automatically. +4. **Commit:** Follow the Conventional Commits specification. ### 4.1. Testing & Quality Gates @@ -121,174 +109,18 @@ just check # A convenient alias for fmt + lint + test. ### 4.2. Pre-commit Hooks -This repository uses **pre-commit** to automate quality checks before each commit. The hooks are configured in `.pre-commit-config.yaml` to run `just fmt`, `just lint`, and `just test`. - -This means that every time you run `git commit`, your changes are automatically formatted, linted, and tested. If any of these checks fail, the commit is aborted, allowing you to fix the issues. - -If the hooks do not run automatically, you may need to install them first: -```bash -# Install the git hooks defined in the configuration -pre-commit install - -# After installation, you can run all checks on all files -pre-commit run -a -``` +This repository uses **pre-commit** to automate quality checks before each commit. +The hooks are configured in `.pre-commit-config.yaml` to run `just fmt`, `just lint`, and `just test`. +If any of the hooks fail, the commit will not be created. ### 4.3 Updating all dependencies -You need to keep the project dependencies fresh from time to time. The way to do so is automated with `just bump`. Keep in mind that for that command to work, you need to have `nix` installed and in the $PATH. - -## 5. MCP Tools & Permissions - -The handler filters tools dynamically based on the Sysdig user's permissions. Each tool declares mandatory permissions via `WithRequiredPermissions`. - -| Tool | File | Capability | Required Permissions | Useful Prompts | -| --- | --- | --- | --- | --- | -| `list_runtime_events` | `tool_list_runtime_events.go` | Query runtime events with filters, cursor, scope. | `policy-events.read` | “Show high severity runtime events from last 2h.” | -| `get_event_info` | `tool_get_event_info.go` | Pull full payload for a single policy event. | `policy-events.read` | “Fetch event `abc123` details.” | -| `get_event_process_tree` | `tool_get_event_process_tree.go` | Retrieve the process tree for an event when available. | `policy-events.read` | “Show the process tree behind event `abc123`.” | -| `run_sysql` | `tool_run_sysql.go` | Execute caller-supplied Sysdig SysQL queries safely. | `sage.exec`, `risks.read` | “Run the following SysQL…”. | -| `generate_sysql` | `tool_generate_sysql.go` | Convert natural language to SysQL via Sysdig Sage. | `sage.exec` (does not work with Service Accounts) | “Create a SysQL to list S3 buckets.” | -| `kubernetes_list_clusters` | `tool_kubernetes_list_clusters.go` | Lists Kubernetes cluster information. | `promql.exec` | "List all Kubernetes clusters" | -| `kubernetes_list_nodes` | `tool_kubernetes_list_nodes.go` | Lists Kubernetes node information. | `promql.exec` | "List all Kubernetes nodes in the cluster 'production-gke'" | -| `kubernetes_list_workloads` | `tool_kubernetes_list_workloads.go` | Lists Kubernetes workload information. | `promql.exec` | "List all desired workloads in the cluster 'production-gke' and namespace 'default'" | -| `kubernetes_list_pod_containers` | `tool_kubernetes_list_pod_containers.go` | Retrieves information from a particular pod and container. | `promql.exec` | "Show me info for pod 'my-pod' in cluster 'production-gke'" | -| `kubernetes_list_cronjobs` | `tool_kubernetes_list_cronjobs.go` | Retrieves information from the cronjobs in the cluster. | `promql.exec` | "List all cronjobs in cluster 'prod' and namespace 'default'" | -| `troubleshoot_kubernetes_list_top_unavailable_pods` | `tool_troubleshoot_kubernetes_list_top_unavailable_pods.go` | Shows the top N pods with the highest number of unavailable or unready replicas. | `promql.exec` | "Show the top 20 unavailable pods in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_restarted_pods` | `tool_troubleshoot_kubernetes_list_top_restarted_pods.go` | Lists the pods with the highest number of container restarts. | `promql.exec` | "Show the top 10 pods with the most container restarts in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods.go` | Lists the pods with the highest rate of HTTP 4xx and 5xx errors over a specified time interval. | `promql.exec` | "Show the top 20 pods with the most HTTP errors in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_network_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_network_errors_in_pods.go` | Shows the top network errors by pod over a given interval. | `promql.exec` | "Show the top 10 pods with the most network errors in cluster 'production'" | -| `troubleshoot_kubernetes_list_count_pods_per_cluster` | `tool_troubleshoot_kubernetes_list_count_pods_per_cluster.go` | List the count of running Kubernetes Pods grouped by cluster and namespace. | `promql.exec` | "List the count of running Kubernetes Pods in cluster 'production'" | -| `troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota.go` | List Kubernetes pods with CPU usage below 25% of the quota limit. | `promql.exec` | "Show the top 10 underutilized pods by CPU quota in cluster 'production'" | -| `troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota.go` | List Kubernetes pods with memory usage below 25% of the limit. | `promql.exec` | "Show the top 10 underutilized pods by memory quota in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_cpu_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_workload.go` | Identifies the Kubernetes workloads (all containers) consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 workloads consuming the most CPU in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_cpu_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_container.go` | Identifies the Kubernetes containers consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 containers consuming the most CPU in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_memory_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_workload.go` | Lists memory-intensive workloads (all containers). | `promql.exec` | "Show the top 10 workloads consuming the most memory in cluster 'production'" | -| `troubleshoot_kubernetes_list_top_memory_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_container.go` | Lists memory-intensive containers. | `promql.exec` | "Show the top 10 containers consuming the most memory in cluster 'production'" | - -## 6. Adding a New Tool - -1. **Create Files:** Add `tool_.go` and `tool__test.go` in `internal/infra/mcp/tools/`. - -2. **Implement the Tool:** - * Define a struct that holds the Sysdig client. - * Implement the `handle` method, which contains the tool's core logic. - * Implement the `RegisterInServer` method to define the tool's MCP schema, including its name, description, parameters, and required permissions. Use helpers from `utils.go`. - -3. **Write Tests:** Use Ginkgo/Gomega to write BDD-style tests. Mock the Sysdig client to cover: - - Parameter validation - - Permission metadata - - Sysdig API client interactions (mocked) - - Error handling - -4. **Register the Tool:** Add the new tool to `setupHandler()` in `cmd/server/main.go` (line 88-114). - -5. **Document:** Add the new tool to the README.md and the table in section 5 (MCP Tools & Permissions). - -### 6.1. Example Tool Structure - -```go -type ToolMyFeature struct { - sysdigClient sysdig.ExtendedClientWithResponsesInterface -} - -func (h *ToolMyFeature) handle(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) { - param := request.GetString("param_name", "") - response, err := h.sysdigClient.SomeAPICall(ctx, param) - // Handle response... - return mcp.NewToolResultJSON(response.JSON200) -} - -func (h *ToolMyFeature) RegisterInServer(s *server.MCPServer) { - tool := mcp.NewTool("my_feature", - mcp.WithDescription("What this tool does"), - mcp.WithString("param_name", - mcp.Required(), - mcp.Description("Parameter description"), - ), - mcp.WithReadOnlyHintAnnotation(true), - mcp.WithDestructiveHintAnnotation(false), - WithRequiredPermissions("permission.name"), - ) - s.AddTool(tool, h.handle) -} -``` - -### 6.2. Testing Philosophy - -- Use BDD-style tests with Ginkgo/Gomega -- Each tool requires comprehensive test coverage for: - - Parameter validation - - Permission metadata - - Sysdig API client interactions (mocked using go-mock) - - Error handling -- Integration tests marked with `_integration_test.go` suffix -- No focused specs (`FDescribe`, `FIt`) should be committed - -## 7. Conventional Commits - -All commit messages must follow the [Conventional Commits](https://www.conventionalcommits.org/) specification. This is essential for automated versioning and changelog generation. - -- **Types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `build`, `ci`. -- **Format**: `(): ` - -## 8. Code Generation - -- `internal/infra/sysdig/client.gen.go` is auto-generated from OpenAPI spec via oapi-codegen. -- Run `go generate ./...` (or `just generate`) to regenerate after spec changes. -- Generated code includes all Sysdig Secure API types and client methods. -- **DO NOT** manually edit `client.gen.go`. Extend functionality in separate files (e.g., `client_extension.go`). - -## 9. Important Constraints - -1. **Generated Code**: Never manually edit `client.gen.go`. Extend functionality in separate files like `client_extension.go`. - -2. **Service Account Limitation**: The `generate_sysql` tool does NOT work with Service Account tokens (returns 500). Use regular user API tokens for this tool. - -3. **Permission Filtering**: Tools are hidden if the API token lacks required permissions. Check user's Sysdig role if a tool is unexpectedly missing. - -4. **stdio Mode Requirements**: When using stdio transport, `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` MUST be set. Remote transports can receive these via HTTP headers instead. - -## 10. Troubleshooting - -**Problem**: Tool not appearing in MCP client -- **Solution**: Check API token permissions match tool's `WithRequiredPermissions()`. Use Sysdig UI: **Settings > Users & Teams > Roles**. The token must have **all** permissions listed. - -**Problem**: "unable to authenticate with any method" -- **Solution**: For `stdio`, verify `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` env vars are set correctly. For remote transports, check `Authorization: Bearer ` header format. - -**Problem**: Tests failing with "command not found" -- **Solution**: Enter Nix shell with `nix develop` or `direnv allow`. All dev tools are provided by the flake. - -**Problem**: `generate_sysql` returning 500 error -- **Solution**: This tool requires a regular user API token, not a Service Account token. Switch to a user-based token. - -**Problem**: Pre-commit hooks not running -- **Solution**: Run `pre-commit install` to install git hooks, then `pre-commit run -a` to test all files. - -## 11. Releasing - -The workflow in .github/workflows/publish.yaml will create a new release automatically when the version of the crate changes in package.nix in the default git branch. -So, if you attempt to release a new version, you need to update this version. You should try releasing a new version when you do any meaningful change that the user can benefit from. -The guidelines to follow would be: - -* New feature is implemented -> Release new version. -* Bug fixes -> Release new version. -* CI/Refactorings/Internal changes -> No need to release new version. -* Documentation changes -> No need to release new version. - -The current version of the project is not stable yet, so you need to follow the [Semver spec](https://semver.org/spec/v2.0.0.html), with the following guidelines: - -* Unless specified, do not attempt to stabilize the version. That is, do not try to update the version to >=1.0.0. Versions for now should be <1.0.0. -* For minor changes, update only the Y in 0.X.Y. For example: 0.5.2 -> 0.5.3 -* For major/feature changes, update the X in 0.X.Y and set the Y to 0. For example: 0.5.2 -> 0.6.0 -* Before choosing if the changes are minor or major, check all the commits since the last tag. - -After the commit is merged into the default branch the workflow will cross-compile the project and create a GitHub release of that version. -Check the workflow file in case of doubt. +Automated with `just bump`. Requires `nix` installed. -## 12. Reference Links +## 5. Guides & Reference -- `README.md` – Comprehensive product docs, quickstart, and client configuration samples. -- `CLAUDE.md` – Complementary guide with additional examples and command reference. -- [Model Context Protocol](https://modelcontextprotocol.io/) – Protocol reference for tool/transport behavior. +* **Tools & New Tool Creation:** See `internal/infra/mcp/tools/README.md` +* **Releasing:** See `docs/RELEASING.md` +* **Troubleshooting:** See `docs/TROUBLESHOOTING.md` +* **Conventional Commits:** [Specification](https://www.conventionalcommits.org/) +* **Protocol:** [Model Context Protocol](https://modelcontextprotocol.io/) diff --git a/docs/RELEASING.md b/docs/RELEASING.md new file mode 100644 index 0000000..9d7b78b --- /dev/null +++ b/docs/RELEASING.md @@ -0,0 +1,20 @@ +# Releasing + +The workflow in .github/workflows/publish.yaml will create a new release automatically when the version of the crate changes in package.nix in the default git branch. +So, if you attempt to release a new version, you need to update this version. You should try releasing a new version when you do any meaningful change that the user can benefit from. +The guidelines to follow would be: + +* New feature is implemented -> Release new version. +* Bug fixes -> Release new version. +* CI/Refactorings/Internal changes -> No need to release new version. +* Documentation changes -> No need to release new version. + +The current version of the project is not stable yet, so you need to follow the [Semver spec](https://semver.org/spec/v2.0.0.html), with the following guidelines: + +* Unless specified, do not attempt to stabilize the version. That is, do not try to update the version to >=1.0.0. Versions for now should be <1.0.0. +* For minor changes, update only the Y in 0.X.Y. For example: 0.5.2 -> 0.5.3 +* For major/feature changes, update the X in 0.X.Y and set the Y to 0. For example: 0.5.2 -> 0.6.0 +* Before choosing if the changes are minor or major, check all the commits since the last tag. + +After the commit is merged into the default branch the workflow will cross-compile the project and create a GitHub release of that version. +Check the workflow file in case of doubt. diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 0000000..2da345a --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,16 @@ +# Troubleshooting + +**Problem**: Tool not appearing in MCP client +- **Solution**: Check API token permissions match tool's `WithRequiredPermissions()`. The token must have **all** permissions listed. + +**Problem**: "unable to authenticate with any method" +- **Solution**: For `stdio`, verify `SYSDIG_MCP_API_HOST` and `SYSDIG_MCP_API_TOKEN` env vars are set correctly. For remote transports, check `Authorization: Bearer ` header format. + +**Problem**: Tests failing with "command not found" +- **Solution**: Enter Nix shell with `nix develop` or `direnv allow`. All dev tools are provided by the flake. + +**Problem**: `generate_sysql` returning 500 error +- **Solution**: This tool requires a regular user API token, not a Service Account token. Switch to a user-based token. + +**Problem**: Pre-commit hooks not running +- **Solution**: Run `pre-commit install` to install git hooks, then `pre-commit run -a` to test all files. diff --git a/internal/infra/mcp/tools/README.md b/internal/infra/mcp/tools/README.md new file mode 100644 index 0000000..ec77ca8 --- /dev/null +++ b/internal/infra/mcp/tools/README.md @@ -0,0 +1,61 @@ +# MCP Tools & Permissions + +The handler filters tools dynamically based on the Sysdig user's permissions. Each tool declares mandatory permissions via `WithRequiredPermissions`. + +| Tool | File | Capability | Required Permissions | Useful Prompts | +| --- | --- | --- | --- | --- | +| `list_runtime_events` | `tool_list_runtime_events.go` | Query runtime events with filters, cursor, scope. | `policy-events.read` | “Show high severity runtime events from last 2h.” | +| `get_event_info` | `tool_get_event_info.go` | Pull full payload for a single policy event. | `policy-events.read` | “Fetch event `abc123` details.” | +| `get_event_process_tree` | `tool_get_event_process_tree.go` | Retrieve the process tree for an event when available. | `policy-events.read` | “Show the process tree behind event `abc123`.” | +| `run_sysql` | `tool_run_sysql.go` | Execute caller-supplied Sysdig SysQL queries safely. | `sage.exec`, `risks.read` | “Run the following SysQL…”. | +| `generate_sysql` | `tool_generate_sysql.go` | Convert natural language to SysQL via Sysdig Sage. | `sage.exec` (does not work with Service Accounts) | “Create a SysQL to list S3 buckets.” | +| `kubernetes_list_clusters` | `tool_kubernetes_list_clusters.go` | Lists Kubernetes cluster information. | `promql.exec` | "List all Kubernetes clusters" | +| `kubernetes_list_nodes` | `tool_kubernetes_list_nodes.go` | Lists Kubernetes node information. | `promql.exec` | "List all Kubernetes nodes in the cluster 'production-gke'" | +| `kubernetes_list_workloads` | `tool_kubernetes_list_workloads.go` | Lists Kubernetes workload information. | `promql.exec` | "List all desired workloads in the cluster 'production-gke' and namespace 'default'" | +| `kubernetes_list_pod_containers` | `tool_kubernetes_list_pod_containers.go` | Retrieves information from a particular pod and container. | `promql.exec` | "Show me info for pod 'my-pod' in cluster 'production-gke'" | +| `kubernetes_list_cronjobs` | `tool_kubernetes_list_cronjobs.go` | Retrieves information from the cronjobs in the cluster. | `promql.exec` | "List all cronjobs in cluster 'prod' and namespace 'default'" | +| `troubleshoot_kubernetes_list_top_unavailable_pods` | `tool_troubleshoot_kubernetes_list_top_unavailable_pods.go` | Shows the top N pods with the highest number of unavailable or unready replicas. | `promql.exec` | "Show the top 20 unavailable pods in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_restarted_pods` | `tool_troubleshoot_kubernetes_list_top_restarted_pods.go` | Lists the pods with the highest number of container restarts. | `promql.exec` | "Show the top 10 pods with the most container restarts in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_400_500_http_errors_in_pods.go` | Lists the pods with the highest rate of HTTP 4xx and 5xx errors over a specified time interval. | `promql.exec` | "Show the top 20 pods with the most HTTP errors in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_network_errors_in_pods` | `tool_troubleshoot_kubernetes_list_top_network_errors_in_pods.go` | Shows the top network errors by pod over a given interval. | `promql.exec` | "Show the top 10 pods with the most network errors in cluster 'production'" | +| `troubleshoot_kubernetes_list_count_pods_per_cluster` | `tool_troubleshoot_kubernetes_list_count_pods_per_cluster.go` | List the count of running Kubernetes Pods grouped by cluster and namespace. | `promql.exec` | "List the count of running Kubernetes Pods in cluster 'production'" | +| `troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_cpu_quota.go` | List Kubernetes pods with CPU usage below 25% of the quota limit. | `promql.exec` | "Show the top 10 underutilized pods by CPU quota in cluster 'production'" | +| `troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota` | `tool_troubleshoot_kubernetes_list_underutilized_pods_by_memory_quota.go` | List Kubernetes pods with memory usage below 25% of the limit. | `promql.exec` | "Show the top 10 underutilized pods by memory quota in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_cpu_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_workload.go` | Identifies the Kubernetes workloads (all containers) consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 workloads consuming the most CPU in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_cpu_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_cpu_consumed_by_container.go` | Identifies the Kubernetes containers consuming the most CPU (in cores). | `promql.exec` | "Show the top 10 containers consuming the most CPU in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_memory_consumed_by_workload` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_workload.go` | Lists memory-intensive workloads (all containers). | `promql.exec` | "Show the top 10 workloads consuming the most memory in cluster 'production'" | +| `troubleshoot_kubernetes_list_top_memory_consumed_by_container` | `tool_troubleshoot_kubernetes_list_top_memory_consumed_by_container.go` | Lists memory-intensive containers. | `promql.exec` | "Show the top 10 containers consuming the most memory in cluster 'production'" | + +# Adding a New Tool + +1. **See other tools:** Check how other tools are implemented so you can have the context on how they should look like. + +2. **Create Files:** Add `tool_.go` and `tool__test.go` in `internal/infra/mcp/tools/`. + +3. **Implement the Tool:** + * Define a struct that holds the Sysdig client, or any required collaborator. + * Implement the `handle` method, which contains the tool's core logic. + * Implement the `RegisterInServer` method to define the tool's MCP schema, including its name, description, parameters, and required permissions. Use helpers from `utils.go`. + * If a tool does not have any required permission, just specify `WithRequiredPermissions()`. + If the tool requires one or multiple permissions, specify them like `WithRequiredPermissions("a.permission", "another.permission")`. + +4. **Write Tests:** Use Ginkgo/Gomega to write BDD-style tests. Mock the Sysdig client to cover: + - Parameter validation + - Permission metadata + - Sysdig API client interactions (mocked) + - Error handling + +5. **Register the Tool:** Add the new tool to `setupHandler()` in `cmd/server/main.go`. + +6. **Document:** Add the new tool to the README.md and the table in this document. + + +## Testing Philosophy + +- Use BDD-style tests with Ginkgo/Gomega +- Each tool requires comprehensive test coverage for: + - Parameter validation (all possible combinations need to be tested) + - Permission metadata + - Sysdig API client interactions (mocked using go-mock) + - Error handling +- No focused specs (`FDescribe`, `FIt`) should be committed diff --git a/package.nix b/package.nix index b0efd42..8c93355 100644 --- a/package.nix +++ b/package.nix @@ -3,6 +3,7 @@ buildGoModule (finalAttrs: { pname = "sysdig-mcp-server"; version = "0.5.0"; src = ./.; + # This hash is automatically re-calculated with `just rehash-package-nix`. This is automatically called as well by `just bump`. vendorHash = "sha256-jf/px0p88XbfuSPMry/qZcfR0QPTF9IrPegg2CwAd6M="; subPackages = [