Project Name: DevOps Network Monitoring Agent
Framework Foundation: AutoAgents (Rust multi-agent framework)
Primary Goal: Automate DevOps network monitoring tasks by analyzing Traefik logs in Elasticsearch/Kibana, detecting threats (primarily secrets scanning at high rates), and making deterministic recommendations or actions via Kubernetes API.
Key Capabilities:
- Localized SQLite database for state management (allowlist, incidents, actions)
- Configurable operating modes (auto, review, disabled)
- Multi-turn LLM-driven Elasticsearch query flow for dynamic threat detection
- Traefik-first Kubernetes pattern inspection and blocking
- Simple SPA for review mode approval workflow (no auth)
- Detect secrets scanning activity via high-rate 401/403 responses to sensitive paths with suspicious user-agents
- Maintain a SQLite-backed allowlist to prevent blocking expected IPs
- Inspect existing Kubernetes/Traefik blocking patterns and match them deterministically
- Support configurable LLM service (URL-based or API key-based)
- Provide
auto,review, anddisabledoperating modes - Deliver a simple vanilla HTML/CSS/JS SPA for review mode incident approval
- Complex SPA framework setup (no React/Vue/Svelte build steps)
- Authentication/authorization for the initial SPA version
- Cloud-only LLM defaults (LLM service always uses configured URL or CLI-specified URL)
- Generic log analysis (focus is primarily on secrets scanning indicators)
| Module | Responsibility |
|---|---|
config/ |
Configuration loading and validation (YAML) |
db/ |
SQLite database operations (allowlist, incidents, actions) |
llm/ |
Unified LLM provider abstraction (URL or API key + provider) |
elasticsearch/ |
ES client with LLM-driven DSL query execution |
k8s/ |
Kubernetes client for Traefik CRD discovery and pattern matching |
agent/ |
ReAct executor with tool registry and guardrails hooks |
tools/ |
Agent tools (query_logs, check_allowlist, inspect_patterns, apply_block) |
models/ |
Data models (Incident, Recommendation, Action) |
web/ |
axum HTTP server serving SPA static files and API endpoints |
# LLM Service Configuration
llm:
service_type: "url" # "url" or "api_key"
url: "http://localhost:11434/api/generate" # For Ollama/local
# OR
# provider: "openai"
# api_key: "${OPENAI_API_KEY}"
model: "llama3" # or "gpt-4o"
temperature: 0.2
# Operating Mode
mode: "review" # "auto", "review", or "disabled"
# Elasticsearch/Kibana Configuration
elasticsearch:
url: "https://es-cluster:9200"
username: "${ES_USER}"
password: "${ES_PASSWORD}"
index_pattern: "traefik-logs-*"
# Kubernetes Configuration
kubernetes:
context: "${KUBE_CONTEXT}"
# Database Configuration
database:
path: "./data/devops-agent.db"
# Guardrails Configuration
guardrails:
enabled: true
input_policy: "sanitize"
output_policy: "audit"- Always uses configured URL or CLI-specified
--llm-urlflag - Supports both
url(e.g., Ollama athttp://localhost:11434/api/generate) andapi_key+provider(OpenAI, Anthropic, etc.) - CLI override:
--llm-url <url>or--llm-provider <provider> --llm-api-key <key>
Indicators include:
- High volume of
401 Unauthorizedor403 Forbiddenresponses - Specific paths:
/.env,/wp-config.php,/config.xml,/.git/config,/api/v1/secrets - User-Agent patterns:
gitleaks,trufflehog,secretsniffer,nuclei - Time-windowed rate analysis (e.g., >50 requests/minute to sensitive paths)
- Initial query: Fetch Traefik logs for time window with 401/403 status codes
- Analyze results: Identify path patterns, user-agents, IP sources
- Refine query: LLM dynamically generates follow-up Elasticsearch DSL queries
- Rate analysis: Calculate request volumes per IP/time-window
- Pattern confirmation: LLM confirms if behavior matches secrets scanning indicators
- Traefik Middleware resources (
Middleware,MiddlewareTCP) with IP whitelisting/blacklisting - Traefik CRDs:
IngressRoute,IngressRouteTCP,TraefikServicewith IP-based routing - ConfigMaps/Secrets consumed by Traefik pods (e.g., custom IP block lists)
- NetworkPolicy resources (fallback native K8s blocking)
- Agent uses
kubecrate to discover Traefik CRDs - Equivalent to:
kubectl get middleware,ingressroute,traefikservice -A
| Mode | Behavior |
|---|---|
auto |
Agent executes apply_network_block deterministically |
review |
Agent creates recommendation record, requires human approval via SPA before apply_network_block |
disabled |
Read-only: only queries logs, checks allowlist, creates incident records |
Tables:
allowlist_ips:id,ip_or_cidr,description,created_at,expires_atincidents:id,ip_address,threat_type,severity,detected_at,statusactions:id,incident_id,action_type,mode,applied_at,approved_by
- When threat is detected, check IP against SQLite allowlist
- If IP is in allowlist, skip blocking and log as allowed
- If not in allowlist, create incident record and proceed to recommendation
- Vanilla HTML/CSS/JS served via Rust
axumstatic files - No build steps, no Node.js dependencies
- Python-friendly structure: clean separation of HTML (structure), CSS (styling), JS (API calls)
- No authentication for initial version
GET /api/incidents- List detected incidentsGET /api/incidents/{id}/recommendation- View LLM recommendationPOST /api/incidents/{id}/approve- Approve blocking actionPOST /api/incidents/{id}/reject- Reject blocking action
Where guardrails make sense:
- Input validation: Sanitize ES query results before passing to LLM (remove PII, limit token count)
- Output validation: Ensure recommendations/commands are well-formed and safe
- Audit policy: Log all LLM interactions for compliance
Uses AutoAgents' autoagents-guardrails crate with:
Blockpolicy for malicious input patternsSanitizepolicy for log data redactionAuditpolicy for action recommendations
devops-agent --config config.yaml \
--mode review \
--serve-spa \
--llm-url http://localhost:11434/api/generate \
# OR --llm-provider openai --llm-api-key <key>--config <file>: Path to configuration file--mode <auto|review|disabled>: Operating mode override--serve-spa: Enable SPA web server--llm-url <url>: LLM service URL override--llm-provider <provider>: LLM provider (openai, anthropic, etc.)--llm-api-key <key>: LLM API key
devops-agent/
├── Cargo.toml
├── config/
│ └── default.yaml
├── src/
│ ├── main.rs # CLI entry point (clap)
│ ├── config/
│ │ ├── mod.rs
│ │ └── models.rs
│ ├── db/
│ │ ├── mod.rs
│ │ ├── schema.rs
│ │ └── queries.rs
│ ├── llm/
│ │ ├── mod.rs # URL/API key factory
│ │ └── provider.rs
│ ├── elasticsearch/
│ │ ├── mod.rs
│ │ └── queries.rs # LLM-generated DSL execution
│ ├── k8s/
│ │ ├── mod.rs
│ │ ├── inspector.rs # Traefik CRD discovery via kube crate
│ │ └── blocker.rs
│ ├── agent/
│ │ ├── mod.rs # ReAct executor + tools
│ │ └── hooks.rs # Guardrails
│ ├── tools/
│ │ ├── mod.rs
│ │ ├── query_logs.rs
│ │ ├── check_allowlist.rs
│ │ ├── inspect_patterns.rs
│ │ └── apply_block.rs
│ ├── models/
│ │ ├── incident.rs
│ │ ├── recommendation.rs
│ │ └── action.rs
│ └── web/
│ ├── mod.rs # axum server setup
│ ├── api.rs # REST endpoints
│ └── static/
│ ├── index.html
│ ├── app.js
│ └── styles.css
└── config.yaml.example
[dependencies]
autoagents = { version = "0.5", features = ["full"] }
autoagents-guardrails = "0.5"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_yaml = "0.9"
serde_json = "1"
sqlx = { version = "0.7", features = ["sqlite", "runtime-tokio", "macros"] }
kube = { version = "0.88", features = ["client", "derive", "rustls-tls"] }
reqwest = { version = "0.11", features = ["json", "rustls-tls"] }
axum = "0.7"
tracing = "0.1"
tracing-subscriber = "0.3"
clap = { version = "4", features = ["derive"] }| Phase | Task |
|---|---|
| 1 | Project init, Cargo.toml, CLI parsing (clap), config loading |
| 2 | SQLite DB setup (allowlist, incidents, actions tables) |
| 3 | LLM provider abstraction (URL + API key factory) |
| 4 | Elasticsearch client (LLM-generated DSL execution) |
| 5 | Kubernetes client (Traefik CRD discovery via kube crate) |
| 6 | Agent tools + ReAct executor + guardrails integration |
| 7 | Web server + SPA (vanilla JS via axum) |
| 8 | Testing and documentation |
Upon PRD approval, implementation will begin with:
- Project initialization and Cargo.toml setup
- CLI parsing and configuration loading
- SQLite database schema and operations
- LLM provider abstraction