Enterprise-grade AI agents for Linux/RHEL system administration, powered by Google ADK and linux-mcp-server.
Sysadmin Agents is a collection of AI-powered assistants that help you troubleshoot and manage Linux servers. Instead of manually running diagnostic commands, you describe your problem in plain English and the agents investigate for you.
- System Administrators managing RHEL, Fedora, or other Linux servers
- DevOps/SRE Engineers who need quick diagnostics across multiple hosts
- Developers who want AI assistance for server troubleshooting
| Term | What It Means |
|---|---|
| AI Agent | Software that uses an LLM (like Gemini) to understand your request, decide what actions to take, and execute them autonomously |
| Google ADK | Agent Development Kit - Google's framework for building AI agents with tools, planning, and multi-agent orchestration |
| MCP | Model Context Protocol - A standard way for AI agents to use external tools; here we use linux-mcp-server which provides 19 Linux diagnostic commands |
| Orchestrator | The main "sysadmin" agent that understands your problem and delegates to specialist agents |
| Specialist Agent | Focused agents for specific tasks (performance, security, capacity, etc.) |
| Traditional Approach | With Sysadmin Agents |
|---|---|
SSH into server, run top, df -h, journalctl... |
Ask: "Why is the server slow?" |
| Remember which commands to use | Agent knows the right diagnostic tools |
| Manually correlate logs and metrics | Agent synthesizes findings into recommendations |
| Repeat for each server | Query multiple hosts in one conversation |
| Context lost between sessions | Session state preserved |
┌──────────────────────────────────────────────────────────────────────────────┐
│ ADK Web Interface │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Sysadmin Agent │ │
│ │ (Orchestrator/Router) │ │
│ │ "Describe your problem, I'll handle everything" │ │
│ └───────────────────────────────┬────────────────────────────────────────┘ │
│ │ transfer_to_agent │
│ ┌─────────────┬───────────────┼───────────────┬─────────────┐ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ RCA │ │ Perf │ │ Capacity │ │ Upgrade │ │ Security │ │
│ │Specialist│ │Specialist│ │Specialist│ │Specialist│ │Specialist│ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │Root Cause│ │Bottleneck│ │Disk Usage│ │ Readiness│ │ Audit │ │
│ │ Analysis │ │ Finder │ │ Analysis │ │ Check │ │ & Harden │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
│ │ PlanReActPlanner │
│ ┌────────────▼────────────┐ │
│ │ linux-mcp-server │ │
│ │ (19 tools) │ │
│ └────────────┬────────────┘ │
└───────────────────────────┼──────────────────────────────────────────────────┘
│ SSH
┌───────────┼───────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ RHEL 1 │ │ RHEL 2 │ │ RHEL n │
└────────┘ └────────┘ └────────┘
View as Mermaid Diagram (click to expand)
flowchart TB
subgraph UI [ADK Web Interface]
WebUI[Browser UI at localhost:8000]
end
subgraph Orchestrator [Sysadmin Agent]
Router[Orchestrator - Routes via transfer_to_agent]
end
subgraph Specialists [Specialist Sub-Agents]
RCA[RCA Agent]
Perf[Performance Agent]
Cap[Capacity Agent]
Upg[Upgrade Agent]
Sec[Security Agent]
end
subgraph MCP [MCP Layer]
MCPServer[linux-mcp-server with 19 tools]
end
subgraph Targets [Target Systems]
RHEL1[RHEL Server 1]
RHEL2[RHEL Server 2]
RHELn[RHEL Server n]
end
WebUI --> Router
Router -->|transfer_to_agent| RCA
Router -->|transfer_to_agent| Perf
Router -->|transfer_to_agent| Cap
Router -->|transfer_to_agent| Upg
Router -->|transfer_to_agent| Sec
RCA --> MCPServer
Perf --> MCPServer
Cap --> MCPServer
Upg --> MCPServer
Sec --> MCPServer
MCPServer -->|SSH| RHEL1
MCPServer -->|SSH| RHEL2
MCPServer -->|SSH| RHELn
The Sysadmin Agent is the single entry point. Users describe their problem naturally, and the agent:
- Routes to the appropriate specialist(s) automatically via
transfer_to_agent - Specialists execute investigations using linux-mcp-server tools with PlanReActPlanner
- Synthesizes findings into actionable recommendations
No need to think about which agent to use - just describe the problem.
The linux-mcp-server runs as a subprocess inside the same container (following the official ADK pattern):
┌─────────────────────────────────────────────────────────────┐
│ Container │
│ ┌─────────────────┐ stdio ┌───────────────────────┐ │
│ │ FastAPI │◄───────────►│ linux-mcp-server │ │
│ │ (ADK API + │ │ (19 read-only tools)│ │
│ │ Web UI) │ │ │ │
│ └─────────────────┘ └───────────┬───────────┘ │
└──────────────────────────────────────────────┼──────────────┘
│ SSH
┌────────────────┼────────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ RHEL 1 │ │ RHEL 2 │ │ RHEL n │
└────────┘ └────────┘ └────────┘
View as Mermaid Diagram (click to expand)
flowchart LR
subgraph Container [Container]
FastAPI[FastAPI - ADK API + Web UI]
MCPProc[linux-mcp-server subprocess]
FastAPI <-->|stdio| MCPProc
end
subgraph External [Target Systems]
RHEL1[RHEL 1]
RHEL2[RHEL 2]
RHELn[RHEL n]
end
MCPProc -->|SSH| RHEL1
MCPProc -->|SSH| RHEL2
MCPProc -->|SSH| RHELn
The MCP server uses SSH with key-based authentication to execute read-only diagnostic commands on remote RHEL servers.
The orchestrator uses transfer_to_agent for routing (no planner - it's a router, not a worker).
Sub-agents use PlanReActPlanner for structured reasoning:
/*PLANNING*/
1. First, check system logs for authentication failures
2. Then, analyze listening ports for unexpected services
3. Finally, review audit logs for security events
/*ACTION*/
read_log_file(path='/var/log/secure', lines=100, host='server1')
/*REASONING*/
The user asked about security issues. Checking /var/log/secure first
will reveal any failed login attempts or SSH brute-force attacks.
/*FINAL_ANSWER*/
Based on my investigation...
This provides transparent, step-by-step reasoning for complex troubleshooting.
| Agent | Purpose | Example Queries |
|---|---|---|
| sysadmin | Main entry point (orchestrator) | "Check my server", "What's wrong with my system?" |
| rca | Root Cause Analysis | "Why did the server crash?", "Investigate the outage" |
| performance | Performance Analysis | "System is slow", "High CPU usage" |
| capacity | Capacity Planning | "Disk is full", "Where's my space going?" |
| upgrade | Upgrade Readiness | "Ready to upgrade?", "Pre-flight check for Fedora 43" |
| security | Security Audit | "Check for security issues", "Are there failed logins?" |
Get a free API key from Google AI Studio:
- Sign in with your Google account
- Click "Create API Key"
- Copy the key (starts with
AIza...)
Note: The free tier includes 60 requests/minute which is sufficient for testing. See Gemini pricing for production usage.
The agents connect to your Linux servers via SSH. You need:
-
SSH key pair - If you don't have one:
# Generate a new SSH key (press Enter to accept defaults) ssh-keygen -t ed25519 -C "sysadmin-agents" # This creates ~/.ssh/id_ed25519 (private) and ~/.ssh/id_ed25519.pub (public)
-
Copy public key to target hosts:
# Replace 'user' and 'hostname' with your values ssh-copy-id -i ~/.ssh/id_ed25519.pub user@hostname # Test the connection (should not prompt for password) ssh -i ~/.ssh/id_ed25519 user@hostname "echo 'SSH works!'"
-
Required permissions on target hosts:
- User must be able to run diagnostic commands (
ps,df,journalctl, etc.) - For full functionality, user should have
sudoaccess or be in thewheel/admgroup
- User must be able to run diagnostic commands (
- Python 3.10+ - For local development
- Podman or Docker - For container deployment
- OpenShift/Kubernetes - For production
git clone https://github.com/your-org/sysadmin-agents.git
cd sysadmin-agentsUsing uv (recommended - faster):
# Install uv if you don't have it (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install
uv venv
source .venv/bin/activate
uv pip install -e ".[web]"Using standard pip:
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e ".[web]"Configure environment:
# Copy the example config and edit it
cp deploy/config.env.example .env
# Or set variables directly:
export GOOGLE_API_KEY=your-gemini-api-key
export LINUX_MCP_USER=your-ssh-username
export LINUX_MCP_SSH_KEY_PATH=~/.ssh/id_ed25519Start the ADK web interface:
adk web --port 8000 agentsOpen http://localhost:8000 and select the sysadmin agent.
Before connecting to real servers, verify the setup works:
# Check that agents load correctly
curl -s http://localhost:8000/list-apps | python3 -m json.tool
# Should return: ["agents", "core"]
# Check the web UI is accessible
# Open http://localhost:8000/dev-ui/ in your browser
# You should see a dropdown with "sysadmin" and other agentsIf you see the agents list, the installation is working. Now you can try a real query with your target host.
# Pull or build the image
podman build -t sysadmin-agents:latest -f Containerfile .
# Run with your API key and SSH key mounted
podman run -d \
-p 8000:8000 \
-e GOOGLE_API_KEY="your-api-key" \
-e LINUX_MCP_USER="your-ssh-user" \
-v ~/.ssh/id_ed25519:/opt/app-root/src/.ssh/id_ed25519:ro \
sysadmin-agents:latestOpen http://localhost:8000/dev-ui/ to access the ADK Web UI.
See Container Deployment section below.
Just describe your problem to the sysadmin agent:
My server webserver.example.com has been running slow. Can you check it out?
We had an outage on dbserver.example.com yesterday. Help me understand what happened.
The disk is almost full on storage.example.com. Find what's using the space.
Is my Fedora 42 system ready to upgrade to Fedora 43? Check everything.
Check for security issues on prod-server.example.com - any failed logins or suspicious activity?
The ADK Web UI provides real-time visibility into agent operations including routing, tool calls, and responses.
The orchestrator routes to the performance agent, which gathers CPU, memory, disk, and network metrics to identify bottlenecks:
Shows: Orchestrator routing → Performance agent → Tool calls (get_cpu_information, get_memory_information, get_disk_usage, etc.) → Final analysis with bottleneck identification and recommendations
For disk space issues, the capacity agent recursively analyzes directories and provides cleanup recommendations with safety ratings:
Shows: Directory analysis → Largest space consumers → Cleanup recommendations with safety levels (SAFE/MODERATE/CAUTION) → Recoverable space summary
The security agent checks authentication logs, open ports, and audit events to identify vulnerabilities:
Shows: SSH hardening recommendations → SELinux analysis → Network security → Multi-factor authentication suggestions
These agents are designed to address real-world sysadmin scenarios, inspired by the linux-mcp-server Fedora Magazine article.
Note: All examples below were tested against a real RHEL 9.3 system via the ADK API.
Agent: performance_agent
When your system lags during important moments (video calls, demos), the performance agent:
- Gathers resource metrics - CPU, memory, load averages, swap usage
- Identifies top consumers - Processes sorted by resource usage
- Checks for bottlenecks - CPU-bound, memory pressure, I/O wait, network saturation
- Reviews logs - OOM killer activity, service errors
- Provides recommendations - Immediate fixes and long-term improvements
Example prompt:
Check system performance on myserver.example.com - analyze CPU, memory,
and identify any performance bottlenecks.
Tools used:
get_system_information- OS version, kernel, uptimeget_cpu_information- Load averages, CPU model, utilizationget_memory_information- RAM/swap usageget_disk_usage- Filesystem utilizationlist_processes- Top CPU/memory consumers
Example output:
## Resource Usage
- CPU: 2.8% user, 8.3% system, 88.9% idle. Load average: 0.00, 0.02, 0.00
- Memory: 20.7% used, no swap usage
- Disk: Root partition at 10% utilization
## Bottleneck Identified
No significant performance bottlenecks. CPU, memory, and disk I/O are within
acceptable ranges.
## Recommendations
1. Monitor regularly for early issue detection
2. Optimize rhcd process if CPU usage increases
3. Review SSH configuration for security
Agent: capacity_agent
When your disk fills up mysteriously, the capacity agent:
- Analyzes all filesystems - Identifies which are running low
- Finds largest directories - Recursive deep-dive, not just top-level
- Categorizes space usage - Containers, caches, logs, trash
- Provides cleanup commands - With safety ratings (SAFE/MODERATE/CAUTION/DANGEROUS)
- Calculates recoverable space - Shows what you can safely delete
Example prompt:
Analyze disk usage on myserver.example.com. Find the largest directories
and identify cleanup opportunities with safety assessments.
Tools used:
get_disk_usage- All mounted filesystemslist_directories- Recursive directory size analysislist_block_devices- Physical storage layoutget_journal_logs- Disk-related warnings
Example output:
## Disk Usage Overview
| Filesystem | Size | Used | Available | Use% |
|------------|------|------|-----------|------|
| /dev/nvme0n1p4 | 30G | 2.8G | 27G | 10% |
## Largest Space Consumers
| Path | Size | Category |
|------|------|----------|
| /var/cache/dnf | 346 MB | Package Cache |
| /var/log/messages | 5.4 MB | Logs |
## Cleanup Recommendations
| Action | Space Saved | Safety | Command |
|--------|-------------|--------|---------|
| Clean DNF Cache | ~346 MB | SAFE | `dnf clean packages` |
## Total Recoverable Space: ~346 MB
Agent: rca_agent
When investigating incidents and crashes, the RCA agent:
- Gathers system context - Uptime, recent reboots, load
- Collects evidence - Logs, service status, resource metrics
- Builds timeline - Chronological sequence of events
- Correlates events - Connects related issues across log sources
- Identifies root cause - Distinguishes primary cause from symptoms
- Provides remediation - Immediate fixes and prevention measures
Example prompt:
Investigate any recent issues or errors on myserver.example.com.
Check system logs and services for problems.
Tools used:
get_system_information- Uptime, OS versionget_journal_logs- System warnings/errors (filterable by priority, unit, time)get_service_status- Problem service statusget_service_logs- Service-specific logsget_audit_logs- SELinux denials
Example output:
## Root Cause
SELinux policy preventing /usr/sbin/rhcd from establishing outbound connections
on TCP port 443.
## Evidence
- Kernel: Speculative Return Stack Overflow warnings
- SELinux: Repeated denials for rhcd → port 443
- SSH: kex_exchange_identification errors (possible intrusion attempts)
- rhcd service: "connection lost unexpectedly: pingresp not received"
## Recommendations
1. Use `sealert -l <alert_id>` to analyze SELinux denials
2. Create custom SELinux policy module for rhcd
3. Review SSH logs in /var/log/secure
4. Consider fail2ban for brute-force protection
Agent: upgrade_agent
Before major OS upgrades (Fedora 42→43, RHEL 9.4→9.5), the upgrade agent:
- Identifies current system - OS version, kernel, architecture
- Checks disk space - Minimum requirements for root, /boot, /var
- Reviews system health - Memory, CPU, running processes
- Checks critical services - NetworkManager, sshd, databases
- Scans logs for issues - Pre-existing problems to fix first
- Provides manual checks - Commands for package health, third-party repos
Example prompt:
Check if myserver.example.com is ready for an OS upgrade.
Assess disk space, system health, and identify any blockers.
Tools used:
get_system_information- Current OS versionget_disk_usage- Space for upgradeget_memory_information- RAM availabilityget_cpu_information- CPU resourceslist_processes- Running processesget_service_status- Critical services (NetworkManager, sshd)get_journal_logs- Pre-existing errorsget_network_interfaces- Network configurationlist_block_devices- Storage layout
Example output:
# Upgrade Readiness Report
## System Identification
- Distribution: Red Hat Enterprise Linux 9.3 (Plow)
- Kernel: 5.14.0-362.18.1.el9_3.x86_64
- Target Upgrade: RHEL 9.y (latest)
## Pre-flight Checks
| Filesystem | Size | Used | Available | Status |
|------------|------|------|-----------|--------|
| / | 30G | 2.8G | 27G | ✅ Ready |
| /boot | 536M | 162M | 375M | ⚠️ Warning |
| /boot/efi | 200M | 7.0M | 193M | ❌ Blocking |
## Potential Issues Found
| Issue | Severity | Recommendation |
|-------|----------|----------------|
| /boot/efi low space | HIGH | Free up space before upgrade |
| setroubleshoot high CPU | MEDIUM | Resolve SELinux issues first |
## Overall Readiness: NOT READY ❌
Manual verification commands provided:
sudo dnf check # Package dependency issues
dnf repolist # Third-party repos to verify
sestatus # SELinux status
cat /etc/os-release # Exact OS versionAgent: security_agent
For security auditing and hardening assessment, the security agent:
- Analyzes authentication logs - SSH login attempts from
/var/log/secure - Reviews audit logs - SELinux denials, privilege escalation, file access
- Checks network exposure - Open ports, listening services, active connections
- Identifies suspicious activity - Brute force attempts, unusual connections
- Provides hardening recommendations - With severity ratings (CRITICAL/HIGH/MEDIUM/LOW)
Example prompt:
Check for security issues on prod-server.example.com.
Look for failed login attempts, suspicious activity, and open ports.
Tools used:
read_log_file-/var/log/securefor SSH login attemptsget_audit_logs- Linux audit subsystem eventsget_listening_ports- Open ports assessmentget_network_connections- Active connection analysislist_services- Running services inventoryget_journal_logs- Security-related log entries
Example output:
# Security Audit Report
## Executive Summary
Overall security posture: MEDIUM
Key findings: 0 critical, 2 high, 3 medium
## Authentication Security
### SSH Login Analysis
- Total login attempts: 847
- Failed attempts: 312 (from 47 unique IPs)
- Successful logins: 535
- Suspicious activity: Yes - multiple failed attempts from unknown IPs
## Network Exposure
| Port | Service | Binding | Risk Assessment |
|------|---------|---------|-----------------|
| 22 | sshd | 0.0.0.0 | Expected - SSH access |
| 80 | nginx | 0.0.0.0 | Expected - Web server |
| 3306 | mysqld | 0.0.0.0 | ⚠️ HIGH - Database exposed to all interfaces |
## Findings Summary
| # | Finding | Severity | Recommendation |
|---|---------|----------|----------------|
| 1 | MySQL exposed on all interfaces | HIGH | Bind to localhost only |
| 2 | Multiple failed SSH attempts | HIGH | Configure fail2ban |
| 3 | SELinux denials detected | MEDIUM | Review with sealert |
## Immediate Actions Required
1. `mysql> SET GLOBAL bind_address = '127.0.0.1';`
2. `dnf install fail2ban && systemctl enable --now fail2ban`
Agent: sysadmin (orchestrator)
For comprehensive system assessment, the orchestrator uses multiple specialists:
Example prompt:
Run a complete health check on myserver.example.com - check performance,
disk space, any issues in logs, security, and whether it's ready for the next OS upgrade.
Routing behavior:
- sysadmin → Routes to
performance_agentfor resource analysis - performance_agent → Completes, returns to orchestrator
- sysadmin → Routes to
capacity_agentfor disk analysis - capacity_agent → Completes, returns to orchestrator
- sysadmin → Routes to
security_agentfor security audit - security_agent → Completes, returns to orchestrator
- sysadmin → Synthesizes all findings into comprehensive report
The orchestrator correctly chains specialist agents based on the multi-faceted request.
You can test agents directly via the ADK REST API (works for local, container, or deployed instances):
# Set your endpoint (local, container, or OpenShift)
export API_URL="http://localhost:8000"
# export API_URL="https://sysadmin-agents.apps.your-cluster.example.com"curl -s "$API_URL/list-apps" | jq
# Returns: ["agents", "core"]curl -s -X POST "$API_URL/apps/agents/users/user/sessions/my-session" \
-H "Content-Type: application/json" \
-d '{}' | jqcurl -s -X POST "$API_URL/run" \
-H "Content-Type: application/json" \
-d '{
"appName": "agents",
"userId": "user",
"sessionId": "my-session",
"newMessage": {
"role": "user",
"parts": [{"text": "Check system performance on myserver.example.com"}]
}
}' | jqcurl -s -X POST "$API_URL/run_sse" \
-H "Content-Type: application/json" \
-d '{
"appName": "agents",
"userId": "user",
"sessionId": "my-session",
"newMessage": {
"role": "user",
"parts": [{"text": "Check disk usage on myserver.example.com"}]
},
"streaming": false
}' --no-bufferThe response is a Server-Sent Events stream with JSON payloads containing:
functionCall- MCP tool invocationsfunctionResponse- Tool resultstext- Agent analysis and recommendations
sysadmin-agents/
├── agents/
│ ├── sysadmin/ # Main orchestrator agent
│ │ ├── agent.py # Routes to specialists
│ │ └── root_agent.yaml # ADK Agent Config
│ ├── rca/ # Root Cause Analysis
│ │ ├── agent.py # Agent loader
│ │ └── root_agent.yaml # Instructions & prompts
│ ├── performance/ # Performance Analysis
│ ├── capacity/ # Capacity Planning
│ ├── upgrade/ # Upgrade Readiness
│ └── security/ # Security Audit
├── core/ # Shared infrastructure
│ ├── config.py # Pydantic settings (env vars)
│ ├── mcp.py # MCP connection utilities
│ ├── callbacks.py # ADK callbacks
│ ├── safety.py # Safety screening
│ ├── events.py # Event processing
│ ├── state.py # Session state management
│ └── agent_loader.py # YAML config loader with MCP
├── deploy/ # Kubernetes/OpenShift manifests
│ ├── deployment.yaml # Container deployment
│ ├── service.yaml # ClusterIP service
│ ├── route.yaml # OpenShift route
│ ├── configmap.yaml # Application config
│ ├── secrets.yaml.example # Secret templates
│ ├── podman-kube.yaml # Podman Desktop deployment
│ └── kustomization.yaml # Kustomize config
├── Containerfile # Container build (UBI9 Python)
├── main.py # FastAPI entry point (ADK pattern)
├── .github/workflows/
│ ├── ci.yaml # Lint and test
│ └── build-image.yaml # Container build & push
├── scripts/
│ └── deploy.sh # OpenShift deployment script
├── docs/ # Documentation
└── tests/ # Tests
Each agent has access to these linux-mcp-server tools:
| Category | Tools | Description |
|---|---|---|
| System | get_system_information |
OS, kernel, hostname, uptime |
get_cpu_information |
CPU model, cores, load averages | |
get_memory_information |
RAM/swap usage | |
get_hardware_information |
Hardware details (DMI info) | |
| Storage | get_disk_usage |
Filesystem utilization (df) |
list_block_devices |
Physical disks, partitions (lsblk) | |
list_directories |
Directory sizes with sorting options | |
| Processes | list_processes |
Running processes (ps) |
get_process_info |
Details for specific PID | |
| Services | list_services |
All systemd services |
get_service_status |
Status of specific service | |
get_service_logs |
Logs for specific service | |
| Logs | get_journal_logs |
Systemd journal (filterable by unit, priority, time) |
get_audit_logs |
Linux audit subsystem logs | |
read_log_file |
Read specific log file (e.g., /var/log/secure) | |
read_file |
Read any file (cat) | |
| Network | get_network_interfaces |
Interface configuration (ip addr) |
get_network_connections |
Active connections (ss) | |
get_listening_ports |
Open listening ports |
All tools accept an optional host parameter for remote execution via SSH.
This project uses ADK Agent Config patterns with a hybrid approach for MCP tools (see Known Limitations).
-
Create directory:
agents/<name>/ -
Add
__init__.py:
"""My Specialist Agent."""
from .agent import my_agent, root_agent
__all__ = ["my_agent", "root_agent"]- Add
root_agent.yamlfor instructions and configuration:
# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: my_agent
model: gemini-2.0-flash
description: |
What this agent does (used for routing decisions).
output_key: last_my_report # Persists output to session state
generate_content_config:
temperature: 0.1
instruction: |
Detailed instructions for the LLM.
## Your Approach
...
# NOTE: MCP tools are added programmatically in agent.py
# This avoids serialization issues with McpToolset in ADK Agent Config- Add
agent.pyusingcreate_agent_with_mcp:
import logging
from pathlib import Path
from core.agent_loader import create_agent_with_mcp
logger = logging.getLogger(__name__)
CONFIG_PATH = Path(__file__).parent / "root_agent.yaml"
# Creates agent from YAML config with MCP tools added programmatically
root_agent = create_agent_with_mcp(CONFIG_PATH)
# Alias for convenience
my_agent = root_agent
logger.info(f"My agent created: {my_agent.name}")- To add as a sub-agent to the orchestrator, update
agents/sysadmin/agent.py:
# Add your sub-agent creation
my_sub = create_agent_with_mcp(
agents_dir / "my_agent" / "root_agent.yaml",
use_planner=True, # Sub-agents use PlanReActPlanner
disallow_transfer_to_peers=True, # Only orchestrator routes
)
# Add to sub_agents list
sub_agents=[rca_sub, performance_sub, capacity_sub, upgrade_sub, security_sub, my_sub],-
Update the orchestrator's
root_agent.yamlto include routing instructions for your new agent. -
Restart ADK web - your agent appears automatically!
McpToolset is listed but "not fully supported"
in ADK Agent Config YAML, causing serialization errors in the web UI. This project uses
create_agent_with_mcp() as a workaround that:
- Uses YAML for instructions/config (Agent Config pattern)
- Creates agents programmatically with MCP tools (avoids serialization issues)
# Install with uv (recommended)
uv pip install -e .
# Run linting
uv run ruff check .
uv run ruff format .
# Run tests
uv run pytest tests/ -vThe agents are packaged as a container image that includes:
- ADK Web UI - Browser-based interface for testing agents
- ADK API Server - RESTful API for programmatic access
- linux-mcp-server - MCP tools for RHEL/Linux system administration
The image is built using UBI9 Python 3.11 and follows Google ADK deployment patterns.
# Build locally
podman build -t sysadmin-agents:latest -f Containerfile .
# Or pull from GitHub Container Registry (after CI runs)
podman pull ghcr.io/your-org/sysadmin-agents:latestAll configuration is via environment variables (no hardcoded values):
| Variable | Description | Default |
|---|---|---|
GOOGLE_API_KEY |
Gemini API key (required) | - |
LINUX_MCP_USER |
SSH username for RHEL servers | - |
LINUX_MCP_SSH_KEY_PATH |
Path to SSH key in container | /opt/app-root/src/.ssh/id_ed25519 |
LINUX_MCP_ALLOWED_LOG_PATHS |
Allowed log paths | /var/log/messages,/var/log/secure |
PORT |
Server port | 8000 |
ALLOWED_ORIGINS |
CORS origins | * |
SERVE_WEB_UI |
Enable ADK Web UI | true |
podman run -d \
--name sysadmin-agents \
-p 8000:8000 \
-e GOOGLE_API_KEY="your-api-key" \
-e LINUX_MCP_USER="admin" \
-v ~/.ssh/id_ed25519:/opt/app-root/src/.ssh/id_ed25519:ro \
sysadmin-agents:latestFor more complex deployments with ConfigMaps and Secrets:
# Edit deploy/podman-kube.yaml with your settings
# Then deploy:
podman kube play deploy/podman-kube.yaml
# Access at http://localhost:30800/dev-ui/
# Stop:
podman kube down deploy/podman-kube.yaml# Login to your cluster
oc login https://api.your-cluster.example.com:6443
# Create namespace and secrets
oc create namespace sysadmin-agents
oc create secret generic gemini-api-key \
--from-literal=GOOGLE_API_KEY='your-key' \
-n sysadmin-agents
oc create secret generic ssh-private-key \
--from-file=id_ed25519=~/.ssh/id_ed25519 \
-n sysadmin-agents
# Deploy using the script
./scripts/deploy.sh sysadmin-agents ghcr.io/your-org/sysadmin-agents:latest
# Or apply manifests manually
oc apply -k deploy/ -n sysadmin-agentsThe route will be created automatically. Access at:
https://sysadmin-agents-sysadmin-agents.apps.your-cluster.example.com/dev-ui/
GitHub Actions workflow (.github/workflows/build-image.yaml) automatically:
- Runs tests
- Builds container image with Podman
- Pushes to GitHub Container Registry on
mainbranch or tags
See docs/DEPLOYMENT.md for detailed deployment instructions.
The ADK package isn't installed. Make sure you installed with the web extras:
pip install -e ".[web]"
# or
uv pip install -e ".[web]"Your Gemini API key is invalid or not set:
# Check if the variable is set
echo $GOOGLE_API_KEY
# Set it if missing
export GOOGLE_API_KEY=your-key-hereGet a new key at Google AI Studio.
The MCP server can't connect to your target hosts:
-
Test SSH manually first:
ssh -i ~/.ssh/id_ed25519 user@hostname "echo works"
-
Check environment variables:
echo $LINUX_MCP_USER echo $LINUX_MCP_SSH_KEY_PATH
-
Verify key permissions:
chmod 600 ~/.ssh/id_ed25519
The agent module has an error. Check the console for import errors:
# Run with verbose logging
adk web --port 8000 agents 2>&1 | grep -i errorCommon causes:
- Syntax error in
root_agent.yaml - Missing
__init__.py - Import error in
agent.py
You've hit the Gemini API rate limit (60 requests/minute on free tier). Wait a moment and try again, or upgrade your API plan.
Run with debug logging to see what's happening:
export LINUX_MCP_LOG_LEVEL=DEBUG
adk web --port 8000 agentsIf you're still stuck:
- Check the open issues
- Open a new issue with:
- Python version (
python --version) - OS and version
- Full error message
- Steps to reproduce
- Python version (
| Term | Definition |
|---|---|
| ADK | Agent Development Kit - Google's Python framework for building AI agents |
| Agent | AI-powered software that can understand requests, make decisions, and execute actions autonomously |
| CORS | Cross-Origin Resource Sharing - security feature for web APIs |
| LLM | Large Language Model - the AI model (like Gemini) that powers the agents |
| MCP | Model Context Protocol - standardized way for AI agents to use external tools |
| Orchestrator | The main agent that understands requests and routes to specialists |
| PlanReActPlanner | ADK's planning strategy that creates a plan, then executes it step-by-step |
| Specialist | A focused agent for a specific domain (performance, security, etc.) |
| SSE | Server-Sent Events - streaming protocol for real-time responses |
| transfer_to_agent | ADK function that routes a conversation from one agent to another |
We welcome contributions! See docs/CONTRIBUTING.md.
Check docs/BACKLOG.md for planned features.
Apache-2.0


