diff --git a/docs/decisions/prometheus-integration-pattern.md b/docs/decisions/prometheus-integration-pattern.md new file mode 100644 index 00000000..c2a52333 --- /dev/null +++ b/docs/decisions/prometheus-integration-pattern.md @@ -0,0 +1,216 @@ +# Decision: Prometheus Integration Pattern - Enabled by Default with Opt-Out + +## Status + +Accepted + +## Date + +2025-01-22 + +## Context + +The tracker deployment system needed to add Prometheus as a metrics collection service. Several design decisions were required: + +1. **Enablement Strategy**: Should Prometheus be mandatory, opt-in, or enabled-by-default? +2. **Template Rendering**: How should Prometheus templates be rendered in the release workflow? +3. **Service Validation**: How should E2E tests validate optional services like Prometheus? + +The decision impacts: + +- User experience (ease of getting started with monitoring) +- System architecture (template rendering patterns) +- Testing patterns (extensibility for future optional services) + +## Decision + +### 1. Enabled-by-Default with Opt-Out + +Prometheus is **included by default** in generated environment templates but can be disabled by removing the configuration section. + +**Implementation**: + +```rust +pub struct UserInputs { + pub prometheus: Option, // Some by default, None to disable +} +``` + +**Configuration**: + +```json +{ + "prometheus": { + "scrape_interval": 15 + } +} +``` + +**Disabling**: Remove the entire `prometheus` section from the environment config. + +**Rationale**: + +- Monitoring is a best practice - users should get it by default +- Opt-out is simple - just remove the config section +- No complex feature flags or enablement parameters needed +- Follows principle of least surprise (monitoring expected for production deployments) + +### 2. Independent Template Rendering Pattern + +Each service renders its templates **independently** in the release handler, not from within other service's template rendering. + +**Architecture**: + +```text +ReleaseCommandHandler::execute() +├─ Step 1: Create tracker storage +├─ Step 2: Render tracker templates (tracker/*.toml) +├─ Step 3: Deploy tracker configs +├─ Step 4: Create Prometheus storage (if enabled) +├─ Step 5: Render Prometheus templates (prometheus.yml) - INDEPENDENT STEP +├─ Step 6: Deploy Prometheus configs +├─ Step 7: Render Docker Compose templates (docker-compose.yml) +└─ Step 8: Deploy compose files +``` + +**Rationale**: + +- Each service is responsible for its own template rendering +- Docker Compose templates only define service orchestration, not content generation +- Environment configuration is the source of truth for which services are enabled +- Follows Single Responsibility Principle (each step does one thing) +- Makes it easy to add future services (Grafana, Alertmanager, etc.) + +**Anti-Pattern Avoided**: Rendering Prometheus templates from within Docker Compose template rendering step. + +### 3. ServiceValidation Struct for Extensible Testing + +E2E validation uses a `ServiceValidation` struct with boolean flags instead of function parameters. + +**Implementation**: + +```rust +pub struct ServiceValidation { + pub prometheus: bool, + // Future: pub grafana: bool, + // Future: pub alertmanager: bool, +} + +pub fn run_release_validation( + socket_addr: SocketAddr, + ssh_credentials: &SshCredentials, + services: Option, +) -> Result<(), String> +``` + +**Rationale**: + +- Extensible for future services without API changes +- More semantic than boolean parameters +- Clear intent: `ServiceValidation { prometheus: true }` +- Follows Open-Closed Principle (open for extension, closed for modification) + +**Anti-Pattern Avoided**: `run_release_validation_with_prometheus_check(addr, creds, true)` - too specific and not extensible. + +## Consequences + +### Positive + +1. **Better User Experience**: + + - Users get monitoring by default without manual setup + - Simple opt-out (remove config section) + - Production-ready deployments out of the box + +2. **Cleaner Architecture**: + + - Each service manages its own templates independently + - Clear separation of concerns in release handler + - Easy to add future services (Grafana, Alertmanager, Loki, etc.) + +3. **Extensible Testing**: + + - ServiceValidation struct easily extended for new services + - Consistent pattern for optional service validation + - Type-safe validation configuration + +4. **Maintenance Benefits**: + - Independent template rendering simplifies debugging + - Each service's templates can be modified independently + - Clear workflow steps make issues easier to trace + +### Negative + +1. **Default Overhead**: + + - Users who don't want monitoring must manually remove the section + - Prometheus container always included in default deployments + - Slightly more disk/memory usage for minimal deployments + +2. **Configuration Discovery**: + - Users must learn that removing the section disables the service + - Not immediately obvious from JSON schema alone + - Requires documentation of the opt-out pattern + +### Risks + +1. **Breaking Changes**: Future Prometheus config schema changes require careful migration planning +2. **Service Dependencies**: Adding services that depend on Prometheus requires proper ordering logic +3. **Template Complexity**: As services grow, need to ensure independent rendering doesn't duplicate logic + +## Alternatives Considered + +### Alternative 1: Mandatory Prometheus + +**Approach**: Always deploy Prometheus, no opt-out. + +**Rejected Because**: + +- Forces monitoring on users who don't want it +- Increases minimum resource requirements +- Violates principle of least astonishment for minimal deployments + +### Alternative 2: Opt-In with Feature Flag + +**Approach**: Prometheus disabled by default, enabled with `"prometheus": { "enabled": true }`. + +**Rejected Because**: + +- Requires users to discover and enable monitoring manually +- Most production deployments should have monitoring - opt-in makes it less likely +- Adds complexity with enabled/disabled flags + +### Alternative 3: Render Prometheus Templates from Docker Compose Step + +**Approach**: Docker Compose template rendering step also renders Prometheus templates. + +**Rejected Because**: + +- Violates Single Responsibility Principle +- Makes Docker Compose step dependent on Prometheus internals +- Harder to add future services independently +- Couples service orchestration with service configuration + +### Alternative 4: Boolean Parameters for Service Validation + +**Approach**: `run_release_validation(addr, creds, check_prometheus: bool)`. + +**Rejected Because**: + +- Not extensible - adding Grafana requires API change +- Less semantic - what does `true` mean? +- Becomes unwieldy with multiple services +- Violates Open-Closed Principle + +## Related Decisions + +- [Template System Architecture](../technical/template-system-architecture.md) - Project Generator pattern +- [Environment Variable Injection](environment-variable-injection-in-docker-compose.md) - Configuration passing +- [DDD Layer Placement](../contributing/ddd-layer-placement.md) - Module organization + +## References + +- Issue: [#238 - Prometheus Slice - Release and Run Commands](../issues/238-prometheus-slice-release-run-commands.md) +- Manual Testing Guide: [Prometheus Verification](../e2e-testing/manual/prometheus-verification.md) +- Prometheus Documentation: https://prometheus.io/docs/ +- torrust-demo Reference: Existing Prometheus integration patterns diff --git a/docs/e2e-testing/README.md b/docs/e2e-testing/README.md index a8385e73..f3852041 100644 --- a/docs/e2e-testing/README.md +++ b/docs/e2e-testing/README.md @@ -7,7 +7,10 @@ This guide explains how to run and understand the End-to-End (E2E) tests for the - **[README.md](README.md)** - This overview and quick start guide - **[architecture.md](architecture.md)** - E2E testing architecture, design decisions, and Docker strategy - **[running-tests.md](running-tests.md)** - How to run automated tests, command-line options, and prerequisites -- **[manual-testing.md](manual-testing.md)** - Complete guide for running manual E2E tests with CLI commands +- **[manual/](manual/)** - Manual E2E testing guides: + - **[README.md](manual/README.md)** - Complete manual test workflow (generic deployment guide) + - **[mysql-verification.md](manual/mysql-verification.md)** - MySQL service verification and troubleshooting + - **[prometheus-verification.md](manual/prometheus-verification.md)** - Prometheus metrics verification and troubleshooting - **[test-suites.md](test-suites.md)** - Detailed description of each test suite and what they validate - **[troubleshooting.md](troubleshooting.md)** - Common issues, debugging techniques, and cleanup procedures - **[contributing.md](contributing.md)** - Guidelines for extending E2E tests @@ -67,7 +70,10 @@ For detailed prerequisites and manual setup, see [running-tests.md](running-test ## 📚 Learn More - **New to E2E testing?** Start with [test-suites.md](test-suites.md) to understand what each test does -- **Want to run manual tests?** Follow [manual-testing.md](manual-testing.md) for step-by-step CLI workflow +- **Want to run manual tests?** Follow [manual/README.md](manual/README.md) for step-by-step CLI workflow +- **Testing specific services?** See service-specific guides: + - [manual/mysql-verification.md](manual/mysql-verification.md) - MySQL verification + - [manual/prometheus-verification.md](manual/prometheus-verification.md) - Prometheus verification - **Running into issues?** Check [troubleshooting.md](troubleshooting.md) - **Want to understand the architecture?** Read [architecture.md](architecture.md) - **Adding new tests?** See [contributing.md](contributing.md) diff --git a/docs/e2e-testing/manual-testing.md b/docs/e2e-testing/manual/README.md similarity index 83% rename from docs/e2e-testing/manual-testing.md rename to docs/e2e-testing/manual/README.md index 5ec0493d..5495bf72 100644 --- a/docs/e2e-testing/manual-testing.md +++ b/docs/e2e-testing/manual/README.md @@ -6,6 +6,7 @@ This guide explains how to manually run a complete end-to-end test of the Torrus - [Prerequisites](#prerequisites) - [Complete Manual Test Workflow](#complete-manual-test-workflow) +- [Service-Specific Verification](#service-specific-verification) - [Handling Interrupted Commands](#handling-interrupted-commands) - [State Recovery](#state-recovery) - [Troubleshooting Manual Tests](#troubleshooting-manual-tests) @@ -399,6 +400,50 @@ lxc profile list | grep manual-test ls data/manual-test 2>/dev/null || echo "Cleaned up successfully" ``` +## Service-Specific Verification + +After deploying your environment, you may want to verify that specific services are working correctly. The following guides provide detailed verification steps for each supported service: + +### MySQL Database + +If your deployment includes MySQL as the database backend, see the [MySQL Verification Guide](mysql-verification.md) for detailed steps to: + +- Verify MySQL container health and connectivity +- Check database tables and schema +- Validate tracker-to-MySQL connectivity +- Troubleshoot MySQL-specific issues +- Compare MySQL behavior vs SQLite + +### Prometheus Metrics Collection + +If your deployment includes Prometheus for metrics collection (enabled by default), see the [Prometheus Verification Guide](prometheus-verification.md) for detailed steps to: + +- Verify Prometheus container is running +- Check configuration file deployment +- Validate target scraping (both `/api/v1/stats` and `/api/v1/metrics`) +- Access Prometheus web UI +- Query collected metrics +- Troubleshoot Prometheus-specific issues + +### Basic Tracker Verification + +For basic tracker functionality without service-specific checks: + +```bash +# Get the VM IP +export INSTANCE_IP=$(cat data/manual-test/environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') + +# Test HTTP tracker health endpoint +curl http://$INSTANCE_IP:7070/health_check + +# Test HTTP API health endpoint +curl http://$INSTANCE_IP:1212/api/health_check + +# Check tracker container logs +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust@$INSTANCE_IP \ + "docker logs tracker" +``` + ## Handling Interrupted Commands Commands can be interrupted (Ctrl+C) during execution, leaving the environment in an intermediate state. @@ -761,6 +806,117 @@ exit cargo run -- run manual-test ``` +## Debugging with Application Logs + +If you encounter any issues during the workflow, the application maintains detailed logs that can help diagnose problems: + +### Log Location + +All application execution logs are stored in: + +```bash +data/logs/log.txt +``` + +This file contains **all** operations performed by the deployer, including: + +- Command execution traces with timestamps +- State transitions (Created → Provisioned → Configured → Released → Running) +- Ansible playbook executions with full command details +- Template rendering operations +- Error messages with context +- Step-by-step progress through each command + +### Viewing Logs + +**View recent operations**: + +```bash +# Last 100 lines +tail -100 data/logs/log.txt + +# Follow logs in real-time +tail -f data/logs/log.txt +``` + +**Search for specific command**: + +```bash +# View all release command operations +grep -A5 -B5 'release' data/logs/log.txt + +# View provision operations +grep -A5 -B5 'provision' data/logs/log.txt + +# View Ansible playbook executions +grep 'ansible-playbook' data/logs/log.txt +``` + +**Check state transitions**: + +```bash +# View all state transitions for your environment +grep 'Environment state transition' data/logs/log.txt | grep manual-test +``` + +**Find errors**: + +```bash +# Search for ERROR level logs +grep 'ERROR' data/logs/log.txt + +# Search for WARN level logs +grep 'WARN' data/logs/log.txt +``` + +### Example: Debugging Release Command + +If the release command completes but files aren't on the VM: + +```bash +# Check what actually happened during release +grep -A10 'release_command' data/logs/log.txt | tail -50 + +# Verify Ansible playbooks were executed +grep 'deploy-tracker-config\|deploy-compose-files' data/logs/log.txt + +# Check for any Ansible errors +grep -A5 'Ansible playbook.*failed' data/logs/log.txt +``` + +### Log Format + +Logs are structured with: + +- **Timestamp**: ISO 8601 format (e.g., `2025-12-14T11:52:16.232160Z`) +- **Level**: INFO, WARN, ERROR +- **Span**: Command and step context (e.g., `release_command:deploy_tracker_config`) +- **Module**: Rust module path +- **Message**: Human-readable description +- **Fields**: Structured data (environment name, step name, status, etc.) + +**Example log entry**: + +```text +2025-12-14T11:52:21.495109Z INFO release_command: torrust_tracker_deployer_lib::application::command_handlers::release::handler: +Tracker configuration deployed successfully command="release" step=Deploy Tracker Config to Remote command_type="release" +environment=manual-test +``` + +### Common Issues in Logs + +1. **"Ansible playbook failed"**: Check the Ansible command that was executed and verify SSH connectivity +2. **"Template rendering failed"**: Check template syntax and context data +3. **"State persistence failed"**: Check file permissions in `data/` directory +4. **"Instance IP not found"**: Environment wasn't provisioned correctly + +### Tips + +- The log file grows with each command execution +- Consider searching for your environment name to filter relevant logs +- Timestamps help correlate logs with command execution times +- All Ansible playbook commands are logged with full paths and arguments + ## Cleanup Procedures ### Application-Level Cleanup (Recommended) diff --git a/docs/e2e-testing/manual-testing-mysql.md b/docs/e2e-testing/manual/mysql-verification.md similarity index 54% rename from docs/e2e-testing/manual-testing-mysql.md rename to docs/e2e-testing/manual/mysql-verification.md index a2da125b..3e18cf37 100644 --- a/docs/e2e-testing/manual-testing-mysql.md +++ b/docs/e2e-testing/manual/mysql-verification.md @@ -1,6 +1,50 @@ -# Manual E2E Testing Guide for MySQL Support +# Manual MySQL Service Verification -This guide provides step-by-step instructions for manually testing the complete MySQL deployment workflow. +This guide provides MySQL-specific verification steps for manual E2E testing. For the complete deployment workflow, see the [Manual E2E Testing Guide](README.md). + +## Overview + +This guide covers: + +- MySQL container health and connectivity +- Database schema verification +- Tracker-to-MySQL connection validation +- MySQL-specific troubleshooting +- Performance comparison with SQLite + +## Prerequisites + +Complete the standard deployment workflow first (see [Manual E2E Testing Guide](README.md)): + +1. ✅ Environment created with MySQL configuration +2. ✅ Infrastructure provisioned +3. ✅ Services configured +4. ✅ Software released +5. ✅ Services running + +**Your environment configuration must include MySQL**: + +```json +{ + "tracker": { + "core": { + "database": { + "driver": "mysql", + "database_name": "torrust_tracker" + } + } + }, + "database": { + "driver": "mysql", + "host": "mysql", + "port": 3306, + "database_name": "torrust_tracker", + "username": "tracker_user", + "password": "tracker_password", + "root_password": "root_password" + } +} +``` ## ⚠️ CRITICAL: Understanding File Locations @@ -25,79 +69,212 @@ This guide provides step-by-step instructions for manually testing the complete **NEVER confuse these two files!** The user creates configurations in `envs/`, and the application manages state in `data/`. -## Test Environment +## MySQL-Specific Verification -- **Environment Name**: `manual-test-mysql` -- **Database**: MySQL 8.0 -- **Provider**: LXD -- **User Configuration File**: `envs/manual-test-mysql.json` -- **Internal State Directory**: `data/manual-test-mysql/` +This section provides detailed MySQL verification steps that should be performed after completing the standard deployment workflow. -## Prerequisites +### 1. Get the VM IP Address -Before starting the test, ensure: +Extract the instance IP from the environment state: -1. LXD is installed and configured -2. The `torrust-profile-manual-test-mysql` LXD profile exists (will be created automatically) -3. All dependencies are installed: `cargo run --bin dependency-installer install` -4. Pre-commit checks pass: `./scripts/pre-commit.sh` -5. The environment configuration exists: `envs/manual-test-mysql.json` - -## Complete MySQL Deployment Workflow +```bash +export INSTANCE_IP=$(cat data/your-env/environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') +echo "VM IP: $INSTANCE_IP" +``` -### Step 1: Create Environment +### 2. Verify MySQL Container Health -Create the deployment environment from the MySQL configuration: +Check that the MySQL container is running and healthy: ```bash -cargo run -- create environment --env-file envs/manual-test-mysql.json +# Check both containers are running +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker ps --format 'table {{.Names}}\t{{.Status}}'" ``` -**Expected Output**: +**Expected output:** ```text -✓ Environment 'manual-test-mysql' created successfully +NAMES STATUS +tracker Up X seconds (healthy) +mysql Up X seconds (healthy) ``` -**Verification**: +**Key verification points:** + +- ✅ MySQL container status shows `(healthy)` +- ✅ Tracker container also shows `(healthy)` indicating it connected to MySQL successfully +- ✅ Both containers have been up for some time (not restarting) + +### 3. Verify MySQL Database and Schema + +Check that the database was created and tables exist: ```bash -# Check internal state file was created by the application -ls -la data/manual-test-mysql/environment.json +# List all databases +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec mysql mysql -u tracker_user -p'tracker_password' -e 'SHOW DATABASES;'" +``` + +**Expected databases:** -# Inspect internal state (Rust struct serialization) - shows current deployment state -cat data/manual-test-mysql/environment.json | jq '.state.type' -# Expected: "Created" (note: capitalized, this is the Rust enum variant name) +```text +Database +information_schema +performance_schema +torrust_tracker ← Your tracker database ``` -**Note**: The state file in `data/` is the application's internal representation. Do NOT edit it manually. +**Check tracker tables:** + +```bash +# List tables in tracker database +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec mysql mysql -u tracker_user -p'tracker_password' torrust_tracker -e 'SHOW TABLES;'" +``` -### Step 2: Provision Infrastructure +**Expected tables:** -Provision the LXD VM instance: +```text +Tables_in_torrust_tracker +keys +torrent_aggregate_metrics +torrents +whitelist +``` + +### 4. Verify Docker-to-MySQL Network Connectivity + +Test that the tracker container can reach MySQL over the Docker network: ```bash -cargo run -- provision manual-test-mysql +# Ping MySQL from tracker container +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec tracker ping -c 2 mysql" ``` -**Expected Output**: +**Expected output:** ```text -⏳ [1/3] Validating environment... - ✓ Environment name validated: manual-test-mysql (took 0ms) -⏳ [2/3] Creating command handler... - ✓ Done (took 0ms) -⏳ [3/3] Provisioning infrastructure... - ✓ Infrastructure provisioned (took 28.4s) -✅ Environment 'manual-test-mysql' provisioned successfully +PING mysql (172.18.0.2): 56 data bytes +64 bytes from 172.18.0.2: seq=0 ttl=64 time=0.052 ms +64 bytes from 172.18.0.2: seq=1 ttl=64 time=0.081 ms +2 packets transmitted, 2 packets received, 0% packet loss ``` -**Verification**: +### 5. Verify Tracker Configuration + +Check that the tracker is configured to use MySQL: ```bash -# Check instance is running -lxc list | grep torrust-tracker-manual-test-mysql +# Check tracker configuration +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec tracker cat /etc/torrust/tracker/tracker.toml | grep -A 5 '\[core.database\]'" +``` + +**Expected output:** + +```toml +[core.database] +driver = "mysql" +path = "mysql://tracker_user:tracker_password@mysql:3306/torrust_tracker" +``` + +**Key verification points:** + +- ✅ `driver = "mysql"` (not "sqlite3") +- ✅ Connection string uses MySQL format +- ✅ Hostname is `mysql` (Docker service name) +- ✅ Port is `3306` (MySQL default) +- ✅ Database name matches configuration + +### 6. Verify Tracker Startup (No Connection Errors) + +**IMPORTANT**: The docker-compose template includes `depends_on` with `condition: service_healthy` for the tracker service. This ensures the tracker waits for MySQL to be healthy before starting. + +Check tracker logs for clean startup: + +```bash +# Check for database connection errors +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker logs tracker 2>&1 | grep -i 'database\|mysql\|error' | head -20" +``` +**What to look for:** + +- ✅ **GOOD**: Clean startup with no "Connection refused" errors +- ✅ **GOOD**: Configuration shows `"driver": "mysql"` +- ❌ **BAD**: "Could not connect to address `mysql:3306': Connection refused" + +**Note**: With proper `depends_on` configuration, you should NOT see connection refused errors. The tracker waits for MySQL's healthcheck to pass before starting. + +### 7. Verify Environment Variables + +Check that MySQL credentials are properly configured in the environment file: + +```bash +# Check .env file contains MySQL variables +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "cat /opt/torrust/.env | grep MYSQL" +``` + +**Expected variables:** + +```env +MYSQL_ROOT_PASSWORD=root_password +MYSQL_DATABASE=torrust_tracker +MYSQL_USER=tracker_user +MYSQL_PASSWORD=tracker_password +``` + +**Verify docker-compose.yml references:** + +```bash +# Check docker-compose.yml uses environment variables +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "cat /opt/torrust/docker-compose.yml | grep -A 10 'mysql:'" +``` + +Should show environment variable references like `${MYSQL_ROOT_PASSWORD}`, not hardcoded values. + +### 8. Test Tracker Functionality with MySQL + +Make an announce request and verify stats are collected: + +```bash +# Get initial stats +INITIAL_STATS=$(curl -s -H "Authorization: Bearer MyAccessToken" \ + http://$INSTANCE_IP:1212/api/v1/stats) +echo "Initial stats: $INITIAL_STATS" + +# Make an announce request (from outside VM - more realistic) +curl -H "X-Forwarded-For: 203.0.113.45" \ + "http://$INSTANCE_IP:7070/announce?info_hash=%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C%3C&peer_id=-qB00000000000000001&port=17548&uploaded=0&downloaded=0&left=0&event=started" + +# Get updated stats +UPDATED_STATS=$(curl -s -H "Authorization: Bearer MyAccessToken" \ + http://$INSTANCE_IP:1212/api/v1/stats) +echo "Updated stats: $UPDATED_STATS" +``` + +**Expected behavior:** + +- ✅ Announce request returns HTTP 200 with tracker response +- ✅ Stats counters increment (e.g., `tcp4_announces_handled`, `tcp4_connections_handled`) +- ✅ MySQL connection remains stable (no errors in tracker logs) + +**Note**: The tracker uses in-memory storage for active torrents by default. Torrents are only persisted to MySQL in specific cases: + +- In private mode: all torrents are persisted +- In public mode: only whitelisted torrents are persisted + +The key verification is that MySQL is accessible and the tracker functions correctly. + +## MySQL-Specific Troubleshooting + +### Common Verification Commands + +```bash # Check internal state transitioned to Provisioned cat data/manual-test-mysql/environment.json | jq 'keys | .[]' # Expected: "Provisioned" (this is the top-level key - Rust enum variant) @@ -143,7 +320,7 @@ ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "docker ps" # Expected: Empty list (no containers running yet) ``` -### Step 4: Release Application +## Step 4: Release Application Deploy tracker configuration and Docker Compose files: @@ -394,238 +571,185 @@ cat data/manual-test-mysql/environment.json | jq 'keys | .[]' # Expected: "Destroyed" (top-level key) ``` -## Key Verification Points - -### MySQL Configuration in Templates - -**1. Tracker Config (`tracker.toml`)**: - -- Driver should be `"mysql"` -- Path should be MySQL connection string format: `mysql://user:pass@host:port/database` - -**2. Docker Compose `.env` file**: - -- Should contain `MYSQL_ROOT_PASSWORD`, `MYSQL_DATABASE`, `MYSQL_USER`, `MYSQL_PASSWORD` -- Values should match environment configuration - -**3. Docker Compose `docker-compose.yml`**: - -- MySQL service should use `${MYSQL_*}` environment variable references -- NOT hardcoded values +## MySQL-Specific Troubleshooting -### Runtime Verification +This section covers common MySQL-specific issues. For general troubleshooting, see the [Manual E2E Testing Guide](README.md#troubleshooting-manual-tests). -**1. Containers**: +### MySQL Container Not Healthy -- Both `torrust-tracker` and `mysql` containers should be running -- MySQL container should show `(healthy)` status - -**2. Database**: - -- MySQL database tables should be created -- Tracker should be able to read/write to database - -**3. API**: - -- Tracker API should respond on port 1212 -- Stats endpoint should return valid JSON - -## Troubleshooting - -### MySQL container not healthy +If the MySQL container fails to start or shows unhealthy status: ```bash -# Check MySQL container logs -INSTANCE_IP=$(cat data/manual-test-mysql/environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') -ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP "docker logs mysql" +# Check MySQL container logs for errors +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker logs mysql 2>&1 | tail -50" -# Verify MySQL service status +# Check container status ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ "docker ps --filter 'name=mysql' --format '{{.Names}}\t{{.Status}}'" ``` -### Tracker shows database connection errors +**Common issues:** -**Note**: You may see errors like "unable to open database file: mysql://..." in the tracker logs. This is a known issue being investigated. The tracker may still function correctly despite these errors. +- **Port 3306 already in use**: Another MySQL instance running on host +- **Permission denied**: Volume mount permissions incorrect +- **Initialization failed**: Database name or credentials invalid + +### Tracker Connection Refused Errors + +If you see "Connection refused" errors when tracker tries to connect to MySQL: ```bash -# Check tracker logs -INSTANCE_IP=$(cat data/manual-test-mysql/environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') +# Check if MySQL healthcheck is properly configured ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ - "docker logs tracker 2>&1 | tail -50" + "cat /opt/torrust/docker-compose.yml | grep -A 10 'mysql:' | grep healthcheck -A 5" +``` -# Verify tracker configuration inside container -ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ - "docker exec tracker cat /etc/torrust/tracker/tracker.toml | grep -A 3 'database'" +**Expected healthcheck configuration:** -# Check if MySQL database is accessible -ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ - "docker exec mysql mysql -u tracker_user -p'tracker_password' torrust_tracker -e 'SELECT 1;'" +```yaml +healthcheck: + test: + [ + "CMD", + "mysqladmin", + "ping", + "-h", + "localhost", + "-u", + "root", + "-p$$MYSQL_ROOT_PASSWORD", + ] + interval: 10s + timeout: 5s + retries: 5 ``` -### Environment variables not applied +**Verify tracker depends_on configuration:** ```bash -# Verify .env file exists and has MySQL variables -INSTANCE_IP=$(cat data/manual-test-mysql/environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') -ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP "cat /opt/torrust/.env" - -# Check docker-compose.yml references variables correctly ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ - "cat /opt/torrust/docker-compose.yml | grep -A 15 'mysql:'" + "cat /opt/torrust/docker-compose.yml | grep -A 5 'tracker:' | grep depends_on -A 3" ``` -## Debugging with Application Logs - -If you encounter any issues during the workflow, the application maintains detailed logs that can help diagnose problems: - -### Log Location +**Expected tracker depends_on:** -All application execution logs are stored in: - -```bash -data/logs/log.txt +```yaml +depends_on: + mysql: + condition: service_healthy ``` -This file contains **all** operations performed by the deployer, including: - -- Command execution traces with timestamps -- State transitions (Created → Provisioned → Configured → Released → Running) -- Ansible playbook executions with full command details -- Template rendering operations -- Error messages with context -- Step-by-step progress through each command +### Database Connection Errors in Tracker Logs -### Viewing Logs +**Note**: You may see errors like "unable to open database file: mysql://..." in the tracker logs. This is a known issue being investigated. The tracker may still function correctly despite these errors. -**View recent operations**: +Check tracker logs for MySQL connection issues: ```bash -# Last 100 lines -tail -100 data/logs/log.txt - -# Follow logs in real-time -tail -f data/logs/log.txt +# Filter tracker logs for database/MySQL errors +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker logs tracker 2>&1 | grep -i 'database\|mysql\|r2d2\|connection' | tail -30" ``` -**Search for specific command**: +**Verify tracker can connect to MySQL:** ```bash -# View all release command operations -grep -A5 -B5 'release' data/logs/log.txt - -# View provision operations -grep -A5 -B5 'provision' data/logs/log.txt +# Test MySQL connection from inside tracker container +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec tracker sh -c 'nc -zv mysql 3306'" -# View Ansible playbook executions -grep 'ansible-playbook' data/logs/log.txt +# Expected output: +# mysql (172.18.0.2:3306) open ``` -**Check state transitions**: +### Environment Variables Not Applied -```bash -# View all state transitions for your environment -grep 'Environment state transition' data/logs/log.txt | grep manual-test-mysql -``` - -**Find errors**: +If MySQL credentials don't match configuration: ```bash -# Search for ERROR level logs -grep 'ERROR' data/logs/log.txt +# Check .env file contains correct MySQL variables +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "cat /opt/torrust/.env | grep MYSQL" -# Search for WARN level logs -grep 'WARN' data/logs/log.txt +# Verify docker-compose.yml references variables (not hardcoded values) +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "cat /opt/torrust/docker-compose.yml | grep -A 15 'mysql:' | grep environment -A 5" ``` -### Example: Debugging Release Command - -If the release command completes but files aren't on the VM: - -```bash -# Check what actually happened during release -grep -A10 'release_command' data/logs/log.txt | tail -50 - -# Verify Ansible playbooks were executed -grep 'deploy-tracker-config\|deploy-compose-files' data/logs/log.txt +**Expected in docker-compose.yml:** -# Check for any Ansible errors -grep -A5 'Ansible playbook.*failed' data/logs/log.txt +```yaml +environment: + - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} + - MYSQL_DATABASE=${MYSQL_DATABASE} + - MYSQL_USER=${MYSQL_USER} + - MYSQL_PASSWORD=${MYSQL_PASSWORD} ``` -### Log Format +**NOT like this (hardcoded):** -Logs are structured with: +```yaml +environment: + - MYSQL_ROOT_PASSWORD=hardcoded_password # ❌ WRONG +``` -- **Timestamp**: ISO 8601 format (e.g., `2025-12-14T11:52:16.232160Z`) -- **Level**: INFO, WARN, ERROR -- **Span**: Command and step context (e.g., `release_command:deploy_tracker_config`) -- **Module**: Rust module path -- **Message**: Human-readable description -- **Fields**: Structured data (environment name, step name, status, etc.) +### Tables Not Created -**Example log entry**: +If tracker tables don't exist in MySQL: -```text -2025-12-14T11:52:21.495109Z INFO release_command: torrust_tracker_deployer_lib::application::command_handlers::release::handler: -Tracker configuration deployed successfully command="release" step=Deploy Tracker Config to Remote command_type="release" -environment=manual-test-mysql +```bash +# Check if tracker has created tables +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@$INSTANCE_IP \ + "docker exec mysql mysql -u tracker_user -p'tracker_password' torrust_tracker -e 'SHOW TABLES;'" ``` -### Common Issues in Logs +**If tables are missing:** -1. **"Ansible playbook failed"**: Check the Ansible command that was executed and verify SSH connectivity -2. **"Template rendering failed"**: Check template syntax and context data -3. **"State persistence failed"**: Check file permissions in `data/` directory -4. **"Instance IP not found"**: Environment wasn't provisioned correctly +1. Check tracker logs for migration errors +2. Verify tracker has correct database permissions +3. Ensure tracker started successfully after MySQL was healthy -### Tips +## Comparison: MySQL vs SQLite -- The log file grows with each command execution -- Consider searching for your environment name to filter relevant logs -- Timestamps help correlate logs with command execution times -- All Ansible playbook commands are logged with full paths and arguments +### Performance Characteristics -## Success Criteria +**SQLite:** -The MySQL implementation is successful if: +- ✅ Simpler setup (no separate container) +- ✅ Faster for small-scale deployments +- ✅ Lower memory footprint +- ❌ Limited concurrency +- ❌ Single file - no network access -1. ✅ All commands complete without errors -2. ✅ Tracker config contains MySQL connection string (not SQLite path) -3. ✅ `.env` file contains all MySQL credentials AND standardized `TORRUST_TRACKER_CONFIG_OVERRIDE_CORE__DATABASE__DRIVER` variable -4. ✅ `docker-compose.yml` uses `${MYSQL_*}` and `${TORRUST_TRACKER_CONFIG_OVERRIDE_CORE__DATABASE__DRIVER}` references (not hardcoded) -5. ✅ Both containers (tracker + MySQL) start and run healthy -6. ✅ Tracker API responds with valid JSON -7. ✅ MySQL database tables are created -8. ✅ No connection errors in tracker logs -9. ✅ Application logs in `data/logs/log.txt` show successful state transitions +**MySQL:** -## Comparison with SQLite +- ✅ Better concurrency for multiple clients +- ✅ Network accessible (can query from other services) +- ✅ Better for high-traffic deployments +- ✅ Advanced features (replication, clustering) +- ❌ More complex setup (requires container/service) +- ❌ Higher memory usage -For comparison, test the same workflow with SQLite configuration: +### When to Use MySQL -```bash -# Use existing SQLite config -cargo run -- create environment --env-file data/e2e-deployment/environment.json -cargo run -- provision e2e-deployment -cargo run -- configure e2e-deployment -cargo run -- release e2e-deployment -cargo run -- run e2e-deployment -cargo run -- test e2e-deployment -cargo run -- destroy e2e-deployment -``` +Choose MySQL when: + +- **High concurrency**: Multiple clients accessing tracker simultaneously +- **Network access**: Need to query database from external tools/services +- **Production deployments**: Long-term stable deployments with scaling needs +- **Replication needs**: Want database backup/replication features -**Key Differences**: +Choose SQLite when: -- SQLite: `driver = "sqlite3"`, `path = "/var/lib/torrust/tracker/database/sqlite3.db"` -- MySQL: `driver = "mysql"`, `path = "mysql://user:pass@host:port/database"` -- SQLite: No MySQL service in docker-compose -- MySQL: MySQL service with healthcheck in docker-compose +- **Development/testing**: Quick local testing +- **Low traffic**: Personal or small-scale deployments +- **Simplicity**: Prefer simpler setup without database container +- **Single-instance**: No need for network database access ## Related Documentation -- [Environment Configuration](../user-guide/configuration/environment.md) -- [Release Command](../user-guide/commands/release.md) -- [Run Command](../user-guide/commands/run.md) -- [ADR: Database Configuration Structure in Templates](../decisions/database-configuration-structure-in-templates.md) -- [ADR: Environment Variable Injection in Docker Compose](../decisions/environment-variable-injection-in-docker-compose.md) +- [Manual E2E Testing Guide](README.md) - Complete deployment workflow +- [Prometheus Verification Guide](prometheus-verification.md) - Metrics collection verification +- [MySQL Configuration Schema](../../user-guide/README.md) - Configuration file format +- [Troubleshooting Guide](../README.md) - General troubleshooting tips diff --git a/docs/e2e-testing/manual/prometheus-verification.md b/docs/e2e-testing/manual/prometheus-verification.md new file mode 100644 index 00000000..7a6a722b --- /dev/null +++ b/docs/e2e-testing/manual/prometheus-verification.md @@ -0,0 +1,517 @@ +# Manual Prometheus Service Verification + +This guide provides step-by-step instructions for manually verifying that the Prometheus metrics collection service is correctly deployed, configured, and collecting metrics from the Torrust Tracker. + +## Prerequisites + +- A deployed environment with Prometheus enabled in the configuration +- SSH access to the target instance +- The tracker service must be running +- Basic knowledge of Docker and Prometheus + +## Environment Setup + +This guide assumes you have completed the full deployment workflow: + +```bash +# 1. Create environment with Prometheus enabled +cargo run -- create environment --env-file envs/your-config.json + +# 2. Provision infrastructure +cargo run -- provision your-env + +# 3. Configure services +cargo run -- configure your-env + +# 4. Release software +cargo run -- release your-env + +# 5. Run services +cargo run -- run your-env +``` + +Your environment configuration should include the `prometheus` section: + +```json +{ + "environment": { "name": "your-env" }, + "tracker": { ... }, + "prometheus": { + "scrape_interval": 15 + } +} +``` + +## Getting the VM IP Address + +First, get the IP address of your deployed VM: + +### For LXD VMs + +```bash +# List all LXD instances +lxc list + +# Find your instance (e.g., torrust-tracker-vm-your-env) +# Look for the IP address in the enp5s0 interface column +``` + +Example output: + +```text +| torrust-tracker-vm-your-env | RUNNING | 10.140.190.249 (enp5s0) | ... | VIRTUAL-MACHINE | +``` + +The VM IP in this example is `10.140.190.249`. + +## Verification Steps + +### 1. Verify Prometheus Container is Running + +SSH into the VM and check that the Prometheus container is running: + +```bash +# SSH into the VM +ssh -i fixtures/testing_rsa -o StrictHostKeyChecking=no torrust@ + +# Check running containers +docker ps +``` + +**Expected output:** + +You should see two containers running: + +```text +CONTAINER ID IMAGE COMMAND STATUS +b2d988505fae prom/prometheus:v3.0.1 "/bin/prometheus --c…" Up 2 minutes +f0e3124878de torrust/tracker:develop "/usr/local/bin/entr…" Up 2 minutes (healthy) +``` + +**Key verification points:** + +- ✅ `prom/prometheus:v3.0.1` container is present +- ✅ Container status shows "Up" (not "Restarting" or "Exited") +- ✅ Port 9090 is exposed (`0.0.0.0:9090->9090/tcp`) + +### 2. Verify Prometheus Configuration File + +Check that the Prometheus configuration file was deployed correctly: + +```bash +# Check file exists and has correct permissions +ls -la /opt/torrust/storage/prometheus/etc/prometheus.yml + +# View the configuration +cat /opt/torrust/storage/prometheus/etc/prometheus.yml +``` + +**Expected output:** + +```yaml +# Prometheus Configuration for Torrust Tracker Metrics Collection + +global: + scrape_interval: 15s # How often to scrape metrics from targets + +scrape_configs: + # Tracker Statistics - Aggregate metrics about tracker state + - job_name: "tracker_stats" + metrics_path: "/api/v1/stats" + params: + token: [""] + format: ["prometheus"] + static_configs: + - targets: ["tracker:1212"] + + # Tracker Metrics - Detailed operational metrics + - job_name: "tracker_metrics" + metrics_path: "/api/v1/metrics" + params: + token: [""] + format: ["prometheus"] + static_configs: + - targets: ["tracker:1212"] +``` + +**Key verification points:** + +- ✅ File exists at the correct path +- ✅ File is readable (permissions: `0644`) +- ✅ `scrape_interval` matches your configuration (e.g., `15s`) +- ✅ Admin token matches your tracker configuration +- ✅ Port matches your tracker HTTP API port (default: `1212`) +- ✅ Both `tracker_stats` and `tracker_metrics` jobs are configured + +### 3. Verify Prometheus Targets are Up + +Check that Prometheus is successfully scraping both tracker endpoints: + +```bash +# From your local machine (not inside the VM) +curl -s http://:9090/api/v1/targets | python3 -m json.tool +``` + +**Expected output:** + +Look for the `activeTargets` array containing both jobs with `"health": "up"`: + +```json +{ + "status": "success", + "data": { + "activeTargets": [ + { + "labels": { + "instance": "tracker:1212", + "job": "tracker_metrics" + }, + "scrapeUrl": "http://tracker:1212/api/v1/metrics?format=prometheus&token=...", + "lastError": "", + "health": "up", + "scrapeInterval": "15s" + }, + { + "labels": { + "instance": "tracker:1212", + "job": "tracker_stats" + }, + "scrapeUrl": "http://tracker:1212/api/v1/stats?format=prometheus&token=...", + "lastError": "", + "health": "up", + "scrapeInterval": "15s" + } + ] + } +} +``` + +**Key verification points:** + +- ✅ Both `tracker_metrics` and `tracker_stats` jobs are present +- ✅ `health` field shows `"up"` for both targets +- ✅ `lastError` field is empty (`""`) +- ✅ `scrapeInterval` matches your configuration +- ✅ `lastScrape` timestamp is recent (within the last minute) + +**If targets are down:** + +Check the `lastError` field for error messages: + +- **Connection refused**: Tracker container might not be running or healthy +- **Authentication error**: Admin token mismatch between config files +- **Timeout**: Network connectivity issues or tracker overloaded + +### 4. Verify Tracker Endpoints Directly + +Test the tracker metrics endpoints directly to ensure they're accessible: + +```bash +# Test the stats endpoint +curl -s "http://:1212/api/v1/stats?token=&format=prometheus" + +# Test the metrics endpoint +curl -s "http://:1212/api/v1/metrics?token=&format=prometheus" +``` + +**Expected output (stats endpoint):** + +```text +torrents 0 +seeders 0 +completed 0 +leechers 0 +tcp4_connections_handled 0 +tcp4_announces_handled 0 +tcp4_scrapes_handled 0 +udp_requests_aborted 0 +udp4_requests 18 +udp4_connections_handled 18 +... +``` + +**Expected output (metrics endpoint):** + +```text +# HELP torrust_tracker_announce_requests_total Total number of announce requests +# TYPE torrust_tracker_announce_requests_total counter +torrust_tracker_announce_requests_total 0 + +# HELP torrust_tracker_torrents_total Total number of torrents tracked +# TYPE torrust_tracker_torrents_total gauge +torrust_tracker_torrents_total 0 +... +``` + +**Key verification points:** + +- ✅ Both endpoints return metrics data (not authentication errors) +- ✅ Response is in Prometheus text format +- ✅ Metrics contain tracker-specific data (torrents, peers, etc.) + +### 5. Verify Prometheus UI is Accessible + +Access the Prometheus web UI to verify it's working: + +```bash +# Test that Prometheus UI is accessible +curl -s http://:9090 | head -5 +``` + +**Expected output:** + +```html +Found. +``` + +**Alternative verification:** + +Open a web browser and navigate to: + +```text +http://:9090 +``` + +You should see the Prometheus UI with: + +- ✅ Query interface at the top +- ✅ Navigation menu (Alerts, Graph, Status, Help) +- ✅ No error messages + +**Try a sample query:** + +1. Navigate to `http://:9090/graph` +2. In the query box, enter: `torrust_tracker_torrents_total` +3. Click "Execute" +4. Switch to "Graph" tab + +You should see a graph (even if it's flatlined at 0 if no torrents are tracked yet). + +### 6. Verify Metrics are Being Collected + +Query Prometheus to ensure it's storing metrics: + +```bash +# Query for a specific metric +curl -s "http://:9090/api/v1/query?query=up" | python3 -m json.tool +``` + +**Expected output:** + +```json +{ + "status": "success", + "data": { + "resultType": "vector", + "result": [ + { + "metric": { + "job": "tracker_metrics", + "instance": "tracker:1212" + }, + "value": [1734285600, "1"] + }, + { + "metric": { + "job": "tracker_stats", + "instance": "tracker:1212" + }, + "value": [1734285600, "1"] + } + ] + } +} +``` + +**Key verification points:** + +- ✅ Query returns successfully (`"status": "success"`) +- ✅ Both targets show `"value": [..., "1"]` (indicating they're up) +- ✅ Timestamp is recent + +### 7. Verify Data Over Time + +Wait a few minutes (at least 2-3 scrape intervals) and check that data is accumulating: + +```bash +# Query for metrics over the last 5 minutes +curl -s "http://:9090/api/v1/query_range?query=up&start=$(date -u -d '5 minutes ago' +%s)&end=$(date -u +%s)&step=15s" | python3 -m json.tool +``` + +**Key verification points:** + +- ✅ Multiple data points are returned (not just one) +- ✅ Data points are spaced according to `scrape_interval` (e.g., 15s apart) +- ✅ No gaps in the time series + +## Common Issues and Troubleshooting + +### Issue: Prometheus Container Not Running + +**Symptoms:** + +- `docker ps` doesn't show prometheus container +- Or shows container with "Restarting" or "Exited" status + +**Diagnosis:** + +```bash +# Check container logs +docker logs prometheus + +# Check if container exists but stopped +docker ps -a | grep prometheus +``` + +**Common causes:** + +1. **Configuration file syntax error** + + - Fix: Check prometheus.yml for YAML syntax errors + - Validate: `docker run --rm -v /opt/torrust/storage/prometheus/etc:/etc/prometheus prom/prometheus:v3.0.1 promtool check config /etc/prometheus/prometheus.yml` + +2. **Port 9090 already in use** + + - Check: `ss -tulpn | grep 9090` + - Fix: Stop conflicting service or change Prometheus port + +3. **Volume mount issues** + - Fix: Verify `/opt/torrust/storage/prometheus/etc` exists and contains `prometheus.yml` + +### Issue: Targets Showing as "Down" + +**Symptoms:** + +- Prometheus UI shows targets with red "DOWN" status +- `/api/v1/targets` shows `"health": "down"` + +**Diagnosis:** + +```bash +# Check tracker container is running and healthy +docker ps + +# Test tracker endpoints manually +curl http://tracker:1212/api/v1/stats?token=&format=prometheus + +# Check Prometheus logs for scrape errors +docker logs prometheus | grep -i error +``` + +**Common causes:** + +1. **Tracker container not running** + + - Fix: Check tracker container status with `docker ps` + - Check logs: `docker logs tracker` + +2. **Authentication token mismatch** + + - Verify: Token in `prometheus.yml` matches tracker's `admin_token` + - Fix: Correct the token and restart Prometheus + +3. **Network issues** + - Verify: Containers are on the same Docker network + - Check: `docker network inspect ` + +### Issue: No Metrics Data + +**Symptoms:** + +- Prometheus UI shows empty graphs +- Queries return no data + +**Possible causes:** + +1. **Prometheus just started** + + - Wait for at least 1-2 scrape intervals + - Check: `/api/v1/targets` shows `lastScrape` timestamp + +2. **Query syntax error** + + - Verify metric names exist: `curl http://:9090/api/v1/label/__name__/values` + - Use Prometheus UI's autocomplete feature + +3. **Time range issue** + - Ensure you're querying the correct time range + - Try: "Last 5 minutes" in the UI + +## Advanced Verification + +### Check Prometheus Configuration + +Verify Prometheus is using the correct configuration: + +```bash +# Inside the VM +docker exec prometheus cat /etc/prometheus/prometheus.yml +``` + +### Check Prometheus Storage + +Verify Prometheus is persisting data: + +```bash +# Inside the VM +docker exec prometheus ls -la /prometheus +``` + +### Monitor Scrape Duration + +Check how long scrapes are taking: + +```bash +curl -s "http://:9090/api/v1/query?query=scrape_duration_seconds" | python3 -m json.tool +``` + +Scrape duration should be well under the scrape interval (e.g., < 1s for a 15s interval). + +### Verify Prometheus Version + +Confirm the correct Prometheus version is running: + +```bash +curl -s http://:9090/api/v1/status/buildinfo | python3 -m json.tool +``` + +Expected output includes: + +```json +{ + "data": { + "version": "3.0.1", + ... + } +} +``` + +## Success Criteria + +Your Prometheus deployment is successful if: + +- ✅ Prometheus container is running and stable +- ✅ Configuration file is correctly deployed +- ✅ Both tracker endpoints (stats and metrics) show `"health": "up"` +- ✅ Metrics are being collected and stored +- ✅ Prometheus UI is accessible +- ✅ Queries return expected data +- ✅ No errors in Prometheus or tracker logs +- ✅ Data accumulates over time + +## Next Steps + +Once Prometheus is verified: + +1. **Add more scrape targets** - Configure additional services to monitor +2. **Set up alerts** - Define alerting rules for important metrics +3. **Connect Grafana** - Visualize metrics with dashboards +4. **Tune scrape intervals** - Adjust based on your monitoring needs +5. **Review retention** - Configure how long to keep metrics data + +## Related Documentation + +- [Prometheus Official Documentation](https://prometheus.io/docs/) +- [Torrust Tracker Metrics Documentation](https://github.com/torrust/torrust-tracker) +- [Main E2E Testing Guide](../manual-testing.md) +- [Prometheus Configuration Template](../../../templates/prometheus/prometheus.yml.tera) diff --git a/docs/issues/238-prometheus-slice-release-run-commands.md b/docs/issues/238-prometheus-slice-release-run-commands.md index 2ea946da..8a95638f 100644 --- a/docs/issues/238-prometheus-slice-release-run-commands.md +++ b/docs/issues/238-prometheus-slice-release-run-commands.md @@ -9,45 +9,211 @@ This task adds Prometheus as a metrics collection service for the Torrust Tracke ## Goals -- [ ] Add Prometheus service conditionally to docker-compose stack (only when present in environment config) -- [ ] Create Prometheus configuration template with tracker metrics endpoints -- [ ] Extend environment configuration schema to include Prometheus monitoring section -- [ ] Configure service dependency - Prometheus depends on tracker service -- [ ] Include Prometheus in generated environment templates by default (enabled by default) -- [ ] Allow users to disable Prometheus by removing its configuration section -- [ ] Deploy and verify Prometheus collects metrics from tracker +- ✅ Add Prometheus service conditionally to docker-compose stack (only when present in environment config) +- ✅ Create Prometheus configuration template with tracker metrics endpoints +- ✅ Extend environment configuration schema to include Prometheus monitoring section +- ✅ Configure service dependency - Prometheus depends on tracker service +- ✅ Include Prometheus in generated environment templates by default (enabled by default) +- ✅ Allow users to disable Prometheus by removing its configuration section +- ✅ Deploy and verify Prometheus collects metrics from tracker + +## Progress + +- ✅ **Phase 1**: Template Structure & Data Flow Design (commit: 2ca0fa9) + + - Created `PrometheusContext` struct with `scrape_interval`, `api_token`, `api_port` fields + - Implemented module structure following existing patterns + - Added comprehensive unit tests (5 tests) + - Created `templates/prometheus/prometheus.yml.tera` template + +- ✅ **Phase 2**: Environment Configuration (commit: 92aab59) + + - Created `PrometheusConfig` domain struct in `src/domain/prometheus/` + - Added optional `prometheus` field to `UserInputs` (enabled by default) + - Implemented comprehensive unit tests (5 tests) + - Updated all constructors and test fixtures + +- ✅ **Phase 3**: Prometheus Template Renderer (commit: 731eaf4) + + - Created `PrometheusConfigRenderer` to load and render `prometheus.yml.tera` + - Implemented `PrometheusTemplate` wrapper for Tera integration + - Created `PrometheusProjectGenerator` to orchestrate rendering workflow + - Implemented context extraction from `PrometheusConfig` and `TrackerConfig` + - Added 12 comprehensive unit tests with full coverage + - All linters passing + +- ✅ **Phase 4**: Docker Compose Integration (commit: 22790de) + + - Added `prometheus_config: Option` field to `DockerComposeContext` + - Implemented `with_prometheus()` method for context builder pattern + - Added conditional Prometheus service to `docker-compose.yml.tera` template + - Prometheus service uses bind mount: `./storage/prometheus/etc:/etc/prometheus:Z` + - Added 4 comprehensive unit tests for Prometheus service rendering + - All linters passing + +- ✅ **Phase 5**: Release Command Integration (commit: f20d45c) + + - **FIXED**: Moved Prometheus template rendering from docker-compose step to independent step in release handler + - Created `RenderPrometheusTemplatesStep` to render Prometheus templates + - Added `render_prometheus_templates()` method to `ReleaseCommandHandler` + - Prometheus templates now rendered independently at Step 5 (after tracker templates, before docker-compose) + - Docker Compose step only adds Prometheus config to context (no template rendering) + - Added `RenderPrometheusTemplates` variant to `ReleaseStep` enum + - Extended `EnvironmentTestBuilder` with `with_prometheus_config()` method + - All linters passing, all tests passing (1507 tests) + - **Architectural Principle**: Each service renders its templates independently in the release handler + +- ✅ **Phase 6**: Ansible Deployment (commit: pending) + + - Created Ansible playbooks: + - `templates/ansible/create-prometheus-storage.yml` - Creates `/opt/torrust/storage/prometheus/etc` directory + - `templates/ansible/deploy-prometheus-config.yml` - Deploys `prometheus.yml` configuration file with verification + - Created Rust application steps: + - `CreatePrometheusStorageStep` - Executes create-prometheus-storage playbook + - `DeployPrometheusConfigStep` - Executes deploy-prometheus-config playbook + - Registered playbooks in `AnsibleProjectGenerator` (16 total playbooks) + - Registered steps in `application/steps/application/mod.rs` + - Updated release handler with new methods: + - `create_prometheus_storage()` - Creates Prometheus storage directories (Step 5) + - `deploy_prometheus_config_to_remote()` - Deploys Prometheus config (Step 7) + - Added new `ReleaseStep` enum variants: + - `CreatePrometheusStorage` + - `DeployPrometheusConfigToRemote` + - Added error handling: + - `PrometheusStorageCreation` error variant with help text + - Proper trace formatting and error classification + - Updated workflow to 9 steps total: + - Step 5: Create Prometheus storage (if enabled) + - Step 6: Render Prometheus templates (if enabled) + - Step 7: Deploy Prometheus config (if enabled) + - Step 8: Render Docker Compose templates + - Step 9: Deploy compose files + - All linters passing, all tests passing (1507 tests) + - **Pattern**: Independent Prometheus deployment following tracker pattern + +- ✅ **Phase 6**: Ansible Deployment (commit: 9c1b91a) + +- ✅ **Phase 7**: Testing & Verification (commit: a257fcf) + + - Refactored validation with `ServiceValidation` struct for extensibility + - Replaces boolean parameter with flags struct for future services (Grafana, etc.) + - Supports selective validation based on enabled services + - Created `PrometheusConfigValidator` to verify prometheus.yml deployment + - Validates file exists at `/opt/torrust/storage/prometheus/etc/prometheus.yml` + - Checks file permissions and ownership via SSH + - Updated e2e-deployment-workflow-tests to use ServiceValidation pattern + - Created test environment configs: + - `envs/e2e-deployment.json` - With Prometheus enabled (scrape_interval: 15) + - `envs/e2e-deployment-no-prometheus.json` - Without Prometheus (disabled scenario) + - E2E tests validate: + - Prometheus configuration file exists at correct path + - Docker Compose files are deployed correctly + - File permissions and ownership are correct + - Manual E2E testing completed (environment: manual-test-prometheus): + - ✅ Prometheus container running (`docker ps` shows prom/prometheus:v3.0.1) + - ✅ Prometheus scraping both tracker endpoints successfully + - `/api/v1/stats` endpoint: health="up", scraping every 15s + - `/api/v1/metrics` endpoint: health="up", scraping every 15s + - ✅ Prometheus UI accessible at `http://:9090` + - ✅ Tracker metrics available and being collected + - ✅ Configuration file correctly deployed with admin token and port + - Created comprehensive manual testing documentation: + - `docs/e2e-testing/manual/prometheus-verification.md` (450+ lines) + - Documents 7 verification steps with exact commands and expected outputs + - Includes troubleshooting guide for common issues + - Provides success criteria checklist + - All linters passing, all E2E tests passing (1507+ tests) + - **Architecture validated**: Independent service rendering pattern working correctly + +- ✅ **Phase 8**: Documentation (commit: 2a820e2) + + - Created ADR: `docs/decisions/prometheus-integration-pattern.md` + - Documents enabled-by-default with opt-out approach + - Explains independent template rendering pattern + - Documents ServiceValidation struct for extensible testing + - Lists alternatives considered and consequences + - Updated user guide: `docs/user-guide/README.md` + - Added Prometheus configuration section + - Documents prometheus.scrape_interval parameter + - Explains enabled-by-default behavior and opt-out pattern + - Instructions for accessing Prometheus UI (port 9090) + - Links to manual verification guide + - Added technical terms to project dictionary (Alertmanager, entr, flatlined, promtool, tulpn) + - All linters passing, all tests passing (1507+ tests) + +## Summary + +Issue [#238](https://github.com/torrust/torrust-tracker-deployer/issues/238) is **complete**. All 8 phases implemented: + +1. ✅ Template Structure & Data Flow Design +2. ✅ Environment Configuration +3. ✅ Prometheus Template Renderer +4. ✅ Docker Compose Integration +5. ✅ Release Command Integration +6. ✅ Ansible Deployment +7. ✅ Testing & Verification +8. ✅ Documentation + +**Total Commits**: 8 (2ca0fa9, 92aab59, 731eaf4, 22790de, f20d45c, 9c1b91a, a257fcf, 2a820e2) + +Prometheus is now fully integrated with: + +- Metrics collection from both `/api/v1/stats` and `/api/v1/metrics` endpoints +- Enabled by default with simple opt-out (remove config section) +- Independent template rendering following DDD principles +- Comprehensive E2E validation (automated + manual) +- Complete documentation (ADR + user guide + manual verification) ## 🏗️ Architecture Requirements **DDD Layers**: Infrastructure + Domain **Module Paths**: -- `src/infrastructure/templating/docker_compose/` - Docker Compose template rendering with Prometheus service - `src/infrastructure/templating/prometheus/` - Prometheus configuration template system (NEW) -- `src/domain/config/environment/` - Environment configuration schema extensions +- `src/infrastructure/templating/docker_compose/` - Docker Compose template rendering with Prometheus service +- `src/domain/prometheus/` - Prometheus configuration domain types (NEW) +- `src/application/command_handlers/create/config/prometheus/` - Prometheus config creation handlers (NEW) **Pattern**: Template System with Project Generator pattern + Configuration-driven service selection ### Module Structure Requirements -- [ ] Follow template system architecture (see [docs/technical/template-system-architecture.md](../technical/template-system-architecture.md)) -- [ ] Create new Prometheus template module following existing patterns (tracker, docker-compose) -- [ ] Use Project Generator pattern for Prometheus templates -- [ ] Register Prometheus configuration template in renderer -- [ ] Use `.tera` extension for dynamic templates -- [ ] Environment config drives Prometheus enablement +- ✅ Follow template system architecture (see [docs/technical/template-system-architecture.md](../technical/template-system-architecture.md)) +- ✅ Create new Prometheus template module following existing patterns (tracker, docker-compose) +- ✅ Use Project Generator pattern for Prometheus templates +- ✅ Register Prometheus configuration template in renderer +- ✅ Use `.tera` extension for dynamic templates +- ✅ Environment config drives Prometheus enablement ### Architectural Constraints -- [ ] Prometheus service is included by default in generated environment templates -- [ ] Only included in docker-compose when Prometheus section present in environment config -- [ ] Service can be disabled by removing the monitoring.prometheus section from config -- [ ] Prometheus depends on tracker service (starts after tracker container starts, no health check) -- [ ] Metrics API token and port read from tracker HTTP API configuration (`tracker.http_api.admin_token` and `tracker.http_api.bind_address`) -- [ ] Prometheus configuration is dynamic (uses Tera templating) +- ✅ Prometheus service is included by default in generated environment templates +- ✅ Only included in docker-compose when Prometheus section present in environment config +- ✅ Service can be disabled by removing the monitoring.prometheus section from config +- ✅ Prometheus depends on tracker service (starts after tracker container starts, no health check) +- ✅ Metrics API token and port read from tracker HTTP API configuration (`tracker.http_api.admin_token` and `tracker.http_api.bind_address`) +- ✅ Prometheus configuration is dynamic (uses Tera templating) +- ✅ **Independent Template Rendering**: Each service renders its templates independently in the release handler + - Prometheus templates rendered by dedicated `RenderPrometheusTemplatesStep` in release handler + - Tracker templates rendered by dedicated `RenderTrackerTemplatesStep` in release handler + - Docker Compose templates rendered by dedicated `RenderDockerComposeTemplatesStep` in release handler + - **Rationale**: Docker Compose templates are NOT the "master" templates - they only define service orchestration + - **Source of Truth**: The environment configuration determines which services are enabled + - **Example**: MySQL service has docker-compose configuration but no separate config files (service-specific) ### Anti-Patterns to Avoid +- ❌ Making Prometheus mandatory for all deployments +- ❌ Hardcoding API tokens in templates +- ❌ Starting Prometheus before tracker is ready +- ❌ Duplicating tracker endpoint configuration +- ❌ Mixing metrics collection logic with other services +- ❌ **Rendering service templates from within docker-compose template rendering** (CRITICAL) + + - Docker Compose step should ONLY render docker-compose files + - Each service's templates should be rendered independently in the release handler + - The handler orchestrates all template rendering steps based on environment config + - ❌ Making Prometheus mandatory for all deployments - ❌ Hardcoding API tokens in templates - ❌ Starting Prometheus before tracker is ready @@ -72,32 +238,30 @@ The implementation follows an **enabled-by-default, opt-out approach** where Pro ### Prometheus Service Enablement -**Environment Configuration Addition**: +**Environment Configuration Addition** (top-level, alongside `tracker`): ```json { - "deployment": { - "tracker": { - "monitoring": { - "prometheus": { - "scrape_interval": 15 - } - } - } + "environment": { "name": "my-deployment" }, + "provider": { ... }, + "ssh_credentials": { ... }, + "tracker": { ... }, + "prometheus": { + "scrape_interval": 15 } } ``` **Default Behavior in Generated Templates**: -- The `monitoring.prometheus` section is **included by default** when generating environment templates +- The `prometheus` section is **included by default** when generating environment templates - If the section is **present** in the environment config → Prometheus service is included in docker-compose - If the section is **removed/absent** from the environment config → Prometheus service is NOT included **Service Detection**: -- Presence of `monitoring.prometheus` section (regardless of content) → Service enabled -- Absence of `monitoring.prometheus` section → Service disabled +- Presence of `prometheus` section (regardless of content) → Service enabled +- Absence of `prometheus` section → Service disabled **Configuration Model**: Uses `Option` in Rust domain model: @@ -202,14 +366,18 @@ scrape_configs: ### Environment Configuration Schema Extensions -**Add to Domain Layer** (`src/domain/config/environment/`): +**Add to Domain Layer** (`src/domain/prometheus/`): ```rust -// In tracker configuration -pub struct TrackerMonitoring { - pub prometheus: Option, -} - +// New file: src/domain/prometheus/mod.rs +pub mod config; +pub use config::PrometheusConfig; + +// New file: src/domain/prometheus/config.rs +/// Prometheus metrics collection configuration +/// +/// Configures how Prometheus scrapes metrics from the tracker. +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] pub struct PrometheusConfig { /// Scrape interval in seconds pub scrape_interval: u32, @@ -222,33 +390,42 @@ impl Default for PrometheusConfig { } ``` +**Add to Environment User Inputs** (`src/domain/environment/user_inputs.rs` or similar): + +The environment's user inputs struct should have a top-level optional `prometheus` field: + +```rust +pub struct UserInputs { + pub provider: ProviderConfig, + pub ssh_credentials: SshCredentials, + pub tracker: TrackerConfig, + /// Prometheus metrics collection (optional third-party service) + #[serde(skip_serializing_if = "Option::is_none")] + pub prometheus: Option, +} +``` + **JSON Schema Addition** (`schemas/environment-config.json`): ```json { - "deployment": { - "tracker": { - "monitoring": { - "prometheus": { - "type": "object", - "description": "Prometheus metrics collection service configuration. Remove this section to disable Prometheus.", - "properties": { - "scrape_interval": { - "type": "integer", - "description": "How often to scrape metrics from tracker (in seconds). Minimum 5s to avoid overwhelming the tracker.", - "default": 15, - "minimum": 5, - "maximum": 300 - } - } - } + "prometheus": { + "type": "object", + "description": "Prometheus metrics collection service configuration. Remove this section to disable Prometheus.", + "properties": { + "scrape_interval": { + "type": "integer", + "description": "How often to scrape metrics from tracker (in seconds). Minimum 5s to avoid overwhelming the tracker.", + "default": 15, + "minimum": 5, + "maximum": 300 } } } } ``` -**Template Generation**: When generating environment templates with `create environment --template`, include the `monitoring.prometheus` section by default. +**Template Generation**: When generating environment templates with `create environment --template`, include the `prometheus` section by default at the top level (alongside `tracker`). ### Template File Organization @@ -327,7 +504,7 @@ volumes: - name: Copy Prometheus configuration ansible.builtin.copy: - src: "{{ build_dir }}/storage/prometheus/etc/prometheus.yml" + src: "{{ build_dir }}/prometheus/prometheus.yml" dest: /opt/torrust/storage/prometheus/etc/prometheus.yml mode: "0644" when: prometheus_config is defined @@ -335,7 +512,13 @@ volumes: ## Implementation Plan -> **Important**: After completing each phase, run `./scripts/pre-commit.sh` to verify all checks pass, then commit your changes with a descriptive message following the [commit conventions](../contributing/commit-process.md). This ensures incremental progress is saved and issues are caught early. +> **Important Workflow**: After completing each phase: +> +> 1. Run `./scripts/pre-commit.sh` to verify all checks pass +> 2. Commit your changes with a descriptive message following the [commit conventions](../contributing/commit-process.md) +> 3. **STOP and wait for feedback/approval before proceeding to the next phase** +> +> This ensures incremental progress is saved, issues are caught early, and each phase is reviewed before moving forward. ### Phase 1: Template Structure & Data Flow Design (1 hour) @@ -345,14 +528,18 @@ volumes: - [ ] Create initial `prometheus.yml.tera` template with placeholder variables - [ ] Verify template variables match context struct fields +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 2. + ### Phase 2: Environment Configuration (1-2 hours) -- [ ] Extend domain `TrackerConfig` with `monitoring.prometheus: Option` field -- [ ] Update JSON schema with Prometheus configuration (scrape_interval field) -- [ ] Add Prometheus config to application layer conversion methods -- [ ] Ensure generated templates include Prometheus section by default +- [ ] Create new domain module `src/domain/prometheus/` with `PrometheusConfig` struct +- [ ] Add `prometheus: Option` to environment's `UserInputs` struct (top-level, alongside tracker) +- [ ] Update JSON schema with Prometheus configuration (top-level) +- [ ] Ensure generated templates include Prometheus section by default (at top level) - [ ] Add unit tests for Prometheus configuration serialization/deserialization (with and without section) +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 3. + ### Phase 3: Prometheus Template Renderer (2 hours) **Why Phase 3**: Must create the Prometheus renderer BEFORE docker-compose integration can use it. @@ -363,6 +550,8 @@ volumes: - [ ] Register Prometheus templates in project generator - [ ] Add comprehensive unit tests for renderer (with different scrape intervals, tokens, ports) +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 4. + ### Phase 4: Docker Compose Integration (2-3 hours) **Why Phase 4**: Now we can integrate with the existing Prometheus renderer from Phase 3. @@ -373,15 +562,19 @@ volumes: - [ ] Update docker-compose template renderer to handle Prometheus context - [ ] Add unit tests for Prometheus service rendering (with and without Prometheus section) +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 5. + ### Phase 5: Release Command Integration (1 hour) **Why Phase 5**: Orchestrates both renderers (docker-compose + prometheus) created in previous phases. - [ ] Update `RenderTemplatesStep` to call Prometheus renderer when config present -- [ ] Ensure Prometheus templates rendered to `build/{env}/storage/prometheus/etc/` directory +- ✅ Ensure Prometheus templates rendered to `build/{env}/prometheus/` directory - [ ] Verify build directory structure includes Prometheus configuration - [ ] Test release command with Prometheus enabled and disabled +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 6. + ### Phase 6: Ansible Deployment (1 hour) - [ ] Extend release playbook with Prometheus configuration tasks @@ -389,6 +582,8 @@ volumes: - [ ] Add conditional file copy for prometheus.yml - [ ] Test Ansible playbook with Prometheus enabled/disabled +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 7. + ### Phase 7: Testing & Verification (2-3 hours) - [ ] Add E2E test for deployment with Prometheus enabled (default behavior) @@ -399,6 +594,8 @@ volumes: - [ ] Verify Prometheus scrapes metrics from tracker endpoints - [ ] Update manual testing documentation +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit changes, and **WAIT FOR APPROVAL** before Phase 8. + ### Phase 8: Documentation (1 hour) - [ ] Create ADR for Prometheus integration pattern @@ -407,6 +604,8 @@ volumes: - [ ] Add Prometheus to architecture diagrams - [ ] Update AGENTS.md if needed +**Checkpoint**: Run `./scripts/pre-commit.sh`, commit final changes, and mark issue as complete. + **Total Estimated Time**: 13-17 hours ## Acceptance Criteria diff --git a/docs/user-guide/README.md b/docs/user-guide/README.md index 43a7ad9c..3fa578d9 100644 --- a/docs/user-guide/README.md +++ b/docs/user-guide/README.md @@ -10,6 +10,8 @@ Welcome to the Torrust Tracker Deployer user guide! This guide will help you get - [Available Commands](#available-commands) - [Basic Workflows](#basic-workflows) - [Configuration](#configuration) +- [Services](#services) +- [Security](#security) - [Troubleshooting](#troubleshooting) - [Additional Resources](#additional-resources) @@ -271,6 +273,60 @@ The environment configuration file is in JSON format: - SSH port number - Default: `22` +For service-specific configuration (Prometheus, MySQL, etc.), see the [Services](#services) section below. + +## Services + +The Torrust Tracker Deployer supports optional services that can be enabled in your deployment: + +### Available Services + +- **[Prometheus Monitoring](services/prometheus.md)** - Metrics collection and monitoring (enabled by default) + - Automatic metrics scraping from tracker API + - Web UI on port 9090 + - Configurable scrape intervals + - Can be disabled by removing from configuration + +### Adding or Removing Services + +Services are configured in your environment JSON file. To enable a service, include its configuration section. To disable it, remove the section. + +**Example with Prometheus**: + +```json +{ + "environment": { "name": "my-env" }, + "ssh_credentials": { ... }, + "prometheus": { + "scrape_interval": 15 + } +} +``` + +**Example without Prometheus**: + +```json +{ + "environment": { "name": "my-env" }, + "ssh_credentials": { ... } +} +``` + +See individual service guides for detailed configuration options and verification steps. + +## Security + +**🔒 CRITICAL**: The deployer automatically configures firewall protection during the `configure` command to secure internal services (Prometheus, MySQL) while keeping tracker services publicly accessible. + +**For complete security information**, see the **[Security Guide](security.md)** which covers: + +- Automatic firewall configuration (UFW) +- Why firewall protection is critical for production +- SSH security best practices +- Docker and network security +- Production security checklist +- Security incident response + ### Logging Configuration Control logging output with command-line options: diff --git a/docs/user-guide/security.md b/docs/user-guide/security.md new file mode 100644 index 00000000..7af30b1a --- /dev/null +++ b/docs/user-guide/security.md @@ -0,0 +1,333 @@ +# Security Guide + +This guide covers security considerations and best practices when deploying Torrust Tracker using the deployer. + +## Overview + +Security is a critical aspect of production deployments. The Torrust Tracker Deployer implements several security measures automatically during the deployment process, with additional considerations for production environments. + +## Firewall Configuration + +### Automatic Firewall Setup + +**CRITICAL**: The `configure` command automatically configures UFW (Uncomplicated Firewall) on virtual machines to protect internal services from unauthorized external access. + +During the `configure` step, the deployer: + +1. **Installs UFW** - Ensures the firewall is available +2. **Sets restrictive policies** - Denies all incoming traffic by default +3. **Allows SSH access** - Preserves SSH connectivity (configured port) +4. **Allows tracker services** - Opens only necessary tracker ports: + - UDP tracker ports (configured in environment) + - HTTP tracker ports (configured in environment) + - HTTP API port (configured in environment) +5. **Enables the firewall** - Activates rules to protect the system + +### Why Firewall Configuration Matters + +The Docker Compose configuration (`templates/docker-compose/docker-compose.yml.tera`) exposes several service ports that should **NOT** be publicly accessible: + +**Exposed Ports in Docker Compose**: + +```yaml +services: + # Tracker - Public ports (UDP/HTTP tracker, HTTP API) + tracker: + ports: + - "6969:6969/udp" # ✅ Public - UDP tracker + - "7070:7070" # ✅ Public - HTTP tracker + - "1212:1212" # ✅ Public - HTTP API + + # Prometheus - INTERNAL ONLY + prometheus: + ports: + - "9090:9090" # ⚠️ INTERNAL - Metrics UI + + # MySQL - INTERNAL ONLY + mysql: + ports: + - "3306:3306" # ⚠️ INTERNAL - Database +``` + +**Without firewall protection**, services like Prometheus (port 9090) and MySQL (port 3306) would be accessible from the internet, potentially exposing: + +- **Prometheus** - Internal metrics, performance data, system topology +- **MySQL** - Database access (even with authentication, this is a security risk) + +**With firewall protection** (UFW configured by `configure` command): + +- ✅ **Tracker ports** - Accessible externally (UDP tracker, HTTP tracker, HTTP API) +- 🔒 **Prometheus port** - Blocked from external access +- 🔒 **MySQL port** - Blocked from external access +- ✅ **SSH access** - Preserved for administration + +### E2E Testing vs Production + +**E2E Testing (Docker Containers)**: + +- Uses Docker containers instead of VMs for faster test execution +- Firewall **NOT** configured inside containers (containers provide isolation) +- Services exposed for testing purposes +- ⚠️ **NOT suitable for production use** + +**Production Deployments (Virtual Machines)**: + +- Uses real VMs (LXD, cloud providers) +- Firewall **automatically configured** by `configure` command +- Only tracker services exposed externally +- ✅ **Production-ready security posture** + +### Firewall Rules Applied + +The deployer configures these firewall rules during the `configure` step: + +```bash +# SSH Access (required for management) +ufw allow /tcp + +# UDP Tracker Ports (configured in environment) +ufw allow /udp + +# HTTP Tracker Ports (configured in environment) +ufw allow /tcp + +# HTTP API Port (configured in environment) +ufw allow /tcp + +# Default policies +ufw default deny incoming # Block everything else +ufw default allow outgoing # Allow outbound connections +``` + +### Verifying Firewall Configuration + +After running the `configure` command, verify firewall rules: + +```bash +# SSH into your VM +INSTANCE_IP=$(cat data//environment.json | jq -r '.Configured.context.runtime_outputs.instance_ip') +ssh -i @$INSTANCE_IP + +# Check UFW status +sudo ufw status numbered + +# Expected output shows: +# - SSH port allowed +# - Tracker ports allowed (UDP/HTTP/API) +# - Default deny incoming policy +# - All other ports blocked +``` + +**Example output**: + +```text +Status: active + + To Action From + -- ------ ---- +[ 1] 22/tcp ALLOW IN Anywhere +[ 2] 6969/udp ALLOW IN Anywhere +[ 3] 7070/tcp ALLOW IN Anywhere +[ 4] 1212/tcp ALLOW IN Anywhere +``` + +Note that ports 9090 (Prometheus) and 3306 (MySQL) are **not** in this list, meaning they are blocked from external access. + +## SSH Security + +### SSH Key Authentication + +The deployer requires SSH key-based authentication for VM access: + +**Best Practices**: + +1. **Use strong SSH keys** - Generate RSA keys with at least 4096 bits: + + ```bash + ssh-keygen -t rsa -b 4096 -f ~/.ssh/torrust_deploy + ``` + +2. **Protect private keys** - Set restrictive permissions: + + ```bash + chmod 600 ~/.ssh/torrust_deploy + ``` + +3. **Use dedicated keys** - Don't reuse personal SSH keys for deployments + +4. **Rotate keys regularly** - Update SSH keys periodically + +### SSH Port Configuration + +The default SSH port (22) is commonly targeted by automated attacks. Consider using a custom port: + +```json +{ + "ssh_credentials": { + "port": 2222 // Custom SSH port + } +} +``` + +**Trade-offs**: + +- ✅ Reduces automated attack attempts +- ✅ Adds minimal security through obscurity +- ⚠️ Must remember custom port for manual access +- ⚠️ Not a substitute for strong authentication + +## Docker Security Considerations + +### Container Isolation + +Services run in isolated Docker containers with: + +- **Network isolation** - Backend network for inter-container communication +- **Volume mounts** - Limited filesystem access with `:Z` SELinux labels +- **Resource limits** - Logging limits prevent disk exhaustion +- **Restart policies** - Automatic recovery from failures + +### Image Security + +**Current Images**: + +- `torrust/tracker:develop` - Torrust Tracker (development tag) +- `prom/prometheus:v3.0.1` - Prometheus (pinned version) +- `mysql:8.0` - MySQL (major version pinned) + +**Recommendations**: + +1. **Pin specific versions** - Use exact version tags in production +2. **Scan images regularly** - Check for known vulnerabilities +3. **Update periodically** - Apply security patches +4. **Use official images** - Prefer official/verified images + +### Environment Variables + +Sensitive configuration is managed via `.env` files on the VM: + +**Best Practices**: + +1. **Strong passwords** - Use complex, randomly generated passwords +2. **Unique credentials** - Different passwords per environment +3. **Secure storage** - Never commit `.env` files to version control +4. **Rotation policy** - Update passwords periodically + +**Example** (DO NOT use these values): + +```bash +# Bad - Weak passwords +MYSQL_ROOT_PASSWORD=password123 +MYSQL_PASSWORD=tracker + +# Good - Strong, unique passwords +MYSQL_ROOT_PASSWORD=7k#mP9$vL2@qX5nR8jW +MYSQL_PASSWORD=xF4!hT6@dN9$sK2mQ7wE +``` + +## Network Security + +### Service Exposure + +The deployer follows the principle of least exposure: + +**Public Services** (accessible externally): + +- UDP Tracker - Required for BitTorrent protocol +- HTTP Tracker - Required for HTTP-based tracker operations +- HTTP API - Required for tracker management and metrics + +**Internal Services** (blocked by firewall): + +- Prometheus UI - Metrics collection (internal monitoring only) +- MySQL Database - Data storage (internal access only) + +### Internal Communication + +Services communicate via Docker's `backend_network`: + +- Container-to-container communication allowed +- Isolated from host network by default +- DNS resolution via container names (e.g., `tracker`, `mysql`, `prometheus`) + +## Production Security Checklist + +Before deploying to production, verify: + +### Infrastructure Security + +- [ ] **Virtual machines used** (not Docker containers for testing) +- [ ] **Firewall configured** (`configure` command completed successfully) +- [ ] **SSH key authentication** (password authentication disabled) +- [ ] **Custom SSH port** (optional but recommended) +- [ ] **Firewall rules verified** (`ufw status` shows expected rules) + +### Credential Security + +- [ ] **Strong SSH keys** (4096-bit RSA minimum) +- [ ] **Strong database passwords** (randomly generated, complex) +- [ ] **Unique API tokens** (per environment, rotated regularly) +- [ ] **No credentials in git** (`.env` files gitignored) +- [ ] **Secure key storage** (restricted permissions on private keys) + +### Application Security + +- [ ] **Pinned image versions** (not using `latest` or `develop` tags) +- [ ] **Image scanning enabled** (vulnerability checks in CI/CD) +- [ ] **Logging configured** (audit trail and debugging) +- [ ] **Resource limits set** (prevent resource exhaustion) +- [ ] **Regular updates scheduled** (security patches applied) + +### Monitoring Security + +- [ ] **Prometheus UI not exposed** (firewall blocks port 9090) +- [ ] **Database not exposed** (firewall blocks port 3306) +- [ ] **Access logs reviewed** (regular security audits) +- [ ] **Metrics monitored** (unusual patterns detected) + +## Security Incident Response + +If you suspect a security breach: + +1. **Isolate the system** - Disable network access if necessary +2. **Check logs** - Review `data/logs/log.txt` and container logs +3. **Review firewall rules** - Verify UFW configuration hasn't changed +4. **Rotate credentials** - Update all passwords and keys immediately +5. **Update software** - Apply latest security patches +6. **Report vulnerabilities** - Contact maintainers for Torrust Tracker issues + +## Future Security Enhancements + +Planned improvements for future releases: + +- **TLS/SSL support** - HTTPS for HTTP tracker and API +- **Certificate management** - Automated Let's Encrypt integration +- **Rate limiting** - Protection against abuse +- **Fail2ban integration** - Automated IP blocking for failed attempts +- **Security scanning** - Automated vulnerability detection in CI/CD +- **Audit logging** - Detailed access logs for compliance + +## Additional Resources + +### Related Documentation + +- **[User Guide](README.md)** - Main deployment guide +- **[Configuration Guide](configuration/)** - Environment configuration details +- **[Services Guide](services/)** - Service-specific security considerations + +### External Resources + +- **[UFW Documentation](https://help.ubuntu.com/community/UFW)** - Firewall configuration +- **[Docker Security Best Practices](https://docs.docker.com/engine/security/)** - Container security +- **[SSH Hardening Guide](https://www.ssh.com/academy/ssh/security)** - SSH security best practices +- **[OWASP Top 10](https://owasp.org/www-project-top-ten/)** - Web application security risks + +## Questions or Concerns? + +Security is an ongoing process. If you have questions or discover security issues: + +- **Security Issues** - Report privately to maintainers (do not open public issues) +- **General Questions** - [GitHub Discussions](https://github.com/torrust/torrust-tracker-deployer/discussions) +- **Feature Requests** - [GitHub Issues](https://github.com/torrust/torrust-tracker-deployer/issues) + +Stay secure! 🔒 diff --git a/docs/user-guide/services/README.md b/docs/user-guide/services/README.md new file mode 100644 index 00000000..c16d568e --- /dev/null +++ b/docs/user-guide/services/README.md @@ -0,0 +1,111 @@ +# Services Documentation + +This directory contains detailed documentation for optional services that can be included in your Torrust Tracker deployments. + +## Purpose + +The services documentation provides comprehensive guides for each optional service, including: + +- Configuration options and examples +- Enabling/disabling instructions +- Verification and testing procedures +- Troubleshooting common issues +- Architecture and deployment details + +## Available Services + +- **[Prometheus Monitoring](prometheus.md)** - Metrics collection and monitoring service + - Automatic metrics scraping from tracker API endpoints + - Web UI for querying and visualizing metrics + - Configurable scrape intervals + - Enabled by default, can be disabled + +## Service Organization + +Each service guide follows a consistent structure: + +1. **Overview** - Purpose and capabilities +2. **Default Behavior** - Out-of-the-box configuration +3. **Configuration** - How to configure the service +4. **Disabling** - How to remove the service from deployment +5. **Accessing** - How to interact with the service after deployment +6. **Verification** - How to verify the service is working correctly +7. **Troubleshooting** - Common issues and solutions +8. **Architecture** - Technical details about deployment structure + +## How Services Work + +Services in the deployer are: + +- **Optional** - Include only what you need +- **Configuration-based** - Enable by adding a section to your environment JSON +- **Containerized** - Each service runs in its own Docker container +- **Integrated** - Automatically configured to work with the tracker + +### Adding a Service + +To include a service in your deployment, add its configuration section to your environment JSON file: + +```json +{ + "environment": { + "name": "my-env" + }, + "ssh_credentials": { + "private_key_path": "~/.ssh/id_rsa", + "public_key_path": "~/.ssh/id_rsa.pub", + "username": "torrust" + }, + "prometheus": { + "scrape_interval": 15 + } +} +``` + +### Removing a Service + +To exclude a service from your deployment, simply remove its configuration section: + +```json +{ + "environment": { + "name": "my-env" + }, + "ssh_credentials": { + "private_key_path": "~/.ssh/id_rsa", + "public_key_path": "~/.ssh/id_rsa.pub", + "username": "torrust" + } + // No prometheus section = service not deployed +} +``` + +## Future Services + +As the deployer evolves, additional optional services may be added to this directory: + +- Database services (MySQL, PostgreSQL) +- Reverse proxy services (Nginx, Traefik) +- Logging aggregation (Loki, Elasticsearch) +- Alerting services (Alertmanager) +- Visualization services (Grafana) + +## Related Documentation + +- **[User Guide](../README.md)** - Main user guide with general configuration +- **[Quick Start Guide](../quick-start.md)** - Getting started with deployments +- **[Configuration Reference](../configuration/)** - Environment configuration details +- **[Manual Testing Guides](../../e2e-testing/manual/)** - Service verification procedures + +## Contributing + +When adding new service documentation: + +1. Follow the established structure outlined above +2. Include practical examples and commands +3. Provide verification steps +4. Document common troubleshooting scenarios +5. Update this README to list the new service +6. Add cross-references to related documentation + +See [Contributing Guidelines](../../contributing/README.md) for more details. diff --git a/docs/user-guide/services/prometheus.md b/docs/user-guide/services/prometheus.md new file mode 100644 index 00000000..fe18f603 --- /dev/null +++ b/docs/user-guide/services/prometheus.md @@ -0,0 +1,291 @@ +# Prometheus Monitoring Service + +This guide covers the Prometheus monitoring service integration in the Torrust Tracker Deployer. + +## Overview + +The deployer includes Prometheus for metrics collection by default. Prometheus automatically scrapes metrics from the tracker's HTTP API endpoints, providing real-time monitoring and historical data analysis. + +## Default Behavior + +- **Enabled by default** in generated environment templates +- Metrics collected from both `/api/v1/stats` and `/api/v1/metrics` endpoints +- Accessible via web UI on port `9090` +- Scrape interval: 15 seconds (configurable) + +## Configuration + +### Basic Configuration + +Add the `prometheus` section to your environment configuration file: + +```json +{ + "environment": { + "name": "my-env" + }, + "ssh_credentials": { + "private_key_path": "~/.ssh/id_rsa", + "public_key_path": "~/.ssh/id_rsa.pub", + "username": "torrust", + "port": 22 + }, + "prometheus": { + "scrape_interval": 15 + } +} +``` + +### Configuration Fields + +**prometheus.scrape_interval** (optional): + +- Metrics collection interval in seconds +- Default: `15` seconds +- Minimum recommended: `5` seconds +- Typical values: `10-60` seconds + +**Examples**: + +```json +// High-frequency monitoring (5 seconds) +{ + "prometheus": { + "scrape_interval": 5 + } +} + +// Standard monitoring (15 seconds) +{ + "prometheus": { + "scrape_interval": 15 + } +} + +// Low-frequency monitoring (60 seconds) +{ + "prometheus": { + "scrape_interval": 60 + } +} +``` + +## Disabling Prometheus + +To deploy without Prometheus monitoring, simply remove the entire `prometheus` section from your environment config: + +```json +{ + "environment": { + "name": "my-env" + }, + "ssh_credentials": { + "private_key_path": "~/.ssh/id_rsa", + "public_key_path": "~/.ssh/id_rsa.pub", + "username": "torrust", + "port": 22 + } + // No prometheus section = monitoring disabled +} +``` + +## Accessing Prometheus + +After deployment, the Prometheus web UI is available at: + +```text +http://:9090 +``` + +Where `` is the IP address of your deployed VM instance. + +### Finding Your VM IP + +```bash +# Extract IP from environment state +INSTANCE_IP=$(cat data//environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') +echo "Prometheus UI: http://$INSTANCE_IP:9090" +``` + +## Using the Prometheus UI + +The Prometheus web interface provides several capabilities: + +### 1. View Current Metrics + +Navigate to **Status → Targets** to see: + +- Tracker endpoint health (up/down status) +- Last scrape time +- Scrape duration +- Error messages (if any) + +### 2. Query Metrics + +Use the **Graph** tab to query metrics: + +**Example Queries**: + +```promql +# Total announced peers +torrust_tracker_announced_peers_total + +# Scrape duration +up{job="tracker"} + +# Rate of announcements per second +rate(torrust_tracker_announced_peers_total[5m]) +``` + +### 3. Explore Available Metrics + +Navigate to **Graph → Insert metric at cursor** to see all available metrics from the tracker. + +### 4. Check Target Health + +Navigate to **Status → Targets** to verify: + +- Both tracker endpoints are being scraped +- No error messages +- Recent successful scrapes + +## Verification + +For complete Prometheus verification steps, see the [Prometheus Verification Guide](../../e2e-testing/manual/prometheus-verification.md). + +### Quick Verification + +```bash +# Get VM IP +INSTANCE_IP=$(cat data//environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') + +# Check Prometheus container is running +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "docker ps | grep prometheus" + +# Check Prometheus is accessible +curl -s http://$INSTANCE_IP:9090/-/healthy +# Expected: Prometheus is Healthy. + +# Check tracker targets +curl -s http://$INSTANCE_IP:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}' +``` + +## Troubleshooting + +### Prometheus Container Not Running + +**Check container status**: + +```bash +INSTANCE_IP=$(cat data//environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "docker ps -a | grep prometheus" +``` + +**Check container logs**: + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "docker logs prometheus" +``` + +### Targets Showing as Down + +**Check tracker is running**: + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "docker ps | grep tracker" +``` + +**Check tracker HTTP API is accessible**: + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "curl -s http://tracker:6969/api/v1/stats" +``` + +**Check Prometheus configuration**: + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "cat /opt/torrust/storage/prometheus/etc/prometheus.yml" +``` + +### Metrics Not Being Scraped + +**Verify scrape interval**: + +```bash +# Check your environment config +cat envs/.json | jq '.prometheus.scrape_interval' +``` + +**Check Prometheus config on VM**: + +```bash +INSTANCE_IP=$(cat data//environment.json | jq -r '.Running.context.runtime_outputs.instance_ip') +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "cat /opt/torrust/storage/prometheus/etc/prometheus.yml | grep scrape_interval" +``` + +### Port 9090 Not Accessible + +**Check port is exposed in docker-compose**: + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "cat /opt/torrust/docker-compose.yml | grep -A 5 'prometheus:'" +``` + +**Check firewall rules** (if applicable): + +```bash +ssh -i fixtures/testing_rsa torrust@$INSTANCE_IP "sudo ufw status" +``` + +## Architecture + +### Deployment Structure + +Prometheus is deployed as a Docker container alongside the tracker: + +```text +VM Instance +├── /opt/torrust/ +│ ├── docker-compose.yml # Defines prometheus service +│ ├── storage/ +│ │ └── prometheus/ +│ │ └── etc/ +│ │ └── prometheus.yml # Prometheus configuration +│ └── .env # Environment variables +``` + +### Configuration Generation + +The deployer generates the Prometheus configuration file from templates: + +1. **Template**: `templates/tracker/prometheus.yml.tera` +2. **Build Directory**: `build//prometheus/prometheus.yml` +3. **Deployment**: Ansible copies to `/opt/torrust/storage/prometheus/etc/prometheus.yml` + +### Docker Compose Integration + +When Prometheus is enabled, the deployer adds the service to `docker-compose.yml`: + +```yaml +services: + prometheus: + image: prom/prometheus:latest + container_name: prometheus + ports: + - "9090:9090" + volumes: + - ./storage/prometheus/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro + command: + - "--config.file=/etc/prometheus/prometheus.yml" + networks: + - tracker-network + depends_on: + - tracker +``` + +## Related Documentation + +- **[Prometheus Verification Guide](../../e2e-testing/manual/prometheus-verification.md)** - Detailed verification steps +- **[User Guide](../README.md)** - Main user guide +- **[Configuration Guide](../configuration/)** - Environment configuration details +- **[Quick Start Guide](../quick-start.md)** - Getting started with deployments diff --git a/project-words.txt b/project-words.txt index f1ba57f3..90582cdf 100644 --- a/project-words.txt +++ b/project-words.txt @@ -2,6 +2,7 @@ AAAAB AAAAC AAAAI AGENTS +Alertmanager Ashburn Avalonia CIFS @@ -40,6 +41,7 @@ Testcontain Testcontainers Testinfra Torrust +Traefik VARCHAR addgroup adduser @@ -86,6 +88,7 @@ ehthumbs elif endfor endraw +entr epel eprint eprintln @@ -93,6 +96,7 @@ equalto executability exfiltration exitcode +flatlined frontends getent getopt @@ -169,6 +173,7 @@ preconfigured preinstalls prereq println +promtool publickey pytest readlink @@ -235,6 +240,7 @@ tmpfs tmptu torrust tulnp +tulpn turbofish tést undertested diff --git a/src/application/command_handlers/release/errors.rs b/src/application/command_handlers/release/errors.rs index 5ea22f29..4025d9c6 100644 --- a/src/application/command_handlers/release/errors.rs +++ b/src/application/command_handlers/release/errors.rs @@ -48,6 +48,10 @@ pub enum ReleaseCommandHandlerError { #[error("Tracker database initialization failed: {0}")] TrackerDatabaseInit(String), + /// Prometheus storage directory creation failed + #[error("Prometheus storage creation failed: {0}")] + PrometheusStorageCreation(String), + /// General deployment operation failed #[error("Deployment failed: {message}")] Deployment { @@ -102,6 +106,11 @@ impl Traceable for ReleaseCommandHandlerError { Self::TrackerDatabaseInit(message) => { format!("ReleaseCommandHandlerError: Tracker database initialization failed - {message}") } + Self::PrometheusStorageCreation(message) => { + format!( + "ReleaseCommandHandlerError: Prometheus storage creation failed - {message}" + ) + } Self::Deployment { message, .. } | Self::DeploymentFailed { message, .. } => { format!("ReleaseCommandHandlerError: Deployment failed - {message}") } @@ -125,6 +134,7 @@ impl Traceable for ReleaseCommandHandlerError { | Self::TemplateRendering(_) | Self::TrackerStorageCreation(_) | Self::TrackerDatabaseInit(_) + | Self::PrometheusStorageCreation(_) | Self::ReleaseOperationFailed { .. } => None, } } @@ -137,7 +147,8 @@ impl Traceable for ReleaseCommandHandlerError { Self::StatePersistence(_) => ErrorKind::StatePersistence, Self::TemplateRendering(_) | Self::TrackerStorageCreation(_) - | Self::TrackerDatabaseInit(_) => ErrorKind::TemplateRendering, + | Self::TrackerDatabaseInit(_) + | Self::PrometheusStorageCreation(_) => ErrorKind::TemplateRendering, Self::Deployment { .. } | Self::ReleaseOperationFailed { .. } => { ErrorKind::InfrastructureOperation } @@ -308,6 +319,30 @@ Common causes: - Ansible playbook not found - Network connectivity issues +For more information, see docs/user-guide/commands.md" + } + Self::PrometheusStorageCreation(_) => { + "Prometheus Storage Creation Failed - Troubleshooting: + +1. Verify the target instance is reachable: + ssh @ + +2. Check that the instance has sufficient disk space: + df -h + +3. Verify the Ansible playbook exists: + ls templates/ansible/create-prometheus-storage.yml + +4. Check Ansible execution permissions + +5. Review the error message above for specific details + +Common causes: +- Insufficient disk space on target instance +- Permission denied on target directories +- Ansible playbook not found +- Network connectivity issues + For more information, see docs/user-guide/commands.md" } Self::Deployment { .. } => { diff --git a/src/application/command_handlers/release/handler.rs b/src/application/command_handlers/release/handler.rs index d48e4c85..e3efcd95 100644 --- a/src/application/command_handlers/release/handler.rs +++ b/src/application/command_handlers/release/handler.rs @@ -10,8 +10,11 @@ use super::errors::ReleaseCommandHandlerError; use crate::adapters::ansible::AnsibleClient; use crate::application::command_handlers::common::StepResult; use crate::application::steps::{ - application::{CreateTrackerStorageStep, DeployTrackerConfigStep, InitTrackerDatabaseStep}, - rendering::RenderTrackerTemplatesStep, + application::{ + CreatePrometheusStorageStep, CreateTrackerStorageStep, DeployPrometheusConfigStep, + DeployTrackerConfigStep, InitTrackerDatabaseStep, + }, + rendering::{RenderPrometheusTemplatesStep, RenderTrackerTemplatesStep}, DeployComposeFilesStep, RenderDockerComposeTemplatesStep, }; use crate::domain::environment::repository::{EnvironmentRepository, TypedEnvironmentRepository}; @@ -199,10 +202,19 @@ impl ReleaseCommandHandler { // Step 4: Deploy tracker configuration to remote self.deploy_tracker_config_to_remote(environment, &tracker_build_dir, instance_ip)?; - // Step 5: Render Docker Compose templates + // Step 5: Create Prometheus storage directories (if enabled) + Self::create_prometheus_storage(environment, instance_ip)?; + + // Step 6: Render Prometheus configuration templates (if enabled) + Self::render_prometheus_templates(environment)?; + + // Step 7: Deploy Prometheus configuration to remote (if enabled) + self.deploy_prometheus_config_to_remote(environment, instance_ip)?; + + // Step 8: Render Docker Compose templates let compose_build_dir = self.render_docker_compose_templates(environment).await?; - // Step 6: Deploy compose files to remote + // Step 9: Deploy compose files to remote self.deploy_compose_files_to_remote(environment, &compose_build_dir, instance_ip)?; let released = environment.clone().released(); @@ -308,6 +320,152 @@ impl ReleaseCommandHandler { Ok(tracker_build_dir) } + /// Render Prometheus configuration templates to the build directory (if enabled) + /// + /// This step is optional and only executes if Prometheus is configured in the environment. + /// If Prometheus is not configured, the step is skipped without error. + /// + /// # Errors + /// + /// Returns a tuple of (error, `ReleaseStep::RenderPrometheusTemplates`) if rendering fails + #[allow(clippy::result_large_err)] + fn render_prometheus_templates( + environment: &Environment, + ) -> StepResult<(), ReleaseCommandHandlerError, ReleaseStep> { + let current_step = ReleaseStep::RenderPrometheusTemplates; + + // Check if Prometheus is configured + if environment.context().user_inputs.prometheus.is_none() { + info!( + command = "release", + step = %current_step, + status = "skipped", + "Prometheus not configured - skipping template rendering" + ); + return Ok(()); + } + + let template_manager = Arc::new(TemplateManager::new(environment.templates_dir())); + let step = RenderPrometheusTemplatesStep::new( + Arc::new(environment.clone()), + template_manager, + environment.build_dir().clone(), + ); + + step.execute().map_err(|e| { + ( + ReleaseCommandHandlerError::TemplateRendering(e.to_string()), + current_step, + ) + })?; + + info!( + command = "release", + step = %current_step, + "Prometheus configuration templates rendered successfully" + ); + + Ok(()) + } + + /// Create Prometheus storage directories on the remote host (if enabled) + /// + /// This step is optional and only executes if Prometheus is configured in the environment. + /// If Prometheus is not configured, the step is skipped without error. + /// + /// # Errors + /// + /// Returns a tuple of (error, `ReleaseStep::CreatePrometheusStorage`) if creation fails + #[allow(clippy::result_large_err)] + fn create_prometheus_storage( + environment: &Environment, + _instance_ip: IpAddr, + ) -> StepResult<(), ReleaseCommandHandlerError, ReleaseStep> { + let current_step = ReleaseStep::CreatePrometheusStorage; + + // Check if Prometheus is configured + if environment.context().user_inputs.prometheus.is_none() { + info!( + command = "release", + step = %current_step, + status = "skipped", + "Prometheus not configured - skipping storage creation" + ); + return Ok(()); + } + + let ansible_client = Arc::new(AnsibleClient::new(environment.build_dir().join("ansible"))); + + CreatePrometheusStorageStep::new(ansible_client) + .execute() + .map_err(|e| { + ( + ReleaseCommandHandlerError::PrometheusStorageCreation(e.to_string()), + current_step, + ) + })?; + + info!( + command = "release", + step = %current_step, + "Prometheus storage directories created successfully" + ); + + Ok(()) + } + + /// Deploy Prometheus configuration to the remote host via Ansible (if enabled) + /// + /// This step is optional and only executes if Prometheus is configured in the environment. + /// If Prometheus is not configured, the step is skipped without error. + /// + /// # Arguments + /// + /// * `environment` - The environment in Releasing state + /// * `instance_ip` - The target instance IP address + /// + /// # Errors + /// + /// Returns a tuple of (error, `ReleaseStep::DeployPrometheusConfigToRemote`) if deployment fails + #[allow(clippy::result_large_err, clippy::unused_self)] + fn deploy_prometheus_config_to_remote( + &self, + environment: &Environment, + _instance_ip: IpAddr, + ) -> StepResult<(), ReleaseCommandHandlerError, ReleaseStep> { + let current_step = ReleaseStep::DeployPrometheusConfigToRemote; + + // Check if Prometheus is configured + if environment.context().user_inputs.prometheus.is_none() { + info!( + command = "release", + step = %current_step, + status = "skipped", + "Prometheus not configured - skipping config deployment" + ); + return Ok(()); + } + + let ansible_client = Arc::new(AnsibleClient::new(environment.build_dir().join("ansible"))); + + DeployPrometheusConfigStep::new(ansible_client) + .execute() + .map_err(|e| { + ( + ReleaseCommandHandlerError::TemplateRendering(e.to_string()), + current_step, + ) + })?; + + info!( + command = "release", + step = %current_step, + "Prometheus configuration deployed successfully" + ); + + Ok(()) + } + /// Deploy tracker configuration to the remote host via Ansible /// /// # Arguments diff --git a/src/application/steps/application/create_prometheus_storage.rs b/src/application/steps/application/create_prometheus_storage.rs new file mode 100644 index 00000000..8accc61b --- /dev/null +++ b/src/application/steps/application/create_prometheus_storage.rs @@ -0,0 +1,110 @@ +//! Prometheus storage directory creation step +//! +//! This module provides the `CreatePrometheusStorageStep` which handles creation +//! of the required directory structure for Prometheus on remote hosts +//! via Ansible playbooks. This step ensures Prometheus has the necessary +//! directories for configuration files. +//! +//! ## Key Features +//! +//! - Creates standardized directory structure for Prometheus storage +//! - Sets appropriate ownership and permissions +//! - Idempotent operation (safe to run multiple times) +//! - Only executes when Prometheus is enabled in environment configuration +//! +//! ## Directory Structure +//! +//! The step creates the following directory hierarchy: +//! ```text +//! /opt/torrust/storage/prometheus/ +//! └── etc/ # Configuration files (prometheus.yml) +//! ``` + +use std::sync::Arc; +use tracing::{info, instrument}; + +use crate::adapters::ansible::AnsibleClient; +use crate::shared::command::CommandError; + +/// Step that creates Prometheus storage directories on a remote host via Ansible +/// +/// This step creates the necessary directory structure for Prometheus, +/// ensuring all directories have correct ownership and permissions. +pub struct CreatePrometheusStorageStep { + ansible_client: Arc, +} + +impl CreatePrometheusStorageStep { + /// Create a new Prometheus storage directory creation step + /// + /// # Arguments + /// + /// * `ansible_client` - Ansible client for running playbooks + #[must_use] + pub fn new(ansible_client: Arc) -> Self { + Self { ansible_client } + } + + /// Execute the storage directory creation + /// + /// Runs the Ansible playbook that creates the Prometheus storage directory structure. + /// + /// # Errors + /// + /// Returns `CommandError` if: + /// - Ansible playbook execution fails + /// - Directory creation fails on remote host + /// - Permission setting fails + #[instrument( + name = "create_prometheus_storage", + skip_all, + fields(step_type = "system", component = "prometheus", method = "ansible") + )] + pub fn execute(&self) -> Result<(), CommandError> { + info!( + step = "create_prometheus_storage", + action = "create_directories", + "Creating Prometheus storage directory structure" + ); + + match self + .ansible_client + .run_playbook("create-prometheus-storage", &[]) + { + Ok(_) => { + info!( + step = "create_prometheus_storage", + status = "success", + "Prometheus storage directories created successfully" + ); + Ok(()) + } + Err(e) => { + tracing::error!( + step = "create_prometheus_storage", + error = %e, + "Failed to create Prometheus storage directories" + ); + Err(e) + } + } + } +} + +#[cfg(test)] +mod tests { + use tempfile::TempDir; + + use super::*; + + #[test] + fn it_should_create_prometheus_storage_step() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let ansible_client = Arc::new(AnsibleClient::new(temp_dir.path().to_path_buf())); + + let step = CreatePrometheusStorageStep::new(ansible_client); + + // Step should be created successfully + assert!(!std::ptr::addr_of!(step).cast::<()>().is_null()); + } +} diff --git a/src/application/steps/application/deploy_prometheus_config.rs b/src/application/steps/application/deploy_prometheus_config.rs new file mode 100644 index 00000000..2418c843 --- /dev/null +++ b/src/application/steps/application/deploy_prometheus_config.rs @@ -0,0 +1,114 @@ +//! Prometheus configuration deployment step +//! +//! This module provides the `DeployPrometheusConfigStep` which handles deployment +//! of the Prometheus configuration file (`prometheus.yml`) to remote hosts +//! via Ansible playbooks. +//! +//! ## Key Features +//! +//! - Deploys prometheus.yml from build directory to remote host +//! - Sets appropriate ownership and permissions +//! - Verifies successful deployment with assertions +//! - Only executes when Prometheus is enabled in environment configuration +//! +//! ## Deployment Flow +//! +//! 1. Copy prometheus.yml from build directory to remote host +//! 2. Set file permissions (0644) and ownership +//! 3. Verify file exists and has correct properties +//! +//! ## File Locations +//! +//! - **Source**: `{build_dir}/storage/prometheus/etc/prometheus.yml` +//! - **Destination**: `/opt/torrust/storage/prometheus/etc/prometheus.yml` +//! - **Container Mount**: Mounted as `/etc/prometheus/prometheus.yml` + +use std::sync::Arc; +use tracing::{info, instrument}; + +use crate::adapters::ansible::AnsibleClient; +use crate::shared::command::CommandError; + +/// Step that deploys Prometheus configuration to a remote host via Ansible +/// +/// This step copies the rendered prometheus.yml configuration file from the +/// build directory to the remote host's Prometheus configuration directory. +pub struct DeployPrometheusConfigStep { + ansible_client: Arc, +} + +impl DeployPrometheusConfigStep { + /// Create a new Prometheus configuration deployment step + /// + /// # Arguments + /// + /// * `ansible_client` - Ansible client for running playbooks + #[must_use] + pub fn new(ansible_client: Arc) -> Self { + Self { ansible_client } + } + + /// Execute the configuration deployment + /// + /// Runs the Ansible playbook that deploys the Prometheus configuration file. + /// + /// # Errors + /// + /// Returns `CommandError` if: + /// - Ansible playbook execution fails + /// - File copying fails + /// - Permission setting fails + /// - Verification assertions fail + #[instrument( + name = "deploy_prometheus_config", + skip_all, + fields(step_type = "deployment", component = "prometheus", method = "ansible") + )] + pub fn execute(&self) -> Result<(), CommandError> { + info!( + step = "deploy_prometheus_config", + action = "deploy_file", + "Deploying Prometheus configuration to remote host" + ); + + match self + .ansible_client + .run_playbook("deploy-prometheus-config", &[]) + { + Ok(_) => { + info!( + step = "deploy_prometheus_config", + status = "success", + "Prometheus configuration deployed successfully" + ); + Ok(()) + } + Err(e) => { + tracing::error!( + step = "deploy_prometheus_config", + error = %e, + "Failed to deploy Prometheus configuration" + ); + Err(e) + } + } + } +} + +#[cfg(test)] +mod tests { + use tempfile::TempDir; + + use super::*; + + #[test] + fn it_should_create_deploy_prometheus_config_step() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let ansible_client = Arc::new(AnsibleClient::new(temp_dir.path().to_path_buf())); + + let step = DeployPrometheusConfigStep::new(ansible_client); + + // Step should be created successfully + assert!(!std::ptr::addr_of!(step).cast::<()>().is_null()); + } +} diff --git a/src/application/steps/application/mod.rs b/src/application/steps/application/mod.rs index 85a90f7a..62dee423 100644 --- a/src/application/steps/application/mod.rs +++ b/src/application/steps/application/mod.rs @@ -9,6 +9,8 @@ //! - `create_tracker_storage` - Creates tracker storage directory structure on remote host //! - `init_tracker_database` - Initializes `SQLite` database file for the tracker //! - `deploy_tracker_config` - Deploys tracker.toml configuration file to remote host +//! - `create_prometheus_storage` - Creates Prometheus storage directory structure on remote host +//! - `deploy_prometheus_config` - Deploys prometheus.yml configuration file to remote host //! - `deploy_compose_files` - Deploys Docker Compose files to remote host via Ansible //! - `start_services` - Starts Docker Compose services via Ansible //! - `run` - Legacy run step (placeholder) @@ -26,15 +28,19 @@ //! software installation steps to provide complete deployment workflows //! from infrastructure provisioning to application operation. +pub mod create_prometheus_storage; pub mod create_tracker_storage; pub mod deploy_compose_files; +pub mod deploy_prometheus_config; pub mod deploy_tracker_config; pub mod init_tracker_database; pub mod run; pub mod start_services; +pub use create_prometheus_storage::CreatePrometheusStorageStep; pub use create_tracker_storage::CreateTrackerStorageStep; pub use deploy_compose_files::{DeployComposeFilesStep, DeployComposeFilesStepError}; +pub use deploy_prometheus_config::DeployPrometheusConfigStep; pub use deploy_tracker_config::{DeployTrackerConfigStep, DeployTrackerConfigStepError}; pub use init_tracker_database::InitTrackerDatabaseStep; pub use run::{RunStep, RunStepError}; diff --git a/src/application/steps/rendering/docker_compose_templates.rs b/src/application/steps/rendering/docker_compose_templates.rs index d2298315..9e63da69 100644 --- a/src/application/steps/rendering/docker_compose_templates.rs +++ b/src/application/steps/rendering/docker_compose_templates.rs @@ -33,7 +33,7 @@ use crate::domain::environment::Environment; use crate::domain::template::TemplateManager; use crate::domain::tracker::{DatabaseConfig, TrackerConfig}; use crate::infrastructure::templating::docker_compose::template::wrappers::docker_compose::{ - DockerComposeContext, TrackerPorts, + DockerComposeContext, DockerComposeContextBuilder, TrackerPorts, }; use crate::infrastructure::templating::docker_compose::template::wrappers::env::EnvContext; use crate::infrastructure::templating::docker_compose::{ @@ -72,30 +72,6 @@ impl RenderDockerComposeTemplatesStep { } } - /// Extract port numbers from tracker configuration - /// - /// Returns a tuple of (`udp_ports`, `http_ports`, `api_port`) - fn extract_tracker_ports(tracker_config: &TrackerConfig) -> (Vec, Vec, u16) { - // Extract UDP tracker ports - let udp_ports: Vec = tracker_config - .udp_trackers - .iter() - .map(|tracker| tracker.bind_address.port()) - .collect(); - - // Extract HTTP tracker ports - let http_ports: Vec = tracker_config - .http_trackers - .iter() - .map(|tracker| tracker.bind_address.port()) - .collect(); - - // Extract HTTP API port - let api_port = tracker_config.http_api.bind_address.port(); - - (udp_ports, http_ports, api_port) - } - /// Execute the template rendering step /// /// This will render Docker Compose templates to the build directory. @@ -129,66 +105,33 @@ impl RenderDockerComposeTemplatesStep { let generator = DockerComposeProjectGenerator::new(&self.build_dir, &self.template_manager); - // Extract admin token from environment config - let admin_token = self - .environment - .context() - .user_inputs - .tracker - .http_api - .admin_token - .clone(); - - // Extract tracker ports from configuration - let tracker_config = &self.environment.context().user_inputs.tracker; - let (udp_tracker_ports, http_tracker_ports, http_api_port) = - Self::extract_tracker_ports(tracker_config); - - let ports = TrackerPorts { - udp_tracker_ports, - http_tracker_ports, - http_api_port, - }; + let admin_token = self.extract_admin_token(); + let ports = self.build_tracker_ports(); // Create contexts based on database configuration - let database_config = &self.environment.context().user_inputs.tracker.core.database; - let (env_context, docker_compose_context) = match database_config { - DatabaseConfig::Sqlite { .. } => { - let env_context = EnvContext::new(admin_token); - let docker_compose_context = DockerComposeContext::new_sqlite(ports); - (env_context, docker_compose_context) - } + let database_config = self.environment.database_config(); + let (env_context, builder) = match database_config { + DatabaseConfig::Sqlite { .. } => Self::create_sqlite_contexts(admin_token, ports), DatabaseConfig::Mysql { port, database_name, username, password, .. - } => { - // For MySQL, generate a secure root password (in production, this should be managed securely) - let root_password = format!("{password}_root"); - - let env_context = EnvContext::new_with_mysql( - admin_token, - root_password.clone(), - database_name.clone(), - username.clone(), - password.clone(), - ); - - let docker_compose_context = DockerComposeContext::new_mysql( - root_password, - database_name.clone(), - username.clone(), - password.clone(), - *port, - ports, - ); - - (env_context, docker_compose_context) - } + } => Self::create_mysql_contexts( + admin_token, + ports, + *port, + database_name.clone(), + username.clone(), + password.clone(), + ), }; + // Apply Prometheus configuration (independent of database choice) + let builder = self.apply_prometheus_config(builder); + let docker_compose_context = builder.build(); + let compose_build_dir = generator .render(&env_context, &docker_compose_context) .await?; @@ -202,6 +145,94 @@ impl RenderDockerComposeTemplatesStep { Ok(compose_build_dir) } + + fn extract_admin_token(&self) -> String { + self.environment.admin_token().to_string() + } + + fn build_tracker_ports(&self) -> TrackerPorts { + let tracker_config = self.environment.tracker_config(); + let (udp_tracker_ports, http_tracker_ports, http_api_port) = + Self::extract_tracker_ports(tracker_config); + + TrackerPorts { + udp_tracker_ports, + http_tracker_ports, + http_api_port, + } + } + + fn create_sqlite_contexts( + admin_token: String, + ports: TrackerPorts, + ) -> (EnvContext, DockerComposeContextBuilder) { + let env_context = EnvContext::new(admin_token); + let builder = DockerComposeContext::builder(ports); + + (env_context, builder) + } + + fn create_mysql_contexts( + admin_token: String, + ports: TrackerPorts, + port: u16, + database_name: String, + username: String, + password: String, + ) -> (EnvContext, DockerComposeContextBuilder) { + // For MySQL, generate a secure root password (in production, this should be managed securely) + let root_password = format!("{password}_root"); + + let env_context = EnvContext::new_with_mysql( + admin_token, + root_password.clone(), + database_name.clone(), + username.clone(), + password.clone(), + ); + + let builder = DockerComposeContext::builder(ports).with_mysql( + root_password, + database_name, + username, + password, + port, + ); + + (env_context, builder) + } + + fn apply_prometheus_config( + &self, + builder: DockerComposeContextBuilder, + ) -> DockerComposeContextBuilder { + if let Some(prometheus_config) = self.environment.prometheus_config() { + builder.with_prometheus(prometheus_config.clone()) + } else { + builder + } + } + + fn extract_tracker_ports(tracker_config: &TrackerConfig) -> (Vec, Vec, u16) { + // Extract UDP tracker ports + let udp_ports: Vec = tracker_config + .udp_trackers + .iter() + .map(|tracker| tracker.bind_address.port()) + .collect(); + + // Extract HTTP tracker ports + let http_ports: Vec = tracker_config + .http_trackers + .iter() + .map(|tracker| tracker.bind_address.port()) + .collect(); + + // Extract HTTP API port + let api_port = tracker_config.http_api.bind_address.port(); + + (udp_ports, http_ports, api_port) + } } #[cfg(test)] diff --git a/src/application/steps/rendering/mod.rs b/src/application/steps/rendering/mod.rs index 3c0fd7e9..f9910b81 100644 --- a/src/application/steps/rendering/mod.rs +++ b/src/application/steps/rendering/mod.rs @@ -10,6 +10,7 @@ //! - `opentofu_templates` - `OpenTofu` template rendering for infrastructure //! - `docker_compose_templates` - Docker Compose template rendering for deployment //! - `tracker_templates` - Tracker configuration template rendering +//! - `prometheus_templates` - Prometheus configuration template rendering //! //! ## Key Features //! @@ -24,9 +25,11 @@ pub mod ansible_templates; pub mod docker_compose_templates; pub mod opentofu_templates; +pub mod prometheus_templates; pub mod tracker_templates; pub use ansible_templates::RenderAnsibleTemplatesStep; pub use docker_compose_templates::RenderDockerComposeTemplatesStep; pub use opentofu_templates::RenderOpenTofuTemplatesStep; +pub use prometheus_templates::RenderPrometheusTemplatesStep; pub use tracker_templates::RenderTrackerTemplatesStep; diff --git a/src/application/steps/rendering/prometheus_templates.rs b/src/application/steps/rendering/prometheus_templates.rs new file mode 100644 index 00000000..ba74ddc6 --- /dev/null +++ b/src/application/steps/rendering/prometheus_templates.rs @@ -0,0 +1,226 @@ +//! Prometheus template rendering step +//! +//! This module provides the `RenderPrometheusTemplatesStep` which handles rendering +//! of Prometheus configuration templates to the build directory. This step prepares +//! Prometheus configuration files for deployment to the remote host. +//! +//! ## Key Features +//! +//! - Template rendering for Prometheus configurations +//! - Integration with the `PrometheusProjectGenerator` for file generation +//! - Build directory preparation for deployment operations +//! - Comprehensive error handling for template processing +//! +//! ## Usage Context +//! +//! This step is typically executed during the release workflow, after +//! infrastructure provisioning and software installation, to prepare +//! the Prometheus configuration files for deployment. +//! +//! ## Architecture +//! +//! This step follows the three-level architecture: +//! - **Command** (Level 1): `ReleaseCommandHandler` orchestrates the release workflow +//! - **Step** (Level 2): This `RenderPrometheusTemplatesStep` handles template rendering +//! - The templates are rendered locally, no remote action is needed + +use std::path::PathBuf; +use std::sync::Arc; + +use tracing::{info, instrument}; + +use crate::domain::environment::Environment; +use crate::domain::template::TemplateManager; +use crate::infrastructure::templating::prometheus::{ + PrometheusProjectGenerator, PrometheusProjectGeneratorError, +}; + +/// Step that renders Prometheus templates to the build directory +/// +/// This step handles the preparation of Prometheus configuration files +/// by rendering templates to the build directory. The rendered files are +/// then ready to be deployed to the remote host. +pub struct RenderPrometheusTemplatesStep { + environment: Arc>, + template_manager: Arc, + build_dir: PathBuf, +} + +impl RenderPrometheusTemplatesStep { + /// Creates a new `RenderPrometheusTemplatesStep` + /// + /// # Arguments + /// + /// * `environment` - The deployment environment + /// * `template_manager` - The template manager for accessing templates + /// * `build_dir` - The build directory where templates will be rendered + #[must_use] + pub fn new( + environment: Arc>, + template_manager: Arc, + build_dir: PathBuf, + ) -> Self { + Self { + environment, + template_manager, + build_dir, + } + } + + /// Execute the template rendering step + /// + /// This will render Prometheus templates to the build directory if Prometheus + /// configuration is present in the environment. + /// + /// # Returns + /// + /// Returns the path to the Prometheus build directory on success, or `None` + /// if Prometheus is not configured. + /// + /// # Errors + /// + /// Returns an error if: + /// * Template rendering fails + /// * Directory creation fails + /// * File writing fails + #[instrument( + name = "render_prometheus_templates", + skip_all, + fields( + step_type = "rendering", + template_type = "prometheus", + build_dir = %self.build_dir.display() + ) + )] + pub fn execute(&self) -> Result, PrometheusProjectGeneratorError> { + // Check if Prometheus is configured + let Some(prometheus_config) = &self.environment.context().user_inputs.prometheus else { + info!( + step = "render_prometheus_templates", + status = "skipped", + reason = "prometheus_not_configured", + "Skipping Prometheus template rendering - not configured" + ); + return Ok(None); + }; + + info!( + step = "render_prometheus_templates", + templates_dir = %self.template_manager.templates_dir().display(), + build_dir = %self.build_dir.display(), + "Rendering Prometheus configuration templates" + ); + + let generator = + PrometheusProjectGenerator::new(&self.build_dir, self.template_manager.clone()); + + // Extract tracker config for API token and port + let tracker_config = &self.environment.context().user_inputs.tracker; + generator.render(prometheus_config, tracker_config)?; + + let prometheus_build_dir = self.build_dir.join("storage/prometheus/etc"); + + info!( + step = "render_prometheus_templates", + prometheus_build_dir = %prometheus_build_dir.display(), + status = "success", + "Prometheus templates rendered successfully" + ); + + Ok(Some(prometheus_build_dir)) + } +} + +#[cfg(test)] +mod tests { + use tempfile::TempDir; + + use super::*; + use crate::domain::environment::testing::EnvironmentTestBuilder; + use crate::domain::prometheus::PrometheusConfig; + + #[test] + fn it_should_create_render_prometheus_templates_step() { + let templates_dir = TempDir::new().expect("Failed to create templates dir"); + let build_dir = TempDir::new().expect("Failed to create build dir"); + + let (environment, _, _, _temp_dir) = + EnvironmentTestBuilder::new().build_with_custom_paths(); + let environment = Arc::new(environment); + + let template_manager = Arc::new(TemplateManager::new(templates_dir.path().to_path_buf())); + let step = RenderPrometheusTemplatesStep::new( + environment.clone(), + template_manager.clone(), + build_dir.path().to_path_buf(), + ); + + assert_eq!(step.build_dir, build_dir.path()); + assert_eq!(step.template_manager.templates_dir(), templates_dir.path()); + } + + #[test] + fn it_should_skip_rendering_when_prometheus_not_configured() { + let templates_dir = TempDir::new().expect("Failed to create templates dir"); + let build_dir = TempDir::new().expect("Failed to create build dir"); + + // Build environment without Prometheus config + let (environment, _, _, _temp_dir) = EnvironmentTestBuilder::new() + .with_prometheus_config(None) + .build_with_custom_paths(); + let environment = Arc::new(environment); + + let template_manager = Arc::new(TemplateManager::new(templates_dir.path().to_path_buf())); + let step = RenderPrometheusTemplatesStep::new( + environment, + template_manager, + build_dir.path().to_path_buf(), + ); + + let result = step.execute(); + assert!( + result.is_ok(), + "Should succeed when Prometheus not configured" + ); + assert!( + result.unwrap().is_none(), + "Should return None when Prometheus not configured" + ); + } + + #[test] + fn it_should_render_templates_when_prometheus_configured() { + let templates_dir = TempDir::new().expect("Failed to create templates dir"); + let build_dir = TempDir::new().expect("Failed to create build dir"); + + // Build environment with Prometheus config + let (environment, _, _, _temp_dir) = EnvironmentTestBuilder::new() + .with_prometheus_config(Some(PrometheusConfig { + scrape_interval: 30, + })) + .build_with_custom_paths(); + let environment = Arc::new(environment); + + let template_manager = Arc::new(TemplateManager::new(templates_dir.path().to_path_buf())); + let step = RenderPrometheusTemplatesStep::new( + environment, + template_manager, + build_dir.path().to_path_buf(), + ); + + let result = step.execute(); + assert!(result.is_ok(), "Should render Prometheus templates"); + + let prometheus_build_dir = result.unwrap(); + assert!( + prometheus_build_dir.is_some(), + "Should return build directory path" + ); + + let build_dir_path = prometheus_build_dir.unwrap(); + assert!( + build_dir_path.to_string_lossy().contains("prometheus"), + "Build directory should contain 'prometheus' in path" + ); + } +} diff --git a/src/bin/e2e_deployment_workflow_tests.rs b/src/bin/e2e_deployment_workflow_tests.rs index f907cee9..ea8c7192 100644 --- a/src/bin/e2e_deployment_workflow_tests.rs +++ b/src/bin/e2e_deployment_workflow_tests.rs @@ -78,8 +78,12 @@ use torrust_tracker_deployer_lib::testing::e2e::tasks::black_box::{ }; use torrust_tracker_deployer_lib::testing::e2e::tasks::container::cleanup_infrastructure::stop_test_infrastructure; use torrust_tracker_deployer_lib::testing::e2e::tasks::run_configuration_validation::run_configuration_validation; -use torrust_tracker_deployer_lib::testing::e2e::tasks::run_release_validation::run_release_validation; -use torrust_tracker_deployer_lib::testing::e2e::tasks::run_run_validation::run_run_validation; +use torrust_tracker_deployer_lib::testing::e2e::tasks::run_release_validation::{ + run_release_validation, ServiceValidation, +}; +use torrust_tracker_deployer_lib::testing::e2e::tasks::run_run_validation::{ + run_run_validation, ServiceValidation as RunServiceValidation, +}; /// Environment name for this E2E test const ENVIRONMENT_NAME: &str = "e2e-deployment"; @@ -284,7 +288,9 @@ async fn run_deployer_workflow( test_runner.release_software()?; // Validate the release (Docker Compose files deployed correctly) - run_release_validation(socket_addr, ssh_credentials) + // Note: E2E deployment environment has Prometheus enabled, so we validate it + let services = ServiceValidation { prometheus: true }; + run_release_validation(socket_addr, ssh_credentials, Some(services)) .await .map_err(|e| anyhow::anyhow!("{e}"))?; @@ -293,11 +299,14 @@ async fn run_deployer_workflow( test_runner.run_services()?; // Validate services are running using actual mapped ports from runtime environment + // Note: E2E deployment environment has Prometheus enabled, so we validate it + let run_services = RunServiceValidation { prometheus: true }; run_run_validation( socket_addr, ssh_credentials, runtime_env.container_ports.http_api_port, vec![runtime_env.container_ports.http_tracker_port], + Some(run_services), ) .await .map_err(|e| anyhow::anyhow!("{e}"))?; diff --git a/src/domain/environment/context.rs b/src/domain/environment/context.rs index d1cf6954..feb66914 100644 --- a/src/domain/environment/context.rs +++ b/src/domain/environment/context.rs @@ -282,4 +282,82 @@ impl EnvironmentContext { pub fn tofu_templates_dir(&self) -> PathBuf { self.internal_config.tofu_templates_dir() } + + /// Returns the environment name + #[must_use] + pub fn name(&self) -> &EnvironmentName { + &self.user_inputs.name + } + + /// Returns the instance name + #[must_use] + pub fn instance_name(&self) -> &crate::domain::InstanceName { + &self.user_inputs.instance_name + } + + /// Returns the provider configuration + #[must_use] + pub fn provider_config(&self) -> &ProviderConfig { + &self.user_inputs.provider_config + } + + /// Returns the SSH credentials + #[must_use] + pub fn ssh_credentials(&self) -> &SshCredentials { + &self.user_inputs.ssh_credentials + } + + /// Returns the SSH port + #[must_use] + pub fn ssh_port(&self) -> u16 { + self.user_inputs.ssh_port + } + + /// Returns the database configuration + #[must_use] + pub fn database_config(&self) -> &crate::domain::tracker::DatabaseConfig { + &self.user_inputs.tracker.core.database + } + + /// Returns the tracker configuration + #[must_use] + pub fn tracker_config(&self) -> &crate::domain::tracker::TrackerConfig { + &self.user_inputs.tracker + } + + /// Returns the admin token + #[must_use] + pub fn admin_token(&self) -> &str { + &self.user_inputs.tracker.http_api.admin_token + } + + /// Returns the Prometheus configuration if enabled + #[must_use] + pub fn prometheus_config(&self) -> Option<&crate::domain::prometheus::PrometheusConfig> { + self.user_inputs.prometheus.as_ref() + } + + /// Returns the build directory + #[must_use] + pub fn build_dir(&self) -> &PathBuf { + &self.internal_config.build_dir + } + + /// Returns the data directory + #[must_use] + pub fn data_dir(&self) -> &PathBuf { + &self.internal_config.data_dir + } + + /// Returns the instance IP address if available + #[must_use] + pub fn instance_ip(&self) -> Option { + self.runtime_outputs.instance_ip + } + + /// Returns the provision method + #[must_use] + pub fn provision_method(&self) -> Option { + self.runtime_outputs.provision_method + } } diff --git a/src/domain/environment/mod.rs b/src/domain/environment/mod.rs index 1b8bcffd..15717a6b 100644 --- a/src/domain/environment/mod.rs +++ b/src/domain/environment/mod.rs @@ -131,6 +131,9 @@ pub use crate::domain::tracker::{ UdpTrackerConfig, }; +// Re-export Prometheus types for convenience +pub use crate::domain::prometheus::PrometheusConfig; + use crate::adapters::ssh::SshCredentials; use crate::domain::provider::ProviderConfig; use crate::domain::{InstanceName, ProfileName}; @@ -400,25 +403,49 @@ impl Environment { /// Returns the instance name for this environment #[must_use] pub fn instance_name(&self) -> &InstanceName { - &self.context.user_inputs.instance_name + self.context.instance_name() } /// Returns the provider configuration for this environment #[must_use] pub fn provider_config(&self) -> &ProviderConfig { - &self.context.user_inputs.provider_config + self.context.provider_config() } /// Returns the SSH credentials for this environment #[must_use] pub fn ssh_credentials(&self) -> &SshCredentials { - &self.context.user_inputs.ssh_credentials + self.context.ssh_credentials() } /// Returns the SSH port for this environment #[must_use] pub fn ssh_port(&self) -> u16 { - self.context.user_inputs.ssh_port + self.context.ssh_port() + } + + /// Returns the database configuration for this environment + #[must_use] + pub fn database_config(&self) -> &DatabaseConfig { + self.context.database_config() + } + + /// Returns the tracker configuration for this environment + #[must_use] + pub fn tracker_config(&self) -> &TrackerConfig { + self.context.tracker_config() + } + + /// Returns the admin token for the HTTP API + #[must_use] + pub fn admin_token(&self) -> &str { + self.context.admin_token() + } + + /// Returns the Prometheus configuration if enabled + #[must_use] + pub fn prometheus_config(&self) -> Option<&PrometheusConfig> { + self.context.prometheus_config() } /// Returns the SSH username for this environment @@ -442,13 +469,13 @@ impl Environment { /// Returns the build directory for this environment #[must_use] pub fn build_dir(&self) -> &PathBuf { - &self.context.internal_config.build_dir + self.context.build_dir() } /// Returns the data directory for this environment #[must_use] pub fn data_dir(&self) -> &PathBuf { - &self.context.internal_config.data_dir + self.context.data_dir() } /// Returns the instance IP address if available @@ -496,7 +523,7 @@ impl Environment { /// ``` #[must_use] pub fn instance_ip(&self) -> Option { - self.context.runtime_outputs.instance_ip + self.context.instance_ip() } /// Returns the provision method for this environment @@ -511,7 +538,7 @@ impl Environment { /// The provision method, if set. #[must_use] pub fn provision_method(&self) -> Option { - self.context.runtime_outputs.provision_method + self.context.provision_method() } /// Returns whether this environment's infrastructure is managed by this tool @@ -1016,6 +1043,7 @@ mod tests { ssh_credentials, ssh_port: 22, tracker: crate::domain::tracker::TrackerConfig::default(), + prometheus: Some(crate::domain::prometheus::PrometheusConfig::default()), }, internal_config: InternalConfig { data_dir: data_dir.clone(), diff --git a/src/domain/environment/state/release_failed.rs b/src/domain/environment/state/release_failed.rs index 634a9c75..05ff8ff7 100644 --- a/src/domain/environment/state/release_failed.rs +++ b/src/domain/environment/state/release_failed.rs @@ -38,6 +38,12 @@ pub enum ReleaseStep { RenderTrackerTemplates, /// Deploying tracker configuration to the remote host via Ansible DeployTrackerConfigToRemote, + /// Creating Prometheus storage directories on remote host + CreatePrometheusStorage, + /// Rendering Prometheus configuration templates to the build directory + RenderPrometheusTemplates, + /// Deploying Prometheus configuration to the remote host via Ansible + DeployPrometheusConfigToRemote, /// Rendering Docker Compose templates to the build directory RenderDockerComposeTemplates, /// Deploying compose files to the remote host via Ansible @@ -51,6 +57,9 @@ impl fmt::Display for ReleaseStep { Self::InitTrackerDatabase => "Initialize Tracker Database", Self::RenderTrackerTemplates => "Render Tracker Templates", Self::DeployTrackerConfigToRemote => "Deploy Tracker Config to Remote", + Self::CreatePrometheusStorage => "Create Prometheus Storage", + Self::RenderPrometheusTemplates => "Render Prometheus Templates", + Self::DeployPrometheusConfigToRemote => "Deploy Prometheus Config to Remote", Self::RenderDockerComposeTemplates => "Render Docker Compose Templates", Self::DeployComposeFilesToRemote => "Deploy Compose Files to Remote", }; diff --git a/src/domain/environment/testing.rs b/src/domain/environment/testing.rs index f8403525..40aca6a9 100644 --- a/src/domain/environment/testing.rs +++ b/src/domain/environment/testing.rs @@ -37,6 +37,7 @@ pub struct EnvironmentTestBuilder { ssh_key_name: String, ssh_username: String, temp_dir: TempDir, + prometheus_config: Option, } impl EnvironmentTestBuilder { @@ -52,6 +53,7 @@ impl EnvironmentTestBuilder { ssh_key_name: "test_key".to_string(), ssh_username: "torrust".to_string(), temp_dir: TempDir::new().expect("Failed to create temp directory"), + prometheus_config: Some(crate::domain::prometheus::PrometheusConfig::default()), } } @@ -76,6 +78,16 @@ impl EnvironmentTestBuilder { self } + /// Sets the Prometheus configuration + #[must_use] + pub fn with_prometheus_config( + mut self, + config: Option, + ) -> Self { + self.prometheus_config = config; + self + } + /// Builds an Environment with custom paths inside a temporary directory /// /// This is the recommended way to create test environments as it ensures @@ -139,6 +151,7 @@ impl EnvironmentTestBuilder { ssh_credentials, ssh_port: 22, tracker: crate::domain::tracker::TrackerConfig::default(), + prometheus: self.prometheus_config, }, internal_config: crate::domain::environment::InternalConfig { data_dir: data_dir.clone(), diff --git a/src/domain/environment/user_inputs.rs b/src/domain/environment/user_inputs.rs index 2ca2981f..1d2e4cb5 100644 --- a/src/domain/environment/user_inputs.rs +++ b/src/domain/environment/user_inputs.rs @@ -20,6 +20,7 @@ use crate::adapters::ssh::SshCredentials; use crate::domain::environment::EnvironmentName; +use crate::domain::prometheus::PrometheusConfig; use crate::domain::provider::{Provider, ProviderConfig}; use crate::domain::tracker::TrackerConfig; use crate::domain::InstanceName; @@ -38,6 +39,7 @@ use serde::{Deserialize, Serialize}; /// use torrust_tracker_deployer_lib::domain::provider::{ProviderConfig, LxdConfig}; /// use torrust_tracker_deployer_lib::domain::environment::user_inputs::UserInputs; /// use torrust_tracker_deployer_lib::domain::tracker::TrackerConfig; +/// use torrust_tracker_deployer_lib::domain::prometheus::PrometheusConfig; /// use torrust_tracker_deployer_lib::shared::Username; /// use torrust_tracker_deployer_lib::adapters::ssh::SshCredentials; /// use std::path::PathBuf; @@ -57,6 +59,7 @@ use serde::{Deserialize, Serialize}; /// ), /// ssh_port: 22, /// tracker: TrackerConfig::default(), +/// prometheus: Some(PrometheusConfig::default()), /// }; /// # Ok::<(), Box>(()) /// ``` @@ -79,6 +82,13 @@ pub struct UserInputs { /// Tracker deployment configuration pub tracker: TrackerConfig, + + /// Prometheus metrics collection configuration (optional) + /// + /// When present, Prometheus service is enabled in the deployment. + /// When absent (`None`), Prometheus service is disabled. + /// Default: `Some(PrometheusConfig::default())` in generated templates. + pub prometheus: Option, } impl UserInputs { @@ -145,6 +155,7 @@ impl UserInputs { ssh_credentials, ssh_port, tracker: TrackerConfig::default(), + prometheus: Some(PrometheusConfig::default()), } } @@ -169,6 +180,7 @@ impl UserInputs { ssh_credentials, ssh_port, tracker, + prometheus: Some(PrometheusConfig::default()), } } diff --git a/src/domain/mod.rs b/src/domain/mod.rs index 42b67ee2..b3f3feda 100644 --- a/src/domain/mod.rs +++ b/src/domain/mod.rs @@ -16,6 +16,7 @@ pub mod environment; pub mod instance_name; pub mod profile_name; +pub mod prometheus; pub mod provider; pub mod template; pub mod tracker; diff --git a/src/domain/prometheus/config.rs b/src/domain/prometheus/config.rs new file mode 100644 index 00000000..9a20a32c --- /dev/null +++ b/src/domain/prometheus/config.rs @@ -0,0 +1,97 @@ +//! Prometheus configuration domain model +//! +//! Defines the configuration for Prometheus metrics scraping. + +use serde::{Deserialize, Serialize}; + +/// Prometheus metrics collection configuration +/// +/// Configures how Prometheus scrapes metrics from the tracker. +/// When present in environment configuration, Prometheus service is enabled. +/// When absent, Prometheus service is disabled. +/// +/// # Example +/// +/// ```rust +/// use torrust_tracker_deployer_lib::domain::prometheus::PrometheusConfig; +/// +/// let config = PrometheusConfig { +/// scrape_interval: 15, +/// }; +/// ``` +/// +/// # Default Behavior +/// +/// - Default scrape interval: 15 seconds +/// - Minimum recommended: 5 seconds (to avoid overwhelming the tracker) +/// - Maximum recommended: 300 seconds (5 minutes) +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] +pub struct PrometheusConfig { + /// Scrape interval in seconds + /// + /// How often Prometheus should scrape metrics from the tracker's HTTP API endpoints. + /// Default: 15 seconds + pub scrape_interval: u32, +} + +impl Default for PrometheusConfig { + fn default() -> Self { + Self { + scrape_interval: 15, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn it_should_create_prometheus_config_with_default_values() { + let config = PrometheusConfig::default(); + assert_eq!(config.scrape_interval, 15); + } + + #[test] + fn it_should_create_prometheus_config_with_custom_interval() { + let config = PrometheusConfig { + scrape_interval: 30, + }; + assert_eq!(config.scrape_interval, 30); + } + + #[test] + fn it_should_serialize_to_json() { + let config = PrometheusConfig { + scrape_interval: 20, + }; + + let json = serde_json::to_value(&config).unwrap(); + assert_eq!(json["scrape_interval"], 20); + } + + #[test] + fn it_should_deserialize_from_json() { + let json = serde_json::json!({ + "scrape_interval": 25 + }); + + let config: PrometheusConfig = serde_json::from_value(json).unwrap(); + assert_eq!(config.scrape_interval, 25); + } + + #[test] + fn it_should_support_different_scrape_intervals() { + let fast = PrometheusConfig { scrape_interval: 5 }; + let medium = PrometheusConfig { + scrape_interval: 15, + }; + let slow = PrometheusConfig { + scrape_interval: 300, + }; + + assert_eq!(fast.scrape_interval, 5); + assert_eq!(medium.scrape_interval, 15); + assert_eq!(slow.scrape_interval, 300); + } +} diff --git a/src/domain/prometheus/mod.rs b/src/domain/prometheus/mod.rs new file mode 100644 index 00000000..771a862f --- /dev/null +++ b/src/domain/prometheus/mod.rs @@ -0,0 +1,7 @@ +//! Prometheus metrics collection domain +//! +//! This module contains the domain configuration for Prometheus metrics collection. + +pub mod config; + +pub use config::PrometheusConfig; diff --git a/src/infrastructure/remote_actions/validators/mod.rs b/src/infrastructure/remote_actions/validators/mod.rs index c200f990..91f84950 100644 --- a/src/infrastructure/remote_actions/validators/mod.rs +++ b/src/infrastructure/remote_actions/validators/mod.rs @@ -1,7 +1,9 @@ pub mod cloud_init; pub mod docker; pub mod docker_compose; +pub mod prometheus; pub use cloud_init::CloudInitValidator; pub use docker::DockerValidator; pub use docker_compose::DockerComposeValidator; +pub use prometheus::PrometheusValidator; diff --git a/src/infrastructure/remote_actions/validators/prometheus.rs b/src/infrastructure/remote_actions/validators/prometheus.rs new file mode 100644 index 00000000..b949b60f --- /dev/null +++ b/src/infrastructure/remote_actions/validators/prometheus.rs @@ -0,0 +1,143 @@ +//! Prometheus smoke test validator for remote instances +//! +//! This module provides the `PrometheusValidator` which performs a smoke test +//! on a running Prometheus instance to verify it's operational and accessible. +//! +//! ## Key Features +//! +//! - Validates Prometheus web UI is accessible via HTTP +//! - Checks Prometheus returns a successful HTTP response +//! - Performs validation from inside the VM (not exposed externally) +//! +//! ## Validation Approach +//! +//! Since Prometheus is not exposed outside the VM (protected by firewall), +//! validation must be performed from inside the VM via SSH: +//! +//! 1. Connect to VM via SSH +//! 2. Execute `curl` command to fetch Prometheus homepage +//! 3. Verify successful HTTP response (200 OK) +//! +//! This smoke test confirms Prometheus is: +//! - Running and bound to the expected port +//! - Responding to HTTP requests +//! - Web UI is functional +//! +//! ## Future Enhancements +//! +//! For more comprehensive validation, consider: +//! +//! 1. **Configuration Validation**: +//! - Parse Prometheus config file to verify scrape targets +//! - Check that tracker endpoints are configured correctly +//! - Validate scrape interval matches environment config +//! +//! 2. **Data Collection Validation**: +//! - Query Prometheus API for active targets +//! - Verify tracker metrics are being collected +//! - Check that scrape jobs are succeeding (not in "down" state) +//! - Example: `curl http://localhost:9090/api/v1/targets | jq` +//! +//! 3. **Metric Availability**: +//! - Query specific tracker metrics (e.g., `torrust_tracker_info`) +//! - Verify metrics have recent timestamps +//! - Example: `curl http://localhost:9090/api/v1/query?query=up` +//! +//! These enhancements require: +//! - JSON parsing of Prometheus API responses +//! - Async coordination (waiting for first scrape to complete) +//! - More complex error handling +//! +//! The current smoke test provides a good baseline validation that can be +//! extended as needed. + +use std::net::IpAddr; +use tracing::{info, instrument}; + +use crate::adapters::ssh::SshClient; +use crate::adapters::ssh::SshConfig; +use crate::infrastructure::remote_actions::{RemoteAction, RemoteActionError}; + +/// Default Prometheus port (not exposed outside VM) +const DEFAULT_PROMETHEUS_PORT: u16 = 9090; + +/// Action that validates Prometheus is running and accessible +pub struct PrometheusValidator { + ssh_client: SshClient, + prometheus_port: u16, +} + +impl PrometheusValidator { + /// Create a new `PrometheusValidator` with the specified SSH configuration + /// + /// # Arguments + /// * `ssh_config` - SSH connection configuration containing credentials and host IP + /// * `prometheus_port` - Port where Prometheus is running (defaults to 9090 if None) + #[must_use] + pub fn new(ssh_config: SshConfig, prometheus_port: Option) -> Self { + let ssh_client = SshClient::new(ssh_config); + Self { + ssh_client, + prometheus_port: prometheus_port.unwrap_or(DEFAULT_PROMETHEUS_PORT), + } + } +} + +impl RemoteAction for PrometheusValidator { + fn name(&self) -> &'static str { + "prometheus-smoke-test" + } + + #[instrument( + name = "prometheus_smoke_test", + skip(self), + fields( + action_type = "validation", + component = "prometheus", + server_ip = %server_ip, + prometheus_port = self.prometheus_port + ) + )] + async fn execute(&self, server_ip: &IpAddr) -> Result<(), RemoteActionError> { + info!( + action = "prometheus_smoke_test", + prometheus_port = self.prometheus_port, + "Running Prometheus smoke test" + ); + + // Perform smoke test: curl Prometheus homepage and check for success + // Using -f flag to make curl fail on HTTP errors (4xx, 5xx) + // Using -s flag for silent mode (no progress bar) + // Using -o /dev/null to discard response body (we only care about status code) + let command = format!( + "curl -f -s -o /dev/null http://localhost:{} && echo 'success'", + self.prometheus_port + ); + + let output = self.ssh_client.execute(&command).map_err(|source| { + RemoteActionError::SshCommandFailed { + action_name: self.name().to_string(), + source, + } + })?; + + if !output.trim().contains("success") { + return Err(RemoteActionError::ValidationFailed { + action_name: self.name().to_string(), + message: format!( + "Prometheus smoke test failed. Prometheus may not be running or accessible on port {}. \ + Check that 'docker compose ps' shows Prometheus container as running.", + self.prometheus_port + ), + }); + } + + info!( + action = "prometheus_smoke_test", + status = "success", + "Prometheus is running and responding to HTTP requests" + ); + + Ok(()) + } +} diff --git a/src/infrastructure/templating/ansible/template/renderer/inventory.rs b/src/infrastructure/templating/ansible/template/renderer/inventory.rs index 1c760b30..912f7b20 100644 --- a/src/infrastructure/templating/ansible/template/renderer/inventory.rs +++ b/src/infrastructure/templating/ansible/template/renderer/inventory.rs @@ -98,6 +98,9 @@ impl InventoryRenderer { /// Output filename for the rendered inventory file const INVENTORY_OUTPUT_FILE: &'static str = "inventory.yml"; + /// Directory path for Ansible templates + const ANSIBLE_TEMPLATE_DIR: &'static str = "ansible"; + /// Creates a new inventory template renderer /// /// # Arguments @@ -191,7 +194,11 @@ impl InventoryRenderer { /// /// * `String` - The complete template path for inventory.yml.tera fn build_template_path() -> String { - format!("ansible/{}", Self::INVENTORY_TEMPLATE_FILE) + format!( + "{}/{}", + Self::ANSIBLE_TEMPLATE_DIR, + Self::INVENTORY_TEMPLATE_FILE + ) } } diff --git a/src/infrastructure/templating/ansible/template/renderer/project_generator.rs b/src/infrastructure/templating/ansible/template/renderer/project_generator.rs index 6588bc29..e1d3d66c 100644 --- a/src/infrastructure/templating/ansible/template/renderer/project_generator.rs +++ b/src/infrastructure/templating/ansible/template/renderer/project_generator.rs @@ -307,6 +307,8 @@ impl AnsibleProjectGenerator { "create-tracker-storage.yml", "init-tracker-database.yml", "deploy-tracker-config.yml", + "create-prometheus-storage.yml", + "deploy-prometheus-config.yml", "deploy-compose-files.yml", "run-compose-services.yml", ] { @@ -316,7 +318,7 @@ impl AnsibleProjectGenerator { tracing::debug!( "Successfully copied {} static template files", - 14 // ansible.cfg + 13 playbooks + 16 // ansible.cfg + 15 playbooks ); Ok(()) diff --git a/src/infrastructure/templating/ansible/template/renderer/variables.rs b/src/infrastructure/templating/ansible/template/renderer/variables.rs index 70980f8f..4f49cb30 100644 --- a/src/infrastructure/templating/ansible/template/renderer/variables.rs +++ b/src/infrastructure/templating/ansible/template/renderer/variables.rs @@ -99,6 +99,9 @@ impl VariablesRenderer { /// Output filename for the rendered variables file const VARIABLES_OUTPUT_FILE: &'static str = "variables.yml"; + /// Directory path for Ansible templates + const ANSIBLE_TEMPLATE_DIR: &'static str = "ansible"; + /// Creates a new variables template renderer /// /// # Arguments @@ -192,7 +195,11 @@ impl VariablesRenderer { /// /// * `String` - The complete template path for variables.yml.tera fn build_template_path() -> String { - format!("ansible/{}", Self::VARIABLES_TEMPLATE_FILE) + format!( + "{}/{}", + Self::ANSIBLE_TEMPLATE_DIR, + Self::VARIABLES_TEMPLATE_FILE + ) } } diff --git a/src/infrastructure/templating/docker_compose/template/renderer/docker_compose.rs b/src/infrastructure/templating/docker_compose/template/renderer/docker_compose.rs index 075a1c2c..52580adb 100644 --- a/src/infrastructure/templating/docker_compose/template/renderer/docker_compose.rs +++ b/src/infrastructure/templating/docker_compose/template/renderer/docker_compose.rs @@ -219,14 +219,15 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let mysql_context = DockerComposeContext::new_mysql( - "rootpass123".to_string(), - "tracker_db".to_string(), - "tracker_user".to_string(), - "userpass123".to_string(), - 3306, - ports, - ); + let mysql_context = DockerComposeContext::builder(ports) + .with_mysql( + "rootpass123".to_string(), + "tracker_db".to_string(), + "tracker_user".to_string(), + "userpass123".to_string(), + 3306, + ) + .build(); let renderer = DockerComposeRenderer::new(template_manager); let output_dir = TempDir::new().unwrap(); @@ -307,7 +308,7 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let sqlite_context = DockerComposeContext::new_sqlite(ports); + let sqlite_context = DockerComposeContext::builder(ports).build(); let renderer = DockerComposeRenderer::new(template_manager); let output_dir = TempDir::new().unwrap(); @@ -337,4 +338,119 @@ mod tests { "Should not contain mysql_data volume" ); } + + #[test] + fn it_should_render_prometheus_service_when_config_is_present() { + use crate::domain::prometheus::PrometheusConfig; + + let temp_dir = TempDir::new().unwrap(); + let template_manager = Arc::new(TemplateManager::new(temp_dir.path())); + + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let prometheus_config = PrometheusConfig { + scrape_interval: 15, + }; + let context = DockerComposeContext::builder(ports) + .with_prometheus(prometheus_config) + .build(); + + let renderer = DockerComposeRenderer::new(template_manager); + let output_dir = TempDir::new().unwrap(); + + let result = renderer.render(&context, output_dir.path()); + assert!( + result.is_ok(), + "Rendering with Prometheus context should succeed" + ); + + let output_path = output_dir.path().join("docker-compose.yml"); + let rendered_content = std::fs::read_to_string(&output_path) + .expect("Should be able to read rendered docker-compose.yml"); + + // Verify Prometheus service is present + assert!( + rendered_content.contains("prometheus:"), + "Rendered output should contain prometheus service" + ); + assert!( + rendered_content.contains("image: prom/prometheus:v3.0.1"), + "Should use Prometheus v3.0.1 image" + ); + assert!( + rendered_content.contains("container_name: prometheus"), + "Should set container name" + ); + + // Verify port mapping + assert!( + rendered_content.contains("9090:9090"), + "Should expose Prometheus port 9090" + ); + + // Verify volume mount + assert!( + rendered_content.contains("./storage/prometheus/etc:/etc/prometheus:Z"), + "Should mount Prometheus config directory" + ); + + // Verify service dependency + assert!( + rendered_content.contains("depends_on:"), + "Should have depends_on section" + ); + assert!( + rendered_content.contains("- tracker"), + "Should depend on tracker" + ); + + // Verify network + assert!( + rendered_content.contains("- backend_network"), + "Should be on backend_network" + ); + } + + #[test] + fn it_should_not_render_prometheus_service_when_config_is_absent() { + let temp_dir = TempDir::new().unwrap(); + let template_manager = Arc::new(TemplateManager::new(temp_dir.path())); + + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let context = DockerComposeContext::builder(ports).build(); + + let renderer = DockerComposeRenderer::new(template_manager); + let output_dir = TempDir::new().unwrap(); + + let result = renderer.render(&context, output_dir.path()); + assert!( + result.is_ok(), + "Rendering without Prometheus context should succeed" + ); + + let output_path = output_dir.path().join("docker-compose.yml"); + let rendered_content = std::fs::read_to_string(&output_path) + .expect("Should be able to read rendered docker-compose.yml"); + + // Verify Prometheus service is NOT present + assert!( + !rendered_content.contains("image: prom/prometheus:v3.0.1"), + "Should not contain Prometheus service when config absent" + ); + assert!( + !rendered_content.contains("container_name: prometheus"), + "Should not have prometheus container" + ); + assert!( + !rendered_content.contains("./storage/prometheus/etc:/etc/prometheus:Z"), + "Should not have prometheus volume mount" + ); + } } diff --git a/src/infrastructure/templating/docker_compose/template/renderer/env.rs b/src/infrastructure/templating/docker_compose/template/renderer/env.rs index 79b31b49..e691276b 100644 --- a/src/infrastructure/templating/docker_compose/template/renderer/env.rs +++ b/src/infrastructure/templating/docker_compose/template/renderer/env.rs @@ -98,6 +98,9 @@ impl EnvRenderer { /// Output filename for the rendered .env file const ENV_OUTPUT_FILE: &'static str = ".env"; + /// Directory path for Docker Compose templates + const DOCKER_COMPOSE_TEMPLATE_DIR: &'static str = "docker-compose"; + /// Creates a new .env template renderer /// /// # Arguments @@ -191,7 +194,11 @@ impl EnvRenderer { /// /// * `String` - The complete template path for env.tera fn build_template_path() -> String { - format!("docker-compose/{}", Self::ENV_TEMPLATE_FILE) + format!( + "{}/{}", + Self::DOCKER_COMPOSE_TEMPLATE_DIR, + Self::ENV_TEMPLATE_FILE + ) } } diff --git a/src/infrastructure/templating/docker_compose/template/renderer/project_generator.rs b/src/infrastructure/templating/docker_compose/template/renderer/project_generator.rs index 2de65f85..595c0802 100644 --- a/src/infrastructure/templating/docker_compose/template/renderer/project_generator.rs +++ b/src/infrastructure/templating/docker_compose/template/renderer/project_generator.rs @@ -212,7 +212,7 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - DockerComposeContext::new_sqlite(ports) + DockerComposeContext::builder(ports).build() } #[tokio::test] diff --git a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/context.rs b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/context.rs index 88a1b330..7dfbad44 100644 --- a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/context.rs +++ b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/context.rs @@ -3,10 +3,83 @@ //! This module defines the structure and validation for Docker Compose services //! that will be rendered into the docker-compose.yml file. +// External crates use serde::Serialize; +// Internal crate +use crate::domain::prometheus::PrometheusConfig; + +/// Context for rendering the docker-compose.yml template +/// +/// Contains all variables needed for the Docker Compose service configuration. +#[derive(Serialize, Debug, Clone)] +pub struct DockerComposeContext { + /// Database configuration + pub database: DatabaseConfig, + /// Tracker port configuration + pub ports: TrackerPorts, + /// Prometheus configuration (optional) + #[serde(skip_serializing_if = "Option::is_none")] + pub prometheus_config: Option, +} + +impl DockerComposeContext { + /// Creates a new builder for `DockerComposeContext` + /// + /// The builder starts with `SQLite` as the default database configuration. + /// Use `with_mysql()` to switch to `MySQL` configuration. + /// + /// # Arguments + /// + /// * `ports` - Tracker port configuration + /// + /// # Examples + /// + /// ```rust + /// use torrust_tracker_deployer_lib::infrastructure::templating::docker_compose::template::wrappers::docker_compose::{DockerComposeContext, TrackerPorts}; + /// + /// let ports = TrackerPorts { + /// udp_tracker_ports: vec![6868, 6969], + /// http_tracker_ports: vec![7070], + /// http_api_port: 1212, + /// }; + /// + /// // SQLite (default) + /// let context = DockerComposeContext::builder(ports.clone()).build(); + /// assert_eq!(context.database().driver(), "sqlite3"); + /// + /// // MySQL + /// let context = DockerComposeContext::builder(ports) + /// .with_mysql("root_pass".to_string(), "db".to_string(), "user".to_string(), "pass".to_string(), 3306) + /// .build(); + /// assert_eq!(context.database().driver(), "mysql"); + /// ``` + #[must_use] + pub fn builder(ports: TrackerPorts) -> DockerComposeContextBuilder { + DockerComposeContextBuilder::new(ports) + } + + /// Get the database configuration + #[must_use] + pub fn database(&self) -> &DatabaseConfig { + &self.database + } + + /// Get the tracker ports configuration + #[must_use] + pub fn ports(&self) -> &TrackerPorts { + &self.ports + } + + /// Get the Prometheus configuration if present + #[must_use] + pub fn prometheus_config(&self) -> Option<&PrometheusConfig> { + self.prometheus_config.as_ref() + } +} + /// Tracker port configuration -#[derive(Debug, Clone)] +#[derive(Serialize, Debug, Clone)] pub struct TrackerPorts { /// UDP tracker ports pub udp_tracker_ports: Vec, @@ -26,6 +99,20 @@ pub struct DatabaseConfig { pub mysql: Option, } +impl DatabaseConfig { + /// Get the database driver name + #[must_use] + pub fn driver(&self) -> &str { + &self.driver + } + + /// Get the `MySQL` configuration if present + #[must_use] + pub fn mysql(&self) -> Option<&MysqlConfig> { + self.mysql.as_ref() + } +} + /// `MySQL`-specific configuration #[derive(Serialize, Debug, Clone)] pub struct MysqlConfig { @@ -41,55 +128,30 @@ pub struct MysqlConfig { pub port: u16, } -/// Context for rendering the docker-compose.yml template +/// Builder for `DockerComposeContext` /// -/// Contains all variables needed for the Docker Compose service configuration. -#[derive(Serialize, Debug, Clone)] -pub struct DockerComposeContext { - /// Database configuration - pub database: DatabaseConfig, - /// UDP tracker ports - pub udp_tracker_ports: Vec, - /// HTTP tracker ports - pub http_tracker_ports: Vec, - /// HTTP API port - pub http_api_port: u16, +/// Provides a fluent API for constructing Docker Compose contexts with optional features. +/// Defaults to `SQLite` database configuration. +pub struct DockerComposeContextBuilder { + ports: TrackerPorts, + database: DatabaseConfig, + prometheus_config: Option, } -impl DockerComposeContext { - /// Creates a new `DockerComposeContext` with `SQLite` configuration (default) - /// - /// # Arguments - /// - /// * `ports` - Tracker port configuration - /// - /// # Examples - /// - /// ```rust - /// use torrust_tracker_deployer_lib::infrastructure::templating::docker_compose::template::wrappers::docker_compose::{DockerComposeContext, TrackerPorts}; - /// - /// let ports = TrackerPorts { - /// udp_tracker_ports: vec![6868, 6969], - /// http_tracker_ports: vec![7070], - /// http_api_port: 1212, - /// }; - /// let context = DockerComposeContext::new_sqlite(ports); - /// assert_eq!(context.database().driver(), "sqlite3"); - /// ``` - #[must_use] - pub fn new_sqlite(ports: TrackerPorts) -> Self { +impl DockerComposeContextBuilder { + /// Creates a new builder with default `SQLite` configuration + fn new(ports: TrackerPorts) -> Self { Self { + ports, database: DatabaseConfig { driver: "sqlite3".to_string(), mysql: None, }, - udp_tracker_ports: ports.udp_tracker_ports, - http_tracker_ports: ports.http_tracker_ports, - http_api_port: ports.http_api_port, + prometheus_config: None, } } - /// Creates a new `DockerComposeContext` with `MySQL` configuration + /// Switches to `MySQL` database configuration /// /// # Arguments /// @@ -98,90 +160,47 @@ impl DockerComposeContext { /// * `user` - `MySQL` user /// * `password` - `MySQL` password /// * `port` - `MySQL` port - /// * `ports` - Tracker port configuration - /// - /// # Examples - /// - /// ```rust - /// use torrust_tracker_deployer_lib::infrastructure::templating::docker_compose::template::wrappers::docker_compose::{DockerComposeContext, TrackerPorts}; - /// - /// let ports = TrackerPorts { - /// udp_tracker_ports: vec![6868, 6969], - /// http_tracker_ports: vec![7070], - /// http_api_port: 1212, - /// }; - /// let context = DockerComposeContext::new_mysql( - /// "root_pass".to_string(), - /// "tracker_db".to_string(), - /// "tracker_user".to_string(), - /// "user_pass".to_string(), - /// 3306, - /// ports, - /// ); - /// assert_eq!(context.database().driver(), "mysql"); - /// ``` #[must_use] - pub fn new_mysql( + pub fn with_mysql( + mut self, root_password: String, database: String, user: String, password: String, port: u16, - ports: TrackerPorts, ) -> Self { - Self { - database: DatabaseConfig { - driver: "mysql".to_string(), - mysql: Some(MysqlConfig { - root_password, - database, - user, - password, - port, - }), - }, - udp_tracker_ports: ports.udp_tracker_ports, - http_tracker_ports: ports.http_tracker_ports, - http_api_port: ports.http_api_port, - } - } - - /// Get the database configuration - #[must_use] - pub fn database(&self) -> &DatabaseConfig { - &self.database - } - - /// Get the UDP tracker ports - #[must_use] - pub fn udp_tracker_ports(&self) -> &[u16] { - &self.udp_tracker_ports - } - - /// Get the HTTP tracker ports - #[must_use] - pub fn http_tracker_ports(&self) -> &[u16] { - &self.http_tracker_ports - } - - /// Get the HTTP API port - #[must_use] - pub fn http_api_port(&self) -> u16 { - self.http_api_port + self.database = DatabaseConfig { + driver: "mysql".to_string(), + mysql: Some(MysqlConfig { + root_password, + database, + user, + password, + port, + }), + }; + self } -} -impl DatabaseConfig { - /// Get the database driver name + /// Adds Prometheus configuration + /// + /// # Arguments + /// + /// * `prometheus_config` - Prometheus configuration #[must_use] - pub fn driver(&self) -> &str { - &self.driver + pub fn with_prometheus(mut self, prometheus_config: PrometheusConfig) -> Self { + self.prometheus_config = Some(prometheus_config); + self } - /// Get the `MySQL` configuration if present + /// Builds the `DockerComposeContext` #[must_use] - pub fn mysql(&self) -> Option<&MysqlConfig> { - self.mysql.as_ref() + pub fn build(self) -> DockerComposeContext { + DockerComposeContext { + database: self.database, + ports: self.ports, + prometheus_config: self.prometheus_config, + } } } @@ -196,13 +215,13 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_sqlite(ports); + let context = DockerComposeContext::builder(ports).build(); assert_eq!(context.database().driver(), "sqlite3"); assert!(context.database().mysql().is_none()); - assert_eq!(context.udp_tracker_ports(), &[6868, 6969]); - assert_eq!(context.http_tracker_ports(), &[7070]); - assert_eq!(context.http_api_port(), 1212); + assert_eq!(context.ports().udp_tracker_ports, vec![6868, 6969]); + assert_eq!(context.ports().http_tracker_ports, vec![7070]); + assert_eq!(context.ports().http_api_port, 1212); } #[test] @@ -212,14 +231,15 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_mysql( - "root123".to_string(), - "tracker".to_string(), - "tracker_user".to_string(), - "pass456".to_string(), - 3306, - ports, - ); + let context = DockerComposeContext::builder(ports) + .with_mysql( + "root123".to_string(), + "tracker".to_string(), + "tracker_user".to_string(), + "pass456".to_string(), + 3306, + ) + .build(); assert_eq!(context.database().driver(), "mysql"); assert!(context.database().mysql().is_some()); @@ -231,9 +251,9 @@ mod tests { assert_eq!(mysql.password, "pass456"); assert_eq!(mysql.port, 3306); - assert_eq!(context.udp_tracker_ports(), &[6868, 6969]); - assert_eq!(context.http_tracker_ports(), &[7070]); - assert_eq!(context.http_api_port(), 1212); + assert_eq!(context.ports().udp_tracker_ports, vec![6868, 6969]); + assert_eq!(context.ports().http_tracker_ports, vec![7070]); + assert_eq!(context.ports().http_api_port, 1212); } #[test] @@ -243,7 +263,7 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_sqlite(ports); + let context = DockerComposeContext::builder(ports).build(); let serialized = serde_json::to_string(&context).unwrap(); assert!(serialized.contains("sqlite3")); @@ -257,14 +277,15 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_mysql( - "root".to_string(), - "db".to_string(), - "user".to_string(), - "pass".to_string(), - 3306, - ports, - ); + let context = DockerComposeContext::builder(ports) + .with_mysql( + "root".to_string(), + "db".to_string(), + "user".to_string(), + "pass".to_string(), + 3306, + ) + .build(); let serialized = serde_json::to_string(&context).unwrap(); assert!(serialized.contains("mysql")); @@ -274,7 +295,6 @@ mod tests { assert!(serialized.contains("pass")); assert!(serialized.contains("3306")); } - #[test] fn it_should_be_cloneable() { let ports = TrackerPorts { @@ -282,16 +302,78 @@ mod tests { http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_mysql( - "root".to_string(), - "db".to_string(), - "user".to_string(), - "pass".to_string(), - 3306, - ports, - ); + let context = DockerComposeContext::builder(ports) + .with_mysql( + "root".to_string(), + "db".to_string(), + "user".to_string(), + "pass".to_string(), + 3306, + ) + .build(); let cloned = context.clone(); assert_eq!(cloned.database().driver(), "mysql"); } + #[test] + fn it_should_not_include_prometheus_config_by_default() { + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let context = DockerComposeContext::builder(ports).build(); + + assert!(context.prometheus_config().is_none()); + } + + #[test] + fn it_should_include_prometheus_config_when_added() { + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let prometheus_config = PrometheusConfig { + scrape_interval: 30, + }; + let context = DockerComposeContext::builder(ports) + .with_prometheus(prometheus_config) + .build(); + + assert!(context.prometheus_config().is_some()); + assert_eq!(context.prometheus_config().unwrap().scrape_interval, 30); + } + + #[test] + fn it_should_not_serialize_prometheus_config_when_absent() { + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let context = DockerComposeContext::builder(ports).build(); + + let serialized = serde_json::to_string(&context).unwrap(); + assert!(!serialized.contains("prometheus_config")); + } + + #[test] + fn it_should_serialize_prometheus_config_when_present() { + let ports = TrackerPorts { + udp_tracker_ports: vec![6868, 6969], + http_tracker_ports: vec![7070], + http_api_port: 1212, + }; + let prometheus_config = PrometheusConfig { + scrape_interval: 20, + }; + let context = DockerComposeContext::builder(ports) + .with_prometheus(prometheus_config) + .build(); + + let serialized = serde_json::to_string(&context).unwrap(); + assert!(serialized.contains("prometheus_config")); + assert!(serialized.contains("\"scrape_interval\":20")); + } } diff --git a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/mod.rs b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/mod.rs index 0afa0e15..474185d2 100644 --- a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/mod.rs +++ b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/mod.rs @@ -1,5 +1,5 @@ pub mod context; pub mod template; -pub use context::{DockerComposeContext, TrackerPorts}; +pub use context::{DockerComposeContext, DockerComposeContextBuilder, TrackerPorts}; pub use template::DockerComposeTemplate; diff --git a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/template.rs b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/template.rs index 517f9ebf..ebc37fe1 100644 --- a/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/template.rs +++ b/src/infrastructure/templating/docker_compose/template/wrappers/docker_compose/template.rs @@ -104,7 +104,7 @@ services: http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_sqlite(ports); + let context = DockerComposeContext::builder(ports).build(); let template = DockerComposeTemplate::new(&template_file, context).unwrap(); assert_eq!(template.database().driver(), "sqlite3"); @@ -134,14 +134,15 @@ services: http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_mysql( - "root123".to_string(), - "tracker".to_string(), - "user".to_string(), - "pass".to_string(), - 3306, - ports, - ); + let context = DockerComposeContext::builder(ports) + .with_mysql( + "root123".to_string(), + "tracker".to_string(), + "user".to_string(), + "pass".to_string(), + 3306, + ) + .build(); let template = DockerComposeTemplate::new(&template_file, context).unwrap(); assert_eq!(template.database().driver(), "mysql"); @@ -161,15 +162,15 @@ services: "; let template_file = File::new("docker-compose.yml.tera", template_content.to_string()).unwrap(); - let ports = TrackerPorts { udp_tracker_ports: vec![6868, 6969], http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_sqlite(ports); + let context = DockerComposeContext::builder(ports).build(); let template = DockerComposeTemplate::new(&template_file, context).unwrap(); + // Create temp directory for output // Create temp directory for output let temp_dir = TempDir::new().unwrap(); let output_path = temp_dir.path().join("docker-compose.yml"); @@ -195,7 +196,7 @@ services: http_tracker_ports: vec![7070], http_api_port: 1212, }; - let context = DockerComposeContext::new_sqlite(ports); + let context = DockerComposeContext::builder(ports).build(); let result = DockerComposeTemplate::new(&template_file, context); assert!(result.is_err()); diff --git a/src/infrastructure/templating/mod.rs b/src/infrastructure/templating/mod.rs index 7880f24b..b423a335 100644 --- a/src/infrastructure/templating/mod.rs +++ b/src/infrastructure/templating/mod.rs @@ -22,6 +22,8 @@ //! - `template` - Template renderers for `OpenTofu` configuration files //! - `tracker` - Torrust Tracker configuration management //! - `template` - Template renderers for Tracker configuration files +//! - `prometheus` - Prometheus metrics collection configuration +//! - `template` - Template renderers for Prometheus configuration files //! //! ## Template Rendering //! @@ -32,5 +34,6 @@ pub mod ansible; pub mod docker_compose; +pub mod prometheus; pub mod tofu; pub mod tracker; diff --git a/src/infrastructure/templating/prometheus/mod.rs b/src/infrastructure/templating/prometheus/mod.rs new file mode 100644 index 00000000..690162b8 --- /dev/null +++ b/src/infrastructure/templating/prometheus/mod.rs @@ -0,0 +1,19 @@ +//! Prometheus integration for metrics collection +//! +//! This module provides Prometheus-specific functionality for the deployment system, +//! including template rendering for Prometheus configuration files. +//! +//! ## Components +//! +//! - `template` - Template rendering functionality for Prometheus configuration + +pub mod template; + +pub use template::{ + PrometheusContext, PrometheusProjectGenerator, PrometheusProjectGeneratorError, +}; + +/// Subdirectory name for Prometheus-related files within the build directory. +/// +/// Prometheus configuration files will be rendered to `build_dir/prometheus/`. +pub const PROMETHEUS_SUBFOLDER: &str = "prometheus"; diff --git a/src/infrastructure/templating/prometheus/template/mod.rs b/src/infrastructure/templating/prometheus/template/mod.rs new file mode 100644 index 00000000..8a2d1560 --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/mod.rs @@ -0,0 +1,12 @@ +//! Prometheus template functionality +//! +//! This module provides template-related functionality for Prometheus configuration, +//! including wrappers for dynamic templates. + +pub mod renderer; +pub mod wrapper; + +pub use renderer::{ + PrometheusConfigRenderer, PrometheusProjectGenerator, PrometheusProjectGeneratorError, +}; +pub use wrapper::PrometheusContext; diff --git a/src/infrastructure/templating/prometheus/template/renderer/mod.rs b/src/infrastructure/templating/prometheus/template/renderer/mod.rs new file mode 100644 index 00000000..714ed24d --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/renderer/mod.rs @@ -0,0 +1,7 @@ +//! Template rendering for Prometheus configuration + +pub mod project_generator; +pub mod prometheus_config; + +pub use project_generator::{PrometheusProjectGenerator, PrometheusProjectGeneratorError}; +pub use prometheus_config::{PrometheusConfigRenderer, PrometheusConfigRendererError}; diff --git a/src/infrastructure/templating/prometheus/template/renderer/project_generator.rs b/src/infrastructure/templating/prometheus/template/renderer/project_generator.rs new file mode 100644 index 00000000..19a244cd --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/renderer/project_generator.rs @@ -0,0 +1,335 @@ +//! Prometheus Project Generator +//! +//! Orchestrates the rendering of all Prometheus configuration templates following +//! the Project Generator pattern. +//! +//! ## Architecture +//! +//! This follows the three-layer Project Generator pattern: +//! - **Context** (`PrometheusContext`) - Defines variables needed by templates +//! - **Template** (`PrometheusTemplate`) - Wraps template file with context +//! - **Renderer** (`PrometheusConfigRenderer`) - Renders specific .tera templates +//! - **`ProjectGenerator`** (this file) - Orchestrates all renderers +//! +//! ## Data Flow +//! +//! Environment Config → `PrometheusConfig` → `PrometheusContext` → Template Rendering + +use std::path::{Path, PathBuf}; +use std::sync::Arc; + +use thiserror::Error; +use tracing::instrument; + +use crate::domain::prometheus::PrometheusConfig; +use crate::domain::template::TemplateManager; +use crate::domain::tracker::TrackerConfig; +use crate::infrastructure::templating::prometheus::template::{ + renderer::{PrometheusConfigRenderer, PrometheusConfigRendererError}, + PrometheusContext, +}; + +/// Errors that can occur during Prometheus project generation +#[derive(Error, Debug)] +pub enum PrometheusProjectGeneratorError { + /// Failed to create the build directory + #[error("Failed to create build directory '{directory}': {source}")] + DirectoryCreationFailed { + directory: String, + #[source] + source: std::io::Error, + }, + + /// Failed to render Prometheus configuration + #[error("Failed to render Prometheus configuration: {0}")] + RendererFailed(#[from] PrometheusConfigRendererError), + + /// Missing required tracker configuration + #[error("Tracker configuration is required to extract API token and port for Prometheus")] + MissingTrackerConfig, +} + +/// Orchestrates Prometheus configuration template rendering +/// +/// This is the Project Generator that coordinates all Prometheus template rendering. +/// It follows the standard pattern: +/// 1. Create build directory structure +/// 2. Extract data from tracker and Prometheus configs +/// 3. Build `PrometheusContext` +/// 4. Call `PrometheusConfigRenderer` to render prometheus.yml.tera +pub struct PrometheusProjectGenerator { + build_dir: PathBuf, + prometheus_renderer: PrometheusConfigRenderer, +} + +impl PrometheusProjectGenerator { + /// Default relative path for Prometheus configuration files + const PROMETHEUS_BUILD_PATH: &'static str = "prometheus"; + + /// Creates a new Prometheus project generator + /// + /// # Arguments + /// + /// * `build_dir` - The destination directory where templates will be rendered + /// * `template_manager` - The template manager to source templates from + #[must_use] + pub fn new>(build_dir: P, template_manager: Arc) -> Self { + let prometheus_renderer = PrometheusConfigRenderer::new(template_manager); + + Self { + build_dir: build_dir.as_ref().to_path_buf(), + prometheus_renderer, + } + } + + /// Renders Prometheus configuration templates to the build directory + /// + /// This method: + /// 1. Creates the build directory structure for Prometheus config + /// 2. Extracts API token and port from tracker configuration + /// 3. Builds `PrometheusContext` with `scrape_interval`, `api_token`, `api_port` + /// 4. Renders prometheus.yml.tera template + /// 5. Writes the rendered content to prometheus.yml + /// + /// # Arguments + /// + /// * `prometheus_config` - Prometheus configuration (`scrape_interval`) + /// * `tracker_config` - Tracker configuration (needed for API token and port) + /// + /// # Errors + /// + /// Returns an error if: + /// - Tracker configuration is not provided + /// - Build directory creation fails + /// - Template loading fails + /// - Template rendering fails + /// - Writing output file fails + #[instrument( + name = "prometheus_project_generator_render", + skip(self, prometheus_config, tracker_config), + fields( + build_dir = %self.build_dir.display() + ) + )] + pub fn render( + &self, + prometheus_config: &PrometheusConfig, + tracker_config: &TrackerConfig, + ) -> Result<(), PrometheusProjectGeneratorError> { + // Create build directory for Prometheus templates + let prometheus_build_dir = self.build_dir.join(Self::PROMETHEUS_BUILD_PATH); + std::fs::create_dir_all(&prometheus_build_dir).map_err(|source| { + PrometheusProjectGeneratorError::DirectoryCreationFailed { + directory: prometheus_build_dir.display().to_string(), + source, + } + })?; + + // Build PrometheusContext from configurations + let context = Self::build_context(prometheus_config, tracker_config); + + // Render prometheus.yml using PrometheusConfigRenderer + self.prometheus_renderer + .render(&context, &prometheus_build_dir)?; + + Ok(()) + } + + /// Builds `PrometheusContext` from Prometheus and Tracker configurations + /// + /// # Arguments + /// + /// * `prometheus_config` - Contains `scrape_interval` + /// * `tracker_config` - Contains HTTP API `admin_token` and `bind_address` + /// + /// # Returns + /// + /// A `PrometheusContext` with: + /// - `scrape_interval`: From `prometheus_config.scrape_interval` + /// - `api_token`: From `tracker_config.http_api.admin_token` + /// - `api_port`: Parsed from `tracker_config.http_api.bind_address` + fn build_context( + prometheus_config: &PrometheusConfig, + tracker_config: &TrackerConfig, + ) -> PrometheusContext { + let scrape_interval = prometheus_config.scrape_interval; + let api_token = tracker_config.http_api.admin_token.clone(); + + // Extract port from SocketAddr + let api_port = tracker_config.http_api.bind_address.port(); + + PrometheusContext::new(scrape_interval, api_token, api_port) + } +} + +#[cfg(test)] +mod tests { + use std::fs; + + use super::*; + use crate::domain::tracker::HttpApiConfig; + + fn create_test_template_manager() -> Arc { + use tempfile::TempDir; + + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let templates_dir = temp_dir.path().join("templates"); + let prometheus_dir = templates_dir.join("prometheus"); + + fs::create_dir_all(&prometheus_dir).expect("Failed to create prometheus dir"); + + let template_content = r#"global: + scrape_interval: {{ scrape_interval }}s + +scrape_configs: + - job_name: "tracker_stats" + metrics_path: "/api/v1/stats" + params: + token: ["{{ api_token }}"] + format: ["prometheus"] + static_configs: + - targets: ["tracker:{{ api_port }}"] +"#; + + fs::write(prometheus_dir.join("prometheus.yml.tera"), template_content) + .expect("Failed to write template"); + + // Prevent temp_dir from being dropped + std::mem::forget(temp_dir); + + Arc::new(TemplateManager::new(templates_dir)) + } + + fn create_test_tracker_config() -> TrackerConfig { + TrackerConfig { + http_api: HttpApiConfig { + bind_address: "0.0.0.0:1212".parse().expect("valid address"), + admin_token: "test_admin_token".to_string(), + }, + ..Default::default() + } + } + + #[test] + fn it_should_create_prometheus_build_directory() { + let temp_dir = tempfile::tempdir().expect("Failed to create temp dir"); + let build_dir = temp_dir.path().join("build"); + + let template_manager = create_test_template_manager(); + let generator = PrometheusProjectGenerator::new(&build_dir, template_manager); + + let prometheus_config = PrometheusConfig::default(); + let tracker_config = create_test_tracker_config(); + + generator + .render(&prometheus_config, &tracker_config) + .expect("Failed to render templates"); + + let prometheus_dir = build_dir.join("prometheus"); + assert!( + prometheus_dir.exists(), + "Prometheus directory should be created" + ); + assert!( + prometheus_dir.is_dir(), + "Prometheus build path should be a directory" + ); + } + + #[test] + fn it_should_render_prometheus_yml_with_default_config() { + let temp_dir = tempfile::tempdir().expect("Failed to create temp dir"); + let build_dir = temp_dir.path().join("build"); + + let template_manager = create_test_template_manager(); + let generator = PrometheusProjectGenerator::new(&build_dir, template_manager); + + let prometheus_config = PrometheusConfig::default(); // scrape_interval: 15 + let tracker_config = create_test_tracker_config(); + + generator + .render(&prometheus_config, &tracker_config) + .expect("Failed to render templates"); + + let prometheus_yml_path = build_dir.join("prometheus/prometheus.yml"); + assert!( + prometheus_yml_path.exists(), + "prometheus.yml should be created" + ); + + let content = + fs::read_to_string(&prometheus_yml_path).expect("Failed to read prometheus.yml"); + + // Verify default values + assert!(content.contains("scrape_interval: 15s")); + assert!(content.contains(r#"token: ["test_admin_token"]"#)); + assert!(content.contains("targets: [\"tracker:1212\"]")); + } + + #[test] + fn it_should_render_prometheus_yml_with_custom_scrape_interval() { + let temp_dir = tempfile::tempdir().expect("Failed to create temp dir"); + let build_dir = temp_dir.path().join("build"); + + let template_manager = create_test_template_manager(); + let generator = PrometheusProjectGenerator::new(&build_dir, template_manager); + + let prometheus_config = PrometheusConfig { + scrape_interval: 30, + }; + let tracker_config = create_test_tracker_config(); + + generator + .render(&prometheus_config, &tracker_config) + .expect("Failed to render templates"); + + let content = fs::read_to_string(build_dir.join("prometheus/prometheus.yml")) + .expect("Failed to read file"); + + assert!(content.contains("scrape_interval: 30s")); + } + + #[test] + fn it_should_extract_api_port_from_tracker_config() { + let temp_dir = tempfile::tempdir().expect("Failed to create temp dir"); + let build_dir = temp_dir.path().join("build"); + + let template_manager = create_test_template_manager(); + let generator = PrometheusProjectGenerator::new(&build_dir, template_manager); + + let prometheus_config = PrometheusConfig::default(); + let mut tracker_config = create_test_tracker_config(); + tracker_config.http_api.bind_address = "0.0.0.0:8080".parse().expect("valid address"); + + generator + .render(&prometheus_config, &tracker_config) + .expect("Failed to render templates"); + + let content = fs::read_to_string(build_dir.join("prometheus/prometheus.yml")) + .expect("Failed to read file"); + + assert!(content.contains("targets: [\"tracker:8080\"]")); + } + + #[test] + fn it_should_use_tracker_api_token() { + let temp_dir = tempfile::tempdir().expect("Failed to create temp dir"); + let build_dir = temp_dir.path().join("build"); + + let template_manager = create_test_template_manager(); + let generator = PrometheusProjectGenerator::new(&build_dir, template_manager); + + let prometheus_config = PrometheusConfig::default(); + let mut tracker_config = create_test_tracker_config(); + tracker_config.http_api.admin_token = "custom_admin_token_123".to_string(); + + generator + .render(&prometheus_config, &tracker_config) + .expect("Failed to render templates"); + + let content = fs::read_to_string(build_dir.join("prometheus/prometheus.yml")) + .expect("Failed to read file"); + + assert!(content.contains(r#"token: ["custom_admin_token_123"]"#)); + } +} diff --git a/src/infrastructure/templating/prometheus/template/renderer/prometheus_config.rs b/src/infrastructure/templating/prometheus/template/renderer/prometheus_config.rs new file mode 100644 index 00000000..34e6df76 --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/renderer/prometheus_config.rs @@ -0,0 +1,220 @@ +//! Prometheus configuration renderer +//! +//! Renders prometheus.yml.tera template using `PrometheusContext` and `PrometheusTemplate` wrappers. + +use std::path::Path; +use std::sync::Arc; + +use thiserror::Error; +use tracing::instrument; + +use crate::domain::template::{TemplateManager, TemplateManagerError}; +use crate::infrastructure::templating::prometheus::template::wrapper::prometheus_config::{ + template::PrometheusTemplateError, PrometheusContext, PrometheusTemplate, +}; + +/// Errors that can occur during Prometheus configuration rendering +#[derive(Error, Debug)] +pub enum PrometheusConfigRendererError { + /// Failed to get template path from template manager + #[error("Failed to get template path for 'prometheus.yml.tera': {0}")] + TemplatePathFailed(#[from] TemplateManagerError), + + /// Failed to read template file + #[error("Failed to read template file at '{path}': {source}")] + TemplateReadFailed { + path: String, + #[source] + source: std::io::Error, + }, + + /// Failed to create or render template + #[error("Failed to process Prometheus template: {0}")] + TemplateProcessingFailed(#[from] PrometheusTemplateError), +} + +/// Renders prometheus.yml.tera template to prometheus.yml configuration file +/// +/// This renderer follows the Project Generator pattern: +/// 1. Loads prometheus.yml.tera from the template manager +/// 2. Creates a `PrometheusTemplate` with `PrometheusContext` +/// 3. Renders the template to an output file +/// +/// The `PrometheusContext` contains: +/// - `scrape_interval`: How often to scrape metrics (from prometheus config) +/// - `api_token`: Tracker HTTP API admin token (for authentication) +/// - `api_port`: Tracker HTTP API port (where metrics are exposed) +pub struct PrometheusConfigRenderer { + template_manager: Arc, +} + +impl PrometheusConfigRenderer { + /// Template filename for the Prometheus Tera template + const PROMETHEUS_TEMPLATE_FILE: &'static str = "prometheus.yml.tera"; + + /// Output filename for the rendered Prometheus config file + const PROMETHEUS_OUTPUT_FILE: &'static str = "prometheus.yml"; + + /// Directory path for Prometheus templates + const PROMETHEUS_TEMPLATE_DIR: &'static str = "prometheus"; + + /// Creates a new Prometheus config renderer + /// + /// # Arguments + /// + /// * `template_manager` - The template manager to load templates from + #[must_use] + pub fn new(template_manager: Arc) -> Self { + Self { template_manager } + } + + /// Renders the Prometheus configuration to a file + /// + /// # Arguments + /// + /// * `context` - The rendering context with `scrape_interval`, `api_token`, `api_port` + /// * `output_dir` - Directory where prometheus.yml will be written + /// + /// # Errors + /// + /// Returns an error if: + /// - Template file cannot be loaded + /// - Template file cannot be read + /// - Template rendering fails + /// - Output file cannot be written + #[instrument(skip(self, context), fields(output_dir = %output_dir.display()))] + pub fn render( + &self, + context: &PrometheusContext, + output_dir: &Path, + ) -> Result<(), PrometheusConfigRendererError> { + // 1. Load template from template manager + let template_path = self.template_manager.get_template_path(&format!( + "{}/{}", + Self::PROMETHEUS_TEMPLATE_DIR, + Self::PROMETHEUS_TEMPLATE_FILE + ))?; + + // 2. Read template content + let template_content = std::fs::read_to_string(&template_path).map_err(|source| { + PrometheusConfigRendererError::TemplateReadFailed { + path: template_path.display().to_string(), + source, + } + })?; + + // 3. Create PrometheusTemplate with context + let template = PrometheusTemplate::new(template_content, context.clone())?; + + // 4. Render to output file + let output_path = output_dir.join(Self::PROMETHEUS_OUTPUT_FILE); + template.render_to_file(&output_path)?; + + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::fs; + use tempfile::TempDir; + + fn create_test_template_manager() -> Arc { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let templates_dir = temp_dir.path().join("templates"); + let prometheus_dir = templates_dir.join("prometheus"); + + fs::create_dir_all(&prometheus_dir).expect("Failed to create prometheus dir"); + + let template_content = r#"global: + scrape_interval: {{ scrape_interval }}s + +scrape_configs: + - job_name: "tracker_stats" + metrics_path: "/api/v1/stats" + params: + token: ["{{ api_token }}"] + format: ["prometheus"] + static_configs: + - targets: ["tracker:{{ api_port }}"] +"#; + + fs::write(prometheus_dir.join("prometheus.yml.tera"), template_content) + .expect("Failed to write template"); + + // Prevent temp_dir from being dropped + std::mem::forget(temp_dir); + + Arc::new(TemplateManager::new(templates_dir)) + } + + #[test] + fn it_should_render_prometheus_template_successfully() { + let template_manager = create_test_template_manager(); + let renderer = PrometheusConfigRenderer::new(template_manager); + + let context = PrometheusContext::new(15, "test_token".to_string(), 1212); + + let temp_dir = TempDir::new().expect("Failed to create temp output dir"); + let output_dir = temp_dir.path(); + + renderer + .render(&context, output_dir) + .expect("Failed to render Prometheus template"); + + let output_file = output_dir.join("prometheus.yml"); + assert!(output_file.exists(), "prometheus.yml should be created"); + + let file_content = fs::read_to_string(&output_file).expect("Failed to read prometheus.yml"); + assert!(file_content.contains("scrape_interval: 15s")); + assert!(file_content.contains(r#"token: ["test_token"]"#)); + assert!(file_content.contains(r#"targets: ["tracker:1212"]"#)); + } + + #[test] + fn it_should_substitute_all_template_variables() { + let template_manager = create_test_template_manager(); + let renderer = PrometheusConfigRenderer::new(template_manager); + + let context = PrometheusContext::new(30, "admin_token_123".to_string(), 8080); + + let temp_dir = TempDir::new().expect("Failed to create temp output dir"); + let output_dir = temp_dir.path(); + + renderer + .render(&context, output_dir) + .expect("Failed to render Prometheus template"); + + let file_content = + fs::read_to_string(output_dir.join("prometheus.yml")).expect("Failed to read file"); + + // Verify all variables were substituted + assert!(file_content.contains("scrape_interval: 30s")); + assert!(file_content.contains(r#"token: ["admin_token_123"]"#)); + assert!(file_content.contains(r#"targets: ["tracker:8080"]"#)); + + // Verify no unrendered template tags remain + assert!(!file_content.contains("{{")); + assert!(!file_content.contains("}}")); + } + + #[test] + fn it_should_use_embedded_template_when_external_not_found() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let templates_dir = temp_dir.path().join("templates"); + fs::create_dir_all(&templates_dir).expect("Failed to create templates dir"); + + let template_manager = Arc::new(TemplateManager::new(&templates_dir)); + let renderer = PrometheusConfigRenderer::new(template_manager); + + let context = PrometheusContext::new(15, "token".to_string(), 1212); + let output_dir = temp_dir.path(); + + let result = renderer.render(&context, output_dir); + assert!( + result.is_ok(), + "Should use embedded template when external template not found" + ); + } +} diff --git a/src/infrastructure/templating/prometheus/template/wrapper/mod.rs b/src/infrastructure/templating/prometheus/template/wrapper/mod.rs new file mode 100644 index 00000000..b6810cf2 --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/wrapper/mod.rs @@ -0,0 +1,7 @@ +//! Template wrappers for prometheus.yml.tera +//! +//! This module provides context and template wrappers for Prometheus configuration. + +pub mod prometheus_config; + +pub use prometheus_config::PrometheusContext; diff --git a/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/context.rs b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/context.rs new file mode 100644 index 00000000..fbcdce61 --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/context.rs @@ -0,0 +1,141 @@ +//! Prometheus template context +//! +//! Defines the variables needed for prometheus.yml.tera template rendering. + +use serde::Serialize; + +/// Context for rendering prometheus.yml.tera template +/// +/// Contains all variables needed for Prometheus scrape configuration. +/// The context extracts metrics endpoint details from the tracker configuration. +/// +/// # Example +/// +/// ```rust +/// use torrust_tracker_deployer_lib::infrastructure::templating::prometheus::PrometheusContext; +/// +/// let context = PrometheusContext { +/// scrape_interval: 15, +/// api_token: "MyAccessToken".to_string(), +/// api_port: 1212, +/// }; +/// ``` +/// +/// # Data Flow +/// +/// Environment Config (`tracker.http_api`) → Application Layer → `PrometheusContext` +/// +/// - `scrape_interval`: From `prometheus.scrape_interval` (default: 15 seconds) +/// - `api_token`: From `tracker.http_api.admin_token` +/// - `api_port`: Parsed from `tracker.http_api.bind_address` (e.g., 1212 from "0.0.0.0:1212") +#[derive(Debug, Clone, Serialize, PartialEq)] +pub struct PrometheusContext { + /// How often to scrape metrics from tracker (in seconds) + /// + /// Default: 15 seconds + /// Minimum: 5 seconds (to avoid overwhelming the tracker) + /// Maximum: 300 seconds (5 minutes) + pub scrape_interval: u32, + + /// Tracker HTTP API admin token for authentication + /// + /// This token is required to access the tracker's metrics endpoints: + /// - `/api/v1/stats` - Aggregate statistics + /// - `/api/v1/metrics` - Detailed operational metrics + pub api_token: String, + + /// Tracker HTTP API port + /// + /// The port where the tracker's HTTP API is listening. + /// Prometheus scrapes metrics from this API. + /// Extracted from the tracker's HTTP API bind address. + /// Example: 1212 from "0.0.0.0:1212" + pub api_port: u16, +} + +impl PrometheusContext { + /// Creates a new `PrometheusContext` + /// + /// # Arguments + /// + /// * `scrape_interval` - How often to scrape metrics (in seconds) + /// * `api_token` - Tracker HTTP API admin token + /// * `api_port` - Tracker HTTP API port + /// + /// # Example + /// + /// ```rust + /// use torrust_tracker_deployer_lib::infrastructure::templating::prometheus::PrometheusContext; + /// + /// let context = PrometheusContext::new(15, "MyToken".to_string(), 1212); + /// ``` + #[must_use] + pub fn new(scrape_interval: u32, api_token: String, api_port: u16) -> Self { + Self { + scrape_interval, + api_token, + api_port, + } + } +} + +impl Default for PrometheusContext { + fn default() -> Self { + Self { + scrape_interval: 15, + api_token: String::new(), + api_port: 1212, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn it_should_create_prometheus_context() { + let context = PrometheusContext::new(15, "test_token".to_string(), 1212); + + assert_eq!(context.scrape_interval, 15); + assert_eq!(context.api_token, "test_token"); + assert_eq!(context.api_port, 1212); + } + + #[test] + fn it_should_create_default_context() { + let context = PrometheusContext::default(); + + assert_eq!(context.scrape_interval, 15); + assert_eq!(context.api_token, ""); + assert_eq!(context.api_port, 1212); + } + + #[test] + fn it_should_serialize_to_json() { + let context = PrometheusContext::new(30, "admin_token".to_string(), 8080); + + let json = serde_json::to_value(&context).unwrap(); + assert_eq!(json["scrape_interval"], 30); + assert_eq!(json["api_token"], "admin_token"); + assert_eq!(json["api_port"], 8080); + } + + #[test] + fn it_should_support_different_scrape_intervals() { + let fast_scrape = PrometheusContext::new(5, "token".to_string(), 1212); + let slow_scrape = PrometheusContext::new(300, "token".to_string(), 1212); + + assert_eq!(fast_scrape.scrape_interval, 5); + assert_eq!(slow_scrape.scrape_interval, 300); + } + + #[test] + fn it_should_support_different_ports() { + let default_port = PrometheusContext::new(15, "token".to_string(), 1212); + let custom_port = PrometheusContext::new(15, "token".to_string(), 8080); + + assert_eq!(default_port.api_port, 1212); + assert_eq!(custom_port.api_port, 8080); + } +} diff --git a/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/mod.rs b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/mod.rs new file mode 100644 index 00000000..e4310833 --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/mod.rs @@ -0,0 +1,9 @@ +//! Prometheus configuration template wrapper +//! +//! This module provides the context for rendering the prometheus.yml.tera template. + +pub mod context; +pub mod template; + +pub use context::PrometheusContext; +pub use template::PrometheusTemplate; diff --git a/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/template.rs b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/template.rs new file mode 100644 index 00000000..0708366a --- /dev/null +++ b/src/infrastructure/templating/prometheus/template/wrapper/prometheus_config/template.rs @@ -0,0 +1,244 @@ +//! Prometheus template wrapper +//! +//! Wraps the prometheus.yml.tera template file with its context for rendering. + +use std::path::Path; + +use tera::Tera; +use thiserror::Error; + +use super::context::PrometheusContext; + +/// Errors that can occur during Prometheus template operations +#[derive(Error, Debug)] +pub enum PrometheusTemplateError { + /// Failed to create Tera instance + #[error("Failed to create Tera template engine: {0}")] + TeraCreationFailed(#[from] tera::Error), + + /// Failed to render template + #[error("Failed to render Prometheus template: {0}")] + RenderingFailed(String), + + /// Failed to write rendered content to file + #[error("Failed to write Prometheus configuration to '{path}': {source}")] + WriteFileFailed { + path: String, + #[source] + source: std::io::Error, + }, +} + +/// Wrapper for prometheus.yml template with rendering context +/// +/// This type encapsulates the Prometheus configuration template and provides +/// methods to render it with the given context. +/// +/// The context contains: +/// - `scrape_interval`: How often Prometheus scrapes metrics +/// - `api_token`: Tracker HTTP API admin token +/// - `api_port`: Tracker HTTP API port +pub struct PrometheusTemplate { + /// The template content + content: String, + /// The rendering context + context: PrometheusContext, +} + +impl PrometheusTemplate { + /// Creates a new Prometheus template with the given content and context + /// + /// # Arguments + /// + /// * `content` - The raw template content (prometheus.yml.tera) + /// * `context` - The rendering context with `scrape_interval`, `api_token`, `api_port` + /// + /// # Errors + /// + /// Returns an error if the template content is invalid Tera syntax + pub fn new( + template_content: String, + context: PrometheusContext, + ) -> Result { + // Validate template syntax by attempting to create a Tera instance + let mut tera = Tera::default(); + tera.add_raw_template("prometheus.yml", &template_content)?; + + Ok(Self { + content: template_content, + context, + }) + } + + /// Renders the template with the context + /// + /// # Returns + /// + /// The rendered template content as a String + /// + /// # Errors + /// + /// Returns an error if template rendering fails + pub fn render(&self) -> Result { + let mut tera = Tera::default(); + tera.add_raw_template("prometheus.yml", &self.content) + .map_err(|e| PrometheusTemplateError::RenderingFailed(e.to_string()))?; + + let context = tera::Context::from_serialize(&self.context) + .map_err(|e| PrometheusTemplateError::RenderingFailed(e.to_string()))?; + + tera.render("prometheus.yml", &context) + .map_err(|e| PrometheusTemplateError::RenderingFailed(e.to_string())) + } + + /// Renders the template and writes it to a file + /// + /// # Arguments + /// + /// * `output_path` - Path where the rendered prometheus.yml should be written + /// + /// # Errors + /// + /// Returns an error if rendering fails or if writing to the file fails + pub fn render_to_file(&self, output_path: &Path) -> Result<(), PrometheusTemplateError> { + let rendered = self.render()?; + + std::fs::write(output_path, rendered).map_err(|source| { + PrometheusTemplateError::WriteFileFailed { + path: output_path.display().to_string(), + source, + } + })?; + + Ok(()) + } + + /// Returns the raw template content + #[must_use] + pub fn content(&self) -> &str { + &self.content + } + + /// Returns a reference to the rendering context + #[must_use] + pub fn context(&self) -> &PrometheusContext { + &self.context + } +} + +#[cfg(test)] +mod tests { + use super::*; + use tempfile::TempDir; + + fn sample_template_content() -> String { + r#"global: + scrape_interval: {{ scrape_interval }}s + +scrape_configs: + - job_name: "tracker_stats" + metrics_path: "/api/v1/stats" + params: + token: ["{{ api_token }}"] + format: ["prometheus"] + static_configs: + - targets: ["tracker:{{ api_port }}"] +"# + .to_string() + } + + #[test] + fn it_should_create_prometheus_template_successfully() { + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(15, "test_token".to_string(), 1212); + + let template = PrometheusTemplate::new(template_content, ctx); + assert!(template.is_ok()); + } + + #[test] + fn it_should_fail_with_invalid_template_syntax() { + let invalid_content = "{{ unclosed".to_string(); + let context = PrometheusContext::new(15, "token".to_string(), 1212); + + let result = PrometheusTemplate::new(invalid_content, context); + assert!(result.is_err()); + } + + #[test] + fn it_should_render_template_with_context() { + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(30, "admin_token".to_string(), 8080); + + let template = + PrometheusTemplate::new(template_content, ctx).expect("Failed to create template"); + + let rendered = template.render().expect("Failed to render template"); + + assert!(rendered.contains("scrape_interval: 30s")); + assert!(rendered.contains(r#"token: ["admin_token"]"#)); + assert!(rendered.contains(r#"targets: ["tracker:8080"]"#)); + } + + #[test] + fn it_should_not_contain_template_syntax_after_rendering() { + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(15, "token".to_string(), 1212); + + let template = + PrometheusTemplate::new(template_content, ctx).expect("Failed to create template"); + + let rendered = template.render().expect("Failed to render template"); + + // Verify no unrendered template tags remain + assert!(!rendered.contains("{{")); + assert!(!rendered.contains("}}")); + } + + #[test] + fn it_should_render_to_file_successfully() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let output_path = temp_dir.path().join("prometheus.yml"); + + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(20, "file_token".to_string(), 9090); + + let template = + PrometheusTemplate::new(template_content, ctx).expect("Failed to create template"); + + template + .render_to_file(&output_path) + .expect("Failed to render to file"); + + assert!(output_path.exists()); + + let file_content = + std::fs::read_to_string(&output_path).expect("Failed to read output file"); + + assert!(file_content.contains("scrape_interval: 20s")); + assert!(file_content.contains(r#"token: ["file_token"]"#)); + assert!(file_content.contains(r#"targets: ["tracker:9090"]"#)); + } + + #[test] + fn it_should_provide_access_to_content() { + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(15, "token".to_string(), 1212); + + let template = PrometheusTemplate::new(template_content.clone(), ctx) + .expect("Failed to create template"); + + assert_eq!(template.content(), template_content); + } + + #[test] + fn it_should_provide_access_to_context() { + let template_content = sample_template_content(); + let ctx = PrometheusContext::new(25, "context_token".to_string(), 7070); + + let template = PrometheusTemplate::new(template_content, ctx.clone()) + .expect("Failed to create template"); + + assert_eq!(template.context(), &ctx); + } +} diff --git a/src/infrastructure/templating/tracker/template/renderer/tracker_config.rs b/src/infrastructure/templating/tracker/template/renderer/tracker_config.rs index a060c1f0..6a1075df 100644 --- a/src/infrastructure/templating/tracker/template/renderer/tracker_config.rs +++ b/src/infrastructure/templating/tracker/template/renderer/tracker_config.rs @@ -54,7 +54,14 @@ pub struct TrackerConfigRenderer { } impl TrackerConfigRenderer { - const TRACKER_TEMPLATE_PATH: &'static str = "tracker/tracker.toml.tera"; + /// Template filename for the Tracker Tera template + const TRACKER_TEMPLATE_FILE: &'static str = "tracker.toml.tera"; + + /// Output filename for the rendered Tracker config file + const TRACKER_OUTPUT_FILE: &'static str = "tracker.toml"; + + /// Directory path for Tracker templates + const TRACKER_TEMPLATE_DIR: &'static str = "tracker"; /// Creates a new tracker config renderer /// @@ -91,9 +98,11 @@ impl TrackerConfigRenderer { output_dir: &Path, ) -> Result<(), TrackerConfigRendererError> { // 1. Load template from template manager - let template_path = self - .template_manager - .get_template_path(Self::TRACKER_TEMPLATE_PATH)?; + let template_path = self.template_manager.get_template_path(&format!( + "{}/{}", + Self::TRACKER_TEMPLATE_DIR, + Self::TRACKER_TEMPLATE_FILE + ))?; // 2. Read template content let template_content = std::fs::read_to_string(&template_path).map_err(|source| { @@ -107,7 +116,7 @@ impl TrackerConfigRenderer { let template = TrackerTemplate::new(template_content, context.clone())?; // 4. Render to output file - let output_path = output_dir.join("tracker.toml"); + let output_path = output_dir.join(Self::TRACKER_OUTPUT_FILE); template.render_to_file(&output_path)?; Ok(()) diff --git a/src/testing/e2e/tasks/run_release_validation.rs b/src/testing/e2e/tasks/run_release_validation.rs index ec25b1fa..e56800d3 100644 --- a/src/testing/e2e/tasks/run_release_validation.rs +++ b/src/testing/e2e/tasks/run_release_validation.rs @@ -23,9 +23,23 @@ use crate::adapters::ssh::SshConfig; use crate::adapters::ssh::SshCredentials; use crate::infrastructure::remote_actions::{RemoteAction, RemoteActionError}; +/// Service validation configuration +/// +/// Controls which optional service validations should be performed +/// during release validation. This allows for flexible validation +/// based on which services are enabled in the environment configuration. +#[derive(Debug, Clone, Copy, Default)] +pub struct ServiceValidation { + /// Whether to validate Prometheus configuration files + pub prometheus: bool, +} + /// Default deployment directory for Docker Compose files const DEFAULT_DEPLOY_DIR: &str = "/opt/torrust"; +/// Default directory for Prometheus configuration files +const DEFAULT_PROMETHEUS_CONFIG_DIR: &str = "/opt/torrust/storage/prometheus/etc"; + /// Errors that can occur during release validation #[derive(Debug, Error)] pub enum ReleaseValidationError { @@ -38,6 +52,16 @@ Tip: Ensure the release command completed successfully and files were deployed" #[source] source: RemoteActionError, }, + + /// Prometheus configuration files validation failed + #[error( + "Prometheus configuration files validation failed: {source} +Tip: Ensure Prometheus is configured in environment config and release command completed successfully" + )] + PrometheusConfigValidationFailed { + #[source] + source: RemoteActionError, + }, } impl ReleaseValidationError { @@ -82,6 +106,36 @@ impl ReleaseValidationError { - Re-run release command: cargo run -- release - Or manually copy files to /opt/torrust/ +For more information, see docs/e2e-testing/." + } + Self::PrometheusConfigValidationFailed { .. } => { + "Prometheus Configuration Files Validation Failed - Detailed Troubleshooting: + +1. Check if Prometheus is enabled in environment config: + - Verify prometheus section exists in envs/.json + - Ensure prometheus.scrape_interval is set (e.g., 15) + +2. Check if release command completed: + - SSH to instance: ssh user@instance-ip + - Check Prometheus directory: ls -la /opt/torrust/storage/prometheus/etc/ + - Verify prometheus.yml exists + +3. Verify file deployment: + - Check Ansible deployment logs for errors + - Verify the release command ran without errors + - Ensure source template files exist in templates/prometheus/ + +4. Common issues: + - Prometheus section missing from environment config (intentional if disabled) + - Storage directory not created: mkdir -p /opt/torrust/storage/prometheus/etc + - Insufficient permissions to write files + - Ansible playbook failed silently + - Template rendering errors + +5. Re-deploy if needed: + - Re-run release command: cargo run -- release + - Or manually copy files to /opt/torrust/storage/prometheus/etc/ + For more information, see docs/e2e-testing/." } } @@ -150,6 +204,68 @@ impl RemoteAction for ComposeFilesValidator { } } +/// Validates Prometheus configuration files are deployed +struct PrometheusConfigValidator { + ssh_client: crate::adapters::ssh::SshClient, + config_dir: std::path::PathBuf, +} + +impl PrometheusConfigValidator { + /// Create a new `PrometheusConfigValidator` with the specified SSH configuration + #[must_use] + fn new(ssh_config: SshConfig) -> Self { + let ssh_client = crate::adapters::ssh::SshClient::new(ssh_config); + Self { + ssh_client, + config_dir: std::path::PathBuf::from(DEFAULT_PROMETHEUS_CONFIG_DIR), + } + } +} + +impl RemoteAction for PrometheusConfigValidator { + fn name(&self) -> &'static str { + "prometheus-config-validation" + } + + async fn execute(&self, server_ip: &std::net::IpAddr) -> Result<(), RemoteActionError> { + info!( + action = "prometheus_config_validation", + config_dir = %self.config_dir.display(), + server_ip = %server_ip, + "Validating Prometheus configuration files are deployed" + ); + + // Check if prometheus.yml exists + let config_dir = self.config_dir.display(); + let command = format!("test -f {config_dir}/prometheus.yml && echo 'exists'"); + + let output = self.ssh_client.execute(&command).map_err(|source| { + RemoteActionError::SshCommandFailed { + action_name: self.name().to_string(), + source, + } + })?; + + if !output.trim().contains("exists") { + return Err(RemoteActionError::ValidationFailed { + action_name: self.name().to_string(), + message: format!( + "prometheus.yml not found in {config_dir}. \ + Ensure Prometheus is configured and release command completed successfully." + ), + }); + } + + info!( + action = "prometheus_config_validation", + status = "success", + "Prometheus configuration files are deployed correctly" + ); + + Ok(()) + } +} + /// Run release validation tests on a configured instance /// /// This function validates that the `release` command executed correctly @@ -159,6 +275,7 @@ impl RemoteAction for ComposeFilesValidator { /// /// * `socket_addr` - Socket address where the target instance can be reached /// * `ssh_credentials` - SSH credentials for connecting to the instance +/// * `services` - Optional service validation configuration (defaults to no optional services) /// /// # Returns /// @@ -170,13 +287,18 @@ impl RemoteAction for ComposeFilesValidator { /// - SSH connection cannot be established /// - Docker Compose files are not found /// - File validation fails +/// - Optional service validation fails (when enabled) pub async fn run_release_validation( socket_addr: SocketAddr, ssh_credentials: &SshCredentials, + services: Option, ) -> Result<(), ReleaseValidationError> { + let services = services.unwrap_or_default(); + info!( socket_addr = %socket_addr, ssh_user = %ssh_credentials.ssh_username, + validate_prometheus = services.prometheus, "Running release validation tests" ); @@ -185,6 +307,11 @@ pub async fn run_release_validation( // Validate Docker Compose files are deployed validate_compose_files(ip_addr, ssh_credentials, socket_addr.port()).await?; + // Optionally validate Prometheus configuration files + if services.prometheus { + validate_prometheus_config(ip_addr, ssh_credentials, socket_addr.port()).await?; + } + info!( socket_addr = %socket_addr, status = "success", @@ -212,3 +339,22 @@ async fn validate_compose_files( Ok(()) } + +/// Validate Prometheus configuration files are deployed +async fn validate_prometheus_config( + ip_addr: std::net::IpAddr, + ssh_credentials: &SshCredentials, + port: u16, +) -> Result<(), ReleaseValidationError> { + info!("Validating Prometheus configuration files deployment"); + + let ssh_config = SshConfig::new(ssh_credentials.clone(), SocketAddr::new(ip_addr, port)); + + let validator = PrometheusConfigValidator::new(ssh_config); + validator + .execute(&ip_addr) + .await + .map_err(|source| ReleaseValidationError::PrometheusConfigValidationFailed { source })?; + + Ok(()) +} diff --git a/src/testing/e2e/tasks/run_run_validation.rs b/src/testing/e2e/tasks/run_run_validation.rs index d6355ab1..53e887ff 100644 --- a/src/testing/e2e/tasks/run_run_validation.rs +++ b/src/testing/e2e/tasks/run_run_validation.rs @@ -59,8 +59,20 @@ use tracing::info; use crate::adapters::ssh::SshConfig; use crate::adapters::ssh::SshCredentials; use crate::infrastructure::external_validators::RunningServicesValidator; +use crate::infrastructure::remote_actions::validators::PrometheusValidator; use crate::infrastructure::remote_actions::{RemoteAction, RemoteActionError}; +/// Service validation configuration +/// +/// Controls which optional service validations should be performed +/// during run validation. This allows for flexible validation +/// based on which services are enabled in the environment configuration. +#[derive(Debug, Clone, Copy, Default)] +pub struct ServiceValidation { + /// Whether to validate Prometheus is running and accessible + pub prometheus: bool, +} + /// Errors that can occur during run validation #[derive(Debug, Error)] pub enum RunValidationError { @@ -73,6 +85,16 @@ Tip: Ensure Docker Compose services are started and healthy" #[source] source: RemoteActionError, }, + + /// Prometheus smoke test failed + #[error( + "Prometheus smoke test failed: {source} +Tip: Ensure Prometheus container is running and accessible on port 9090" + )] + PrometheusValidationFailed { + #[source] + source: RemoteActionError, + }, } impl RunValidationError { @@ -118,6 +140,35 @@ impl RunValidationError { - Re-run the 'run' command: cargo run -- run - Or manually: cd /opt/torrust && docker compose up -d +For more information, see docs/e2e-testing/." + } + Self::PrometheusValidationFailed { .. } => { + "Prometheus Smoke Test Failed - Detailed Troubleshooting: + +1. Check Prometheus container status: + - SSH to instance: ssh user@instance-ip + - Check container: cd /opt/torrust && docker compose ps + - View Prometheus logs: docker compose logs prometheus + +2. Verify Prometheus is accessible: + - Test from inside VM: curl http://localhost:9090 + - Check if port 9090 is listening: ss -tlnp | grep 9090 + +3. Common issues: + - Prometheus container failed to start (check logs) + - Port 9090 already in use by another process + - Prometheus configuration file has errors + - Insufficient memory for Prometheus + +4. Debug steps: + - Check Prometheus config: docker compose exec prometheus cat /etc/prometheus/prometheus.yml + - Restart Prometheus: docker compose restart prometheus + - Check scrape targets: curl http://localhost:9090/api/v1/targets | jq + +5. Re-deploy if needed: + - Re-run 'run' command: cargo run -- run + - Or manually: cd /opt/torrust && docker compose up -d prometheus + For more information, see docs/e2e-testing/." } } @@ -135,6 +186,7 @@ For more information, see docs/e2e-testing/." /// * `ssh_credentials` - SSH credentials for connecting to the instance /// * `tracker_api_port` - Port for the tracker API health endpoint /// * `http_tracker_ports` - Ports for HTTP tracker health endpoints (can be empty) +/// * `services` - Optional service validation configuration (defaults to no optional services) /// /// # Returns /// @@ -146,24 +198,29 @@ For more information, see docs/e2e-testing/." /// - SSH connection cannot be established /// - Services are not running /// - Services are unhealthy +/// - Optional service validation fails (when enabled) pub async fn run_run_validation( socket_addr: SocketAddr, ssh_credentials: &SshCredentials, tracker_api_port: u16, http_tracker_ports: Vec, + services: Option, ) -> Result<(), RunValidationError> { + let services = services.unwrap_or_default(); + info!( socket_addr = %socket_addr, ssh_user = %ssh_credentials.ssh_username, tracker_api_port = tracker_api_port, http_tracker_ports = ?http_tracker_ports, + validate_prometheus = services.prometheus, "Running 'run' command validation tests" ); let ip_addr = socket_addr.ip(); - // Validate running services - validate_running_services( + // Validate externally accessible services (tracker API, HTTP tracker) + validate_external_services( ip_addr, ssh_credentials, socket_addr.port(), @@ -172,6 +229,11 @@ pub async fn run_run_validation( ) .await?; + // Optionally validate Prometheus is running and accessible + if services.prometheus { + validate_prometheus(ip_addr, ssh_credentials, socket_addr.port()).await?; + } + info!( socket_addr = %socket_addr, status = "success", @@ -181,19 +243,25 @@ pub async fn run_run_validation( Ok(()) } -/// Validate running services on a configured instance +/// Validate externally accessible services on a configured instance +/// +/// This function validates services that are exposed outside the VM and accessible +/// without SSH (e.g., tracker API, HTTP tracker). These services have firewall rules +/// allowing external access. It checks the status of services started by the `run` +/// command and verifies they are operational by connecting from outside the VM. +/// +/// # Note /// -/// This function validates that Docker Compose services are running and healthy -/// on the target instance. It checks the status of services started by the `run` -/// command and verifies they are operational. -async fn validate_running_services( +/// Internal services like Prometheus (not exposed externally) are validated separately +/// via SSH in `validate_prometheus()`. +async fn validate_external_services( ip_addr: IpAddr, ssh_credentials: &SshCredentials, port: u16, tracker_api_port: u16, http_tracker_ports: Vec, ) -> Result<(), RunValidationError> { - info!("Validating running services"); + info!("Validating externally accessible services (tracker API, HTTP tracker)"); let ssh_config = SshConfig::new(ssh_credentials.clone(), SocketAddr::new(ip_addr, port)); @@ -206,3 +274,30 @@ async fn validate_running_services( Ok(()) } + +/// Validate Prometheus is running and accessible via smoke test +/// +/// This function performs a smoke test on Prometheus by connecting via SSH +/// and executing a curl command to verify the web UI is accessible. +/// +/// # Note +/// +/// Prometheus runs on port 9090 inside the VM but is NOT exposed externally +/// (blocked by firewall). Validation must be performed from inside the VM. +async fn validate_prometheus( + ip_addr: IpAddr, + ssh_credentials: &SshCredentials, + port: u16, +) -> Result<(), RunValidationError> { + info!("Validating Prometheus is running and accessible"); + + let ssh_config = SshConfig::new(ssh_credentials.clone(), SocketAddr::new(ip_addr, port)); + + let prometheus_validator = PrometheusValidator::new(ssh_config, None); + prometheus_validator + .execute(&ip_addr) + .await + .map_err(|source| RunValidationError::PrometheusValidationFailed { source })?; + + Ok(()) +} diff --git a/templates/ansible/create-prometheus-storage.yml b/templates/ansible/create-prometheus-storage.yml new file mode 100644 index 00000000..9dc57e42 --- /dev/null +++ b/templates/ansible/create-prometheus-storage.yml @@ -0,0 +1,15 @@ +--- +- name: Create Prometheus storage directories + hosts: all + become: true + + tasks: + - name: Create Prometheus directory structure + ansible.builtin.file: + path: "{{ item }}" + state: directory + mode: "0755" + owner: "{{ ansible_user }}" + group: "{{ ansible_user }}" + loop: + - /opt/torrust/storage/prometheus/etc diff --git a/templates/ansible/deploy-prometheus-config.yml b/templates/ansible/deploy-prometheus-config.yml new file mode 100644 index 00000000..6247063d --- /dev/null +++ b/templates/ansible/deploy-prometheus-config.yml @@ -0,0 +1,41 @@ +--- +# Deploy Prometheus Configuration +# +# This playbook deploys the prometheus.yml configuration file to the remote host. +# The configuration file is copied from the local build directory to the Prometheus +# configuration directory on the remote instance. +# +# Requirements: +# - Prometheus storage directories must exist (created by create-prometheus-storage.yml) +# - Build directory must contain rendered prometheus.yml +# +# Variables: +# - ansible_user: The SSH user for the remote host (set automatically) + +- name: Deploy Prometheus configuration + hosts: all + become: true + + tasks: + - name: Copy prometheus.yml to VM + ansible.builtin.copy: + src: "{{ playbook_dir }}/../prometheus/prometheus.yml" + # Note: This is the host path. Inside the container, it's mounted to /etc/prometheus/ + dest: /opt/torrust/storage/prometheus/etc/prometheus.yml + mode: "0644" + owner: "{{ ansible_user }}" + group: "{{ ansible_user }}" + + - name: Verify Prometheus configuration file exists + ansible.builtin.stat: + path: /opt/torrust/storage/prometheus/etc/prometheus.yml + register: prometheus_config + + - name: Assert Prometheus configuration was deployed + ansible.builtin.assert: + that: + - prometheus_config.stat.exists + - prometheus_config.stat.isreg + - prometheus_config.stat.pw_name == ansible_user + fail_msg: "Prometheus configuration file was not deployed properly" + success_msg: "Prometheus configuration deployed successfully" diff --git a/templates/docker-compose/docker-compose.yml.tera b/templates/docker-compose/docker-compose.yml.tera index 6c7f75c3..ced74f58 100644 --- a/templates/docker-compose/docker-compose.yml.tera +++ b/templates/docker-compose/docker-compose.yml.tera @@ -39,15 +39,15 @@ services: - backend_network ports: # UDP Tracker Ports (dynamically configured) -{%- for port in udp_tracker_ports %} +{%- for port in ports.udp_tracker_ports %} - {{ port }}:{{ port }}/udp {%- endfor %} # HTTP Tracker Ports (dynamically configured) -{%- for port in http_tracker_ports %} +{%- for port in ports.http_tracker_ports %} - {{ port }}:{{ port }} {%- endfor %} # HTTP API Port (dynamically configured) - - {{ http_api_port }}:{{ http_api_port }} + - {{ ports.http_api_port }}:{{ ports.http_api_port }} volumes: - ./storage/tracker/lib:/var/lib/torrust/tracker:Z - ./storage/tracker/log:/var/log/torrust/tracker:Z @@ -57,6 +57,26 @@ services: max-size: "10m" max-file: "10" +{% if prometheus_config %} + prometheus: + image: prom/prometheus:v3.0.1 + container_name: prometheus + tty: true + restart: unless-stopped + networks: + - backend_network + ports: + - "9090:9090" + volumes: + - ./storage/prometheus/etc:/etc/prometheus:Z + logging: + options: + max-size: "10m" + max-file: "10" + depends_on: + - tracker +{% endif %} + {% if database.driver == "mysql" %} mysql: image: mysql:8.0 diff --git a/templates/prometheus/prometheus.yml.tera b/templates/prometheus/prometheus.yml.tera new file mode 100644 index 00000000..b0cf874c --- /dev/null +++ b/templates/prometheus/prometheus.yml.tera @@ -0,0 +1,26 @@ +# Prometheus Configuration for Torrust Tracker Metrics Collection +# +# This configuration defines how Prometheus scrapes metrics from the Torrust Tracker. +# It collects both aggregate statistics and detailed operational metrics. + +global: + scrape_interval: {{ scrape_interval }}s # How often to scrape metrics from targets + +scrape_configs: + # Tracker Statistics - Aggregate metrics about tracker state + - job_name: "tracker_stats" + metrics_path: "/api/v1/stats" + params: + token: ["{{ api_token }}"] + format: ["prometheus"] + static_configs: + - targets: ["tracker:{{ api_port }}"] + + # Tracker Metrics - Detailed operational metrics + - job_name: "tracker_metrics" + metrics_path: "/api/v1/metrics" + params: + token: ["{{ api_token }}"] + format: ["prometheus"] + static_configs: + - targets: ["tracker:{{ api_port }}"]