-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add Prometheus integration for metrics collection (#238) #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Prometheus as top-level service (not nested in tracker config) - Add phase checkpoints requiring approval between phases - Define new domain module structure for prometheus - Clarify separation of concerns between services
- Create prometheus.yml.tera template with Tera variables - Define scrape_interval, api_token, metrics_port variables - Configure two scrape jobs: tracker_stats and tracker_metrics - Template ready for rendering with PrometheusContext Phase 1: Template Structure & Data Flow Design
- Create PrometheusContext struct with scrape_interval, api_token, api_port fields - Add complete module structure following existing patterns - Implement comprehensive unit tests (5 tests covering creation, defaults, serialization) - Update Tera template to use api_port variable (renamed from metrics_port for clarity) - Context extracts data from tracker HTTP API configuration for Prometheus scraping
- Create PrometheusConfig struct in domain layer (src/domain/prometheus/) - Add optional prometheus field to UserInputs (enabled by default) - Implement comprehensive unit tests (5 tests covering defaults, serialization) - Update all UserInputs constructors and test fixtures - Configuration structure: Option<PrometheusConfig> for opt-in/opt-out behavior - Default scrape interval: 15 seconds
- Create PrometheusConfigRenderer to load and render prometheus.yml.tera - Add PrometheusTemplate wrapper for Tera integration - Implement PrometheusProjectGenerator to orchestrate rendering - Extract context from PrometheusConfig and TrackerConfig - Add 12 unit tests with comprehensive coverage - All linters passing (markdown, yaml, toml, cspell, clippy, rustfmt, shellcheck)
- Add prometheus_config field to DockerComposeContext - Implement with_prometheus() builder method - Add conditional Prometheus service to docker-compose.yml.tera - Use bind mount for Prometheus config: ./storage/prometheus/etc:/etc/prometheus:Z - Add 4 unit tests for Prometheus service rendering (with/without config) - All linters passing
… workflow - Create RenderPrometheusTemplatesStep for Prometheus config rendering - Add render_prometheus_templates() method to ReleaseCommandHandler - Prometheus templates rendered independently at Step 5 (after tracker, before docker-compose) - Add RenderPrometheusTemplates variant to ReleaseStep enum - Extend EnvironmentTestBuilder with with_prometheus_config() method - Export PrometheusProjectGeneratorError from prometheus module - Fix architectural issue: Each service now renders its templates independently - Docker Compose step only adds Prometheus config to context (no template rendering) This follows the architectural principle that each service should render its templates independently in the release handler. Docker Compose templates are NOT the master templates - they only define service orchestration. The environment configuration is the source of truth for which services are enabled. All tests passing (1507 tests), all linters passing.
- Create Ansible playbooks for Prometheus storage and config deployment - create-prometheus-storage.yml: Creates /opt/torrust/storage/prometheus/etc - deploy-prometheus-config.yml: Deploys prometheus.yml with verification - Create application steps following tracker pattern - CreatePrometheusStorageStep: Executes create-prometheus-storage playbook - DeployPrometheusConfigStep: Executes deploy-prometheus-config playbook - Register playbooks in AnsibleProjectGenerator (16 total) - Register steps in application/steps/application module - Add release handler methods with conditional execution - create_prometheus_storage(): Step 5 in workflow - deploy_prometheus_config_to_remote(): Step 7 in workflow - Add ReleaseStep enum variants for Prometheus operations - Add PrometheusStorageCreation error variant with help text - Update workflow to 9 steps total - All linters passing, all tests passing (1507 tests) Each service now independently handles storage creation and config deployment.
…us (Phase 7) - Add Prometheus configuration file validation to release tests - Create PrometheusConfigValidator for remote file verification - Validates prometheus.yml exists at /opt/torrust/storage/prometheus/etc/ - Checks file permissions and ownership - Refactor validation with ServiceValidation struct for extensibility - Replace boolean parameter with flags struct for future services (Grafana, etc.) - Supports selective validation based on enabled services - Update e2e-deployment environment to include Prometheus - Add prometheus config with 15s scrape_interval - Create e2e-deployment-no-prometheus.json for disabled scenario testing - Manual E2E testing completed and verified: - ✅ Prometheus container running (prom/prometheus:v3.0.1) - ✅ Both tracker endpoints (stats & metrics) scraped successfully - ✅ Prometheus UI accessible and functional - ✅ Metrics collection verified over time - Add comprehensive manual testing documentation - Created docs/e2e-testing/manual/prometheus-verification.md - Documents verification steps for container, config, targets, UI, and metrics - Includes troubleshooting guide for common issues - Provides success criteria checklist - All linters passing, all E2E tests passing (1507+ tests) Architecture validated: Independent service rendering pattern working correctly with Prometheus fully integrated into deployment workflow.
- Create ADR documenting Prometheus integration architectural decisions: - Enabled-by-default with opt-out approach (monitoring best practice) - Independent template rendering pattern (each service renders own templates) - ServiceValidation struct for extensible E2E testing (supports future services) - Document alternatives considered and consequences - Update user guide with Prometheus configuration section: - Document prometheus.scrape_interval configuration - Explain enabled-by-default behavior and opt-out pattern - Add Prometheus UI access instructions (port 9090) - Link to manual verification guide for detailed testing - Add technical terms to project dictionary: - Alertmanager, entr, flatlined, promtool, tulpn - All linters passing, all tests passing (1507+ tests) Documentation completes Phase 8 of issue #238 implementation.
All goals achieved: - ✅ Prometheus service conditionally added to docker-compose - ✅ Configuration template with tracker metrics endpoints - ✅ Environment schema extended for Prometheus monitoring - ✅ Service dependencies configured properly - ✅ Included in templates by default (enabled-by-default) - ✅ Users can disable by removing config section (opt-out) - ✅ Deployed and verified collecting metrics successfully Summary: 8 phases complete across 8 commits (2ca0fa9 through 2a820e2). Prometheus fully integrated with metrics collection, E2E validation, and comprehensive documentation.
Change Prometheus template path from:
build/{env}/storage/prometheus/etc/prometheus.yml
To:
build/{env}/prometheus/prometheus.yml
This aligns with the tracker pattern (build/{env}/tracker/tracker.toml)
making it easier to identify all services in the build directory.
Changes:
- Update PROMETHEUS_SUBFOLDER constant from 'storage/prometheus/etc' to 'prometheus'
- Update PrometheusProjectGenerator to use simplified path
- Update Ansible deploy playbook to copy from new build location
- Update all unit tests to use new path structure
- Update documentation references
The VM deployment path remains unchanged at:
/opt/torrust/storage/prometheus/etc/prometheus.yml
This is the actual location inside the VM where the file is deployed
and mounted into the Prometheus container.
All tests passing (1507+ tests, E2E tests verified).
…uplication - Moved manual-testing.md -> manual/README.md (generic workflow) - Moved manual-testing-mysql.md -> manual/mysql-verification.md (MySQL-specific) - Restructured mysql-verification.md to focus only on verification and troubleshooting - Removed duplicated deployment workflow from MySQL guide (now only in README) - Added 'Debugging with Application Logs' section to generic README - Added service-specific verification section to README with links - Updated docs/e2e-testing/README.md to reference new structure - Fixed markdown linting issues (heading levels, code block formatting) This follows DRY principle: common workflow in README, service-specific verification in dedicated files (MySQL, Prometheus).
…ervice-specific guide - Created docs/user-guide/services/ directory for service documentation - Moved Prometheus configuration and usage details to services/prometheus.md - Added services/README.md explaining the directory structure and purpose - Updated main user guide to remove Prometheus-specific content - Added Services section to user guide with links to service guides - Simplified configuration examples to show only core fields This improves maintainability by: - Keeping generic deployment workflow in main guide - Isolating service-specific details in dedicated guides - Making it easier to add/remove services from the stack - Following DRY principle established in E2E testing docs
- Created docs/user-guide/security.md with detailed security information - Explains automatic firewall configuration by configure command - Clarifies why firewall is critical (protects Prometheus/MySQL ports) - Documents E2E testing vs production security differences - Includes SSH security, Docker security, network security sections - Provides production security checklist - Added brief security section to main user guide with link Key security feature documented: The configure command automatically sets up UFW firewall on VMs to protect internal services (Prometheus port 9090, MySQL port 3306) while keeping tracker services publicly accessible. This is critical because Docker Compose exposes these ports, which would be publicly accessible without firewall protection. Note: E2E tests use containers (no firewall needed), but production deployments use VMs with automatic firewall configuration.
…tern All template renderers now consistently use three constants: - TEMPLATE_FILE: Source template filename (e.g., 'inventory.yml.tera') - OUTPUT_FILE: Rendered output filename (e.g., 'inventory.yml') - TEMPLATE_DIR: Template directory path (e.g., 'ansible') Updated renderers: - InventoryRenderer: Added ANSIBLE_TEMPLATE_DIR constant - VariablesRenderer: Added ANSIBLE_TEMPLATE_DIR constant - DockerComposeRenderer: Already had all three constants - EnvRenderer: Added DOCKER_COMPOSE_TEMPLATE_DIR constant - PrometheusConfigRenderer: Split PROMETHEUS_TEMPLATE_PATH into three constants - TrackerConfigRenderer: Split TRACKER_TEMPLATE_PATH into three constants - CloudInitRenderer: Already had all three constants Benefits: - Consistent pattern across all renderers (same responsibility, same structure) - Easier to understand template source and destination paths - More maintainable when adding new renderers - Clear separation of concerns (filename vs directory path) All tests passing (1509 tests).
- Added PrometheusValidator for SSH-based smoke testing via curl to localhost:9090 - Added ServiceValidation struct for conditional validation (matches release validation pattern) - Added PrometheusValidationFailed error with comprehensive troubleshooting help - Updated run_run_validation to conditionally validate Prometheus when enabled - Renamed validate_running_services to validate_external_services for clarity * External services: tracker API, HTTP tracker (exposed, no SSH) * Internal services: Prometheus (port 9090, firewall-blocked, SSH required) - Updated E2E tests to validate Prometheus smoke test functionality - All E2E tests passing (deployment workflow validated Prometheus successfully)
- Grouped imports properly (external → internal) - Main type (DockerComposeContext) with implementation comes first - Helper types (TrackerPorts, DatabaseConfig, MysqlConfig) follow with their implementations - Tests remain at the end Follows docs/contributing/module-organization.md conventions: - Imports always first - Public before private - High-level before low-level - Important before secondary
- Format multi-line string format expressions consistently - Break long format! calls into multiple lines for readability Automatic formatting applied by rustfmt with new Rust version
- Changed DockerComposeContext from flattened port fields to composed TrackerPorts field - Added Serialize derive to TrackerPorts for proper JSON serialization - Simplified constructors to use direct field move instead of destructuring - Replaced 3 individual accessor methods with single ports() getter - Updated Tera template to use nested access pattern (ports.field_name) - Maintains all existing test contracts (9/9 tests passing) Benefits: - Single source of truth for port structure (DRY principle) - Better type safety and semantic grouping - Easier maintenance - change port structure in one place - More idiomatic Rust composition pattern
- Introduce DockerComposeContextBuilder for flexible, scalable configuration - Replace specialized constructors (new_sqlite, new_mysql) with unified builder() entry point - Default to SQLite database configuration (zero-config common case) - Explicit database override via with_mysql() method - Consistent optional feature pattern: all use with_* methods - Move Prometheus configuration to builder chain - Update all test code and production code to use new builder API BREAKING CHANGES: - Removed: DockerComposeContext::new_sqlite(ports) - Removed: DockerComposeContext::new_mysql(root_password, database, user, password, port, ports) - Removed: context.with_prometheus(config) post-construction method - New API: DockerComposeContext::builder(ports).build() - MySQL: builder(ports).with_mysql(...).build() - Prometheus: builder(ports).with_prometheus(config).build() Rationale: Previous constructor-based API was "chaotic" with: - Two specialized constructors forcing early database choice - One mutable method (with_prometheus) inconsistent with immutable constructors - Poor scalability: future features (Grafana, Redis) would explode constructor count - Confusing API: when to use constructors vs methods? Builder pattern provides: - Single entry point scales infinitely - SQLite default handles common case with zero config - Explicit database choice via with_mysql() - All optional features via consistent with_* pattern - Clean immutable result after build() - Self-documenting API (builder chain reads like English) - Future-proof: easy to add with_grafana(), with_redis(), etc. This is a complete refactoring with no backward compatibility, justified by: - Project in early development phase - Not used in production by end users yet - Clean migration without technical debt from deprecated APIs - Better developer experience for future contributors
- Extract helper methods to reduce complexity of execute() method - Separate database context creation from Prometheus configuration - Follow top-down, public-first module organization principles - Convert database context creation methods to associated functions (use Self::) - Database methods now return builder instead of final context for composition Key improvements: - execute() now clearly shows workflow: extract data → create contexts → apply prometheus → build - create_sqlite_contexts() and create_mysql_contexts() are pure functions focused on database config - apply_prometheus_config() applied once after database context creation (not inside each method) - Prometheus configuration independent of database choice (better separation of concerns) Method organization (no section headers): 1. new() - public constructor 2. execute() - main public method 3. Helper methods in logical order of usage 4. Associated functions at the end (extract_tracker_ports) This refactoring addresses the "chaotic" execute() method identified earlier by: - Reducing method from ~118 lines to ~40 lines - Extracting focused helper methods with single responsibilities - Making the control flow clear and readable - Improving testability through smaller, isolated functions
…nt delegation This refactoring addresses two related architectural concerns: Phase 1 - Application Layer (Law of Demeter): - Added 4 convenience methods to Environment: database_config(), tracker_config(), admin_token(), prometheus_config() - Updated docker_compose_templates.rs to use new methods instead of chaining through context - Eliminates method chaining violations like self.environment.context().user_inputs.tracker.core.database Phase 2 - Domain Layer (Consistent Delegation): - Added 13 accessor methods to EnvironmentContext for all directly-accessed fields - Updated 11 Environment methods to consistently delegate to context methods - Changes pattern from direct access (self.context.user_inputs.field) to delegation (self.context.field()) Benefits: - Single source of truth: EnvironmentContext controls field access - Better encapsulation: Environment doesn't know context's internal structure - Easier maintenance: Structural changes only affect EnvironmentContext - Consistent API: All Environment methods follow same delegation pattern - Improved testability: Can mock EnvironmentContext methods independently All 1509 tests passing, pre-commit checks passed.
Member
Author
|
ACK c8962e1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Implements issue #238 - Adds Prometheus as a metrics collection service for the Torrust Tracker deployment.
Changes
This PR adds complete Prometheus integration across 8 phases:
Phase 1: Template Structure & Data Flow Design (2ca0fa9)
PrometheusContextstruct for template renderingtemplates/prometheus/prometheus.yml.teratemplatePhase 2: Environment Configuration (92aab59)
PrometheusConfigdomain struct insrc/domain/prometheus/prometheusfield toUserInputs(enabled by default)Phase 3: Prometheus Template Renderer (731eaf4)
PrometheusConfigRendererto load and render templatesPrometheusTemplatewrapper for Tera integrationPrometheusProjectGeneratorfor rendering workflowPhase 4: Docker Compose Integration (22790de)
prometheus_config: Option<PrometheusConfig>toDockerComposeContextwith_prometheus()method for builder patterndocker-compose.yml.tera./storage/prometheus/etc:/etc/prometheus:ZPhase 5: Release Command Integration (f20d45c)
RenderPrometheusTemplatesStepin release handlerEnvironmentTestBuilderwithwith_prometheus_config()Phase 6: Ansible Deployment (9c1b91a)
create-prometheus-storage.yml- Creates directory structuredeploy-prometheus-config.yml- Deploys configuration with verificationAnsibleProjectGenerator(16 total)Phase 7: Testing & Verification (a257fcf)
ServiceValidationstruct for extensibilityPrometheusConfigValidatorfor file verificationPhase 8: Documentation (2a820e2)
docs/decisions/prometheus-integration-pattern.mdKey Features
✅ Enabled by Default: Prometheus included in generated environment templates
✅ Opt-Out Available: Users can disable by removing the
prometheussection✅ Configuration-Driven: Service presence controlled by config section existence
✅ Independent Rendering: Each service renders its templates independently
✅ Extensible Testing: ServiceValidation pattern supports future services (Grafana, etc.)
Configuration
{ "prometheus": { "scrape_interval": 15 } }To disable: Remove the entire
prometheussection from environment config.Architecture
prometheussection → Service enabledTesting
Documentation
docs/decisions/prometheus-integration-pattern.mddocs/user-guide/README.mddocs/e2e-testing/manual/prometheus-verification.mddocs/issues/238-prometheus-slice-release-run-commands.mdRelated Issues
Closes #238
Parent Epic: #216
Checklist