Skip to content

Commit 47d708a

Browse files
committed
Merge #239: feat: add Prometheus integration for metrics collection (#238)
c8962e1 refactor: [#238] fix Law of Demeter violations and implement consistent delegation (Jose Celano) a624d80 refactor: simplify execute() method in RenderDockerComposeTemplatesStep (Jose Celano) f2c3cb0 refactor: implement Builder Pattern for DockerComposeContext (Jose Celano) 010e6fc refactor: use composition for TrackerPorts in DockerComposeContext (Jose Celano) b056f2a style: apply rustfmt formatting to template renderers (Jose Celano) 22ef8ae refactor: reorganize docker_compose context module following conventions (Jose Celano) cbeac26 feat: [#238] add Prometheus smoke test validation after run command (Jose Celano) b79b436 chore: [#238] add Traefik to project dictionary and fix formatting (Jose Celano) 4292c9a refactor: [#238] standardize renderer constants to use three-part pattern (Jose Celano) 8638430 docs: [#238] add comprehensive security guide for production deployments (Jose Celano) 3eb96f9 docs: [#238] refactor user guide to extract Prometheus content into service-specific guide (Jose Celano) 254a9a8 docs: [#238] refactor manual E2E testing documentation to eliminate duplication (Jose Celano) c0e3192 refactor: simplify Prometheus build directory structure (Jose Celano) 1bdd612 docs: [#238] mark issue complete - all 8 phases implemented (Jose Celano) 2a820e2 feat: [#238] add Prometheus integration documentation (Phase 8) (Jose Celano) a257fcf feat: [#238] add E2E validation and manual testing guide for Prometheus (Phase 7) (Jose Celano) 2f33fe0 docs: [#238] mark Phase 6 complete with commit hash (Jose Celano) 9c1b91a feat: [#238] add Ansible deployment for Prometheus (Phase 6) (Jose Celano) f20d45c feat: [#238] add independent Prometheus template rendering in release workflow (Jose Celano) 22790de feat: [#238] integrate Prometheus with Docker Compose (Jose Celano) 731eaf4 feat: [#238] implement Prometheus template renderer (Jose Celano) 92aab59 feat: [#238] add Prometheus domain configuration (Jose Celano) 2ca0fa9 feat: [#238] add PrometheusContext for template rendering (Jose Celano) 5ae2a69 feat: [#238] add Prometheus configuration template (Jose Celano) 64b1ae4 docs: [#238] update Prometheus slice spec with correct architecture (Jose Celano) Pull request description: ## Overview Implements issue #238 - Adds Prometheus as a metrics collection service for the Torrust Tracker deployment. ## Changes This PR adds complete Prometheus integration across 8 phases: ### Phase 1: Template Structure & Data Flow Design (2ca0fa9) - Created `PrometheusContext` struct for template rendering - Implemented module structure following existing patterns - Added `templates/prometheus/prometheus.yml.tera` template - 5 comprehensive unit tests ### Phase 2: Environment Configuration (92aab59) - Created `PrometheusConfig` domain struct in `src/domain/prometheus/` - Added optional `prometheus` field to `UserInputs` (enabled by default) - Updated constructors and test fixtures - 5 comprehensive unit tests ### Phase 3: Prometheus Template Renderer (731eaf4) - Created `PrometheusConfigRenderer` to load and render templates - Implemented `PrometheusTemplate` wrapper for Tera integration - Created `PrometheusProjectGenerator` for rendering workflow - 12 comprehensive unit tests with full coverage ### Phase 4: Docker Compose Integration (22790de) - Added `prometheus_config: Option<PrometheusConfig>` to `DockerComposeContext` - Implemented `with_prometheus()` method for builder pattern - Added conditional Prometheus service to `docker-compose.yml.tera` - Bind mount: `./storage/prometheus/etc:/etc/prometheus:Z` - 4 comprehensive unit tests ### Phase 5: Release Command Integration (f20d45c) - **Fixed**: Moved Prometheus template rendering to independent step - Created `RenderPrometheusTemplatesStep` in release handler - Prometheus templates rendered independently (Step 5 of 9) - Extended `EnvironmentTestBuilder` with `with_prometheus_config()` - **Architectural Principle**: Each service renders templates independently ### Phase 6: Ansible Deployment (9c1b91a) - Created Ansible playbooks for Prometheus storage and config deployment - `create-prometheus-storage.yml` - Creates directory structure - `deploy-prometheus-config.yml` - Deploys configuration with verification - Registered playbooks in `AnsibleProjectGenerator` (16 total) - Updated release handler with Prometheus deployment steps ### Phase 7: Testing & Verification (a257fcf) - Refactored validation with `ServiceValidation` struct for extensibility - Created `PrometheusConfigValidator` for file verification - Updated E2E tests to use ServiceValidation pattern - Created test environments (with/without Prometheus) - Manual E2E testing completed successfully: - ✅ Prometheus container running - ✅ Both tracker endpoints scraped successfully - ✅ Prometheus UI accessible - ✅ Metrics being collected - Created comprehensive manual testing documentation (450+ lines) ### Phase 8: Documentation (2a820e2) - Created ADR: `docs/decisions/prometheus-integration-pattern.md` - Documents enabled-by-default with opt-out approach - Explains independent template rendering pattern - Documents ServiceValidation struct for extensibility - Updated user guide with Prometheus configuration section - Added technical terms to project dictionary ## Key Features ✅ **Enabled by Default**: Prometheus included in generated environment templates ✅ **Opt-Out Available**: Users can disable by removing the `prometheus` section ✅ **Configuration-Driven**: Service presence controlled by config section existence ✅ **Independent Rendering**: Each service renders its templates independently ✅ **Extensible Testing**: ServiceValidation pattern supports future services (Grafana, etc.) ## Configuration ```json { "prometheus": { "scrape_interval": 15 } } ``` **To disable**: Remove the entire `prometheus` section from environment config. ## Architecture - **DDD Layers**: Domain + Infrastructure + Application - **Pattern**: Template System with Project Generator pattern - **Service Detection**: Presence of `prometheus` section → Service enabled - **Dependencies**: Prometheus depends on tracker service ## Testing - ✅ All linters passing - ✅ 1507+ tests passing - ✅ E2E validation (automated + manual) - ✅ Manual verification with real LXD VM deployment ## Documentation - ADR: `docs/decisions/prometheus-integration-pattern.md` - User Guide: `docs/user-guide/README.md` - Manual Testing: `docs/e2e-testing/manual/prometheus-verification.md` - Issue Tracking: `docs/issues/238-prometheus-slice-release-run-commands.md` ## Related Issues Closes #238 Parent Epic: #216 ## Checklist - [x] All goals achieved - [x] Code follows project conventions - [x] All tests passing (unit + E2E) - [x] All linters passing - [x] Documentation complete (ADR + user guide + manual verification) - [x] Manual testing completed successfully ACKs for top commit: josecelano: ACK c8962e1 Tree-SHA512: 26e0e19a9a76d8457aedba7eb7f8cf41ae1a9d3f2a79c9106e35caa51c552efae1680b159bb865a84884e8e64c4071c940f043a017cb6b9588c3bad45857bbb7
2 parents f0969fa + c8962e1 commit 47d708a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+5223
-555
lines changed
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# Decision: Prometheus Integration Pattern - Enabled by Default with Opt-Out
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Date
8+
9+
2025-01-22
10+
11+
## Context
12+
13+
The tracker deployment system needed to add Prometheus as a metrics collection service. Several design decisions were required:
14+
15+
1. **Enablement Strategy**: Should Prometheus be mandatory, opt-in, or enabled-by-default?
16+
2. **Template Rendering**: How should Prometheus templates be rendered in the release workflow?
17+
3. **Service Validation**: How should E2E tests validate optional services like Prometheus?
18+
19+
The decision impacts:
20+
21+
- User experience (ease of getting started with monitoring)
22+
- System architecture (template rendering patterns)
23+
- Testing patterns (extensibility for future optional services)
24+
25+
## Decision
26+
27+
### 1. Enabled-by-Default with Opt-Out
28+
29+
Prometheus is **included by default** in generated environment templates but can be disabled by removing the configuration section.
30+
31+
**Implementation**:
32+
33+
```rust
34+
pub struct UserInputs {
35+
pub prometheus: Option<PrometheusConfig>, // Some by default, None to disable
36+
}
37+
```
38+
39+
**Configuration**:
40+
41+
```json
42+
{
43+
"prometheus": {
44+
"scrape_interval": 15
45+
}
46+
}
47+
```
48+
49+
**Disabling**: Remove the entire `prometheus` section from the environment config.
50+
51+
**Rationale**:
52+
53+
- Monitoring is a best practice - users should get it by default
54+
- Opt-out is simple - just remove the config section
55+
- No complex feature flags or enablement parameters needed
56+
- Follows principle of least surprise (monitoring expected for production deployments)
57+
58+
### 2. Independent Template Rendering Pattern
59+
60+
Each service renders its templates **independently** in the release handler, not from within other service's template rendering.
61+
62+
**Architecture**:
63+
64+
```text
65+
ReleaseCommandHandler::execute()
66+
├─ Step 1: Create tracker storage
67+
├─ Step 2: Render tracker templates (tracker/*.toml)
68+
├─ Step 3: Deploy tracker configs
69+
├─ Step 4: Create Prometheus storage (if enabled)
70+
├─ Step 5: Render Prometheus templates (prometheus.yml) - INDEPENDENT STEP
71+
├─ Step 6: Deploy Prometheus configs
72+
├─ Step 7: Render Docker Compose templates (docker-compose.yml)
73+
└─ Step 8: Deploy compose files
74+
```
75+
76+
**Rationale**:
77+
78+
- Each service is responsible for its own template rendering
79+
- Docker Compose templates only define service orchestration, not content generation
80+
- Environment configuration is the source of truth for which services are enabled
81+
- Follows Single Responsibility Principle (each step does one thing)
82+
- Makes it easy to add future services (Grafana, Alertmanager, etc.)
83+
84+
**Anti-Pattern Avoided**: Rendering Prometheus templates from within Docker Compose template rendering step.
85+
86+
### 3. ServiceValidation Struct for Extensible Testing
87+
88+
E2E validation uses a `ServiceValidation` struct with boolean flags instead of function parameters.
89+
90+
**Implementation**:
91+
92+
```rust
93+
pub struct ServiceValidation {
94+
pub prometheus: bool,
95+
// Future: pub grafana: bool,
96+
// Future: pub alertmanager: bool,
97+
}
98+
99+
pub fn run_release_validation(
100+
socket_addr: SocketAddr,
101+
ssh_credentials: &SshCredentials,
102+
services: Option<ServiceValidation>,
103+
) -> Result<(), String>
104+
```
105+
106+
**Rationale**:
107+
108+
- Extensible for future services without API changes
109+
- More semantic than boolean parameters
110+
- Clear intent: `ServiceValidation { prometheus: true }`
111+
- Follows Open-Closed Principle (open for extension, closed for modification)
112+
113+
**Anti-Pattern Avoided**: `run_release_validation_with_prometheus_check(addr, creds, true)` - too specific and not extensible.
114+
115+
## Consequences
116+
117+
### Positive
118+
119+
1. **Better User Experience**:
120+
121+
- Users get monitoring by default without manual setup
122+
- Simple opt-out (remove config section)
123+
- Production-ready deployments out of the box
124+
125+
2. **Cleaner Architecture**:
126+
127+
- Each service manages its own templates independently
128+
- Clear separation of concerns in release handler
129+
- Easy to add future services (Grafana, Alertmanager, Loki, etc.)
130+
131+
3. **Extensible Testing**:
132+
133+
- ServiceValidation struct easily extended for new services
134+
- Consistent pattern for optional service validation
135+
- Type-safe validation configuration
136+
137+
4. **Maintenance Benefits**:
138+
- Independent template rendering simplifies debugging
139+
- Each service's templates can be modified independently
140+
- Clear workflow steps make issues easier to trace
141+
142+
### Negative
143+
144+
1. **Default Overhead**:
145+
146+
- Users who don't want monitoring must manually remove the section
147+
- Prometheus container always included in default deployments
148+
- Slightly more disk/memory usage for minimal deployments
149+
150+
2. **Configuration Discovery**:
151+
- Users must learn that removing the section disables the service
152+
- Not immediately obvious from JSON schema alone
153+
- Requires documentation of the opt-out pattern
154+
155+
### Risks
156+
157+
1. **Breaking Changes**: Future Prometheus config schema changes require careful migration planning
158+
2. **Service Dependencies**: Adding services that depend on Prometheus requires proper ordering logic
159+
3. **Template Complexity**: As services grow, need to ensure independent rendering doesn't duplicate logic
160+
161+
## Alternatives Considered
162+
163+
### Alternative 1: Mandatory Prometheus
164+
165+
**Approach**: Always deploy Prometheus, no opt-out.
166+
167+
**Rejected Because**:
168+
169+
- Forces monitoring on users who don't want it
170+
- Increases minimum resource requirements
171+
- Violates principle of least astonishment for minimal deployments
172+
173+
### Alternative 2: Opt-In with Feature Flag
174+
175+
**Approach**: Prometheus disabled by default, enabled with `"prometheus": { "enabled": true }`.
176+
177+
**Rejected Because**:
178+
179+
- Requires users to discover and enable monitoring manually
180+
- Most production deployments should have monitoring - opt-in makes it less likely
181+
- Adds complexity with enabled/disabled flags
182+
183+
### Alternative 3: Render Prometheus Templates from Docker Compose Step
184+
185+
**Approach**: Docker Compose template rendering step also renders Prometheus templates.
186+
187+
**Rejected Because**:
188+
189+
- Violates Single Responsibility Principle
190+
- Makes Docker Compose step dependent on Prometheus internals
191+
- Harder to add future services independently
192+
- Couples service orchestration with service configuration
193+
194+
### Alternative 4: Boolean Parameters for Service Validation
195+
196+
**Approach**: `run_release_validation(addr, creds, check_prometheus: bool)`.
197+
198+
**Rejected Because**:
199+
200+
- Not extensible - adding Grafana requires API change
201+
- Less semantic - what does `true` mean?
202+
- Becomes unwieldy with multiple services
203+
- Violates Open-Closed Principle
204+
205+
## Related Decisions
206+
207+
- [Template System Architecture](../technical/template-system-architecture.md) - Project Generator pattern
208+
- [Environment Variable Injection](environment-variable-injection-in-docker-compose.md) - Configuration passing
209+
- [DDD Layer Placement](../contributing/ddd-layer-placement.md) - Module organization
210+
211+
## References
212+
213+
- Issue: [#238 - Prometheus Slice - Release and Run Commands](../issues/238-prometheus-slice-release-run-commands.md)
214+
- Manual Testing Guide: [Prometheus Verification](../e2e-testing/manual/prometheus-verification.md)
215+
- Prometheus Documentation: https://prometheus.io/docs/
216+
- torrust-demo Reference: Existing Prometheus integration patterns

docs/e2e-testing/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ This guide explains how to run and understand the End-to-End (E2E) tests for the
77
- **[README.md](README.md)** - This overview and quick start guide
88
- **[architecture.md](architecture.md)** - E2E testing architecture, design decisions, and Docker strategy
99
- **[running-tests.md](running-tests.md)** - How to run automated tests, command-line options, and prerequisites
10-
- **[manual-testing.md](manual-testing.md)** - Complete guide for running manual E2E tests with CLI commands
10+
- **[manual/](manual/)** - Manual E2E testing guides:
11+
- **[README.md](manual/README.md)** - Complete manual test workflow (generic deployment guide)
12+
- **[mysql-verification.md](manual/mysql-verification.md)** - MySQL service verification and troubleshooting
13+
- **[prometheus-verification.md](manual/prometheus-verification.md)** - Prometheus metrics verification and troubleshooting
1114
- **[test-suites.md](test-suites.md)** - Detailed description of each test suite and what they validate
1215
- **[troubleshooting.md](troubleshooting.md)** - Common issues, debugging techniques, and cleanup procedures
1316
- **[contributing.md](contributing.md)** - Guidelines for extending E2E tests
@@ -67,7 +70,10 @@ For detailed prerequisites and manual setup, see [running-tests.md](running-test
6770
## 📚 Learn More
6871

6972
- **New to E2E testing?** Start with [test-suites.md](test-suites.md) to understand what each test does
70-
- **Want to run manual tests?** Follow [manual-testing.md](manual-testing.md) for step-by-step CLI workflow
73+
- **Want to run manual tests?** Follow [manual/README.md](manual/README.md) for step-by-step CLI workflow
74+
- **Testing specific services?** See service-specific guides:
75+
- [manual/mysql-verification.md](manual/mysql-verification.md) - MySQL verification
76+
- [manual/prometheus-verification.md](manual/prometheus-verification.md) - Prometheus verification
7177
- **Running into issues?** Check [troubleshooting.md](troubleshooting.md)
7278
- **Want to understand the architecture?** Read [architecture.md](architecture.md)
7379
- **Adding new tests?** See [contributing.md](contributing.md)

0 commit comments

Comments
 (0)