Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
64b1ae4
docs: [#238] update Prometheus slice spec with correct architecture
josecelano Dec 15, 2025
5ae2a69
feat: [#238] add Prometheus configuration template
josecelano Dec 15, 2025
2ca0fa9
feat: [#238] add PrometheusContext for template rendering
josecelano Dec 15, 2025
92aab59
feat: [#238] add Prometheus domain configuration
josecelano Dec 15, 2025
731eaf4
feat: [#238] implement Prometheus template renderer
josecelano Dec 15, 2025
22790de
feat: [#238] integrate Prometheus with Docker Compose
josecelano Dec 15, 2025
f20d45c
feat: [#238] add independent Prometheus template rendering in release…
josecelano Dec 15, 2025
9c1b91a
feat: [#238] add Ansible deployment for Prometheus (Phase 6)
josecelano Dec 15, 2025
2f33fe0
docs: [#238] mark Phase 6 complete with commit hash
josecelano Dec 15, 2025
a257fcf
feat: [#238] add E2E validation and manual testing guide for Promethe…
josecelano Dec 15, 2025
2a820e2
feat: [#238] add Prometheus integration documentation (Phase 8)
josecelano Dec 15, 2025
1bdd612
docs: [#238] mark issue complete - all 8 phases implemented
josecelano Dec 15, 2025
c0e3192
refactor: simplify Prometheus build directory structure
josecelano Dec 15, 2025
254a9a8
docs: [#238] refactor manual E2E testing documentation to eliminate d…
josecelano Dec 15, 2025
3eb96f9
docs: [#238] refactor user guide to extract Prometheus content into s…
josecelano Dec 15, 2025
8638430
docs: [#238] add comprehensive security guide for production deployments
josecelano Dec 15, 2025
4292c9a
refactor: [#238] standardize renderer constants to use three-part pat…
josecelano Dec 15, 2025
b79b436
chore: [#238] add Traefik to project dictionary and fix formatting
josecelano Dec 15, 2025
cbeac26
feat: [#238] add Prometheus smoke test validation after run command
josecelano Dec 15, 2025
22ef8ae
refactor: reorganize docker_compose context module following conventions
josecelano Dec 15, 2025
b056f2a
style: apply rustfmt formatting to template renderers
josecelano Dec 15, 2025
010e6fc
refactor: use composition for TrackerPorts in DockerComposeContext
josecelano Dec 15, 2025
f2c3cb0
refactor: implement Builder Pattern for DockerComposeContext
josecelano Dec 15, 2025
a624d80
refactor: simplify execute() method in RenderDockerComposeTemplatesStep
josecelano Dec 16, 2025
c8962e1
refactor: [#238] fix Law of Demeter violations and implement consiste…
josecelano Dec 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions docs/decisions/prometheus-integration-pattern.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# Decision: Prometheus Integration Pattern - Enabled by Default with Opt-Out

## Status

Accepted

## Date

2025-01-22

## Context

The tracker deployment system needed to add Prometheus as a metrics collection service. Several design decisions were required:

1. **Enablement Strategy**: Should Prometheus be mandatory, opt-in, or enabled-by-default?
2. **Template Rendering**: How should Prometheus templates be rendered in the release workflow?
3. **Service Validation**: How should E2E tests validate optional services like Prometheus?

The decision impacts:

- User experience (ease of getting started with monitoring)
- System architecture (template rendering patterns)
- Testing patterns (extensibility for future optional services)

## Decision

### 1. Enabled-by-Default with Opt-Out

Prometheus is **included by default** in generated environment templates but can be disabled by removing the configuration section.

**Implementation**:

```rust
pub struct UserInputs {
pub prometheus: Option<PrometheusConfig>, // Some by default, None to disable
}
```

**Configuration**:

```json
{
"prometheus": {
"scrape_interval": 15
}
}
```

**Disabling**: Remove the entire `prometheus` section from the environment config.

**Rationale**:

- Monitoring is a best practice - users should get it by default
- Opt-out is simple - just remove the config section
- No complex feature flags or enablement parameters needed
- Follows principle of least surprise (monitoring expected for production deployments)

### 2. Independent Template Rendering Pattern

Each service renders its templates **independently** in the release handler, not from within other service's template rendering.

**Architecture**:

```text
ReleaseCommandHandler::execute()
├─ Step 1: Create tracker storage
├─ Step 2: Render tracker templates (tracker/*.toml)
├─ Step 3: Deploy tracker configs
├─ Step 4: Create Prometheus storage (if enabled)
├─ Step 5: Render Prometheus templates (prometheus.yml) - INDEPENDENT STEP
├─ Step 6: Deploy Prometheus configs
├─ Step 7: Render Docker Compose templates (docker-compose.yml)
└─ Step 8: Deploy compose files
```

**Rationale**:

- Each service is responsible for its own template rendering
- Docker Compose templates only define service orchestration, not content generation
- Environment configuration is the source of truth for which services are enabled
- Follows Single Responsibility Principle (each step does one thing)
- Makes it easy to add future services (Grafana, Alertmanager, etc.)

**Anti-Pattern Avoided**: Rendering Prometheus templates from within Docker Compose template rendering step.

### 3. ServiceValidation Struct for Extensible Testing

E2E validation uses a `ServiceValidation` struct with boolean flags instead of function parameters.

**Implementation**:

```rust
pub struct ServiceValidation {
pub prometheus: bool,
// Future: pub grafana: bool,
// Future: pub alertmanager: bool,
}

pub fn run_release_validation(
socket_addr: SocketAddr,
ssh_credentials: &SshCredentials,
services: Option<ServiceValidation>,
) -> Result<(), String>
```

**Rationale**:

- Extensible for future services without API changes
- More semantic than boolean parameters
- Clear intent: `ServiceValidation { prometheus: true }`
- Follows Open-Closed Principle (open for extension, closed for modification)

**Anti-Pattern Avoided**: `run_release_validation_with_prometheus_check(addr, creds, true)` - too specific and not extensible.

## Consequences

### Positive

1. **Better User Experience**:

- Users get monitoring by default without manual setup
- Simple opt-out (remove config section)
- Production-ready deployments out of the box

2. **Cleaner Architecture**:

- Each service manages its own templates independently
- Clear separation of concerns in release handler
- Easy to add future services (Grafana, Alertmanager, Loki, etc.)

3. **Extensible Testing**:

- ServiceValidation struct easily extended for new services
- Consistent pattern for optional service validation
- Type-safe validation configuration

4. **Maintenance Benefits**:
- Independent template rendering simplifies debugging
- Each service's templates can be modified independently
- Clear workflow steps make issues easier to trace

### Negative

1. **Default Overhead**:

- Users who don't want monitoring must manually remove the section
- Prometheus container always included in default deployments
- Slightly more disk/memory usage for minimal deployments

2. **Configuration Discovery**:
- Users must learn that removing the section disables the service
- Not immediately obvious from JSON schema alone
- Requires documentation of the opt-out pattern

### Risks

1. **Breaking Changes**: Future Prometheus config schema changes require careful migration planning
2. **Service Dependencies**: Adding services that depend on Prometheus requires proper ordering logic
3. **Template Complexity**: As services grow, need to ensure independent rendering doesn't duplicate logic

## Alternatives Considered

### Alternative 1: Mandatory Prometheus

**Approach**: Always deploy Prometheus, no opt-out.

**Rejected Because**:

- Forces monitoring on users who don't want it
- Increases minimum resource requirements
- Violates principle of least astonishment for minimal deployments

### Alternative 2: Opt-In with Feature Flag

**Approach**: Prometheus disabled by default, enabled with `"prometheus": { "enabled": true }`.

**Rejected Because**:

- Requires users to discover and enable monitoring manually
- Most production deployments should have monitoring - opt-in makes it less likely
- Adds complexity with enabled/disabled flags

### Alternative 3: Render Prometheus Templates from Docker Compose Step

**Approach**: Docker Compose template rendering step also renders Prometheus templates.

**Rejected Because**:

- Violates Single Responsibility Principle
- Makes Docker Compose step dependent on Prometheus internals
- Harder to add future services independently
- Couples service orchestration with service configuration

### Alternative 4: Boolean Parameters for Service Validation

**Approach**: `run_release_validation(addr, creds, check_prometheus: bool)`.

**Rejected Because**:

- Not extensible - adding Grafana requires API change
- Less semantic - what does `true` mean?
- Becomes unwieldy with multiple services
- Violates Open-Closed Principle

## Related Decisions

- [Template System Architecture](../technical/template-system-architecture.md) - Project Generator pattern
- [Environment Variable Injection](environment-variable-injection-in-docker-compose.md) - Configuration passing
- [DDD Layer Placement](../contributing/ddd-layer-placement.md) - Module organization

## References

- Issue: [#238 - Prometheus Slice - Release and Run Commands](../issues/238-prometheus-slice-release-run-commands.md)
- Manual Testing Guide: [Prometheus Verification](../e2e-testing/manual/prometheus-verification.md)
- Prometheus Documentation: https://prometheus.io/docs/
- torrust-demo Reference: Existing Prometheus integration patterns
10 changes: 8 additions & 2 deletions docs/e2e-testing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ This guide explains how to run and understand the End-to-End (E2E) tests for the
- **[README.md](README.md)** - This overview and quick start guide
- **[architecture.md](architecture.md)** - E2E testing architecture, design decisions, and Docker strategy
- **[running-tests.md](running-tests.md)** - How to run automated tests, command-line options, and prerequisites
- **[manual-testing.md](manual-testing.md)** - Complete guide for running manual E2E tests with CLI commands
- **[manual/](manual/)** - Manual E2E testing guides:
- **[README.md](manual/README.md)** - Complete manual test workflow (generic deployment guide)
- **[mysql-verification.md](manual/mysql-verification.md)** - MySQL service verification and troubleshooting
- **[prometheus-verification.md](manual/prometheus-verification.md)** - Prometheus metrics verification and troubleshooting
- **[test-suites.md](test-suites.md)** - Detailed description of each test suite and what they validate
- **[troubleshooting.md](troubleshooting.md)** - Common issues, debugging techniques, and cleanup procedures
- **[contributing.md](contributing.md)** - Guidelines for extending E2E tests
Expand Down Expand Up @@ -67,7 +70,10 @@ For detailed prerequisites and manual setup, see [running-tests.md](running-test
## 📚 Learn More

- **New to E2E testing?** Start with [test-suites.md](test-suites.md) to understand what each test does
- **Want to run manual tests?** Follow [manual-testing.md](manual-testing.md) for step-by-step CLI workflow
- **Want to run manual tests?** Follow [manual/README.md](manual/README.md) for step-by-step CLI workflow
- **Testing specific services?** See service-specific guides:
- [manual/mysql-verification.md](manual/mysql-verification.md) - MySQL verification
- [manual/prometheus-verification.md](manual/prometheus-verification.md) - Prometheus verification
- **Running into issues?** Check [troubleshooting.md](troubleshooting.md)
- **Want to understand the architecture?** Read [architecture.md](architecture.md)
- **Adding new tests?** See [contributing.md](contributing.md)
Expand Down
Loading
Loading