Skip to content

Commit a416f40

Browse files
committed
processor_tda: Address comments
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
1 parent 155e3f9 commit a416f40

File tree

2 files changed

+22
-19
lines changed

2 files changed

+22
-19
lines changed

pipeline/processors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Fluent Bit offers the following processors:
2121
- [Sampling](./processors/sampling.md): Apply head or tail sampling to incoming traces.
2222
- [SQL](./processors/sql.md): Use SQL queries to extract log content.
2323
- [Filters as processors](./processors/filters.md): Use filters as processors.
24-
- [TDA](./processors/tda.md): Do Topological Data Analysis calculations.
24+
- [TDA](./processors/tda.md): Do Topological Data Analysis (TDA) calculations.
2525

2626
## Features
2727

pipeline/processors/tda.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# TDA
22

3-
The `tda` processor applies **Topological Data Analysis (TDA)** – specifically, **persistent homology** – to Fluent Bit’s metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space.
4-
5-
This processor is intended for detecting **phase transitions**, **regime changes**, and **intermittent instabilities** that are hard to see from individual counters, gauges, or standard statistical aggregates. It can, for example, differentiate between a single, one-off failure and an extended period of intermittent failures where the system never settles into a stable regime.
3+
The `tda` processor applies **Topological Data Analysis (TDA)**—specifically, **persistent homology**—to Fluent Bit metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space.
64

5+
This processor is intended for detecting **phase transitions**, **regime changes**, and **intermittent instabilities** that are difficult to detect from individual counters, gauges, or standard statistical aggregates.
6+
It can, for example, differentiate between a single, one-off failure and an extended period of intermittent failures where the system never settles into a stable regime.
77
Currently, `tda` works only in the **metrics pipeline** (`processors.metrics`).
88

99
---
@@ -48,7 +48,8 @@ On each metrics flush, `tda`:
4848
To stabilize very different magnitudes and bursty traffic, each rate is mapped to
4949
`norm = log1p(|rate|)`, and the sign of `rate` is reattached. This yields a vector that is roughly scale-invariant but still sensitive to relative changes in rates across groups.
5050

51-
The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented via a lightweight circular buffer (`lwrb`) that stores timestamped samples. The window maintains at most `window_size` samples; older samples are dropped when the buffer is full.
51+
The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented through a lightweight circular buffer (`lwrb`) that stores timestamped samples.
52+
The window maintains at most `window_size` samples; older samples are dropped when the buffer is full.
5253

5354
### 2. Sliding window and delay embedding
5455

@@ -65,7 +66,7 @@ $$
6566

6667
where each `x_·` is the **D-dimensional normalized metrics vector** at that time. This yields embedded points in (\mathbb{R}^{mD}).
6768

68-
Because we need all lags to be inside the window, the number of embedded points is:
69+
Because all lags must be inside the window, the number of embedded points is:
6970

7071
$$
7172
n_{\text{embed}} = n_{\text{raw}} - (m - 1)\tau
@@ -77,8 +78,8 @@ This embedding follows the idea of **Takens’ theorem**, which states that, und
7778

7879
Intuitively:
7980

80-
* `embed_dim = 1`: you see only the current snapshot geometry.
81-
* `embed_dim > 1`: you expose **loops and recurrent trajectories** in the joint evolution of metrics, which later show up as **H₁ (Betti₁) features**.
81+
* `embed_dim = 1`: only the current "snapshot" geometry is visible.
82+
* `embed_dim > 1`: **loops and recurrent trajectories** in the joint evolution of metrics become visible, which later show up as **H₁ (Betti₁) features**.
8283

8384
### 3. Distance matrix construction
8485

@@ -96,24 +97,24 @@ Persistent homology requires a **scale parameter** (Rips radius / distance thres
9697

9798
1. **Automatic multi-quantile scan** (`threshold = 0`, default)
9899

99-
* The off-diagonal distances are collected, sorted, and several quantiles are evaluated, e.g. `q ∈ {0.10, 0.20, …, 0.90}`.
100+
* The off-diagonal distances are collected, sorted, and several quantiles are evaluated, for example `q ∈ {0.10, 0.20, …, 0.90}`.
100101
* For each candidate quantile `q`, a threshold `r_q` is chosen and Betti numbers are computed using Ripser.
101102
* The plugin prefers the scale where **Betti₁** (loops) is maximized; if all Betti₁ are zero, it falls back to Betti₀ as a secondary indicator.
102103

103104
2. **Fixed quantile mode** (`0 < threshold < 1`)
104105

105106
* `threshold` is interpreted as a single quantile `q`. The Rips radius is set at this quantile of all pairwise distances.
106-
* The multi-quantile scan still runs internally for robustness, but reported diagnostics (e.g., debug logs) will reflect the user-selected quantile.
107+
* The multi-quantile scan still runs internally for robustness, but reported diagnostics (For example, debug logs) will reflect the user-selected quantile.
107108

108109
Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, which gathers all `i > j` entries of the distance matrix, sorts them, and picks the specified quantile index.
109110

110-
### 5. Persistent homology via Ripser
111+
### 5. Persistent Homology through Ripser
111112

112113
Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris–Rips persistent homology:
113114

114115
1. **Compression and C API**
115116

116-
* The dense `n_embed × n_embed` matrix is converted into Ripsers `compressed_lower_distance_matrix`.
117+
* The dense `n_embed × n_embed` matrix is converted into Ripser's `compressed_lower_distance_matrix`.
117118
* The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in (\mathbb{Z}/2\mathbb{Z}), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features.
118119

119120
2. **Interval aggregation**
@@ -124,7 +125,7 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a
124125
3. **Multi-scale selection**
125126

126127
* For each candidate threshold, Betti numbers are computed.
127-
* The best scale is chosen as the one with the largest Betti₁ (loops); if Betti₁ is zero across scales, the plugin picks the scale where Betti₀ is largest.
128+
* The "best" scale is chosen as the one with the largest Betti₁ (loops); if Betti₁ is zero across scales, the plugin picks the scale where Betti₀ is largest.
128129
* The corresponding Betti₀, Betti₁, and Betti₂ values are then exported as Fluent Bit gauges.
129130

130131
### 6. Exported metrics
@@ -133,9 +134,9 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a
133134

134135
| Metric name | Type | Description |
135136
| ---------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
136-
| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many micro-regimes. |
137-
| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. |
138-
| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. |
137+
| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ - number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes". |
138+
| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ - number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. |
139+
| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ - number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. |
139140

140141
Each metric is timestamped with the current time at the moment of TDA computation and is exported via the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric.
141142

@@ -170,11 +171,13 @@ Some practical patterns:
170171

171172
3. **Intermittent failure / unstable regime**
172173

173-
* The system repeatedly bounces between healthy and unhealthy states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses).
174+
* The system repeatedly bounces between "healthy" and "unhealthy" states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses).
174175
* The trajectory in phase space forms **loops**: metrics move away from the healthy region and then return, many times.
175176
* Betti₁ (and occasionally Betti₂) increases noticeably while this behavior persists, reflecting the emergence of non-trivial cycles in the metric dynamics.
176177

177-
In the sample output, as the HTTP output oscillates between success and various `Connection refused` / `broken connection` errors, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (e.g., Betti₁ around 10–13, Betti₂ around 1–2) while Betti₀ also increases. This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability.
178+
In the sample output, the HTTP output oscillates between success and various "Connection refused" and "broken connection" errors.
179+
As this occurs, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (for example, Betti₁ around 10—13, Betti₂ around 1—2) while Betti₀ also increases.
180+
This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability.
178181

179182
These interpretations are consistent with results from condensed matter physics and dynamical systems, where persistent homology has been used to detect phase transitions and changes in underlying order purely from data (References 1 and 2).
180183

@@ -184,7 +187,7 @@ These interpretations are consistent with results from condensed matter physics
184187

185188
### Basic setup with `fluentbit_metrics`
186189

187-
The following example computes TDA on Fluent Bits own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`:
190+
The following example computes TDA on Fluent Bit's own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`:
188191

189192
```yaml
190193
service:

0 commit comments

Comments
 (0)