Skip to content

Cherry-pick #44769: Allow disabling chart datasets: backend#44924

Merged
sgress454 merged 1 commit intorc-minor-fleet-v4.85.0from
sgress454/disable-chart-datasets-backend-cp
May 7, 2026
Merged

Cherry-pick #44769: Allow disabling chart datasets: backend#44924
sgress454 merged 1 commit intorc-minor-fleet-v4.85.0from
sgress454/disable-chart-datasets-backend-cp

Conversation

@sgress454
Copy link
Copy Markdown
Contributor

Cherry-pick of #44769 into the RC branch.

<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** For #44077 

# Details

This PR implements enforcement of the "disable dataset" feature.  

When a dataset is disabled globally, we:

* Stop collecting all data for that dataset (the `Collect` method for
that dataset is not called in the cron job)
* Remove all previously-collected data for the dataset via an
asynchronous job

When a dataset is disabled for one or more fleets, we:

* Provide the list of disabled fleets as an argument to each dataset's
`Collect` method. Each dataset is responsible for filtering out hosts in
the most efficient way possible
* Scrub the data for the relevant datasets using a bitmask, so that all
hosts from the disabled fleets are removed from the data. This is done
via an asynchronous job.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
n/a, unreleased

- [X] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements), JS
inline code is prevented especially for url redirects, and untrusted
data interpolated into shell scripts/commands is validated against shell
metacharacters.
- [X] Timeouts are implemented and retries are limited to avoid infinite
loops

## Testing

- [X] Added/updated automated tests
- [X] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)

- [ ] QA'd all new/changed functionality manually

  ### Prerequisites / Test Setup

- [ ] Fleet running with at least 3 teams (call them T1, T2, T3) and ≥3
hosts in each, plus ≥2 hosts with no team
- [ ] At least one host on each team has reported recent uptime (within
the bucket window)
- [ ] At least one host in each team is affected by a tracked CVE (so
`host_scd_data` for `dataset='cve'` will have non-empty bitmaps)
- [ ] AppConfig: both `features.historical_data.uptime` and
`features.historical_data.vulnerabilities` start as `true`; same for
every team
- [ ] Let the collection cron run at least one full tick to populate
baseline rows in `host_scd_data` for both `uptime` and `cve`
- [ ] Note the current row count per dataset: `SELECT dataset, COUNT(*)
FROM host_scd_data GROUP BY dataset;`

  ---

  ### 1. Cron Skips Globally-Disabled Datasets

  #### 1.1 Global disable of `uptime`

- [x] Disable globally: `PATCH /api/v1/fleet/config` with
`features.historical_data.uptime = false`
- [x] Verify activity feed shows `disabled_historical_dataset` for
`uptime` (existing behavior)
- [x] Wait for next collection tick (or trigger it via fleetctl debug if
available)
  - [x] Confirm **no new rows** appear for `dataset='uptime'`:
`SELECT MAX(valid_from) FROM host_scd_data WHERE dataset='uptime';`
        should not advance after the disable
- [x] Confirm cron still writes `cve` rows on the same tick (per-dataset
isolation)
  - [x] Re-enable: PATCH `historical_data.uptime = true`
  - [x] Verify next tick resumes writing `uptime` rows

  #### 1.2 Global disable of `vulnerabilities`

  - [x] Repeat 1.1 with `features.historical_data.vulnerabilities`
  - [x] Confirm `cve` writes stop, `uptime` continues

  #### 1.3 Both disabled globally

  - [x] Disable both globally
  - [x] Confirm cron tick produces zero new rows for either dataset
  - [x] Confirm cron does not error or get stuck
  - [x] Re-enable both

  ---

  ### 2. Per-Fleet Disable — Cron Filters at SQL

  #### 2.1 Single team disabled for one dataset

  - [x] Disable uptime for T1 only: PATCH team T1 with
        `features.historical_data.uptime = false`
- [x] Verify scoped `disabled_historical_dataset` activity emitted for
T1
  - [x] Wait for next cron tick / trigger cron
- [x] Pick a host known to be in T1 (call it `H_T1`); confirm its bit is
NOT set in any `uptime` row written *after* the disable by filtering the
chart to that host
- [x] Pick a host in T2 (`H_T2`); confirm its bit IS still set in the
same rows (T2 is not disabled)
- [x] Pick a no-team host (`H_none`); confirm its bit IS still set
(no-team hosts follow the global value)

  #### 2.2 Same fleet, different dataset

- [x] With T1's uptime disabled, confirm T1's hosts ARE still written
into `cve` rows on subsequent ticks (per-dataset isolation)

  #### 2.3 All teams disabled, global on, no-team hosts

  - [x] Disable uptime on every team (T1, T2, T3)
- [x] Confirm next tick still writes a row containing only no-team
hosts' bits (global is on, no-team hosts always count)
  - [x] Re-enable uptime on all teams

  ---

  ### 3. Global Scrub — DELETE

  #### 3.1 Successful global scrub

  - [x] Note baseline:
        `SELECT COUNT(*) FROM host_scd_data WHERE dataset='uptime';`
        (should be > 5000 to exercise the loop; if not, manually
        insert filler rows or run multiple cron ticks)
  - [x] Disable uptime globally via the API
  - [x] Wait for the worker to pick up the scrub / trigger the job
  - [x] Confirm the count drops to 0:
        `SELECT COUNT(*) FROM host_scd_data WHERE dataset='uptime';`
  - [x] Confirm rows for **other datasets** are untouched
  - [ ] Test again but disable via GitOps

  ---

  ### 4. Per-Fleet Scrub — ANDNOT

  #### 4.1 Single-fleet scrub clears bits

  - [x] Identify hosts in T1 and record their IDs (call this set `S`)
- [x] Pre-disable, confirm at least one `host_scd_data` row for
`dataset='uptime'` has bits set at positions in `S` by filtering the
chart to those hosts
  - [x] Disable uptime on T1 only, via the API
  - [x] Wait for the scrub to run / trigger it
- [x] Confirm: every existing row for `dataset='uptime'` now has NO bits
set at any position in `S`. Spot-check by filtering the chart to those
hosts
- [x] Confirm rows for `dataset='cve'` (different dataset) are untouched
  - [x] Confirm bits for hosts in T2/T3 (not disabled) are still set
  - [x] Run test again but disable via GitOps

  #### 4.2 Multi-fleet scrub via GitOps batch

- [x] Apply a GitOps spec that flips cve to false on T1 and T3 in a
single apply
  - [x] Wait for scrub(s) to complete
- [x] Confirm bits for the union of T1∪T2 hosts are cleared from every
row of `dataset='cve'`
  - [x] Confirm T2 hosts' bits remain set

  ---

  ### 5. Activity Feed Cross-Check

  - [x] Each global flip emits exactly one `disabled_historical_dataset`
        activity (existing behavior, unchanged)
  - [x] Each per-team flip emits one scoped activity with the team's
        ID and name
  - [x] PATCH submitting unchanged values emits **no** activity and
        causes **no** scrub (no `host_scd_data` data change observed
        after the cron tick)
  - [x] No new "scrub completed" or "scrub started" activity is
        emitted (out of scope for v1)
  - [x] Re-enable flips emit `enabled_historical_dataset` activities
        and do NOT emit any scrub-related activity

  ---

  ### 6. Regression Spot Checks

  - [x] With everything enabled (default), the chart UI renders the
        same data as before this change (no behavior change in the
        "all on" case)
  - [x] AppConfig YAML round-trip (`fleetctl apply`) is benign:
        applying the unchanged config produces no scrub jobs and no
        activities
  - [x] GitOps apply with `historical_data` omitted from team specs
        defaults to `true` (per the gitops-api change) and does not
        trigger spurious scrubs
  - [x] After a full disable+scrub of cve, the `host_scd_data` table
        has no `dataset='cve'` rows; the chart UI for "vulnerable
        hosts over time" shows an empty/zero state without errors

  ---


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Chart collection now supports per-dataset scoping and honors
team-level disables; new scrub jobs are registered and worker handlers
added.
* New dataset scrub operations: global and fleet-scoped scrubs; scrubs
can be enqueued and are deduplicated to avoid duplicate pending jobs.
Historical-data changes enqueue scrubs after save (errors logged,
non-blocking).
* **Tests**
* Added unit tests for scope resolution, scrub enqueue/dedup behavior,
scrub workers, scrub application, and low-level blob scrub logic.
* **Documentation**
  * Added OpenSpec metadata for the chart scrub change.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@sgress454 sgress454 requested a review from a team as a code owner May 7, 2026 13:53
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@lukeheath
Copy link
Copy Markdown
Member

@sgress454 Konstantin is out today and tomorrow so I did a quick scan and re-approved this based on his PR approval.

@sgress454
Copy link
Copy Markdown
Contributor Author

@sgress454 Konstantin is out today and tomorrow so I did a quick scan and re-approved this based on his PR approval.

Thanks! I also sprung for the GitHub Claude review AND a last-minute "max effort" local review on this one (well, the main one, not the cherry pick).

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 65.24390% with 114 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.79%. Comparing base (9de8af6) to head (e709183).
⚠️ Report is 26 commits behind head on rc-minor-fleet-v4.85.0.

Files with missing lines Patch % Lines
server/chart/internal/mysql/data.go 68.42% 28 Missing and 2 partials ⚠️
server/chart/internal/mysql/charts.go 21.62% 27 Missing and 2 partials ⚠️
cmd/fleet/cron.go 38.23% 20 Missing and 1 partial ⚠️
server/worker/chart_scrub.go 56.25% 8 Missing and 6 partials ⚠️
server/chart/internal/service/service.go 70.83% 3 Missing and 4 partials ⚠️
ee/server/service/teams.go 25.00% 6 Missing ⚠️
server/service/appconfig.go 25.00% 3 Missing ⚠️
server/fleet/historical_data.go 95.12% 1 Missing and 1 partial ⚠️
cmd/fleet/serve.go 0.00% 1 Missing ⚠️
server/datastore/mysql/jobs.go 92.30% 1 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##           rc-minor-fleet-v4.85.0   #44924      +/-   ##
==========================================================
+ Coverage                   66.72%   66.79%   +0.07%     
==========================================================
  Files                        2626     2630       +4     
  Lines                      211198   211797     +599     
  Branches                     9428     9423       -5     
==========================================================
+ Hits                       140924   141477     +553     
+ Misses                      57485    57477       -8     
- Partials                    12789    12843      +54     
Flag Coverage Δ
backend 68.57% <65.24%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sgress454 sgress454 merged commit a7afd51 into rc-minor-fleet-v4.85.0 May 7, 2026
51 checks passed
@sgress454 sgress454 deleted the sgress454/disable-chart-datasets-backend-cp branch May 7, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants