Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughThis PR changes OSV vulnerability scanning to operate per OS version instead of per host. It adds two Datastore APIs—ListSoftwareForVulnDetectionByOSVersion and ListSoftwareVulnerabilitiesBySoftwareIDs—with MySQL implementations that batch software and vulnerability queries (configurable batch size). The OSV analyzer was refactored to use a new analyzeOSV helper which lists distinct software by OS version, processes software in chunks with a matcher, computes vulnerability deltas, performs batched deletes/inserts, deduplicates inserts, and optionally returns inserted vulnerabilities. Mock methods and unit tests were added. Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/vulnerabilities/osv/analyzer.go (1)
128-137:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn the matched vulnerabilities, not just the insert delta.
When
collectVulnsis true, this now returns onlynewVulns, so a no-op re-scan yields an empty slice even thoughfoundis non-empty. That changesAnalyze/AnalyzeRHELfrom “collect detected vulns” to “collect newly inserted vulns”.Suggested fix
- newVulns, err := ds.InsertSoftwareVulnerabilities(ctx, toInsert, source) - if err != nil { - return nil, fmt.Errorf("inserting software vulnerabilities: %w", err) - } + if _, err := ds.InsertSoftwareVulnerabilities(ctx, toInsert, source); err != nil { + return nil, fmt.Errorf("inserting software vulnerabilities: %w", err) + } if !collectVulns { return nil, nil } - return newVulns, nil + return found, nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/vulnerabilities/osv/analyzer.go` around lines 128 - 137, The current code returns only the insertion delta (newVulns) when collectVulns is true, but callers expect the matched/detected vulnerabilities (the variable found); change the logic so that after calling ds.InsertSoftwareVulnerabilities you still return the detected slice (found) when collectVulns is true instead of newVulns. In practice, in analyzer.go replace the branch that returns newVulns with a return of found (or the appropriate merged slice of found items if needed), keeping the error handling around ds.InsertSoftwareVulnerabilities unchanged and ensuring the function's return types still match.
🧹 Nitpick comments (4)
server/fleet/datastore.go (1)
683-685: 💤 Low valueDocument nil/empty-slice behavior for
softwareIDsin the interface contract comment.The current comment is silent on what callers should expect when
softwareIDsis nil or empty. Given that the OSV analyzer calls this method after a potentially emptyListSoftwareForVulnDetectionByOSVersionresult, a well-defined contract (e.g., "returns nil, nil when softwareIDs is empty") prevents defensive coding duplication in every caller.📝 Suggested doc addition
- // ListSoftwareVulnerabilitiesBySoftwareIDs returns vulnerabilities for the given software IDs - // filtered by source. Queries software_cve directly without joining through host_software. - ListSoftwareVulnerabilitiesBySoftwareIDs(ctx context.Context, softwareIDs []uint, source VulnerabilitySource) ([]SoftwareVulnerability, error) + // ListSoftwareVulnerabilitiesBySoftwareIDs returns vulnerabilities for the given software IDs + // filtered by source. Queries software_cve directly without joining through host_software. + // Returns nil, nil when softwareIDs is nil or empty. + ListSoftwareVulnerabilitiesBySoftwareIDs(ctx context.Context, softwareIDs []uint, source VulnerabilitySource) ([]SoftwareVulnerability, error)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/fleet/datastore.go` around lines 683 - 685, Update the interface comment for ListSoftwareVulnerabilitiesBySoftwareIDs to explicitly document nil/empty-slice behavior for the softwareIDs parameter (e.g., state that passing a nil or zero-length slice returns (nil, nil) or other defined result), so callers like the OSV analyzer and ListSoftwareForVulnDetectionByOSVersion don’t need to defensively handle empty inputs; mention the types involved (softwareIDs, VulnerabilitySource, SoftwareVulnerability) and the chosen contract (exact return values) so implementers and callers have a clear expectation.server/datastore/mysql/software_test.go (1)
12064-12069: ⚡ Quick winStrengthen OS-version result assertions beyond name-only checks.
Right now the test only asserts software names, so a regression that returns incorrect
Version/Sourcefor the same names could pass undetected. Prefer asserting full identity tuples.Suggested assertion tightening
- names := make([]string, len(result)) - for i, sw := range result { - names[i] = sw.Name - } - sort.Strings(names) - require.Equal(t, []string{"libbar", "libbaz", "libfoo"}, names) + type swKey struct { + Name string + Version string + Source string + } + got := make([]swKey, 0, len(result)) + for _, sw := range result { + got = append(got, swKey{Name: sw.Name, Version: sw.Version, Source: sw.Source}) + } + require.ElementsMatch(t, []swKey{ + {Name: "libbar", Version: "4.5.6", Source: "deb_packages"}, + {Name: "libbaz", Version: "7.8.9", Source: "deb_packages"}, + {Name: "libfoo", Version: "1.2.3", Source: "deb_packages"}, + }, got)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/datastore/mysql/software_test.go` around lines 12064 - 12069, Current test only collects sw.Name from result and asserts names; modify it to assert full identity tuples (Name, Version, Source) to catch regressions in version/source. Replace the names slice with a slice built from result that combines sw.Name, sw.Version and sw.Source (e.g., formatted tuple or small struct), sort that slice, and assert equality against an expected sorted list of full tuples like {"libbar|1.2.3|apt", ...}. Update the require.Equal call to compare these full tuples instead of just names, referencing the existing result and sw variables.server/datastore/mysql/software.go (1)
3314-3392: Add stage-level timing metrics for these new batch paths.Given the runtime-regression objective, consider recording per-call metrics (ID fetch duration, batch count, rows returned, vulnerability fetch duration) so first-run vs steady-state behavior can be compared in low-sampling environments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/datastore/mysql/software.go` around lines 3314 - 3392, Instrument the two new batch paths to record stage-level timing and counts: in ListSoftwareForVulnDetectionByOSVersion, measure and emit the duration and row count for the initial SELECT DISTINCT (the sqlx.SelectContext that produces softwareIDs), then for each batch inside the common_mysql.BatchProcessSimple lambda measure batch duration, record batch size and rows returned from the batchResult query, and emit a cumulative batch count; in ListSoftwareVulnerabilitiesBySoftwareIDs, likewise instrument each BatchProcessSimple invocation to time the per-batch SELECT (the sqlx.SelectContext that fetches from software_cve), record batch size, rows returned and cumulative batches processed, and include the source (fleet.VulnerabilitySource) as a tag/label. Use the project’s existing metrics/telemetry helper on ds (or the common metrics facility used elsewhere in Datastore) to record durations (histogram) and counts (counter/gauge) with labels for function name, platform/os_version or source, and batch index so first-run vs steady-state comparisons are possible; ensure metrics are emitted on both success and error paths inside the lambdas.server/vulnerabilities/osv/analyzer.go (1)
101-130: ⚡ Quick winAdd write-side timings to the new bulk-flow log.
The new debug log only measures the read/match phases. If delete/insert is still the dominant cost on large runs, this output will not explain the remaining runtime.
Suggested instrumentation
+ var deleteTime time.Duration // Delete stale vulnerabilities. if len(toDelete) > 0 { + deleteStart := time.Now().UTC() toDeleteMap := make(map[string]fleet.SoftwareVulnerability, len(toDelete)) for _, v := range toDelete { toDeleteMap[v.Key()] = v } if err := utils.BatchProcess(toDeleteMap, func(v []fleet.SoftwareVulnerability) error { return ds.DeleteSoftwareVulnerabilities(ctx, v) }, vulnBatchSize); err != nil { return nil, fmt.Errorf("deleting stale vulnerabilities: %w", err) } + deleteTime = time.Since(deleteStart) } // Insert new vulnerabilities. - newVulns, err := ds.InsertSoftwareVulnerabilities(ctx, toInsert, source) + insertStart := time.Now().UTC() + newVulns, err := ds.InsertSoftwareVulnerabilities(ctx, toInsert, source) if err != nil { return nil, fmt.Errorf("inserting software vulnerabilities: %w", err) } + insertTime := time.Since(insertStart) logger.DebugContext(ctx, "osv analysis completed", "platform", ver.Platform, "version", ver.Version, "distinct_software", len(software), "software_query_time", softwareTime, "match_time", matchTime, "existing_query_time", existingTime, + "delete_time", deleteTime, + "insert_time", insertTime, "found_vulns", len(found), "existing_vulns", len(existing), "to_insert", len(toInsert), "to_delete", len(toDelete), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/vulnerabilities/osv/analyzer.go` around lines 101 - 130, The debug log at logger.DebugContext("osv analysis completed", ...) only includes read/match timings; measure and log the write-side timings by timing the delete and insert operations around the existing delete and insert blocks and add those durations and counts to the same DebugContext output (e.g., capture deleteStart, measure deleteTime around utils.BatchProcess/ ds.DeleteSoftwareVulnerabilities and capture insertStart, measure insertTime around ds.InsertSoftwareVulnerabilities), and include fields like "delete_time", "delete_count", "insert_time", and "insert_count" (or "to_delete", "to_insert" equivalents) so the final debug record reflects both read and write costs. Ensure timings are recorded even on errors so the error paths can surface the measured durations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@server/vulnerabilities/osv/analyzer.go`:
- Around line 128-137: The current code returns only the insertion delta
(newVulns) when collectVulns is true, but callers expect the matched/detected
vulnerabilities (the variable found); change the logic so that after calling
ds.InsertSoftwareVulnerabilities you still return the detected slice (found)
when collectVulns is true instead of newVulns. In practice, in analyzer.go
replace the branch that returns newVulns with a return of found (or the
appropriate merged slice of found items if needed), keeping the error handling
around ds.InsertSoftwareVulnerabilities unchanged and ensuring the function's
return types still match.
---
Nitpick comments:
In `@server/datastore/mysql/software_test.go`:
- Around line 12064-12069: Current test only collects sw.Name from result and
asserts names; modify it to assert full identity tuples (Name, Version, Source)
to catch regressions in version/source. Replace the names slice with a slice
built from result that combines sw.Name, sw.Version and sw.Source (e.g.,
formatted tuple or small struct), sort that slice, and assert equality against
an expected sorted list of full tuples like {"libbar|1.2.3|apt", ...}. Update
the require.Equal call to compare these full tuples instead of just names,
referencing the existing result and sw variables.
In `@server/datastore/mysql/software.go`:
- Around line 3314-3392: Instrument the two new batch paths to record
stage-level timing and counts: in ListSoftwareForVulnDetectionByOSVersion,
measure and emit the duration and row count for the initial SELECT DISTINCT (the
sqlx.SelectContext that produces softwareIDs), then for each batch inside the
common_mysql.BatchProcessSimple lambda measure batch duration, record batch size
and rows returned from the batchResult query, and emit a cumulative batch count;
in ListSoftwareVulnerabilitiesBySoftwareIDs, likewise instrument each
BatchProcessSimple invocation to time the per-batch SELECT (the
sqlx.SelectContext that fetches from software_cve), record batch size, rows
returned and cumulative batches processed, and include the source
(fleet.VulnerabilitySource) as a tag/label. Use the project’s existing
metrics/telemetry helper on ds (or the common metrics facility used elsewhere in
Datastore) to record durations (histogram) and counts (counter/gauge) with
labels for function name, platform/os_version or source, and batch index so
first-run vs steady-state comparisons are possible; ensure metrics are emitted
on both success and error paths inside the lambdas.
In `@server/fleet/datastore.go`:
- Around line 683-685: Update the interface comment for
ListSoftwareVulnerabilitiesBySoftwareIDs to explicitly document nil/empty-slice
behavior for the softwareIDs parameter (e.g., state that passing a nil or
zero-length slice returns (nil, nil) or other defined result), so callers like
the OSV analyzer and ListSoftwareForVulnDetectionByOSVersion don’t need to
defensively handle empty inputs; mention the types involved (softwareIDs,
VulnerabilitySource, SoftwareVulnerability) and the chosen contract (exact
return values) so implementers and callers have a clear expectation.
In `@server/vulnerabilities/osv/analyzer.go`:
- Around line 101-130: The debug log at logger.DebugContext("osv analysis
completed", ...) only includes read/match timings; measure and log the
write-side timings by timing the delete and insert operations around the
existing delete and insert blocks and add those durations and counts to the same
DebugContext output (e.g., capture deleteStart, measure deleteTime around
utils.BatchProcess/ ds.DeleteSoftwareVulnerabilities and capture insertStart,
measure insertTime around ds.InsertSoftwareVulnerabilities), and include fields
like "delete_time", "delete_count", "insert_time", and "insert_count" (or
"to_delete", "to_insert" equivalents) so the final debug record reflects both
read and write costs. Ensure timings are recorded even on errors so the error
paths can surface the measured durations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d4e34a5-500c-4c4c-adbb-6dae6014183d
📒 Files selected for processing (6)
changes/44391-osv-vuln-optimizationsserver/datastore/mysql/software.goserver/datastore/mysql/software_test.goserver/fleet/datastore.goserver/mock/datastore_mock.goserver/vulnerabilities/osv/analyzer.go
There was a problem hiding this comment.
Pull request overview
This PR targets the vulnerabilities cron runtime regression by reducing redundant OSV scanning work at scale. It refactors OSV analyzers to operate on distinct software per OS version (instead of per-host) and adds supporting datastore APIs to query software/vulns more directly.
Changes:
- Refactored Ubuntu/RHEL OSV analysis to query distinct software per OS version and compute a single vuln delta per OS version.
- Added datastore methods to (1) list distinct software for vuln detection by OS version and (2) list existing vulns by software IDs + source.
- Added MySQL unit tests for the new datastore methods and a user-visible changes entry.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| server/vulnerabilities/osv/analyzer.go | Consolidates OSV analysis to a per-OS-version distinct-software pass and scopes existing vuln lookup by software IDs. |
| server/mock/datastore_mock.go | Extends the datastore mock with the new software/vuln listing APIs. |
| server/fleet/datastore.go | Extends the Datastore interface with the new OS-version and software-ID based methods. |
| server/datastore/mysql/software.go | Implements the new MySQL queries for distinct software-by-OS-version and vulns-by-software-IDs. |
| server/datastore/mysql/software_test.go | Adds coverage for the new MySQL datastore methods. |
| changes/44391-osv-vuln-optimizations | Documents the performance optimization for release notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #44684 +/- ##
=======================================
Coverage 66.69% 66.69%
=======================================
Files 2651 2651
Lines 213559 213605 +46
Branches 9610 9610
=======================================
+ Hits 142424 142472 +48
+ Misses 58170 58167 -3
- Partials 12965 12966 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/vulnerabilities/osv/analyzer.go (1)
101-112: Log delete/insert timings too.The new debug event times the read/match phases, but not the delete/insert phases. Since the linked investigation explicitly called out DB writes as a suspected bottleneck, counts alone will still leave the slowest stage opaque in customer runs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/vulnerabilities/osv/analyzer.go` around lines 101 - 112, Add timing for the DB write phases and include them in the existing debug event: capture the duration of the insert and delete operations that act on toInsert and toDelete (e.g., record start := time.Now() before performing the writes and compute insertTime := time.Since(start) / deleteTime similarly around the delete path), then pass "insert_time" and "delete_time" (and keep existing counts len(toInsert)/len(toDelete)) into the logger.DebugContext call so the debug line that uses logger.DebugContext reports both write durations and counts alongside the existing read/match timings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/vulnerabilities/osv/analyzer.go`:
- Around line 67-140: The helper currently materializes the entire OS-version
dataset (software, softwareIDs, found, existing, delta, dedupe maps) which can
blow memory for large OS buckets; update the flow in the function that calls
ds.ListSoftwareForVulnDetectionByOSVersion, matcher, utils.VulnsDelta,
ds.ListSoftwareVulnerabilitiesBySoftwareIDs, ds.DeleteSoftwareVulnerabilities
and ds.InsertSoftwareVulnerabilities to process the software slice in
configurable chunks (e.g., softwareChunkSize or reusing vulnBatchSize), for each
chunk: 1) build chunkSoftware and chunkSoftwareIDs, 2) call
matcher(chunkSoftware) and then ListSoftwareVulnerabilitiesBySoftwareIDs for
that chunk, 3) compute toInsert/toDelete via utils.VulnsDelta for the chunk, 4)
perform deletes with utils.BatchProcess and ds.DeleteSoftwareVulnerabilities and
perform inserts immediately via ds.InsertSoftwareVulnerabilities (deduplicating
inserts using a global seen map keyed by v.Key() to avoid duplicate insert
across chunks), and 5) aggregate counters/timings for logging; this keeps query
optimization but bounds peak memory to the chunk size while preserving existing
function names and behavior.
---
Nitpick comments:
In `@server/vulnerabilities/osv/analyzer.go`:
- Around line 101-112: Add timing for the DB write phases and include them in
the existing debug event: capture the duration of the insert and delete
operations that act on toInsert and toDelete (e.g., record start := time.Now()
before performing the writes and compute insertTime := time.Since(start) /
deleteTime similarly around the delete path), then pass "insert_time" and
"delete_time" (and keep existing counts len(toInsert)/len(toDelete)) into the
logger.DebugContext call so the debug line that uses logger.DebugContext reports
both write durations and counts alongside the existing read/match timings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7afab038-b37a-4d58-a8ad-dd90b663c799
📒 Files selected for processing (3)
server/datastore/mysql/software.goserver/datastore/mysql/software_test.goserver/vulnerabilities/osv/analyzer.go
🚧 Files skipped from review as they are similar to previous changes (2)
- server/datastore/mysql/software.go
- server/datastore/mysql/software_test.go
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| func (ds *Datastore) ListSoftwareForVulnDetectionByOSVersion( | ||
| ctx context.Context, | ||
| osVer fleet.OSVersion, | ||
| ) ([]fleet.Software, error) { | ||
| var softwareIDs []uint | ||
| err := sqlx.SelectContext(ctx, ds.reader(ctx), &softwareIDs, ` | ||
| SELECT DISTINCT hs.software_id | ||
| FROM host_software hs | ||
| JOIN hosts h ON hs.host_id = h.id | ||
| WHERE h.platform = ? AND h.os_version = ? | ||
| `, osVer.Platform, osVer.Name) | ||
| if err != nil { | ||
| return nil, ctxerr.Wrap(ctx, err, "listing distinct software IDs for OS version") | ||
| } | ||
|
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
mostlikelee
left a comment
There was a problem hiding this comment.
Nice! I'm guessing iterating over OS version software will also perform much better than the OVAL scanner as well. Just have one question.
| func (ds *Datastore) ListSoftwareForVulnDetectionByOSVersion( | ||
| ctx context.Context, | ||
| osVer fleet.OSVersion, | ||
| ) ([]fleet.Software, error) { |
There was a problem hiding this comment.
curious if we'll run into memory issues here. I believe the old pattern paginated
Related issue: Resolves #44391
Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
SELECT *is avoided, SQL injection is prevented (using placeholders for values in statements), JS inline code is prevented especially for url redirects, and untrusted data interpolated into shell scripts/commands is validated against shell metacharacters.Testing
I tested this locally
number of ubuntu hosts: 6,252
average software per host: 2,302
distinct software items: 61,213
host_software rows: 14.4M
generates software_cve rows 305,826
OS sub-versions: 25
The time before my optimization 10m53s down to 4m26s the optimization.
Summary by CodeRabbit
Chores
Refactor
Tests