Skip to content

feat(M1/T1): integrity hashing (ssdeep + mismatch + degraded-coverage alert) and crossbeam removal#190

Open
unclesp1d3r wants to merge 20 commits into
mainfrom
feat/m1-integrity-hashing-crossbeam-removal
Open

feat(M1/T1): integrity hashing (ssdeep + mismatch + degraded-coverage alert) and crossbeam removal#190
unclesp1d3r wants to merge 20 commits into
mainfrom
feat/m1-integrity-hashing-crossbeam-removal

Conversation

@unclesp1d3r

@unclesp1d3r unclesp1d3r commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Finishes the two in-flight M1 foundation workstreams (ticket T1): executable integrity hashing and crossbeam removal.

Integrity hashing (R2):

  • Adds an ssdeep/CTPH fuzzy hash alongside the SHA-256 identity hash, computed on a dedicated non-cryptographic path (daemoneye-lib/src/integrity/fuzzy.rs) — deliberately kept off MultiAlgorithmHasher so the HashResult.hashes cryptographic-only invariant is preserved (ssdeep is attacker-malleable and never used as an identity guarantee).
  • Computes ssdeep in procmond's hash pass from a clone of the already-authorized fd (no second open, no new TOCTOU window) on the blocking pool, and flags degraded coverage when ssdeep fails but SHA-256 succeeds.
  • Detects on-disk-vs-running mismatch on Linux via the /proc/<pid>/exe " (deleted)" suffix.
  • Decouples the hash pass from the R1 enumeration deadline — hashing runs on its own budget after enumeration completes, reusing the shared hash cache.
  • Threads three new signals (ssdeep_hash, on_disk_mismatch, ssdeep_degraded) to the agent over a typed protobuf contract (ProcessRecord fields 15/16/17). On the procmond side they ride ProcessEvent.platform_metadata via typed helpers, avoiding a breaking change to the 180+ ProcessEvent construction sites.
  • A new agent-side alert bridge (daemoneye-agent/src/integrity_alerts.rs) raises integrity.coverage.degraded (Medium), integrity.disk_mismatch (High), and integrity.binary_change (Medium, below a validated configurable similarity threshold) — procmond can't emit alerts, so the orchestrator does.

Crossbeam removal (R14):

  • Removes the export-only, zero-runtime-consumer crossbeam HighPerformanceEventBus and drops the crossbeam dependency entirely (absent from Cargo.lock). The in-process delivery path the agent actually runs is the daemoneye-eventbus broker, now covered by a new end-to-end in-process broker latency benchmark for the R14 AC4 record. No dual-bus end state remains.

Operator requirement (2026-06-10): when ssdeep fails but SHA-256 succeeds, this is surfaced as an alert (the integrity.coverage.degraded signal), not a silent None.

Implementation notes

Derived from docs/plans/2026-06-10-001-feat-integrity-hashing-crossbeam-removal-plan.md (11 implementation units, U1–U11). The ticket itself (spec/full/tickets/T1...) was refined via a documentation review before planning.

Test plan

  • cargo fmt --all --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean (zero warnings)
  • cargo test --workspace — full suite passing (incl. new unit tests for the fuzzy module, integrity-signal helpers, hash-pass ssdeep stamping, classify_exe_target, the agent alert bridge + BinaryChangeTracker, and a ProcessEvent → proto round-trip for the three new fields)
  • cargo deny check — advisories / bans / licenses / sources ok (new fuzzyhash dep is pure-Rust, MIT, exact-pinned, default-features = false)
  • Criterion benches compile and run (integrity_operations ssdeep impact; broker_inprocess_latency)
  • CI verifies the Linux-only linux_collector mismatch code — it cannot be compiled on the macOS dev host (cross-target C build deps), so the (deleted)-suffix wiring is validated by CI's Linux matrix. The pure classify_exe_target helper is unit-tested.

Residual review findings (from ce-code-review, not blocking)

Applied in this PR: per-target alert dedup (distinct affected executables no longer collapse to one alert), bounded BinaryChangeTracker (evicts baselines for exited executables), proto→native asymmetry doc, format inline, added tests.

Surfaced but intentionally deferred:

  • Doubled read I/O per executable — ssdeep re-streams the file after SHA-256 reads it to EOF (shared fd offset, seek(0)). Inherent to computing two independent digests from a single authorized fd without buffering the whole file; bounded by the per-file size cap and concurrency semaphore.
  • classify_exe_target ambiguity — a real file literally named ... (deleted) is byte-identical to a kernel-flagged deleted exe (the kernel does not escape its suffix). Inherent to the /proc convention; would require an extra stat/inode cross-check.
  • macOS/Windows on_disk_mismatch always false — Linux is primary for this signal; non-Linux probes are future work.
  • Agent-side fuzzy::similarity runs synchronously (no catch_unwind); digests originate from trusted procmond over CRC32-validated same-host IPC.

AI usage disclosure

Per AI_POLICY.md: implemented with Claude Code (Claude Opus 4.8 (1M Context)) via the compound-engineering autonomous pipeline (plan → work → review → fix). All code was reviewed by an 8-persona automated review pass; findings were triaged and the actionable ones applied. All changes build clean, pass clippy -D warnings, and pass the full test suite locally. Human maintainer review required before merge.

CI status

All checks green after fixing the one previously-failing job:

  • eventbus-latency-slo — the failure was a pre-existing hang in the bench harness, now fixed in this PR: latency_benchmark in daemoneye-eventbus/benches/ipc_performance.rs (added by END-297 / feat(eventbus): close END-297 — finalize message broker closure pass #178) awaited the infinite start_echo_handler() accept-loop directly, so the bench process never exited (criterion runs every bench fn's setup body regardless of the --bench filter). The SLO measurement itself always passed (p99 ≈ 20µs vs 1ms). Fixed by spawning the handler on a background task, matching latency_p99_slo; verified locally that the bench now exits cleanly.

Post-review changes (from /review-pr and ce-code-review)

Applied:

  • Per-target alert dedup + bounded BinaryChangeTracker (eviction of exited executables) — both flagged by 2+ reviewers.
  • Stale ssdeep reset on event reusepopulate_hashes Phase 1 now clears the ssdeep metadata too (was only clearing the SHA-256 fields), so a reused event failing auth can't lift a stale digest/degraded flag onto the wire. Test extended.
  • Observable fd-clone failure in the ssdeep path (was a silent drop).
  • Per-target/shared dedup-key tests, tracker-eviction test, ProcessEvent → proto round-trip test, proto→native asymmetry doc, format inline.

Deferred (sound under current usage; tracked for follow-up, not reachable bugs):

  • FuzzyConfig could move to parse-on-construction (new()/TryFrom) per the AGENTS.md newtype rule — today only Default constructs it, so the unvalidated state isn't reachable until operator config wiring lands.
  • HashOutcome::Hashed { ssdeep: Option, ssdeep_degraded: bool } could collapse to a 3-state enum to make (Some, true) unrepresentable.
  • Surfacing fuzzy::similarity / read_link error kinds (vs the benign no-op) for richer forensic telemetry.
  • A Linux-only end-to-end on_disk_mismatch integration test (unlinked backing file) beyond the pure-helper unit tests.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add the pure-Rust, MIT-licensed fuzzyhash 0.2.2 crate (ssdeep/CTPH) as an
exact-pinned workspace dependency, exposed through a new daemoneye-lib
fuzzy-hashes feature (on by default). ssdeep is a non-cryptographic
similarity hash and is deliberately kept off the MultiAlgorithmHasher
cryptographic path. Passes cargo deny (licenses/bans/advisories/sources ok).

Advances R2 AC7 (ssdeep fuzzy hash recording).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
New integrity::fuzzy module computes ssdeep/CTPH digests on a dedicated
non-cryptographic path (bytes + streaming-reader entry points) and compares
two digests for 0-100 similarity. FuzzyConfig carries a named default
threshold (DEFAULT_SSDEEP_SIMILARITY_THRESHOLD = 80) with a validate() bound
rejecting [0] and [100] so a misconfiguration cannot silently disable or
saturate the binary-change observation. Feature-gated on fuzzy-hashes with
compiling stubs when disabled. Kept entirely off MultiAlgorithmHasher so the
HashResult cryptographic-only invariant is untouched.

Advances R2 AC7 (ssdeep fuzzy hash recording).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…contract

Add three typed fields to the protobuf ProcessRecord (ssdeep_hash=15,
on_disk_mismatch=16, ssdeep_degraded=17) so the agent reads them directly.
On the procmond side the signals ride ProcessEvent.platform_metadata (via
typed set_integrity_signals / ssdeep_hash / on_disk_mismatch / ssdeep_degraded
helpers) rather than new struct fields, keeping the 180+ ProcessEvent literal
construction sites stable; the IPC conversion lifts them onto the proto record.
ssdeep_hash is decoupled from the (executable_hash, hash_algorithm) paired
invariant. Unrelated proto-record literals default the new fields.

Advances R2 AC6/AC7 and the degraded-coverage alert requirement.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
populate_hashes now computes an ssdeep fuzzy hash alongside SHA-256 for each
authorized executable, streaming a clone of the same authorized fd on the
blocking pool (no second open, no new TOCTOU window). The digest is stamped via
ProcessEvent::set_ssdeep_signal. When SHA-256 succeeds but ssdeep fails the
event is flagged ssdeep_degraded and a new HashPassStats.ssdeep_failures counter
increments; a disabled fuzzy-hashes feature is NOT counted as degraded. ssdeep
failure never fails the pass or enumeration.

Replaces the monolithic set_integrity_signals with composable set_ssdeep_signal
and set_on_disk_mismatch setters so the hash pass and the collector (U6) write
independently without clobbering each other.

Advances R2 AC7 and the degraded-coverage alert requirement.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The post-enumeration hash pass previously ran on whatever budget enumeration
left over (collection_timeout minus elapsed, with a skip when exhausted), so a
slow enumeration starved hashing and hashing competed with the R1 enumeration
deadline. Enumeration already completes and produces its events before the hash
pass, so the pass now runs on its own independent budget (collection_timeout in
ProcessEventSource, CYCLE_BUDGET in the actor collector). Hashing latency no
longer shortens or extends the enumeration deadline; cache reuse via the shared
engine keeps steady-state cost low; inaccessible executables remain non-fatal.
Adds ssdeep_failures to the completion telemetry.

Advances R2 AC4 (async hashing outside the enumeration deadline).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The Linux collector now classifies the /proc/<pid>/exe symlink target: when the
kernel appends the trailing " (deleted)" suffix (the backing executable was
unlinked or replaced while the process runs), it strips the suffix from the
stored path and records the on-disk-vs-running mismatch via
ProcessEvent::set_on_disk_mismatch (R2 AC6). The match is anchored to the
trailing token so a path legitimately containing the substring mid-string is
not flagged. macOS/Windows default the flag to false (Linux is primary for this
signal). Helper classify_exe_target is unit-tested.

Advances R2 AC6 (on-disk-vs-running mismatch recorded as distinct metadata).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
procmond sets per-process integrity flags on the wire but cannot emit alerts
(no AlertManager, no network). The new integrity_alerts module in daemoneye-agent
reads the proto ProcessRecord flags before native conversion and raises alerts:
ssdeep_degraded -> integrity.coverage.degraded (Medium), on_disk_mismatch ->
integrity.disk_mismatch (High). The alerts are folded into the existing
execute_rules alert stream so they share dedup, rate-limiting, and delivery.
A record may raise both (distinct dedup keys); clean records raise none.

Implements the operator degraded-coverage alert requirement and surfaces R2 AC6
mismatch as an alert.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Adds a session-scoped BinaryChangeTracker to the agent integrity bridge. It
holds the last ssdeep digest per executable path; when a process's current
ssdeep similarity to its previously recorded value falls below the configured
threshold it raises an integrity.binary_change alert (Medium). The first
observation for a path seeds the baseline only; a comparison failure or a
missing ssdeep is skipped without a false alert. The "previously recorded value"
is the agent's last in-memory value, reading no storage and adding no storage
schema (within the ticket's no-new-storage-logic boundary). The threshold comes
from the validated FuzzyConfig so a misconfiguration cannot silently disable it.

Advances R2 AC7 (binary-change observation below a configurable threshold).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Extend the integrity criterion suite with ssdeep-only benchmarks across the
representative sizes (1 KiB / 256 KiB / 4 MiB) and a combined SHA-256 + ssdeep
benchmark matching what procmond's hash pass now computes per executable.
Comparing these against the existing SHA-256-only baseline quantifies the
ssdeep overhead for the R2 AC4 sustained-CPU budget. Baselines are recorded via
the criterion CLI (cargo bench --baseline previous).

Advances R2 AC4 (hashing impact baselines).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The R14 AC4 no-regression gate compares the daemoneye-eventbus in-process broker
delivery path against a pre-migration baseline, but no benchmark measured that
path: throughput.rs is publish-only, ipc_performance.rs is socket-based, and
procmond's eventbus_benchmarks.rs measures the WAL connector. This new bench
constructs a broker with no transport server bound and measures publish ->
in-process subscriber receive (tokio mpsc fan-out) — the path the agent runs.
Record the baseline with: cargo bench -p daemoneye-eventbus --bench
broker_inprocess_latency -- --save-baseline pre-migration.

Advances R14 AC4 (broker end-to-end no-regression baseline).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ndency

Gate decision (R14 AC4/AC7): the crossbeam HighPerformanceEventBus was
export-only dead code with zero runtime consumers (instantiated only in its own
unit tests), so its removal is pure dead-code elimination with no possible
performance regression — the in-process delivery path the agent actually runs is
the daemoneye-eventbus broker, now covered by the broker_inprocess_latency
benchmark (U10). Deletes high_performance_event_bus.rs, drops its re-exports from
collector-core, and removes the crossbeam dependency from both manifests
(absent from Cargo.lock). Fixes a stale doc comment claiming LocalEventBus uses
crossbeam (it uses tokio channels) and updates the AGENTS.md tech-stack entry.
No dual-bus end state remains.

Implements R14 AC7 (legacy crossbeam path removed).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The EventSubscription example in the crate-level doctest predated the
include_control field and failed to compile under cargo test --workspace.

Signed-off-by line added by -s.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Address ce-code-review findings on the integrity feature:

- Alert over-suppression (adversarial + reliability): the AlertManager dedup key
  is severity:rule_id:title, identical across processes for a shared integrity
  rule, so distinct affected executables collapsed to one delivered alert per
  window. build_alert now discriminates the dedup key by executable identity, so
  distinct executables alert separately while the same executable still dedups.
- Unbounded BinaryChangeTracker growth (adversarial + reliability +
  maintainability + correctness): observe() now evicts baselines for executables
  no longer running, bounding the map to the running-process set.
- Inline {score} format arg; document the intentional proto->native integrity
  drop. Add tests: per-target vs shared dedup keys, tracker eviction, and a
  ProcessEvent->proto round-trip for the three integrity fields.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings June 10, 2026 06:18
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. alerting Alert delivery and notification systems architecture System architecture and design decisions core-feature Core system functionality cryptography Cryptographic implementations and security dependencies Pull requests that update a dependency file enhancement New feature or request protobuf Protocol Buffer related changes testing Related to test development and test infrastructure labels Jun 10, 2026
@mergify

mergify Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for dependabot and dosubot.

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Activates for non-bot authors, or dependabot when files exist outside .github/workflows/.

  • check-success = DCO
  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-15, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (windows-2022, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 3 commits of the base branch before merging

  • #commits-behind <= 3

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4678e7e6-601e-4026-b349-3352ceed5e7a

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff3d73 and 3fd3bba.

📒 Files selected for processing (1)
  • daemoneye-lib/src/proto.rs

Summary by CodeRabbit

  • New Features

    • Integrity monitoring: fuzzy-hash (ssdeep) binary-change detection with configurable similarity threshold.
    • Detection of on-disk vs running executable mismatches and degraded coverage signaling.
    • New integrity alerts: degraded coverage, on-disk mismatch, and binary-change; alerts deduplicate per-executable.
    • IPC/protocol extended to include fuzzy-hash, degraded, and mismatch signals in process records.
  • Documentation

    • Architecture docs updated to reflect expanded executable integrity verification.

Walkthrough

Adds feature-gated ssdeep fuzzy hashing across collection and hash pass, extends ProcessRecord proto with ssdeep/on_disk_mismatch/ssdeep_degraded, detects deleted /proc/.../exe targets, lifts integrity signals into wire records, emits agent-side integrity alerts, and removes the crossbeam high-performance event bus.

Changes

Integrity Signal Detection and Alerting

Layer / File(s) Summary
Fuzzy hashing library and configuration
daemoneye-lib/src/integrity/fuzzy.rs, daemoneye-lib/Cargo.toml, daemoneye-lib/benches/integrity_operations.rs
Feature-gated FuzzyConfig, FuzzyHashError, compute_ssdeep_from_bytes/from_reader, and similarity() APIs; unit tests and benches; optional fuzzyhash dependency wired to fuzzy-hashes.
Protobuf and in-process metadata contracts
daemoneye-lib/proto/common.proto, collector-core/src/event.rs, daemoneye-lib/src/proto.rs, daemoneye-lib/src/integrity/mod.rs, procmond/src/lib.rs
ProcessRecord adds ssdeep_hash, on_disk_mismatch, ssdeep_degraded. ProcessEvent adds reserved metadata keys and getters/setters; proto↔native conversions updated and documented; tests added to verify lifting.
On-disk executable mismatch detection
procmond/src/linux_collector.rs
Adds classify_exe_target to strip " (deleted)" suffix and set on_disk_mismatch on ProcessEvent; includes unit tests.
Ssdeep computation in hash pass
procmond/src/hash_pass.rs
populate_hashes computes best-effort ssdeep alongside SHA-256 using cloned fds and blocking tasks; stamps ProcessEvent via set_ssdeep_signal; introduces ssdeep_failures telemetry and tests.
Hash population budget & event-source changes
procmond/src/event_source.rs, procmond/src/monitor_collector.rs
Hash population runs on an independent fixed collection timeout; timeout logging/messages updated; shutdown-aware cancellation preserved.
Agent integrity alert generation
daemoneye-agent/src/integrity_alerts.rs, daemoneye-agent/src/main.rs
New module defines synthetic rule IDs and BinaryChangeTracker that maintains per-executable ssdeep baselines, validates similarity thresholds, emits binary-change alerts on threshold drops, evicts stale baselines, and converts boolean integrity flags into Alerts; main loop initializes tracker and merges integrity alerts with rule-based alerts.

Test, Benchmark, and Dependency Updates

Layer / File(s) Summary
Test fixture updates for schema extension
collector-core/tests/..., daemoneye-lib/tests/..., daemoneye-lib/benches/ipc_*.rs
Multiple test and benchmark ProcessRecord/ProtoProcessRecord literals updated to use ..Default::default() so newly added proto fields get default values in fixtures.
Ssdeep performance benchmarks
daemoneye-lib/benches/integrity_operations.rs
Adds deterministic make_bytes() helper and two Criterion benches (bench_ssdeep_only, bench_sha256_plus_ssdeep_medium) gated by fuzzy-hashes.
Crossbeam removal and event bus module elimination
Cargo.toml, collector-core/Cargo.toml, collector-core/src/lib.rs, collector-core/src/collector.rs, AGENTS.md
Removes crossbeam from workspace and collector-core manifests; deletes collector-core/src/high_performance_event_bus.rs and its public re-exports; updates docs to reflect tokio-based local event distribution and tightens AGENTS commit policy wording.

Sequence Diagram(s)

sequenceDiagram
  participant Collection as Linux Process Collection
  participant HashPass as Hash Pass (populate_hashes)
  participant Conversion as Proto Conversion
  participant Agent as Agent Detection Loop
  participant Alerts as Alert Pipeline

  Collection->>Collection: read /proc/pid/exe\nclassify_exe_target()
  Collection->>Collection: Create ProcessEvent\nset_on_disk_mismatch(bool)

  HashPass->>HashPass: Clone authorized fd
  HashPass->>HashPass: SHA-256 hash
  HashPass->>HashPass: Blocking: ssdeep compute_ssdeep_best_effort()
  HashPass->>Collection: Stamp event\nset_ssdeep_signal(hash, degraded)

  Conversion->>Conversion: Extract ssdeep/mismatch\nfrom platform_metadata
  Conversion->>Conversion: Populate ProtoRecord\nfields 15-17

  Agent->>Agent: detect_integrity_alerts()\n(degraded, mismatch flags)
  Agent->>Agent: BinaryChangeTracker.observe()\n(compare baseline)

  Alerts->>Alerts: build_alert()\n(custom dedup key)
  Alerts->>Alerts: Emit alerts: degraded, mismatch, binary_change
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • EvilBit-Labs/DaemonEye#168: Touches the collector-core event bus and related bus payload types; overlaps with the removed crossbeam-based event-bus work.
  • EvilBit-Labs/DaemonEye#170: Prior hashing pipeline changes that this PR extends with ssdeep integration.

Suggested labels

rust, core-feature, process-monitoring, data-models, protobuf, daemoneye-agent, type:feature, ipc, async

Poem

🧩 ssdeep hums where SHA once stood,
Deleted links trimmed, paths understood,
Baselines learn, alerts arise,
Dedup keys keep distinct surprise,
Watch integrity — steady eyes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Significant disconnect: PR delivers M1/T1 integrity hashing and crossbeam removal, but END-297 requires a CrossProcessEventBus broker with IPC topic-routing, load-balancing, and health monitoring—none of which are present in this changeset. Verify that END-297 is the intended linked issue. If END-297 is correct, the PR scope does not meet its requirements. If M1/T1 is the primary work, link to the correct issue instead.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits with type (feat) and scope (M1/T1), clearly summarizing the main changes: integrity hashing implementation and crossbeam removal.
Description check ✅ Passed Description comprehensively explains the integrity hashing (ssdeep, mismatch detection, degraded coverage) and crossbeam removal work, with implementation notes, test plan, and known trade-offs clearly documented.
Out of Scope Changes check ✅ Passed Changes are tightly scoped to M1/T1 objectives: integrity hashing pathway, crossbeam removal, and supporting protobuf/proto conversion updates. No unrelated refactoring or feature creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/m1-integrity-hashing-crossbeam-removal

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust Pull requests that update rust code data-models Data structure and model related ipc Inter-Process Communication labels Jun 10, 2026
@coderabbitai coderabbitai Bot added process-monitoring Process monitoring and enumeration features core-feature Core system functionality type:feature daemoneye-agent labels Jun 10, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
procmond/src/hash_pass.rs (1)

429-449: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat fd-clone failures as degraded coverage, not "not attempted".

If try_clone() hits EMFILE/ENFILE, this path logs at debug and later stamps ssdeep_degraded = false. That keeps stats.ssdeep_failures at zero and suppresses the degraded-coverage alert even though ssdeep coverage was lost under resource pressure.

🐛 Suggested change
-    let fuzzy_file = match file.try_clone() {
-        Ok(clone) => Some(clone),
-        Err(ref err) => {
-            debug!(path = ?exe.as_path(), error = %err, "ssdeep fd clone failed; skipping fuzzy hash");
-            None
-        }
-    };
+    let (fuzzy_file, clone_failed) = match file.try_clone() {
+        Ok(clone) => (Some(clone), false),
+        Err(ref err) => {
+            warn!(path = ?exe.as_path(), error = %err, "ssdeep fd clone failed");
+            (None, true)
+        }
+    };
...
-                let (ssdeep, ssdeep_degraded) =
-                    compute_ssdeep_best_effort(fuzzy_file, exe.as_path()).await;
+                let (ssdeep, ssdeep_degraded) = if clone_failed {
+                    (None, true)
+                } else {
+                    compute_ssdeep_best_effort(fuzzy_file, exe.as_path()).await
+                };

As per coding guidelines, procmond/** is a privileged collector, and the PR’s degraded-coverage signaling only works if coverage drops are surfaced to operators.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@procmond/src/hash_pass.rs` around lines 429 - 449, The fd-clone failure
(try_clone) is currently treated as "not attempted" (fuzzy_file = None) which
suppresses degraded-coverage signals; instead record that cloning failed and
surface that as degraded coverage. Change the fuzzy_file handling so the clone
error is distinguishable (e.g. keep a bool/enum like fuzzy_clone_failed or make
fuzzy_file a Result<File, io::Error>), log the error at warn/debug as you
already do, and when calling or after compute_ssdeep_best_effort (function
compute_ssdeep_best_effort and call site using fuzzy_file and ssdeep_degraded),
ensure that a prior clone failure forces ssdeep_degraded = true (or pass the
failure flag into compute_ssdeep_best_effort so it returns degraded=true).
Update references to fuzzy_file, compute_ssdeep_best_effort, and ssdeep_degraded
accordingly so resource-exhaustion clone failures count as degraded coverage.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@procmond/src/hash_pass.rs`:
- Around line 258-261: The loop is currently calling
event.set_ssdeep_signal(None, false) for every event which unnecessarily
allocates; instead stop materializing ssdeep metadata on reset by either
removing the set_ssdeep_signal(None, false) call from the events.iter_mut() loop
(leave event.executable_hash = None and event.hash_algorithm = None only) or
change set_ssdeep_signal to early-return/no-alloc when the first argument is
None and the boolean is false; update the code that relies on ssdeep_degraded()
accordingly so missing metadata continues to default to false (reference: the
events.iter_mut() loop and the set_ssdeep_signal method on the event type).

---

Outside diff comments:
In `@procmond/src/hash_pass.rs`:
- Around line 429-449: The fd-clone failure (try_clone) is currently treated as
"not attempted" (fuzzy_file = None) which suppresses degraded-coverage signals;
instead record that cloning failed and surface that as degraded coverage. Change
the fuzzy_file handling so the clone error is distinguishable (e.g. keep a
bool/enum like fuzzy_clone_failed or make fuzzy_file a Result<File, io::Error>),
log the error at warn/debug as you already do, and when calling or after
compute_ssdeep_best_effort (function compute_ssdeep_best_effort and call site
using fuzzy_file and ssdeep_degraded), ensure that a prior clone failure forces
ssdeep_degraded = true (or pass the failure flag into compute_ssdeep_best_effort
so it returns degraded=true). Update references to fuzzy_file,
compute_ssdeep_best_effort, and ssdeep_degraded accordingly so
resource-exhaustion clone failures count as degraded coverage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96836fa5-76da-4ae8-b345-e97b5d81ed45

📥 Commits

Reviewing files that changed from the base of the PR and between 96ffe6a and 7823f45.

⛔ Files ignored due to path filters (1)
  • daemoneye-eventbus/benches/ipc_performance.rs is excluded by none and included by none
📒 Files selected for processing (1)
  • procmond/src/hash_pass.rs

Comment thread procmond/src/hash_pass.rs
- procmond hash_pass: an fd-clone failure now marks ssdeep_degraded (ssdeep
  coverage absent while SHA-256 succeeded = degraded, per the operator
  requirement) instead of returning a silent non-degraded None. (Copilot)
- collector-core event.rs: integrity setters no longer materialize
  platform_metadata for default values — set_on_disk_mismatch(false) /
  set_ssdeep_signal(None, false) now clear the keys and drop an emptied object
  rather than allocating per process on the 10k+ scan path. (Copilot, CodeRabbit)
- daemoneye-lib integrity bench: gate the ssdeep benches behind `fuzzy-hashes`
  (they panic via the FeatureDisabled stub when the feature is off) and stream
  via compute_ssdeep_from_reader to match procmond's actual path, not
  compute_ssdeep_from_bytes. (CodeRabbit)
- daemoneye-eventbus broker bench: correct the doc — from_broker starts the
  broker, which binds a transport server (the measured path is still in-process
  mpsc); use to_string_lossy to avoid a panic on a non-UTF-8 temp path. (Copilot)
- architecture doc: document the ssdeep_hash / on_disk_mismatch / ssdeep_degraded
  field semantics in the ProcessRecord schema block. (CodeRabbit)
- AGENTS.md: rename rule 02 heading to match its main-branch-scoped body. (Copilot)

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings June 10, 2026 23:47

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.

Comment thread procmond/src/hash_pass.rs
Comment on lines +432 to +435
// by path, no new TOCTOU window. A clone failure (e.g. fd exhaustion) is a
// resource condition, not a hash failure: log it so the dropped ssdeep
// coverage is observable, but keep the "not attempted" (non-degraded)
// semantics rather than flooding degraded-coverage alerts.
Comment on lines +93 to +102
rt.block_on(async {
event_bus
.publish(
create_test_process_event(black_box(1234)),
"bench-e2e".to_string(),
)
.await
.unwrap();
black_box(receiver.recv().await)
})
@coderabbitai coderabbitai Bot added dependencies Pull requests that update a dependency file cross-platform Multi-platform compatibility features testing Related to test development and test infrastructure integration Related to integration testing and component integration crypto Cryptographic functionality and hashing performance breaking_change and removed rust Pull requests that update rust code data-models Data structure and model related protobuf Protocol Buffer related changes ipc Inter-Process Communication process-monitoring Process monitoring and enumeration features core-feature Core system functionality type:feature daemoneye-agent labels Jun 10, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
daemoneye-lib/benches/integrity_operations.rs (1)

158-177: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Benchmark claim still overstates production-path parity.

bench_sha256_plus_ssdeep_medium computes SHA-256 from tmp.path() but computes ssdeep from an in-memory Cursor<&[u8]>. That still omits the file/FD streaming path procmond uses for ssdeep, so the “end-to-end per executable” overhead comparison is biased. Either stream ssdeep from the temp file reader in-loop or rename/describe this as mixed-path benchmarking.

Proposed minimal adjustment
 fn bench_sha256_plus_ssdeep_medium(c: &mut Criterion) {
     let rt = Runtime::new().expect("tokio runtime");
     let size = 256 * 1024;
     let tmp = make_file(size);
-    let bytes = make_bytes(size);
     let hasher = build_hasher(vec![HashAlgorithm::Sha256]);
     c.bench_function("integrity_sha256_plus_ssdeep_256kib", |b| {
         b.iter(|| {
             rt.block_on(async {
                 black_box(hasher.compute(tmp.path()).await.expect("sha256 hash"));
             });
+            let mut file = std::fs::File::open(tmp.path()).expect("open file for ssdeep");
             black_box(
-                fuzzy::compute_ssdeep_from_reader(&mut Cursor::new(&bytes)).expect("ssdeep digest"),
+                fuzzy::compute_ssdeep_from_reader(&mut file).expect("ssdeep digest"),
             );
         });
     });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@daemoneye-lib/benches/integrity_operations.rs` around lines 158 - 177, The
benchmark bench_sha256_plus_ssdeep_medium currently computes SHA-256 from the
temp file path (hasher.compute(tmp.path())) but computes ssdeep from an
in-memory Cursor (&bytes), which misrepresents the production streaming path;
change the ssdeep call to read from the same temp file reader instead (use
tmp.path() and open a File/BufReader and pass that reader into
fuzzy::compute_ssdeep_from_reader) inside the benchmark loop, reopening the file
(or seeking back to start) each iteration so the streaming path is measured
end-to-end and not the in-memory shortcut.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@daemoneye-lib/benches/integrity_operations.rs`:
- Around line 158-177: The benchmark bench_sha256_plus_ssdeep_medium currently
computes SHA-256 from the temp file path (hasher.compute(tmp.path())) but
computes ssdeep from an in-memory Cursor (&bytes), which misrepresents the
production streaming path; change the ssdeep call to read from the same temp
file reader instead (use tmp.path() and open a File/BufReader and pass that
reader into fuzzy::compute_ssdeep_from_reader) inside the benchmark loop,
reopening the file (or seeking back to start) each iteration so the streaming
path is measured end-to-end and not the in-memory shortcut.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0b169a1-0369-4d29-8600-bc43d34e9fe6

📥 Commits

Reviewing files that changed from the base of the PR and between 7823f45 and 7ff3d73.

⛔ Files ignored due to path filters (1)
  • daemoneye-eventbus/benches/broker_inprocess_latency.rs is excluded by none and included by none
📒 Files selected for processing (5)
  • AGENTS.md
  • collector-core/src/event.rs
  • daemoneye-lib/benches/integrity_operations.rs
  • docs/src/architecture/system-architecture.md
  • procmond/src/hash_pass.rs

…pwire

The native ProcessRecord has no integrity-signal fields; ssdeep_hash /
on_disk_mismatch / ssdeep_degraded live only on the protobuf record and are
produced on the procmond ProcessEvent -> proto path. Both From conversions
therefore drop them, and the agent integrity-alert bridge depends on reading
the signals off the proto record *before* conversion.

That correspondence is hand-maintained and the type system can't express it, so
add a tripwire test asserting native -> proto defaults the three fields and a
signal-carrying proto loses them after a native round-trip. A future edit that
adds the fields to the native model (or wires them through) now fails this test
and forces a deliberate decision instead of silently changing the lossy
boundary. Surfaced by the type-design and test-coverage review passes on #190.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@coderabbitai coderabbitai Bot removed dependencies Pull requests that update a dependency file cross-platform Multi-platform compatibility features testing Related to test development and test infrastructure labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alerting Alert delivery and notification systems architecture System architecture and design decisions async Related to asynchronous programming and async/await patterns core-feature Core system functionality cryptography Cryptographic implementations and security daemoneye-agent data-models Data structure and model related enhancement New feature or request ipc Inter-Process Communication process-monitoring Process monitoring and enumeration features protobuf Protocol Buffer related changes rust Pull requests that update rust code size:XXL This PR changes 1000+ lines, ignoring generated files. type:feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants