Skip to content

docs: add admin UI and key visualizer design#545

Open
bootjp wants to merge 30 commits intomainfrom
feat/admin_ui_key_visualizer
Open

docs: add admin UI and key visualizer design#545
bootjp wants to merge 30 commits intomainfrom
feat/admin_ui_key_visualizer

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 19, 2026

Proposes a standalone cmd/elastickv-admin binary and a TiKV-style key visualizer heatmap. Avoids the Prometheus client dependency in the initial phases by adding an in-process LiveSummary alongside the existing observers, and keeps sampler hot-path overhead below the benchmark noise floor via adaptive 1-in-N sampling with a ≥95% capture SLO.

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive design spec for the admin Web UI and Key Visualizer (architecture, sampling, persistence, UI interactions, phased plan).
  • New Features

    • Introduced a standalone admin HTTP server with cluster-overview API, membership discovery, and graceful shutdown/CLI controls.
    • Added a token-protected Admin gRPC surface with cluster, raft, adapter, key‑visualizer, route detail, and live event/stream endpoints.
    • Key Visualizer: route-level sampling, coarsening, live column updates, drill-down, and uncertainty rendering.
  • Tests

    • New unit/integration tests for admin HTTP, gRPC, discovery/fanout, auth, client caching, and admin startup behavior.
  • Chores

    • Protobuf generation updated to include the new admin service.

Proposes a standalone cmd/elastickv-admin binary and a TiKV-style key
visualizer heatmap. Avoids the Prometheus client dependency in the
initial phases by adding an in-process LiveSummary alongside the
existing observers, and keeps sampler hot-path overhead below the
benchmark noise floor via adaptive 1-in-N sampling with a ≥95% capture
SLO.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 28 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 28 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c58daf3a-1af2-4304-a65a-918c8eda1142

📥 Commits

Reviewing files that changed from the base of the PR and between a126e71 and 5efc655.

⛔ Files ignored due to path filters (1)
  • proto/admin.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (10)
  • adapter/admin_grpc.go
  • adapter/admin_grpc_test.go
  • cmd/elastickv-admin/main.go
  • cmd/elastickv-admin/main_test.go
  • docs/admin_ui_key_visualizer_design.md
  • internal/tokenfile.go
  • internal/tokenfile_test.go
  • main.go
  • main_admin_test.go
  • proto/admin.proto
📝 Walkthrough

Walkthrough

Adds a new admin subsystem: a standalone elastickv-admin HTTP frontend that fans out to per-node, token-protected, read-only Admin gRPC endpoints; a node-side Admin gRPC server and interceptors; new proto/admin.proto RPCs/messages; admin CLI/transport logic; and a Key Visualizer design document describing sampling, storage, and UI behavior.

Changes

Cohort / File(s) Summary
Design doc
docs/admin_ui_key_visualizer_design.md
New comprehensive design for the admin Web UI and Key Visualizer covering sampler placement, adaptive sampling, COW route table, split/merge continuity, phased persistence, namespace isolation, adapter summaries, frontend UX/WS behaviors, and phased delivery/testcases.
Admin protobufs & generation
proto/admin.proto, proto/Makefile
Adds Admin gRPC service (GetClusterOverview, GetRaftGroups, GetAdapterSummary, GetKeyVizMatrix, GetRouteDetail, StreamEvents) plus messages/enums; proto/Makefile updated to include admin.proto in Go/GRPC generation.
Node-side gRPC server
adapter/admin_grpc.go, adapter/admin_grpc_test.go
New AdminServer, AdminGroup interface and NodeIdentity; implements GetClusterOverview and GetRaftGroups, deterministic sorted replies, group registration API, and AdminTokenAuth unary/stream interceptors with tests for auth, ordering, and response fields.
Admin CLI / frontend server
cmd/elastickv-admin/main.go, cmd/elastickv-admin/main_test.go
New elastickv-admin binary: CLI flags (bind, seeds, token file, TLS/insecure), membership cache with singleflight discovery, concurrent fan-out to node Admin gRPCs, /api/cluster/overview and /healthz handlers, graceful shutdown, proto→JSON marshaling, safety limits, and extensive tests for parsing, token loading, transport creds, discovery/fanout caching, and HTTP behavior.
Node integration & flags
main.go, main_admin_test.go
Nodes optionally register per-raft Admin service; startup loads admin token or enables insecure dev mode (mutually exclusive), wires auth interceptors into raft gRPC servers, and exposes bootstrap-member-derived admin peer list; tests validate config, token loading, and member extraction.
Misc small change
internal/raftengine/etcd/fsm_snapshot_file.go
Replaced hardcoded .snap string with shared snapFileExt constant for snapshot filename parsing and enumeration.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant UI as Admin UI (browser)
    participant Fanout as elastickv-admin (HTTP)
    participant Node as elastickv node (Admin gRPC)
    participant Engine as Raft Group Engine

    UI->>Fanout: GET /api/cluster/overview
    Fanout->>Node: GetClusterOverview (gRPC, Bearer token)
    Node->>Engine: Status() per registered group
    Engine-->>Node: group status (leader, term, indexes)
    Node-->>Fanout: GetClusterOverviewResponse
    Fanout-->>UI: aggregated JSON { nodes: [...] }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through bytes and counted each route,

I sampled the warm and noted the stout,
Heatmaps bloom, tokens hum in the night,
Admin fans out and clusters shine bright,
A rabbit applauds — charts tidy and light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: add admin UI and key visualizer design' accurately summarizes the main change—a comprehensive design document for the admin UI and key visualizer. It is specific, concise, and clearly reflects the primary contribution of this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/admin_ui_key_visualizer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a design for a standalone Admin UI and Key Visualizer for Elastickv, utilizing a low-overhead sampler and a Svelte-based frontend to provide cluster insights. The review feedback identifies several critical areas for refinement, including potential concurrency bugs in the sampler's map access, scalability concerns regarding memory usage for high route counts, and the operational risk of centralizing data persistence in the default Raft group. Furthermore, the feedback suggests re-evaluating the mathematical definition of the capture rate SLO, the performance overhead of T-Digests on the hot path, and the necessity of a cluster-wide aggregation strategy to ensure a complete view of traffic.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
…aling

- Replace lock-free claim with concrete shard-striped map + atomic
  pointer-swap flush; child counters are installed before RouteID is
  published so Observe never races with splits/merges (H1).
- Distribute KeyViz persistence across the user Raft groups that own
  each route, with per-group hourly batches, instead of centralising
  writes on the default group (H2).
- Cap tracked routes via --keyvizMaxTrackedRoutes (default 10000) with
  activity-weighted coarsening of adjacent routes; memory footprint is
  now bounded independent of total route count (H3).
- Redefine the accuracy SLO as plus-or-minus 5 percent relative error
  at 95 percent CI on bucket totals, since raw capture rate is
  incompatible with Horvitz-Thompson estimation under stepwise
  sub-sampling (M1).
- Replace the hot-path t-digest in LiveSummary with a fixed-bucket
  log-linear histogram so every observation is a single atomic add (M2).
- Make cluster-wide fan-out the default mode of cmd/elastickv-admin;
  GetKeyVizMatrix now only serves this node leader-owned routes and
  the admin binary merges and reports per-node partial-status (M3).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 20, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design document for a new Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary and a low-overhead sampling mechanism for traffic heatmaps. The review feedback highlights several critical improvements: ensuring historical data continuity by using stable identifiers for persistence instead of volatile route IDs, optimizing the hot-path sampling logic with a lock-free map to reduce cache line contention, and correcting the mathematical formula for the estimator's relative error to ensure accurate sampling statistics.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design for a built-in Admin UI and a Key Visualizer for Elastickv, featuring a standalone admin binary and a high-performance traffic sampler. The design includes strategies for adaptive sub-sampling, memory management, and a phased implementation. The review feedback identifies several critical design gaps and operational risks, such as potential unbounded data growth due to storage layer limitations, the risk of exposing internal metadata to user operations, and potential data loss during leadership transitions. It also emphasizes the need for unique ID generation and state recovery mechanisms to ensure historical continuity.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local reserved key `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the global user keyspace or default group. `lineageID` is a stable KeyViz identifier stored with `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` metadata; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>`. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.
- A TTL of 7 days is applied via the existing HLC-based expiry (`store/lsm_store.go:24`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The design relies on HLC-based expiry for a 7-day TTL, but the pebbleStore.Compact implementation in store/lsm_store.go (lines 1115-1138) only removes old MVCC versions and does not check ExpireAt for the latest version. This poses an operational risk of unbounded data growth; the design should detail mitigation strategies, such as updating the store's GC logic, to ensure data is correctly evicted.

References
  1. When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local reserved key `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the global user keyspace or default group. `lineageID` is a stable KeyViz identifier stored with `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` metadata; `RouteID` is recorded only as the current routing hint, never as the primary history key.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The !admin| prefix for internal KeyViz data is not currently filtered by ShardedCoordinator.maxLatestCommitTS or pebbleStore.ScanAt. This could expose internal metadata to user-plane operations or cause user transactions to receive unnecessarily advanced StartTS values.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request outlines the design for a standalone Admin UI and Key Visualizer for Elastickv, introducing a heatmap-based traffic monitor and cluster overview. The design focuses on high-performance sampling and distributed persistence. Feedback highlights the need for earlier authentication to secure node-level metadata, the inclusion of follower-local read sampling to avoid incomplete traffic views, and API enhancements to clarify when route data has been coarsened for the UI.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design document for an Admin UI and Key Visualizer for Elastickv, proposing a standalone admin binary and a hot-path-safe sampler for per-route traffic. The review feedback identifies several operational risks that require mitigation strategies, such as the I/O overhead of the garbage collection pass, potential collisions in UUIDv7 generation, and the risk of data loss due to hourly batching. Additionally, the reviewer suggests improving data isolation by rejecting user-plane writes to the internal admin namespace and updating store-level metadata checks to support prefix-based filtering.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.
- Retention is enforced by a KeyViz-specific GC pass, not by assuming ordinary HLC expiry will delete the latest MVCC version. Phase 3 includes either updating `pebbleStore.Compact` to collect latest versions whose `ExpireAt` is past the retention horizon or adding a KeyViz maintenance delete that tombstones expired column and lineage records before compaction. Persistence refuses to enable if this GC capability is absent, avoiding unbounded growth.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The design proposes a KeyViz-specific GC pass that either updates pebbleStore.Compact or adds a maintenance delete. Since KeyViz data can be high-volume, performing a full scan-and-delete GC pass might introduce significant I/O and CPU overhead. As this represents a significant operational risk, the design must detail mitigation strategies per repository guidelines. Leveraging a CompactionFilter to drop expired versions during background compactions is a recommended mitigation to avoid manual scan overhead.

References
  1. When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.
- `lineageID` is generated as a UUIDv7 using the route transition HLC plus crypto-random entropy, making it cluster-wide unique without coordinating through the default group. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The design mentions generating lineageID as a UUIDv7 using the route transition HLC. Truncating the 64-bit HLC to the 48-bit UUIDv7 timestamp field might lead to collisions or loss of ordering precision, which is a significant operational risk. Per repository guidelines, the design must detail mitigation strategies. Please specify how the 64-bit HLC is mapped into the UUIDv7 structure (e.g., by using the physical part for the timestamp and the logical part for the sequence/random bits) to mitigate this risk.

References
  1. When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
- `lineageID` is generated as a UUIDv7 using the route transition HLC plus crypto-random entropy, making it cluster-wide unique without coordinating through the default group. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Batching KeyViz writes hourly per group minimizes Raft overhead but introduces a risk of losing up to one hour of historical heatmap data if a node crashes before the batch is persisted. This is a significant operational risk. According to repository rules, the design must detail potential mitigation strategies. For Phase 3, or even as a configurable option now, the design should specify a more frequent persistence interval or a recovery strategy to ensure operational continuity.

References
  1. When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The design specifies filtering !admin|* records from user-plane scans and timestamp selection. To ensure complete isolation and prevent accidental or malicious interference with internal metadata, the data-plane adapters (gRPC, Redis, etc.) should also be updated to reject user-plane write requests (PUT, DELETE) that target keys starting with the !admin| prefix.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the current implementation of store/lsm_store.go, the isPebbleMetaKey function (line 299) only checks for exact matches with specific metadata keys. To support the !admin|* namespace filtering mentioned in the design, this function (or its equivalent in Phase 3) should be updated to perform prefix checks. This will ensure that nextScannableUserKey and prevScannableUserKey correctly skip all internal KeyViz records during user-plane scans.

- Detail UUIDv7 HLC mapping (physical→ts, logical→rand_a, crypto rand_b).
- Use Pebble CompactionFilter as preferred GC path; bounded sweep as fallback.
- Replace hourly batch with 5 min default + per-node WAL for crash recovery.
- Reject user-plane writes to !admin|* in coordinator; widen isPebbleMetaKey to prefix match.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design for a built-in Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary and a sampling-based heatmap for traffic analysis. Feedback highlights several technical concerns: the potential for node startup delays due to WAL replay, a security gap where point reads could bypass system namespace isolation, and a risk of ID collisions when mapping 16-bit HLC counters to 12-bit UUIDv7 fields. Additionally, it is recommended to use dynamic membership refresh with caching for the admin binary's fan-out logic to ensure heatmap completeness during cluster scaling.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, the low 12 bits of the HLC **logical** counter populate the `rand_a` sequence field (preserving per-millisecond ordering for transitions in the same physical tick), and the remaining 62 bits of `rand_b` come from `crypto/rand`. This avoids the collision and lost-ordering risk of naïvely truncating the HLC into the timestamp field. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract; on restart the sampler replays the WAL into the in-memory ring buffer, then truncates entries that have since been included in a persisted batch. Operators that want stricter durability set `--keyvizPersistInterval=30s`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Replaying a 5-minute WAL into the in-memory ring buffer on every restart could significantly delay node startup under high QPS. This presents a significant operational risk, particularly for rolling upgrades where nodes must return to service quickly. Per repository guidelines, the design must detail mitigation strategies, such as more frequent persistence (checkpointing) or a mechanism to bound the WAL size/replay time, to ensure the node remains responsive during recovery.

References
  1. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The namespace isolation logic for !admin|* keys should also be applied to GetAt and ExistsAt operations in the ShardedCoordinator. Currently, the design specifies filtering for ScanAt, ReverseScanAt, and maxLatestCommitTS, and rejecting user-plane writes. However, if point reads are not also restricted, a user could potentially read internal KeyViz metadata or lineage records by discovering the keys, which undermines the intended isolation of the system namespace.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, the low 12 bits of the HLC **logical** counter populate the `rand_a` sequence field (preserving per-millisecond ordering for transitions in the same physical tick), and the remaining 62 bits of `rand_b` come from `crypto/rand`. This avoids the collision and lost-ordering risk of naïvely truncating the HLC into the timestamp field. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Mapping the HLC logical counter to the 12-bit rand_a field in UUIDv7 is problematic for high-throughput clusters. Standard HLC implementations often use a 16-bit logical counter to handle high-concurrency bursts within a single millisecond. Truncating this to 12 bits will cause collisions and break the monotonic ordering guarantee for events occurring within the same millisecond if the logical counter exceeds 4095. Since lineageID is used to track continuity across split/merge events, losing ordering could lead to incorrect history reconstruction in the UI.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
## 13. Open Questions

1. Default value of `--keyvizMaxTrackedRoutes`. 10 000 is conservative; operators with very large clusters may prefer a higher default paired with shorter retention. Settle during Phase 2 benchmarking.
2. In fan-out (§9.1), should the admin binary **pin** to the seed list or dynamically refresh membership from `GetClusterOverview` on every request? Dynamic is more correct during scale events; pinned is simpler and avoids stampedes on the seed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Regarding the open question on fan-out membership: dynamic refresh is strongly recommended over pinning to a seed list. Relying on a pinned list will lead to incomplete heatmaps as the cluster scales out or as nodes are replaced. To avoid stampedes on the seed nodes, the admin binary can cache the membership for a short duration (e.g., 10-30 seconds) rather than refreshing on every single request.

- Bound WAL replay: size cap + per-interval checkpoint + shed-on-overflow so rolling restarts stay responsive.
- Extend namespace isolation to GetAt/ExistsAt so point reads cannot probe !admin|*.
- Use full 16-bit HLC logical when packing UUIDv7 (rand_a + top nibble of rand_b) to keep monotonic ordering.
- Resolve fan-out membership open question: dynamic refresh with short-lived cache + lazy invalidation.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
docs/admin_ui_key_visualizer_design.md (2)

252-262: Fix table formatting in the Integration Points section.

Static analysis flags table column count mismatches at lines 257 and 260. Reviewing the table structure at lines 252-262, the two-column layout (File | Change) has rows where the Change cell contains inline code spans with spaces (e.g., `Put`/`Delete`), which the Markdown linter interprets as additional columns.

To fix the linting warnings and improve readability:

  1. Remove spaces inside inline code spans: `Put`/`Delete` should be `Put` / `Delete` or `Put`/`Delete` (no spaces around the slash).
  2. Ensure each table row has exactly two pipe-separated cells.

Also, the code block at line 167 is missing a language identifier (flagged by MD040). Tag it as text or leave it untagged if it's intentionally pseudo-code:

-```
+```text
 Sampler
  ├─ routes atomic.Pointer[routeTable]
  └─ history *ringBuffer[matrixColumn]

These are minor formatting issues that don't affect technical correctness but improve documentation quality.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/admin_ui_key_visualizer_design.md` around lines 252 - 262, The
Integration Points table has rows where the "Change" column contains inline code
spans that include spaces around slashes which the Markdown linter treats as
extra columns (breaks the two-column `File | Change` table); edit the table so
each row has exactly two pipe-separated cells and normalize inline code spans
like `Put`/`Delete` (remove surrounding spaces or use `Put / Delete`) and any
other backticked phrases to avoid introducing additional pipes, and add a
language tag (e.g., text) to the standalone code block containing the Sampler
snippet (the block starting with "Sampler ├─ routes atomic.Pointer[routeTable]")
so MD040 is resolved.

85-86: Consider documenting token lifecycle and rotation strategy.

The design specifies a shared read-only admin token loaded from --adminTokenFile, but doesn't address token rotation, minimum entropy requirements, or compromise recovery. For production deployment, operators need guidance on:

  • Token generation (e.g., require minimum 256-bit entropy)
  • Safe distribution mechanism
  • Rotation procedure without downtime (e.g., dual-token grace period)
  • Monitoring for unauthorized access attempts

Add a "Security Considerations" subsection under §4 that specifies:

### 4.2 Token Security

- Generate admin tokens with at least 256 bits of cryptographic entropy (e.g., `openssl rand -base64 32`).
- Distribute via secure channels (sealed secrets, key management systems).
- Support dual-token mode: `--adminTokenFile` may contain multiple newline-separated tokens; nodes accept any valid token, enabling zero-downtime rotation.
- Log failed authentication attempts to the audit stream for monitoring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/admin_ui_key_visualizer_design.md` around lines 85 - 86, Add a "Security
Considerations" subsection under §4 that documents admin token lifecycle and
rotation for the Admin service (registered alongside RawKV), specifying that
admin tokens loaded via --adminTokenFile (and sent from admin binary using
--nodeTokenFile) must be generated with >=256-bit entropy (example: openssl rand
-base64 32), should be distributed via secure channels (sealed secrets / KMS),
support dual-token mode by allowing --adminTokenFile to contain multiple
newline-separated tokens so nodes accept any valid token for zero-downtime
rotation, require logging of failed authentication attempts to the audit stream,
and include a brief compromise-recovery procedure and guidance on using
--adminInsecureNoAuth only for local development (with startup warning).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/admin_ui_key_visualizer_design.md`:
- Line 161: Clarify how CPU attributed to Observe is measured: update the text
around "sampleRate" and the adaptive controller (and add a brief note in §5.2)
to state the chosen measurement approach explicitly—preferably Option 1
(synthetic cost model), e.g., compute Observe cost as total flush-window
duration or routeCount × benchmarkedCostPerObserve to avoid hot-path profiling;
if Option 2 is chosen instead, state using runtime.SetCPUProfileRate with a low
sampling rate (e.g., ~10Hz) and that attribution is approximate and
low-overhead. Mention both options briefly and mark which one the design uses.
Ensure references to "sampleRate" and "Observe" remain consistent.
- Around line 206-207: The text around lineageID and "route transition HLC" must
explicitly state the timestamp source to ensure leader-issued timestamps are
used: clarify that the "route transition HLC" used to derive lineageID (and
populate validFromHLC/validToHLC) is the Raft-leader-issued HLC/timestamp
attached to the SplitRange/MergeRange Raft proposal (or otherwise derived
deterministically from the Raft log index), not a node-local HLC snapshot;
update the wording for lineageID, validFromHLC, validToHLC and any references to
split/merge events to require leader-issued timestamps or a deterministic
Raft-index-based derivation to avoid violating timestamp safety.
- Around line 206-207: The lineageID UUIDv7 mapping drops the top 4 bits of the
16-bit HLC logical counter (rand_a only takes 12 bits) and uses crypto/rand for
rand_b making IDs non-deterministic across nodes; fix the lineageID generation
by encoding the full 16-bit HLC logical into the UUIDv7 fields (for example,
keep unix_ts_ms from the 48-bit HLC physical, place all 16 logical bits by using
the remaining bits of rand_a plus the high bits of rand_b as a deterministic
prefix) and derive the rest of rand_b deterministically (e.g., from a secure
hash of the HLC or node-specific stable ID + HLC) instead of crypto/rand so
identical HLCs produce identical lineageID values; update the mapping logic used
where lineageID is produced and anywhere rand_a/rand_b are referenced to ensure
deterministic, full-precision encoding.
- Line 143: Add a note in §5.1 or §5.6 describing behavior on leadership loss:
when a node's lease-loss callback for a Raft group fires, the sampler should
mark all leaderWrite samples for that group in the current and previous step
window with staleLeader=true (or optionally clear them immediately), and
document that the admin binary's fan-out/merge logic (§9.1) must exclude
staleLeader samples so they aren't double-counted; reference the sampler,
leaderWrite, staleLeader flag, lease-loss callback, and the admin fan-out merge
in the text.

---

Nitpick comments:
In `@docs/admin_ui_key_visualizer_design.md`:
- Around line 252-262: The Integration Points table has rows where the "Change"
column contains inline code spans that include spaces around slashes which the
Markdown linter treats as extra columns (breaks the two-column `File | Change`
table); edit the table so each row has exactly two pipe-separated cells and
normalize inline code spans like `Put`/`Delete` (remove surrounding spaces or
use `Put / Delete`) and any other backticked phrases to avoid introducing
additional pipes, and add a language tag (e.g., text) to the standalone code
block containing the Sampler snippet (the block starting with "Sampler ├─ routes
atomic.Pointer[routeTable]") so MD040 is resolved.
- Around line 85-86: Add a "Security Considerations" subsection under §4 that
documents admin token lifecycle and rotation for the Admin service (registered
alongside RawKV), specifying that admin tokens loaded via --adminTokenFile (and
sent from admin binary using --nodeTokenFile) must be generated with >=256-bit
entropy (example: openssl rand -base64 32), should be distributed via secure
channels (sealed secrets / KMS), support dual-token mode by allowing
--adminTokenFile to contain multiple newline-separated tokens so nodes accept
any valid token for zero-downtime rotation, require logging of failed
authentication attempts to the audit stream, and include a brief
compromise-recovery procedure and guidance on using --adminInsecureNoAuth only
for local development (with startup warning).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 012e9477-bf60-4387-b012-93c49d94c8eb

📥 Commits

Reviewing files that changed from the base of the PR and between f1dcc14 and 6a88d26.

📒 Files selected for processing (1)
  • docs/admin_ui_key_visualizer_design.md

Comment thread docs/admin_ui_key_visualizer_design.md
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design for an Admin UI and Key Visualizer for Elastickv, featuring a standalone binary, a Svelte-based frontend, and a sophisticated sampling and persistence mechanism for traffic heatmaps. The review feedback highlights several critical areas for refinement in the Phase 3 persistence and recovery logic, including the need for a WAL corruption strategy, clarification on how follower-local reads are persisted to avoid data loss during rebalancing, and the necessity of restricting lineage record creation to Raft leaders to prevent conflicting IDs. Additionally, the feedback suggests jittering persistence intervals to prevent cluster-wide load spikes and ensuring that namespace isolation is consistently applied to point reads as well as scans.

- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract, with two hard bounds to keep restart fast: the WAL is **size-capped at `--keyvizWALMaxBytes` (default 64 MiB)** and **checkpointed every `--keyvizPersistInterval`** — when a batch is persisted to Raft, the corresponding WAL prefix is truncated. This caps worst-case replay at one interval's worth of data (at the default, tens of MiB at most), and a target recovery budget of **≤1 s replay time at 1 M ops/s**. If the WAL exceeds its size cap before the next flush — indicating the node is behind on persistence — the sampler drops the oldest records and records a `keyviz_wal_shed_total` metric instead of blocking the hot path. On startup the sampler fast-loads the WAL without running the adaptive controller, then resumes normal operation; readiness is gated on WAL replay completion so rolling upgrades do not route traffic to a node that is still rebuilding state. Operators that want stricter durability set `--keyvizPersistInterval=30s`; those that want faster restart at the cost of more write amplification set a smaller `--keyvizWALMaxBytes`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The design gates node readiness on WAL replay completion. While this ensures data continuity, the design should specify how to handle a corrupted WAL file. Without a defined "ignore and continue" or "move aside" strategy, a single corrupted KeyViz WAL segment could prevent a node from joining the cluster or passing health checks, turning a monitoring feature into an availability risk.

References
  1. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

Comment on lines +203 to +205
Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The design for Phase 3 persistence should clarify whether follower-local read samples are proposed to the Raft leader or stored only in the node's local Pebble instance. Section 5.1 states that pointing the admin binary at a single node produces a "partial view" because reads are spread across followers, which suggests the data is not aggregated in the Raft log. However, Section 5.6 describes persistence via "Raft proposal per group" into a namespace that is part of the FSM. If follower reads are not replicated via Raft, they will be lost if a group is rebalanced to a new node, contradicting the goal of preserving 7 days of history across cluster changes.


Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The design states that point reads targeting !admin|* keys should return NotFound. However, the integration points in Section 8 only mention widening the isPebbleMetaKey check for scans. To ensure full isolation, the pebbleStore.getAt and latestCommitTS methods (and their public counterparts) must also be updated to explicitly reject or skip keys with the !admin| prefix, as they do not use the scannable key helpers.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for creating new lineage records on startup should be restricted to the Raft leader of the group. If multiple nodes in a group detect a missing lineage record simultaneously and use the proposed UUIDv7 generation (which includes 58 random bits), they will propose conflicting lineage IDs for the same key range. While Raft will linearize these, it results in redundant records and potential confusion in the lineage tree.

- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract, with two hard bounds to keep restart fast: the WAL is **size-capped at `--keyvizWALMaxBytes` (default 64 MiB)** and **checkpointed every `--keyvizPersistInterval`** — when a batch is persisted to Raft, the corresponding WAL prefix is truncated. This caps worst-case replay at one interval's worth of data (at the default, tens of MiB at most), and a target recovery budget of **≤1 s replay time at 1 M ops/s**. If the WAL exceeds its size cap before the next flush — indicating the node is behind on persistence — the sampler drops the oldest records and records a `keyviz_wal_shed_total` metric instead of blocking the hot path. On startup the sampler fast-loads the WAL without running the adaptive controller, then resumes normal operation; readiness is gated on WAL replay completion so rolling upgrades do not route traffic to a node that is still rebuilding state. Operators that want stricter durability set `--keyvizPersistInterval=30s`; those that want faster restart at the cost of more write amplification set a smaller `--keyvizWALMaxBytes`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid synchronized load spikes across the cluster, the 5-minute persistence interval (--keyvizPersistInterval) should be jittered or staggered per Raft group. If all groups on all nodes attempt to propose their compacted KeyViz columns simultaneously, it could lead to transient Raft election timeouts or increased latency on the data plane hot path.

References
  1. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

…p loss

- Leadership loss: mark in-window leaderWrite samples staleLeader=true; fan-out keys writes by (group, term, window) so stale + fresh samples never double-count.
- CPU attribution: use synthetic costPerObserveNs × observeCount rather than runtime profiling to keep hot path clean.
- lineageID is generated once by the Raft leader as part of the split/merge proposal (leader-issued HLC), deterministically derived from HLC + log index + BLAKE2b so every replica computes the same UUIDv7.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 21, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design document for a new Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary, a high-performance traffic sampler, and a distributed persistence model. The review feedback highlights several critical areas for improvement: addressing potential performance bottlenecks in the cluster-wide fan-out mechanism as the cluster scales, extending write protections to all internal metadata prefixes (such as _meta_), and mitigating a race condition in the lock-free counter flush logic to ensure data accuracy.


Because writes are recorded by Raft leaders and follower-local reads are recorded by the followers that serve them (§5.1), pointing the admin binary at a single node produces a **partial heatmap**. To give operators a complete view by default, the admin binary runs in **fan-out mode**:

- `--nodes` accepts a comma-separated list of seed addresses. The admin binary calls `GetClusterOverview` on any reachable seed to discover the current full membership (node → gRPC endpoint, plus per-group leader identity). Membership is cached for `--nodesRefreshInterval` (**default 15 s**) so a stampede of concurrent browser requests hits at most one `GetClusterOverview` per interval per seed, while scale-out events are still reflected within seconds. The cache is refreshed lazily on the first request after expiry and invalidated immediately on any per-node `Unavailable` error, so removed or replaced nodes are dropped on the next request instead of waiting for the next tick.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cluster-wide fan-out mode for the admin binary (querying every node) may become a performance bottleneck as the cluster scales to hundreds or thousands of nodes. This represents a significant operational risk for large-scale deployments. While acceptable for an MVP, the design must detail potential mitigation strategies for Phase 3, such as querying only the leaders of the relevant Raft groups or implementing a hierarchical aggregation layer to reduce the fan-out pressure.

References
  1. When a design document identifies a significant operational risk, such as performance bottlenecks at scale, it must also detail potential mitigation strategies.


Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The design mentions widening isPebbleMetaKey to a prefix check on !admin| to protect internal KeyViz records from user-plane scans. However, the existing meta keys (e.g., _meta_last_commit_ts) are also sensitive and should be protected from user-plane writes. To mitigate the operational risk of metadata corruption, the design should explicitly state that all internal prefixes (including _meta_) must be rejected by the ShardedCoordinator for user-plane writes.

References
  1. When a design document identifies a significant operational risk, such as metadata corruption, it must also detail potential mitigation strategies.

Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment on lines +149 to +150
- **Current-window counters**: `routes` is an immutable `routeTable` published through `atomic.Pointer[routeTable]`. `routeTable` owns `map[RouteID]*routeSlot`; each `routeSlot` owns an `atomic.Pointer[routeCounters]`. `Observe` loads the current table, performs a plain map lookup against that immutable snapshot, loads the slot's counter pointer, and uses `atomic.AddUint64` on the counter fields. Adding a new `RouteID` or replacing split/merge mappings performs a copy-on-write table update under a non-hot-path `routesMu`, then publishes the new table with one atomic store. No `Observe` call ever runs against a Go map that can be mutated concurrently.
- **Flush**: instead of holding a long write lock, the flush goroutine **atomically swaps** the `*routeCounters` pointer for each key using `atomic.Pointer[routeCounters]`, then reads the old pointer's frozen counters to build the new matrix column. `Observe` that loaded the old pointer before the swap completes its increments against the (now-retired) old counters, which the next flush will harvest. No counts are lost; at most one step-boundary's worth of counts land in the next column instead of the current one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The lock-free pointer swap scheme for routeCounters is efficient, but there is a potential race condition during the flush. If the flush goroutine reads the old counters immediately after the swap, it might miss increments from Observe calls that loaded the old pointer just before the swap but haven't executed atomic.AddUint64 yet. To ensure no counts are lost as claimed, the flush mechanism should include a small grace period or a double-buffering strategy to allow pending increments to settle before harvesting the frozen values.

Merged main (PR #590 dropped hashicorp/raft); migrated admin wiring to raftengine.Server. Addressed Codex P2 and CodeRabbit findings on commit ac8d6c3:

- Refcount cached gRPC clients (cmd/elastickv-admin): clientFor now returns (client, release, err); evictOneLocked and invalidateClient retire entries but keep the *grpc.ClientConn alive until the last borrower releases. Prevents spurious cancellation of healthy in-flight GetClusterOverview calls when the cache hits maxCachedClients under fan-out load.
- New --allowRemoteBind flag; by default --bindAddr is rejected unless the host is loopback. The admin HTTP layer has no browser-facing auth while holding the node admin token, so remote bind is an explicit opt-in for operators behind an auth proxy.
- AdminServer.GetRaftGroups now populates LastContactUnixMs from raftengine.Status.LastContact (absolute ms = nowFunc() - LastContact). Nowtest-seamed via nowFunc so tests are deterministic.
- Design doc §5.1 flush path rewritten to use atomic.SwapUint64 instead of pointer-swap, so no late Observe can race past the flush snapshot — no retired counters to harvest.
- Design doc §5.2 calibration: replace "benchmark at startup" with a checked-in costPerObserveNs constant with CI drift check.

Regression tests added under -race for: eviction-while-in-flight, post-close invalidateClient no-op, --bindAddr loopback validation, GetRaftGroups LastContactUnixMs.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

@codex review

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review @codex review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
docs/admin_ui_key_visualizer_design.md (1)

268-271: ⚠️ Potential issue | 🟡 Minor

Escape | characters inside the integration table.

markdownlint still parses these rows as extra table columns because raw | appears inside code spans. Escape the reserved-key examples so the table renders and lint passes.

📝 Proposed fix
-| `kv/sharded_coordinator.go` | One-line `sampler.Observe(...)` at dispatch entry; `sampler` is `keyviz.Sampler` injected via constructor, nil-safe. Phase 3 also filters `!admin|*` from `maxLatestCommitTS` and rejects user-plane writes (`Put`/`Delete`/transactional mutations) targeting `!admin|*` with `InvalidArgument`, so adapters (gRPC, Redis, DynamoDB, S3) cannot bypass the isolation. |
+| `kv/sharded_coordinator.go` | One-line `sampler.Observe(...)` at dispatch entry; `sampler` is `keyviz.Sampler` injected via constructor, nil-safe. Phase 3 also filters `!admin\|\*` from `maxLatestCommitTS` and rejects user-plane writes (`Put`/`Delete`/transactional mutations) targeting `!admin\|\*` with `InvalidArgument`, so adapters (gRPC, Redis, DynamoDB, S3) cannot bypass the isolation. |
@@
-| `store/lsm_store.go` | Phase 3 widens `isPebbleMetaKey` from exact-match to a prefix check on `!admin|` so `nextScannableUserKey` / `prevScannableUserKey` skip all internal KeyViz records during user-plane scans; adds retention GC (Pebble `CompactionFilter` preferred, time-budgeted maintenance sweep fallback) for expired `!admin|keyviz|*` columns and lineage records. |
+| `store/lsm_store.go` | Phase 3 widens `isPebbleMetaKey` from exact-match to a prefix check on `!admin\|` so `nextScannableUserKey` / `prevScannableUserKey` skip all internal KeyViz records during user-plane scans; adds retention GC (Pebble `CompactionFilter` preferred, time-budgeted maintenance sweep fallback) for expired `!admin\|keyviz\|\*` columns and lineage records. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/admin_ui_key_visualizer_design.md` around lines 268 - 271, The table
rows contain raw pipe characters inside code spans (e.g.
`kv/sharded_coordinator.go`, `isPebbleMetaKey`, and reserved keys like `!admin|`
and `!admin|keyviz|*`) which Markdown parses as column separators; update the
table entries in docs/admin_ui_key_visualizer_design.md to escape the pipe
characters inside those inline code spans (replace `|` with `\|` inside the code
text) so entries like `!admin\|` and `!admin\|keyviz\|*` render as single cells
and satisfy markdownlint; ensure all occurrences in the listed entries
(`kv/sharded_coordinator.go`, `keyviz/`, `monitoring/live_summary.go`,
`store/lsm_store.go`, and references to `isPebbleMetaKey`) are updated
consistently.
adapter/admin_grpc.go (1)

111-127: ⚠️ Potential issue | 🟠 Major

Preserve “unknown” for zero LastContact.

Line 118 says zero can mean “no contact recorded yet”, but Line 126 serializes it as the current time, making an unknown/stale peer look freshly contacted. Keep 0 for unknown and only compute an absolute timestamp when LastContact > 0.

🐛 Proposed fix
 	now := nowFunc()
 	for _, id := range ids {
 		st := s.groups[id].Status()
+		lastContactUnixMs := int64(0)
+		if st.LastContact > 0 {
+			lastContactUnixMs = now.Add(-st.LastContact).UnixMilli()
+		}
 		// Translate LastContact (duration since the last contact with the
@@
 			LeaderTerm:        st.Term,
 			CommitIndex:       st.CommitIndex,
 			AppliedIndex:      st.AppliedIndex,
-			LastContactUnixMs: now.Add(-st.LastContact).UnixMilli(),
+			LastContactUnixMs: lastContactUnixMs,
 		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/admin_grpc.go` around lines 111 - 127, The current serialization in
adapter/admin_grpc.go turns a zero st.LastContact into "now" which hides
unknown/no-contact state; in the loop over ids using nowFunc(), change the logic
that sets pb.RaftGroupState.LastContactUnixMs so that if st.LastContact == 0 you
set LastContactUnixMs to 0 (preserve unknown) and only compute
now.Add(-st.LastContact).UnixMilli() when st.LastContact > 0; update the code
around the Status() call and the pb.RaftGroupState construction to branch on
st.LastContact accordingly.
🧹 Nitpick comments (2)
cmd/elastickv-admin/main.go (1)

167-167: Use structured slog in the new admin command.

This new command logs operational events with log.Printf; please switch the new log sites to slog with structured keys such as bind_addr, seeds, node, and error.

As per coding guidelines, Use slog for logging and maintain structured logging keys (e.g., key, commit_ts).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/elastickv-admin/main.go` at line 167, Replace the plain log.Printf call
with structured slog usage: initialize or use the existing slog logger and emit
an Info event for the admin start with structured keys like bind_addr and seeds
(reference bindAddr and cfg.seeds), e.g., call slog.Info or slog.Log with
attributes "bind_addr", *bindAddr and "seeds", cfg.seeds; also include
additional keys such as "node" if available and ensure the slog package is
imported and logger initialized before use (update main or init where main sets
up logging).
main.go (1)

520-522: Use slog for the new admin warning.

This new warning should use structured logging rather than log.Printf, e.g. slog.Warn("admin grpc service exposed without authentication", "flag", "adminInsecureNoAuth").

As per coding guidelines, Use slog for logging and maintain structured logging keys (e.g., key, commit_ts).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.go` around lines 520 - 522, Replace the plain log.Printf call that warns
when the adminInsecureNoAuth flag is set with structured slog logging: call
slog.Warn (not log.Printf) and include at least the structured key/value pairs
such as "flag": "adminInsecureNoAuth" (and any other context keys your project
requires, e.g., "key" or "commit_ts"); update the branch that checks
*adminInsecureNoAuth to invoke slog.Warn with a descriptive message like "admin
grpc service exposed without authentication" and the structured fields so
logging follows the project's slog conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/admin_grpc_test.go`:
- Around line 56-65: The test mutates the package-global nowFunc (used by
GetRaftGroups) while running t.Parallel(), causing races; fix by making the test
serial or by moving the clock seam into AdminServer: either remove t.Parallel()
from TestGetRaftGroupsExposesCommitApplied so it doesn't mutate nowFunc
concurrently, or add a clock field/option to NewAdminServer and AdminServer
(e.g., pass a nowFunc/Clock to NewAdminServer and have GetRaftGroups consult
srv.now) and update the test to supply a fixed clock instead of changing the
package-global nowFunc.

In `@cmd/elastickv-admin/main.go`:
- Around line 300-313: The release() logic for nodeClient must be made truly
idempotent by tracking whether the connection was already closed under the same
lock that protects refcount/evicted (fanout.mu); add a boolean field (e.g.
closed or connClosed) to nodeClient, check it inside the critical section before
calling conn.Close(), and set it true once you close so subsequent release()
calls do nothing; apply the same change to the other release implementations
that manipulate refcount/evicted and call conn.Close() to ensure Close() is
invoked at most once.
- Around line 241-250: The current size check using os.Stat followed by
os.ReadFile can be racy; open the file with os.Open(abs) and perform a bounded
read (e.g. use io.LimitReader or io.ReadAll with io.LimitReader) to read at most
maxTokenFileBytes+1 bytes, then reject when the returned byte slice length is >
maxTokenFileBytes; replace the os.ReadFile(abs) usage and the variables info,
abs, maxTokenFileBytes accordingly so the cap is enforced during the actual
read.

In `@docs/admin_ui_key_visualizer_design.md`:
- Line 292: The docs mention a `--no-fanout` flag that doesn’t exist; either
implement the flag in the admin command parsing (add a boolean flag
`--no-fanout` to the admin command handler that toggles single-node partial mode
alongside the existing `--nodes` option) or update the text to remove or mark
the `--no-fanout` example as deferred; locate the admin command
registration/flag parsing code that defines `--nodes` (the admin command
handler) to add the `--no-fanout` boolean, wire it into the fanout logic, and
return/update the doc sentence accordingly.
- Line 176: The docs currently describe the flush goroutine as swapping each
route's counter pointer and dropping a new column into the ring buffer, but §5.1
and the implementation use in-place atomic drainage via SwapUint64; update all
occurrences (references to "swap pointer", "retired-pointer protocol", and
similar) to describe in-place atomic reset using SwapUint64 (or
atomic.SwapUint64), explain that the flush goroutine (triggered every
stepSeconds) atomically swaps the counter value to 0 and records the drained
value into the ring buffer column, and adjust the representation, performance,
and test text to remove retired-pointer semantics and instead describe atomic
in-place drainage, the memory model/atomicity guarantees of SwapUint64, and
expected test behavior for zeroed counters.

In `@main.go`:
- Around line 592-604: The current os.Stat then os.ReadFile can miss a
concurrent grow/replacement; replace the os.ReadFile call with opening the file
(os.Open(abs)) and reading via a limited reader (e.g.
io.ReadAll(io.LimitReader(f, adminTokenMaxBytes+1))) so you only allocate up to
adminTokenMaxBytes+1 bytes, then reject if len(b) > adminTokenMaxBytes with a
clear error (same context as "read admin token file"); keep or keep using the
initial os.Stat check for an early fast-fail but enforce the strict cap at read
time using adminTokenMaxBytes, abs, os.Open and io.LimitReader.

---

Duplicate comments:
In `@adapter/admin_grpc.go`:
- Around line 111-127: The current serialization in adapter/admin_grpc.go turns
a zero st.LastContact into "now" which hides unknown/no-contact state; in the
loop over ids using nowFunc(), change the logic that sets
pb.RaftGroupState.LastContactUnixMs so that if st.LastContact == 0 you set
LastContactUnixMs to 0 (preserve unknown) and only compute
now.Add(-st.LastContact).UnixMilli() when st.LastContact > 0; update the code
around the Status() call and the pb.RaftGroupState construction to branch on
st.LastContact accordingly.

In `@docs/admin_ui_key_visualizer_design.md`:
- Around line 268-271: The table rows contain raw pipe characters inside code
spans (e.g. `kv/sharded_coordinator.go`, `isPebbleMetaKey`, and reserved keys
like `!admin|` and `!admin|keyviz|*`) which Markdown parses as column
separators; update the table entries in docs/admin_ui_key_visualizer_design.md
to escape the pipe characters inside those inline code spans (replace `|` with
`\|` inside the code text) so entries like `!admin\|` and `!admin\|keyviz\|*`
render as single cells and satisfy markdownlint; ensure all occurrences in the
listed entries (`kv/sharded_coordinator.go`, `keyviz/`,
`monitoring/live_summary.go`, `store/lsm_store.go`, and references to
`isPebbleMetaKey`) are updated consistently.

---

Nitpick comments:
In `@cmd/elastickv-admin/main.go`:
- Line 167: Replace the plain log.Printf call with structured slog usage:
initialize or use the existing slog logger and emit an Info event for the admin
start with structured keys like bind_addr and seeds (reference bindAddr and
cfg.seeds), e.g., call slog.Info or slog.Log with attributes "bind_addr",
*bindAddr and "seeds", cfg.seeds; also include additional keys such as "node" if
available and ensure the slog package is imported and logger initialized before
use (update main or init where main sets up logging).

In `@main.go`:
- Around line 520-522: Replace the plain log.Printf call that warns when the
adminInsecureNoAuth flag is set with structured slog logging: call slog.Warn
(not log.Printf) and include at least the structured key/value pairs such as
"flag": "adminInsecureNoAuth" (and any other context keys your project requires,
e.g., "key" or "commit_ts"); update the branch that checks *adminInsecureNoAuth
to invoke slog.Warn with a descriptive message like "admin grpc service exposed
without authentication" and the structured fields so logging follows the
project's slog conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 850901d9-e54b-4d1d-a263-503a1c84bacc

📥 Commits

Reviewing files that changed from the base of the PR and between ac8d6c3 and a126e71.

📒 Files selected for processing (7)
  • adapter/admin_grpc.go
  • adapter/admin_grpc_test.go
  • cmd/elastickv-admin/main.go
  • cmd/elastickv-admin/main_test.go
  • docs/admin_ui_key_visualizer_design.md
  • main.go
  • main_admin_test.go

Comment thread adapter/admin_grpc_test.go Outdated
Comment thread cmd/elastickv-admin/main.go Outdated
Comment thread cmd/elastickv-admin/main.go
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread docs/admin_ui_key_visualizer_design.md Outdated
Comment thread main.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a126e71af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/admin_grpc.go Outdated
LeaderTerm: st.Term,
CommitIndex: st.CommitIndex,
AppliedIndex: st.AppliedIndex,
LastContactUnixMs: now.Add(-st.LastContact).UnixMilli(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clamp unknown LastContact before computing Unix timestamp

raftengine can return a sentinel negative duration for unknown contact (unknownLastContact = -1 when follower/candidate contact is unavailable), but this code always does now.Add(-st.LastContact). For negative values that produces a future lastContactUnixMs, so the admin UI can show negative/incorrect staleness exactly when leader contact is unknown, masking real connectivity issues. Please guard negative LastContact values (or map them to an explicit sentinel) before converting to Unix milliseconds.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Phase 0 infrastructure for the Elastickv Admin UI and Key Visualizer, adding a node-side Admin gRPC service, a standalone elastickv-admin fan-out proxy binary, and a detailed design specification. The review feedback suggests hardening type assertions in the membership refresh logic to avoid panics, deduplicating token validation code across the codebase, and improving the idempotency and error logging of the gRPC client connection release process.

Comment on lines +523 to +524
select {
case r := <-ch:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type assertion r.Val.([]string) should be checked for success. While refreshMembership currently always returns a slice, if the singleflight function were to return an error or a different type in the future, this would cause a panic.

case r := <-ch:
	addrs, _ := r.Val.([]string)
	return addrs

Comment thread main.go
Comment on lines +587 to +610
func loadAdminTokenFile(path string) (string, error) {
abs, err := filepath.Abs(path)
if err != nil {
return "", errors.Wrap(err, "resolve admin token path")
}
info, err := os.Stat(abs)
if err != nil {
return "", errors.Wrap(err, "stat admin token file")
}
if info.Size() > adminTokenMaxBytes {
return "", fmt.Errorf(
"admin token file %s is %d bytes; maximum is %d",
abs, info.Size(), adminTokenMaxBytes)
}
b, err := os.ReadFile(abs)
if err != nil {
return "", errors.Wrap(err, "read admin token file")
}
token := strings.TrimSpace(string(b))
if token == "" {
return "", errors.New("admin token file is empty")
}
return token, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for loading and validating the admin token file is duplicated between cmd/elastickv-admin/main.go and main.go. Centralize this common logic into a shared helper function to avoid code duplication and improve maintainability.

References
  1. Centralize common complex logic, such as retry/backoff mechanisms, into shared helper functions to avoid code duplication and improve maintainability.

Comment thread cmd/elastickv-admin/main.go Outdated
Comment on lines +423 to +428
if c.refcount == 0 && c.evicted {
if err := c.conn.Close(); err != nil {
log.Printf("elastickv-admin: deferred close for %s: %v", c.addr, err)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If release() is called multiple times by a buggy caller, and the client is already evicted, c.conn.Close() will be called repeatedly. While grpc.ClientConn.Close() is idempotent, it is better to ensure the deferred close only happens once. Additionally, do not silently ignore errors from Close() methods on resources like network connections; log them to ensure visibility.

if c.refcount == 0 && c.evicted {
	c.evicted = false // Ensure double-release doesn't re-close
	if err := c.conn.Close(); err != nil {
		log.Printf("elastickv-admin: deferred close for %s: %v", c.addr, err)
	}
}
References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

CodeRabbit review 4164075693:

- adapter/admin_grpc.go: move the test-seam clock from a package-global nowFunc to a per-AdminServer field with SetClock(); parallel tests on other AdminServer instances can no longer race through the seam. Updated the LastContact test to use SetClock.
- cmd/elastickv-admin/main.go + main.go: replace the Stat/ReadFile TOCTOU token-load with io.LimitReader(f, max+1); a file that grows or is swapped between the stat() and read() now still cannot force an oversized allocation. Tests updated for the new error wording.
- cmd/elastickv-admin/main.go: make the deferred close of retired gRPC clients idempotent by tracking nodeClient.closed under fanout.mu. A second release() on an already-closed client is now a true no-op; Close() also marks clients closed to avoid double-close with late releases.
- docs §5.1/§5.3/§5.4/§10/§11: align all remaining references with the atomic.SwapUint64 flush protocol (drop "counter pointer" / "pointer swap" wording). No retired pointers, no late-writer harvest needed.
- docs §9.1: replace the non-existent `--no-fanout` flag example with a note that passing one address to --nodes yields the partial single-node view; a dedicated flag is deferred.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review @codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Admin gRPC service for node management and a standalone elastickv-admin binary that serves as a proxy and UI backend. The Admin service provides endpoints for cluster overview, Raft group status, and performance summaries, while the admin binary implements a fan-out mechanism to aggregate data across the cluster. Feedback was provided regarding the potential for gRPC interceptors to be overwritten in main.go when combining server options, which could lead to authentication or logging being bypassed.

Comment thread main.go
Comment on lines +688 to +690
opts := append([]grpc.ServerOption(nil), internalutil.GRPCServerOptions()...)
opts = append(opts, adminGRPCOpts...)
gs := grpc.NewServer(opts...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Appending adminGRPCOpts (which contains grpc.ChainUnaryInterceptor) to the options from internalutil.GRPCServerOptions() may cause existing interceptors to be ignored if they were provided via grpc.UnaryInterceptor. In gRPC-Go, if multiple interceptor options of the same type are provided to NewServer, only the last one is used. To ensure all interceptors are preserved, it is safer to chain them together or ensure that internalutil.GRPCServerOptions() is compatible with chaining.

Codex P2 + Gemini medium comments on 09a38ed/a126e71a:

- adapter/admin_grpc.go: clamp sentinel-negative st.LastContact to zero before computing LastContactUnixMs. raftengine's etcd engine returns -1 when contact is unknown (follower with no leader yet); without the clamp the admin UI would show a future timestamp exactly when contact is unknown, reading as "freshly contacted". Added regression test that asserts the clamp.
- cmd/elastickv-admin + main.go: extract the shared bearer-token loader to internal.LoadBearerTokenFile so the admin binary and node process share the same LimitReader-bounded read + trim + size-cap logic. Drops duplication flagged by Gemini.
- cmd/elastickv-admin/main.go: make the singleflight DoChan result type-assertion safe (unexpected type logs and falls back to seeds instead of panicking, per Gemini).
- adapter/admin_grpc.go: gci-format the imports (bootjp-prefix group separated from third-party) to satisfy the reviewdog/golangci pre-check.

All tests (incl. -race) pass for adapter, cmd/elastickv-admin, internal, and main packages.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review @codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an Admin gRPC service and a corresponding admin binary to provide operational visibility into the Elastickv cluster. The implementation includes token-based authentication, cluster membership discovery, and raft group status reporting. My feedback highlights an issue with how unknown last-contact durations are handled, suggesting a sentinel value instead of clamping to zero to avoid misleading health reporting, and points out that the static membership list in the AdminServer will not reflect cluster scale-out events.

Comment thread adapter/admin_grpc.go Outdated
Comment on lines +137 to +148
lastContact := st.LastContact
if lastContact < 0 {
lastContact = 0
}
out = append(out, &pb.RaftGroupState{
RaftGroupId: id,
LeaderNodeId: st.Leader.ID,
LeaderTerm: st.Term,
CommitIndex: st.CommitIndex,
AppliedIndex: st.AppliedIndex,
LastContactUnixMs: now.Add(-lastContact).UnixMilli(),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Clamping negative LastContact durations to zero results in the RPC reporting the current time as the last contact when the actual status is unknown (e.g., a follower that has never heard from a leader). This is misleading as it indicates the node is perfectly in sync when it might be isolated or failing to sync. It is better to return a sentinel value (like 0) so the UI can correctly render an "unknown" or "never contacted" state instead of showing a false positive for health.

		lastContact := st.LastContact
		var lastContactUnixMs int64
		if lastContact >= 0 {
			lastContactUnixMs = now.Add(-lastContact).UnixMilli()
		}
		out = append(out, &pb.RaftGroupState{
			RaftGroupId:       id,
			LeaderNodeId:      st.Leader.ID,
			LeaderTerm:        st.Term,
			CommitIndex:       st.CommitIndex,
			AppliedIndex:      st.AppliedIndex,
			LastContactUnixMs: lastContactUnixMs,
		})

Comment thread main.go
Comment on lines +498 to +509
func setupAdminService(
nodeID, grpcAddress string,
runtimes []*raftGroupRuntime,
bootstrapServers []raftengine.Server,
) (*adapter.AdminServer, []grpc.ServerOption, error) {
members := adminMembersFromBootstrap(nodeID, bootstrapServers)
srv, opts, err := configureAdminService(
*adminTokenFile,
*adminInsecureNoAuth,
adapter.NodeIdentity{NodeID: nodeID, GRPCAddress: grpcAddress},
members,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AdminServer is initialized with a static list of members derived from the Raft bootstrap configuration. This list is never updated, which means that any nodes added to the cluster after startup (scale-out) will not be discoverable. To maintain consistency between the application's peer list and the Raft ConfState, the AdminServer should ideally query the Raft engines for the current cluster configuration to provide a dynamic and accurate membership list, ensuring the peer list remains complete and consistent for all operations.

References
  1. To maintain consistency between the application's peer list and the Raft ConfState, ensure the peer list remains complete and accurate, especially when handling new nodes or cluster changes.

Gemini medium on 09a38ed: returning grpc.ChainUnaryInterceptor(unary) as a
grpc.ServerOption from configureAdminService risked silent loss of the admin
auth gate if anyone ever added a grpc.UnaryInterceptor (single-interceptor)
option to internalutil.GRPCServerOptions() — gRPC-Go keeps only the last
option of the same type.

Change configureAdminService and setupAdminService to return raw interceptor
slices in an adminGRPCInterceptors struct, and let startRaftServers collapse
them with any other admin interceptors into a single
grpc.ChainUnaryInterceptor / ChainStreamInterceptor call at registration time.
No existing option is overwritten, and future additions fall into the same
chain without special handling.

Tests updated to assert the slice shape (unary:1, stream:1 for token gate;
empty for disabled / insecure).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review @codex review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Gemini medium items on a3c88b4:

- adapter/admin_grpc.go (GetRaftGroups): report LastContactUnixMs=0 (epoch
  sentinel) when raftengine returns a negative LastContact (unknown — e.g.,
  follower that has never heard from a leader). Previously the clamp collapsed
  to `now`, which the UI could not distinguish from "freshly contacted".
  proto/admin.proto documents the sentinel; regenerated admin.pb.go.

- adapter/admin_grpc.go (GetClusterOverview): the member list is now the union
  of the bootstrap seed and each registered Raft group's live Configuration().
  Scale-out nodes added after admin-server startup are now discoverable
  without restarting the admin binary. A per-group Configuration error does
  NOT fail the RPC — the seed list still produces useful output.

- adapter.AdminGroup interface extended with Configuration(ctx); in-tree
  raftengine.Engine already satisfies it (ConfigReader). Test fakes updated.

Tests added:
- unknown LastContact → 0 instead of now.
- scale-out node discovered via Configuration().
- Configuration error on one group does not fail the overview.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

/gemini review @codex review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1f0e532f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/admin_grpc.go Outdated
Comment on lines +130 to +134
if _, dup := seen[id]; dup {
return
}
seen[id] = struct{}{}
out = append(out, &pb.NodeIdentity{NodeId: id, GrpcAddress: addr})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prefer live member address over seeded duplicate NodeID

snapshotMembers seeds out first, then ignores later entries with the same NodeID, so a stale bootstrap address permanently wins over the current Raft configuration address for that node. If a node is replaced or readdressed with the same ID, GetClusterOverview will keep advertising the old endpoint and the admin fanout will repeatedly dial the wrong target instead of the live one.

Useful? React with 👍 / 👎.

Codex P2 on e1f0e53: snapshotMembers seeded out[] first, so a stale
bootstrap entry for a readdressed node (same NodeID, new GRPCAddress)
silently outvoted the current Raft Configuration. GetClusterOverview
kept advertising the old endpoint and the admin fanout dialed the wrong
target.

Collect live Configuration entries first, then add seeds only for
NodeIDs no group reported. A readdressed n2 now shows its current
10.0.0.22:50051 instead of the stale 10.0.0.12:50051 bootstrap value.
Regression test pins the behavior.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

/gemini review @codex review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37a043529e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread main.go
return nil
})

adminServer, adminGRPCOpts, err := setupAdminService(*raftId, *myAddr, runtimes, bootstrapServers)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive Admin self address from active listener

setupAdminService is initialized with *myAddr, but when --raftGroups is set the process actually listens on each group's rt.spec.address in startRaftServers, and --address can be unrelated/default. In that configuration GetClusterOverview.Self.grpcAddress advertises a non-listening endpoint, and the admin UI fan-out path (membersFrom) will include this bad address on every refresh, causing persistent failed RPCs and partial node visibility.

Useful? React with 👍 / 👎.

Comment thread adapter/admin_grpc.go
Comment on lines +170 to +171
if _, dup := addrByID[srv.ID]; dup {
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile duplicate live member IDs deterministically

When two raft groups report the same srv.ID with different addresses (common during readdress/rolling config convergence), this code keeps whichever address is seen first and drops the rest. Because group iteration originates from a map snapshot, first-seen order is not guaranteed, so the selected address can flap between calls or stick to a stale endpoint, making admin fan-out intermittently target dead nodes.

Useful? React with 👍 / 👎.

Codex P2 ×2 on 37a0435:

1. Self-address: setupAdminService was advertising *myAddr as
   Self.GRPCAddress, but in multi-group mode the process actually
   listens on each group's rt.spec.address (*myAddr can be unrelated or
   default). Fan-out would then include a non-listening endpoint and
   keep failing against it. canonicalSelfAddress now picks the
   lowest-group-ID listener deterministically (with *myAddr fallback
   for single-node dev runs).

2. Duplicate live NodeIDs: when two groups report the same server ID
   with different addresses during readdress convergence,
   collectLiveMembers kept first-seen — but Go map iteration is
   non-deterministic, so fan-out could flap between addresses. Sort
   groups by ID (ascending) before iterating so the lowest-ID group's
   address is the stable winner. Runs through GetClusterOverview 5 times
   in the regression test to pin stability.

adapter/admin_grpc.go: cloneGroups → cloneGroupsSorted returning
[]groupEntry{id, group}; collectLiveMembers now takes sorted groups.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

/gemini review @codex review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5efc655557

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/admin_grpc.go
if _, dup := addrByID[srv.ID]; dup {
continue
}
addrByID[srv.ID] = srv.Address
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore live members with empty gRPC addresses

collectLiveMembers currently treats the first seen NodeID as authoritative even if srv.Address is empty (addrByID[srv.ID] = srv.Address), and then mergeSeedMembers refuses to backfill that ID from seeds. This can drop reachable peers from discovery because the etcd engine can emit configuration entries with Address: "" when peer metadata is missing (internal/raftengine/etcd/engine.go), so GetClusterOverview returns blank grpc_address and cmd/elastickv-admin will trim that entry and stop fanning out to that node. Skipping empty live addresses (or letting seeds overwrite blank live values) avoids this regression.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant