Skip to content

Vault: VaultForceEmptyOCRRounds and plugin limit resolution#22345

Open
prashantkumar1982 wants to merge 5 commits into
developfrom
feat/vault-force-empty-ocr-rounds
Open

Vault: VaultForceEmptyOCRRounds and plugin limit resolution#22345
prashantkumar1982 wants to merge 5 commits into
developfrom
feat/vault-force-empty-ocr-rounds

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

@prashantkumar1982 prashantkumar1982 commented May 7, 2026

Summary

This change wires VaultForceEmptyOCRRounds (chainlink-common CRE setting) into the vault OCR2 plugin, refines how reporting plugin limits are resolved, and moves small helpers into plugin_utils.go. It also adds request lifecycle instrumentation for Vault capability + OCR (OTLP via beholder), with a non-optional RequestLifecycleTracker for the reporting plugin factory and transmitter.

  • When VaultForceEmptyOCRRounds is enabled (gate allows), GetPendingQueue is skipped in Observation and ValidateObservation; pending queue is treated as empty. Observation reuses a single in-memory pending queue list (renamed currentPendingQueueItems) and drops the duplicate store read.

https://smartcontract-it.atlassian.net/browse/CRE-4071
https://smartcontract-it.atlassian.net/browse/CRE-3478


Request lifecycle stages

These are the stage string values used in traces, logs (furthest_stage on timeout), and metric attributes where applicable. Order follows the happy path from capability ingress through OCR.

Stage value Meaning
received Request recorded at handleRequest (start of trace; used for furthest_stage when nothing else ran).
blob_broadcasting Request chosen for blob broadcast during Observation.
blob_broadcasted Blob successfully broadcast for the request.
written_to_pending_queue Request included in DON pending queue after state transition consensus.
observed_outcome Request present in the observation batch built from the local KV queue.
state_transition_outcome Outcome for the request included in the state transition result.
transmitted OCR Transmit invoked for this request id.
capability_response_received Capability delivered a response to the caller (success or error path that closes the trace).

Latency histograms are emitted from received as time zero for stages starting at blob_broadcasting through capability_response_received (only for stages that actually occurred). The rounds histogram measures OCR seqNr delta from blob_broadcasting for subsequent stages (blob_broadcasted through transmitted).


Metrics (OTLP)

Unless noted, lifecycle/capability metrics include attribute config_digest (OCR config digest string).

Capability + request lifecycle

Name Type Description / attributes
platform_vault_capability_requests_received_total Counter One increment per request received at the capability (handleRequest).
platform_vault_request_lifecycle_stage_latency_ms Histogram (ms) stage: blob_broadcasting, blob_broadcasted, written_to_pending_queue, observed_outcome, state_transition_outcome, transmitted, capability_response_received. Wall time from received to that stage.
platform_vault_request_lifecycle_stage_rounds_delta Histogram stage: blob_broadcasted, written_to_pending_queue, observed_outcome, state_transition_outcome, transmitted. SeqNr delta from blob_broadcasting.
platform_vault_request_lifecycle_round_delta_skipped_total Counter stage: same as rounds histogram when blob_broadcasting never ran (cannot compute delta).
platform_vault_request_lifecycle_timeout_total Counter Request timed out in capability before a response (handleRequest deadline).
platform_vault_request_lifecycle_pending_queue_not_in_local_queue_total Counter Pending-queue write seen for an id not present in this node’s local queue (DON path without local receipt first).
platform_vault_request_lifecycle_transmit_not_in_local_queue_total Counter Transmit for an id not present in this node’s local queue.
platform_vault_capability_request_outcome_total Counter outcome: success, timeout, response_error. For timeout, also furthest_stage: one of the lifecycle stage values above (furthest reached when the trace ended).
platform_vault_capability_request_response_error_total Counter Capability closed with an OCR error response (non-timeout).

Vault OCR reporting plugin (separate meter instruments)

These use attribute configDigest (string) on recordings, plus method for KV duration as listed in code.

Name Type Notes
platform_vault_plugin_queue_overflow Counter queueSize, batchSize when local observation queue is truncated.
platform_vault_plugin_kv_operation_duration_ms Histogram (ms) method per wrapped KV call.
platform_vault_plugin_local_queue_size Histogram ({request}) Size of local store at Observation time.

Disk usage (if enabled in plugin)

Name Type
platform_vault_disk_usage_bytes Gauge

- Gate pending-queue reads in Observation and ValidateObservation when
  VaultForceEmptyOCRRounds is enabled; rename batch to currentPendingQueueItems
  and remove redundant GetPendingQueue call
- Add VaultForceEmptyOCRRounds GateLimiter to reporting plugin config;
  Close() and test helper wiring
- Resolve OCR plugin limits via BoundLimiter.Limit() in initializePluginLimits;
  extract helpers to plugin_utils.go (resolveVaultOCRBoundLimitInt, logging
  for forceEmptyOCRRounds)

Co-authored-by: Cursor <cursoragent@cursor.com>
@prashantkumar1982 prashantkumar1982 requested review from a team as code owners May 7, 2026 19:32
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

👋 prashantkumar1982, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ No conflicts with other open PRs targeting develop

@prashantkumar1982 prashantkumar1982 changed the title feat(vault): VaultForceEmptyOCRRounds and plugin limit resolution Vault: VaultForceEmptyOCRRounds and plugin limit resolution May 7, 2026
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 7, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

russell-stern
russell-stern previously approved these changes May 7, 2026
func forceEmptyOCRRounds(ctx context.Context, lggr logger.Logger, vaultForceEmptyOCRRounds limits.GateLimiter) bool {
err := vaultForceEmptyOCRRounds.AllowErr(ctx)
if err == nil {
lggr.Warnw("VaultForceEmptyOCRRounds is enabled; pending queue is not read this OCR round — store-backed pending observation items are skipped")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we could add this logging closer to where we apply the side effect? Otherwise I'm afraid we could do code changes that render this logging ineffective which would be confusing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err != nil {
return nil, fmt.Errorf("could not fetch batch of requests: %w", err)
var currentPendingQueueItems []*vaultcommon.StoredPendingQueueItem
if !forceEmptyOCRRounds(ctx, r.lggr, r.cfg.VaultForceEmptyOCRRounds) {
Copy link
Copy Markdown
Contributor

@cedric-cordenier cedric-cordenier May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor this so that we deal with this as an early return instead? i.e. check this condition once, and if true immediately return an empty observation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way -- did you intend to just set the current round's pending queue to empty? or did you also intend to make the local queue empty in the returned observation?

Copy link
Copy Markdown
Contributor Author

@prashantkumar1982 prashantkumar1982 May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this condition once, and if true immediately return an empty observation
We still want to update the pending queue with newer items right? Thus not returning an empty observation.

By the way -- did you intend to just set the current round's pending queue to empty? or did you also intend to make the local queue empty in the returned observation?

Just set the current round's pending queue to empty, and start preparing new pending queue from the local queue.

@cl-sonarqube-production
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants