Skip to content

feat(dips): dedicated fast loop with offer-existence gate#1200

Open
MoonBoi9001 wants to merge 15 commits intomain-dipsfrom
mb9/dedicated-fast-loop-for-dips-acceptance
Open

feat(dips): dedicated fast loop with offer-existence gate#1200
MoonBoi9001 wants to merge 15 commits intomain-dipsfrom
mb9/dedicated-fast-loop-for-dips-acceptance

Conversation

@MoonBoi9001
Copy link
Copy Markdown
Member

@MoonBoi9001 MoonBoi9001 commented Apr 22, 2026

Motivation

When an indexer accepts an offer to index a subgraph, the indexer-agent confirms that acceptance on-chain by calling the SubgraphService contract. Until now, that acceptance was tied to the agent's 120-second reconciliation cycle, which means the agent needed two or more cycles (240+ seconds) before it actually attempted the on-chain call. Recurring collection agreements have a 300-second deadline. The slack between the two was minimal — and on a fresh deploy with any backlog at all, agreements expired before the agent got around to accepting them.

A second, related failure mode shows up under load. dipper submits an offer() transaction on-chain ahead of asking the indexer to accept; if dipper's transaction is evicted from the mempool — for example, because another service holding the same wallet sends a higher-fee transaction in the same window — the indexer-agent walks up to a contract that has no record of the offer. The call reverts, the agent treats the revert as a permanent rejection, and the indexing slot is lost to reassessment even though the only real fix was waiting one more tick for dipper to resubmit.

The combined effect is that DIPs agreements expire or get reassessed when they should land cleanly. Operators see opaque revert errors and can't tell whether the system is broken or just unlucky.

Summary

  • Adds a dedicated 5-second acceptance loop (startProposalAcceptanceLoop) that polls pending_rca_proposals independently of the main reconciliation cycle. Acceptance attempts now happen well inside the 300-second deadline.
  • Adds an OfferMonitor that queries the indexing-payments-subgraph for the offer before the agent attempts to accept. If the offer isn't yet on-chain, the agent stays pending and the 5s loop re-pops the row on the next tick. If the agreement deadline is within 30 seconds, the agent gives up cleanly with offer_never_landed so reassessment can pick a replacement.
  • Carries the supporting fixes the new fast loop needs to land usefully: unpacking the agreement and signature into separate arguments to acceptIndexingAgreement, creating the local indexing rule before the accept transaction fires (so the new loop doesn't race the reconciliation cycle into orphan-allocation territory), ensuring the subgraph deployment exists on graph-node before the multicall, and logging the full revert context on accept failure.
  • Wraps collectAgreementPayments and matchAgreementAllocations in try/catch so a transient subgraph or RPC failure skips that tick instead of aborting the entire reconcile cycle for every configured network.

Changes

  • New packages/indexer-common/src/indexing-fees/offer-monitor.ts — converts the UUID-format agreementId from pending_rca_proposals to the bytes16 hex form the subgraph keys offers by; treats subgraph errors as transient.
  • New optional indexingPaymentsSubgraph field on the network specification, plumbed through the agent's CLI flags. When absent, the gate is bypassed and the prior behaviour is preserved.
  • New startProposalAcceptanceLoop in dips.ts plus its registration from agent.ts.
  • Test coverage: 4 cases for OfferMonitor, three branches of the gate in processProposal (offer absent + deadline far → wait; offer absent + deadline near → reject; offer present → proceed), and supporting test scaffolding for graphNode.ensure.

@Maikol Maikol force-pushed the feat/dips-on-chain-cancel branch from c4d0d52 to 861cddf Compare April 22, 2026 14:29
MoonBoi9001 added a commit to edgeandnode/local-network that referenced this pull request Apr 25, 2026
The indexer-agent's DIPs accept gate (graphprotocol/indexer#1200)
queries the indexing-payments-subgraph for OfferStored entities
before calling acceptIndexingAgreement. Without a configured
endpoint the gate is a no-op and a dropped offer() tx loses the
agreement to a permanent deterministic rejection.

Set INDEXER_AGENT_INDEXING_PAYMENTS_SUBGRAPH_ENDPOINT alongside the
existing INDEXER_AGENT_OFFCHAIN_SUBGRAPHS so the agent picks up the
gate as soon as the subgraph deployment is detected.

Also add inline shellcheck directives to silence pre-existing
SC1091 / SC2153 notes the post-edit-lint hook surfaced.
@MoonBoi9001 MoonBoi9001 force-pushed the mb9/dedicated-fast-loop-for-dips-acceptance branch from 78f4c6c to 790355a Compare April 28, 2026 06:17
@MoonBoi9001 MoonBoi9001 changed the base branch from feat/dips-on-chain-cancel to feat/dips-new-subgraph April 28, 2026 06:17
@MoonBoi9001 MoonBoi9001 force-pushed the mb9/dedicated-fast-loop-for-dips-acceptance branch 2 times, most recently from 2d3a942 to 33c5e22 Compare April 28, 2026 07:03
MoonBoi9001 and others added 3 commits April 28, 2026 15:32
Decouples DIPs proposal acceptance from the 120s reconciliation cycle
into a dedicated 5s polling loop (startProposalAcceptanceLoop). The
300s RCA deadline left insufficient slack with the old 240s+ latency.

Bundles the supporting fixes the new loop needs to land usefully:

- Unpack the agreement and signature as separate arguments to
  acceptIndexingAgreement. The on-chain contract split the previously-
  packed SignedRCA arg; without unpacking, the call reverts with
  FailedCall().

- Create the local indexing rule for the deployment inside
  processProposal, before the accept tx fires. With the fast loop in
  place, accept now races reconciliation; the rule must exist first
  or reconciliation sees the new allocation as an orphan and tries to
  unallocate it (IE067).

- Ensure the subgraph deployment exists on graph-node before the
  multicall. The contract's state-validation step looks the deployment
  up; if it isn't local the multicall reverts.

- Log full revert context (reason, data, message, contract target) on
  accept failure. Previously this showed `error: null` for any custom
  error the parser didn't recognise.
Before: processProposal read a pending RCA and immediately called
acceptIndexingAgreement. If dipper's offer() tx was evicted from the
mempool (or hadn't confirmed yet), the contract reverted with
RecurringCollectorInvalidSigner. handleAcceptError treated the
CALL_EXCEPTION as a permanent deterministic failure and rejected the
proposal, losing the slot to reassessment.

After: before calling acceptIndexingAgreement, query the
indexing-payments-subgraph for Offer(id: agreementId). Missing offer
means dipper's submission hasn't reached the subgraph yet — stay
pending and let the 5s acceptance loop re-pop the row on the next
tick. If the RCA deadline is within 30s, give up and mark
offer_never_landed so reassessment can pick a replacement.

Wired via a new OfferMonitor helper and an optional
indexingPaymentsSubgraph field on the network specification. When the
field isn't configured (operator didn't wire it up) the gate is
bypassed and the prior "try and see" behaviour is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap collectAgreementPayments and matchAgreementAllocations bodies so a
transient subgraph or RPC failure logs and skips this tick instead of
throwing through mapNetworkMapped and aborting the entire reconcile
cycle for every configured network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MoonBoi9001 MoonBoi9001 force-pushed the mb9/dedicated-fast-loop-for-dips-acceptance branch from 33c5e22 to 76346c1 Compare April 28, 2026 07:33
@MoonBoi9001 MoonBoi9001 changed the title feat: dedicated fast-path loop for DIPs proposal acceptance feat(dips): dedicated fast loop with offer-existence gate Apr 28, 2026
Adds a fourth case to the offer-existence-gate suite asserting that when
indexingPaymentsSubgraph isn't configured, offerMonitor is null and the
gate is skipped (acceptIndexingAgreement still runs). Locks in the
"prior behaviour preserved" claim from the PR body.

Promotes the inline safetyMarginSeconds to a file-level constant
(OFFER_GATE_DEADLINE_SAFETY_MARGIN_SECONDS) alongside DIPS_ACCEPTANCE_INTERVAL
so the timing budget is discoverable from the top of the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from feat/dips-new-subgraph to feat/dips-on-chain-cancel April 28, 2026 19:37
Base automatically changed from feat/dips-on-chain-cancel to feat/dips-on-chain-collect April 28, 2026 20:09
@Maikol Maikol force-pushed the feat/dips-on-chain-collect branch from f792bc2 to 078630f Compare April 28, 2026 20:23
Base automatically changed from feat/dips-on-chain-collect to main-dips April 28, 2026 20:28
MoonBoi9001 and others added 5 commits April 29, 2026 12:28
Wraps each phase of processProposal with process.hrtime.bigint timers
and emits a structured INFO summary per proposal (rule check, offer
existence check, graphNode.ensure, accept). Outcome label disambiguates
the early-return paths (deadline, blocklist, offer-gate) from the accept
phase.

Pure observability; no logic changes. Existing log lines for "Proposal
accepted on-chain", "Rejecting proposal: deterministic contract error",
and "Transient error accepting proposal, will retry" continue to
disambiguate the accept outcome.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a classifyAbiMismatch helper that pattern-matches ethers errors
arising from ABI shape mismatches at the encode/dispatch layer:
  - UNSUPPORTED_OPERATION with operation: "fragment" (wrong selector)
  - INVALID_ARGUMENT (wrong argument tuple)

These errors mean the installed contract types disagree with what the
deployed contract expects. Retrying does not help; the call will fail
identically every tick. Marking the proposal rejected immediately and
cleaning up the dips rule lets dipper reassess in seconds rather than
burning the full RCA deadline.

Without this, a single class of failure (e.g. installed
@graphprotocol/interfaces ABI lagging the deployed contract) cascades
into mass agreement expirations, and via dipper's 30-day decline
lookback, can blocklist every (indexer, deployment) pair after just one
failed batch.

Existing CALL_EXCEPTION handling (revert data + tryParseCustomError) is
unchanged; this only adds the abi-mismatch fast-path before it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves graphNode.ensure from processProposal into
ensureDipsRuleForProposal. The deploy is idempotent on graph-node, so
moving it to rule-creation time means each indexer pays the cold-deploy
cost once per (deployment, proposal) instead of once per accept-loop
tick per agreement.

At 50-request scale (3 candidates per request, ~25 proposals queueing
on each agent's accept loop) the previous placement could re-invoke
ensure ~25 times on each tick across the same handful of deployments.
With this change the cold-deploy work happens at the moment dipper's
proposal first lands, before the offer-existence gate, and is amortised
across any reassessment cycles for the same deployment.

If graphNode.ensure throws, the proposal stays pending (returns false
from ensureDipsRuleForProposal) so the next acceptance-loop tick
retries — matching the existing retry semantics of other transient
failures.

The phase-timing log added in the previous commit attributes ensure
duration to ruleMs going forward (one fewer phase reported, but the
total wall time captured by totalMs is unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the serial for-loop in startProposalAcceptanceLoop with a pMap
over proposals at DIPS_ACCEPT_CONCURRENCY=4. Each processProposal call
operates on a distinct agreementId with no shared mutable state, so the
parallel work is independent.

The wallet's nonce queue inside transactionManager.executeTransaction
already handles ordering and retries on collision, so concurrent submits
are safe. Concurrency is capped low enough that one slow in-flight call
doesn't head-of-line everything else, while removing the serial-tick
bottleneck observed at 50-request scale (where ~25 proposals per agent
per tick had to drain one at a time).

Combined with the previous commit hoisting graphNode.ensure out of
processProposal, the accept critical path is now tx-bound rather than
deploy-bound, and parallelising that path yields a meaningful
throughput multiplier.

Per-proposal failures stay isolated: handleAcceptError already absorbs
errors without rethrowing, and the explicit try/catch wrapper around
processProposal here keeps the same isolation under pMap with
stopOnError: false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a periodic sweep on the indexer-agent that diffs each dips-basis
indexing rule against the indexing-payments-subgraph's view of the
indexer's accepted agreements. If a rule has no matching IndexingAgreement
in state Accepted, the rule is deleted; the agent's normal reconciliation
loop then closes the corresponding allocation through its existing path.

Defends against the "indexer indexing without payment" failure mode:
when something kills the originally-paired agreement (dipper expires it,
agent restarts mid-flow, DB cleanup loses in-flight context), the rule
survives and the agent keeps the allocation alive. Without an oracle the
agent has no signal that the allocation is unpaid. The
indexing-payments-subgraph is the right oracle because it tracks
exactly the on-chain accept events that bind agreements to indexers.

Single batched query (indexingAgreements where indexer = SELF, state =
Accepted) plus _meta block timestamp; the timestamp gates the sweep so
we never disable rules from stale subgraph data. If the subgraph is
more than DIPS_SWEEP_STALENESS_THRESHOLD_SECONDS (300s) behind
wall-clock, or the query errors, the sweep skips this tick and logs.

Loop registered alongside startProposalAcceptanceLoop in
AllocationManager. The sweep is independently configurable: it runs
every DIPS_SWEEP_INTERVAL (60s) on a separate timer from the 5s accept
loop, so a slow sweep tick can't backpressure acceptance.

Tests cover: stale rule removal, healthy rule preservation, mixed
backed/unbacked, subgraph staleness skip, query error skip, and the
no-subgraph-configured no-op path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: 🗃️ Inbox

Development

Successfully merging this pull request may close these issues.

2 participants