-
Notifications
You must be signed in to change notification settings - Fork 626
[Fix] Accept proof submission even it has been timeout #1764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughA timeout proof validation path was modified to remove early rejection. Instead of immediately returning an error upon detecting a timeout, the validator now increments a counter, logs a warning, and allows validation to proceed. Changes
Sequence Diagram(s)sequenceDiagram
participant V as Validator
participant C as Counter
participant L as Logger
rect rgb(230, 240, 255)
Note over V: Old behavior
V->>V: Detect timeout?
alt Timeout found
V->>L: Log error
V-->>V: Return ErrValidatorFailureProofTimeout
end
end
rect rgb(240, 250, 230)
Note over V: New behavior
V->>V: Detect timeout?
alt Timeout found
V->>C: Increment validateFailureProverTaskTimeout
V->>L: Log warning
V->>V: Continue validation
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
coordinator/internal/logic/submitproof/proof_receiver.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/logic/submitproof/proof_receiver.go (1)
common/types/db.go (2)
ProverTaskFailureType(99-99)ProverTaskFailureTypeTimeout(105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check
- GitHub Check: tests
- GitHub Check: tests
| // if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it | ||
| if types.ProverTaskFailureType(proverTask.FailureType) == types.ProverTaskFailureTypeTimeout { | ||
| m.validateFailureProverTaskTimeout.Inc() | ||
| log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, | ||
| "proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Timeout now only logs/metrics and proceeds — double‑check interaction with ProvingStatus guard and clean up comment/log text
The new behavior (incrementing validateFailureProverTaskTimeout and logging, but not returning) matches the PR goal of accepting late proofs after a timeout. A couple of points to verify/tidy up:
-
Interaction with “submit twice” guard (lines 315‑329)
If timeouts elsewhere in the system are recorded by setting the task’sProvingStatustotypes.ProverProofInvalid(orValid), then the early guard:if types.ProverProveStatus(proverTask.ProvingStatus) == types.ProverProofValid || types.ProverProveStatus(proverTask.ProvingStatus) == types.ProverProofInvalid { ... return ErrValidatorFailureProverTaskCannotSubmitTwice }
will fire before this timeout block, and late proofs for such tasks will still be rejected. For the new behavior to be effective, timed‑out tasks you still want to accept later proofs for must not be marked
ProverProofValid/ProverProofInvalidbefore the submission arrives. Please confirm that your timeout handling only setsFailureTypetoProverTaskFailureTypeTimeout(seecommon/types/db.go:98-104) and leavesProvingStatusin a state that passes this guard. -
Metrics semantics
validateFailureProverTaskTimeoutis documented as “validate failure timeout”, but it now increments even for proofs you accept and verify successfully. That’s probably fine if you interpret it as “number of submissions whose task had previously timed out”, but it will no longer be a strict subset ofvalidateFailureTotal(which only increments on non‑nilerr). Just keep this in mind for dashboards/alerts. -
Minor comment & log clarity (optional polish)
- The comment mentions
SessionInfoFailureTimeout, but the actual enum isProverTaskFailureTypeTimeout. - The log message
"proof submit proof have timeout"is hard to read.
You could make these clearer with something like:
- The comment mentions
- // if prover task FailureType is SessionInfoFailureTimeout, the submit proof is timeout, but we still accept it
-
// If prover task FailureType is ProverTaskFailureTypeTimeout, the submission has timed out, but we still accept it.
if types.ProverTaskFailureType(proverTask.FailureType) == types.ProverTaskFailureTypeTimeout {
m.validateFailureProverTaskTimeout.Inc()
-
log.Warn("proof submit proof have timeout", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, -
"proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)
-
log.Warn("proof submission timed out but will be verified", "hash", proofParameter.TaskID, "taskType", proverTask.TaskType, -
}
"proverName", proverTask.ProverName, "proverPublicKey", pk, "proofTime", proofTimeSec)
To help verify how timeouts are recorded across the codebase, you can run:
🏁 Script executed:
#!/bin/bash
# Inspect usages of ProverTaskFailureTypeTimeout and related status updates
rg -n "ProverTaskFailureTypeTimeout" -C3
rg -n "UpdateProverTaskProvingStatusAndFailureType" -C3Length of output: 6120
Critical: Timeout-marked tasks are rejected before reaching the new acceptance logic
The script output reveals a blocking issue with the intended behavior. When a timeout is detected, collect_proof.go:237 calls:
UpdateProverTaskProvingStatusAndFailureType(c.ctx, assignedProverTask.UUID, types.ProverProofInvalid, types.ProverTaskFailureTypeTimeout, tx)This sets ProvingStatus to ProverProofInvalid. However, the guard at lines 315–329 rejects submissions before the new timeout handler at lines 364–369 can execute:
if types.ProverProveStatus(proverTask.ProvingStatus) == types.ProverProofValid ||
types.ProverProveStatus(proverTask.ProvingStatus) == types.ProverProofInvalid {
...
return ErrValidatorFailureProverTaskCannotSubmitTwice
}Result: Late proofs for timed-out tasks will always be rejected by the "cannot submit twice" guard, never reaching the new acceptance logic. The PR's goal cannot be achieved with this implementation.
Fix: Either (1) skip the "submit twice" guard when FailureType == ProverTaskFailureTypeTimeout, or (2) leave timed-out tasks with ProvingStatus unset (or set to a non-rejected state) so they pass the guard. The latter likely requires changes to collect_proof.go:237 as well.
🤖 Prompt for AI Agents
In coordinator/internal/logic/submitproof/proof_receiver.go around lines
315–369, the current "cannot submit twice" guard rejects tasks already marked
ProverProofInvalid before the timeout-acceptance logic at 364–369 runs; to fix,
either update the guard (lines ~315–329) to allow submissions when
proverTask.FailureType == types.ProverTaskFailureTypeTimeout by skipping the
ProverProofInvalid/ProverProofValid check for timeout-marked tasks, or instead
change the code that marks timeouts (collect_proof.go:237) so it does not set
ProvingStatus to ProverProofInvalid for timeout cases (leave it unset or set to
a non-rejecting state) so the existing guard won’t block late proofs—pick one
approach and implement the corresponding change consistently across both files.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1764 +/- ##
===========================================
- Coverage 36.54% 36.49% -0.05%
===========================================
Files 247 247
Lines 21186 21185 -1
===========================================
- Hits 7742 7732 -10
- Misses 12614 12631 +17
+ Partials 830 822 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
We faced a delimma in
collection_timesetting (i.e. the time limit to set an assigned prove task being timeout): value too small would cause a possible time consuming task can not be completed since all submission would be rejected by timeout; in the other hand, a too big timeout would take too long to re-assign a task if the connection with assignment (prover) lost.This PR advise to accept the proof submission even it has been timeout: there is no proper reason to reject the result if it can be verified. With the fixing we can reduce the interval of reassignment without worring about a permanent failure of occasional long-running task. The counter of timeout failure would still be counted.
Summary by CodeRabbit
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.