Refactor e2e RustFS fault plans#143
Conversation
dbe6653 to
05b6dcf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 05b6dcfead
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "checker-pre-recommit-report.json", | ||
| &serde_json::to_string_pretty(&pre_recommit_report)?, | ||
| )?; | ||
| if let Err(error) = pre_recommit_report.require_success() { |
There was a problem hiding this comment.
Defer strict checks until unknown overwrites are recommitted
When the mixed workload times out or gets an unknown result for an overwrite that actually lands, check_s3_history still models the previous committed value as live and reports a hash mismatch; requiring success here fails the run before recommit_unconfirmed_objects below can reconcile that accepted S3 timeout/unknown outcome. This makes fault scenarios fail on a normal ambiguous-write case rather than on data loss or corruption.
Useful? React with 👍 / 👎.
| let s3 = s3.clone(); | ||
| let recorder = recorder.clone(); | ||
| async move { | ||
| let get = s3.get_object_result(&key, &recorder).await?; |
There was a problem hiding this comment.
Avoid recording unknown-write probes in shared history
For a timed-out or unknown PUT that later materializes, this pre-recommit probe records a successful GET in history.jsonl while the model still has no committed live value for that key. The final checker then replays that probe as unexpected_visible_deleted_objects, so the scenario fails even after the recommit step succeeds; use a non-recording probe or tag these records so anomaly detection ignores them.
Useful? React with 👍 / 👎.
Type of Change
Related Issues
N/A
Summary of Changes
Adds a resolved fault-run contract and lifecycle event stream for RustFS e2e fault tests.
This PR introduces stable
run-spec.yaml/run-spec.jsonartifacts, JSONL run events for future visualization, configurable RustFS pod and volume assumptions, stronger fault plan validation, and stricter artifact validation in the shell runner. It also keeps composite multi-fault execution gated until an explicit composition policy exists.Checklist
make pre-commit(fmt-check + clippy + test + console-lint + console-fmt-check)[Unreleased](if user-visible change)Impact
Verification
Additional Notes
The destructive live fault scenario itself still requires a dedicated real Kubernetes or K3s cluster with the required fault-test environment variables and Chaos Mesh/device-mapper prerequisites.