Add chaos testing framework for ldk-server #108
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce ldk-server-chaos, a testing harness that stress-tests ldk-server by running multiple nodes, opening channels, and continuously sending payments while randomly killing and restarting nodes.
This tool readily reproduces the long-existing channel monitor/manager desync issue that can occur when a node is forcefully terminated at an inopportune moment during channel state updates. By using SIGKILL at random intervals across 3 nodes while 20 concurrent payment loops are running, the harness creates exactly the conditions that trigger such desyncs without making assumptions about specific timing or failure scenarios.
The test can verify potential fixes in a robust way: if payments continue flowing successfully across thousands of kill/restart cycles, there's strong evidence the fix is working. Failure is detected when any payment direction times out (no success for 60 seconds), typically indicating a desync has rendered channels unusable.