Skip to content

Conversation

@joostjager
Copy link
Contributor

@joostjager joostjager commented Jan 21, 2026

Introduce ldk-server-chaos, a testing harness that stress-tests ldk-server by running multiple nodes, opening channels, and continuously sending payments while randomly killing and restarting nodes.

This tool readily reproduces the long-existing channel monitor/manager desync issue that can occur when a node is forcefully terminated at an inopportune moment during channel state updates. By using SIGKILL at random intervals across 3 nodes while 20 concurrent payment loops are running, the harness creates exactly the conditions that trigger such desyncs without making assumptions about specific timing or failure scenarios.

The test can verify potential fixes in a robust way: if payments continue flowing successfully across thousands of kill/restart cycles, there's strong evidence the fix is working. Failure is detected when any payment direction times out (no success for 60 seconds), typically indicating a desync has rendered channels unusable.

@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@joostjager joostjager changed the title Chaos Add chaos testing framework for ldk-server Jan 21, 2026
Adds a new endpoint to connect to a peer on the Lightning Network
without opening a channel. This is useful for establishing connections
before channel operations or for maintaining peer connectivity.

The endpoint accepts node_pubkey, address, and an optional persist flag
that defaults to true for automatic reconnection on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce ldk-server-chaos, a testing harness that stress-tests ldk-server
by running multiple nodes, opening channels, and continuously sending
payments while randomly killing and restarting nodes.

Features:
- Spawns 3 ldk-server nodes with auto-generated configs
- Creates a fully connected channel topology
- Runs concurrent payment loops between all node pairs
- Randomly kills and restarts nodes to test resilience
- Tracks payment success rates and detects timeout failures
- Uses bitcoind RPC for on-chain operations and block generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants