Fix fragment-induced BAD,Incorrect result rejects: buffer TCP reads per connection#4
Open
gionag wants to merge 1 commit intoduino-coin:mainfrom
Open
Fix fragment-induced BAD,Incorrect result rejects: buffer TCP reads per connection#4gionag wants to merge 1 commit intoduino-coin:mainfrom
gionag wants to merge 1 commit intoduino-coin:mainfrom
Conversation
…d share rejects
receiveData attached a one-shot 'data' listener and resolved with
whatever single chunk Node.js delivered. TCP does not preserve
application-level message boundaries — a client message can span
multiple 'data' events, or two messages can coalesce into one —
and this pattern had no way to recover. answer[0] ended up parsing
fragments ("JOB", mining_key tails), parseInt returned NaN,
miner_res !== random, and valid shares got rejected as
BAD,Incorrect result.
Observed rate under 47 concurrent mining sockets over ~80 ms RTT:
roughly 0.34 % of shares rejected. Reproduced locally at ~0.17 %;
first reject consistently appeared after ~14k clean shares.
Fix: install a single persistent 'data' listener per connection
that accumulates into a per-conn buffer and hands out one message
at a time. Framing is dual-mode for backward compatibility:
- If the buffer contains '\n', cut there (robust path).
- Otherwise wait LEGACY_IDLE_FLUSH_MS (10 ms) after the last
chunk; if nothing more arrives, treat the buffer as a single
message. Fragments of the same logical message arrive sub-ms
apart at receive side, so the idle gap cleanly separates
"still arriving" from "done".
The same pattern is applied to connectionHandler's mainListener so
the first JOB_REQ on a new connection cannot pass a truncated
mining_key to miningHandler.
Legacy miners that do not append '\n' are unaffected. Clients that
opt in to '\n' termination get the zero-latency fast path.
Validation: 47 virtual workers against the patched pool for 35
minutes, 33,947 shares processed, 0 BAD,Incorrect result. Baseline
unpatched: first BAD at ~14,192 shares, persistent thereafter.
Author
Note for follow-upThis PR is fully backward-compatible: clients that don't terminate Worth noting for future work: if the reference miner code is updated Not a prerequisite for merging this PR — it stands on its own — just |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Under sustained load a fraction of mining shares were rejected as
BAD,Incorrect resulteven when the nonce was mathematicallycorrect. Miners see a rejection rate of roughly 0.3 % for no
apparent reason. Reproduced against this repo verbatim at ~0.17 %.
Root cause
receiveDatainsrc/mining.jsattached a one-shot'data'listener and resolved with whatever single chunk Node.js delivered:
TCP does not preserve application-level message boundaries. Under
event-loop pressure a single client message can span two
'data'events, or two messages can arrive in one event. The one-shot
pattern has no way to recover:
answer[0]ends up parsing afragment of a different message ("JOB", a
mining_keysuffix,etc),
parseInt(...)returnsNaN, andminer_res !== randomtriggers
BAD,Incorrect resulton a perfectly valid share.The same fragmentation risk exists in
connectionHandler.jsmainListenerfor the very firstJOB_REQon a new connection —a truncated mining_key can be passed to
miningHandlerandpropagated into
minersStats.pwfor the rest of the session.Fix
Install a single persistent
'data'listener per connection thatappends incoming chunks to a per-connection buffer, then hands out
one message per
receiveDatacall.Framing is dual-mode for backward compatibility:
\nis in the buffer, cut there. Clientsthat terminate their messages get robust boundary detection.
\narrives withinLEGACY_IDLE_FLUSH_MS(10 ms) after the last chunk, treat thebuffer as a single message. TCP fragment parts of the same
logical message arrive sub-ms apart at receive side, so this
idle gap cleanly separates "still arriving" from "message
complete".
Existing miners that do not append
\n(AVR_Miner.py,PC_Miner.py, and most community clients) are unaffected — theyhit the legacy path with an imperceptible 10 ms tail. New clients
that opt in to
\ntermination get the zero-latency fast path.Validation
Test harness: 47 concurrent virtual-worker TCP sockets against the
patched pool, diff = 10, ~0.5 shares/sec/worker.
BAD,Incorrect resultEvidence on the unpatched side included 30 distinct reject cycles
where the client's submitted nonce was the correct answer to the
pool's own
(lastBlockhash, newHash)challenge (verified byclient-side SHA-1 comparison), but the pool's
answer[0]parsedas
"JOB"or"...destroyer"(the mining_key tail) — proof thatthe fragment-reassembly assumption was wrong.
Files changed
src/mining.js—receiveDatarewritten with persistentper-connection buffer + dual-mode framing.
src/connectionHandler.js—mainListenerrefactored to thesame pattern so the first
JOB_REQgets the same protection;post-
\nleftover is handed off tomining.jsvia the sharedconn._recvBufso no bytes are lost at the handler boundary.No client-side changes required. No API changes. No configuration
changes.