-
Notifications
You must be signed in to change notification settings - Fork 747
perf(state): Cache validator workload in BuildLastCommitInfo #5421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
state/execution.go
Outdated
| } | ||
|
|
||
| // ExecCommitBlockWithCache is an optimized version that uses caching for better performance | ||
| func (blockExec *BlockExecutor) ExecCommitBlockWithCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unused. looks like the previous ExecCommitBlock is only used when replaying blocks. do we want to use that there?
state/execution.go
Outdated
| } | ||
|
|
||
| // ClearValidatorCache clears the validator caches to free memory | ||
| func (blockExec *BlockExecutor) ClearValidatorCache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unused except in tests, so the part of your test that is testing cache clearing isn't really testing anything
| validatorCache: make(map[int64]*types.ValidatorSet), | ||
| abciValidatorCache: make(map[string]abci.Validator), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in real execution, no values from these caches are ever removed, so these will grow forever. We will need someway to periodically remove elements from them.
state/execution.go
Outdated
| commitSig := block.LastCommit.Signatures[i] | ||
|
|
||
| // Create cache key for this validator | ||
| cacheKey := fmt.Sprintf("%s_%d", val.PubKey.Address(), val.VotingPower) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my understanding the point of the abciValidatorCache is to not have to do the expensive val.PubKey.Address() call. however you are calling val.PubKey.Address() when generating the cacheKey. doesn't this eliminate most of the point of the cache?
|
Had some race conditions in the validator caching that were causing CI failures, so I added proper double-checked locking and moved the cache cleanup outside the mutex locks to prevent deadlocks. Also toned down the cache cleanup frequency since it was interfering with mempool rechecking and updated the test accordingly. |
state/execution.go
Outdated
| // Simple cleanup: clear half of the cache | ||
| // Since Go maps don't guarantee iteration order, we'll clear the entire cache | ||
| // and let it rebuild naturally. This is simpler and avoids the FIFO issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cursor pointed this out as well but the first line of the comment here is misleading, it clears the whole cache, not half
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to authenticate your request. Please make sure to connect your GitHub account to Cursor. Go to Cursor
state/execution.go
Outdated
| func (blockExec *BlockExecutor) cleanupOldCacheEntries() { | ||
| // Only cleanup if cache is significantly over the limit to avoid frequent cleanup | ||
| blockExec.validatorCacheMutex.Lock() | ||
| if len(blockExec.validatorCache) > blockExec.maxCacheSize*2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why *2 here? that feels misleading against what the maxCacheSize variable implies (that we will not grow past it). Since you already initialize the maxCacheSize to a default of 1000, why not just set it to 2000 by default?
state/execution.go
Outdated
| // Double-checked locking: acquire write lock and check again | ||
| blockExec.validatorCacheMutex.Lock() | ||
| // Check again in case another goroutine added it | ||
| if existingValSet, exists := blockExec.validatorCache[height]; exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not following why you would need to grab from the cache again here? wouldn't this always be the same as what you just loaded from the store on line 549? I get that another goroutine may have just cached it and its now available in the cache, but why do we need to fetch it from the cache when we already did the work of loading it from the store above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question for the BuildLastCommitInfoWithCache
state/execution.go
Outdated
| // cleanupOldCacheEntries removes old entries from caches to prevent memory leaks | ||
| func (blockExec *BlockExecutor) cleanupOldCacheEntries() { | ||
| // Only cleanup if cache is significantly over the limit to avoid frequent cleanup | ||
| blockExec.validatorCacheMutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be called quite frequently and you dont actually need a write lock on this except for the single time that the cache needs to be cleared. it may make more sense to acquire a read only lock when checking the condition, and change to a write lock only when the cache actually needs to be written to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also use your helper GetCacheSize to get the sizes with a read only lock
Refactor ABCI validator conversion to use canonical helper for consistent field population.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days-before-close if no further activity occurs. |
Summary
Implements caching for
state.BuildLastCommitInfoto eliminate performance bottlenecks in consensus operations.Changes
BlockExecutorBuildLastCommitInfoFromStoreWithCacheandBuildLastCommitInfoWithCachemethodsapplyBlock,ExtendVote,ProcessProposalClearValidatorCache()for memory managementPerformance Impact
Fixes #3152
Note
Introduce cached validator/ABCI validator lookups in
BuildLastCommitInfoand update call sites for faster block processing.state/execution.go):validatorCacheandabciValidatorCachewith RW locks toBlockExecutor.BuildLastCommitInfoFromStoreWithCache,BuildLastCommitInfoWithCache, andClearValidatorCache.buildLastCommitInfoFromStoreinProcessProposal,FinalizeBlockpath (applyBlock), andExtendVotewith cached variants.NewBlockExecutor.ExecCommitBlockWithCachehelper.state/execution_test.go):Written by Cursor Bugbot for commit 5663a3f. This will update automatically on new commits. Configure here.