feat(backup_request): add rate-limited backup request policy (#3228)#3229
feat(backup_request): add rate-limited backup request policy (#3228)#3229feng-y wants to merge 9 commits intoapache:masterfrom
Conversation
|
@copilot review this PR |
There was a problem hiding this comment.
Pull request overview
Adds a rate-limiting mechanism for backup requests, aiming to prevent “backup request storms” by suppressing backups when the backup/total ratio exceeds a configurable threshold over a sliding time window.
Changes:
- Introduces
ChannelOptions.backup_request_max_ratioand creates an internal rate-limitedBackupRequestPolicywhen enabled. - Adds global gflags for ratio threshold, window size, and ratio cache update interval, backed by
bvar::Window+ cumulative counters for cold start. - Adds a unit test covering policy creation/precedence plus a timing-based functional check.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/brpc_channel_unittest.cpp | Adds coverage for rate-limited backup request behavior and configuration precedence. |
| src/brpc/channel.h | Adds per-channel configuration and internal ownership for an auto-created backup policy. |
| src/brpc/channel.cpp | Creates and wires an internal rate-limited backup policy during channel initialization. |
| src/brpc/backup_request_policy.h | Refactors BackupRequestPolicy into a concrete base with optional rate limiting support. |
| src/brpc/backup_request_policy.cpp | Implements rate limiter, gflags, and default policy behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/brpc/backup_request_policy.cpp:120
- When compare_exchange_strong fails on line 105, the code continues to use the potentially stale
ratiovalue loaded on line 102, even though another thread may have updated_cached_ratioon line 116. While the comment acknowledges this is "best-effort rate limiting," consider reloading_cached_ratioafter a failed compare_exchange to reduce the window of staleness, especially in high-concurrency scenarios where the stale value could lead to temporarily incorrect rate limiting decisions.
bool DoBackup(const Controller* /*controller*/) const override {
const int64_t now_us = butil::cpuwide_time_us();
int64_t last_us = _last_update_us.load(butil::memory_order_relaxed);
double ratio = _cached_ratio.load(butil::memory_order_relaxed);
if (now_us - last_us >= _update_interval_us) {
if (_last_update_us.compare_exchange_strong(
last_us, now_us, butil::memory_order_relaxed)) {
int64_t total = _total_window.get_value();
int64_t backup = _backup_window.get_value();
// Fall back to cumulative counts when the window has no
// sampled data yet (cold-start within the first few seconds).
if (total <= 0) {
total = _total_count.get_value();
backup = _backup_count.get_value();
}
ratio = (total > 0) ? static_cast<double>(backup) / total : 0.0;
_cached_ratio.store(ratio, butil::memory_order_relaxed);
}
}
return ratio < _max_backup_ratio;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b50fdc to
e6db035
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
test/brpc_channel_unittest.cpp:2856
- The comment says “ratio refreshes on every DoBackup call” but with
FLAGS_backup_request_ratio_update_interval_s = 1the ratio cache can only refresh at most once per second. This makes the test rationale misleading when debugging failures; please adjust the comment to match the actual behavior.
// Test 5: Functional — rate limiting reduces backup requests.
// Use short update interval so ratio refreshes on every DoBackup call.
// Save/restore gflags outside the inner scope so cleanup always runs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ed6666 to
e614e33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (4)
src/brpc/channel.cpp:284
- Consider checking if CreateRateLimitedBackupPolicy returns NULL before setting _owns_backup_policy to true. While the current guards (line 278 checks max_ratio > 0, validators ensure gflags are valid) should prevent NULL returns in practice, defensive programming suggests verifying the return value and only setting _owns_backup_policy if the policy was successfully created. This would make the code more robust against future changes to CreateRateLimitedBackupPolicy or the validators.
_options.backup_request_policy = CreateRateLimitedBackupPolicy(
_options.backup_request_ms, max_ratio,
FLAGS_backup_request_ratio_window_size_s,
FLAGS_backup_request_ratio_update_interval_s);
_owns_backup_policy = true;
test/brpc_channel_unittest.cpp:2868
- The gflag FLAGS_backup_request_max_ratio is saved and restored but never actually modified in this test. The test uses opt.backup_request_max_ratio = 0.3 (per-channel config) instead of the global gflag. The save/restore of saved_ratio at lines 2862 and 2868 is unnecessary and could be removed for clarity.
const double saved_ratio = brpc::FLAGS_backup_request_max_ratio;
brpc::FLAGS_backup_request_ratio_update_interval_s = 1;
brpc::FLAGS_backup_request_ratio_window_size_s = 2;
BRPC_SCOPE_EXIT {
brpc::FLAGS_backup_request_ratio_update_interval_s = saved_interval;
brpc::FLAGS_backup_request_ratio_window_size_s = saved_window;
brpc::FLAGS_backup_request_max_ratio = saved_ratio;
src/brpc/backup_request_policy.h:48
- The documentation states "The caller owns the returned pointer" but this is inconsistent with actual usage in channel.cpp where the Channel owns the policy when created via backup_request_max_ratio. Consider clarifying: "The caller owns the returned pointer and must delete it when done. However, when Channel creates this policy internally via backup_request_max_ratio, Channel manages the lifetime automatically."
// The caller owns the returned pointer.
test/brpc_channel_unittest.cpp:2853
- Missing test coverage for backup_request_max_ratio = 0 (explicitly disabled). While -1 means "use gflag default", 0 should explicitly disable rate limiting even if the gflag is set to enable it. Consider adding a test case that sets the gflag to a positive value, then creates a channel with opt.backup_request_max_ratio = 0, and verifies that no policy is created.
// Test 4: No policy when backup_request_ms < 0 (backup disabled)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = -1;
opt.backup_request_max_ratio = 0.3;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e614e33 to
27fcaee
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the edge case where
backup_request_max_ratio = 0. According to the PR description, 0 should explicitly disable rate limiting. A test case similar to Test 2 but withmax_ratio = 0instead of-1would verify this behavior and ensure both forms of disabling work correctly.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the special case where
backup_request_max_ratio = 1.0. According to the implementation (backup_request_policy.cpp:106-107), values >= 1.0 are treated as "no limit" since the ratio cannot exceed 1.0. A test should verify that setting max_ratio to 1.0 allows all backup requests, similar to having no rate limiting.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the case where
backup_request_max_ratio > 1.0. According to channel.cpp:273-276, values > 1.0 should be clamped to 1.0 with a warning. A test should verify this clamping behavior works correctly and that the policy is still created with the clamped value.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
27fcaee to
abe44a3
Compare
d9a2e0d to
efd2f73
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
efd2f73 to
95af5c8
Compare
|
@wwbmmm Pipeline failed: some packages failed to download. |
Please merge master branch which fixed CI failure. |
e840999 to
3243190
Compare
…e guidance - Promote built-in factory function to its own subsection (before custom interface) - Add unique_ptr usage example for policy lifetime management - Add RateLimitedBackupPolicyOptions parameter table with defaults/constraints - Document NULL return on invalid params - Keep cn/en docs in sync
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nts, tests
- controller.cpp: When policy returns -1 (inherit sentinel), fall back
to _backup_request_ms set from ChannelOptions, so backup timer is
actually armed when using a policy with backup_request_ms=-1.
- backup_request_policy.cpp: Clarify OnRPCEnd comment to say 'RPC legs'
(both original and backup completions counted as denominator).
- backup_request_policy.cpp: Warn when update_interval_seconds exceeds
window_size_seconds (window would rarely refresh within its period).
- backup_request_policy.h: Fix comment typo ('Called when an RPC ends').
- brpc_channel_unittest.cpp: Replace nullptr with NULL to match codebase
convention; use ASSERT_TRUE(p != NULL) for unique_ptr null checks.
- brpc_channel_unittest.cpp: Add ValidMaxRatioAtBoundary behavioral assert
and AfterColdStartBackupSuppressedUntilRpcCompletes test.
…test - docs: fix backup_request_ms default (0→-1) and constraint (>=0→>=-1); add note that -1 inherit only works via ChannelOptions injection path, not Controller::set_backup_request_policy(). - test: replace no-op AfterColdStart test with a real behavioral assertion: after cold-start backup fires, wait 1.2s for ratio refresh, verify DoBackup() returns false (conservative ratio=1.0 path triggers).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
test/brpc_channel_unittest.cpp:3151
- The explanation for the cold-start behavior here doesn’t match the current limiter logic: on true cold start (total==0 and backup==0) the computed ratio is 0.0 (allow), and only after backups are counted and the update interval elapses can the conservative ratio=1.0 path kick in (total==0, backup>0). Updating these comments to match the actual transitions would make the test easier to understand/maintain.
// With max_backup_ratio=1.0, backup is allowed until backup_count/total
// reaches 1.0. At cold start (total=0) the conservative path applies:
// ratio is treated as 1.0, so DoBackup() returns false only after the
// first backup is counted.
ASSERT_TRUE(p->DoBackup(NULL)); // cold start: no total yet, allow first
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
test/brpc_channel_unittest.cpp:3152
- The explanatory comment about cold start here doesn’t match the implementation in BackupRateLimiter::ShouldAllow(): when total==0 and backup==0, the code treats this as “true cold start” and sets ratio=0.0 (allow freely). The conservative ratio=1.0 path only applies after backups have been counted but totals are still 0. Consider updating the comment to reflect the actual behavior to avoid confusion when maintaining the test.
// With max_backup_ratio=1.0, backup is allowed until backup_count/total
// reaches 1.0. At cold start (total=0) the conservative path applies:
// ratio is treated as 1.0, so DoBackup() returns false only after the
// first backup is counted.
ASSERT_TRUE(p->DoBackup(NULL)); // cold start: no total yet, allow first
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…CEnd comment, re-allow test, docs - controller.cpp: treat -1 specifically (not all negatives) as the inherit sentinel; other negatives still disable backup, preserving old behavior for custom policies that return negative values to disable backup - backup_request_policy.h: document the -1 sentinel contract on GetBackupRequestMs() so custom implementors know the new interface - backup_request_policy.cpp: fix OnRPCEnd comment — called once per user-level RPC, not once per leg (total_count tracks user RPCs) - test: add OnRPCEndDrivesRatioDownAndReAllows — fires 20 backups to suppress, then completes 50 RPCs via OnRPCEnd, verifies DoBackup re-allows once ratio refreshes below max_backup_ratio - docs (EN+CN): rephrase backup_request_ms=-1 note to clarify the channel-level fallback only applies when set via ChannelOptions
Plain new follows brpc's project-wide OOM convention (abort rather than return NULL). The factory's NULL return already exclusively signals invalid parameters, not allocation failure — adding std::nothrow would conflate the two. Comment added to suppress future linter/AI suggestions.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…oyed before policy The unique_ptr comment was ambiguous: 'released when goes out of scope, as long as it outlives the channel' can be read as contradictory. Reword to make the ordering explicit: destroy channel first, then policy.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
test/brpc_channel_unittest.cpp:3150
- The comment describing the cold-start behavior is inaccurate: when
total==0andbackup==0,BackupRateLimiter::ShouldAllow()treats the ratio as0.0(true cold start), not1.0(the conservative path only applies whenbackup>0 && total==0). Please update the comment so it matches the implementation and doesn’t mislead future readers.
// With max_backup_ratio=1.0, backup is allowed until backup_count/total
// reaches 1.0. At cold start (total=0) the conservative path applies:
// ratio is treated as 1.0, so DoBackup() returns false only after the
// first backup is counted.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tioAtBoundary ratio=1.0 conservative path only applies when backup>0 && total==0. True cold start (both zero) sets ratio=0.0 and allows freely.
Summary
Add ratio-based rate limiting for backup requests to prevent backup request storms under high QPS or downstream latency spikes.
Design
BackupRequestPolicyinterface unchanged (ABI stable)BackupRateLimiter: standalone statistics module tracking backup/total ratio within a sliding time window usingbvar::Window<bvar::Adder>+ relaxed atomics for lock-free best-effort throttlingRateLimitedBackupPolicy: internal implementation composingBackupRateLimiter, hidden in.cppRateLimitedBackupPolicyOptions: public config struct with sensible defaultsCreateRateLimitedBackupPolicy(opts): public factory function (validates inputs, returns NULL on error)unique_ptror set onChannelOptions.backup_request_policyConfiguration (
RateLimitedBackupPolicyOptions)backup_request_msChannelOptions.backup_request_ms(only when set viaChannelOptions). Must be >= -1.max_backup_ratiowindow_size_secondsupdate_interval_secondsEdge cases handled
max_backup_ratio > 1.0: rejected by factory (returns NULL with LOG(ERROR))Rate-limiting semantics
The denominator counts all completed RPC legs (original + backup), not user-visible RPCs. For
max_backup_ratio=R, the effective fraction of user RPCs receiving a backup is approximatelyR/(1-R)(e.g. ~11% for R=0.1). The cached ratio refreshes at most once perupdate_interval_seconds, so short bursts within that window may transiently exceed the limit.Closes #3228
Usage
Test plan