Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 14, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 20, 2025 8:53pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 20, 2025 8:53pm
rivet-inspector Ignored Ignored Preview Nov 20, 2025 8:53pm
rivet-site Ignored Ignored Preview Nov 20, 2025 8:53pm

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 14, 2025 01:40 Inactive
Copy link
Contributor Author

MasterPtato commented Nov 14, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 14, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm to more closely align with FoundationDB's behavior. Overall, the changes look good and represent a significant improvement.

Code Quality: ✅ Good

Strengths:

  • Excellent documentation with concrete examples showing the backoff progression
  • References to FoundationDB source code provide good context
  • Clean, readable implementation

Suggestions:

  1. Documentation clarity: The comment at line 45 says "Jitter is 0-10% of backoff ms" but this is slightly misleading:

    • At attempt 0: jitter is 0% (base=1, so jitter is always 0)
    • At attempt 1+: jitter is 0-10% as stated

    Consider updating the comment to: // Jitter is 0-10% of backoff ms (proportional to base)

Correctness: ✅ Verified

Algorithm Analysis:

Old algorithm:

  • Max attempt capped at 10
  • Max backoff: 2^10 * 10 = 10,240ms
  • Fixed jitter: 0-99ms (not proportional)

New algorithm:

  • Max attempt capped at 7 ✅
  • Max backoff: 2^7 * 10 = 1,280ms ✅ (aligns with FDB's 1s max mentioned in docs)
  • Proportional jitter: 0-10% ✅

The math checks out correctly:

  • base = 2^min(attempt, 7) produces: 1, 2, 4, 8, 16, 32, 64, 128
  • base_backoff_ms = base * 10 produces: 10, 20, 40, 80, 160, 320, 640, 1280ms
  • jitter_ms = rand() % base produces proportional jitter

Potential Issues: ⚠️ Minor

Edge case - Attempt 0:
At attempt 0, base = 1, so jitter_ms = rand::random::<u64>() % 1 will always be 0. This is fine functionally, but means the first retry has no jitter. This could lead to thundering herd issues if many transactions fail simultaneously on their first attempt.

Recommendation: Consider starting jitter from attempt 0. For example:

// Option 1: Always add at least 1ms to the modulo
let jitter_ms = rand::random::<u64>() % (base + 1);

// Option 2: Add a minimum jitter amount
let jitter_ms = rand::random::<u64>() % base.max(10);

However, this may be intentional to match FDB's behavior (which appears to have no jitter based on the referenced code), so this is more of a consideration than a required fix.

Performance: ✅ Excellent

The new algorithm is actually better for performance:

  • Faster max backoff (1.28s vs 10.24s) means transactions retry more aggressively
  • Proportional jitter is more appropriate than fixed jitter for preventing thundering herds
  • Better alignment with FoundationDB's proven retry strategy

Security: ✅ No concerns

The use of rand::random() for jitter is appropriate for this use case (not cryptographic).

Test Coverage: ✅ Adequate

The function is used in:

  1. postgres/database.rs - production usage
  2. rocksdb/database.rs - production usage
  3. tests/rocksdb.rs - test usage (though this test has a duplicate implementation of the old algorithm at lines 140-146)

Note: The test file at tests/rocksdb.rs:140-146 has a duplicate implementation of the old algorithm. This should probably be removed since the test should use the shared implementation from utils::calculate_tx_retry_backoff.

Overall Assessment: ✅ Approve with minor suggestions

This is a solid improvement that brings the retry logic closer to FoundationDB's proven approach. The main suggestions are:

  1. Consider adding jitter for attempt 0 to prevent thundering herd (optional, depends on FDB alignment goals)
  2. Clean up the duplicate implementation in tests/rocksdb.rs
  3. Minor documentation clarification about jitter calculation

Great work on the detailed documentation and FDB source references!

@MasterPtato MasterPtato force-pushed the 11-13-fix_udb_update_backoff_algo branch from 9b19169 to a0dfaaf Compare November 14, 2025 02:40
@MasterPtato MasterPtato force-pushed the 11-12-chore_write_update_docs branch from 52bea63 to ce30dce Compare November 14, 2025 02:40
@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 14, 2025 02:40 Inactive
@claude
Copy link

claude bot commented Nov 14, 2025

Code Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. Overall, this is a good improvement that makes the backoff more aggressive and consistent with FDB's design.

✅ Positive Changes

  1. Better alignment with FDB behavior: The new algorithm caps at attempt 7 (max ~1.28s) instead of attempt 10 (max ~10.24s), which is closer to FDB's 1-second max backoff.

  2. Proportional jitter: The jitter is now proportional to the backoff (0-10% of base backoff) rather than a fixed 0-100ms. This is more sensible for larger backoffs.

  3. Excellent documentation: The inline documentation comparing the implementation to FDB with specific links is very helpful.

🐛 Potential Issues

Critical: Modulo bias in jitter calculation

Location: engine/packages/universaldb/src/utils/mod.rs:46

let jitter_ms = rand::random::<u64>() % base;

Issue: Using modulo with rand::random::<u64>() introduces modulo bias. When u64::MAX + 1 is not evenly divisible by base, some values will appear more frequently than others. While the bias is small for small values of base (max 128 in this case), it's still technically incorrect.

Recommendation: Use a proper bounded random range:

use rand::Rng;
let jitter_ms = rand::thread_rng().gen_range(0..base);

This requires adding rand to dependencies if not already present with the rng feature.

Minor: Inconsistency in test file

Location: engine/packages/universaldb/tests/rocksdb.rs:140

The test file has its own copy of calculate_tx_retry_backoff that still uses the old algorithm (max attempt 10). This should be updated to use the exported function from the main library or at least be kept in sync:

// Instead of duplicating, import from the library
use universaldb::calculate_tx_retry_backoff;

📊 Behavioral Analysis

Old algorithm:

  • Attempt 0: 10ms + 0-100ms jitter = 10-110ms
  • Attempt 5: 320ms + 0-100ms jitter = 320-420ms
  • Attempt 10 (max): 10,240ms + 0-100ms jitter = 10.24-10.34s

New algorithm:

  • Attempt 0: 10ms + 0-1ms jitter = 10-11ms
  • Attempt 5: 320ms + 0-32ms jitter = 320-352ms
  • Attempt 7 (max): 1,280ms + 0-128ms jitter = 1.28-1.41s

The new algorithm is:

  • More aggressive (lower max backoff)
  • More consistent (proportional jitter)
  • Better aligned with FDB's 1-second max

🔒 Security Considerations

No security concerns identified.

🧪 Test Coverage

No new tests were added. Consider adding a unit test to verify:

  1. The backoff values for known attempt numbers
  2. That jitter is within expected bounds
  3. That the max cap (attempt 7) works correctly

Example test:

#[test]
fn test_backoff_algorithm() {
    // Test specific attempts
    assert_eq!(calculate_tx_retry_backoff(0), 10..=11);
    assert_eq!(calculate_tx_retry_backoff(1), 20..=22);
    
    // Test max cap
    let backoff_7 = calculate_tx_retry_backoff(7);
    let backoff_100 = calculate_tx_retry_backoff(100);
    // Both should have same base (capped at 7)
    assert!(backoff_7 >= 1280 && backoff_7 < 1408);
    assert!(backoff_100 >= 1280 && backoff_100 < 1408);
}

📝 Recommendations

Required:

  1. Fix the modulo bias in jitter calculation (use rand::thread_rng().gen_range(0..base))
  2. Update or remove the duplicate function in tests/rocksdb.rs:140

Optional:
3. Add unit tests for the backoff algorithm
4. Consider verifying the FDB source links are correct (I couldn't fetch them to verify)

Overall Assessment

Approved with suggested fixes

The core change is sound and improves the backoff algorithm. The modulo bias issue should be addressed before merging, and the test file inconsistency should be resolved.

@claude
Copy link

claude bot commented Nov 14, 2025

Code Review

Summary

This PR updates the exponential backoff algorithm for transaction retries in UniversalDB to better align with FoundationDB's implementation. The changes improve the retry behavior by capping the max backoff and adjusting the jitter calculation.

Positive Changes ✅

  1. Better max backoff cap: Changed from min(10) (max 10.24s) to min(7) (max 1.28s), which is closer to FDB's 1s max
  2. Proportional jitter: The jitter is now 0-10% of the backoff (was fixed 0-100ms), making it scale appropriately with the backoff duration
  3. Excellent documentation: The comments clearly explain the algorithm with examples and reference FDB source code

Issues Found 🔍

1. CRITICAL: Outdated duplicate function in tests

Location: engine/packages/universaldb/tests/rocksdb.rs:140-146

There's a duplicate calculate_tx_retry_backoff function in the test file that still uses the OLD algorithm:

pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
    let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
    let jitter_ms = rand::random::<u64>() % 100;
    base_backoff_ms + jitter_ms
}

Action required: This duplicate should be removed and the test should import from universaldb::calculate_tx_retry_backoff instead. Having two different implementations will cause confusion and inconsistent behavior.

2. Potential arithmetic issue in jitter calculation

Location: engine/packages/universaldb/src/utils/mod.rs:46

The jitter calculation rand::random::<u64>() % base has an issue:

  • For attempt 0: base = 1, so jitter is random % 1 = 0 (always 0, not 0-1ms as documented)
  • For attempt 1: base = 2, so jitter is random % 2 (0-1ms, not 0-2ms as documented)
  • For attempt 2: base = 4, so jitter is random % 4 (0-3ms, not 0-4ms as documented)

The comment says "0-10% of backoff ms" but the actual range is 0 to (base-1)ms, which is slightly less than 10% of base_backoff_ms.

Suggested fix:

// Jitter is 0-10% of backoff ms
let jitter_ms = rand::random::<u64>() % (base_backoff_ms / 10 + 1);

This would give:

  • Attempt 0: 0-1ms jitter (10ms backoff)
  • Attempt 1: 0-2ms jitter (20ms backoff)
  • Attempt 2: 0-4ms jitter (40ms backoff)
  • Attempt 7: 0-128ms jitter (1280ms backoff)

Which matches the documentation exactly.

3. Missing unit tests

There don't appear to be any unit tests specifically for calculate_tx_retry_backoff. Given that this is a critical function for retry behavior, it should have tests verifying:

  • The exponential growth curve
  • The max cap at attempt 7+
  • The jitter is within expected bounds
  • The values roughly match FDB's behavior

Suggested test:

#[test]
fn test_tx_retry_backoff() {
    // Test base values without jitter (by checking range)
    for attempt in 0..10 {
        let backoff = calculate_tx_retry_backoff(attempt);
        let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
        let max_jitter = 2_u64.pow((attempt as u32).min(7));
        
        assert!(backoff >= expected_base);
        assert!(backoff < expected_base + max_jitter);
    }
    
    // Verify max cap
    assert!(calculate_tx_retry_backoff(7) < 1500); // Should be ~1280ms + jitter
    assert!(calculate_tx_retry_backoff(100) < 1500); // Should still be capped
}

Minor Suggestions 💡

  1. Consider using rand::thread_rng().gen_range() instead of rand::random::<u64>() % n to avoid modulo bias (though it's negligible for large u64 values)

  2. Document the dependency on FDB knobs: The comment references specific FDB source files. Consider noting that this should be kept in sync with FDB updates.

Security Considerations 🔒

No security concerns identified. The backoff algorithm is appropriate for preventing retry storms.

Performance Considerations ⚡

The changes improve performance by reducing max backoff from ~10s to ~1.28s, allowing faster recovery from transient conflicts.

Verdict

Good improvement to align with FDB behavior, but needs the test duplicate removed and the jitter calculation fixed to match the documentation. Consider adding unit tests for this critical function.


🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Nov 17, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the exponential backoff algorithm for transaction retries in UniversalDB to more closely align with FoundationDB's implementation. The changes improve the retry behavior with better jitter and a more appropriate max backoff.

Positive Changes ✅

  1. Better max backoff cap: Changed from 2^10 = 1024 (attempt 10) to 2^7 = 128 (attempt 7), resulting in a max base backoff of 1280ms vs 10240ms. This is more reasonable and closer to FDB's 1s max.

  2. Proportional jitter: The new jitter calculation (rand::random::<u64>() % base) provides 0-10% jitter relative to the backoff time, which is more appropriate than the fixed 0-100ms range in the old implementation.

  3. Excellent documentation: The added docstring clearly explains the behavior with examples and references to the FDB source code.

Issues & Concerns 🚨

1. Critical Bug: Jitter Calculation is Incorrect (engine/packages/universaldb/src/utils/mod.rs:46)

let jitter_ms = rand::random::<u64>() % base;

The comment says "Jitter is 0-10% of backoff ms" but this actually provides 0-100% jitter of base, not 10%.

For attempt 0:

  • base = 2^0 = 1
  • jitter_ms = rand() % 1 → 0ms only
  • Result: 10ms + 0ms = 10ms

For attempt 1:

  • base = 2^1 = 2
  • jitter_ms = rand() % 2 → 0-1ms
  • Result: 20ms + 0-1ms = 20-21ms ❌ (should be 20-22ms per docstring)

For attempt 7:

  • base = 2^7 = 128
  • jitter_ms = rand() % 128 → 0-127ms
  • Result: 1280ms + 0-127ms = 1280-1407ms ❌ (should be 1280-1408ms per docstring)

Fix: To match the docstring (0-10% of backoff), it should be:

let jitter_ms = rand::random::<u64>() % (base + 1);

Or update the docstring to reflect actual behavior (0-100% of base, which is 0-10% of base_backoff_ms).

2. Stale Test Function (engine/packages/universaldb/tests/rocksdb.rs:140-146)

There's a duplicate implementation of calculate_tx_retry_backoff in the test file using the OLD algorithm. This should be removed and the function should be imported from universaldb::utils::calculate_tx_retry_backoff instead to ensure tests use the actual implementation.

3. Random Number Generation Concerns (engine/packages/universaldb/src/utils/mod.rs:46)

  • rand::random::<u64>() generates a full 64-bit random number, then takes modulo. For small values of base, this is inefficient.
  • Consider using rand::thread_rng().gen_range(0..base) for cleaner range generation, though this may require adding rand as a dependency with the std feature if not already available.

Minor Observations

  1. Performance: The new algorithm with lower max backoff (1280ms vs 10240ms) will result in faster retry cycles in high-contention scenarios. This is generally good, but ensure this aligns with your system's behavior under load.

  2. Consistency: Good that the docstring includes both your implementation and FDB's for comparison. This makes the differences clear.

Recommendations

Must Fix:

  1. Fix the jitter calculation to match the docstring or update the docstring
  2. Remove the duplicate test implementation and import from the actual module

Nice to Have:

  1. Consider using gen_range for cleaner random number generation
  2. Add a unit test for calculate_tx_retry_backoff to verify the backoff values match expectations

Test Coverage

The function is tested indirectly through the rocksdb tests, but would benefit from a dedicated unit test to verify the backoff progression matches the documented behavior.


Overall, this is a good improvement to the backoff algorithm, but the jitter calculation bug needs to be fixed before merging.

@claude
Copy link

claude bot commented Nov 17, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. The changes improve the retry strategy by capping attempts at 7 (resulting in max ~1.4s) and using proportional jitter.

Code Quality & Best Practices

Positive

  • Excellent documentation with clear examples showing backoff progression
  • Good references to FDB source code for context
  • Clean implementation with clear variable names

Issues Found

1. Critical: Inconsistent Implementation in Test File

engine/packages/universaldb/tests/rocksdb.rs:140-146 contains a duplicate copy of this function with the old implementation:

pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
    let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
    let jitter_ms = rand::random::<u64>() % 100;
    base_backoff_ms + jitter_ms
}

Impact: Tests are using the old backoff logic, making them inconsistent with production code.

Fix: Remove the duplicate function from the test file and import it from the main module instead:

use universaldb::calculate_tx_retry_backoff;

2. Bug: Incorrect Jitter Calculation

Line 46: let jitter_ms = rand::random::<u64>() % base;

Issue: The comment says "Jitter is 0-10% of backoff ms" but the code implements 0-base where base is 2^attempt.

Examples:

  • Attempt 0: base=1, so jitter is 0-1ms (matches docs: "0-1ms")
  • Attempt 1: base=2, so jitter is 0-2ms (matches docs: "0-2ms")
  • Attempt 7: base=128, so jitter is 0-128ms (matches docs: "0-128ms")

Wait, the code is actually correct - the jitter IS proportional to the base. However, the comment is misleading:

  • For attempt 7: base_backoff=1280ms, jitter=0-128ms = exactly 10%

Clarity Fix: Consider updating the comment to be more explicit:

// Jitter is 0-10% of backoff ms (ranges from 0 to base milliseconds)
let jitter_ms = rand::random::<u64>() % base;

Performance Considerations

Improved Performance

  • Max backoff reduced from 10,240ms (old: 2^10 * 10) to 1,280ms (new: 2^7 * 10)
  • Jitter reduced from fixed 0-100ms to proportional 0-10%, which is better for early retries
  • These changes align better with FDB's 1s max backoff mentioned in the docs

Potential Concern: The reduced max backoff might lead to more aggressive retries under high contention. However, this aligns with FDB's design, so it should be acceptable.

Security Concerns

No security issues identified. The use of rand::random() for jitter is appropriate for this use case (not cryptographic).

Test Coverage

Missing

  • No unit tests for the backoff calculation function
  • No tests verifying the backoff progression matches documented values
  • No tests for edge cases (attempt 0, attempt greater than 7, etc.)

Recommendation: Add tests like:

#[test]
fn test_backoff_progression() {
    // Test that backoff values are within expected ranges
    for attempt in 0..10 {
        let backoff = calculate_tx_retry_backoff(attempt);
        let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
        let expected_max_jitter = 2_u64.pow((attempt as u32).min(7));
        
        assert!(backoff >= expected_base);
        assert!(backoff < expected_base + expected_max_jitter);
    }
}

#[test]
fn test_backoff_max_cap() {
    // Verify attempts beyond 7 use same base as attempt 7
    let backoff_7_base = calculate_tx_retry_backoff(7) / 10 * 10;
    let backoff_100_base = calculate_tx_retry_backoff(100) / 10 * 10;
    
    // Both should have same base (1280ms), only jitter differs
    assert_eq!(backoff_7_base, backoff_100_base);
}

Action Items

  1. Remove duplicate function from tests/rocksdb.rs and import from module
  2. Add unit tests for the backoff calculation
  3. Consider clarifying the jitter comment (optional)

Verdict

The core algorithm change is correct and improves alignment with FDB, but the duplicate function in tests needs to be addressed before merging.

@MasterPtato MasterPtato marked this pull request as ready for review November 18, 2025 23:43
@MasterPtato MasterPtato force-pushed the 11-12-chore_write_update_docs branch from 99d08ca to 0bb8c75 Compare November 20, 2025 22:38
@claude
Copy link

claude bot commented Nov 20, 2025

Pull Request Review

This PR updates the exponential backoff algorithm for transaction retries in UniversalDB to better align with FoundationDB's implementation. Overall the changes look good, but I have some feedback:

✅ Positives

  1. Better alignment with FoundationDB: The updated algorithm now caps at 7 attempts (1280ms + jitter) instead of 10 (10240ms + jitter), which is closer to FDB's 1-second max backoff
  2. Improved jitter calculation: The new jitter calculation (0-10% instead of fixed 0-100ms) scales proportionally with the backoff, which is a more standard approach
  3. Good documentation: The comment clearly explains the algorithm with examples and references to FDB's implementation
  4. Proper code organization: Moving the function from tests to the main utils module makes it properly accessible

🔍 Issues & Concerns

1. Potential Bug in Jitter Calculation (engine/packages/universaldb/src/utils/mod.rs:46)

let jitter_ms = rand::random::<u64>() % base;

The comment says "Jitter is 0-10% of backoff ms" but the code calculates jitter_ms = rand::random::<u64>() % base, where base = 2^attempt. This means:

  • Attempt 0: base=1, jitter=0-1ms (10% of 10ms would be 0-1ms) ✅
  • Attempt 1: base=2, jitter=0-2ms (10% of 20ms would be 0-2ms) ✅
  • Attempt 2: base=4, jitter=0-4ms (10% of 40ms would be 0-4ms) ✅

Actually, this is correct! The jitter is 10% because base_backoff_ms = base * 10, so base is indeed 10% of base_backoff_ms. The comment accurately describes the behavior.

2. Breaking Change in Backoff Times (engine/packages/universaldb/src/utils/mod.rs:41-48)

The old algorithm had:

  • Max attempt: 10 → 10240ms + 0-100ms jitter
  • Backoff grew much more aggressively

The new algorithm has:

  • Max attempt: 7 → 1280ms + 0-128ms jitter
  • Capped much earlier

Impact: This is a behavioral change that will affect retry behavior:

  • Positive: Transactions will fail faster if they're going to fail (2.5 seconds total backoff vs 20+ seconds)
  • Potential issue: Higher contention scenarios might see more transaction failures before hitting max retries
  • Question: What is the max retry count? If it's still high enough, this should be fine

3. Import Style Consistency (engine/packages/api-peer/src/actors/kv_get.rs:2-3)

The import reordering from:

use base64::prelude::BASE64_STANDARD;
use base64::Engine;

to:

use base64::Engine;
use base64::prelude::BASE64_STANDARD;

This appears to be an auto-formatter change. According to CLAUDE.md, formatting is handled by pre-commit hooks, so this is fine.

4. TypeScript Changes (engine/sdks/typescript/runner-protocol/src/index.ts)

Good:

  • Moving assert to a proper Node.js import is the right approach
  • The TODO comment for renaming createTs to update_ts is helpful

Minor: The formatting change in kv_get.rs (engine/packages/api-public/src/actors/kv_get.rs:65-71) is just cosmetic but follows proper formatting conventions.

📋 Recommendations

  1. Consider documenting the max retry count: It would be helpful to document what the maximum number of retries is and what the total time to failure would be with the new backoff algorithm
  2. Performance testing: Given this is a core retry mechanism, it would be good to verify the new backoff times work well under high contention (though the existing tests at rocksdb.rs should help validate this)
  3. Test coverage: The test at rocksdb.rs:68 uses the function correctly, which is good

🎯 Verdict

Approve with minor suggestions. The changes improve alignment with FoundationDB's retry strategy and the code quality is good. The jitter calculation is mathematically correct despite being initially confusing. The main consideration is ensuring the max retry count is appropriate for the faster failure times.

The refactoring from test-only code to a proper public utility is a good architectural improvement.

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Summary

This PR improves the transaction retry backoff algorithm in UniversalDB to better match FoundationDB's behavior. The changes are well-documented and improve code organization.

✅ Positives

  1. Improved backoff algorithm: The new implementation better aligns with FoundationDB's exponential backoff:

    • Max attempt cap reduced from 10 to 7 (1280ms vs 10240ms max)
    • Jitter calculation improved: now 0-10% of backoff (0-base ms) instead of fixed 0-100ms
    • Good documentation with examples showing the progression
  2. Better code organization: Moving calculate_tx_retry_backoff from test file to utils/mod.rs and making it public is the right approach for reusability

  3. Import cleanup: Organizing imports alphabetically (base64::Engine before base64::prelude::*) follows Rust conventions

  4. Formatting improvements: Multi-line formatting of the URL string in api-public/src/actors/kv_get.rs improves readability

  5. TypeScript improvements: Moving assert to a proper import from node:assert is cleaner than a custom implementation

🔍 Issues & Suggestions

🔴 Critical: Potential modulo-by-zero bug

let jitter_ms = rand::random::<u64>() % base;

When attempt = 0, base = 2^0 = 1, so this becomes rand::random::<u64>() % 1, which always equals 0. This works but is inefficient. More critically, if the calculation ever resulted in base = 0, this would panic with division by zero.

Recommendation: Add explicit handling or use a safer approach:

let jitter_ms = if base > 0 {
    rand::random::<u64>() % base
} else {
    0
};

Or document that base is guaranteed to be >= 1.

⚠️ Medium: Jitter calculation doesn't match documentation

The comment states:

Jitter is 0-10% of backoff ms

But the code does: rand::random::<u64>() % base

  • At attempt 0: base=1, so jitter is 0-1ms on a 10ms backoff = 0-10%
  • At attempt 7: base=128, so jitter is 0-128ms on a 1280ms backoff = 0-10%

The implementation is actually correct! However, the examples in the doc comment could be clearer:

///   0 -> 10ms + 0-1ms jitter (0-10%)
///   1 -> 20ms + 0-2ms jitter (0-10%)
///   2 -> 40ms + 0-4ms jitter (0-10%)

ℹ️ Minor: Missing dependency check

The code uses rand::random but I don't see where rand is imported. Please verify the rand crate is in the workspace dependencies and properly imported in this file.

ℹ️ Minor: TypeScript comment location

The TODO comment in runner-protocol/src/index.ts:

/**
 * TODO: Rename to update_ts
 */
readonly createTs: i64

This is good documentation, but consider tracking this TODO in an issue rather than just a code comment for better visibility.

📊 Additional Validation Needed

  1. Tests: The existing test in rocksdb.rs uses this function, but there are no unit tests validating the backoff calculation itself. Consider adding tests like:

    #[test]
    fn test_backoff_progression() {
        assert_eq!(calculate_tx_retry_backoff(0) >= 10, true);
        assert_eq!(calculate_tx_retry_backoff(0) < 12, true);
        // ... test other attempts
    }
  2. Performance: The reduced max backoff (1280ms vs 10240ms) might affect retry behavior under high contention. Ensure this has been validated in production-like scenarios.

Conclusion

The PR is solid overall with clear improvements to the backoff algorithm. The main concern is the potential modulo-by-zero edge case that should be addressed before merging. The rest are documentation and testing improvements.

Recommendation: Request changes to address the modulo-by-zero safety concern, then approve.

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

I've reviewed the changes in this PR. Overall, this is a solid improvement to the backoff algorithm that better aligns with FoundationDB's behavior. Here's my detailed feedback:

✅ Strengths

  1. Improved Backoff Algorithm: The new implementation is much better aligned with FoundationDB's retry logic:

    • Reduced max exponent from 10 to 7 (capping at 1280ms vs 10240ms)
    • Jitter is now proportional to the backoff (0-10%) instead of fixed 0-100ms
    • Excellent documentation with comparative examples
  2. Code Organization: Moving calculate_tx_retry_backoff from the test file to utils/mod.rs is the correct architectural decision - it's now properly exported and reusable.

  3. Documentation: The inline comments comparing your implementation with FDB's are very helpful for future maintainers.

🔍 Issues & Suggestions

1. Jitter Calculation is Correct

The jitter calculation on line 46 (rand::random::<u64>() % base) correctly implements 0-10% jitter since base is 1/10th of base_backoff_ms. The comment is accurate!

2. Import Organization (Minor Style Issue)

In api-peer/src/actors/kv_get.rs:1-2:

use base64::Engine;
use base64::prelude::BASE64_STANDARD;

These could be combined as:

use base64::{Engine, prelude::BASE64_STANDARD};

However, this might be a clippy/fmt preference, so it's acceptable as-is.

3. TypeScript Changes (Unrelated?)

The changes in engine/sdks/typescript/runner-protocol/src/index.ts appear unrelated to the backoff algorithm fix:

  • Moving assert import to top (good cleanup)
  • Adding TODO comment about renaming createTs to update_ts

These are fine changes but seem tangential to the PR's main purpose. Consider whether these should be in a separate commit or PR for cleaner history.

4. Formatting Change in api-public (Minor)

The reformatting of the format! call in api-public/src/actors/kv_get.rs:68-72 is purely stylistic. This is fine but also seems unrelated to the backoff algorithm changes.

📊 Performance Analysis

The new backoff schedule is more aggressive in early attempts and caps out earlier:

Attempt Old Max New Max Improvement
0 10-110ms 10-11ms ~10x faster
1 20-120ms 20-22ms ~5x faster
2 40-140ms 40-44ms ~3x faster
7 1280-1380ms 1280-1408ms Similar
10 10240-10340ms 1280-1408ms ~7.5x faster

This should reduce retry latency for transient conflicts while still providing adequate backoff for persistent contention.

🧪 Test Coverage

The existing tests in rocksdb.rs properly exercise the backoff function. Consider adding a unit test specifically for the backoff calculation to verify:

  • The exponential progression
  • The jitter range
  • The max cap at attempt 7+

Example:

#[test]
fn test_backoff_calculation() {
    // Attempt 0: 10-11ms
    for _ in 0..100 {
        let backoff = calculate_tx_retry_backoff(0);
        assert!(backoff >= 10 && backoff <= 11);
    }
    
    // Attempt 7: 1280-1408ms
    for _ in 0..100 {
        let backoff = calculate_tx_retry_backoff(7);
        assert!(backoff >= 1280 && backoff <= 1408);
    }
    
    // Attempt 100 should cap same as attempt 7
    for _ in 0..100 {
        let backoff_100 = calculate_tx_retry_backoff(100);
        let backoff_7 = calculate_tx_retry_backoff(7);
        // Both should be in same range due to .min(7) cap
        assert!(backoff_100 >= 1280 && backoff_100 <= 1408);
        assert!(backoff_7 >= 1280 && backoff_7 <= 1408);
    }
}

🔒 Security Considerations

No security concerns identified. The use of rand::random for jitter is appropriate for this use case.

✅ Verdict

Approve with minor suggestions. The core change (backoff algorithm) is excellent and significantly improves retry performance. The ancillary changes (import ordering, formatting, TypeScript) are fine but could be separated for cleaner commit history in the future.

The only actionable item would be adding a unit test for the backoff calculation, but this is optional.

@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: fix(udb): update backoff algo

Overview

This PR updates the exponential backoff algorithm used in UniversalDB transaction retries and makes some minor code cleanup improvements. The changes look good overall with some minor considerations.

Code Quality

Positive Changes:

  1. Improved backoff algorithm - The new algorithm is better aligned with FoundationDB retry strategy:

    • Reduced max backoff from ~10s (attempt 10) to ~1.4s (attempt 7), which is more reasonable
    • Jitter calculation is now proportional to the base backoff (0-10%) instead of a fixed 0-100ms
    • Better documented with clear examples comparing to FDB implementation
  2. Better code organization - Moving calculate_tx_retry_backoff from test file to the main utils module makes it properly reusable

    • Now properly exported from universaldb crate
    • Test file correctly imports it instead of duplicating the function
  3. Code cleanup:

    • Fixed import ordering in api-peer/src/actors/kv_get.rs (base64 imports)
    • Improved formatting in api-public/src/actors/kv_get.rs
    • Moved assert to use Node.js built-in instead of custom implementation in TypeScript SDK

Potential Issues and Considerations

1. Breaking Change in Retry Behavior

The new algorithm has significantly shorter max backoff times:

  • Old: Max ~10.24s at attempt 10 (2^10 * 10ms = 10,240ms + jitter)
  • New: Max ~1.28s at attempt 7 (2^7 * 10ms = 1,280ms + jitter)

Impact: This could affect retry behavior under high contention. Monitor retry metrics after deployment to ensure this does not increase retry exhaustion rates.

2. Jitter Calculation Edge Case

For attempt 0, base = 1, so jitter_ms = rand() % 1 = 0 (always zero). The comment says "0-1ms jitter" but the code produces 0ms for attempt 0. Either update the comment to "0 -> 10ms (no jitter)" or adjust the calculation.

3. Documentation

The comment comparing to FDB is excellent. However, FDB comparison shows "X -> max 1s" which suggests FDB caps at 1 second. Your implementation caps at ~1.4s (1280ms + up to 128ms jitter). Is this intentional?

Security Concerns

None identified. The changes are internal retry logic improvements.

Performance Considerations

Generally positive:

  • Faster retries should reduce tail latency for transient conflicts
  • Lower max backoff reduces worst-case retry time
  • Proportional jitter is more appropriate than fixed jitter

Monitor: Watch for increased retry exhaustion if contention is high.

Test Coverage

The existing test in rocksdb.rs already uses this function and exercises retry logic with 64 concurrent tasks creating conflicts. Consider adding a unit test specifically for calculate_tx_retry_backoff to verify exponential growth pattern, jitter bounds, and max attempt capping at 7.

Recommendation

Approve with monitoring. The changes are well-implemented and improve the codebase. The main consideration is monitoring retry behavior post-deployment to ensure the shorter backoff times do not negatively impact reliability under high contention.

Action items for follow-up:

  1. Monitor retry exhaustion metrics after deployment
  2. Consider adding unit tests for the backoff function
  3. Clarify if the 1.4s max (vs FDB 1s) is intentional

@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR improves the transaction retry backoff algorithm in UniversalDB to more closely align with FoundationDB's implementation. Overall, the changes are good and address a real issue with the previous implementation.


Code Quality & Best Practices ✅

Positive:

  • Good code organization: moved calculate_tx_retry_backoff from test file to proper utility module
  • Excellent documentation with comparison to FDB implementation
  • Clean import organization improvements (e.g., api-peer/src/actors/kv_get.rs)
  • Proper use of workspace conventions

Minor formatting improvement in api-public/src/actors/kv_get.rs:68-71:
The string formatting change is purely cosmetic but follows good multi-line formatting practices.


Potential Issues 🔍

1. Critical: Incorrect Jitter Calculation 🚨

Location: engine/packages/universaldb/src/utils/mod.rs:46

The jitter calculation has a logical error:

let jitter_ms = rand::random::<u64>() % base;

Problem: The documentation states "Jitter is 0-10% of backoff ms" but the code implements jitter as 0 to (base - 1) where base = 2^attempt.

Expected behavior (0-10% jitter):

  • Attempt 0: base=1, backoff=10ms → jitter should be 0-1ms ✅ (matches, though coincidentally)
  • Attempt 1: base=2, backoff=20ms → jitter should be 0-2ms ✅ (matches)
  • Attempt 7: base=128, backoff=1280ms → jitter should be 0-128ms ✅ (matches)

Actually, the comment is correct! The jitter IS 0 to base (which equals 10% of backoff). The inline comment claiming "0-10% of backoff ms" is accurate. My apologies - this is correct.

However, there's still a subtle issue: using modulo with rand::random::<u64>() introduces modulo bias for non-power-of-2 values of base. While base is always a power of 2 in this case (2^n), this is still technically suboptimal.

Recommendation:

let jitter_ms = rand::thread_rng().gen_range(0..=base);

This uses the proper uniform distribution without modulo bias. You'll need to add use rand::Rng; at the top.

2. TypeScript Import Change

Location: engine/sdks/typescript/runner-protocol/src/index.ts:1

Changed from local assert function definition to importing from node:assert. This is fine, but:

  • Question: Was the custom assert function being used elsewhere in the file?
  • The removed function (lines 1926-1929 in old version) appears unused, so this cleanup is good
  • Using Node's built-in is better practice ✅

Performance Considerations ⚡

Improved:

  • Max backoff reduced from ~10 seconds to ~1.4 seconds (old: 2^10 * 10ms = 10,240ms, new: 2^7 * 10ms = 1,280ms + jitter)
  • This aligns better with FDB's 1-second max and should reduce unnecessary waiting under contention
  • Jitter reduced from 0-100ms to proportional (0-10% of backoff), which is more appropriate for lower attempts

Concern:

  • The change from min(10) to min(7) means the algorithm will cap out faster
  • Under sustained high contention, this could lead to more aggressive retry patterns
  • Recommendation: Monitor transaction retry rates in production after deployment

Security Concerns 🔒

No security issues identified. The changes are purely algorithmic improvements to retry logic.


Test Coverage 📊

Positive:

  • The test file engine/packages/universaldb/tests/rocksdb.rs properly imports and uses the function
  • Test scenarios with 64 concurrent workers creating conflicts are good stress tests

Missing:

  • No unit tests for calculate_tx_retry_backoff itself
  • Recommendation: Add a test to verify the backoff values match expectations:
#[test]
fn test_calculate_tx_retry_backoff() {
    // Attempt 0: 10ms + 0-1ms jitter = 10-11ms
    for _ in 0..100 {
        let backoff = calculate_tx_retry_backoff(0);
        assert!(backoff >= 10 && backoff <= 11);
    }
    
    // Attempt 7: 1280ms + 0-128ms jitter = 1280-1408ms  
    for _ in 0..100 {
        let backoff = calculate_tx_retry_backoff(7);
        assert!(backoff >= 1280 && backoff <= 1408);
    }
    
    // Beyond max should cap at 7
    assert_eq!(calculate_tx_retry_backoff(7), calculate_tx_retry_backoff(100));
}

Additional Notes

  1. Documentation comment accuracy: The FDB links in the doc comment are helpful and accurate ✅

  2. TODO comment in TypeScript: Good catch noting that createTs should be renamed to update_ts - consider creating a follow-up issue for this

  3. Import consolidation: Nice cleanup moving the function from test-only to public API via lib.rs


Recommendation

Approve with minor suggestions:

  1. Consider using rand::thread_rng().gen_range() instead of modulo for better distribution
  2. Add unit tests for the backoff function
  3. Monitor retry metrics after deployment due to reduced max backoff

The core algorithm improvement is solid and the code quality is good. The main concern is the lack of unit tests for a critical retry mechanism.

@claude
Copy link

claude bot commented Nov 20, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm in UniversalDB to more closely match FoundationDB's behavior. The changes reduce maximum backoff time and adjust jitter calculation for better retry performance.

✅ Positive Aspects

  1. Better alignment with FoundationDB: The new algorithm caps retries at attempt 7 (max 1280ms + jitter) instead of attempt 10 (max 10240ms + jitter), matching FDB's documented behavior more closely.

  2. Improved jitter calculation: Changed from fixed 0-100ms jitter to proportional 0-10% jitter (0 to base ms), which scales appropriately with the backoff duration.

  3. Code organization: Good refactoring by moving calculate_tx_retry_backoff from test file to the public utils module where it belongs, making it reusable.

  4. Documentation: Excellent inline documentation comparing the new algorithm to FDB's implementation with links to source code.

  5. Minor improvements: Import ordering fixes and formatting improvements in unrelated files.

🔍 Issues & Concerns

Critical Issue: Potential Overflow Risk

Location: engine/packages/universaldb/src/utils/mod.rs:46

let jitter_ms = rand::random::<u64>() % base;

Problem: Using modulo on a potentially small base value when rand::random::<u64>() can return very large values may not provide uniform distribution. More importantly, when attempt = 0, base = 1, so jitter is always 0, defeating the purpose of jitter on the first retry.

Recommendation: Consider using rand::thread_rng().gen_range(0..base) for better distribution, or ensure the jitter calculation matches the comment (0-10% means base/10, not base):

use rand::Rng;
let jitter_ms = rand::thread_rng().gen_range(0..=base); // If you want 0-base range
// OR based on your comment "0-10% of backoff ms":
let jitter_ms = rand::thread_rng().gen_range(0..=(base_backoff_ms / 10)); // For true 0-10%

Documentation Mismatch

Location: engine/packages/universaldb/src/utils/mod.rs:30-34

The comment says:

///   0 -> 10ms + 0-1ms jitter
///   1 -> 20ms + 0-2ms jitter

But the code implements:

  • attempt 0: base=1, jitter 0-1ms
  • attempt 1: base=2, jitter 0-2ms
  • attempt 2: base=4, jitter 0-4ms

However, the comment states "Jitter is 0-10% of backoff ms" (line 45), but the actual implementation does jitter 0-base which is:

  • attempt 0: 0-1ms jitter out of 10ms base = 0-10% ✓
  • attempt 1: 0-2ms jitter out of 20ms base = 0-10% ✓
  • attempt 2: 0-4ms jitter out of 40ms base = 0-10% ✓

The code is correct! The comment is accurate. Good work here.

Minor: Missing Dependency Check

The code uses rand::random::<u64>() but I should verify that rand is a workspace dependency. Let me check if there are any import statements needed at the top of the file.

Location: engine/packages/universaldb/src/utils/mod.rs:46

No explicit use rand::... statement is visible in the file. This might work if rand is used elsewhere, but it's better to be explicit. Consider adding:

use rand::Rng; // if using gen_range

TypeScript Changes

Location: engine/sdks/typescript/runner-protocol/src/index.ts

  1. Good: Adding the proper import for assert from node:assert
  2. Good: Removing the custom assert implementation that was less robust
  3. Good: Adding TODO comment about renaming createTs to update_ts

These changes are clean and improve code quality.

🎯 Testing Considerations

The PR includes test file changes in rocksdb.rs:

  • The test now properly imports calculate_tx_retry_backoff from the public API
  • The duplicate test implementation was removed ✓

Recommendation: Consider adding a unit test specifically for the backoff calculation to verify:

  1. Backoff values match expected ranges for attempts 0-10
  2. Jitter is within expected bounds
  3. Maximum backoff caps at attempt 7

Example:

#[test]
fn test_calculate_tx_retry_backoff() {
    for attempt in 0..10 {
        let backoff = calculate_tx_retry_backoff(attempt);
        let base = 2_u64.pow((attempt as u32).min(7));
        let expected_base = base * 10;
        assert!(backoff >= expected_base);
        assert!(backoff <= expected_base + base);
    }
}

🔒 Security & Performance

  • Security: No concerns. The changes don't introduce vulnerabilities.
  • Performance:
    • ✅ Reduced maximum backoff time will improve retry performance under contention
    • ✅ Proportional jitter prevents thundering herd while allowing faster retries
    • ⚠️ First retry (attempt 0) now has minimal jitter which could cause thundering herd in high-concurrency scenarios

📋 Recommendations

  1. High Priority: Fix the jitter calculation to use proper random range generation
  2. Medium Priority: Add unit tests for the backoff function
  3. Low Priority: Consider whether attempt 0 should have more jitter to prevent thundering herd

Overall Assessment

This is a solid improvement that brings the retry behavior closer to FoundationDB's proven approach. The main concern is the jitter calculation implementation which should use proper random range generation. Once that's addressed, this PR should be good to merge.

Rating: ⚠️ Approve with changes requested (fix jitter calculation)


Review generated by Claude Code

@claude
Copy link

claude bot commented Nov 21, 2025

Code Review

I have reviewed PR #3469 "fix(udb): update backoff algo" and have the following feedback:

Positive Changes

  1. Improved backoff algorithm: The updated exponential backoff algorithm is now closer to FoundationDB implementation with better-tuned parameters - max exponent changed from 10 to 7, capping maximum backoff at approximately 1.28s instead of approximately 10.24s

  2. Code organization: Moving calculate_tx_retry_backoff from the test file to utils/mod.rs is the right approach - it is now properly shared across the codebase and publicly exported

  3. Good documentation: The function has clear comments comparing the implementation with FDB approach, including useful references to the FDB source code

  4. Minor formatting fixes: Import ordering improvements follow Rust conventions

Critical Issue: Jitter calculation bug

The jitter calculation at line 46 in utils/mod.rs appears incorrect. The code uses rand::random::<u64>() % base but the comment says "Jitter is 0-10% of backoff ms".

Problem: For attempt 0, base=1 and base_backoff_ms=10. Current jitter is always 0, but should be 0-9ms according to the comment. The doc comment examples ("0 -> 10ms + 0-1ms jitter") suggest the intent is proportional to base, matching current implementation. However, the inline comment says "0-10% of backoff ms" which would require % base_backoff_ms.

Fix needed: Either change to rand::random::<u64>() % base_backoff_ms for true 10% jitter, or update the comment to accurately describe the current behavior.

Minor Observations

  1. TypeScript changes using standard import assert is good practice
  2. TODO comment for renaming update_ts is helpful
  3. Multi-line format macro improves readability

Testing Recommendations

  1. Add unit tests for calculate_tx_retry_backoff to verify backoff values and jitter ranges
  2. Consider adding timing assertions to integration tests

Action Items

Must fix before merge: Resolve jitter calculation inconsistency between code and documentation

Nice to have: Add unit tests for backoff function, consider using rand::thread_rng()

Summary

The PR improves the backoff algorithm significantly, but the jitter calculation needs clarification or fixing before merge.


Review generated with Claude Code

@claude
Copy link

claude bot commented Nov 21, 2025

Code Review for PR #3469: fix(udb): update backoff algo

Overview

This PR updates the backoff algorithm for transaction retries in UniversalDB to better match FoundationDB's behavior. The changes are well-documented and improve the exponential backoff strategy.

✅ Strengths

  1. Excellent Documentation: The inline comments clearly explain the new backoff behavior with concrete examples and references to FDB source code.

  2. Proper Code Organization: Moving calculate_tx_retry_backoff from the test file to the utils module and making it public is the right architectural decision.

  3. Import Cleanup: The import reordering in api-peer/src/actors/kv_get.rs and api-public/src/actors/kv_get.rs follows Rust conventions (standard library first).

  4. TypeScript Cleanup: Moving assert import to the top and removing the duplicate implementation is a good cleanup.

🔍 Issues & Concerns

1. Critical Bug: Incorrect Jitter Calculation

Location: engine/packages/universaldb/src/utils/mod.rs:46

The jitter calculation has a critical issue:

let base = 2_u64.pow((attempt as u32).min(7));
let base_backoff_ms = base * 10;
let jitter_ms = rand::random::<u64>() % base;  // ❌ WRONG

The comment says "Jitter is 0-10% of backoff ms", but the code calculates rand() % base where base = 2^attempt.

Expected:

  • For attempt 0: base=1, backoff=10ms, jitter should be 0-1ms (10%)
  • For attempt 1: base=2, backoff=20ms, jitter should be 0-2ms (10%)
  • For attempt 7: base=128, backoff=1280ms, jitter should be 0-128ms (10%)

Actual behavior:

  • The jitter rand() % base does produce the ranges mentioned in the documentation (0-1ms, 0-2ms, etc.)
  • However, this is NOT 10% of the backoff - it's 10% of the BASE, which happens to work out to 10% of the backoff only because backoff = base * 10

Issue: While the current implementation produces the correct numeric ranges described in the comments, the inline comment "Jitter is 0-10% of backoff ms" is misleading. If someone later changes the backoff formula (e.g., changing the multiplier from 10), the jitter percentage would no longer be 10%.

Recommendations:

// Option 1: Make the intent explicit
let jitter_ms = rand::random::<u64>() % (base_backoff_ms / 10);

// Option 2: Update the comment to be more accurate
// Jitter is 0-base ms (which equals 10% of backoff)
let jitter_ms = rand::random::<u64>() % base;

2. Potential Integer Overflow (Low Risk)

Location: engine/packages/universaldb/src/utils/mod.rs:48

base_backoff_ms + jitter_ms

While extremely unlikely in practice (max would be 1280ms + 128ms = 1408ms), there's no explicit overflow protection. Consider using saturating_add() for defensive programming:

base_backoff_ms.saturating_add(jitter_ms)

3. Modulo Bias in Random Number Generation

Location: engine/packages/universaldb/src/utils/mod.rs:46

Using rand::random::<u64>() % base introduces modulo bias when base doesn't evenly divide the range of u64. For small values of base (1-128), this bias is negligible but technically imperfect.

Recommendation: Use rand::thread_rng().gen_range(0..base) for unbiased random generation:

use rand::Rng;

let jitter_ms = rand::thread_rng().gen_range(0..base);

4. Missing Test Coverage

The function calculate_tx_retry_backoff is now a public utility but lacks unit tests. Consider adding tests to verify:

  • Exponential growth (attempt 0→1→2→...→7)
  • Capping at attempt 7 (2^7 = 128)
  • Jitter range is within expected bounds
  • Behavior for edge cases (attempt > 7)

Example test:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_backoff_growth() {
        // Test base backoff without considering jitter
        for attempt in 0..=7 {
            let backoff = calculate_tx_retry_backoff(attempt);
            let expected_base = 2_u64.pow(attempt as u32) * 10;
            let expected_max_jitter = 2_u64.pow(attempt as u32);
            
            assert!(backoff >= expected_base);
            assert!(backoff < expected_base + expected_max_jitter);
        }
    }

    #[test]
    fn test_backoff_capping() {
        // Verify attempts beyond 7 are capped at 2^7
        let backoff_7 = calculate_tx_retry_backoff(7);
        let backoff_8 = calculate_tx_retry_backoff(8);
        let backoff_100 = calculate_tx_retry_backoff(100);
        
        // All should use base = 2^7 = 128
        // Exact values will vary due to jitter, but ranges should be same
        let expected_min = 1280;
        let expected_max = 1280 + 128;
        
        assert!(backoff_7 >= expected_min && backoff_7 < expected_max);
        assert!(backoff_8 >= expected_min && backoff_8 < expected_max);
        assert!(backoff_100 >= expected_min && backoff_100 < expected_max);
    }
}

📝 Minor Issues

5. Formatting Consistency

Location: engine/packages/api-public/src/actors/kv_get.rs:68-72

The formatting change to split the format! call is good for readability but inconsistent with project style. However, this is likely auto-formatted, so no action needed if the project uses rustfmt.

6. TODO Comment in TypeScript

Location: engine/sdks/typescript/runner-protocol/src/index.ts:56-58

The TODO comment "TODO: Rename to update_ts" suggests a pending refactor. While this is fine, ensure there's a tracking issue for this rename, as it appears to be a breaking API change.

🔒 Security Considerations

No security concerns identified. The changes to backoff timing don't introduce security vulnerabilities.

⚡ Performance Considerations

Improved: The new backoff algorithm caps at 2^7 (128ms base) instead of 2^10 (1024ms base), which means:

  • Faster retry cycles for high contention scenarios
  • More aggressive retry strategy
  • Better alignment with FoundationDB's proven approach

Note: With max attempt 7 capped at ~1.4 seconds backoff (vs previous ~100+ seconds at attempt 10), high-contention scenarios will retry more frequently. This should be monitored in production.

📊 Test Coverage

The test file engine/packages/universaldb/tests/rocksdb.rs now correctly imports the function from the utils module, which is good. However, the test doesn't validate the backoff behavior itself—it only uses the function. Consider adding dedicated unit tests as mentioned above.

✅ Recommendations Summary

Must Fix:

  1. ✅ Clarify the jitter calculation comment or refactor to use base_backoff_ms / 10 for clarity

Should Fix:
2. ⚠️ Add unit tests for calculate_tx_retry_backoff
3. ⚠️ Use saturating_add() for overflow safety

Consider:
4. 💭 Use rand::thread_rng().gen_range() for unbiased random generation
5. 💭 Create tracking issue for the update_ts rename if not already tracked

Conclusion

Overall, this is a solid improvement to the backoff algorithm with excellent documentation. The main concern is the jitter calculation clarity and lack of test coverage. Once those are addressed, this PR will be in great shape.

Approval Status:Request Changes - pending jitter calculation clarification and test coverage

@claude
Copy link

claude bot commented Nov 21, 2025

PR Review: fix(udb): update backoff algo

Thanks for this PR! Overall, this is a good improvement that brings the backoff algorithm closer to FoundationDB behavior.

Strengths:

  1. Improved backoff algorithm - proper max cap at attempt 7 (1280ms base)
  2. Better jitter calculation - now proportional to base backoff (0-10 percent)
  3. Good code organization - moving calculate_tx_retry_backoff to utils/mod.rs makes it reusable
  4. Excellent documentation with GitHub links to FDB implementation

Issues and Suggestions:

  1. Jitter calculation clarity (packages/universaldb/src/utils/mod.rs:46)
    The comment says jitter is 0-10 percent of backoff ms but code does jitter mod base. Math is correct but not obvious. Consider clarifying.

  2. Modulo bias in random number generation
    Using rand::random mod base introduces modulo bias. Better to use rand::thread_rng().gen_range(0..base).

  3. Missing test coverage
    No test validates the backoff calculation itself. Consider adding unit test to verify values at different attempts and max cap.

  4. TypeScript assert import - Good refactor to use node:assert

Performance: New algorithm reduces latency under light contention and prevents excessive backoff times. This is an improvement.

Security: No issues identified.

Recommended Actions:

  1. Approve as-is if time-sensitive - code is functionally correct
  2. Nice to have: Fix modulo bias with gen_range
  3. Nice to have: Clarify jitter calculation
  4. Nice to have: Add unit tests

Great work on improving the retry logic!

@claude
Copy link

claude bot commented Nov 21, 2025

Code Review

This PR updates the backoff algorithm in UniversalDB to better align with FoundationDB's implementation. Overall, the changes look good with well-documented improvements.

Strengths

  1. Better Alignment with FDB: The new backoff algorithm is closer to FoundationDB's implementation with a capped exponential backoff (max at attempt 7 = 1280ms) instead of the previous attempt 10.

  2. Improved Jitter: The jitter calculation has been improved from a fixed 0-100ms range to a proportional 0-10% of the backoff time.

  3. Excellent Documentation: The detailed comments comparing the implementation to FDB's approach with links to source code are very helpful.

  4. Good Code Organization: Moving calculate_tx_retry_backoff from the test file to utils/mod.rs makes it properly accessible and reusable.

Issues and Recommendations

1. Jitter Calculation Clarity (High Priority)

The comment says "Jitter is 0-10% of backoff ms" but the calculation uses modulo base instead of base_backoff_ms. This is actually correct since base equals base_backoff_ms divided by 10, but it's confusing.

Recommendation: Either update the comment to clarify this relationship, or make it more explicit in code.

2. Missing Dependency Verification

Please verify that the rand crate is properly declared in the workspace dependencies for the universaldb package.

3. Mathematical Precision (Low Priority)

The modulo operation with rand::random doesn't provide perfectly uniform distribution. For the small values of base used here, the bias is negligible, but consider using thread_rng().gen_range for better distribution.

4. Test Coverage

Consider adding a unit test for the backoff calculation to verify the backoff values, jitter bounds, and max cap behavior.

Minor Observations

  • TypeScript TODO comment about renaming createTs to update_ts is good to track but appears unrelated to this PR
  • Import reordering in Rust files is likely from auto-formatting
  • Multi-line formatting of the format! macro improves readability

Security and Performance

No security issues identified. The changes improve performance by capping retries earlier and using proportional jitter.

Summary

Approve with minor suggestions. The core algorithm improvement is solid and well-documented. Main recommendation is to clarify the jitter calculation comment. Great work on the improved documentation and better FDB alignment!

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 21, 2025

Merge activity

  • Nov 21, 2:40 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 21, 2:42 AM UTC: CI is running for this pull request on a draft pull request (#3515) due to your merge queue CI optimization settings.
  • Nov 21, 2:43 AM UTC: Merged by the Graphite merge queue via draft PR: #3515.

graphite-app bot pushed a commit that referenced this pull request Nov 21, 2025
@graphite-app graphite-app bot closed this Nov 21, 2025
@graphite-app graphite-app bot deleted the 11-13-fix_udb_update_backoff_algo branch November 21, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants