Skip to content

feat(exception-capture): add client-side token bucket rate limiting#662

Open
hpouillot wants to merge 4 commits into
mainfrom
feat/exception-bucketed-rate-limiter
Open

feat(exception-capture): add client-side token bucket rate limiting#662
hpouillot wants to merge 4 commits into
mainfrom
feat/exception-bucketed-rate-limiter

Conversation

@hpouillot

@hpouillot hpouillot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

💡 Motivation and Context

Addresses the long-standing TODO in posthog/exception_capture.py: exception autocapture has no client-side rate limiting, so a crash loop can flood the ingestion queue.

This ports the BucketedRateLimiter from posthog-js (packages/core/src/utils/bucketed-rate-limiter.ts) and applies it to exception autocapture as an opt-in feature:

  • disabled by default — without the flag, behavior is identical to released versions (the limiter is not even constructed)
  • one token bucket per exception type (the Python equivalent of $exception_list[0].type, falling back to "Exception")
  • rate-limited exceptions are skipped before reaching the ingestion queue, logging Skipping exception capture because of client rate limiting. like the other SDKs
  • defaults are more generous than the browser/Node SDKs' 10 / 1 / 10s because one server process aggregates exceptions across many users' requests

New Client options (also available as module-level settings):

Option Default Description
enable_exception_autocapture_rate_limiting False Opt into client-side rate limiting
exception_autocapture_bucket_size 50 Max burst of captures per exception type (clamped to 0–100)
exception_autocapture_refill_rate 10 Tokens restored per refill interval
exception_autocapture_refill_interval_seconds 10 Seconds between refills

Deviations from the JS source, since this SDK runs in threaded server processes:

  • guarded by a threading.Lock (the on_bucket_rate_limited callback fires outside the lock)
  • injectable monotonic clock for tests

One JS quirk is preserved for cross-SDK parity: the call that drains the bucket is itself reported as limited, so a burst lets bucket_size - 1 events through.

💚 How did you test it?

  • posthog/test/test_bucketed_rate_limiter.py: 25 tests porting the posthog-js spec (bucketed-rate-limiter.spec.ts) — consumption, refill math, partial intervals, bucket isolation, callback semantics, stop(), timestamp carry-over — plus Python-specific tests: parameter clamping, a 10-thread concurrency test asserting exactly bucket_size - 1 of 200 contended consumes pass, ExceptionCapture integration (15 same-type exceptions on a size-10 bucket → 9 captured; a different type still passes), disabled-by-default (100 captures pass untouched, no limiter constructed), enabled defaults, and config pass-through from Client kwargs to the limiter.
  • Existing posthog/test/test_exception_capture.py (15 tests) still passes.
  • ruff format, ruff check, and mypy are clean.

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

If releasing new changes

  • Ran sampo add to generate a changeset file

🤖 Generated with Claude Code

Port the posthog-js BucketedRateLimiter (packages/core/src/utils/
bucketed-rate-limiter.ts) and apply it to exception autocapture with
the same settings as the browser and Node SDKs: one bucket per
exception type, bucket size 10, refilling 1 token per 10 seconds.

Resolves the long-standing TODO in exception_capture.py.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

posthog-python Compliance Report

Date: 2026-06-12 11:18:34 UTC
Duration: 176153ms

✅ All Tests Passed!

45/45 tests passed


Capture Tests

29/29 tests passed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 518ms
Format Validation.Event Has Uuid 1508ms
Format Validation.Event Has Lib Properties 1510ms
Format Validation.Distinct Id Is String 1507ms
Format Validation.Token Is Present 1508ms
Format Validation.Custom Properties Preserved 1507ms
Format Validation.Event Has Timestamp 1507ms
Retry Behavior.Retries On 503 9519ms
Retry Behavior.Does Not Retry On 400 3507ms
Retry Behavior.Does Not Retry On 401 3508ms
Retry Behavior.Respects Retry After Header 9515ms
Retry Behavior.Implements Backoff 23520ms
Retry Behavior.Retries On 500 7513ms
Retry Behavior.Retries On 502 7516ms
Retry Behavior.Retries On 504 7513ms
Retry Behavior.Max Retries Respected 23530ms
Deduplication.Generates Unique Uuids 1499ms
Deduplication.Preserves Uuid On Retry 7516ms
Deduplication.Preserves Uuid And Timestamp On Retry 14514ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 7516ms
Deduplication.No Duplicate Events In Batch 1505ms
Deduplication.Different Events Have Different Uuids 1507ms
Compression.Sends Gzip When Enabled 1508ms
Batch Format.Uses Proper Batch Structure 1508ms
Batch Format.Flush With No Events Sends Nothing 1005ms
Batch Format.Multiple Events Batched Together 1506ms
Error Handling.Does Not Retry On 403 3510ms
Error Handling.Does Not Retry On 413 3508ms
Error Handling.Retries On 408 7513ms

Feature_Flags Tests

16/16 tests passed

View Details
Test Status Duration
Request Payload.Request With Person Properties Device Id 1005ms
Request Payload.Flags Request Uses V2 Query Param 1007ms
Request Payload.Flags Request Hits Flags Path Not Decide 1007ms
Request Payload.Flags Request Omits Authorization Header 1007ms
Request Payload.Token In Flags Body Matches Init 1007ms
Request Payload.Groups Round Trip 1007ms
Request Payload.Groups Default To Empty Object 1007ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It 1008ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False 1007ms
Request Payload.Disable Geoip Omitted Defaults To False 1008ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key 1007ms
Request Lifecycle.No Flags Request On Init Alone 503ms
Request Lifecycle.No Flags Request On Normal Capture 1508ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests 1011ms
Request Lifecycle.Mock Response Value Is Returned To Caller 1003ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event 1510ms

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/test/test_bucketed_rate_limiter.py:223-248
**Integration test placed in wrong test module**

`test_exception_capture_rate_limits_per_exception_type` exercises `ExceptionCapture` end-to-end (including its `sys.excepthook` side-effect and `close()` teardown) — it is an integration test for `ExceptionCapture`, not for `BucketedRateLimiter`. Having it here means anyone reading `test_exception_capture.py` gets an incomplete picture of that class's tested behaviour. It belongs in `test_exception_capture.py` alongside the other `ExceptionCapture` tests.

Reviews (1): Last reviewed commit: "feat(exception-capture): add client-side..." | Re-trigger Greptile

Comment on lines +223 to +248
def test_exception_capture_rate_limits_per_exception_type():
from posthog.exception_capture import ExceptionCapture

client = MagicMock()
capture = ExceptionCapture(client)
try:

def exc_info(error):
try:
raise error
except type(error):
import sys

return sys.exc_info()

for _ in range(15):
capture.capture_exception(exc_info(ValueError("boom")))

# bucket size 10 -> 9 captured, the rest rate limited
assert client.capture_exception.call_count == 9

# a different exception type has its own bucket
capture.capture_exception(exc_info(ZeroDivisionError("zero")))
assert client.capture_exception.call_count == 10
finally:
capture.close()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Integration test placed in wrong test module

test_exception_capture_rate_limits_per_exception_type exercises ExceptionCapture end-to-end (including its sys.excepthook side-effect and close() teardown) — it is an integration test for ExceptionCapture, not for BucketedRateLimiter. Having it here means anyone reading test_exception_capture.py gets an incomplete picture of that class's tested behaviour. It belongs in test_exception_capture.py alongside the other ExceptionCapture tests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/test/test_bucketed_rate_limiter.py
Line: 223-248

Comment:
**Integration test placed in wrong test module**

`test_exception_capture_rate_limits_per_exception_type` exercises `ExceptionCapture` end-to-end (including its `sys.excepthook` side-effect and `close()` teardown) — it is an integration test for `ExceptionCapture`, not for `BucketedRateLimiter`. Having it here means anyone reading `test_exception_capture.py` gets an incomplete picture of that class's tested behaviour. It belongs in `test_exception_capture.py` alongside the other `ExceptionCapture` tests.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

hpouillot and others added 3 commits June 12, 2026 13:04
Expose exception_autocapture_bucket_size, exception_autocapture_refill_rate
and exception_autocapture_refill_interval_seconds on Client and the
module-level API, passed through to ExceptionCapture's rate limiter.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Bucket size 50 refilling 10 tokens per 10 seconds (was 10/1/10, the
browser SDK defaults) since one server process aggregates exceptions
across many users' requests. Defaults now live on ExceptionCapture and
are referenced by Client and the module-level API.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add enable_exception_autocapture_rate_limiting (default False) on
Client, the module-level API and ExceptionCapture. The limiter is only
constructed when enabled, so default behavior is unchanged from
released versions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hpouillot hpouillot marked this pull request as ready for review June 12, 2026 12:13
@hpouillot hpouillot requested a review from a team as a code owner June 12, 2026 12:13
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "feat(exception-capture): make rate limit..." | Re-trigger Greptile

@hpouillot hpouillot requested review from a team, ablaszkiewicz and cat-ph June 12, 2026 12:47
def capture_exception(self, exception, metadata=None):
try:
if self._rate_limiter is not None:
exception_type = self._exception_type(exception)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the correct order with chained exceptions? I think that might consume the top-level one instead, i.e. RuntimeError from ZeroDivisionError consumes RuntimeError instead of ZeroDivisionError

client = MagicMock()
capture = ExceptionCapture(client, rate_limiting_enabled=True, bucket_size=2)

capture.capture_exception(exc_info_for(RuntimeError("wrapped", ZeroDivisionError())))
capture.capture_exception(exc_info_for(RuntimeError("wrapped", KeyError())))

assert client.capture_exception.call_count == 2

I think this fails, but we'd want it to pass?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants