-
Notifications
You must be signed in to change notification settings - Fork 20
Fix async concurrency issues in split PDF hook #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix async concurrency issues in split PDF hook #331
Conversation
- Add AsyncSDKInitHook, AsyncBeforeRequestHook, AsyncAfterSuccessHook, and AsyncAfterErrorHook interfaces - Update SDKHooks to store and execute both sync and async hooks - Add async executor methods that run async hooks first, then sync hooks for backward compatibility - Update basesdk.py do_request_async to call async hook methods - Enables proper async/await patterns in hooks without blocking the event loop This is the first step toward fixing ENG-792 where blocking calls in the split PDF hook prevent concurrent async requests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Implement AsyncBeforeRequestHook, AsyncAfterSuccessHook, and AsyncAfterErrorHook interfaces - Convert _await_elements to async method that directly awaits coroutines - Remove blocking executor.submit() and .result() calls that were blocking the event loop - Remove ThreadPoolExecutor and _run_coroutines_in_separate_thread workaround - before_request_async contains full implementation (CPU-bound work noted in docstring) - Register async hook implementations in init_hooks - Sync hooks still work for backward compatibility with partition() - Async hooks enable true concurrent requests with partition_async() Fixes ENG-792: partition_async now supports concurrent SDK requests without blocking. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Critical fixes: - Fix missing await in sync after_success (would have caused bugs) - Move instance-level settings (allow_failed, cache_tmp_data_feature, cache_tmp_data_dir) to per-operation dicts - This prevents race conditions where concurrent requests would overwrite each other's settings - Add safety check for tempdir existence before accessing in call_api_partial Per-operation settings storage: - Each operation_id now has its own isolated settings - Settings are properly cleaned up in _clear_operation - Default values are used when operation_id not found Tests added: - test_per_request_settings_isolation: Validates settings don't interfere between operations - test_per_request_settings_cleanup: Validates proper cleanup - test_concurrent_async_operations_isolation: Simulates real concurrent async operations - test_await_elements_uses_operation_settings: Validates _await_elements uses correct settings - test_default_values_used_when_operation_not_found: Validates fallback to defaults This fixes the race conditions that caused inconsistent behavior and "event loop is closed" warnings when making concurrent partition_async requests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Convert load_elements_from_response to async function using aiofiles - Update caller in _await_elements to await the async file read - Prevents blocking the event loop during file I/O operations - Each cached response file is now read asynchronously This completes the async transformation of the split PDF hook, ensuring no blocking I/O operations remain in async contexts. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| """Code generated by Speakeasy (https://speakeasy.com). DO NOT EDIT.""" | ||
|
|
||
| import httpx | ||
| import inspect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.


Summary
Fixes multiple concurrency issues in the split PDF hook that prevented concurrent
partition_asyncrequests from working correctly. The hook was using blocking operations that defeated the purpose of async/await, causing concurrent requests to interfere with each other.Issues Fixed
Critical: Event Loop Blocking
.result()call that waited for ThreadPoolExecutor in_await_elementsasyncio.run()in separate thread workaroundload_elements_from_responseasync to prevent blocking I/OCritical: Race Conditions
allow_failed,cache_tmp_data_feature, andcache_tmp_data_dirto per-operation storageoperation_idBug Fix
after_successthat would have caused incorrect behaviorTechnical Changes
1. Async Hook Infrastructure
AsyncBeforeRequestHook,AsyncAfterSuccessHook,AsyncAfterErrorHookSDKHooksto execute async hooks in async contextsbasesdk.pyto call async hook methods indo_request_async2. SplitPdfHook Async Conversion
_await_elementsto async method that directly awaits coroutines3. Per-Operation Settings Isolation
_clear_operationTesting
Added comprehensive unit tests:
test_per_request_settings_isolation- Validates settings don't interferetest_per_request_settings_cleanup- Validates proper cleanuptest_concurrent_async_operations_isolation- Simulates real concurrent scenariostest_await_elements_uses_operation_settings- Validates correct settings usagetest_default_values_used_when_operation_not_found- Validates fallback behaviorImpact
Before:
partition_asyncblocked when making concurrent requestsAfter:
partition_asyncrequests work correctlyCommits
🤖 Generated with Claude Code
Note
Medium Risk
Touches core request/response hook execution for all async SDK calls and changes split-PDF request fan-out behavior; regressions could impact request handling and error propagation under concurrency.
Overview
Fixes
partition_asyncconcurrency by introducing async hook infrastructure and refactoring the split-PDF flow to avoid event-loop blocking and shared mutable state.SDKHooks/basesdk.pynow executebefore_request/after_success/after_errorvia new async hook interfaces (while still running sync hooks for compatibility), and hook registration wires up the async variants.SplitPdfHookis updated to use per-operation_idsettings (e.g.,allow_failed, cache flags/dir, concurrency) with cleanup, removes the prior thread/executor workaround, and switches cached-element loading to async file I/O; unit tests add coverage for settings isolation/cleanup and concurrent async behavior. Version/release metadata is bumped (plus lockfile regen).Written by Cursor Bugbot for commit 1306dd8. This will update automatically on new commits. Configure here.