Skip to content

Conversation

@awalker4
Copy link
Collaborator

@awalker4 awalker4 commented Feb 9, 2026

Summary

Fixes multiple concurrency issues in the split PDF hook that prevented concurrent partition_async requests from working correctly. The hook was using blocking operations that defeated the purpose of async/await, causing concurrent requests to interfere with each other.

Issues Fixed

Critical: Event Loop Blocking

  • Removed blocking .result() call that waited for ThreadPoolExecutor in _await_elements
  • Removed asyncio.run() in separate thread workaround
  • Made load_elements_from_response async to prevent blocking I/O
  • Hooks now properly use async/await patterns throughout

Critical: Race Conditions

  • Fixed instance-level settings sharing between concurrent requests
  • Moved allow_failed, cache_tmp_data_feature, and cache_tmp_data_dir to per-operation storage
  • Each operation now has isolated settings keyed by operation_id
  • Prevents one request from overwriting another's configuration

Bug Fix

  • Fixed missing await in sync after_success that would have caused incorrect behavior
  • Sync version now properly documents it doesn't support split PDF operations

Technical Changes

1. Async Hook Infrastructure

  • Added async hook interfaces: AsyncBeforeRequestHook, AsyncAfterSuccessHook, AsyncAfterErrorHook
  • Updated SDKHooks to execute async hooks in async contexts
  • Modified basesdk.py to call async hook methods in do_request_async
  • Maintains backward compatibility with sync hooks

2. SplitPdfHook Async Conversion

  • Converted _await_elements to async method that directly awaits coroutines
  • Implemented async versions of hook methods
  • Removed ThreadPoolExecutor workaround
  • Converted file I/O operations to async using aiofiles

3. Per-Operation Settings Isolation

  • Changed settings from instance variables to per-operation dicts
  • Added operation-specific cleanup in _clear_operation
  • Added safety checks for tempdir existence

Testing

Added comprehensive unit tests:

  • test_per_request_settings_isolation - Validates settings don't interfere
  • test_per_request_settings_cleanup - Validates proper cleanup
  • test_concurrent_async_operations_isolation - Simulates real concurrent scenarios
  • test_await_elements_uses_operation_settings - Validates correct settings usage
  • test_default_values_used_when_operation_not_found - Validates fallback behavior

Impact

Before:

  • partition_async blocked when making concurrent requests
  • ❌ Concurrent requests could overwrite each other's settings
  • ❌ Event loop blocked on file I/O and executor waits
  • ❌ Inconsistent behavior with concurrent requests

After:

  • ✅ True concurrent partition_async requests work correctly
  • ✅ Each request has isolated settings
  • ✅ No event loop blocking
  • ✅ Clean async/await patterns throughout
  • ✅ Comprehensive test coverage for concurrency scenarios

Commits

  1. Add async hook support to hook infrastructure - Foundation for async hooks
  2. Convert SplitPdfHook to use async hooks - Remove blocking operations
  3. Fix concurrency issues in SplitPdfHook - Per-operation settings + tests
  4. Make load_elements_from_response async - Complete async transformation

🤖 Generated with Claude Code


Note

Medium Risk
Touches core request/response hook execution for all async SDK calls and changes split-PDF request fan-out behavior; regressions could impact request handling and error propagation under concurrency.

Overview
Fixes partition_async concurrency by introducing async hook infrastructure and refactoring the split-PDF flow to avoid event-loop blocking and shared mutable state.

SDKHooks/basesdk.py now execute before_request/after_success/after_error via new async hook interfaces (while still running sync hooks for compatibility), and hook registration wires up the async variants.

SplitPdfHook is updated to use per-operation_id settings (e.g., allow_failed, cache flags/dir, concurrency) with cleanup, removes the prior thread/executor workaround, and switches cached-element loading to async file I/O; unit tests add coverage for settings isolation/cleanup and concurrent async behavior. Version/release metadata is bumped (plus lockfile regen).

Written by Cursor Bugbot for commit 1306dd8. This will update automatically on new commits. Configure here.

awalker4 and others added 6 commits February 3, 2026 12:10
- Add AsyncSDKInitHook, AsyncBeforeRequestHook, AsyncAfterSuccessHook, and AsyncAfterErrorHook interfaces
- Update SDKHooks to store and execute both sync and async hooks
- Add async executor methods that run async hooks first, then sync hooks for backward compatibility
- Update basesdk.py do_request_async to call async hook methods
- Enables proper async/await patterns in hooks without blocking the event loop

This is the first step toward fixing ENG-792 where blocking calls in the split PDF hook prevent concurrent async requests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Implement AsyncBeforeRequestHook, AsyncAfterSuccessHook, and AsyncAfterErrorHook interfaces
- Convert _await_elements to async method that directly awaits coroutines
- Remove blocking executor.submit() and .result() calls that were blocking the event loop
- Remove ThreadPoolExecutor and _run_coroutines_in_separate_thread workaround
- before_request_async contains full implementation (CPU-bound work noted in docstring)
- Register async hook implementations in init_hooks
- Sync hooks still work for backward compatibility with partition()
- Async hooks enable true concurrent requests with partition_async()

Fixes ENG-792: partition_async now supports concurrent SDK requests without blocking.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Critical fixes:
- Fix missing await in sync after_success (would have caused bugs)
- Move instance-level settings (allow_failed, cache_tmp_data_feature, cache_tmp_data_dir) to per-operation dicts
- This prevents race conditions where concurrent requests would overwrite each other's settings
- Add safety check for tempdir existence before accessing in call_api_partial

Per-operation settings storage:
- Each operation_id now has its own isolated settings
- Settings are properly cleaned up in _clear_operation
- Default values are used when operation_id not found

Tests added:
- test_per_request_settings_isolation: Validates settings don't interfere between operations
- test_per_request_settings_cleanup: Validates proper cleanup
- test_concurrent_async_operations_isolation: Simulates real concurrent async operations
- test_await_elements_uses_operation_settings: Validates _await_elements uses correct settings
- test_default_values_used_when_operation_not_found: Validates fallback to defaults

This fixes the race conditions that caused inconsistent behavior and "event loop is closed" warnings when making concurrent partition_async requests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Convert load_elements_from_response to async function using aiofiles
- Update caller in _await_elements to await the async file read
- Prevents blocking the event loop during file I/O operations
- Each cached response file is now read asynchronously

This completes the async transformation of the split PDF hook,
ensuring no blocking I/O operations remain in async contexts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

"""Code generated by Speakeasy (https://speakeasy.com). DO NOT EDIT."""

import httpx
import inspect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused inspect import in sdkhooks module

Low Severity

import inspect was added to sdkhooks.py but is never referenced anywhere in the file. This is dead code that was likely left over from development.

Fix in Cursor Fix in Web

@awalker4 awalker4 marked this pull request as draft February 10, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant