Skip to content

fix(runtime): offload async inference work#250

Open
rylinjames wants to merge 1 commit into
mainfrom
fix/runtime-async-predict-offload
Open

fix(runtime): offload async inference work#250
rylinjames wants to merge 1 commit into
mainfrom
fix/runtime-async-predict-offload

Conversation

@rylinjames

@rylinjames rylinjames commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • run non-batched predict_async() calls in the default executor instead of blocking the event loop
  • run _predict_batch_sync() from the batch worker through the default executor as well
  • add a generic ORT I/O Binding path for the denoise loop so constant inputs are bound once per chunk and per-step dynamic inputs use run_with_iobinding() when enabled
  • add regression tests proving both async paths yield while slow sync inference is running, plus a fake-session I/O Binding test that avoids session.run()
  • clean up stale server/test imports and add the missing os import used by the curate uploader path

Tests

  • /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/ruff check src/tether/runtime/server.py tests/test_server.py
  • PYTHONPATH=$PWD/src /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/python -m py_compile src/tether/runtime/server.py tests/test_server.py
  • PYTHONPATH=$PWD/src /Users/romirjain/Desktop/building\ projects/fastcrest/tether/.venv/bin/python -m pytest tests/test_server.py::TestTetherServerWithMockORT::test_predict_async_offloads_non_batched_predict tests/test_server.py::TestTetherServerWithMockORT::test_batch_worker_offloads_sync_batch_predict tests/test_server.py::TestTetherServerWithMockORT::test_denoise_uses_iobinding_when_enabled tests/test_serve_e2e.py -p no:cacheprovider

Note: tests/test_chunk_budget_integration.py could not run in this local venv because onnxruntime is not installed.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rylinjames rylinjames force-pushed the fix/runtime-async-predict-offload branch from 9513555 to 8650bc5 Compare June 11, 2026 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant