Skip to content

Add uvloop-inspired async task API#25

Open
benoitc wants to merge 21 commits intomainfrom
feature/async-task-api
Open

Add uvloop-inspired async task API#25
benoitc wants to merge 21 commits intomainfrom
feature/async-task-api

Conversation

@benoitc
Copy link
Owner

@benoitc benoitc commented Mar 11, 2026

Summary

  • Add thread-safe async task API to py_event_loop module
  • py_event_loop:run/3,4 - Blocking run of async Python functions
  • py_event_loop:create_task/3,4 - Non-blocking task submission
  • py_event_loop:await/1,2 - Wait for task result with timeout
  • py_event_loop:spawn_task/3,4 - Fire-and-forget task execution

Uses enif_send for thread-safe wakeup from dirty schedulers. Task queue protected by mutex. Results delivered via Erlang message passing.

Also adds erlang.sleep() and erlang.call() for sync context, plus explicit scheduling API (erlang.schedule(), erlang.schedule_py()).

benoitc added 21 commits March 11, 2026 08:39
Add erlang.sleep() function that works in both async and sync contexts:
- Async: returns asyncio.sleep() which uses Erlang timer system
- Sync: uses erlang.call('_py_sleep') callback with receive/after,
  truly releasing the dirty scheduler for cooperative yielding

Remove unused _erlang_sleep NIF which only released the GIL but blocked
the pthread. The callback approach properly suspends the Erlang process.

Changes:
- Add sleep() to _erlang_impl and export to erlang module
- Add _py_sleep callback in py_event_loop.erl
- Remove py_erlang_sleep NIF and dispatch_sleep_complete
- Remove sync_sleep fields from event loop struct
- Remove sleep handlers from py_event_worker
- Update tests to use erlang.sleep()
Update docstring and asyncio.md to clarify:
- Both sync and async modes release the dirty NIF scheduler
- Async: yields to event loop via asyncio.sleep()/call_later()
- Sync: suspends Erlang process via receive/after callback

Also fix outdated architecture diagram that referenced removed
sleep_wait/dispatch_sleep_complete NIF.
- Make erlang.call() blocking (no replay)
- Add erlang.schedule(), schedule_py(), consume_time_slice()
- ScheduleMarker type for explicit dirty scheduler release
Add documentation for:
- erlang.call() now blocks (no replay)
- erlang.schedule() for Erlang callback continuation
- erlang.schedule_py() for Python function continuation
- erlang.consume_time_slice() for cooperative scheduling
Implement uvloop-inspired thread-safe task submission for running
async Python coroutines from any Erlang dirty scheduler thread.

Core changes:
- Add task queue (ErlNifIOQueue) to event loop for atomic operations
- nif_call_soon_threadsafe: serialize and enqueue task, send wakeup
- nif_process_ready_tasks: dequeue and schedule tasks on event loop
- py_event_worker handles task_ready message to process queue

High-level Erlang API:
- py_event_loop:run/3,4 - blocking run, wait for result
- py_event_loop:create_task/3,4 - non-blocking, returns ref
- py_event_loop:spawn/3,4 - fire-and-forget with optional notify
- py_event_loop:await/1,2 - wait for task result

Uses enif_send() for thread-safe wakeup from any dirty scheduler,
avoiding the thread-local event loop issues with asyncio.
Implements a thread-safe async task queue that works from dirty schedulers:

- Add task_queue (ErlNifIOQueue) and py_loop fields to erlang_event_loop_t
- nif_submit_task: Thread-safe task submission via enif_ioq and enif_send
- nif_process_ready_tasks: Dequeue tasks, create coroutines, schedule on loop
- py_event_worker handles task_ready wakeup message
- High-level Erlang API: run/3,4, create_task/3,4, await/1,2, spawn_task/3,4
- Python ErlangEventLoop registers with global loop via _set_global_loop_ref
- Register callbacks early in supervisor to ensure availability
- Add uvloop-style lazy Python loop creation in process_ready_tasks
- Only call _run_once when coroutines are scheduled (not for sync functions)
- Use enif_send directly for sync function results (faster path)
- Fix queue size tracking in task processing loop

Before: 1003 ms/task (1 task/sec)
After: 0.009 ms/task (117K tasks/sec)
Pass timeout_hint=0 to _run_once() when coroutines are scheduled,
preventing the event loop from blocking for up to 1 second when
work is already pending. This matches uvloop's approach of
computing exact sleep times.

Changes:
- Add timeout_hint parameter to ErlangEventLoop._run_once()
- Update C code to pass timeout=0 after scheduling coroutines
- Add bench_channel_async.erl for sync vs async comparison
Reduces async task API overhead by:
- Early exit before GIL acquisition when task queue is empty
- Caching asyncio module and _run_and_send function across calls
- Only calling _run_once when coroutines are actually scheduled

Performance improvements:
- create_task + await: ~40% faster (157K vs 113K tasks/sec)
- Concurrent tasks: ~30% faster (360K vs 275K tasks/sec)
uvloop-style optimizations for the Python event loop:
- Handle pooling: reuse Handle objects in call_soon() instead of allocating
- Time caching: cache time.monotonic() at start of each _run_once iteration
- Clear context references when returning handles to pool

These reduce allocations and syscalls in the hot path.
Restructure nif_process_ready_tasks into two phases:
- Phase 1: Dequeue all tasks WITHOUT GIL (NIF operations only)
- Phase 2: Acquire GIL once, process entire batch, release

Benefits:
- GIL held only during Python operations, not NIF operations
- Batch up to 64 tasks per GIL acquisition
- Task queue mutex released before GIL acquired (no lock overlap)
SuspensionRequiredException inherits from BaseException, not Exception,
so the except Exception block didn't catch it. This caused the suspension
mechanism to replay the entire function, making time measurements show
~0 elapsed time instead of the actual sleep duration.

The fix catches BaseException and falls back to time.sleep() for correct
timing behavior in py:call contexts. For dirty scheduler release in sync
contexts, py:exec/py:eval should be used instead.
Add 15 new tests covering:
- Stdlib operations (math.sqrt, pow, floor, ceil)
- Operator module functions (add, mul)
- Error handling (invalid module, function, timeout)
- Concurrency (multiple processes, batch tasks)
- Edge cases (empty args, large results, nested data)

Tests use stdlib modules to avoid context issues with __main__.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant