Skip to content

Embed python stdlib modules#6814

Draft
hoodmane wants to merge 442 commits into
mainfrom
hoodmane/embed-python-stdlib-modules
Draft

Embed python stdlib modules#6814
hoodmane wants to merge 442 commits into
mainfrom
hoodmane/embed-python-stdlib-modules

Conversation

@hoodmane

Copy link
Copy Markdown
Contributor

No description provided.

mikea and others added 30 commits May 28, 2026 17:05
update-deps: always obtain github token

See merge request cloudflare/ew/workerd!175
Convert WorkerEntrypoint::request from a .then/.catch_/.attach promise
chain to a C++20 coroutine with KJ_TRY/KJ_CATCH and KJ_DEFER. The new
form is organized into four labeled stages:

  Stage 1: Set up per-request state (incomingRequest, tracer, metrics).
  Stage 2: Run the JS request handler under context.run().
  Stage 3: Wait for the deferred-proxy task.
  Stage 4: Handle any exception escaped from the above (in the outer
           KJ_CATCH): actor tunnel / fail-open / worker-to-worker
           tunnel / synthesize 5xx.

Other readability changes folded in:

  - Extract buildFetchEventInfo() as a free helper in the anonymous
    namespace; this replaces a multi-step inline header canonicalization
    block inside Stage 1.
  - Inline the previously-separate sendErrorResponse() error-handling
    logic into Stage 4's KJ_CATCH body, with a sendSyntheticStatus
    lambda for the shared synth-5xx code path.
  - Drop redundant comments where the code is self-explanatory.

Two subtleties preserved from the original chain require explicit
handling in coroutine form:

  - proxyTask cleanup: The original chain attached
    .attach(kj::defer(proxyTask = kj::none)) at the top level so that
    the defer ran on cancellation of the overall returned promise,
    including when the fail-open fallback took over. To match, the
    coroutine puts KJ_DEFER({ proxyTask = kj::none; }) at the outermost
    KJ_TRY scope rather than scoped to Stage 3 (per Harris's review on
    MR !162; the narrower scope was effectively a no-op since 'co_await
    p' consumes the promise before the defer would fire).
  - Cancellation ordering: In the original chain, the boundary between
    a stage's .catch_ and the next .then was an implicit yield point.
    Without it, downstream observers (e.g. rpc-handler's
    addStartupException) interpret a canceled request as one that
    threw, which changes the order in which stage events are reported
    in the request log. The coroutine restores this by inserting
    'co_await kj::yield()' before each rethrow in the Stage 2 / Stage 3
    / Stage 4-isActor catches.
Check for banned name on SQLite rename

See merge request cloudflare/ew/workerd!173
Refactor WorkerEntrypoint::request to coroutine

See merge request cloudflare/ew/workerd!162
containers: Add SpanContext to relevant container.capnp calls

See merge request cloudflare/ew/workerd!176
update deps

See merge request cloudflare/ew/workerd!179
A more thorough set of fixes for crypto TOCTOU issues

See merge request cloudflare/ew/workerd!138
Frankenvalue::fromCapnpImpl() accumulated per-node UInt32 capTableSize
wire fields into a 32-bit `uint capCount` with no overflow check. The
only validation was a final-sum equality check against capTable.size(),
which an attacker could satisfy by crafting capTableSize values that wrap
around 2^32 (e.g. root=0x80000000, child=0x80000001 → wrapped sum=1).
The resulting Property::capTableOffset/capTableSize values became
arbitrary 32-bit garbage used as ArrayPtr::slice() bounds in toJsImpl(),
leading to out-of-bounds heap reads in the multi-tenant parent process.

The fix widens capCount to size_t (preventing 32-bit wrap) and adds a
per-node KJ_REQUIRE that each node's declared capTableSize does not
exceed the remaining unconsumed slots before accumulation. The existing
final-sum equality check is retained as a belt-and-suspenders guard.

Coverage: This commit ships two regression tests in frankenvalue-test.c++
that exercise the patched code path. The first test constructs the exact
uint32 overflow construction from the finding (root capTableSize=0x80000000,
property capTableSize=0x80000001, capTable.size()=1) and asserts that
fromCapnp throws "capTableSize exceeds". The second test verifies the
simpler case of a single node claiming more caps than exist.

Test validation: VALIDATED LOCALLY
Pre-patch run: FAIL (bazel test //src/workerd/io:frankenvalue-test@ --test_filter="capTableSize uint32 overflow")
Post-patch run: PASS (bazel test //src/workerd/io:frankenvalue-test@ --test_filter="capTableSize")

Refs: AUTOVULN-EW-EDGEWORKER-15
The user-tracing API (ctx.tracing.enterSpan, ctx.tracing.Span, and
`import { tracing } from 'cloudflare:workers'`) was introduced behind
the workerdExperimental compatibility flag in #6608. The surface is
purely additive — making it generally available cannot break existing
workers — so remove the experimental gate from ExecutionContext::getTracing()
and let the property be accessible on every compatibility date.

Because the value is no longer optional, change the return type from
jsg::Optional<jsg::Ref<Tracing>> to jsg::Ref<Tracing>; the generated
typings move from `tracing?: Tracing` to `tracing: Tracing`, so
user code no longer needs to null-check.

Drop the "experimental" compatibility flag and the --experimental
CLI argument from the three tracing tests; they exercise the API on
its own merits now.
When passing an ArrayBuffer through the utility asBytes,
we recently updated it to copy if the ArrayBuffer is
resizable. Let's prepare for the arrival of immutable
ArrayBuffers also by also copying if IsImmutable is
true. Since the backing store would otherwise be shared
and expected to be mutable, this is the safest option.
Immutable ArrayBuffer is arriving soon, prepare for it

See merge request cloudflare/ew/workerd!174
Remove jsg::BufferSource from vfs

See merge request cloudflare/ew/workerd!170
Since we are copying the data into the write queue anyway,
we can allow users to write strings and SABs directly for
an ergonomic improvement. With strings in particular, this
avoids the users having to create a TextEncoder just to write
text to the stream, which is an exceedingly common use case.
VULN-136635: fix(io): reject Frankenvalue capTableSize overflow in fromCapnp

See merge request cloudflare/ew/workerd!103
Copy write buffers defensively

See merge request cloudflare/ew/workerd!172
Promote ctx.tracing user spans API out of experimental

See merge request cloudflare/ew/workerd!181
This was always used to wrap either drain() or finishScheduled(), so it's cleaner to actually do it in those two places.

In addition to looking nicer, this is needed for subsequent changes. It also conveniently resolves an ugly TODO under WorkerEntrypoint::customEvent().
`drain()` itself is now responsible for attaching the `IncomingRequest` to the promise and adding that to `waitUntilTasks`.

Since `drain()` is called from many call sites, they all need to be updated. However, going forward this makes it easier to adjust the semantics of `drain()`, as will happen in the next commit.
This is sort of like what the previous commit did with drain(), except in the case of finishScheduled(), the caller actually waits for the results rather than adding it to `waitUntilTasks`.

Every call site of `finishScheduled()` was also doing some additional calls on the `IoContext` after `finishScheduled()` completed, but some of this logic was literally the same for all call sites. Since the `IoContext` will now be gone by the time `finishScheduled()` completes, we need to move this logic into `finishScheduled()`, but this is a nice cleanup anyway.

The specific call site in WorkerEntrypoint::test() was confusing. The weird logic around onAbort() there relied on the assumption that `finishScheduled()` only returned `EventOutcome::EXCEPTION` in the case of an abort -- a non-abort exception was reported via `waitUntilStatus()` instead. The purpose here is just to make sure these exceptions get logged in tests, so I simplified to a `KJ_LOG(INFO)`.
The timeout for "small" is *extremely* short, and this test produces a stack trace which takes some time to decode.
This is particularly important with facets, where we want the SQLite database to be closed IMMEDIATELY so that we can then potentially delete it or replace it. If we let the SQLite database hang around for a bit, it might try to delete its WAL file later after that file has already been replaced by a new one.
1. Deleting a name that never existed failed spuriously.
2. WAL and SHM files were not deleted. Typically SQLite cleans these up automatically, but not always.
hoodmane and others added 23 commits June 11, 2026 20:04
Remove more unneeded package loading machinery

See merge request cloudflare/ew/workerd!272
Simplify snapshot.ts after removal of builtin packages

See merge request cloudflare/ew/workerd!282
Revert "Merge branch 'jasnell/fixup-internal-safety-moar-2' into 'gitlab'"

See merge request cloudflare/ew/workerd!285
V8 can use MPK's to implement what they call ThreadIsolation.

Among other things, ThreadIsolation prevents write access to
executable pages outside of JIT compiler scopes.

This is separate from the MPK-based isolate separation that
we already have.  One MPK key was already reserved by V8 for
ThreadIsolation, but it was unused.  This means this change
does not reduce the number of keys available for isolate
separation.

This change fixes things both for workerd and for an embedder
(the embedder also has to override the function to thread
through the change).
Enable V8's MPK-based ThreadIsolation.

See merge request cloudflare/ew/workerd!277
Cherry-pick !278 to fix V8 15.0 fatals/segfaults

See merge request cloudflare/ew/workerd!289
Add auto_grpc_convert compat flag and cf.grpcWeb type

See merge request cloudflare/ew/workerd!240
Revert V8 15.0

See merge request cloudflare/ew/workerd!288
…lity

Add a no-op virtual RequestObserver::setNextSubrequestBodyRewindable(
SubrequestBodyRewindable) and call it from fetchImplNoOutputLock with
jsRequest->canRewindBody() just before getClientWithTracing, while the JS-level
request is still in hand.

The signal is a property of the request payload (a buffered/null body can be
safely replayed; a stream body cannot), not of the callee, so it is set
unconditionally on every fetch. It is only consumed when the target is an
actor, where edgeworker uses it to classify retry eligibility for disconnected
outgoing actor calls. The set->getClientWithTracing->wrapSubrequestClient
sequence is synchronous, so the stashed value always corresponds to the next
outgoing call with no stale-attribution risk.

No behavior change: the base observer implementation is a no-op.

Add a test (fetch-body-rewindable-test) covering that fetch forwards the correct
value: a buffered body reports rewindable, a stream body reports not. A single
RequestObserver is shared across every outgoing subrequest in an IoContext, so
the test issues two consecutive fetches in one invocation to verify the per-body
mapping, the per-call sequencing, and the absence of stale attribution. To
support it, TestFixture gains an optional requestObserverFactory so tests can
inject a RequestObserver that records the hook's arguments.

Release note: None.
* Make Data assert on handle fire in release.

See merge request cloudflare/ew/workerd!292
Make Data assert on handle fire in release.

See merge request cloudflare/ew/workerd!292
STOR-5277: Add RequestObserver hook for outgoing subrequest replayability

See merge request cloudflare/ew/workerd!280
Patch 0013 added support for cross-context promise settlement to
JSPromise::Fulfill/Reject by only deferring reaction triggering — the promise
status was updated immediately.

Move the cross-context check before the status update so the entire settlement
(status update and reactions) is deferred to the owning IoContext.

Also add dcheck to Torque RejectPromise matching FulfillPromise.

Update cross-context-promise-test to account for the changed settlement
timing: the unhandledrejection event now fires in the correct IoContext,
and synchronous inspect() after cross-context resolve may show pending.
Fix cross-context promise settlement assertion failure

See merge request cloudflare/ew/workerd!261
Original commit message:
  This adds a new TracedReference mode to V8 where slots
  don't get reused until the C++ destructor of the holding
  C++ object has been run.

This appears to cause V8 fatals of the form:
  Check failed: node->IsInUse()
Sentry issue 39273904
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


14 out of 15 committers have signed the CLA.
✅ (erikcorry)[https://github.com/erikcorry]
✅ (mikea)[https://github.com/mikea]
✅ (danlapid)[https://github.com/danlapid]
✅ (mar-cf)[https://github.com/mar-cf]
✅ (gabivlj)[https://github.com/gabivlj]
✅ (jasnell)[https://github.com/jasnell]
✅ (npaun)[https://github.com/npaun]
✅ (dcarney-cf)[https://github.com/dcarney-cf]
✅ (ketanhwr)[https://github.com/ketanhwr]
✅ (dom96)[https://github.com/dom96]
✅ (kentonv)[https://github.com/kentonv]
✅ (git-bruh)[https://github.com/git-bruh]
✅ (ObsidianMinor)[https://github.com/ObsidianMinor]
✅ (fhanau)[https://github.com/fhanau]
@matt Simpson
Matt Simpson seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@hoodmane hoodmane force-pushed the hoodmane/embed-python-stdlib-modules branch 3 times, most recently from b78218c to 46fcff5 Compare June 15, 2026 19:57
and remove package loading code paths. More cleanup after removing builtin
packages.
@hoodmane hoodmane force-pushed the hoodmane/embed-python-stdlib-modules branch from 46fcff5 to c6ee075 Compare June 15, 2026 20:35
Update python_metadata.bzl with new bundle info

This commit updates the backport and integrity values in python_metadata.bzl
based on the latest Pyodide bundle upload.

🤖 Generated automatically by release-python-runtime workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.