Skip to content

fix: optimize e2e actor path#4551

Draft
MasterPtato wants to merge 1 commit into04-02-feat_pb_actor_v1_to_v2_migrationfrom
04-03-fix_optimize_e2e_actor_path
Draft

fix: optimize e2e actor path#4551
MasterPtato wants to merge 1 commit into04-02-feat_pb_actor_v1_to_v2_migrationfrom
04-03-fix_optimize_e2e_actor_path

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app bot commented Apr 4, 2026

🚅 Deployed to the rivet-pr-4551 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud ❌ Build Failed (View Logs) Web Apr 4, 2026 at 4:00 am
frontend-inspector ❌ Build Failed (View Logs) Web Apr 4, 2026 at 4:00 am
mcp-hub ✅ Success (View Logs) Web Apr 4, 2026 at 3:59 am
website ❌ Build Failed (View Logs) Web Apr 4, 2026 at 3:59 am
kitchen-sink ❌ Build Failed (View Logs) Web Apr 4, 2026 at 3:59 am
ladle ❌ Build Failed (View Logs) Web Apr 4, 2026 at 3:59 am

@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Code Review: fix: optimize e2e actor path

Overall this is a solid set of targeted improvements: adding caching to the namespace ops, better tracing instrumentation across the e2e path, and two new workflow signals (Sleep/Reallocate). A few issues worth addressing before merge.


Issues

test-envoy/src/index.ts/foo endpoint left in

The file contains a dead test endpoint (app.post("/foo", ...) at line 203) that duplicates the /api/rivet/start handler but uses setTimeout to defer startup. This looks like a leftover from debugging the timing fix. It should be removed before this lands.

test-envoy/src/index.ts/api/rivet/start SSE ping before envoy is ready

// /api/rivet/start (lines 184-200)
const stopped = Promise.withResolvers<void>();
const envoy = startEnvoySync({ ... });
// no initial ping here - correct

// /foo (lines 210-232) 
setTimeout(() => {
    const envoy = startEnvoySync({ ... });
}, 0);
await stream.writeSSE({ event: "ping", data: "" }); // ping before envoy attaches
await stopped.promise;

The /foo endpoint sends the initial SSE ping before the envoy actually connects, which is what the setTimeout is trying to work around. If this pattern is the right fix for the race, it should replace /api/rivet/start rather than exist alongside it. If /api/rivet/start is already correct, drop /foo.

pegboard/src/workflows/actor2/mod.rsMain::Sleep silently ignores Allocating/Starting/GoingAway (line 828)

Transition::Allocating { .. }
| Transition::Starting { .. }
| Transition::GoingAway { .. } => {
    // TODO: Set to sleep after allocation is complete
}

The TODO is fine as an acknowledged deferral, but GoingAway is grouped with the not-yet-started transitions. An actor in GoingAway has already received a CommandStopActor(GoingAway) and is on its way out — it seems closer to Destroying in behavior. Is silently ignoring the sleep intent here correct, or should it set rewake_after_stop: false analogously to how Reallocate handles SleepIntent/StopIntent by transitioning those to GoingAway? Worth a quick sanity check on the desired state machine behavior.

scripts/tests/actor_e2e.ts — debug timestamps left in

Lines 1024 and 1032 add raw console.log(new Date().toISOString()) calls flanking the timed request. These appear to be leftover instrumentation from profiling the e2e path. They should be removed or converted to a proper console.time/timeEnd pair.

engine/sdks/typescript/test-envoy/src/index.ts — log message uses template literal instead of structured field

getLogger().info({
    msg: `Received SSE request`,  // line 178
});

The backtick here is harmless but inconsistent with the adjacent structured log at line 205 which also passes payload as a structured field. Minor nit.


Observations (non-blocking)

pegboard-outbound/src/lib.rsendpoint_url.clone() on line 265

let req = client
    .post(endpoint_url.clone())
    .body(payload)
    .headers(headers);

The clone is needed because endpoint_url is used again in the subsequent debug log, which is fine. Just noting it is intentional and correct.

pools/src/pools.rs — reqwest client eagerly initialized

The comment "Initialize here to avoid cold starts elsewhere" is accurate and helpful. The behavior change is intentional and correct per the CLAUDE.md guidance to reuse the shared client.

namespace/src/ops/get_local.rscache.resolve(&&namespace_id, ns) (line 46)

The double-reference matches the existing pattern in get_global.rs, so this is consistent, not a bug.

envoy-client/src/tasks/envoy/index.tsonShutdown call moved to after cleanup

Moving ctx.shared.config.onShutdown() to after ctx.actors.clear() is the right fix — callers should not observe a resolved stopped promise while actor handles are still open. The log message change from "stopping envoy" to "envoy stopped" is accurate.

logfmt.tsLOGGER_CONFIG.enableInspect is always true

The LOGGER_CONFIG object is exported and its enableInspect field is checked in castToLogValue, but nothing in the added code sets it to false. If this is meant to be configurable at runtime (e.g., via an env var), that wiring is missing. If it is always true, the branch can be simplified. Not a blocker for a test helper.


Summary

Two things to fix before merge:

  1. Remove the /foo debug endpoint from test-envoy/src/index.ts.
  2. Remove the stray console.log(new Date().toISOString()) lines from scripts/tests/actor_e2e.ts.

The rest of the changes (caching namespace ops, tracing spans, signal handling, logfmt extraction, Bun server support) look correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant