Skip to content

chore(kitchen-sink): add rivet cloud deploy workflow#5100

Open
NathanFlurry wants to merge 39 commits into
mainfrom
nathan/kitchen-sink-rivet-deploy
Open

chore(kitchen-sink): add rivet cloud deploy workflow#5100
NathanFlurry wants to merge 39 commits into
mainfrom
nathan/kitchen-sink-rivet-deploy

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Adds a kitchen-sink-scoped GitHub Action that builds examples/kitchen-sink/Dockerfile and deploys it to a Rivet Cloud managed pool via rivet-dev/deploy-action@v1.1.0.

  • Path-filtered to examples/kitchen-sink/** and the rivetkit / engine SDK / shared TS packages the Dockerfile copies in.
  • Managed-pool env sets PORT=8080 to match the Dockerfile's EXPOSE 8080.
  • Uses the kitchen-sink-scoped KITCHEN_SINK_RIVET_CLOUD_TOKEN repo secret.
  • Documented in examples/kitchen-sink/CLAUDE.md.

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 24, 2026

🚅 Deployed to the rivet-pr-5100 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web May 24, 2026 at 6:42 pm
frontend-cloud 😴 Sleeping (View Logs) Web May 24, 2026 at 6:41 pm
ladle ✅ Success (View Logs) Web May 24, 2026 at 6:33 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 24, 2026 at 6:33 pm
mcp-hub ✅ Success (View Logs) Web May 24, 2026 at 6:31 pm
kitchen-sink ❌ Build Failed (View Logs) Web May 24, 2026 at 6:30 pm

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 24, 2026

Code Review: PR #5100 — chore(kitchen-sink): add rivet cloud deploy workflow

This PR adds significant observability infrastructure (Prometheus metrics in envoy-client and rivetkit-core), a kitchen-sink GitHub Actions deploy workflow, and several new kitchen-sink actors/scripts. The metrics work is thorough, but a few correctness and security issues are worth addressing.


🚨 Security

/metrics endpoint is unauthenticated (rivetkit-rust/packages/rivetkit-core/src/registry/http.rs)

handle_metrics_fetch calls render_prometheus_metrics directly without invoking authorize_metrics_request. The authorize_metrics_request function in metrics_endpoint.rs checks RIVETKIT_METRICS_ENABLED=1 and validates a bearer token from RIVETKIT_METRICS_TOKEN, but that code path is never reached from the HTTP handler. Any untrusted actor-level HTTP client can GET /metrics on any actor and receive the full Prometheus output without a token.

// http.rs — no auth check before calling render
fn handle_metrics_fetch(request: &Request, envoy_status: Option<&CoreEnvoyStatus>) -> Result<HttpResponse> {
    let metrics = crate::metrics_endpoint::render_prometheus_metrics(envoy_status)?;
    bytes_response(StatusCode::OK, &metrics.content_type, metrics.body)
}

The fix is to extract the Authorization header from request, call authorize_metrics_request(bearer_token), and return an appropriate error response on Err.


🐛 Correctness

is_ping_healthy() returns false on a fresh envoy (startup regression) (engine/sdks/rust/envoy-client/src/handle.rs)

The doc comment says "Fresh envoys start healthy until the threshold elapses without a ping", but the implementation contradicts this. last_ping_ts initializes to 0; last_ping_at_ms() returns None when the value is 0; and is_some_and(...) on None is always false. So a newly-started envoy reports itself as unhealthy before the first ping arrives.

pub fn is_ping_healthy(&self) -> bool {
    self.last_ping_age_ms()
        .is_some_and(|age_ms| age_ms < Self::PING_HEALTHY_THRESHOLD_MS)  // None → false
}

Fix: return true when last_ping_at_ms() is None (never received a ping means the envoy is newly connected, not failing).


outbound_queue_depth gauge leaks on Stop/eviction exits (engine/sdks/rust/envoy-client/src/envoy.rs)

outbound_queue_depth is incremented for every BufferTunnelMsg pushed (line 439), but is only decremented inside resend_buffered_tunnel_messages (called on reconnect). When the envoy loop exits via ToEnvoyMessage::Stop (line 488) or a ConnClose { evict: true } break (line 422), ctx.buffered_messages is dropped without a matching sub(). Over multiple eviction/stop cycles the gauge permanently overstates queue depth.

Fix: drain and sub ctx.buffered_messages in the cleanup block after the loop, or inside the Stop/eviction arms before breaking.


ws_tx_depth gauge leaks when write task is aborted (engine/sdks/rust/envoy-client/src/connection/native.rs)

After the read loop exits, write_handle.abort() (line 330) is called without draining the write task's channel. Any WsTxMessage::Send items queued in ws_rx at that moment had their ws_tx_depth incremented by ws_send, but the matching fetch_sub in the write task's recv loop will never fire because the task was aborted. Across reconnect cycles under connection instability, ws_tx_depth accumulates indefinitely.


JsRegistryRouteResponse type used but never declared (rivetkit-typescript/packages/rivetkit-napi/index.d.ts)

Lines 320–322 declare health(), metadata(), and metrics() as returning JsRegistryRouteResponse, but no such interface is defined anywhere in index.d.ts. TypeScript consumers will get Cannot find name 'JsRegistryRouteResponse' and be unable to use these methods with type checking.


🔧 Style / CLAUDE.md violations

Mutex<HashMap> and std::sync::Mutex in async context (engine/sdks/rust/envoy-client/src/context.rs)

pub actors: Arc<StdMutex<HashMap<String, HashMap<u32, SharedActorEntry>>>>,
pub live_tunnel_requests: Arc<StdMutex<HashMap<[u8; 8], String>>>,
pub pending_hibernation_restores: Arc<StdMutex<HashMap<...>>>,

CLAUDE.md: "Never use Mutex<HashMap<...>>. Use scc::HashMap (preferred)." and "Async Rust code defaults to tokio::sync::Mutex. Do not use std::sync::Mutex." These maps are accessed from async code paths (wait_actor_registered_then_stopped, http_request_counter). A future change that holds a guard across .await will silently deadlock the Tokio thread pool. Prefer scc::HashMap.


RIVETKIT_METRICS_ENABLED=true silently disables metrics (rivetkit-rust/packages/rivetkit-core/src/metrics_endpoint.rs)

configured_metrics_token() only treats "1" as enabled. Setting RIVETKIT_METRICS_ENABLED=true or RIVETKIT_METRICS_ENABLED=yes (common operator conventions) silently returns None and disables the endpoint with no diagnostic. Consider accepting any non-empty string, or documenting the exact expected value prominently.


Positive notes

  • Metrics naming is clean with bounded label sets throughout — the explicit comment discouraging actor_id/request_id labels is a good guard.
  • ws_tx_lock_wait_duration_seconds / ws_tx_lock_hold_duration_seconds split is a useful debugging aid for isolating lock contention vs. encode latency.
  • subtle::ConstantTimeEq usage for token comparison in authorize_metrics_request is the right call.
  • SQLite corruption fuzz tests (sqlite_corruption_fuzz.rs) are a welcome addition.

@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 4971f55 to 674bdef Compare May 24, 2026 00:10
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 674bdef to a9f91b3 Compare May 24, 2026 00:14
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from a9f91b3 to 5a5c57d Compare May 24, 2026 02:15
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 5a5c57d to 105d3ba Compare May 24, 2026 02:29
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 105d3ba to f1500d7 Compare May 24, 2026 02:38
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from f1500d7 to 7a03edc Compare May 24, 2026 02:49
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 7a03edc to 142a4ca Compare May 24, 2026 03:16
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 142a4ca to c4d3c0e Compare May 24, 2026 03:26
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from c4d3c0e to 2a6829a Compare May 24, 2026 03:39
@rivet-dev rivet-dev deleted a comment from github-actions Bot May 24, 2026
@rivet-dev rivet-dev deleted a comment from github-actions Bot May 24, 2026
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 2a6829a to 030662e Compare May 24, 2026 04:00
@rivet-dev rivet-dev deleted a comment from github-actions Bot May 24, 2026
@NathanFlurry NathanFlurry force-pushed the nathan/kitchen-sink-rivet-deploy branch from 030662e to 628a849 Compare May 24, 2026 04:11
NathanFlurry and others added 28 commits May 24, 2026 11:26
… for ws transport and sqlite request lifecycle
chore(envoy-client): init new backpressure tracking fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants