Skip to content

[Bug]: RIE crash for steps exceeding 60 seconds due to Lambda client in local runner missing read_timeout configuration #203

@willmizzi

Description

@willmizzi

Expected Behavior

When running durable functions locally via SAM CLI (sam local start-lambda) and using the js-sdk, steps that take longer than 60 seconds should complete successfully, up to the configured Lambda function Timeout.

Actual Behavior

Any durable function step that takes longer than ~55-60 seconds causes the emulator to re-invoke the Lambda, which crashes the Lambda Runtime Interface Emulator (RIE) with a nil pointer dereference:

START RequestId: 6ec8cb31-81ab-46a7-a5fa-0eaee703fb45 Version: $LATEST
Starting 60s sleep...
Sleep tick: 1s / 60s
Sleep tick: 2s / 60s
...
Sleep tick: 55s / 60s
Sleep tick: 56s / 60s
START RequestId: dc9f1334-a200-439b-8e92-9a64ba5dba75 Version: $LATEST
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x785041]

goroutine 64 [running]:
github.com/aws/aws-lambda-runtime-interface-emulator/internal/lambda/rapidcore.(*Server).Invoke.func2()
  /LambdaRuntimeLocal/internal/lambda/rapidcore/server.go:666 +0xe1
created by github.com/aws/aws-lambda-runtime-interface-emulator/internal/lambda/rapidcore.(*Server).Invoke in goroutine 61
  /LambdaRuntimeLocal/internal/lambda/rapidcore/server.go:649 +0x1f1

The failure happens consistently around 55 or 56 seconds. When the execution time is 50s or less, it consistently completes without erroring.

I attempted to set the Global and Function level timeouts to 300s in my template.yml, but the issue persisted.

Steps to Reproduce

  1. Create a durable function with a single step that takes longer than 60 seconds:
import {
  withDurableExecution,
  DurableContext,
} from '@aws/durable-execution-sdk-js';

const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));

export const handler = withDurableExecution(
  async (event: unknown, context: DurableContext) => {
    await context.step('long-running-step', async () => {
      console.log('Starting 60s sleep...');
      for (let i = 1; i <= 60; i++) {
        await sleep(1_000);
        console.log(`Sleep tick: ${i}s / 60s`);
      }
      console.log('60s sleep complete');
    });
  },
);
  1. Configure the function in template.yml with a Timeout exceeding 60 seconds (e.g. 300)
  2. Run sam local start-lambda
  3. Invoke the function

Result: After ~55-60 seconds, the RIE crashes with the nil pointer panic shown above.

Control test: Changing the sleep to 40 seconds succeeds consistently.

SDK Version

unknown — using the version bundled with SAM CLI 1.158.0

Python Version

Other (specify in additional context)

Is this a regression?

No

Last Working Version

No response

Additional Context

SDK Version

SAM CLI 1.158.0, using the bundled samcli/durable-execution-emulator:aws-durable-execution-emulator-arm64 image (built 2026-03-26).

@aws/durable-execution-sdk-js v1.1.1 / @aws/durable-execution-sdk-js-testing v1.1.1.

Environment

SAM CLI is running inside a Docker container via docker compose, using sam local start-lambda mode:
Image: public.ecr.aws/sam/build-nodejs24.x:latest
Runtime: nodejs24.x
Platform: macOS (Darwin), Docker Desktop, ARM64
SAM command: sam local start-lambda --host 0.0.0.0 --port 8107 --docker-network my-network --container-host host.docker.internal --warm-containers EAGER
The Docker socket is mounted into the SAM container (/var/run/docker.sock) so SAM can create Lambda runtime containers and the durable execution emulator container on the host Docker daemon

Also tested with different warm-containers configurations, and --shutdown with same results

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions