Skip to content

Add container agent support#334

Open
tejaskash wants to merge 6 commits intomainfrom
feature/container-support
Open

Add container agent support#334
tejaskash wants to merge 6 commits intomainfrom
feature/container-support

Conversation

@tejaskash
Copy link
Contributor

@tejaskash tejaskash commented Feb 18, 2026

Summary

  • Adds Container as a build type for agents across the full lifecycle: create, add, dev, deploy, invoke, package
  • Container dev server with Docker build, volume mount, and uvicorn hot-reload
  • Dockerfile and dockerignore templates for Python container agents
  • Docker prerequisite detection and validation
  • Wires userId (default: default-user) through invoke for container runtime auth (runtimeUserId header)
  • Upgrades vended @aws/agentcore-cdk to ^0.1.0-alpha.2
  • Fixes eslint require-await errors in codezip-dev-server and container packaging

Documentation

PR: #340

Test plan

  • agentcore create with Container build type generates correct project structure
  • agentcore add agent --build Container adds container agent config
  • agentcore dev builds and runs container with hot-reload
  • agentcore deploy builds container image via CodeBuild, pushes to ECR, creates runtime
  • agentcore invoke works for container agents (userId sent as default-user)
  • agentcore package packages container source correctly
  • CodeZip agents continue to work unchanged
  • All unit tests pass (1235 passed, 21 skipped)

@notgitika
Copy link
Contributor

/strands review this PR and leave comments, do not make code changes

@github-actions
Copy link
Contributor

Review Summary

I've completed a comprehensive review of this PR. Overall, this is a well-structured implementation that adds container support across the entire CLI lifecycle with good patterns and error handling. However, there are several areas that need attention before merging.

Critical Issues ⚠️

  1. CDK Version Discrepancy: The PR description claims upgrade to ^0.1.0-alpha.2 but src/assets/cdk/package.json still shows ^0.1.0-alpha.1. Please clarify what version is actually being used.

  2. Missing Test Coverage: No unit tests found for:

    • ContainerDevServer class
    • ContainerPackager class
    • Docker detection logic (detectContainerRuntime())
    • Container-specific scenarios

    Given the complexity of Docker operations, these should have test coverage.

  3. Default userId Security: The hardcoded DEFAULT_RUNTIME_USER_ID = 'default-user' has no validation or warnings. For production container agents, should there be enforcement or at least documentation about security implications?

High Priority Issues 🔴

See individual comments on specific files below for details on:

  • Hardcoded bedrock_agentcore user assumptions
  • Resource cleanup (built images left on system)
  • Memory concerns with unbounded build output
  • Silent error handling in cleanup operations

Positive Highlights ✅

  • Clean inheritance from DevServer base class
  • Comprehensive runtime detection (Docker, Podman, Finch)
  • Good AWS credential forwarding with multiple auth methods
  • Hot-reload support with volume mounts
  • Clear error messages with actionable guidance
  • Proper preflight validation structure

I'll leave detailed comments on specific files next.

@github-actions
Copy link
Contributor

src/cli/operations/dev/container-dev-server.ts

Security & Configuration Concerns

Line 148-149 - Hardcoded container user path:

const awsMountArgs = existsSync(awsDir) ? ['-v', `${awsDir}:/home/bedrock_agentcore/.aws:ro`] : [];

The path /home/bedrock_agentcore/.aws assumes a specific username in the container. If a user's Dockerfile uses a different username, this mount will fail silently. Consider:

  • Making the container username configurable
  • Documenting this as a requirement in the Dockerfile template
  • Validating the Dockerfile contains the expected USER directive

Line 84 - Hardcoded user in dev layer:

'USER bedrock_agentcore',

Same issue - this should be configurable or clearly documented as a hard requirement.

Resource Management

Line 69-70 & 92 - Unbounded build output:

this.logBuildOutput(buildResult.stdout, buildResult.stderr, onLog);

Build logs are piped to memory without size limits. Large Docker builds could cause memory issues. Consider:

  • Streaming logs instead of buffering
  • Adding a size limit with truncation
  • Writing large outputs to temp files

Line 58 - Silent container cleanup:

spawnSync(this.runtimeBinary, ['rm', '-f', this.containerName], { stdio: 'ignore' });

Using stdio: 'ignore' could hide legitimate errors (e.g., permission issues). Consider at least logging warnings for non-zero exit codes.

Dev/Prod Drift

Lines 77-97 - uvicorn installation in dev layer:
Installing uvicorn only in dev creates drift between dev and production images. This is pragmatic but should be documented in user guidance. Users need to understand that prod deployments should include uvicorn in their pyproject.toml.

@github-actions
Copy link
Contributor

src/cli/external-requirements/detect.ts

Cross-Platform & Error Handling

Line 39 - Platform detection edge cases:

const exists = isWindows ? await checkSubprocess('where', [runtime]) : await checkSubprocess('which', [runtime]);

This doesn't handle edge cases like:

  • WSL (Windows Subsystem for Linux) - reports as Linux but may have different binary paths
  • Cygwin/Git Bash on Windows
  • Non-standard PATH configurations

Consider adding more robust detection or documenting supported environments.

Lines 43-47 - No timeout on subprocess calls:

const result = await runSubprocessCapture(runtime, ['--version']);
// ... later ...
const infoResult = await runSubprocessCapture(runtime, ['info']);

If a Docker daemon hangs or is unresponsive, these calls could block indefinitely. Consider:

  • Adding timeouts to subprocess calls
  • Providing user feedback if detection is taking too long
  • Documenting expected detection time

User Experience

Lines 38-51 - Silent continue on detection failure:
The loop continues silently when runtimes fail checks. For better UX, consider logging info-level messages about which runtimes were found but not usable, especially during agentcore dev startup.

@github-actions
Copy link
Contributor

src/lib/packaging/container.ts

Silent Fallback Behavior

Lines 49-56 - Silent skip when no runtime:

if (!runtime) {
  // No runtime available — skip local build validation (deploy will use CodeBuild)
  return Promise.resolve({
    artifactPath: '',
    sizeBytes: 0,
    stagingPath: codeLocation,
  });
}

This silent fallback could be confusing for users. They run agentcore package expecting validation, but get an empty artifact with no clear indication. Consider:

  • Logging a warning that local validation was skipped
  • Documenting this behavior clearly
  • Adding a flag to require local validation

Resource Cleanup

Line 59 - No cleanup of built images:

const imageName = `agentcore-package-${agentName}`;

Built images are left on the system indefinitely. Over time, this could consume significant disk space, especially in CI/CD environments. Consider:

  • Adding cleanup after successful size validation
  • Documenting the need for manual cleanup
  • Providing a --cleanup flag option

Error Handling

Line 65 - Potentially large error output:

return Promise.reject(new PackagingError(`Container build failed:\n${buildResult.stderr?.toString()}`));

Docker build stderr can be very large (especially for verbose builds). Consider:

  • Truncating to last N lines
  • Writing full output to a log file and referencing it in the error
  • Limiting error message size

Configuration

Line 15 - Hardcoded runtime priority:

for (const runtime of ['docker', 'finch', 'podman']) {

The order is hardcoded as docker > finch > podman. Should this be:

  • Configurable via environment variable or config?
  • Consistent with the order in detect.ts (which uses docker, podman, finch)?

@github-actions
Copy link
Contributor

src/assets/container/python/Dockerfile

Security & Best Practices

Line 14 - Hardcoded UID:

RUN useradd -m -u 1000 bedrock_agentcore

UID 1000 is common but could conflict with host user permissions when using volume mounts on some systems. Consider:

  • Documenting this UID choice
  • Explaining how to handle permission issues
  • Considering --user flag flexibility

Missing HEALTHCHECK:
The Dockerfile doesn't include a HEALTHCHECK directive. While not critical for dev, this would help with:

  • Container orchestration in production
  • Detecting when the agent is actually ready
  • Debugging startup issues

Consider adding something like:

HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8080/health || exit 1

Documentation Needs

Line 21 - OpenTelemetry default:

CMD ["opentelemetry-instrument", "python", "-m", "{{entrypoint}}"]

The default CMD includes OpenTelemetry instrumentation, which adds overhead. Consider documenting:

  • How users can override this if they don't want OTel
  • Performance implications
  • How to disable OTel in dev if desired

@github-actions
Copy link
Contributor

src/cli/aws/agentcore.ts & src/cli/commands/invoke/action.ts

userId/runtimeUserId Implementation

Line 14 (agentcore.ts) - Default userId:

export const DEFAULT_RUNTIME_USER_ID = 'default-user';

Line 120 (agentcore.ts) & Line 99 (action.ts) - Usage:

runtimeUserId: options.userId ?? DEFAULT_RUNTIME_USER_ID,

Security Consideration

The PR description states: "Container agents require this to obtain workload access tokens." However:

  1. No validation for container agents specifically: If container agents REQUIRE a real userId for auth/authorization, should the code enforce this requirement? Currently, any agent type can use the default 'default-user'.

  2. No security warnings: For production use, 'default-user' might not be appropriate. Consider:

    • Adding a warning when default userId is used with container agents
    • Documenting the security implications
    • Validating userId format/requirements
  3. Documentation needed: The relationship between userId and container auth should be clearly documented:

    • What identity provider integration requires this?
    • What happens if wrong/missing userId is provided?
    • How does this relate to AWS IAM roles?

Positive Note

The consistent wiring through both streaming and non-streaming paths is well done, and the default value prevents breaking existing code.

@github-actions
Copy link
Contributor

src/cli/operations/deploy/preflight.ts

Dockerfile Validation

Lines 140-148 - Surface-level validation only:

if (agent.build === 'Container') {
  const codeLocation = resolveCodeLocation(agent.codeLocation, configRoot);
  const dockerfilePath = path.join(codeLocation, DOCKERFILE_NAME);
  
  if (!existsSync(dockerfilePath)) {
    errors.push(
      `Agent "${agent.name}": Dockerfile not found at ${dockerfilePath}...`
    );
  }
}

This only checks if the Dockerfile exists, not if it's valid. While full Docker validation might be out of scope, consider:

  1. Basic syntax check: Can we at least verify it's a valid text file?

  2. Required directive check: Verify the Dockerfile contains:

    • FROM directive
    • USER bedrock_agentcore (since this is hardcoded elsewhere)
    • WORKDIR /app (if required)
  3. Warning-level checks: Even if not enforced, warn about:

    • Missing HEALTHCHECK
    • Images that might be too large
    • Base images from untrusted sources
  4. Documentation: At minimum, document Dockerfile requirements:

    • Must create bedrock_agentcore user
    • Must set WORKDIR to /app
    • Should include HEALTHCHECK
    • Must expose appropriate ports

Positive Note

The error accumulation pattern (collect all errors before throwing) is excellent UX - users see all problems at once rather than one-at-a-time.

@github-actions
Copy link
Contributor

src/cli/operations/dev/utils.ts

convertEntrypointToModule Function

Lines 39-43:

export function convertEntrypointToModule(entrypoint: string): string {
  if (entrypoint.includes(':')) return entrypoint;
  const path = entrypoint.replace(/\.py$/, '').replace(/\//g, '.');
  return `${path}:app`;
}

Concerns

  1. Hardcoded handler assumption: The function assumes :app as the default handler name. What if:

    • User's entrypoint uses main, application, server, or custom handler?
    • Consider documenting this convention clearly
    • Or make the default handler configurable
  2. Python-only: The function only handles .py files. Looking forward:

    • Will container agents support Node.js/TypeScript?
    • If so, this function needs to handle those runtimes
    • Consider renaming to convertPythonEntrypointToModule if staying Python-specific
  3. Edge cases: What happens if:

    • Entrypoint is malformed?
    • Path has special characters?
    • Consider adding validation/error handling

Suggestion

export function convertEntrypointToModule(entrypoint: string, defaultHandler = 'app'): string {
  if (entrypoint.includes(':')) return entrypoint;
  if (!entrypoint.endsWith('.py')) {
    throw new Error(`Unsupported entrypoint format: ${entrypoint}`);
  }
  const path = entrypoint.replace(/\.py$/, '').replace(/\//g, '.');
  return `${path}:${defaultHandler}`;
}

Positive Note

The port availability checking logic (lines 1-37) is solid and properly handles both localhost and all-interfaces binding patterns used by containers.

@github-actions
Copy link
Contributor

Testing & Documentation

Missing Test Coverage

I couldn't find unit tests for the new container functionality:

Files without test coverage:

  • src/cli/operations/dev/container-dev-server.ts - No tests found
  • src/lib/packaging/container.ts - No tests found
  • src/cli/external-requirements/detect.ts - Docker detection logic untested

Recommended test cases:

  1. ContainerDevServer:

    • Test prepare() error paths (no runtime, no Dockerfile)
    • Test getSpawnConfig() AWS credential forwarding
    • Test cleanup on kill()
    • Mock Docker operations
  2. ContainerPackager:

    • Test size validation (under/over 1GB limit)
    • Test Dockerfile existence check
    • Test graceful handling when no runtime available
    • Mock Docker build and inspect
  3. Runtime Detection:

    • Test detection order (docker, podman, finch)
    • Test handling of installed but not-ready runtimes
    • Test fallback when no runtime available
    • Mock subprocess calls

Documentation Gaps

The following should be documented (in README, CONTRIBUTING, or inline):

  1. Dockerfile Requirements:

    • Must create bedrock_agentcore user (UID 1000)
    • Must set WORKDIR to /app
    • Recommended: Add HEALTHCHECK
    • OpenTelemetry is included by default
  2. Dev/Prod Differences:

    • Dev mode auto-installs uvicorn
    • Prod needs uvicorn in pyproject.toml
    • Volume mounts in dev vs. image layers in prod
  3. userId/runtimeUserId:

    • When is this required?
    • Security implications of 'default-user'
    • How it relates to identity providers
  4. Resource Management:

    • Built images are not cleaned up automatically
    • Commands for manual cleanup
    • Disk space considerations in CI/CD

Test Plan Verification

The PR description mentions all tests pass (1235 passed, 21 skipped), which is great! However, adding specific container tests would increase confidence in this complex feature.

@github-actions
Copy link
Contributor

Recommendations & Next Steps

Before Merging - Must Address ⚠️

  1. Resolve CDK version discrepancy - Clarify if upgrade to alpha.2 is included or not
  2. Add test coverage - At minimum, add unit tests for:
    • ContainerDevServer error paths
    • Container size validation
    • Runtime detection logic
  3. Document Dockerfile requirements - Users need clear guidance on required structure

Before Merging - Should Address 🔴

  1. Make bedrock_agentcore username configurable or document it as a hard requirement
  2. Add resource cleanup for built images (at least document the need)
  3. Add size limits to build output logging to prevent memory issues
  4. Add timeouts to subprocess calls in runtime detection
  5. Improve silent fallback handling - log warnings when skipping validation

Post-Merge - Consider for Follow-up 🟡

  1. Enhanced Dockerfile validation - Check for required directives
  2. HEALTHCHECK support in default template
  3. Configurable runtime priority (docker vs podman vs finch)
  4. Better cross-platform support (WSL, Cygwin detection)
  5. userId validation for container agents
  6. Documentation expansion - dev/prod differences, security implications

Overall Assessment

This is high-quality work that implements a complex feature with good architectural patterns. The code is readable, error handling is generally solid, and the integration is comprehensive. With the critical issues addressed (particularly testing and the CDK version), this will be a valuable addition to the CLI.

Great job on:

  • Clean abstraction patterns
  • Comprehensive runtime support
  • Good error messages
  • Hot-reload functionality
  • AWS credential forwarding

The concerns raised are primarily about robustness, resource management, and documentation rather than fundamental design issues.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 43% 2835 / 6593
🔵 Statements 42.59% 2984 / 7005
🔵 Functions 45.49% 606 / 1332
🔵 Branches 48.02% 1861 / 3875
Generated in workflow #422 for commit 2c2b079 by the Vitest Coverage Report Action

notgitika
notgitika previously approved these changes Feb 18, 2026
Copy link
Contributor

@notgitika notgitika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing comments. lgtm

- Add Container as a build type for agents (create, add, dev, deploy, invoke, package)
- Add Dockerfile and dockerignore templates for Python container agents
- Add container dev server with Docker build, run, volume mount, and hot-reload
- Add container packaging with Docker runtime detection and validation
- Add Docker prerequisite check and runtime detection utility
- Wire userId (default: "default-user") through invoke flow for container auth
- Log userId in invoke request logs
- Upgrade vended @aws/agentcore-cdk to ^0.1.0-alpha.2
- Fix eslint require-await errors in codezip-dev-server and container packaging
- Simplify BaseRenderer to use copyAndRenderDir for container templates
- Add buildType: 'CodeZip' to schema-mapper test baseConfig (merge artifact)
- Revert @aws/agentcore-cdk to ^0.1.0-alpha.1 (alpha.2 not yet published;
  semver ^0.1.0-alpha.1 already covers alpha.2 once available)
notgitika and others added 4 commits February 18, 2026 15:44
- Add port comments to Dockerfile EXPOSE directive
- Fix container runtime null check (info.runtime !== null)
- Use path.join() in getDockerfilePath for cross-platform support
- Use CONTAINER_RUNTIMES constant instead of hardcoded array
- Change dynamic import to static import in container-dev-server
- Check ports sequentially to avoid bind race conditions
…tibility

In Container builds, the Python module loads at container startup before
any request context exists. API-key-based providers use @requires_api_key
which needs a workload access token only available within request context,
causing ValueError at import time.

Defer load_model() using lazy initialization in all affected templates:
- Strands: get_or_create_agent() singleton
- LangChain/LangGraph: get_or_create_model() singleton
- Google ADK / OpenAI Agents: ensure_credentials_loaded() guard
Copy link
Contributor

@aidandaly24 aidandaly24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, there are probably some things we should ensure we have in the e2e test like container cleanup after dev


// Validate build type if provided
if (options.build) {
const buildResult = BuildTypeSchema.safeParse(options.build);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if since some of this code is duplicated we should eventually consolidate it somewhere. Just a thought not necessary for this PR.

}

// Build locally
const imageName = `agentcore-package-${agentName}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ever cleaning this up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments