feat(docker): separate iptables setup into init container#1281
Conversation
Add awf-iptables-init service that shares the agent's network namespace via network_mode: "service:agent" and runs setup-iptables.sh before signaling readiness. The agent container never receives NET_ADMIN capability, eliminating the startup window where privileged capabilities were held. Key changes: - Add iptables-init service to docker-compose with NET_ADMIN + cap_drop ALL - Remove NET_ADMIN from agent container's cap_add - Agent entrypoint waits for /tmp/awf-init/ready signal (30s timeout) - Init container uses same image as agent, exits after iptables setup - Update cleanup scripts to handle awf-iptables-init container Fixes #375 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (3 files)
✨ New Files (2 files)
Coverage comparison generated by |
The iptables-init container uses network_mode: service:agent to share the agent's network namespace. With depends_on: service_started, Docker may try to look up the agent's PID in /proc before it's fully visible, causing "lstat /proc/PID/ns/net: no such file or directory". Adding a healthcheck to the agent and using service_healthy ensures the PID is stable before the init container starts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors firewall startup so iptables rules are applied by a dedicated init container (awf-iptables-init) that shares the agent’s network namespace, allowing the agent container to run without NET_ADMIN at any point.
Changes:
- Add an
iptables-initCompose service intended to runsetup-iptables.sh, then signal readiness via a shared volume file. - Remove
NET_ADMINfrom the agent service’scap_addand update container cleanup to include the new init container. - Update the agent entrypoint to wait (up to 30s) for the init container’s readiness signal before proceeding.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/docker-manager.ts | Adds init-signal directory/volume, introduces iptables-init Compose service, removes NET_ADMIN from agent, updates container removal list. |
| src/docker-manager.test.ts | Updates expectations for agent capabilities and adds assertions for the new iptables-init service. |
| scripts/ci/cleanup.sh | Ensures CI cleanup removes the new awf-iptables-init container. |
| containers/agent/entrypoint.sh | Replaces in-container iptables setup with a wait-for-init readiness signal and adjusts capability-drop logic. |
Comments suppressed due to low confidence (1)
src/docker-manager.ts:1078
- The
iptables-initservice uses the same agent image, but that image’s Dockerfile setsENTRYPOINT ["/usr/local/bin/entrypoint.sh"]. In Compose,commandoverrides CMD but not ENTRYPOINT, so this init container will runentrypoint.sh(which now waits on/tmp/awf-init/ready) and deadlock—preventing iptables setup and causing the agent to timeout. Fix by overridingentrypointforiptables-init(or add an env flag to skip the wait logic in entrypoint.sh) so it runssetup-iptables.shdirectly and then writes the ready file.
const iptablesInitService: any = {
container_name: 'awf-iptables-init',
// Share agent's network namespace so iptables rules apply to agent's traffic
network_mode: 'service:agent',
// Only mount the init signal volume and the iptables setup script
volumes: [
`${initSignalDir}:/tmp/awf-init:rw`,
],
environment: {
// Pass through environment variables needed by setup-iptables.sh
AWF_SQUID_HOST: environment.AWF_SQUID_HOST || `${networkConfig.squidIp}`,
AWF_SQUID_PORT: String(SQUID_PORT),
AWF_DNS_SERVERS: environment.AWF_DNS_SERVERS || '',
AWF_BLOCKED_PORTS: environment.AWF_BLOCKED_PORTS || '',
AWF_ENABLE_HOST_ACCESS: environment.AWF_ENABLE_HOST_ACCESS || '',
AWF_API_PROXY_IP: environment.AWF_API_PROXY_IP || '',
AWF_DOH_PROXY_IP: environment.AWF_DOH_PROXY_IP || '',
AWF_SSL_BUMP_ENABLED: environment.AWF_SSL_BUMP_ENABLED || '',
AWF_SSL_BUMP_INTERCEPT_PORT: environment.AWF_SSL_BUMP_INTERCEPT_PORT || '',
},
depends_on: {
'agent': {
condition: 'service_started',
},
},
// Only NET_ADMIN is needed for iptables setup
cap_add: ['NET_ADMIN'],
cap_drop: ['ALL'],
security_opt: ['no-new-privileges:true'],
// Run setup-iptables.sh then signal readiness
command: ['/bin/bash', '-c', '/usr/local/bin/setup-iptables.sh && touch /tmp/awf-init/ready'],
// Resource limits (init container exits quickly)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // SECURITY: iptables init container - sets up NAT rules in a separate container | ||
| // that shares the agent's network namespace but NEVER gives NET_ADMIN to the agent. | ||
| // This eliminates the window where the agent holds NET_ADMIN during startup. | ||
| const iptablesInitService: any = { | ||
| container_name: 'awf-iptables-init', | ||
| // Share agent's network namespace so iptables rules apply to agent's traffic | ||
| network_mode: 'service:agent', | ||
| // Only mount the init signal volume and the iptables setup script | ||
| volumes: [ | ||
| `${initSignalDir}:/tmp/awf-init:rw`, | ||
| ], | ||
| environment: { |
| expect(initService.restart).toBe('no'); | ||
| }); | ||
|
|
The init container architecture requires the agent image to have the updated entrypoint that waits for the init container's ready signal. Without pre-building, examples use GHCR images with the old entrypoint, causing the agent to exit because it tries to run setup-iptables.sh without NET_ADMIN capability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setup-iptables.sh reads SQUID_PROXY_HOST (not AWF_SQUID_HOST), but the init container only passed AWF_SQUID_HOST. Since the init container uses network_mode: service:agent, it may not have DNS resolution for compose service names, causing getent hosts to fail and the script to exit before writing the ready signal. Use the direct IP address to avoid DNS issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The init container passes SQUID_PROXY_HOST as a direct IP (172.30.0.10) to bypass DNS resolution. But setup-iptables.sh runs getent hosts on it, which does a reverse DNS lookup that fails in Docker containers, causing the init container to exit before writing the ready signal. The agent then times out after 30s waiting for /tmp/awf-init/ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iptables init container was hanging because cap_drop: ALL removed NET_RAW which iptables needs for netfilter socket operations. Also removed no-new-privileges which can block iptables binary execution. Added diagnostic output logging: setup-iptables.sh output is written to /tmp/awf-init/output.log (shared volume), and on timeout the entrypoint displays the log for easier CI debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The init container uses the same Docker image as the agent, which has ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]. The entrypoint.sh contains an "init container wait" loop that waits for /tmp/awf-init/ready to appear. When the init container runs through this same entrypoint, it deadlocks waiting for itself to signal readiness. Fix: Set entrypoint: ['/bin/bash'] on the init container to bypass entrypoint.sh and run setup-iptables.sh directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The init container's environment object captures values at definition time (JavaScript object literal evaluation). AWF_API_PROXY_IP was set on line 1196 (inside the enableApiProxy block) but read on line 1076 (init container definition), so the init container always got an empty string. This caused setup-iptables.sh to skip adding ACCEPT rules for the API proxy IP (172.30.0.30), blocking agent→api-proxy connectivity and failing the API proxy health check. Move the assignment before the init container definition so the value is available when the object literal is evaluated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Smoke Test Results — PASS
|
|
Smoke test results:
|
Smoke Test Results — Copilot Engine ✅ PASSLast 2 merged PRs:
Overall: PASS — PR by
|
Chroot Version Comparison Results
Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Summary
awf-iptables-initservice that shares the agent's network namespace and performs iptables setup withNET_ADMINcapabilityNET_ADMINfrom the agent container entirely — the agent never holds this capability, even briefly during startupsetup-iptables.sh, writes signal file, and exitsMotivation
Previously the agent container was granted
NET_ADMINat startup and dropped it viacapshbefore running user code. This created a brief window where a bug or misconfiguration in the entrypoint could skip the privilege drop. The init container pattern makes the security boundary visible at the Docker layer level — the agent container literally cannot modify iptables rules at any point in its lifecycle.Test plan
Fixes #375
🤖 Generated with Claude Code