Skip to content

Server hardening improvements for binary/streaming endpoints (camera use-case) #5

Description

@harmon25

Context

Observed on AtomVM 0.7.0-beta.0+git.fd8127f1 (ESP32-S3 XIAO Sense), Elixir 1.19.5 / OTP 28, atomvm_httpd at aae97d1 (improvements branch).

Use-case: a camera webserver that serves live JPEG frames at ~100–160 kB each from PSRAM framebuffers over WiFi. This hammers the send path and surfaced several issues. Filing as one tracking issue with sub-bullets so they can be individually scheduled.


1. Demote "Connection closed mid-transfer" log to ?TRACE

try_send_binary/2 logs every client-initiated mid-stream close at io:format level:

{error, closed} ->
    case byte_size(Rest) of
        0 -> ok;
        _ -> io:format("Connection closed mid-transfer (~p/~p bytes sent)~n", [ChunkSize, TotalSize])
    end,

A browser canceling a request (image swap, tab close, navigation, or a parallel request racing ahead) is normal client behavior, not a server fault. On a page that chains JPEG requests back-to-back, a single slow WiFi connection produces dozens of these per minute and completely obscures real errors.

Suggestion: demote to ?TRACE, matching the surrounding tracing pattern in the file.


2. Set TCP_NODELAY on accepted connections

accept/2 does not set {tcp, nodelay} on accepted sockets. Nagle buffering holds small writes in lwIP's send buffer for up to ~40 ms while waiting for an ACK — which is exactly the wrong default for an HTTP server that sends a response head followed immediately by a binary body. The response head ends up queued until either an ACK arrives or the Nagle timer fires, adding latency to every response.

Suggestion: add socket:setopt(Connection, {tcp, nodelay}, true) immediately after socket:accept/1 (~line 323). This could be unconditional (HTTP is virtually always nodelay-friendly) or gated behind a socket_options map entry so callers can opt out.


3. Make MAX_SEND_CHUNK configurable; bump default from 1460 to 4096

gen_tcp_server.erl hardcodes ?MAX_SEND_CHUNK = 1460 (Ethernet TCP MSS). For a 100 kB JPEG this means ~70 successive socket:send calls each followed by a receive after 0 yield. That:

  • multiplies NIF-crossing and lwIP queue overhead per response,
  • widens the window for a client cancellation to land between chunks (contributing to issue 1),
  • increases mailbox depth on the controlling gen_server during a long send.

WiFi handles its own fragmentation above the MTU; lwIP also segments internally. Larger per-send payloads are the norm in similar embedded HTTP stacks.

Suggestion:

  • accept a chunk_size key in the socket_options map passed to start/4 / start_link/4,
  • default to 4096 (appropriate for ESP32 lwIP send-buffer headroom; callers on platforms with larger buffers can raise it).

4. Move request handling into per-connection worker processes

Currently the single controlling gen_server:

  1. owns the listen socket and connection registry,
  2. handles all requests (handle_tcp_dataHandler:handle_http_req/2try_send).

While one client downloads a 100 kB response, every other client's request sits in the gen_server's mailbox. For a camera server with a browser issuing 4–6 parallel image loads, or any long-lived response (chunked transfer, SSE), this is a hard serialization bottleneck.

Suggestion: spawn a worker process at accept time that owns both the recv loop and request dispatch/send for its socket. The gen_server retains only listen-socket ownership and max_connections enforcement. Rough shape:

gen_server (listener)
  ├── accept/2 (spawned per listener)
  └── worker (spawned per connection): recv → dispatch → send → loop

This is the more involved refactor in the list — filing separately so it can be planned on its own schedule.


All four are independent; they can be addressed in any order. Happy to submit PRs for any of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions