feat(tuned): migrate sysctl tunings to tuned by hunleyd · Pull Request #2082 · supabase/postgres

hunleyd · 2026-03-11T18:07:10Z

Summary of Changes

Ansible Task Refactoring (ansible/tasks/):
- setup-system.yml: Removed direct sysctl calls (e.g., net.ipv4.tcp_keepalive_time, vm.panic_on_oom) to prevent conflicts with the new tuned profile.
- setup-postgres.yml: Explicitly defined Group IDs (GIDs) for ssl-cert (1001) and postgres (1002). This ensures deterministic GIDs, which is critical for the new HugePages configuration in setup-tuned.yml that references GID 1002.
- setup-tuned.yml:
  - Adopted throughput-performance as the base profile.
  - Added a comprehensive list of Supabase-specific sysctl parameters to /etc/tuned/profiles/postgresql/tuned.conf based on examing the postgres, platform, and salt repos for existing sysctl calls.
  - Added logic to dynamically calculate and set vm.nr_hugepages based on shared_buffers and configure vm.hugetlb_shm_group.
Service Ordering (ansible/files/gotrue.service.j2):
- Added After=tuned.service to the gotrue service unit. This ensures network and kernel optimizations are fully applied before the authentication service starts.

Detailed Analysis of Sysctl Parameters

The following sysctl parameters are now being applied via tuned. These changes generally aim to optimize for a high-throughput database workload, improve network resilience, and prevent memory exhaustion issues.

Memory Management & HugePages

vm.swappiness = 10: Reduces the kernel's tendency to swap out application memory. For PostgreSQL, swapping is detrimental to performance; we prefer the kernel to reclaim page cache instead.
vm.overcommit_memory = 2: Tells the kernel to never overcommit memory. This is a safer mode for dedicated database servers, ensuring the OOM killer is less likely to trigger unpredictably, though it requires careful sizing of Swap + RAM.
vm.dirty_ratio = 40 / vm.dirty_background_ratio = 10:
- dirty_background_ratio: Start writing dirty pages to disk when they reach 10% of memory.
- dirty_ratio: Force processes to write dirty pages synchronously when they reach 40% of memory.
- Impact: These settings buffer more write operations in memory (up to 40%), which can smooth out I/O spikes but may increase checkpoint times.
vm.dirty_expire_centisecs = 3000 (30s) / vm.dirty_writeback_centisecs = 500 (5s): Controls how often the kernel flusher threads wake up and how old dirty data can be before flush.
kernel.numa_balancing = 0: Disables automatic page migration between NUMA nodes. For PostgreSQL, automatic balancing can sometimes cause unpredictable latency spikes; pinning or manual management is often preferred on large multi-socket systems.
vm.nr_hugepages (Calculated): Allocates explicit HugePages for PostgreSQL shared_buffers.
- Impact: Reduces TLB (Translation Lookaside Buffer) misses and overhead for managing large amounts of memory, slightly improving CPU efficiency for memory-intensive queries. The computed value assumes our default of 128MB for shared_buffers. Salt will need modified to adjust this based on the current shared_buffers value.
vm.hugetlb_shm_group = 1002: Grants the postgres group (GID 1002) permission to use HugePages.

Network & TCP Stack

net.core.somaxconn = 16384: Increases the maximum backlog of pending connections. Critical for handling bursts of new connections (connection storms).
net.ipv4.ip_local_port_range = 1025 65499: Widens the range of available ephemeral ports, allowing more outgoing connections (e.g., to external APIs or replicas).
net.ipv4.tcp_tw_reuse = 1: Allows reusing sockets in TIME_WAIT state for new connections. This helps prevent port exhaustion in high-turnover scenarios.
net.ipv4.tcp_keepalive_time = 1800 (30m) / net.ipv4.tcp_keepalive_intvl = 60: Reduces the time dead connections hang around.
net.core.rmem_max / net.core.wmem_max (~100MB): drastically increases the maximum TCP read/write buffer sizes.
- Impact: Allows TCP windows to scale larger, significantly improving throughput over high-latency links (e.g., cross-region replication).
net.ipv4.tcp_window_scaling = 1: Enables large TCP windows (required for the buffers above to be effective).
net.core.netdev_max_backlog = 10000: Increases the queue for incoming packets on the network interface, preventing packet drops during high traffic bursts.

Kernel Stability & Shared Memory

kernel.panic = 10 / kernel.panic_on_oom = 1 / vm.panic_on_oom = 1: Configures the system to reboot 10 seconds after a kernel panic or Out-Of-Memory (OOM) event.
- Impact: In a cloud/HA environment, it is often better to fail fast and reboot (letting HA failover handle traffic) than to hang in a degraded state.
kernel.shmmax / kernel.shmall: Set to effectively infinite values to ensure the kernel allows PostgreSQL to allocate as much shared memory as it requests.
fs.file-max / fs.aio-max-nr: Increases limits on open files and asynchronous I/O events, essential for databases handling many connections and disk operations.

Conclusion

These changes represent a move towards a "production-ready," high-performance configuration. The system is explicitly tuned for high throughput (via buffer/window increases), stability (via OOM panic policies), and reduced CPU overhead (via HugePages and NUMA settings). These settings were based on existing Supabase settings throughout the code, and the recommended tuning practices from Red Hat: PostgreSQL Load Tuning on RHEL (https://www.redhat.com/en/blog/postgresql-load-tuning-red-hat-enterprise-linux). This ensures that the OS is not just a general-purpose host, but is specifically optimized for the high-concurrency, high-I/O profile of a production PostgreSQL instance.

… GIDs - Move various sysctl parameters from setup-system.yml into the postgresql tuned profile. - Explicitly define GIDs for ssl-cert (1001) and postgres (1002) to ensure stable HugePages access. - Add HugePages calculation and hugetlb_shm_group configuration to the tuned profile. - Ensure gotrue.service waits for tuned.service before starting.

hunleyd added 4 commits March 11, 2026 13:43

chore: adjust ammi version vars

4e604d6

fix(tuned): allow overcommit

8bffbbe

Merge branch 'develop' into INDATA-378

acf08e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tuned): migrate sysctl tunings to tuned #2082

feat(tuned): migrate sysctl tunings to tuned #2082
hunleyd wants to merge 4 commits intodevelopfrom
INDATA-378

hunleyd commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hunleyd commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Detailed Analysis of Sysctl Parameters

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hunleyd commented Mar 11, 2026 •

edited

Loading