Skip to content

feat: render and validate client.toml (currently missing chain-id panics seid start) #17

@bdchatham

Description

@bdchatham

Problem

sei-config renders app.toml and config.toml for nodes it manages, but does not render client.toml. When client.toml is absent, cosmos-sdk auto-creates it on first start with an empty chain-id field, which then causes seid start to panic with a misleading error message that points at config.toml rather than the actual file at fault.

This bites every fresh BYOV (bring-your-own-volume) node we provision in Kubernetes via the seictl sidecar, and the failure mode is opaque enough that operators have to dig into cosmos-sdk source to figure out what's wrong.

Example failure

On a freshly-bound SeiNode whose data volume was prepared by sei-config (no client.toml), seid start panics with:

panic: genesis file chain-id=pacific-1 does not equal config.toml chain-id=

The error message is misleading — the empty side of the comparison is clientCtx.ChainID, which cosmos-sdk loads from client.toml, not config.toml. The exception is raised in cosmos-sdk/server/start.go while validating that the chain-id from the loaded client context matches the one in genesis.json.

The current workaround is to hand-write client.toml onto the volume out-of-band:

chain-id = "pacific-1"
keyring-backend = "test"
output = "text"
node = "tcp://localhost:26657"
broadcast-mode = "sync"

This is captured in our archive-node BYOV runbook and has had to be applied to multiple nodes manually:

  • pacific-1/archive-0 (and forthcoming archive-1, archive-2 for multi-AZ HA)
  • canonical-v4 EC2 bootstrap (different code path but same root cause)

Proposed scope

Two pieces, ideally both:

1. Render client.toml

Have sei-config emit client.toml alongside app.toml / config.toml, populated with at least:

  • chain-id — sourced from the same canonical chain identity sei-config already knows for app.toml / config.toml / genesis.json
  • node = "tcp://localhost:26657" — sane default for in-pod RPC reach
  • keyring-backendtest for service nodes (no operator interaction expected); leave configurable for validator setups
  • broadcast-mode, output — standard defaults

2. Validation gate

Add a config-correctness check (run by seictl, or by sei-config at template-apply time) that surfaces the missing/empty chain-id before seid start panics. Suggested form:

  • If client.toml is missing → fail loud with a pointer to the rendered version sei-config would write.
  • If present but chain-id is empty or mismatched against genesis.json → fail loud with the actual file path and the expected value.

This validation belongs upstream of seid start so that the operator (or in our case, the controller's plan task) gets a precise error rather than a cosmos-sdk panic.

Why now

We're rolling out multi-AZ HA for the pacific-1 archive (archive-1 in eu-central-1a, archive-2 in 1c), which means provisioning two more BYOV nodes, each of which would hit this same panic without manual intervention. Closing this would make the rollout — and any future archive bootstrap — fully self-serve.

References

  • cosmos-sdk source: sei-cosmos/server/start.go (the panic site, line numbers shift across versions)
  • Our BYOV runbook: sei-protocol/sei-k8s-controller/.agent/runbooks/operating-archive-node-byov.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions