Conversation
📝 WalkthroughWalkthroughAdds a ports subsystem with bitmap-backed allocators, persistence, and DI wiring; introduces per-instance InstanceSpec persistence and reconciliation that allocates/releases ports; adds RandomPorts config and updates workflows to reconcile specs before resource generation. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
db5226b to
674a998
Compare
aaf891b to
46a8e09
Compare
cf89c8a to
4cfbac6
Compare
cf18355 to
1c79e0e
Compare
6ac6ac9 to
584efda
Compare
1c79e0e to
f0c226e
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
server/internal/ports/service.go (1)
219-235: Consider bounded retries inallocateAvailablePort.The loop continues until
AllocateNextreturnsErrFull, but if many ports are bound by external processes, this could result in many iterations before exhaustion. The current design marks externally-bound ports as allocated (line 228 doesn't release them), which is intentional per the comment, but could exhaust the range faster than expected.This is likely acceptable given typical deployment scenarios, but consider logging a warning when a significant portion of attempts fail the OS check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/ports/service.go` around lines 219 - 235, allocateAvailablePort can loop many times if many ports are externally bound; add a bounded-retry/attempt counter inside allocateAvailablePort that increments on each failed s.portChecker(port) check and, when it crosses a configurable threshold, emits a s.logger.Warn message (including port, attempts, and range identifier) so operators see many OS-check failures; continue to call r.AllocateNext() and stop only on ErrFull as before, but expose the threshold as a Service field or constant so it’s adjustable and use the PortRange.AllocateNext, s.portChecker, and s.logger methods to implement this warning behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/config/config.go`:
- Around line 202-214: The RandomPorts.validate method currently rejects ranges
where Max == Min but NewPortRange (and ports_test) allow single-port ranges;
update RandomPorts.validate to accept Min==Max by changing the check from "if
r.Max <= r.Min" to only reject when r.Max < r.Min (i.e., ensure the validation
in RandomPorts.validate permits min==max), keeping the other bounds checks
intact so Min >=1 and Max <=65535 remain enforced.
In `@server/internal/database/instance_spec_store.go`:
- Around line 53-55: InstanceSpecStore.Put currently writes StoredInstanceSpec
unconditionally (uses storage.NewPutOp), ignoring
StoredInstanceSpec.Spec.Version, which allows races during reconcile; change Put
to perform conditional writes: when creating new specs use create-if-absent
semantics and when updating use a version-checked compare-and-swap using the
StoredInstanceSpec.Spec.Version (or store a separate version/ETag) via the
storage client conditional/compare API, and add retry logic on
conditional-failure to re-read the latest StoredInstanceSpec, recompute any port
allocation idempotently, and retry the conditional Put; update references to
InstanceSpecStore.Key and storage.NewPutOp accordingly to locate where to wire
the conditional create/update and retry loop.
In `@server/internal/database/service.go`:
- Around line 536-556: The code currently allocates ports via
s.portsSvc.AllocatePort for spec.Port and spec.PatroniPort before calling
s.store.InstanceSpec.Put(...).Exec(ctx), which can leak ports if Put() fails;
modify the logic to track which ports were newly allocated (e.g., record values
returned from s.portsSvc.AllocatePort in a local slice or flags) and if
s.store.InstanceSpec.Put(...).Exec(ctx) returns an error, call the corresponding
release method (e.g., s.portsSvc.ReleasePort(ctx, port)) for each newly
allocated port before returning the error; alternatively (preferred) move
allocation until after a successful conditional write/transaction/CAS or wrap
the Put() in a transaction so ports are only allocated when persistence is
guaranteed — update the code around spec.Port, spec.PatroniPort,
s.portsSvc.AllocatePort, s.portsSvc.ReleasePort, and the StoredInstanceSpec
Put().Exec call to implement this cleanup or transactional allocation.
---
Nitpick comments:
In `@server/internal/ports/service.go`:
- Around line 219-235: allocateAvailablePort can loop many times if many ports
are externally bound; add a bounded-retry/attempt counter inside
allocateAvailablePort that increments on each failed s.portChecker(port) check
and, when it crosses a configurable threshold, emits a s.logger.Warn message
(including port, attempts, and range identifier) so operators see many OS-check
failures; continue to call r.AllocateNext() and stop only on ErrFull as before,
but expose the threshold as a Service field or constant so it’s adjustable and
use the PortRange.AllocateNext, s.portChecker, and s.logger methods to implement
this warning behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f726b821-6a7e-4135-bdfe-96961c2bb9d8
📒 Files selected for processing (15)
docs/installation/configuration.mdserver/cmd/root.goserver/internal/config/config.goserver/internal/database/instance_spec_store.goserver/internal/database/provide.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/database/store.goserver/internal/ports/ports.goserver/internal/ports/ports_test.goserver/internal/ports/provide.goserver/internal/ports/service.goserver/internal/ports/store.goserver/internal/workflows/activities/get_instance_resources.goserver/internal/workflows/activities/get_restore_resources.go
f0c226e to
13e6ece
Compare
584efda to
a145735
Compare
13e6ece to
c3bdad2
Compare
a145735 to
f3db136
Compare
c3bdad2 to
40aab30
Compare
40aab30 to
7a71fcb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/internal/database/service.go (1)
134-175:⚠️ Potential issue | 🟠 MajorDon't free ports before the delete is durable.
Both delete paths release ports before the corresponding
InstanceSpecremoval commits. If the later etcd delete/transaction fails — orDeleteDatabasereachesErrDatabaseNotFoundafter only finding orphaned instance specs — those ports become reusable while the persisted specs still reference them. Move the allocator update into the same transaction as the spec delete, or release only after the delete has committed successfully.Also applies to: 579-599
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/service.go` around lines 134 - 175, The code currently calls releaseInstancePorts(ctx, spec.Spec) before persisting the spec/db deletes in DeleteDatabase; move port releases so they occur only after the delete transaction commits (or make allocator updates part of the same storage transaction if the allocator supports transactional ops). Concretely: in DeleteDatabase, do not call releaseInstancePorts inside the initial loop over specs; instead collect the ports/instance IDs from each spec (from the spec.Spec) and proceed to build ops (s.store.Spec.Delete(...), s.store.Instance.DeleteByDatabaseID(...), s.store.InstanceSpec.DeleteByDatabaseID(...), s.store.InstanceStatus.DeleteByDatabaseID(...)) and call s.store.Txn(...).Commit(ctx) first, then on successful commit iterate the collected ports and call releaseInstancePorts for each; also ensure the early-return ErrDatabaseNotFound path does not release ports. This preserves durability and prevents freeing ports before deletes are persisted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/database/service.go`:
- Around line 564-574: The current flow persists the new spec before releasing
ports and blindly calls portsSvc.Release for any non-zero Port/PatroniPort, and
releasePreviousSpecPorts uses new.HostID which leaks allocations when moving
hosts; change ReconcileInstanceSpec and releasePreviousSpecPorts to persist
allocation provenance when allocating (mark ports as allocator-owned), only call
portsSvc.Release for ports that were allocated by our allocator (check the saved
provenance on the previous Instance.Spec), and call Release against
previous.HostID (not new.HostID) so the old host's allocations are freed; apply
the same allocator-owned check and previous-host release logic to the other
similar blocks mentioned (around 604-629 and 632-645).
In `@server/internal/ports/service.go`:
- Around line 96-105: The allocator is currently keyed by s.cfg.ClientAddress()
and NewPortRange(min,max) then calling restoreAllocator(ctx, r, name) which
drops the snapshot on Restore failure; this makes allocator identity change when
addresses or ranges change and can reissue in-use ports. Modify the logic to
key/memoize allocators by a stable host identifier (e.g., hostID) instead of
ClientAddress(), and change restoreAllocator (and any code paths around
NewPortRange/restoreAllocator) to migrate or seed existing persisted state into
the new range when ranges/addresses change rather than discarding the snapshot
on Restore failure; ensure functions/variables referenced include
s.cfg.ClientAddress(), s.cfg.RandomPorts.Min/Max, NewPortRange, restoreAllocator
and the hostID source so state is loaded/migrated under the stable host key and
only reset when explicitly requested.
---
Outside diff comments:
In `@server/internal/database/service.go`:
- Around line 134-175: The code currently calls releaseInstancePorts(ctx,
spec.Spec) before persisting the spec/db deletes in DeleteDatabase; move port
releases so they occur only after the delete transaction commits (or make
allocator updates part of the same storage transaction if the allocator supports
transactional ops). Concretely: in DeleteDatabase, do not call
releaseInstancePorts inside the initial loop over specs; instead collect the
ports/instance IDs from each spec (from the spec.Spec) and proceed to build ops
(s.store.Spec.Delete(...), s.store.Instance.DeleteByDatabaseID(...),
s.store.InstanceSpec.DeleteByDatabaseID(...),
s.store.InstanceStatus.DeleteByDatabaseID(...)) and call
s.store.Txn(...).Commit(ctx) first, then on successful commit iterate the
collected ports and call releaseInstancePorts for each; also ensure the
early-return ErrDatabaseNotFound path does not release ports. This preserves
durability and prevents freeing ports before deletes are persisted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 01852eb3-ac93-4c5c-be21-c662b3c81604
📒 Files selected for processing (15)
docs/installation/configuration.mdserver/cmd/root.goserver/internal/config/config.goserver/internal/database/instance_spec_store.goserver/internal/database/provide.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/database/store.goserver/internal/ports/ports.goserver/internal/ports/ports_test.goserver/internal/ports/provide.goserver/internal/ports/service.goserver/internal/ports/store.goserver/internal/workflows/activities/get_instance_resources.goserver/internal/workflows/activities/get_restore_resources.go
🚧 Files skipped from review as they are similar to previous changes (9)
- docs/installation/configuration.md
- server/internal/ports/ports_test.go
- server/internal/ports/store.go
- server/internal/workflows/activities/get_instance_resources.go
- server/internal/ports/ports.go
- server/internal/database/spec.go
- server/internal/database/store.go
- server/internal/ports/provide.go
- server/internal/workflows/activities/get_restore_resources.go
| if err := s.releaseInstancePorts(ctx, spec.Spec); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Can we move this logic after DeleteByKey?
There was a problem hiding this comment.
The same logic that I stated above applies here as well: if the release happens after the delete, it cannot be retried. This operation is more likely to fail during the release, so I think it makes more sense to attempt it first.
| for _, spec := range specs { | ||
| if err := s.releaseInstancePorts(ctx, spec.Spec); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
This should also be moved after completing the transaction and before returning? what do you think?
There was a problem hiding this comment.
I think it makes more sense at the beginning because of the retry case. We're using the simplest form of deletion (just by key with no constraints) for all database entities, and I would only expect them to fail if Etcd were unhealthy. The release will also fail if Etcd is unhealthy, but it has an additional failure mode when many instances are being created and deleted simultaneously. So, as a whole, this method is more likely to fail during the release than during database entity deletes.
With the release at the top here, the spec still exists in Etcd even if the release fails. So, on retry, we're still able to read the spec from storage and retry the release operation.
If we did this the other way around and the entity deletes happened first, then on retry, the spec would no longer exist, so we wouldn't be able to retry the port release. That would have to happen thousands of times before you would run out of random ports in the default range, but still, since we can prevent it, I think it makes more sense to try the more error-prone operation at the beginning.
server/internal/database/service.go
Outdated
| return nil | ||
| } | ||
| if portShouldBeReleased(previous.Port, new.Port) { | ||
| err := s.portsSvc.ReleasePortIfDefined(ctx, new.HostID, previous.Port) |
There was a problem hiding this comment.
Is this previous.HostID or new.HostID?
There was a problem hiding this comment.
I can see how this looks confusing, so I'll change it.
But just for background about why this works as-is: this is the previous and new spec for one instance, so the host IDs are identical. Host ID is a component of the instance identity, meaning that it's impossible for an instance to change hosts. We made that decision because database instances are stateful - they depend on data that lives on a disk that's attached to a specific host.
server/internal/database/service.go
Outdated
| } | ||
| } | ||
| if portShouldBeReleased(previous.PatroniPort, new.PatroniPort) { | ||
| err := s.portsSvc.ReleasePortIfDefined(ctx, new.HostID, previous.PatroniPort) |
There was a problem hiding this comment.
Here also previous.HostID or new.HostID?
tsivaprasad
left a comment
There was a problem hiding this comment.
Verification
- Created a cluster using the cluster initialization script.
- Created a database with port: 0 and verified that a random port was automatically assigned.
- Verified the assigned port using the get-database command.
restish control-plane-local-1 get-database storefront
HTTP/1.1 200 OK
Content-Length: 737
Content-Type: application/json
Date: Thu, 12 Mar 2026 12:11:35 GMT
{
created_at: "2026-03-12T12:04:28Z"
id: "storefront"
instances: [
{
connection_info: {
addresses: ["127.0.0.1"]
port: 15279
}
created_at: "2026-03-12T12:04:36Z"
host_id: "host-1"
id: "storefront-n1-689qacsi"
node_name: "n1"
postgres: {
patroni_state: "running"
role: "primary"
version: "18.2"
}
spock: {
read_only: "off"
version: "5.0.5"
}
state: "available"
status_updated_at: "2026-03-12T12:11:35Z"
updated_at: "2026-03-12T12:10:45Z"
}
]
spec: {
database_name: "storefront"
database_users: [
{
attributes: ["SUPERUSER", "LOGIN"]
db_owner: true
username: "admin"
}
]
nodes: [
{
host_ids: ["host-1"]
name: "n1"
}
]
port: 0
postgres_version: "18.2"
spock_version: "5"
}
state: "available"
updated_at: "2026-03-12T12:04:28Z"
}
- Stopped the database instance using stop-instance.
- Started the instance again using start-instance.
- Verified through get-database / list-databases that the same port (15279) was retained after restart.
restish control-plane-local-1 list-databases
HTTP/1.1 200 OK
Content-Length: 523
Content-Type: application/json
Date: Thu, 12 Mar 2026 12:19:19 GMT
{
databases: [
{
created_at: "2026-03-12T12:04:28Z"
id: "storefront"
instances: [
{
connection_info: {
addresses: ["127.0.0.1"]
port: 15279
}
created_at: "2026-03-12T12:04:36Z"
host_id: "host-1"
id: "storefront-n1-689qacsi"
node_name: "n1"
postgres: {
patroni_state: "running"
role: "primary"
version: "18.2"
}
spock: {
read_only: "off"
version: "5.0.5"
}
state: "available"
status_updated_at: "2026-03-12T12:19:15Z"
updated_at: "2026-03-12T12:19:05Z"
}
]
state: "available"
updated_at: "2026-03-12T12:04:28Z"
}
]
}
Adds a service to allocate random ports for instances from a configurable range. These ports are then persisted with the instance spec. The next time the instance spec is evaluated, we copy the ports from the persisted copy of the spec. The end result is that when we assign a random port to an instance, it will retain that port assignment until it is deleted or the user sets the port to `null` or a non-zero value in the database/node spec. Since this mechanism does not rely on Docker or the OS to assign ports, it will also work for the SystemD orchestrator. PLAT-417 PLAT-236
7a71fcb to
30edbed
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/internal/ports/service.go (1)
219-235: Consider a safety limit on OS-unavailable port retries.
allocateAvailablePortloops indefinitely until it finds an OS-available port. If many ports within[min, max]are bound by external processes, this could exhaust the entire range and returnErrFull. However, ports marked as unavailable remain allocated in the bitmap, which could slowly fill the range.This is likely acceptable given the range size (default 10,000 ports), but worth noting for environments with heavy port usage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/ports/service.go` around lines 219 - 235, The allocateAvailablePort loop in Service.allocateAvailablePort can spin indefinitely if many ports are OS-unavailable; add a safety retry limit and return an error once exceeded. Specifically, track attempts inside allocateAvailablePort while calling PortRange.AllocateNext and s.portChecker, and stop after a bounded number (e.g. the range size or a configurable maxRetries constant), returning a clear error (or ErrFull) instead of looping forever; keep the existing Debug log when skipping ports and ensure the new limit uses AllocateNext and s.portChecker as the loop’s break conditions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/internal/ports/service.go`:
- Around line 219-235: The allocateAvailablePort loop in
Service.allocateAvailablePort can spin indefinitely if many ports are
OS-unavailable; add a safety retry limit and return an error once exceeded.
Specifically, track attempts inside allocateAvailablePort while calling
PortRange.AllocateNext and s.portChecker, and stop after a bounded number (e.g.
the range size or a configurable maxRetries constant), returning a clear error
(or ErrFull) instead of looping forever; keep the existing Debug log when
skipping ports and ensure the new limit uses AllocateNext and s.portChecker as
the loop’s break conditions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3da02593-82a4-4bbd-8c86-c104585ac17e
📒 Files selected for processing (15)
docs/installation/configuration.mdserver/cmd/root.goserver/internal/config/config.goserver/internal/database/instance_spec_store.goserver/internal/database/provide.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/database/store.goserver/internal/ports/ports.goserver/internal/ports/ports_test.goserver/internal/ports/provide.goserver/internal/ports/service.goserver/internal/ports/store.goserver/internal/workflows/activities/get_instance_resources.goserver/internal/workflows/activities/get_restore_resources.go
🚧 Files skipped from review as they are similar to previous changes (5)
- server/internal/ports/store.go
- server/internal/database/spec.go
- server/internal/ports/provide.go
- server/internal/database/store.go
- docs/installation/configuration.md
Summary
Adds a service to allocate random ports for instances from a configurable range. These ports are then persisted with the instance spec. The next time the instance spec is evaluated, we copy the ports from the persisted copy of the spec.
The end result is that when we assign a random port to an instance, it will retain that port assignment until it is deleted or the user sets the port to
nullor a non-zero value in the database/node spec. Since this mechanism does not rely on Docker or the OS to assign ports, it will also work for the SystemD orchestrator.Testing
You will now be able to run the E2Es against the compose-dev environment.
You can also create read-replicas in the compose-dev environment by setting port to 0.
You will also notice that randomly assigned ports will now persist across restarts:
PLAT-417
PLAT-236