Skip to content

feat: instance record management improvements#289

Merged
jason-lynch merged 1 commit intomainfrom
feat/PLAT-398/instance-record-management
Mar 11, 2026
Merged

feat: instance record management improvements#289
jason-lynch merged 1 commit intomainfrom
feat/PLAT-398/instance-record-management

Conversation

@jason-lynch
Copy link
Member

Summary

Previously, the instance resource was responsible for creating, updating and deleting the instance record in Etcd. This led to a corner case in our disaster recovery process when we need to remove an instance resource without executing its lifecyce methods.

This commit shifts the responsibility for creating and deleting the instance records to the workflows. Now, we will create or update the instance record to indicate the operation we're about to perform. We only completely delete the instance records if the database operation was successful.

The instance resource still updates the instance record to indicate an available or failed status with an error.

Testing

This is mostly a refactor to pave the way for other features, so the user-facing changes in this PR are very subtle. There's a new deleting instance state, and instances will enter a creating, modifying, or deleting state slightly earlier in the process than they used to.

PLAT-398

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a325c8b9-1309-4707-81a7-7211780282e8

📥 Commits

Reviewing files that changed from the base of the PR and between de78f91 and f3db136.

⛔ Files ignored due to path filters (4)
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
📒 Files selected for processing (13)
  • api/apiv1/design/instance.go
  • client/enums.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_status_store.go
  • server/internal/database/instance_store.go
  • server/internal/database/service.go
  • server/internal/workflows/activities/activities.go
  • server/internal/workflows/activities/cleanup_instance.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/common.go
  • server/internal/workflows/restart_instance.go
💤 Files with no reviewable changes (2)
  • server/internal/workflows/activities/cleanup_instance.go
  • server/internal/workflows/restart_instance.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • api/apiv1/design/instance.go
  • server/internal/workflows/activities/update_db_state.go
  • client/enums.go
  • server/internal/workflows/activities/activities.go
  • server/internal/database/instance_status_store.go

📝 Walkthrough

Walkthrough

The PR adds a "deleting" instance state throughout the API, client, and database layers, and refactors the instance cleanup workflow. It introduces UpdatePlannedInstanceStates activity to apply planned state changes, removes the CleanupInstance activity, and updates the database service to cascade-delete instances when databases are deleted.

Changes

Cohort / File(s) Summary
API and Client Enumerations
api/apiv1/design/instance.go, client/enums.go
Added "deleting" state to Instance enum values, expanding the set of allowed instance states.
Database State Model
server/internal/database/instance.go
Added InstanceStateDeleting constant, inProgressStates set including the new deleting state, and IsInProgress() method to check if an instance is in a transitional state.
Instance Resource Lifecycle
server/internal/database/instance_resource.go
Streamlined resource creation, modification, and deletion by removing intermediate state updates, inlining initialization calls, and replacing service-based deletion with a no-op path comment indicating instance shutdown and filesystem cleanup.
Database Storage Operations
server/internal/database/instance_store.go, server/internal/database/instance_status_store.go
Added DeleteByDatabaseID() methods for bulk deletion of instances and statuses by database ID; added Now field to InstanceUpdateOptions to support testable timestamp handling.
Service Layer Cleanup
server/internal/database/service.go
Updated DeleteDatabase() to cascade-delete related instances and instance statuses by database ID; modified GetInstance() to handle missing statuses gracefully.
Workflow Activity Replacement
server/internal/workflows/activities/activities.go, server/internal/workflows/activities/cleanup_instance.go, server/internal/workflows/activities/update_planned_instance_states.go
Removed CleanupInstance activity registration and implementation; added new UpdatePlannedInstanceStates activity to apply planned instance state changes from database plans during workflow execution.
Workflow State Handling
server/internal/workflows/activities/update_db_state.go
Extended state change handling to mark in-progress instances as failed when database fails and delete orphaned instances when database succeeds.
Workflow Orchestration
server/internal/workflows/common.go, server/internal/workflows/restart_instance.go
Removed host-removal cleanup workaround; integrated updatePlannedInstanceStates() helper into plan application flow; removed cancellation cleanup logic from restart instance workflow.

Poem

🐰 wiggles nose at instances now deleting
A cleanup dance no more—the plan's now healing,
States transition smooth, no orphans left behind,
Delete with grace, cascade-aligned,
The workflow hops to a better way of being! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely describes the main change: shifting instance record lifecycle responsibility from resources to workflows.
Description check ✅ Passed Description includes all required sections with clear explanation of the motivation, changes made, and testing notes; all checklist items are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/PLAT-398/instance-record-management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/addresses branch from 2ba6cc0 to 1aa1b8b Compare March 9, 2026 22:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from db5226b to 674a998 Compare March 9, 2026 22:37
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/addresses branch from 1aa1b8b to 06352a5 Compare March 10, 2026 13:09
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch 2 times, most recently from cf89c8a to 4cfbac6 Compare March 10, 2026 14:27
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/addresses branch from 06352a5 to 21bd381 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from 4cfbac6 to de78f91 Compare March 10, 2026 14:33
@jason-lynch
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/internal/database/instance_resource.go (1)

113-115: Add clarifying comment to Delete method explaining intentional no-op design.

The Delete method's nil return is intentional—instance record cleanup is handled by DeleteDbEntities activity in the deletion workflow, not by the resource's Delete method. Both deletion paths (normal workflow via applyPlans and host removal error handling) correctly converge on database.Service.DeleteDatabase, which explicitly cleans up instance records.

However, InstanceResource.Delete lacks an explanatory comment. Other resource types with similar no-op Delete methods (MCPConfigResource, ServiceInstanceSpecResource) include comments explaining why deletion is a no-op. Add a similar comment here for consistency and clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/instance_resource.go` around lines 113 - 115, Add a
clarifying comment above the InstanceResource.Delete method stating that the nil
return is intentional because instance record cleanup is handled by the
DeleteDbEntities activity in the deletion workflow (both normal applyPlans
workflow and host-removal error handling converge on
database.Service.DeleteDatabase), mirroring the explanatory comments present on
MCPConfigResource and ServiceInstanceSpecResource to make the no-op explicit and
avoid confusion.
server/internal/workflows/common.go (1)

94-125: Early exit optimization doesn't break outer loop.

The inner loop break on line 104 only exits the inner for _, event := range phase loop, not the outer for _, phase := range plan loop. This means if an instance-modifying event is found in the first phase, the code still continues iterating through remaining phases unnecessarily.

Consider adding a labeled break or restructuring for efficiency:

♻️ Optional: Use labeled break for efficiency
 func (w *Workflows) updatePlannedInstanceStates(
 	ctx workflow.Context,
 	databaseID string,
 	plan resource.Plan,
 ) error {
 	var modifiesInstances bool
+outer:
 	for _, phase := range plan {
 		for _, event := range phase {
 			if event.Resource.Identifier.Type == database.ResourceTypeInstance {
 				modifiesInstances = true
-				break
+				break outer
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/workflows/common.go` around lines 94 - 125, The loop in
updatePlannedInstanceStates uses a break that only exits the inner "for _, event
:= range phase" loop, so modifiesInstances remains set but the outer "for _,
phase := range plan" continues iterating; change the logic to stop scanning
remaining phases as soon as an instance-modifying event is found — e.g., use a
labeled break from the outer loop or immediately set modifiesInstances = true
and break out of the outer loop (or return early) so you avoid unnecessary work
before constructing UpdatePlannedInstanceStatesInput and calling
ExecuteUpdatePlannedInstanceStates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/internal/database/instance_resource.go`:
- Around line 113-115: Add a clarifying comment above the
InstanceResource.Delete method stating that the nil return is intentional
because instance record cleanup is handled by the DeleteDbEntities activity in
the deletion workflow (both normal applyPlans workflow and host-removal error
handling converge on database.Service.DeleteDatabase), mirroring the explanatory
comments present on MCPConfigResource and ServiceInstanceSpecResource to make
the no-op explicit and avoid confusion.

In `@server/internal/workflows/common.go`:
- Around line 94-125: The loop in updatePlannedInstanceStates uses a break that
only exits the inner "for _, event := range phase" loop, so modifiesInstances
remains set but the outer "for _, phase := range plan" continues iterating;
change the logic to stop scanning remaining phases as soon as an
instance-modifying event is found — e.g., use a labeled break from the outer
loop or immediately set modifiesInstances = true and break out of the outer loop
(or return early) so you avoid unnecessary work before constructing
UpdatePlannedInstanceStatesInput and calling ExecuteUpdatePlannedInstanceStates.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c52a6ff-dac2-42db-b57b-60d005a5deec

📥 Commits

Reviewing files that changed from the base of the PR and between 21bd381 and de78f91.

⛔ Files ignored due to path filters (4)
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
📒 Files selected for processing (13)
  • api/apiv1/design/instance.go
  • client/enums.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_status_store.go
  • server/internal/database/instance_store.go
  • server/internal/database/service.go
  • server/internal/workflows/activities/activities.go
  • server/internal/workflows/activities/cleanup_instance.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/common.go
  • server/internal/workflows/restart_instance.go
💤 Files with no reviewable changes (2)
  • server/internal/workflows/restart_instance.go
  • server/internal/workflows/activities/cleanup_instance.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/addresses branch from 21bd381 to 2d6598c Compare March 10, 2026 14:57
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch 2 times, most recently from 6ac6ac9 to 584efda Compare March 10, 2026 16:39
@mmols mmols requested a review from rshoemaker March 10, 2026 18:01
Copy link
Contributor

@rshoemaker rshoemaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - one question in-line.

Base automatically changed from feat/PLAT-417/addresses to main March 11, 2026 12:54
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from 584efda to a145735 Compare March 11, 2026 12:57
Previously, the instance resource was responsible for creating, updating
and deleting the instance record in Etcd. This led to a corner case in
our disaster recovery process when we need to remove an instance
resource without executing its lifecyce methods.

This commit shifts the responsibility for creating and deleting the
instance records to the workflows. Now, we will create or update the
instance record to indicate the operation we're about to perform.
We only completely delete the instance records if the database operation
was successful.

The instance resource still updates the instance record to indicate an
available or failed status with an error.

PLAT-398
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from a145735 to f3db136 Compare March 11, 2026 13:06
@jason-lynch jason-lynch merged commit c2e8e67 into main Mar 11, 2026
3 checks passed
@jason-lynch jason-lynch deleted the feat/PLAT-398/instance-record-management branch March 11, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants