Skip to content

Conversation

@kimpenhaus
Copy link
Collaborator

@kimpenhaus kimpenhaus commented Nov 4, 2025

Hey Christoph @buehler

This is a comprehensive PR containing changes from integrating the operator into our cluster environment.

Summary

This PR introduces breaking changes to the KubeOps SDK, implementing a result pattern inspired by the Go operator implementation. Controllers and finalizers now return ReconciliationResult<TEntity> enabling explicit success/failure states, centralized requeuing via RequeueAfter, and automatic finalizer lifecycle management. Additional improvements include extensible requeue mechanisms, const value support in source generators, and configurable leader election types.

Breaking Changes ⚠️

1. Result Pattern

Controller and finalizer interfaces now return Task<ReconciliationResult<TEntity>> instead of Task:

Before:

public interface IEntityController<TEntity>
{
    Task ReconcileAsync(TEntity entity, CancellationToken cancellationToken);
    Task DeletedAsync(TEntity entity, CancellationToken cancellationToken);
}

After:

public interface IEntityController<TEntity>
{
    Task<ReconciliationResult<TEntity>> ReconcileAsync(TEntity entity, CancellationToken cancellationToken);
    Task<ReconciliationResult<TEntity>> DeletedAsync(TEntity entity, CancellationToken cancellationToken);
}

The ReconciliationResult<TEntity> provides:

  • Success/failure status with error information
  • Optional RequeueAfter timespan for delayed reprocessing
  • Access to the updated entity after reconciliation (which allows, for example, changing the entity's state before finalizer detachment, which was not possible before as the entity would have been in a modified state)

Migration Example:

// Old implementation
public async Task ReconcileAsync(V1TestEntity entity, CancellationToken token)
{
    // ... reconciliation logic
}

// New implementation
public async Task<ReconciliationResult<V1TestEntity>> ReconcileAsync(V1TestEntity entity, CancellationToken token)
{
    // ... reconciliation logic

    // Success - requeue after 5 minutes
    return ReconciliationResult<V1TestEntity>.Success(entity, TimeSpan.FromMinutes(5));

    // Or failure with error message
    return ReconciliationResult<V1TestEntity>.Failure(entity, "Failed to process entity");
}

2. Namespace Reorganization

Types moved to new namespaces:

  • IEntityController<TEntity>: KubeOps.Abstractions.ControllerKubeOps.Abstractions.Reconciliation.Controller
  • IEntityFinalizer<TEntity>: KubeOps.Abstractions.FinalizerKubeOps.Abstractions.Reconciliation.Finalizer
  • EntityRequeue: KubeOps.Abstractions.QueueKubeOps.Abstractions.Reconciliation.Queue
  • IEntityRequeueFactory: KubeOps.Abstractions.QueueKubeOps.Abstractions.Reconciliation.Queue

Migration: Update using statements in your controllers and finalizers.

3. Queue Interface Changes

The internal queue interface is now public and extensible:

public interface ITimedEntityQueue<TEntity>
{
    Task Enqueue(TEntity entity, RequeueType type, TimeSpan requeueIn, CancellationToken cancellationToken);
    Task Remove(TEntity entity, CancellationToken cancellationToken);
}

This enables implementing durable requeue mechanisms (e.g., backed by Redis, Service Bus, database) by overriding the default in-memory implementation.

New Features

1. Automatic Finalizer Management

Two new settings provide automatic finalizer lifecycle management:

builder.Services
    .AddKubernetesOperator(settings =>
    {
        // Automatically attach finalizers during reconciliation (default: true)
        settings.AutoAttachFinalizers = true;

        // Automatically detach finalizers after successful finalization (default: true)
        settings.AutoDetachFinalizers = true;
    });

Benefits:

  • No manual finalizer management required
  • Consistent finalizer handling across operators
  • Reduces boilerplate code
  • Can be disabled for custom finalization workflows

2. Const Value Support in Source Generator

The syntax receiver now supports constant values in Kubernetes entity attributes:

public static class Constants
{
    public const string ApiGroup = "mycompany.com";
    public const string ApiVersion = "v1";
}

[KubernetesEntity(
    Group = Constants.ApiGroup,  // Const values now supported
    ApiVersion = Constants.ApiVersion,
    Kind = "MyResource")]
public class V1MyResource : CustomKubernetesEntity<V1MyResourceSpec>
{
}

Benefits:

  • Centralized API group/version management
  • Compile-time constant validation
  • Better code organization for multi-resource operators

3. Leader Election Type Configuration

Introduction of LeaderElectionType enum for explicit leader election configuration:

public enum LeaderElectionType
{
    None = 0,    // No leader election (default)
    Single = 1,  // Single leader election using Kubernetes leases
    Custom = 2   // Custom user-defined leader election mechanism
}

Configuration:

builder.Services
    .AddKubernetesOperator(settings =>
    {
        settings.LeaderElectionType = LeaderElectionType.Single;
        settings.LeaderElectionLeaseDuration = TimeSpan.FromSeconds(15);
        settings.LeaderElectionRenewDeadline = TimeSpan.FromSeconds(10);
        settings.LeaderElectionRetryPeriod = TimeSpan.FromSeconds(2);
    });

Benefits:

  • Explicit configuration of leader election behavior
  • Support for custom leader election implementations
  • Clear distinction between single-instance and multi-instance deployments

4. Extensible Requeue Mechanism

Introduction of RequeueType enum and ITimedEntityQueue<TEntity> interface:

public enum RequeueType
{
    Added,
    Modified,
    Deleted
}

Use Cases:

  • Implement durable requeue using external storage (Redis, Service Bus, database)
  • Survive operator restarts
  • Implement custom requeue strategies
  • Add monitoring and metrics for requeue operations

Example Implementation:

public class DurableEntityQueue<TEntity> : ITimedEntityQueue<TEntity>
{
    public async Task Enqueue(TEntity entity, RequeueType type, TimeSpan requeueIn, CancellationToken cancellationToken)
    {
        // Store in Redis/Database with execution time
        await _storage.SaveAsync(entity, type, DateTime.UtcNow.Add(requeueIn));
    }

    public async Task Remove(TEntity entity, CancellationToken cancellationToken)
    {
        // Remove from external storage
        await _storage.DeleteAsync(entity);
    }
}

5. ReconciliationContext

New context object providing metadata about reconciliation triggers:

public sealed record ReconciliationContext<TEntity>
{
    public TEntity Entity { get; }
    public WatchEventType EventType { get; }
    public ReconciliationTriggerSource ReconciliationTriggerSource { get; }
}

Helps distinguish between API server events and operator-initiated requeues.

Implementation Details

Core Components

  1. ReconciliationResult (src/KubeOps.Abstractions/Reconciliation/ReconciliationResult{TEntity}.cs)

    • Immutable record type with success/failure semantics
    • Optional requeue after duration
    • Error message and exception support
  2. Reconciler (src/KubeOps.Operator/Reconciliation/Reconciler.cs)

    • Centralized reconciliation orchestration
    • Handles controller and finalizer invocation
    • Manages generation-based caching
    • Automatic finalizer attachment/detachment
    • Better testability
  3. ITimedEntityQueue (src/KubeOps.Operator/Queue/ITimedEntityQueue{TEntity}.cs)

    • Public interface for queue implementations
    • Async methods with cancellation token support
    • Extensibility point for custom implementations

Alignment with Go Implementation

This implementation draws inspiration from controller-runtime (Go):

  • Result pattern for reconciliation outcomes
  • RequeueAfter concept for delayed reprocessing
  • Clear separation of success/error states
  • Flexible error handling strategies

Testing

  • ✅ Comprehensive unit tests for ReconciliationResult<TEntity>
  • ✅ Unit tests for ReconciliationContext<TEntity>
  • ✅ Integration tests for finalizer auto-attach/detach
  • ✅ Tests for const value support in syntax receiver
  • ✅ Queue functionality tests with new RequeueType
  • ✅ All existing integration tests updated and passing

Documentation

  • Updated controller examples with new result pattern
  • Added advanced configuration guide
  • Updated finalizer documentation with auto-attach/detach settings
  • Added caching documentation
  • Migration guide included in this PR description

Additional Notes

Migration Checklist

For operators upgrading to this version:

  • Update controller methods to return ReconciliationResult<TEntity>
  • Update finalizer methods to return ReconciliationResult<TEntity>
  • Update namespace imports for reconciliation types
  • Review automatic finalizer settings (defaults are enabled)
  • Review leader election configuration (default: None)
  • Consider using const values for entity attributes (optional)
  • Test requeue behavior with new result pattern
  • Review error handling using result pattern instead of exceptions

kimpenhaus added 30 commits June 4, 2025 07:35
# Conflicts:
#	src/KubeOps.Abstractions/KubeOps.Abstractions.csproj
…g (hybrid cache)

- Integrated FusionCache for robust caching in resource watchers.
- Enhanced default configuration with extensible settings in `OperatorSettings`.
- Improved concurrency handling using `SemaphoreSlim` for entity events.
- Updated tests and dependencies to reflect caching changes.
…nt entity locks

- Renamed `DefaultCacheConfiguration` to `DefaultResourceWatcherCacheConfiguration` for clarity.
- Introduced cache key prefix to improve cache segmentation.
- Removed `ConcurrentDictionary` for entity locks to simplify concurrency management.
- Refactored event handling logic for "added" and "modified" events to streamline codebase.
- Updated `ConfigureResourceWatcherEntityCache` to use `IFusionCacheBuilder` for extensibility.
- Moved resource watcher cache setup logic to `WithResourceWatcherCaching` extension.
- Added detailed XML comments for `EntityLoggingScope` to improve documentation.
- Removed redundant `DefaultResourceWatcherCacheConfiguration`.
- Renamed `WithResourceWatcherCaching` to `WithResourceWatcherEntityCaching` for clarity.
- Updated `CacheExtensions` to be `internal` to limit scope.
- Removed unused dependency on `ZiggyCreatures.Caching.Fusion`.
- Added a new `Caching` documentation page explaining resource watcher caching with FusionCache and configuration options (in-memory and distributed).
- Updated sidebar positions for `Deployment`, `Utilities`, and `Testing` to accommodate the new `Caching` page.
…usionCache details

- Improved explanations for in-memory and distributed caching setups.
- Added example code for customizing resource watcher cache with FusionCache.
- Included references to FusionCache and Redis documentation for further guidance.
# Conflicts:
#	src/KubeOps.Operator/Watcher/ResourceWatcher{TEntity}.cs
# Conflicts:
#	examples/Operator/Finalizer/FinalizerOne.cs
#	src/KubeOps.Abstractions/KubeOps.Abstractions.csproj
#	src/KubeOps.Operator/Builder/CacheExtensions.cs
#	src/KubeOps.Operator/Constants/CacheConstants.cs
#	src/KubeOps.Operator/KubeOps.Operator.csproj
#	src/KubeOps.Operator/Watcher/ResourceWatcher{TEntity}.cs
…ependency

- Removed redundant requeue logic and optimized entity cache operations during deletion in `ResourceWatcher`.
- Upgraded `ZiggyCreatures.FusionCache` to version `2.4.0`.
- Introduced `RequeueType` enumeration to specify requeue operation types (`Added`, `Modified`, `Deleted`).
- Implemented `RequeueTypeExtensions` for mapping `WatchEventType` to `RequeueType`.
- Updated requeue mechanism to include `RequeueType` in `EntityRequeue` and related methods.
- Refactored `TimedEntityQueue` and related classes to support `RequeueEntry` containing both the entity and its requeue type.
- Adjusted tests to incorporate `RequeueType` into entity requeue logic.
… reconciliation logic

- Created `IReconciler<TEntity>` interface and its implementation to handle entity creation, modification, and deletion.
- Updated `ResourceWatcher` and `EntityRequeueBackgroundService` to use `Reconciler` for reconciliation operations.
- Removed redundant FusionCache dependency from `ResourceWatcher` and related classes.
- Streamlined requeue mechanics and replaced entity finalization logic with `Reconciler` integration.
- Registered `IReconciler<TEntity>` and its implementation `Reconciler<TEntity>` in the service container.
- Ensured proper integration with existing requeue and entity processing workflows.
…-attach/detach options

- Added `AutoAttachFinalizers` and `AutoDetachFinalizers` settings in `OperatorSettings`, enabling automatic management of entity finalizers during reconciliation.
- Extended `Reconciler` to respect these settings for adding and removing finalizers.
- Introduced `EntityFinalizerExtensions` for streamlined finalizer handling and identifier generation.
- Updated relevant interfaces and documentation for improved clarity and usability.
…ant values

- Update `KubernetesEntitySyntaxReceiver` to utilize `SemanticModel` for attribute argument resolution, ensuring accurate value retrieval.
- Updated `EntityFinalizerExtensions` to correctly append "finalizer" when missing from the name.
- Added unit tests to validate finalizer identifier generation, including cases for length limits and naming consistency.
- Renamed test cases and entities for improved clarity and consistency.
- Added new tests for entities with no group values and entities with varying group definitions.
- Adjusted expected
…interface for improved flexibility

- Extracted `ITimedEntityQueue` interface from `TimedEntityQueue` implementation.
- Updated all usages, including services and tests, to rely on the interface.
- Added extension methods for requeue key management.
- Improved code consistency and maintainability across the queue system.
…r election

- Replaced `EnableLeaderElection` with `LeaderElectionType` in `OperatorSettings` for enhanced configurability.
- Added `LeaderElectionType` enum with options: None, Single, and Custom.
- Updated `OperatorBuilder` to handle leader election logic based on `LeaderElectionType`.
- Modified `EntityRequeueBackgroundService` to public visibility and implemented proper `Dispose` logic.
- Adjusted tests to reflect new leader election mechanism.
- Improved code maintainability and alignment with distributed system requirements.
@buehler
Copy link
Collaborator

buehler commented Nov 11, 2025

Cool! Thanks for the insights. Definitely looking forward to the code :)

@buehler buehler self-requested a review November 12, 2025 10:17
Copy link
Collaborator

@buehler buehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this big contribution! I really like the changes and I'm looking forward to see how people integrate those into their operators. Getting more aligned with the go implementation makes sense apparently.

I do have some minor comments/questions. Feel free to comment on them and let us have a discussion :-)

Thanks again!

…ate reconciliation method signatures

- Marked entity-related classes as sealed for improved clarity and security.
- Adjusted reconciliation and finalizer method return types to use `ReconciliationResult` in dotnet templates.
- Simplified condition checks by replacing `IsFailure` with `!IsSuccess`.
- Updated related tests and logic to reflect the removal of `IsFailure` property.
…ialization and simplify object creation

- Replaced factory method with init-only properties in `RequeueEntry`.
- Enhanced instantiation of `TimedQueueEntry` with object initializer syntax.
- Added XML documentation for improved code readability.
Copy link
Collaborator

@buehler buehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)
One or two suggestions and discussions are open, but then I'll guess we're good to go!

… for reconciliations

- Introduced `RequeueStrategy` to configure reconciliation queuing behavior - addressing to proper configure custom requeue strategies
- Enhanced `EntityLoggingScope` with additional metadata and public visibility - addressing usage in custom leadership- or requeue overloads
- Updated background services to support activity-based tracing and scoped logging - addressing missing log information
- Adjusted `RequeueEntry` to use `struct` for performance benefits.
@kimpenhaus
Copy link
Collaborator Author

kimpenhaus commented Nov 14, 2025

hey Christoph @buehler - sorry for pushing new changes 🙈 I just saw that there were some logs missing (and an activity) - so tried to align that for consistency. also it doesn't felt good how to "configure" custom requeue mechanism, so I mad it configurable same way as finalizer handling.

this should it be for now - (except changes to the docs to reflect all the changes here - but that will be one last commit) I have two open points I'd like to discuss but will move them maybe to the discussion first.

@buehler
Copy link
Collaborator

buehler commented Nov 14, 2025

No worries. which parts do you want to discuss?

@kimpenhaus
Copy link
Collaborator Author

No worries. which parts do you want to discuss?

  1. I think there is an issue between deserialization in the resource watcher (which is done through the Kubernetes.Client --> KubernetesJson) and the deserialzation in the admission webhooks (validate/mutate) (which is done by the default System.Text.Json). Why do I think that? We have - not sure if it's uncommon - n entity model containing some TimeSpan properties. In the KubernetesJson there is a special converter enforcing the ISO8601 duration format. (eg: PT1H if it's 1 hour). This could be proper deserialized/serialized by the KubernetesClient but with the default System.Text.Json this will fail in the validation webhook - expecting it to be eg 01:00:00 as standard timespan format. My idea was to give the admission webhooks a special modelbinder using the KubernetesJson for the deserialization - but from what I saw this is currently not possible: I asked on the KubernetsClient side: client deserialization <-> validation webhook deserialization kubernetes-client/csharp#1683

  2. when deleting a entity we wanted to set the state to terminating when the finalizer is triggered. this leads to a crd change which fires a modified event (with deletion timestamp - a new resource version - same generation). we had an error in our code which made the finalizer fail - what happens now is an infinite loop :) what we figured out is whenever the finalizer couldn't be detached but the crd gets modified this leads to an infinite loop. in the case where finalizing a crd leads to deletion the modified event is skipped (because the entity doesn't exists anymore) no idea/experience how to proper solve this but I saw other crd's having a state reflecting when it comes to finalization.

  3. not sure about this: but I think that the requeue service and the resource watcher can lead to concurrent/parallel reconciliation which might lead to conflicts. not sure if that is intentional or a common use case and therefore accepted. but it feels kind of a overhead to reconcile the one on top of the other in parallel

I know these are 3 points 🤣

@ralf-cestusio
Copy link

I wanted to chime in on 3. the potential parallel execution of watcher and requeue.
There was a bug mentioning this a few weeks ago: #977
I feel it can lead to some rather hard to debug 409 errors. But i have not experienced this myself so i find it hard to judge how disruptive this behavior is.
But this PR is already very large so i am not sure we should address this one in the same PR.

@kimpenhaus
Copy link
Collaborator Author

I wanted to chime in on 3. the potential parallel execution of watcher and requeue. There was a bug mentioning this a few weeks ago: #977 I feel it can lead to some rather hard to debug 409 errors. But i have not experienced this myself so i find it hard to judge how disruptive this behavior is. But this PR is already very large so i am not sure we should address this one in the same PR.

thanks @ralf-cestusio - honestly I hadn't checked for existing issues. this 3 points weren't planned to go in to this PR :) these were just some points I'd like to discuss.

@buehler buehler merged commit 06c65ba into dotnet:main Nov 21, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants