Obsolete resource handling for read-cache-after-write#3207
Obsolete resource handling for read-cache-after-write#3207csviri wants to merge 33 commits intooperator-framework:nextfrom
Conversation
|
edit: started my initial review with seeing this had been associated with #3208. Based upon that description this is for when a resource is added to the temporary cache, then deleted by another actor, and a relist happens before the either the new or the delete events are received. Those stale entries won't actually get used because of the logic in ManagedInformerEventSource.get, but if this happens a sufficient number of times for resources that never again receive an event it can be a memory leak. In addition to something like this, ManagedInformerEventSource.get could be refined.
|
I don't see why those would not be used: If there is a resource in TemporaryResourceCache (TRC), it will return that value. This PR ensures that as you mentioned that we miss the delete event because of re-list, we eventually remove those resources. The related PR in fabric8 client will allow us to ensure that we always get an up-to-date resource version (vi bookmark), even if there are no further resources changes related to this informer. So we can clear old these old resources from TRC.
Stale entries are removed as a result of an event. This PR ensures that also those for which we miss the DELETE event are removed too. Not sure why we would not proactively remote those? |
edit: I was initially reading that effectively as isPresent, not isEmpty. So if the entry doesn't exist, we'll still use the temporary version. Yes, that logic should change to not even look for the cache entry, but to test the lastest resource version known to the informer.
That goes back to the original form of the comment, if you were trying to keep the temporary cache as small as possible - if so, then anytime we determine an entry is stale, there's no reason not to remove it. |
you mean the lastSyncVersion or latestResourceVersion from TRC ? hmm but I guess that does not matter that much in this case |
I mean the cache lastSyncResourceVersion - that would allow any relist (regardless of whether list watches are used, and any other client changes) to make the TRC resource seem obsolete. |
|
Yep, will change it also in this pr. Thank you. We should still clear Also if we can get the notification on the list from the client will cover that last corner case if there are no further events coming in. But frankly I think we can release without that after this change since the probability that all these circumstances happens is really close to zero. |
No it shouldn't hold up the release. You could even consider changing what you have here to be based upon a timer, and you'll be covered regardless. |
...io/javaoperatorsdk/operator/processing/event/source/informer/ManagedInformerEventSource.java
Outdated
Show resolved
Hide resolved
...ava/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java
Outdated
Show resolved
Hide resolved
c074fcd to
fb12deb
Compare
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
fb12deb to
5421153
Compare
Mentioning consistency or obsolete seems to be good. You could refer to the pruning as dealing with "ghost" entries. |
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
|
Yes, maybe ghost resource is better name. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the framework’s read-cache-after-write behavior by introducing periodic cleanup of “obsolete” entries in the temporary resource cache, and wiring the cleanup interval into informer configuration.
Changes:
- Add scheduled obsolete-entry detection/removal to
TemporaryResourceCache, driven by informerlastSyncResourceVersion(). - Extend informer configuration/annotation to support an
obsoleteResourceCacheCheckIntervalwith a default. - Update tests and sample logging configs; remove an unused sample dependency.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
operator-framework-core/src/main/java/.../TemporaryResourceCache.java |
Adds scheduled obsolete-resource checking and uses informer sync RV for consistency decisions. |
operator-framework-core/src/main/java/.../ManagedInformerEventSource.java |
Constructs the updated temporary cache and uses sync RV in get(...) logic. |
operator-framework-core/src/main/java/.../InformerManager.java |
Exposes lastSyncResourceVersion(namespace) to support new cache logic. |
operator-framework-core/src/main/java/.../ExecutorServiceManager.java |
Adds a scheduled executor accessor/initialization for periodic tasks. |
operator-framework-core/src/main/java/.../InformerConfiguration.java |
Adds config field + defaults for obsolete check interval. |
operator-framework-core/src/main/java/.../InformerEventSourceConfiguration.java |
Moves comparable-RV configuration into InformerConfiguration and adds new interval builder method. |
operator-framework-core/src/main/java/.../Informer.java |
Adds annotation attribute for obsolete check interval. |
operator-framework-core/src/test/java/.../TemporaryPrimaryResourceCacheTest.java |
Adds tests for obsolete-resource removal behavior. |
operator-framework-core/src/test/java/.../InformerEventSourceTest.java |
Updates mocks to include new configuration. |
operator-framework-core/src/test/java/.../ControllerEventSourceTest.java |
Updates config setup for new interval plumbing. |
operator-framework-core/src/main/java/.../InformerWrapper.java |
Adds getter exposing the underlying informer (used for sync RV retrieval). |
sample-operators/.../log4j2.xml |
Adjusts logging levels/layouts in samples/tests. |
sample-operators/leader-election/pom.xml |
Removes org.takes:takes dependency. |
test.sh |
Adds a helper script to build/load a Docker image into kind. |
You can also share your feedback on Copilot code review. Take the survey.
...ava/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java
Show resolved
Hide resolved
...io/javaoperatorsdk/operator/processing/event/source/informer/ManagedInformerEventSource.java
Show resolved
Hide resolved
| var iterator = cache.entrySet().iterator(); | ||
| while (iterator.hasNext()) { | ||
| var e = iterator.next(); | ||
| if (ReconcilerUtilsInternal.compareResourceVersions( | ||
| e.getValue().getMetadata().getResourceVersion(), | ||
| getLatestResourceVersion(e.getValue().getMetadata().getNamespace())) | ||
| < 0 | ||
| // making sure we have the situation where resource is missing from the cache | ||
| && managedInformerEventSource | ||
| .manager() | ||
| .get(ResourceID.fromResource(e.getValue())) | ||
| .isEmpty()) { | ||
| iterator.remove(); | ||
| managedInformerEventSource.handleEvent(ResourceAction.DELETED, e.getValue(), null, true); | ||
| log.debug("Removing obsolete resource with ID: {}", e.getKey()); |
There was a problem hiding this comment.
checkObsoleteResources() iterates over a ConcurrentHashMap and calls iterator.remove(), but ConcurrentHashMap iterators don’t support element removal and will throw UnsupportedOperationException at runtime. Remove entries via cache.remove(key) / cache.remove(key, value) (or collect keys to remove) instead of using iterator.remove().
.../main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerManager.java
Show resolved
Hide resolved
| public void start(ConfigurationService configurationService) { | ||
| if (!started) { | ||
| this.configurationService = configurationService; // used to lazy init workflow executor | ||
| this.cachingExecutorService = Executors.newCachedThreadPool(); | ||
| this.scheduledExecutorService = Executors.newScheduledThreadPool(0); | ||
| this.executor = new InstrumentedExecutorService(configurationService.getExecutorService()); | ||
| started = true; |
There was a problem hiding this comment.
scheduledExecutorService is initialized with Executors.newScheduledThreadPool(0), which creates a scheduler with 0 core threads; scheduled tasks won’t run. This also isn’t included in stop() shutdown, so it would leak threads once fixed. Use a scheduler with at least 1 thread (or a shared scheduler) and ensure it’s shut down in stop() alongside the other executors.
...operatorsdk/operator/processing/event/source/informer/TemporaryPrimaryResourceCacheTest.java
Outdated
Show resolved
Hide resolved
...operatorsdk/operator/processing/event/source/informer/TemporaryPrimaryResourceCacheTest.java
Show resolved
Hide resolved
| * For read-cache-after-write consistency there are some corner cases where we need to check the | ||
| * caches see {@link TemporaryResourceCache#checkObsoleteResources()} periodically. This is the | ||
| * period in milliseconds. Applicable only if {@link #comparableResourceVersions()}} is true. | ||
| * |
There was a problem hiding this comment.
Javadoc has a mismatched brace in the link ({@link #comparableResourceVersions()}}) which will render incorrectly and can produce Javadoc warnings. Also, linking to TemporaryResourceCache#checkObsoleteResources() references a private method, which may not be linkable in generated docs; consider linking to the class or describing the behavior without referencing the private method.
.../main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
| private void checkGhostResources() { | ||
| log.debug("Checking for ghost resources."); | ||
| var iterator = cache.entrySet().iterator(); | ||
| while (iterator.hasNext()) { | ||
| var e = iterator.next(); | ||
| if (ReconcilerUtilsInternal.compareResourceVersions( | ||
| e.getValue().getMetadata().getResourceVersion(), | ||
| getLatestResourceVersion(e.getValue().getMetadata().getNamespace())) | ||
| < 0 | ||
| // making sure we have the situation where resource is missing from the cache |
There was a problem hiding this comment.
checkGhostResources() calls compareResourceVersions(..., getLatestResourceVersion(...)) without handling the case where lastSyncResourceVersion() is null (e.g., informer not synced yet) or throws. This can cause a NullPointerException/NoSuchElementException and will cancel the periodic task (ScheduledExecutorService suppresses subsequent executions after an exception). Guard against null/absent latest RV and catch/log exceptions so the periodic cleanup keeps running.
...n/java/io/javaoperatorsdk/operator/api/config/informer/InformerEventSourceConfiguration.java
Show resolved
Hide resolved
.../main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerManager.java
Show resolved
Hide resolved
...ava/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
Signed-off-by: Attila Mészáros a_meszaros@apple.com