-
-
Notifications
You must be signed in to change notification settings - Fork 9
007 - Add topic name lookup facility #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here's a rough prototype kroxylicious/kroxylicious#2583 |
|
|
||
| The 'edge' aspect is saying that we want to cache what the upstream said the topic names are, at the edge of the Proxy -> Upstream, but they must then traverse the Filter chain to maintain composability. | ||
|
|
||
| We can implement a very long lived cache invalidation strategy as Kafkatopics cannot be renamed. Inconsistencies will arise if the upstream topics are deleted but we should be able to rely on the upstream failing if we forward any interactions with deleted topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: KIP-516 imagines a world where topics can be renamed. Obviously, we don't need to think about rightnow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, too, don't want to think about this. But we should.
Kafka already has a mechanism for deferred deletion. I don't recall if that's just soft deleting the data, with the topic deleted from the metadata immediately (such that you can immediately recreate the topic once you have the deletion response success), or whether it imposes a delay before you can recreate the topic. If the latter then knowing the timeout would at least give us an upper bound on how long to leave it before revalidating caches.
| * Asynchronously resolves a set of topic UUIDs to their corresponding topic names. | ||
| * | ||
| * @param topicUuids A Set of topic UUIDs to resolve. | ||
| * @return A CompletionStage that will complete with an unmodifiable Map<Uuid, String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question whether failing in the case where a Uuid is not found is the right behavior. The caller has no way to know which Uuid has been deleted, so I think you are really forcing them to query with singletons. I think in the case where a topic id no longer exists, returning a map with that key absent is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also a bit cautious about this. "cannot be resolved" covers things like the user not being authorized to see the topic, as well as races wrt topic creation and deletion. We need to be sure that what the client observes is consistent with Kafka behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have reworked it with a Result class that will take either the name or an exception, so we can have specific exception types for non-existence vs authorization or other error codes from Kafka.
| CompletionStage<Map<Uuid, String>> getTopicNames(Set<Uuid> topicUuids); | ||
| ``` | ||
|
|
||
| This method will internally send a `Metadata` request to the upstream Kafka cluster to fetch the required topic information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this must be sending a request for all topics rather than just the subset queried by the user? I am guessing too that includeClusterAuthorizedOperations etc will be false to minimize the amount of the work the server has to do.
It is a pity one can't query for topics without pulling the partition information (which might be huge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had imagined only querying the topic ids that we are immediately interested in. I guess doing them all at once would save incrementally building the cache up.
|
Thanks for the write up @robobario. I left some comments but I think the shape is broadly good. |
tombentley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robobario
| * Asynchronously resolves a set of topic UUIDs to their corresponding topic names. | ||
| * | ||
| * @param topicUuids A Set of topic UUIDs to resolve. | ||
| * @return A CompletionStage that will complete with an unmodifiable Map<Uuid, String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also a bit cautious about this. "cannot be resolved" covers things like the user not being authorized to see the topic, as well as races wrt topic creation and deletion. We need to be sure that what the client observes is consistent with Kafka behaviour.
|
|
||
| The 'edge' aspect is saying that we want to cache what the upstream said the topic names are, at the edge of the Proxy -> Upstream, but they must then traverse the Filter chain to maintain composability. | ||
|
|
||
| We can implement a very long lived cache invalidation strategy as Kafkatopics cannot be renamed. Inconsistencies will arise if the upstream topics are deleted but we should be able to rely on the upstream failing if we forward any interactions with deleted topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, too, don't want to think about this. But we should.
Kafka already has a mechanism for deferred deletion. I don't recall if that's just soft deleting the data, with the topic deleted from the metadata immediately (such that you can immediately recreate the topic once you have the deletion response success), or whether it imposes a delay before you can recreate the topic. If the latter then knowing the timeout would at least give us an upper bound on how long to leave it before revalidating caches.
tombentley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robobario I had a few questions, but overall this looks good.
|
@robobario assuming #75 doesn't get merged first, I suppose you will need to renumber the filename. |
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
3dd0c2a to
35b7351
Compare
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
This proposal outlines a new feature for the Kroxylicious filter framework: a mechanism to resolve topic names from their corresponding topic IDs. This will enable filters to implement business logic based on human-readable topic names, rather than the underlying Kafka implementation detail of topic IDs.