Skip to content

Conversation

@pajakd
Copy link
Collaborator

@pajakd pajakd commented Nov 13, 2025

Description

The Slice CR is changed from namespaced to cluster-scoped. This PR is to align with this change.

Issue

Testing

ObjectMeta: metav1.ObjectMeta{
Name: SliceName(wl.Name, podSetName),
Namespace: wl.Namespace,
Name: SliceName(wl.Name, podSetName),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have workloads with the same names and PodSet names in different namespaces? Should we include the namespace in the name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add test case for this scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test "should handle two JobSets with the same name in different namespaces", which passes without any additional changes. The thing is that each workload gets a different name (bc it is build using has of UID of the JobSet https://github.com/kubernetes-sigs/kueue/blob/36e330ef5aa07ec20157cd523805cd173dac488b/pkg/controller/jobframework/workload_names.go#L52).

Do you think its possible to construct a test in which two workloads have the same names?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what if the user creates a prebuilt workload?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I think you are right. So I prepended the namespace to the slice name for fix the potential name collision. Added a unit test "should create a slice for another workload with the same name but in a different namespace" that should cover this scenario. Please take a look.

Copy link
Collaborator

@mbobrovskyi mbobrovskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we didn’t handle removing slices when a workload was deleted due to ownerReference. I think we should add this logic to the reconciler to handle workload deletion properly.

@pajakd
Copy link
Collaborator Author

pajakd commented Nov 17, 2025

Previously, we didn’t handle removing slices when a workload was deleted due to ownerReference. I think we should add this logic to the reconciler to handle workload deletion properly.

If workload is deleted then shouldFinalize(wl) is true and this logic removes the slices:

if finalize, reason := shouldFinalize(wl); finalize {
if controllerutil.ContainsFinalizer(wl, SliceControllerName) {
log.V(3).Info(fmt.Sprintf("Cleaning up the Slices and finalizing the Workload because %s", reason))
cleanedUp, err := r.cleanupSlices(ctx, wl)

I can't think of a scenario that requires additional cleanup (also, none of our e2e tests is failing with this change). What am I missing?

@mbobrovskyi
Copy link
Collaborator

mbobrovskyi commented Nov 17, 2025

I can't think of a scenario that requires additional cleanup (also, none of our e2e tests is failing with this change). What am I missing?

Before this, we try to get the workload, and if it doesn’t exist, we skip the reconciliation. But probably it shouldn't be a problem because we have finalizer, right?

@pajakd
Copy link
Collaborator Author

pajakd commented Nov 18, 2025

I can't think of a scenario that requires additional cleanup (also, none of our e2e tests is failing with this change). What am I missing?

Before this, we try to get the workload, and if it doesn’t exist, we skip the reconciliation. But probably it shouldn't be a problem because we have finalizer, right?

I think so too. We delete the finalizer only after performing the cleanup so the workload cannot be deleted if the slices still exist.

Comment on lines 295 to 296
s.Annotations["slice.accelerator.gke.io/owner-workload-name"] = name
s.Annotations["slice.accelerator.gke.io/owner-workload-namespace"] = ns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use const vals here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@PBundyra
Copy link
Collaborator

Overall LGTM

@pajakd pajakd merged commit e58527f into AI-Hypercomputer:slice-main Nov 24, 2025
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants