add snapshot create delete capability by disperate · Pull Request #111 · cloudscale-ch/csi-cloudscale

disperate · 2026-01-20T17:40:12Z

Adds support for ControllerServiceCapability_RPC_CREATE_DELETE_SNAPSHOT.

…dd-snapshot-create-delete-capability

mweibel

looks good so far. I have a few questions and mostly nits.

For reviewers sake: it would be great to have an example in the examples folder ready to use for testing this. I currently didn't test it on a cluster although I did install the version to see if it starts and if we have any immediate error logs (we don't).

charts/csi-cloudscale/templates/rbac.yaml

charts/csi-cloudscale/templates/statefulset.yaml

driver/controller.go

driver/driver_test.go

test/kubernetes/integration_test.go

go.mod

README.md

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

…s is missing

mweibel

overall this looks very good. Most of the things I commented are not super critical things. Good work 👏

mweibel · 2026-02-03T10:37:37Z

charts/csi-cloudscale/templates/rbac.yaml

-  - apiGroups: ["snapshot.storage.k8s.io"]
-    resources: ["volumesnapshots"]
-    verbs: [ "get", "list", "watch", "update" ]
-  - apiGroups: ["snapshot.storage.k8s.io"]
-    resources: ["volumesnapshotcontents"]
-    verbs: ["get", "list"]


maybe we misunderstood each other in the first review.
I'd keep these in, because external-provisioner has them as well.
In the end it doesn't really matter because snapshotter-role also maps those to the same SA but I think updating later to newer provisioner versions with potentially updated role definitions makes the update easier.

mweibel · 2026-02-03T12:50:15Z

examples/kubernetes/volume-snapshots/volumesnapshotclass.yaml

what's the reason for this to not be in the chart?

driver/controller.go

mweibel · 2026-02-03T13:19:44Z

driver/controller.go

+		AccessibleTopology: []*csi.Topology{
+			{
+				Segments: map[string]string{
+					topologyZonePrefix: d.zone,


Suggested change

topologyZonePrefix: d.zone,

topologyZonePrefix: snapshot.Zone.Slug,

shouldn't we use snapshot.Zone.Slug instead?

The d.zone is not entirely incorrect. Since the driver does not currently work across zones, it gets the zone from the metadata of the node on which it is running. The same zone logic is happening during the normal volume creation.

I think we will move to multi-zone support at some point, for which we will need to implement proper topology support. So i decided to refactor this whole section a bit. We now use the actual data from the volume to create the csiVolume.

driver/controller.go

test/kubernetes/integration_test.go

mweibel · 2026-02-03T13:59:22Z

test/kubernetes/integration_test.go

+	assert.NoError(t, err)
+
+	// Wait a bit for the PVC to be processed
+	time.Sleep(10 * time.Second)


I'd prefer if we could fetch the PVC more often and in a loop, but overall wait a bit longer. 10s is quite a long time in tests and even then may fail, so using a loop may result in faster tests and potentially less flakes.

mweibel · 2026-02-03T13:59:51Z

test/kubernetes/integration_test.go

+	assert.NoError(t, err)
+
+	// Wait a bit for the PVC to be processed
+	time.Sleep(10 * time.Second)


see polling comment above

mweibel · 2026-02-03T14:03:53Z

test/kubernetes/integration_test.go

+
+		if time.Now().UnixNano()-start.UnixNano() > (5 * time.Minute).Nanoseconds() {
+			t.Fatalf("timeout exceeded while waiting for volume snapshot %v to be ready", name)
+			return


nit: unnecessary, but can be left if it's preferred clarity-wise.

t.Fatalf already exits right on the spot.

⚠️ This also means however, that the cleanup won't be done! This is something we might want to avoid?

mweibel · 2026-02-03T14:09:13Z

test/kubernetes/integration_test.go

+		t.Logf("Volume snapshot %q not ready yet; waiting...", name)
+		time.Sleep(5 * time.Second)
+	}
+}


this code is a bit complex and could be simplified and made more idiomatic Go.

Several suggestions:

use time.Since() :

func waitForVolumeSnapshot(t *testing.T, client kubernetes.Interface, name string) { const timeout = 5 * time.Minute const pollInterval = 5 * time.Second start := time.Now() t.Logf("Waiting for volume snapshot %q to be ready...", name) for { snapshot := getVolumeSnapshot(t, client, name) if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse { t.Logf("Volume snapshot %q is ready", name) return } if time.Since(start) > timeout { t.Fatalf("timeout exceeded while waiting for volume snapshot %q to be ready", name) } t.Logf("Volume snapshot %q not ready yet; waiting...", name) time.Sleep(pollInterval) } }

use ticker and select:

func waitForVolumeSnapshot(ctx context.Context, t *testing.T, name string) { const pollInterval = 5 * time.Second t.Logf("Waiting for volume snapshot %q to be ready...", name) ticker := time.NewTicker(pollInterval) defer ticker.Stop() for { snapshot := getVolumeSnapshot(t, ctx, name) if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse { t.Logf("Volume snapshot %q is ready", name) return } select { case <-ctx.Done(): t.Fatalf("timeout waiting for volume snapshot %q: %v", name, ctx.Err()) case <-ticker.C: t.Logf("Volume snapshot %q not ready yet; waiting...", name) } } }

use wait.PollUntilContextTimeout:

func waitForVolumeSnapshot(ctx context.Context, t *testing.T, name string) { t.Logf("Waiting for volume snapshot %q to be ready...", name) err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (done bool, err error) { snapshot := getVolumeSnapshot(t, name) if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse { t.Logf("Volume snapshot %q is ready", name) return true, nil } t.Logf("Volume snapshot %q not ready yet; waiting...", name) return false, nil }) if err != nil { t.Fatalf("failed waiting for volume snapshot %q: %v", name, err) } }

personally, I'd use option 3 in this case because we're already within a package importing kubernetes code.

ctx in the caller you get by using t.Context().

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

…romSnapshot

….com:disperate/csi-cloudscale into julian/add-snapshot-create-delete-capability

… TestPod_Create_Volume_From_Snapshot, remove unused client param

disperate added 7 commits January 20, 2026 17:37

add create and delete snapshot capability

57c44ea

add csi-snapshotter sidecar

fdd653e

Merge branch 'master' into julian/add-snapshot-create-delete-capability

ebf1153

fix for existing snapshots on other volumes

226e604

Merge remote-tracking branch 'upstream/upgrade-go-1.25' into julian/a…

9177b94

…dd-snapshot-create-delete-capability

add integraiton test covering creation and deletion of snapshot

a731ebe

add integration test creating a new volume from a snapshot

aa6bedb

mweibel reviewed Jan 22, 2026

View reviewed changes

disperate and others added 22 commits January 22, 2026 18:15

improve error handling in CreateSnapshot if snapshot does not exist

eaa650c

Merge branch 'master' into julian/add-snapshot-create-delete-capability

9bc534f

improve error message formating

a70fa50

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

fix error formatting

bb3f558

add integration test for luks volume

68317ca

add integration test for PVCs with wrong size

000c1d9

add examples, including luks

576bd5a

remove volume group snapshot permissions

ec91d70

add propper permission role and bindings

6180be8

explain reason for DynamicSnapshotClient

1186d23

get rid of dynamicClient, use typed clientset from external-snapshotter

c3e163c

use errors.As to prevent issues with wraped errors

73a757f

throw InvalidArgument instead of warning for storageType missmatch

d376c0b

improve error handling of existing snapshots and log levels

760a5c7

simplify createdAt parsing

40920be

shorten compared error message

f044acf

replace custom CRD and snapshot controller installation with kustomize

567d561

ignoring storage type parameter, only add debug information

c8de277

setup instructions for volumesnapshotclass.yaml

da0b949

remove leader-election config, as we run it on single replica

9436917

add documentation for luks examples

23d371c

fail integration tests with clear error if CRDs or VolumeSnapshotClas…

64768a1

…s is missing

disperate added 3 commits February 2, 2026 18:42

fix volume and snapshot cleanup in TestPod_Create_Volume_From_Snapshot

b84e9a2

cleanup luks examples

44c0954

cleanup volume snapshot examples, prevent naming conflict

fe09ed2

mweibel reviewed Feb 3, 2026

View reviewed changes

disperate and others added 10 commits February 4, 2026 18:41

remove unnecessary if in driver/controller.go

4a98d81

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

fix typo in test/kubernetes/integration_test.go

509d43b

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

replace static log level with .Values.snapshotter.logLevelVerbosity

64f2d90

improve check for existing snapshots

f81b6e6

wire up resources into helm chart

9065025

Apply suggestion from @mweibel, simplify error creation

6b54627

Co-authored-by: Michael Weibel <307427+mweibel@users.noreply.github.com>

improve idempodency handling and csiVolume creation for createVolumeF…

765446d

…romSnapshot

cleanup logging, comments and error code if size does not match

ddb32a6

Merge branch 'julian/add-snapshot-create-delete-capability' of github…

ae5d884

….com:disperate/csi-cloudscale into julian/add-snapshot-create-delete-capability

remove TestPod_Single_SSD_Volume_Snapshot as it is already covered in…

93ad635

… TestPod_Create_Volume_From_Snapshot, remove unused client param

	topologyZonePrefix: d.zone,
	topologyZonePrefix: snapshot.Zone.Slug,

Conversation

disperate commented Jan 20, 2026

Uh oh!

mweibel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mweibel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants