Skip to content

add snapshot create delete capability#111

Draft
disperate wants to merge 42 commits intocloudscale-ch:masterfrom
disperate:julian/add-snapshot-create-delete-capability
Draft

add snapshot create delete capability#111
disperate wants to merge 42 commits intocloudscale-ch:masterfrom
disperate:julian/add-snapshot-create-delete-capability

Conversation

@disperate
Copy link
Contributor

Adds support for ControllerServiceCapability_RPC_CREATE_DELETE_SNAPSHOT.

Copy link
Collaborator

@mweibel mweibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far. I have a few questions and mostly nits.

For reviewers sake: it would be great to have an example in the examples folder ready to use for testing this. I currently didn't test it on a cluster although I did install the version to see if it starts and if we have any immediate error logs (we don't).

Copy link
Collaborator

@mweibel mweibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this looks very good. Most of the things I commented are not super critical things. Good work 👏

Comment on lines -19 to -24
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots"]
verbs: [ "get", "list", "watch", "update" ]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["get", "list"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we misunderstood each other in the first review.
I'd keep these in, because external-provisioner has them as well.
In the end it doesn't really matter because snapshotter-role also maps those to the same SA but I think updating later to newer provisioner versions with potentially updated role definitions makes the update easier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for this to not be in the chart?

AccessibleTopology: []*csi.Topology{
{
Segments: map[string]string{
topologyZonePrefix: d.zone,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
topologyZonePrefix: d.zone,
topologyZonePrefix: snapshot.Zone.Slug,

shouldn't we use snapshot.Zone.Slug instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The d.zone is not entirely incorrect. Since the driver does not currently work across zones, it gets the zone from the metadata of the node on which it is running. The same zone logic is happening during the normal volume creation.

I think we will move to multi-zone support at some point, for which we will need to implement proper topology support. So i decided to refactor this whole section a bit. We now use the actual data from the volume to create the csiVolume.

assert.NoError(t, err)

// Wait a bit for the PVC to be processed
time.Sleep(10 * time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we could fetch the PVC more often and in a loop, but overall wait a bit longer. 10s is quite a long time in tests and even then may fail, so using a loop may result in faster tests and potentially less flakes.

assert.NoError(t, err)

// Wait a bit for the PVC to be processed
time.Sleep(10 * time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see polling comment above


if time.Now().UnixNano()-start.UnixNano() > (5 * time.Minute).Nanoseconds() {
t.Fatalf("timeout exceeded while waiting for volume snapshot %v to be ready", name)
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary, but can be left if it's preferred clarity-wise.

t.Fatalf already exits right on the spot.

⚠️ This also means however, that the cleanup won't be done! This is something we might want to avoid?

t.Logf("Volume snapshot %q not ready yet; waiting...", name)
time.Sleep(5 * time.Second)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is a bit complex and could be simplified and made more idiomatic Go.

Several suggestions:

  1. use time.Since() :
  func waitForVolumeSnapshot(t *testing.T, client kubernetes.Interface, name string) {                                                           
        const timeout = 5 * time.Minute                                                                                                          
        const pollInterval = 5 * time.Second                                                                                                     
        start := time.Now()                                                                                                                      
                                                                                                                                                 
        t.Logf("Waiting for volume snapshot %q to be ready...", name)                                                                            
                                                                                                                                                 
        for {                                                                                                                                    
                snapshot := getVolumeSnapshot(t, client, name)                                                                                   
                                                                                                                                                 
                if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse {                                  
                        t.Logf("Volume snapshot %q is ready", name)                                                                              
                        return                                                                                                                   
                }                                                                                                                                
                                                                                                                                                 
                if time.Since(start) > timeout {                                                                                                 
                        t.Fatalf("timeout exceeded while waiting for volume snapshot %q to be ready", name)                                      
                }                                                                                                                                
                                                                                                                                                 
                t.Logf("Volume snapshot %q not ready yet; waiting...", name)                                                                     
                time.Sleep(pollInterval)                                                                                                         
        }                                                                                                                                        
  } 
  1. use ticker and select:
  func waitForVolumeSnapshot(ctx context.Context, t *testing.T, name string) {                                                                   
        const pollInterval = 5 * time.Second                                                                                                     
                                                                                                                                                 
        t.Logf("Waiting for volume snapshot %q to be ready...", name)                                                                            
                                                                                                                                                 
        ticker := time.NewTicker(pollInterval)                                                                                                   
        defer ticker.Stop()                                                                                                                      
                                                                                                                                                 
        for {                                                                                                                                    
                snapshot := getVolumeSnapshot(t, ctx, name)                                                                                      
                                                                                                                                                 
                if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse {                                  
                        t.Logf("Volume snapshot %q is ready", name)                                                                              
                        return                                                                                                                   
                }                                                                                                                                
                                                                                                                                                 
                select {                                                                                                                         
                case <-ctx.Done():                                                                                                               
                        t.Fatalf("timeout waiting for volume snapshot %q: %v", name, ctx.Err())                                                  
                case <-ticker.C:                                                                                                                 
                        t.Logf("Volume snapshot %q not ready yet; waiting...", name)                                                             
                }                                                                                                                                
        }                                                                                                                                        
  }
  1. use wait.PollUntilContextTimeout:
  func waitForVolumeSnapshot(ctx context.Context, t *testing.T, name string) {                                                                   
        t.Logf("Waiting for volume snapshot %q to be ready...", name)                                                                            
                                                                                                                                                 
        err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (done bool, err error) {          
                snapshot := getVolumeSnapshot(t, name)                                                                                           
                                                                                                                                                 
                if snapshot.Status != nil && snapshot.Status.ReadyToUse != nil && *snapshot.Status.ReadyToUse {                                  
                        t.Logf("Volume snapshot %q is ready", name)                                                                              
                        return true, nil                                                                                                         
                }                                                                                                                                
                                                                                                                                                 
                t.Logf("Volume snapshot %q not ready yet; waiting...", name)                                                                     
                return false, nil                                                                                                                
        })                                                                                                                                       
                                                                                                                                                 
        if err != nil {                                                                                                                          
                t.Fatalf("failed waiting for volume snapshot %q: %v", name, err)                                                                 
        }                                                                                                                                        
  }

personally, I'd use option 3 in this case because we're already within a package importing kubernetes code.

ctx in the caller you get by using t.Context().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants