Implement Runtime NVMe Instance Storage Discovery Using AWS EBS Symlinks#396
Implement Runtime NVMe Instance Storage Discovery Using AWS EBS Symlinks#396
Conversation
rkoster
left a comment
There was a problem hiding this comment.
In general I would have expected this logic to go into the https://github.com/cloudfoundry/bosh-agent/tree/main/infrastructure/devicepathresolver package.
| p.logger.Debug(logTag, "Found NVMe devices: %v", allNvmeDevices) | ||
|
|
||
| // Identify EBS volumes via symlinks | ||
| ebsSymlinks, err := p.fs.Glob("/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_*") |
There was a problem hiding this comment.
This should somehow be passed in via the agent config in the stemcell builder, because it is IaaS specific.
There was a problem hiding this comment.
Created a new component instanceStorageResolver for the linux platform. It has 3 variants:
autoDetectingInstanceStorageResolver- the linux platform uses this one, it detects which storage resolver to useawsNVMeInstanceStorageResolver- detects what are the instance storage devices; contains the new logicidentityInstanceStorageResolver- simple resolver for the traditional naming used before this change
Thank you for the review! That's was a big oversight on my end, I'll look into it. |
|
No worries 🙂 |
|
We discussed this during the FI WG meeting and this have to relay on the stemcell agent settings and agent strategy for disc handling. |
|
As discussed during the working group meeting, focus is now on validating: cloudfoundry/bosh-aws-cpi-release#196 (comment) |
|
As per: cloudfoundry/bosh-aws-cpi-release#196 (comment) this change is still needed. Please continue reviewing. |
|
|
||
| type awsNVMeInstanceStorageResolver struct { | ||
| fs boshsys.FileSystem | ||
| devicePathResolver DevicePathResolver |
There was a problem hiding this comment.
The devicePathResolver field is stored in the struct but never used anywhere in the awsNVMeInstanceStorageResolver implementation. The DiscoverInstanceStorage method only uses fs for globbing and symlink resolution.
| nvmeDevicePattern string, | ||
| ) InstanceStorageResolver { | ||
| if ebsSymlinkPattern == "" { | ||
| ebsSymlinkPattern = "/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_*" |
There was a problem hiding this comment.
I would still like this to be passed in as configuration, so that the strategy can be iaas agnostic. This pattern might some day be re-usable.
There was a problem hiding this comment.
Reworked in bb98869
PR in stemcell builder to add patterns to config: cloudfoundry/bosh-linux-stemcell-builder#592
* Refactor instance storage discovery into configurable component Implement auto-detection for instance storage disk type * Fix windows tests * Fix windows tests (but for real this time)
e7d00b4 to
dcd857a
Compare
|
@neddp could you take a look at these failing unit tests? |
|
Hi @rkoster, We still haven't had the time to test the changes on an actual deployment. I will move the PR to draft until we can confirm everything is working fine. We'll address the tests as well. |
* Make implementation iaas-agnostic * Rename storage resolver files * Fix tests * Remove instance storage resolver * Don't use the aws pattern as default * Refactor NVMe instance storage discovery and remove unused symlink patterns * Enhance NVMe instance storage discovery with managed volume pattern support * Fix unit tests * Don't run windows unit tests when not supported * Simplify FakeDevicePathResolver by removing unused fields and methods * Wait for udev to settle before resolving EBS symlinks * Add debug logs * Import udev and add comment about why it's needed
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdds a new SymlinkDeviceResolver for globbing and resolving device symlinks and coordinating udev trigger/settle. Modifies FakeDevicePathResolver to record all diskSettings calls in a slice; tests updated from exact equality to containment/collection assertions. Refactors LinuxPlatform.SetupRawEphemeralDisks to optionally discover NVMe instance-storage devices via the symlink resolver (with configurable patterns) and updates its constructor signature and options. Provider wiring injects the new resolver into Linux platform construction; tests are extended accordingly. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 55 minutes and 49 seconds.Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infrastructure/devicepathresolver/symlink_device_resolver_test.go`:
- Around line 72-92: The test currently expects ResolveSymlinksToDevices to
silently skip unresolved symlinks; instead modify the resolver
(ResolveSymlinksToDevices in the symlink device resolver) to treat any existing
symlink that cannot be resolved as an error (fail-closed) and return that error
to the caller; update the symlink_device_resolver_test.go case that created an
unresolved "/dev/disk/by-id/..." entry so it now asserts that
ResolveSymlinksToDevices returns a non-nil error (and no successful mapping)
rather than skipping the broken symlink. Ensure the change surfaces the
underlying readlink/resolve error from the resolver.
In `@platform/linux_platform.go`:
- Around line 811-825: The current logic gates NVMe runtime discovery on
CPI-reported path prefixes via hasNVMeDevices(devices), which can skip NVMe
resolution when CPI reports aliases; change the condition so that if a managed
volume pattern (p.options.InstanceStorageManagedVolumePattern != "") and the
symlink resolver (p.symlinkDeviceResolver != nil) are present, always call
p.discoverNVMeInstanceStorage(devices) instead of falling back to
p.discoverIdentityInstanceStorage(devices), removing the hasNVMeDevices(devices)
check; keep references to discoverNVMeInstanceStorage,
discoverIdentityInstanceStorage, hasNVMeDevices,
p.options.InstanceStorageManagedVolumePattern, and p.symlinkDeviceResolver to
locate and update the conditional.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed13b328-8e26-46ab-81e0-b8bed7f74223
📒 Files selected for processing (10)
agent/bootstrap_test.goinfrastructure/devicepathresolver/fakes/fake_device_path_resolver.goinfrastructure/devicepathresolver/fallback_device_path_resolver_test.goinfrastructure/devicepathresolver/scsi_device_path_resolver_test.goinfrastructure/devicepathresolver/symlink_device_resolver.goinfrastructure/devicepathresolver/symlink_device_resolver_test.goinfrastructure/devicepathresolver/virtio_device_path_resolver_test.goplatform/linux_platform.goplatform/linux_platform_test.goplatform/provider.go
Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com>
Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infrastructure/devicepathresolver/symlink_device_resolver_test.go`:
- Around line 88-90: The test file symlink_device_resolver_test.go has a
formatting/indentation issue around the assertion block that calls
resolver.ResolveSymlinksToDevices and checks the error; run goimports (or gofmt)
on the file to normalize imports and indentation so the block with
ResolveSymlinksToDevices, Expect(err).To(HaveOccurred()), and
Expect(err.Error()).To(ContainSubstring("nvme-invalid")) is properly formatted.
Ensure the file compiles and lints cleanly after running the formatter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9d923cdb-2fa3-4c19-bc3c-66c23b259996
📒 Files selected for processing (4)
infrastructure/devicepathresolver/symlink_device_resolver.goinfrastructure/devicepathresolver/symlink_device_resolver_test.goplatform/linux_platform.goplatform/linux_platform_test.go
Problem
On AWS Nitro-based instances with NVMe devices, the kernel's PCIe enumeration order is non-deterministic. This means:
/dev/nvme0n1could be the root EBS volume OR instance storage/dev/nvme1n1could be instance storage OR the root EBS volumeSolution
Implemented runtime discovery to reliably identify instance storage by excluding EBS volumes.
Discovery Algorithm
Why EBS Symlinks Are Reliable
AWS automatically creates persistent symlinks for all EBS volumes via udev rules:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol{volume_id}Backwards Compatibility
Non-NVMe instances: No changes to behavior
/dev/xvdb,/dev/sdb) use CPI paths directlyThis must be merged together with the CPI changes - cloudfoundry/bosh-aws-cpi-release#196
Pair @Ivaylogi98