virtio/fs/macos: preload directory entries to avoid calling telldir()#548
virtio/fs/macos: preload directory entries to avoid calling telldir()#548slp merged 1 commit intocontainers:mainfrom
Conversation
telldir() is a very heavy call on APFS, but we have to invoke it on each entry in the directory to handle offsets correctly. The alternative is to preload all dir entries at the start and use normal indexes as offsets. The memory impact: In the worst-case scenario each entry would be around 1K in size. A plain directory with 100k files would require 100M in host memory. However each call to telldir() also allocates memory internally via malloc() and we call it on each dir entry, so the actual difference should be smaller than that in practice. Signed-off-by: Vadzim Dambrouski <pftbest@gmail.com>
f3f2d49 to
9c59116
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request aims to improve filesystem performance on macOS by pre-loading directory entries to avoid expensive telldir() calls. The implementation replaces the previous directory stream handling with a caching mechanism, which is a sound approach to address the identified performance bottleneck. The code is generally well-written, but I have identified a couple of areas for improvement related to lock contention and code clarity in an unsafe block. My review comments provide specific suggestions to address these points.
| name.push(*c as u8); | ||
| if !ds.ready { | ||
| // Fill the cache on first call | ||
| if let Err(e) = ds.fill_from_fd(data.file.write().unwrap().as_raw_fd()) { |
There was a problem hiding this comment.
The fill_from_fd function only needs a file descriptor to read directory entries; it doesn't modify the file itself. Using a read lock (read()) instead of a write lock (write()) is sufficient and avoids unnecessary lock contention, which can improve performance.
| if let Err(e) = ds.fill_from_fd(data.file.write().unwrap().as_raw_fd()) { | |
| if let Err(e) = ds.fill_from_fd(data.file.read().unwrap().as_raw_fd()) { |
| let name = unsafe { | ||
| let name_len = usize::from((*dentry).d_namlen); | ||
| let name_ptr = (*dentry).d_name.as_ptr().cast(); | ||
| let name = std::slice::from_raw_parts(name_ptr, name_len); | ||
|
|
||
| if name == b"." || name == b".." { | ||
| continue; | ||
| } | ||
| Box::<[u8]>::from(name) | ||
| }; |
There was a problem hiding this comment.
For better maintainability and to minimize the scope of unsafe code, consider refactoring this block. The unsafe block is only needed to create a slice from the raw pointer. The subsequent logic for filtering and boxing the name can be performed in safe code. This refactoring reduces the scope of unsafe code, which is a good practice for safety and maintainability.
| let name = unsafe { | |
| let name_len = usize::from((*dentry).d_namlen); | |
| let name_ptr = (*dentry).d_name.as_ptr().cast(); | |
| let name = std::slice::from_raw_parts(name_ptr, name_len); | |
| if name == b"." || name == b".." { | |
| continue; | |
| } | |
| Box::<[u8]>::from(name) | |
| }; | |
| let name = { | |
| let name_slice = unsafe { | |
| let name_len = usize::from((*dentry).d_namlen); | |
| let name_ptr = (*dentry).d_name.as_ptr().cast(); | |
| std::slice::from_raw_parts(name_ptr, name_len) | |
| }; | |
| if name_slice == b"." || name_slice == b".." { | |
| continue; | |
| } | |
| Box::<[u8]>::from(name_slice) | |
| }; |
|
I think gemini's both suggestions are quite reasonable. |
|
@slp Sorry, but I don't agree with its conclusion for the first one. I specifically rely on the write lock to be locked for the full duration of the I'm not quite sure if libkrun uses single thread or multiple threads to handle fs requests, but even if it's single threaded then Second suggestion I can apply, but will need to move the safety comment too. |
Makes sense
Looking at it again, the way you have written it is maybe safer (it is very clear what the lifetime of the slice is). Feel free to ignore Gemini. |
This PR depends on #544 and includes its commits. I can rebase when 544 is merged.
I was doing some performance measurements on filesystem operations and noticed that
telldir()call is very heavy on macOS.In some of my synthetic tests it could take up to 40% of total CPU cycles on the host:

The proposed fix is to avoid calling
telldir()at all and preload all dir entries ahead of time on the first call todo_readdir().The memory impact: In the worst-case scenario each entry would be max 1K in size. A plain directory with 100k files would require 100M in host memory.
However each call to telldir() also allocates memory internally via malloc() and stores it inside DIR stream. We call it on each dir entry so the actual difference should be smaller than that in practice.
Unfortunately after I made this change I did not see a significant time savings on the guest side. Yes the host cycles were reduced but the most of the time is spent waiting on communication between host and guest, so I only see 5-10% improvement on the guest side.
For example running
time make cleanon the linux kernel source code went from around 12s to 11s.Trace for make clean without this patch. do_readdir is at 24% cycles
Trace for make clean with this patch do_readdir is at 9% cycles