Skip to content

virtio/fs/macos: preload directory entries to avoid calling telldir()#548

Merged
slp merged 1 commit intocontainers:mainfrom
pftbest:preload-dir-entries
Feb 19, 2026
Merged

virtio/fs/macos: preload directory entries to avoid calling telldir()#548
slp merged 1 commit intocontainers:mainfrom
pftbest:preload-dir-entries

Conversation

@pftbest
Copy link
Contributor

@pftbest pftbest commented Feb 13, 2026

This PR depends on #544 and includes its commits. I can rebase when 544 is merged.

I was doing some performance measurements on filesystem operations and noticed that telldir() call is very heavy on macOS.

In some of my synthetic tests it could take up to 40% of total CPU cycles on the host:
Screenshot 2026-02-12 at 22 02 57

The proposed fix is to avoid calling telldir() at all and preload all dir entries ahead of time on the first call to do_readdir().

The memory impact: In the worst-case scenario each entry would be max 1K in size. A plain directory with 100k files would require 100M in host memory.

However each call to telldir() also allocates memory internally via malloc() and stores it inside DIR stream. We call it on each dir entry so the actual difference should be smaller than that in practice.

Unfortunately after I made this change I did not see a significant time savings on the guest side. Yes the host cycles were reduced but the most of the time is spent waiting on communication between host and guest, so I only see 5-10% improvement on the guest side.

For example running time make clean on the linux kernel source code went from around 12s to 11s.

Trace for make clean without this patch. do_readdir is at 24% cycles

Screenshot 2026-02-13 at 12 44 08

Trace for make clean with this patch do_readdir is at 9% cycles

Screenshot 2026-02-13 at 12 39 36

telldir() is a very heavy call on APFS, but we have to invoke it
on each entry in the directory to handle offsets correctly.

The alternative is to preload all dir entries at the start and use
normal indexes as offsets.

The memory impact: In the worst-case scenario each entry would be
around 1K in size. A plain directory with 100k files would require
100M in host memory.

However each call to telldir() also allocates memory internally via
malloc() and we call it on each dir entry, so the actual difference
should be smaller than that in practice.

Signed-off-by: Vadzim Dambrouski <pftbest@gmail.com>
@pftbest pftbest force-pushed the preload-dir-entries branch from f3f2d49 to 9c59116 Compare February 17, 2026 14:20
Copy link
Collaborator

@dorindabassey dorindabassey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

Copy link
Collaborator

@mtjhrc mtjhrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, thanks!

@mtjhrc
Copy link
Collaborator

mtjhrc commented Feb 18, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve filesystem performance on macOS by pre-loading directory entries to avoid expensive telldir() calls. The implementation replaces the previous directory stream handling with a caching mechanism, which is a sound approach to address the identified performance bottleneck. The code is generally well-written, but I have identified a couple of areas for improvement related to lock contention and code clarity in an unsafe block. My review comments provide specific suggestions to address these points.

name.push(*c as u8);
if !ds.ready {
// Fill the cache on first call
if let Err(e) = ds.fill_from_fd(data.file.write().unwrap().as_raw_fd()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fill_from_fd function only needs a file descriptor to read directory entries; it doesn't modify the file itself. Using a read lock (read()) instead of a write lock (write()) is sufficient and avoids unnecessary lock contention, which can improve performance.

Suggested change
if let Err(e) = ds.fill_from_fd(data.file.write().unwrap().as_raw_fd()) {
if let Err(e) = ds.fill_from_fd(data.file.read().unwrap().as_raw_fd()) {

Comment on lines +126 to +135
let name = unsafe {
let name_len = usize::from((*dentry).d_namlen);
let name_ptr = (*dentry).d_name.as_ptr().cast();
let name = std::slice::from_raw_parts(name_ptr, name_len);

if name == b"." || name == b".." {
continue;
}
Box::<[u8]>::from(name)
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and to minimize the scope of unsafe code, consider refactoring this block. The unsafe block is only needed to create a slice from the raw pointer. The subsequent logic for filtering and boxing the name can be performed in safe code. This refactoring reduces the scope of unsafe code, which is a good practice for safety and maintainability.

Suggested change
let name = unsafe {
let name_len = usize::from((*dentry).d_namlen);
let name_ptr = (*dentry).d_name.as_ptr().cast();
let name = std::slice::from_raw_parts(name_ptr, name_len);
if name == b"." || name == b".." {
continue;
}
Box::<[u8]>::from(name)
};
let name = {
let name_slice = unsafe {
let name_len = usize::from((*dentry).d_namlen);
let name_ptr = (*dentry).d_name.as_ptr().cast();
std::slice::from_raw_parts(name_ptr, name_len)
};
if name_slice == b"." || name_slice == b".." {
continue;
}
Box::<[u8]>::from(name_slice)
};

@slp
Copy link
Collaborator

slp commented Feb 18, 2026

I think gemini's both suggestions are quite reasonable.

@pftbest
Copy link
Contributor Author

pftbest commented Feb 18, 2026

@slp Sorry, but I don't agree with its conclusion for the first one. I specifically rely on the write lock to be locked for the full duration of the fill_from_fd function because it updates internal position of the original fd. All duped fd share the same position internally so running 2 instances of fill_from_fd on the same fd will lead to corrupted data.

I'm not quite sure if libkrun uses single thread or multiple threads to handle fs requests, but even if it's single threaded then read() vs write() doesn't matter and if it's multiple threads then write() is correct here.

Second suggestion I can apply, but will need to move the safety comment too.

@mtjhrc
Copy link
Collaborator

mtjhrc commented Feb 18, 2026

@slp I specifically rely on the write lock to be locked for the full duration of the fill_from_fd function because it updates internal position of the original fd. All duped fd share the same position internally so running 2 instances of fill_from_fd on the same fd will lead to corrupted data.

Makes sense

Second suggestion I can apply, but will need to move the safety comment too.

Looking at it again, the way you have written it is maybe safer (it is very clear what the lifetime of the slice is).

Feel free to ignore Gemini.

@slp slp merged commit 54da12a into containers:main Feb 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments