Skip to content

Domain schedules#445

Open
Kswin01 wants to merge 21 commits intoseL4:mainfrom
au-ts:run_time_domains
Open

Domain schedules#445
Kswin01 wants to merge 21 commits intoseL4:mainfrom
au-ts:run_time_domains

Conversation

@Kswin01
Copy link
Contributor

@Kswin01 Kswin01 commented Mar 24, 2026

This PR adds support for defining domain schedules in Microkit, and relies on the run-time domain scheduler changes due to be merged into seL4.

It also depends on the following branch of rust-sel4: https://github.com/au-ts/rust-sel4/tree/domain_set?branch=domain_set

Changes for users

In build_sdk.py we define two new configurations, debug_domains and release_domains. These configurations build the kernel with a max of 256 domains, and 256 entries into the domain schedule. The user can change this to values as they see fit. I have separated these from the regular configs as the users may not wish to have the extra memory overhead (although mostly negligible). I can merge these configs into one if its more desirable.

A domain schedule is defined in the sdf as:

  <domain_schedule>
      <domain name="domain_1" length="1000" />
      <domain name="domain_2" length="1000" />
      <domain name="domain_3" length="2000" />
  </domain_schedule>

The length is defined in milliseconds, and reads the TIMER_FREQUENCY from the kernel config to covert to timer ticks that the kernel requires. This is the case for aarch64 and riscv64. However, on x86, we don't have a static definition of the timer frequency, and is instead set at runtime by the kernel. I'm not sure of the best way to do it here, whether we should do this conversion in the capDL initialiser instead.

The above will insert the schedule using the new DomainSet invocations, beginning at index 0 by default. We also provide capDL a start index of 0 by default.

The users can optionally set these values like so:

    <domain_schedule>
        <domain name="domain_1" length="1000" />
        <domain name="domain_2" length="1000" />
        <domain name="domain_3" length="2000" />
        <domain_start index="2" />
        <domain_idx_shift shift="10" />
    </domain_schedule>

The start index is relative to the schedule list that the user provides not an absolute index. It must be in bounds of the length of schedule that the user defines. We also can have a domain shift, which means that will start inserting the schedule at that shifted index (in the above case 10). The shift + length of the schedule must be less than the kernel configured max number of domain schedules.

We also now build one monitor per domain. The monitor is responsible for handling faults, and also making threads passive if requested. The decision to create a monitor per domain was that these faults/requests can be serviced within that domains timeslice, and not have to wait for a singular domain to be scheduled again.

TODOs before merging

@Ivan-Velickovic
Copy link
Collaborator

In build_sdk.py we define two new configurations, debug_domains and release_domains. These configurations build the kernel with a max of 256 domains, and 256 entries into the domain schedule. The user can change this to values as they see fit. I have separated these from the regular configs as the users may not wish to have the extra memory overhead (although mostly negligible). I can merge these configs into one if its more desirable.

Having the extra configs doesn't scale well, it already is a bit out of hand with the SMP ones, I think the domain functionality should be part of the existing configurations.

@lsf37
Copy link
Member

lsf37 commented Mar 24, 2026

Having the extra configs doesn't scale well, it already is a bit out of hand with the SMP ones, I think the domain functionality should be part of the existing configurations.

If you're always running with NUM_DOMAINS > 1 and don't set up a schedule, you will get the default schedule that wraps around once 2^56-1 ticks have passed. On a 1 GHz timer tick that's after about 833 days. Impact is negligible, just wanted to point out that if you are using this for a very long-term deployment with super precise time requirements you would see a blip every few years.

@Kswin01 Kswin01 force-pushed the run_time_domains branch 2 times, most recently from 8c458dd to a3b6e88 Compare March 24, 2026 05:32
@Indanz
Copy link

Indanz commented Mar 24, 2026

Some comments:

  • Default of 256 domains is way too many. 4 domains would be much more sensible and reduce the performance and memory overhead enough that can you enable it always.
  • 256 entries is over the top too, but that only costs a little bit of extra memory, not performance.
  • Milliseconds for domain durations is too coarse, I think you want microseconds.
  • Domains have no SMP support, so it's either one or the other, but never both.
  • Using domain_start index="1" as an example is misleading because it gives the impression that index starts from 1 instead of 0.

When we move to a clock based MCS API instead of time based, we'll run into the same issues as for domains.

For x86, the timing info is passed on via bootinfo. The problem of that is that it's hard to use by stand-alone or library code, as it depends on the system how to get that information. That's why I proposed a new syscall to get the same info (it could take either a domain cap, or an SchedContext/SchedControl cap). Even on Arm and RISC-V it depends on the system used whether those kernel configs are passed on or not.

@lsf37
Copy link
Member

lsf37 commented Mar 25, 2026

  • Default of 256 domains is way too many. 4 domains would be much more sensible and reduce the performance and memory overhead enough that can you enable it always.

The max number of domains has no performance impact at all as long as it fits 8 bits. You can have NUM_DOMAINS=256 and only use 4 of them. I do agree that you don't want to actually use 256 domains, that would indeed have a performance impact, but there is no reason to pick a particular low number for the maximum.

  • 256 entries is over the top too, but that only costs a little bit of extra memory, not performance.

That is the only one that actually does cost a little. The kernel default is 100, but I don't think 256 is a problem.

  • Milliseconds for domain durations is too coarse, I think you want microseconds.

I agree, people have been asking for a shorter minimum duration for non-MCS as well.

@midnightveil
Copy link
Contributor

midnightveil commented Mar 25, 2026

The max number of domains has no performance impact at all as long as it fits 8 bits. You can have NUM_DOMAINS=256 and only use 4 of them. I do agree that you don't want to actually use 256 domains, that would indeed have a performance impact, but there is no reason to pick a particular low number for the maximum.

That's not true, all the scheduling queues are duplicated per-domain, and they're often some of the largest data in the kernel image aside from kernel page table structures. Though it's probably not a performance impact but more just a memory usage one. (But yes, I broadly agree with you).

@lsf37
Copy link
Member

lsf37 commented Mar 25, 2026

The max number of domains has no performance impact at all as long as it fits 8 bits. You can have NUM_DOMAINS=256 and only use 4 of them. I do agree that you don't want to actually use 256 domains, that would indeed have a performance impact, but there is no reason to pick a particular low number for the maximum.

That's not true, all the scheduling queues are duplicated per-domain, and they're often some of the largest data in the kernel image aside from kernel page table structures. Though it's probably not a performance impact but more just a memory usage one. (But yes, I broadly agree with you).

You're right, there is indeed a memory impact.

@Kswin01
Copy link
Contributor Author

Kswin01 commented Mar 25, 2026

Using domain_start index="1" as an example is misleading because it gives the impression that index starts from 1 instead of 0.

I do explicitly mention just below the xml snippet:
"The above will insert the schedule using the new DomainSet invocations, beginning at index 0 by default"

And I also mention something similar in the manual.md changes in this PR. Hopefully this is enough to indicate that index starts from 0, but I'll change the example here to be clearer.

@Kswin01
Copy link
Contributor Author

Kswin01 commented Mar 25, 2026

I've switched the default maximums to 64 domains, and 128 domain schedule entries. Would this be sufficient?

@Kswin01 Kswin01 force-pushed the run_time_domains branch 3 times, most recently from a791ef9 to ed650e1 Compare March 25, 2026 01:17
Kswin01 added 15 commits March 25, 2026 14:16
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Kswin01 added 5 commits March 25, 2026 14:16
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
Signed-off-by: Krishnan Winter <krishnan.winter@unsw.edu.au>
@Indanz
Copy link

Indanz commented Mar 25, 2026

I've switched the default maximums to 64 domains, and 128 domain schedule entries. Would this be sufficient?

64 is still way too many, especially considering Microkit practically has a limit of about 64 PDs and already uses MCS, which reduces the need for domains.

I also strongly recommend reducing the number of priorities to something sane. E.g.

#define NUM_READY_QUEUES (CONFIG_NUM_DOMAINS * CONFIG_NUM_PRIORITIES)

and the same is true for ksReadyQueuesL2Bitmap.

If you reduce the number of priorities to 64 and have a default number of domains of 4, then you don't use much more memory than now without domains enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants