Skip to content

dax: use atomic flags to avoid race conditions#10580

Open
checkupup wants to merge 1 commit intothesofproject:mainfrom
checkupup:main_dax_race_condition
Open

dax: use atomic flags to avoid race conditions#10580
checkupup wants to merge 1 commit intothesofproject:mainfrom
checkupup:main_dax_race_condition

Conversation

@checkupup
Copy link
Contributor

In DP domain, sof_dax_process runs in a DP thread, other sof_dax_* functions run in a different LL thread. Using atomic flags instead of original dax_ctx->update_flags to avoid potential race conditions when updating flags.

Since dax_inf.h should not have any dependencies on SOF, we define the new atomic flags in dax.c

In DP domain, sof_dax_process runs in a DP thread,
other sof_dax_* functions run in a different LL
thread. Using atomic flags instead of original
dax_ctx->update_flags to avoid potential race
conditions when updating flags.

Since dax_inf.h should not have any dependencies
on SOF, we define the new atomic flags in dax.c

Signed-off-by: Jun Lai <jun.lai@dolby.com>
@sofci
Copy link
Collaborator

sofci commented Feb 26, 2026

Can one of the admins verify this patch?

reply test this please to run this test once

@checkupup
Copy link
Contributor Author

Hello @johnylin76 @lgirdwood , let us continue our discussion on race condition issue here.

DAX_FLAG_READ_AND_CLEAR,
};

static int32_t flag_process(struct dax_adapter_data *adapter_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think zephyr already has an implementation of atomic mask (e.g. atomic_test_and_set_bit). Did you consider it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Zephyr provides a wealth of compelling bitwise operations functions, but for detailed reasons, please refer to #10580 (comment)

{
switch (opt_mode) {
case DAX_FLAG_READ:
return atomic_read(&adapter_data->proc_flags[ffs(flag) - 1]);
Copy link
Contributor

@wjablon1 wjablon1 Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just define indexes of individual bits instead of masks:
define DAX_DEVICE_MASK 0x4 -> define DAX_DEVICE_IDX 2
This would eliminate ffs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to preserve the bitwise operation characteristics of masks, which allows me to:

  1. Retain the original code logic as much as possible
  2. Conveniently pass multiple masks within the same variable when needed, e.g., masks = (DAX_DEVICE_MASK | DAX_PROFILE_MASK)
  3. Easily modify the code to use atomic_or and atomic_and operations if we fully transition to the Zephyr environment in the future


enum dax_flag_opt_mode {
DAX_FLAG_READ = 0,
DAX_FLAG_ADD,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAX_FLAG_ADD -> DAX_FLAG_SET?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable


struct dax_adapter_data {
struct sof_dax dax_ctx;
atomic_t proc_flags[32];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@checkupup are you able to provide a little more context, as I can now see an array of data instead of a global flag. My understanding was that we would only have 2 instances and had to protect a single shared resource between both DP instances ?
i.e. we can use https://docs.zephyrproject.org/latest/doxygen/html/group__atomic__apis.html#gaef275bd78e093e5b5177ef725d0f570a

bool atomic_cas(atomic_t *target, atomic_val_t old_value, atomic_val_t new_value);

/* is resource mine ? */
owner_id = atomic_get(atomic_var);
if (owner_id == MY_INSTANCE_ID_VALUE) {
    // I can freely use the resource
} else {
    /* i dont own it, try and get resource */
    is_resource_mine = atomic_cas(atomic_var, FREE_VALUE, MY_INSTANCE_ID_VALUE);
    if (!is_resource_mine)
        yield(); // try again when we are next scheduled
}

// I can use resource at this point

...

/* put resource */
is_resource_mine = atomic_cas(atomic_var, MY_INSTANCE_ID_VALUE, FREE_VALUE);
if (!is_resource_mine)
    // should never fail - handle.

where each instance would have a unique id for the atomic variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not involve modifications to shared instances; it aims to prevent race condition on dax_ctx->update_flags.

For example, when executing dax_ctx->update_flags |= DAX_PROCESSING_MASK; in sof_dax_process, the DP thread may be preempted by the LL thread. At this point, the LL thread executes the sof_dax_reset operation and compares dax_ctx->update_flags & DAX_PROCESSING_MASK. At this point, the previous |= operation may not have completed its assignment. The result of dax_ctx->update_flags & DAX_PROCESSING_MASK in LL thread could be false, causing the instance resource to be released. When the DP thread resumes execution, it completes the assignment of DAX_PROCESSING_MASK and continues calling dax_process to handle data. Since the instance resource has already been released, this call will result in undefined behavior. The same issue may also occur in check_and_update_settings and check_and_update_state.

As mentioned earlier, once sof_dax_prepare, sof_dax_set_configuration, and sof_dax_reset are moved to userspace, this race protection is no longer needed since they run on the same thread as sof_dax_process. The main branch has already merged the userspace changes, but to align with the ptl-*-drop-stable branch, this change still needs to be merged first.

@checkupup
Copy link
Contributor Author

@lgirdwood @wjablon1 Zephyr supports numerous compelling operations such as atomic_or and atomic_and, theoretically requiring only a single atomic_t variable. Previous discussions indicated that subsequent SOF updates would be built upon Zephyr. However, legacy platforms based on XTOS and the x86 GCC version within the testbench suite involve non-Zephyr runtime environments. Therefore, to ensure broad compatibility, I avoid adopting Zephyr operations entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants