-
Notifications
You must be signed in to change notification settings - Fork 284
drivers: firmware: imx: fix dependency for IMX_SEC_ENCLAVE and use-after-free #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lf-6.12.y
Are you sure you want to change the base?
Conversation
IMX_SEC_ENCLAVE access the 'nvmem', thus on i.MX 8DXL/QXP/QM the following is required: NVMEM_IMX_OCOTP_SCU. Add it to the depends. With a missing NVMEM_IMX_OCOTP_SCU or with IMX_SEC_ENCLAVE=y and NVMEM_IMX_OCOTP_SCU=m, the probe can keep getting deferred. Fixes: 4faa6ae ("LF-13910-8: drivers: firmware: imx: add support for i.MX8DXL/QXP/QM") Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
At the end of the probe function `se_if_probe`, `se_sysfs_log` is called which creates and adds the kernel object `se_kobj` and the sysfs files. This kobject is then at reference count 1. Since multiple instances of this driver can be present, with the same kobject, the desired behavior is to clean up this kobject when the last instance is removed. Therefore, when the probe is successful and `se_kobj` already existed, the reference count should be increased to track all instances using it. This is not the case now, this patch adds the missing reference count. Secondly, the function `se_if_probe_cleanup` decreases the reference count of `se_kobj`. However, this function is called not only by `se_if_remove` but also in case of probe failure. When the probe fails, the reference count was never increased (because the sysfs entries are not in use) and should thus not be decreased. This leads to a use-after-free situation and a warning in the log: [ 4.282934] fsl-se secure-envlave-1: Fail to read FIPS fuse [ 4.288628] fsl-se secure-envlave-1: Failed to fetch SoC Info. [ 4.294643] fsl-se secure-envlave-1: failed[-EPROBE_DEFER] to fetch SoC Info [ 4.302043] ------------[ cut here ]------------ [ 4.302049] refcount_t: underflow; use-after-free. [ 4.302091] WARNING: CPU: 2 PID: 50 at /lib/refcount.c:28 refcount_warn_saturate+0xf4/0x144 [ 4.302117] Modules linked in: snd_soc_imx_hdmi snd_soc_simple_card_utils sec_enclave rng_core led_class gpio_fan libata phy_fsl_imx8q_pcie fsl_imx8_ddr_perf ci_hdrc_imx cdns3_imx ci_hdrc ulpi ehci_hcd phy_cadence_salvo roles phy_mxs_usb usbmisc_imx usb3503 imx8qxp_adc flexcan can_dev snd_soc_sgtl5000 snd_soc_fsl_audmix snd_soc_fsl_spdif snd_soc_fsl_sai snd_soc_fsl_asrc pwm_imx27 snd_soc_fsl_utils imx_pcm_dma imx8_media_dev(C) spi_fsl_lpspi cdns_mhdp_imx cdns_mhdp_drmcore imx8qxp_pixel_combiner caam error irq_imx_irqsteer amphion_vpu fuse ipv6 autofs4 [ 4.302284] CPU: 2 PID: 50 Comm: kworker/u11:2 Tainted: G C 6.6.101-7.4.0-devel nxp-imx#1 [ 4.302294] Hardware name: Toradex Apalis iMX8QP V1.1 on Apalis Ixora V1.2 Carrier Board (DT) [ 4.302303] Workqueue: events_unbound deferred_probe_work_func [ 4.302319] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.302329] pc : refcount_warn_saturate+0xf4/0x144 [ 4.302341] lr : refcount_warn_saturate+0xf4/0x144 [ 4.302352] sp : ffffffc080393ab0 [ 4.302356] x29: ffffffc080393ab0 x28: 0000000000000000 x27: 0000000000000000 [ 4.302372] x26: ffffff8000411428 x25: ffffff8000666580 x24: ffffff8000628605 [ 4.302388] x23: 0000000000000008 x22: ffffffc080393b38 x21: ffffff80004e0810 [ 4.302404] x20: ffffff80004e0810 x19: ffffff800340b140 x18: fffffffffffe7488 [ 4.302419] x17: 0000000000000000 x16: ffffffd4c0967194 x15: 0000000000000030 [ 4.302435] x14: fffffffffffe74b8 x13: ffffffd4c100ad10 x12: 00000000000004f8 [ 4.302450] x11: 00000000000001a8 x10: ffffffd4c1062d10 x9 : ffffffd4c100ad10 [ 4.302466] x8 : 00000000ffffefff x7 : ffffffd4c1062d10 x6 : 80000000fffff000 [ 4.302481] x5 : ffffff807fb5ebc8 x4 : 0000000000000000 x3 : 0000000000000027 [ 4.302497] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80006e6200 [ 4.302512] Call trace: [ 4.302519] refcount_warn_saturate+0xf4/0x144 [ 4.302531] kobject_put+0x164/0x218 [ 4.302544] se_if_probe_cleanup+0xf8/0x134 [sec_enclave] [ 4.302581] devm_action_release+0x14/0x20 [ 4.302594] devres_release_all+0xa8/0x110 [ 4.302604] device_unbind_cleanup+0x18/0x68 [ 4.302614] really_probe+0x120/0x3c4 [ 4.302623] __driver_probe_device+0x7c/0x16c [ 4.302632] driver_probe_device+0x3c/0x110 [ 4.302642] __device_attach_driver+0xbc/0x158 [ 4.302651] bus_for_each_drv+0x88/0xe8 [ 4.302661] __device_attach+0xa0/0x1b4 [ 4.302670] device_initial_probe+0x14/0x20 [ 4.302680] bus_probe_device+0xa8/0xac [ 4.302689] deferred_probe_work_func+0x94/0xe4 [ 4.302698] process_one_work+0x144/0x29c [ 4.302712] worker_thread+0x31c/0x434 [ 4.302723] kthread+0x110/0x114 [ 4.302735] ret_from_fork+0x10/0x20 [ 4.302746] ---[ end trace 0000000000000000 ]--- Fix both of these issues by incrementing the reference count when the kobject is relevant in subsequent probes and moving the kobject_put from the cleanup function to the remove function. This way, all successful probes increment the reference count, and only full removals decrement it, preventing use-after-free issues. Failed probes never alter the reference count so the cleanup shouldn't either. Fixes: 4faa6ae ("LF-13910-8: drivers: firmware: imx: add support for i.MX8DXL/QXP/QM") Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
|
@ernestvh Thanks for the PR. We are currently working to define a process for accepting patches from outside into our internal linux-imx tree. In order to speed this up, please first create a Github Issue that describes your problem, setup and steps to reproduce. Then link it to this PR. I have already sent this patches internally for review and will get back to you with the feedback. |
| config IMX_SEC_ENCLAVE | ||
| tristate "i.MX Embedded Secure Enclave - EdgeLock Enclave Firmware driver." | ||
| depends on IMX_MBOX && ARCH_MXC && ARM64 | ||
| depends on NVMEM_IMX_OCOTP_SCU && IMX_MBOX && ARCH_MXC && ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - depends on ensures the enclave driver can’t be y when NVMEM_IMX_OCOTP_SCU is m, avoiding probe defers.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit to @MaxKrummenacher!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit to @MaxKrummenacher!
@ernestvh add @MaxKrummenacher as Suggested-by or Reported-by to give credit.
Also, NVMEM_IMX_OCOTP_SCU is enabled only for SCU (System controller unit) based platforms right (like 8qxp/8qm)?
So it means that IMX_SEC_ENCLAVE is only available on this boards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it means that IMX_SEC_ENCLAVE is only available on this boards.
The current imx_v8_defconfig enables NVMEM_IMX_OCOTP_SCU, so IMX_SEC_ENCLAVE would be there for boards without an SCU.
But you are right the this seems not the proper or complete solution to forcing IMX_SEC_ENCLAVE=m when the OCOTP driver is m.
Assuming it to handle the =m vs. =y issue correctly probably the following would be better:
depends on (NVMEM_IMX_OCOTP_SCU || NVMEM_IMX_OCOTP_ELE ) && IMX_MBOX && ARCH_MXC && ARM64
| if (ret) | ||
| pr_warn("Warn: Creating sysfs entry for se_log and se_rcv_msg_timeout: %d\n", ret); | ||
| } else { | ||
| kobject_get(se_kobj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ernestvh Just a heads-up: this isn’t really my area of expertise, so feel free to correct me if I’m off.
From what I can tell se_sysfs_log() always tries to create a new /sys/kernel/se kobject and its files. If it succeeds, it returns 0 (meaning it was just created). If it fails, it cleans up and returns a negative value.
There’s no success path for "already existed".
So if another probe calls it again, kobject_create_and_add("se", kernel_kobj) will likely fail due to the name already being used, and you’ll get an error.
Because of that, in this patch, calling kobject_get(se_kobj) when ret == 0 (success) will increase the refcount unnecessarily for the first device, which could cause a leak (unless the last remove does two put()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iuliana-prodan, thanks for taking the time to review this.
Only the first probe should ever call se_sysfs_log and thus kobject_create_and_add, subsequent probes will always have se_kobj set to some value and thus if (!se_kobj) { ret = se_sysfs_log(); ... will not be executed, with one exception noted below. Instead, the later probes will do the kobject_get and afterwards a kobject_put in their remove methods.
Indeed, se_sysfs_log() tries to create this kobject and its files every time. If kobject_create_and_add fails, se_kobj is still a null pointer, leading the next probe to try se_sysfs_log again. When the call to kobject_create_and_add fails, kobject_put(se_kobj) will be executed, that does not seem right, since at that moment se_kobj is still null (EDIT: the kernel code always checks for null first inside kobject_put, so this is not an issue) . This is the only way that I see that se_sysfs_log could be executed more than once, since all other paths would lead to se_kobj not being null. So I do not see the potential for a leak here, but you did put your finger on a bug.
If either of the sysfs_create_file calls fail, the reference count is decremented, and se_kobj will probably be cleaned up since kobject_create_and_add will have lead to a reference count of 1, but se_kobj will not be reset to null. So the next probe will not execute se_sysfs_log and the creation of the files is never tried again. Possibly this is an issue. Instead, the next probe will do kobject_get, and it will later do a kobject_put in its remove function. I do not see an issue with the reference count there.
In short, hitting "already existed" seems impossible to me, but it does seem like we have some issues with the failure logic here, but that does not seem to be introduced by this patch. In any case, we would only hit such issues if the system is out of memory, so maybe it is not a big concern.
This logic is certainly confusing and I am also not an expert on this, so please feel free to correct me as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm correct about the possible issue, a solution could be to call se_sysfs_log from the probe every time instead of conditionally, and then do something like this in there:
/* Exposing the variable se_log via sysfs to enable/disable logging */
static int se_sysfs_log(void)
{
int ret = 0;
if (!se_kobj) {
/* Create kobject "se" located under /sys/kernel */
se_kobj = kobject_create_and_add("se", kernel_kobj);
if (!se_kobj) {
pr_warn("kobject creation failed\n");
ret = -ENOMEM;
/* N.B.: kobject_put will check for null and nothing will be done */
goto out;
}
} else {
kobject_get(se_kobj);
}
/* TODO: check if this file already exists first */
/* Create file for the se_log attribute */
if (sysfs_create_file(se_kobj, &se_log_attr.attr)) {
pr_err("Failed to create se file\n");
ret = -ENOMEM;
goto out;
}
/* TODO: check if this file already exists first */
/* Create file for the se_rcv_msg_timeout attribute */
if (sysfs_create_file(se_kobj, &se_rcv_msg_timeout_attr.attr)) {
pr_err("Failed to create se_rcv_msg_timeout file\n");
ret = -ENOMEM;
goto out;
}
out:
if (ret)
kobject_put(se_kobj);
return ret;
}
This way the usage is more in one place and easier to follow, and with some additional logic the creation of the files could be tried again if a probe is deferred because of out-of-memory issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points @ernestvh!
Agreed that only the first probe actually calls se_sysfs_log(). The main bug I still see is that if a sysfs_create_file() fails, we put() but never reset se_kobj, so later probes will skip creation and end up with a stale pointer and missing files.
I like the idea of moving the create/get logic fully into se_sysfs_log(), it would make the flow clearer and fix the stale pointer case.
Probably best to hear from the driver owner on which pattern they want.
Thanks!
Hi @dbaluta, thanks for the info, I will keep it in mind for future patches. It seems like I am not permitted to create issues on this repo currently. |
…p-imx#32) * dts: remove opp-supported-hw property The supported-hw check is failing at boot for all opp-supported-hw settings. My theory is that nvmem has not been initialized, or our silicon version has not been fused. Relevant driver files for that configuration are: drivers/opp/of.c drivers/opp/core.c drivers/cpufreq/imx-cpufreq-dt.c A new task will be generated for fusing and re-adding the opp-supported-hw property * dts: mt-connect: increase max voltage of buck1/2 for 1.8g config The 1.8ghz configuration requires a higher max voltage for the a53 core rails/SoC logic and interconnect rails.
Hi, previously @MaxKrummenacher opened #31
We since noticed that there are still issues with this driver, and now have a second commit to fix these. Doing this I noticed that we still need Max's earlier fix, but for functional reasons. The attached stack trace in the original commit message from Max is seemingly not caused by the issue it fixes but by another one.
I have therefore attached both commits but edited Max's commit message.
Please let me know if additional info aside from the commit messages is needed, and thanks for taking the time to investigate this.
@dbaluta