Add tests for src/snmalloc/override/new.cc#839
Conversation
|
@microsoft-github-policy-service agree |
|
Hi @pauhdk, thanks for taking this.
We run TSan in CI. So any concurrent test will get checked with TSan.
This should also be done, but a separate PR. We still support building with C++17, so that has caused some failures in CI. Also, there should be a clangformat pass. It will build a target with |
mjp41
left a comment
There was a problem hiding this comment.
Thanks. I have made a few suggestions that will hopefully get it through CI.
The TSan build is failing due to duplicate symbols. We might have to disable this test on TSan sadly.
Thank you! You beat me to it. Probably shouldn't have pushed before addressing all of your previous comments.
So, no additional data race tests needed in that case? |
|
Just pushed another fix, @mjp41. I believe the Ubuntu builds are failing because they run with Not sure about the Windows builds. |
|
Did some more testing, and I'm able to reproduce the CI failures locally. I stil believe that An easy fix to pass the tests would be a plain Do you consider this a bug, @mjp41? What do you suggest we do? |
|
Self-vendored STL is for compiled with freestanding so I don't expect it will have full exception handling facilities. |
So should we disable this test if the self-vendored setting is on? |
|
I think so. |
|
PTAL |
mjp41
left a comment
There was a problem hiding this comment.
LGTM, thanks for addressing this long standing issue
|
Looks like the netbsd tests are failing. Maybe an integer overflow somewhere? |
|
Could you add -v (either to ninja or CMAKE_CXX_FLAGS) to dump the compilation command? |
|
I'm trying to run the netbsd CI job locally with act to get the compile command, but no luck so far. |
Hi @paulhdk, I don't think it's an integer overflow. The request is (size_t)-1 (~16 EiB), so snmalloc rejects the size internally long before any arithmetic on it Looking at the log again: we land on the EXPECT with caught_bad_alloc == false. If bad_alloc had been thrown and unmatched, the process would have terminated before Two plausible causes:
Quickest way to tell them apart: drop a write(2, "throw\n", 6) right before the throw std::bad_alloc() in override/new.cc and rerun the netbsd job. If "throw" shows Hope it helps. |
as suggested by @devnexen in microsoft#839 (comment)
I just pushed a commit with the suggested print statement. @mjp41, could you re-run the CI please?
Thank you, @devnexen - very helpful! |
|
I'm not seeing the "throw" message in the netbsd job. FWICT, stderr is not explicitly redirected for the jobs running the tests. Can anyone think of a reason why stderr/the "throw" message" is not showing in the logs? In the meantime, I'll continue trying to set up the CI on my machine with act. |
I would have to add that to the netbsd job's
Is there a particular reason we're using the builtin |
Yes. It should put quite a lot of information about what is happening.
I read the docs and write doesn't need flush if it is to stdout or stderr. We can't use std::cout as it can allocate. Being an allocator is tricky as most libraries assume allocation works. So an allocator can't call them. We have snmalloc::message which is what we normally use internally for logging. That adds a few bits on top of write for mild formatting without allocation. It is also cross platform. |
Done. Could you give the CI another go, please?
Makes sense - thanks for explaining! |
|
I'm hoping that #841 will fix the NetBSD implementation. Claude felt that would ensure the override to occur. |
as suggested by @devnexen in microsoft#839 (comment)
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
On NetBSD with GCC 10, taking the address of operator new/delete as function pointer template arguments can resolve to the libstdc++ versions rather than the snmalloc overrides defined in the same translation unit. This happens because the function pointer is resolved via PLT indirection on some ELF platforms. Replace the function-pointer template parameters with lambda arguments that make direct calls to operator new/delete. Direct calls are always resolved to the definitions in the translation unit, avoiding the platform-dependent symbol resolution issue.
3f7022d to
dfde462
Compare
GCC recognises operator new/delete as C++ replaceable allocation functions and can transform them into malloc/free or elide paired new/delete calls at higher optimisation levels. This bypasses the snmalloc override and causes the test to fail (e.g. operator new returns NULL from malloc instead of throwing std::bad_alloc). -fno-builtin does not prevent this because GCC handles C++ replaceable allocation functions separately from C built-in functions. Compiling the test at -O0 disables all such transformations.
GCC 10 on NetBSD 9.2 has a bug where it incorrectly optimises calls to replaceable C++ allocation functions (operator new/delete) inside templates and lambdas, replacing them with built-in malloc/free even when a user-provided replacement is visible in the same translation unit. This causes the new/delete override test to fail. Upgrade to NetBSD 10.1 (the vmactions/netbsd-vm default) and GCC 14 from pkgsrc, which does not have this issue.
The GCC 10 compiler bug that required this workaround is addressed by upgrading the NetBSD CI to GCC 14.
|
@paulhdk sorry for taking over here, but with the microsoft organisation policy that new contributors cannot trigger CI to run, I think this would have taken a very long time. Thank you for the work, and hopefully this will now pass CI. If you could take a look it would be great to check I haven't broken something. |
Makes sense. I'll just have to find another issue to work on that I can debug myself then 🙂 Maybe the Rust tests we discussed earlier in this thread?
Thank you, @mjp41 (+ @devnexen, + @SchrodingerZhu) for helping getting it across the finish line! So, the fix ended up being a combination of #841, upgrading GCC in the netbsd CI, and using lambdas instead of function pointers?
Is the hardcoded pragma to suppress warnings on GCC at the beginning of the tests (still) necessary? Otherwise, LGTM! |
That would be great, if you want to take that on. You got this working perfectly except for NetBSD, and I think that was mostly due to it being pinned to an ancient version. Hopefully, once this PR lands you will get automatically running CI, but it might require more PRs.
Thank you.
I am not sure anymore. I think GCC on some platforms was getting confused and emitting that warning. But I did feel a bit like a dog chasing its tail on this. I also am not sure the Lambda rewrite was required. We could go back to your address taking version as that didn't have the need for the warning? |
Avoid having to hard-code a suppression of certain GCC warnings
I just reverted those changes so we can give CI another go. I think it would definitely improve readability, but let's see if CI passes first. |
|
@paulhdk for other issues. There is the Rust testing you mentioned, which would be awesome. If you want something fairly self-contained but pretty low-level there is: #594. There is a proposed solution in there, but no one has tried it. That would get you familiar with a bit of snmalloc platform abstraction. A really meaty issue would be to take on some of #740. The first bullet would be pretty tough, but both myself and @SchrodingerZhu would be interested and able to provide you information and guidance. I'd suggest making a sub-issue to discuss how it would be done, if you want to try that. |
mjp41
left a comment
There was a problem hiding this comment.
Awesome. Thanks for your first contribution to snmalloc.
This one also sounds like a great excuse to invest some more time in figuring out how to run the CI locally with act and then document it within SNMalloc for others to use. I believe the current documentation for running the CI locally is outdated.
Awesome! All of these sound super interesting! Let's see how far I can get 🙂 |
I didn't know we had documentation on that, so it is definitely going to be out of date ;-)
Awesome. Please raise issues if you start work, so we don't duplicate effort. |
Thank you, will do! |


First-time contributor taking a stab at #85.
Questions:
newanddeletedon't introduce data races. Should we explicitly test that, e.g., with ThreadSanitizer?should that be tracked in a separate issue?
closes: #85