-
Notifications
You must be signed in to change notification settings - Fork 25
Server thread safety #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Server thread safety #275
Conversation
…ety, serializing access to shared global resources like NVM and global keycache
billphipps
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly excellent! You solved this just the way I had hoped for!
My requested changes are very limited and not really functional. More just fleshing out the exact requirements for a real implementation and a few minor typos and renaming opportunities.
The stress testing framework is outstanding!
| #include "wolfhsm/wh_lock.h" | ||
| #include "wolfhsm/wh_error.h" | ||
|
|
||
| #ifdef WOLFHSM_CFG_THREADSAFE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best name? Consider the more mundane WOLFHSM_CFG_LOCKS. Threadsafe may imply more than just locks, like cancelability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah was kind of wishy washy on this. good point. Let me think on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding posix into the name of this file since it heavily used posix to provide any real functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it might be nice to organize our posix tests in one spot. maybe test/posix or port/posix/test/ so we can leave our wh_test_*.c stuff generic for all platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that solution. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a good idea. Unfortunately a lot of our generic tests modules (e.g. wh_test_clientserver.c) contain both generic drivers as well as a POSIX harness (e.g. spins up the client + server threads). I think it might be best to push this out of scope of this PR and refactor the tests to better split generic test drivers (e.g. whTest_XXXClientCfg(whClientConfig*) and whTest_XXXCLientCtx(whClientCtx*)) from the actual underlying test harness. I'd wager we could reduce a lot of code that way with one or two unified harnesses that drivers just run on top of
rizlik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into tests yet.
Great work.
Is this lock enough to properly synchronize client request?
Example, _HandleNvmRead:
rc = wh_Nvm_GetMetadata(server->nvm, id, &meta);
if (rc != WH_ERROR_OK) {
return rc;
}
if (offset >= meta.len)
return WH_ERROR_BADARGS;
/* Clamp length to object size */
if ((offset + len) > meta.len) {
len = meta.len - offset;
}
rc = wh_Nvm_ReadChecked(server->nvm, id, offset, len, out_data);
if (rc != WH_ERROR_OK)
metadata can be changed between GetMetadata and ReadChecked.
Also, when handling key request:
/* get a new id if one wasn't provided */
if (WH_KEYID_ISERASED(meta->id)) {
ret = wh_Server_KeystoreGetUniqueId(server, &meta->id);
resp.rc = ret;
}
/* write the key */
if (ret == WH_ERROR_OK) {
ret = wh_Server_KeystoreCacheKeyChecked(server, meta, in);
resp.rc = ret;
}
the id might not be unique anymore when _KeysotreCacheKeyCached.
Would more coarse granular locking at request level simplify the design?
API/Error handling: - Add initialized flag to whLock structure to distinguish init states - Enhance error handling: acquire/release check initialized flag - Make wh_Lock_Cleanup zero structure for clear post-cleanup state - Document init/cleanup must be single-threaded (no atomics) - Document cleanup preconditions (no active contention required) - Update all API docs with precise return codes and error conditions - Change blocking acquire failure from ERROR_LOCKED to ERROR_ABORTED - Add comment explaining why non-blocking acquire is not provided POSIX port improvements: - Enhanced errno mapping in posix_lock.c (EINVAL→BADARGS, etc) - Trap PTHREAD_MUTEX_ERRORCHECK errors (EDEADLK, EPERM) Test coverage: - Add testUninitializedLock to validate error handling - Enhance testLockLifecycle with post-cleanup validation tests Misc: - Apply consistent critical section style pattern in wh_nvm.c - Update copyright years to 2026 - Rename stress test files to wh_test_posix_threadsafe_stress.*
|
@rizlik great catch, thanks. I thought I fixed all of those but clearly there are some non-atomic compound operations still lurking. I will make another pass to ensure I make them all atomic. |
I wonder, if we are going to use a single lock, can't we just acquire the lock at It's probably a tradeoff, we'll gain simplicity as we don't need locked vs unlocked APIs but there is the risk that other part of the code misuse Nvm API and introduce races in the future. |
@rizlik yep that is what I was worried about and why I didn't initially try it that way I'm not 100% sold on which is better |
…nter, img_mgr, and nvm modules Adds proper thread-safety locking discipline to additional server modules that perform compound NVM operations. This prevents TOCTOU (Time-Of-Check-Time-Of-Use) issues where metadata could become stale between check and use/writeback. Changes: - wh_server_cert.c: Add NVM locking for atomic GetMetadata + Read operations in certificate read and export paths - wh_server_counter.c: Add NVM locking for atomic read-modify-write counter increment operations - wh_server_img_mgr.c: Add NVM locking for atomic signature load operations - wh_server_keystore.c: Refactor to use unlocked internal variants for compound operations (GetUniqueId + CacheKey, policy check + erase, freshen + export). Add locking discipline documentation. - wh_server_nvm.c: Add NVM locking for DMA read operations to ensure metadata remains valid throughout transfer. Add locking discipline documentation. - wh_test_posix_threadsafe_stress.c: Add new stress test phases for counter concurrent increment, counter increment vs read, NVM read vs resize, NVM concurrent resize, and NVM read DMA vs resize. Add counter atomicity validation. All compound operations now follow the pattern: 1. Acquire server->nvm->lock 2. Use only *Unlocked() variants internally 3. Keep lock held for entire operation including DMA 4. Release lock after all metadata-dependent operations complete
AlexLanzano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good so far!
My main concern is the addition of *Unlocked functions. I feel like there has to be a way to remove those and still use the top level API functions by either checking if the current thread has already acquired the nvm lock. Or by creating a lock for both the keystore and the nvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it might be nice to organize our posix tests in one spot. maybe test/posix or port/posix/test/ so we can leave our wh_test_*.c stuff generic for all platforms
…vel server module APIs (keystore, NVM, counter, etc.) and aquire lock in request handling functions (e.g. wh_Server_HandleXXXRequest())
07aebaf to
03719fa
Compare
667996d to
2c446fa
Compare
Server thread safety
TL;DR: Makes wolfHSM server safe to use in multithreaded scenarios.
Overview
This pull request implements thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) subsystem which also protects the global key cache. Crypto is left to a subsequent PR but is the likely next candidate.
Note that a server context itself still cannot be shared across threads without proper serialization by the caller. This PR adds the mechanisms such that, when multiple server contexts share an NVM instance (which includes the global keystore), access to those shared resources is properly serialized, allowing requests from multiple clients to be processed concurrently in separate threads.
Changes
wh_lock.{c,h}) with callback-based design for platform independencewh_Server_NvmLock()/wh_Server_NvmUnlock()) with convenience macrosWH_SERVER_NVM_LOCK()/WH_SERVER_NVM_UNLOCK()WOLFHSM_CFG_THREADSAFEbuild option. When this option is NOT defined, all lock macros compile to no-ops with zero overheadDesign Rationale
The locking strategy is intentionally simple: acquire the NVM lock at the start of a request handler, perform all operations (including any compound operations involving multiple NVM/cache accesses), then release the lock. This approach:
Gaps/Future Work