Fix thread pool starvation under multi-concurrency#2387
Conversation
Parse AWS_LAMBDA_MAX_CONCURRENCY as integer to use as the default polling task count instead of Math.Max(2, processorCount). This ensures enough polling tasks exist to fill all available concurrency slots. Add AdjustThreadPoolSettings() to pre-size the ThreadPool to mcCount + processorCount at startup, preventing blocking handlers from starving polling task continuations of threads needed to cycle back to RAPID's /next endpoint. Without both changes, blocking handlers (Thread.Sleep, .Result, .Wait()) exhaust the ThreadPool under multi-concurrency, causing Runtime.Unavailable errors because polling tasks cannot resume their await continuations. Changes: - Utils.cs: Add GetMaxConcurrency() method, use parsed MC value in DetermineProcessingTaskCount (fallback to Math.Max(2, processorCount) for non-numeric values) - LambdaBootstrap.cs: Add AdjustThreadPoolSettings() called after AdjustMemorySettings() in RunAsync startup sequence - TestMultiConcurrencyRuntimeApiClient.cs: Make thread-safe for concurrent polling task access (ConcurrentDictionary, lock on Queue) - Add tests demonstrating fix works with MC=10/20 blocking handlers
| { | ||
| try | ||
| { | ||
| var maxConcurrency = Utils.GetMaxConcurrency(_environmentVariables); |
There was a problem hiding this comment.
I'm thinking we should check if Constants.ENVIRONMENT_VARIABLE_AWS_LAMBDA_DOTNET_PROCESSING_TASKS is set or create a separate environment variable to check. If that environment is set then use that value over AWS_LAMBDA_MAX_CONCURRENCY. That way if a customer finds in their work load that the number of threads we are creating hurts their performance they still have a knob they can use to adjust. For most flexibility in this advanced use case I prefer a separate environment, maybe AWS_LAMBDA_DOTNET_MIN_THREADS
There was a problem hiding this comment.
Thanks for thinking about this in some detail, but I'm worried this might be unneeded complexity. If customers want fewer threads, they can set lower MC, and if they want fewer processing tasks, they can set that. Is there a use case you can think of where threads could hurt performance?
Summary
Math.Max(2, processorCount)— ensures enough/nextcalls are pending to avoidRuntime.Unavailablerace conditionsSetMinThreads) toMC + ProcessorCountso polling loop continuations aren't starved by handler workProblem
With the current defaults (2 polling tasks on 1 vCPU), the runtime doesn't call RAPID's
/nextfast enough under load. When handlers do blocking work, the .NET thread pool (which defaultsMinThreadstoProcessorCount) can't resume the polling loop'sawaitcontinuation, causing RAPID to timeout withRuntime.Unavailable.Benchmark Results
Tested with RIE across configurations from MC=8/1vCPU to MC=128/4vCPU with both
Thread.Sleepand CPU-bound workloads:Test plan
LambdaBootstrapMultiConcurrencyTestspassUtilsTest.DetermineProcessingTaskCountcases reflect new defaultsUtilsTest.GetMaxConcurrencytests cover parsing edge cases