Skip to content

reduce_sum: Task Isolation Missing in TBB #3388

@andrepfeuffer

Description

@andrepfeuffer

Deadlock or Wrong Results

Files Affected

  • stan/lib/stan_math/stan/math/prim/functor/reduce_sum.hpp
    • tbb::parallel_reduce call site

Root Cause Analysis

When reduce_sum calls are nested (one reduce_sum inside another's ReduceFunction), TBB task stealing can corrupt partial sums:

struct outer_fn {
  double operator()(const std::vector<double>& slice, size_t start, size_t end,
                    std::ostream* msgs) const {
    // Inner reduce_sum inside outer reduce_sum!
    return stan::math::reduce_sum<inner_fn>(slice, grainsize, msgs);
  }
};

std::vector<double> outer_data(100, 1.0);
double result = stan::math::reduce_sum<outer_fn>(outer_data, 5, nullptr);

The Problem:

TBB's task stealing allows work from one task to be stolen by another:

Outer reduce_sum:
  Chunk A: running inner_reduce_sum
  Chunk B: ready to run
  Chunk C: ready to run

Inner reduce_sum (from Chunk A):
  Task 1: running
  Task 2: ready

TBB's work stealer:
  "Chunk B is idle, but Inner Task 1 is ready → steal it for Chunk B"
  
Result:
  Work from different reduction trees intermixes
  Partial sums from Inner get added to Outer incorrectly
  Deadlock possible if all threads stuck waiting on wrong tasks

The Fix

// BEFORE (allows task stealing):
tbb::parallel_reduce(range, worker);
return_type result = worker.sum_;

// AFTER (isolates task arena):
return_type result(0);
tbb::this_task_arena::isolate([&] {
  tbb::parallel_reduce(range, worker);
  result = worker.sum_;
});
return result;

tbb::this_task_arena::isolate() creates a boundary that TBB respects — work inside cannot be stolen by work outside, and vice versa.

Test Coverage

TEST(StanMathPrim_reduce_sum, nested_reduce_sum_isolation) {
  struct inner_fn {
    double operator()(const std::vector<double>& slice, size_t, size_t,
                      std::ostream*) const {
      double s = 0;
      for (auto x : slice) s += x;
      return s;
    }
  };

  struct outer_fn {
    double operator()(const std::vector<double>& slice, size_t, size_t,
                      std::ostream* msgs) const {
      // Inner reduce_sum
      return stan::math::reduce_sum<inner_fn>(slice, 1, msgs);
    }
  };

  std::vector<double> outer_data(100, 1.0);
  // Without isolation, task stealing could corrupt partial sums
  // With isolation, each reduce_sum respects its boundary
  double result = stan::math::reduce_sum<outer_fn>(outer_data, 5, nullptr);
  EXPECT_DOUBLE_EQ(result, 100.0);
}

Impact

Before Fix:

  • Nested reduce_sum calls silently produce wrong results
  • Symptoms: unpredictable values, race conditions, possible deadlock
  • Only manifests under specific threading/load conditions
  • Very difficult to debug

After Fix:

  • Task boundaries respected
  • Nested reduce_sum works correctly
  • Deterministic results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions