fix(pyamber): use monotonic_ns to prevent negative elapsed times in statistics_manager by lalalastella · Pull Request #5791 · apache/texera

lalalastella · 2026-06-19T05:57:43Z

What changes were proposed in this PR?

Root cause: time.time_ns() is a wall-clock call that can go backward during NTP adjustments or inside VMs/containers. In MainLoop._switch_context and _process_dcm, elapsed time is computed as end_time - start_time where both values come from time.time_ns(). When the clock steps backward between the two calls, the difference is negative. Passing a negative value to StatisticsManager.increase_data_processing_time raised ValueError: Time must be non-negative, which propagated into the DataProcessor thread and left the worker in a deadlocked wait.

Before → After

Before:
  start_time = time.time_ns()   # wall clock — can go backward
  ...
  end_time   = time.time_ns()   # can be < start_time after NTP step
  statistics_manager.increase_data_processing_time(end_time - start_time)
  #  → ValueError: Time must be non-negative
  #  → worker thread hangs (unhandled exception)

After:
  start_time = time.monotonic_ns()   # guaranteed non-decreasing
  ...
  end_time   = time.monotonic_ns()   # always ≥ start_time
  statistics_manager.increase_data_processing_time(end_time - start_time)
  #  → always ≥ 0, no exception

Changes (3 files):

File	Change
`core/runnables/main_loop.py`	Replace all 6 `time.time_ns()` calls with `time.monotonic_ns()`
`core/architecture/managers/statistics_manager.py`	Defense-in-depth: log a warning and return early for negative elapsed times instead of raising `ValueError`
`test/python/core/architecture/managers/test_statistics_manager.py`	Update `test_negative_time_raises` → `test_negative_time_is_clamped_to_zero` to match new behavior

Any related issues, documentation, discussions?

Closes #3768

How was this PR tested?

cd amber && pytest src/test/python/core/architecture/managers/test_statistics_manager.py -v

(Requires bin/python-proto-gen.sh to have been run first, as in CI.)

All existing TestStatisticsManager* tests pass. The updated test test_negative_time_is_clamped_to_zero replaces test_negative_time_raises and covers the new clamp-and-warn behavior.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

time.time_ns() (wall clock) can go backward on NTP adjustments or in VMs, making end_time - start_time negative in _switch_context and _process_dcm. Passing that negative delta to StatisticsManager raised ValueError, which was unhandled by the DataProcessor thread and left the worker in a deadlocked wait. Fix: replace all time.time_ns() in main_loop.py with time.monotonic_ns(), which is guaranteed non-decreasing. Defense-in-depth: update increase_data_processing_time and increase_control_processing_time to log a warning and return early instead of raising ValueError for negative values. This keeps the worker alive in the unlikely event of any remaining clock anomaly. Closes apache#3768

Copilot

Pull request overview

Fixes intermittent worker hangs in the PyAmber runtime by switching elapsed-time measurements to a monotonic clock and making statistics collection tolerant to negative durations.

Changes:

Replace time.time_ns() with time.monotonic_ns() in MainLoop when measuring elapsed processing time and total execution time.
Make StatisticsManager.increase_*_processing_time ignore (and warn on) negative durations instead of raising.
Update the statistics manager unit test to match the new “ignore negative durations” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`amber/src/main/python/core/runnables/main_loop.py`	Switches processing/execution timing from wall-clock to monotonic time to avoid negative elapsed durations.
`amber/src/main/python/core/architecture/managers/statistics_manager.py`	Adds defense-in-depth by warning and returning early on negative processing durations.
`amber/src/test/python/core/architecture/managers/test_statistics_manager.py`	Updates the negative-time test to validate the new behavior (no exception, accumulator unchanged).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    @pytest.mark.parametrize(
-        "method",
-        ["increase_data_processing_time", "increase_control_processing_time"],
+        "method, attr",
+        [
+            ("increase_data_processing_time", "_data_processing_time"),
+            ("increase_control_processing_time", "_control_processing_time"),
+        ],
    )


github-actions · 2026-06-19T06:54:12Z

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 4 worse · ⚪ 11 noise (<±5%) · 0 without baseline

Compared against main 731d671 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🔴	bs=10 sw=10 sl=64	402	0.245	22,699/37,616/37,616 us	🔴 +7.2% / 🔴 +7.5%
🔴	bs=100 sw=10 sl=64	942	0.575	104,597/134,381/134,381 us	🔴 +19.0% / 🟢 -6.8%
⚪	bs=1000 sw=10 sl=64	1,103	0.673	906,025/962,805/962,805 us	⚪ within ±5% / 🟢 -6.9%

Baseline details

Latest main 731d671 from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	402 tuples/sec	421 tuples/sec	410.82 tuples/sec	-4.5%	-2.1%
bs=10 sw=10 sl=64	MB/s	0.245 MB/s	0.257 MB/s	0.251 MB/s	-4.7%	-2.3%
bs=10 sw=10 sl=64	p50	22,699 us	22,952 us	23,785 us	-1.1%	-4.6%
bs=10 sw=10 sl=64	p95	37,616 us	35,089 us	34,980 us	+7.2%	+7.5%
bs=10 sw=10 sl=64	p99	37,616 us	35,089 us	34,980 us	+7.2%	+7.5%
bs=100 sw=10 sl=64	throughput	942 tuples/sec	970 tuples/sec	891.94 tuples/sec	-2.9%	+5.6%
bs=100 sw=10 sl=64	MB/s	0.575 MB/s	0.592 MB/s	0.544 MB/s	-2.9%	+5.6%
bs=100 sw=10 sl=64	p50	104,597 us	104,735 us	112,277 us	-0.1%	-6.8%
bs=100 sw=10 sl=64	p95	134,381 us	112,921 us	139,802 us	+19.0%	-3.9%
bs=100 sw=10 sl=64	p99	134,381 us	112,921 us	139,802 us	+19.0%	-3.9%
bs=1000 sw=10 sl=64	throughput	1,103 tuples/sec	1,110 tuples/sec	1,041 tuples/sec	-0.6%	+6.0%
bs=1000 sw=10 sl=64	MB/s	0.673 MB/s	0.677 MB/s	0.635 MB/s	-0.6%	+5.9%
bs=1000 sw=10 sl=64	p50	906,025 us	897,776 us	972,714 us	+0.9%	-6.9%
bs=1000 sw=10 sl=64	p95	962,805 us	945,364 us	1,023,057 us	+1.8%	-5.9%
bs=1000 sw=10 sl=64	p99	962,805 us	945,364 us	1,023,057 us	+1.8%	-5.9%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,497.78,200,128000,402,0.245,22699.28,37616.06,37616.06
1,100,10,64,20,2124.27,2000,1280000,942,0.575,104596.76,134380.67,134380.67
2,1000,10,64,20,18139.02,20000,12800000,1103,0.673,906024.97,962805.12,962805.12

mengw15 · 2026-06-20T01:16:23Z

Please make sure CI tests pass first

github-actions Bot assigned lalalastella Jun 19, 2026

github-actions Bot added fix pyamber common labels Jun 19, 2026

lalalastella force-pushed the fix/monotonic-clock-negative-elapsed-time branch from cf0da76 to e81241f Compare June 19, 2026 06:24

github-actions Bot removed the common label Jun 19, 2026

aglinxinyuan requested review from Copilot and mengw15 June 19, 2026 06:42

Copilot started reviewing on behalf of aglinxinyuan June 19, 2026 06:43 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pyamber): use monotonic_ns to prevent negative elapsed times in statistics_manager#5791

fix(pyamber): use monotonic_ns to prevent negative elapsed times in statistics_manager#5791
lalalastella wants to merge 1 commit into
apache:mainfrom
lalalastella:fix/monotonic-clock-negative-elapsed-time

lalalastella commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

mengw15 commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lalalastella commented Jun 19, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 19, 2026

⚠️ Benchmark changes need a look

Uh oh!

mengw15 commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants