fix(engine): Base64-encode Python worker startup config to survive Windows argv#5917
fix(engine): Base64-encode Python worker startup config to survive Windows argv#5917yangzhang75 wants to merge 1 commit into
Conversation
Automated Reviewer SuggestionsBased on the
|
|
It seems this branch has diverged from the current main branch. Could you clean up the branch. |
bb77e2c to
28c8df1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5917 +/- ##
============================================
+ Coverage 54.50% 54.69% +0.18%
- Complexity 2915 2952 +37
============================================
Files 1108 1110 +2
Lines 42807 42855 +48
Branches 4604 4608 +4
============================================
+ Hits 23332 23438 +106
+ Misses 18119 18050 -69
- Partials 1356 1367 +11
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 411 | 0.251 | 23,767/31,766/31,766 us | 🔴 +22.9% / 🔴 +113.4% |
| ⚪ | bs=100 sw=10 sl=64 | 817 | 0.499 | 120,635/156,208/156,208 us | ⚪ within ±5% / 🔴 +47.0% |
| ⚪ | bs=1000 sw=10 sl=64 | 912 | 0.557 | 1,100,507/1,127,471/1,127,471 us | ⚪ within ±5% / 🔴 +13.4% |
Baseline details
Latest main 1c580e5 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 411 tuples/sec | 459 tuples/sec | 781.13 tuples/sec | -10.5% | -47.4% |
| bs=10 sw=10 sl=64 | MB/s | 0.251 MB/s | 0.28 MB/s | 0.477 MB/s | -10.4% | -47.4% |
| bs=10 sw=10 sl=64 | p50 | 23,767 us | 22,514 us | 12,542 us | +5.6% | +89.5% |
| bs=10 sw=10 sl=64 | p95 | 31,766 us | 25,853 us | 14,886 us | +22.9% | +113.4% |
| bs=10 sw=10 sl=64 | p99 | 31,766 us | 25,853 us | 17,580 us | +22.9% | +80.7% |
| bs=100 sw=10 sl=64 | throughput | 817 tuples/sec | 812 tuples/sec | 999.37 tuples/sec | +0.6% | -18.2% |
| bs=100 sw=10 sl=64 | MB/s | 0.499 MB/s | 0.496 MB/s | 0.61 MB/s | +0.6% | -18.2% |
| bs=100 sw=10 sl=64 | p50 | 120,635 us | 120,386 us | 99,687 us | +0.2% | +21.0% |
| bs=100 sw=10 sl=64 | p95 | 156,208 us | 149,966 us | 106,271 us | +4.2% | +47.0% |
| bs=100 sw=10 sl=64 | p99 | 156,208 us | 149,966 us | 115,445 us | +4.2% | +35.3% |
| bs=1000 sw=10 sl=64 | throughput | 912 tuples/sec | 924 tuples/sec | 1,036 tuples/sec | -1.3% | -12.0% |
| bs=1000 sw=10 sl=64 | MB/s | 0.557 MB/s | 0.564 MB/s | 0.632 MB/s | -1.2% | -11.9% |
| bs=1000 sw=10 sl=64 | p50 | 1,100,507 us | 1,073,848 us | 970,675 us | +2.5% | +13.4% |
| bs=1000 sw=10 sl=64 | p95 | 1,127,471 us | 1,137,266 us | 1,011,928 us | -0.9% | +11.4% |
| bs=1000 sw=10 sl=64 | p99 | 1,127,471 us | 1,137,266 us | 1,045,045 us | -0.9% | +7.9% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,486.52,200,128000,411,0.251,23766.55,31766.35,31766.35
1,100,10,64,20,2448.25,2000,1280000,817,0.499,120635.30,156207.52,156207.52
2,1000,10,64,20,21935.23,20000,12800000,912,0.557,1100507.11,1127470.78,1127470.78|
@yangzhang75 please rebase your commit history on this branch. right now it contains 4461 commits. I marked the PR into a draft for now. please reopen it when ready. |
967020a to
28c8df1
Compare
|
Sure, feel free to ping me when is ready for a review pass. thanks. |
|
By default, PR is ready for review. If not, author should mark it a draft |
|
Ok I am doing testing now |
|
It worked, thanks. |
Yicong-Huang
left a comment
There was a problem hiding this comment.
ok, approved for the change. however base64 causes the payload not readable by human. could potentially make future debugging harder. low priority now.
Thanks!
What changes were proposed in this PR?
Follow-up fix for #5597, which started passing the Python worker startup config as a
single JSON-string command-line argument. On Windows the JVM assembles argv into one
command line and the inner double quotes are stripped before Python receives them, so
json.loadsfails withJSONDecodeError: Expecting property name enclosed in double quotes.Linux/macOS are unaffected (the JVM passes argv directly, quotes survive).
This Base64-encodes the JSON on the JVM side (
encodeStartupConfig) and Base64-decodes itin
texera_run_python_worker.pybefore parsing. Base64 uses only[A-Za-z0-9+/=], so theargument carries no quotes or spaces and survives argv quoting on every platform. The 19-key
contract and all validation are unchanged.
Any related issues, documentation, discussions?
Closes #5916
How was this PR tested?
PythonWorkflowWorkerStartupConfigSpecupdated to decode Base64; added a regressiontest asserting the encoded output contains no quotes/whitespace. 5/5 pass; scalafmt clean.
test_run_python_worker.pyupdated to feed Base64 input; added a round-trip test.ruff check + format clean.
Base64.getEncoderandPython
base64.b64decode.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)