feat(pyamber): support Python UDF UI parameters#5603
Conversation
|
@Xiao-zhen-Liu could you review it? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5603 +/- ##
============================================
+ Coverage 54.42% 54.49% +0.07%
+ Complexity 2896 2895 -1
============================================
Files 1107 1103 -4
Lines 42768 42768
Branches 4599 4586 -13
============================================
+ Hits 23277 23307 +30
+ Misses 18134 18104 -30
Partials 1357 1357
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Xiao-zhen-Liu
left a comment
There was a problem hiding this comment.
Thanks for splitting this up — it's easy to follow and the approach is good: the way the generated code plugs in is clean, the parsing is a standalone function that's easy to test, and the INTEGER/BOOLEAN aliases behave correctly. My one real ask before merging is the timestamp fix (inline); the rest are smaller readability and test notes.
One note on the series: the earlier PR (#5141) that writes the generated hook merged before this one that defines it — only safe because nothing uses the feature yet, so worth doing in the other order from here on.
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 387 | 0.236 | 23,794/42,237/42,237 us | 🔴 +30.6% / 🔴 +20.7% |
| 🟢 | bs=100 sw=10 sl=64 | 794 | 0.484 | 125,294/148,648/148,648 us | 🟢 -17.3% / 🔴 +11.6% |
| ⚪ | bs=1000 sw=10 sl=64 | 917 | 0.56 | 1,087,918/1,140,176/1,140,176 us | ⚪ within ±5% / 🔴 -11.9% |
Baseline details
Latest main a0154d5 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 387 tuples/sec | 421 tuples/sec | 410.82 tuples/sec | -8.1% | -5.8% |
| bs=10 sw=10 sl=64 | MB/s | 0.236 MB/s | 0.257 MB/s | 0.251 MB/s | -8.2% | -5.9% |
| bs=10 sw=10 sl=64 | p50 | 23,794 us | 23,976 us | 23,785 us | -0.8% | +0.0% |
| bs=10 sw=10 sl=64 | p95 | 42,237 us | 32,333 us | 34,980 us | +30.6% | +20.7% |
| bs=10 sw=10 sl=64 | p99 | 42,237 us | 32,333 us | 34,980 us | +30.6% | +20.7% |
| bs=100 sw=10 sl=64 | throughput | 794 tuples/sec | 803 tuples/sec | 891.94 tuples/sec | -1.1% | -11.0% |
| bs=100 sw=10 sl=64 | MB/s | 0.484 MB/s | 0.49 MB/s | 0.544 MB/s | -1.2% | -11.1% |
| bs=100 sw=10 sl=64 | p50 | 125,294 us | 120,334 us | 112,277 us | +4.1% | +11.6% |
| bs=100 sw=10 sl=64 | p95 | 148,648 us | 179,735 us | 139,802 us | -17.3% | +6.3% |
| bs=100 sw=10 sl=64 | p99 | 148,648 us | 179,735 us | 139,802 us | -17.3% | +6.3% |
| bs=1000 sw=10 sl=64 | throughput | 917 tuples/sec | 911 tuples/sec | 1,041 tuples/sec | +0.7% | -11.9% |
| bs=1000 sw=10 sl=64 | MB/s | 0.56 MB/s | 0.556 MB/s | 0.635 MB/s | +0.7% | -11.9% |
| bs=1000 sw=10 sl=64 | p50 | 1,087,918 us | 1,097,734 us | 972,714 us | -0.9% | +11.8% |
| bs=1000 sw=10 sl=64 | p95 | 1,140,176 us | 1,132,612 us | 1,023,057 us | +0.7% | +11.4% |
| bs=1000 sw=10 sl=64 | p99 | 1,140,176 us | 1,132,612 us | 1,023,057 us | +0.7% | +11.4% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,516.65,200,128000,387,0.236,23794.23,42236.60,42236.60
1,100,10,64,20,2519.99,2000,1280000,794,0.484,125293.62,148648.49,148648.49
2,1000,10,64,20,21812.52,20000,12800000,917,0.560,1087917.60,1140176.15,1140176.15|
@Xiao-zhen-Liu It is ready for your next pass, thanks. |
|
@Xiao-zhen-Liu Please continue the review when you get a chance to unblock this PR. |
Xiao-zhen-Liu
left a comment
There was a problem hiding this comment.
LGTM — all the round-1 points are addressed, and I checked the timestamp fix and the apply-once open() logic on Python 3.10. Test coverage is solid.
One optional note (not blocking): a timestamp without Z parses with no timezone, while a Z value and the empty default are timezone-aware, so the two can't be compared directly. The frontend likely sends one consistent format, so feel free to leave it.
Automated Reviewer SuggestionsBased on the
|
|
Addressed. Timestamp parsing now normalizes offset-less values to UTC-aware datetimes, so plain ISO strings, If you agree, let's merge and I will raise the #4. |
What changes were proposed in this PR?
This PR adds Python runtime support for Python UDF UI parameters.
It introduces:
self.UiParameter(...)support on Python UDF operator base classes._texera_injected_ui_parametersbase hook that the Scala injector overrides.AttributeType.Dict,Any, andAttributeTypethroughfrom pytexera import *, so generated code from the Scala injector loads correctly.AttributeType.INTEGERandAttributeType.BOOLEAN, matching the frontend parser’s accepted tokens.This PR is stacked after the merged frontend foundation PR #5043 and Scala backend injection PR #5141. It does not wire UI parameters into operator execution end to end; that integration is handled by the next PR in the stack.
Any related issues, documentation, discussions?
Part of the Python UDF UI parameter feature split from
feat/ui-parameter.Related tracking issue / stack: #5044
Stack order:
How was this PR tested?
Commands run:
cd amber ruff check src/main/python/core/models/schema/attribute_type.py src/main/python/pytexera/udf/udf_operator.py src/test/python/pytexera/udf/test_udf_operator.py pytest src/test/python/pytexera/udf/test_udf_operator.py -q pytest src/test/python/pytexera/udf -qWas this PR authored or co-authored using generative AI tooling?
Generated-by: Me