Skip to content

feat(pyamber): support Python UDF UI parameters#5603

Open
carloea2 wants to merge 11 commits into
apache:mainfrom
carloea2:ui-parameter-backend-python
Open

feat(pyamber): support Python UDF UI parameters#5603
carloea2 wants to merge 11 commits into
apache:mainfrom
carloea2:ui-parameter-backend-python

Conversation

@carloea2

@carloea2 carloea2 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

This PR adds Python runtime support for Python UDF UI parameters.

It introduces:

Area Change
PyTexera runtime API Adds self.UiParameter(...) support on Python UDF operator base classes.
Runtime injection bridge Adds the _texera_injected_ui_parameters base hook that the Scala injector overrides.
Typed parameter parsing Converts injected UI parameter values into Python values using AttributeType.
PyTexera exports Exports Dict, Any, and AttributeType through from pytexera import *, so generated code from the Scala injector loads correctly.
Attribute type compatibility Adds Python enum aliases for AttributeType.INTEGER and AttributeType.BOOLEAN, matching the frontend parser’s accepted tokens.
Follow-up cleanup from PR #5141 Removes the temporary generated-code comment that described this runtime PR as a future dependency.
Test coverage Adds PyAmber tests for injected values, parsing, duplicate declarations, unsupported types, enum aliases, and instance-local state.

This PR is stacked after the merged frontend foundation PR #5043 and Scala backend injection PR #5141. It does not wire UI parameters into operator execution end to end; that integration is handled by the next PR in the stack.

Any related issues, documentation, discussions?

Part of the Python UDF UI parameter feature split from feat/ui-parameter.

Related tracking issue / stack: #5044

Stack order:

  1. Frontend UI parameter building blocks: feat(frontend): add Python UDF UI parameter form support #5043
  2. Scala backend injection model: feat(workflow-operator): add Python UDF UI parameter injection model #5141
  3. Python runtime support: this PR
  4. End-to-end integration

How was this PR tested?

Commands run:

cd amber
ruff check src/main/python/core/models/schema/attribute_type.py src/main/python/pytexera/udf/udf_operator.py src/test/python/pytexera/udf/test_udf_operator.py
pytest src/test/python/pytexera/udf/test_udf_operator.py -q
pytest src/test/python/pytexera/udf -q

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Me

@carloea2

Copy link
Copy Markdown
Contributor Author

@Xiao-zhen-Liu could you review it?

@codecov-commenter

codecov-commenter commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.52066% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.49%. Comparing base (a0154d5) to head (d54fb6c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
amber/src/main/python/pytexera/udf/udf_operator.py 96.80% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5603      +/-   ##
============================================
+ Coverage     54.42%   54.49%   +0.07%     
+ Complexity     2896     2895       -1     
============================================
  Files          1107     1103       -4     
  Lines         42768    42768              
  Branches       4599     4586      -13     
============================================
+ Hits          23277    23307      +30     
+ Misses        18134    18104      -30     
  Partials       1357     1357              
Flag Coverage Δ *Carryforward flag
access-control-service 70.44% <ø> (ø)
agent-service 34.36% <ø> (ø) Carriedforward from 5882434
amber 56.34% <100.00%> (ø)
computing-unit-managing-service 1.65% <ø> (ø)
config-service 57.35% <ø> (ø)
file-service 58.59% <ø> (ø)
frontend 48.08% <ø> (-0.17%) ⬇️ Carriedforward from 5882434
pyamber 90.34% <97.50%> (+0.14%) ⬆️
python 90.75% <ø> (-0.02%) ⬇️ Carriedforward from 5882434
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Xiao-zhen-Liu Xiao-zhen-Liu self-requested a review June 10, 2026 19:11

@Xiao-zhen-Liu Xiao-zhen-Liu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this up — it's easy to follow and the approach is good: the way the generated code plugs in is clean, the parsing is a standalone function that's easy to test, and the INTEGER/BOOLEAN aliases behave correctly. My one real ask before merging is the timestamp fix (inline); the rest are smaller readability and test notes.

One note on the series: the earlier PR (#5141) that writes the generated hook merged before this one that defines it — only safe because nothing uses the feature yet, so worth doing in the other order from here on.

Comment thread amber/src/main/python/core/models/schema/attribute_type.py Outdated
Comment thread amber/src/main/python/core/models/schema/attribute_type.py Outdated
Comment thread amber/src/main/python/core/models/schema/attribute_type.py Outdated
Comment thread amber/src/main/python/core/models/schema/attribute_type.py Outdated
Comment thread amber/src/main/python/core/models/schema/attribute_type.py
Comment thread amber/src/main/python/pytexera/udf/udf_operator.py Outdated
Comment thread amber/src/main/python/pytexera/udf/udf_operator.py
Comment thread amber/src/main/python/pytexera/udf/udf_operator.py
Comment thread amber/src/main/python/pytexera/__init__.py Outdated
Comment thread amber/src/test/python/pytexera/udf/test_udf_operator.py
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 4 worse · ⚪ 9 noise (<±5%) · 0 without baseline

Compared against main a0154d5 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 387 0.236 23,794/42,237/42,237 us 🔴 +30.6% / 🔴 +20.7%
🟢 bs=100 sw=10 sl=64 794 0.484 125,294/148,648/148,648 us 🟢 -17.3% / 🔴 +11.6%
bs=1000 sw=10 sl=64 917 0.56 1,087,918/1,140,176/1,140,176 us ⚪ within ±5% / 🔴 -11.9%
Baseline details

Latest main a0154d5 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 387 tuples/sec 421 tuples/sec 410.82 tuples/sec -8.1% -5.8%
bs=10 sw=10 sl=64 MB/s 0.236 MB/s 0.257 MB/s 0.251 MB/s -8.2% -5.9%
bs=10 sw=10 sl=64 p50 23,794 us 23,976 us 23,785 us -0.8% +0.0%
bs=10 sw=10 sl=64 p95 42,237 us 32,333 us 34,980 us +30.6% +20.7%
bs=10 sw=10 sl=64 p99 42,237 us 32,333 us 34,980 us +30.6% +20.7%
bs=100 sw=10 sl=64 throughput 794 tuples/sec 803 tuples/sec 891.94 tuples/sec -1.1% -11.0%
bs=100 sw=10 sl=64 MB/s 0.484 MB/s 0.49 MB/s 0.544 MB/s -1.2% -11.1%
bs=100 sw=10 sl=64 p50 125,294 us 120,334 us 112,277 us +4.1% +11.6%
bs=100 sw=10 sl=64 p95 148,648 us 179,735 us 139,802 us -17.3% +6.3%
bs=100 sw=10 sl=64 p99 148,648 us 179,735 us 139,802 us -17.3% +6.3%
bs=1000 sw=10 sl=64 throughput 917 tuples/sec 911 tuples/sec 1,041 tuples/sec +0.7% -11.9%
bs=1000 sw=10 sl=64 MB/s 0.56 MB/s 0.556 MB/s 0.635 MB/s +0.7% -11.9%
bs=1000 sw=10 sl=64 p50 1,087,918 us 1,097,734 us 972,714 us -0.9% +11.8%
bs=1000 sw=10 sl=64 p95 1,140,176 us 1,132,612 us 1,023,057 us +0.7% +11.4%
bs=1000 sw=10 sl=64 p99 1,140,176 us 1,132,612 us 1,023,057 us +0.7% +11.4%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,516.65,200,128000,387,0.236,23794.23,42236.60,42236.60
1,100,10,64,20,2519.99,2000,1280000,794,0.484,125293.62,148648.49,148648.49
2,1000,10,64,20,21812.52,20000,12800000,917,0.560,1087917.60,1140176.15,1140176.15

@carloea2

Copy link
Copy Markdown
Contributor Author

@Xiao-zhen-Liu It is ready for your next pass, thanks.

@chenlica

Copy link
Copy Markdown
Contributor

@Xiao-zhen-Liu Please continue the review when you get a chance to unblock this PR.

@Xiao-zhen-Liu Xiao-zhen-Liu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all the round-1 points are addressed, and I checked the timestamp fix and the apply-once open() logic on Python 3.10. Test coverage is solid.

One optional note (not blocking): a timestamp without Z parses with no timezone, while a Z value and the empty default are timezone-aware, so the two can't be compared directly. The frontend likely sends one consistent format, so feel free to leave it.

@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @kunwp1, @aglinxinyuan
    You can notify them by mentioning @kunwp1, @aglinxinyuan in a comment.

@carloea2

Copy link
Copy Markdown
Contributor Author

Addressed. Timestamp parsing now normalizes offset-less values to UTC-aware datetimes, so plain ISO strings, Z strings, and the empty default are comparable. Explicit offsets are preserved.

If you agree, let's merge and I will raise the #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants