Commit 08c0783
[SPARK-54617][PYTHON][SQL] Enable Arrow Grouped Iter Aggregate UDF registration for SQL
### What changes were proposed in this pull request?
This PR enables Arrow grouped iter aggregate UDFs to be registered and used in SQL queries. Previously, Arrow iter aggregate UDFs could only be used via DataFrame API, but not in SQL.
The main change is adding `SQL_GROUPED_AGG_ARROW_ITER_UDF` to the allowed eval types in `UDFRegistration.register()` method, along with comprehensive test cases.
### Why are the changes needed?
Arrow iter aggregate UDFs provide a memory-efficient way to perform grouped aggregations by processing data in batches iteratively. However, they could only be used via DataFrame API, not in SQL queries. This limitation prevented users from using these UDFs in SQL-based workflows.
### Does this PR introduce _any_ user-facing change?
Yes. Users can now register Arrow grouped iter aggregate UDFs and use them in SQL queries.
Example:
```python
from typing import Iterator
from pyspark.sql.functions import arrow_udf
import pyarrow as pa
arrow_udf("double")
def arrow_mean_iter(it: Iterator[pa.Array]) -> float:
sum_val = 0.0
cnt = 0
for v in it:
sum_val += pa.compute.sum(v).as_py()
cnt += len(v)
return sum_val / cnt if cnt > 0 else 0.0
# Now this works:
spark.udf.register("arrow_mean_iter", arrow_mean_iter)
spark.sql("SELECT id, arrow_mean_iter(v) as mean FROM test_table GROUP BY id").show()
```
### How was this patch tested?
Added comprehensive test cases covering:
- Single column Arrow iter aggregate UDF in SQL
- Multiple columns Arrow iter aggregate UDF in SQL
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #53357 from Yicong-Huang/SPARK-54617/feat/arrow-iter-agg-udf-sql.
Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>1 parent 3dffd12 commit 08c0783
File tree
4 files changed
+80
-3
lines changed- python/pyspark/sql
- connect
- tests
- arrow
- pandas
4 files changed
+80
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
| 298 | + | |
298 | 299 | | |
299 | 300 | | |
300 | 301 | | |
301 | 302 | | |
302 | 303 | | |
303 | 304 | | |
304 | 305 | | |
305 | | - | |
| 306 | + | |
| 307 | + | |
306 | 308 | | |
307 | 309 | | |
308 | 310 | | |
| |||
Lines changed: 72 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1212 | 1212 | | |
1213 | 1213 | | |
1214 | 1214 | | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
1215 | 1287 | | |
1216 | 1288 | | |
1217 | 1289 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
| 215 | + | |
| 216 | + | |
216 | 217 | | |
217 | 218 | | |
218 | 219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
681 | 681 | | |
682 | 682 | | |
683 | 683 | | |
| 684 | + | |
684 | 685 | | |
685 | 686 | | |
686 | 687 | | |
687 | 688 | | |
688 | 689 | | |
689 | 690 | | |
690 | 691 | | |
691 | | - | |
| 692 | + | |
| 693 | + | |
692 | 694 | | |
693 | 695 | | |
694 | 696 | | |
| |||
0 commit comments