Add changes for compatibility with WASM components and collocated UDF servers by kesmit13 · Pull Request #121 · singlestore-labs/singlestoredb-python

kesmit13 · 2026-04-01T18:28:22Z

This PR makes several changes to allow the singlestoredb work in WASM environments. Many of these changes benefit standard installations as well such as lazy loading of numpy, pandas, polars, and pyarrow. Others move imports that are only needed in certain environments, but not within WASM.

A new collocated UDF server implementation is also included that uses a high-performance loop in the C extension to parse and call Python functions on each row. This function is used both by standard collocated servers as well as WASM-based UDF handlers.

Note

High Risk
High risk because it introduces a new UDF server runtime (sockets, threading/process forking, dynamic registration) and makes large changes to the C extension’s serialization/execution hot path and type handling (DECIMAL/date/time/datetime).

Overview
Adds a new plugin-mode external-function server (functions/ext/plugin) that serves UDFs over a Unix socket via a CLI (python-udf-server), including a control protocol (@@health, @@functions, @@register) and thread or pre-fork process pools with live registry refresh.

Extends the C accelerator (accel.c) and rowdat_1/JSON/ASGI UDF plumbing to support DECIMAL and date/time/datetime types end-to-end, hardens buffer/length handling, and adds new fast-path helpers (call_function_accel, mmap_read, mmap_write, recv_exact) used by the plugin server.

Improves WASM/optional dependency compatibility by lazy-importing heavy libs (numpy/pandas/polars/pyarrow), moving jwt imports behind call sites, broadening MySQL default-user fallback for WASM, updating _iquery to normalize more result types (pandas/polars/arrow/numpy), and documenting the new plugin server and env vars.

^{Reviewed by Cursor Bugbot for commit 3043c59. Bugbot is set up for automated code reviews on this repo. Configure here.}

Defer top-level `import jwt` to function scope in auth.py, management/manager.py, and management/utils.py (jwt unavailable in WASM). Catch OSError in mysql/connection.py getpass handling (pwd module unavailable in WASM). Broaden except clause for IPython import in utils/events.py. Add singlestoredb/functions/ext/wasm/ package with udf_handler.py and numpy_stub.py so componentize-py components can `pip install` this branch and import directly from singlestoredb. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Required by componentize-py to build function-handler components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Build complete @udf-decorated Python functions from signature metadata and raw function body instead of requiring full source code. This adds dtype-to-Python type mapping and constructs properly annotated functions at registration time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Heavy optional dependencies (numpy, pandas, polars, pyarrow) were imported at module load time, causing failures in WASM environments where these packages may not be available. This adds a lazy import utility module and converts all eager try/except import patterns to use cached lazy accessors. Type maps in dtypes.py are also converted from module-level dicts to lru_cached factory functions. The pandas DataFrame isinstance check in connection.py is replaced with a duck-type hasattr check to avoid importing pandas at module scope. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace `str | None` with `Optional[str]` to maintain compatibility with Python 3.9 and earlier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add the call_function_accel function directly to accel.c, implementing a combined load/call/dump operation for UDF function calls. This function handles rowdat_1 deserialization, Python UDF invocation, and result serialization in a single optimized C implementation. Previously this function was injected at build time via a patch script in the wasm-udf-server repository. Moving it into the source tree is a prerequisite for cleaning up the custom componentize-py builder and simplifying the WASM component build process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add resources/build_wasm.sh that cross-compiles the package as a WASM wheel targeting wasm32-wasip2. The script sets up a host venv, configures the WASI SDK toolchain (clang, ar, linker flags), and uses `python -m build` to produce the wheel, then unpacks it into build/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

numpy is lazy-loaded throughout the codebase via the _lazy_import helpers, so the WASM numpy_stub that patched sys.modules['numpy'] is no longer needed. Delete the stub module and remove its references from udf_handler.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a standalone collocated UDF server package that can run as a drop-in replacement for the Rust wasm-udf-server. Uses pre-fork worker processes (default) for true CPU parallelism, avoiding GIL contention in the C-accelerated call path. Thread pool mode is available via --process-mode thread. Collapse the wasm subpackage into a single wasm.py module since it only contained one class re-exported through __init__.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@register

Each forked worker previously created its own independent SharedRegistry and FunctionRegistry. When @@register arrived at a worker, only that worker's local registry was updated — the main process and sibling workers never learned about the new function. Add Unix pipe-based IPC (matching the R UDF server fix): each worker gets a pipe back to the main process. When a worker handles @@register, it writes the registration payload to its pipe. The main process reads it via select.poll(), applies the registration to its own SharedRegistry, then kills and re-forks all workers so they inherit the updated state. Thread mode is unaffected — pipe_write_fd is None and the pipe write is a no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add poll()-based timeout to C recv_exact to avoid the interaction between Python's settimeout() (which sets O_NONBLOCK on the fd) and direct fd-level recv() in the C code. When the fd was non-blocking, recv() returned EAGAIN immediately when no data was available, which the C code treated as an error, closing the connection and causing EPIPE on the client side. - accel.c: Add optional timeout_ms parameter to recv_exact that uses poll(POLLIN) before each recv() call, raising TimeoutError on timeout. Also add mmap_read and mmap_write C helpers for fd-level I/O. - connection.py: Only call settimeout() for the Python fallback path; keep fd blocking for C accel path. Pass 100ms timeout to C recv_exact. Catch TimeoutError instead of socket.timeout. Replace select() loop with timeout-based recv. Add C accel paths for mmap read/write. Add optional per-request profiling via SINGLESTOREDB_UDF_PROFILE=1. - registry.py: Consolidate accel imports (mmap_read, mmap_write, recv_exact) under single _has_accel flag. - wasm.py: Update to use renamed _has_accel flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces WASM-compatibility improvements (primarily by lazy-loading heavyweight optional dependencies and moving environment-specific imports into call sites) and adds a new collocated Python UDF server implementation, including a new C-extension hot path to accelerate rowdat_1 decode → Python call → rowdat_1 encode.

Changes:

Added a WIT interface definition and a WASM build helper script for external UDF component workflows.
Refactored optional dependency handling (numpy/pandas/polars/pyarrow, IPython, JWT) to be more robust in constrained/WASM-like environments.
Added a new collocated UDF server (socket + mmap protocol, thread/process modes, dynamic registration) and a C-extension accelerator entry point (call_function_accel).

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
wit/udf.wit	Defines the external UDF WIT interface and exported world.
singlestoredb/utils/_lazy_import.py	Adds cached lazy imports for heavy optional deps.
singlestoredb/utils/dtypes.py	Converts dtype maps to lazily-evaluated, cached getters.
singlestoredb/utils/results.py	Switches result formatting to lazy imports + cached type maps.
singlestoredb/utils/events.py	Broadens IPython import failure handling.
singlestoredb/converters.py	Uses lazy numpy import in vector converters.
singlestoredb/connection.py	Adjusts internal result-to-dict conversion to avoid importing pandas.
singlestoredb/mysql/connection.py	Adds WASM-friendly DEFAULT_USER detection (handles `OSError`).
singlestoredb/auth.py	Moves `jwt` import into call site.
singlestoredb/management/utils.py	Moves `jwt` import into call sites for WASM-friendliness.
singlestoredb/management/manager.py	Moves `jwt` import into `is_jwt` call site.
singlestoredb/functions/dtypes.py	Updates exports to use dtype-map getter functions.
singlestoredb/functions/ext/rowdat_1.py	Replaces eager dtype maps with lazy getter functions.
singlestoredb/functions/ext/json.py	Replaces eager dtype maps with lazy getter functions.
singlestoredb/functions/ext/collocated/*	Adds collocated server, protocol handling, registry, control signals, and WASM adapter.
singlestoredb/tests/test_connection.py	Makes pandas string dtype assertions version-tolerant.
resources/build_wasm.sh	Adds a build helper for wasm32-wasip2 wheels.
pyproject.toml	Adds `python-udf-server` CLI entry point.
accel.c	Adds `call_function_accel` C hot path and exports it from the extension module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When numpy is not available (e.g., WASM), the `np` name is undefined. The has_numpy flag was already used elsewhere but this check was missed when the numpy_stub was removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The mmap_read, mmap_write, and recv_exact functions use poll.h, sys/mman.h, and sys/socket.h which are unavailable in WASI. Wrap these includes, function bodies, and PyMethodDef entries with #ifndef __wasi__ guards so the C extension compiles for wasm32-wasip2. The core call_function_accel optimization remains available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Without this, the accel status log messages ("Using accelerated C call_function_accel loop" / "Using pure Python call_function loop") are silently dropped because no logging handler is configured in the WASM handler path. setup_logging() was only called from __main__.py (collocated server CLI). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The _singlestoredb_accel C extension ifdef'd out the mmap and socket functions for __wasi__ builds, but registry.py imports all four symbols (call_function_accel, mmap_read, mmap_write, recv_exact) in a single try block. The missing exports caused the entire import to fail, silently falling back to the pure Python call_function loop. Add #else stubs that raise NotImplementedError if called, so the symbols are importable and call_function_accel works in WASM. Also capture the accel import error and log it in initialize() for future diagnostics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

accel.c: - Replace empty TODO type stubs with NotImplementedError raises - Add CHECK_REMAINING macro for bounds checking on buffer reads - Replace unaligned pointer-cast reads with memcpy for WASM/ARM safety - Fix double-decref in output error paths (set to NULL before goto) - Fix Py_None reference leak by removing pre-switch INCREF - Fix MYSQL_TYPE_NULL consuming an extra byte from next column - Add PyErr_Format in default switch cases - Add PyErr_Occurred() checks after PyLong/PyFloat conversions Python: - Align list/tuple multi-return handling in registry.py with C path - Add _write_all_fd helper for partial os.write() handling - Harden handshake recvmsg: name length bound, ancdata validation, MSG_CTRUNC check, FD cleanup on error - Wrap get_context('fork') with platform safety error - Narrow events.py exception catch to (ImportError, OSError) - Fix _iquery DataFrame check ordering (check before list()) - Expand setblocking(False) warning comment - Update WIT and wasm.py docstrings for code parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The MYSQL_TYPE_NULL case in call_function_rowdat_1 was missing the `data += 1` advancement after reading the null flag byte. This caused data stream desynchronization for all subsequent columns and rows, leading to data corruption or crashes. The fix aligns with the existing behavior in load_rowdat_1 and load_rowdat_1_numpy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add PyErr_Occurred() checks after PyLong_AsLong calls for colspec type and return type parsing in call_function_accel. Without these checks, a non-integer value would cause PyLong_AsLong to return -1 and set the error indicator, but the code would silently continue with -1 as the type code. This matches the existing pattern used in the other accel functions (call_function_rowdat_1, call_function_rowdat_1_to_pylist). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement full serialization/deserialization for DECIMAL, DATETIME, DATE, and TIME types in the C accelerator (accel.c), pure-Python rowdat_1 codec, ASGI handler, and collocated registry. Previously these types raised "unsupported" errors or were stubbed with TODO comments. Key changes: - accel.c: Implement load/dump for DECIMAL (length-prefixed UTF-8 string), DATE (packed YYYYMMDD int64), TIME (signed HHMMSSuuuuuu int64), and DATETIME/TIMESTAMP (bit-packed int64) in all three code paths (load_rowdat_1_numpy, load_rowdat_1, call_function_accel). Add interned PyStr attribute strings for datetime component access. - rowdat_1.py: Add pack/unpack helpers and wire them into _load, _load_vectors, _dump, and _dump_vectors for the four new type families. - asgi.py, collocated/registry.py: Add datetime/date/time/decimal entries to rowdat_1_type_map so the ASGI and collocated paths can negotiate these types. - registry.py: Add _dtype_to_python mappings, emit datetime/decimal imports in generated code, convert silent skip-on-bad-dtype to TypeError for visibility. - signature.py: Add 'decimal' to sql_type_map, DECIMAL/NUMERIC to sql_to_type_map, and normalize_dtype() mappings for decimal.Decimal and datetime.time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-04-09T18:02:14Z

    ft.LONG, -ft.LONG, ft.LONGLONG, -ft.LONGLONG,
 ])
-string_types = set([15, 245, 247, 248, 249, 250, 251, 252, 253, 254])
+string_types = set([15, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254])


NEWDECIMAL added to string_types shadows decimal handler

High Severity

Value 246 (ft.NEWDECIMAL) was added to string_types, but it's also in the new decimal_types set. Since string_types is checked before decimal_types in the if-elif chains of _load, _load_vectors, _dump, and _dump_vectors, the decimal_types handler is unreachable for NEWDECIMAL. This causes NEWDECIMAL values to be returned as plain strings instead of decimal.Decimal on load, and on dump it will call .encode('utf-8') on a Decimal object, raising an AttributeError.

Additional Locations (1)

singlestoredb/functions/ext/rowdat_1.py#L92-L93

^{Reviewed by Cursor Bugbot for commit a7fb74d. Configure here.}

cursor · 2026-04-09T18:02:14Z

+                    if (!py_dec) goto error;
+                    u64 = (uint64_t)py_dec;
+                    memcpy(out_cols[i] + j * 8, &u64, 8);
+                    CHECKRC(PyDict_SetItem(py_objs, PyLong_FromUnsignedLongLong(u64), py_dec));


PyLong key leaked in PyDict_SetItem calls

Medium Severity

PyLong_FromUnsignedLongLong(u64) returns a new reference, but PyDict_SetItem does not steal references — it increments the refcount of both key and value. The temporary PyLong key is never Py_DECREF'd, leaking memory on every non-null DECIMAL, DATE, TIME, and DATETIME/TIMESTAMP value processed through the numpy loading path (load_rowdat_1_numpy).

Additional Locations (2)

accel.c#L2609-L2610

accel.c#L2646-L2673

^{Reviewed by Cursor Bugbot for commit a7fb74d. Configure here.}

Defer IPython import and handler registration from module load time to first subscribe() call, preventing IPython from being eagerly imported on every `import singlestoredb`. Also defer _setup_authentication_info_handler() in management/utils.py since it was calling subscribe() at module level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-04-09T18:24:54Z

+                    int64_t hh = total_secs / 3600;
+                    int64_t mm = (total_secs % 3600) / 60;
+                    int64_t ss = total_secs % 60;
+                    i64 = sign_v * (hh * 10000 + mm * 100 + ss) * 1000000 + sign_v * total_us;


C TIME encoding disagrees with Python pack algorithm

High Severity

The C TIME encoding computes total_secs from days*86400 + seconds and keeps microseconds separate, then negates them independently. The Python _pack_time instead computes total microseconds via total_seconds() * 1_000_000 first, then decomposes. For negative timedeltas with non-zero microseconds (e.g., -22:59:59.5), the C approach produces a corrupted packed value because sign_v * total_us has the opposite sign from the hhmmss component, causing the packed digits to bleed. The round-trip gives a wrong timedelta.

Additional Locations (2)

accel.c#L4990-L4998

accel.c#L5773-L5781

^{Reviewed by Cursor Bugbot for commit f3d069a. Configure here.}

cursor · 2026-04-09T18:24:54Z

+                    memcpy(out_cols[i] + j * 8, &u64, 8);
+                    CHECKRC(PyDict_SetItem(py_objs, PyLong_FromUnsignedLongLong(u64), py_dec));
+                    Py_CLEAR(py_dec);
+                }


Null DECIMAL doesn't advance data past string bytes

Low Severity

In the DECIMAL handling for both load_rowdat_1_numpy (second pass) and load_rowdat_1, when is_null is true, data advances past the 8-byte length prefix but not past i64 bytes of string data. The first sizing pass always advances by both 8 and i64. If a producer ever emits a non-zero length for a null DECIMAL value, the data pointer would desync with subsequent columns and rows. Currently safe only because null encoding always uses length zero.

Additional Locations (1)

accel.c#L4553-L4567

^{Reviewed by Cursor Bugbot for commit f3d069a. Configure here.}

- Fix Py_None refcount leak in load_rowdat_1: remove pre-switch INCREF, move INCREF before PyTuple_SetItem in each null branch, and remove the post-SetItem INCREF that was leaking a reference on every null value - Replace unaligned pointer dereferences (*(type*)data) with memcpy in load_rowdat_1 and load_rowdat_1_numpy to avoid undefined behavior on alignment-sensitive platforms (ARM, WASM) - Fix YEAR type signed/unsigned mismatch in dump_rowdat_1 and call_function_accel output paths: use uint16_t/PyLong_AsUnsignedLong to match the unsigned input path in load_rowdat_1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extend JSONEncoder.default() to handle datetime, date, timedelta, and Decimal types that were missing from the JSON format path. These types were added to the ROWDAT_1 binary format but the JSON encoder only handled bytes→base64. Without this, json.dumps() raises TypeError when a UDF returns any of these types via FORMAT JSON. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-04-10T19:24:21Z

+                slen = struct.unpack('<q', data_io.read(8))[0]
+                val = decimal.Decimal(
+                    data_io.read(slen).decode('utf-8'),
+                )


NULL datetime/date/decimal crashes Python fallback deserializer

High Severity

In _load and _load_vectors, the new datetime/date/decimal deserialization runs unconditionally before the is_null check at row.append(None if is_null else val). For NULL values, the packed wire value is 0, so _unpack_datetime(0) calls datetime(0, 0, 0) which raises ValueError (year/month/day must be >= 1), _unpack_date(0) similarly crashes, and decimal.Decimal('') raises InvalidOperation. The C accelerator correctly checks is_null before unpacking, but the Python fallback does not — which is the path used in WASM, the primary target of this PR.

Additional Locations (1)

singlestoredb/functions/ext/rowdat_1.py#L270-L287

^{Reviewed by Cursor Bugbot for commit 4c7fb60. Configure here.}

cursor · 2026-04-10T19:24:21Z

+    ss = total_secs % 60
+    mm = (total_secs // 60) % 60
+    hh = total_secs // 3600
+    return sign * (hh * 10000 + mm * 100 + ss) * 1_000_000 + (sign * us)


Floating-point precision loss in _pack_time for timedeltas

Low Severity

_pack_time computes int(td.total_seconds() * 1_000_000) which involves float64 arithmetic. total_seconds() returns a float, and the multiplication can lose microsecond precision for large timedelta values. Using integer-only arithmetic like td.days * 86400_000_000 + td.seconds * 1_000_000 + td.microseconds would avoid this entirely. The C code reads days, seconds, microseconds as separate integers and avoids this issue.

^{Reviewed by Cursor Bugbot for commit 4c7fb60. Configure here.}

When a TimeoutError occurs mid-message, the function removes the socket timeout to avoid protocol desync. Previously the timeout was never restored, causing the caller's shutdown-polling loop to block indefinitely on subsequent reads. Now the original timeout is saved and restored on both EOF and successful completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename the `singlestoredb.functions.ext.collocated` package to `singlestoredb.functions.ext.plugin` and update all related naming: - CLI args: --extension -> --plugin-name, --extension-path -> --search-path - Env vars: EXTERNAL_UDF_* -> PLUGIN_* - Class: FunctionHandler -> Plugin - WIT: singlestore:udf/function-handler -> singlestore:plugin/plugin - WIT world: external-udf -> plugin-server - pyproject.toml entry point updated to new module path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update ARCHITECTURE.md with plugin/ subpackage in the package tree, three-mode execution diagram, Plugin Server CLI section with all 7 arguments and env variables, and Appendix A/D entries. Update CONTRIBUTING.md with plugin server CLI example for manual testing. Update README.md UDF bullet to mention plugin-mode deployment. Replace remaining "collocated" references in plugin/ source files: logger names (collocated.* → plugin.*), docstrings, and argparse description. Files outside plugin/ (config.py, asgi.py, mmap.py) correctly use "collocated" for their own functionality and are left unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 8 total unresolved issues (including 6 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 14d5e83. Configure here.}

cursor · 2026-04-14T16:28:43Z

+                f'mmap_write={t_mmap_write / n_requests * 1e6:.1f}us '
+                f'send={t_send / n_requests * 1e6:.1f}us '
+                f'total={t_total:.1f}us',
+            )


Profiling variables may be referenced before assignment on error

Low Severity

In _handle_udf_loop, when _PROFILE is True, the profiling accumulators (n_requests, t_recv, etc.) are initialized inside an if profile: block. However, if an exception occurs inside the try block before the first successful request (e.g., during get_thread_local_registry()), the finally block references n_requests which would be defined but the other timing variables would also be zero — that part is fine. But t_total computation divides by n_requests, and n_requests > 0 guards it, so this is safe. The real edge case is if profile is reassigned or shadowed unexpectedly.

^{Reviewed by Cursor Bugbot for commit 14d5e83. Configure here.}

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T19:25:34Z

+                val = decimal.Decimal(
+                    data_io.read(slen).decode('utf-8'),
+                )


Same issue as _load(): _load_vectors() attempts decimal.Decimal(...) even when is_null is true and the encoded length is 0, which will raise decimal.InvalidOperation. Guard the decimal parsing on not is_null (or slen > 0) so NULL decimal columns can be loaded.

Suggested change

val = decimal.Decimal(

data_io.read(slen).decode('utf-8'),

)

sval = data_io.read(slen).decode('utf-8')

val = decimal.Decimal(sval) if not is_null else sval

Copilot · 2026-04-15T19:25:35Z

    if issubclass(dtype, datetime.date):
        return 'date'
+    if issubclass(dtype, datetime.time):
+        return 'time'


datetime.time is normalized to 'time' here, but ROWDAT_1 TIME serialization/deserialization uses datetime.timedelta (see rowdat_1._pack_time/_unpack_time). A UDF annotated with datetime.time (or returning it) will be serialized with _pack_time() and crash because datetime.time has no total_seconds(). Either convert datetime.time to a timedelta during dump/call paths, or keep datetime.time unsupported and raise a clear error.

Suggested change

return 'time'

raise TypeError(

'unsupported type annotation: datetime.time; '

'use datetime.timedelta for TIME values, or use `args`/`returns` '

'on the @udf/@tvf decorator to specify the data type',

)

Copilot · 2026-04-15T19:25:35Z

+    SQLString
+
+    """
+    if element_type.upper() not in (F16, F32, F64, I8, I16, I32, I64):


VECTOR() validates element_type.upper() but then emits the SQL using the original element_type string. If callers pass a lowercase value (e.g. "f32"), validation passes but the SQL becomes VECTOR(n, f32) which is likely invalid. Normalize element_type = element_type.upper() before formatting the SQL string.

Suggested change

if element_type.upper() not in (F16, F32, F64, I8, I16, I32, I64):

element_type = element_type.upper()

if element_type not in (F16, F32, F64, I8, I16, I32, I64):

Copilot · 2026-04-15T19:25:35Z

+        sock_path = self.config['socket']
+        if os.path.exists(sock_path):
+            os.unlink(sock_path)
+


_bind_socket() unconditionally unlinks any existing path. If sock_path points to a regular file, this will delete it. Consider verifying the existing path is actually a Unix domain socket (e.g., via stat.S_ISSOCK(os.stat(...).st_mode)) before unlinking, and raise a clear error otherwise.

Copilot · 2026-04-15T19:25:35Z

+                    if (!py_dec) goto error;
+                    u64 = (uint64_t)py_dec;
+                    memcpy(out_cols[i] + j * 8, &u64, 8);
+                    CHECKRC(PyDict_SetItem(py_objs, PyLong_FromUnsignedLongLong(u64), py_dec));
+                    Py_CLEAR(py_dec);


PyDict_SetItem(py_objs, PyLong_FromUnsignedLongLong(u64), py_dec) leaks the temporary key object because PyDict_SetItem does not steal references. Create the key PyObject* separately, pass it to PyDict_SetItem, then Py_DECREF it. This matters on the hot path because it will leak one Python int per non-null DECIMAL/DATE/TIME/DATETIME cell.

Copilot · 2026-04-15T19:25:35Z

    ft.LONG, -ft.LONG, ft.LONGLONG, -ft.LONGLONG,
 ])
-string_types = set([15, 245, 247, 248, 249, 250, 251, 252, 253, 254])
+string_types = set([15, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254])


string_types includes 246 (MySQL NEWDECIMAL), which will cause decimal columns to be treated as strings (and binary_types will include -246). This will mis-parse/mis-dump DECIMAL/NEWDECIMAL data; remove 246 from string_types and rely on decimal_types for DECIMAL handling.

Suggested change

string_types = set([15, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254])

string_types = set([15, 245, 247, 248, 249, 250, 251, 252, 253, 254])

Copilot · 2026-04-15T19:25:36Z

+                val = decimal.Decimal(
+                    data_io.read(slen).decode('utf-8'),
+                )


Decimal decoding happens even when the value is NULL. For NULL decimals _dump() writes length=0, but _load() always calls decimal.Decimal(''), which raises decimal.InvalidOperation. Skip parsing when is_null (or when slen == 0), and set val to a safe placeholder so NULLs round-trip correctly.

Suggested change

val = decimal.Decimal(

data_io.read(slen).decode('utf-8'),

)

dec_bytes = data_io.read(slen)

if is_null or slen == 0:

val = None

else:

val = decimal.Decimal(dec_bytes.decode('utf-8'))

Copilot · 2026-04-15T19:25:36Z

+    # Decimal
+    if issubclass(dtype, decimal.Decimal):
+        return 'decimal'
+


New/expanded dtype support for decimal.Decimal (and now datetime.time) isn’t covered by existing tests. Add unit tests (e.g., in singlestoredb/tests/test_udf.py::test_datetimes / a new decimal test) to assert the generated SQL/type mapping for decimal.Decimal and to define/validate expected behavior for datetime.time.

Copilot · 2026-04-15T19:25:36Z

+        namelen,
+        socket.CMSG_LEN(2 * fd_model.itemsize),
+    )
+


Handshake reads the function name using a single recvmsg(namelen, ...) call. On a stream socket, recvmsg is allowed to return fewer than namelen bytes, which would truncate function_name and desync the protocol. Validate len(msg) == namelen (and/or handle MSG_TRUNC) and if short, read the remaining bytes (without expecting additional FDs) before decoding.

Suggested change

if flags & getattr(socket, 'MSG_TRUNC', 0):

logger.warning('Function name data was truncated (MSG_TRUNC)')

return

if len(msg) < namelen:

remaining = _recv_exact_py(conn, namelen - len(msg))

if remaining is None:

logger.warning(

'Connection closed while receiving function name',

)

return

msg += remaining

Copilot · 2026-04-15T19:25:36Z

+        name = '__main__'
+        compiled = compile(full_code, f'<{name}>', 'exec')
+
+        if name in sys.modules:
+            module = sys.modules[name]


Dynamic function registration executes generated code inside the __main__ module. This can clobber the server’s own __main__ namespace and cause collisions between registrations/reloads. Use a dedicated/private module name for generated functions (e.g. singlestoredb.functions.ext.plugin._dynamic_udfs_<counter>), or create a fresh module per registration and keep it referenced as needed.

The negated binary blob case list in load_rowdat_1_numpy's sizing pass omitted -MYSQL_TYPE_BLOB while including the other three blob types. The data output pass already handled all four. This mismatch caused columns with negated MYSQL_TYPE_BLOB to fall through the sizing switch unhandled, leading to incorrect data pointer advancement and corrupted parsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kesmit13 and others added 12 commits March 19, 2026 15:38

Add WIT interface definition for WASM UDF components

73dcaad

Required by componentize-py to build function-handler components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add type annotations to WASM numpy stub and UDF handler

49521d1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix Python 3.10+ union syntax in udf_handler type annotation

24df239

Replace `str | None` with `Optional[str]` to maintain compatibility with Python 3.9 and earlier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kesmit13 requested a review from srinathnarayanan as a code owner April 1, 2026 18:28

kesmit13 temporarily deployed to Base April 1, 2026 18:28 — with GitHub Actions Inactive

kesmit13 requested a review from Copilot April 1, 2026 18:28

Copilot started reviewing on behalf of kesmit13 April 1, 2026 18:29 View session

cursor bot reviewed Apr 1, 2026

View reviewed changes

Comment thread accel.c

Comment thread accel.c

Copilot AI reviewed Apr 1, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 1, 2026 18:39 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 1, 2026 18:52 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 1, 2026 19:37 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 1, 2026 19:50 — with GitHub Actions Inactive

kesmit13 requested a review from Copilot April 2, 2026 14:07

Copilot started reviewing on behalf of kesmit13 April 2, 2026 14:08 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 7, 2026 20:46 — with GitHub Actions Inactive

cursor bot reviewed Apr 7, 2026

View reviewed changes

Comment thread accel.c

Comment thread accel.c

kesmit13 temporarily deployed to Base April 8, 2026 13:40 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 9, 2026 17:56 — with GitHub Actions Inactive

cursor bot reviewed Apr 9, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 9, 2026 18:19 — with GitHub Actions Inactive

cursor bot reviewed Apr 9, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 9, 2026 20:28 — with GitHub Actions Inactive

Enable VECTOR type with element type validation in dtypes

a22fc43

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kesmit13 temporarily deployed to Base April 10, 2026 14:10 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 10, 2026 19:14 — with GitHub Actions Inactive

cursor bot reviewed Apr 10, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 13, 2026 14:39 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 14, 2026 15:46 — with GitHub Actions Inactive

kesmit13 temporarily deployed to Base April 14, 2026 16:21 — with GitHub Actions Inactive

cursor bot reviewed Apr 14, 2026

View reviewed changes

kesmit13 requested a review from Copilot April 15, 2026 19:19

Copilot started reviewing on behalf of kesmit13 April 15, 2026 19:19 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

kesmit13 temporarily deployed to Base April 15, 2026 19:59 — with GitHub Actions Inactive

-        return 'time'
+        raise TypeError(
+            'unsupported type annotation: datetime.time; '
+            'use datetime.timedelta for TIME values, or use `args`/`returns` '
+            'on the @udf/@tvf decorator to specify the data type',
+        )

	if element_type.upper() not in (F16, F32, F64, I8, I16, I32, I64):
	element_type = element_type.upper()
	if element_type not in (F16, F32, F64, I8, I16, I32, I64):

	string_types = set([15, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254])
	string_types = set([15, 245, 247, 248, 249, 250, 251, 252, 253, 254])

-                val = decimal.Decimal(
-                    data_io.read(slen).decode('utf-8'),
-                )
+                dec_bytes = data_io.read(slen)
+                if is_null or slen == 0:
+                    val = None
+                else:
+                    val = decimal.Decimal(dec_bytes.decode('utf-8'))

+    if flags & getattr(socket, 'MSG_TRUNC', 0):
+        logger.warning('Function name data was truncated (MSG_TRUNC)')
+        return
+    if len(msg) < namelen:
+        remaining = _recv_exact_py(conn, namelen - len(msg))
+        if remaining is None:
+            logger.warning(
+                'Connection closed while receiving function name',
+            )
+            return
+        msg += remaining

Conversation

kesmit13 commented Apr 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

NEWDECIMAL added to string_types shadows decimal handler

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

PyLong key leaked in PyDict_SetItem calls

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

C TIME encoding disagrees with Python pack algorithm

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

Null DECIMAL doesn't advance data past string bytes

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

NULL datetime/date/decimal crashes Python fallback deserializer

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Floating-point precision loss in _pack_time for timedeltas

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Apr 14, 2026

Choose a reason for hiding this comment

Profiling variables may be referenced before assignment on error

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

kesmit13 commented Apr 1, 2026 •

edited by cursor bot

Loading

Floating-point precision loss in `_pack_time` for timedeltas