Skip to content

Commit 03f8ee9

Browse files
authored
FEAT: Optimize executemany() performance (#138)
### ADO Work Item Reference <!-- Insert your ADO Work Item ID below (e.g. AB#37452) --> > [AB#37935](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/37935) ------------------------------------------------------------------- ### Summary <!-- Insert your Copilot Generated Summary below --> This pull request introduces significant updates to the `executemany` method in the `mssql_python` library to enable column-wise parameter binding for batched SQL execution. It also includes enhancements to the underlying C++ bindings to support this functionality. The most important changes involve implementing column-wise binding logic, adding helper functions for parameter buffer allocation, and exposing a new `SQLExecuteMany` method for executing batched statements. ### Enhancements to Python `executemany` method: * [`mssql_python/cursor.py`](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R623-R736): Refactored the `executemany` method to use column-wise parameter binding for improved performance and scalability. Added `_transpose_rowwise_to_columnwise` helper function to convert row-wise parameters into column-wise format. ### Updates to C++ bindings: * [`mssql_python/pybind/ddbc_bindings.cpp`](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R945-R1108): Added `BindParameterArray` function to handle column-wise parameter binding, supporting multiple data types such as `SQL_C_LONG`, `SQL_C_DOUBLE`, and `SQL_C_WCHAR`. * [`mssql_python/pybind/ddbc_bindings.cpp`](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R945-R1108): Introduced `SQLExecuteMany_wrap` function to enable batched execution of SQL statements with multiple parameter sets. * [`mssql_python/pybind/ddbc_bindings.cpp`](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2288): Exposed the new `SQLExecuteMany` method to Python via `PYBIND11_MODULE`. ### Helper functions for buffer allocation: * [`mssql_python/pybind/ddbc_bindings.cpp`](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R188-R196): Added `AllocateParamBufferArray` template function to allocate buffers for column-wise parameter binding. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) -->
1 parent 0374aa2 commit 03f8ee9

File tree

2 files changed

+443
-19
lines changed

2 files changed

+443
-19
lines changed

mssql_python/cursor.py

Lines changed: 84 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121

2222
logger = get_logger()
2323

24-
2524
class Cursor:
2625
"""
2726
Represents a database cursor, which is used to manage the context of a fetch operation.
@@ -631,37 +630,103 @@ def execute(
631630
# Initialize description after execution
632631
self._initialize_description()
633632

633+
@staticmethod
634+
def _select_best_sample_value(column):
635+
"""
636+
Selects the most representative non-null value from a column for type inference.
637+
638+
This is used during executemany() to infer SQL/C types based on actual data,
639+
preferring a non-null value that is not the first row to avoid bias from placeholder defaults.
640+
641+
Args:
642+
column: List of values in the column.
643+
"""
644+
non_nulls = [v for v in column if v is not None]
645+
if not non_nulls:
646+
return None
647+
if all(isinstance(v, int) for v in non_nulls):
648+
# Pick the value with the widest range (min/max)
649+
return max(non_nulls, key=lambda v: abs(v))
650+
if all(isinstance(v, float) for v in non_nulls):
651+
return 0.0
652+
if all(isinstance(v, decimal.Decimal) for v in non_nulls):
653+
return max(non_nulls, key=lambda d: len(d.as_tuple().digits))
654+
if all(isinstance(v, str) for v in non_nulls):
655+
return max(non_nulls, key=lambda s: len(str(s)))
656+
if all(isinstance(v, datetime.datetime) for v in non_nulls):
657+
return datetime.datetime.now()
658+
if all(isinstance(v, datetime.date) for v in non_nulls):
659+
return datetime.date.today()
660+
return non_nulls[0] # fallback
661+
662+
def _transpose_rowwise_to_columnwise(self, seq_of_parameters: list) -> list:
663+
"""
664+
Convert list of rows (row-wise) into list of columns (column-wise),
665+
for array binding via ODBC.
666+
Args:
667+
seq_of_parameters: Sequence of sequences or mappings of parameters.
668+
"""
669+
if not seq_of_parameters:
670+
return []
671+
672+
num_params = len(seq_of_parameters[0])
673+
columnwise = [[] for _ in range(num_params)]
674+
for row in seq_of_parameters:
675+
if len(row) != num_params:
676+
raise ValueError("Inconsistent parameter row size in executemany()")
677+
for i, val in enumerate(row):
678+
columnwise[i].append(val)
679+
return columnwise
680+
634681
def executemany(self, operation: str, seq_of_parameters: list) -> None:
635682
"""
636683
Prepare a database operation and execute it against all parameter sequences.
637-
684+
This version uses column-wise parameter binding and a single batched SQLExecute().
638685
Args:
639686
operation: SQL query or command.
640687
seq_of_parameters: Sequence of sequences or mappings of parameters.
641688
642689
Raises:
643690
Error: If the operation fails.
644691
"""
645-
self._check_closed() # Check if the cursor is closed
646-
692+
self._check_closed()
647693
self._reset_cursor()
648694

649-
first_execution = True
650-
total_rowcount = 0
651-
for parameters in seq_of_parameters:
652-
parameters = list(parameters)
653-
if ENABLE_LOGGING:
654-
logger.info("Executing query with parameters: %s", parameters)
655-
prepare_stmt = first_execution
656-
first_execution = False
657-
self.execute(
658-
operation, parameters, use_prepare=prepare_stmt, reset_cursor=False
695+
if not seq_of_parameters:
696+
self.rowcount = 0
697+
return
698+
699+
param_info = ddbc_bindings.ParamInfo
700+
param_count = len(seq_of_parameters[0])
701+
parameters_type = []
702+
703+
for col_index in range(param_count):
704+
column = [row[col_index] for row in seq_of_parameters]
705+
sample_value = self._select_best_sample_value(column)
706+
dummy_row = list(seq_of_parameters[0])
707+
parameters_type.append(
708+
self._create_parameter_types_list(sample_value, param_info, dummy_row, col_index)
659709
)
660-
if self.rowcount != -1:
661-
total_rowcount += self.rowcount
662-
else:
663-
total_rowcount = -1
664-
self.rowcount = total_rowcount
710+
711+
columnwise_params = self._transpose_rowwise_to_columnwise(seq_of_parameters)
712+
if ENABLE_LOGGING:
713+
logger.info("Executing batch query with %d parameter sets:\n%s",
714+
len(seq_of_parameters),"\n".join(f" {i+1}: {tuple(p) if isinstance(p, (list, tuple)) else p}" for i, p in enumerate(seq_of_parameters))
715+
)
716+
717+
# Execute batched statement
718+
ret = ddbc_bindings.SQLExecuteMany(
719+
self.hstmt,
720+
operation,
721+
columnwise_params,
722+
parameters_type,
723+
len(seq_of_parameters)
724+
)
725+
check_error(ddbc_sql_const.SQL_HANDLE_STMT.value, self.hstmt, ret)
726+
727+
self.rowcount = ddbc_bindings.DDBCSQLRowCount(self.hstmt)
728+
self.last_executed_stmt = operation
729+
self._initialize_description()
665730

666731
def fetchone(self) -> Union[None, Row]:
667732
"""

0 commit comments

Comments
 (0)