Skip to content

Commit 9f17c10

Browse files
authored
FIX: Binary data padding + tests (#218)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#38480](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38480) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- ### Summary This pull request improves how binary data (Python `bytes` and `bytearray`) is handled in the MSSQL Python driver, especially for edge cases like empty values, mixed types, and large binaries. It also significantly expands the test suite to cover these scenarios and documents current driver limitations regarding parameter and fetch buffer sizes. **Binary Data Handling Improvements** * Updated `_map_sql_type` in `cursor.py` to always use `VARBINARY` for Python `bytes`/`bytearray`, avoiding storage waste and ensuring correct handling of variable-length data. This removes previous logic that sometimes used fixed-length `BINARY` and simplifies type mapping. * Improved sample value selection in `_select_best_sample_value` to correctly handle columns that contain only binary types (`bytes` or `bytearray`). **Test Suite Enhancements for Binary Data** * Fixed and clarified the `test_longvarbinary` test to expect the correct number of rows and removed assumptions about zero-padding in returned binary data. * Added new tests covering edge cases: - Executemany with binary data and empty byte arrays, ensuring correct insertion and retrieval of empty, regular, and null binary values. - Binary data over 8000 bytes, documenting driver limitations (8192 bytes for parameters, 4096 bytes for fetch buffer) and verifying correct error handling and data retrieval for supported sizes. - All empty binaries: verifies that multiple empty binary rows are handled and returned as zero-length `bytes`. - Mixing `bytes` and `bytearray` types in the same column, confirming consistent storage and retrieval as `bytes`. - Binary columns with mostly small/empty values and one large value, ensuring the large value is handled within driver limits. - Table with only NULL and empty binary values, verifying clear distinction between NULL and empty values and correct query results. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
1 parent 95a1f18 commit 9f17c10

File tree

2 files changed

+338
-21
lines changed

2 files changed

+338
-21
lines changed

mssql_python/cursor.py

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -380,33 +380,21 @@ def _map_sql_type(self, param, parameters_list, i):
380380
)
381381

382382
if isinstance(param, bytes):
383-
if len(param) > 8000: # Assuming VARBINARY(MAX) for long byte arrays
384-
return (
385-
ddbc_sql_const.SQL_VARBINARY.value,
386-
ddbc_sql_const.SQL_C_BINARY.value,
387-
len(param),
388-
0,
389-
False,
390-
)
383+
# Use VARBINARY for Python bytes/bytearray since they are variable-length by nature.
384+
# This avoids storage waste from BINARY's zero-padding and matches Python's semantics.
391385
return (
392-
ddbc_sql_const.SQL_BINARY.value,
386+
ddbc_sql_const.SQL_VARBINARY.value,
393387
ddbc_sql_const.SQL_C_BINARY.value,
394388
len(param),
395389
0,
396390
False,
397391
)
398392

399393
if isinstance(param, bytearray):
400-
if len(param) > 8000: # Assuming VARBINARY(MAX) for long byte arrays
401-
return (
402-
ddbc_sql_const.SQL_VARBINARY.value,
403-
ddbc_sql_const.SQL_C_BINARY.value,
404-
len(param),
405-
0,
406-
True,
407-
)
394+
# Use VARBINARY for Python bytes/bytearray since they are variable-length by nature.
395+
# This avoids storage waste from BINARY's zero-padding and matches Python's semantics.
408396
return (
409-
ddbc_sql_const.SQL_BINARY.value,
397+
ddbc_sql_const.SQL_VARBINARY.value,
410398
ddbc_sql_const.SQL_C_BINARY.value,
411399
len(param),
412400
0,
@@ -848,6 +836,8 @@ def _select_best_sample_value(column):
848836
return max(non_nulls, key=lambda s: len(str(s)))
849837
if all(isinstance(v, datetime.datetime) for v in non_nulls):
850838
return datetime.datetime.now()
839+
if all(isinstance(v, (bytes, bytearray)) for v in non_nulls):
840+
return max(non_nulls, key=lambda b: len(b))
851841
if all(isinstance(v, datetime.date) for v in non_nulls):
852842
return datetime.date.today()
853843
return non_nulls[0] # fallback

0 commit comments

Comments
 (0)