-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Hi - SQLAlchemy testing is coming down to only a few failures remaining. The latest is this particular quirk involving CP1252 characters inside of VARCHAR. usually for unicode data we use NVARCHAR, however we apparently have some tests that roundtrip accent characters into VARCHAR, and this seems to work on SQL Server databases due to the CP1252 encoding set up (pyodbc, pymssql etc. all round trip without issue). all the other backends we have can also roundtrip this data with VARCHAR without having to use NVARCHAR. Below is a claude-generated report that includes a test matrix.
VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters
Summary
The mssql-python driver fails to fetch VARCHAR columns correctly when:
- The data length exactly equals the column size (e.g., 10 characters in VARCHAR(10))
- The data contains non-ASCII characters from Windows-1252/CP1252 (e.g., é, à, ñ)
This affects all VARCHAR sizes from small (VARCHAR(5)) to large (VARCHAR(50+)).
Behavior:
fetchone(): Returnsbytesobject (e.g.,b'\xa9') instead ofstrfetchall(): Crashes withRuntimeError: Error fetching LOB for column 1, cType=1, loop=1, SQLGetData return=-1
Works correctly when:
- Data is pure ASCII (any length)
- Data contains CP1252 characters but does NOT fill the entire column
Reproduction Code
"""
Comprehensive reproducer for mssql-python VARCHAR fetch bugs
Issue: Driver fails to fetch VARCHAR columns correctly when data length equals
the column size AND contains non-ASCII characters (Windows-1252/CP1252).
Behavior:
- fetchall(): Crashes with "Error fetching LOB" RuntimeError
- fetchone(): Returns bytes object (e.g., b'\xa9') instead of string
Database: SQL Server with collation SQL_Latin1_General_CP1_CI_AS (code page 1252)
"""
from mssql_python import connect
# Adjust connection string for your environment
conn_str = "UID=scott;PWD=tiger^5HHH;Server=mssql2022;Database=test;Encrypt=No"
conn = connect(conn_str)
cursor = conn.cursor()
# Test grid: (column_size, test_data, description)
test_cases = [
# Pure ASCII - should always work
(9, "abcdefghi", "ASCII, full length"),
(10, "abcdefghij", "ASCII, full length"),
# CP1252 chars, NOT full length - should work
(10, "café", "CP1252, 4 of 10 chars"),
(20, "réveillé", "CP1252, 8 of 20 chars"),
# CP1252 chars, FULL length - these FAIL
(5, "aaaaé", "CP1252, 5 of 5 chars (FULL)"),
(9, "aaaaaaaaé", "CP1252, 9 of 9 chars (FULL)"),
(10, "aaaaaaaaaé", "CP1252, 10 of 10 chars (FULL)"),
(20, "aaaaaaaaaaaaaaaaaaaé", "CP1252, 20 of 20 chars (FULL)"),
]
print("=" * 90)
print("VARCHAR Fetch Test Results")
print("=" * 90)
print(f"{'Column':<8} {'Data Length':<12} {'Description':<30} {'fetchone()':<15} {'fetchall()':<15}")
print("-" * 90)
for col_size, test_data, description in test_cases:
try:
cursor.execute("DROP TABLE test_table")
conn.commit()
except:
pass
cursor.execute(f"CREATE TABLE test_table (data VARCHAR({col_size}) NULL)")
conn.commit()
cursor.execute("INSERT INTO test_table (data) VALUES (?)", (test_data,))
conn.commit()
# Test fetchone()
cursor.execute("SELECT data FROM test_table")
try:
row = cursor.fetchone()
value = row[0]
if value == test_data:
fetchone_result = "✓ PASS"
else:
if isinstance(value, bytes):
fetchone_result = f"✗ bytes: {repr(value)[:8]}"
else:
fetchone_result = f"✗ wrong: {repr(value)[:8]}"
except Exception as e:
fetchone_result = f"✗ ERROR: {str(e)[:8]}"
# Test fetchall()
cursor.execute("SELECT data FROM test_table")
try:
rows = cursor.fetchall()
value = rows[0][0]
if value == test_data:
fetchall_result = "✓ PASS"
else:
if isinstance(value, bytes):
fetchall_result = f"✗ bytes: {repr(value)[:8]}"
else:
fetchall_result = f"✗ wrong: {repr(value)[:8]}"
except Exception as e:
error_msg = str(e)
if "LOB" in error_msg:
fetchall_result = "✗ LOB error"
else:
fetchall_result = f"✗ ERROR: {error_msg[:8]}"
print(f"VARCHAR({col_size:<2}) {len(test_data):>2}/{col_size:<2} chars {description:<30} {fetchone_result:<15} {fetchall_result:<15}")
print("-" * 90)
print("\nSummary:")
print(" ✓ PASS: Data fetched correctly")
print(" ✗ bytes: Returned bytes object instead of string")
print(" ✗ LOB error: Crashed with 'Error fetching LOB' RuntimeError")
print()
print("Pattern: Bug occurs when data length EQUALS column size AND contains")
print(" non-ASCII CP1252 characters (e.g., é, à, ñ)")
print("=" * 90)
# Clean up
cursor.close()
conn.close()Expected Behavior
All VARCHAR columns should return Python str objects containing the stored data, regardless of:
- Column size
- Data length
- Whether data contains ASCII or CP1252 characters
Actual Behavior
Output:
==========================================================================================
VARCHAR Fetch Test Results
==========================================================================================
Column Data Length Description fetchone() fetchall()
------------------------------------------------------------------------------------------
VARCHAR(9 ) 9/9 chars ASCII, full length ✓ PASS ✓ PASS
VARCHAR(10) 10/10 chars ASCII, full length ✓ PASS ✓ PASS
VARCHAR(10) 4/10 chars CP1252, 4 of 10 chars ✓ PASS ✓ PASS
VARCHAR(20) 8/20 chars CP1252, 8 of 20 chars ✓ PASS ✓ PASS
VARCHAR(5 ) 5/5 chars CP1252, 5 of 5 chars (FULL) ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(9 ) 9/9 chars CP1252, 9 of 9 chars (FULL) ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(10) 10/10 chars CP1252, 10 of 10 chars (FULL) ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(20) 20/20 chars CP1252, 20 of 20 chars (FULL) ✗ bytes: b'\xa9' ✗ LOB error
------------------------------------------------------------------------------------------
Environment
- Python: 3.14
- SQL Server: 2022
- OS: Fedora 40
- Database Collation:
SQL_Latin1_General_CP1_CI_AS(Code Page 1252) - mssql-python version: [current version]
Additional Context
The bug pattern is very specific:
- ✅ VARCHAR with ASCII data (any length): Works
- ✅ VARCHAR with CP1252 data that doesn't fill the column: Works
- ❌ VARCHAR with CP1252 data that exactly fills the column: Fails
This is a critical issue for applications using SQLAlchemy's Enum type, which uses VARCHAR columns and can store values that exactly match the column size.
Note: Windows-1252 (CP1252) includes common European characters like:
- French: é, è, à, ç
- German: ö, ü, ä, ß
- Spanish: ñ, á, í
- And many others in the U+0080 to U+00FF range
These characters are valid in VARCHAR columns with CP1252 collation and should be retrievable as strings, not bytes or LOB data.