Skip to content

VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters #435

@zzzeek

Description

@zzzeek

Hi - SQLAlchemy testing is coming down to only a few failures remaining. The latest is this particular quirk involving CP1252 characters inside of VARCHAR. usually for unicode data we use NVARCHAR, however we apparently have some tests that roundtrip accent characters into VARCHAR, and this seems to work on SQL Server databases due to the CP1252 encoding set up (pyodbc, pymssql etc. all round trip without issue). all the other backends we have can also roundtrip this data with VARCHAR without having to use NVARCHAR. Below is a claude-generated report that includes a test matrix.

VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters

Summary

The mssql-python driver fails to fetch VARCHAR columns correctly when:

  1. The data length exactly equals the column size (e.g., 10 characters in VARCHAR(10))
  2. The data contains non-ASCII characters from Windows-1252/CP1252 (e.g., é, à, ñ)

This affects all VARCHAR sizes from small (VARCHAR(5)) to large (VARCHAR(50+)).

Behavior:

  • fetchone(): Returns bytes object (e.g., b'\xa9') instead of str
  • fetchall(): Crashes with RuntimeError: Error fetching LOB for column 1, cType=1, loop=1, SQLGetData return=-1

Works correctly when:

  • Data is pure ASCII (any length)
  • Data contains CP1252 characters but does NOT fill the entire column

Reproduction Code

"""
Comprehensive reproducer for mssql-python VARCHAR fetch bugs

Issue: Driver fails to fetch VARCHAR columns correctly when data length equals
       the column size AND contains non-ASCII characters (Windows-1252/CP1252).

Behavior:
- fetchall(): Crashes with "Error fetching LOB" RuntimeError
- fetchone(): Returns bytes object (e.g., b'\xa9') instead of string

Database: SQL Server with collation SQL_Latin1_General_CP1_CI_AS (code page 1252)
"""

from mssql_python import connect

# Adjust connection string for your environment
conn_str = "UID=scott;PWD=tiger^5HHH;Server=mssql2022;Database=test;Encrypt=No"
conn = connect(conn_str)
cursor = conn.cursor()

# Test grid: (column_size, test_data, description)
test_cases = [
    # Pure ASCII - should always work
    (9, "abcdefghi", "ASCII, full length"),
    (10, "abcdefghij", "ASCII, full length"),

    # CP1252 chars, NOT full length - should work
    (10, "café", "CP1252, 4 of 10 chars"),
    (20, "réveillé", "CP1252, 8 of 20 chars"),

    # CP1252 chars, FULL length - these FAIL
    (5, "aaaaé", "CP1252, 5 of 5 chars (FULL)"),
    (9, "aaaaaaaaé", "CP1252, 9 of 9 chars (FULL)"),
    (10, "aaaaaaaaaé", "CP1252, 10 of 10 chars (FULL)"),
    (20, "aaaaaaaaaaaaaaaaaaaé", "CP1252, 20 of 20 chars (FULL)"),
]

print("=" * 90)
print("VARCHAR Fetch Test Results")
print("=" * 90)
print(f"{'Column':<8} {'Data Length':<12} {'Description':<30} {'fetchone()':<15} {'fetchall()':<15}")
print("-" * 90)

for col_size, test_data, description in test_cases:
    try:
        cursor.execute("DROP TABLE test_table")
        conn.commit()
    except:
        pass

    cursor.execute(f"CREATE TABLE test_table (data VARCHAR({col_size}) NULL)")
    conn.commit()

    cursor.execute("INSERT INTO test_table (data) VALUES (?)", (test_data,))
    conn.commit()

    # Test fetchone()
    cursor.execute("SELECT data FROM test_table")
    try:
        row = cursor.fetchone()
        value = row[0]
        if value == test_data:
            fetchone_result = "✓ PASS"
        else:
            if isinstance(value, bytes):
                fetchone_result = f"✗ bytes: {repr(value)[:8]}"
            else:
                fetchone_result = f"✗ wrong: {repr(value)[:8]}"
    except Exception as e:
        fetchone_result = f"✗ ERROR: {str(e)[:8]}"

    # Test fetchall()
    cursor.execute("SELECT data FROM test_table")
    try:
        rows = cursor.fetchall()
        value = rows[0][0]
        if value == test_data:
            fetchall_result = "✓ PASS"
        else:
            if isinstance(value, bytes):
                fetchall_result = f"✗ bytes: {repr(value)[:8]}"
            else:
                fetchall_result = f"✗ wrong: {repr(value)[:8]}"
    except Exception as e:
        error_msg = str(e)
        if "LOB" in error_msg:
            fetchall_result = "✗ LOB error"
        else:
            fetchall_result = f"✗ ERROR: {error_msg[:8]}"

    print(f"VARCHAR({col_size:<2}) {len(test_data):>2}/{col_size:<2} chars   {description:<30} {fetchone_result:<15} {fetchall_result:<15}")

print("-" * 90)
print("\nSummary:")
print("  ✓ PASS: Data fetched correctly")
print("  ✗ bytes: Returned bytes object instead of string")
print("  ✗ LOB error: Crashed with 'Error fetching LOB' RuntimeError")
print()
print("Pattern: Bug occurs when data length EQUALS column size AND contains")
print("         non-ASCII CP1252 characters (e.g., é, à, ñ)")
print("=" * 90)

# Clean up
cursor.close()
conn.close()

Expected Behavior

All VARCHAR columns should return Python str objects containing the stored data, regardless of:

  • Column size
  • Data length
  • Whether data contains ASCII or CP1252 characters

Actual Behavior

Output:

==========================================================================================
VARCHAR Fetch Test Results
==========================================================================================
Column   Data Length  Description                    fetchone()      fetchall()
------------------------------------------------------------------------------------------
VARCHAR(9 )  9/9  chars   ASCII, full length             ✓ PASS          ✓ PASS
VARCHAR(10) 10/10 chars   ASCII, full length             ✓ PASS          ✓ PASS
VARCHAR(10)  4/10 chars   CP1252, 4 of 10 chars          ✓ PASS          ✓ PASS
VARCHAR(20)  8/20 chars   CP1252, 8 of 20 chars          ✓ PASS          ✓ PASS
VARCHAR(5 )  5/5  chars   CP1252, 5 of 5 chars (FULL)    ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(9 )  9/9  chars   CP1252, 9 of 9 chars (FULL)    ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(10) 10/10 chars   CP1252, 10 of 10 chars (FULL)  ✗ bytes: b'\xa9' ✗ LOB error
VARCHAR(20) 20/20 chars   CP1252, 20 of 20 chars (FULL)  ✗ bytes: b'\xa9' ✗ LOB error
------------------------------------------------------------------------------------------

Environment

  • Python: 3.14
  • SQL Server: 2022
  • OS: Fedora 40
  • Database Collation: SQL_Latin1_General_CP1_CI_AS (Code Page 1252)
  • mssql-python version: [current version]

Additional Context

The bug pattern is very specific:

  • ✅ VARCHAR with ASCII data (any length): Works
  • ✅ VARCHAR with CP1252 data that doesn't fill the column: Works
  • ❌ VARCHAR with CP1252 data that exactly fills the column: Fails

This is a critical issue for applications using SQLAlchemy's Enum type, which uses VARCHAR columns and can store values that exactly match the column size.

Note: Windows-1252 (CP1252) includes common European characters like:

  • French: é, è, à, ç
  • German: ö, ü, ä, ß
  • Spanish: ñ, á, í
  • And many others in the U+0080 to U+00FF range

These characters are valid in VARCHAR columns with CP1252 collation and should be retrievable as strings, not bytes or LOB data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions