FEAT: Add ActiveDirectoryMSI support for bulk copy#573
Merged
Conversation
Adds Authentication=ActiveDirectoryMSI to the auth pipeline: - Zero-arg ManagedIdentityCredential() for system-assigned MSI. - ManagedIdentityCredential(client_id=UID) for user-assigned MSI, matching ODBC's convention where UID carries the identity's client_id under MSI. - Threads optional credential_kwargs through get_auth_token / get_raw_token / _acquire_token so future auth methods that need constructor args (e.g. ClientSecretCredential) can plug in via the same channel. - Cache key remains a plain string for zero-arg auth types and becomes a tuple when kwargs are present, so different client_ids get separate cached credentials. Partial fix for #534. ServicePrincipal and Password to follow as separate PRs.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds support for Authentication=ActiveDirectoryMSI (managed identity) token acquisition for the bulk copy (mssql-py-core) path, including user-assigned MSI via UID=<client_id>.
Changes:
- Add
AuthType.MSI(activedirectorymsi) and map it toManagedIdentityCredential. - Thread optional
credential_kwargsthrough token acquisition and update the credential-instance cache key to incorporate kwargs. - Attempt to extract MSI
client_idfrom the connection string and pass it into bulk copy token acquisition; add tests and a changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
mssql_python/constants.py |
Adds AuthType.MSI enum value. |
mssql_python/auth.py |
Adds MSI credential support, credential kwargs plumbing, and cache key changes; adds extract_credential_kwargs. |
mssql_python/cursor.py |
Bulk copy now tries to extract MSI credential kwargs and pass them to get_raw_token. |
tests/test_008_auth.py |
Adds tests for MSI auth type parsing, credential construction, cache isolation, and connection-string processing. |
CHANGELOG.md |
Documents MSI support for bulk copy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Connection.__init__ overwrites self.connection_str with the sanitized (UID-stripped) string returned by process_connection_string. The original implementation re-parsed self.connection_str at bulkcopy time via extract_credential_kwargs, which silently dropped the user-assigned MSI client_id and degraded to system-assigned a wrong-identity bug.MSI Changes: - process_connection_string now returns a 4-tuple including the captured credential_kwargs so callers can persist them. - Connection.__init__ stores _credential_kwargs alongside _auth_type. - cursor.bulkcopy() reads self.connection._credential_kwargs instead of re-parsing self.connection_str. - The public extract_credential_kwargs helper is removed (it only existed to support the broken re-parse path; nothing else needs it). - black --line-length=100 reformats (CI was red). Tests: - test_bulkcopy_path_preserves_user_assigned_msi_client_id: invokes cursor.bulkcopy() with a mocked mssql_py_core, patches AADAuth.get_raw_token to capture the args it receives, and asserts the captured credential_kwargs match Connection._credential_kwargs. Fails if cursor reverts to re-parsing self.connection.connection_str. - test_credential_kwargs_persisted_for_user_assigned_msi: asserts Connection.__init__ stores _credential_kwargs from the 4-tuple. - test_credential_kwargs_none_for_system_assigned_msi. - test_credential_kwargs_none_for_non_msi_auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bbda2a2 to
3ca2165
Compare
bewithgaurav
added a commit
that referenced
this pull request
May 14, 2026
saurabh500
reviewed
May 14, 2026
Per @saurabh500's review: braced ODBC values like UID={hello=world} need the canonical parser, not naive partition('='). Without this, the helper returns '{hello=world}' verbatim and ManagedIdentityCredential rejects it. Worse, a UID containing a literal ';' would be truncated. _extract_msi_client_id now delegates to _ConnectionStringParser, which handles braces, escaped '}}' inside braces, and '=' inside braced values correctly. validate_keywords=False so the helper never raises on keys the auth flow doesn't care about. Tests: - test_msi_braced_uid_value_is_unwrapped: UID={hello=world} -> 'hello=world' - test_msi_braced_uid_with_semicolon_is_preserved: UID={abc;def;ghi} Note: process_connection_string and extract_auth_type still use naive split(';') for Authentication= detection across all Entra ID auth types. That's pre-existing and tracked separately for a parser-wide refactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…microsoft/mssql-python into bewithgaurav/gh534-bulkcopy-msi
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
saurabh500
reviewed
May 14, 2026
- Debug log distinguishing user-assigned vs system-assigned MSI when the user passes Authentication=ActiveDirectoryMSI. Helps diagnose which branch was taken when token acquisition fails. Logs client_id length, not value (still identity material). - Comment above _credential_cache explains the cache key shape so the unbounded growth is understood as a deliberate choice rather than an oversight. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.build._deps.simdutf-src.src.haswell.implementation.cpp: 0.4%
mssql_python.pybind.build._deps.simdutf-src.src.implementation.cpp: 6.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.implementation.h: 10.4%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.scalar.utf16_to_utf8.utf16_to_utf8.h: 25.3%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.internal.isadetection.h: 65.3%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%🔗 Quick Links
|
Connection.__init__ already parses the same connection string through _ConnectionStringParser via _construct_connection_string (connection.py line 253) before process_connection_string is ever called. By the time _extract_msi_client_id runs, the input is guaranteed parseable. The try/except was dead code. A real parse failure here would indicate an upstream bug and should propagate, not silently degrade user-assigned MSI to system-assigned (which is the wrong-identity failure mode this PR exists to prevent). Brings diff coverage to 100%. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
saurabh500
previously approved these changes
May 14, 2026
bewithgaurav
added a commit
that referenced
this pull request
May 15, 2026
…tructure Brings in the ActiveDirectoryMSI feature (PR #573, mostly approved) so SP can use the shared module-level credential cache that PR #573 introduces, instead of the closure-scoped cache the SP commit shipped with. Conflicts resolved (kept both contributions): - mssql_python/constants.py: AuthType.MSI + AuthType.SERVICE_PRINCIPAL - mssql_python/auth.py: _parse_tenant_id + ServicePrincipalAuth + _extract_msi_client_id; both elif branches in process_auth_parameters; both keys in extract_auth_type's auth_map - mssql_python/cursor.py: SP factory branch on _auth_type=='serviceprincipal', Model A branch threads credential_kwargs through AADAuth.get_raw_token (MSI's contract change) - tests/test_008_auth.py: MockClientSecretCredential + MockManagedIdentityCredential in setup_azure_identity, both sets of TestProcessAuthParameters / TestExtractAuthType cases ServicePrincipalAuth.make_token_factory now uses the shared _credential_cache + _credential_cache_key (introduced for MSI), keyed by ('serviceprincipal', tenant_id, client_id). Closure-scoped cache and the TODO(#573-followup) note removed. client_secret is intentionally NOT in the cache key because credentials are looked up by identity, not secret; a rotated secret will surface as ClientAuthenticationError, not a silent hit on the wrong credential. Tests: 86 passed in tests/test_008_auth.py (was 75 SP-only). Black clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jahnvi480
reviewed
May 15, 2026
jahnvi480
requested changes
May 15, 2026
Tracking refactor (parse-once, thread the parsed map through the auth path) is a separate follow-up; this docstring helps anyone reaching into the function before that lands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
saurabh500
approved these changes
May 15, 2026
jahnvi480
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
Adds
Authentication=ActiveDirectoryMSIsupport to bulk copy. Partial fix for #534.ManagedIdentityCredential()for system-assigned MSI.ManagedIdentityCredential(client_id=UID)for user-assigned MSI. Matches ODBC convention whereUIDcarries the identity'sclient_idunder MSI.credential_kwargsthroughget_auth_token/get_raw_token/_acquire_tokenso future auth methods that need constructor args plug in via the same channel.(auth_type, sorted_kwargs)when kwargs are present, so differentclient_ids get separate cached credentials.mssql-py-core(the Rust path used by bulk copy) doesn't itself acquire Entra tokens. Python pre-acquires a JWT and passes it asaccess_token. Today this works forDefault,DeviceCode, andInteractive. MSI was missing, the most common Azure-hosted-service auth method.ServicePrincipal and Password are explicitly out of scope for this PR. They need different design work and ship separately.
How user-assigned MSI survives the bulkcopy fresh-token path
Connection.__init__callsprocess_connection_string, which sanitizesself.connection_str(stripsUID=,Authentication=,PWD=). Bulkcopy needs a fresh token at copy time. Re-parsingself.connection_strat that point would lose UID and silently degrade user-assigned MSI to system-assigned (wrong-identity bug).Fix shape:
process_connection_stringreturns a 4-tuple including the capturedcredential_kwargs.Connection.__init__persists_credential_kwargsalongside_auth_type.cursor.bulkcopy()readsself.connection._credential_kwargsdirectly. No re-parse, no chance for the sanitized string to drop theclient_id.Connection string examples