Skip to content

FEAT: Add ActiveDirectoryMSI support for bulk copy#573

Merged
bewithgaurav merged 12 commits into
mainfrom
bewithgaurav/gh534-bulkcopy-msi
May 21, 2026
Merged

FEAT: Add ActiveDirectoryMSI support for bulk copy#573
bewithgaurav merged 12 commits into
mainfrom
bewithgaurav/gh534-bulkcopy-msi

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav commented May 12, 2026

Work Item / Issue Reference

AB#45137

GitHub Issue: #534


Summary

Adds Authentication=ActiveDirectoryMSI support to bulk copy. Partial fix for #534.

  • Zero-arg ManagedIdentityCredential() for system-assigned MSI.
  • ManagedIdentityCredential(client_id=UID) for user-assigned MSI. Matches ODBC convention where UID carries the identity's client_id under MSI.
  • Threads optional credential_kwargs through get_auth_token / get_raw_token / _acquire_token so future auth methods that need constructor args plug in via the same channel.
  • Credential cache key stays a plain string for zero-arg auth types and becomes a tuple (auth_type, sorted_kwargs) when kwargs are present, so different client_ids get separate cached credentials.

mssql-py-core (the Rust path used by bulk copy) doesn't itself acquire Entra tokens. Python pre-acquires a JWT and passes it as access_token. Today this works for Default, DeviceCode, and Interactive. MSI was missing, the most common Azure-hosted-service auth method.

ServicePrincipal and Password are explicitly out of scope for this PR. They need different design work and ship separately.

How user-assigned MSI survives the bulkcopy fresh-token path

Connection.__init__ calls process_connection_string, which sanitizes self.connection_str (strips UID=, Authentication=, PWD=). Bulkcopy needs a fresh token at copy time. Re-parsing self.connection_str at that point would lose UID and silently degrade user-assigned MSI to system-assigned (wrong-identity bug).

Fix shape: process_connection_string returns a 4-tuple including the captured credential_kwargs. Connection.__init__ persists _credential_kwargs alongside _auth_type. cursor.bulkcopy() reads self.connection._credential_kwargs directly. No re-parse, no chance for the sanitized string to drop the client_id.

Connection string examples

# System-assigned MSI
"Server=tcp:foo.database.windows.net;Authentication=ActiveDirectoryMSI;Database=mydb;"

# User-assigned MSI
"Server=tcp:foo.database.windows.net;Authentication=ActiveDirectoryMSI;UID=<client_id_guid>;Database=mydb;"

Adds Authentication=ActiveDirectoryMSI to the auth pipeline:

- Zero-arg ManagedIdentityCredential() for system-assigned MSI.
- ManagedIdentityCredential(client_id=UID) for user-assigned MSI,
  matching ODBC's convention where UID carries the identity's
  client_id under MSI.
- Threads optional credential_kwargs through get_auth_token /
  get_raw_token / _acquire_token so future auth methods that need
  constructor args (e.g. ClientSecretCredential) can plug in via
  the same channel.
- Cache key remains a plain string for zero-arg auth types and
  becomes a tuple when kwargs are present, so different client_ids
  get separate cached credentials.

Partial fix for #534. ServicePrincipal and
Password to follow as separate PRs.
Copilot AI review requested due to automatic review settings May 12, 2026 12:09
@bewithgaurav bewithgaurav changed the title Add ActiveDirectoryMSI support for bulk copy FIX: Add ActiveDirectoryMSI support for bulk copy May 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Authentication=ActiveDirectoryMSI (managed identity) token acquisition for the bulk copy (mssql-py-core) path, including user-assigned MSI via UID=<client_id>.

Changes:

  • Add AuthType.MSI (activedirectorymsi) and map it to ManagedIdentityCredential.
  • Thread optional credential_kwargs through token acquisition and update the credential-instance cache key to incorporate kwargs.
  • Attempt to extract MSI client_id from the connection string and pass it into bulk copy token acquisition; add tests and a changelog entry.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mssql_python/constants.py Adds AuthType.MSI enum value.
mssql_python/auth.py Adds MSI credential support, credential kwargs plumbing, and cache key changes; adds extract_credential_kwargs.
mssql_python/cursor.py Bulk copy now tries to extract MSI credential kwargs and pass them to get_raw_token.
tests/test_008_auth.py Adds tests for MSI auth type parsing, credential construction, cache isolation, and connection-string processing.
CHANGELOG.md Documents MSI support for bulk copy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mssql_python/cursor.py Outdated
Comment thread mssql_python/auth.py Outdated
Comment thread tests/test_008_auth.py
@bewithgaurav bewithgaurav changed the title FIX: Add ActiveDirectoryMSI support for bulk copy FEAT: Add ActiveDirectoryMSI support for bulk copy May 13, 2026
@bewithgaurav bewithgaurav marked this pull request as draft May 13, 2026 12:15
Connection.__init__ overwrites self.connection_str with the sanitized
(UID-stripped) string returned by process_connection_string. The original
implementation re-parsed self.connection_str at bulkcopy time via
extract_credential_kwargs, which silently dropped the user-assigned MSI
client_id and degraded to system-assigned  a wrong-identity bug.MSI

Changes:

- process_connection_string now returns a 4-tuple including the captured
  credential_kwargs so callers can persist them.
- Connection.__init__ stores _credential_kwargs alongside _auth_type.
- cursor.bulkcopy() reads self.connection._credential_kwargs instead of
  re-parsing self.connection_str.
- The public extract_credential_kwargs helper is removed (it only existed
  to support the broken re-parse path; nothing else needs it).
- black --line-length=100 reformats (CI was red).

Tests:

- test_bulkcopy_path_preserves_user_assigned_msi_client_id: invokes
  cursor.bulkcopy() with a mocked mssql_py_core, patches
  AADAuth.get_raw_token to capture the args it receives, and asserts
  the captured credential_kwargs match Connection._credential_kwargs.
  Fails if cursor reverts to re-parsing self.connection.connection_str.
- test_credential_kwargs_persisted_for_user_assigned_msi: asserts
  Connection.__init__ stores _credential_kwargs from the 4-tuple.
- test_credential_kwargs_none_for_system_assigned_msi.
- test_credential_kwargs_none_for_non_msi_auth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bewithgaurav bewithgaurav marked this pull request as ready for review May 14, 2026 12:31
@github-actions github-actions Bot added the pr-size: medium Moderate update size label May 14, 2026
Comment thread mssql_python/auth.py Outdated
bewithgaurav and others added 3 commits May 14, 2026 18:40
Per @saurabh500's review: braced ODBC values like UID={hello=world} need
the canonical parser, not naive partition('='). Without this, the helper
returns '{hello=world}' verbatim and ManagedIdentityCredential rejects it.
Worse, a UID containing a literal ';' would be truncated.

_extract_msi_client_id now delegates to _ConnectionStringParser, which
handles braces, escaped '}}' inside braces, and '=' inside braced values
correctly. validate_keywords=False so the helper never raises on keys
the auth flow doesn't care about.

Tests:
- test_msi_braced_uid_value_is_unwrapped: UID={hello=world} -> 'hello=world'
- test_msi_braced_uid_with_semicolon_is_preserved: UID={abc;def;ghi}

Note: process_connection_string and extract_auth_type still use naive
split(';') for Authentication= detection across all Entra ID auth types.
That's pre-existing and tracked separately for a parser-wide refactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread mssql_python/auth.py
Comment thread mssql_python/auth.py
- Debug log distinguishing user-assigned vs system-assigned MSI when the
  user passes Authentication=ActiveDirectoryMSI. Helps diagnose which
  branch was taken when token acquisition fails. Logs client_id length,
  not value (still identity material).
- Comment above _credential_cache explains the cache key shape so the
  unbounded growth is understood as a deliberate choice rather than an
  oversight.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

25%


📈 Total Lines Covered: 7039 out of 27155
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/auth.py (100%)
  • mssql_python/connection.py (100%)
  • mssql_python/constants.py (100%)

Summary

  • Total: 34 lines
  • Missing: 0 lines
  • Coverage: 100%

📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.build._deps.simdutf-src.src.haswell.implementation.cpp: 0.4%
mssql_python.pybind.build._deps.simdutf-src.src.implementation.cpp: 6.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.implementation.h: 10.4%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.scalar.utf16_to_utf8.utf16_to_utf8.h: 25.3%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.internal.isadetection.h: 65.3%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

Connection.__init__ already parses the same connection string through
_ConnectionStringParser via _construct_connection_string (connection.py
line 253) before process_connection_string is ever called. By the time
_extract_msi_client_id runs, the input is guaranteed parseable.

The try/except was dead code. A real parse failure here would indicate
an upstream bug and should propagate, not silently degrade user-assigned
MSI to system-assigned (which is the wrong-identity failure mode this PR
exists to prevent).

Brings diff coverage to 100%.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
saurabh500
saurabh500 previously approved these changes May 14, 2026
Comment thread mssql_python/auth.py
bewithgaurav added a commit that referenced this pull request May 15, 2026
…tructure

Brings in the ActiveDirectoryMSI feature (PR #573, mostly approved) so SP
can use the shared module-level credential cache that PR #573 introduces,
instead of the closure-scoped cache the SP commit shipped with.

Conflicts resolved (kept both contributions):
- mssql_python/constants.py: AuthType.MSI + AuthType.SERVICE_PRINCIPAL
- mssql_python/auth.py: _parse_tenant_id + ServicePrincipalAuth +
  _extract_msi_client_id; both elif branches in process_auth_parameters;
  both keys in extract_auth_type's auth_map
- mssql_python/cursor.py: SP factory branch on _auth_type=='serviceprincipal',
  Model A branch threads credential_kwargs through AADAuth.get_raw_token
  (MSI's contract change)
- tests/test_008_auth.py: MockClientSecretCredential + MockManagedIdentityCredential
  in setup_azure_identity, both sets of TestProcessAuthParameters /
  TestExtractAuthType cases

ServicePrincipalAuth.make_token_factory now uses the shared
_credential_cache + _credential_cache_key (introduced for MSI), keyed by
('serviceprincipal', tenant_id, client_id). Closure-scoped cache and the
TODO(#573-followup) note removed. client_secret is intentionally NOT in
the cache key because credentials are looked up by identity, not secret;
a rotated secret will surface as ClientAuthenticationError, not a silent
hit on the wrong credential.

Tests: 86 passed in tests/test_008_auth.py (was 75 SP-only). Black clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread mssql_python/connection.py
Tracking refactor (parse-once, thread the parsed map through the auth
path) is a separate follow-up; this docstring helps anyone reaching
into the function before that lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bewithgaurav bewithgaurav merged commit 46ae4ba into main May 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants