feat: add MSSQL/Fabric support to data-parity skill#705
feat: add MSSQL/Fabric support to data-parity skill#705suryaiyer95 wants to merge 11 commits intofeat/data-parity-skill-improvementsfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
5 issues found across 30 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>
<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/registry.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Bun's runtime never fires native addon async callbacks, so the async `new duckdb.Database(path, opts, callback)` form would hit the 2-second timeout fallback on every connection attempt. Switch to the synchronous constructor form `new duckdb.Database(path)` / `new duckdb.Database(path, opts)` which throws on error and completes immediately in both Node and bun runtimes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly at e3df5a4. The timeout was caused by a missing duckdb .node binary, not a bun incompatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql, fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical serde mismatch where Rust engine rejects raw warehouse type names. - Update both `resolveDialect()` functions to use the mapping - Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col) - Add locale-safe date literal casting via CONVERT(DATE, ..., 23) - Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver) - Add `fabric` normalize aliases in normalize.ts - Add 15 SQL Server driver unit tests (TOP injection, truncation, schema introspection, connection lifecycle, result format) - Add 9 dialect mapping unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`: `azure-active-directory-password`, `access-token`, `service-principal-secret`, `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential` - Force TLS encryption for all Azure AD connections - Dynamic import of `@azure/identity` for `DefaultAzureCredential` - Add normalize aliases for Azure AD config fields (`authentication`, `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`) - Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES - Add 10 Azure AD unit tests covering all auth flows, encryption, and `DefaultAzureCredential` with managed identity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md - Add SQL Server / Fabric schema inspection query in Step 2 - Add "SQL Server and Microsoft Fabric" section with: - Supported configurations table (sqlserver, mssql, fabric) - Fabric connection guide with Azure AD auth types - Algorithm behavior notes (joindiff vs hashdiff selection) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter
- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
`@azure/identity` internally and creates credentials — avoids bun CJS/ESM
`isTokenCredential` boundary issue that caused "not an instance of the token
credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
`msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
columns — it stripped legitimate aliases like `_p` used by partition discovery,
causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
real MSSQL server (profile, joindiff, auto, where_clause, partitioned).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening - Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19) - Use explicit `ConnectionPool` instead of global `mssql.connect()` to isolate multiple simultaneous connections - Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g. `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string key; restore positional column values - Proper column name resolution: compare `namedKeys.length` against flattened row length, fall back to synthetic `col_0`, `col_1`, etc. - Update test mock to export `ConnectionPool` class and `createMockPool` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions
Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
403477e to
811c2be
Compare
|
@suryaiyer95 can you address code review comments? |
- P1: restrict `flattenRow` to only spread the empty-string key (`""`) where mssql merges unnamed columns, preserving legitimate array values - P2: escape single quotes in `partitionValue` for date-mode branches in `buildPartitionWhereClause` (categorical mode already escaped) - P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent password validation alongside `sqlserver`/`mssql` - P2: fallback to `"(no values)"` when `d.values` is nullish to prevent template literal coercing `undefined` to the string `"undefined"` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only the empty-string key is spread, legitimate named arrays are preserved - driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host, trustServerCertificate → trust_server_certificate) - connections: fabric type is recognized in DRIVER_MAP and listed correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server 2022+, mssql v12, and @azure/identity v4 with rationale for each - Document auth shorthands (CLI, default, password, service-principal, msi) - Move @azure/identity from dependencies to optional peerDependencies so it is NOT installed by default — only required for Azure AD auth - Add runtime check in sqlserver driver: if Azure AD auth type is requested but @azure/identity is missing, throw a clear install instruction error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
connect/execute/listSchemas/listTables/describeTable/closewith T-SQLTOPinjection,sys.*catalog queries, and row flattening for unnamed columnsdefault,password,access-token,service-principal-secret,msi-vm,msi-app-service) with shorthand aliases (cli,default,password,service-principal,msi)sqlserver/mssql→tsql,fabric→fabricwithDATETRUNC()andCONVERT(DATE, ..., 23)for date partitioningConnectionPoolisolation for concurrent connections, unnamed-column array flattening, synthetic column name fallbackKey files
packages/drivers/src/sqlserver.ts,packages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts.opencode/skills/data-parity/SKILL.mdTest plan
bun test packages/drivers/test/sqlserver-unit.test.ts— 28 tests passbun test packages/opencode/test/altimate/data-diff-dialect.test.ts— 9 tests pass🤖 Generated with Claude Code
Summary by cubic
Adds end-to-end MSSQL and Microsoft Fabric support to the data-parity skill, with Azure AD auth, Rust dialect mapping (
tsql/fabric), and an upgradedmssqlv12 driver. Documents minimum versions and makes@azure/identityan optional peer dependency required only for Azure AD.New Features
data_diff(auto, joindiff, hashdiff, profile, cascade) with date partitioning viaDATETRUNC()andCONVERT(DATE, ..., 23); mapssqlserver/mssql→tsql,fabric→fabric; improved query detection (keywords + whitespace).default,password,service-principal,msi,CLI), TLS enforced for AAD, runtime check for optional@azure/identity,ConnectionPoolisolation, T‑SQLTOPinjection withExecuteOptions.noLimit, robust result shaping for unnamed columns, and synthetic column names when needed.fabricto reuse the SQL Server driver; add normalize aliases forauthenticationand Azure fields; SKILL docs include minimum versions and a Fabric connection guide;mssqlupgraded to v12.Bug Fixes
noLimitbypasses injectedTOP; stable SQL‑vs‑table detection.CONVERT(DATE, ..., 23); show "(no values)" when diff rows have no values; addedfabricto password‑driver validation.Written for commit 333a45c. Summary will update on new commits.