From 860cee21fa2429fa9ff48b9a89de36ab01c98c64 Mon Sep 17 00:00:00 2001 From: Josh Crittenden Date: Tue, 5 May 2026 14:29:46 -0400 Subject: [PATCH] Add Power BI skills: Best Practices Analyzer and Reverse Engineer Two new community skills for Power BI + Snowflake workflows: - powerbi-best-practices-analyzer: Four-domain audit (data modeling, DAX, Power Query, performance/security) with prioritized findings and Snowflake SQL fixes - powerbi-reverse-engineer: Converts Power BI semantic models (.pbit/.pbix or live via MCP) into Snowflake Semantic View DDL .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code --- .gitignore | 1 + .../powerbi-best-practices-analyzer/LICENSE | 189 +++++++++++ .../powerbi-best-practices-analyzer/README.md | 44 +++ .../powerbi-best-practices-analyzer/SKILL.md | 194 ++++++++++++ .../data-modeling/SKILL.md | 259 +++++++++++++++ .../dax-measures/SKILL.md | 174 ++++++++++ .../performance-security/SKILL.md | 176 +++++++++++ .../power-query/SKILL.md | 182 +++++++++++ skills/powerbi-reverse-engineer/LICENSE | 189 +++++++++++ skills/powerbi-reverse-engineer/README.md | 43 +++ skills/powerbi-reverse-engineer/SKILL.md | 296 ++++++++++++++++++ 11 files changed, 1747 insertions(+) create mode 100644 .gitignore create mode 100644 skills/powerbi-best-practices-analyzer/LICENSE create mode 100644 skills/powerbi-best-practices-analyzer/README.md create mode 100644 skills/powerbi-best-practices-analyzer/SKILL.md create mode 100644 skills/powerbi-best-practices-analyzer/data-modeling/SKILL.md create mode 100644 skills/powerbi-best-practices-analyzer/dax-measures/SKILL.md create mode 100644 skills/powerbi-best-practices-analyzer/performance-security/SKILL.md create mode 100644 skills/powerbi-best-practices-analyzer/power-query/SKILL.md create mode 100644 skills/powerbi-reverse-engineer/LICENSE create mode 100644 skills/powerbi-reverse-engineer/README.md create mode 100644 skills/powerbi-reverse-engineer/SKILL.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..e43b0f98 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store diff --git a/skills/powerbi-best-practices-analyzer/LICENSE b/skills/powerbi-best-practices-analyzer/LICENSE new file mode 100644 index 00000000..76ff4c06 --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/LICENSE @@ -0,0 +1,189 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work. + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by the Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding any notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + Copyright 2026 Josh Crittenden + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/skills/powerbi-best-practices-analyzer/README.md b/skills/powerbi-best-practices-analyzer/README.md new file mode 100644 index 00000000..11bdd5a2 --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/README.md @@ -0,0 +1,44 @@ +# Power BI Best Practices Analyzer + +Audit your Power BI semantic models against Snowflake and Power BI best practices, all from within Cortex Code. This skill performs a four-domain analysis (Data Modeling, DAX Measures, Power Query/Connector, Performance/Security) and produces a prioritized findings report with severity ratings and actionable Snowflake SQL/DDL fixes. + +## How It Works + +Provide a `.pbit` or `.pbix` file, or connect to a live model via the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp), and the skill will: + +1. **Extract** the semantic model metadata (tables, columns, relationships, DAX measures, M expressions) +2. **Analyze** across four domains using specialized sub-rules: + - **Data Modeling & Relationships** — star schema adherence, calculated columns that should be pushed to Snowflake, cardinality issues + - **DAX Measures** — CALCULATE overuse, IF.EAGER patterns, base measure definitions, organization + - **Power Query & Connector** — transforms that belong in Snowflake, ODBC vs native connector, custom SQL usage + - **Performance & Security** — RLS implementation, bidirectional cross-filtering, high-cardinality slicers, DirectQuery anti-patterns +3. **Generate** a prioritized report with CRITICAL/HIGH/MEDIUM/LOW findings and complete Snowflake SQL fixes + +## Usage + +``` +$powerbi-bpa analyze my Power BI file at ~/reports/sales-model.pbit +``` + +``` +$powerbi-bpa audit the model open in Power BI Desktop called "Sales Analytics" +``` + +``` +$powerbi-bpa analyze semantic model "Finance Model" in workspace "Finance Team" +``` + +## Output + +- `_bpa_report.md` — Prioritized findings report with a summary scorecard and Snowflake DDL fixes for every applicable finding + +## Prerequisites + +- A `.pbit` or `.pbix` file, **or** the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) running with access to your model +- Cortex Code with the skill installed + +## License + +Apache License 2.0 — see [LICENSE](LICENSE) + +**Author**: Josh Crittenden diff --git a/skills/powerbi-best-practices-analyzer/SKILL.md b/skills/powerbi-best-practices-analyzer/SKILL.md new file mode 100644 index 00000000..25f4ccf8 --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/SKILL.md @@ -0,0 +1,194 @@ +--- +id: powerbi-best-practices-analyzer +name: powerbi-best-practices-analyzer +skill-name: $powerbi-bpa +description: "Analyze a Power BI semantic model against Snowflake and Power BI best practices. Produces a prioritized findings report with actionable SQL fixes." +prompt: "$powerbi-bpa analyze my Power BI file at /path/to/report.pbit" +language: en +status: Published +author: Josh Crittenden +type: community +--- + +# Power BI Best Practices Analyzer + +# When to Use +- User provides a `.pbit` or `.pbix` file and wants to audit it against best practices +- User wants to improve Power BI + Snowflake performance +- User wants to identify DAX anti-patterns, connector issues, or modeling problems +- User has the Power BI Modeling MCP Server running and wants to audit an open model +- Do NOT use for reverse engineering into a Snowflake Semantic View (use `$powerbi-reverse-engineer` instead) + +# What This Skill Provides +Performs a four-domain audit of a Power BI semantic model (Data Modeling, DAX Measures, Power Query/Connector, Performance/Security) and generates a prioritized findings report with severity ratings and Snowflake SQL/DDL fixes. + +# References +- [Snowflake + Power BI Best Practices](https://medium.com/snowflake/snowflake-and-power-bi-best-practices-and-recent-improvements-183e2d970c0c) +- [Microsoft Power BI Optimization Guide](https://learn.microsoft.com/en-us/power-bi/guidance/power-bi-optimization) +- [DirectQuery Model Guidance](https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance) +- [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) + +# Instructions + +## Step 1: Determine Input Method + +**Ask** the user how they want to provide their Power BI model. Two methods are supported: + +### Option A: File-based (.pbit or .pbix) + +If the user provides a file path: + +1. **Extract** the DataModelSchema from the archive: + ```bash + mkdir -p /tmp/pbi_bpa + unzip -o "" DataModelSchema -d /tmp/pbi_bpa + iconv -f UTF-16LE -t UTF-8 /tmp/pbi_bpa/DataModelSchema > /tmp/pbi_bpa/model.json + ``` + +2. **Parse** into structured context: + ```python + import json + with open('/tmp/pbi_bpa/model.json') as f: + data = json.load(f) + model = data.get('model', {}) + tables = model.get('tables', []) + relationships = model.get('relationships', []) + roles = model.get('roles', []) + ``` + +### Option B: Power BI Modeling MCP Server + +If the user has the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) running (available via `npx @microsoft/powerbi-modeling-mcp@latest --start`): + +1. **Connect** to the model using MCP tools: + - For Power BI Desktop: `Connect to '[File Name]' in Power BI Desktop` + - For Fabric workspace: `Connect to semantic model '[Model Name]' in Fabric Workspace '[Workspace Name]'` + +2. **Retrieve** the model metadata using MCP tool calls: + - Use `model_operations` to get overall model structure + - Use `table_operations` (list) to get all tables + - Use `column_operations` (list) for each table to get columns with data types + - Use `measure_operations` (list) for each table to get DAX measures + - Use `relationship_operations` (find) to get all relationships + - Use `security_role_operations` to get RLS definitions + - Use `named_expression_operations` (list) to get Power Query M expressions + +**Detect storage mode:** M expressions containing `Snowflake.Databases(` indicate DirectQuery/Import from Snowflake. Tables with `partitions[].source.type == "calculated"` are DAX-calculated. + +**Output:** Parsed model context with tables, columns, measures, relationships, and M expressions. + +## Step 2: Run Analysis Across Four Domains + +**Execute** analysis across four specialist domains in sequence, each appending to a shared `findings[]` list. + +**2a. Data Modeling & Relationships** +→ **Load** `data-modeling/SKILL.md` +→ Checks: star schema adherence, calculated tables/columns that should be pushed to Snowflake, relationship cardinality and cross-filtering patterns, numeric precision issues, missing column metadata +→ Returns: findings for categories A + B + +**2b. DAX Measures** +→ **Load** `dax-measures/SKILL.md` +→ Checks: base measure definitions, CALCULATE overuse, IF.EAGER patterns, format string issues, measure table organization, time intelligence placement +→ Returns: findings for category C + +**2c. Power Query & Connector** +→ **Load** `power-query/SKILL.md` +→ Checks: M expression transforms that should be pushed to Snowflake, custom SQL usage, ODBC vs. native Snowflake connector, relative date filtering, non-Snowflake sources +→ Returns: findings for category D + +**2d. Performance & Security** +→ **Load** `performance-security/SKILL.md` +→ Checks: RLS implementation, MEDIAN on large tables (use PERCENTILE_CONT in Snowflake instead), bidirectional cross-filtering, high-cardinality slicer patterns, DirectQuery-specific anti-patterns +→ Returns: findings for category E + +**Output:** Combined findings list with severity, category, affected object, description, and fix. + +## Step 3: Generate Report + +**Format** findings into a prioritized report: + +``` +## Power BI Best Practices Analysis Report +File: +Date: +Total Findings: N (X critical, X high, X medium, X low) + +--- +### CRITICAL Findings +[rule ID] [table/measure] Description | Fix + +### HIGH Findings +... + +### MEDIUM Findings +... + +### LOW Findings +... + +--- +### Summary Scorecard +| Category | Critical | High | Medium | Low | +|----------------------|----------|------|--------|-----| +| A. Data Modeling | | | | | +| B. Relationships | | | | | +| C. DAX Measures | | | | | +| D. Power Query | | | | | +| E. Performance/RLS | | | | | +| Total | | | | | + +### Top 5 Recommended Actions +1. ... +``` + +Include complete Snowflake SQL/DDL for every applicable finding. + +**⚠️ STOPPING POINT:** Present findings to user and wait for confirmation before saving. + +## Step 4: Save & Present + +**Save** the report as `_bpa_report.md` in the same directory as the input file (or the current working directory if using MCP). + +**Offer** to: +1. Deep-dive any specific finding +2. Generate complete Snowflake DDL for all suggested views/tables +3. Re-run analysis on a single domain after changes +4. Apply fixes directly via MCP (if connected via MCP server) + +**If error occurs:** +- File not a valid ZIP: Check file extension, try different encoding +- DataModelSchema not found: File may be corrupted or incompatible PBI version +- UTF-16 decode fails: Try UTF-16BE instead of UTF-16LE +- MCP connection fails: Verify Power BI Desktop is open or Fabric workspace is accessible +- Unknown error: Ask user for guidance + +# Severity Guide + +| Level | Meaning | +|-------|---------| +| CRITICAL | Incorrect results or severe performance degradation | +| HIGH | Significant perf impact or major architectural anti-pattern | +| MEDIUM | Best practice violation with moderate impact | +| LOW | Minor improvement, documentation, or maintenance issue | + +# Stopping Points +- ✋ After Step 3 — Review findings before saving the report + +**Resume rule:** Upon user approval, proceed directly to Step 4. + +# Output +- `_bpa_report.md` — Prioritized findings report with Snowflake SQL fixes + +# Examples + +## Example 1: File-based analysis +User: $powerbi-bpa analyze my Power BI file at ~/reports/sales-model.pbit +Assistant: Extracts DataModelSchema, runs four-domain analysis, presents findings report with severity ratings and Snowflake DDL fixes. + +## Example 2: MCP-based analysis +User: $powerbi-bpa audit the model open in Power BI Desktop called "Sales Analytics" +Assistant: Connects via Power BI Modeling MCP Server, retrieves model metadata via MCP tools, runs analysis, presents findings. Offers to apply fixes directly via MCP. + +## Example 3: Fabric workspace model +User: $powerbi-bpa analyze semantic model "Finance Model" in workspace "Finance Team" +Assistant: Connects to Fabric workspace via MCP, retrieves model metadata, runs analysis, presents findings report. diff --git a/skills/powerbi-best-practices-analyzer/data-modeling/SKILL.md b/skills/powerbi-best-practices-analyzer/data-modeling/SKILL.md new file mode 100644 index 00000000..7f7f5971 --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/data-modeling/SKILL.md @@ -0,0 +1,259 @@ +--- +name: pbi-bpa-data-modeling +description: Data modeling and relationships sub-skill for the Power BI Best Practices Analyzer. +parent_skill: powerbi-best-practices-analyzer +--- + +# Data Modeling & Relationships Checks + +Run all rules below. For each violation, append to findings[]: +`{id, severity, affected: [table/column names], description, fix_summary, sql_ddl}` + +--- + +## Category A: Data Modeling (Snowflake-First) + +**A1 — CRITICAL: Logic in Power Query Instead of Snowflake** + +*Check:* M expression contains any of: `Table.AddColumn`, `Table.SelectRows`, `Table.Group`, `Table.Join`, `Table.NestedJoin`, `Table.TransformColumns`, or complex string/date functions. + +*Violation:* Data transformation logic that should live in Snowflake is embedded in the Power BI model. + +*Fix:* Create a Snowflake VIEW or DYNAMIC TABLE implementing the same logic: +```sql +CREATE OR REPLACE VIEW DB.SCHEMA.V_TABLE AS +SELECT *, COL_A || COL_B AS FK_KEY -- replaces Table.AddColumn +FROM DB.SCHEMA.SOURCE_TABLE +WHERE STATUS = 'Active'; -- replaces Table.SelectRows +``` + +--- + +**A2 — HIGH: Calculated Tables (DAX) Should Be Physical Snowflake Tables** + +*Check:* `partitions[].source.type == "calculated"` on any table. + +*Violation:* DAX-computed tables consume memory and slow model refresh. + +*Fix for date/calendar tables:* +```sql +CREATE OR REPLACE TABLE DB.SCHEMA.DIM_DATE AS +SELECT + DATEADD(DAY, SEQ4(), '2015-01-01'::DATE) AS DATE, + YEAR(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) AS YEAR, + MONTH(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) AS MONTH_NUM, + TO_CHAR(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE), 'MMMM') AS MONTH_NAME, + QUARTER(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) AS QUARTER_NUM, + YEAR(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) * 100 + QUARTER(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) AS YEAR_QUARTER, + CASE WHEN DAYOFWEEK(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) IN (1,7) THEN TRUE ELSE FALSE END AS WEEKEND_FLAG, + YEAR(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) - YEAR(CURRENT_DATE()) AS RELATIVE_YEAR, + MONTH(DATEADD(DAY, SEQ4(), '2015-01-01'::DATE)) - MONTH(CURRENT_DATE()) AS RELATIVE_MONTH +FROM TABLE(GENERATOR(ROWCOUNT => 7305)) +WHERE DATEADD(DAY, SEQ4(), '2015-01-01'::DATE) <= DATE_FROM_PARTS(YEAR(CURRENT_DATE())+10,12,31); +``` +Note: `RELATIVE_YEAR` / `RELATIVE_MONTH` columns replace the need for Power Query relative date filtering. + +--- + +**A3 — HIGH: Calculated Columns (DAX) Should Live in Snowflake** + +*Check:* Any column with a non-empty `expression` field in DataModelSchema. + +*Violation:* DAX calculated columns are rebuilt on every refresh and compress less efficiently than source columns. + +*Fix:* Translate the DAX expression to SQL in a Snowflake VIEW. Common translations: +- `YEAR([Date])` → `YEAR(date_col)` +- `FORMAT([Date], "YYYY-MM")` → `TO_CHAR(date_col, 'YYYY-MM')` +- `IF([Qty]>0,"Y","N")` → `CASE WHEN qty > 0 THEN 'Y' ELSE 'N' END` +- `RELATED(Dim[Name])` → JOIN in view +- `DATEDIFF("day",[Start],[End])` → `DATEDIFF('day', start_col, end_col)` +- `CONCATENATE([A],[B])` → `A || B` +- `LEFT([Col],3)` → `LEFT(col, 3)` + +--- + +**A4 — MEDIUM: Wide Flat Tables (Not Star Schema)** + +*Check:* Fact tables with >30 columns, or tables mixing dimensional attributes with numeric measures. + +*Violation:* Wide tables miss Snowflake's columnar compression advantages and slow Power BI's VertiPaq engine. + +*Fix:* Decompose into star schema in Snowflake. Create dimension views: +```sql +CREATE OR REPLACE VIEW DB.SCHEMA.DIM_PRODUCT AS +SELECT DISTINCT PRODUCT_ID, PRODUCT_NAME, CATEGORY, SUBCATEGORY FROM DB.SCHEMA.FLAT_FACT; + +CREATE OR REPLACE VIEW DB.SCHEMA.FACT_SALES AS +SELECT ORDER_ID, PRODUCT_ID, CUSTOMER_ID, DATE_ID, AMOUNT, QUANTITY FROM DB.SCHEMA.FLAT_FACT; +``` + +--- + +**A5 — MEDIUM: Unnecessary Columns Loaded** + +*Check:* Columns not referenced in any relationship, measure expression, or visible hierarchy. + +*Violation:* Every extra column occupies memory in the VertiPaq engine; unused columns provide no analytical value. + +*Fix:* Either remove from the Power BI model, or create a Snowflake view that excludes them: +```sql +CREATE OR REPLACE VIEW DB.SCHEMA.V_SLIM_TABLE AS +SELECT NEEDED_COL1, NEEDED_COL2, NEEDED_COL3 -- only include necessary columns +FROM DB.SCHEMA.FULL_TABLE; +``` + +--- + +**A6 — LOW: No Primary Keys Defined** + +*Check:* Tables missing surrogate key columns (`*_ID`, `*_KEY`, `*_SK`). + +*Violation:* Relationships without PK constraints allow duplicates and produce incorrect aggregations. + +*Fix:* +```sql +ALTER TABLE DB.SCHEMA.DIM_CUSTOMER ADD PRIMARY KEY (CUSTOMER_ID); +-- Or ensure uniqueness with a UNIQUE constraint +ALTER TABLE DB.SCHEMA.DIM_CUSTOMER ADD CONSTRAINT uq_customer UNIQUE (CUSTOMER_ID); +``` + +--- + +**A7 — MEDIUM: Columns with Excessive Numeric Precision** + +*Check:* Columns with `dataType == "double"` — inspect name for type hints: +- Currency (`amount`,`price`,`cost`,`revenue`,`total`,`value`,`balance`) → flag, recommend `NUMBER(18,2)` +- Integer-like (`id`,`count`,`qty`,`num`,`flag`,`year`,`month`,`day`,`age`,`rank`) → flag, recommend `INTEGER` +- Percentage (`pct`,`percent`,`ratio`,`rate`,`share`) → flag, recommend `NUMBER(10,4)` + +*Violation:* `double` (64-bit float) wastes memory and causes floating-point precision issues in aggregations. + +*Fix:* +```sql +CREATE OR REPLACE VIEW DB.SCHEMA.V_ORDERS AS +SELECT + ORDER_ID::INTEGER AS ORDER_ID, + CUSTOMER_ID::INTEGER AS CUSTOMER_ID, + ROUND(UNIT_PRICE,2)::NUMBER(18,2) AS UNIT_PRICE, + QUANTITY::INTEGER AS QUANTITY +FROM DB.SCHEMA.ORDERS; +-- Also consider: ODBC_TREAT_DECIMAL_AS_INT = TRUE for integer-valued DECIMAL columns +``` + +--- + +**A8 — HIGH: Auto Date/Time Tables Hidden in Model** + +*Check:* Calculated tables whose name matches pattern `DateTableTemplate_*` or `LocalDateTable_*`, OR any calculated table expression containing `CALENDAR(` or `CALENDARAUTO(` and no corresponding physical date table exists. + +*Violation:* Power BI auto date/time creates a hidden calculated table for every date column — can double model size. + +*Fix:* Disable Auto date/time in Power BI Desktop (File → Options → Data Load → uncheck "Auto date/time"). Create a single physical date table in Snowflake (see A2 DDL above). + +--- + +**A9 — MEDIUM: High-Cardinality Text Column Used as Relationship Key** + +*Check:* Relationship `fromColumn` or `toColumn` where the column `dataType == "string"` and column name suggests a code or identifier (`code`, `key`, `num`, `id`, `sku`, `ref`). + +*Violation:* String-type join keys prevent hash encoding optimization and generate larger, slower queries. + +*Fix:* Add an integer surrogate key in Snowflake: +```sql +ALTER TABLE DB.SCHEMA.DIM_PRODUCT ADD COLUMN PRODUCT_SK INTEGER AUTOINCREMENT; +-- Or use HASH to create a stable integer key +ALTER TABLE DB.SCHEMA.DIM_PRODUCT ADD COLUMN PRODUCT_SK INTEGER + DEFAULT HASH(PRODUCT_CODE)::INTEGER; +``` + +--- + +**A10 — MEDIUM: GUID/UUID Columns in Relationships** + +*Check:* Relationship columns where name contains `guid`, `uuid`, or column name is exactly one of the relationship keys and `dataType == "string"` with 36-char format patterns. + +*Violation:* Power BI generates `CAST` operations on GUID joins — causes poor DirectQuery performance. + +*Fix:* Materialize an integer surrogate key in Snowflake and use that as the join key instead. + +--- + +**A11 — LOW: Primary Key Columns Not Hidden** + +*Check:* The one-side (PK) column of each relationship is visible (no `isHidden: true` flag). + +*Violation:* ID/key columns surfaced to report authors cause confusion and misuse (e.g., summing IDs). + +*Fix:* In Power BI, hide PK columns on dimension tables. Set `summarizeBy = "none"` on all ID columns. + +--- + +**A12 — LOW: Missing Table and Column Descriptions** + +*Check:* Tables or columns with empty/missing `description` field. + +*Violation:* Undescribed fields reduce discoverability and lead to misuse. + +*Fix:* Add descriptions in Power BI. Mirror documentation in Snowflake: +```sql +COMMENT ON COLUMN DB.SCHEMA.TABLE.COLUMN IS 'Description of what this column represents'; +COMMENT ON TABLE DB.SCHEMA.TABLE IS 'Description of this table'; +``` + +--- + +## Category B: Relationships + +**B1 — HIGH: Bi-Directional Relationships** + +*Check:* `crossFilteringBehavior == "bothDirections"`. + +*Violation:* Each bi-directional filter doubles the number of SQL queries generated per visual interaction. + +*Fix:* Switch to single-direction. If many-to-many is the root cause, resolve with a Snowflake bridge table: +```sql +CREATE OR REPLACE TABLE DB.SCHEMA.BRIDGE_PRODUCT_CATEGORY AS +SELECT DISTINCT PRODUCT_ID, CATEGORY_ID FROM DB.SCHEMA.FACT_SALES; +``` + +--- + +**B2 — HIGH: Many-to-Many Relationships** + +*Check:* Relationship cardinality is not one-to-many (infer: FK column has duplicates relative to PK). + +*Violation:* Many-to-many causes row duplication (fan-out) and incorrect aggregations. + +*Fix:* Resolve in Snowflake with a bridge/junction table or by deduplicating the dimension. + +--- + +**B3 — MEDIUM: Inactive Relationships** + +*Check:* `isActive == false`. + +*Violation:* Every inactive relationship requires `USERELATIONSHIP()` in each relevant measure — complexity debt and extra SQL queries. + +*Fix:* Consider materializing a separate Snowflake view per relationship path, or redesign to eliminate ambiguity. + +--- + +**B4 — LOW: Referential Integrity Not Set** + +*Check:* DirectQuery model detected (all M sources are Snowflake). Note as general recommendation. + +*Fix:* Enable "Assume Referential Integrity" on all DirectQuery relationships where Snowflake FK integrity is enforced. Forces `INNER JOIN` instead of `LEFT OUTER JOIN` — faster queries. + +--- + +**B5 — MEDIUM: Relationships on Calculated/Derived Columns** + +*Check:* Relationship `fromColumn` or `toColumn` matches a column that has an `expression` field (calculated column). + +*Violation:* Per Microsoft guidance: relationships on calculated columns embed expressions into every source query and prevent index usage. + +*Fix:* Materialize the concatenated/derived key in Snowflake as a physical column, then join on that. + +## Output + +Return `findings[]` array to the router (SKILL.md Step 3). diff --git a/skills/powerbi-best-practices-analyzer/dax-measures/SKILL.md b/skills/powerbi-best-practices-analyzer/dax-measures/SKILL.md new file mode 100644 index 00000000..13786220 --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/dax-measures/SKILL.md @@ -0,0 +1,174 @@ +--- +name: pbi-bpa-dax-measures +description: DAX measures sub-skill for the Power BI Best Practices Analyzer. +parent_skill: powerbi-best-practices-analyzer +--- + +# DAX Measures Checks + +Run all rules below against all measures across all tables. Append violations to `findings[]`. + +--- + +## Category C: DAX Measures + +**C1 — HIGH: Repeated Aggregation Logic Without Base Measures** + +*Check:* Extract all `SUM(`, `COUNT(`, `AVERAGE(`, `MIN(`, `MAX(`, `DISTINCTCOUNT(` calls from all measure expressions. Find any raw aggregation sub-expression appearing in 3+ separate measures without a dedicated base measure that encapsulates it. + +*Algorithm:* +```python +import re +agg_pattern = re.compile(r"(SUM|COUNT|AVERAGE|MIN|MAX|DISTINCTCOUNT)\s*\([^\)]+\)") +from collections import Counter +all_aggs = [] +for t in tables: + for m in t.get('measures', []): + expr = ''.join(m.get('expression', []) if isinstance(m.get('expression'), list) else [m.get('expression', '')]) + all_aggs.extend(agg_pattern.findall(expr)) +duplicates = [agg for agg, count in Counter(all_aggs).items() if count >= 3] +``` + +*Violation:* Repeated raw aggregations across measures with no base measure. Every change requires updating N measures. + +*Fix:* Create dedicated base measures: +``` +-- Base measure (create first): +Total Sales = SUM(Sales[Amount]) + +-- Then reference it everywhere: +Total Sales YTD = TOTALYTD([Total Sales], Date[Date]) +Total Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date])) +Sales Growth % = DIVIDE([Total Sales] - [Total Sales LY], [Total Sales LY]) +-- NOT: Sales Growth % = DIVIDE(SUM(Sales[Amount]) - CALCULATE(SUM(Sales[Amount]),...), CALCULATE(SUM(Sales[Amount]),...)) +``` + +--- + +**C2 — MEDIUM: IF/SWITCH Without .EAGER Variant** + +*Check:* Measure expressions containing `IF(` or `SWITCH(` (case-insensitive, not already using `IF.EAGER` or `SWITCH.EAGER`). + +*Violation:* Standard `IF`/`SWITCH` in DirectQuery mode generates one SQL query per branch. `IF.EAGER`/`SWITCH.EAGER` evaluates all branches in a single query. + +*Fix:* Replace `IF(condition, a, b)` → `IF.EAGER(condition, a, b)` and `SWITCH(expr, v1,r1, v2,r2, else)` → `SWITCH.EAGER(expr, v1,r1, v2,r2, else)`. + +Note: `.EAGER` variants may slightly increase query cost in Import mode but significantly reduce query count in DirectQuery mode. + +--- + +**C3 — HIGH: Complex DAX Computed in Memory (Pre-aggregate in Snowflake)** + +*Check:* Measures containing `CALCULATETABLE(`, `SUMMARIZE(`, `ADDCOLUMNS(`, `CROSSJOIN(`, `TOPN(`, or `GENERATE(`. + +*Violation:* These functions force Power BI to pull large intermediate result sets into memory and process them locally — especially costly in DirectQuery mode. + +*Fix:* Pre-aggregate in Snowflake using a DYNAMIC TABLE or MATERIALIZED VIEW: +```sql +-- Instead of SUMMARIZE(Sales, Customer[Region], "Total", SUM(Sales[Amount])) +CREATE OR REPLACE DYNAMIC TABLE DB.SCHEMA.AGG_SALES_BY_REGION + TARGET_LAG = '1 hour' + WAREHOUSE = MY_WH +AS +SELECT REGION, SUM(AMOUNT) AS TOTAL_SALES, COUNT(*) AS ORDER_COUNT +FROM DB.SCHEMA.FACT_SALES +GROUP BY REGION; +``` + +--- + +**C4 — MEDIUM: CALCULATE Used Excessively** + +*Check:* Count the number of measures containing `CALCULATE(`. If >60% of all measures use `CALCULATE(`, flag. + +*Violation:* Per Microsoft DirectQuery guidance, heavy use of `CALCULATE` generates expensive native queries that don't perform well. Signals that filter context manipulation could instead be handled in Snowflake views. + +*Fix:* For common filter patterns, create pre-filtered Snowflake views: +```sql +-- Instead of CALCULATE([Sales], Region = "West") +CREATE OR REPLACE VIEW DB.SCHEMA.V_SALES_WEST AS +SELECT * FROM DB.SCHEMA.FACT_SALES WHERE REGION = 'West'; +-- Power BI measure becomes simply: SUM(V_SALES_WEST[AMOUNT]) +``` + +--- + +**C5 — MEDIUM: Measures Without Format Strings** + +*Check:* Measures with empty or missing `formatString` field. + +*Violation:* Unformatted measures display raw numbers — inconsistent UX, prone to misinterpretation. + +*Fix:* Assign format strings based on measure name patterns: +- `amount`, `sales`, `revenue`, `cost`, `price` → `"$#,##0.00"` +- `pct`, `percent`, `rate`, `ratio` → `"0.00%"` +- `count`, `qty`, `quantity`, `num` → `"#,##0"` +- Generic numeric → `"#,##0.00"` + +--- + +**C6 — MEDIUM: No Dedicated Measures Table** + +*Check:* Measures are scattered across multiple dimension and fact tables (each with >1 table having measures). + +*Violation:* Scattered measures reduce discoverability and make the model harder to maintain. + +*Fix:* Create a hidden `_Measures` table in Power BI and move all measures there. In Snowflake, document this convention with a semantic view. + +--- + +**C7 — MEDIUM: Time Intelligence in Power Query Instead of DAX** + +*Check:* M expressions containing `Date.IsInCurrentYear(`, `Date.IsInPreviousYear(`, `Date.IsInCurrentMonth(`, `DateTime.LocalNow()`, or `#duration(`. + +*Violation:* Per Microsoft DirectQuery guidance, Power Query relative date filters generate inefficient `CONVERT(datetime2, ...)` native queries instead of using the date table's relative columns. + +*Fix:* Add `RELATIVE_YEAR` and `RELATIVE_MONTH` columns to the Snowflake date table (included in A2 DDL above). Then use DAX time intelligence: +``` +-- Use DAX time intelligence against the date table, not M filters: +Sales YTD = TOTALYTD([Total Sales], Date[Date]) +Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date])) +``` + +--- + +**C8 — LOW: Implicit Measures on Numeric Columns** + +*Check:* Numeric columns in fact tables that don't have a corresponding explicit measure. Infer by checking if `summarizeBy` is not `"none"` for numeric columns (ID columns, flag columns). + +*Violation:* Auto-aggregation of numeric columns (e.g., summing an `ORDER_ID`) produces meaningless results and confuses report authors. + +*Fix:* For fact columns that should be aggregated, create explicit measures. For ID/flag/code columns, set `summarizeBy = "none"` in Power BI (or recommend adding to the model with this property). Alternatively, in Snowflake ensure ID columns use `INTEGER` type (non-summarizable by convention). + +--- + +**C9 — LOW: Measures Without Descriptions** + +*Check:* Measures with empty or missing `description` field. + +*Violation:* Undocumented measures lead to misuse and duplication. + +*Fix:* Add descriptions to all measures. Mirror business logic documentation in Snowflake: +```sql +COMMENT ON COLUMN DB.SCHEMA.SEMANTIC_VIEW.TOTAL_SALES IS 'Sum of all invoice amounts. Excludes cancelled orders.'; +``` + +--- + +**C10 — LOW: MEDIAN on Potentially High-Cardinality Data** + +*Check:* Measures containing `MEDIAN(` or `MEDIANX(`. + +*Violation:* Power BI does not push MEDIAN to Snowflake — it retrieves all detail rows and computes locally. For large tables this hits the 1M row limit and causes failures. + +*Fix:* Pre-compute median in Snowflake: +```sql +-- Replace MEDIAN(Sales[Price]) with a Snowflake pre-computed value +CREATE OR REPLACE VIEW DB.SCHEMA.V_MEDIAN_PRICE AS +SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY PRICE) AS MEDIAN_PRICE +FROM DB.SCHEMA.FACT_SALES; +``` + +## Output + +Return `findings[]` to the router (SKILL.md Step 3). diff --git a/skills/powerbi-best-practices-analyzer/performance-security/SKILL.md b/skills/powerbi-best-practices-analyzer/performance-security/SKILL.md new file mode 100644 index 00000000..b16290ea --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/performance-security/SKILL.md @@ -0,0 +1,176 @@ +--- +name: pbi-bpa-performance-security +description: DirectQuery performance, RLS, and report design sub-skill for the Power BI Best Practices Analyzer. +parent_skill: powerbi-best-practices-analyzer +--- + +# Performance, Security & Report Design Checks + +Run all rules below. Append violations to `findings[]`. + +--- + +## Category E: Performance, Security & Report Design + +**E1 — MEDIUM: Row-Level Security Defined in Power BI** + +*Check:* `model.roles` array is non-empty. + +*Violation:* RLS defined in Power BI duplicates security logic that belongs in Snowflake. With Power BI RLS, data is fetched first and then filtered — Snowflake row access policies filter at the database level, which is more efficient and centralized. + +*Fix:* Move RLS to Snowflake: +```sql +-- Row Access Policy (filter by current user's region) +CREATE OR REPLACE ROW ACCESS POLICY DB.SCHEMA.RAP_BY_REGION +AS (REGION VARCHAR) RETURNS BOOLEAN -> + CURRENT_USER() IN ( + SELECT USERNAME FROM DB.SCHEMA.USER_REGION_MAP WHERE REGION = REGION + ); + +ALTER TABLE DB.SCHEMA.FACT_SALES ADD ROW ACCESS POLICY DB.SCHEMA.RAP_BY_REGION ON (REGION); + +-- Column Masking Policy (mask PII for non-privileged roles) +CREATE OR REPLACE MASKING POLICY DB.SCHEMA.MASK_EMAIL AS (VAL VARCHAR) RETURNS VARCHAR -> + CASE WHEN CURRENT_ROLE() = 'ANALYST_ROLE' THEN VAL + ELSE '***MASKED***' END; + +ALTER TABLE DB.SCHEMA.DIM_CUSTOMER MODIFY COLUMN EMAIL SET MASKING POLICY DB.SCHEMA.MASK_EMAIL; +``` +Enable SSO between Power BI and Snowflake so Snowflake policies auto-enforce per user. + +--- + +**E2 — HIGH: MEDIAN on Potentially Large Tables** + +*Check:* Any measure containing `MEDIAN(` or `MEDIANX(` applied to a fact table (a table with many rows, indicated by being the source of multiple relationship many-sides). + +*Violation:* MEDIAN is not pushed to Snowflake — Power BI retrieves ALL detail rows and computes locally. For large tables this causes 1M-row limit failures or extreme memory pressure. + +*Fix:* Pre-compute in Snowflake using `PERCENTILE_CONT`: +```sql +CREATE OR REPLACE VIEW DB.SCHEMA.V_MEDIAN_METRICS AS +SELECT + PRODUCT_ID, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY UNIT_PRICE) AS MEDIAN_UNIT_PRICE, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY QUANTITY) AS MEDIAN_QUANTITY +FROM DB.SCHEMA.FACT_SALES +GROUP BY PRODUCT_ID; +``` +Reference this view from Power BI instead of computing MEDIAN in DAX. + +--- + +**E3 — HIGH: DISTINCTCOUNT on High-Cardinality Columns in DirectQuery** + +*Check:* Measures containing `DISTINCTCOUNT(` on columns likely to have high cardinality (column name contains: `id`, `order`, `transaction`, `session`, `user`). + +*Violation:* DISTINCTCOUNT is computed in Power BI locally in DirectQuery mode — causes extra query round-trips and high memory usage for large cardinality sets. Visual totals using DISTINCTCOUNT also require additional queries. + +*Fix:* Pre-aggregate distinct counts in Snowflake: +```sql +CREATE OR REPLACE DYNAMIC TABLE DB.SCHEMA.AGG_DISTINCT_CUSTOMERS + TARGET_LAG = '1 hour' WAREHOUSE = MY_WH AS +SELECT DATE_TRUNC('month', ORDER_DATE) AS MONTH, + REGION, + COUNT(DISTINCT CUSTOMER_ID) AS DISTINCT_CUSTOMERS +FROM DB.SCHEMA.FACT_SALES +GROUP BY 1, 2; +``` + +--- + +**E4 — MEDIUM: Excessive Visuals Pattern (Design Guidance)** + +*Check:* Count unique visual containers in the report layer. If the PBIT has report pages, note as general guidance. + +*Violation:* Per Microsoft guidance, each visual on a page generates its own DAX/SQL query. Pages with many visuals slow page load significantly in DirectQuery mode. + +*Fix recommendations:* +- Limit to 5-8 visuals per report page +- Use drillthrough pages instead of cramming detail visuals on summary pages +- Use bookmarks to toggle between visual states (reduce active visuals) +- Replace multiple card visuals with a single multi-row card +- Use Query Reduction options (File → Options → Query Reduction): add Apply button to slicers, disable cross-highlighting by default + +--- + +**E5 — MEDIUM: Cross-Filtering Not Constrained** + +*Check:* Bi-directional relationships detected (from data-modeling sub-skill) combined with DirectQuery source → note here as a report-design amplifier. + +*Violation:* Each cross-filter interaction in a DirectQuery report sends additional queries per visual. With many visuals and bi-directional filters, a single slicer click can trigger 10-20 Snowflake queries. + +*Fix:* +- Switch off cross-highlighting/filtering for non-essential visual pairs (Edit Interactions in Power BI) +- Enable "Add an Apply button to each slicer" in Query Reduction settings +- Enable "Add a single Apply button to the page to apply all filter changes at once" + +--- + +**E6 — MEDIUM: Visual Totals Enabled with DISTINCTCOUNT or MEDIAN** + +*Check:* Measures using `DISTINCTCOUNT(` or `MEDIAN(` / `MEDIANX(` (already flagged in E2/E3). If present, note that visual totals require additional queries. + +*Violation:* Tables/matrices display totals by default. For DISTINCTCOUNT/MEDIAN measures, Power BI sends additional queries to compute these totals — can double query count per visual. + +*Fix:* Disable visual totals for DISTINCTCOUNT/MEDIAN measures in the Format pane of each visual, or pre-compute totals in Snowflake. + +--- + +**E7 — LOW: No Dedicated Snowflake Warehouse for Power BI** + +*Check:* All M expressions share the same `#"Snowflake Warehouse"` parameter. Note as a recommendation. + +*Violation:* Using a shared warehouse for Power BI reports creates resource contention with ETL, notebook, and other workloads. + +*Fix recommendations:* +- Create a dedicated reporting warehouse for Power BI: `CREATE WAREHOUSE PBI_WH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 600 AUTO_RESUME = TRUE;` +- Set auto-suspend to 10+ minutes (preserve warehouse cache for BI workloads) +- For high-concurrency: use multi-cluster warehouse with `MIN_CLUSTER_COUNT = 1, MAX_CLUSTER_COUNT = 3` +- For large Import refreshes: use a separate, larger warehouse that auto-suspends immediately after refresh + +--- + +**E8 — LOW: Snowflake Role Not Scoped to Minimum Privilege** + +*Check:* M expressions using a wildcard or admin-level Snowflake role (name contains `ADMIN`, `SYSADMIN`, `ACCOUNTADMIN`). + +*Violation:* Per Snowflake best practices, Power BI connections should use a role with access only to the required objects. Broad roles trigger excessive metadata queries (SHOW TABLES, SHOW SCHEMAS) that slow initial connection. + +*Fix:* +```sql +CREATE ROLE POWER_BI_ROLE; +GRANT USAGE ON DATABASE MY_DB TO ROLE POWER_BI_ROLE; +GRANT USAGE ON SCHEMA MY_DB.ANALYTICS TO ROLE POWER_BI_ROLE; +GRANT SELECT ON ALL TABLES IN SCHEMA MY_DB.ANALYTICS TO ROLE POWER_BI_ROLE; +GRANT SELECT ON ALL VIEWS IN SCHEMA MY_DB.ANALYTICS TO ROLE POWER_BI_ROLE; +GRANT FUTURE GRANTS ON TABLES IN SCHEMA MY_DB.ANALYTICS TO ROLE POWER_BI_ROLE; +-- Then connect Power BI with Role = "POWER_BI_ROLE" +``` + +--- + +**E9 — MEDIUM: Composite Model Not Used for Large DirectQuery Models** + +*Check:* All tables in the model use DirectQuery (all M sources are Snowflake, no Import-mode tables). More than 5 tables detected. + +*Violation:* Per Microsoft guidance, Composite Models (mixing Import dimension tables with DirectQuery fact tables) significantly improve performance. Dimension tables in Import mode enable fast filtering without hitting Snowflake for every slicer interaction. + +*Fix:* Switch dimension tables (Customer, Product, Date, etc.) to Import storage mode in Power BI. Keep large fact tables in DirectQuery. This reduces Snowflake queries for filter/slice operations to near-zero. + +--- + +**E10 — LOW: Authentication Using Username/Password** + +*Check:* M expressions not containing `[Role=...]` or where the connection string pattern suggests basic auth. + +*Violation:* Snowflake is deprecating single-factor password authentication. Power BI should use OAuth/SSO or key-pair authentication. + +*Fix recommendations:* +- Enable Azure AD OAuth for Power BI → Snowflake SSO (Snowflake External OAuth) +- This ensures Power BI queries respect Snowflake row access policies per user identity +- See: [Snowflake OAuth for Power BI](https://docs.snowflake.com/en/user-guide/oauth-powerbi) + +## Output + +Return `findings[]` to the router (SKILL.md Step 3). diff --git a/skills/powerbi-best-practices-analyzer/power-query/SKILL.md b/skills/powerbi-best-practices-analyzer/power-query/SKILL.md new file mode 100644 index 00000000..8091a22a --- /dev/null +++ b/skills/powerbi-best-practices-analyzer/power-query/SKILL.md @@ -0,0 +1,182 @@ +--- +name: pbi-bpa-power-query +description: Power Query (M), connector, and storage mode sub-skill for the Power BI Best Practices Analyzer. +parent_skill: powerbi-best-practices-analyzer +--- + +# Power Query, Connector & Storage Mode Checks + +Run all rules below against M expressions and connection settings. Append violations to `findings[]`. + +--- + +## Category D: Power Query, Connector & Storage + +**D1 — CRITICAL: Data Transformation Logic in Power Query** + +*Check:* M expressions containing any transformation beyond a simple table reference: +- `Table.AddColumn(` — computed columns +- `Table.SelectRows(` or `Table.SelectColumns(` — row/column filtering +- `Table.Group(` — aggregations +- `Table.Join(` or `Table.NestedJoin(` — joins +- `Table.RenameColumns(`, `Table.TransformColumnTypes(` — structural changes +- `Text.`, `Date.`, `Number.` function calls — value transformations + +*Violation:* Per Snowflake and Microsoft guidance, all data preparation should happen upstream in Snowflake. Logic in Power Query is brittle, unmaintainable, and prevents query folding in DirectQuery mode. + +*Fix:* For each transform detected, generate the Snowflake equivalent: + +| M Pattern | Snowflake Fix | +|-----------|---------------| +| `Table.AddColumn(..., each [A] & [B])` | `SELECT *, A \|\| B AS NEW_COL FROM ...` | +| `Table.SelectRows(..., each [Status] = "Active")` | `CREATE VIEW ... AS SELECT * WHERE STATUS = 'Active'` | +| `Table.Group(..., {"Col"}, {{"Total", each List.Sum([Amt])}})` | `CREATE DYNAMIC TABLE ... AS SELECT COL, SUM(AMT) FROM ... GROUP BY COL` | +| `Table.Join(A, "ID", B, "ID")` | Implement join in a Snowflake VIEW | + +```sql +-- Example: replace complex M transforms with a clean Snowflake view +CREATE OR REPLACE VIEW DB.SCHEMA.V_TRANSFORMED AS +SELECT + ORDER_ID::INTEGER AS ORDER_ID, + CUSTOMER_ID || '-' || SOURCE AS FK_KEY, -- replaces AddColumn concat + UPPER(TRIM(PRODUCT_NAME)) AS PRODUCT_NAME, -- replaces Text transforms + ROUND(AMOUNT, 2) AS AMOUNT +FROM DB.SCHEMA.RAW_TABLE +WHERE STATUS = 'Active'; -- replaces SelectRows +``` + +--- + +**D2 — HIGH: Relative Date Filtering in Power Query** + +*Check:* M expressions containing `Date.IsInCurrentYear(`, `Date.IsInPreviousYear(`, `Date.IsInCurrentMonth(`, `Date.IsInCurrentQuarter(`, `DateTime.LocalNow()`, `#duration(`, or `Date.From(DateTime.LocalNow())`. + +*Violation:* Per Microsoft DirectQuery guidance, Power Query relative date filters translate to inefficient native queries using `CONVERT(datetime2, ...)` with hard-coded date literals that are not index-friendly. + +*Fix:* Add relative time columns to the Snowflake date table (replace hard-coded date filtering): +```sql +-- Add to DIM_DATE table or view: +ALTER TABLE DB.SCHEMA.DIM_DATE ADD COLUMN RELATIVE_YEAR INTEGER; +ALTER TABLE DB.SCHEMA.DIM_DATE ADD COLUMN RELATIVE_MONTH INTEGER; +ALTER TABLE DB.SCHEMA.DIM_DATE ADD COLUMN IS_CURRENT_YEAR BOOLEAN; +ALTER TABLE DB.SCHEMA.DIM_DATE ADD COLUMN IS_CURRENT_MONTH BOOLEAN; +ALTER TABLE DB.SCHEMA.DIM_DATE ADD COLUMN IS_CURRENT_QTR BOOLEAN; + +UPDATE DB.SCHEMA.DIM_DATE SET + RELATIVE_YEAR = YEAR(DATE) - YEAR(CURRENT_DATE()), + RELATIVE_MONTH = DATEDIFF('month', CURRENT_DATE(), DATE), + IS_CURRENT_YEAR = (YEAR(DATE) = YEAR(CURRENT_DATE())), + IS_CURRENT_MONTH = (DATE_TRUNC('month', DATE) = DATE_TRUNC('month', CURRENT_DATE())), + IS_CURRENT_QTR = (DATE_TRUNC('quarter', DATE) = DATE_TRUNC('quarter', CURRENT_DATE())); +``` +Then filter in Power BI using these columns with DAX: `CALCULATE([Sales], DimDate[IS_CURRENT_YEAR] = TRUE)`. + +--- + +**D3 — HIGH: Custom SQL Hard-Coded in Power Query** + +*Check:* M expressions containing `Value.NativeQuery(`, `Query = "SELECT`, or `Sql.Database(` with a `Query` parameter. + +*Violation:* Custom SQL embeds logic inside the dataset definition — brittle, duplicates logic, hard to maintain, prevents query folding. + +*Fix:* Move SQL into a Snowflake VIEW or DYNAMIC TABLE: +```sql +-- Move this out of Power BI: +-- Query = "SELECT o.*, c.NAME FROM ORDERS o JOIN CUSTOMERS c ON o.CUST_ID = c.ID WHERE o.STATUS = 'Shipped'" +-- Into Snowflake: +CREATE OR REPLACE VIEW DB.SCHEMA.V_SHIPPED_ORDERS AS +SELECT o.*, c.NAME AS CUSTOMER_NAME +FROM DB.SCHEMA.ORDERS o +JOIN DB.SCHEMA.CUSTOMERS c ON o.CUST_ID = c.ID +WHERE o.STATUS = 'Shipped'; +-- Power BI connects directly to V_SHIPPED_ORDERS +``` + +--- + +**D4 — MEDIUM: Aggregation (Group By) in Power Query** + +*Check:* M expressions containing `Table.Group(`. + +*Violation:* Power Query aggregations are computed in Power BI memory on every refresh. For large tables this is slow and wasteful — Snowflake can do this more efficiently at source. + +*Fix:* Use a Snowflake DYNAMIC TABLE for automated refresh, or a MATERIALIZED VIEW for static aggregation: +```sql +CREATE OR REPLACE DYNAMIC TABLE DB.SCHEMA.AGG_DAILY_SALES + TARGET_LAG = '1 hour' + WAREHOUSE = REPORTING_WH +AS +SELECT + DATE_TRUNC('day', ORDER_DATE) AS ORDER_DAY, + PRODUCT_ID, + REGION, + SUM(AMOUNT) AS TOTAL_SALES, + COUNT(*) AS ORDER_COUNT, + SUM(QUANTITY) AS TOTAL_QTY +FROM DB.SCHEMA.FACT_ORDERS +GROUP BY 1, 2, 3; +``` + +--- + +**D5 — MEDIUM: Generic ODBC Connector Instead of Native Snowflake** + +*Check:* M expressions containing `Odbc.DataSource(` or `Odbc.Query(`. + +*Violation:* Generic ODBC bypasses Snowflake-specific connector optimizations: Arrow/ADBC fast transfers, query folding, Snowflake query tags, and the new ADBC connector improvements. + +*Fix:* Replace with native Snowflake connector: +``` +// Replace: +Odbc.DataSource("dsn=MySnowflakeDSN", ...) +// With: +Snowflake.Databases("account.snowflakecomputing.com", "MY_WH", [Role="MY_ROLE"]) +``` + +--- + +**D6 — LOW: Non-Snowflake Data Sources Mixed In** + +*Check:* M expressions NOT containing `Snowflake.Databases(` (Excel, SharePoint, SQL Server, OData, etc.). + +*Violation:* Mixed sources prevent full DirectQuery optimization, complicate governance, and fragment data lineage. + +*Fix:* Land non-Snowflake data into Snowflake first using: +- Snowflake connectors (e.g., Fivetran, dbt, Snowpipe) +- External stages for file-based sources +- Snowflake's native Excel/CSV ingestion +Then connect Power BI to a single unified Snowflake source. + +--- + +**D7 — LOW: Include Relationship Columns Enabled** + +*Check:* Detect Snowflake connections and note as a general recommendation (not directly in DataModelSchema). + +*Violation:* "Include relationship columns" in the Snowflake connector advanced options causes excessive metadata queries, slowing initial model load and refresh. + +*Fix:* Uncheck "Include relationship columns" in Power Query → Snowflake connector → Advanced Options. This setting is especially impactful for accounts with many objects. + +--- + +**D8 — MEDIUM: Large Tables Fully Loaded in Import Mode** + +*Check:* Fact tables (identified by having >5 numeric columns and being referenced by multiple relationships) present in Import-mode model without visible row filtering or aggregation in M. + +*Violation:* Per Microsoft guidance, large fact tables should use DirectQuery storage mode (or be pre-aggregated). Full Import of large fact tables causes slow refresh and high memory usage. + +*Fix:* Switch large fact tables to DirectQuery, keep dimension tables in Import mode (Composite Model pattern). Or pre-summarize in Snowflake: +```sql +CREATE OR REPLACE DYNAMIC TABLE DB.SCHEMA.AGG_FACT_MONTHLY + TARGET_LAG = 'downstream' + WAREHOUSE = MY_WH +AS +SELECT DATE_TRUNC('month', ORDER_DATE) AS MONTH, PRODUCT_ID, REGION, + SUM(AMOUNT) AS SALES, COUNT(*) AS ORDERS +FROM DB.SCHEMA.FACT_SALES +GROUP BY 1, 2, 3; +``` + +## Output + +Return `findings[]` to the router (SKILL.md Step 3). diff --git a/skills/powerbi-reverse-engineer/LICENSE b/skills/powerbi-reverse-engineer/LICENSE new file mode 100644 index 00000000..76ff4c06 --- /dev/null +++ b/skills/powerbi-reverse-engineer/LICENSE @@ -0,0 +1,189 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work. + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by the Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding any notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + Copyright 2026 Josh Crittenden + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/skills/powerbi-reverse-engineer/README.md b/skills/powerbi-reverse-engineer/README.md new file mode 100644 index 00000000..85198ce1 --- /dev/null +++ b/skills/powerbi-reverse-engineer/README.md @@ -0,0 +1,43 @@ +# Power BI Reverse Engineer + +Convert your Power BI semantic models into Snowflake Semantic Views with Cortex Code. This skill extracts the complete model from a `.pbit`/`.pbix` file or a live model via the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp), analyzes source tables, relationships, and DAX measures, then generates a single `.sql` file containing all supporting DDL and the `CREATE OR REPLACE SEMANTIC VIEW` statement. + +## How It Works + +1. **Extract** the semantic model (tables, columns, relationships, DAX measures, Power Query M expressions) +2. **Analyze** Power Query sources to map logical Power BI table names to physical Snowflake tables +3. **Convert** DAX measures to SQL metrics (simple aggregations, ratios, YoY, rolling calculations) +4. **Generate** a complete `.sql` file with: + - Supporting objects (date dimension tables, helper views for materialized FK columns) + - The full `CREATE OR REPLACE SEMANTIC VIEW` DDL with tables, relationships, facts, dimensions, and metrics + - Documentation of unconverted DAX measures requiring manual review + +## Usage + +``` +$powerbi-reverse-engineer convert ~/reports/sales.pbit to a semantic view in ANALYTICS_DB.REPORTING +``` + +``` +$powerbi-reverse-engineer reverse engineer the model open in Power BI Desktop called "Customer 360" into a semantic view +``` + +``` +$powerbi-reverse-engineer convert semantic model "Finance Reporting" from workspace "Corp Finance" to FINANCE_DB.SEMANTIC_LAYER.FINANCE_SV +``` + +## Output + +- `_semantic_view.sql` — Single SQL file containing supporting database objects, the semantic view DDL, and documentation for any items needing manual review + +## Prerequisites + +- A `.pbit` or `.pbix` file, **or** the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) running with access to your model +- Cortex Code with the skill installed +- Target Snowflake database and schema where the semantic view will be created + +## License + +Apache License 2.0 — see [LICENSE](LICENSE) + +**Author**: Josh Crittenden diff --git a/skills/powerbi-reverse-engineer/SKILL.md b/skills/powerbi-reverse-engineer/SKILL.md new file mode 100644 index 00000000..3f2d2633 --- /dev/null +++ b/skills/powerbi-reverse-engineer/SKILL.md @@ -0,0 +1,296 @@ +--- +id: powerbi-reverse-engineer +name: powerbi-reverse-engineer +skill-name: $powerbi-reverse-engineer +description: "Reverse engineer a Power BI semantic model into a Snowflake Semantic View with complete DDL output. Supports .pbit/.pbix files or live models via the Power BI Modeling MCP Server." +prompt: "$powerbi-reverse-engineer convert my Power BI file at /path/to/report.pbit into a Snowflake Semantic View" +language: en +status: Published +author: Josh Crittenden +type: community +--- + +# Power BI Reverse Engineer + +# When to Use +- User provides a `.pbit` or `.pbix` file and wants to convert it to a Snowflake Semantic View +- User wants to migrate Power BI business logic (DAX measures) to Snowflake SQL metrics +- User wants to enable Cortex Agents or Snowflake Intelligence for data already modeled in Power BI +- User has the Power BI Modeling MCP Server running and wants to reverse engineer a live model +- Do NOT use for auditing best practices only (use `$powerbi-bpa` instead) + +# What This Skill Provides +Extracts the complete semantic model from a Power BI file or live model, analyzes source tables, relationships, and DAX measures, then generates a `.sql` file containing all recommended database objects (views, tables) and the `CREATE OR REPLACE SEMANTIC VIEW` DDL. + +# References +- [Snowflake Semantic Views](https://docs.snowflake.com/en/user-guide/views-semantic/overview) +- [Cortex Analyst](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/cortex-analyst) +- [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) + +# Instructions + +## Step 1: Determine Input Method and Extract Model + +**Ask** the user how they want to provide their Power BI model. Two methods are supported: + +### Option A: File-based (.pbit or .pbix) + +1. **Extract** the DataModelSchema from the archive: + ```bash + mkdir -p /tmp/pbit_extract + unzip -o "" DataModelSchema -d /tmp/pbit_extract + iconv -f UTF-16LE -t UTF-8 /tmp/pbit_extract/DataModelSchema > /tmp/pbit_extract/DataModelSchema.json + ``` + +2. **Parse** the JSON: + ```python + import json + with open('/tmp/pbit_extract/DataModelSchema.json', 'r') as f: + data = json.load(f) + model = data.get('model', {}) + tables = model.get('tables', []) + relationships = model.get('relationships', []) + ``` + +### Option B: Power BI Modeling MCP Server + +If the user has the [Power BI Modeling MCP Server](https://github.com/microsoft/powerbi-modeling-mcp) running: + +1. **Connect** to the model: + - For Power BI Desktop: `Connect to '[File Name]' in Power BI Desktop` + - For Fabric workspace: `Connect to semantic model '[Model Name]' in Fabric Workspace '[Workspace Name]'` + +2. **Retrieve** all model metadata via MCP tools: + - `model_operations` — overall model properties + - `table_operations` (list) — all tables + - `column_operations` (list) — columns per table with data types and expressions + - `measure_operations` (list) — all DAX measures with expressions + - `relationship_operations` (find) — all relationships with cardinality and cross-filter behavior + - `security_role_operations` — RLS definitions + - `named_expression_operations` (list) — Power Query M expressions and parameters + +**Output:** Parsed model object with tables, relationships, measures, and M expressions. + +**If error occurs:** +- File not a valid ZIP: Check file extension, try different encoding +- DataModelSchema not found: File may be corrupted or incompatible PBI version +- UTF-16 decode fails: Try UTF-16BE instead of UTF-16LE +- MCP connection fails: Verify Power BI Desktop is open or Fabric workspace is accessible + +## Step 2: Analyze Power Query (M Expressions) + +**Goal:** Identify actual source table names and data sources for every table. + +**Extract** M expressions from `partitions[].source.expression` (file-based) or via `named_expression_operations` (MCP). + +**Analyze** each M expression to identify source: + +| Pattern | Source Type | What to Extract | +|---------|------------|-----------------| +| `Snowflake.Databases(...)` | Snowflake | Server, warehouse, role, database, schema, table name | +| `Sql.Database(...)` | SQL Server | Server, database, query/table | +| `PowerPlatform.Dataflows(...)` | Power Platform Dataflow | Workspace ID, dataflow ID, entity name | +| `Table.FromRows(Json.Document(...))` | Static/Embedded | Inline data (decode Base64) | +| DAX expression (type = "calculated") | DAX Calculated Table | Parse DAX CALENDAR/ADDCOLUMNS | +| `OData.Feed(...)` | OData | Service URL, entity | +| `Web.Contents(...)` | Web/REST API | URL | +| `Excel.Workbook(...)` | Excel | File path | + +**Also identify:** +- Computed columns from Power Query (`Table.AddColumn` calls): FK/PK columns, derived columns, type transformations +- Power BI parameters (e.g., `SnowflakeDatabase1`) in M expressions + +**⚠️ IMPORTANT:** Capture actual physical table names from M expressions (e.g., `Schema{[Name = "CUSTOMER_SCORECARD"]}`) as these differ from Power BI logical names. + +**Output:** Source mapping table (PBI table → source type → physical name). + +## Step 3: Analyze the Semantic Model + +**Goal:** Document all tables, columns, relationships, and measures. + +**3a. Tables and Columns:** +Categorize each column as: +- **Physical columns** (have `sourceColumn`) → Dimensions or Facts +- **Computed columns** (have DAX `expression`) → Note the expression +- **Power Query computed** (from M `Table.AddColumn`) → Note derivation + +**3b. Relationships:** +Extract all relationships noting: fromTable, fromColumn, toTable, toColumn, isActive, crossFilteringBehavior. Flag inactive relationships (used via USERELATIONSHIP in DAX). + +**3c. DAX Measures:** +Categorize every measure: +1. **Simple aggregations** (SUM, COUNT, AVG, MIN, MAX) → Convert directly to SQL metrics +2. **Ratio/division** (DIVIDE) → Convert to DIV0(...) metrics +3. **Period wrappers** (SELECTEDVALUE + SWITCH for MTD/QTD/YTD) → Document in AI_SQL_GENERATION +4. **Year-over-year** (SAMEPERIODLASTYEAR, PARALLELPERIOD) → Convert with date filter logic +5. **Rolling calculations** (DATESINPERIOD) → Document in AI_SQL_GENERATION +6. **Display/formatting** (FORMAT, UNICHAR, text concat) → SKIP (UI-only) +7. **Complex DAX** (CALCULATETABLE, SUMMARIZE) → Convert to closest SQL equivalent or note complexity + +**3d. Row-Level Security (RLS):** +Document any RLS roles and table permissions. + +**3e. Calculated Tables:** +Tables with `partitions[].source.type == "calculated"` are DAX-generated. Extract the DAX and plan SQL DDL conversion (e.g., CALENDAR → GENERATOR-based date table). + +**⚠️ STOPPING POINT:** Present the full analysis to the user: +- Table mapping (PBI name → source → physical table) +- Column counts per table +- Measure categorization summary (N convertible, N skip, N manual review) +- Non-Snowflake sources identified +- RLS if present + +## Step 4: Ask User for Target Configuration + +**Ask** the user: +1. **Target database.schema** for the semantic view +2. **Semantic view name** (suggest based on PBI file name) +3. **Confirm** which non-Snowflake sources to comment out vs. include with placeholder table names +4. **Output file path** for the `.sql` file (suggest `_semantic_view.sql` in working directory) + +**⚠️ STOPPING POINT:** Wait for user confirmation before generating DDL. + +## Step 5: Generate Complete .sql File + +**Goal:** Produce a single `.sql` file containing ALL database objects and the semantic view DDL. + +**Structure the .sql file as follows:** + +```sql +/* +========================================================================== + Power BI to Snowflake Semantic View + Source: + Generated: + Target: .. + + Table Mapping: + PBI Logical Name → Snowflake Physical Table + ──────────────────────────────────────────────────── + + + + Non-Snowflake Sources (commented out): + → Power Platform Dataflow (not in Snowflake) + + Measure Conversion Summary: + Converted: N measures + Skipped (UI): N measures + Manual Review: N measures +========================================================================== +*/ + +-- ============================================================ +-- SECTION 1: SUPPORTING DATABASE OBJECTS +-- ============================================================ + +-- Date Dimension Table (converted from DAX CALENDAR) +CREATE TABLE IF NOT EXISTS ..DIM_DATE AS +SELECT ... +FROM (SELECT ... FROM TABLE(GENERATOR(ROWCOUNT => ...))); + +-- Helper Views (materialized FK columns, etc.) +CREATE OR REPLACE VIEW .. AS +SELECT ... +FROM ...; + +-- ============================================================ +-- SECTION 2: SEMANTIC VIEW +-- ============================================================ + +CREATE OR REPLACE SEMANTIC VIEW .. + + TABLES (...) + RELATIONSHIPS (...) + FACTS (...) + DIMENSIONS (...) + METRICS (...) + + COMMENT = '...' + AI_SQL_GENERATION '...' + AI_QUESTION_CATEGORIZATION '...' +; + +-- ============================================================ +-- SECTION 3: UNCONVERTED / MANUAL REVIEW ITEMS +-- ============================================================ +-- The following DAX measures could not be automatically converted. +-- Manual SQL equivalents should be authored and added as metrics. +-- +-- Measure: +-- DAX: +-- Reason: +``` + +**Rules for DDL generation:** + +| Element | Rule | +|---------|------| +| Tables from Snowflake | Include with actual physical table name | +| Tables from non-Snowflake | Comment out with note: `-- sourced from , not in Snowflake` | +| FK/PK computed columns | Include as dimensions (assume materialized in Snowflake) | +| Relationships using Snowflake tables | Include | +| Relationships involving non-SF tables | Comment out | +| Simple DAX aggregations | Convert to SQL metrics | +| Complex DAX measures | Convert to closest SQL equivalent | +| Display-only measures (FORMAT/UNICHAR) | Skip, document in Section 3 | +| Period wrappers (MTD/QTD/YTD) | Skip, document pattern in AI_SQL_GENERATION | +| Pipeline/non-SF dependent measures | Comment out | +| Inactive relationships | Comment out with note about USERELATIONSHIP usage | + +**SYNONYMS:** Add synonyms for key business columns (customer name, product line, dates, status fields). + +**AI_SQL_GENERATION:** Include instructions covering: +- Key metric definitions and business logic +- Period filtering approach (MTD/QTD/YTD via date filters) +- Any special calculation rules +- FK derivation logic for reference + +**AI_QUESTION_CATEGORIZATION:** Describe the business domains this model covers. + +**⚠️ STOPPING POINT:** Present the generated `.sql` file to the user for review before finalizing. + +## Step 6: Save and Present Final Deliverables + +**Save** the `.sql` file to the path confirmed in Step 4. + +**Present** to user: +1. **File location** of the generated `.sql` file +2. **Summary** — table count, relationship count, metrics converted, items needing manual review +3. **Next steps** — deploy to Snowflake, create Cortex Agent, add verified queries + +**If error occurs:** +- M expression unrecognizable: Flag as unknown source, ask user to identify +- DAX too complex to convert: Document in Section 3 of the .sql file +- Unknown error: Ask user for guidance + +# Best Practices +- Always capture physical table names from M expressions, not Power BI logical names +- Column names with spaces require quoting in Snowflake DDL +- Semantic view RELATIONSHIPS only support column references (no expressions/CONCAT), so FK columns must be materialized +- Inactive PBI relationships correspond to DAX USERELATIONSHIP() and represent alternate join paths +- Power BI parameters appear in M expressions as variable references; resolve them to actual values when possible + +# Stopping Points +- ✋ After Step 3 — Analysis review before DDL generation +- ✋ After Step 4 — Target configuration confirmation +- ✋ After Step 5 — Generated .sql file review before saving + +**Resume rule:** Upon user approval, proceed directly to the next step without re-asking. + +# Output +- `_semantic_view.sql` — Single .sql file containing supporting objects + semantic view DDL + unconverted items documentation + +# Examples + +## Example 1: File-based reverse engineering +User: $powerbi-reverse-engineer convert ~/reports/sales.pbit to a semantic view in ANALYTICS_DB.REPORTING +Assistant: Extracts DataModelSchema, analyzes M expressions and DAX measures, presents analysis for review, generates complete .sql file with supporting DDL and semantic view. + +## Example 2: MCP-based reverse engineering +User: $powerbi-reverse-engineer reverse engineer the model open in Power BI Desktop called "Customer 360" into a semantic view +Assistant: Connects via MCP, retrieves full model metadata, analyzes sources and measures, presents analysis, asks for target database.schema, generates .sql file. + +## Example 3: Fabric workspace model +User: $powerbi-reverse-engineer convert semantic model "Finance Reporting" from workspace "Corp Finance" to FINANCE_DB.SEMANTIC_LAYER.FINANCE_SV +Assistant: Connects to Fabric via MCP, retrieves model, maps tables to Snowflake sources, converts DAX to SQL metrics, generates .sql file with all DDL.