Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions skills/mlops/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Snowflake Skills License

© 2026 Snowflake Inc. All rights reserved.

LICENSE: Use of these materials (including all code, prompts, assets, files, and other components of these skills (collectively, “Skills”)) is governed by your agreement with Snowflake for the Service. If no separate agreement exists, use is governed by Snowflake’s Terms of Service (available at: https://www.snowflake.com/en/legal/terms-of-service/).

Your applicable agreement is referred to as the "Agreement." "Service" is as defined in the Agreement.

ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the contrary, you may not:

* Extract from the Service or retain copies of the Skills outside use with the Service;
* Reproduce or copy the Skills , except for temporary copies created automatically during authorized use of the Service;
* Create derivative works based on the Skills;
* Distribute, sublicense, or transfer the Skills to any third party;
* Make, offer to sell, sell, or import any inventions embodied in the Skills; nor,
* Reverse engineer, decompile, or disassemble the Skills.

The receipt, viewing, or possession of the Skills does not convey or imply any license or right beyond those expressly granted above.

Snowflake retains all rights, title, and interest in the Skills, including all copyrights, trademarks, patents, and all other applicable intellectual property rights.

THE SKILLS ARE PROVIDED “AS IS,” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SKILLS OR THE USE OR OTHER DEALINGS IN THE SKILLS.
104 changes: 104 additions & 0 deletions skills/mlops/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
name: mlops
title: Plan and run MLOps
summary: Router for MLOps work on Snowflake — assess maturity, pick promotion patterns, and implement CI/CD, monitoring, and governance.
description: "Use when a developer or data engineer wants to assess MLOps maturity, design a promotion strategy (Code/Model/Hybrid), or implement MLOps capabilities (CI/CD, monitoring, retraining, governance) on Snowflake for traditional ML or LLM/GenAI workloads. Triggers: mlops, mlops maturity, mlops assessment, mlops strategy, mlops pattern, mlops framework, model promotion, ml ci/cd, ml monitoring, llmops, rag pipeline ops, fine-tuning ops."
prompt: Help me set up MLOps on Snowflake for my ML project.
language: en
status: Published
author: Snowflake Solutions Team
type: snowflake
tools:
- snowflake_sql_execute
- Bash
- Read
- Write
- Edit
- Glob
- Grep
---

# Plan and run MLOps

## Overview

Router skill for operationalizing ML and LLM/GenAI workloads on Snowflake. It covers the *process and governance layer* — when to promote, what gates to enforce, what to monitor, how to roll back. It does **not** cover SDK-level code (model registration, feature store APIs, training loops) — that belongs to the `machine-learning` skill.

This skill applies to traditional ML *and* GenAI (prompt management, RAG, fine-tuning, agentic apps). There is no separate "LLMOps" — LLM operationalization is part of MLOps with workload-specific adaptations.

**Scope split**

| Question | Owner |
|---|---|
| When should I promote a model? What gates must it pass? | mlops |
| How do I register a model or deploy an endpoint? (code) | machine-learning |
| What should I monitor after deployment? When to roll back? | mlops |
| How do I set up Feature Store / Cortex Search? (code) | machine-learning |
| How should I govern Feature Store / Registry across environments? | mlops |
| How do I train / fine-tune / build RAG? (code) | machine-learning |
| How should I operationalize training across environments? | mlops |

**Platform constraint:** All recommendations assume Snowflake as the platform (Model Registry, Feature Store, Cortex AI, Snowpark, Tasks/Streams). Do not propose third-party platforms unless the user explicitly asks.

**Explain before asking:** Always introduce concepts (maturity levels L0–L3, promotion patterns, capability dimensions) before asking the user to make decisions about them. Do not assume prior knowledge.

## Sub-flows

- `implement-patterns/INSTRUCTIONS.md` — implementation playbooks for promotion, CI/CD, monitoring, governance (includes maturity assessment as part of the pattern selection workflow)

## Workflow

### Step 1: Detect intent

Ask the user which path they need:

1. **Assessment & strategy** — evaluate current maturity, pick patterns, build a roadmap
2. **Implementation patterns** — guidance for a specific capability (CI/CD, monitoring, etc.)
3. **Full setup** — end-to-end MLOps design from scratch

### Step 2: Route

| Intent | Route |
|---|---|
| ASSESS — "assess maturity", "gap analysis", "roadmap", "where are we" | Load `implement-patterns/INSTRUCTIONS.md` — start with promotion pattern determination |
| PATTERNS — "promotion pattern", "ci/cd", "monitoring", "retraining", "feature store governance", "RAG pipeline ops", "LLM monitoring" | Load `implement-patterns/INSTRUCTIONS.md` |
| FULL SETUP — "setup mlops from scratch", "end to end" | Load `implement-patterns/INSTRUCTIONS.md` — start with promotion pattern determination, then work through capabilities per priority |

⚠️ STOPPING POINT: Before loading `implement-patterns/INSTRUCTIONS.md`, the user MUST have an explicit promotion pattern (Code / Model / Hybrid). If unknown, run the decision tree (ask about team structure, artifact type, deployment frequency). Do not generate implementation guidance without it.

### Step 3: Per-message intent re-evaluation

On every user message — not just the first — re-check intent. If the user shifts to implementation ("start with X", "let's build", "show me the code", "what SQL do I need"):

1. STOP generating from general knowledge.
2. Load `implement-patterns/INSTRUCTIONS.md` immediately, passing known context (pattern, maturity, environments).
3. If promotion pattern is unknown, determine it briefly before loading.

## Common Mistakes

- Generating implementation code from general knowledge instead of loading `implement-patterns/INSTRUCTIONS.md`.
- Skipping promotion-pattern selection and producing pattern-agnostic recommendations (they will be wrong).
- Treating LLM/GenAI as a separate "LLMOps" track instead of a workload variant.
- Recommending non-Snowflake tools (SageMaker, Vertex, Databricks, MLflow) when the user did not ask.
- Answering "how do I register a model" inside this skill — that's `machine-learning`.
- Asking the user to choose between L1 and L2 without first explaining what the levels mean.

## Red Flags

Refuse these rationalizations:

- "The user seems to know what they want, I'll skip the promotion-pattern question." — No. Pattern is a hard prerequisite.
- "I'll generate the CI/CD pipeline from memory, faster than loading the sub-flow." — No. Sub-flow content is curated and tested; general-knowledge output drifts.
- "They asked about MLflow, I'll just answer." — Only if they explicitly asked. Default is Snowflake-native.
- "The roadmap is obvious, I'll skip the assessment." — No. Maturity baseline drives sequencing.
- "They want to start implementing, I don't need to re-check intent each turn." — Re-evaluate every message.

## Stopping Points

- Step 2 — wait for explicit promotion pattern (Code / Model / Hybrid) before loading `implement-patterns/INSTRUCTIONS.md`. If unknown, run decision tree or full assessment first.

## Output

- Assessment route: maturity scorecard + prioritized roadmap.
- Patterns route: implementation playbook for the selected capability.
- Full setup: complete architecture with sequenced implementation plan.
118 changes: 118 additions & 0 deletions skills/mlops/implement-patterns/INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@

# Implementation Patterns

> **Platform constraint (inherited from parent):** All recommendations must assume Snowflake as the platform. Do NOT propose non-Snowflake tools or platforms unless the user explicitly asks.

## When to Load

`mlops/SKILL.md` Step 2: When user needs implementation guidance for a specific MLOps capability.

## Setup

**Load** the decision tree (ask about team structure, artifact type, and deployment frequency) if maturity context is needed.

## Workflow

### Step 1: Identify Topic

**Ask** user which area they need guidance on:

1. **Promotion Patterns** - Code / Model / Hybrid workflows, environment structure, LLM artifact promotion
2. **CI/CD & Testing** - Test strategy, deployment automation, pipeline architecture, LLM-specific tests
3. **Continuous Training** - Retraining triggers, scheduling, automation, LLM iteration cycles
4. **Monitoring & Rollback** - Drift detection, alerting, rollback, recovery, LLM evaluation
5. **Model Lifecycle** - Registry, versioning, Champion/Challenger, promotion gates, LLM versioning
6. **Data & Features** - Data validation, feature store, skew prevention, vector DB / search index
7. **Governance & Metadata** - Lineage, compliance, audit, metadata management, LLM access control

### Step 2: Gather Context

**If routed from the parent skill with a roadmap**, use the known promotion pattern, maturity levels, and environment names. Skip to Step 3.

**Otherwise, ask** user:
1. **Promotion pattern**: Before asking, briefly introduce the three promotion patterns so the user has context:
- **Code Promotion** — Training code moves through environments (DEV → STAGING → PROD). The model is retrained in each environment using that environment's data. Best when production data is accessible from the production environment.
- **Model Promotion** — The model is trained in one environment (typically DEV) and the trained artifact is promoted to other environments. Only the artifact moves, not the training code. Best when training is expensive or environments cannot access production data.
- **Hybrid Promotion** — Code moves to a middle environment (e.g., STAGING) that has production data access, the model is trained there, and the artifact is promoted to production. Combines aspects of both patterns.
Then ask: Which pattern fits your situation? Code / Model / Hybrid (or undecided)
2. **Current maturity level**: Before asking, briefly introduce the maturity levels so the user has context:
- **L0 (Ad-hoc / Experimental)** — No formal process. Notebooks, manual everything. No production deployment.
- **L1 (Manual)** — All core AI/ML features available — but every step is executed and approved by humans. No CI/CD, no automated monitoring, no automated governance.
- **L2 (Semi-automated)** — CI/CD runs tests automatically, but model validation and promotion require human approval gates.
- **L3 (Fully Automated)** — End-to-end automation including monitoring-triggered retraining, auto-validation, and auto-promotion with rollback.
Then ask: Where does your current setup fall? L0 / L1 / L2 / L3 (or unknown)
3. **Target maturity level**: L1 / L2 / L3
4. **Environment setup**: How many environments, what names, fully isolated or shared components?

Use the user's chosen names, environment count, and isolation model in **all outputs** (checklists, diagrams, recommendations). For full environment guidance (2-env vs 3-env trade-offs, isolation models, canonical name table), see the parent mlops skill Step 2.

> **Promotion pattern is a hard prerequisite**: All implementation recommendations in this skill are pattern-specific — CI/CD pipelines, environment structure, promotion gates, and governance all vary fundamentally between Code, Model, and Hybrid promotion. **Do not proceed to Step 3** until the user has explicitly confirmed a promotion pattern.
>
> If the user says "undecided" or doesn't know:
> 1. **Quick path**: Walk them through the decision tree in the decision tree (ask about team structure, artifact type, and deployment frequency) § "Decision Tree" — this takes ~5 minutes and yields a clear pattern choice.
> 2. **Full path**: Recommend a full assessment via the parent mlops skill for comprehensive maturity + pattern evaluation (~15 minutes).
> 3. **Do not skip**: Generating implementation guidance without a promotion pattern leads to rework (e.g., building CI/CD for Code Promotion when the team actually needs Model Promotion).
>
> If maturity level or environments are unknown, these can be estimated — but promotion pattern **must** be explicit.

If maturity level is unknown, estimate based on their answers or suggest running the parent mlops skill first. **Never ask the user to self-assess their maturity level without first explaining what each level means.**

### Step 3: Load and Present Pattern

Based on topic selection, **Load** the corresponding reference:

| Topic | Reference |
|-------|-----------|
| Promotion Patterns | `references/promotion-patterns.md` |
| CI/CD & Testing | `references/ci-cd-testing.md` |
| Continuous Training | `references/continuous-training.md` |
| Monitoring & Rollback | `references/monitoring-rollback.md` |
| Model Lifecycle | `references/model-lifecycle.md` |
| Data & Features | `references/data-features.md` |
| Governance & Metadata | `references/governance-metadata.md` |

Present the relevant maturity level section (L1/L2/L3) for the user's promotion pattern. Include:
- What to implement
- How it works
- Key decisions
- Risk callouts (if applicable)

**When the topic is Promotion Patterns**: Always present the "Promotion Mechanisms and Snowflake Features" section from `references/promotion-patterns.md`. This gives the user a concrete view of how each artifact type moves between environments via CI/CD, which Snowflake commands are used, and which features enable the workflow. Present it alongside the pattern-specific guidance — do not wait for the user to ask.

### Step 4: Actionable Checklist

Produce an implementation checklist tailored to the user's context:

```
Implementation Checklist: [Topic] at L[X] [Pattern] Promotion
==============================================================
[ ] Step 1: [specific action]
[ ] Step 2: [specific action]
[ ] Step 3: [specific action]
...
Prerequisites: [list]
Depends on: [other capabilities that must be in place]
```

## Stopping Points

- ✋ After Step 1: Confirm topic selection
- ✋ After Step 2: **Hard gate** — promotion pattern must be explicitly confirmed before proceeding. Confirm context (promotion pattern, maturity levels, environment setup) before loading reference. **Load only** the corresponding reference (do not preload all references)
- ✋ After Step 3: **Present** the key decisions and risk callouts from the pattern. **Ask** the user to confirm the approach before generating the checklist
- ✋ After Step 4: Review checklist for feasibility

## Output

- Pattern guidance for selected topic at specified maturity level
- Implementation checklist with prerequisites and dependencies

## Troubleshooting

**User doesn't know their maturity level:**
- Suggest running the parent mlops skill first for a full assessment, or estimate based on their answers.

**Pattern doesn't cover the user's use case:**
- Check if a combination of patterns applies. Present the closest match and note gaps explicitly.

**User wants guidance across multiple topics at once:**
- Prioritize by dependency order: promotion patterns -> CI/CD -> data/features -> CT -> monitoring -> governance. Work through one at a time.
Loading
Loading