Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag by cankoklu-db · Pull Request #514 · databricks-solutions/ai-dev-kit

cankoklu-db · 2026-05-04T16:46:13Z

Summary

Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions (correctness, completeness, guideline adherence)
Root cause: the reference response used --project-id my-app which is invalid CLI syntax — the project ID is a positional argument to create-project, not a flag
Also expanded the reference to cover endpoint and credential commands, which are core CLI operations missing from the original

Changes

Fix create-project syntax: positional argument, not --project-id flag
Add update-endpoint example with correct positional field-mask syntax
Add generate-database-credential example
Replace no_expiry: true branch example with ttl: 604800s for consistency
Expand expected_facts to explicitly assert positional arg and field-mask patterns
Add expected_patterns for endpoint and credential commands
Guideline updated: require 5 subcommands (was 4), explicitly forbid --project-id flag

Context

Follow-on to PR #497 (CLI-first rewrite of databricks-lakebase-autoscale). The other 8 tasks recovered to 0.608–0.825 after ground truth alignment; task 006 was left at 0.200 because the reference response bug was a separate issue. This PR completes that work.

This pull request and its description were written by Isaac.

Adds a release channel selection during installation allowing users to choose between stable (default) and experimental branches. When experimental is selected: - Displays feedback request with links to issues/discussions - Re-downloads install.sh from the experimental branch - Re-executes with --experimental flag (preserving other args) Features: - New --experimental flag and DEVKIT_CHANNEL env var - Interactive radio selector for channel choice - Channel shown in summary and completion messages - Feedback reminder at end of experimental installs Closes #468 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Automates releases while ensuring the experimental branch stays in sync: - Triggers on VERSION file changes on main - Checks if experimental is behind main - Creates sync PR (main → experimental) if needed - Auto-merges if no conflicts, blocks release if conflicts exist - Clear error messages with PR links when blocked - Creates git tag and GitHub Release when sync is complete Part of #468 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

When release is blocked due to conflicts between main and experimental, the error message now includes: - Step-by-step instructions for resolution - A ready-to-use Claude Code prompt that: - First analyzes commits in experimental to understand intent - Reviews conflicted files from both sides - Resolves by keeping both changes when possible - Asks for human confirmation when resolution isn't obvious 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…apps skills - databricks-agent-bricks: Use CLI for KA/Genie, add manager.py for MAS operations - databricks-aibi-dashboards: Use databricks lakeview CLI commands - databricks-app-python: Update to use CLI-based deployment This is part of the effort to simplify skills by removing MCP tool dependencies and using Databricks CLI directly where possible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add conversation.py script for Genie Conversation API (ask_genie) - Update SKILL.md to use databricks genie CLI commands - Update spaces.md with CLI-based export/import/migration workflows - Update conversation.md to use conversation.py script 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- databricks-lakebase-autoscale: Remove MCP section, expand CLI commands - databricks-lakebase-provisioned: Remove MCP section, expand CLI commands 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…and dbsql skills - databricks-model-serving: Use databricks CLI for endpoints and workspace ops - databricks-unity-catalog: Use databricks fs CLI for volume operations - databricks-dbsql: Update guideline to use CLI instead of MCP 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove MCP Tools section from SKILL.md (manage_vs_endpoint, manage_vs_index, query_vs_index, manage_vs_data) - Update Common Issues to remove MCP-specific truncation issue - Update Notes section to reference CLI/SDK instead of MCP - Update end-to-end-rag.md: replace MCP tools table with CLI commands - Update troubleshooting-and-operations.md: replace MCP tool references with CLI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…mands - Rename Option C from "MCP Tools" to "CLI" approach - Replace references/2-mcp-approach.md with 2-cli-approach.md (full rewrite) - Update Post-Run Validation section to use `databricks pipelines` CLI - Update all workflow references from MCP to CLI/SDK - Update 1-project-initialization.md reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- databricks-config: Rewrite to use `databricks auth` CLI commands - databricks-docs: Update references from MCP to CLI/SDK - databricks-metric-views: Replace MCP tools with SQL CREATE/DESCRIBE commands - databricks-execution-compute: Replace MCP tools with CLI job commands - databricks-unity-catalog/6-volumes: Replace MCP tools with `databricks fs` CLI - databricks-unity-catalog/7-data-profiling: Replace MCP tools with SQL QUALITY MONITOR 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- 5-development-testing.md: Update workflow from MCP to CLI - 8-querying-endpoints.md: Replace MCP tools section with CLI commands - SKILL.md: Update reference table descriptions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- README.md: Update description and diagram to reference CLI/SDK - install_skills.sh: Update comment describing skills - databricks-app-python: Rename 6-mcp-approach.md to 6-cli-approach.md - databricks-jobs/task-types.md: Remove MCP tool note - databricks-model-serving: Replace MCP tools with CLI commands - 1-classical-ml.md: CLI for querying endpoints - 3-genai-agents.md: CLI for testing and querying - 6-logging-registration.md: CLI for running scripts - 7-deployment.md: CLI for job creation and management - 9-package-requirements.md: Notebook commands instead of MCP - databricks-unstructured-pdf-generation: Python script pattern - databricks-zerobus-ingest: CLI workflow instead of MCP execute_code Note: MCP references in databricks-agent-bricks (External MCP Server feature) and databricks-mlflow-evaluation (MLflow MCP server) are legitimate product features and remain unchanged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added test infrastructure for Python scripts in databricks-skills: - .tests/conftest.py: Pytest fixtures for Databricks connection - workspace_client: Session-scoped WorkspaceClient - warehouse_id: Finds running SQL warehouse - Custom markers for integration tests - .tests/test_agent_bricks_manager.py: Tests for supervisor agent CLI - Unit tests for _build_agent_list helper (all agent types) - Integration tests for MAS lifecycle (list, find, get) - .tests/test_genie_conversation.py: Tests for Genie conversation CLI - Unit tests with mocks for ask_genie function - Tests for timeout, failure handling, conversation tracking - Integration tests for live Genie Space queries - .tests/run_tests.py: Test runner script - Supports --unit and --integration flags - HTML and JUnit XML report generation - Colored terminal output with summary Tests cover the remaining Python scripts in skills: - databricks-agent-bricks/manager.py - databricks-genie/conversation.py All 11 unit tests pass. Integration tests require Databricks connection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…xample queue support Changes: - Renamed manager.py → mas_manager.py for clearer naming - Added example question management functions: - add_examples(): Add examples to ONLINE MAS - add_examples_queued(): Queue examples for when MAS becomes ONLINE - list_examples(): List all examples for a MAS - Integrated with TileExampleQueue from databricks-tools-core - Updated all documentation references to use mas_manager.py - Updated test imports to use mas_manager module This allows users to add example questions immediately after creating a MAS, even before it finishes provisioning. Examples are automatically added when the endpoint becomes ONLINE. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add installation section with uv (preferred) and pip fallback for installing databricks-tools-core library. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- mas_manager.py: Inline all agent_bricks functionality, use raw HTTP with WorkspaceClient for auth only (no core imports) - pdf_generator.py: New self-contained script using CLI for uploads (databricks fs cp) instead of SDK-based volume operations - Update SKILL.md files to reflect self-contained scripts - Update tests to work with new modules Skills now only require: - databricks-sdk (for auth in mas_manager) - requests (for HTTP in mas_manager) - plutoprint (for PDF generation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Move mas_manager.py to databricks-agent-bricks/scripts/ - Move conversation.py to databricks-genie/scripts/ - Move pdf_generator.py to databricks-unstructured-pdf-generation/scripts/ - Update all markdown references to use scripts/ path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Use --json syntax for creating UC objects (catalogs, schemas, volumes) - Document correct JSON format for each create operation - Add SQL execution alternative for creating objects - Fix incorrect positional args syntax in multiple skill files The --json syntax is the most reliable across CLI versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Use --json syntax for catalogs, schemas, volumes create commands - Remove incorrect positional argument examples - Simplify volume example (remove external variant) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

DatabricksEnv does not exist in current databricks-connect versions. Updated all skills to use: - DatabricksSession.builder.serverless(True).getOrCreate() - Local dependency installation via uv/pip 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add Post-Generation Validation section with CLI SQL examples - Update troubleshooting.md with CLI-based validation queries - Remove in-script .show() calls from generate_synthetic_data.py - Validate data using `databricks sql execute` instead of DataFrame API 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove Python import patterns (not usable by agent) - Focus on CLI: write HTML to temp file, run script - Remove redundant sections and patterns 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add field format requirements: all items need unique 32-char hex UUID id - Document that question/sql/content fields must be arrays of strings - Add example showing correct format - Add trash-space command for deleting spaces 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Document correct serialized_space format with ID requirements - All items require 32-char hex UUID id field (uuid.uuid4().hex) - Text fields (question, sql, content) must be arrays, not strings - Fix CLI syntax: use title (not display_name), serialized_space (not table_identifiers) - Add trash-space command documentation - Remove redundant spaces.md file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Create standalone compute.py with all logic inlined (no external deps) - Filter clusters to UI/API sources only (interactive, human-created) - Add page_size=100 for faster cluster listing - Use proper SDK types (JobEnvironment, Environment, timedelta) - Add integration tests for compute.py CLI - Merge Genie conversation.md into SKILL.md - Fix CLI commands in SKILL.md (databricks warehouses) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add CRITICAL widget version requirements table - Document mandatory validation workflow (test queries before deploy) - Fix CLI commands: discover-schema requires CATALOG.SCHEMA.TABLE format - Fix lakeview create: use --display-name, --warehouse-id, --serialized-dashboard - Add Genie space linking via uiSettings.genieSpace - Add design best practices section - Remove duplicate 3-examples.md (content in 4-examples.md) - Update file references to match correct numbering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add Quick Reference table at top for common CLI commands - Add Step 4 design phase with filter-to-dataset mapping - Add filter scope rule to checklist (filters only affect datasets with field) - Clarify percentage format (0-1 vs 0-100) with fix options - Add data variance guidance for trend charts - Condense expression examples using [option|option] notation - Remove redundant ASCII workflow diagram (steps below are clearer) - Link dataset parameters to filter widget documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Documents how to run unit and integration tests for skill scripts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix create-knowledge-source: use --json with source_type "files" and files.path (old --volume-config flag doesn't exist) - Add Quick Reference section with correct commands - Add volume discovery step: databricks volumes list CATALOG SCHEMA - Fix state name: CREATING (not PROVISIONING) - Streamline content, remove duplicates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Comprehensive CLI audit fixes: - Fix positional arguments vs flags (postgres, database, system-schemas, knowledge-assistants, storage-credentials) - Add cluster/warehouse create examples with tags to execution-compute - Add --cluster-sources UI,API filter to exclude job clusters (faster) - Fix genie export/import commands (use get-space --include-serialized-space) - Standardize tag instructions: "include" for inline JSON, "after creation" for workspace-entity-tag-assignments Resources with tags: - Jobs, Pipelines: inline "tags" in create JSON - Clusters: inline "custom_tags" in create JSON - Warehouses: inline "tags.custom_tags" array in create JSON - Dashboards, Apps, Genie: workspace-entity-tag-assignments API - Serving Endpoints: patch API with add_tags 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Restore databricks-mcp-server/ and databricks-tools-core/ directories - Make MCP server installation optional in install.sh (default: skip) - Add --mcp and --mcp-path CLI options for non-interactive install - Add DEVKIT_INSTALL_MCP and DEVKIT_MCP_PATH env vars - Skills-only install is faster (no venv setup required) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Save installation config (tools, profile, scope, skills, MCP) to .install-config - On reinstall, show recap of previous settings with option to reuse or reconfigure - Use hash-based schema validation: auto-detects when new config fields are added - Silent/non-interactive modes auto-apply previous config when available - Config file stored in scope-appropriate location (project or global) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Rename .install-config to .ai-dev-kit-install-config - Remove old .skills-profile mechanism (now unified in config file) - Add HAS_PREVIOUS_CONFIG flag for pre-selection mode - Pre-select all prompts from saved values when reconfiguring: - Tools: shows "previous" hint on saved selections - Databricks profile: pre-selects saved profile - Scope: pre-selects project/global - Skills: pre-selects skill profiles - MCP: pre-selects install option - Simplify "Keep this configuration? (Y/n)" prompt - Make header/prerequisites more compact (single line) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove version check (always reinstall) - Remove extra blank line after experimental download message - Add "previous" hint to all pre-selected options from saved config: - Scope, MCP install, skill profiles 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Quote all values in save_config to handle spaces correctly - Replace source with grep-based parsing (no code execution risk) - Any config error silently falls back to fresh install 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

… 3.12 - Remove duplicate --mcp-path case in argument parser - Remove dead INSTALL_MCP=true assignment (was immediately overwritten) - Remove duplicate MCP server line in summary - Remove redundant install_mcp_server call (setup_mcp already handles it) - Use Python 3.12 instead of 3.11 for venv creation - Add --allow-existing to install_mcp_server venv creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…pport Cleanup (~200 lines removed): - Remove dead functions: install_mcp_server(), check_sdk_version(), prompt_mcp_path() - Refactor prompt_scope() and prompt_mcp_install() to use radio_select() New feature - Claude profile env: - Add write_claude_env() to set DATABRICKS_CONFIG_PROFILE in .claude/settings.json - Only prompt for profile when Claude + project scope (not global) - Reorder flow: tools → scope → profile (so we know scope before asking profile) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The --dataset-catalog and --dataset-schema CLI flags only fill in missing parts of a query — they do NOT override catalog/schema hardcoded in the FROM clause. Dashboard queries must use bare table names only (e.g., "FROM trips", not "FROM nyctaxi.trips"). - SKILL.md: rewrite note with ✅/❌ examples and a "why" explanation - 4-examples.md: update example queries to use bare table names - 3-filters.md: update example query to use bare table name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

SKILL.md becomes a dense hub with one worked CLI example per concept (projects, branches, endpoints, credentials, reverse ETL). Deep-dive subfiles cover internals, limits, and advanced CLI, with an "SDK equivalents" section at the bottom of each. connection-patterns.md stays SDK-based since in-process OAuth token refresh is the one legitimate runtime SDK use case. Also fixes CLI bugs found during live testing: create-project and generate-database-credential take positional args (not flags); default endpoint is named primary (not ep-primary); duration fields use suspend_timeout_duration / history_retention_duration (not _seconds). Co-authored-by: Isaac

SKILL.md and references/2-serverless-job.md now lead with databricks jobs submit (the one-shot create+run CLI primitive) instead of the defunct MCP execute_code wrapper that the reference file used to point at. Full flow documented: upload → submit → poll get-run → fetch get-run-output, including the non-obvious gotcha that get-run-output takes the task run_id (.tasks[0].run_id), not the parent run_id from submit. scripts/compute.py gains --environments flag with dict-or-typed normalization so the standalone script can install pip dependencies (previously impossible from CLI — "client": "4" deps had no path). Interactive cluster section reduced to an "avoid by default" callout in SKILL.md; the raw-CLI cluster list and create patterns move into references/3-interactive-cluster.md alongside the existing script wrappers. SQL Warehouses section in SKILL.md expanded from create-only to the full CRUD surface (create, list, find, get, start, stop, edit, delete) with live-verified min_num_clusters/max_num_clusters and --no-wait gotchas. Co-authored-by: Isaac

job_extra_params={"environments": [...]} was broken both ways: passing dicts (the documented shape) crashed in the SDK's jobs.submit because it serializes each list element via .as_dict(); passing typed JobEnvironment crashed earlier trying to read environment_key with .get(). Neither path worked. Normalize extra["environments"] to List[JobEnvironment] once at the top of the submit path: dicts get wrapped (nested spec dict → typed Environment), typed objects pass through, anything else raises TypeError before hitting the SDK. env_key for the task binding is read off the canonical typed object. Adds TestServerlessJobExtraParams integration-test class covering the four cases: dict input, typed input, no environments (default path), malformed entry. Previously there was zero coverage of job_extra_params, which is how the bug landed. All four pass live (≈110 s for the class). Co-authored-by: Isaac

The --dataset-catalog / --dataset-schema guidance tells you what to do but not why. Clarify that bare table names exist so the serialized dashboard can be re-installed on a different catalog.schema without rewriting queries. Co-authored-by: Isaac

Three skills previously documented the dead get_table_stats_and_schema MCP function or had related gaps: - databricks-metric-views: swap the MCP call for `databricks experimental aitools tools discover-schema` and note that deeper distribution probes go through the `query` subcommand. - databricks-genie: same replacement in Step 1 "Understand the Data", plus delete the bogus `databricks sql exec` calls (no such subcommand exists) in favor of `query`. - databricks-aibi-dashboards: expand Step 2 exploration guidance so the design decisions (widget vs. table, KPI vs. trend chart, trend granularity, filter options) are explicitly tied to what to probe (cardinality, top values, numeric distribution, trend viability). Keeps the skill conceptual rather than prescribing SQL the agent can already write. Co-authored-by: Isaac

databricks fs on CLI v0.296 requires the dbfs: scheme prefix for UC Volume paths. Without it the CLI treats the path as local filesystem and errors with `no such directory`. Fix every fs example pointed at /Volumes/... in the PDF, UC, and SDP skills; also tighten the UC examples to use -r and --overwrite consistently, and clarify that -r copies the source directory's contents (not the directory itself). databricks workspace import-dir on redeploys silently skips files that already exist, so updates never reach the workspace and the app keeps running the old version. Add --overwrite to every import-dir example in the app skill's 4-deployment.md and 6-cli-approach.md. Also flag the first-ever-deploy gotcha on the redeployment recipe (the workspace delete line errors when the target dir doesn't exist yet). Fix the PDF skill's volumes-create troubleshooting row — it passed a single dotted arg (`catalog.schema.volume`) where the CLI wants four positional args (`CATALOG SCHEMA NAME MANAGED`). All corrected forms live-verified against the workspace once. Co-authored-by: Isaac

scripts/conversation.py was a 171-line Python glue wrapper around client.genie.start_conversation_and_wait and client.genie.get_message with manual polling. The CLI now exposes all the primitives directly (start-conversation, create-message, get-message, get-message-attachment-query-result), and start-conversation has a built-in --no-wait / --timeout LRO flag. Document the three-command flow end-to-end and delete the script. No external Python callers (only SKILL.md pointed at it). Also in this commit: - Fix the Export/Import quoting inconsistency: genie_space.json on disk is now a parsed object (not a JSON-string blob). Export unwraps with `jq '.serialized_space | fromjson'`; import and update both stringify consistently with `jq -c '.' | jq -Rs '.'`. - Add two troubleshooting rows: slow answers / query timeouts (warehouse sizing) and wrong/empty answers (example_question_sqls + text_instructions). - Drop the redundant serialized_space "Structure" skeleton — its information is a strict subset of the Complete Example, now renamed "Example" with the top-level keys called out in the lead-in. All three primitives live-verified against a real Genie Space on the workspace (NordWind Fleet Analytics): start-conversation → poll get-message → get-message-attachment-query-result (columns + rows) → create-message for follow-up. Co-authored-by: Isaac

SDP skill had three concrete bugs that bit an agent running a real pipeline update end-to-end: 1. references/2-cli-approach.md claimed "file" libraries could point to a directory. They can't — the API errors with "Paths must end with .py or .sql". The correct shape for a folder is {"glob": {"include": "<dir>/**"}}. Fixed the example and added a troubleshooting row for the exact error string. Live-verified with a pipelines update round-trip. 2. No documented flow for editing an existing pipeline. Added a dense "Updating a Pipeline" section covering re-upload + start-update. Key gotcha: pipelines consume raw FILE entries, so re-imports need --format RAW --overwrite. --format SOURCE --language SQL|PYTHON creates a workspace NOTEBOOK (deprecated for pipelines) and fails on an existing FILE path with "type mismatch (asked: NOTEBOOK, actual: FILE)". Live-verified both failure and success modes. Added troubleshooting row. 3. Contradictory streaming read guidance — SKILL.md said FROM stream(table), 4-dlt-migration.md showed FROM STREAM table. Both parse, but the function form is the canonical one. Reworked the troubleshooting row to spell out when each form applies and flag FROM STREAM table as legacy DLT compatibility. Bonus: pipelines list-pipeline-events returns a bare array, not {"events": [...]} — skill previously showed the raw command with no output shape hint. Replaced with a ready jq pattern that surfaces just ERROR/WARN entries; agent had written two failing Python one-liners trying to guess the shape. Also simplified databricks-unity-catalog SKILL.md to show the positional form for schemas create and volumes create (what the help text documents as canonical) instead of the --json form that was redundant with the positional CLI. Co-authored-by: Isaac

Script invocations in SKILL.md and references (python scripts/X) previously assumed the reader was running from the skill's install folder. Agents running from an arbitrary project cwd hit "No such file or directory" errors — the agent-bricks, execution-compute, and pdf-generation skills all trip the same way. Switch to the <SKILL_ROOT> literal token for every script invocation and add a one-line convention note at the top of each affected SKILL.md and reference file: > <SKILL_ROOT> = the directory containing this SKILL.md; resolve to > the absolute install path (e.g. ~/.claude/skills/<skill-name>). Rewrote: - python scripts/compute.py ... → python <SKILL_ROOT>/scripts/compute.py ... - python scripts/pdf_generator.py ... → python <SKILL_ROOT>/scripts/pdf_generator.py ... Also fixed a stale markdown link in the SDP skill whose display text said "examples/exploration_notebook.py" but whose path was "scripts/...". databricks-agent-bricks script references come in a separate commit. Co-authored-by: Isaac

The skill told readers to call create-knowledge-source with four positional args (PARENT DISPLAY_NAME DESCRIPTION SOURCE_TYPE) alongside --json. The CLI rejects that combination: Error: when --json flag is specified, provide only PARENT as positional arguments. Provide 'display_name', 'description', 'source_type' in your JSON input. Only two forms actually work (verified live on the workspace): 1. PARENT + --json '{display_name, description, source_type, files|index|...}' 2. positional-only (no --json) — but then there's nowhere to pass files.path / index.index_name, so this form only works for source types that need no extra body, which today is none. Updated SKILL.md and 1-knowledge-assistants.md to show the single working shape: PARENT positional + everything else in --json. Added the display_name / description fields inside each example body. Co-authored-by: Isaac

Three concrete bugs in scripts/mas_manager.py triggered by a real agent session: 1. get_mas (L481) and update_mas (L531) read instructions from mas_data.get("instructions") — wrong nested level, always empty. The GET response nests it on tile: mas_data.tile.instructions. Consequence: update_mas(tile_id, name="...") without an explicit instructions= arg wiped the existing instructions on every call. Verified the correct path live: "instructions_len: 232" vs 0 before. 2. add_examples_queued span up an in-process daemon thread that polled get_endpoint_status every 30s. When the CLI process exited, the thread died and examples were never added — silent data loss. Removed add_examples_queued, TileExampleQueue, get_tile_example_queue, the _tile_example_queue singleton, and the now-unused threading / Tuple imports. 3. Replaced the broken queue with a wait_for_online flag on add_examples (CLI: --wait). Blocks and polls every 30s for up to 15 min (covers the ~10 min NOT_READY -> ONLINE wait after create_mas or a big update_mas, with headroom). No background queue — the caller process must stay alive for the wait. Also live-verified that the MAS PATCH endpoint is NOT partial: missing `name` returns 400 Missing required field, missing `agents` returns 400 "At least one BaseAgent must be provided". update_mas already handles this internally (fetches existing + merges), so the full-replace reality stays an internal detail of the HTTP layer — callers see a partial-update-shaped API. Skill doc updates: - SKILL.md: reorder list_mas to the top of the check/manage block with a one-liner describing the return shape. - SKILL.md: flag the ~10min NOT_READY wait on add_examples with --wait. - SKILL.md: fix status legend from "(2-5 min)" to "up to ~10 min". - 2-supervisor-agents.md: replace the dual add_examples / add_examples_wait block with a single add_examples [--wait] example. - SKILL.md also includes the KA create-knowledge-source fix from the previous commit's companion page (PARENT + everything-in-json). Co-authored-by: Isaac

Lakebase Autoscaling is the canonical path for all new Lakebase work (autoscaling, branching, scale-to-zero, point-in-time restore). The Provisioned skill covers the predecessor fixed-capacity tier; keeping both causes agents to spend time deciding between them or picking the older one. Delete the Provisioned skill and point everything at autoscale. Files deleted: - databricks-skills/databricks-lakebase-provisioned/SKILL.md - databricks-skills/databricks-lakebase-provisioned/connection-patterns.md - databricks-skills/databricks-lakebase-provisioned/reverse-etl.md Cross-references updated: - install_skills.sh: drop from DATABRICKS_SKILLS list, description map, and reference-files map. - README.md: replace the Provisioned bullet with a Lakebase Autoscale bullet under the same Development & Deployment section. - databricks-python-sdk/SKILL.md, databricks-app-python/SKILL.md: redirect the "Related Skills" link to databricks-lakebase-autoscale. - databricks-lakebase-autoscale/SKILL.md: drop the now-meaningless "Provisioned vs Autoscaling" comparison table and the predecessor link. Keep the one prose mention in computes.md explaining CU RAM sizing context — that's justification, not a link. Co-authored-by: Isaac

…tures - Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md - Fix scale-to-zero wake-up latency: few hundred ms → ~100ms - Update token refresh guidance: 50 min → 30-40 min - Move synced-table CLI from `databricks database` to `databricks postgres` group (v0.294.0+) - Update SDK module from `databricks.sdk.service.database` to `databricks.sdk.service.postgres` - Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU - Add High Availability section (secondaries vs read replicas, HA constraints) - Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only) - Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only) - Add `databricks apps init --features lakebase` command and `list-endpoints` command Co-authored-by: Isaac

Reorganizes branches.md, computes.md, connection-patterns.md, projects.md, and reverse-etl.md into a references/ subfolder. Updates all links in SKILL.md (references/foo.md) and back-links in each reference file (../SKILL.md). Also corrects token refresh guidance to 45 min per official Databricks docs (docs.databricks.com/aws/en/oltp/projects/external-apps-connect). Co-authored-by: Isaac

Ports three hard-difficulty interactive test cases from ai-dev-kit-lakebase_updates: - 007: Full project setup (create project, autoscaling, branch protection, dev branch, connectivity, database) - 008: Schema DDL (4-table support schema with FKs, CHECK constraints, indexes) - 009: Extended DDL (support_cases, case_products, case_notes with uv/pip install) Fixes token refresh guidance in 007 response from ~50 min to ~45 min. Co-authored-by: Isaac

…p-primary Follow-up to the CLI-first rewrite in this PR. Three fixes that were blocking accurate eval scoring: 1. ground_truth.yaml — replaced all SDK expected_facts/patterns with CLI equivalents for tasks 001, 002, 004, 005, 007 (management-plane tasks). Connection and DDL facts in tasks 003, 008, 009 made approach-agnostic. Proxy eval confirmed: 0.216 → 0.609 (+0.393). Tasks 002/003/005 all hit 0.825 after the fix; task 006 (cli_reference) is residual work for a follow-on PR. 2. SKILL.md — added psycopg3 connection snippet inline to the Credentials section. Moving it to references/connection-patterns.md caused a regression where the surrogate LLM defaulted to import psycopg2. Reference files are not loaded at eval time (evaluator reads only SKILL.md); guidance must be inline to be effective. 3. references/computes.md line 7 + ground_truth.yaml tasks 007/008/009 — ep-primary → primary. The wrong endpoint name was the direct cause of the task 009 floor (score 0.000 during agent-eval; agent used the wrong path and the connection failed). Also adds psycopg[binary] to .test/pyproject.toml so pre-validation passes for tasks 003/007/008/009 (which import psycopg in their reference responses), and fixes agent executor to skip empty env var values so Claude Code falls back to keychain auth correctly. Co-authored-by: Isaac

…ence Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions (correctness, completeness, guideline adherence). Root cause: the reference response used '--project-id my-app' which is invalid syntax — the project ID is a positional argument to create-project, not a flag. Changes: - Fix create-project syntax: positional argument, not --project-id flag - Add update-endpoint example with correct positional field-mask syntax - Add generate-database-credential example (credentials are a core CLI op) - Replace 'no_expiry: true' example with 'ttl: 604800s' for consistency with the rest of the ground truth - Expand expected_facts to assert positional arg and field-mask patterns - Add expected_patterns for endpoint and credential commands - Update guideline: 5 subcommands (was 4), explicitly forbid --project-id flag Co-authored-by: Isaac

dustinvannoy-db · 2026-05-04T23:30:14Z

@cankoklu-db can you make these changes for lakebase directly to the PR you reference instead? I think that was your intentino

cankoklu-db · 2026-05-06T07:39:40Z

Changes applied directly to PR #497 (experimental_lakebase_updates) per reviewer feedback.

Quentin Ambard and others added 30 commits April 15, 2026 10:48

Add installation instructions to PDF generation skill

0e34e8a

Add installation section with uv (preferred) and pip fallback for installing databricks-tools-core library. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fix pdf_generator import path to use scripts/ folder

933271d

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add testing section to skills README

f1a745c

Documents how to run unit and integration tests for skill scripts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Quentin Ambard and others added 27 commits April 15, 2026 14:40

Fix comma-space separator in prerequisites list

2284568

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

cankoklu-db closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag#514

Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag#514
cankoklu-db wants to merge 62 commits intomainfrom
fix/lakebase-autoscale-task006-ground-truth

cankoklu-db commented May 4, 2026

Uh oh!

dustinvannoy-db commented May 4, 2026

Uh oh!

cankoklu-db commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants