Skip to content

Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag#514

Closed
cankoklu-db wants to merge 62 commits intomainfrom
fix/lakebase-autoscale-task006-ground-truth
Closed

Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag#514
cankoklu-db wants to merge 62 commits intomainfrom
fix/lakebase-autoscale-task006-ground-truth

Conversation

@cankoklu-db
Copy link
Copy Markdown
Collaborator

Summary

  • Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions (correctness, completeness, guideline adherence)
  • Root cause: the reference response used --project-id my-app which is invalid CLI syntax — the project ID is a positional argument to create-project, not a flag
  • Also expanded the reference to cover endpoint and credential commands, which are core CLI operations missing from the original

Changes

  • Fix create-project syntax: positional argument, not --project-id flag
  • Add update-endpoint example with correct positional field-mask syntax
  • Add generate-database-credential example
  • Replace no_expiry: true branch example with ttl: 604800s for consistency
  • Expand expected_facts to explicitly assert positional arg and field-mask patterns
  • Add expected_patterns for endpoint and credential commands
  • Guideline updated: require 5 subcommands (was 4), explicitly forbid --project-id flag

Context

Follow-on to PR #497 (CLI-first rewrite of databricks-lakebase-autoscale). The other 8 tasks recovered to 0.608–0.825 after ground truth alignment; task 006 was left at 0.200 because the reference response bug was a separate issue. This PR completes that work.

This pull request and its description were written by Isaac.

Quentin Ambard and others added 30 commits April 15, 2026 10:48
Adds a release channel selection during installation allowing users to
choose between stable (default) and experimental branches.

When experimental is selected:
- Displays feedback request with links to issues/discussions
- Re-downloads install.sh from the experimental branch
- Re-executes with --experimental flag (preserving other args)

Features:
- New --experimental flag and DEVKIT_CHANNEL env var
- Interactive radio selector for channel choice
- Channel shown in summary and completion messages
- Feedback reminder at end of experimental installs

Closes #468

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Automates releases while ensuring the experimental branch stays in sync:

- Triggers on VERSION file changes on main
- Checks if experimental is behind main
- Creates sync PR (main → experimental) if needed
- Auto-merges if no conflicts, blocks release if conflicts exist
- Clear error messages with PR links when blocked
- Creates git tag and GitHub Release when sync is complete

Part of #468

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
When release is blocked due to conflicts between main and experimental,
the error message now includes:
- Step-by-step instructions for resolution
- A ready-to-use Claude Code prompt that:
  - First analyzes commits in experimental to understand intent
  - Reviews conflicted files from both sides
  - Resolves by keeping both changes when possible
  - Asks for human confirmation when resolution isn't obvious

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…apps skills

- databricks-agent-bricks: Use CLI for KA/Genie, add manager.py for MAS operations
- databricks-aibi-dashboards: Use databricks lakeview CLI commands
- databricks-app-python: Update to use CLI-based deployment

This is part of the effort to simplify skills by removing MCP tool dependencies
and using Databricks CLI directly where possible.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add conversation.py script for Genie Conversation API (ask_genie)
- Update SKILL.md to use databricks genie CLI commands
- Update spaces.md with CLI-based export/import/migration workflows
- Update conversation.md to use conversation.py script

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- databricks-lakebase-autoscale: Remove MCP section, expand CLI commands
- databricks-lakebase-provisioned: Remove MCP section, expand CLI commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…and dbsql skills

- databricks-model-serving: Use databricks CLI for endpoints and workspace ops
- databricks-unity-catalog: Use databricks fs CLI for volume operations
- databricks-dbsql: Update guideline to use CLI instead of MCP

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove MCP Tools section from SKILL.md (manage_vs_endpoint, manage_vs_index, query_vs_index, manage_vs_data)
- Update Common Issues to remove MCP-specific truncation issue
- Update Notes section to reference CLI/SDK instead of MCP
- Update end-to-end-rag.md: replace MCP tools table with CLI commands
- Update troubleshooting-and-operations.md: replace MCP tool references with CLI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…mands

- Rename Option C from "MCP Tools" to "CLI" approach
- Replace references/2-mcp-approach.md with 2-cli-approach.md (full rewrite)
- Update Post-Run Validation section to use `databricks pipelines` CLI
- Update all workflow references from MCP to CLI/SDK
- Update 1-project-initialization.md reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- databricks-config: Rewrite to use `databricks auth` CLI commands
- databricks-docs: Update references from MCP to CLI/SDK
- databricks-metric-views: Replace MCP tools with SQL CREATE/DESCRIBE commands
- databricks-execution-compute: Replace MCP tools with CLI job commands
- databricks-unity-catalog/6-volumes: Replace MCP tools with `databricks fs` CLI
- databricks-unity-catalog/7-data-profiling: Replace MCP tools with SQL QUALITY MONITOR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- 5-development-testing.md: Update workflow from MCP to CLI
- 8-querying-endpoints.md: Replace MCP tools section with CLI commands
- SKILL.md: Update reference table descriptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- README.md: Update description and diagram to reference CLI/SDK
- install_skills.sh: Update comment describing skills
- databricks-app-python: Rename 6-mcp-approach.md to 6-cli-approach.md
- databricks-jobs/task-types.md: Remove MCP tool note
- databricks-model-serving: Replace MCP tools with CLI commands
  - 1-classical-ml.md: CLI for querying endpoints
  - 3-genai-agents.md: CLI for testing and querying
  - 6-logging-registration.md: CLI for running scripts
  - 7-deployment.md: CLI for job creation and management
  - 9-package-requirements.md: Notebook commands instead of MCP
- databricks-unstructured-pdf-generation: Python script pattern
- databricks-zerobus-ingest: CLI workflow instead of MCP execute_code

Note: MCP references in databricks-agent-bricks (External MCP Server
feature) and databricks-mlflow-evaluation (MLflow MCP server) are
legitimate product features and remain unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added test infrastructure for Python scripts in databricks-skills:

- .tests/conftest.py: Pytest fixtures for Databricks connection
  - workspace_client: Session-scoped WorkspaceClient
  - warehouse_id: Finds running SQL warehouse
  - Custom markers for integration tests

- .tests/test_agent_bricks_manager.py: Tests for supervisor agent CLI
  - Unit tests for _build_agent_list helper (all agent types)
  - Integration tests for MAS lifecycle (list, find, get)

- .tests/test_genie_conversation.py: Tests for Genie conversation CLI
  - Unit tests with mocks for ask_genie function
  - Tests for timeout, failure handling, conversation tracking
  - Integration tests for live Genie Space queries

- .tests/run_tests.py: Test runner script
  - Supports --unit and --integration flags
  - HTML and JUnit XML report generation
  - Colored terminal output with summary

Tests cover the remaining Python scripts in skills:
- databricks-agent-bricks/manager.py
- databricks-genie/conversation.py

All 11 unit tests pass. Integration tests require Databricks connection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…xample queue support

Changes:
- Renamed manager.py → mas_manager.py for clearer naming
- Added example question management functions:
  - add_examples(): Add examples to ONLINE MAS
  - add_examples_queued(): Queue examples for when MAS becomes ONLINE
  - list_examples(): List all examples for a MAS
- Integrated with TileExampleQueue from databricks-tools-core
- Updated all documentation references to use mas_manager.py
- Updated test imports to use mas_manager module

This allows users to add example questions immediately after creating a MAS,
even before it finishes provisioning. Examples are automatically added when
the endpoint becomes ONLINE.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add installation section with uv (preferred) and pip fallback
for installing databricks-tools-core library.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- mas_manager.py: Inline all agent_bricks functionality, use raw HTTP
  with WorkspaceClient for auth only (no core imports)
- pdf_generator.py: New self-contained script using CLI for uploads
  (databricks fs cp) instead of SDK-based volume operations
- Update SKILL.md files to reflect self-contained scripts
- Update tests to work with new modules

Skills now only require:
- databricks-sdk (for auth in mas_manager)
- requests (for HTTP in mas_manager)
- plutoprint (for PDF generation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Move mas_manager.py to databricks-agent-bricks/scripts/
- Move conversation.py to databricks-genie/scripts/
- Move pdf_generator.py to databricks-unstructured-pdf-generation/scripts/
- Update all markdown references to use scripts/ path

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Use --json syntax for creating UC objects (catalogs, schemas, volumes)
- Document correct JSON format for each create operation
- Add SQL execution alternative for creating objects
- Fix incorrect positional args syntax in multiple skill files

The --json syntax is the most reliable across CLI versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Use --json syntax for catalogs, schemas, volumes create commands
- Remove incorrect positional argument examples
- Simplify volume example (remove external variant)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
DatabricksEnv does not exist in current databricks-connect versions.
Updated all skills to use:
- DatabricksSession.builder.serverless(True).getOrCreate()
- Local dependency installation via uv/pip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add Post-Generation Validation section with CLI SQL examples
- Update troubleshooting.md with CLI-based validation queries
- Remove in-script .show() calls from generate_synthetic_data.py
- Validate data using `databricks sql execute` instead of DataFrame API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Python import patterns (not usable by agent)
- Focus on CLI: write HTML to temp file, run script
- Remove redundant sections and patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add field format requirements: all items need unique 32-char hex UUID id
- Document that question/sql/content fields must be arrays of strings
- Add example showing correct format
- Add trash-space command for deleting spaces

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Document correct serialized_space format with ID requirements
- All items require 32-char hex UUID id field (uuid.uuid4().hex)
- Text fields (question, sql, content) must be arrays, not strings
- Fix CLI syntax: use title (not display_name), serialized_space (not table_identifiers)
- Add trash-space command documentation
- Remove redundant spaces.md file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Create standalone compute.py with all logic inlined (no external deps)
- Filter clusters to UI/API sources only (interactive, human-created)
- Add page_size=100 for faster cluster listing
- Use proper SDK types (JobEnvironment, Environment, timedelta)
- Add integration tests for compute.py CLI
- Merge Genie conversation.md into SKILL.md
- Fix CLI commands in SKILL.md (databricks warehouses)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add CRITICAL widget version requirements table
- Document mandatory validation workflow (test queries before deploy)
- Fix CLI commands: discover-schema requires CATALOG.SCHEMA.TABLE format
- Fix lakeview create: use --display-name, --warehouse-id, --serialized-dashboard
- Add Genie space linking via uiSettings.genieSpace
- Add design best practices section
- Remove duplicate 3-examples.md (content in 4-examples.md)
- Update file references to match correct numbering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add Quick Reference table at top for common CLI commands
- Add Step 4 design phase with filter-to-dataset mapping
- Add filter scope rule to checklist (filters only affect datasets with field)
- Clarify percentage format (0-1 vs 0-100) with fix options
- Add data variance guidance for trend charts
- Condense expression examples using [option|option] notation
- Remove redundant ASCII workflow diagram (steps below are clearer)
- Link dataset parameters to filter widget documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documents how to run unit and integration tests for skill scripts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix create-knowledge-source: use --json with source_type "files"
  and files.path (old --volume-config flag doesn't exist)
- Add Quick Reference section with correct commands
- Add volume discovery step: databricks volumes list CATALOG SCHEMA
- Fix state name: CREATING (not PROVISIONING)
- Streamline content, remove duplicates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Quentin Ambard and others added 27 commits April 15, 2026 14:40
Comprehensive CLI audit fixes:
- Fix positional arguments vs flags (postgres, database, system-schemas,
  knowledge-assistants, storage-credentials)
- Add cluster/warehouse create examples with tags to execution-compute
- Add --cluster-sources UI,API filter to exclude job clusters (faster)
- Fix genie export/import commands (use get-space --include-serialized-space)
- Standardize tag instructions: "include" for inline JSON, "after creation"
  for workspace-entity-tag-assignments

Resources with tags:
- Jobs, Pipelines: inline "tags" in create JSON
- Clusters: inline "custom_tags" in create JSON
- Warehouses: inline "tags.custom_tags" array in create JSON
- Dashboards, Apps, Genie: workspace-entity-tag-assignments API
- Serving Endpoints: patch API with add_tags

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Restore databricks-mcp-server/ and databricks-tools-core/ directories
- Make MCP server installation optional in install.sh (default: skip)
- Add --mcp and --mcp-path CLI options for non-interactive install
- Add DEVKIT_INSTALL_MCP and DEVKIT_MCP_PATH env vars
- Skills-only install is faster (no venv setup required)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Save installation config (tools, profile, scope, skills, MCP) to .install-config
- On reinstall, show recap of previous settings with option to reuse or reconfigure
- Use hash-based schema validation: auto-detects when new config fields are added
- Silent/non-interactive modes auto-apply previous config when available
- Config file stored in scope-appropriate location (project or global)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Rename .install-config to .ai-dev-kit-install-config
- Remove old .skills-profile mechanism (now unified in config file)
- Add HAS_PREVIOUS_CONFIG flag for pre-selection mode
- Pre-select all prompts from saved values when reconfiguring:
  - Tools: shows "previous" hint on saved selections
  - Databricks profile: pre-selects saved profile
  - Scope: pre-selects project/global
  - Skills: pre-selects skill profiles
  - MCP: pre-selects install option
- Simplify "Keep this configuration? (Y/n)" prompt
- Make header/prerequisites more compact (single line)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove version check (always reinstall)
- Remove extra blank line after experimental download message
- Add "previous" hint to all pre-selected options from saved config:
  - Scope, MCP install, skill profiles

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Quote all values in save_config to handle spaces correctly
- Replace source with grep-based parsing (no code execution risk)
- Any config error silently falls back to fresh install

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… 3.12

- Remove duplicate --mcp-path case in argument parser
- Remove dead INSTALL_MCP=true assignment (was immediately overwritten)
- Remove duplicate MCP server line in summary
- Remove redundant install_mcp_server call (setup_mcp already handles it)
- Use Python 3.12 instead of 3.11 for venv creation
- Add --allow-existing to install_mcp_server venv creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…pport

Cleanup (~200 lines removed):
- Remove dead functions: install_mcp_server(), check_sdk_version(), prompt_mcp_path()
- Refactor prompt_scope() and prompt_mcp_install() to use radio_select()

New feature - Claude profile env:
- Add write_claude_env() to set DATABRICKS_CONFIG_PROFILE in .claude/settings.json
- Only prompt for profile when Claude + project scope (not global)
- Reorder flow: tools → scope → profile (so we know scope before asking profile)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The --dataset-catalog and --dataset-schema CLI flags only fill in
missing parts of a query — they do NOT override catalog/schema
hardcoded in the FROM clause. Dashboard queries must use bare
table names only (e.g., "FROM trips", not "FROM nyctaxi.trips").

- SKILL.md: rewrite note with ✅/❌ examples and a "why" explanation
- 4-examples.md: update example queries to use bare table names
- 3-filters.md: update example query to use bare table name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
SKILL.md becomes a dense hub with one worked CLI example per concept
(projects, branches, endpoints, credentials, reverse ETL). Deep-dive
subfiles cover internals, limits, and advanced CLI, with an "SDK
equivalents" section at the bottom of each. connection-patterns.md
stays SDK-based since in-process OAuth token refresh is the one
legitimate runtime SDK use case.

Also fixes CLI bugs found during live testing: create-project and
generate-database-credential take positional args (not flags); default
endpoint is named primary (not ep-primary); duration fields use
suspend_timeout_duration / history_retention_duration (not _seconds).

Co-authored-by: Isaac
SKILL.md and references/2-serverless-job.md now lead with
databricks jobs submit (the one-shot create+run CLI primitive) instead
of the defunct MCP execute_code wrapper that the reference file used
to point at. Full flow documented: upload → submit → poll get-run →
fetch get-run-output, including the non-obvious gotcha that
get-run-output takes the task run_id (.tasks[0].run_id), not the
parent run_id from submit.

scripts/compute.py gains --environments flag with dict-or-typed
normalization so the standalone script can install pip dependencies
(previously impossible from CLI — "client": "4" deps had no path).

Interactive cluster section reduced to an "avoid by default" callout
in SKILL.md; the raw-CLI cluster list and create patterns move into
references/3-interactive-cluster.md alongside the existing script
wrappers.

SQL Warehouses section in SKILL.md expanded from create-only to the
full CRUD surface (create, list, find, get, start, stop, edit,
delete) with live-verified min_num_clusters/max_num_clusters and
--no-wait gotchas.

Co-authored-by: Isaac
job_extra_params={"environments": [...]} was broken both ways: passing
dicts (the documented shape) crashed in the SDK's jobs.submit because
it serializes each list element via .as_dict(); passing typed
JobEnvironment crashed earlier trying to read environment_key with
.get(). Neither path worked.

Normalize extra["environments"] to List[JobEnvironment] once at the
top of the submit path: dicts get wrapped (nested spec dict → typed
Environment), typed objects pass through, anything else raises
TypeError before hitting the SDK. env_key for the task binding is
read off the canonical typed object.

Adds TestServerlessJobExtraParams integration-test class covering the
four cases: dict input, typed input, no environments (default path),
malformed entry. Previously there was zero coverage of job_extra_params,
which is how the bug landed. All four pass live (≈110 s for the class).

Co-authored-by: Isaac
The --dataset-catalog / --dataset-schema guidance tells you what to do
but not why. Clarify that bare table names exist so the serialized
dashboard can be re-installed on a different catalog.schema without
rewriting queries.

Co-authored-by: Isaac
Three skills previously documented the dead get_table_stats_and_schema
MCP function or had related gaps:

- databricks-metric-views: swap the MCP call for
  `databricks experimental aitools tools discover-schema` and note that
  deeper distribution probes go through the `query` subcommand.
- databricks-genie: same replacement in Step 1 "Understand the Data",
  plus delete the bogus `databricks sql exec` calls (no such
  subcommand exists) in favor of `query`.
- databricks-aibi-dashboards: expand Step 2 exploration guidance so
  the design decisions (widget vs. table, KPI vs. trend chart, trend
  granularity, filter options) are explicitly tied to what to probe
  (cardinality, top values, numeric distribution, trend viability).
  Keeps the skill conceptual rather than prescribing SQL the agent
  can already write.

Co-authored-by: Isaac
databricks fs on CLI v0.296 requires the dbfs: scheme prefix for UC
Volume paths. Without it the CLI treats the path as local filesystem
and errors with `no such directory`. Fix every fs example pointed at
/Volumes/... in the PDF, UC, and SDP skills; also tighten the UC
examples to use -r and --overwrite consistently, and clarify that -r
copies the source directory's contents (not the directory itself).

databricks workspace import-dir on redeploys silently skips files that
already exist, so updates never reach the workspace and the app keeps
running the old version. Add --overwrite to every import-dir example
in the app skill's 4-deployment.md and 6-cli-approach.md. Also flag
the first-ever-deploy gotcha on the redeployment recipe (the workspace
delete line errors when the target dir doesn't exist yet).

Fix the PDF skill's volumes-create troubleshooting row — it passed a
single dotted arg (`catalog.schema.volume`) where the CLI wants four
positional args (`CATALOG SCHEMA NAME MANAGED`).

All corrected forms live-verified against the workspace once.

Co-authored-by: Isaac
scripts/conversation.py was a 171-line Python glue wrapper around
client.genie.start_conversation_and_wait and client.genie.get_message
with manual polling. The CLI now exposes all the primitives directly
(start-conversation, create-message, get-message,
get-message-attachment-query-result), and start-conversation has a
built-in --no-wait / --timeout LRO flag. Document the three-command
flow end-to-end and delete the script. No external Python callers
(only SKILL.md pointed at it).

Also in this commit:

- Fix the Export/Import quoting inconsistency: genie_space.json on
  disk is now a parsed object (not a JSON-string blob). Export unwraps
  with `jq '.serialized_space | fromjson'`; import and update both
  stringify consistently with `jq -c '.' | jq -Rs '.'`.
- Add two troubleshooting rows: slow answers / query timeouts
  (warehouse sizing) and wrong/empty answers (example_question_sqls +
  text_instructions).
- Drop the redundant serialized_space "Structure" skeleton — its
  information is a strict subset of the Complete Example, now renamed
  "Example" with the top-level keys called out in the lead-in.

All three primitives live-verified against a real Genie Space on the
workspace (NordWind Fleet Analytics): start-conversation → poll
get-message → get-message-attachment-query-result (columns + rows) →
create-message for follow-up.

Co-authored-by: Isaac
SDP skill had three concrete bugs that bit an agent running a real
pipeline update end-to-end:

1. references/2-cli-approach.md claimed "file" libraries could point
   to a directory. They can't — the API errors with "Paths must end
   with .py or .sql". The correct shape for a folder is
   {"glob": {"include": "<dir>/**"}}. Fixed the example and added a
   troubleshooting row for the exact error string. Live-verified with
   a pipelines update round-trip.

2. No documented flow for editing an existing pipeline. Added a
   dense "Updating a Pipeline" section covering re-upload +
   start-update. Key gotcha: pipelines consume raw FILE entries, so
   re-imports need --format RAW --overwrite. --format SOURCE
   --language SQL|PYTHON creates a workspace NOTEBOOK (deprecated for
   pipelines) and fails on an existing FILE path with
   "type mismatch (asked: NOTEBOOK, actual: FILE)". Live-verified
   both failure and success modes. Added troubleshooting row.

3. Contradictory streaming read guidance — SKILL.md said
   FROM stream(table), 4-dlt-migration.md showed FROM STREAM table.
   Both parse, but the function form is the canonical one. Reworked
   the troubleshooting row to spell out when each form applies and
   flag FROM STREAM table as legacy DLT compatibility.

Bonus: pipelines list-pipeline-events returns a bare array, not
{"events": [...]} — skill previously showed the raw command with no
output shape hint. Replaced with a ready jq pattern that surfaces
just ERROR/WARN entries; agent had written two failing Python
one-liners trying to guess the shape.

Also simplified databricks-unity-catalog SKILL.md to show the
positional form for schemas create and volumes create (what the help
text documents as canonical) instead of the --json form that was
redundant with the positional CLI.

Co-authored-by: Isaac
Script invocations in SKILL.md and references (python scripts/X)
previously assumed the reader was running from the skill's install
folder. Agents running from an arbitrary project cwd hit
"No such file or directory" errors — the agent-bricks, execution-compute,
and pdf-generation skills all trip the same way.

Switch to the <SKILL_ROOT> literal token for every script invocation
and add a one-line convention note at the top of each affected
SKILL.md and reference file:

  > <SKILL_ROOT> = the directory containing this SKILL.md; resolve to
  > the absolute install path (e.g. ~/.claude/skills/<skill-name>).

Rewrote:
- python scripts/compute.py ...         → python <SKILL_ROOT>/scripts/compute.py ...
- python scripts/pdf_generator.py ...   → python <SKILL_ROOT>/scripts/pdf_generator.py ...

Also fixed a stale markdown link in the SDP skill whose display text
said "examples/exploration_notebook.py" but whose path was "scripts/...".

databricks-agent-bricks script references come in a separate commit.

Co-authored-by: Isaac
The skill told readers to call create-knowledge-source with four
positional args (PARENT DISPLAY_NAME DESCRIPTION SOURCE_TYPE) alongside
--json. The CLI rejects that combination:

  Error: when --json flag is specified, provide only PARENT as
  positional arguments. Provide 'display_name', 'description',
  'source_type' in your JSON input.

Only two forms actually work (verified live on the workspace):
  1. PARENT + --json '{display_name, description, source_type, files|index|...}'
  2. positional-only (no --json) — but then there's nowhere to pass
     files.path / index.index_name, so this form only works for source
     types that need no extra body, which today is none.

Updated SKILL.md and 1-knowledge-assistants.md to show the single
working shape: PARENT positional + everything else in --json. Added
the display_name / description fields inside each example body.

Co-authored-by: Isaac
Three concrete bugs in scripts/mas_manager.py triggered by a real
agent session:

1. get_mas (L481) and update_mas (L531) read instructions from
   mas_data.get("instructions") — wrong nested level, always empty.
   The GET response nests it on tile: mas_data.tile.instructions.
   Consequence: update_mas(tile_id, name="...") without an explicit
   instructions= arg wiped the existing instructions on every call.
   Verified the correct path live: "instructions_len: 232" vs 0 before.

2. add_examples_queued span up an in-process daemon thread that
   polled get_endpoint_status every 30s. When the CLI process exited,
   the thread died and examples were never added — silent data loss.
   Removed add_examples_queued, TileExampleQueue, get_tile_example_queue,
   the _tile_example_queue singleton, and the now-unused threading /
   Tuple imports.

3. Replaced the broken queue with a wait_for_online flag on
   add_examples (CLI: --wait). Blocks and polls every 30s for up to
   15 min (covers the ~10 min NOT_READY -> ONLINE wait after create_mas
   or a big update_mas, with headroom). No background queue — the
   caller process must stay alive for the wait.

Also live-verified that the MAS PATCH endpoint is NOT partial:
missing `name` returns 400 Missing required field, missing `agents`
returns 400 "At least one BaseAgent must be provided". update_mas
already handles this internally (fetches existing + merges), so the
full-replace reality stays an internal detail of the HTTP layer —
callers see a partial-update-shaped API.

Skill doc updates:
- SKILL.md: reorder list_mas to the top of the check/manage block
  with a one-liner describing the return shape.
- SKILL.md: flag the ~10min NOT_READY wait on add_examples with --wait.
- SKILL.md: fix status legend from "(2-5 min)" to "up to ~10 min".
- 2-supervisor-agents.md: replace the dual add_examples / add_examples_wait
  block with a single add_examples [--wait] example.
- SKILL.md also includes the KA create-knowledge-source fix from the
  previous commit's companion page (PARENT + everything-in-json).

Co-authored-by: Isaac
Lakebase Autoscaling is the canonical path for all new Lakebase work
(autoscaling, branching, scale-to-zero, point-in-time restore). The
Provisioned skill covers the predecessor fixed-capacity tier; keeping
both causes agents to spend time deciding between them or picking the
older one. Delete the Provisioned skill and point everything at
autoscale.

Files deleted:
- databricks-skills/databricks-lakebase-provisioned/SKILL.md
- databricks-skills/databricks-lakebase-provisioned/connection-patterns.md
- databricks-skills/databricks-lakebase-provisioned/reverse-etl.md

Cross-references updated:
- install_skills.sh: drop from DATABRICKS_SKILLS list, description map,
  and reference-files map.
- README.md: replace the Provisioned bullet with a Lakebase Autoscale
  bullet under the same Development & Deployment section.
- databricks-python-sdk/SKILL.md, databricks-app-python/SKILL.md:
  redirect the "Related Skills" link to databricks-lakebase-autoscale.
- databricks-lakebase-autoscale/SKILL.md: drop the now-meaningless
  "Provisioned vs Autoscaling" comparison table and the predecessor
  link. Keep the one prose mention in computes.md explaining CU RAM
  sizing context — that's justification, not a link.

Co-authored-by: Isaac
…tures

- Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md
- Fix scale-to-zero wake-up latency: few hundred ms → ~100ms
- Update token refresh guidance: 50 min → 30-40 min
- Move synced-table CLI from `databricks database` to `databricks postgres` group (v0.294.0+)
- Update SDK module from `databricks.sdk.service.database` to `databricks.sdk.service.postgres`
- Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU
- Add High Availability section (secondaries vs read replicas, HA constraints)
- Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only)
- Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only)
- Add `databricks apps init --features lakebase` command and `list-endpoints` command

Co-authored-by: Isaac
Reorganizes branches.md, computes.md, connection-patterns.md, projects.md,
and reverse-etl.md into a references/ subfolder. Updates all links in
SKILL.md (references/foo.md) and back-links in each reference file
(../SKILL.md). Also corrects token refresh guidance to 45 min per official
Databricks docs (docs.databricks.com/aws/en/oltp/projects/external-apps-connect).

Co-authored-by: Isaac
Ports three hard-difficulty interactive test cases from ai-dev-kit-lakebase_updates:
- 007: Full project setup (create project, autoscaling, branch protection, dev branch, connectivity, database)
- 008: Schema DDL (4-table support schema with FKs, CHECK constraints, indexes)
- 009: Extended DDL (support_cases, case_products, case_notes with uv/pip install)

Fixes token refresh guidance in 007 response from ~50 min to ~45 min.

Co-authored-by: Isaac
…p-primary

Follow-up to the CLI-first rewrite in this PR. Three fixes that were blocking
accurate eval scoring:

1. ground_truth.yaml — replaced all SDK expected_facts/patterns with CLI
   equivalents for tasks 001, 002, 004, 005, 007 (management-plane tasks).
   Connection and DDL facts in tasks 003, 008, 009 made approach-agnostic.
   Proxy eval confirmed: 0.216 → 0.609 (+0.393). Tasks 002/003/005 all hit
   0.825 after the fix; task 006 (cli_reference) is residual work for a
   follow-on PR.

2. SKILL.md — added psycopg3 connection snippet inline to the Credentials
   section. Moving it to references/connection-patterns.md caused a regression
   where the surrogate LLM defaulted to import psycopg2. Reference files are
   not loaded at eval time (evaluator reads only SKILL.md); guidance must be
   inline to be effective.

3. references/computes.md line 7 + ground_truth.yaml tasks 007/008/009 —
   ep-primary → primary. The wrong endpoint name was the direct cause of the
   task 009 floor (score 0.000 during agent-eval; agent used the wrong path
   and the connection failed).

Also adds psycopg[binary] to .test/pyproject.toml so pre-validation passes
for tasks 003/007/008/009 (which import psycopg in their reference responses),
and fixes agent executor to skip empty env var values so Claude Code falls
back to keychain auth correctly.

Co-authored-by: Isaac
…ence

Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions
(correctness, completeness, guideline adherence). Root cause: the reference
response used '--project-id my-app' which is invalid syntax — the project ID
is a positional argument to create-project, not a flag.

Changes:
- Fix create-project syntax: positional argument, not --project-id flag
- Add update-endpoint example with correct positional field-mask syntax
- Add generate-database-credential example (credentials are a core CLI op)
- Replace 'no_expiry: true' example with 'ttl: 604800s' for consistency
  with the rest of the ground truth
- Expand expected_facts to assert positional arg and field-mask patterns
- Add expected_patterns for endpoint and credential commands
- Update guideline: 5 subcommands (was 4), explicitly forbid --project-id flag

Co-authored-by: Isaac
@dustinvannoy-db
Copy link
Copy Markdown
Collaborator

@cankoklu-db can you make these changes for lakebase directly to the PR you reference instead? I think that was your intentino

@cankoklu-db
Copy link
Copy Markdown
Collaborator Author

Changes applied directly to PR #497 (experimental_lakebase_updates) per reviewer feedback.

@cankoklu-db cankoklu-db closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants