Update lakebase-autoscale skill with revised CLI commands and new fea… by dustinvannoy-db · Pull Request #497 · databricks-solutions/ai-dev-kit

dustinvannoy-db · 2026-04-26T03:53:52Z

…tures

Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md
Fix scale-to-zero wake-up latency: few hundred ms → ~100ms
Update token refresh guidance: 50 min → 45 min (Claude suggested this because of this link, either seems ok)
Move synced-table CLI from databricks database to databricks postgres group (v0.294.0+)
Update SDK module from databricks.sdk.service.database to databricks.sdk.service.postgres
Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU (based on docs)
Add High Availability section (secondaries vs read replicas, HA constraints)
Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only)
Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only)
Add databricks apps init --features lakebase command and list-endpoints command

…tures - Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md - Fix scale-to-zero wake-up latency: few hundred ms → ~100ms - Update token refresh guidance: 50 min → 30-40 min - Move synced-table CLI from `databricks database` to `databricks postgres` group (v0.294.0+) - Update SDK module from `databricks.sdk.service.database` to `databricks.sdk.service.postgres` - Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU - Add High Availability section (secondaries vs read replicas, HA constraints) - Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only) - Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only) - Add `databricks apps init --features lakebase` command and `list-endpoints` command Co-authored-by: Isaac

Reorganizes branches.md, computes.md, connection-patterns.md, projects.md, and reverse-etl.md into a references/ subfolder. Updates all links in SKILL.md (references/foo.md) and back-links in each reference file (../SKILL.md). Also corrects token refresh guidance to 45 min per official Databricks docs (docs.databricks.com/aws/en/oltp/projects/external-apps-connect). Co-authored-by: Isaac

Ports three hard-difficulty interactive test cases from ai-dev-kit-lakebase_updates: - 007: Full project setup (create project, autoscaling, branch protection, dev branch, connectivity, database) - 008: Schema DDL (4-table support schema with FKs, CHECK constraints, indexes) - 009: Extended DDL (support_cases, case_products, case_notes with uv/pip install) Fixes token refresh guidance in 007 response from ~50 min to ~45 min. Co-authored-by: Isaac

dustinvannoy-db · 2026-04-29T16:33:27Z

@cankoklu-db can you please review this ground truth and try evaluating these changes?

cankoklu-db · 2026-05-04T16:24:07Z

Review: Approve ✅ — Follow-ups Applied

The CLI-first direction is correct and technically validated. Three follow-ups identified in review are now applied.

Technical Facts Confirmed

Fact	main (old)	PR #497 (new)	Verified
Default endpoint name	`ep-primary` (wrong)	`primary`	✅ Live `list-endpoints` after `create-project`
Autoscaling spread	≤ 8 CU (wrong)	≤ 16 CU	✅ API rejects: `"max - min must be <= 16 CU"`
Token TTL	implied ~1h	exactly 3600s	✅ Live JWT decode; `expire_time` field present
Refresh cadence	implicit	45 min explicit	✅ Correct conservative buffer

Why the Eval Scores Looked Worse (They Weren't)

Proxy eval: main 0.690 → PR #497 0.216. The entire drop was one thing: ground_truth.yaml was written for the SDK approach and never updated. Judges were grading CLI answers against SDK expected facts.

Task 002 actual response WITH the PR #497 skill:

databricks postgres create-branch projects/my-app development \
    --json '{"spec": {"source_branch": "projects/my-app/branches/production", "ttl": "604800s"}}'

Correct — but judge marked all expected facts missing because it expected w.postgres.create_branch(), BranchSpec, Duration(seconds=604800). The skill was working. The rubric was wrong.

Three Follow-ups Applied

ground_truth.yaml — CLI facts for tasks 001–009. SDK expected_facts/expected_patterns replaced with CLI equivalents for management-plane tasks. Connection and DDL facts made approach-agnostic. Confirmed references/*.md are not loaded at eval time — evaluator reads only SKILL.md.
SKILL.md — psycopg3 snippet added inline to Credentials section. Moving it to references/connection-patterns.md caused a regression where task 003 responses defaulted to import psycopg2. Fixed by adding the snippet directly inline.
ep-primary → primary in ground_truth.yaml (tasks 007, 008, 009) + references/computes.md line 7. ep-primary was the direct cause of the task 009 floor (score 0.000 — agent used wrong endpoint path, connection failed).

Post-Fix Eval Results

Overall proxy score: 0.216 → 0.609 (+0.393)

Task	Pre-fix	Post-fix	corr	comp	guide	Δ
001 create_project	~0.19	0.608	no	yes	yes	+0.50
002 create_branch	~0.19	0.825	yes	yes	yes	+1.00
003 connect_notebook	~0.21	0.825	yes	yes	yes	+1.00
005 resize_compute	~0.19	0.825	yes	yes	yes	+1.00
006 cli_reference	~0.20	0.200	no	no	no	+0.00
007 full_project_setup	~0.16	0.442	no	yes	no	+0.50
008 schema_ddl	~0.22	0.505	yes	yes	yes	+0.00
009 support_cases_ddl	~0.21	0.640	yes	yes	no	+1.00

Skill effectiveness: 0.62 (all tasks NEEDS_SKILL). Tasks 002, 003, 005 all hit 0.825. Task 006 (cli_reference) is a residual ground truth issue for a follow-on pass. Task 008 delta=+0.00 is expected — pure DDL doesn't benefit from skill context.

LGTM.

…p-primary Follow-up to the CLI-first rewrite in this PR. Three fixes that were blocking accurate eval scoring: 1. ground_truth.yaml — replaced all SDK expected_facts/patterns with CLI equivalents for tasks 001, 002, 004, 005, 007 (management-plane tasks). Connection and DDL facts in tasks 003, 008, 009 made approach-agnostic. Proxy eval confirmed: 0.216 → 0.609 (+0.393). Tasks 002/003/005 all hit 0.825 after the fix; task 006 (cli_reference) is residual work for a follow-on PR. 2. SKILL.md — added psycopg3 connection snippet inline to the Credentials section. Moving it to references/connection-patterns.md caused a regression where the surrogate LLM defaulted to import psycopg2. Reference files are not loaded at eval time (evaluator reads only SKILL.md); guidance must be inline to be effective. 3. references/computes.md line 7 + ground_truth.yaml tasks 007/008/009 — ep-primary → primary. The wrong endpoint name was the direct cause of the task 009 floor (score 0.000 during agent-eval; agent used the wrong path and the connection failed). Also adds psycopg[binary] to .test/pyproject.toml so pre-validation passes for tasks 003/007/008/009 (which import psycopg in their reference responses), and fixes agent executor to skip empty env var values so Claude Code falls back to keychain auth correctly. Co-authored-by: Isaac

…ence Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions (correctness, completeness, guideline adherence). Root cause: the reference response used '--project-id my-app' which is invalid syntax — the project ID is a positional argument to create-project, not a flag. Changes: - Fix create-project syntax: positional argument, not --project-id flag - Add update-endpoint example with correct positional field-mask syntax - Add generate-database-credential example (credentials are a core CLI op) - Replace 'no_expiry: true' example with 'ttl: 604800s' for consistency with the rest of the ground truth - Expand expected_facts to assert positional arg and field-mask patterns - Add expected_patterns for endpoint and credential commands - Update guideline: 5 subcommands (was 4), explicitly forbid --project-id flag Co-authored-by: Isaac

dustinvannoy-db added 3 commits April 25, 2026 10:05

dustinvannoy-db marked this pull request as ready for review April 29, 2026 16:32

dustinvannoy-db requested a review from cankoklu-db April 29, 2026 16:32

dustinvannoy-db requested a review from QuentinAmbard May 2, 2026 05:09

cankoklu-db added 2 commits May 4, 2026 18:42

cankoklu-db mentioned this pull request May 4, 2026

Fix lakebase-autoscale task 006 ground truth: invalid --project-id flag #514

Closed

cankoklu-db merged commit eb48a4c into experimental May 6, 2026

cankoklu-db deleted the experimental_lakebase_updates branch May 6, 2026 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update lakebase-autoscale skill with revised CLI commands and new fea…#497

Update lakebase-autoscale skill with revised CLI commands and new fea…#497
cankoklu-db merged 5 commits intoexperimentalfrom
experimental_lakebase_updates

dustinvannoy-db commented Apr 26, 2026 •

edited

Loading

Uh oh!

dustinvannoy-db commented Apr 29, 2026

Uh oh!

cankoklu-db commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dustinvannoy-db commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustinvannoy-db commented Apr 29, 2026

Uh oh!

cankoklu-db commented May 4, 2026

Review: Approve ✅ — Follow-ups Applied

Technical Facts Confirmed

Why the Eval Scores Looked Worse (They Weren't)

Three Follow-ups Applied

Post-Fix Eval Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dustinvannoy-db commented Apr 26, 2026 •

edited

Loading