Skip to content

Update lakebase-autoscale skill with revised CLI commands and new fea…#497

Merged
cankoklu-db merged 5 commits intoexperimentalfrom
experimental_lakebase_updates
May 6, 2026
Merged

Update lakebase-autoscale skill with revised CLI commands and new fea…#497
cankoklu-db merged 5 commits intoexperimentalfrom
experimental_lakebase_updates

Conversation

@dustinvannoy-db
Copy link
Copy Markdown
Collaborator

@dustinvannoy-db dustinvannoy-db commented Apr 26, 2026

…tures

  • Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md
  • Fix scale-to-zero wake-up latency: few hundred ms → ~100ms
  • Update token refresh guidance: 50 min → 45 min (Claude suggested this because of this link, either seems ok)
  • Move synced-table CLI from databricks database to databricks postgres group (v0.294.0+)
  • Update SDK module from databricks.sdk.service.database to databricks.sdk.service.postgres
  • Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU (based on docs)
  • Add High Availability section (secondaries vs read replicas, HA constraints)
  • Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only)
  • Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only)
  • Add databricks apps init --features lakebase command and list-endpoints command

…tures

- Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md
- Fix scale-to-zero wake-up latency: few hundred ms → ~100ms
- Update token refresh guidance: 50 min → 30-40 min
- Move synced-table CLI from `databricks database` to `databricks postgres` group (v0.294.0+)
- Update SDK module from `databricks.sdk.service.database` to `databricks.sdk.service.postgres`
- Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU
- Add High Availability section (secondaries vs read replicas, HA constraints)
- Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only)
- Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only)
- Add `databricks apps init --features lakebase` command and `list-endpoints` command

Co-authored-by: Isaac
Reorganizes branches.md, computes.md, connection-patterns.md, projects.md,
and reverse-etl.md into a references/ subfolder. Updates all links in
SKILL.md (references/foo.md) and back-links in each reference file
(../SKILL.md). Also corrects token refresh guidance to 45 min per official
Databricks docs (docs.databricks.com/aws/en/oltp/projects/external-apps-connect).

Co-authored-by: Isaac
Ports three hard-difficulty interactive test cases from ai-dev-kit-lakebase_updates:
- 007: Full project setup (create project, autoscaling, branch protection, dev branch, connectivity, database)
- 008: Schema DDL (4-table support schema with FKs, CHECK constraints, indexes)
- 009: Extended DDL (support_cases, case_products, case_notes with uv/pip install)

Fixes token refresh guidance in 007 response from ~50 min to ~45 min.

Co-authored-by: Isaac
@dustinvannoy-db dustinvannoy-db marked this pull request as ready for review April 29, 2026 16:32
@dustinvannoy-db
Copy link
Copy Markdown
Collaborator Author

@cankoklu-db can you please review this ground truth and try evaluating these changes?

@cankoklu-db
Copy link
Copy Markdown
Collaborator

Review: Approve ✅ — Follow-ups Applied

The CLI-first direction is correct and technically validated. Three follow-ups identified in review are now applied.


Technical Facts Confirmed

Fact main (old) PR #497 (new) Verified
Default endpoint name ep-primary (wrong) primary ✅ Live list-endpoints after create-project
Autoscaling spread ≤ 8 CU (wrong) ≤ 16 CU ✅ API rejects: "max - min must be <= 16 CU"
Token TTL implied ~1h exactly 3600s ✅ Live JWT decode; expire_time field present
Refresh cadence implicit 45 min explicit ✅ Correct conservative buffer

Why the Eval Scores Looked Worse (They Weren't)

Proxy eval: main 0.690 → PR #497 0.216. The entire drop was one thing: ground_truth.yaml was written for the SDK approach and never updated. Judges were grading CLI answers against SDK expected facts.

Task 002 actual response WITH the PR #497 skill:

databricks postgres create-branch projects/my-app development \
    --json '{"spec": {"source_branch": "projects/my-app/branches/production", "ttl": "604800s"}}'

Correct — but judge marked all expected facts missing because it expected w.postgres.create_branch(), BranchSpec, Duration(seconds=604800). The skill was working. The rubric was wrong.


Three Follow-ups Applied

  1. ground_truth.yaml — CLI facts for tasks 001–009. SDK expected_facts/expected_patterns replaced with CLI equivalents for management-plane tasks. Connection and DDL facts made approach-agnostic. Confirmed references/*.md are not loaded at eval time — evaluator reads only SKILL.md.

  2. SKILL.md — psycopg3 snippet added inline to Credentials section. Moving it to references/connection-patterns.md caused a regression where task 003 responses defaulted to import psycopg2. Fixed by adding the snippet directly inline.

  3. ep-primary → primary in ground_truth.yaml (tasks 007, 008, 009) + references/computes.md line 7. ep-primary was the direct cause of the task 009 floor (score 0.000 — agent used wrong endpoint path, connection failed).


Post-Fix Eval Results

Overall proxy score: 0.216 → 0.609 (+0.393)

Task Pre-fix Post-fix corr comp guide Δ
001 create_project ~0.19 0.608 no yes yes +0.50
002 create_branch ~0.19 0.825 yes yes yes +1.00
003 connect_notebook ~0.21 0.825 yes yes yes +1.00
005 resize_compute ~0.19 0.825 yes yes yes +1.00
006 cli_reference ~0.20 0.200 no no no +0.00
007 full_project_setup ~0.16 0.442 no yes no +0.50
008 schema_ddl ~0.22 0.505 yes yes yes +0.00
009 support_cases_ddl ~0.21 0.640 yes yes no +1.00

Skill effectiveness: 0.62 (all tasks NEEDS_SKILL). Tasks 002, 003, 005 all hit 0.825. Task 006 (cli_reference) is a residual ground truth issue for a follow-on pass. Task 008 delta=+0.00 is expected — pure DDL doesn't benefit from skill context.

LGTM.

…p-primary

Follow-up to the CLI-first rewrite in this PR. Three fixes that were blocking
accurate eval scoring:

1. ground_truth.yaml — replaced all SDK expected_facts/patterns with CLI
   equivalents for tasks 001, 002, 004, 005, 007 (management-plane tasks).
   Connection and DDL facts in tasks 003, 008, 009 made approach-agnostic.
   Proxy eval confirmed: 0.216 → 0.609 (+0.393). Tasks 002/003/005 all hit
   0.825 after the fix; task 006 (cli_reference) is residual work for a
   follow-on PR.

2. SKILL.md — added psycopg3 connection snippet inline to the Credentials
   section. Moving it to references/connection-patterns.md caused a regression
   where the surrogate LLM defaulted to import psycopg2. Reference files are
   not loaded at eval time (evaluator reads only SKILL.md); guidance must be
   inline to be effective.

3. references/computes.md line 7 + ground_truth.yaml tasks 007/008/009 —
   ep-primary → primary. The wrong endpoint name was the direct cause of the
   task 009 floor (score 0.000 during agent-eval; agent used the wrong path
   and the connection failed).

Also adds psycopg[binary] to .test/pyproject.toml so pre-validation passes
for tasks 003/007/008/009 (which import psycopg in their reference responses),
and fixes agent executor to skip empty env var values so Claude Code falls
back to keychain auth correctly.

Co-authored-by: Isaac
…ence

Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions
(correctness, completeness, guideline adherence). Root cause: the reference
response used '--project-id my-app' which is invalid syntax — the project ID
is a positional argument to create-project, not a flag.

Changes:
- Fix create-project syntax: positional argument, not --project-id flag
- Add update-endpoint example with correct positional field-mask syntax
- Add generate-database-credential example (credentials are a core CLI op)
- Replace 'no_expiry: true' example with 'ttl: 604800s' for consistency
  with the rest of the ground truth
- Expand expected_facts to assert positional arg and field-mask patterns
- Add expected_patterns for endpoint and credential commands
- Update guideline: 5 subcommands (was 4), explicitly forbid --project-id flag

Co-authored-by: Isaac
@cankoklu-db cankoklu-db merged commit eb48a4c into experimental May 6, 2026
@cankoklu-db cankoklu-db deleted the experimental_lakebase_updates branch May 6, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants