Update lakebase-autoscale skill with revised CLI commands and new fea…#497
Update lakebase-autoscale skill with revised CLI commands and new fea…#497cankoklu-db merged 5 commits intoexperimentalfrom
Conversation
…tures - Fix autoscaling spread constraint: 8 CU → 16 CU across SKILL.md and computes.md - Fix scale-to-zero wake-up latency: few hundred ms → ~100ms - Update token refresh guidance: 50 min → 30-40 min - Move synced-table CLI from `databricks database` to `databricks postgres` group (v0.294.0+) - Update SDK module from `databricks.sdk.service.database` to `databricks.sdk.service.postgres` - Correct reverse-ETL throughput figures: snapshot 2k rows/s/CU, incremental 150 rows/s/CU - Add High Availability section (secondaries vs read replicas, HA constraints) - Add Data API section (PostgREST-compatible HTTP CRUD, Autoscaling-only) - Add Lakehouse Sync Beta section (Postgres → UC Delta, AWS only) - Add `databricks apps init --features lakebase` command and `list-endpoints` command Co-authored-by: Isaac
Reorganizes branches.md, computes.md, connection-patterns.md, projects.md, and reverse-etl.md into a references/ subfolder. Updates all links in SKILL.md (references/foo.md) and back-links in each reference file (../SKILL.md). Also corrects token refresh guidance to 45 min per official Databricks docs (docs.databricks.com/aws/en/oltp/projects/external-apps-connect). Co-authored-by: Isaac
Ports three hard-difficulty interactive test cases from ai-dev-kit-lakebase_updates: - 007: Full project setup (create project, autoscaling, branch protection, dev branch, connectivity, database) - 008: Schema DDL (4-table support schema with FKs, CHECK constraints, indexes) - 009: Extended DDL (support_cases, case_products, case_notes with uv/pip install) Fixes token refresh guidance in 007 response from ~50 min to ~45 min. Co-authored-by: Isaac
|
@cankoklu-db can you please review this ground truth and try evaluating these changes? |
Review: Approve ✅ — Follow-ups AppliedThe CLI-first direction is correct and technically validated. Three follow-ups identified in review are now applied. Technical Facts Confirmed
Why the Eval Scores Looked Worse (They Weren't)Proxy eval: main 0.690 → PR #497 0.216. The entire drop was one thing: Task 002 actual response WITH the PR #497 skill: databricks postgres create-branch projects/my-app development \
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "ttl": "604800s"}}'Correct — but judge marked all expected facts missing because it expected Three Follow-ups Applied
Post-Fix Eval ResultsOverall proxy score: 0.216 → 0.609 (+0.393)
Skill effectiveness: 0.62 (all tasks NEEDS_SKILL). Tasks 002, 003, 005 all hit 0.825. Task 006 (cli_reference) is a residual ground truth issue for a follow-on pass. Task 008 delta=+0.00 is expected — pure DDL doesn't benefit from skill context. LGTM. |
…p-primary Follow-up to the CLI-first rewrite in this PR. Three fixes that were blocking accurate eval scoring: 1. ground_truth.yaml — replaced all SDK expected_facts/patterns with CLI equivalents for tasks 001, 002, 004, 005, 007 (management-plane tasks). Connection and DDL facts in tasks 003, 008, 009 made approach-agnostic. Proxy eval confirmed: 0.216 → 0.609 (+0.393). Tasks 002/003/005 all hit 0.825 after the fix; task 006 (cli_reference) is residual work for a follow-on PR. 2. SKILL.md — added psycopg3 connection snippet inline to the Credentials section. Moving it to references/connection-patterns.md caused a regression where the surrogate LLM defaulted to import psycopg2. Reference files are not loaded at eval time (evaluator reads only SKILL.md); guidance must be inline to be effective. 3. references/computes.md line 7 + ground_truth.yaml tasks 007/008/009 — ep-primary → primary. The wrong endpoint name was the direct cause of the task 009 floor (score 0.000 during agent-eval; agent used the wrong path and the connection failed). Also adds psycopg[binary] to .test/pyproject.toml so pre-validation passes for tasks 003/007/008/009 (which import psycopg in their reference responses), and fixes agent executor to skip empty env var values so Claude Code falls back to keychain auth correctly. Co-authored-by: Isaac
…ence Task 006 (cli_reference) was scoring 0.000 on all three judge dimensions (correctness, completeness, guideline adherence). Root cause: the reference response used '--project-id my-app' which is invalid syntax — the project ID is a positional argument to create-project, not a flag. Changes: - Fix create-project syntax: positional argument, not --project-id flag - Add update-endpoint example with correct positional field-mask syntax - Add generate-database-credential example (credentials are a core CLI op) - Replace 'no_expiry: true' example with 'ttl: 604800s' for consistency with the rest of the ground truth - Expand expected_facts to assert positional arg and field-mask patterns - Add expected_patterns for endpoint and credential commands - Update guideline: 5 subcommands (was 4), explicitly forbid --project-id flag Co-authored-by: Isaac
…tures
databricks databasetodatabricks postgresgroup (v0.294.0+)databricks.sdk.service.databasetodatabricks.sdk.service.postgresdatabricks apps init --features lakebasecommand andlist-endpointscommand