feat: add support for postgres schema selection#475
feat: add support for postgres schema selection#475
Conversation
Add support for selecting a PostgreSQL schema instead of always using 'public'. The schema is extracted from the connection URL's options parameter (search_path), following PostgreSQL's native libpq format. Changes: - Add _parse_schema_from_url() to extract schema from connection URL - Thread schema parameter through all extraction methods with 'public' default - Add pg_namespace JOINs for correct cross-schema disambiguation - Add schema input field in DatabaseModal (PostgreSQL only) - Add comprehensive unit tests for URL schema parsing - Update documentation with custom schema configuration guide Based on PR #373 by sirudog with the following fixes: - Fix pg_namespace JOIN order in extract_columns_info to prevent duplicate rows when same-named tables exist across schemas - Fix regex to require '=' separator (prevents mis-capture edge cases) - Improve $user handling to loop through all schemas instead of only checking first two positions - Fix pylint line-too-long in test file Co-authored-by: sirudog <1550561+sirudog@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Completed Working on "Code Review"✅ Workflow completed successfully. |
|
🚅 Deployed to the QueryWeaver-pr-475 environment in queryweaver
|
Dependency ReviewThe following issues were found:
License Issuesuv.lock
OpenSSF Scorecard
Scanned Files
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Thanks for the schema-selection enhancement—there are 6 MAJOR findings in total (no BLOCKER/CRITICAL/MINOR/SUGGESTION/PRAISE items).
Summary of findings
- Importance counts: 6 MAJOR
- Affected files: 4 (
api/loaders/postgres_loader.py,app/src/components/modals/DatabaseModal.tsx,tests/test_postgres_loader.py,docs/postgres_loader.md)
Key themes observed
- Schema parsing robustness gaps in
search_pathhandling (quoted commas and edge-token handling) that can select the wrong schema. - Schema-scoping correctness issues in metadata joins where non-unique constraint names across schemas can lead to incorrect PK/FK attribution.
- Coverage and usability gaps: missing regression tests for documented edge cases and a docs snippet using an incorrect class name.
Actionable next steps
- Harden
search_pathparsing/tokenization (or delegate to server-side schema resolution) and safely encode/validate schema input from UI. - Fix constraint joins to include schema-qualified keys (
constraint_schema/table_schema) to prevent cross-schema collisions. - Add regression tests for repeated
$userand empty-token-onlysearch_pathvalues, then correct the documentation import/example to match implementation.
Rename _parse_schema_from_url to parse_schema_from_url since the method is already documented for external use and tested directly. This eliminates W0212 (protected-access) warnings that cause CI pylint to fail with exit code 4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add constraint_schema qualifier to key_column_usage JOINs in extract_columns_info to prevent cross-schema constraint name collisions - Sanitize schema input in DatabaseModal to strip non-identifier characters before building the URL options - Add edge case tests: empty tokens, blank quoted tokens, repeated $user entries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds support for targeting a specific PostgreSQL schema (instead of always public) by parsing search_path from the connection URL’s options parameter, threading the schema through metadata queries, and exposing an optional schema field in the UI for PostgreSQL manual connections.
Changes:
- Backend: added
PostgresLoader.parse_schema_from_url()and parameterizedinformation_schemaqueries byschema(defaultpublic). - UI: added an optional “Schema” input (PostgreSQL-only) that encodes
options=-csearch_path=<schema>into the built connection URL. - Tests/Docs: added unit tests for URL schema parsing and documentation/examples for custom schema configuration.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
api/loaders/postgres_loader.py |
Parses schema from connection URL and uses it to scope table/column/FK extraction queries. |
app/src/components/modals/DatabaseModal.tsx |
Adds PostgreSQL-only schema input and injects search_path into connection URL options for manual mode. |
tests/test_postgres_loader.py |
Adds unit tests covering search_path parsing edge cases. |
docs/postgres_loader.md |
Documents custom schema configuration via options=-csearch_path=... and troubleshooting guidance. |
README.md |
Fixes markdown code block formatting around the streaming Python example. |
.github/wordlist.txt |
Adds PostgresLoader to the spellcheck allowlist. |
- Fix regex to capture search_path values with spaces after commas (e.g. $user, public) by matching up to next -c option or EOL - Set session search_path explicitly after connecting so sample queries resolve to the correct schema - Use versionless PostgreSQL docs link (/docs/current/) - Clarify case-sensitivity note for schema names in troubleshooting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace (.+?)(?=\s+-c|\s*$) with [^\s,]+(?:\s*,\s*[^\s,]+)* to eliminate polynomial backtracking flagged by CodeQL. The new pattern uses unambiguous character classes with no overlapping quantifiers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…L encoding - DatabaseModal: Show validation error for invalid schema characters instead of silently stripping them. Throw error on submit if invalid chars present. - docs: URL-encode the example URL to prevent copy/paste connection failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The URL-encoded form (-csearch_path%3Dmy_schema) inside the Liquid
capture block triggers spellcheck failures ('csearch', 'Dmy'). Reverted
to readable form since Python's urlparse handles both formats fine.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds support for selecting a PostgreSQL schema instead of always using
public. Based on PR #373 by @sirudog, rebased onto current staging with bug fixes.What changed
_parse_schema_from_url()extracts schema from the connection URL'soptionsparameter (search_path), following PostgreSQL's native libpq formatinformation_schemaqueries now accept a parameterizedschemaargument (default:public) — no SQL injection riskFixes over original PR #373
pg_namespaceJOIN order inextract_columns_info— the original usedLEFT JOIN pg_classthenLEFT JOIN pg_namespace, which doesn't filter duplicatepg_classrows when same-named tables exist across schemas. Fixed by joiningpg_namespacefirst and folding the namespace filter into thepg_classjoin condition[=\s]+to\s*=\s*to require=as separator, preventing mis-capture whensearch_path=is followed by a space and another option\$user,\$user,publicBackward compatibility
All methods default to
publicschema, so existing connections withoutsearch_pathwork identically.Closes #373 (supersedes)