Skip to content

feat(datastore): PostgreSQL dual-database support#2

Closed
dnplkndll wants to merge 6 commits intomainfrom
pg-dialect
Closed

feat(datastore): PostgreSQL dual-database support#2
dnplkndll wants to merge 6 commits intomainfrom
pg-dialect

Conversation

@dnplkndll
Copy link
Copy Markdown

@dnplkndll dnplkndll commented Mar 25, 2026

Summary

Add PostgreSQL as an alternative database backend alongside MySQL. Fleet can now run against either database with zero application code changes — the dialect abstraction and SQL rebind driver handle all translation transparently at runtime.

Upstream issue: fleetdm#34025
Production deployment: Running on PG 16 (CNPG) at fleet.hz.ledoweb.com

Test Results

Metric Start Current Improvement
SQL errors 91 4 96%
Test failures 141 12 91%

129/141 PG test failures resolved across 31 iterative rounds.
All MySQL tests continue to pass — zero regressions.


Architecture (5 clean commits)

1. feat: add PostgreSQL rebind driver and platform support

Core pgx-rebind driver (1,588 lines) that auto-rewrites MySQL SQL → PG. 30+ transformation categories with pre-compiled regexes. PG SQLSTATE error classification. Helm chart + config support.

2. feat: add DialectHelper interface and dual-dialect support

17-method interface for SQL fragment composition. Dual-dialect goose migration support. PG baseline schema migration via fleet prepare db.

3. refactor: migrate MySQL-specific SQL to dialect helpers

59 datastore files converted to use ds.dialect.*() methods. PG triggers for generated columns. Seed data for lookup tables.

4. test: add PG baseline schema and integration tests

194-table PG schema with triggers, 27 integration tests, dollar-quote aware statement splitter.

5. test: update tests for PG compatibility

Cross-DB test assertions, dynamic IDs, PG-aware INSERT syntax, dialect-conditional FK guards.


Key Technical Decisions

  • coerceTimeArgsToUTC: All time.Time params converted to UTC at driver level — fixes PG timestamp without time zone round-trip discrepancy
  • PG triggers instead of GENERATED columns (PG requires IMMUTABLE functions for generated columns; md5/decode are not)
  • GENERATED BY DEFAULT instead of GENERATED ALWAYS (allows explicit ID insertion in tests)
  • rebindQuery skips $$ blocks (PL/pgSQL function bodies aren't MySQL SQL)
  • No RESTART IDENTITY in TruncateTables (causes cascading failures from hardcoded IDs)

Remaining 12 Failures (tracked)

Category Count Root Cause
Pack stats ordering 3 PG DISTINCT ON vs MySQL INSERT IGNORE dedup
Missing label (NULL label_id) 2 "All Hosts" label cleanup between tests
SCEP serial column name 2 PG schema serial vs test id
Host device mapping 1 Hardcoded host IDs in test
MDM profile labels 1 Aggregate count off-by-one
Certificate truncation 1 Content ordering difference
Host list status 1 Extra host from previous subtest
Vulnerability list 1 Software ID lookup

Test plan

  • go build ./server/... — clean
  • gofmt — clean
  • POSTGRES_TEST=1 go test ./server/datastore/mysql/... — 129/141 passing
  • MYSQL_TEST=1 go test ./server/datastore/mysql/... — verify no regressions
  • Production: fleet.hz.ledoweb.com on PG 16 (CNPG), health 200
  • Observability: Vector → OpenObserve (480K+ logs)
  • Backups: Nightly to GCS with 7-day retention

Introduce the pgx-rebind SQL driver that wraps pgx/v5 to automatically
translate MySQL-dialect SQL to PostgreSQL at query time. Handles 30+
transformation categories including placeholder conversion, function
rewrites, boolean/integer fixes, JSON operators, upsert syntax, and more.

New files:
- server/platform/postgres/rebind_driver.go — SQL rewrite layer (1,185 lines)
- server/platform/postgres/errors.go — PG SQLSTATE error classification
- server/platform/postgres/common.go — shared PG utilities

Config/infrastructure:
- server/config/config.go — --mysql_driver flag for driver selection
- charts/fleet/ — database.driver Helm value + FLEET_MYSQL_DRIVER env
- docker-compose.yml — postgres_test service for local PG testing
Introduce DialectHelper interface (17 methods) that abstracts MySQL vs PG
SQL differences at the fragment level. Each Datastore instance holds a
dialect that generates the correct SQL for its backend.

Methods: InsertIgnoreInto, OnDuplicateKey, OnConflictDoNothing, GroupConcat,
JSONExtract, JSONUnquoteExtract, JSONBuildObject, JSONAgg, FindInSet,
FullTextMatch, RegexpMatch, GoquDialect, IsDuplicate, IsForeignKey,
IsReadOnly, IsBadConnection, ReturningID.

Also:
- Dual MySQL/PG error classification in errors.go
- Driver selection and PG baseline migration in mysql.go
- Dual-dialect goose migration support
Replace hardcoded MySQL syntax with dialect method calls across all
datastore files. This enables the same Go code to generate correct SQL
for both MySQL and PostgreSQL.

Changes per file use ds.dialect methods for:
- INSERT IGNORE → InsertIgnoreInto()
- ON DUPLICATE KEY UPDATE → OnDuplicateKey()
- GROUP_CONCAT → GroupConcat()
- JSON_EXTRACT/JSON_OBJECT → JSONExtract()/JSONBuildObject()
- FIND_IN_SET → FindInSet()
- Boolean comparisons via rebind driver
- Error classification via IsDuplicate()/IsForeignKey()
- pg_baseline_schema.sql — 194 tables translated from MySQL DDL to PG
- postgres_smoke_test.go — 27 integration tests covering Host CRUD,
  Labels, Queries, Packs, Users, Teams, Policies, Software, Sessions,
  AppConfig, ListHosts, CountHosts, and more
- testing_utils.go — CreatePostgresDS, TruncateTables PG support

All 27 tests pass against PostgreSQL 16.
Update test call sites to pass dialect parameter where function
signatures changed. Add dialect: mysqlDialect{} to mockDatastore.
Additional datastore files migrated to use ds.dialect methods:
aggregated_stats, android_*, app_configs, carves, campaigns,
certificate_authorities, conditional_access, delete, in_house_apps,
invites, jobs, locks, maintained_apps, packs, password_reset,
scheduled_queries, secret_variables, setup_experience, software_titles*,
teams, users, windows_updates, wstep.
@dnplkndll dnplkndll force-pushed the main branch 4 times, most recently from 769083c to 617cf71 Compare March 31, 2026 23:54
dnplkndll pushed a commit that referenced this pull request Apr 14, 2026
<!-- Add the related story/sub-task/bug number, like Resolves fleetdm#123, or
remove if NA -->
**Related issue:** Resolves fleetdm#42836 

This is another hot path optimization.

## Before

When a host submits policy results via `SubmitDistributedQueryResults`,
the system needed to determine which policies "flipped" (changed from
passing to failing or vice versa). Each consumer computed this
independently:

```
SubmitDistributedQueryResults(policyResults)
  |
  +-- processScriptsForNewlyFailingPolicies
  |     filter to failing policies with scripts
  |     BUILD SUBSET of results
  |     CALL FlippingPoliciesForHost(subset)          <-- DB query #1
  |     convert result to set, filter, queue scripts
  |
  +-- processSoftwareForNewlyFailingPolicies
  |     filter to failing policies with installers
  |     BUILD SUBSET of results
  |     CALL FlippingPoliciesForHost(subset)          <-- DB query #2
  |     convert result to set, filter, queue installs
  |
  +-- processVPPForNewlyFailingPolicies
  |     filter to failing policies with VPP apps
  |     BUILD SUBSET of results
  |     CALL FlippingPoliciesForHost(subset)          <-- DB query #3
  |     convert result to set, filter, queue VPP
  |
  +-- webhook filtering
  |     filter to webhook-enabled policies
  |     CALL FlippingPoliciesForHost(subset)          <-- DB query #4
  |     register flipped policies in Redis
  |
  +-- RecordPolicyQueryExecutions
        CALL FlippingPoliciesForHost(all results)     <-- DB query fleetdm#5
        reset attempt counters for newly passing
        INSERT/UPDATE policy_membership
```

Each `FlippingPoliciesForHost` call runs `SELECT policy_id, passes FROM
policy_membership WHERE host_id = ? AND policy_id IN (?)`. All 5 queries
hit the same table for the same host before `policy_membership` is
updated, so they all see identical state.

Each consumer also built intermediate maps to narrow down to its subset
before calling `FlippingPoliciesForHost`, then converted the result into
yet another set for filtering. This meant 3-4 temporary maps per
consumer.

## After

```
SubmitDistributedQueryResults(policyResults)
  |
  CALL FlippingPoliciesForHost(all results)           <-- single DB query
  build newFailingSet, normalize newPassing
  |
  +-- processScriptsForNewlyFailingPolicies
  |     filter to failing policies with scripts
  |     CHECK newFailingSet (in-memory map lookup)
  |     queue scripts
  |
  +-- processSoftwareForNewlyFailingPolicies
  |     filter to failing policies with installers
  |     CHECK newFailingSet (in-memory map lookup)
  |     queue installs
  |
  +-- processVPPForNewlyFailingPolicies
  |     filter to failing policies with VPP apps
  |     CHECK newFailingSet (in-memory map lookup)
  |     queue VPP
  |
  +-- webhook filtering
  |     filter to webhook-enabled policies
  |     FILTER newFailing/newPassing by policy IDs (in-memory)
  |     register flipped policies in Redis
  |
  +-- RecordPolicyQueryExecutions
        USE pre-computed newPassing (skip DB query)
        reset attempt counters for newly passing
        INSERT/UPDATE policy_membership
```

The intermediate subset maps and per-consumer set conversions are
removed. Each process function goes directly from "policies with
associated automation" to "is this policy in newFailingSet?" in a single
map lookup.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Performance Improvements**
* Reduced redundant database queries during policy result submissions by
computing flipping policies once per host check-in instead of multiple
times.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@dnplkndll
Copy link
Copy Markdown
Author

Superseded by #4 (feat/pg-compat-clean)

@dnplkndll dnplkndll closed this Apr 17, 2026
@dnplkndll dnplkndll deleted the pg-dialect branch April 17, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant