diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 1899b73..bd1e422 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -11,12 +11,13 @@ jobs: steps: - name: Checkout code uses: actions/checkout@v4 - - name: Checkout another public repo + - name: Checkout pgdog-enterprise uses: actions/checkout@v4 with: - repository: pgdogdev/pgdog - ref: main + repository: pgdogdev/pgdog-enterprise + ref: main-ent path: pgdog-source + token: ${{ secrets.PGDOG_ENTERPRISE_TOKEN }} - uses: actions-rs/toolchain@v1 with: toolchain: stable diff --git a/docs/enterprise_edition/active_queries.md b/docs/enterprise_edition/active_queries.md deleted file mode 100644 index 65bd964..0000000 --- a/docs/enterprise_edition/active_queries.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -icon: material/play-circle ---- -# Running queries - -PgDog EE provides a real-time view into queries currently executing on PostgreSQL connections. It is accessible in two places: - -1. [`SHOW ACTIVE_QUERIES`](#admin-database) admin command -2. [Activity](#dashboard) view in the dashboard - -## How it works - -When a client sends a query to PgDog, it will first attempt to acquire a connection from the connection pool. Once acquired, it will register the query with the live query view. After the query finishes running, it's removed from the view. - -Only queries that are currently executing through PgDog are visible in this view. If your application doesn't connect to PgDog, its queries won't appear here. - -### Admin database - -You can see which queries are actually running on each instance by connecting to the admin database and running the `SHOW ACTIVE_QUERIES` command: - -=== "Command" - ``` - SHOW ACTIVE_QUERIES; - ``` - -=== "Output" - ``` - query | protocol | database | user | running_time | - -------------------------------------------------------+----------+----------+-------+--------------+ - SELECT * FROM users WHERE id = $1 | extended | pgdog | pgdog | 15 | - SELECT pg_sleep(50) | simple | pgdog | pgdog | 5 | - INSERT INTO users (id, email) VALUES ($1, $2) | extended | pgdog | pgdog | 1 | - ``` - -The following information is available in the running queries view: - -| Column | Description | -|-|-| -| `query` | The SQL statement currently executing on a PostgreSQL connection. | -| `protocol` | What version of the query protocol is used. `simple` protocol injects parameters into text, while `extended` is used by prepared statements. | -| `database` | The name of the connection pool database. | -| `user` | The name of the user executing the query. | -| `running_time` | For how long (in ms) has the query been running. | - -### Dashboard - -If you're running multiple instances of PgDog, active queries from all instances are aggregated and sent to the Dashboard application. They are then made available in the Activity tab, in real-time, with query plans automatically attached for slow queries. - -
- How PgDog works - Real-time view into running queries. -
diff --git a/docs/enterprise_edition/control_plane.md b/docs/enterprise_edition/control_plane.md new file mode 100644 index 0000000..5548336 --- /dev/null +++ b/docs/enterprise_edition/control_plane.md @@ -0,0 +1,68 @@ +--- +icon: material/console +--- + +# Control plane + +Multi-node PgDog deployments require synchronization to perform certain tasks, like atomic configuration changes, toggling [maintenance mode](../administration/maintenance_mode.md), [resharding](../features/sharding/resharding/index.md), and more. To make this work, PgDog Enterprise comes with a control plane, an application deployed alongside PgDog, to provide coordination and collect and present system telemetry. + +## How it works + +The control plane and PgDog processes communicate via the network using HTTP. They exchange messages to send metrics, commands, and other metadata that allows PgDog to transmit real-time information to the control plane, and for the control plane to control the behavior of each PgDog process. + +
+ Control plane +
+ +### Configuration + +In order for PgDog to connect to the control plane, it needs to be configured with its endpoint address and an authentication token, both of which are specified in [`pgdog.toml`](../configuration/pgdog.toml/general.md): + +```toml +[control] +endpoint = "https://control-plane-endpoint.cloud.pgdog.dev" +token = "cff57e5c-7c4f-4ca0-b81c-c8ed22cf873d" +``` + +The authentication token is generated by the control plane and identifies each PgDog deployment. PgDog nodes which are part of the same deployment should use the same token. + +For example, if you're using our [Helm chart](../installation.md#kubernetes), you can configure the endpoint and token in `values.yaml` as follows: + +```yaml +control: + endpoint: https://control-plane-endpoint.cloud.pgdog.dev + token: cff57e5c-7c4f-4ca0-b81c-c8ed22cf873d +``` + +### Connection flow + +The connection to the control plane is initiated by PgDog on startup and happens in the background. Upon connecting, PgDog will send its node identifier (randomly generated, or set in the `NODE_ID` envrionment variable) to register with the control plane, and start uploading telemtry and poll for commands. + +!!! note "Error handling" + Since most PgDog functions (including sharding) are configuration-driven, the control plane connection is **not required** + for PgDog to start and serve queries. + + If any error is encounted while communicating with the control plane, + PgDog will continue operating normally, while attempting to reconnect periodically. + + +This architecture makes the communication link more resilient to unreliable network conditions. + +### Telemetry + +PgDog transmits the following information to the control plane: + +| Telemetry | Description | +|-|-| +| [Metrics](metrics.md) | The same [metrics](../features/metrics.md) as exposed by the Prometheus endpoint (and the admin database), are trasmitted at a much higher frequency, to allow for real-time monitoring. | +| [Active queries](insights/active_queries.md) | Queries that are currently executing through each PgDog node. | +| [Query statistics](insights/statistics.md) | Real-time statistics on each query executed through PgDog, like duration, idle-in-transaction time, and more. | +| [Errors](insights/errors.md) | Recent errors encountered by clients, e.g. query syntax issues. | +| [Query plans](insights/query_plans.md) | Output of `EXPLAIN` for slow and sampled queries, collected by PgDog in the background. | +| [Configuration](configuration.md) | Current PgDog settings and database schema. | + +#### High availability + +The control plane itself is backed by a PostgreSQL database, used for storing historical metrics, query statistics, configuration, and other metadata. + +This allows multiple instances of the control plane to be deployed in a high-avaibility setup, since all actions are syncrhonized by PostgreSQL transactions and locks. diff --git a/docs/enterprise_edition/index.md b/docs/enterprise_edition/index.md index 94d778d..be50315 100644 --- a/docs/enterprise_edition/index.md +++ b/docs/enterprise_edition/index.md @@ -2,20 +2,31 @@ icon: material/office-building --- -# PgDog EE +# PgDog Enterprise -PgDog **E**nterprise **E**dition is a version of PgDog that contains additional features for at scale monitoring and deployment of sharded (and unsharded) PostgreSQL databases. +PgDog Enterprise is a version of PgDog that contains additional features for at scale monitoring and deployment of sharded (and unsharded) PostgreSQL databases. -Unlike PgDog itself, PgDog EE is closed source and available upon the purchase of a license. It comes with a hosted management dashboard which provides real-time visibility into PgDog's operations. +Unlike PgDog itself, PgDog Enterprise is closed source and available upon the purchase of a license. It comes with a control plane which provides real-time visibility into PgDog's operations and enterprise features. ## Features | Feature | Description | |-|-| -| [Running queries](active_queries.md) | Instant view into queries running through PgDog. | -| [Query plans](query_plans.md) | Root cause slow queries in seconds with automatic PostgreSQL query plans. | -| [Real-time metrics](metrics.md) | All PgDog metrics, delivered with second-precision through a dedicated link. | -| Query blocker | Terminate all instances of a slow query with a button click and prevent them from running again. | +| [Control plane](control_plane.md) | Synchronize and monitor multiple PgDog processes. | +| [Active queries](insights/active_queries.md) | Real-time view into queries running through PgDog. | +| [Query plans](insights/query_plans.md) | Root cause slow queries and execution anomalies with real-time Postgres query plans, collected in the background. | +| [Real-time metrics](metrics.md) | All PgDog metrics, delivered with second-precision through a dedicated connection. | +| [Query statistics](insights/statistics.md) | Query execution statistics, like duration, idle-in-transaction time, errors, and more. | + +## Roadmap + +PgDog Enterprise is new and in active development. A lot of the features we want aren't built yet: + +| Feature | Description | +|-|-| +| QoS | Quality of service guarantees, incl. throttling on a per-user/database/query level. | +| AWS RDS integration | Deploy PgDog on top of AWS RDS, without the hassle of Kubernetes or manual configuration. | +| Automatic resharding | Detect hot shards and re-shard data without operator intervention. | ## Get a demo diff --git a/docs/enterprise_edition/insights/active_queries.md b/docs/enterprise_edition/insights/active_queries.md new file mode 100644 index 0000000..52be965 --- /dev/null +++ b/docs/enterprise_edition/insights/active_queries.md @@ -0,0 +1,59 @@ +--- +icon: material/play-circle +--- + +# Active queries + +PgDog Enterprise provides a real-time view into queries currently executing on its PostgreSQL connections. This is accessible in two places: + +1. [`SHOW ACTIVE_QUERIES`](#admin-database) admin command +2. [Activity](#dashboard) view in the dashboard + +## How it works + +When a client sends a query to PgDog, it will first attempt to acquire a connection from the connection pool. Once acquired, it will register the query with the live query view. After the query finishes running, it's removed from the view. + +Only queries that are currently executing through PgDog are visible. If your application doesn't connect to PgDog, its queries won't appear here. + +### Admin database + +You can see which queries are actually running on each instance by connecting to the [admin database](../../administration/index.md) and running the `SHOW ACTIVE_QUERIES` command: + +=== "Command" + ``` + SHOW ACTIVE_QUERIES; + ``` + +=== "Output" + ``` + query | protocol | database | user | running_time | plan + ---------------------------------------------------+----------+----------+-------+--------------+--------------------------------------------------------------- + SELECT * FROM users WHERE id = $1 | extended | pgdog | pgdog | 15 | Index Scan on users (cost=0.15..8.17 rows=1 width=64) + SELECT pg_sleep(50) | simple | pgdog | pgdog | 1662 | Result (cost=0.00..0.01 rows=1 width=4) + INSERT INTO users (id, email) VALUES ($1, $2) | extended | pgdog | pgdog | 1 | Insert on users (cost=0.00..0.01 rows=0 width=0) + ``` + +The following information is available in the running queries view: + +| Column | Description | +|-|-| +| `query` | The SQL statement currently executing on a PostgreSQL connection. | +| `protocol` | What version of the query protocol is used. `simple` protocol injects parameters into text, while `extended` is used by prepared statements. | +| `database` | The name of the connection pool database. | +| `user` | The name of the user executing the query. | +| `running_time` | For how long (in ms) has the query been running. | +| `plan` | The query execution plan obtained from PostgreSQL using `EXPLAIN`. | + +### Web UI + +If you're running multiple instances of PgDog, active queries from all instances are aggregated and sent to the [control plane](../control_plane.md). They are then made available in the Activity tab, in real-time, with query plans automatically attached for slow queries. + +
+ How PgDog works +
+ +### Parameters + +If your application is using prepared statements (or just placeholders in queries), the parameters for these queries are not shown and will not be sent to the control plane. + +If your application is using simple statements (parameters in query text), PgDog will normalize the queries, removing values and replacing them with parameter symbols (e.g., `$1`). This is to make sure no sensitive data leaves the database network. diff --git a/docs/enterprise_edition/insights/errors.md b/docs/enterprise_edition/insights/errors.md new file mode 100644 index 0000000..baedb1c --- /dev/null +++ b/docs/enterprise_edition/insights/errors.md @@ -0,0 +1,47 @@ +--- +icon: material/alert-circle +--- + +# Errors + +PgDog Enterprise tracks query errors returned by PostgreSQL, providing a real-time view into recently encountered issues like syntax errors, missing columns, or lock timeouts. + +## Admin database + +You can see recent errors by connecting to the [admin database](../../administration/index.md) and running the `SHOW ERRORS` command: + +=== "Command" + ``` + SHOW ERRORS; + ``` + +=== "Output" + ``` + error | count | age | query + --------------------------------+-------+------+------------------------ + column "sdfsdf" does not exist | 1 | 1444 | SELECT sdfsdf; + syntax error at end of input | 3 | 500 | SELECT FROM users; + relation "foo" does not exist | 2 | 120 | SELECT * FROM foo; + ``` + +The following information is available in the errors view: + +| Column | Description | +|-|-| +| `error` | The error message returned by PostgreSQL. | +| `count` | The number of times this error has been encountered. | +| `age` | How long ago (in ms) was this error last seen. | +| `query` | The last SQL statement that caused the error. | + +## Configuration + +Errors are collected automatically if query statistics are enabled. The in-memory view is periodically purged of old errors, configurable in [`pgdog.toml`](../configuration/pgdog.toml/general.md): + +```toml +[query_stats] +enabled = true +max_errors = 100 +max_error_age = 300_000 # 5 minutes +``` + +By default, PgDog will keep up to 100 distinct errors for a maximum of 5 minutes. This data is periodically sent to the [control plane](../control_plane.md), so the history of seen errors is available in the web UI. diff --git a/docs/enterprise_edition/insights/index.md b/docs/enterprise_edition/insights/index.md new file mode 100644 index 0000000..8210c6d --- /dev/null +++ b/docs/enterprise_edition/insights/index.md @@ -0,0 +1,20 @@ +--- +icon: material/lightbulb-on +--- + +# Query insights + +PgDog Enterprise provides visibility into all queries that it serves, which allows it to analyze and report how those queries perform, in real-time. + +## Telemetry + +PgDog Enterprise collects the following telemetry: + +| Telemetry | Frequency | Description | +|-|-|-| +| [Active queries](active_queries.md) | real time | Queries actively executing through the proxy. | +| [Query plans](query_plans.md) | sample / threshold | Query plans (`EXPLAIN` output) are collected for slow queries and sampled queries automatically. | +| [Query statistics](statistics.md) | real time | Query duration, number of rows returned, idle-in-transaction time, errors, and more. | +| [Errors](errors.md) | real time | View into recently encountered query errors, like syntax errors or lock timeouts. | + +This data is transmitted to the [control plane](../control_plane.md) in real-time, which makes it available via its web dashboard and HTTP API. diff --git a/docs/enterprise_edition/query_plans.md b/docs/enterprise_edition/insights/query_plans.md similarity index 75% rename from docs/enterprise_edition/query_plans.md rename to docs/enterprise_edition/insights/query_plans.md index 8273dfc..b40bd56 100644 --- a/docs/enterprise_edition/query_plans.md +++ b/docs/enterprise_edition/insights/query_plans.md @@ -41,6 +41,26 @@ The following information is available in this view: | `user` | The name of the user running the query. | | `age` | How long ago the plan was fetched from Postgres (in ms). | +### Configuration + +Which queries are planned and how frequently is configurable in [`pgdog.toml`](../configuration/pgdog.toml/general.md): + +```toml +[query_stats] +enabled = true +query_plan_threshold = 250 # 250 ms +query_plans_cache = 100 +query_plans_sample_rate = 0.0 +query_plan_max_age = 15_000 +``` + +| Setting | Description | +|-|-| +| `query_plan_threshold` | Minimum query execution duration (in ms), as recorded by PgDog in [query statistics](statistics.md) which will trigger a plan collection. | +| `query_plans_cache` | How many plans to keep in the cache to avoid planning the same queries multiple times. | +| `query_plans_sample_rate` | Percentage of queries (0.0 - 1.0) to collect plans for irrespective of their execution duration. | +| `query_plan_max_age` | For how long (in ms) to keep plans in the cache before they are considered stale and require a new plan. | + ### Dashboard The query plans are automatically attached to running queries and sent to the Dashboard via a dedicated connection. They can be viewed in real-time in the [Activity](active_queries.md#dashboard) tab. diff --git a/docs/enterprise_edition/insights/statistics.md b/docs/enterprise_edition/insights/statistics.md new file mode 100644 index 0000000..5e283be --- /dev/null +++ b/docs/enterprise_edition/insights/statistics.md @@ -0,0 +1,100 @@ +--- +icon: material/chart-bar +--- + +# Query statistics + +PgDog Enterprise collects detailed per-query statistics, similar to PostgreSQL's `pg_stat_statements`, with extra information useful for debugging application performance. + +## How it works + +All queries are normalized (parameters replaced with `$1`, `$2`, etc.) and grouped, so you can see aggregate performance data for each unique query pattern. Each query execution is recorded, along with the number of rows returned, the time it took to process the request, and how much of it was spent idling inside a transaction. + +This data is accessible via two mediums: + +1. [Admin database](#admin-database) +2. The Insights page in the web UI of the [control plane](../control_plane.md) + +### Admin database + +You can view query statistics by connecting to the [admin database](../../administration/index.md) and running the `SHOW QUERY_STATS` command: + +=== "Command" + ``` + SHOW QUERY_STATS; + ``` + +=== "Output" + ``` + -[ RECORD 1 ]------------+------------------------------- + query | SELECT now(); + calls | 1 + active | 0 + total_exec_time | 2.045 + min_exec_time | 2.045 + max_exec_time | 2.045 + avg_exec_time | 2.045 + total_rows | 1 + min_rows | 1 + max_rows | 1 + avg_rows | 1.000 + errors | 0 + last_exec | 2026-03-06 13:06:23.255 -08:00 + last_exec_in_transaction | 0 + idle_in_transaction_time | 0.000 + -[ RECORD 2 ]------------+------------------------------- + query | SELECT $1; + calls | 2 + active | 0 + total_exec_time | 5.718 + min_exec_time | 2.322 + max_exec_time | 3.397 + avg_exec_time | 2.859 + total_rows | 2 + min_rows | 1 + max_rows | 1 + avg_rows | 1.000 + errors | 0 + last_exec | 2026-03-06 13:06:15.990 -08:00 + last_exec_in_transaction | 0 + idle_in_transaction_time | 0.000 + ``` + +The following information is available in the query statistics view: + +| Column | Description | +|-|-| +| `query` | The normalized SQL statement. | +| `calls` | Total number of times this query has been executed. | +| `active` | Number of instances of this query currently executing. | +| `total_exec_time` | Total execution time (in ms) across all calls. | +| `min_exec_time` | Minimum execution time (in ms) of a single call. | +| `max_exec_time` | Maximum execution time (in ms) of a single call. | +| `avg_exec_time` | Average execution time (in ms) per call. | +| `total_rows` | Total number of rows returned across all calls. | +| `min_rows` | Minimum number of rows returned by a single call. | +| `max_rows` | Maximum number of rows returned by a single call. | +| `avg_rows` | Average number of rows returned per call. | +| `errors` | Total number of errors encountered by this query. | +| `last_exec` | Timestamp of the last time this query was executed. | +| `last_exec_in_transaction` | Number of times the last execution was inside a transaction. | +| `idle_in_transaction_time` | Total time (in ms) spent idle inside a transaction after this query completed. | + +### Configuration + +Query statistics collection can be enabled/disabled and tweaked via configuration in [`pgdog.toml`](../configuration/pgdog.toml/general.md): + +```toml +[query_stats] +enabled = true +max_entries = 10_000 +``` + +By default, if enabled, query statistics will store 10,000 distinct query entries. When a new query exceeds this limit, PgDog will remove the least frequently seen query from the view, using a similar exponential decay algorithm used by `pg_stat_statements` in PostgreSQL. + + +### Comparison to `pg_stat_statements` + +PgDog's query statistics are an improvement on `pg_stat_statements` because they record information it doesn't, like `errors`, and idle-in-transaction timing. These are important for debugging production performance issues. + +Additionally, PgDog can have multiple instances of the proxy in front of the same database. This allows the query statistics implementation to have a lower impact on overall database performance, by taking advantage of multiple CPUs and reduced locking overhead. diff --git a/docs/enterprise_edition/metrics.md b/docs/enterprise_edition/metrics.md index 4e84981..253a61e 100644 --- a/docs/enterprise_edition/metrics.md +++ b/docs/enterprise_edition/metrics.md @@ -3,19 +3,31 @@ icon: material/speedometer --- # Real-time metrics -PgDog EE collects and sends its own metrics to the Dashboard. This provides a real-time view into PgDog internals, without a delay that's typically present in other monitoring solutions. +PgDog Enterprise collects and trasmits its own metrics to the [control plane](control_plane.md), at a configurable interval (1s, by default). This provides a real-time view into PgDog internals, without a delay that's typically present in other monitoring solutions. ## How it works Real-time metrics are available in both Open Source and Enterprise versions of PgDog. The [open source metrics](../features/metrics.md) are accessible via an OpenMetrics endpoint or via the admin database. -In PgDog EE, the same metrics are collected and sent via a dedicated uplink to the Dashboard. This provides an out-of-the-box experience for monitoring deployments, without delays typically introduced by other solutions. +In PgDog EE, the same metrics are collected and sent via a dedicated connection to the control plane. Since metrics are just numbers, they can be serialized and sent quickly. To deliver second-precision metrics, PgDog EE requires less than 1KB/second of bandwidth and little to no additional CPU or memory. + +### Configuration + +The intervals at which metrics are uploaded to the control plane are configurable in [`pgdog.toml`](../configuration/pgdog.toml/general.md): + +```toml +[control] +metrics_interval = 1_000 # 1s +endpoint = "https://control-plane-endpoint.cloud.pgdog.dev" +token = "cff57e5c-7c4f-4ca0-b81c-c8ed22cf873d" +``` + +The default value is **1 second**, which should be sufficient to debug most production issues. + +### Web UI + +Once the metrics reach the control plane, they are pushed down to the web dashboard via a real-time connection. Per-minute aggregates are computed in the background and stored in a separate PostgreSQL database, which provides a historical view into overall database performance.
How PgDog works - Real-time metrics.
- -Since metrics are just integers, they can be serialized and sent efficiently. To deliver second-precision metrics, PgDog EE requires less than 1KB/second of bandwidth and basically no CPU or additional memory. - -Once the metrics reach the Dashboard, they are pushed down to the web UI via a WebSocket connection. At the same time, per-minute aggregates are computed in the background and stored in a separate Postgres database. This provides a historical view into database performance. diff --git a/docs/images/control_plane.png b/docs/images/control_plane.png new file mode 100644 index 0000000..cbe919a Binary files /dev/null and b/docs/images/control_plane.png differ diff --git a/docs/images/ee/metrics.png b/docs/images/ee/metrics.png index 2542bb8..bd0a161 100644 Binary files a/docs/images/ee/metrics.png and b/docs/images/ee/metrics.png differ