diff --git a/docs/getting-started/schema-optimization.md b/docs/getting-started/schema-optimization.md index 91ef689..29bc6f8 100644 --- a/docs/getting-started/schema-optimization.md +++ b/docs/getting-started/schema-optimization.md @@ -26,6 +26,8 @@ Schema optimization helps you detect and fix: 4. Test changes in a development environment first 5. Execute the SQL on your production database during low-traffic periods +If an automatic schema change fails in Releem, use the [Schema Change Troubleshooting](/releem-agent/schema-change-troubleshooting) guide to match the error to the next action. + For detailed information about each type of schema check and comprehensive best practices, see the [MySQL Database Schema Checks](https://releem.com/blog/mysql-database-schema-checks) article. Schema optimization is essential for maintaining long-term database health and performance as your application grows. diff --git a/docs/query-optimization/automatic-schema-changes.md b/docs/query-optimization/automatic-schema-changes.md new file mode 100644 index 0000000..a40a330 --- /dev/null +++ b/docs/query-optimization/automatic-schema-changes.md @@ -0,0 +1,108 @@ +--- +id: automatic-schema-changes +title: Automatic Schema Changes +--- + +# Automatic schema changes in the Releem Agent + +If the **Releem Agent** is already installed and running, you can allow it to execute approved schema changes on the server. Automatic schema changes also include the option of running a pre-change backup, in case a rollback is required. + +Both automatic schema changes and backups were implemented with availability in mind, so they will only run if: +* There is enough disk space to perform both, the backup and the schema change +* The backup won't block the affected tables +* Point-in-time restore is possible on the server +* The schema change won't block the affected tables + +The following steps explain how to configure the agent and the database user to handle this new functionality. + +--- + +## 1. Locate the configuration file + +To enable automatic schema changes, we need to include a few new parameters in the agent configuration file. Below is the default location for Linux servers. Open the file with your favorite editor to add the new parameters. + +| Platform | Default path | +|----------|----------------| +| Linux | `/opt/releem/releem.conf` | + +--- +## 2. Enable automatic schema (DDL) execution + +By default the agent **does not** run schema changes from Releem, even when you approve them in the product. For schema changes to be executed on your database server, activate this feature explicitly by setting `enable_exec_ddl` to `true`. + +Before running the schema change against the real table, the agent will perform a dry-run of the change against an empty table with the same structure. This is to guarantee that the operation can run successfully with the intenteded strategy. + +There are some schema changes that the database server can't execute on its own, without blocking the table. An alternative it to use an external tool called [pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html). This tool creates a copy of the table with the intended changes, copies all data to this new table, and swaps it with the existing one, with minimum impact. + +[pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html) needs to be available on the server and the location of the tool can be specified in the configuration. + +| Setting | Values | What it does | +|---------|--------|----------------| +| `enable_exec_ddl` | `false` (default) or `true` | When `true`, the agent may execute **schema changes** that Releem sends after analysis. When `false`, those changes are not run; the agent reports that execution is disabled. | +| `ptosc_path` | `pt-online-schema-change` | Percona Toolkit is not on `PATH` or you use a non-standard binary location. | +| `online_ddl_test_schema` | `releem_online_ddl_test` (default) or any valid database/schema name | **Optional:** Database/schema name where the agent will test the schema change before executing it against the real table| + + +--- +## 3. Configure your backup settings + +When a pre-change backup is requested, the agent needs tools and extra disk space available on the **same host that runs the agent**. As mentioned before, the Releem agent will look for the best alternative to backup the affected tables before the schema change is executed. + +* If the server and the table supports it, the agent will create a physical backup of the table using `xtrabackup` or `mariabackup` +* If online physical backup is not an option, the agent will use mysqldump to create a logical backup of the data (a `.sql` file with necessary statements to re-create the table and the data) + +Releem only proceeds with the backup when **point-in-time recovery** is available for the instance as Releem detects it. If not, the change that required the backup will not run. + + +| Setting | Values | What it does | +|---------|--------|----------------| +| `backup_dir` | `/tmp/backups` (default) | Directory for backup output. Must exist or be creatable and have enough free space. | +| `mysqldump_path` | `mysqldump` (default) | Full path or name on `PATH` for `mysqldump` (logical backup). | +| `xtrabackup_path` | `xtrabackup` (default) | Full path or name on `PATH` for `xtrabackup` (physical backup when Releem selects that method). | +| `backup_space_buffer` | `20.0` (default) | Extra free space (as a percentage) the agent requires above its estimated backup size before starting a backup. | + + +--- +## 4. Extend database user permissions + +The same **MySQL user** the agent already uses for monitoring must have permission to run the approved ALTER statements. Connect to the target database server and run the he GRANT statements below: + +```sql +-- To allow table ALTERs and New indexes on **any** database +GRANT CREATE, REFERENCES, INDEX, ALTER ON *.* TO `releem`@`127.0.0.1` +``` + +```sql +-- Alternative: grant ALTER permissions *only* on a specific database +GRANT CREATE, REFERENCES, INDEX, ALTER ON `airportdb`.* TO `releem`@`127.0.0.1` +``` + +```sql +-- Needed for schema changes dry-runs (note this only affects the test database) +GRANT CREATE, DROP, INDEX, ALTER ON `releem_online_ddl_test`.* TO `releem`@`127.0.0.1` +``` + +#### Optional - To use pt-online-schema-change as an alternative method when the operation can't be executed online by the server +```sql +GRANT SELECT, INSERT, DROP, RELOAD, SUPER, SHOW VIEW, TRIGGER ON *.* TO `releem`@`127.0.0.1` +``` + +--- + + + +## 5. Restart the agent + + +After editing, **restart the Releem Agent** so changes take effect. + +--- + +## External tools + +Install **mysqldump**, **XtraBackup**, **mariabackup** and **pt-online-schema-change**as appropriate for your Database server and OS flavor. For more information about how to install these tools, please refer to: + +* [pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html) +* [xtrabackup](https://docs.percona.com/percona-xtrabackup/2.4/index.html) +* [mariabackup](https://mariadb.com/docs/server/server-usage/backup-and-restore/mariadb-backup/mariadb-backup-overview#installing-mariadb-backup) +* [mysqldump](https://dev.mysql.com/doc/refman/9.7/en/mysqldump.html) diff --git a/docs/query-optimization/schema-change-troubleshooting.md b/docs/query-optimization/schema-change-troubleshooting.md new file mode 100644 index 0000000..e3f9015 --- /dev/null +++ b/docs/query-optimization/schema-change-troubleshooting.md @@ -0,0 +1,94 @@ +--- +id: schema-change-troubleshooting +title: Schema Change Troubleshooting +--- + +# Schema Change Troubleshooting + +This guide helps you troubleshoot failed **automatic schema changes** executed by the Releem Agent. Use it when Releem cannot apply an index or table change automatically and the Releem Dashboard shows a failed task. + +When a change fails, open the failed task in the Releem Dashboard and check: + +- **Apply Index Error** - the detailed message, usually including `Statement N failed: ...`. +- **Agent logs** - useful when the dashboard message is not enough. See [How to Check Releem Agent Logs](/releem-agent/how-to-check-logs). + +## Before you retry + +1. Read the exact output in the Releem Dashboard. +2. Match the message to the table below. +3. Fix the server-side issue first. Retrying without changing anything usually fails again. +4. If the error says the payload is invalid or empty, contact Releem support with the task id. + +Automatic schema changes are intended for environments where the Releem Agent is allowed to make DDL changes. The Agent must have enough MySQL privileges, access to the configured backup tools, and `enable_exec_ddl = true` in `/opt/releem/releem.conf` when automatic DDL execution is enabled. +For configuration prerequisites, see [Automatic Schema Changes](query-optimization/automatic-schema-changes). + +--- + +## Errors before execution starts + +| Scenario | Troubleshooting steps | +| --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| DDL failed syntax validation | Fix the SQL in Releem (or cancel and recreate the change). The task output includes `syntax validation failed` and any `syntax_error` detail from analysis. Do not retry the same statement until the DDL is corrected. | +| Schema change execution disabled | Set `enable_exec_ddl = true` in `/opt/releem/releem.conf` (or your config path), restart the agent, and retry the change from Releem.| +| Invalid or malformed task payload | This is not fixable on the server alone—the task JSON from Releem is invalid or missing required fields (`schema_name`, `ddl_statement`, `analysis_results.schema_name`, `analysis_results.table_name`). Contact Releem support with the task id; retry after the platform resends a valid payload. | +| Empty schema change list | The task contained no statements to run. Retry from Releem or contact support if the change should have been scheduled. | + +--- + +## Errors during validation (per statement) + +These stop the task before any DDL or backup runs on the server. + +| Scenario | Troubleshooting steps | +| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| No safe execution method | Releem analysis marked the change as neither online DDL nor `pt-online-schema-change` safe. This means that the change cannot run without temporarily blocking the affected tables, and thus, will not be executed automatically. A maintenance window for manual execution is required. Contact Releem for more details about this scenario. | +| Pre-change backup required but PITR unavailable | A table backup before the schema change is executed was requested, but point-in-time recovery is not available on this instance (binary log is not enabled or the retention window is too small). Enable the binary log on the server by configuring `log_bin` and make sure `expire_log_days` is greater equal or greater than 2. Alternatively, disable the pre-change backup requirement for this change. | + + +--- + +## Errors on Backup or Execution + +### Disk space and filesystem capacity + +| Scenario | Error message | Troubleshooting steps | +| -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Insufficient space on MySQL datadir | `insufficient datadir free space: ... required >10%` or `insufficient datadir capacity for table change: projected usage ... exceeds 90% limit` | Free space must stay **above 10%** and projected use after the change must stay **at or below 90%**. Free space on the datadir filesystem, archive or drop unused data, or shrink large tables before retrying. This check can be disabled by setting `disable_space_checks = true` in `releem.conf` although **it is not recommended**. It should be done as a last resort and only temporarily. | +| Insufficient space in backup directory | `backup failed: insufficient disk space: required ... available ...` | Free space on the volume that holds `backup_dir` (default `/tmp/backups`), point `backup_dir` to a larger filesystem, or lower `backup_space_buffer` only if you accept less safety margin. | +| Cannot read datadir or table size | `failed to resolve datadir`, `datadir is empty`, `failed to get table size`, `failed to check datadir filesystem capacity`, or `invalid datadir filesystem size` | Verify that the agent database user has the necessary permissions on the target table. Check [Automatic Schema Changes](query-optimization/automatic-schema-changes) for more details.| +| Cannot check backup directory | `failed to check disk space` or `failed to create backup directory` | Ensure `backup_dir` exists and is accessible by the agent process . | + + +### Pre-change backup + +| Scenario | Error message | Troubleshooting steps | +| ----------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| mysqldump backup failed | `backup failed: mysqldump failed: ...` | Make sure `mysqldump` is installed on the server and available at `mysqldump_path`. Confirm the agent database user has access to the target table. | +| XtraBackup `backup` or `prepare` failed | `backup failed: xtrabackup backup failed: ...` or `backup failed: xtrabackup prepare failed: ...` | Install a compatible version of **xtrabackup** (or **mariabackup** in case the target host is running MariaDB) and confirm the tool is available at `xtrabackup_path`. Verify the agent database user has all necessary privileges. | +| Backup size estimate failed | `failed to estimate backup size: ...` | Check that the target table still exists. It is possible that the table was renamed or dropped after the recommended change was generated. | + + +### Online DDL (including dry-run on test table) + +| Scenario | Error message | Troubleshooting steps | +| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Online DDL preflight (dry-run) failed | `schema change execution failed: online DDL preflight failed on test table ...` | The agent clones the table into `online_ddl_test_schema` (default `releem_online_ddl_test`) and runs the DDL there first. Make sure the agent database user has the necessary permissions. Check [Automatic Schema Changes](query-optimization/automatic-schema-changes) for more details. | +| Online DDL failed on production table | `schema change execution failed: ...` (after preflight succeeded) | An unexpected situation caused the backup to fail. Check the agent log for additional errors and contact Releem support. | +| Test schema cannot be created | `schema change execution failed: test schema is required for online DDL preflight`, `... failed to create test schema ...`, or `... failed to create test table ...` |Make sure the agent database user has the necessary permissions. Check [Automatic Schema Changes](query-optimization/automatic-schema-changes) for more details | +| Lock wait timeout | `failed to set session lock_wait_timeout: ...`, or `schema change execution failed: ...` mentioning lock wait / metadata locks | Online DDL sets `lock_wait_timeout = 20`. If errors mention lock wait or metadata locks, clear blocking transactions and retry, or retry execution during a maintenance window. | + + +### pt-online-schema-change + +| Scenario | Error message | Troubleshooting steps | +| ---------------------------- | ----------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `pt-online-schema-change` dry-run failed | `pt-online-schema-change execution failed: pt-online-schema-change dry-run failed: ...` | Install [Percona Toolkit](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html), set `ptosc_path`, and grant the agent database user the required permissions for the target table. Depending on your MySQL version and topology, pt-online-schema-change may require privileges such as `SELECT`, `INSERT`, `DROP`, `RELOAD`, `SUPER`, `SHOW VIEW`, `TRIGGER`. | +| `pt-online-schema-change` execute failed | `pt-online-schema-change execution failed: pt-online-schema-change failed: ...` | Dry-run passed but the execute step failed. Check pt-online-schema-change output in logs (triggers, replicas, disk, permissions, etc) and contact Releem support. | + + +## No statements executed + + +| Scenario | Troubleshooting steps | +| -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| No schema changes executed | Task output includes `No schema changes were executed.` This is returned when the loop finishes without applying any statement (unusual if earlier validation passed). Review full task output and agent logs; retry from Releem or contact support with the task id. | diff --git a/sidebars.js b/sidebars.js index 62d0c72..d77b0c5 100644 --- a/sidebars.js +++ b/sidebars.js @@ -93,6 +93,8 @@ const sidebars = { 'query-optimization/enable-sql-query-optimization', 'query-optimization/disable-sql-query-optimization', 'query-optimization/prepared-statements-issue', + 'query-optimization/automatic-schema-changes', + 'query-optimization/schema-change-troubleshooting', ], }, {