diff --git a/src/current/molt/migration-strategy.md b/src/current/molt/migration-strategy.md index e5e3e532a5c..e5439b0253c 100644 --- a/src/current/molt/migration-strategy.md +++ b/src/current/molt/migration-strategy.md @@ -47,6 +47,8 @@ It's important to fully [prepare the migration](#prepare-for-migration) in order ## Capacity planning +### Cluster sizing + To size the target CockroachDB cluster, consider your data volume and workload characteristics: - What is the total size of the data you will migrate? @@ -64,6 +66,10 @@ If you are migrating to a CockroachDB {{ site.data.products.core }} cluster: - Refer to our [storage recommendations]({% link {{ site.current_cloud_version }}/recommended-production-settings.md %}#storage) to determine the amount of storage to provision on each node. - For guidance on sizing for connection pools, see the CockroachDB {{ site.data.products.core }} [Production Checklist]({% link {{ site.current_cloud_version }}/recommended-production-settings.md %}#connection-pooling). +### Memory allocation + +MOLT Fetch buffers data in memory regardless of the [data path]({% link molt/molt-fetch.md %}#data-path) used. For memory sizing requirements, refer to [Memory requirements]({% link molt/molt-fetch.md %}#memory-requirements). + ## Application changes As you develop your migration plan, consider the application changes that you will need to make. These may include the following changes: diff --git a/src/current/molt/molt-fetch.md b/src/current/molt/molt-fetch.md index 70cdfb60742..e7b6063193c 100644 --- a/src/current/molt/molt-fetch.md +++ b/src/current/molt/molt-fetch.md @@ -372,12 +372,6 @@ For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, `--direct-copy` specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store: -- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight: - - ~~~ - average size of each row * --row-batch-size * --export-concurrency * --table-concurrency - ~~~ - - Direct copy does not support compression or [continuation](#fetch-continuation). - The [`--use-copy`](#data-load-mode) flag is redundant with `--direct-copy`. @@ -1095,23 +1089,31 @@ To verify that your connections and configuration work properly, run MOLT Fetch - If a PostgreSQL database is set as a [source](#source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase](#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`](#monitoring) for all exported tables. -### Optimize performance +### Memory requirements -- {% include molt/molt-drop-constraints-indexes.md %} +- MOLT Fetch buffers data in memory before flushing to files or the target database, regardless of which [data path](#data-path) you use. To prevent memory outages during [data export](#data-export-phase), estimate the amount of memory used to export tables: + + ~~~ + average size of each row * --row-batch-size * --export-concurrency * --table-concurrency + ~~~ -- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. + If you are exporting multiple tables concurrently (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), estimate the memory requirement using the table with the largest average row size. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding). -- To prevent memory outages during `READ COMMITTED` [data export](#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table: + Insufficient memory can cause degraded throughput, process crashes, or migration failures. To prevent memory issues: - ~~~ - --row-batch-size * --export-concurrency * average size of the table rows - ~~~ + - Allocate enough memory for MOLT Fetch, based on the preceding formula. + - Adjust [`--row-batch-size`](#global-flags), [`--export-concurrency`](#global-flags), or [`--table-concurrency`](#global-flags) if memory is constrained. + - Monitor memory utilization during the migration. + +### Optimize performance + +- {% include molt/molt-drop-constraints-indexes.md %} - If you are exporting more than one table at a time (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust `--row-batch-size` accordingly. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding). +- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`](https://www.postgresql.org/docs/current/sql-analyze.html) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. - If a table in the source database is much larger than the other tables, [filter and export the largest table](#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task. -- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding). +- Ensure that the machine running MOLT Fetch is equipped to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding). ### Import and continuation handling