Cherry-pick missing pg-dump commits from gp7 by Smyatkin-Maxim · Pull Request #1821 · apache/cloudberry

Smyatkin-Maxim · 2026-06-15T11:06:09Z

Here are quite a few commints from gp7 to support gpupgrade from 6 to 7. Most of these commits are itself borrowed from PG. We also need them to support gpupgrade from GP6 to Cloudberry.

At the moment I'm targeting Cloudberry 2, completely igniring main. My goal is to upgrade production clusters from GP6 to Cloudberry 2. As soon as this goal is reached, we might start thinking about either GP6->CB3 support or at least CB2->CB3 support.

These commits should't break anything but aren't enough for an upgrade. For gpupgrade to work we need this PR to land + a bunch of additional patches to core(15-20 smallish commits) and a bunch of patches to gpupgrade itself.

I already have a prototype which completely restores regression dataset (the one created during installcheck), it consists of 18 patches:

ls -lt ./gpupgrade-patches/
total 108
-rw-rw-r-- 1 smiatkin smiatkin  1788 Jun 14 21:22 0018-pg_dump-drop-unused-tgispartition-alias-in-getTrigge.patch
-rw-rw-r-- 1 smiatkin smiatkin  1496 Jun 14 21:22 0017-pg_dump-restrict-default-type-storage-options-to-bas.patch
-rw-rw-r-- 1 smiatkin smiatkin  2134 Jun 14 21:22 0016-pg_dump-simplify-attribute-encoding-attrelid-query.patch
-rw-rw-r-- 1 smiatkin smiatkin  1943 Jun 14 21:22 0015-pg_dump-fix-getPartitionDefs-GPDB6-version-guard-and.patch
-rw-rw-r-- 1 smiatkin smiatkin  6675 Jun 14 21:22 0014-pg_dump-address-external-partition-EXCHANGE-without-.patch
-rw-rw-r-- 1 smiatkin smiatkin  1801 Jun 14 21:22 0013-pg_dump-skip-subpartition-TEMPLATE-during-binary-upg.patch
-rw-rw-r-- 1 smiatkin smiatkin  3630 Jun 14 21:22 0012-pg_dump-only-emit-column-ENCODING-for-AOCO-tables.patch
-rw-rw-r-- 1 smiatkin smiatkin  4228 Jun 14 21:22 0011-pg_dump-synthesize-table-AM-from-relstorage-and-stri.patch
-rw-rw-r-- 1 smiatkin smiatkin  6360 Jun 14 21:22 0010-pg_dump-translate-gp_default_storage_options-to-defa.patch
-rw-rw-r-- 1 smiatkin smiatkin  1867 Jun 14 21:22 0009-pg_dump-add-toast-table-OID-to-binary-upgrade-preass.patch
-rw-rw-r-- 1 smiatkin smiatkin  3740 Jun 14 21:22 0008-pg_dump-force-array-type-OID-preassignment-for-AO-AO.patch
-rw-rw-r-- 1 smiatkin smiatkin  8490 Jun 14 21:22 0007-pg_dump-skip-toast-AO-aux-OID-preassignment-for-lega.patch
-rw-rw-r-- 1 smiatkin smiatkin  1157 Jun 14 21:22 0006-pg_dump-fix-GPDB7-CBDB2-major-version-constants.patch
-rw-rw-r-- 1 smiatkin smiatkin  4153 Jun 14 21:22 0005-pg_dumpall-dump-role-PASSWORD-and-gate-GPDB-profile-.patch
-rw-rw-r-- 1 smiatkin smiatkin 13568 Jun 14 21:22 0004-pg_upgrade-map-GP5-6-partition-parents-and-external-.patch
-rw-rw-r-- 1 smiatkin smiatkin  1189 Jun 14 21:22 0003-pg_upgrade-default-file_encryption_method-for-cluste.patch
-rw-rw-r-- 1 smiatkin smiatkin  1317 Jun 14 21:22 0002-Map-AO-compresslevel-5-and-6-to-zstd-in-datumstream.patch
-rw-rw-r-- 1 smiatkin smiatkin  2776 Jun 14 21:22 0001-Allow-RLE-compresslevel-up-to-6-when-zstd-is-availab.patch

Those are partially my own, partially LLM-generated. So I'll have to invest some time to sanity-check it. But it indeed allows a very complex gpupgrade for regression databases to pass. So, more PRs are coming, but probably very small and simple ones.

Smyatkin-Maxim · 2026-06-15T12:29:12Z

create_view test is failing. 1) I touched it, 2) there was a conflict. Will take a look shortly.
UPD: done

When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12

During a batch rebranding from Greenplum to Cloudberry we've lost ability to work with Greenplum from pg_dump and psql

Backported from PG15 d782d59 Original commit message follows: When looping over the resultset from a SQL query, extracting the field number before the loop body to avoid repeated calls to PQfnumber is an established pattern. On very wide tables this can have a performance impact, but it wont be noticeable in the common case. This fixes a few queries which were extracting the field number in the loop body. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/OS0PR01MB57164C392783F29F6D0ECA0B94139@OS0PR01MB5716.jpnprd01.prod.outlook.com

Backported from PG15 4438eb4 Conflicts resolved by CBDB: * get foreignserver back * add islvm, isdynamic and amoid Conflicts resolved by GP7: * Include GPDB specific fields: relstorage, parrelid, parlevel * Excluded BM_BITMAPINDEX_NAMESPACE * Excluded upstream foreignserver field * Removed checks for version <= 80200 Original commit message follows: Along the same lines as 0473296, ed2c7f6 and daa9fe8, reduce code duplication by having just one copy of the parts of the query that are the same across all server versions; and make the conditionals control the smallest possible amount of code. This also gets rid of the confusing assortment of different ways to accomplish the same result that we had here before. While at it, make sure all three relevant parts of the function list the fields in the same order. This is just neatnik-ism, of course. Discussion: https://postgr.es/m/1240992.1634419055@sss.pgh.pa.us

Backported from PG15 5209c0b Conflicts Resolved: * Rejected changes to binary_upgrade functions. There is a large diff with GPDB specific changes that will be refactored in a future commit. * Retain GPDB_96_MERGE_FIXME for parrelid check. This will be revisited in a future commit. * GPDB specific dumpFunctionName caused changes to dumpTSParser to be misaligned. Manually resolved. Original commit message follows: Split the DumpableObject.dump bitmask field into separate bitmasks tracking which components are requested to be dumped (in the existing "dump" field) and which components exist for the particular object (in the new "components" field). This gets rid of some klugy and easily-broken logic that involved setting bits and later clearing them. More importantly, it restores the originally intended behavior that pg_dump's secondary data-gathering queries should not be executed for objects we have no interest in dumping. That optimization got broken when the dump flag was turned into a bitmask, because irrelevant bits tended to remain set in many cases. Since the "components" field starts from a minimal set of bits and is added onto as needed, ANDing it with "dump" provides a reliable indicator of what we actually have to dump, without having to complicate the logic that manages the request bits. This makes a significant difference in the number of queries needed when, for example, there are many functions in extensions. Discussion: https://postgr.es/m/2273648.1634764485@sss.pgh.pa.us Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc

Backported from PG15 0c9d844 Conflicts resolved: * Rejected changes to special handling of public schema from upstream a7a7be1 and b073c3c which have not been backported. These changes cause pg_dump to believe that schema public has a specific initprivs value even when it doesn't. In GPDB pg_upgrade we drop and recreate the public schema and with the above changes the public schema ACL would be different in the destination cluster. Ref: https://www.postgresql.org/message-id/20201031163518.GB4039133@rfd.leadboat.com * Refactored getExtProtocols to use new ACL logic * Removed code handling pre 8.3 servers Original commit message follows: Throw away most of the existing logic for this, as it was very inefficient thanks to expensive sub-selects executed to collect ACL data that we very possibly would have no interest in dumping. Reduce the ACL handling in the initial per-object-type queries to be just collection of the catalog ACL fields, as it was originally. Fetch pg_init_privs data separately in a single scan of that catalog, and do the merging calculations on the client side. Remove the separate code path used for pre-9.6 source servers; there is no good reason to treat them differently from newer servers that happen to have empty pg_init_privs. Discussion: https://postgr.es/m/2273648.1634764485@sss.pgh.pa.us Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc

Backported from PG15 9895961 Conflicts resolved: * Handle GPDB specific attencoding field in getTableAttrs. * Rejected changes to getTriggers introduced by f0e21f2 which has not been backported because triggers are not supported in GPDB. * Rejected attcompression field in getTableAttrs which was introduced in PG14 and has not been backported because it is not supported in GPDB. Original commit message follows: Instead of issuing a secondary data-collection query against each table to be dumped, issue just one query, with a WHERE clause restricting it to be applied to only the tables we intend to dump. Likewise for indexes, constraints, and triggers. This greatly reduces the number of queries needed to dump a database containing many tables. It might seem that WHERE clauses listing many target OIDs could be inefficient, but at least on recent server versions this provides a very substantial speedup. (In principle the same thing could be done with other object types such as functions; but that would require significant refactoring of pg_dump, so those will be tackled in a different way in a following patch.) The new WHERE clauses depend on the unnest() function, which is only present in 8.4 and above. We could implement them differently for older servers, but there is an ongoing discussion that will probably result in dropping pg_dump support for servers before 9.2, so that seems like it'd be wasted work. For now, just bump the server version check to require >= 8.4, without stopping to remove any of the code that's thereby rendered dead. We'll mop that situation up soon. Patch by me, based on an idea from Andres Freund. Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc

Backported from PG15 be85727 Conflicts resolved: * Reject changes to dumpTableAttach, which added a query for partbound details in e3fcbbd and wasn't backported as it was considered too invasive. * Handle GPDB specific callbackfunc and prodataaccess fields in dumpFunc Original commit message follows: For objects such as functions, pg_dump issues the same secondary data-collection query against each object to be dumped. This can't readily be refactored to avoid the repetitive queries, but we can PREPARE these queries to reduce planning costs. This patch applies the idea to functions, aggregates, operators, and data types. While it could be carried further, the remaining sorts of objects aren't likely to appear in typical databases enough times to be worth worrying over. Moreover, doing the PREPARE is likely to be a net loss if there aren't at least some dozens of objects to apply the prepared query to. Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc

Rearrange the version-dependent pieces in the new more modular style. Discussion: https://www.postgresql.org/message-id/flat/67a28a3f-7b79-a5a9-fcc7-947b170e66f0%40enterprisedb.com (cherry-picked from e2c52be)

Backported from PG15 d5e8930 Conflicts resolved: * Updated getExtProtocols to use new method of role name lookups. Original commit message follows: Get rid of the "username_subquery" mechanism in favor of doing local lookups of role names from role OIDs. The PG backend isn't terribly smart about scalar SubLinks in SELECT output lists, so this offers a small performance improvement, at least in installations with more than a couple of users. In any case the old method didn't make for particularly readable SQL code. While at it, I removed the various custom warning messages about failing to find an object's owner, in favor of just fatal'ing in the local lookup function. AFAIK there is no reason any longer to treat that as anything but a catalog-corruption case, and certainly no reason to make translators deal with a dozen different messages where one would do. (If it turns out that fatal() is indeed a bad idea, we can back off to issuing pg_log_warning() and returning an empty string, resulting in the same behavior as before, except more consistent.) Also drop an entirely unnecessary sub-SELECT to check on the pg_depend status of a sequence relation: we already have a LEFT JOIN to fetch the row of interest in the FROM clause. Discussion: https://postgr.es/m/2460369.1640903318@sss.pgh.pa.us

Remove remaining checks for versions pre-8.3 Authored-by: Brent Doil <bdoil@vmware.com>

Make sure tblinfo[i].parrelid is set. This was lost in the upstream backports, and fixes a bug when dumping from GPDB5/GPDB6. Authored-by: Brent Doil <bdoil@vmware.com>

Authored-by: Brent Doil <bdoil@vmware.com>

Remove remaining references to pre 8.3 servers. Authored-by: Brent Doil <bdoil@vmware.com>

Partitioning has existed in GPDB for a long time (4.2 and before). So, no reason to check for existence of these features since we do not support dumps from servers <= GPDB5. Removing the check because we expect these to exist. Fixes https://github.com/greenplum-db/gpdb/issues/5170.

To gather binary upgrade information for GPDB{5,6} partition children, they need to exist in the tblinfo array for findTableByOid. This requires the getTable query to also collect all GPDB partition children, then filter them out of the dump so we only dump the partition root DDL. Previously, partition children were being excluded from the getTables query and gathered separately in dumpTableSchema. This allows dumpTableSchema to perform lookups of the partition children to dump their OIDs for binary upgrade. Authored-by: Brent Doil <bdoil@vmware.com>

GPDB5/6 do not support RLS. Without the version gate a dump and restore on these server versions will fail. Authored-by: Brent Doil <bdoil@vmware.com>

Move pg_get_partition_def and pg_get_partition_template_def queries from dumpTableSchema to getTables. The functions are only run for parent tables. This change reduces the number of repetitive queries and simplifies dumpTableSchema. Authored-by: Brent Doil <bdoil@vmware.com>

Two conditionals in getTableSchema were testing remote version <= 90400 for GPDB5/6 specific code, which is a bug because it omits GPDB6 (90424 or 90426) altogether. This commit fixes the issue by defining and using a macro for the major Postgres versions for GPDB5-7. These are then used in GPDB specific version checks. Also cleans up some now unused functions. Also check tbinfo->partclause before trying to dereference the pointer. Authored-by: Brent Doil <bdoil@vmware.com>

Authored-by: Brent Doil <bdoil@vmware.com>

As in gp6 there is no objtype abbreviation E in 'acldefault' fuction.

I met this error when I tried to use pg_dump in main branch to dump ao table from gpdb6. Co-authored-by: gpadmin <gpadmin@localhost.localdomain>

Backport of upstream commit: a563c24 Resolved merge conflicts in: * 002_pg_dump.pl due to multiple tests present in upstream that are not yet present in gpdb codebase. * minor merge conflicts in pg_dump.sgml, unclear their source, likely formatting-derived backported by: Andrew Repp Original commit message: This patch adds new pg_dump switches --table-and-children=pattern --exclude-table-and-children=pattern --exclude-table-data-and-children=pattern which act the same as the existing --table, --exclude-table, and --exclude-table-data switches, except that any partitions or inheritance child tables of the table(s) matching the pattern are also included or excluded. Gilles Darold, reviewed by Stéphane Tachoires Discussion: https://postgr.es/m/5aa393b5-5f67-8447-b83e-544516990ee2@migops.com

Currently, when backing up a GP5 or GP6 database, getTables will hold a lock on all partition tables in the database by calling `pg_get_partition_def` and `pg_get_partition_template_def` on each table, resulting in tables being locked that are not part of the backup set. This commit extracts the `pg_get_partition_def` and `pg_get_partition_template_def` function calls out of getTables and into getPartitionDefs. Now, only partition tables in the backup set will be included in calls to these functions. Authored-by: Brent Doil <bdoil@vmware.com>

Backported from PG15 e3fcbbd Conflicts resolved: * OidOptions enum value zeroIsError does not exist, instead used the existing zeroAsOpaque. * Minor conflicts in dumpTableSchema resulting from foreign server related code introduced in PG13 commit 4e62091 Original commit message follows: Avoid calling pg_get_partkeydef(), pg_get_expr(relpartbound), and regtypeout until we have lock on the relevant tables. The existing coding is at serious risk of failure if there are any concurrent DROP TABLE commands going on --- including drops of other sessions' temp tables. Arguably this is a bug fix that should be back-patched, but it's moderately invasive and we've not had all that many complaints about such failures. Let's just put it in HEAD for now. Discussion: https://postgr.es/m/2273648.1634764485@sss.pgh.pa.us Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc

In order to support 6 > 7 upgrade using pg_upgrade. 7x pg_dumpall needs to be able to dump 6x resource groups correctly. Supporting the dumping of 6x resource group resolves the following error when trying to upgrade from 6 > 7. ``` ALTER RESOURCE GROUP "default_group" SET cpu_weight 0; psql:pg_upgrade_dump_globals.sql:27: ERROR: cpu_weight range is [1, 500] ``` This happens because some of the attribute ordering in pg_resgroupcapability changed from 6x to 7x. To resolve this issue, the following mapping was done based on 6x and 7x resource group docs. New attribute cpu_weight is set to 7x default value of 100. 6x | 7x ---------------+----------------- concurrency | concurrency cpu_rate_limit | cpu_max_percent 100 | cpu_weight cpuset | cputset reference resource group docs: https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/admin_guide-workload_mgmt_resgroups.html

pg_dump does not correctly dump functions when both greenplum specific internal flags `--function-oid` and `--relation-oid` are used at the same time. This is because functions are being marked dumpable or not twice. The first time is by selectDumpableFunction and then again by selectDumpableObject. When `--relation-oid` is used, all namespaces are marked not dumpable which is why selectDumpableObject will mark the function to not be dumped. This must be a regression. The original commit that introduces greenplum specific selectDumpableFunction removes the usage of the more generic selectDumpableObject. selectDumpableFunction reference commit: https://github.com/greenplum-db/gpdb/commit/318b86afc3eb5745f07ddfb390970b1769c63a1b

Commit 992b4dd fixes a pg_dump regression when using --function-oid where functions were incorrectly being marked to not dump. However, this fix revealed a second pg_dump regression with selectDumpableFunction which was being masked by the first regression. selectDumpableFunction now needed updating to work with the new method of dumping that uses bit masking that was backported in github.com/greenplum-db/gpdb/commits/07a3ffef6422275f76818d9c445c31495460889a. It also needed updating to work with function dumping when extensions are involved. selectDumpableAggregate was created so aggregate dumping works as intended when using --function-oid on an aggregate function. This is necessary because pg_dump treats aggregates separately from functions. See the comment in getFuncs for details.

As of GPDB7, gp_toolkit is an extension. During gpinitsystem gp_toolkit extension is automatically created in template1. During 6 > 7 upgrade, we should assume that this extension will always be there. In future upgrades, gp_toolkit will be treated like any other extension.

selectSourceSchema() was removed, remove the fixme but keep the comment since it's still valid.

reshke · 2026-06-15T18:57:03Z

BTW we already have CVE-2024-7348 fix (first commit in this PR) as 8d01027 in PG14ARCHIVE branch
so this does a portion of #1819
anyway LGTM

leborchuk

LGTM

Smyatkin-Maxim force-pushed the pg-dump-rebase-REL-2 branch from ec6f1c8 to d736dfe Compare June 15, 2026 11:35

MasahikoSawada and others added 28 commits June 15, 2026 17:28

pg_dump/psql: properly recognize GP

888dc26

During a batch rebranding from Greenplum to Cloudberry we've lost ability to work with Greenplum from pg_dump and psql

pg_dump: Refactor getIndexes()

3f311d3

Rearrange the version-dependent pieces in the new more modular style. Discussion: https://www.postgresql.org/message-id/flat/67a28a3f-7b79-a5a9-fcc7-947b170e66f0%40enterprisedb.com (cherry-picked from e2c52be)

pg_dump: Remove dead code.

119b2b0

Remove remaining checks for versions pre-8.3 Authored-by: Brent Doil <bdoil@vmware.com>

pg_dump: Fix getTables parrelid

2057075

Make sure tblinfo[i].parrelid is set. This was lost in the upstream backports, and fixes a bug when dumping from GPDB5/GPDB6. Authored-by: Brent Doil <bdoil@vmware.com>

pg_dump: Put GPDB global vars in DumpOptions

e40da99

Authored-by: Brent Doil <bdoil@vmware.com>

pg_dump: Formatting fixes and code cleanup

5dddb95

Remove remaining references to pre 8.3 servers. Authored-by: Brent Doil <bdoil@vmware.com>

pg_dump: Remove DumpOptions argument from testGPbackend

f38a5b8

pg_dump: Omit BYPASSRLS clause for pre GPDB7 server

21feac1

GPDB5/6 do not support RLS. Without the version gate a dump and restore on these server versions will fail. Authored-by: Brent Doil <bdoil@vmware.com>

pg_dump: Remove unneeded version check for quote_all_identifiers

2830f8e

Authored-by: Brent Doil <bdoil@vmware.com>

Fix external protocol dump from gp6 using gp7 pg_dump.

e61e494

As in gp6 there is no objtype abbreviation E in 'acldefault' fuction.

Fix pg_dump small bug (#14991)

c6c1d3e

I met this error when I tried to use pg_dump in main branch to dump ao table from gpdb6. Co-authored-by: gpadmin <gpadmin@localhost.localdomain>

bmdoil and others added 4 commits June 15, 2026 17:28

pg_dumpall: Fix syntax error when dumping RGs

23470d5

Remove a fixme referring selectSourceSchema()

5d34e12

selectSourceSchema() was removed, remove the fixme but keep the comment since it's still valid.

Smyatkin-Maxim force-pushed the pg-dump-rebase-REL-2 branch from d736dfe to 5d34e12 Compare June 15, 2026 14:29

reshke approved these changes Jun 15, 2026

View reviewed changes

leborchuk approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick missing pg-dump commits from gp7#1821

Cherry-pick missing pg-dump commits from gp7#1821
Smyatkin-Maxim wants to merge 32 commits into
REL_2_STABLEfrom
pg-dump-rebase-REL-2

Smyatkin-Maxim commented Jun 15, 2026

Uh oh!

Smyatkin-Maxim commented Jun 15, 2026 •

edited

Loading

Uh oh!

reshke commented Jun 15, 2026

Uh oh!

leborchuk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

Smyatkin-Maxim commented Jun 15, 2026

Uh oh!

Smyatkin-Maxim commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reshke commented Jun 15, 2026

Uh oh!

leborchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Smyatkin-Maxim commented Jun 15, 2026 •

edited

Loading