Skip to content

Backport PG14_ARCHIVE in REL_2_STABLE stable #1819

Open
reshke wants to merge 1108 commits into
mainfrom
bp_old_main
Open

Backport PG14_ARCHIVE in REL_2_STABLE stable #1819
reshke wants to merge 1108 commits into
mainfrom
bp_old_main

Conversation

@reshke

@reshke reshke commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Commits than changes ABI are excluded:

  1. 615f6ec
  2. 7ec9bdf
  3. fed0458
  4. e6ea65a
  5. 3ce9317

reshke and others added 30 commits April 23, 2026 14:04
An duct tape for this was already added as fc8aab8, through redo
path was not patched there. Copy same guard into
redoDistributedCommitRecord function boby.
The number of NFA states, number of NFA arcs, and number of colors
are all bounded to reasonably small values.  However, there are
places where we try to allocate arrays sized by products of those
quantities, and those calculations could overflow, enabling
buffer-overrun attacks.  In practice there's no problem on 64-bit
machines, but there are some live scenarios on 32-bit machines.

A related problem is that citerdissect() and creviterdissect()
allocate arrays based on the length of the input string, which
potentially could overflow.

To fix, invent MALLOC_ARRAY and REALLOC_ARRAY macros that rely on
palloc_array_extended and repalloc_array_extended with the NO_OOM
option, similarly to the existing MALLOC and REALLOC macros.
(Like those, they'll throw an error not return a NULL result for
oversize requests.  This doesn't really fit into the regex code's
view of error handling, but it'll do for now.  We can consider
whether to change that behavior in a non-security follow-up patch.)

I installed similar defenses in the colormap construction code.
It's not entirely clear whether integer overflow is possible
there, but analyzing the behavior in detail seems not worth
the trouble, as the risky spots are not in hot code paths.

I left a bunch of calls as-is after verifying that they can't
overflow given reasonable limits on nstates and narcs.  Those
limits were enforced already via REG_MAX_COMPILE_SPACE, but
add commentary to document the interactions.

In passing, also fix a related edge case, which is that the
special color numbers used in LACON carcs could overflow the
"color" data type, if ncolors is close to MAX_COLOR.

In v14 and v15, the regex engine calls malloc() directly instead
of using palloc(), so MALLOC_ARRAY and REALLOC_ARRAY do likewise.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value.  This results in a small output buffer
allocation and subsequent buffer overrun.

To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so.  In frontend code we want to
just return NULL for this failure (treating it like OOM).  In the
backend, we can rely on the following palloc() call to throw error.

I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array().  These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.

Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size.  This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.

Fix these places by using palloc_array() instead of a handwritten
multiplication.  (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)

In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter.  If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow.  Add an explicit check to forestall that and keep the
behavior the same cross-platform.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
These two routines will be used in a test of an upcoming fix.  This
commit affects the v14~v17 range.  v18 and newer versions already
include them, thanks to 85ec945b7880.

Security: CVE-2026-6479
Backpatch-through: 14
The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.

A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.

This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts.  A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.

Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14
Define MaxAllocSize in src/include/common/fe_memutils.h rather
than having several copies of it in different src/common/*.c files.
This also provides an opportunity to document it better.

Back-patch of commit 11b7de4a7, needed now because assorted security
fixes are adding additional references to MaxAllocSize in frontend
code.

Backpatch-through: 14-17
Security: CVE-2026-6473
contrib/intarray's query_int type uses an int16 field to hold the
offset from a binary operator node to its left operand.  However, it
allows the number of nodes to be as much as will fit in MaxAllocSize,
so there is a risk of overflowing int16 depending on the precise shape
of the tree.  Simple right-associative cases like "a | b | c | ..."
work fine, so we should not solve this by restricting the overall
number of nodes.  Instead add a direct test of whether each individual
offset is too large.

contrib/ltree's ltxtquery type uses essentially the same logic and
has the same 16-bit restriction.

(The core backend's tsquery.c has a variant of this logic too, but
in that case the target field is 32 bits, so it is okay so long
as varlena datums are restricted to 1GB.)

In v16 and up, these types support soft error reporting, so we have
to complicate the recursive findoprnd function's API a bit to allow
the complaint to be reported softly.  v14/v15 don't need that.

Undocumented and overcomplicated code like this makes my head hurt,
so add some comments and simplify while at it.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473
This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.

The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.

Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%.  However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.

To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string.  The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().

Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.

This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
Although pg_strftime() has defined error conditions, no callers bother
to check for errors.  This is problematic because the output string is
very likely not null-terminated if an error occurs, so that blindly
using it is unsafe.  Rather than trusting that we can find and fix all
the callers, let's alter the function's API spec slightly: make it
guarantee a null-terminated result so long as maxsize > 0.

Furthermore, if we do get an error, let's make that null-terminated
result be an empty string.  We could instead truncate at the buffer
length, but that risks producing mis-encoded output if the tz_name
string contains multibyte characters.  It doesn't seem reasonable for
src/timezone/ to make use of our encoding-aware truncation logic.
Also, the only really likely source of a failure is a user-supplied
timezone name that is intentionally trying to overrun our buffers.
I don't feel a need to be particularly friendly about that case.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
This omission allowed roles to create multirange types in any
schema, potentially leading to privilege escalations.  Note that
when a multirange type name is not specified in CREATE TYPE, it is
automatically placed in the range type's schema, which is checked
at the beginning of DefineRange().

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Security: CVE-2026-6472
Backpatch-through: 14
A few functions in this file were incautious about multiplying a
possibly large integer by a factor more than 1 and then using it as
an allocation size.  This is harmless on 64-bit systems where we'd
compute a size exceeding MaxAllocSize and then fail, but on 32-bit
systems we could overflow size_t, leading to an undersized
allocation and buffer overrun.  To fix, use palloc_array() or
mul_size() instead of handwritten multiplication.

Reported-by: Sven Klemm <sven@tigerdata.com>
Reported-by: Xint Code
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Security: CVE-2026-6473
Backpatch-through: 14
pg_rewind and pg_basebackup could be fed paths from rogue endpoints that
could overwrite the contents of the client when received, achieving path
traversal.

There were two areas in the tree that were sensitive to this problem:
- pg_basebackup, through the astreamer code, where no validation was
performed before building an output path when streaming tar data.  This
is an issue in v15 and newer versions.
- pg_rewind file operations for paths received through libpq, for all
the stable branches supported.

In order to address this problem, this commit adds a helper function in
path.c, that reuses path_is_relative_and_below_cwd() after applying
canonicalize_path().  This can be used to validate the paths received
from a connection point.  A path is considered invalid if any of the two
following conditions is satisfied:
- The path is absolute.
- The path includes a direct parent-directory reference.

Reported-by: XlabAI Team of Tencent Xuanwu Lab
Reported-by: Valery Gubanov <valerygubanov95@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6475
pg_locale_icu.c was full of places where a very long input string
could cause integer overflow while calculating a buffer size,
leading to buffer overruns.

It also was cavalier about using char-type local arrays as buffers
holding arrays of UChar.  The alignment of a char[] variable isn't
guaranteed, so that this risked failure on alignment-picky platforms.
The lack of complaints suggests that such platforms are very rare
nowadays; but it's likely that we are paying a performance price on
rather more platforms.  Declare those arrays as UChar[] instead,
keeping their physical size the same.

pg_locale_libc.c's strncoll_libc_win32_utf8() also had the
disease of assuming it could double or quadruple the input
string length without concern for overflow.

Reported-by: Xint Code
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
If you accumulate many arrays full of NULLs, you could overflow
'nitems', before reaching the MaxAllocSize limit on the allocations.
Add an explicit check that the number of items doesn't grow too large.
With more than MaxArraySize items, getting the final result with
makeArrayResultArr() would fail anyway, so better to error out early.

Reported-by: Xint Code
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
When result_is_int is set to 0, PQfn() cannot validate that the
result fits in result_buf, so it will write data beyond the end of
the buffer when the server returns more data than requested.  Since
this function is insecurable and obsolete, add a warning to the top
of the pertinent documentation advising against its use.

The only in-tree caller of PQfn() is the frontend large object
interface.  To fix that, add a buf_size parameter to
pqFunctionCall3() that is used to protect against overruns, and use
it in a private version of PQfn() that also accepts a buf_size
parameter.

Reported-by: Yu Kunpeng <yu443940816@live.com>
Reported-by: Martin Heistermann <martin.heistermann@unibe.ch>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Security: CVE-2026-6477
Backpatch-through: 14
Maliciously crafted key value updates could achieve SQL injection
within check_foreign_key().  To fix, ensure new key values are
properly quoted and escaped in the internally generated SQL
statements.  While at it, avoid potential buffer overruns by
replacing the stack buffers for internally generated SQL statements
with StringInfo.

Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Security: CVE-2026-6637
Backpatch-through: 14
timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.

This has been initially applied as of 09be39112654 in v18 and newer
versions, and will be used in all the stable branches for an upcoming
fix.

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e234@iki.fi
Security: CVE-2026-6478
Backpatch-through: 14
Sufficiently large "count" arguments could result in undetected
overflow, causing the allocated memory chunk to be much smaller
than what the caller will subsequently write into it.  This is
unlikely to be a hazard with 64-bit size_t but can sometimes
happen on 32-bit builds, primarily where a function allocates
workspace that's significantly larger than its input data.
Rather than trying to patch the at-risk callers piecemeal,
let's just redefine these macros so that they always check.

To do that, move the longstanding add_size() and mul_size() functions
into palloc.h and mcxt.c, and adjust them to not be specific to
shared-memory allocation.  Then invent palloc_mul(), palloc0_mul(),
palloc_mul_extended() to use these functions.  Actually, the latter
use inlined copies to save one function call.  repalloc_array() gets
similar treatment.  I didn't bother trying to inline the calls for
repalloc0_array() though.

In v14 and v15, this also adds repalloc_extended(), which previously
was only available in v16 and up.

We need copies of all this in fe_memutils.[hc] as well, since that
module also provides palloc_array() etc.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6473
Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL.  Hence, trusted PLs must not offer features
that achieve setenv().  Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.

To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning.  Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:

  Can the application reasonably proceed without the modification?

    If no, switch to plperlu or another approach.

    If yes, the application should change the code to stop attempting
    environment modifications.  If that's too difficult, add "untie
    %main::ENV" in any code executed before the warning.  For example,
    one might add it to the start of the affected function or even to
    the plperl.on_plperl_init setting.

In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.

Back-patch to v12 (all supported versions).

Andrew Dunstan and Noah Misch

Security: CVE-2024-10979
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it.  This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
v14 and earlier use generated test files, which require being
.gitignore'd to avoid git complaints when testing in-tree.

Security: CVE-2024-10979
v16 commit 8fe3e69 used REGRESS_OPTS in
a way needing this.  That broke "vcregress plcheck".  Back-patch
v16..v12; newer versions don't have this build system.
TestUpgradeXversion knows how to make the main regression database's
references to pg_regress.so be version-independent.  But it doesn't
do that for plperl's database, so that the C function added by
commit b7e3a52a8 is causing cross-version upgrade test failures.
Path of least resistance is to just drop the function at the end
of the new test.

In <= v14, also take the opportunity to clean up the generated
test files.

Security: CVE-2024-10979
tuhaihe and others added 29 commits June 13, 2026 08:11
Replace version tags with commit hashes for Docker GitHub Actions
to comply with Apache organization security requirements.

Changes:
- docker/setup-qemu-action@v3 → @c7c53464625b32c7a7e944ae62b3e17d2b600130 (v3.7.0)
- docker/login-action@v3 → @c94ce9fb468520275223c153574b00df6fe4bcc9 (v3.7.0)
- docker/setup-buildx-action@v3 → @8d2750c68a42422c14e847fe6c8ac0403b4cbd6f (v3.12.0)
- docker/build-push-action@v6 → @10e90e3645eae34f1e60eeb005ba3a3d33f178e8 (v6.19.2)

Affected workflows:
- .github/workflows/docker-cbdb-build-containers.yml
- .github/workflows/docker-cbdb-test-containers.yml

Fixes #1687
This commit introduces a new GitHub Actions workflow for building and
testing Apache Cloudberry on Ubuntu 24.04, enabling automated builds,
DEB packaging, and regresssion testing.

Triggers:
- Push to main branch
- Pull requests modifying this workflow file
- Scheduled: Every Monday at 02:00 UTC
- Manual workflow dispatch with optional test selection
Use raw string literal (r""") for SQL query in orphaned_toast_tables_check.py
to avoid SyntaxWarning on Python 3.12.

The query contains `\d` for PostgreSQL regex which Python 3.12 incorrectly
interprets as an invalid escape sequence, causing test failures on Ubuntu 24.04.

See: #1686
* Fix colNDVBySeg index mismatch in do_analyze_rel

When ANALYZE is run on specific columns (e.g., ANALYZE t (col)) or when
a table has dropped columns, the vacattrstats loop index `i` diverges
from the attribute's actual attnum-1 index used by colNDVBySeg.

Two fixes:
1. QD side (line 887): read colNDVBySeg[attnum-1] instead of
   colNDVBySeg[i] when storing stadistinctbyseg.
2. Segment side (line 1011): write ctx->stadistincts[attnum-1] instead
   of ctx->stadistincts[i] when collecting per-segment NDV.

* Add regression test for colNDVBySeg index mismatch in do_analyze_rel

ANALYZE t(b) puts column b at loop index i=0 on the QD, but b has
attnum=2, so attnum-1=1 != i=0. The fix in do_analyze_rel (using
attnum-1 instead of i to index colNDVBySeg) ensures stadistinctbyseg
is read from the correct per-segment NDV slot.

Test verifies stadistinctbyseg for column b equals 100 (all distinct)
rather than ~5 (NDV of column a at index 0).
Update Go installation in all Docker build containers to use the latest
Go 1.24.13 release instead of 1.23.4, with corresponding SHA256 checksums
for both amd64 and arm64 architectures.

Affected files:
- devops/deploy/docker/build/rocky8/Dockerfile
- devops/deploy/docker/build/rocky9/Dockerfile
- devops/deploy/docker/build/ubuntu22.04/Dockerfile
- devops/deploy/docker/build/ubuntu24.04/Dockerfile

Updated SHA256 checksums:
- linux-amd64: 1fc94b57134d51669c72173ad5d49fd62afb0f1db9bf3f798fd98ee423f8d730
- linux-arm64: 74d97be1cc3a474129590c67ebf748a96e72d9f3a2b6fef3ed3275de591d49b3
Main changes:
- Update default CODEBASE_VERSION from 2.0.0 to 2.1.0 in .env
- Update documentation examples to use version 2.1.0 in README.md
- Update help message example version in run.sh
- Switch to Apache mirror system for downloading release tarball
  using closer.lua for better download reliability and speed
- Replace wget with curl for source download in Dockerfile

This change ensures the sandbox environment defaults to the latest
Apache Cloudberry 2.1.0 release and uses the recommended Apache
mirror download method.
Those fields are missed by orca which are needed by the
pg_stat_statements to identify the query. Without initialization of
those fields, pg_stat_statements won't track those queries.
This is the copy or Rocky9 containers.

Changes comparing to Rocky9:

- Get rid here of packages not available right now under Rocky10 repositories (for example rocky-release-hpc)
- Move to the Java 21, it's default to Rocky Linux 10

Co-authored-by: Leonid Borchuk <xifos@qavm-f9b691f5.qemu>
libxml2 changed the required signature of error handler callbacks
to make the passed xmlError struct "const".  This is causing build
failures on buildfarm member caiman, and no doubt will start showing
up in the field quite soon.  Add a version check to adjust the
declaration of xml_errorHandler() according to LIBXML_VERSION.

2.12.x also produces deprecation warnings for contrib/xml2/xpath.c's
assignment to xmlLoadExtDtdDefaultValue.  I see no good reason for
that to still be there, seeing that we disabled external DTDs (at a
lower level) years ago for security reasons.  Let's just remove it.

Back-patch to all supported branches, since they might all get built
with newer libxml2 once it gets a bit more popular.  (The back
branches produce another deprecation warning about xpath.c's use of
xmlSubstituteEntitiesDefault().  We ought to consider whether to
back-patch all or part of commit 65c5864d7 to silence that.  It's
less urgent though, since it won't break the buildfarm.)

Discussion: https://postgr.es/m/1389505.1706382262@sss.pgh.pa.us
If we are building with openssl but USE_SSL_ENGINE didn't get set,
initialize_SSL's variable "pkey" is declared but used nowhere.
Apparently this combination hasn't been exercised in the buildfarm
before now, because I've not seen this warning before, even though
the code has been like this a long time.  Move the declaration
to silence the warning (and remove its useless initialization).

Per buildfarm member sawshark.  Back-patch to all supported branches.
We have two sections in a Makefile - one for CPP_OBJS and one for OBJS.
CPP_OBJS use wildcards and src/protos includes bot in CPP_OBJS and in
OBJS. So generated gcc string includes multiple items of proto *.o
files. That leads to multiple definitions errors in linking time. Do not
include proto files in CPP_OBJS macros and use it in OBJS macros.
Some functions are used in the tree and are currently marked as
deprecated by upstream.  This commit refreshes the code to use the
recommended functions, leading to the following changes:
- xmlSubstituteEntitiesDefault() is gone, and needs to be replaced with
XML_PARSE_NOENT for the paths doing the parsing.
- xmlParseMemory() -> xmlReadMemory().

These functions, as well as more functions setting global states, have
been officially marked as deprecated by upstream in August 2022.  Their
replacements exist since the 2001-ish area, as far as I have checked,
so that should be safe.

Author: Dmitry Koval
Discussion: https://postgr.es/m/18274-98d16bc03520665f@postgresql.org
ORCA's window frame translation always emits a BETWEEN frame
(start + end bound), so include FRAMEOPTION_BETWEEN alongside
FRAMEOPTION_NONDEFAULT to match the executor's expectations.
…ated host (#1702)

* Fix null dereference on dedicated hot standby coordinator

getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only.
When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up
in the hash but return NULL on dedicated standby nodes that host no primary
segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.
…nsumer (#1719)

* orca: fallback to Postgres optimizer on cross-slice replicated CTE Consumer.
Inspired by greengage 51fe92e: before Expr->DXL translation,
walk the physical tree and track which slice each CTE Producer
and Consumer lives on. If a Consumer is on a different slice
than its Producer and the Producer's distribution is replicated,
force a fallback to the Postgres optimizer.

The replicated filter is essential: ordinary cross-slice CTE plans
(non-replicated Producer with Gather/Redistribute Consumer) are a
normal ORCA pattern and must not trigger fallback.

51fe92e doesn't trigger when a CTE over a replicated table is
referenced from a scalar subquery, so the query hangs. This commit
replaces the single-point check with a whole-tree walker that
catches both cases.

Tests: shared_scan adds a scalar-subquery reproducer guarded by
statement_timeout. qp_orca_fallback adds two cases over a replicated
CTE: a scalar-subquery form that triggers the walker (the hang case
51fe92e missed -- fallback to Postgres), and the original 51fe92e JOIN
form where ORCA emits a safe plan with a One-Time Filter
(gp_execution_segment() = N) and the walker correctly stays silent
(guards against false positives).

(cherry picked from commit
open-gpdb/gpdb@3a9aebf)
Update Go version to 1.25.10 across all development Docker images
for Rocky Linux 8/9/10 and Ubuntu 22.04/24.04.

Changes:
- Go version: go1.24.13 -> go1.25.10
- Updated SHA256 checksums for linux-amd64 and linux-arm64 archives

See: apache/cloudberry-go-libs#19 (comment)
A scalar (plain) aggregate with no grouping columns always emits
exactly one row regardless of input cardinality. Predicates above it
(from a HAVING clause) filter that output row, so they cannot be
moved onto the aggregate's input without changing semantics:

  SELECT count(*) FROM t HAVING false      -- 0 rows
  SELECT count(*) FROM t WHERE  false      -- 1 row (count=0)

CNormalizer::FPushable previously only blocked pushing volatile
predicates below a GbAgg. Any other predicate -- including a constant
false -- was considered pushable because its used-column set was
trivially contained in the aggregate's output columns. The normalizer
then routed the Select's predicate through the GbAgg and down into
its logical child, dropping HAVING semantics for scalar aggregates.
Add comprehensive parallel table scan capability to GPORCA optimizer,
enabling worker-level parallelism within segments for improved query
performance on large table scans.

Key components:
- New CPhysicalParallelTableScan operator and CDistributionSpecWorkerRandom
distribution specification for worker-level data distribution
- CXformGet2ParallelTableScan transformation with parallel safety checks
(excludes CTEs, dynamic scans, foreign tables, replicated tables, etc.)
- Cost model integration with parallel_setup_cost and efficiency degradation
scaling (logarithmic based on worker count)
- DXL serialization/deserialization for CDXLPhysicalParallelTableScan
- Plan translation to PostgreSQL SeqScan nodes with parallel_aware=true
- Rewindability constraints (parallel scans are non-rewindable)
- GUC integration: max_parallel_workers_per_gather controls worker count
In Cloudberry's MPP architecture, segment stats are delivered
asynchronously to the coordinator. The seq_scan counter can be
registered before seq_tup_read arrives from segments, causing
wait_for_stats() to exit prematurely and the subsequent assertion
to fail intermittently in the pax-ic-good-opt-off CI job.

Add an explicit wait condition (updated6) for seq_tup_read reaching
the expected value, and update the comment to reflect Cloudberry's
segment-level async stats delivery rather than parallel workers.
Add libicu-devel package to Rocky Linux 8, 9, and 10 Dockerfiles
to provide ICU (International Components for Unicode) library
support required for PostgreSQL 16 kernel compilation.

This dependency is already present in Ubuntu 22.04 and Ubuntu 24.04
development images, ensuring consistency across all supported build
platforms for PostgreSQL 16 compilation requirements.
The command `gppkg --clean` fails with the following error: "'SyncPackages' object has no attribute 'ret'".

This occurs because `operations` was being passed positionally during the OperationWorkerPool initialization, which incorrectly bound it to the `should_stop` argument instead of `items` in the base WorkerPool class.

The solution is to  pass `operations` as a keyword argument..
* Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726)

Both the FDW catalog reader (src/backend/access/external/external.c)
and the gp_exttable_fdw option validator
(gpcontrib/gp_exttable_fdw/option.c) parsed the "encoding" OPTIONS value
with atoi(). atoi("UTF8") returns 0 (PG_SQL_ASCII) and PG_VALID_ENCODING(0)
is true, so symbolic names like 'UTF8', 'utf-8', 'GBK' silently fell through
validation and were stored as SQL_ASCII at read time. By contrast, the
legacy CREATE EXTERNAL TABLE ... ENCODING ... path resolves names via
pg_char_to_encoding() and persists a numeric form into OPTIONS — only the
FDW OPTIONS entry point bypassed that translation.

Add a small shared helper parse_fdw_encoding_option(const char *) in
src/backend/access/external/external.c (declared in
src/include/access/external.h):

  - first try pg_char_to_encoding(name) — same logic as the legacy path;
  - otherwise try a strict numeric form via strtol() with end-of-string
    and PG_VALID_ENCODING() checks (atoi is intentionally avoided, since
    atoi("UTF8")==0 is the bug being fixed);
  - otherwise ereport(ERROR).

Both the validator and GetExtFromForeignTableOptions() call this helper.
On-disk values in pg_foreign_table.ftoptions are stored verbatim as the
user wrote them; correctness is established at read time. This avoids a
ProcessUtility_hook approach, which is unworkable here because the
extension's _PG_init runs lazily on the first dlopen, after the current
statement's hook check has already passed.

Affected scope: gp_exttable_fdw (used by gp_exttable_server). The
standalone pxf_fdw is unaffected — its validator already routes encoding
through ProcessCopyOptions, which is name-aware.

Behavior change on upgrade: existing rows whose ftoptions literally contain
encoding=<name> have, until now, been silently interpreted as SQL_ASCII.
After this fix they are interpreted as the named encoding. This will be
called out in the release notes; a detection query is provided in the PR
description for operators who wish to pin specific tables to numeric form
before upgrade.

Tests added in gpcontrib/gp_exttable_fdw/{input,output}/gp_exttable_fdw.source
cover encoding '6' / 'UTF8' / 'utf-8' / 'GBK' / 'bogus' and an
ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') path. The pre-existing
encoding '-1' error case has its expected error message updated to match
the new helper's wording.

* test: pad expected output headers to match psql separator widths

The new tests added in the previous commit had column header lines
without the trailing-space padding that psql's aligned output emits
to match the separator. The pre-existing ext_special_uri header
(' a | b') was also unintentionally stripped of its trailing space
during the same edit.

Pure whitespace fix. No behavior change.

* test: drop trailing blank line in gp_exttable_fdw expected output

pg_regress diffs the expected and actual .out files strictly, including
the final newline count. The new encoding test block ended with a
stray empty line (";\n\n") while psql produces ";\n", causing a 1-line
diff at end-of-file. Pure whitespace fix.

* test: reject mixed numeric+letters in FDW encoding option

Add a regression case for `encoding '6abc'`. atoi("6abc") would have
silently returned 6 (= UTF8), which is the class of bug that motivated
moving the FDW encoding option parser off atoi() and onto a strict
strtol() form in parse_fdw_encoding_option(). Without this test, the
strictness of the numeric path was not directly exercised — only the
"unknown name" path ('bogus') was.

Pure test addition; no code change. Lands the third of the reviewer's
suggestions on issue #1726 (the first two — strict strtol parsing and a
single shared helper between the validator and the read path — were
already in place in the original fix commit).

* ci: retrigger to clear flaky alter_distribution_policy

---------

Co-authored-by: chenqiang <chenqiang@hashdata.cn>
ClearAOCSFileSegInfo/ClearFileSegInfo (called from
ao_vacuum_rel_recycle_dead_segments) updates pg_aoseg rows via
simple_heap_update, which assigns the current CommandId to the new tuple.
AppendOptimizedTruncateToEOF then opens a catalog snapshot via
GetCatalogSnapshot, which also uses GetCurrentCommandId.  Because both
operations share the same CommandId, the just-zeroed rows are invisible
to the snapshot (cid >= snapshot->curcid), while the old rows with their
original non-zero EOF values remain visible.  TruncateAOSegmentFile then
sees a 0-byte physical file but a non-zero logical EOF and raises:

  "file size smaller than logical eof"

Advancing the command counter before AppendOptimizedTruncateToEOF
ensures the zeroed rows are visible to its catalog snapshot (their cid
is now strictly less than the new curcid).

Fixes: #1746
ReleaseSysCache(htup) was called before NameStr(staForm->stxname) was
read, returning a pointer into the already-released tuple buffer.
Copy the name with pstrdup() first, then release the cache entry.
This PR fixes the recovery flow when the internal WAL replication slot does not already exist on the source segment.

Before this change, both gpsegrecovery and gpconfigurenewsegment would start pg_basebackup first and only retry with slot creation after the backup failed. In practice, that meant a full base backup could run for a long time and then fail at the end because the slot was missing.

This change fixes that at the root:

adds a shared helper to check whether the replication slot already exists
creates the slot up front when needed, before pg_basebackup starts
removes the fallback second pg_basebackup attempt from both recovery paths
updates unit tests to cover the new behavior and the new failure mode

---------

Co-authored-by: Leonid <63977577+leborchuk@users.noreply.github.com>
@reshke

reshke commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Looks like this is just too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.