Skip to content

Conversation

@zhangwenchao-123
Copy link

  • Add the module name, JIRA# to PR/commit and description.
  • Add tests for the change.

The following two operator delete functions doesn't lookup in madlib
library. Because it's not added in the library script file.

void operator delete  (void *ptr, std::size_t sz) noexcept;
void operator delete[](void *ptr, std::size_t sz) noexcept;

The two functions are missing previously.
set(_PG_CONFIG_VERSION_MACRO "GP_VERSION")
set(_SEARCH_PATH_HINTS
"/usr/local/cloudberry-db-devel/bin"
"/usr/local/cloudberry-db/bin"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to add /usr/local/cbdb/bin?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we used this path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added path /usr/local/cloudberry/bin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the line just after this "$ENV{GPHOME}/bin" will help catch most scenarios. Users will be sourcing cloudberry-env.sh (Cloudberry 3+) or greenplum_path.sh (Cloudberry 2 ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea!

Comment on lines 27 to 29
if(_PG_CONFIG_HEADER_CONTENTS MATCHES "#define SERVERLESS 1")
message("-- Detected Hashdata Cloud (Cloudberry Serverless)")
set(CLOUDBERRY_SERVERLESS TRUE PARENT_SCOPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still no Cloudberry 3.0 yet. So can remove this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache MADlib should be able to build against both the REL_2_STABLE and main (3.0.0) branches. I believe it is better to keep support for Cloudberry 3.0. As main (3.0) has not released, maybe support for 3.0 can be labelled as experimental.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with ed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add the standard Apache license header to the new files, including FindCloudberry.cmake, and FindCloudberry_1.cmake and other files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@edespino edespino self-requested a review October 22, 2025 01:23
Copy link
Contributor

@edespino edespino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ What's Missing (Critical Issues)

1. No main CMakeLists.txt for Cloudberry:
   - src/ports/cloudberry/CMakeLists.txt doesn't exist
   - This file should mirror the structure of
     src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
   - Should define: port configuration, source files, SQL handling,
     build functions, and version management

2. Not integrated into the build system:
   - src/ports/CMakeLists.txt only contains:
       add_subdirectory(postgres)
       add_subdirectory(greenplum)
   - Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
   - Greenplum has GreenplumUtils.cmake with utility functions
   - May need similar utilities for Cloudberry-specific features

🔍 Current State

CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories

However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure

📊 Summary

The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.

To complete the implementation, you would need:

1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection

@edespino
Copy link
Contributor

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

@edespino
Copy link
Contributor

As @tuhaihe mentioned about ASF headers, when I ran the Apache Release Audit Tool (RAT), the following is seen (run the following in the root of the MADlib source: mvn apache-rat:check):

❯ head -30 target/rat.txt

*****************************************************
Summary
-------
Generated at: 2025-10-21T18:46:08-07:00
Notes: 4
Binaries: 5
Archives: 0
Standards: 311

Apache Licensed: 307
Generated Documents: 0

JavaDocs are generated and so license header is optional
Generated files do not required license headers

4 Unknown Licenses

*******************************

Unapproved licenses:

  src/ports/cloudberry/cmake/FindCloudberry.cmake
  src/ports/cloudberry/cmake/FindCloudberry_1.cmake
  src/ports/cloudberry/cmake/FindCloudberry_2.cmake
  src/ports/cloudberry/cmake/FindCloudberry_3.cmake

*******************************

@zhangwenchao-123
Copy link
Author

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

@tuhaihe
Copy link
Member

tuhaihe commented Oct 22, 2025

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

Yes, we should update the related description on the website. I’d like to help with this.

@zhangwenchao-123
Copy link
Author

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

Yes, we should update the related description on the website. I’d like to help with this.

Nice!

@zhangwenchao-123
Copy link
Author

❌ What's Missing (Critical Issues)

1. No main CMakeLists.txt for Cloudberry:
   - src/ports/cloudberry/CMakeLists.txt doesn't exist
   - This file should mirror the structure of
     src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
   - Should define: port configuration, source files, SQL handling,
     build functions, and version management

2. Not integrated into the build system:
   - src/ports/CMakeLists.txt only contains:
       add_subdirectory(postgres)
       add_subdirectory(greenplum)
   - Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
   - Greenplum has GreenplumUtils.cmake with utility functions
   - May need similar utilities for Cloudberry-specific features

🔍 Current State

CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories

However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure

📊 Summary

The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.

To complete the implementation, you would need:

1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection

Have fixed all mentioned problems and license lose

@edespino
Copy link
Contributor

PR Review: Cloudberry MADlib Build Issues

CMake Configuration Command

cmake \
    -DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \
    ..

CMake Configuration Error

Error:
CMake Error at src/CMakeLists.txt:202 (add_library):
  Cannot find source file:

    /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hpp

Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
.hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

CMake Error at src/CMakeLists.txt:202 (add_library):
No SOURCES given to target: madlib_cloudberry_3

CMake Generate step failed. Build files cannot be regenerated correctly.

Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61

Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum
directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:

  • Compatibility.hpp
  • dbconnector.hpp

Additional Build Errors (After Manual Directory Creation)

After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in
Compatibility.hpp:

  1. AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
  2. WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
  3. Missing function: format_procedure not declared
  4. Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist

These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.

@edespino
Copy link
Contributor

@zhangwenchao-123 - Unless absolutely necessary, there is no need to force push additional PR commits. This will allow us to view the PR history easily.

@zhangwenchao-123 zhangwenchao-123 force-pushed the support_cloudberry branch 2 times, most recently from 00de02c to 1aad3dd Compare October 22, 2025 06:57
Fix SEGFAULT memory bugs

There're weird SEGFAULT bug due to custom allocation erroneously paired with std::free (should be custom free) and we're unable to solve them. This is a workaround.
@zhangwenchao-123
Copy link
Author

zhangwenchao-123 commented Oct 23, 2025

PR Review: Cloudberry MADlib Build Issues

CMake Configuration Command

cmake \
    -DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \
    ..

CMake Configuration Error

Error:
CMake Error at src/CMakeLists.txt:202 (add_library):
  Cannot find source file:

    /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hpp

Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

CMake Error at src/CMakeLists.txt:202 (add_library): No SOURCES given to target: madlib_cloudberry_3

CMake Generate step failed. Build files cannot be regenerated correctly.

Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61

Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:

  • Compatibility.hpp
  • dbconnector.hpp

Additional Build Errors (After Manual Directory Creation)

After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in Compatibility.hpp:

  1. AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
  2. WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
  3. Missing function: format_procedure not declared
  4. Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist

These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.

Yeah, there are some other commits not picked, I will continue to complete this PR and test it.

Copy link
Contributor

@edespino edespino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few changes to consider. I have more testing of this PR to perform.

Please do not force push changes to this PR. I want to be able to follow the history of this work. Force pushing is not helping.

name: Greenplum DB

cloudberry:
name: Cloudberry DB No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudberry DB should be Apache Cloudberry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)
set(${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS
"${${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS}/internal")
message("-- Detected Cloudberry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message("-- Detected Cloudberry") should be message("-- Detected Apache Cloudberry")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

# only need the first two digits for <= 4.3.4
dbver = '.'.join(map(str, dbver_split[:2]))
elif portid == 'cloudberry':
# Assume Cloudberry will stick to semantic versioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume Cloudberry will stick to semantic versioning should be Assume Apache Cloudberry will stick to semantic versioning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this symlink needed? I believe we should only be providing support for the Apache Cloudberry 2 & 3 (future) releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# 4.3.5+ from versions < 4.3.5
match = re.search("Greenplum[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr)
elif portid == 'cloudberry':
match = re.search("Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" should be "Apache Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" ?

I am not entirely sure about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudberry is enough to achieve our goal, while Apache Cloudberry is more accurate that maybe is better.

requirements.txt Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this empty file needed?

I noticed this when I ran the Apache Release Audit tool (mvn apache-rat:check).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In apache cloudberry, it's not needed. I will remove it.

# implying we only need 1 folder for same major versions
set(VERSION ${${PORT_UC}_VERSION_MAJOR})
elseif(${PORT_UC} STREQUAL "CLOUDBERRY")
# Assumes CBDB always follows semantic versioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Assumes CBDB always follows semantic versioning
# Assumes Apache Cloudberry always follows semantic versioning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fiexd

libdir = libdir.decode()

libdir = libdir.strip()+'/postgresql'
libdir = str(libdir.strip(), encoding='utf-8')+'/postgresql'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing Note: Encountering TypeError: decoding str is not supported when installing on PostgreSQL 14.19.

Root Cause:
For Postgres 13+ (line 1347), libdir is already decoded to a string via .decode(), but line 1349 attempts to decode it again with
str(libdir.strip(), encoding='utf-8'), which fails because you cannot decode a string that's already been decoded.

Recommended Solution:
Ensure libdir is always decoded to a string before line 1349, then simply strip and append the path:

libdir = subprocess.check_output(['pg_config','--libdir'])
if ((portid == 'greenplum' and is_rev_gte(dbver_split, get_rev_num('7.0'))) or
    (portid == 'postgres' and is_rev_gte(dbver_split, get_rev_num('13.0')))):
    libdir = libdir.decode()
else:
    libdir = libdir.decode('utf-8')

libdir = libdir.strip() + '/postgresql'

This ensures libdir is consistently a string for all code paths (older and newer versions), eliminating the type inconsistency that causes the
error.

Request for Review: Please validate this fix works correctly for both:
- Older versions (Postgres <13, Greenplum <7) where subprocess.check_output() returns bytes
- Newer versions (Postgres 13+, Greenplum 7+) where explicit decoding is needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@zhangwenchao-123
Copy link
Author

All mentioned comments have been fixed and have tested it in cloudberry 3.0.

@tuhaihe
Copy link
Member

tuhaihe commented Oct 29, 2025

Hi @zhangwenchao-123 could you rebase your commits on the latest madlib2-master? Let's see if the CI can pass successfully. Thanks!

@zhangwenchao-123
Copy link
Author

Hi @zhangwenchao-123 could you rebase your commits on the latest madlib2-master? Let's see if the CI can pass successfully. Thanks!

It's the NOTICE file check failed, I have fixed and test whether it can pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed these two files, FindCloudberry_2.cmake & FindCloudberry_3.cmake, are all symbolic links to FindCloudberry.cmake. Should we create them like GP / PG as ASCII text files? FYI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in e665a9f

@tuhaihe
Copy link
Member

tuhaihe commented Oct 31, 2025

Based on the new codebase, I can build and deploy the MADlib into the Cloudberry 2.0 + 3.0 (main) gpdemo database:

  1. Build the Cloudberry gpdemo env following the docs

  2. Build and deploy the MADlib

## Download this PR change
git clone https://github.com/apache/madlib.git
cd madlib
git fetch origin pull/627/head:zhangwenchao-123/support_cloudberry
git switch zhangwenchao-123/support_cloudberry


## Set Python env
sudo alternatives --install /usr/bin/python python /usr/bin/python3 1

## Install required depencies to the Cloudberry Dev container
sudo dnf install boost-devel -y
sudo dnf install -y graphviz # for docs
sudo dnf install --enablerepo=crb doxygen -y # for docs
pip install mock pandas numpy xgboost scikit-learn pyyaml pyxb-x pypmml

## 
cd ~/madlib
mkdir build ; cd build

## for Cloudberry 3.0
cmake \
    -DCLOUDBERRY_3_PG_CONFIG=$GPHOME/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=$GPHOME/bin/postgres \
    ..

## for Cloudberry 2.0
cmake \
    -DCLOUDBERRY_2_PG_CONFIG=$GPHOME/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_2_EXECUTABLE=$GPHOME/bin/postgres \
    ..

## Make, deploy, and run test
make -j$(nproc)
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install-check

If something wrong, please help correct me. Thanks!

Comment on lines 310 to 317

import collections
import collections.abc
if not hasattr(collections, 'MutableSequence'):
collections.MutableSequence = collections.abc.MutableSequence
collections.MutableMapping = collections.abc.MutableMapping
collections.MutableSet = collections.abc.MutableSet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move the Python3 compatibility code into src/ports/postgres/modules/pmml/__init__.py_in to avoid the SQL-side code interfering with M4 macro expansion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been successfully tested in the Cloudberry environment and MADlib Jenkins CI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in e665a9f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add the ASF license header to this file:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at

#   http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in d018805

@tuhaihe
Copy link
Member

tuhaihe commented Feb 11, 2026

Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:

[gpadmin@cdw build]$ ./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install-check
madpack.py: INFO : Detected Apache Cloudberry version 2.0.0.
TEST CASE RESULT|Module: array_ops|array_ops.ic.sql_in|PASS|Time: 74 milliseconds
TEST CASE RESULT|Module: bayes|bayes.ic.sql_in|PASS|Time: 320 milliseconds
TEST CASE RESULT|Module: crf|crf_test_small.ic.sql_in|PASS|Time: 285 milliseconds
TEST CASE RESULT|Module: crf|crf_train_small.ic.sql_in|PASS|Time: 285 milliseconds
TEST CASE RESULT|Module: elastic_net|elastic_net.ic.sql_in|PASS|Time: 190 milliseconds
TEST CASE RESULT|Module: linalg|svd.ic.sql_in|PASS|Time: 572 milliseconds
TEST CASE RESULT|Module: linalg|matrix_ops.ic.sql_in|PASS|Time: 822 milliseconds
TEST CASE RESULT|Module: linalg|linalg.ic.sql_in|PASS|Time: 76 milliseconds
TEST CASE RESULT|Module: pmml|pmml.ic.sql_in|PASS|Time: 452 milliseconds
TEST CASE RESULT|Module: prob|prob.ic.sql_in|PASS|Time: 28 milliseconds
TEST CASE RESULT|Module: svm|svm.ic.sql_in|PASS|Time: 315 milliseconds
TEST CASE RESULT|Module: tsa|arima.ic.sql_in|PASS|Time: 1074 milliseconds
TEST CASE RESULT|Module: stemmer|porter_stemmer.ic.sql_in|PASS|Time: 34 milliseconds
TEST CASE RESULT|Module: conjugate_gradient|conj_grad.ic.sql_in|PASS|Time: 142 milliseconds
TEST CASE RESULT|Module: knn|knn.ic.sql_in|PASS|Time: 175 milliseconds
TEST CASE RESULT|Module: lda|lda.ic.sql_in|PASS|Time: 246 milliseconds
TEST CASE RESULT|Module: stats|correlation.ic.sql_in|PASS|Time: 182 milliseconds
TEST CASE RESULT|Module: stats|mw_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|pred_metrics.ic.sql_in|PASS|Time: 255 milliseconds
TEST CASE RESULT|Module: stats|chi2_test.ic.sql_in|PASS|Time: 37 milliseconds
TEST CASE RESULT|Module: stats|anova_test.ic.sql_in|PASS|Time: 47 milliseconds
TEST CASE RESULT|Module: stats|t_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|cox_prop_hazards.ic.sql_in|PASS|Time: 211 milliseconds
TEST CASE RESULT|Module: stats|ks_test.ic.sql_in|PASS|Time: 84 milliseconds
TEST CASE RESULT|Module: stats|robust_and_clustered_variance_coxph.ic.sql_in|PASS|Time: 355 milliseconds
TEST CASE RESULT|Module: stats|wsr_test.ic.sql_in|PASS|Time: 46 milliseconds
TEST CASE RESULT|Module: stats|f_test.ic.sql_in|PASS|Time: 38 milliseconds
TEST CASE RESULT|Module: utilities|utilities.ic.sql_in|PASS|Time: 115 milliseconds
TEST CASE RESULT|Module: utilities|pivot.ic.sql_in|PASS|Time: 119 milliseconds
TEST CASE RESULT|Module: utilities|path.ic.sql_in|PASS|Time: 159 milliseconds
TEST CASE RESULT|Module: utilities|transform_vec_cols.ic.sql_in|PASS|Time: 156 milliseconds
TEST CASE RESULT|Module: utilities|text_utilities.ic.sql_in|PASS|Time: 126 milliseconds
TEST CASE RESULT|Module: utilities|sessionize.ic.sql_in|PASS|Time: 105 milliseconds
TEST CASE RESULT|Module: utilities|encode_categorical.ic.sql_in|PASS|Time: 186 milliseconds
TEST CASE RESULT|Module: utilities|minibatch_preprocessing.ic.sql_in|PASS|Time: 186 milliseconds
TEST CASE RESULT|Module: assoc_rules|assoc_rules.ic.sql_in|FAIL|Time: 568 milliseconds
madpack.py: ERROR : Failed executing /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp
madpack.py: ERROR : Check the log at /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
TEST CASE RESULT|Module: convex|lmf.ic.sql_in|PASS|Time: 297 milliseconds
TEST CASE RESULT|Module: convex|mlp.ic.sql_in|PASS|Time: 507 milliseconds
TEST CASE RESULT|Module: deep_learning|keras_model_arch_table.ic.sql_in|PASS|Time: 149 milliseconds
TEST CASE RESULT|Module: glm|glm.ic.sql_in|PASS|Time: 906 milliseconds
TEST CASE RESULT|Module: graph|graph.ic.sql_in|PASS|Time: 1343 milliseconds
TEST CASE RESULT|Module: linear_systems|sparse_linear_sytems.ic.sql_in|PASS|Time: 132 milliseconds
TEST CASE RESULT|Module: linear_systems|dense_linear_sytems.ic.sql_in|PASS|Time: 125 milliseconds
TEST CASE RESULT|Module: recursive_partitioning|decision_tree.ic.sql_in|PASS|Time: 252 milliseconds
TEST CASE RESULT|Module: recursive_partitioning|random_forest.ic.sql_in|PASS|Time: 322 milliseconds
TEST CASE RESULT|Module: regress|robust.ic.sql_in|PASS|Time: 193 milliseconds
TEST CASE RESULT|Module: regress|logistic.ic.sql_in|PASS|Time: 249 milliseconds
TEST CASE RESULT|Module: regress|linear.ic.sql_in|PASS|Time: 31 milliseconds
TEST CASE RESULT|Module: regress|clustered.ic.sql_in|PASS|Time: 189 milliseconds
TEST CASE RESULT|Module: regress|multilogistic.ic.sql_in|PASS|Time: 323 milliseconds
TEST CASE RESULT|Module: regress|marginal.ic.sql_in|PASS|Time: 457 milliseconds
TEST CASE RESULT|Module: sample|balance_sample.ic.sql_in|PASS|Time: 139 milliseconds
TEST CASE RESULT|Module: sample|train_test_split.ic.sql_in|PASS|Time: 166 milliseconds
TEST CASE RESULT|Module: sample|sample.ic.sql_in|PASS|Time: 20 milliseconds
TEST CASE RESULT|Module: sample|stratified_sample.ic.sql_in|PASS|Time: 112 milliseconds
TEST CASE RESULT|Module: summary|summary.ic.sql_in|PASS|Time: 148 milliseconds
TEST CASE RESULT|Module: kmeans|kmeans.ic.sql_in|PASS|Time: 661 milliseconds
TEST CASE RESULT|Module: pca|pca.ic.sql_in|PASS|Time: 1475 milliseconds
TEST CASE RESULT|Module: pca|pca_project.ic.sql_in|PASS|Time: 528 milliseconds
TEST CASE RESULT|Module: validation|cross_validation.ic.sql_in|PASS|Time: 332 milliseconds
INFO: Log files saved in /tmp/madlib.7qnxdkya
[gpadmin@cdw build]$ cat /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
-- Switch to test user:
SET ROLE "madlib_210_installcheck_postgres";
SET
-- Set SEARCH_PATH for install-check:
SET search_path=madlib_installcheck_assoc_rules,madlib;
SET
/* ----------------------------------------------------------------------- *//**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 *//* ----------------------------------------------------------------------- */
---------------------------------------------------------------------------
-- Rules:
-- ------
-- 1) Any DB objects should be created w/o schema prefix,
--    since this file is executed in a separate schema context.
-- 2) There should be no DROP statements in this script, since
--    all objects created in the default schema will be cleaned-up outside.
---------------------------------------------------------------------------
---------------------------------------------------------------------------
-- Setup:
---------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION assoc_array_eq
    (
    arr1 TEXT[],
    arr2 TEXT[]
    )
RETURNS BOOL AS $$
    SELECT COUNT(*) = array_upper($1, 1) AND array_upper($1, 1) = array_upper($2, 1)
    FROM (SELECT unnest($1) id) t1, (SELECT unnest($2) id) t2
    WHERE t1.id = t2.id;

$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION
CREATE OR REPLACE FUNCTION install_test() RETURNS VOID AS $$
declare
    result1        TEXT;
    result2        TEXT;
    result3        TEXT;
    result_maxiter TEXT;
    res            madlib.assoc_rules_results;
    output_schema  TEXT;
    output_table   TEXT;
    total_rules    INT;
    total_time     INTERVAL;
begin
    DROP TABLE IF EXISTS test_data1;
    CREATE TABLE test_data1 (
        trans_id INT
        , product INT
    );

    DROP TABLE IF EXISTS test_data2;
    CREATE TABLE test_data2 (
        trans_id INT
        , product VARCHAR
    );


    INSERT INTO test_data1 VALUES (1,1);
    INSERT INTO test_data1 VALUES (1,2);
    INSERT INTO test_data1 VALUES (3,3);
    INSERT INTO test_data1 VALUES (8,4);
    INSERT INTO test_data1 VALUES (10,1);
    INSERT INTO test_data1 VALUES (10,2);
    INSERT INTO test_data1 VALUES (10,3);
    INSERT INTO test_data1 VALUES (19,2);

    INSERT INTO test_data2 VALUES (1, 'beer');
    INSERT INTO test_data2 VALUES (1, 'diapers');
    INSERT INTO test_data2 VALUES (1, 'chips');
    INSERT INTO test_data2 VALUES (2, 'beer');
    INSERT INTO test_data2 VALUES (2, 'diapers');
    INSERT INTO test_data2 VALUES (3, 'beer');
    INSERT INTO test_data2 VALUES (3, 'diapers');
    INSERT INTO test_data2 VALUES (4, 'beer');
    INSERT INTO test_data2 VALUES (4, 'chips');
    INSERT INTO test_data2 VALUES (5, 'beer');
    INSERT INTO test_data2 VALUES (6, 'beer');
    INSERT INTO test_data2 VALUES (6, 'diapers');
    INSERT INTO test_data2 VALUES (6, 'chips');
    INSERT INTO test_data2 VALUES (7, 'beer');
    INSERT INTO test_data2 VALUES (7, 'diapers');

    DROP TABLE IF EXISTS test1_exp_result;
    CREATE TABLE test1_exp_result (
        ruleid integer,
        pre text[],
        post text[],
        support double precision,
        confidence double precision,
        lift double precision,
        conviction double precision
    ) ;

    DROP TABLE IF EXISTS test2_exp_result;
    CREATE TABLE test2_exp_result (
        ruleid integer,
        pre text[],
        post text[],
        support double precision,
        confidence double precision,
        lift double precision,
        conviction double precision
    ) ;


    INSERT INTO test1_exp_result VALUES (7, '{3}', '{1}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
    INSERT INTO test1_exp_result VALUES (4, '{2}', '{1}', 0.40000000000000002, 0.66666666666666674, 1.6666666666666667, 1.8000000000000003);
    INSERT INTO test1_exp_result VALUES (1, '{1}', '{2,3}', 0.20000000000000001, 0.5, 2.4999999999999996, 1.6000000000000001);
    INSERT INTO test1_exp_result VALUES (9, '{2,3}', '{1}', 0.20000000000000001, 1, 2.4999999999999996, 0);
    INSERT INTO test1_exp_result VALUES (6, '{1,2}', '{3}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
    INSERT INTO test1_exp_result VALUES (8, '{3}', '{2}', 0.20000000000000001, 0.5, 0.83333333333333337, 0.80000000000000004);
    INSERT INTO test1_exp_result VALUES (5, '{1}', '{2}', 0.40000000000000002, 1, 1.6666666666666667, 0);
    INSERT INTO test1_exp_result VALUES (2, '{3}', '{2,1}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
    INSERT INTO test1_exp_result VALUES (10, '{3,1}', '{2}', 0.20000000000000001, 1, 1.6666666666666667, 0);
    INSERT INTO test1_exp_result VALUES (3, '{1}', '{3}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);

    INSERT INTO test2_exp_result VALUES (7, '{chips,diapers}', '{beer}', 0.2857142857142857, 1, 1, 0);
    INSERT INTO test2_exp_result VALUES (2, '{chips}', '{diapers}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
    INSERT INTO test2_exp_result VALUES (1, '{chips}', '{diapers,beer}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
    INSERT INTO test2_exp_result VALUES (6, '{diapers}', '{beer}', 0.7142857142857143, 1, 1, 0);
    INSERT INTO test2_exp_result VALUES (4, '{beer}', '{diapers}', 0.7142857142857143, 0.7142857142857143, 1, 1);
    INSERT INTO test2_exp_result VALUES (3, '{chips,beer}', '{diapers}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
    INSERT INTO test2_exp_result VALUES (5, '{chips}', '{beer}', 0.42857142857142855, 1, 1, 0);

    res = madlib.assoc_rules (.1, .5, 'trans_id', 'product', 'test_data1','madlib_installcheck_assoc_rules', false);

    RETURN;

end $$ language plpgsql;
CREATE FUNCTION
---------------------------------------------------------------------------
-- Test
---------------------------------------------------------------------------
SELECT install_test();
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  table "test_data1" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as the Apache Cloudberry data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  table "test_data2" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as the Apache Cloudberry data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  table "test1_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as the Apache Cloudberry data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  table "test2_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as the Apache Cloudberry data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING:  terminating connection because of crash of another server process  (seg0 slice3 172.17.0.6:7002 pid=45213)
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING:  terminating connection because of crash of another server process  (seg0 slice1 172.17.0.6:7002 pid=45202)
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING:  terminating connection because of crash of another server process  (seg0 172.17.0.6:7002 pid=45137)
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING:  writer gang of current global transaction is lost
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING:  Any temporary tables for this session have been dropped because the gang was disconnected (session id = 596)
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: ERROR:  DTX RollbackAndReleaseCurrentSubTransaction dispatch failed
CONTEXT:  PL/Python function "assoc_rules"
PL/pgSQL function install_test() line 93 at assignment

@tuhaihe
Copy link
Member

tuhaihe commented Feb 11, 2026

Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:

The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.

Thanks again.

Add bounds checking before accessing unique value arrays to prevent
out-of-bounds reads in the SparseData operation loop.

Problem:
In op_sdata_by_sdata(), the loop increments indices i and j to
traverse the unique values in left and right SparseData structures.
After incrementing, the code immediately accesses vals->data[i] and
vals->data[j] in the next iteration without verifying that i and j
are within bounds (i.e., < unique_value_count). This could lead to
reading beyond the allocated array boundaries.

Solution:
Add explicit bounds checking after index increments and before
accessing the arrays. The check breaks the loop if either index
reaches or exceeds the respective unique_value_count, preventing
invalid memory access.

The fix is placed after the index increment logic (lines 1088-1101)
and before reading run_length values and accessing the vals arrays,
ensuring all subsequent array operations are safe.
@zhangyue1818
Copy link

Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:

The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.

Thanks again.

fix in b57e5a9

@tuhaihe
Copy link
Member

tuhaihe commented Feb 11, 2026

Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:

The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.
Thanks again.

fix in b57e5a9

Thanks! Now tested and run well both on Docker and vitual machine env.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants