Skip to content

[Bug](function) count_substrings returns wrong result for overlapping pattern boundary #62768

@hakanuzum

Description

@hakanuzum

Search before asking

  • I searched in the issues and found no similar issues.

Version

master (after PR #62262)

What happened

count_substrings('ccc', 'cc') returns 2 instead of expected 1 (non-overlapping count).

The regression test test_string_all.groovy expects 1, but BE returns 2, causing cloud_p0 CI failures.

Root Cause

In PR #62262, function_string.h was split into function_string_search.cpp. The find_pos() function checks pos < str_size but does not verify pos + pattern_ref.size <= str_size, so it matches a pattern that extends beyond the string boundary via memcmp_small_allow_overflow15.

// function_string_search.cpp, find_pos()
while (pos < str_size &&    // ← should be: pos + pattern_ref.size <= str_size
       memcmp_small_allow_overflow15(...)) {
    pos++;
}

For count_substrings('ccc', 'cc'):

  • pos=0: matches cc at [0,1] → count=1, advance to pos=2
  • pos=2: pos < str_size (2 < 3) is true, memcmp reads bytes [2,3] which overflows → false match → count=2

Expected behavior

count_substrings('ccc', 'cc') should return 1 (non-overlapping).

How to reproduce

SELECT count_substrings('ccc', 'cc');
-- Returns: 2
-- Expected: 1

Anything else

This causes qt_count_substrings_53 in test_string_all.groovy to fail in CI.
Introduced by refactoring PR #62262.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions