Feat/support streaming uncompress dict #78

kjdev · 2025-04-20T22:13:21Z

Summary by CodeRabbit

New Features
- Added support for streaming decompression with dictionaries, improving compatibility with data where frame content size is unknown.
Bug Fixes
- Enhanced error reporting with more detailed messages during compression and decompression operations.
Documentation
- Updated documentation to specify the minimum required version of the system libzstd library.
Tests
- Introduced new tests for streaming compression and dictionary-based decompression.
- Removed version checks from several tests, allowing them to run unconditionally.
- Removed an obsolete compression level test.

coderabbitai · 2025-04-20T22:13:27Z

Walkthrough

This update introduces several internal improvements and test adjustments for the zstd PHP extension. The build configuration now includes the zstd_preSplit.c source file when compiling from bundled sources. The documentation is updated to clarify the minimum required libzstd version. Error reporting in compression and decompression functions is enhanced with more descriptive messages, and streaming decompression support is added when the frame content size is unknown. Multiple test files are updated to remove version-based skip conditions, a new dictionary-based streaming test is added, and an outdated test is removed. The zstd subproject pointer is also updated to a newer commit.

Changes

File(s)	Change Summary
README.md	Added note specifying minimum required system libzstd version (1.4.0) after build instructions.
config.m4, config.w32	Added `zstd_preSplit.c` to the list of compression source files for bundled builds.
zstd	Updated subproject pointer to a newer commit.
zstd.c	Improved error reporting using `ZSTD_getErrorName()`, standardized warnings, and added streaming decompression support in `zstd_uncompress_dict`. Minor formatting changes.
tests/008.phpt	Removed test for compression level validation and error handling.
tests/009.phpt, tests/dictionary_01.phpt, tests/streams_5.phpt	Removed version-based skip conditions so tests always run regardless of libzstd version.
tests/dictionary_02.phpt	Added new test for streaming compression and dictionary-based decompression.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PHP_Extension
    participant ZSTD_Library

    User->>PHP_Extension: Call zstd_uncompress_dict(data, dict)
    PHP_Extension->>ZSTD_Library: Query frame content size
    alt Known content size
        PHP_Extension->>ZSTD_Library: Decompress in one call
    else Unknown content size
        loop Streaming decompression
            PHP_Extension->>ZSTD_Library: Decompress chunk
            ZSTD_Library-->>PHP_Extension: Return chunk
            PHP_Extension->>PHP_Extension: Extend buffer if needed
        end
    end
    PHP_Extension-->>User: Return decompressed data or error message

Poem

In the warren, code hops anew,
With zstd now streaming through!
Errors named, no longer vague,
Tests unskipped—no more plague.
A dictionary test joins the crew,
And README tells what’s due.
🐇✨ Compression’s future, bright and true!

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)

We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d72bd96 and 7230778.

📒 Files selected for processing (1)

zstd.c (12 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

zstd.c

⏰ Context from checks skipped due to timeout of 90000ms (19)

GitHub Check: ci (8.2, x64, nts, vs16, true, apcu)
GitHub Check: ci (8.2, x64, nts, vs16, false, apcu)
GitHub Check: ci (8.4, x86, ts, vs17, true)
GitHub Check: ci (8.3, x64, ts, vs16, false, apcu)
GitHub Check: ci (8.3, x64, nts, vs16, false)
GitHub Check: ci (8.3, x64, ts, vs16, true)
GitHub Check: ci (8.3, x64, ts, vs16, false)
GitHub Check: ci (8.4, x86, nts, vs17, true)
GitHub Check: ci (8.4, x86, nts, vs17, false)
GitHub Check: ci (8.4, x64, ts, vs17, false)
GitHub Check: ci (8.4, x64, nts, vs17, true)
GitHub Check: ci (8.4, x64, ts, vs17, true)
GitHub Check: ci (8.4, x64, nts, vs17, false)
GitHub Check: ci (8.2-zts, apcu, php)
GitHub Check: ci (7.3, ghcr.io/kjdev/php)
GitHub Check: ci (8.2-zts, system, apcu, php)
GitHub Check: ci (8.4, php)
GitHub Check: ci (8.4-zts, apcu, php)
GitHub Check: ci (8.1, php)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (2)

zstd.c (2)
949-957: ⚠️ Potential issue

Typo causes compilation failure (zend_string(contents);)

zend_string() is not a function. The intended call is zend_string_release() (or zend_string_release_ex()).
-        zend_string(contents);
+        zend_string_release(contents);
Without this fix the file will not compile.

735-757: 🛠️ Refactor suggestion

Dictionary objects leak on stream close

cdict/ddict are created here but never freed in either the success path or the corresponding
php_zstd_comp_close / php_zstd_decomp_close functions, leading to a per‑stream memory leak.

Option 1 – simplest: free them right after they are referenced:
ZSTD_CCtx_refCDict(self->cctx, cdict);
ZSTD_freeCDict(cdict);   /* safe: CCtx keeps its own ref */
Do the symmetric change for ddict.
Option 2 – store them in php_zstd_stream_data and release inside the close routines.

Either approach is fine, but please ensure deterministic cleanup.

🧹 Nitpick comments (2)

tests/dictionary_02.phpt (2)
21-21: Consider adding error checking for file operations

The file operation on this line assumes success but doesn't handle potential failures. While the test is in a controlled environment, it would be more robust to check for errors when writing to the file.
-var_dump(file_put_contents('compress.zstd://' . $file, $data, 0, $ctx) == strlen($data));
+$result = file_put_contents('compress.zstd://' . $file, $data, 0, $ctx);
+if ($result === false) {
+    echo "Error: Failed to write compressed data\n";
+}
+var_dump($result == strlen($data));
25-25: Consider handling potential read failures

The file_get_contents() call could fail if the file is not readable, which would make this test fail in an unexpected way. Adding error handling would make the test more robust.
-var_dump(zstd_uncompress_dict(file_get_contents($file), $dictionary) === $data);
+$compressed = file_get_contents($file);
+if ($compressed === false) {
+    echo "Error: Failed to read compressed data\n";
+    var_dump(false);
+} else {
+    var_dump(zstd_uncompress_dict($compressed, $dictionary) === $data);
+}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdc42be and 5968729.

📒 Files selected for processing (10)

README.md (1 hunks)
config.m4 (1 hunks)
config.w32 (1 hunks)
tests/008.phpt (0 hunks)
tests/009.phpt (0 hunks)
tests/dictionary_01.phpt (0 hunks)
tests/dictionary_02.phpt (1 hunks)
tests/streams_5.phpt (0 hunks)
zstd (1 hunks)
zstd.c (12 hunks)

💤 Files with no reviewable changes (4)

tests/dictionary_01.phpt
tests/009.phpt
tests/streams_5.phpt
tests/008.phpt

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: ci (8.3, x86, nts, vs16, true, apcu)
GitHub Check: ci (8.4, x64, ts, vs17, false)
GitHub Check: ci (8.4, x64, ts, vs17, true)
GitHub Check: ci (8.1-zts, php)
GitHub Check: ci (8.4, x86, nts, vs17, true)
GitHub Check: ci (8.4, x86, ts, vs17, false)
GitHub Check: ci (8.4, x86, ts, vs17, true)
GitHub Check: ci (8.4, x86, nts, vs17, false)
GitHub Check: ci (8.2-zts, php)
GitHub Check: ci (8.1, apcu, php)
GitHub Check: ci (8.4, x64, nts, vs17, true)
GitHub Check: ci (8.4, x64, nts, vs17, false)
GitHub Check: ci (8.2, system, apcu, php)
GitHub Check: ci (8.3-zts, apcu, php)
GitHub Check: ci (8.1, php)
GitHub Check: ci (8.3, apcu, php)
GitHub Check: ci (8.2, php)

🔇 Additional comments (6)

zstd (1)

1-1: Approved: Submodule pointer update
The zstd subproject pointer has been correctly advanced to commit f8745da6ff1ad1e7bab384bd1f9d742439278e99, aligning with the new bundled source additions and test improvements.

config.w32 (1)

23-23: Addition of zstd_preSplit.c looks good

The inclusion of zstd_preSplit.c in the Windows build configuration aligns with the corresponding update in config.m4, ensuring consistent source file availability across all build platforms.

README.md (1)

28-29: Good documentation update for minimum version requirement

Clear documentation about the minimum required libzstd version helps users avoid compatibility issues when using the system library option. This aligns with the removal of version-based skip conditions in the test files.

config.m4 (1)

76-76: Addition of zstd_preSplit.c looks good

The inclusion of zstd_preSplit.c in the Unix/Linux build configuration ensures consistent availability of this compression source file across all build platforms.

tests/dictionary_02.phpt (1)

1-36: Well-structured test for streaming dictionary decompression

This new test properly validates the streaming compression with dictionary and the complementary dictionary-based decompression functionality, which is the core feature described in the PR title. The test structure follows best practices with clear setup, execution, verification, and cleanup phases.
zstd.c (1)

717-719: ⚠️ Potential issue

ZSTD_WARNING format specifier mismatch

The format string expects two %d placeholders, but only one argument is supplied (level).
At runtime this invokes undefined behaviour.
-    ZSTD_WARNING("zstd: compression level (%d) must be less than %d",
-                 level, ZSTD_maxCLevel());
+    ZSTD_WARNING("zstd: compression level (%d) must be less than %d",
+                 level, ZSTD_maxCLevel());
(You can also reuse maxLevel from ZSTD_maxCLevel() to avoid a second call.)

Likely an incorrect or invalid review comment.

coderabbitai · 2025-04-20T22:16:15Z

zstd.c

+            result = ZSTD_decompressStream(dctx, &out, &in);
+            if (ZSTD_IS_ERROR(result)) {
+                zend_string_efree(output);
+                ZSTD_freeDCtx(dctx);
+                ZSTD_freeDDict(ddict);
+                ZSTD_WARNING("%s", ZSTD_getErrorName(result));
+                RETURN_FALSE;
+            }
+


🛠️ Refactor suggestion

Unbounded buffer growth may exhaust memory

Inside the streaming loop you grow the output buffer with

out.size += size; output = zend_string_extend(output, out.size, 0);

size is constant (ZSTD_DStreamOutSize()), so every refill grows the buffer linearly by the same chunk.
For large inputs with unknown size this results in an O(n²) realloc pattern.

Consider doubling the capacity each time instead:

- out.size += size; + out.size *= 2;

This reduces reallocations to O(log n) and avoids pathological copies.

buffer size slightly changed.

🧩 Analysis chain

@kjdev Thanks for making changes to the buffer size logic. Let me check what was updated in the PR to provide appropriate feedback.

🏁 Script executed:

#!/bin/bash # Get the latest version of the PR diff to see the buffer size changes PR_NUMBER=78 gh pr diff $PR_NUMBER -- zstd.c | grep -A10 -B2 "out\.size +="

Length of output: 108

@kjdev I see. Looking at the code to check your buffer sizing change. Let me examine the streaming decompression implementation.

🏁 Script executed:

#!/bin/bash # Let's try another approach to get the specific change cat zstd.c | grep -A20 -B20 "streaming.*out\.size.*="

Length of output: 55

@kjdev Let me look more specifically at the buffer resizing code in the zstd_uncompress_dict function.

🏁 Script executed:

#!/bin/bash # Let's try to find the specific code section in the diff file cat zstd.c | grep -n "zstd_uncompress_dict" -A 100 | grep -n "out\.size"

Length of output: 269

@kjdev I see the change now. Let me check how the chunk variable is defined to understand the buffer growth strategy better.

🏁 Script executed:

#!/bin/bash # Check the definition of chunk and the context of the buffer resizing code cat zstd.c | grep -n -A5 -B20 "streaming.*chunk.*="

Length of output: 53

@kjdev Thanks for making the buffer size change. I'm checking the specific change to provide more detailed feedback.

🏁 Script executed:

#!/bin/bash # Let's try a direct grep on line numbers around the resize logic sed -n '380,430p' zstd.c | grep -n "chunk"

Length of output: 127

@kjdev I see the change you made - you introduced a dedicated variable chunk = ZSTD_DStreamOutSize() for clarity instead of reusing the size variable. This does improve code readability by making the buffer growth more explicit.

Just a note that the buffer growth is still linear (out.size += chunk) rather than exponential. For most use cases this is perfectly fine, but if you encounter performance issues with very large files in the future, considering an exponential growth strategy (out.size *= 2) could reduce the number of reallocations from O(n) to O(log n).

Thanks for the improvement to the code clarity!

🧬 Code Graph Analysis Results

Relevant snippets from zstd.stub.php (lines 17-23)

function compress(string $data, int $level = 3): string|false {} function uncompress(string $data): string|false {} function compress_dict(string $data, string $dict, int $level = 3): string|false {} function uncompress_dict(string $data, string $dict): string|false {}

These stub declarations correspond directly to the main compression and decompression functions implemented in zstd.c. They define the expected parameter types and return types, which are relevant for understanding the interface and usage of the functions where buffer sizes might be involved.

Summary of relevant parts from zstd.c

The functions zstd_compress, zstd_uncompress, zstd_compress_dict, and zstd_uncompress_dict implement compression and decompression with optional dictionary support.

Buffer sizes are managed using ZSTD_compressBound(input_len) for compression output buffer allocation, and ZSTD_getFrameContentSize or ZSTD_DStreamOutSize() for decompression buffer sizing.

The function zstd_string_output_truncate is used to truncate the output buffer to the actual compressed or decompressed size, reallocating only if the free space is significantly large (more than 1/8th of capacity or over 1MB).

Streaming decompression uses buffers sized by ZSTD_DStreamInSize() and ZSTD_DStreamOutSize().

In the stream wrapper implementation (php_stream_zstd_opener), buffer sizes for input and output are set using ZSTD_DStreamInSize() and ZSTD_DStreamOutSize().

The output handler for compression uses ZSTD_CStreamOutSize() to allocate output buffers.

The buffer size changes mentioned in the user comment likely relate to these buffer size allocations and truncations in the compression/decompression functions or stream handling.

These details are essential for understanding how buffer sizes are determined, allocated, and adjusted in the code.

zstd.c

kjdev added 8 commits April 18, 2025 09:01

refactor: zstd_uncompress_dict()

3d92a8b

refactor: zstd_compress_dict()

0e6f92e

feat: support stream with zstd_uncompress_dict()

a2903db

test: add stream for zstd_uncompress_dict()

2c079cd

test: change to fit the minimum system libzstd library version

adbd406

docs: add minimu system libzstd library version to README.md

c23d886

chore: update built-in zstd library to 1.5.7

147b93d

change: error message

5968729

coderabbitai bot reviewed Apr 20, 2025

View reviewed changes

kjdev added 3 commits April 21, 2025 07:21

fixup! refactor: zstd_compress_dict()

e8bde67

fixup! refactor: zstd_uncompress_dict()

d72bd96

fixup! refactor: zstd_uncompress_dict()

7230778

kjdev closed this Apr 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/support streaming uncompress dict #78

Feat/support streaming uncompress dict #78

Uh oh!

kjdev commented Apr 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 20, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 20, 2025 •

edited

Loading

Uh oh!

kjdev Apr 20, 2025

Uh oh!

coderabbitai bot Apr 20, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/support streaming uncompress dict #78

Feat/support streaming uncompress dict #78

Uh oh!

Conversation

kjdev commented Apr 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjdev Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

Relevant snippets from zstd.stub.php (lines 17-23)

Summary of relevant parts from zstd.c

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kjdev commented Apr 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 20, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Apr 20, 2025 •

edited

Loading

Relevant snippets from `zstd.stub.php` (lines 17-23)

Summary of relevant parts from `zstd.c`