-
Notifications
You must be signed in to change notification settings - Fork 29
Feat/support streaming uncompress dict #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces several internal improvements and test adjustments for the zstd PHP extension. The build configuration now includes the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PHP_Extension
participant ZSTD_Library
User->>PHP_Extension: Call zstd_uncompress_dict(data, dict)
PHP_Extension->>ZSTD_Library: Query frame content size
alt Known content size
PHP_Extension->>ZSTD_Library: Decompress in one call
else Unknown content size
loop Streaming decompression
PHP_Extension->>ZSTD_Library: Decompress chunk
ZSTD_Library-->>PHP_Extension: Return chunk
PHP_Extension->>PHP_Extension: Extend buffer if needed
end
end
PHP_Extension-->>User: Return decompressed data or error message
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (19)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
zstd.c (2)
949-957:⚠️ Potential issueTypo causes compilation failure (
zend_string(contents);)
zend_string()is not a function. The intended call iszend_string_release()(orzend_string_release_ex()).- zend_string(contents); + zend_string_release(contents);Without this fix the file will not compile.
735-757: 🛠️ Refactor suggestionDictionary objects leak on stream close
cdict/ddictare created here but never freed in either the success path or the corresponding
php_zstd_comp_close/php_zstd_decomp_closefunctions, leading to a per‑stream memory leak.Option 1 – simplest: free them right after they are referenced:
ZSTD_CCtx_refCDict(self->cctx, cdict); ZSTD_freeCDict(cdict); /* safe: CCtx keeps its own ref */Do the symmetric change for
ddict.
Option 2 – store them inphp_zstd_stream_dataand release inside the close routines.Either approach is fine, but please ensure deterministic cleanup.
🧹 Nitpick comments (2)
tests/dictionary_02.phpt (2)
21-21: Consider adding error checking for file operationsThe file operation on this line assumes success but doesn't handle potential failures. While the test is in a controlled environment, it would be more robust to check for errors when writing to the file.
-var_dump(file_put_contents('compress.zstd://' . $file, $data, 0, $ctx) == strlen($data)); +$result = file_put_contents('compress.zstd://' . $file, $data, 0, $ctx); +if ($result === false) { + echo "Error: Failed to write compressed data\n"; +} +var_dump($result == strlen($data));
25-25: Consider handling potential read failuresThe file_get_contents() call could fail if the file is not readable, which would make this test fail in an unexpected way. Adding error handling would make the test more robust.
-var_dump(zstd_uncompress_dict(file_get_contents($file), $dictionary) === $data); +$compressed = file_get_contents($file); +if ($compressed === false) { + echo "Error: Failed to read compressed data\n"; + var_dump(false); +} else { + var_dump(zstd_uncompress_dict($compressed, $dictionary) === $data); +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
README.md(1 hunks)config.m4(1 hunks)config.w32(1 hunks)tests/008.phpt(0 hunks)tests/009.phpt(0 hunks)tests/dictionary_01.phpt(0 hunks)tests/dictionary_02.phpt(1 hunks)tests/streams_5.phpt(0 hunks)zstd(1 hunks)zstd.c(12 hunks)
💤 Files with no reviewable changes (4)
- tests/dictionary_01.phpt
- tests/009.phpt
- tests/streams_5.phpt
- tests/008.phpt
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: ci (8.3, x86, nts, vs16, true, apcu)
- GitHub Check: ci (8.4, x64, ts, vs17, false)
- GitHub Check: ci (8.4, x64, ts, vs17, true)
- GitHub Check: ci (8.1-zts, php)
- GitHub Check: ci (8.4, x86, nts, vs17, true)
- GitHub Check: ci (8.4, x86, ts, vs17, false)
- GitHub Check: ci (8.4, x86, ts, vs17, true)
- GitHub Check: ci (8.4, x86, nts, vs17, false)
- GitHub Check: ci (8.2-zts, php)
- GitHub Check: ci (8.1, apcu, php)
- GitHub Check: ci (8.4, x64, nts, vs17, true)
- GitHub Check: ci (8.4, x64, nts, vs17, false)
- GitHub Check: ci (8.2, system, apcu, php)
- GitHub Check: ci (8.3-zts, apcu, php)
- GitHub Check: ci (8.1, php)
- GitHub Check: ci (8.3, apcu, php)
- GitHub Check: ci (8.2, php)
🔇 Additional comments (6)
zstd (1)
1-1: Approved: Submodule pointer update
Thezstdsubproject pointer has been correctly advanced to commitf8745da6ff1ad1e7bab384bd1f9d742439278e99, aligning with the new bundled source additions and test improvements.config.w32 (1)
23-23: Addition ofzstd_preSplit.clooks goodThe inclusion of
zstd_preSplit.cin the Windows build configuration aligns with the corresponding update in config.m4, ensuring consistent source file availability across all build platforms.README.md (1)
28-29: Good documentation update for minimum version requirementClear documentation about the minimum required libzstd version helps users avoid compatibility issues when using the system library option. This aligns with the removal of version-based skip conditions in the test files.
config.m4 (1)
76-76: Addition ofzstd_preSplit.clooks goodThe inclusion of
zstd_preSplit.cin the Unix/Linux build configuration ensures consistent availability of this compression source file across all build platforms.tests/dictionary_02.phpt (1)
1-36: Well-structured test for streaming dictionary decompressionThis new test properly validates the streaming compression with dictionary and the complementary dictionary-based decompression functionality, which is the core feature described in the PR title. The test structure follows best practices with clear setup, execution, verification, and cleanup phases.
zstd.c (1)
717-719:⚠️ Potential issue
ZSTD_WARNINGformat specifier mismatchThe format string expects two
%dplaceholders, but only one argument is supplied (level).
At runtime this invokes undefined behaviour.- ZSTD_WARNING("zstd: compression level (%d) must be less than %d", - level, ZSTD_maxCLevel()); + ZSTD_WARNING("zstd: compression level (%d) must be less than %d", + level, ZSTD_maxCLevel());(You can also reuse
maxLevelfromZSTD_maxCLevel()to avoid a second call.)Likely an incorrect or invalid review comment.
| result = ZSTD_decompressStream(dctx, &out, &in); | ||
| if (ZSTD_IS_ERROR(result)) { | ||
| zend_string_efree(output); | ||
| ZSTD_freeDCtx(dctx); | ||
| ZSTD_freeDDict(ddict); | ||
| ZSTD_WARNING("%s", ZSTD_getErrorName(result)); | ||
| RETURN_FALSE; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unbounded buffer growth may exhaust memory
Inside the streaming loop you grow the output buffer with
out.size += size;
output = zend_string_extend(output, out.size, 0);size is constant (ZSTD_DStreamOutSize()), so every refill grows the buffer linearly by the same chunk.
For large inputs with unknown size this results in an O(n²) realloc pattern.
Consider doubling the capacity each time instead:
- out.size += size;
+ out.size *= 2;This reduces reallocations to O(log n) and avoids pathological copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer size slightly changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@kjdev Thanks for making changes to the buffer size logic. Let me check what was updated in the PR to provide appropriate feedback.
🏁 Script executed:
#!/bin/bash
# Get the latest version of the PR diff to see the buffer size changes
PR_NUMBER=78
gh pr diff $PR_NUMBER -- zstd.c | grep -A10 -B2 "out\.size +="
Length of output: 108
@kjdev I see. Looking at the code to check your buffer sizing change. Let me examine the streaming decompression implementation.
🏁 Script executed:
#!/bin/bash
# Let's try another approach to get the specific change
cat zstd.c | grep -A20 -B20 "streaming.*out\.size.*="
Length of output: 55
@kjdev Let me look more specifically at the buffer resizing code in the zstd_uncompress_dict function.
🏁 Script executed:
#!/bin/bash
# Let's try to find the specific code section in the diff file
cat zstd.c | grep -n "zstd_uncompress_dict" -A 100 | grep -n "out\.size"
Length of output: 269
@kjdev I see the change now. Let me check how the chunk variable is defined to understand the buffer growth strategy better.
🏁 Script executed:
#!/bin/bash
# Check the definition of chunk and the context of the buffer resizing code
cat zstd.c | grep -n -A5 -B20 "streaming.*chunk.*="
Length of output: 53
@kjdev Thanks for making the buffer size change. I'm checking the specific change to provide more detailed feedback.
🏁 Script executed:
#!/bin/bash
# Let's try a direct grep on line numbers around the resize logic
sed -n '380,430p' zstd.c | grep -n "chunk"
Length of output: 127
@kjdev I see the change you made - you introduced a dedicated variable chunk = ZSTD_DStreamOutSize() for clarity instead of reusing the size variable. This does improve code readability by making the buffer growth more explicit.
Just a note that the buffer growth is still linear (out.size += chunk) rather than exponential. For most use cases this is perfectly fine, but if you encounter performance issues with very large files in the future, considering an exponential growth strategy (out.size *= 2) could reduce the number of reallocations from O(n) to O(log n).
Thanks for the improvement to the code clarity!
🧬 Code Graph Analysis Results
Relevant snippets from zstd.stub.php (lines 17-23)
function compress(string $data, int $level = 3): string|false {}
function uncompress(string $data): string|false {}
function compress_dict(string $data, string $dict, int $level = 3): string|false {}
function uncompress_dict(string $data, string $dict): string|false {}These stub declarations correspond directly to the main compression and decompression functions implemented in zstd.c. They define the expected parameter types and return types, which are relevant for understanding the interface and usage of the functions where buffer sizes might be involved.
Summary of relevant parts from zstd.c
- The functions
zstd_compress,zstd_uncompress,zstd_compress_dict, andzstd_uncompress_dictimplement compression and decompression with optional dictionary support. - Buffer sizes are managed using
ZSTD_compressBound(input_len)for compression output buffer allocation, andZSTD_getFrameContentSizeorZSTD_DStreamOutSize()for decompression buffer sizing. - The function
zstd_string_output_truncateis used to truncate the output buffer to the actual compressed or decompressed size, reallocating only if the free space is significantly large (more than 1/8th of capacity or over 1MB). - Streaming decompression uses buffers sized by
ZSTD_DStreamInSize()andZSTD_DStreamOutSize(). - In the stream wrapper implementation (
php_stream_zstd_opener), buffer sizes for input and output are set usingZSTD_DStreamInSize()andZSTD_DStreamOutSize(). - The output handler for compression uses
ZSTD_CStreamOutSize()to allocate output buffers. - The buffer size changes mentioned in the user comment likely relate to these buffer size allocations and truncations in the compression/decompression functions or stream handling.
These details are essential for understanding how buffer sizes are determined, allocated, and adjusted in the code.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests