fix: Enhancements Needed for Secure Tar Extraction (5560)#5726
fix: Enhancements Needed for Secure Tar Extraction (5560)#5726aviruthen wants to merge 4 commits intoaws:masterfrom
Conversation
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes three security-related bugs in tar extraction safety logic in common_utils.py: (1) uses os.path.commonpath instead of startswith to prevent prefix-collision bypasses, (2) passes the resolved extract_path as base instead of using CWD, and (3) calls tar.getmembers() to pass a proper list to _get_safe_members. The fixes are sound and align with the already-corrected implementations in _repack_model.py. However, there are a few issues worth addressing.
🤖 Iteration #1 — Review Comments AddressedDescriptionThis PR adds robustness improvements to the tar extraction safety logic in Changes
Related IssueRelated issue: 5560 Changes Made
Comments reviewed: 7
|
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes tar extraction safety logic in sagemaker-core's common_utils.py by improving path validation (using os.path.commonpath instead of startswith), passing the correct base directory to _get_safe_members, and calling tar.getmembers() explicitly. The changes are security-relevant and well-tested, though there are a few issues to address.
🤖 Iteration #2 — Review Comments AddressedDescriptionThis PR adds tar extraction safety functions and their tests to BackgroundThe
Changes Made
Related IssueRelated issue: Enhancements Needed for Secure Tar Extraction Merge Checklist
Comments reviewed: 15
|
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes tar extraction safety logic in sagemaker-core's common_utils.py by improving path validation with os.path.commonpath (fixing a prefix-matching vulnerability), passing the correct base directory to _get_safe_members, and calling tar.getmembers() explicitly. The changes are well-motivated security fixes with good test coverage. A few minor issues need attention.
|
|
||
|
|
||
| def _get_safe_members(members): | ||
| def _get_safe_members(members, base): |
There was a problem hiding this comment.
The signature change from _get_safe_members(members) to _get_safe_members(members, base) is a breaking change for any code that calls this function with the old signature. Since this is a private function (underscore-prefixed), the risk is lower, but if any downstream code or other modules in sagemaker-core call _get_safe_members(tar) (passing the tar object directly as was done before), this will break. The old code also used _get_safe_members(tar) where tar was iterable — now it expects tar.getmembers() (a list). Please verify there are no other callers of _get_safe_members in the codebase that would break.
There was a problem hiding this comment.
Agreed, this should be fixed. However, perhaps this can be made an optional argument by providing a default value which can be handled (rather than removing it from the signature or verifying other function calls don't use this).
🤖 Iteration #3 — Review Comments AddressedDescriptionThis PR addresses the reported issues with tar extraction safety logic in Changes Made
Related IssueEnhancements Needed for Secure Tar Extraction Merge Checklist
Comments reviewed: 23
|
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes tar extraction safety logic in sagemaker-core's common_utils.py by improving path comparison (using os.path.commonpath instead of startswith), passing the correct base directory to _get_safe_members, and calling tar.getmembers() explicitly. The changes are security-relevant and well-tested, though there are a few issues to address.
| """ | ||
| # joinpath will ignore base if path is absolute | ||
| return not _get_resolved_path(joinpath(base, path)).startswith(base) | ||
| resolved = _get_resolved_path(joinpath(base, path)) |
There was a problem hiding this comment.
The os.path.commonpath fix is a good security improvement over startswith, which can be tricked by paths like /base/x2 matching /base/x. However, there's a subtle issue: os.path.commonpath normalizes paths, but base may not be normalized in the same way. You should ensure both resolved and base are consistently normalized before comparison. Consider:
resolved = _get_resolved_path(joinpath(base, path))
base = _get_resolved_path(base) # ensure consistent normalization
try:
return os.path.commonpath([resolved, base]) != base
except ValueError:
return TrueOtherwise, if base has a trailing slash or different normalization than what commonpath returns, the comparison could give incorrect results.
Description
The issue reports three bugs in tar extraction safety logic in _repack_model.py. After reading the actual code, _repack_model.py has ALREADY been fixed: (1) _get_safe_members takes a 'members' list param and is called with tar.getmembers(), (2) custom_extractall_tarfile passes _get_resolved_path(extract_path) as base, (3) _is_bad_path uses os.path.commonpath. However, sagemaker-core/src/sagemaker/core/common_utils.py references _get_resolved_path() and custom_extractall_tarfile() in its _create_or_update_code_dir and _extract_model functions, but these tar safety functions (_get_resolved_path, _is_bad_path, _is_bad_link, _get_safe_members, custom_extractall_tarfile) are NOT defined in common_utils.py. They need to be added to common_utils.py with the same corrected implementations from _repack_model.py.
Related Issue
Related issue: 5560
Changes Made
sagemaker-core/src/sagemaker/core/common_utils.pysagemaker-core/src/sagemaker/core/utils/__init__.pysagemaker-core/tests/unit/test_common_utils_tar_safety.pyAI-Generated PR
This PR was automatically generated by the PySDK Issue Agent.
Merge Checklist
prefix: descriptionformat