Populate conda-lock md5 and sha256 in scan manifests#1802
Populate conda-lock md5 and sha256 in scan manifests#1802davidBConda wants to merge 1 commit intomicrosoft:mainfrom
Conversation
- Add optional Sha256 on CondaComponent (JSON sha256) and extend ctor with md5/sha256 from lockfile hash map - Add optional Md5 and Sha256 on PipComponent for pip-managed conda-lock entries; include digests in component id when present - CondaDependencyResolver: TryGetHash reads per-package hash.md5 / hash.sha256 from conda-lock.yml - Extend CondaLock detector tests to assert digest values from fixture Made-with: Cursor
|
@davidBConda please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
Updates the CondaLock detector pipeline to populate per-package digest fields (md5/sha256) into scan manifests, and adjusts typed component identity behavior to incorporate digests.
Changes:
- Extend
CondaDependencyResolverto extractmd5/sha256from conda-lock packagehashmaps and populate them onto created components. - Add
Md5/Sha256fields toPipComponentandSha256toCondaComponent, and updateComputeBaseId()logic to include digests. - Expand
CondaLockComponentDetectorTeststo assert digests are populated for both Conda and Pip components.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Detectors.Tests/CondaLockComponentDetectorTests.cs | Adds assertions validating md5/sha256 propagation from conda-lock YAML into detected components. |
| src/Microsoft.ComponentDetection.Detectors/conda/CondaDependencyResolver.cs | Extracts digests from conda-lock hash maps and sets them on created PipComponent/CondaComponent instances. |
| src/Microsoft.ComponentDetection.Contracts/TypedComponent/PipComponent.cs | Introduces Md5/Sha256 serialization fields and changes component BaseId/Id to include digest suffix. |
| src/Microsoft.ComponentDetection.Contracts/TypedComponent/CondaComponent.cs | Adds Sha256 field and includes it in BaseId/Id computation. |
| var digestSuffix = string.Empty; | ||
| if (!string.IsNullOrEmpty(this.Sha256)) | ||
| { | ||
| digestSuffix = $" {this.Sha256}"; | ||
| } | ||
| else if (!string.IsNullOrEmpty(this.Md5)) | ||
| { | ||
| digestSuffix = $" {this.Md5}"; | ||
| } | ||
|
|
||
| return $"{this.Name} {this.Version}{digestSuffix} - {this.Type}".ToLowerInvariant(); |
There was a problem hiding this comment.
ComputeBaseId() now incorporates Sha256/Md5 into the Pip component identity. Since Component.Id is used as the stable key in manifests and dependency graphs, this is a breaking change: the same package/version can now produce different IDs depending on whether a digest is present, and verification tests that diff manifests across versions will fail. If the goal is to surface digests in output without changing identity semantics, consider keeping BaseId as name version - type and emitting digests as separate fields (or via extended ID properties if the ID format must include them).
| var digestSuffix = string.Empty; | |
| if (!string.IsNullOrEmpty(this.Sha256)) | |
| { | |
| digestSuffix = $" {this.Sha256}"; | |
| } | |
| else if (!string.IsNullOrEmpty(this.Md5)) | |
| { | |
| digestSuffix = $" {this.Md5}"; | |
| } | |
| return $"{this.Name} {this.Version}{digestSuffix} - {this.Type}".ToLowerInvariant(); | |
| return $"{this.Name} {this.Version} - {this.Type}".ToLowerInvariant(); |
| [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)] | ||
| [JsonPropertyName("mD5")] | ||
| public string? Md5 { get; set; } | ||
|
|
||
| [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)] | ||
| [JsonPropertyName("sha256")] | ||
| public string? Sha256 { get; set; } |
There was a problem hiding this comment.
Md5/Sha256 are mutable (set;) but are also used to compute (and then cache) BaseId/Id. If these properties are ever set/changed after the component has been registered (or after Id has been accessed once), the cached ID won’t reflect the updated digest, causing inconsistent graph keys. To prevent this class of bugs, consider making these init-only or constructor-set so identity inputs can’t change after creation.
| public override ComponentType Type => ComponentType.Conda; | ||
|
|
||
| protected override string ComputeBaseId() => $"{this.Name} {this.Version} {this.Build} {this.Channel} {this.Subdir} {this.Namespace} {this.Url} {this.MD5} - {this.Type}"; | ||
| protected override string ComputeBaseId() => $"{this.Name} {this.Version} {this.Build} {this.Channel} {this.Subdir} {this.Namespace} {this.Url} {this.MD5} {this.Sha256} - {this.Type}"; |
There was a problem hiding this comment.
ComputeBaseId() now always interpolates this.Sha256. When Sha256 is null/empty (e.g., callers like the Linux conda artifact factory don’t pass it), this changes the resulting component ID string (extra whitespace) compared to previous releases, which can break manifest diffing and component de-duplication. Consider building the ID from non-empty parts (or conditionally appending sha256 only when present) so IDs are stable when sha256 isn’t available.
| protected override string ComputeBaseId() => $"{this.Name} {this.Version} {this.Build} {this.Channel} {this.Subdir} {this.Namespace} {this.Url} {this.MD5} {this.Sha256} - {this.Type}"; | |
| protected override string ComputeBaseId() | |
| { | |
| var baseIdWithoutSha256 = $"{this.Name} {this.Version} {this.Build} {this.Channel} {this.Subdir} {this.Namespace} {this.Url} {this.MD5}"; | |
| if (string.IsNullOrEmpty(this.Sha256)) | |
| { | |
| return $"{baseIdWithoutSha256} - {this.Type}"; | |
| } | |
| return $"{baseIdWithoutSha256} {this.Sha256} - {this.Type}"; | |
| } |
| public string MD5 { get; set; } | ||
|
|
||
| [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)] | ||
| [JsonPropertyName("sha256")] | ||
| public string Sha256 { get; set; } |
There was a problem hiding this comment.
Sha256 is ignored when null, but MD5 is still always serialized. If MD5 is missing for a package, this will emit "mD5": null in scan manifests, which is inconsistent with the new sha256 behavior and may create noisy diffs. Consider adding JsonIgnore(Condition = WhenWritingNull) to MD5 as well (or otherwise aligning null-handling for both digest fields).
| }; | ||
| } | ||
|
|
||
| return new CondaComponent(package.Name, package.Version, null, package.Category, null, null, null, md5, sha256); |
There was a problem hiding this comment.
Passing md5/sha256 into CondaComponent will make the component ID platform/artifact-specific (hashes differ per platform/build), reintroducing the duplicate-component scenario the comment above is trying to avoid by omitting URL. If the intent is still to dedupe the same name/version across platforms, consider storing hashes as metadata fields but keeping them out of the identity used for Component.Id (or otherwise explicitly scoping identity changes to the desired cases).
| return new CondaComponent(package.Name, package.Version, null, package.Category, null, null, null, md5, sha256); | |
| return new CondaComponent(package.Name, package.Version, null, package.Category, null, null, null, null, null); |
Example PR to include sha256 to conda lock file processing and fixes md5 processing.
Current Example of CondaLock Processed Data
CondaLock Data with SHA256 added and MD5 included