add options for validating licenses that limits what is considered valid#144
add options for validating licenses that limits what is considered valid#144
Conversation
ValidateLicenses is unchanged for backward compatibility
ValidateLicensesWithOptions was added and includes an options parameter with values
ValidateLicensesWithOptions
- FailComplexExpressions - anything with a conjunction
- FailDeprecatedLicenses
- FailAllLicenseRefs
- FailAllDocumentRefs
Example call:
```
valid, _ := ValidateLicensesWithOptions("MIT AND Apache-2.0", ValidateLicensesOptions{FailComplexExpressions: true})
// returns valid=false
```
These provide a way to fail fast for any license that is in one of these categories.
also adds a simple test that makes sure benchmark use cases always pass to ensure improvements are not caused by use case failing early
There was a problem hiding this comment.
Pull request overview
Adds configurable license validation behavior and improves performance of license/exception/deprecated lookups by switching from slice scans to map lookups.
Changes:
- Introduce
ValidateLicensesWithOptionsandValidateLicensesOptionsto optionally reject complex expressions, deprecated IDs, and SPDX refs. - Add generated uppercase-keyed maps for active/deprecated licenses and exceptions; update lookup helpers to use maps.
- Expand tests and benchmarks to cover new scenarios and validate benchmark assumptions.
Show a summary per file
| File | Description |
|---|---|
| spdxexp/satisfies.go | Adds ValidateLicensesWithOptions, options type, and new fast paths in Satisfies; introduces helper predicates. |
| spdxexp/license.go | Switches license/exception/deprecated checks to map-based lookups. |
| spdxexp/spdxlicenses/get_licenses.go | Adds generated licensesMap and GetLicensesMap() accessor. |
| spdxexp/spdxlicenses/get_exceptions.go | Adds generated exceptionsMap and GetExceptionsMap() accessor. |
| spdxexp/spdxlicenses/get_deprecated.go | Adds generated deprecatedMap and GetDeprecatedMap() accessor. |
| spdxexp/satisfies_test.go | Adds tests for new validation options and Satisfies fast-path behavior; adds benchmark scenario safety test. |
| spdxexp/benchmark_validate_licenses_test.go | Updates benchmark scenarios to include more representative cases (spaces, deprecated, refs, etc.). |
| spdxexp/benchmark_satisfies_test.go | Updates benchmark scenarios similarly to validate Satisfies performance changes. |
| cmd/license.go | Updates code generation to emit license/deprecated maps and map accessors. |
| cmd/exceptions.go | Updates code generation to emit exceptions map and accessor. |
| cmd/doc.go | Updates extraction instructions. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
spdxexp/satisfies_test.go:2683
- Grammar in comment: "to changes behavior" should be "to changes in behavior".
// to changes behavior of ValidateLicenses function.
spdxexp/satisfies.go:112
- The option flags (FailDeprecatedLicenses/FailAllLicenseRefs/FailAllDocumentRefs) are only checked for atomic strings. If the input is a valid SPDX expression (e.g. "MIT AND eCos-2.0" or "MIT AND LicenseRef-X"), parse() can succeed and the expression will be treated as valid even when those flags are enabled. Consider enforcing these options by inspecting the parsed node tree (or scanning tokens) and rejecting deprecated IDs / refs anywhere in the expression.
// whether the license expression is valid
if _, err := parse(license); err != nil {
valid = false
invalidLicenses = append(invalidLicenses, license)
}
- Files reviewed: 11/11 changed files
- Comments generated: 10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ons configured to reject add test that would have caught this
also corrects formatting issues and puts public functions at top of generated files so they are easier to find
|
My opinion is that it's not worth doing a /v2 just for this function. Just doesn't seem worth it, given that the addition of a second validate function is perhaps a little verbose, but easy to understand. |
dangoor
left a comment
There was a problem hiding this comment.
This is a great change, and the benchmarks are awesome.
This makes sense to me. It is the exact purpose of having semantic versioning. |
ljones140
left a comment
There was a problem hiding this comment.
I was going to suggest you use a variadic argument for the options. So we don't have to change the signature for existing callers.
ValidateLicenses(licenses []string, options ...ValidateLicensesOptions)
But apparently this is still considered a backward incompatible change see https://stackoverflow.com/questions/55163005/could-adding-variadic-parameters-to-a-function-break-existing-code
Description
This PR includes two changes:
Control over license validation
In it's current form
ValidateLicensesreports a complex expression (e.g. "MIT AND Apache-2.0") as valid. In most scenarios, based on the original intent, this should have returned invalid. Additionally, it may be desirable to treat other categories of licenses (e.g. deprecated licenses) as invalid as well.In order to maintain the backward compatibility, a new function
ValidateLicensesWithOptionswas created that uses options to allow the caller to specify what to consider invalid. All of these are valid by default which is consistent the with current behavior ofValidateLicenses.Performance improvement
Lookup for licenses was using a loop to walk through the
sliceof licenses which gives good performance for "Apache-2.0" and poor performance for "Zed". This has been updated to amaplookup with significant performance improvements forValidateLicensesandSatisfiesfunctions.Tradeoffs -- Extending the public API vs. breaking change and major release
OPTION 1: Adding the
ValidateLicensesWithOptionsmethodPros
Cons
OPTION 2: Adding a parameter to
ValidateLicensesPros
Cons
ValidateLicensesOPTION 2-a: Alternative for updates to licenses
Instead of requiring downstream users to update to the new major release, both release tracks could be supported for license updates only. This would be a temporary approach that would end in approximately 2-3 months.
Pros
Cons
Summary
Since the required adjustment to downstream code is trivial in order to upgrade to the major release, it doesn't seem worth it to add support for two release tracks.
I am inclined to update
ValidateLicensesto include the additional parameter and cut a major release.Feedback on the tradeoffs and proposal to cut a major release is welcome.