Skip to content

add options for validating licenses that limits what is considered valid#144

Merged
elrayle merged 14 commits intomainfrom
elr/expressions-invalid
Apr 16, 2026
Merged

add options for validating licenses that limits what is considered valid#144
elrayle merged 14 commits intomainfrom
elr/expressions-invalid

Conversation

@elrayle
Copy link
Copy Markdown
Collaborator

@elrayle elrayle commented Apr 16, 2026

Description

This PR includes two changes:

  • more control over how licenses are validated
  • performance improvement

Control over license validation

In it's current form ValidateLicenses reports a complex expression (e.g. "MIT AND Apache-2.0") as valid. In most scenarios, based on the original intent, this should have returned invalid. Additionally, it may be desirable to treat other categories of licenses (e.g. deprecated licenses) as invalid as well.

In order to maintain the backward compatibility, a new function ValidateLicensesWithOptions was created that uses options to allow the caller to specify what to consider invalid. All of these are valid by default which is consistent the with current behavior of ValidateLicenses.

  • FailComplexExpressions - rejects license that includes a conjunctive (e.g. "MIT AND Apache-2.0")
  • FailDeprecatedLicenses - rejects deprecated SPDX license identifiers (e.g. "eCos-2.0").
  • FailAllLicenseRefs - rejects all SPDX license references (e.g. "LicenseRef-MyLicense").
  • FailAllDocumentRefs - rejects all SPDX document references (e.g. "DocumentRef-MyDocument").

Performance improvement

Lookup for licenses was using a loop to walk through the slice of licenses which gives good performance for "Apache-2.0" and poor performance for "Zed". This has been updated to a map lookup with significant performance improvements for ValidateLicenses and Satisfies functions.

+--------------------------------------------------+-----------------+-----------------+
| Benchmark ValidateLicenses                       +     BEFORE      +      AFTER      |
+--------------------------------------------------+-----------------+-----------------+
|                                                  | ns/op average   | ns/op average   |
+--------------------------------------------------+-----------------+-----------------+
| MIT--exact                                       |         5 ns/op |         9 ns/op |
| mit--caseinsensitive                             |         8 ns/op |         9 ns/op |
| mit--extra-space                                 |         8 ns/op |        28 ns/op |
| Apache-2.0--active-early                         |     1,527 ns/op |        74 ns/op |
| Zed--active-end                                  |     3,244 ns/op |        60 ns/op |
| MIT AND Apache-2.0--complex                      |     7,941 ns/op |        80 ns/op |
| MIT AND Apache-2.0 OR Zed--complex               |    12,965 ns/op |       144 ns/op |
| BSD-2-Clause-FreeBSD--deprecated                 |     9,733 ns/op |       191 ns/op |
| GPL-2.0-or-later--range                          |     2,311 ns/op |        91 ns/op |
| Apache-1.0+--plus-range                          |     7,508 ns/op |     2,581 ns/op |
| LicenseRef-scancode-adobe-postscript             |     5,345 ns/op |     2,087 ns/op |
| DocumentRef-spdx-tool-1.2:LicenseRef-MIT-Style-2 |     7,191 ns/op |     4,117 ns/op |
+--------------------------------------------------+-----------------+-----------------+

+--------------------------------------------------+-----------------+-----------------+
+ Benchmark Satisfies                              +     BEFORE      +      AFTER      +
+--------------------------------------------------+-----------------+-----------------+
|                                                  | ns/op average   | ns/op average   |
+--------------------------------------------------+-----------------+-----------------+
| MIT--exact                                       |     1,484 ns/op |     1,469 ns/op |
| mit--caseinsensitive                             |     1,538 ns/op |     1,460 ns/op |
| mit--extra-space                                 |     1,538 ns/op |     1,526 ns/op |
| Apache-2.0--active-early                         |     2,498 ns/op |     1,025 ns/op |
| Zed--active-end                                  |     5,388 ns/op |     1,973 ns/op |
| MIT AND Apache-2.0--complex                      | 4,017,136 ns/op | 3,276,933 ns/op |
| MIT AND Apache-2.0 OR Zed--complex               | 3,968,873 ns/op | 2,888,830 ns/op |
| BSD-2-Clause-FreeBSD--deprecated                 | 5,443,734 ns/op | 4,245,148 ns/op |
| GPL-2.0-or-later--range                          | 9,345,267 ns/op | 7,934,505 ns/op |
| Apache-1.0+--plus-range                          | 2,585,905 ns/op | 1,363,857 ns/op |
| LicenseRef-scancode-adobe-postscript             | 2,122,109 ns/op |   983,471 ns/op |
| DocumentRef-spdx-tool-1.2:LicenseRef-MIT-Style-2 | 2,133,928 ns/op |   997,369 ns/op |
+--------------------------------------------------+-----------------+-----------------+
NOTE: For perspective, 4,000,000 ns is 4 ms

Tradeoffs -- Extending the public API vs. breaking change and major release

OPTION 1: Adding the ValidateLicensesWithOptions method

Pros

  • no breaking changes
  • minor release
  • updates to the list of licenses can be adopted with minor release updates

Cons

  • expanding the API long term with no additional feature, only a modification to an existing feature

OPTION 2: Adding a parameter to ValidateLicenses

Pros

  • single point of entry for validating licenses reduces confusion
  • maintain a single definition of what it means to validate licenses
  • avoid a function naming patterns that is not seen elsewhere in the public API

Cons

  • breaking change
  • requires major release to get the new functionality
  • updates to the list of licenses requires downstream users to update to the major release created with this PRs changes
  • downstream code would need to update their calls to ValidateLicenses
// FROM
licenses := []string{"MIT", "Apache-2.0", "GPL-2.0"}
valid, invalidLicenses := ValidateLicenses(licenses)

// TO
valid, invalidLicenses := ValidateLicenses(licenses, ValidateLicensesOptions{})

OPTION 2-a: Alternative for updates to licenses

Instead of requiring downstream users to update to the new major release, both release tracks could be supported for license updates only. This would be a temporary approach that would end in approximately 2-3 months.

Pros

  • removes the requirement to update to the major release if the new functionality is not needed

Cons

  • would have to add infrastructure to support both release tracks

Summary

Since the required adjustment to downstream code is trivial in order to upgrade to the major release, it doesn't seem worth it to add support for two release tracks.

I am inclined to update ValidateLicenses to include the additional parameter and cut a major release.

Feedback on the tradeoffs and proposal to cut a major release is welcome.

elrayle added 7 commits April 15, 2026 15:50
ValidateLicenses is unchanged for backward compatibility

ValidateLicensesWithOptions was added and includes an options parameter with values
ValidateLicensesWithOptions
- FailComplexExpressions - anything with a conjunction
- FailDeprecatedLicenses
- FailAllLicenseRefs
- FailAllDocumentRefs

Example call:
```
valid, _ := ValidateLicensesWithOptions("MIT AND Apache-2.0", ValidateLicensesOptions{FailComplexExpressions: true})
// returns valid=false
```

These provide a way to fail fast for any license that is in one of these categories.
also adds a simple test that makes sure benchmark use cases always pass to ensure improvements are not caused by use case failing early
Copilot AI review requested due to automatic review settings April 16, 2026 15:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable license validation behavior and improves performance of license/exception/deprecated lookups by switching from slice scans to map lookups.

Changes:

  • Introduce ValidateLicensesWithOptions and ValidateLicensesOptions to optionally reject complex expressions, deprecated IDs, and SPDX refs.
  • Add generated uppercase-keyed maps for active/deprecated licenses and exceptions; update lookup helpers to use maps.
  • Expand tests and benchmarks to cover new scenarios and validate benchmark assumptions.
Show a summary per file
File Description
spdxexp/satisfies.go Adds ValidateLicensesWithOptions, options type, and new fast paths in Satisfies; introduces helper predicates.
spdxexp/license.go Switches license/exception/deprecated checks to map-based lookups.
spdxexp/spdxlicenses/get_licenses.go Adds generated licensesMap and GetLicensesMap() accessor.
spdxexp/spdxlicenses/get_exceptions.go Adds generated exceptionsMap and GetExceptionsMap() accessor.
spdxexp/spdxlicenses/get_deprecated.go Adds generated deprecatedMap and GetDeprecatedMap() accessor.
spdxexp/satisfies_test.go Adds tests for new validation options and Satisfies fast-path behavior; adds benchmark scenario safety test.
spdxexp/benchmark_validate_licenses_test.go Updates benchmark scenarios to include more representative cases (spaces, deprecated, refs, etc.).
spdxexp/benchmark_satisfies_test.go Updates benchmark scenarios similarly to validate Satisfies performance changes.
cmd/license.go Updates code generation to emit license/deprecated maps and map accessors.
cmd/exceptions.go Updates code generation to emit exceptions map and accessor.
cmd/doc.go Updates extraction instructions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

spdxexp/satisfies_test.go:2683

  • Grammar in comment: "to changes behavior" should be "to changes in behavior".
// to changes behavior of ValidateLicenses function.

spdxexp/satisfies.go:112

  • The option flags (FailDeprecatedLicenses/FailAllLicenseRefs/FailAllDocumentRefs) are only checked for atomic strings. If the input is a valid SPDX expression (e.g. "MIT AND eCos-2.0" or "MIT AND LicenseRef-X"), parse() can succeed and the expression will be treated as valid even when those flags are enabled. Consider enforcing these options by inspecting the parsed node tree (or scanning tokens) and rejecting deprecated IDs / refs anywhere in the expression.
		// whether the license expression is valid
		if _, err := parse(license); err != nil {
			valid = false
			invalidLicenses = append(invalidLicenses, license)
		}
  • Files reviewed: 11/11 changed files
  • Comments generated: 10

Comment thread spdxexp/spdxlicenses/get_deprecated.go Outdated
Comment thread cmd/exceptions.go
Comment thread cmd/doc.go Outdated
Comment thread spdxexp/satisfies_test.go Outdated
Comment thread spdxexp/satisfies.go Outdated
Comment thread spdxexp/satisfies.go
Comment thread spdxexp/spdxlicenses/get_licenses.go Outdated
Comment thread spdxexp/spdxlicenses/get_exceptions.go Outdated
Comment thread cmd/license.go
Comment thread spdxexp/satisfies.go Outdated
elrayle and others added 6 commits April 16, 2026 13:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ons configured to reject

add test that would have caught this
also corrects formatting issues and puts public functions at top of generated files so they are easier to find
@dangoor
Copy link
Copy Markdown
Contributor

dangoor commented Apr 16, 2026

My opinion is that it's not worth doing a /v2 just for this function. Just doesn't seem worth it, given that the addition of a second validate function is perhaps a little verbose, but easy to understand.

Copy link
Copy Markdown
Contributor

@dangoor dangoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change, and the benchmarks are awesome.

@elrayle elrayle merged commit 6f11b7c into main Apr 16, 2026
5 checks passed
@elrayle elrayle deleted the elr/expressions-invalid branch April 16, 2026 20:24
@ahpook
Copy link
Copy Markdown
Contributor

ahpook commented Apr 17, 2026

I am inclined to update ValidateLicenses to include the additional parameter and cut a major release.

This makes sense to me. It is the exact purpose of having semantic versioning.

Copy link
Copy Markdown

@ljones140 ljones140 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest you use a variadic argument for the options. So we don't have to change the signature for existing callers.

ValidateLicenses(licenses []string, options ...ValidateLicensesOptions)

But apparently this is still considered a backward incompatible change see https://stackoverflow.com/questions/55163005/could-adding-variadic-parameters-to-a-function-break-existing-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants