Skip to content

perf: compile regex patterns once instead of per-call in 3 hot paths #826

@dangrondahl

Description

@dangrondahl

Parent issue

Part of #824 (performance audit item #5).

Problem

Three separate code paths compile regex patterns on every invocation instead of compiling once and reusing:

5a. ShouldInclude() — resource name filtering

File: internal/filters/resourceFilter.go:29,45

func (filter *ResourceFilterOptions) ShouldInclude(name string) (bool, error) {
    for _, pattern := range filter.ExcludeNamesRegex {
        re, err := regexp.Compile(pattern)  // ← compiled per name, per pattern
        // ...
    }
    for _, pattern := range filter.IncludeNamesRegex {
        re, err := regexp.Compile(pattern)  // ← compiled per name, per pattern
        // ...
    }
}

ShouldInclude is called once per resource (namespace, pod, function, cluster, service). The same regex patterns are recompiled on every call.

Affected commands:

  • kosli snapshot k8s — called once per namespace (line kube.go:274) during filtering
  • kosli snapshot lambda — called once per Lambda function (aws.go:190)
  • kosli snapshot ecs — called once per ECS cluster (aws.go:501) and once per ECS service (aws.go:607)

Impact: For a Kubernetes cluster with 100 namespaces and 2 regex patterns, that's 200 regex compilations instead of 2. For large ECS deployments with dozens of clusters and hundreds of services, this multiplies further. Regex compilation is one of the most expensive stdlib operations in Go.

5b. MatchPatternInCommitMessageORBranchName() — Jira issue key extraction

File: internal/gitview/gitView.go:288

func (gv *GitView) MatchPatternInCommitMessageORBranchName(...) ([]string, *CommitInfo, error) {
    re := regexp.MustCompile(pattern)  // ← compiled on every call
    commitMatches := re.FindAllString(commitInfo.Message, -1)
    branchMatches := re.FindAllString(commitInfo.Branch, -1)
    // ...
}

Affected commands:

  • kosli attest jira — calls this to find Jira issue keys (e.g. [A-Z][A-Z0-9]{1,9}-[0-9]+) in commit messages and branch names (attestJira.go:281)

Impact: Called once per attestation currently, so the per-call cost is moderate. But the pattern is static for a given invocation, so caching it is trivial and prevents waste if the function is ever called in a loop.

5c. ValidateDigest() — SHA256 fingerprint validation

File: internal/digest/digest.go:359–369

func ValidateDigest(sha256ToCheck string) error {
    validSha256regex := "^([a-f0-9]{64})$"
    r, err := regexp.Compile(validSha256regex)  // ← recompiled on every call, pattern is a constant
    // ...
}

Affected commands:

  • Any command accepting --fingerprint or artifact arguments passes through GetFingerprintFromArtifactArg() in cli_utils.go:407,424, which calls ValidateDigest
  • Also called from internal/digest/digest.go:280 during Docker image digest resolution

Impact: The regex pattern is a compile-time constant — there is zero reason to recompile it. Every command that fingerprints an artifact pays this cost.

Fix

5a — Pre-compile in filter struct

Add a method or constructor that compiles patterns once:

type ResourceFilterOptions struct {
    // existing fields...
    compiledExclude []*regexp.Regexp  // compiled once
    compiledInclude []*regexp.Regexp  // compiled once
}

func (filter *ResourceFilterOptions) CompilePatterns() error {
    // compile once, store in struct
}

5b — Accept pre-compiled regex or compile at call site

// Option A: caller compiles and passes in
func (gv *GitView) MatchPatternInCommitMessageORBranchName(re *regexp.Regexp, ...) 

// Option B: compile once at the top of the calling function
re := regexp.MustCompile(jiraIssueKeyPattern)

5c — Package-level compiled regex

var validSha256Regex = regexp.MustCompile(`^[a-f0-9]{64}$`)

func ValidateDigest(sha256ToCheck string) error {
    if !validSha256Regex.MatchString(sha256ToCheck) {
        return fmt.Errorf(...)
    }
    return nil
}

Verification

  • All existing tests for these functions should continue to pass with no changes
  • Could add a benchmark test to demonstrate the improvement (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssues related to performance bottlenecks or improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions