workflowcheck OOM issues on large codebases

`workflowcheck` scans the entirety of the `runtimeClasspath` and stores a cache of all visited classes in a hashmap. For large codebases, this is wasteful and can OOM CI jobs.

I suspect the current behavior was chosen because (1) it's simple, and (2) to potentially capture workflows that are brought in by dependencies. For the second point, it feels like the burden of determinism checks for workflows provided by a library should be on the library provider, rather than the consumer.

I suspect that suggestion 2 below may be the best fix and most-correct, but it might be behaviorally backwards-incompatible from what exists today

# Current Behavior

`temporal-workflowcheck` uses a single classpath concept. When you call `findWorkflowClasses(String... classPaths)`:

1. ClassPath constructor enumerates every .class file across all JARs and directories (filtering only `java/`, `javax/`, `jdk/`, `com/sun/`). For a larges app, this could be tens of thousands of classes.
2. The main loop calls `loader.loadClass()` on every one of them, using ASM ClassReader to parse bytecode. Each class gets cached in `Loader.classes` (a `HashMap<String, ClassInfo>`).
3. For each class, it checks whether any public non-static non-abstract method is a workflow implementation (by walking the superclass/interface chain via `findWorkflowImplInfo`). This triggers more `loadClass` calls.
4. For workflow methods found, `processMethodValidity` traces the call chain through dependencies to detect non-deterministic operations, triggering yet more `loadClass` calls.

The result for a project with a few workflow classes buried among ~20,000 dependency classes, the tool eagerly loads all 20,000 just to find those few classes. The `Loader.classes` cache retains every `ClassInfo` for the lifetime of the scan.

# Suggestions

## (1) Separate scan targets from resolution classpath

Add a new overload:

```java
public List<ClassInfo> findWorkflowClasses(
  String[] scanPaths,         // enumerate classes from here
  String[] resolutionPaths    // available for call-chain resolution only
) throws IOException
```

`ClassPath` would enumerate class names only from scanPaths, but create a `URLClassLoader` covering both. The main loop iterates a fraction of the classes; `Loader.loadClass()` still works for call-chain tracing because the full classpath is on the classloader.

Instead of eagerly loading 20,000 classes in the discovery loop, only module classes are enumerated. Dependencies are loaded on-demand during `processMethodValidity` tracing, and only the onesactually in the call chain. The `Loader.classes` cache might end up with ~1k entries instead of ~20k.

We'd need to add a `--scan-classpath <path>` flag (repeatable). When present, the tool would only enumerate from those path(s). When absent, current behavior would be preserved (scan everything).

A much less durable alternative would be to support configurable classpath exclusions. While flexible, it'd put a toil-level burden on developers to whack-a-mole classpaths as a module's dependency graph changes.

## (2) Lazy load with annotation filtering

Instead of loading every class through ASM's full `ClassReader`, do a two-pass approach:

- Pass 1: Quick scan of workflow-related annotations (`WorkflowInterface`, `WorkflowMethod`, etc), which can be done by reading the class constant pools instead of the full class. Classes without these annotations (or that don't implement interfaces containing them) would be skipped entirely.
- Pass 2: Full ASM analysis only on classes identified by pass 1.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

workflowcheck OOM issues on large codebases #2846

Current Behavior

Suggestions

(1) Separate scan targets from resolution classpath

(2) Lazy load with annotation filtering

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

workflowcheck OOM issues on large codebases #2846

Description

Current Behavior

Suggestions

(1) Separate scan targets from resolution classpath

(2) Lazy load with annotation filtering

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions