Skip to content

workflowcheck OOM issues on large codebases #2846

@robzienert

Description

@robzienert

workflowcheck scans the entirety of the runtimeClasspath and stores a cache of all visited classes in a hashmap. For large codebases, this is wasteful and can OOM CI jobs.

I suspect the current behavior was chosen because (1) it's simple, and (2) to potentially capture workflows that are brought in by dependencies. For the second point, it feels like the burden of determinism checks for workflows provided by a library should be on the library provider, rather than the consumer.

I suspect that suggestion 2 below may be the best fix and most-correct, but it might be behaviorally backwards-incompatible from what exists today

Current Behavior

temporal-workflowcheck uses a single classpath concept. When you call findWorkflowClasses(String... classPaths):

  1. ClassPath constructor enumerates every .class file across all JARs and directories (filtering only java/, javax/, jdk/, com/sun/). For a larges app, this could be tens of thousands of classes.
  2. The main loop calls loader.loadClass() on every one of them, using ASM ClassReader to parse bytecode. Each class gets cached in Loader.classes (a HashMap<String, ClassInfo>).
  3. For each class, it checks whether any public non-static non-abstract method is a workflow implementation (by walking the superclass/interface chain via findWorkflowImplInfo). This triggers more loadClass calls.
  4. For workflow methods found, processMethodValidity traces the call chain through dependencies to detect non-deterministic operations, triggering yet more loadClass calls.

The result for a project with a few workflow classes buried among ~20,000 dependency classes, the tool eagerly loads all 20,000 just to find those few classes. The Loader.classes cache retains every ClassInfo for the lifetime of the scan.

Suggestions

(1) Separate scan targets from resolution classpath

Add a new overload:

public List<ClassInfo> findWorkflowClasses(
  String[] scanPaths,         // enumerate classes from here
  String[] resolutionPaths    // available for call-chain resolution only
) throws IOException

ClassPath would enumerate class names only from scanPaths, but create a URLClassLoader covering both. The main loop iterates a fraction of the classes; Loader.loadClass() still works for call-chain tracing because the full classpath is on the classloader.

Instead of eagerly loading 20,000 classes in the discovery loop, only module classes are enumerated. Dependencies are loaded on-demand during processMethodValidity tracing, and only the onesactually in the call chain. The Loader.classes cache might end up with ~1k entries instead of ~20k.

We'd need to add a --scan-classpath <path> flag (repeatable). When present, the tool would only enumerate from those path(s). When absent, current behavior would be preserved (scan everything).

A much less durable alternative would be to support configurable classpath exclusions. While flexible, it'd put a toil-level burden on developers to whack-a-mole classpaths as a module's dependency graph changes.

(2) Lazy load with annotation filtering

Instead of loading every class through ASM's full ClassReader, do a two-pass approach:

  • Pass 1: Quick scan of workflow-related annotations (WorkflowInterface, WorkflowMethod, etc), which can be done by reading the class constant pools instead of the full class. Classes without these annotations (or that don't implement interfaces containing them) would be skipped entirely.
  • Pass 2: Full ASM analysis only on classes identified by pass 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions