Skip to content

[GOBBLIN-2262] Bump default thread counts for spec catalog listener and DAG processing engine#4189

Open
DaisyModi wants to merge 1 commit intoapache:masterfrom
DaisyModi:dmodi/bump-thread-pool-defaults
Open

[GOBBLIN-2262] Bump default thread counts for spec catalog listener and DAG processing engine#4189
DaisyModi wants to merge 1 commit intoapache:masterfrom
DaisyModi:dmodi/bump-thread-pool-defaults

Conversation

@DaisyModi
Copy link
Copy Markdown
Contributor

@DaisyModi DaisyModi commented Apr 28, 2026

Dear Gobblin maintainers,

I am submitting the following PR for your kind review.

JIRA

Description

Increases two thread-pool defaults from 3 to 150 in ServiceConfigKeys:

  • DEFAULT_NUM_SPEC_CATALOG_LISTENER_THREADS — controls the executor used by FlowCatalog's SpecCatalogListenersList for parallel flow compilation on the submission path (POST /flowconfigs).
  • DEFAULT_NUM_DAG_PROC_THREADS — controls the executor used by DagProcessingEngine on the execution path.

Motivation

Both pools ultimately invoke MultiHopFlowCompiler.compileFlow, which calls DataMovementAuthorizer.isMovementAuthorized (MultiHopFlowCompiler.java:242). Under high authorization-service latency, the default pool size of 3 becomes the binding throughput limit on both paths and queues flow submissions and executions even when downstream resources have plenty of capacity.

The pool sizes remain configurable via:

  • gobblin.service.specCatalogListener.numThreads
  • gobblin.service.dagProcessingEngine.numThreads

Deployments preferring the previous behavior can override either back to 3.

Tests

Existing tests reference these constants symbolically (e.g. DagProcessingEngineTest.java uses ServiceConfigKeys.DEFAULT_NUM_DAG_PROC_THREADS directly), so they auto-adjust to the new default and require no changes.

…nd DAG processing engine

Increase DEFAULT_NUM_SPEC_CATALOG_LISTENER_THREADS and
DEFAULT_NUM_DAG_PROC_THREADS from 3 to 150. Both pools serve compileFlow
invocations (submission and execution paths respectively); compileFlow
calls DataMovementAuthorizer.isMovementAuthorized, which can be slow
under load and starves the small default pools, queueing flow
submissions and executions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DaisyModi DaisyModi changed the title [GOBBLIN-XXXX] Bump default thread counts for spec catalog listener and DAG processing engine [GOBBLIN-2262] Bump default thread counts for spec catalog listener and DAG processing engine Apr 28, 2026
public static final String NUM_DAG_PROC_THREADS_KEY = GOBBLIN_SERVICE_DAG_PROCESSING_ENGINE_PREFIX + "numThreads";
public static final String DAG_PROC_ENGINE_NON_RETRYABLE_EXCEPTIONS_KEY = GOBBLIN_SERVICE_DAG_PROCESSING_ENGINE_PREFIX + "nonRetryableExceptions";
public static final Integer DEFAULT_NUM_DAG_PROC_THREADS = 3;
public static final Integer DEFAULT_NUM_DAG_PROC_THREADS = 150;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this configurable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants