Skip to content

CHASM retry all tasks on standby until active side has invalidated them#10552

Open
awln-temporal wants to merge 2 commits into
mainfrom
chasm-standby-retry-invalid-tasks
Open

CHASM retry all tasks on standby until active side has invalidated them#10552
awln-temporal wants to merge 2 commits into
mainfrom
chasm-standby-retry-invalid-tasks

Conversation

@awln-temporal

@awln-temporal awln-temporal commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What

Retry CHASM tasks on standby indefinitely until Active cluster replication invalidates the logical tasks.

Why

Currently, if the standby cluster runs any CHASM task and none are valid, the physical task is then discarded. To prevent stuck execution cases where code deployments lead to stricter task validation and physical task discarding, we need to keep track of the physical task, only invalidating if the active cluster has completed or dropped it.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

@awln-temporal awln-temporal requested review from a team as code owners June 5, 2026 17:19
@awln-temporal awln-temporal force-pushed the chasm-standby-retry-invalid-tasks branch 2 times, most recently from 075e4db to 483511f Compare June 5, 2026 17:31
@awln-temporal awln-temporal changed the title [CHASM] Retry locally-invalid side effect tasks on standby until active cluster is consulted CHASM retry all tasks on standby until active side has invalidated them Jun 5, 2026
@awln-temporal awln-temporal force-pushed the chasm-standby-retry-invalid-tasks branch from 645c257 to 67c4356 Compare June 5, 2026 18:45

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm changes in this file seems not related, can we split them to a separate PR?


valid, err := validateChasmSideEffectTask(ctx, ms, task)
if err != nil || !valid {
_, err = validateChasmSideEffectTask(ctx, ms, task)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall us discussing about changing side effect task logic on standby side as well but yeah I agree we can make the same change.

However, I don't think it's as simple as just ignoring the valid flag, we only want to ignore "invalid" tasks due to task validator logic change but not "invalid" tasks due to say component not found or corresponding logical tasks not found. The "valid" flag returned today contains both cases.


return false, nil
func(_ chasm.NodePureTask, _ chasm.TaskAttributes, _ any) (bool, error) {
// Any task present means replication has not yet removed it — retry.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

mutableState,
task,
func(node chasm.NodePureTask, taskAttributes chasm.TaskAttributes, task any) (bool, error) {
ok, err := node.ValidatePureTask(ctx, taskAttributes, task)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can remove ValidatePureTask method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants