Skip to content

Downgrade Hangfire to 1.8.22 and Hangfire.Mongo to 1.12.2#876

Merged
pmachapman merged 1 commit intomainfrom
downgrade_mongo
Feb 18, 2026
Merged

Downgrade Hangfire to 1.8.22 and Hangfire.Mongo to 1.12.2#876
pmachapman merged 1 commit intomainfrom
downgrade_mongo

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Feb 18, 2026

In my testing, it appears that the OperationCanceledException that are cancelling jobs are coming from Hangfire directly, when reading the JobState.

It looks like a bug in the implementation for the new StateHistory collection in Hangfire.Mongo reading the incorrect state (perhaps because it is reading from a shard in the MongoDB Atlas cluster that does not yet have the latest StateHistory document for the jobId?), so I have downgraded to the version of Hangfire and Hangfire.Mongo before this change was made.

In my testing, this stopped the jobs being cancelled.

The main downside of this PR is that the serval_jobs and machine_jobs collections will need to be dropped on internal QA and external QA when this PR is deployed.


This change is Reviewable

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out, Peter! If this is a bug in the library, have you noticed other folks hitting this same problem online? Is there an issue or ticket we can create with the maintainers?

@Enkidu93 reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ddaspit).

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have enough information, it would be good to submit an issue to the repo. I'm guessing you tried version 1.13.1. It looks like it had a fix for some issues related state history.

@ddaspit reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on pmachapman).

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you tried version 1.13.1. It looks like it had a fix for some issues related state history.

Yes, sadly this did not help.

If this is a bug in the library, have you noticed other folks hitting this same problem online? Is there an issue or ticket we can create with the maintainers?

I couldn't find anyone else with the issue. To proceed (assuming this PR fixes the bug in prod as it did for me locally), I would need to spend a reasonable chunk of time actually isolating the bug. I think it could be one of three broad areas:

  1. A logic issue in Hangfire.Mongo.
  2. A bug with how we are using MongoDB Atlas, where the StateHistory collection is not kept in sync across shards for the latest state history documents.
  3. A bug with how we are using Hangfire, where we have two pods accessing the same Hangfire database at the same time.

I think I will need to create a test harness and see if I can replicate the bug outside of Serval and go from there.

@pmachapman made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on pmachapman).

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to spend at least some time isolating the issue. I don't want to get stuck on an old version of Hangfire indefinitely. You should timebox your investigation.

@ddaspit made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on pmachapman).

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to spend at least some time isolating the issue. I don't want to get stuck on an old version of Hangfire indefinitely. You should timebox your investigation.

Sounds good - will do.

@pmachapman made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on pmachapman).

@pmachapman pmachapman merged commit 40290bc into main Feb 18, 2026
3 checks passed
@pmachapman pmachapman deleted the downgrade_mongo branch February 18, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments