Skip to content

WIP: Apply backpressure to add_item operations#282

Open
benoit74 wants to merge 1 commit intomainfrom
backpressure
Open

WIP: Apply backpressure to add_item operations#282
benoit74 wants to merge 1 commit intomainfrom
backpressure

Conversation

@benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Mar 3, 2026

This is an attempt to add backpressure to add_item operations through the callback mechanism to ensure we do not ask libzim to process more items than it can achieve to handle.

@benoit74
Copy link
Collaborator Author

benoit74 commented Mar 3, 2026

@rgaudin : I probably need your help on this ; simple test I've implemented is dead-locking "forever", and since it is the "base" of what I intend to use to apply backpressure to add_item operations, I'm a bit stuck.

I tried various approaches, but it seems to be mostly impossible to ensure the item is really finalized while also waiting for total count of items to be below a given threshold.

@rgaudin
Copy link
Member

rgaudin commented Mar 4, 2026

That's because the data you're adding is too small (under the size of a cluster) so there is never a cluster being closed so there is no chance for the worker to consider the data added and thus fire the callback.
I've changed your code to add a 1MB file instead and it behaves as expected.

@benoit74
Copy link
Collaborator Author

benoit74 commented Mar 5, 2026

OK, thank you for the insight.

Do I get it correctly that it means we need to track the size of added items to apply backpressure, rather than the number of items? It makes quite a lot of sense actually, and would be superior to original "idea" I had.

Do we have a default cluster size in libzim, or is this something dynamically decided? how?

@rgaudin
Copy link
Member

rgaudin commented Mar 5, 2026

Do I get it correctly that it means we need to track the size of added items to apply backpressure, rather than the number of items? It makes quite a lot of sense actually, and would be superior to original "idea" I had.

Not necessarily as long as you work around the first and last cluster behavior.

Do we have a default cluster size in libzim, or is this something dynamically decided? how?

Default is 2MiB but can be changed with creator.config_clustersize()

@benoit74
Copy link
Collaborator Author

benoit74 commented Mar 5, 2026

Not necessarily as long as you work around the first and last cluster behavior.

What do you mean by first and last cluster? I don't think I'm aware of anything special on these clusters

@rgaudin
Copy link
Member

rgaudin commented Mar 5, 2026

Exactly what I indicated above: callback is fired only once a cluster is closed so, as highlighted by your test, until you have filled the first cluster, you cannot expect any callback.
The last cluster is only close in finish() and so the callback happens post-finish which can be unexpected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants