Skip to content

Bug: delete_dir in stateful test uses raw startswith for bookkeeping, causing flaky KeyError in delete_group_using_del #3977

@d-v-b

Description

@d-v-b

this issue was written by claude. it was discovered while working on #3961

Bug: delete_dir in stateful test uses raw startswith for bookkeeping, causing flaky KeyError in delete_group_using_del

The hypothesis state machine in src/zarr/testing/stateful.py tracks created nodes in two sets, self.all_arrays and self.all_groups. When delete_dir(path) runs, it prunes those sets using a raw string-prefix match:

# src/zarr/testing/stateful.py:307-312
matches = set()
for node in self.all_groups | self.all_arrays:
    if node.startswith(path):
        matches.add(node)
self.all_groups = self.all_groups - matches
self.all_arrays = self.all_arrays - matches

node.startswith(path) matches any node whose path string begins with path, not just nodes that are descendants of the directory path. So delete_dir('6/f') matches a sibling node at 6/faNT7p7jvJsO3_C._HYi and incorrectly removes it from all_arrays.

The real store-level delete_dir('6/f') only removes objects under 6/f/, so 6/faNT... survives in the store. The bookkeeping and the model now disagree. When delete_group_using_del later walks members(...) of an ancestor group and tries self.all_arrays.remove(obj.path), the entry has already been pruned by the broken delete_dir, and the call raises KeyError.

Reproduction

Slow Hypothesis CI run https://github.com/zarr-developers/zarr-python/actions/runs/25939320276 found this in two distinct falsifying examples in the same job:

File "src/zarr/testing/stateful.py", line 372, in delete_group_using_del
    self.all_arrays.remove(obj.path)
KeyError: '6/j3pnC'
File "src/zarr/testing/stateful.py", line 372, in delete_group_using_del
    self.all_arrays.remove(obj.path)
KeyError: '6/faNT7p7jvJsO3_C._HYi'

The shrunk trace shows the pattern clearly: an array is created at 6/faNT7p7jvJsO3_C._HYi, delete_dir('6/f') is invoked, and the next delete_group_using_del targeting '6' blows up because the bookkeeping for 6/faNT... is gone but the store still has it.

The bug is non-deterministic in CI because .github/workflows/hypothesis.yaml does not pin a hypothesis seed. Most runs pass; the example only surfaces when node_names generates a name that is a string-prefix-collision with another sibling's name and the action ordering exposes the bookkeeping drift.

Root cause

delete_dir strips entries by string prefix instead of by path-segment prefix. The check needs to require that any match is either equal to path or has path followed by the / segment separator.

Suggested fix

Replace the body of the delete_dir cleanup loop with a segment-aware check:

matches = {
    node for node in self.all_groups | self.all_arrays
    if node == path or node.startswith(path + "/")
}

Origin

Introduced in #3130 (commit c972f7f) when the additional stateful actions were ported from icechunk. Unrelated to the current zarr-metadata refactor; surfaced there only because Hypothesis randomization happened to find it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions