Skip to content

Conversation

@chrishalcrow
Copy link
Member

@chrishalcrow chrishalcrow commented Dec 31, 2025

Would close #4261

Docs: https://spikeinterface--4287.org.readthedocs.build/en/4287/modules/curation.html

(Note: PR looks big but mostly tests+docs - main changes are updating the id strategy in sorting_tools.py (~100 lines) and updating the apply_merges function (~50 lines))

This PR allows you to remove spikes from a unit during curation by specifying the following in your curation json file:

"discard_spikes": [
    {
        "unit_id": "u10",
        "indices": [
            56,
            57,
            59,
            60
        ]
    },
    {
        "unit_id": "u14",
        "indices": [
            123,
            321
        ]
    }
]

This new feature means that the curation format gets bumped to v3!

You can discard at the same time as merging and splitting.

Tricky bit 1

How to apply it to an analyzer. Decided to discard spikes at the same time as splitting. During the splitting step, we re-wrangle discard spikes into another split unit (call them "discard units") and keep track of the discard unit id. Then remove the full discard units after the splitting. This allows us to use the existing splitting machinery (including splitting extension etc) for discards - nice!

Tricky bit 2

Much of the complexity is now related to the id strategy (ughh - why do we give the user THREE new id strategies!!!). It's difficult because we want a cleaned unit to retains its id. This is true even if there are other units which are just split, and the user is using the "append" or "split" new_unit_id_strategy.

So suppose the user does the following:

Unit 1: clean
Unit 2: split
Unit 3: clean and split

With strategy append, we want

Unit 1 -> Unit 1 + Unit 1 dirty
Unit 2 -> Unit 4 + Unit 5
Unit 3 -> Unit 6 + Unit 7 + Unit 3 dirty

We do this by slotting in the dirty units into places we know to split later. For "append" strategy, I chose to put "Unit 1 dirty" to the last possible unit id, and unit 3 dirty to "unit 3".

Other stuff

We have to do merges after splitting+discarding. This is because the spike indices change after merging. To avoid wrangling spike indices (gross!) we just do discarding first.

Tests to do:

  • test curation format
  • test v2 -> v3 update
  • test actual discarding when also splitting / merging

@chrishalcrow chrishalcrow added the curation Related to curation module label Dec 31, 2025
@alejoe91
Copy link
Member

alejoe91 commented Jan 6, 2026

Awesome Chris! Quick comment before diving into the PR:

So suppose the user does the following:

Unit 1: clean
Unit 2: split
Unit 3: clean and split

I don't think that clean and split should be allowed. We have rules so that a unit can either be removed, merged, split, and I would add discard. Does it simplify the new unit id logic?

Comment on lines +291 to +297
if len(discard_spikes_unit_ids) > 0:
ids_to_remove = []
for new_id_set in new_ids:
if new_id_set[0] in discard_spikes_unit_ids or new_id_set[1] in discard_spikes_unit_ids:
ids_to_remove.append(new_id_set[0])

curated_sorting_or_analyzer = curated_sorting_or_analyzer.remove_units(ids_to_remove)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)


# decide if unit is a simple discard, a simple split or a discard and split
just_discard = False
discard_and_split = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this should not be allowed. A unit can be either split or cleaned, not both! This should simplify the logic a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

curation Related to curation module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add discard_spikes/clean_spikes step in a curation v3

2 participants