Skip to content

Add a "hide duplicate images" option to import dialogs#21319

Open
jeffc wants to merge 5 commits into
darktable-org:masterfrom
jeffc:skip-dupes
Open

Add a "hide duplicate images" option to import dialogs#21319
jeffc wants to merge 5 commits into
darktable-org:masterfrom
jeffc:skip-dupes

Conversation

@jeffc

@jeffc jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

I have a camera with two SD card slots (Canon R7), and when I have "save to both cards" enabled, darktable shows both copies of every file in the "import from camera" window. This is frustrating, because I have to then sort by name and manually deselect all of the duplicates, or import them and delete them manually.

This PR adds a hide duplicate images option to the import dialogs, which looks at the timestamp (from the photo metadata) and filesize of every image. If two (or more) match, it considers the one with the alphabetically-first path the "canonical" version and marks the others with a "possible duplicate" flag. Checking the "hide duplicate images" box filters the possible duplicates out of the list. It only compares against other images in the same import job, not against anything already in the darktable library/database.

This doesn't take into account the filename(s) of the images, which is how the "only new images" box works (in addition to timestamp), but relying on a size check felt more reliable to me. I'm happy to update that to match if you'd prefer.

image

This PR also fixes two small pre-existing memory leaks in the import dialog process.

note: I used AI tools to write most of this code, since C++ is not my strongest language. I reviewed the changes as carefully as I could and they all look reasonable

@jeffc jeffc changed the title Add a "skip duplicates" option to import dialogs Add a "hide duplicate images" option to import dialogs Jun 15, 2026

@victoryforce victoryforce left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit changes the 'swap' argument of dt_gui_preferences_bool function (which controls the position in the grid and has nothing to do with memory allocation). I've failed to see how this can fix "pre-existing memory leak". Could you please explain why the commit comment mentions a memory leak?

EDIT: Unexpected behavior from GitHub UI, I wrote this comment under a specific commit, and it is published in the general thread. It is about commit 71d8f2a

@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@victoryforce

That's my bad; I made two back-to-back commits (in order to put the leak fix in its own), but added the wrong files to each. I'll add a comment on the offending commit to explain for any other reviewers

The first leak fix is in camera_jobs.c here, just switching a g_list_free() to a g_list_free_full() in order to free the list contents in addition to the list object itself.

The second leak fix is properly in its own commit, here (making a scope-local copy of the list so that we don't inhibit the caller's ability to free the images list)

@jenshannoschwalm

Copy link
Copy Markdown
Collaborator

You should do a proper PR with the proposed fix for 5.6 first i think.

@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@jenshannoschwalm by that, do you mean separate out the memory leak fixes into their own PR? (I'm happy to do that, just making sure I understand you properly)

@jenshannoschwalm

Copy link
Copy Markdown
Collaborator

by that, do you mean separate out the memory leak fixes into their own PR?

Exactly.

  1. If there is a bug you found it should be fixed for 5.6 if still possible
  2. Rebasing on that would make your pr more easy to discuss.

@wpferguson

Copy link
Copy Markdown
Member

I think this is an enhancement and not a bug. It's also somewhat of an edge cause because you have to be saving the same image to both cards and then importing from the camera

The question I have is can one image be corrupt and the other not and still have the same file size. Would check summing be a better test of duplicate (though painfully slow).

@jeffc jeffc requested a review from LebedevRI as a code owner June 15, 2026 18:00
@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I separated out the memory leaks into their own PR (#21326), but seem to have goofed up this PR with my rebase. Fixing that now

jeffc added 4 commits June 15, 2026 14:10
 - skip duplicate checking when size or timestamp is not set
 - resolve "chains" (all dupes point to the same canonical image)
 - match UI defaults and config defaults ("hide duplicates" starts
   unchecked)
@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Okay, rebase is untangled and should be good for further review

The question I have is can one image be corrupt and the other not and still have the same file size. Would check summing be a better test of duplicate (though painfully slow).

I actually did implement this at first -- the timestamp+size test marked "possible duplicate" images, then they were still copied from the camera and checksummed. It only discarded images if their "canonical" version's checksum matched perfectly (I had originally wanted to read the "thumbnail preview" for every image and compare the checksums of those, but I realized this would indeed be painfully slow). I later changed to the current behavior because I realized that doing it that way effectively doubled the import time. I suspect (without evidence) that a corrupted image would most likely be due to an incomplete write (and would thus have a different filesize), but even if that's not true I think it's fair to let the user handle that case on their own. The current system for detecting "already imported" doesn't take into account file checksums, only the file name and timestamp, so doing this the performant way (with metadata only) rather than the "catches any edge case" way (with checksums) felt like the right course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants