Jump to content

SMF - The fastest duplicate files finder... [Updated 2024-Oct-20]


Recommended Posts

Posted (edited)

Already found a small bug which needs to be fixed. If you manually increase the LIMIT in the report, the report is not updated.
 
About the settings... well, export a database from v4 and import that into v5, that should transfer most settings. Though I don't think this will work for report styles, as I've rewritten that part to accept custom names by request of... *check, check*... michaelslamet :lol:...
 
Edit: Fixed the LIMIT bug

Edited by KaFu
  • 3 weeks later...
Posted

2013-Jul-26, Changelog 4.0 > 5.0

  • Fixed - Unwanted auto-reopen of report issue
  • Fixed - Occasional program crash in search for "Executable Infos"
  • Fixed - Crash on "Export" if not column was selected
  • Fixed - "Progress" tab, fixed crash on click of "Pre-Filter Active" icon
  • Added "Throttle Search Speed" option
  • Report - Added "IN Reference Filelist"
  • Report - Added "LIKE Reference Filelist"
  • Report - Added "IN Reference Extensionlist" option
  • Report - Added "Between Reference Dates" option
  • Report - "Load & Save Styles" => made save names selectable
  • Report - Added "LIKE Expressions" and "NOT LIKE Expressions" multiple value selection
  • Report - Added Custom SQLite query builder
  • Added optional duplicate search for "Duplicate_Filename"
  • Updated SQLite Dll to 3.7.17
  • Updated MediaInfo Dll to 0.7.64
  • Updated TrID Definitions to version 2013 Jul 23

Source and Executable are available at
http://www.funk.eu

Best Regards
 

Updated first Post... Enjoy :)...

Posted
  • 2 months later...
Posted

2013-Sep-29, Changelog 5.0 > 6.0

  • Fixed - Different small bugs in main program and report
  • Report - Added special fullscreen mode (F11)
  • Report - Added default styles
  • Updated - Added option to keep current folder selection on purge of DB
  • Updated - Added lot's of hotkeys for main program and report (see "About" for details)
  • Updated SQLite Dll to 3.8.0.2
  • Updated MediaInfo Dll to 0.7.64
  • Updated TrID Definitions to version 2013 Sep 20

Source and Executable are available at
http://www.funk.eu

Best Regards
 

Updated first Post... Enjoy :)...

  • 2 months later...
Posted

2013-Dec-22, Changelog 6.0 > 7.0

  • Fixed - Crash on "Computer Info Report"
  • Updated - Added option to auto-update Folder Treeview
  • Updated - Folder Treeview > added option to cancel refresh when (slow) network folders are included
  • Report - Added several hotkeys (e.g. style change) to standard and fullscreen report
  • Report - Make all relevant functions work with fullscreen GUI too
  • Lots of small bug fixes and style changes
  • Updated SQLite Dll to 3.8.2
  • Updated MediaInfo Dll to 0.7.65
  • Updated TrID Definitions to version 2013 Dec 19

Source and Executable are available at
http://www.funk.eu

Best Regards
 

Updated first Post... Enjoy :)...

  • 3 months later...
Posted (edited)

2014-Apr-06, Changelog v7 > v8

  • Fixed   -   "Image Infos" return could produce SQL errors (when trying to extract infos from damaged pictures)
  • Fixed   -   "Extended File Attributes" column return could produce SQL errors > replaced "#" character in property (column) name
  • Fixed   -   "MediaInfo" column return could produce SQL errors > replaced "*" character in property (column) name
  • Fixed   -   While running, report was not auto-updated correctly after finish
  • Report  -   Implemented _LV_Resort_Thumbnail_Creation_Queue(), now those thumbnails visible should be loaded first
  • Report  -   Implemented new "CopyTo" multi-selection dialog
  • Report  -   Added browse and info controls to fullscreen report
  • Added   -   Option to auto-update Folder Treeview
  • Added   -   "Fast Search Loop" feature for faster default search > searches up to 13.000* files per second!

140406_smf_search_my_files_speed.jpg

* peak on my system, second search run leveraging Windows file cache speed, of course depends on your system's properties too,

  • Added   -   "File MIME Type" analysis
  • Report  -   Added some more default thumbnail sizes: 96px / 160px / 192px
  • Report  -   Tweaked CopyMove CallbackDlg function, now shows thumbnails instead of icons, plus links are clickable
  • Updated -   Lots of small bug fixes and style changes
  • Updated -   SQLite Dll to 3.8.4.3
  • Updated -   MediaInfo Dll to 0.7.68
  • Updated -   True ID - TrID Definitions to version 2014 Mar 22

Source and Executable are available at

http://www.funk.eu

Best Regards
 

Updated first Post... Enjoy :)...

Edited by KaFu
  • 4 months later...
Posted (edited)

Awesome. Thanks! :)

The only one of 2 kinds of these programs I tried before that I can remember is FolderMatch not too long ago.

It crashed on one of my folders of > 80.000 pictures ... (I don't *really* have that many, I have *many* duplicates ;) )

So I was hoping for better with this one, cause ideally I need even way more capacity.

Good points:

- hardly had to touch the defaults

- speed seems not to be lacking

- after a short while I decided I could trust its detection blindy and started doing so

- all I wanted from it I could find; the stuff I didn't understand, luckily I didn't need  :)

and I found a way of working that I was ok with pretty quickly

(by opting for "Show no icons" and hovering over the files with Tooltip preview on for a quick "confidence check")

Work to be done:

- had many "Error allocating memory" dialogs as long as I was using "Show icons." (even at their smallest and with report limit to 100)

So wondering if you could allow for more memory? (or have it as a user pref)

All seemed ok with whatever amount of pics when I disabled icon previews.

- 2 or 3 times I had "Line 37890 - Variable used without being declared."  I believe at least one of those was at the end of a session in the 80.000+ pics folder and I only had some thumbnails left to delete.

Suggestions

- Nothing really important  :)

- This may already be incorporated, but I wonder if some intelligent logic is into place for selecting the hopefully correct picture to delete?

My logic would for example be this:

* In early days of internet, everything was new and when I started collecting pics on some subjects, I was more meticulous to organize them well.

This means that I would both give priority to older dates *and* look at folders that are less common.

For example:

If I collected pics of European countries, the pics I want to keep are probably in folders named after countries and *not* among a big bunch that are not yet organized and in the folder "European countries" (the more common folder, IF you have more pics in that folder than in the largest specific country folder.) They would probably also often have the older dates.

I might also do a check for "(2)" (indicating a dupe in the same folder, or now a different folder) and the like at the end of a filename to add to my logic.

- What I may enjoy if I have more time, or when the dupes are not terribly numerous, is a mode that makes it fun and easy to select the pic I want to keep, while enjoying a good look at them again.

I'm thinking of an image viewer style that per image shows me all the dupes in one larger view, with the disk & (at least end of) filepaths as easy as possible to read, cause that's what it's about in the end of it all (for me): selecting the pic in the preferred folder to keep. Since that is just one pic usually, it would be great to be able to click anywhere on the pic to select it as a keeper, after which the program moves to the next duplicates (you'd need something to stay in place for when it's more than one pic though).

So there  ;)  I'm keeping an eye on this and I will already recommend it!

(I'm on Win 7 64b)

Edited by Guy_
Posted (edited)

Awesome. Thanks! :)

 

Glad you like it :), and thanks for this thoroughly feedback.

Work to be done:

- had many "Error allocating memory" dialogs as long as I was using "Show icons." (even at their smallest and with report limit to 100)

So wondering if you could allow for more memory? (or have it as a user pref)

All seemed ok with whatever amount of pics when I disabled icon previews.

- 2 or 3 times I had "Line 37890 - Variable used without being declared."  I believe at least one of those was at the end of a session in the 80.000+ pics folder and I only had some thumbnails left to delete.

 

I intentionally tried to code SMF with as few limitations as possible, to get the most out of the reporting functionality.

  • As a consequence from your feedback I've implemented a thumbnail creation break. Now the report will check the overall memory usage and stop creating new thumbnails if the memory usage is > 90% to prevent the report from crashing (thumbnails are the bottleneck, as they are stored as an imagelist which uses uncompressed bitmaps). What really makes me wonder is the low number of thumbnails you were able to display. I myself never had problems with some thousand, though the memory usage did climb significantly. How much RAM do you have? Could you re-test it and monitor the memory usage in the "Task Manager"?
  • Thanks for pointing out the "Line 37890" issue, typical copy & paste error on my side :), should be fixed now.

I intend to release a v9 quite soon, here's the current Beta for public testing (happy about any feedback or improvement suggestions :) )...

- This may already be incorporated, but I wonder if some intelligent logic is into place for selecting the hopefully correct picture to delete?

My logic would for example be this:

* In early days of internet, everything was new and when I started collecting pics on some subjects, I was more meticulous to organize them well.

This means that I would both give priority to older dates *and* look at folders that are less common.

 

SMF searches in alphabetical order, what I do is such a case is to put all files in two different sub-folders, "a" for the files most likely to be kept and "b" for the files most likely to be deleted. That way the files to be kept will be above the other ones in the duplication enumeration.

- What I may enjoy if I have more time, or when the dupes are not terribly numerous, is a mode that makes it fun and easy to select the pic I want to keep, while enjoying a good look at them again.

I'm thinking of an image viewer style that per image shows me all the dupes in one larger view, with the disk & (at least end of) filepaths as easy as possible to read, cause that's what it's about in the end of it all (for me): selecting the pic in the preferred folder to keep. Since that is just one pic usually, it would be great to be able to click anywhere on the pic to select it as a keeper, after which the program moves to the next duplicates (you'd need something to stay in place for when it's more than one pic though).

 

Give the pre-defined report styles a try :), hit F1 to F5, best results in this scenario you might achieve with F3 or F4. Select the files you want to keep with a mouse click (hold CTRL for multiple selection), then "Toggle" the selection (top right of report or in the right-click contextmenu) and delete all superfluous files.

Best Regards

Edited by KaFu
  • 2 weeks later...
Posted

2014-Sep-06, Changelog v8 > v9

  • Added   -   Duplicates Search "Hash-Cache" functionality (optional), calculated hashes are cached and re-used in next search run to improve duplicate search speed
  • Fixed     -   Option to Auto-update Folder Treeview sometimes crashed (e.g. new drives added)
  • Report   -   Added memory check function, if more than 90% of memory is in use, do not create any more icons / thumbnail previews
  • Report   -   Added option to change report icon size with CTRL+MouseWheel
  • Updated -   Lots of other bug fixes and style changes
  • Updated -   SQLite Dll to 3.8.6
  • Updated -   MediaInfo Dll to 0.7.70
  • Updated -   TrID Definitions to version 2014 Aug 23

Source and Executable are available at

http://www.funk.eu

Best Regards
 

Updated first Post... Enjoy :)...

Posted (edited)

"Sorry I'm late." I thought I had lost my password for the forum for a while and the forum software does not want to send it back to me... (then I found it again).

I tried beta and last version with a freshly restarted Win 7 OS and I still have the memory errors if I put on thumbs (probably without them too, which I discovered a while ago). Even while I only need for 50 ('limit') to show at a time.

Once these errors start they can get into a loop of error reporting and I have to quit SMF.

If I don't use thumbs or very small ones and a low limit (50) it *does* sometimes happen that I can work for a while too.

I have reloaded my old 55 MB database though and not restarted everything fresh, so I don't know if that could be a factor.

I assume SMF is 32 bit and does not address memory above 4GB?

I have 16GB installed and when I check the process list during the error dialogs, the memory of that thumbnail process is usually at 1.697.xxx K (1,6GB) and my memory usage then looks a little over 4GB.

ATM, I'm guessing that without a fresh OS restart, I'll always have problems pretty quickly. This situation does not seem better than before, but possibly on the contrary. (I have >24GB left on drive C)

My pics are just normal size pics from earlier internet days, so even if SMF was putting original pic previews and thumbs of 50 to 250 pics all directly in memory, it shouldn't give errors or reach 1.6 GB, I would think...? Is it possible it is trying to put *way* more into memory than specified in 'limit'?

Some more suggestions that you may have solutions built-in for already...

- For my "important" pics I have to go slowy and "manually" down the list, for which I rearrange the column order (and like the preview on the left). So I was disappointed that column order/size is not saved in "Save report styles", because I have to revisit the results often before work is done. I'm not sure it is even possible to do that though...

- Because I have to revisit often, I also wish 'offset' was saved with the database (or somewhere).

It would also be nice if you could select the pics you already looked at and just "drop them" from the database, because they can get confusing to look at still being there. I think you could also recalculate the 'offset' after every deletion of dupes?

- While paging (with a limit of 50 for example) the report should never split on duplicates. For me, it would be better if you then disrepect the exact limit and split at less or more, but show all dupes of the last pic.

Thanks for developing :)

Edited by Guy_
Posted

Is it possible to have the right-click context menu functionality do a "Search Duplicates" straight away?

I got a bit confused there that it didn't, but I realize there is more than one use for your program of course ;)

Anyway, a preference dropdown or something would be cool.

  • 2 weeks later...
Posted

Sorry for me being late now :), quite busy at home and work and esp. the memory limit was a really hard nut... hopefully solved now :).

> For my "important" pics I have to go slowly and "manually" down the list, for which I rearrange the column order (and like the preview on the left).
>So I was disappointed that column order/size is not saved in "Save report styles", because I have to revisit the results often before work is done. I'm not sure it is even possible to do that though...
DONE

> Because I have to revisit often, I also wish 'offset' was saved with the database (or somewhere).
DONE

> It would also be nice if you could select the pics you already looked at and just "drop them" from the database, because they can get confusing to look at still being there.
Already possible > right-click context menu on Listview > "Delete" > "Records from Database only"

> I think you could also recalculate the 'offset' after every deletion of dupes?
? Deletion of records should not change the "Offset"

> Is it possible to have the right-click context menu functionality do a "Search Duplicates" straight away?
DONE

> I assume SMF is 32 bit and does not address memory above 4GB?
Absolutely right, beginners fault I made there :), just restricting to 90% of memory just does not do it when you've got more than 4GB installed...

> it shouldn't give errors or reach 1.6 GB, I would think...? Is it possible it is trying to put *way* more into memory than specified in 'limit'?
I decided for a whole different approach. Now SMF imposes a hard limit of 1GB RAM for the report (or less if less RAM is freely available). The report loads images until the limit is reached.

After the limit is reached, the imagelist will be purged on next reload of the report (e.g. change offset, sort order).

For the currently visible results the report will try to ensure that the thumbnails are visible by purging not visible thumbnails and freeing up memory that way.
    
Give the v10 Beta a try and let me know if it works for you ;)...

Posted (edited)

Wow, thanks again! It's getting there, but the bug list seems to grow longer too... ;)

- In the Settings as well as on installation, the dropdown for Explorer Contextmenu is now greyed out and inaccessible here. The Context menu options themselves *are* visible and seem to be working though, so I'm not missing anything by that.

- I did have one or two new "Error allocating memory" dialogs with the old database in the beginning, but I think I've discovered a new relation or possible cause. In previous versions, when a pic was deleted, the pic remained in the list, but as it was deleted SMF still showed a generic jpg icon in place of it. It seems to me SMF can sometimes trip up on these non-existing pics maybe. If I'm among untouched pics down in the list where I've done no deletions yet, it seems to work great now. I also like that the related pics disappear, so no more generic jpg icons even need to appear.

- One time I clicked on "Show report" and got this error:  

IdjqxkJ.png

Clicked again and then it worked.

 

> I think you could also recalculate the 'offset' after every deletion of dupes?
? Deletion of records should not change the "Offset"

 

I'll explain again what is a big irritation in my workflow...

Let's say my Limit is on 50 and I delete only 2 pics. SMF will make 2 other relating ones invisible, which leaves 46 pics to which SMF then adds 4 new ones on the bottom again to get to 50 (this is the problem). This makes that I can not use the page button or I will skip these 4 added pics. I have to keep scrolling down the list and address the few new pics that are added ad nauseum. Because if I then delete just one of these new pics on the bottom, SMF will add 2 new ones on the bottom and I have to scroll down again. The more pics I initially delete, the worse the problem is.
That's why I first thought a solution could be to have the offset update. But probably a better solution would be that you don't let SMF add the pics again on the bottom to get to the Limit of 50. If you don't, the paging button would work more as I would expect it to :)

Then there were some confusing moments that I'm still not sure how often it is happening and if it will be a standing issue.

- The first time I saved a custom style, it didn't ask for thumb dimensions and another time it did. Maybe that was related to settings I had at the time, but I'm pretty sure I had thumbs displaying the first time.

- Just been playing with the different default styles and after that my saved style suddenly didn't show thumbs anymore.

I'm also confused how some of these styles seem to start from different pics (but I have not yet looked deep into that).

It would also be great if the column widths were saved  ;)

While paging (with a limit of 50 for example) the report should never split on duplicates. For me, it would be better if you then disrepect the exact limit and split at less or more, but show all dupes of the last pic.

No pressure, but did you overlook this or do you simply disagree?
I'll explain that one again too. Let's say I have 3 duplicate pics near the bottom of the list, but only two are shown and one is just beyond the Limit and thus not showing... If I then delete one of the visible ones, I then get to see that there was still another one, and maybe my decision about which one to delete would have been different... So I would lightly disrespect the Limit here and always group all duplicates occuring on the bottom of the list and (possibly) a few pics beyond it.

Edited by Guy_
Posted (edited)

Hi Guy,

Wow, thanks again! It's getting there, but the bug list seems to grow longer too... ;)

 

No problem, keep em coming ;)...

In the Settings as well as on installation, the dropdown for Explorer Contextmenu is now greyed out and inaccessible here. The Context menu options themselves *are* visible and seem to be working though, so I'm not missing anything by that.

 

Not sure why this happened, can't reproduce. Did you maybe accidentally selected a "Portable" install and the contextmenu entries are left-overs from the former install (you can check that on the "Settings" tab by hovering over the "I" info icon)? I've reworked the "uninstall" routine if an old version is detected, so that now also the old contextmenu entries are removed too (in case someone switches the install type).

I did have one or two new "Error allocating memory" dialogs with the old database in the beginning, but I think I've discovered a new relation or possible cause. In previous versions, when a pic was deleted, the pic remained in the list, but as it was deleted SMF still showed a generic jpg icon in place of it. It seems to me SMF can sometimes trip up on these non-existing pics maybe. If I'm among untouched pics down in the list where I've done no deletions yet, it seems to work great now. I also like that the related pics disappear, so no more generic jpg icons even need to appear.

 

That might have been a good hint (hopefully :) ), I tweaked the thumbnail provider process to check for existence of files first (there's a feature to rotate images based on meta-data, that might have crashed when the picture actually did not exist). When you delete files with SMF, the files and the records in the database should be gone (old, implemented long time ago). The report only displays generic icons if the file is gone but the record still exists (e.g. you do a search and manually delete the file in Windows Explorer afterwards).

One time I clicked on "Show report" and got this error:  

Clicked again and then it worked.

 

Hopefully gone too, I think it was related to code I've changed in course to implement the second contextmenu entry ;)...

But probably a better solution would be that you don't let SMF add the pics again on the bottom to get to the Limit of 50. If you don't, the paging button would work more as I would expect it to :)

 

I see what you mean, but a) it's quite hard to implement and B) the report should stay as generic as possible, because duplicate deletion is just one use. What I've done now is to add three more buttons "+1", "+10" & "+100" to dynamically increase the number of displayed items. So if you know the last one is cut-off (displaying only 1/3 and not 2/3 and 3/3) you now can easily add those to the displayed records too.

Then there were some confusing moments that I'm still not sure how often it is happening and if it will be a standing issue.

- The first time I saved a custom style, it didn't ask for thumb dimensions and another time it did. Maybe that was related to settings I had at the time, but I'm pretty sure I had thumbs displaying the first time.

 

I can not reproduce that just now, let me know when you find a reproducing procedure.

I'm also confused how some of these styles seem to start from different pics (but I have not yet looked deep into that).

It would also be great if the column widths were saved  ;).

 

Sorting direction was not saved properly, should be fixed now. Also the width is saved too (only for report view styles).

No pressure, but did you overlook this or do you simply disagree?

I'll explain that one again too. Let's say I have 3 duplicate pics near the bottom of the list, but only two are shown and one is just beyond the Limit and thus not showing... If I then delete one of the visible ones, I then get to see that there was still another one, and maybe my decision about which one to delete would have been different... So I would lightly disrespect the Limit here and always group all duplicates occuring on the bottom of the list and (possibly) a few pics beyond it.

 

I did not overlook it but was not sure how to implement. Hopefully it works now :). I've added a checkbox to "Prevent Duplicate Occurrence Cut-Off", only works if "Duplicate Occurrence" is set to NOT NULL and list is either sorted by Filesize or Hash. In this case items are added to the end of the list (additional to the limit) until the last one in the current group is displayed (e.g. you limit to 250, the last one is 1/3, then two more are added and the limit is automatically increased to 252)... hope that solves that :)...

Give the latest v10 Beta a try and let me know if it works for you ;)...

Edited by KaFu
Posted (edited)

Thanks! :) Just been testing for a few minutes so far...
A few preliminary findings...:

I had indeed unwantedly made a portable install, so that issue is fixed.

Seems the previous SMF erased my database this time, but I had a similar backup.

- AFAIK, I have not changed any picture folder names.

The thumbs show normally, but the main problem is now that the preview is only giving me fully black pictures... (or also stays white and "seems dead").

Doesn't help to click it off and on, or change thumbnail sizes.

- One time when I was typing a new offset, this one appeared:
3Rk9Nky.png

- The Limit "+1" button always seems to add 2 here.  [further down I realized why this may be (a good thing)]

Gonna do a restart and see if that solves anything  :)

Edited by Guy_
Posted

A restart didn't solve the preview problem for the most part.

However, I'm now in a state where 25 to 50% of pics *do* preview, others remain black, many have scrambled info on the edge and a few look kinda like this:

0xukLBy.png

 

Did have the "Error allocating memory" dialog when testing for a limit of 500.

To make sure it has nothing to do with disappeared pics and generic jpg icons, I may rebuild the database and get back to you one of the next days :)

Posted (edited)

Rebuilt the database and I must have included some folders I previously didn't.

I now have 15.600 duplicates for 372.000 files... (previously around 9000)

The database is 325.000 MB, previously 55.000 MB, which *is* a puzzling difference (but I could be to blame).

I had the dialog mentioned 2 posts before again while deleting a number in the offset field.

Maybe it trips up if you delete too slowly or in a certain timing...

No "Error allocating memory" yet.

The rest of the problems persist, but I think I found a clue for the preview:

It seems to do way more previews like it should if I select "Show no icons".

Still about half of them remain black though...

On offset 0, I just put the Limit to 50 and preview goes dead (remains gray screen).

The Limit also auto-updates to 51 (sry, this may be because of your new "duplicates at end of Limit" detection? :) )

Edited by Guy_

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...