Jump to content

SMF - The fastest duplicate files finder... [Updated 2024-Oct-20]


Recommended Posts

Posted

2023-Jun-03, Changelog v14 > v15

  • Updated - SQLite Dll to 3.42.0
  • Updated - MediaInfo Dll to 23.04
  • Updated -   TrID Definitions to version 2023-May-14
  • Updated -   Added dHash visual similarity check
  • Updated -   Improved thumbnail creation speed
  • Fixed       -   Caching of hashes to speed up duplicate search
  • Fixed       -   Report crash on stats change
  • Fixed       -   CTRL+ALT+F explorer integration
  • Updated -   Lots of other small bug fixes and style changes

Source and Executable are available at https://funk.eu

Best Regards
Updated first Post... Enjoy :)...

  • 10 months later...
Posted

woudl this program have the ability to search for multple files using a set oif searches?

i.e.

*filename1234.txt

*filename1235.txt

*filename1247.txt

*filename1248.txt

*filename2240.txt

 

find all 5 of theses files then give you an option to copy them to another folder?

Posted (edited)

Yes, SMF can do that. Select the folders to search, do a search and open the report.

Go to the yellow filter box above the "FileName Long" column and enter this:

filename1234.txt|filename1235.txt|filename1247.txt|filename1248.txt|filename2240.txt

The "|" sign is a delimiter for an OR condition in the LIKE statement. You can also select "LIKE Expressions" in the drop-down above the yellow field and enter the filenames one per line.

Select all required files in the report and either Ctrl+C copy them or use the right mouse click menu for more complex copy operations (e.g. "Copy to...").

Edited by KaFu
Posted

Hi @KaFu, I've a real-world use case for detecting and managing duplicates (same content but different filenames) in a large folder tree.

My case is as follows: I gather payware technical documentation for a very large number of appliances from many makers. Each model has it's own set of docs but very often the maker issues the same docs for a number of similar machines in the same series. Many makers use a number of brands, often offering almost identical machine under different names and brand names. I organize the docs based on brand, then type then model number. Hence I store many dups in either the same folder or different folders. I also store datasheet of electronic components , but this is a much smaller volume and large duplicates are rare. All this accumulates over years and I'd like to minimize storage and backups sizes. Curently, 85833 files in 14667 folders, worth 13Gb of disk storage.

The way I think of it is to allow SMF to let me decide, given a number of docs having the same content, which file will remain the source and convert dups to links (hard or soft) while retaining the path and name of the dups.

Of course I could write a dedicated program to do precisely that, but if SMF could do that by itself, I wouldn't object 😁

Do you think it's both possible and a valuable feature to other users?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted

Hi @jchd, SMF can do all the initial steps. but it can not replace the duplicates detected with (soft or hard) links. Only workaround would be to do a search and then export the results including the hash values to CSV as an input for a separate script, which replaces all but the first matches per file with links.

Posted

Understood, that's what I was heading to.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

  • KaFu changed the title to SMF - The fastest duplicate files finder... [Updated 2024-Oct-13]
Posted (edited)

2024-Oct-9, Changelog v15 > v16

  • Updated   -   SQLite Dll to 3.46.1
  • Updated   -   MediaInfo Dll to 24.06
  • Updated   -   TrID Definitions to version 2024-Oct-9
  • Updated   -   dHash > added support for video files, if ffmpeg.exe exists in directory "SMF_Files\Bins"
  • Updated   -   dHash > added support for WebP image files, based on WebP UDF by UEZ
  • Updated   -   Lots of other small bug fixes and style changes

Source and Executable are available at https://funk.eu

Best Regards
Updated first Post... Enjoy :)...

Edited by KaFu
  • KaFu changed the title to SMF - The fastest duplicate files finder... [Updated 2024-Oct-20]
Posted

2024-Oct-20, Changelog v16 > v17

  • Fixed         -   "Datatype mismatch" crash in "Hash Cache" function of duplicates search

Sorry for any inconvenience caused, typical case of over-optimization.

Source and Executable are available at https://funk.eu

Best Regards
Updated first Post... Enjoy :)...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...