gcue Posted June 8, 2016 Posted June 8, 2016 here is my script, it is part of a larger script but this component is taking forever to process expandcollapse popup#include <File.au3> #include <Crypt.au3> #include <Array.au3> $msg_normal = 0 $source_parent_dir = "C:\" Local $full_array[1] ;get full file paths array for all files and in all dirs and sub dirs - this is taking very long $files_array = _FileListToArrayRec($source_parent_dir, "*", $FLTAR_FILES, 1, $FLTAR_SORT, $FLTAR_FULLPATH) Debug("pause 1") ;i need the array to be in the following format for later script processing - this is taking very long For $x = 1 To UBound($files_array) - 1 $file_name_extension = StringRegExpReplace($files_array[$x], "^.*\\", "") $file_path_dir = StringReplace($files_array[$x], "\" & $file_name_extension, "") ReDim $full_array[UBound($full_array) + 1][4] $full_array[UBound($full_array) - 1][0] = $file_path_dir $full_array[UBound($full_array) - 1][1] = $file_name_extension Next Debug("pause 2") ;here's how i find duplicates - this is taking very long $final_array = _GetFileDupes($full_array) If UBound($final_array) - 1 = 0 Then MsgBox($msg_normal, @ScriptName, "There are NO duplicate files found.") Exit EndIf Debug($final_array) Func _GetFileDupes($full_array) _Crypt_Startup() For $x = 1 To UBound($full_array) - 1 $path = $full_array[$x][0] $file_name = $full_array[$x][1] $sha1 = _Crypt_HashFile($path & "\" & $file_name, $CALG_SHA1) $full_array[$x][2] = $sha1 Next ;~ Debug($full_array) _Crypt_Shutdown() Local $final_array[1] For $x = 1 To UBound($full_array) - 1 $search = _ArrayFindAll($full_array, $full_array[$x][2], 1, 0, 0, 0, 2) If UBound($search) = 1 Then ContinueLoop For $y = 0 To UBound($search) - 1 $index = $search[$y] If $full_array[$index][3] <> "DUPLICATE" Then $full_array[$index][3] = "DUPLICATE" ReDim $final_array[UBound($final_array) + 1][3] $final_array[UBound($final_array) - 1][0] = $full_array[$index][0] $final_array[UBound($final_array) - 1][1] = $full_array[$index][1] $final_array[UBound($final_array) - 1][2] = $full_array[$index][2] EndIf Next Next ;~ Debug($final_array) Return $final_array EndFunc ;==>_GetFileDupes Func Debug($variable1 = "", $variable2 = "", $variable3 = "", $variable4 = "", $variable5 = "") ;~ #include <array.au3> ;~ $msg_normal = 0 If IsArray($variable1) Or IsArray($variable2) Then If IsArray($variable1) Then _ArrayDisplay($variable1, $variable2) If IsArray($variable2) Then _ArrayDisplay($variable2, $variable1) Else $variable = "" If $variable1 <> "" Then $variable &= $variable1 & @CRLF If $variable2 <> "" Then $variable &= $variable2 & @CRLF If $variable3 <> "" Then $variable &= $variable3 & @CRLF If $variable4 <> "" Then $variable &= $variable4 & @CRLF If $variable5 <> "" Then $variable &= $variable5 & @CRLF $variable = StringStripWS($variable, 2) ClipPut($variable) MsgBox($msg_normal, "Debug", $variable) EndIf EndFunc ;==>Debug any help is greatly appreciated!
spudw2k Posted June 8, 2016 Posted June 8, 2016 Perhaps you could use the FindFile functions to build your array yourself and populate it with the desired format from the get go, versus populating the array and updating each array entity afterwards.  That could help cut some time down.  Spoiler Things I've Made: Always On Top Tool ◊ AU History â—ŠÂ Deck of Cards â—Š HideIt â—Š ICU â—Š Icon Freezer â—Š Ipod Ejector â—Š Junos Configuration Explorer â—Š Link Downloader â—Š MD5 Folder Enumerator â—Š PassGen â—ŠÂ Ping Tool â—Š Quick NIC â—Š Read OCR â—Š RemoteIT â—Š SchTasksGui â—Š SpyCam â—Š System Scan Report Tool â—Š System UpTime â—Š Transparency Machine ◊ VMWare ESX Builder Misc Code Snippets: ADODB Example â—Š CheckHover ◊ Detect SafeMode â—Š DynEnumArray â—Š GetNetStatData ◊ HashArray â—Š IsBetweenDates â—Š Local Admins â—Š Make Choice â—Š Recursive File List â—Š Remove Sizebox Style â—Š Retrieve PNPDeviceID â—Š Retrieve SysListView32 Contents â—Š Set IE Homepage â—Š Tickle Expired Password â—Š Transpose Array Projects: Drive Space Usage GUI â—ŠÂ LEDkIT â—Š Plasma_kIt â—ŠÂ Scan Engine Builder â—Š SpeeDBurner â—Š SubnetCalc Cool Stuff: AutoItObject UDF â—Š Extract Icon From Proc â—Š GuiCtrlFontRotate â—Š Hex Edit Funcs â—Š Run binary â—Š Service_UDF Â
iamtheky Posted June 8, 2016 Posted June 8, 2016 MD5 might be faster. And why are you splitting them into name and extension?  Wouldnt simply hashing the first array tell you if there were dupes? ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
gcue Posted June 8, 2016 Author Posted June 8, 2016 i need name and extension later. so that's why i am processing them ill try md5 -- but still cant get away from the first part being slow
gcue Posted June 8, 2016 Author Posted June 8, 2016 24 minutes ago, spudw2k said: Perhaps you could use the FindFile functions to build your array yourself and populate it with the desired format from the get go, versus populating the array and updating each array entity afterwards.  That could help cut some time down.  let me try that =)
Danyfirex Posted June 8, 2016 Posted June 8, 2016 (edited) Hello. my suggestions are: Build your own recursive files list rutine. Build you formatted array inside recursion. To speed up you compare rutine you can first check if files size are equal if it is. you go to hash checking. Saludos Edited June 8, 2016 by Danyfirex  Danysys.com     AutoIt...  UDFs: VirusTotal API 2.0 UDF - libZPlay UDF - Apps: Guitar Tab Tester - VirusTotal Hash Checker Examples: Text-to-Speech ISpVoice Interface - Get installed applications - Enable/Disable Network connection  PrintHookProc - WINTRUST - Mute Microphone Level - Get Connected NetWorks - Create NetWork Connection ShortCut  Â
iamtheky Posted June 8, 2016 Posted June 8, 2016 (edited) and you can scrap the sort since you are testing the hashes, that will no doubt speed it up (just over 3x faster in my testing just now) Edited June 8, 2016 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
Beege Posted June 8, 2016 Posted June 8, 2016 Doing a Redim on the array for each item within that first loop has a tax too and should not be needed ;get full file paths array for all files and in all dirs and sub dirs - this is taking very long $files_array = _FileListToArrayRec($source_parent_dir, "*", $FLTAR_FILES, 1, $FLTAR_SORT, $FLTAR_FULLPATH) Debug("pause 1") Local $full_array[UBound($files_array)][4] ;i need the array to be in the following format for later script processing - this is taking very long For $x = 1 To UBound($files_array) - 1 $file_name_extension = StringRegExpReplace($files_array[$x], "^.*\\", "") $file_path_dir = StringReplace($files_array[$x], "\" & $file_name_extension, "") ;ReDim $full_array[UBound($full_array) + 1][4] $full_array[$x][0] = $file_path_dir $full_array[$x][1] = $file_name_extension Next  Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
gcue Posted June 8, 2016 Author Posted June 8, 2016 5 minutes ago, Beege said: Doing a Redim on the array for each item within that first loop has a tax too and should not be needed ;get full file paths array for all files and in all dirs and sub dirs - this is taking very long $files_array = _FileListToArrayRec($source_parent_dir, "*", $FLTAR_FILES, 1, $FLTAR_SORT, $FLTAR_FULLPATH) Debug("pause 1") Local $full_array[UBound($files_array)][4] ;i need the array to be in the following format for later script processing - this is taking very long For $x = 1 To UBound($files_array) - 1 $file_name_extension = StringRegExpReplace($files_array[$x], "^.*\\", "") $file_path_dir = StringReplace($files_array[$x], "\" & $file_name_extension, "") ;ReDim $full_array[UBound($full_array) + 1][4] $full_array[$x][0] = $file_path_dir $full_array[$x][1] = $file_name_extension Next  then how do i add another record to the array without doing that?
Beege Posted June 8, 2016 Posted June 8, 2016 Just like I posted should work. For that portion of code you already know how large the array is going to need to be, so its better to create the whole array just once, then walk though it and fill in the elements.  Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
gcue Posted June 8, 2016 Author Posted June 8, 2016 sorry i overlooked that.. great suggestion! =) thank you!
gcue Posted June 8, 2016 Author Posted June 8, 2016 wow just that alone was incredibly faster beege thank you sooo much for catching/suggesting it. playing with some of the other suggestions thank you everyone! =)
mrider Posted June 8, 2016 Posted June 8, 2016 (edited) You'll need to change how this displays, but this is noticeably faster than using arrays.  Note also that I had to download the SQLite DLLs from https://www.autoitscript.com/autoit3/pkgmgr/sqlite/ and put the files in the script directory in order to get this to work, since my computer is behind a proxy.  My code: expandcollapse popup#include <File.au3> #include <Crypt.au3> #include <Array.au3> #include <SQLite.au3> #include <SQLite.dll.au3> _SQLite_Startup() _SQLite_Open() _SQLite_Exec(-1, "CREATE TABLE HashSums (Count, Sum, Path);") _Crypt_Startup() Local $source_parent_dir = "C:\" Local $row, $count, $path Local $files = _FileListToArrayRec($source_parent_dir, "*", $FLTAR_FILES, 1, $FLTAR_SORT, $FLTAR_FULLPATH) For $i = 1 To $files[0] Local $sum = _Crypt_HashFile($files[$i], $CALG_SHA1) _SQLite_QuerySingleRow(-1, "SELECT Count, Path FROM HashSums WHERE Sum = '" & $sum & "';", $row) If @error Then $path = _SQLite_FastEscape(@LF & $files[$i]) _SQLite_Exec(-1, "INSERT INTO HashSums (Count, Sum, Path) VALUES (1, '" & $sum & "', " & $path & ");") Else $count = Int($row[0])+1 $path = _SQLite_FastEscape($row[1] & @LF & $files[$i]) _SQLite_Exec(-1, "UPDATE HashSums SET Count = '" & $count & "', Path = " & $path & " WHERE SUM = '" & $sum & "';") If @error Then ConsoleWrite("@error = " & @error & @CRLF) EndIf EndIf Next Local $query _SQLite_Query(-1, "SELECT Sum, Path FROM HashSums WHERE Count > 1;", $query) While _SQLite_FetchData($query, $row) = $SQLITE_OK ; Change this to display however you wish... ConsoleWrite($row[0]) ConsoleWrite($row[1] & @LF & @LF & @LF) WEnd _Crypt_Shutdown() _SQLite_Shutdown()  Edited June 8, 2016 by mrider Oops, left experimental path in place instead of the default "C:" How's my riding? Dial 1-800-Wait-There Trying to use a computer with McAfee installed is like trying to read a book at a rock concert.
AspirinJunkie Posted June 9, 2016 Posted June 9, 2016 (edited) It should take very long if you really want to hash every file on "C:\"! Better try to reduce the list of the files to hash by using filtering in the _FileListToArrayRec-Function. Then you need a fast comparison method to find existing matches. Two nested for-loops are very inefficient. A Dictionary is a possible solution for this. Also the sqlite-solution from mrider goes into that direction. So here's my solution for this: #include <File.au3> #include <Crypt.au3> #include <Array.au3> Global $s_Path_Parent = "C:\programming\AutoIt" Global $o_Hashes = ObjCreate("Scripting.Dictionary") Global $o_DoubleHashes = ObjCreate("Scripting.Dictionary") Global $s_Hash, $a_Temp, $s_File Global $a_Files = _FileListToArrayRec($s_Path_Parent, "*.au3", 1, 1, 0, 2) ; Hash all files and create List of double files: _Crypt_Startup() For $i = 1 To $a_Files[0] $s_Hash = String(_Crypt_HashFile($a_Files[$i], $CALG_MD5)) If $o_Hashes.Exists($s_Hash) Then $a_Temp = $o_Hashes($s_Hash) If UBound($a_Temp) = 1 Then $o_DoubleHashes($s_Hash) = 0 _ArrayAdd($a_Temp, $a_Files[$i]) Else Local $a_Temp[] = [$a_Files[$i]] EndIf $o_Hashes($s_Hash) = $a_Temp Next _Crypt_Shutdown() ; output the doubled files: For $s_Hash in $o_DoubleHashes.Keys For $s_File in $o_Hashes($s_Hash) ConsoleWrite($s_File & @CRLF) Next ConsoleWrite(@CRLF) Next  Edited June 9, 2016 by AspirinJunkie
RTFC Posted June 9, 2016 Posted June 9, 2016 Check out trancexx's file mapping examples and KaFu's solution of using hashes on parts of the files. My Contributions and Wrappers Spoiler BitMaskSudokuSolver BuildPartitionTable  CodeCrypter  CodeScanner  DigitalDisplay  Eigen4AutoIt  FAT Suite   HighMem  MetaCodeFileLibrary  OSgrid  Pool  RdRand  SecondDesktop  SimulatedAnnealing  Xbase I/O
gcue Posted June 10, 2016 Author Posted June 10, 2016 i tried the sql way - wasnt much faster - still playing with some of the suggestions
gcue Posted June 17, 2016 Author Posted June 17, 2016 after finding out how much time rediming took.. i went through and removed any extra arrays i was using and worked through the same array - much faster... i also took your advice aspirinjunkie and only going through certain file types instead of * thank you all!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now