frank10 Posted April 4, 2022 Share Posted April 4, 2022 I have a stored txt file with these line structure: 20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49| 20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0| there are about 80000 lines.... so it's huge I need to add some more data in the same format, but sometimes there are some lines with similar content that needs to be updated without duplicating: i.e 20211231000000|gasCOM|USD|gas 1||BROAD|2142|20|0.033|0.59|11.79|||0|0|0|0|0| that need to substitute the old line. To detect if a line must be updated I need to check the text upto the 6° "|", so: "20211231000000|gasCOM|USD|gas 1||BROAD|" I made this, loading data all in one column: local $aDB, $aNew _FileReadToArray( $reportsADSDir & "\" & "originalData.txt", $aDB) _FileReadToArray( $reportsADSDir & "\" & "newData.txt", $aNew) local $tLoop = Timerinit() for $i=0 to Ubound($aNew)-1 if TimerDiff($tLoop)/1000 > 15 then ; in 15seconds ONLY 382 new lines done !! consolewrite( $i & @CRLF) ExitLoop EndIf local $sLine = $aNew[$i] local $sNew = StringMid($sLine, 1 , Stringinstr($sLine, "|", 0,6) ) local $res = _ArrayBinarySearch ($aDB, $sNew,0,0,0) if $res > 0 then $aDB[$res] = $sLine Else _arrayAdd($aDB,$sLine,0,"§") EndIf Next consolewrite( "tdiff:" & TimerDiff($tLoop)/1000 & @CRLF) _FileWriteFromArray( $reportsADSDir & "\" & "_NEW.txt" , $aDB) It's deadly SLOW..:!! it elaborates only about 380 new lines of data (out of 25k...) in 15 seconds! Unusable... Any ideas as to make this search a lot faster? Link to comment Share on other sites More sharing options...
Nine Posted April 4, 2022 Share Posted April 4, 2022 (edited) How about using SQLite with a unique index consisting of the first six variables ? Once loaded you could reuse the DB for the next update by modifying the DB with the new data, and generate a new txt file. I am sure it would be lot faster...Even if you need to load the data each time, it would be faster. Edited April 4, 2022 by Nine “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
frank10 Posted April 4, 2022 Author Share Posted April 4, 2022 (edited) Ok, if there are no tricks to speedup the process with array, I'll go with SQlite... I tried to make a similar process in a browser with JS with array: they are blazingly FAST there... it's autoit with long array that is slow... Edited April 4, 2022 by frank10 Link to comment Share on other sites More sharing options...
Nine Posted April 4, 2022 Share Posted April 4, 2022 (edited) From what I am seeing, your code should not even work. You would need to use partial search with _ArraySearch : #include <Array.au3> Local $aDB = [ _ "20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49|", _ "20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0|" ] Local $str = "20211231000000|gasCOM|USD|gas 1||BROAD|2142|20|0.033|0.59|11.79|||0|0|0|0|0|" Local $sNew = StringMid($str, 1 , Stringinstr($str, "|", 1,6)) ConsoleWrite($sNew & @CRLF) Local $res = _ArraySearch ($aDB, $sNew, 0, 0, 1, 1) ConsoleWrite($res & @CRLF) ; <<<<<<<<<<<<< working $res = _ArrayBinarySearch ($aDB, $sNew) ConsoleWrite($res & "/" & @error & @CRLF) ; <<<<<<<<<<<<< NOT working By making both StringInStr and _ArraySearch case sensitive, it may speed up your script, but I am quite sure, it won't be enough If the original data is perfectly sorted for the first 6 fields and the new data is also perfectly sorted on the first 6 fields, you can use the previous successful find as the starting point for the next search. I will get faster and faster along the way... Edited April 4, 2022 by Nine “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
mikell Posted April 4, 2022 Share Posted April 4, 2022 4 hours ago, frank10 said: To detect if a line must be updated I need to check the text upto the 6° "|" This could be done easily using regex (text treatment, no array used) Please try this for the concept (raw code - without error checking !!) ; create 80k lines txt file #cs $s = "20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49|" $old_line = "20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0|" Local $txt For $i = 0 to 80000 $txt &= $s & @crlf Next $txt &= $old_line & @crlf FileWrite(@desktopdir & "\test.txt", $txt) #ce ; get from new line the part to be detected - upto the 6° "|" $new_line = "20211231000000|gasCOM|USD|gas 1||BROAD|88888|20|0.033|0.59|11.79|||0|0|0|0|0|" $part = StringRegExpReplace($new_line, '^(.*?\|){6}\K.*', "") Msgbox(0,"", $part) ; get the corresponding line from txt $txt = FileRead(@desktopdir & "\test.txt") $checked_line = StringRegExp($txt, '\Q' & $part & '\E.*', 1)[0] Msgbox(0,"", $checked_line) ; replace it and write new file $txt2 = StringReplace($txt, $checked_line, $new_line) FileWrite(@desktopdir & "\test_new.txt", $txt2) pixelsearch and Trong 2 Link to comment Share on other sites More sharing options...
Nine Posted April 4, 2022 Share Posted April 4, 2022 (edited) @mikell I believe from his op he needs to perform 25k times the same search and replace/add on a 80k lines txt file ! How much time does it takes for a single search and replace/add ? @frank10 Could you provide both files for us to see if there is an easy solution without SQLite ? Edited April 4, 2022 by Nine “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
frank10 Posted April 4, 2022 Author Share Posted April 4, 2022 Now I must go, tomorrow I will post the files, but they are huge... about 10MB total... @mikell Yes, I thought testing strings vs array, I will try! For now, I tested your suggestion about starting from the last search and it was a very GOOD trick to speed up ArraySearch! BUT overall timing unusable sadly. The main problem is adding new rows to the array: too much slow _ArrayAdd ! These are logs with e w/out lastFound to start Search (notice how slows down when adding elements with ArrayAdd...): expandcollapse popup1) Only the first 1000 new lines analyzed WITHOUT lastFound: line:0 Ubound:77756 added:0 lastFound:0 sec:0.117197 line:1000 Ubound:77756 added:0 lastFound:58602 sec:112.7096294 2) This instead, very good with lastFound: $res = _ArraySearch ($aDB, $sNew, $lastFound, 0, 1, 1) if $res > 0 then $lastFound = $res ........ line:0 Ubound:77756 added:0 lastFound:0 sec:0.1088671 line:1000 Ubound:77756 added:0 lastFound:58602 sec:0.1388141 line:2000 Ubound:77756 added:0 lastFound:59602 sec:0.1686444 line:3000 Ubound:77756 added:0 lastFound:60602 sec:0.2022854 line:4000 Ubound:77756 added:0 lastFound:61602 sec:0.2330433 line:5000 Ubound:77756 added:0 lastFound:62602 sec:0.2650052 line:6000 Ubound:77756 added:0 lastFound:63602 sec:0.2943784 line:7000 Ubound:77756 added:0 lastFound:64602 sec:0.3224296 line:8000 Ubound:77756 added:0 lastFound:65602 sec:0.3501796 line:9000 Ubound:77756 added:0 lastFound:66602 sec:0.3837926 line:10000 Ubound:77756 added:0 lastFound:67602 sec:0.4179998 line:11000 Ubound:77756 added:0 lastFound:68602 sec:0.4474786 line:12000 Ubound:77756 added:0 lastFound:69602 sec:0.4776257 line:13000 Ubound:77756 added:0 lastFound:70602 sec:0.5143441 line:14000 Ubound:77756 added:0 lastFound:71602 sec:0.5455406 line:15000 Ubound:77869 added:113 lastFound:72489 sec:5.950659 line:16000 Ubound:77990 added:121 lastFound:73368 sec:11.5094973 line:17000 Ubound:77990 added:0 lastFound:74368 sec:11.5393217 line:18000 Ubound:77990 added:0 lastFound:75368 sec:11.5747518 line:19000 Ubound:77990 added:0 lastFound:76368 sec:11.6063726 line:20000 Ubound:77990 added:0 lastFound:77368 sec:11.6438742 line:21000 Ubound:78604 added:614 lastFound:77754 sec:34.7205401 line:22000 Ubound:79604 added:1000 lastFound:77754 sec:73.8953093 line:23000 Ubound:80604 added:1000 lastFound:77754 sec:115.3478616 line:24000 Ubound:81604 added:1000 lastFound:77754 sec:160.3937781 line:25000 Ubound:82604 added:1000 lastFound:77754 sec:208.6512129 25708 tdiff:244.3590916 But sadly 244 seconds to finish...!! So, _ArrayAdd is slowing things down. Link to comment Share on other sites More sharing options...
Nine Posted April 4, 2022 Share Posted April 4, 2022 5 minutes ago, frank10 said: I tested your suggestion about starting from the last search and it was a very GOOD trick to speed up ArraySearch Glad you tested my idea. But if you discard the _ArrayAdd, how much time does it take ? In the case it is good enough without _ArrayAdd, there might a better solution. I believe there is a way to perform the addition without it... “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
mikell Posted April 4, 2022 Share Posted April 4, 2022 40 minutes ago, Nine said: he needs to perform 25k times the same search Hmm.. I misunderstood this tiny detail To add/change 25k lines in one shot with a reasonable delay I can't think of anything else than SQLite indeed. Any String or Array way will be too slow... Link to comment Share on other sites More sharing options...
frank10 Posted April 5, 2022 Author Share Posted April 5, 2022 (edited) 12 hours ago, Nine said: Glad you tested my idea. But if you discard the _ArrayAdd, how much time does it take ? In the case it is good enough without _ArrayAdd, there might a better solution. I believe there is a way to perform the addition without it... Only the search without _ArrayAdd is fast! I got it working putting the new lines all in a string and making one only _ArrayAdd at the end of the for loop! Now I'm at 7.5sec for all , quite acceptable... 2x initial sort of both array sec:4.2354846 line:0 Ubound:77756 added:0 lastFound:0 sec:4.3428845 line:1000 Ubound:77756 added:0 lastFound:58602 sec:4.3722679 line:2000 Ubound:77756 added:0 lastFound:59602 sec:4.4027041 line:3000 Ubound:77756 added:0 lastFound:60602 sec:4.4331181 line:4000 Ubound:77756 added:0 lastFound:61602 sec:4.4637452 line:5000 Ubound:77756 added:0 lastFound:62602 sec:4.4933318 line:6000 Ubound:77756 added:0 lastFound:63602 sec:4.5226008 line:7000 Ubound:77756 added:0 lastFound:64602 sec:4.5527228 line:8000 Ubound:77756 added:0 lastFound:65602 sec:4.5823687 line:9000 Ubound:77756 added:0 lastFound:66602 sec:4.6140867 line:10000 Ubound:77756 added:0 lastFound:67602 sec:4.6438304 line:11000 Ubound:77756 added:0 lastFound:68602 sec:4.6738332 line:12000 Ubound:77756 added:0 lastFound:69602 sec:4.7021799 line:13000 Ubound:77756 added:0 lastFound:70602 sec:4.7331963 line:14000 Ubound:77756 added:0 lastFound:71602 sec:4.7626579 line:15000 Ubound:77756 added:113 lastFound:72489 sec:5.9211965 line:16000 Ubound:77756 added:121 lastFound:73368 sec:7.1149558 line:17000 Ubound:77756 added:0 lastFound:74368 sec:7.1440564 line:18000 Ubound:77756 added:0 lastFound:75368 sec:7.1778125 line:19000 Ubound:77756 added:0 lastFound:76368 sec:7.2093452 line:20000 Ubound:77756 added:0 lastFound:77368 sec:7.2420617 line:21000 Ubound:77756 added:614 lastFound:77754 sec:7.2783153 line:22000 Ubound:77756 added:1000 lastFound:77754 sec:7.3140007 line:23000 Ubound:77756 added:1000 lastFound:77754 sec:7.3461758 line:24000 Ubound:77756 added:1000 lastFound:77754 sec:7.3777684 line:25000 Ubound:77756 added:1000 lastFound:77754 sec:7.4103058 25708 sec:7.5072707 To be more precise, the most of new lines that match the first part of the line, are perfectly identical to those existing in the original DB (so they would not need to be re-copied over). Only some of them has slightly variation in the second part of the line, that's the reason I check the first part to see all the identical in the first part, then I copy all, even if not necessary just because it's fast overwriting... Just in case you could have some ideas maybe using _arrayUnique first Anyway, if you want to try something still more efficient (faster is always better ), I upload the files zipped testFiles.zip Edited April 5, 2022 by frank10 Link to comment Share on other sites More sharing options...
Nine Posted April 5, 2022 Share Posted April 5, 2022 5.8 secs with : #include <Array.au3> #include <File.au3> Local $hTimer = TimerInit() Local $aDB = FileReadToArray("OrigDbSort.txt") Local $aNew = FileReadToArray("newLinesSort.txt") Local $aAdd[10000] Local $part, $res, $start = 0, $ind = 0 For $i = 0 to UBound($aNew) - 1 $part = StringMid($aNew[$i], 1 , Stringinstr($aNew[$i], "|", 1,6)) $res = _ArraySearch ($aDB, $part, $start, 0, 1, 1) If $res >= 0 Then $aDB[$res] = $aNew[$i] $start = $res + 1 Else $aAdd[$ind] = $aNew[$i] $ind += 1 EndIf Next Local $hFile = FileOpen("TempDB.txt", $FO_OVERWRITE) _FileWriteFromArray($hFile, $aDB) _FileWriteFromArray($hFile, $aAdd, 0, $ind) FileClose($hFile) RunWait(@ComSpec & " /c sort TempDB.txt /o NewDB.txt", "", @SW_HIDE) FileDelete("TempDB.txt") ConsoleWrite(TimerDiff($hTimer) & @CRLF) I sorted the input files previously to make sure... “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
636C65616E Posted April 5, 2022 Share Posted April 5, 2022 (edited) Hey, Just a little guess, you could use Didn't tested this UDF, but i bet it will be faster than using regexp, ArraySearch, etc As it is based on SQLite, you can update your file with SQL requests (ie using _CSV_Exec($csv_handle, $csv_query)) Edited April 5, 2022 by 636C65616E Link to comment Share on other sites More sharing options...
jchd Posted April 5, 2022 Share Posted April 5, 2022 (edited) Using SQLite (almost directly): first make input files look like .CSV by inserting a comma after the 6th bar, then use the SQLite CLI to import both files as tables and merge them; finally output the result as a text file. Local $s = "Part1,Part2" & @CRLF & FileRead("origdb.txt") $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2") Local $h = FileOpen("origdb1.txt", $FO_OVERWRITE) FileWrite($h, $s) FileClose($h) $s = "Part1,Part2" & @CRLF & FileRead("newlines.txt") $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2") $h = FileOpen("newlines1.txt", $FO_OVERWRITE) FileWrite($h, $s) FileClose($h) _SQLite_SQLiteExe("Merging.sq3", _ "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _ "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _ ".import origdb1.txt Origin --csv" & @CRLF & _ ".import newlines1.txt Extras --csv" & @CRLF & _ "insert or replace into origin select * from extras;" & @CRLF & _ ".output Result.txt" & @CRLF & _ "select part1 || part2 as "" from Origin;" & @CRLF & _ ".quit", $s) This runs in 2.9s from Scite on my old rig. EDIT: of course it's possible to split each data line into as many columns as required, making the table(s) easier to lookup later, but this required more typing and time than I can spend right now. Edited April 5, 2022 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
pixelsearch Posted April 5, 2022 Share Posted April 5, 2022 (edited) SQLite... I really should try it at least once as it looks powerful. @Nine: meanwhile, just curious about the array method, I have an idea but untested. We know that _ArrayBinarySearch is damn fast. It should retrieve an element in a sorted array with max 17 tests for a 98.304 rows array, as shown with this little empiric example, which assumes that we need maximum 2 tests for a 3 rows array (2 tests when the searched element isn't found within the array bounds, just tested). Local $i = 3, $j = 0 ConsoleWrite("2 3" & @crlf) For $j = 3 To 17 $i *= 2 ConsoleWrite($j & " " & $i & @crlf) Next 2 3 3 6 4 12 5 24 6 48 7 96 8 192 9 384 10 768 11 1536 12 3072 13 6144 14 12288 15 24576 16 49152 17 98304 So why not trying to grab the short code from _ArrayBinarySearch (1D part) adapt it and place it in your script ? It should require to create a variable corresponding to what is searched (the "6 | thing") because natively _ArrayBinarySearch searches on the full content of a row in a chosen column (no partial search). This could be named "a prefixed partial search adapted to _ArrayBinarySearch" Maybe it could be faster than the slower _ArraySearch (though you already made it faster with your code $start = $res + 1) Just my 2cts, if you think it's worth a try... Edited April 5, 2022 by pixelsearch typo Link to comment Share on other sites More sharing options...
Nine Posted April 5, 2022 Share Posted April 5, 2022 Not a bad idea @pixelsearch. But unfortunately, it is slower because binary search is always searching thru the whole array. With the fact that both input files are sorted, ArraySearch find the value more rapidly (it only needs to read a few records) since there is a very high level of matches. Nevertheless, your 2 cents enlightened me for another way to perform the search even faster. expandcollapse popup#include <Array.au3> #include <File.au3> Local $hTimer = TimerInit() Local $aDB = FileReadToArray("OrigDbSort.txt") Local $aNew = FileReadToArray("newLinesSort.txt") Local $aAdd[10000] Local $part, $res, $start = 0, $ind = 0 For $i = 0 to UBound($aNew) - 1 $part = GetPart($aNew[$i]) $res = FastSearch($aDB, $part, $start) ;$res = _ArraySearch($aDB, $part, $start, 0, 1, 1) If $res >= 0 Then $aDB[$res] = $aNew[$i] $start = $res + 1 Else $aAdd[$ind] = $aNew[$i] $ind += 1 EndIf Next ConsoleWrite($ind & @CRLF) ;Local $hFile = FileOpen("TempDB.txt", $FO_OVERWRITE) ;_FileWriteFromArray($hFile, $aDB) ;_FileWriteFromArray($hFile, $aAdd, 0, $ind) ;FileClose($hFile) ;RunWait(@ComSpec & " /c sort TempDB.txt /o NewDB.txt", "", @SW_HIDE) ;FileDelete("TempDB.txt") ConsoleWrite(TimerDiff($hTimer) & @CRLF) Func GetPart($string) Return StringMid($string, 1 , Stringinstr($string, "|", 1,6)) EndFunc Func FastSearch(ByRef $array, $str, $begin) Local $txt For $i = $begin to UBound($array) - 1 $txt = GetPart($array[$i]) If $txt == $part Then Return $i If $txt > $part then return -1 Next Return -1 EndFunc Now I am under 1 second !!! To make sure the search was correctly done, I compared the number of new records added with both methods. And they are identical. pixelsearch 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
mikell Posted April 5, 2022 Share Posted April 5, 2022 (edited) Another try using Scripting.Dictionary Still trying to avoid arrays $t = TimerInit() $txt1 = StringStripWS(FileRead("origDB.txt"), 3) $txt2 = StringStripWS(FileRead("newLines.txt"), 3) $txt = $txt1 & @crlf & $txt2 $a = StringSplit($txt, @crlf, 3) Local $sda = ObjCreate("Scripting.Dictionary") For $i = 0 to UBound($a)-1 $tmp1 = StringMid($a[$i], 1 , Stringinstr($a[$i], "|", 1,6)) $tmp2 = StringReplace($a[$i], $tmp1, "") $sda.item($tmp1) = $tmp2 Next $result = _list($sda) FileWrite("result.txt", $result) Msgbox(0,"", Round(TimerDiff($t)/1000, 2)) Func _list($dic) Local $res = "" Local $keys = $dic.Keys For $i = 0 To $dic.Count-1 $k = $keys[$i] $res &= $k & $dic.Item($k) & @crlf Next Return $res EndFunc Edit Using StringSplit, StringMid and a litlle change in func _list... 25% faster Edited April 5, 2022 by mikell Link to comment Share on other sites More sharing options...
frank10 Posted April 5, 2022 Author Share Posted April 5, 2022 (edited) Thank you ALL, you're great! I tested all your methods (but SQLite, I don't remember well how to use, I must refresh memory... it does not save correctly...) The time results of all methods are: Load + 2xSort: 4407.1842 -- arrSearch added:5555 arrSearch: 3577.0698 (same time with arr[10000] or unique string to send to _ArrayAdd) Load + 2xSort: 4413.0143 -- getPart added:5555 GetPart: 1333.3216 Load + 2xSort: 4441.1356 Scripting: 1884.2522 Load + 2xSort: 4447.0858 SQLlite: 396.9025 (but not working till now...) arrSearch method does not change between creating arr[1000] or putting values on a var and then making one only ArrayAdd, so I prefer the latter because i can't know how many rows I must create in the array and so I should also delete them at the end. and the winner is..... getPart of Nine!! followed by Scripting by mikell (even if I don't understand how it works, ehehe, never used it ). Also to note, data is NOT sorted... so it must be keep into account to add this process... I'll update when I'll got SQLite to work... (I posted my adapted SQL code below, if you see some err... maybe the dir path, maybe some "\" problem?) but anyway I think I will continue to use arr/txt, looking at the amazing speed results obtained... I elaborate more my path: 1) I have my origDB in txt. 2) I have some xlsx from net with updated lines to substitute/add, so I use Excel UDF to get my autoit 2d arr 3) After updating the txt DB I must send data with WebDriver to browser Chrome to elaborate in JS with other things... I discovered that sending data from Autoit to JS with WD with huge arr is sloooow (also the process to convert them to JSON = impossible ), so the trick was sending txt data and convert them in JSON with FASSST JS txt conversion. So I need txt file at the end of this process, not arr. That's also the reason I save the arr (1D or 2D) and load them back in txt with FileRead, because converting them onthefly with ArrayToString is, you know again, sssloooow!! SO, the process to me seems to be: Excel -> arr2D -> save -> load 1D -> sort -> getPart -> saveArr (or Scripting with final txt BUT NOT sorted... anyway I could fast sort it in JS ... ) -> loadTXT -> send to JS my code test: expandcollapse popup#include <File.au3> #include <Array.au3> #include <SQLite.au3> Global $tLoop global $myDir = "C:\my path..." Global $sFileDB = $myDir & "\origDB.txt" Global $sFileNew = $myDir & "\newLines.txt" _loadAndSort() _arrSearch(1) _loadAndSort() _getPart() _loadAndSort() _Scripting() _loadAndSort() Global $sSQliteDll = _SQLite_Startup() If @error Then ConsoleWrite("SQlite err can't loaded startup" & @error) EndIf _sqlDB() Func _Scripting() $tLoop = TimerInit() $txt1 = StringStripWS(FileRead($sFileDB), 3) $txt2 = StringStripWS(FileRead($sFileNew), 3) $txt = $txt1 & @crlf & $txt2 $a = StringRegExp($txt, '\N+', 3) Local $sda = ObjCreate("Scripting.Dictionary") For $i = 0 to UBound($a)-1 $tmp1 = StringRegExpReplace($a[$i], '^(.*?\|){6}\K.*', "") $tmp2 = StringReplace($a[$i], $tmp1, "") $sda.item($tmp1) = $tmp2 Next $result = _list($sda) ; txt to send JS! ; BUT NOT SORTED!! ConsoleWrite("Scripting: " & TimerDiff($tLoop) & @CRLF) FileWrite( $myDir & "\" & "scriptingResult.txt", $result) EndFunc Func _list($dic) Local $res = "" Local $keys = $dic.Keys Local $items = $dic.Items Local $count = $dic.Count For $i = 0 To $count-1 $res &= $keys[$i] & $items[$i] & @crlf Next Return $res EndFunc Func _sqlDB() $tLoop = TimerInit() Local $s = "Part1,Part2" & @CRLF & FileRead( $sFileDB ) $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2") Local $h = FileOpen($myDir & "\" & "origdb1.txt", $FO_OVERWRITE) FileWrite($h, $s) FileClose($h) $s = "Part1,Part2" & @CRLF & FileRead( $sFileNew ) $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2") $h = FileOpen($myDir & "\" & "newlines1.txt", $FO_OVERWRITE) FileWrite($h, $s) FileClose($h) _SQLite_SQLiteExe("Merging.sq3", _ "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _ "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _ ".import " & $myDir & "\" & "origdb1.txt Origin --csv" & @CRLF & _ ".import " & $myDir & "\" & "newlines1.txt Extras --csv" & @CRLF & _ "insert or replace into origin select * from extras;" & @CRLF & _ ".output " & $myDir & "\" & "sqlResult.txt" & @CRLF & _ "select part1 || part2 as "" from Origin;" & @CRLF & _ ".quit", $s) local $sFinal = FileRead( $myDir & "\" & "sqlResult.txt") ConsoleWrite("SQLlite: " & TimerDiff($tLoop) & @CRLF) EndFunc Func _loadAndSort() $tLoop = TimerInit() _FileReadToArray( $myDir & "\" & $aMarkets[0][0] & "_ADStarget_all.txt", $aDB) _arrayDelete($aDB, 0) _arraysort($aDB, 0,Default,Default,0) _FileReadToArray( $myDir & "\" & $aMarkets[0][0] & "_TestToRead.txt", $aNew) _arrayDelete($aNew, 0) _arraysort($aNew, 0,Default,Default,0) ConsoleWrite("Load + 2xSort: " & TimerDiff($tLoop) & @CRLF) EndFunc func _getPart() $tLoop = TimerInit() Local $aAdd[10000] Local $part, $res, $start = 0, $ind = 0 For $i = 0 to UBound($aNew) - 1 $part = GetPart($aNew[$i]) $res = FastSearch($aDB, $part, $start) ;$res = _ArraySearch($aDB, $part, $start, 0, 1, 1) If $res >= 0 Then $aDB[$res] = $aNew[$i] $start = $res + 1 Else $aAdd[$ind] = $aNew[$i] $ind += 1 EndIf Next ConsoleWrite("-- getPart added:" & $ind & @CRLF) _arrayAdd($aDB, $aAdd) Local $hFile = FileOpen($myDir & "\" & "getPartFinal.txt", $FO_OVERWRITE) _FileWriteFromArray($hFile, $aDB) FileRead( $myDir & "\" & "getPartFinal.txt") ConsoleWrite("GetPart: " & TimerDiff($tLoop) & @CRLF) EndFunc Func GetPart($string) Return StringMid($string, 1 , Stringinstr($string, "|", 1,6)) EndFunc Func FastSearch(ByRef $array, $str, $begin) Local $txt For $i = $begin to UBound($array) - 1 $txt = GetPart($array[$i]) If $txt == $str Then Return $i If $txt > $str then return -1 Next Return -1 EndFunc Func _arrSearch($type = 1) $tLoop = TimerInit() Local $aAdd[10000] Local $part, $res, $start = 0, $ind = 0, $sAdd = '' For $i = 0 to UBound($aNew) - 1 $part = StringMid($aNew[$i], 1 , Stringinstr($aNew[$i], "|", 1,6)) $res = _ArraySearch ($aDB, $part, $start, 0, 1, 1) If $res >= 0 Then $aDB[$res] = $aNew[$i] $start = $res + 1 Else if $type = 1 then $aAdd[$ind] = $aNew[$i] Else $sAdd &= $aNew[$i] & @CRLF EndIf $ind += 1 EndIf Next ConsoleWrite("-- arrSearch added:" & $ind & @CRLF) if $type = 1 then _ArrayAdd($aDB, $aAdd) Else _ArrayAdd($aDB,$sAdd,0,@crlf) EndIf Local $hFile = FileOpen($myDir & "\" & "arraySearchFinal.txt", $FO_OVERWRITE) _FileWriteFromArray($hFile, $aDB) FileRead( $myDir & "\" & "arraySearchFinal.txt") ConsoleWrite("arrSearch: " & TimerDiff($tLoop) & @CRLF) EndFunc See you tomorrow Thanks for now! Edited April 5, 2022 by frank10 Link to comment Share on other sites More sharing options...
jchd Posted April 5, 2022 Share Posted April 5, 2022 5 minutes ago, frank10 said: I'll update when I'll got SQLite to work... (I posted my adapted SQL code below, if you see some err... maybe the dir path, maybe some "\" problem?) Add some error checking. Maybe you don't have sqlite3.exe installed. Download a recent version from sqlite.org. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
frank10 Posted April 6, 2022 Author Share Posted April 6, 2022 (edited) 15 hours ago, jchd said: Add some error checking. Maybe you don't have sqlite3.exe installed. Download a recent version from sqlite.org. The problem is just that... I put sqlite3.exe in @scriptdir and also in system32 but it continues to not find it... But files are there...: Local $files = _FileListToArray ( @scriptdir , "sqlite*", 1) Consolewrite(_ArrayToString($files, @crlf)) ->12:02:02 AU3Check ended.rc:1 >Running:(3.3.14.5):C:\Program Files (x86)\AutoIt3\autoit3.exe "c:\AutoitProgs\TEST__autoitScripts___.au3" /errorstdout +>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop. Load + 2xSort: 5664.183 3 sqlite3.def sqlite3.dll sqlite3.exe +>12:02:14 AutoIt3.exe ended.rc:0 +>12:02:14 AutoIt3Wrapper Finished. Process exited with code 0 Edited April 6, 2022 by frank10 Link to comment Share on other sites More sharing options...
Developers Jos Posted April 6, 2022 Developers Share Posted April 6, 2022 48 minutes ago, frank10 said: I put sqlite3.exe in @scriptdir and also in system32 Maybe you downloaded the x64 version? SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now