Jump to content

very slow search on long array


Recommended Posts

I have a stored txt file with these line structure:

20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49|
20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0|

there are about 80000 lines.... so it's huge

I need to add some more data in the same format, but sometimes there are some lines with similar content that needs to be updated without duplicating:

i.e

20211231000000|gasCOM|USD|gas 1||BROAD|2142|20|0.033|0.59|11.79|||0|0|0|0|0|

that need to substitute the old line. To detect if a line must be updated I need to check the text upto the 6° "|", so: "20211231000000|gasCOM|USD|gas 1||BROAD|"

 

I made this, loading data all in one column:
 

local $aDB, $aNew
        _FileReadToArray(  $reportsADSDir & "\" & "originalData.txt", $aDB) 

        _FileReadToArray(  $reportsADSDir & "\" & "newData.txt", $aNew) 

        local $tLoop = Timerinit()
        for $i=0 to Ubound($aNew)-1

            if TimerDiff($tLoop)/1000 > 15 then  ; in 15seconds ONLY 382 new lines done !!
                consolewrite( $i & @CRLF)
                ExitLoop
            EndIf

            local $sLine = $aNew[$i]
            local $sNew = StringMid($sLine, 1 , Stringinstr($sLine, "|", 0,6) )

            local $res = _ArrayBinarySearch ($aDB, $sNew,0,0,0)   
            
            if $res > 0 then 
                $aDB[$res] = $sLine
            Else
                _arrayAdd($aDB,$sLine,0,"§")
            EndIf
        Next
         consolewrite( "tdiff:" & TimerDiff($tLoop)/1000 & @CRLF)

        _FileWriteFromArray( $reportsADSDir & "\" & "_NEW.txt" , $aDB)

It's deadly SLOW..:!!
it elaborates only about 380 new lines of data (out of 25k...) in 15 seconds! Unusable...

Any ideas as to make this search a lot faster?

Link to comment
Share on other sites

How about using SQLite with a unique index consisting of the first six variables ?  Once loaded you could reuse the DB for the next update by modifying the DB with the new data, and generate a new txt file.  I am sure it would be lot faster...Even if you need to load the data each time, it would be faster.

Edited by Nine
Link to comment
Share on other sites

Ok, if there are no tricks to speedup the process with array, I'll go with SQlite...
I tried to make a similar process in a browser with JS with array: they are blazingly FAST there... it's autoit with long array that is slow...

Edited by frank10
Link to comment
Share on other sites

From what I am seeing, your code should not even work.  You would need to use partial search with _ArraySearch :

#include <Array.au3>

Local $aDB = [ _
"20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49|", _
"20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0|" ]

Local $str = "20211231000000|gasCOM|USD|gas 1||BROAD|2142|20|0.033|0.59|11.79|||0|0|0|0|0|"

Local $sNew = StringMid($str, 1 , Stringinstr($str, "|", 1,6))
ConsoleWrite($sNew & @CRLF)

Local $res = _ArraySearch ($aDB, $sNew, 0, 0, 1, 1)
ConsoleWrite($res & @CRLF) ; <<<<<<<<<<<<< working

$res = _ArrayBinarySearch ($aDB, $sNew)
ConsoleWrite($res & "/" & @error & @CRLF) ; <<<<<<<<<<<<< NOT working

By making both StringInStr and _ArraySearch case sensitive, it may speed up your script, but I am quite sure, it won't be enough

If the original data is perfectly sorted for the first 6 fields and the new data is also perfectly sorted on the first 6 fields, you can use the previous successful find as the starting point for the next search.  I will get faster and faster along the way...

Edited by Nine
Link to comment
Share on other sites

4 hours ago, frank10 said:

To detect if a line must be updated I need to check the text upto the 6° "|"

This could be done easily using regex (text treatment, no array used)
Please try this for the concept (raw code - without error checking !!)

; create 80k lines txt file
#cs
$s = "20211231000000|airCOM|USD|PB air 3 AmSugg||PHRASE|125907|549|0.016|0.59|420.85|||0|0|0|0|49|"
$old_line = "20211231000000|gasCOM|USD|gas 1||BROAD|20042|20|0.033|0.59|11.79|||0|0|0|0|0|"
Local $txt
For $i = 0 to 80000
    $txt &= $s & @crlf
Next
$txt &= $old_line & @crlf
FileWrite(@desktopdir & "\test.txt", $txt)
#ce

; get from new line the part to be detected - upto the 6° "|"
$new_line = "20211231000000|gasCOM|USD|gas 1||BROAD|88888|20|0.033|0.59|11.79|||0|0|0|0|0|"
$part = StringRegExpReplace($new_line, '^(.*?\|){6}\K.*', "")
Msgbox(0,"", $part)

; get the corresponding line from txt
$txt = FileRead(@desktopdir & "\test.txt")
$checked_line = StringRegExp($txt, '\Q' & $part & '\E.*', 1)[0]
Msgbox(0,"", $checked_line)

; replace it and write new file
$txt2 = StringReplace($txt, $checked_line, $new_line)
FileWrite(@desktopdir & "\test_new.txt", $txt2)

 

Link to comment
Share on other sites

@mikell I believe from his op he needs to perform 25k times the same search and replace/add on a 80k lines txt file !  How much time does it takes for a single search and replace/add ?  

@frank10 Could you provide both files for us to see if there is an easy solution without SQLite ?

Edited by Nine
Link to comment
Share on other sites

Now I must go, tomorrow I will post the files, but they are huge... about 10MB total...

 

@mikell Yes, I thought testing strings vs array, I will try!

 

For now, I tested your suggestion about starting from the last search and it was a very GOOD trick to speed up ArraySearch! BUT overall timing unusable sadly. The main problem is adding new rows to the array: too much slow _ArrayAdd !
These are logs with e w/out lastFound to start Search (notice how slows down when adding elements with ArrayAdd...):

1) Only the first 1000 new lines analyzed WITHOUT lastFound:

line:0  Ubound:77756 added:0 lastFound:0 sec:0.117197
line:1000  Ubound:77756 added:0 lastFound:58602 sec:112.7096294

2) This instead, very good with lastFound:

$res = _ArraySearch ($aDB, $sNew, $lastFound, 0, 1, 1)
if $res > 0 then 
    $lastFound = $res
    ........                
                
line:0  Ubound:77756 added:0 lastFound:0 sec:0.1088671
line:1000  Ubound:77756 added:0 lastFound:58602 sec:0.1388141
line:2000  Ubound:77756 added:0 lastFound:59602 sec:0.1686444
line:3000  Ubound:77756 added:0 lastFound:60602 sec:0.2022854
line:4000  Ubound:77756 added:0 lastFound:61602 sec:0.2330433
line:5000  Ubound:77756 added:0 lastFound:62602 sec:0.2650052
line:6000  Ubound:77756 added:0 lastFound:63602 sec:0.2943784
line:7000  Ubound:77756 added:0 lastFound:64602 sec:0.3224296
line:8000  Ubound:77756 added:0 lastFound:65602 sec:0.3501796
line:9000  Ubound:77756 added:0 lastFound:66602 sec:0.3837926
line:10000  Ubound:77756 added:0 lastFound:67602 sec:0.4179998
line:11000  Ubound:77756 added:0 lastFound:68602 sec:0.4474786
line:12000  Ubound:77756 added:0 lastFound:69602 sec:0.4776257
line:13000  Ubound:77756 added:0 lastFound:70602 sec:0.5143441
line:14000  Ubound:77756 added:0 lastFound:71602 sec:0.5455406
line:15000  Ubound:77869 added:113 lastFound:72489 sec:5.950659
line:16000  Ubound:77990 added:121 lastFound:73368 sec:11.5094973
line:17000  Ubound:77990 added:0 lastFound:74368 sec:11.5393217
line:18000  Ubound:77990 added:0 lastFound:75368 sec:11.5747518
line:19000  Ubound:77990 added:0 lastFound:76368 sec:11.6063726
line:20000  Ubound:77990 added:0 lastFound:77368 sec:11.6438742
line:21000  Ubound:78604 added:614 lastFound:77754 sec:34.7205401
line:22000  Ubound:79604 added:1000 lastFound:77754 sec:73.8953093
line:23000  Ubound:80604 added:1000 lastFound:77754 sec:115.3478616
line:24000  Ubound:81604 added:1000 lastFound:77754 sec:160.3937781
line:25000  Ubound:82604 added:1000 lastFound:77754 sec:208.6512129
25708 tdiff:244.3590916

But sadly 244 seconds to finish...!!

So, _ArrayAdd is slowing things down.
 

 

Link to comment
Share on other sites

5 minutes ago, frank10 said:

I tested your suggestion about starting from the last search and it was a very GOOD trick to speed up ArraySearch

Glad you tested my idea. 

But if you discard the _ArrayAdd, how much time does it take ?  In the case it is good enough without _ArrayAdd, there might a better solution. I believe there is a way to perform the addition without it...

Link to comment
Share on other sites

40 minutes ago, Nine said:

he needs to perform 25k times the same search

Hmm.. I misunderstood this tiny detail :D

To add/change 25k lines in one shot with a reasonable delay I can't think of anything else than SQLite indeed. Any String or Array way will be too slow...

Link to comment
Share on other sites

12 hours ago, Nine said:

Glad you tested my idea. 

But if you discard the _ArrayAdd, how much time does it take ?  In the case it is good enough without _ArrayAdd, there might a better solution. I believe there is a way to perform the addition without it...

Only the search without _ArrayAdd is fast!

I got it working putting the new lines all in a string and making one only _ArrayAdd at the end of the for loop!

Now I'm at 7.5sec for all , quite acceptable...

2x initial sort of both array sec:4.2354846
line:0  Ubound:77756 added:0 lastFound:0 sec:4.3428845
line:1000  Ubound:77756 added:0 lastFound:58602 sec:4.3722679
line:2000  Ubound:77756 added:0 lastFound:59602 sec:4.4027041
line:3000  Ubound:77756 added:0 lastFound:60602 sec:4.4331181
line:4000  Ubound:77756 added:0 lastFound:61602 sec:4.4637452
line:5000  Ubound:77756 added:0 lastFound:62602 sec:4.4933318
line:6000  Ubound:77756 added:0 lastFound:63602 sec:4.5226008
line:7000  Ubound:77756 added:0 lastFound:64602 sec:4.5527228
line:8000  Ubound:77756 added:0 lastFound:65602 sec:4.5823687
line:9000  Ubound:77756 added:0 lastFound:66602 sec:4.6140867
line:10000  Ubound:77756 added:0 lastFound:67602 sec:4.6438304
line:11000  Ubound:77756 added:0 lastFound:68602 sec:4.6738332
line:12000  Ubound:77756 added:0 lastFound:69602 sec:4.7021799
line:13000  Ubound:77756 added:0 lastFound:70602 sec:4.7331963
line:14000  Ubound:77756 added:0 lastFound:71602 sec:4.7626579
line:15000  Ubound:77756 added:113 lastFound:72489 sec:5.9211965
line:16000  Ubound:77756 added:121 lastFound:73368 sec:7.1149558
line:17000  Ubound:77756 added:0 lastFound:74368 sec:7.1440564
line:18000  Ubound:77756 added:0 lastFound:75368 sec:7.1778125
line:19000  Ubound:77756 added:0 lastFound:76368 sec:7.2093452
line:20000  Ubound:77756 added:0 lastFound:77368 sec:7.2420617
line:21000  Ubound:77756 added:614 lastFound:77754 sec:7.2783153
line:22000  Ubound:77756 added:1000 lastFound:77754 sec:7.3140007
line:23000  Ubound:77756 added:1000 lastFound:77754 sec:7.3461758
line:24000  Ubound:77756 added:1000 lastFound:77754 sec:7.3777684
line:25000  Ubound:77756 added:1000 lastFound:77754 sec:7.4103058
25708 sec:7.5072707

To be more precise, the most of new lines that match the first part of the line, are perfectly identical to those existing in the original DB (so they would not need to be re-copied over). Only some of them has slightly variation in the second part of the line, that's the reason I check the first part to see all the identical in the first part, then I copy all, even if not necessary just because it's fast overwriting...

Just in case you could have some ideas maybe using _arrayUnique first

 

Anyway, if you want to try something still more efficient (faster is always better :)  ), I upload the files zipped

testFiles.zip

Edited by frank10
Link to comment
Share on other sites

5.8 secs with :

#include <Array.au3>
#include <File.au3>

Local $hTimer = TimerInit()
Local $aDB = FileReadToArray("OrigDbSort.txt")
Local $aNew = FileReadToArray("newLinesSort.txt")
Local $aAdd[10000]

Local $part, $res, $start = 0, $ind = 0

For $i = 0 to UBound($aNew) - 1
  $part = StringMid($aNew[$i], 1 , Stringinstr($aNew[$i], "|", 1,6))
  $res = _ArraySearch ($aDB, $part, $start, 0, 1, 1)
  If $res >= 0 Then
    $aDB[$res] = $aNew[$i]
    $start = $res + 1
  Else
    $aAdd[$ind] = $aNew[$i]
    $ind += 1
  EndIf
Next

Local $hFile = FileOpen("TempDB.txt", $FO_OVERWRITE)
_FileWriteFromArray($hFile, $aDB)
_FileWriteFromArray($hFile, $aAdd, 0, $ind)
FileClose($hFile)
RunWait(@ComSpec & " /c sort TempDB.txt /o NewDB.txt", "", @SW_HIDE)
FileDelete("TempDB.txt")
ConsoleWrite(TimerDiff($hTimer) & @CRLF)

I sorted the input files previously to make sure...

Link to comment
Share on other sites

Hey,

Just a little guess, you could use 

Didn't tested this UDF, but i bet it will be faster than using regexp, ArraySearch, etc

As it is based on SQLite, you can update your file with SQL requests (ie using _CSV_Exec($csv_handle, $csv_query))

Edited by 636C65616E
Link to comment
Share on other sites

Using SQLite (almost directly): first make input files look like .CSV by inserting a comma after the 6th bar, then use the SQLite CLI to import both files as tables and merge them; finally output the result as a text file.

Local $s = "Part1,Part2" & @CRLF & FileRead("origdb.txt")
$s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
Local $h = FileOpen("origdb1.txt", $FO_OVERWRITE)
FileWrite($h, $s)
FileClose($h)

$s = "Part1,Part2" & @CRLF & FileRead("newlines.txt")
$s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
$h = FileOpen("newlines1.txt", $FO_OVERWRITE)
FileWrite($h, $s)
FileClose($h)

_SQLite_SQLiteExe("Merging.sq3", _
    "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
    "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _
    ".import origdb1.txt Origin --csv" & @CRLF & _
    ".import newlines1.txt Extras --csv" & @CRLF & _
    "insert or replace into origin select * from extras;" & @CRLF & _
    ".output Result.txt" & @CRLF & _
    "select part1 || part2 as "" from Origin;" & @CRLF & _
    ".quit", $s)

This runs in 2.9s from Scite on my old rig.

EDIT: of course it's possible to split each data line into as many columns as required, making the table(s) easier to lookup later, but this required more typing and time than I can spend right now.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

SQLite... I really should try it at least once as it looks powerful.

@Nine: meanwhile, just curious about the array method, I have an idea but untested.

We know that _ArrayBinarySearch is damn fast. It should retrieve an element in a sorted array with max 17 tests for a 98.304 rows array, as shown with this little empiric example, which assumes that we need maximum 2 tests for a 3 rows array (2 tests when the searched element isn't found within the array bounds, just tested).

Local $i = 3, $j = 0
ConsoleWrite("2   3" & @crlf)

For $j = 3 To 17
    $i *= 2
    ConsoleWrite($j & "   " & $i & @crlf)
Next
2   3
3   6
4   12
5   24
6   48
7   96
8   192
9   384
10   768
11   1536
12   3072
13   6144
14   12288
15   24576
16   49152
17   98304

So why not trying to grab the short code from _ArrayBinarySearch (1D part) adapt it and place it in your script  ?

It should require to create a variable corresponding to what is searched (the "6 | thing") because natively _ArrayBinarySearch searches on the full content of a row in a chosen column (no partial search). This could be named "a prefixed partial search adapted to _ArrayBinarySearch"

Maybe it could be faster than the slower _ArraySearch (though you already made it faster with your code $start = $res + 1)

Just my 2cts, if you think it's worth a try...

Edited by pixelsearch
typo
Link to comment
Share on other sites

Not a bad idea @pixelsearch.  But unfortunately, it is slower because binary search is always searching thru the whole array. 

With the fact that both input files are sorted, ArraySearch find the value more rapidly (it only needs to read a few records) since there is a very high level of matches.

Nevertheless,  your 2 cents enlightened me for another way to perform the search even faster.

#include <Array.au3>
#include <File.au3>

Local $hTimer = TimerInit()
Local $aDB = FileReadToArray("OrigDbSort.txt")
Local $aNew = FileReadToArray("newLinesSort.txt")
Local $aAdd[10000]

Local $part, $res, $start = 0, $ind = 0

For $i = 0 to UBound($aNew) - 1
  $part = GetPart($aNew[$i])
  $res = FastSearch($aDB, $part, $start)
  ;$res = _ArraySearch($aDB, $part, $start, 0, 1, 1)
  If $res >= 0 Then
    $aDB[$res] = $aNew[$i]
    $start = $res + 1
  Else
    $aAdd[$ind] = $aNew[$i]
    $ind += 1
  EndIf
Next

ConsoleWrite($ind & @CRLF)

;Local $hFile = FileOpen("TempDB.txt", $FO_OVERWRITE)
;_FileWriteFromArray($hFile, $aDB)
;_FileWriteFromArray($hFile, $aAdd, 0, $ind)
;FileClose($hFile)
;RunWait(@ComSpec & " /c sort TempDB.txt /o NewDB.txt", "", @SW_HIDE)
;FileDelete("TempDB.txt")
ConsoleWrite(TimerDiff($hTimer) & @CRLF)

Func GetPart($string)
  Return StringMid($string, 1 , Stringinstr($string, "|", 1,6))
EndFunc

Func FastSearch(ByRef $array, $str, $begin)
  Local $txt
  For $i = $begin to UBound($array) - 1
    $txt = GetPart($array[$i])
    If $txt == $part Then Return $i
    If $txt > $part then return -1
  Next
  Return -1
EndFunc

Now I am under 1 second !!!  To make sure the search was correctly done, I compared the number of new records added with both methods. And they are identical.

:)

Link to comment
Share on other sites

Another try using Scripting.Dictionary :)
Still trying to avoid arrays :P

$t = TimerInit()

$txt1 = StringStripWS(FileRead("origDB.txt"), 3)
$txt2 = StringStripWS(FileRead("newLines.txt"), 3)
$txt = $txt1 & @crlf & $txt2

$a = StringSplit($txt, @crlf, 3)
Local $sda = ObjCreate("Scripting.Dictionary")
For $i = 0 to UBound($a)-1
    $tmp1 = StringMid($a[$i], 1 , Stringinstr($a[$i], "|", 1,6))
    $tmp2 = StringReplace($a[$i], $tmp1, "")
    $sda.item($tmp1) = $tmp2
Next

$result = _list($sda)
FileWrite("result.txt", $result)

Msgbox(0,"", Round(TimerDiff($t)/1000, 2))

Func _list($dic)
    Local $res = ""
    Local $keys = $dic.Keys
    For $i = 0 To $dic.Count-1
        $k = $keys[$i]
        $res &= $k & $dic.Item($k) & @crlf
    Next
    Return $res
EndFunc

 

Edit
Using StringSplit, StringMid and a litlle change in func _list... 25% faster :D

Edited by mikell
Link to comment
Share on other sites

Thank you ALL, you're great!

I tested all your methods (but SQLite, I don't remember well how to use, I must refresh memory... it does not save correctly...)

The time results of all methods are:

Load + 2xSort: 4407.1842
-- arrSearch added:5555
arrSearch: 3577.0698  (same time with arr[10000] or unique string to send to _ArrayAdd)

Load + 2xSort: 4413.0143
-- getPart added:5555
GetPart: 1333.3216

Load + 2xSort: 4441.1356
Scripting: 1884.2522

Load + 2xSort: 4447.0858
SQLlite: 396.9025 (but not working till now...)

arrSearch method does not change between creating arr[1000] or putting values on a var and then making one only ArrayAdd, so I prefer the latter because i can't know how many rows I must create in the array and so I should also delete them at the end.

 

and the winner is.....

getPart of Nine!!

followed by Scripting by mikell (even if I don't understand how it works, ehehe, never used it ). Also to note, data is NOT sorted... so it must be keep into account to add this process...
 

I'll update when I'll got SQLite to work... (I posted my adapted SQL code below, if you see some err... maybe the dir path, maybe some "\" problem?)

but anyway I think I will continue to use arr/txt, looking at the amazing speed results obtained...

 

 

I elaborate more my path:

1) I have my origDB in txt.

2) I have some xlsx from net with updated lines to substitute/add, so I use Excel UDF to get my autoit 2d arr

3) After updating the txt DB I must send data with WebDriver to browser Chrome to elaborate in JS with other things...

I discovered that sending data from Autoit to JS with WD with huge arr is sloooow (also the process to convert them to JSON = impossible ), so the trick was sending txt data and convert them in JSON with FASSST JS txt conversion. So I need txt file at the end of this process, not arr.

That's also the reason I save the arr (1D or 2D) and load them back in txt with FileRead, because converting them onthefly with ArrayToString is, you know again, sssloooow!!

SO,

the process to me seems to be:

Excel -> arr2D -> save -> load 1D -> sort -> getPart -> saveArr (or Scripting with final txt BUT NOT sorted... anyway I could fast sort it in JS ... ) -> loadTXT -> send to JS 

 

 

my code test:

#include <File.au3>
#include <Array.au3>
#include <SQLite.au3>

Global $tLoop
global $myDir = "C:\my path..."
Global $sFileDB  =   $myDir & "\origDB.txt"
Global $sFileNew =   $myDir & "\newLines.txt"


_loadAndSort()
_arrSearch(1)

_loadAndSort()
_getPart()

 _loadAndSort()
_Scripting()

_loadAndSort()
Global $sSQliteDll = _SQLite_Startup()
If @error Then
    ConsoleWrite("SQlite err can't loaded startup" & @error)
EndIf
_sqlDB()


Func _Scripting()
    $tLoop = TimerInit()

    $txt1 = StringStripWS(FileRead($sFileDB), 3)
    $txt2 = StringStripWS(FileRead($sFileNew), 3)
    $txt = $txt1 & @crlf & $txt2

    $a = StringRegExp($txt, '\N+', 3)
    Local $sda = ObjCreate("Scripting.Dictionary")
    For $i = 0 to UBound($a)-1
        $tmp1 = StringRegExpReplace($a[$i], '^(.*?\|){6}\K.*', "")
        $tmp2 = StringReplace($a[$i], $tmp1, "")
        $sda.item($tmp1) = $tmp2
    Next

    $result = _list($sda) ; txt to send JS!    ; BUT NOT SORTED!!

    ConsoleWrite("Scripting: " & TimerDiff($tLoop) & @CRLF)
    FileWrite( $myDir & "\" & "scriptingResult.txt", $result)

EndFunc
   Func _list($dic)
        Local $res = ""
        Local $keys = $dic.Keys
        Local $items = $dic.Items
        Local $count = $dic.Count
        For $i = 0 To $count-1
            $res &= $keys[$i] & $items[$i] & @crlf
        Next
        Return $res
    EndFunc


Func _sqlDB()

    $tLoop = TimerInit()

    Local $s = "Part1,Part2" & @CRLF & FileRead( $sFileDB )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    Local $h = FileOpen($myDir & "\" & "origdb1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)

    $s = "Part1,Part2" & @CRLF & FileRead( $sFileNew )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    $h = FileOpen($myDir & "\" & "newlines1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)

    _SQLite_SQLiteExe("Merging.sq3", _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _
        ".import " & $myDir & "\" & "origdb1.txt Origin --csv" & @CRLF & _
        ".import " & $myDir & "\" & "newlines1.txt Extras --csv" & @CRLF & _
        "insert or replace into origin select * from extras;" & @CRLF & _
        ".output " & $myDir & "\" & "sqlResult.txt" & @CRLF & _
        "select part1 || part2 as "" from Origin;" & @CRLF & _
        ".quit", $s)

    local $sFinal = FileRead( $myDir & "\" & "sqlResult.txt")
    ConsoleWrite("SQLlite: " & TimerDiff($tLoop) & @CRLF)

EndFunc


Func _loadAndSort()
    $tLoop = TimerInit()

    _FileReadToArray(  $myDir & "\" & $aMarkets[0][0] & "_ADStarget_all.txt", $aDB) 
    _arrayDelete($aDB, 0)
    _arraysort($aDB, 0,Default,Default,0)   
            
    _FileReadToArray(  $myDir & "\" & $aMarkets[0][0] & "_TestToRead.txt", $aNew) 
    _arrayDelete($aNew, 0)
    _arraysort($aNew, 0,Default,Default,0)

    ConsoleWrite("Load + 2xSort: " & TimerDiff($tLoop) & @CRLF)

EndFunc

func _getPart()
    $tLoop = TimerInit()

    Local $aAdd[10000]
    Local $part, $res, $start = 0, $ind = 0

    For $i = 0 to UBound($aNew) - 1
    $part = GetPart($aNew[$i])
    $res = FastSearch($aDB, $part, $start)                   
    ;$res = _ArraySearch($aDB, $part, $start, 0, 1, 1)
    If $res >= 0 Then
        $aDB[$res] = $aNew[$i]
        $start = $res + 1
    Else
        $aAdd[$ind] = $aNew[$i]
        $ind += 1
    EndIf
    Next

    ConsoleWrite("-- getPart added:" & $ind & @CRLF)

    _arrayAdd($aDB, $aAdd)
    Local $hFile = FileOpen($myDir & "\" & "getPartFinal.txt", $FO_OVERWRITE)
    _FileWriteFromArray($hFile, $aDB)
    FileRead( $myDir & "\" & "getPartFinal.txt")

    ConsoleWrite("GetPart: " & TimerDiff($tLoop) & @CRLF)

EndFunc
    Func GetPart($string)
        Return StringMid($string, 1 , Stringinstr($string, "|", 1,6))
    EndFunc

    Func FastSearch(ByRef $array, $str, $begin)
        Local $txt
        For $i = $begin to UBound($array) - 1
            $txt = GetPart($array[$i])
            If $txt == $str Then Return $i
            If $txt > $str then return -1
        Next
        Return -1
    EndFunc


Func _arrSearch($type = 1)
    $tLoop = TimerInit()

    Local $aAdd[10000]
    Local $part, $res, $start = 0, $ind = 0, $sAdd = ''

    For $i = 0 to UBound($aNew) - 1
        $part = StringMid($aNew[$i], 1 , Stringinstr($aNew[$i], "|", 1,6))
        $res = _ArraySearch ($aDB, $part, $start, 0, 1, 1)
        If $res >= 0 Then
            $aDB[$res] = $aNew[$i]
            $start = $res + 1
        Else
            if $type = 1 then 
                $aAdd[$ind] = $aNew[$i] 
            Else
                $sAdd &= $aNew[$i] & @CRLF
            EndIf
            $ind += 1
        EndIf
    Next

    ConsoleWrite("-- arrSearch added:" & $ind & @CRLF)

    if $type = 1 then 
        _ArrayAdd($aDB, $aAdd)              
    Else
        _ArrayAdd($aDB,$sAdd,0,@crlf)       
    EndIf

    Local $hFile = FileOpen($myDir & "\" & "arraySearchFinal.txt", $FO_OVERWRITE)
    _FileWriteFromArray($hFile, $aDB)
    FileRead( $myDir & "\" & "arraySearchFinal.txt")

    ConsoleWrite("arrSearch: " & TimerDiff($tLoop) & @CRLF)

EndFunc

 

See you tomorrow

Thanks for now!

 

 

 

 

 

 

 

 

 

 

 

 

Edited by frank10
Link to comment
Share on other sites

5 minutes ago, frank10 said:

I'll update when I'll got SQLite to work... (I posted my adapted SQL code below, if you see some err... maybe the dir path, maybe some "\" problem?)

Add some error checking. Maybe you don't have sqlite3.exe installed. Download a recent version from sqlite.org.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

15 hours ago, jchd said:

Add some error checking. Maybe you don't have sqlite3.exe installed. Download a recent version from sqlite.org.

The problem is just that...

I put sqlite3.exe in @scriptdir and also  in system32

but it continues to not find it...
But files are there...:

 
Local $files = _FileListToArray ( @scriptdir , "sqlite*", 1)
Consolewrite(_ArrayToString($files, @crlf))


->12:02:02 AU3Check ended.rc:1
>Running:(3.3.14.5):C:\Program Files (x86)\AutoIt3\autoit3.exe "c:\AutoitProgs\TEST__autoitScripts___.au3"  /errorstdout
+>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop.
Load + 2xSort: 5664.183
3
sqlite3.def
sqlite3.dll
sqlite3.exe
+>12:02:14 AutoIt3.exe ended.rc:0
+>12:02:14 AutoIt3Wrapper Finished.
Process exited with code 0

 

Edited by frank10
Link to comment
Share on other sites

  • Developers
48 minutes ago, frank10 said:

I put sqlite3.exe in @scriptdir and also  in system32

Maybe you downloaded the x64 version?

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...