AspirinJunkie Posted December 21, 2020 Share Posted December 21, 2020 (edited) I made some performance tests to get a rough indication of which data structure is most appropriate and when. For this, I measured the time needed for adding and reading out at different numbers of elements. These are the results (beware of the logarithmic scale for the axes): Especially the behavior of map is interesting: At small numbers of elements the map structure performs very well and scales quite linear with the number of elements. But then (here at 10,000 elements) there is some sort of cut and the results are getting bad. Scripting.dictionary also shows this behavior but only later a larger number of elements. Now for my question: I suspect, that the AutoIt-Map is internally implemented as hash-table and not with sorting trees. As far as I know the used hash-algorithm in a hash-table will reach the point where it start to produce a lot of collisions which leads to extra treatments. This would explain the showed behavior. But because there is a difference between AutoIt-Map, Scripting.Dictionary and System.Collections.HashTable i have to assume that these each use different hash algorithms. Does anybody know which exact hash algorithms are used internally for these? hashtable.au3 DataStructureComparison.zip Edited December 29, 2020 by AspirinJunkie added pure AutoIt hash-table Musashi, Aelc, FrancescoDiMuro and 1 other 4 Link to comment Share on other sites More sharing options...
JockoDundee Posted December 21, 2020 Share Posted December 21, 2020 Nice. It would be interesting to see where SQLite would place in the mix. Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 21, 2020 Author Share Posted December 21, 2020 Sure - i can add scenarios for sqlite. But not until wednesday (feel free to add them yourself and go through the tests). Anyway - if sqlite with AutoIt is used as a simple key-value datastructure i don't expect much impressive results. In my scenario i add and read every element separately. This means each time a _SQLite_Exec() resp. _SQLite_Query(). I expect, that the performance benefits from the optimized internal implementation of sqlite would eaten up again by the overhead for the wrapper functions. Since sqlite is based on b-trees for its indexes, the runtime behavior during readout should be quite similar to that of the "2D array binary search" (with an absolute offset). For building the data structure, I lack information on how this was implemented internally in sqlite. But since the index has to be built in parallel and sorted, I don't expect any miracles here either. Of course, this is mainly due to the unfavorable scenario for sqlite that elements are added and read individually, instead of bundled. Especially if you can use the ".import" function with _SQLite_SQLiteExe() then quiet nothing is as fast as this. But this is not my scenario here. But if you want to include external databases in the comparison, the really interesting candidates would be the special mem-cached key-value databases like redis. These can be expected to perform at a high level even with large amounts of data. Link to comment Share on other sites More sharing options...
JockoDundee Posted December 22, 2020 Share Posted December 22, 2020 4 hours ago, AspirinJunkie said: But if you want to include external databases in the comparison, the really interesting candidates would be the special mem-cached key-value databases like redis. I mention SQLite only because it is a good reference. Though, loading aside, at some number of rows, it might beat the colliding hashes, perhaps? Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
JockoDundee Posted December 22, 2020 Share Posted December 22, 2020 7 hours ago, AspirinJunkie said: Since sqlite is based on b-trees for its indexes, the runtime behavior during readout should be quite similar to that of the "2D array binary search" (with an absolute offset). Yes, I would agree. FWIW, the 2D array seems to be holding well at the end, whereas the Map looks like it’s gonna go asymptotic. Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 22, 2020 Author Share Posted December 22, 2020 8 hours ago, JockoDundee said: Though, loading aside, at some number of rows, it might beat the colliding hashes, perhaps? Yes of course. There are no collisions because it's index is a sorted (as a b-tree) structure with one element per index. As the number of elements increases, the number of iterations required to find a particular element also increases due to the levels in the tree. However, this is quite harmless and should still work quickly even with a really large number of elements. 5 hours ago, JockoDundee said: the 2D array seems to be holding well at the end, whereas the Map looks like it’s gonna go asymptotic. Really? Map looks also really stable for me. Hm - ok this is just my subjective view based on the optics. I did'nt made a statistical test for linearity (btw i made a statistic-udf to realize such tests 😅 ) To get deeper into the Functionality of hash-tables i wrote a hash-table udf in pure AutoIt as an exercise for myself. Here i learned, that you need a good balance between maximum number of collisions per index and as less as possible resizing when the number of elements increase. If the base-array is big enough, then a big number of collisions is unlikely. Because there are such a lot of optimization parameters - not only the used hash-function - i slowly begin to understand why the 3 hash-tables here perform so different. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted December 22, 2020 Moderators Share Posted December 22, 2020 AspirinJunkie, I went back looking at some old private threads and found that AutoIt Maps currently use the "djb2" hash function and a 256 entry hash table. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 22, 2020 Author Share Posted December 22, 2020 (edited) 13 hours ago, Melba23 said: AutoIt Maps currently use the "djb2" hash function Thank you - i think i will play a little bit with this. Funnily enough, I once stumbled across a collision generator for djb2 where the following statement was made about DJB2: "So never ever use this function in hash table." 😅 13 hours ago, Melba23 said: AutoIt Maps currently use [...] a 256 entry hash table. Fixed size? - no resizing? That would explain the determined behavior. I think linked lists are used for the sub-structures - right? Even with a perfect hash/index-function the result is, that when you get an element we have an average linear scan over 2 elements at size 1,000; 20 elements at size 10,000; 200 elements at size 100,000 and so on. So if you read every single element of such a hash-table with 1,000,000 elements you have to iterate over at least 2 billion elements... There are many reasons for this concrete implementation. But to know this special detail helps me a lot to understand why it behaves as we see.I suspect, that scripting.dictionary takes a quite similar approach - but with a bigger hash-table size. (as stated >>here<< scripting.dictionary shall do resizes) And system.collections.hashtable - puh - either the size is also fixed but significantly larger or they dynamically resize there. Edited December 23, 2020 by AspirinJunkie found one source, that stated that scripting.dictionary do resizes Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted December 22, 2020 Moderators Share Posted December 22, 2020 AspirinJunkie, I am not a Dev and so have no further knowledge of why or how this particular function or table size were chosen. They only came up in a much wider-ranging private conversation while discussing Maps. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
jchd Posted December 23, 2020 Share Posted December 23, 2020 It would be interesting to find out what F14 gives: https://engineering.fb.com/2019/04/25/developer-tools/f14/ FrancescoDiMuro 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 23, 2020 Author Share Posted December 23, 2020 (edited) @JockoDundee i added sqlite. The results are as expected because the scenario here is horrible for sqlite. In real life nobody would (at least I hope so) use single inserts and selects for such a amount of elements if it's possible to avoid. That's why i also added the results for one single insert for all elements just to show the huge difference, even if it doesn't match the actual scenario here. 12 hours ago, Melba23 said: I am not a Dev and so have no further knowledge of why or how this particular function or table size were chosen. I already realized that - but this forum is named "technical discussion" - so why not discuss even if we are not the devs. And your hint already gave me a much further understanding. I learned - that was my intention here - so many thanks to you! @jchd Oh that looks interesting. Especially the discussion of which parameters of a hash-table have what effect. The library is C++ - i don't think it is useful to write a wrapper for it and use it in AutoIt. But maybe i make some tests with it in C++ sometime. Edit: I said i wrote my own hash-table in pure AutoIt for a better understanding. I add this structure to the comparison and i am quite surprised how well this one does. It's even better as AutoIt-maps and scripting.dictionary (only reading) at 1,000,000 elements. Edited December 23, 2020 by AspirinJunkie FrancescoDiMuro 1 Link to comment Share on other sites More sharing options...
jchd Posted December 23, 2020 Share Posted December 23, 2020 Of course SQLite is slower than any other {key, value} library since its overhead for enforcing ACID properties is paramount. Bulk inserts can be made faster by using the following guidelines: Use a memory DB If not possible to use a Memory DB, enclose all inserts in one exclusive transaction Do not add indices and triggers to your target table, do that after all inserts Have your input data cleaned up so you can spare constraints Group new rows in chained insert statements: insert into T ... values (...), (...), (...), ..., (...) Raise memory and cache limits as fit Some fine-tune pragmas can speed things up Even with these steps SQLite can't compete with other libraries on the bulk insert test; the overhead of invoking primitives thru DllCall makes things worse. SQLite isn't bad, it's only poorly suited for the job. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
JockoDundee Posted December 23, 2020 Share Posted December 23, 2020 4 hours ago, jchd said: Of course SQLite is slower than any other {key, value} library since its overhead for enforcing ACID properties is paramount. It’s my fault SQLite was invited to the party. The question wasn’t if it would be slower, just how much it would be slower. And to see at what size that the performance might converge (10,000,000 rows?) Useful also as a sanity check on the other values, too. That said I agree there may be things to improve its performance. 4 hours ago, jchd said: Use a memory DB Have your input data cleaned up so you can spare constraints Group new rows in chained insert statements: insert into T ... values (...), (...), (...), ..., (...) Raise memory and cache limits as fit Some fine-tune pragmas can speed things up Agree. 4 hours ago, jchd said: If not possible to use a Memory DB, enclose all inserts in one exclusive transaction Disagree. Not fair to the one at a time loading scenario set forth by AJ. Also, not clear that it would benefit performance to be part of a million row transaction, since that would imply substantial journaling writes, in order to rollback. 4 hours ago, jchd said: Do not add indices and triggers to your target table, do that after all inserts Agree it’s faster, disagree it’s allowed by the loading rules. Another thing that *may* speed up SQLite is the use of the WITHOUT ROWID clause, see here. Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
jchd Posted December 23, 2020 Share Posted December 23, 2020 8 minutes ago, JockoDundee said: Also, not clear that it would benefit performance to be part of a million row transaction, since that would imply substantial journaling writes, in order to rollback. Try it on a sample and you'll see 😉 Divide your bulk insert in, say, chunks of 10k rows and compare with 10k individual inserts... 9 minutes ago, JockoDundee said: Another thing that *may* speed up SQLite is the use of the WITHOUT ROWID clause Most probably not. In a w/o rowid table you must declare a primary key by your own and either it's an integer (making no difference with a rowid) or something else which is going to be slower to handle than an integer. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 23, 2020 Author Share Posted December 23, 2020 4 hours ago, jchd said: Bulk inserts can be made faster by using the following guidelines: The results in the first post already based on these (code is attached). I used a in-memory db with index creation after all elements are inserted (beware not to define a primary key - this would lead to an implicit index). Also the cache size is increased. 4 hours ago, jchd said: SQLite isn't bad, it's only poorly suited for the job. my words Link to comment Share on other sites More sharing options...
JockoDundee Posted December 23, 2020 Share Posted December 23, 2020 2 hours ago, jchd said: Try it on a sample and you'll see 😉 Divide your bulk insert in, say, chunks of 10k rows and compare with 10k individual inserts... No need to test, I wouldn’t “include all inserts in one exclusive transaction” or a bunch of 10,000 row transactions. Both are way slower than just turning off transactions altogether. Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
jchd Posted December 23, 2020 Share Posted December 23, 2020 (edited) Are you sure? ;-)) expandcollapse popup#include <SQLite.au3> Const $SQLITE_DLL = "C:\SQLite\bin\sqlite3.dll" ;<-- Change to the location of your sqlite dll ; Init sqlite _SQLite_Startup($SQLITE_DLL, False, 1) If @error Then Exit MsgBox($MB_ICONERROR, "SQLite Error", "Unable to start SQLite. Check existence of DLL") OnAutoItExitRegister(_SQLite_Shutdown) ConsoleWrite("SQlite version " & _SQLite_LibVersion() & @LF & @LF) ; optional! Local $hDB = _SQLite_Open() If @error Then Exit OnAutoItExitRegister(_SQ3Close) _SQLite_Exec($hDB, "create table T (k integer primary key, v char)") Local $t = TimerInit() For $i = 1 To 10000 _SQLite_Exec($hDB, "insert into T values (" & $i & ", 'abcdef')") Next Local $dt = TimerDiff($t) ConsoleWrite("10k inserts, no transaction, individual inserts took " & $dt / 1000 & "s" & @LF) _SQLite_Exec($hDB, "drop table T") _SQLite_Exec($hDB, "create table T (k integer primary key, v char)") $t = TimerInit() _SQLite_Exec($hDB, "begin exclusive") For $i = 1 To 10000 _SQLite_Exec($hDB, "insert into T values (" & $i & ", 'abcdef')") Next _SQLite_Exec($hDB, "end") Local $dt = TimerDiff($t) ConsoleWrite("10k inserts, one transaction, individual inserts took " & $dt / 1000 & "s" & @LF) _SQLite_Exec($hDB, "drop table T") _SQLite_Exec($hDB, "create table T (k integer primary key, v char)") $t = TimerInit() Local $s = "insert into T values (1, 'abcdef')" For $i = 2 To 10000 $s &= ",(" & $i & ", 'abcdef')" Next _SQLite_Exec($hDB, $s) Local $dt = TimerDiff($t) ConsoleWrite("10k inserts, one (implicit) transaction, chained inserts took " & $dt / 1000 & "s" & @LF) _SQLite_Exec($hDB, "drop table T") Func _SQ3Close() _SQLite_Close($hDB) EndFunc ;==>_SQ3Close SQlite version 3.32.3 10k inserts, no transaction, individual inserts took 5.9761773s 10k inserts, one transaction, individual inserts took 5.6808805s 10k inserts, one (implicit) transaction, chained inserts took 0.0446869s The beta is even much faster (I added test with a Map): SQlite version 3.32.3 10k inserts, no transaction, individual inserts took 1.2966753s 10k inserts, one transaction, individual inserts took 1.2246717s 10k inserts, one (implicit) transaction, chained inserts took 0.0381699s 10k inserts in a Map took 0.0103734s Edited December 23, 2020 by jchd JockoDundee 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
JockoDundee Posted December 23, 2020 Share Posted December 23, 2020 43 minutes ago, jchd said: Are you sure? ;-)) I’ll check it out. Thanks! Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
JockoDundee Posted December 23, 2020 Share Posted December 23, 2020 1 hour ago, jchd said: Are you sure? ;-)) what i meant by "transactions off" was this: SQlite version 3.27.2 PRAGMA journal_mode=OFF 100k inserts, no transaction, individual inserts took 8.3460208s 100k inserts, one transaction, individual inserts took 8.2206863s 100k inserts, one (implicit) transaction, chained inserts took 0.2979496s SQlite version 3.27.2 WITHOUT PRAGMA journal_mode=OFF 100k inserts, no transaction, individual inserts took 8.5634186s 100k inserts, one transaction, individual inserts took 8.8829882s 100k inserts, one (implicit) transaction, chained inserts took 0.3040778s only a minor difference addmitedly. I bumped the 10K to 100K to event out the averaging a bit. Not familiar with "chained inserts", is it standard SQL? Regardless, it looks to be quit a winner! Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
AspirinJunkie Posted December 24, 2020 Author Share Posted December 24, 2020 (edited) 12 hours ago, JockoDundee said: what i meant by "transactions off" was this: This parameter (also PRAGMA SYNCHRONOUS=OFF) has an effect mainly on file-databases. For in-memory databases like this one, its effect disappears. Change the sqlite-open statement to this one: _SQLite_Open(@DesktopDir & "\Test.db") and you will see the effect. 12 hours ago, JockoDundee said: Not familiar with "chained inserts", is it standard SQL? I do not have the standard here but I firmly assume yes. This construct can be found in almost all big databases (except oracle - they have an extra "insert all" statement). And since PostgreSQL has this as well, there is a lot to be said for it being in the standard as well. Edited December 24, 2020 by AspirinJunkie Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now