DXRW4E Posted March 4, 2013 Share Posted March 4, 2013 It's too slow man, it's too slow! It works fine for 4-5 entries, but I am dealing here almost always with more then a 10 000 entries... I do not have a clear idea of what's going on, but in my experience, more or less Local $i_log = 2 ; log.xlsx reading start ponit Do ; loop $IP = $aArray_log[$i_log][1] ; taking IP from log.xlsx $temp = StringSplit($IP,'.') ; splits up a string into substrings $decIP = ($temp[1]*256*256*256)+($temp[2]*256*256)+($temp[3]*256)+$temp[4] ; converting IP to decimal value, more info https://www.ultratools.com/tools/decimalCalc For $i_geo = 2 To $Max_geo ; geo.xlsx reading start ponit, Loop If $decIP >= $aArray_geo[$i_geo][1] And $decIP <= $aArray_geo[$i_geo][2] Then $dummy = 1 ; commparing decimal IP with values from geo.xlsx and define walue just to script go on Next $i_log = $i_log + 1 Until $i_log = $Max_log ; Do until read all rows in log.xlsx this code should proceed, an average of 1000-10000 item per second, so the maximum will take 10 seconds, and not 150 seconds Link to comment Share on other sites More sharing options...
James Posted March 4, 2013 Share Posted March 4, 2013 Yes. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
tempman Posted March 4, 2013 Author Share Posted March 4, 2013 Ok, I pull the part of code that slow me down and I adapted to work as stand alone to give you the whole picture (geoip.rar). On my Intel® Pentium® DualCore G630 2.7GHz, 4GB DDR3 1333 MHz for that job it took 142 seconds...geoip.rar Link to comment Share on other sites More sharing options...
DXRW4E Posted March 4, 2013 Share Posted March 4, 2013 (edited) First, I work in an old pc x32, very slow (everything is fine, but I want to say is not a PC where you can do test ect ect), with Windows 8 & Office 2013 ect ectAutoit + Excel object does not work well, here I can not open the geo.xlsx, waited 5 minutes and nothing hmmmm it seems that Excel works (CPU 80%) mahhhh nothing seems to want a very long time to open the geo.xlsx, instead _XLSXReadToArray() will serve more or less 30 seconds to open geo.xlsxI left in geo.xlsx only 260 Rows, and Autoit + Excel object opened it in 22 seconds, instead _XLSXReadToArray() opened it in 1.22 secondsthe file geo.xlsx are present 180,000 rows and not 10,000 ehhh, so the loop that you run them and equal to 110.697.900, autoit seems to work almost 300 items per second (in my opinion AutoIt works Very Very fast, they're giving for each loop 1000 array to control ehhhhh), so serve 600 seconds to finish the jobI am not an expert, but for what they saw or generally autoit and very fast, the .NET is not faster in some loop about AutoIt, so I doubt that even C++++ will run this loop in 5 seconds hmmmmmmmmmhowever, code more understandable should be so#include <_XLSXReadToArray.au3> #include <array.au3> Local $inColum, $dIP, $t $t = TimerInit() $aGeo = _XLSXReadToArray(@ScriptDir & "\geo.xlsx", 0, 0, 0) ;~ If @Error Or @Extended < 3 Then ;~ ;ect ect ect ;~ EndIf $aLog = _XLSXReadToArray(@ScriptDir & "\log.xlsx", 0, 0, 0) ;~ If @Error Then ;~ ;ect ect ect ;~ EndIf $inColum = @Extended ReDim $aLog[$aLog[0][0] + 1][$inColum + 1] $t = TimerDiff($t) MsgBox(0, "Time Difference", $t) $t = TimerInit() For $i = 2 To $aLog[0][0] $aIP = StringSplit($aLog[$i][0], ".") $dIP = ($aIP[1] * 256 * 256 * 256) + ($aIP[2] * 256 * 256) + ($aIP[3] * 256) + $aIP[4] For $y = 2 To $aGeo[0][0] If $dIP >= $aGeo[$y][0] And $dIP <= $aGeo[$y][1] Then $aLog[$i][$inColum] = $aGeo[$y][2] Next Next $t = TimerDiff($t) MsgBox(0, "Time Difference", $t) _ArrayDisplay($aLog)Ciao. Edited March 4, 2013 by DXRW4E Link to comment Share on other sites More sharing options...
tempman Posted March 4, 2013 Author Share Posted March 4, 2013 I tested your script and I get 259.378 sec. So you're telling me that my 142 seconds is the fastest solution?!? Link to comment Share on other sites More sharing options...
DXRW4E Posted March 4, 2013 Share Posted March 4, 2013 (edited) I tested your script and I get 259.378 sec. So you're telling me that my 142 seconds is the fastest solution?!? I do not think the script in geoip.rar expandcollapse popup#include <Excel.au3> $Log= @ScriptDir & "\log.xlsx" $oExcel_log= _ExcelBookOpen($Log, 0) $aArray_log = _ExcelReadSheetToArray($oExcel_log, 1, 1, 0, 1) $Max_log = $aArray_log[0][0] $Geo= @ScriptDir & "\geo.xlsx" $oExcel_geo = _ExcelBookOpen($Geo, 0) $aArray_geo = _ExcelReadSheetToArray($oExcel_geo, 1, 1, 0, 3) $Max_geo = $aArray_geo[0][0] Local $begin = TimerInit() Local $i_log = 2 Do $IP = $aArray_log[$i_log][1] $temp = StringSplit($IP,'.') $decIP = ($temp[1]*256*256*256)+($temp[2]*256*256)+($temp[3]*256)+$temp[4] For $i_geo = 2 To $Max_geo If $decIP >= $aArray_geo[$i_geo][1] And $decIP <= $aArray_geo[$i_geo][2] Then _ExcelWriteCell($oExcel_log, $aArray_geo[$i_geo][3], $i_log, 2) Next $i_log = $i_log + 1 Until $i_log = $Max_log + 1 Local $dif = TimerDiff($begin) MsgBox(0, "Time Difference", $dif) ; 142 sec <<<<<<< TO SLOW!!! _ExcelBookSave($oExcel_Log) _ExcelBookClose($oExcel_Log, 1, 0) $oExcel_log.Application.Quit _ExcelBookClose($oExcel_geo, 0, 0) $oExcel_geo.Application.Quit Exit is more faster than #include <_XLSXReadToArray.au3> #include <array.au3> Local $inColum, $dIP, $t $t = TimerInit() $aGeo = _XLSXReadToArray(@ScriptDir & "\geo.xlsx", 0, 0, 0) ;~ If @Error Or @Extended < 3 Then ;~ ;ect ect ect ;~ EndIf $aLog = _XLSXReadToArray(@ScriptDir & "\log.xlsx", 0, 0, 0) ;~ If @Error Then ;~ ;ect ect ect ;~ EndIf $inColum = @Extended ReDim $aLog[$aLog[0][0] + 1][$inColum + 1] $t = TimerDiff($t) MsgBox(0, "Time Difference", $t) $t = TimerInit() For $i = 2 To $aLog[0][0] $aIP = StringSplit($aLog[$i][0], ".") $dIP = ($aIP[1] * 256 * 256 * 256) + ($aIP[2] * 256 * 256) + ($aIP[3] * 256) + $aIP[4] For $y = 2 To $aGeo[0][0] If $dIP >= $aGeo[$y][0] And $dIP <= $aGeo[$y][1] Then $aLog[$i][$inColum] = $aGeo[$y][2] Next Next $t = TimerDiff($t) MsgBox(0, "Time Difference", $t) _ArrayDisplay($aLog) anyway is not autoit that works slow but the same code that performs more than 100 million of loop sorry for my english Ciao. Edited March 4, 2013 by DXRW4E Link to comment Share on other sites More sharing options...
jchd Posted March 5, 2013 Share Posted March 5, 2013 (edited) For the record, I downloaded the large version of the geoip free offer, which includes country, region, city, lattitude, longitude, postal code and area code. It contains 2,261,104 IPv4 ranges for 403,081 locations. I loaded it in SQLite Expert as two tables, converted the full decimal IP values to classical xxx.yyy.zzz.ttt format using an update statement. The resulting DB with its coumpound index is 405 MiB. When I do a simple search like select IPfrom, IPto, Country, City from ipblocks B natural join iplocation L where '109.015.194.154' between ipfrom and ipto; I get the expected result IPfrom IPto country city --------------- --------------- ------- ----------- 109.015.194.000 109.015.194.255 FR La Couronne in ~3.5 s I bet that using the more simple {IP range, country} geoip file, the search will be way faster. It was easy to verify this obvious assumption. Using the geo file you posted, I loaded it into a single table, still using Expert. Again I converted decimal IPs to readable string format (even if I know it will be slower). The resulting table of 178545 rows and its size is 18 Mib. Unsurprisingly, the above query produces the same result in ~235 ms Using only decimal IP values the time decreases to 225 ms. In my biaised view the small gain doesn't justify the burden and inconvenience of full decimal IP unreadable values. EDIT: I don't believe that for routine use loading this small (mostly read-only) DB in :memory: would be much beneficial. It is small enough to be fully kept in both OS and SQLite caches after only few queries, so the only difference would be to avoid double cacheing. Edited March 5, 2013 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
tempman Posted March 5, 2013 Author Share Posted March 5, 2013 The guide line to my judgment what is fast and what is not was this .Net app Apache Log Viewer, which analyze Apache logs and it load my log file (access_log-2013-01-18.txt) with 620 IP's and all other information from it in matter of seconds in form like this:It use GeoIP base in form of GeoIP.dat and GeoIPv6.dat files.Since that app not completely satisfy my needs, I decide to adjust it and I succeed in all the cases except in GeoIp loading speed...access_log-2013-01-18.txt Link to comment Share on other sites More sharing options...
sentry07 Posted March 5, 2013 Share Posted March 5, 2013 (edited) Reading 180,000 rows from an Excel spreadsheet using this method is incredibly slow. I converted the spreadsheet into a CSV. Reading in the file and parsing it into an array using StringSplit and StringRegExp takes around 3 seconds. Obviously the bulk of the time you're spending is in this enormous loop. There are tons of array search optimizations you can put in that would help you not have to search the whole array. For example: First off, the array is already sorted for you. You should be using this to your advantage. One timesaver I came up with off the top of my head is an array index. I saved the first value (16777216) and every time the row's low value was greater than a multiple of that number, I saved that position in the Geo array. This means later when I'm searching for the value, I know a good place to start (arrayIndex[Floor(decIP / firstvalue)]) and I should only have to search a fraction of the total array. If you do the math: (total rows / (last low value / first low value)), which in this case is a maximum of about 800 rows per log entry. Second, you need to look at the information you're searching for. There are a lot of duplicate IPs. Save what you've found and look through those first before looping through the big loop. After optimizing your code with only the above improvements, I have cut the time down to this: Geo File Read Time: 0.14s Geo File Parse Time: 2.95s Log Parse Time: 0.09s Total Time: 3.18s IPs Found: 620 Edited March 5, 2013 by sentry07 Link to comment Share on other sites More sharing options...
tempman Posted March 7, 2013 Author Share Posted March 7, 2013 Since you presentate your achieved times, it's obvious that you modify a script. Are you willing to share your script with us? Link to comment Share on other sites More sharing options...
sentry07 Posted March 7, 2013 Share Posted March 7, 2013 You just asked if it could run faster. If I give you my code, you don't learn. j/k...knock yourself out. GeoIP.zip tempman 1 Link to comment Share on other sites More sharing options...
tempman Posted March 7, 2013 Author Share Posted March 7, 2013 (edited) As i told before in this topic, I am not a full time programmer, I use AutoIt occasionally to make my full time job easier. So the script that I wrote was a peak of my AutoIt knowledge. Thank you very much, this is more than I expected for! Edited March 7, 2013 by tempman Link to comment Share on other sites More sharing options...
jchd Posted March 7, 2013 Share Posted March 7, 2013 Here's my contribution, using the included SQLite DB (which contains data from the lite version of GeoIP, just IP range and country). Since you say you may have to process a large number of lines in your logs, I believe the approach used herein is beneficial: it extracts all IPs from the log file, removes duplicates and sorts them in order to use a binary search later. From the timing displayed, the lookup in the database is fairly fast.What was the slow part of your initial code (the geo IP lookup) boils down to circa 2 ms for every distinct IP address in the log file. Overall processing time takes less than 2.5 s now for the same 621 entries log file.Now beware that the IP geo data seems unreliable or at least questionable. For instance the lite version of geoip says 159.253.130.4 is in Croatia, the large version (with city and more details) says it belongs to "SoftLayer Dutch Holdings BV" and other sites like http://www.geoiptool.com/fr/?IP=159.253.130.4 pretend it's in Netherland.Anyway, here's the self-contained ZIP. There is little to no error checking. The country name is simply appended at every line. You may want to reformat the log lines to suit your actual needs.SQLiteGeoIP.zip This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted March 7, 2013 Moderators Share Posted March 7, 2013 sentry07 and jchd, Service above and beyond the call of duty from both of you there - thank you very much for having made the effort! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
sentry07 Posted March 7, 2013 Share Posted March 7, 2013 Bah, database querying. That's cheating. Link to comment Share on other sites More sharing options...
tempman Posted March 8, 2013 Author Share Posted March 8, 2013 Now beware that the IP geo data seems unreliable or at least questionable. For instance the lite version of geoip says 159.253.130.4 is in Croatia, the large version (with city and more details) says it belongs to "SoftLayer Dutch Holdings BV" and other sites like http://www.geoiptool.com/fr/?IP=159.253.130.4 pretend it's in Netherland.You have this misfortune to bump to one of the outdated GeoIP sites.IP 159.253.130.4 is really in Croatiahttps://apps.db.ripe.net/search/query.html?searchtext=159.253.130.4#resultsAnchorhttp://whois.domaintools.com/159.253.130.4http://www.infosniper.net/index.php?ip_address=159.253.130.4http://myip.ms/info/whois/159.253.130.4As your script concerning, i didn't have time to try it, I feedback you as soon as possible.Thank you in advance for your contribution! Link to comment Share on other sites More sharing options...
jchd Posted March 8, 2013 Share Posted March 8, 2013 That's bad luck: this site was the first Google.fr came up with and I wanted to double-check a few IPs using it. Their own IP is now in my host file so that I won't make the same blemish again. Sorry for FUD. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
tempman Posted March 8, 2013 Author Share Posted March 8, 2013 Never mind about that, your SQLiteGeoIP work like a charm! Which app you used to convert csv to db3? If it was SQLite Expert Professional, how did you manage to import csv base in form of: IPfrom IPto Country 002.000.000.000 002.015.255.255 France When I try to import I get this: IPfrom IPto Country 2.0.0.0 2.15.255.255 France because of that my Traced_access_log-2013-01-18.txt have less resolved IP's Link to comment Share on other sites More sharing options...
jchd Posted March 8, 2013 Share Posted March 8, 2013 (edited) Yes, there's the little catch that common format of IPv4 doesn't carry leading zeroes, which are needed for correct collation. Without leading zeroes, collation is guaranteed to get wrong results, so don't even try.I imported using Expert (my all-time favorite SQLite companion) then used a little extension of mine to reformat the IPs with leading zeroes.Of course you can as well do the parsing/reformatting with AutoIt code.Here's the statement I used, where ipmin and ipmax are the decimal values loaded from the CSV file and ipfrom, ipto are two columns with IPv4 as strings as you see them in my example DB:update geo set ipfrom =printf('%03lli', cast(ipmin / (256 * 256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmin % (256 * 256 * 256)) / (256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmin % (256 * 256)) / 256 as integer)) || '.' ||printf('%03lli', ipmin % 256) ,ipto =printf('%03lli', cast(ipmax / (256 * 256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmax % (256 * 256 * 256)) / (256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmax % (256 * 256)) / 256 as integer)) || '.' ||printf('%03lli', ipmax % 256)The printf() extension function comes from the SQLite extension discussed and its loading procedure for use from AutoIt Search unifuzz author jchd in the forum to find more related posts. If you use it be sure to read and understand the lengthy comments at the head of its C source.Feel free to chime if you hit a wall or need any guidance regarding this.EDIT: I now remember than Expert freeware doesn't include extension autoloading. You'll have to load it manually thru an SQL tab before use. Be aware that loading the extension overloads several internal functions and collations besides adding its own set. The overloading versions merely cover unaccentuation of a large subset of Unicode but should be 99.9% compatible with vanilla functions (e.g. german 'ß' and friends: 'ß' LIKE 'ss' is now true). I guess that those linguistic changes don't interfere with your use case. Edited March 8, 2013 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
GMK Posted March 8, 2013 Share Posted March 8, 2013 (edited) There probably isn't a person who cares, but I took this as a challenge to see how fast I could get Excel to do this and it's still not as fast as jchd's solution, but I cut it down to under 5 seconds, anyway. water's is required:expandcollapse popup#include <Excel ReWrite.au3> #include <Array.au3> $oExcel = _Excel_Open(False) $Log = @ScriptDir & "\log.xlsx" $oExcel_log = _Excel_BookOpen($oExcel, $Log) $aArray_log = _Excel_RangeRead($oExcel, $oExcel_log) $Geo = @ScriptDir & "\geo.xlsx" $oExcel_geo = _Excel_BookOpen($oExcel, $Geo) $aArray_geo = _Excel_RangeRead($oExcel, $oExcel_geo) _Excel_BookClose($oExcel, $oExcel_geo, False) Global $aiWritten $begin = TimerInit() For $i_log = 1 To UBound($aArray_log) - 1 If Not IsArray($aiWritten) Or _ArraySearch($aiWritten, $i_log) = -1 Then $IP = $aArray_log[$i_log] $temp = StringSplit($IP, '.') $decIP = (Int($temp[1]) * 256 * 256 * 256) + (Int($temp[2]) * 256 * 256) + (Int($temp[3]) * 256) + Int($temp[4]) $aiSearch = _ArrayFindAll($aArray_log, $IP) For $i_geo = 0 To UBound($aArray_geo) - 1 If $decIP >= Int($aArray_geo[$i_geo][0]) And $decIP <= Int($aArray_geo[$i_geo][1]) Then For $iFound = 0 To UBound($aiSearch) - 1 _Excel_RangeWrite($oExcel, $oExcel_log, Default, $aArray_geo[$i_geo][2], $aiSearch[$iFound] + 1, 2) If Not IsArray($aiWritten) Then Global $aiWritten[1] = [$aiSearch[$iFound]] Else _ArrayAdd($aiWritten, $aiSearch[$iFound]) EndIf Next ExitLoop EndIf Next EndIf Next $dif = Round(TimerDiff($begin) / 1000, 2) MsgBox(0, "Time Difference", $dif) _Excel_BookSave($oExcel, $oExcel_log) _Excel_BookClose($oExcel, $oExcel_log) _Excel_Close($oExcel)P.S. You'll need my edit of the page__st__120#entry1060576 function. Edited March 12, 2013 by GMK Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now