baconaise Posted September 7, 2022 Share Posted September 7, 2022 Hello, I am trying to manipulate a large CSV file (Over 5 million lines with 50-300 characters per line). I thought I would start with something like: _FileReadToArray("C:\Folder\Data.csv", $My_Data_Array) But, I get a popup "Error allocating memory." as soon as I hit 2GB of RAM usage. Does anyone have a good way to approach such a task? Link to comment Share on other sites More sharing options...
kurtykurtyboy Posted September 7, 2022 Share Posted September 7, 2022 Could you give more context on what you are trying to do with the data? Do you need all of the data at once or can you process 1 line at a time? Here is a modified example from the help file to loop through each line of the file using FileReadLine. expandcollapse popup#include <FileConstants.au3> #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> Example() Func Example() Local $timer = TimerInit() ; Create a constant variable in Local scope of the filepath that will be read/written to. Local Const $sFilePath = "testFile.csv" ;~ Local $sStr = "" ;~ For $i=0 to 5000000 ;~ $sStr &= "00000000000000000000000000000000000000000000000" & @CRLF ;~ Next ;~ ; Create a temporary file to read data from. ;~ If Not FileWrite($sFilePath, $sStr) Then ;~ MsgBox($MB_SYSTEMMODAL, "", "An error occurred whilst writing the temporary file.") ;~ Return False ;~ EndIf ;~ ConsoleWrite(TimerDiff($timer) & @CRLF) ;~ $timer = TimerInit() ; Open the file for reading and store the handle to a variable. Local $hFileOpen = FileOpen($sFilePath, $FO_READ) If $hFileOpen = -1 Then MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.") Return False EndIf ; loop through the file lines Local $sFileLine While 1 ;read the next line $sFileLine = FileReadLine($hFileOpen) If @error Then ExitLoop ;process this line WEnd ConsoleWrite(TimerDiff($timer) & @CRLF) ; Close the handle returned by FileOpen. FileClose($hFileOpen) EndFunc ;==>Example baconaise 1 Link to comment Share on other sites More sharing options...
Musashi Posted September 7, 2022 Share Posted September 7, 2022 5 hours ago, baconaise said: Title : Memory limits and massive arrays But, I get a popup "Error allocating memory." as soon as I hit 2GB of RAM usage. You can find the limits for AutoIt e.g. here : LimitsDefaults Excerpt : Maximum number of elements for an array = 16,777,216 Your problem is apparently the default limit of 2,147,483,647 bytes in RAM. @kurtykurtyboy's suggestion to process the .csv file line by line is good, assuming that this is what you need. Another option would be to read the data into a SQLite database and process it with the respective SQL statements. baconaise, TheXman and kurtykurtyboy 3 "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
genius257 Posted September 7, 2022 Share Posted September 7, 2022 Hi @baconaise. Have you tried running your script as a 64-bit application? baconaise 1 My highlighted topics: AutoIt Package Manager, AutoItObject Pure AutoIt, AutoIt extension for Visual Studio Code Github: AutoIt HTTP Server, AutoIt HTML Parser Link to comment Share on other sites More sharing options...
baconaise Posted September 8, 2022 Author Share Posted September 8, 2022 22 hours ago, kurtykurtyboy said: Could you give more context on what you are trying to do with the data? Do you need all of the data at once or can you process 1 line at a time? Great question. Each CSV line is formatted: Wholesaler,Retailer,Product_Category,Model_Number,Sale_Date,Transaction_Date I am trying to build something that will tell me about each retailer's orders. I do not believe the sum of data for any individual retailer would encounter the memory cap, so there is probably some merit to the idea that I could read line by line to create an array for each retailer. I set up a test for that last night to see the speed at which it might process using something like this: Global $Lines_Of_Data_In_The_CSV = _FileCountLines("C:\Folder\Data.csv") Global $Sales_Data_Dump_CSV = FileOpen("C:\Folder\Data.csv", $FO_READ) Global $New_Array[$Lines_Of_Data_In_The_CSV] Global $GUI_Reading = GUICtrlCreateLabel($Lines_Of_Data_In_The_CSV,5,5,500,30) For $i = 1 to $Lines_Of_Data_In_The_CSV $New_Array[$i] = FileReadLine($Sales_Data_Dump_CSV,$i) GUICtrlSetData($GUI_Reading, $i & " " & $New_Array[$i]) Next In almost a full day, it's only about 10% of the way through the file. That is concerningly slow. Though, it is still better than doing it by hand. Link to comment Share on other sites More sharing options...
baconaise Posted September 8, 2022 Author Share Posted September 8, 2022 18 hours ago, genius257 said: Hi @baconaise. Have you tried running your script as a 64-bit application? Now there's an idea! I compiled it using the 64-bit Aut2Exe (because I don't know how to do that directly from Scite), and it did not give me the error. It's sitting here occupying 1,805 MB of ram. I'm a little curious what would make it happy to hold the data using 1.8GB in 64-bit while it gave me the 2-GB error in 32-bit. Link to comment Share on other sites More sharing options...
kurtykurtyboy Posted September 8, 2022 Share Posted September 8, 2022 For sure, the GuiCtrlSetData is slowing your loop to a crawl! Here is 1 way to sum one of the CSV columns for each retailer using a dictionary object to store and lookup the sums. This took between 90-160 seconds on my humble laptop to process my test file with 5 million lines. You'll most likely need to adjust the columns for your data, but hopefully this gets you going in the right direction. If you want to do something different with your data, let us know and we'll see what we can come up with. expandcollapse popup#include <FileConstants.au3> #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> #include <Array.au3> Example() Func Example() Local $timer = TimerInit() ; Create a constant variable in Local scope of the filepath that will be read/written to. Local Const $sFilePath = "testFile.csv" ;~ Local $sStr = "", $j = 1 ;~ For $j = 1 To 5 ;~ For $i = 0 To 1000000 ;~ $sStr &= "wholesaler" & $j & ",retailer" & $j & ",cat1,model1,saledate,transdate," & $j & @CRLF ;~ Next ;~ Next ;~ ; Create a temporary file to read data from. ;~ If Not FileWrite($sFilePath, $sStr) Then ;~ MsgBox($MB_SYSTEMMODAL, "", "An error occurred whilst writing the temporary file.") ;~ Return False ;~ EndIf ;~ ConsoleWrite(TimerDiff($timer) & @CRLF) ;~ $timer = TimerInit() ; Open the file for reading and store the handle to a variable. Local $hFileOpen = FileOpen($sFilePath, $FO_READ) If $hFileOpen = -1 Then MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.") Return False EndIf ;create dictionary object to hold the sums Local $oRetailerSums = ObjCreate("Scripting.Dictionary") Local $sFileLine, $aLineParts, $i = 0 ; loop through the file lines While 1 ;read the next line $sFileLine = FileReadLine($hFileOpen) If @error Then ExitLoop ;process this line $aLineParts = StringSplit($sFileLine, ",") If @error Then ContinueLoop ;add this value to the matching retailer sum $oRetailerSums.Item($aLineParts[2]) = $oRetailerSums.Item($aLineParts[2]) + $aLineParts[7] ;display line after every 100,000 lines to indicate progress $i += 1 If Mod($i, 100000) = 0 Then ConsoleWrite("Line " & $i & " " & $sFileLine & @CRLF) EndIf WEnd ConsoleWrite(TimerDiff($timer) & @CRLF) ; Close the handle returned by FileOpen. FileClose($hFileOpen) ;Create array of retailer/value pairs for display Local $aRetailersFound = $oRetailerSums.Keys() Local $aRetailerSums[UBound($aRetailersFound)][2] For $i = 0 To UBound($aRetailersFound) - 1 $aRetailerSums[$i][0] = $aRetailersFound[$i] $aRetailerSums[$i][1] = $oRetailerSums.Item($aRetailersFound[$i]) Next _ArrayDisplay($aRetailerSums) EndFunc ;==>Example Musashi and baconaise 1 1 Link to comment Share on other sites More sharing options...
genius257 Posted September 8, 2022 Share Posted September 8, 2022 17 hours ago, baconaise said: Now there's an idea! I compiled it using the 64-bit Aut2Exe (because I don't know how to do that directly from Scite), and it did not give me the error. It's sitting here occupying 1,805 MB of ram. I'm a little curious what would make it happy to hold the data using 1.8GB in 64-bit while it gave me the 2-GB error in 32-bit. Well there are a lot of deeper answers if you google it, but basically 32-bit application have 2GB available memory for the application and 64-bit have 8GB i seem to remember. So is it a pretty solution? No. It's a solution that does not solve your problem, should you allocate more again. But if your application never exceeds the 64-bit limit and you don't wish to waste time optimizing if not needed, then 64-bit is the solution baconaise 1 My highlighted topics: AutoIt Package Manager, AutoItObject Pure AutoIt, AutoIt extension for Visual Studio Code Github: AutoIt HTTP Server, AutoIt HTML Parser Link to comment Share on other sites More sharing options...
baconaise Posted September 9, 2022 Author Share Posted September 9, 2022 23 hours ago, kurtykurtyboy said: using a dictionary object to store and lookup I had never delved deeply enough into AutoIT (or any other language for that matter) to come across such things. What a fascinating rabbit-hole! Thank you for pointing me in such a direction. In case this has inspired anyone else to put on a little blue dress: https://www.autoitscript.com/autoit3/docs/intro/ComRef.htm Link to comment Share on other sites More sharing options...
kurtykurtyboy Posted September 9, 2022 Share Posted September 9, 2022 (edited) And if you want to look further into the dictionary object in particular.. See here. I used the dictionary in my example because it is a very fast an easy way to get a sort of associative array. Fast as in development and execution. You could accomplish the same thing by building and searching through regular arrays but it would be much more complicated and likely much slower, depending on how many keys your data ends up with. Edit: I feel obligated to mention there are several alternatives for associative arrays floating around. The native AutoIt map datatype is probably the way to go; I just have an old habit of using the scripting.dictionary object. Edited September 9, 2022 by kurtykurtyboy Link to comment Share on other sites More sharing options...
junkew Posted September 10, 2022 Share Posted September 10, 2022 Try to load in excel 64 bits and make a pivotgraph or table Try it with sqlite.au3 FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now