Jump to content

Limits to StringSplit?


 Share

Recommended Posts

I wrote a program that when using  a 1GB .csv file would crash at _AddArray. I dug around and have finally come to the conclusion that my file is breaking any StringSplit attempts, including those inside of _AddArray causing  this error message in the SciTE console:

!>HH:MM:SS AutoIt3.exe ended.rc:-1073741819

Can anyone confirm that this is what happens with a .csv this large?

1,065,804,064 Characters

58,320,000 Lines

using this should crash it:

#include <Array.au3>
Local $fileHandle = FileOpenDialog("Open", "C:/", "All Files (*.*)")
Local $sString = fileread($fileHandle)
local $aArray[0][3]
_ArrayAdd($aArray , $sString , "" , ",")  ;Crashes Here
_ArrayDisplay($aArray)

Crash File:

Link to google drive original file when it's done uploading.

Link to 7z compressed file when it's done compressing.

 

To be clear it should work on this file attached.

test.csv

Edited by Funtime60
Added Promised Links
Link to comment
Share on other sites

thats a hard crash, chasing a hard crash can be the long goal, in the meantime...

-read chunks by get/set position

-maybe some filereadline

-maybe let _filereadtoarray take a crack at it

-maybe csvsplit 

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

yup, csvsplit with its regex would probably win most races.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

In help file, under Language Reference / Variables / Arrays:

You can use up to 64 dimensions in an Array. The total number of entries cannot be greater than 2^24 (16 777 216).

Above, entries actually mean array cells, not rows.
I just posted ticket #3701 in Trac (hard crashing is bad); I didn't locate a previous ticket but I'm unsure about discriminating terms to search for.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@Funtime60
Try populating an SQLite database instead (lookup help). Creating such table will take significantly longer but using it for any purpose will be fast and it can grow to dozens of Tb, provided you have the disk size.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Tank you for pointing out this limit with arrays. I don't think that I'll mess with any SQLite database just yet. I think I may try to split up the array into smaller sized component arrays, though that may not work, as I need to sort them all. In the end It may just become a limitation of my final product. (Not actually a product). I think that just about solves this.

Edited by Funtime60
Edit 1: Grammar, Spelling, and Stuff
Link to comment
Share on other sites

Splitting in chunks, sorting then merging them (into what?) is going to be a kind of mess.

What are your sort criterion and which kind of searches/queries/operations will you need with that data?
Also if you can give meaningful names to the 3 columns, that will help.

Roght now I'm d/l the real file and I'll turn it into a DB, with some code to massage data. The last part is which massage you want to apply ;-)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Using C# and VB Code in AutoIt through .NET Framework is another way to handle large files. I've tested some VB code with the large CSV-file with 58,320,000 rows:

Imports System
Imports System.IO

Class CSVClass
  Public Function CSVFunc() As Integer(,)
    'Load CSV file
    Console.WriteLine( "Load CSV file" )
    'Dim sPath As String = "test.csv"
    Dim sPath As String = "LARGE_elevation.csv"
    Dim aFile() As String = File.ReadAllLines( sPath )
    Dim iRows As Integer = aFile.Length
    Console.WriteLine( "Rows in file = {0}", iRows )

    'To 2d array of integers
    Console.WriteLine( "To 2d array of integers" )
    Dim aLine As Object()
    Dim aData(iRows-1,2) As Integer
    For i As Integer = 0 To iRows-1
      aLine = aFile(i).Split(",")
      For j As Integer = 0 To 2
        aData(i,j) = aLine(j)
      Next
    Next
    Console.WriteLine("First row: [{0}, {1}, {2}]", aData(0,0), aData(0,1), aData(0,2) )
    Console.WriteLine("Last  row: [{0}, {1}, {2}]", aData(iRows-1,0), aData(iRows-1,1), aData(iRows-1,2) )

    'Return 2d array slice
    Console.WriteLine( "Return 2d array slice" )
    iRows = 100000
    Dim aSlice(2,iRows-1) As Integer
    For i As Integer = 0 To iRows-1
      aSlice(0,i) = aData(i,0)
      aSlice(1,i) = aData(i,1)
      aSlice(2,i) = aData(i,2)
    Next
    Return aSlice
  End Function
End Class
#AutoIt3Wrapper_UseX64=y

#include <Array.au3>
#include "DotNetAll.au3"

Opt( "MustDeclareVars", 1 )

Example()

Func Example()
  Local $hTimer = TimerInit()
  Local $oNetCode = DotNet_LoadVBcode( FileRead( "CSVLoad.vb" ), "System.dll" )
  Local $oCSVClass = DotNet_CreateObject( $oNetCode, "CSVClass" )
  Local $aSlice = $oCSVClass.CSVFunc()
  ConsoleWrite( "Time = " & TimerDiff( $hTimer ) & @CRLF )
  _ArrayDisplay( $aSlice )
EndFunc

SciTE output:

Load CSV file
Rows in file = 58320000
To 2d array of integers
First row: [222425, 0, 0]
Last  row: [16721411, 10799, 5399]
Return 2d array slice
Time = 55018.3823743302

That's a million rows per second.

All code in the zip-file: CSVLoad.7z

Edited by LarsJ
Link to comment
Share on other sites

Nice way to code. Now the questions remains: if the data is going to accumulate in large or huge chunks like this and if the OP needs to more or less regularly perform some querying or processing on global data, then a Db is certainly useful. If all this is a one-time shot then of course such direct fast processing is wonder.

I've finally found a time to create an SQLite Db. The 58320000 rows are in fact 10800 groups numbered 0 to 10799 (column 2), of 5400 entries each numbered 0 to 5399 (column 3). I take the first column is storing a value. The whole thing seems to be an array like $aValue[10800][5400]. SQL isn't the best data store for arrays, yet all depends on which kind of processing will have to be done and what's the future of data (transient, permanent, accumulative, etc.).

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...