Jump to content

FileReadtoArray Help(Solved)


Recommended Posts

I am unable to read a file to an array using a delimiter.   I get error 3  File lines have different numbers of fields.  So ok,  I then use $FRTA_INTARRAYS,  create an array of arrays.

I use this code to find the arrays that have more rows.  Then combine the needed rows so that all the arrays will have the same row count. 

Global $aArray
_FileReadToArray ('C:\Temp\temp1.csv', $aArray, $FRTA_INTARRAYS , ",")
For $i = 1 To UBound($aArray) -1
        $Row = UBound($aArray[$i], $UBOUND_ROWS)
    
    If $Row <> 10 Then
        $6Row = _ArrayToString($aArray[$i], "", 6, 6)
        $7Row = _ArrayToString($aArray[$i], "", 7, 7)
        _ArrayInsert($aArray[$i], 6, $6Row & $7Row, 0)
        _ArrayDelete($aArray[$i], 7)
        _ArrayDelete($aArray[$i], 7)
        EndIf
    Next

So now my questions:

1 How do I get the array of arrays back to a single array ?

2 Or am I looking at this totally the wrong way?  

Thank you for any help pointing me in the right direction

 
Link to comment
Share on other sites

do we get the csv or portion thereof, and maybe desired output for the combined rows?

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

iamtheky  I have attached a testdata.csv file.  It is generic but in the same format that I am working with.  In this test file you will see on line 7 is where the issue is.

Someone at the creation of the file thinks it is ok to put a comma between firstname and MiddleInit.  That is causing the error 3 when FileReadToArray is used with a delimiter of , 

The correct format will always be 10 columns.  That extra comma causes that line to have 11 columns. 

Thank you for your help

testdata.csv

Link to comment
Share on other sites

Here is some more information as my mind is a bit slow today.  The file pulls into excel correctly because LastName FirstName MiddleInit is in quotes.

Example:  "LastName FirstName MiddleInit"    Just thinking here maybe some regex to replace the comma between the quotes with a space?

Link to comment
Share on other sites

  • Moderators

xcaliber13,

As there can be no definitive answer as to where an extra delimiter might be within the data on a given line, I can only suggest that when the problem arises you loop through the returned arrays and detect which one is giving you the wrong value (use UBound to do this). Once you have found the line(s) which do not have the correct number of delimiters you can manually amend the line(s) question within the original CSV file.

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

i'd do it like this:

 

#include<array.au3>
#include<file.au3>

local $aArray
_FileReadToArray('testdata.csv', $aArray, $FRTA_INTARRAYS , ",")
$col = 0

For $i = 0 to ubound($aArray) - 1
     $col = ubound($aArray[$i]) > $col ?  ubound($aArray[$i]) : $col
next

local $aFinal[0][$col]

For $j = 0 to ubound($aArray) - 1
    _ArrayAdd($aFinal , _ArrayToString($aArray[$j] , @CRLF) , 0 , @CRLF , ",")
Next

_ArrayDisplay($aFinal)

 

edit: you have an extra line in your testdata file, but you can sanitize your input however needed.

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

or this would be faster IMO

Global $aArray, $Row
_FileReadToArray ('test.txt', $aArray, $FRTA_INTARRAYS , ",")
Global $aFinal [UBound($aArray)][10]
For $i = 0 To UBound($aArray)-1
  $Row = $aArray[$i]
  For $j = 0 to Ubound($Row)-1
    $aFinal[$i][$j] = $Row[$j]
  Next
Next

_ArrayDisplay ($aFinal)

 

Link to comment
Share on other sites

changed a few things on the above. should be better.

; PREG-based CSV file parser.
; Copyright 2007, Ed Fletcher

; Modifications by FichteFoll, (02-2012):
;   - added $cEnclose parameter for defining the enclosing character (default: ")
;   - have _CSVGetColumn and _CSVGetRow use Const ByRef for the first param to save ressources
;   - allow _CSVGetColumn and _CSVGetRow to use negative numbers as index (parsing backwards)
;     and add some error handling for invalid parameters
;   - fix use of "Die" as debug function since this is not defined
;   - modified _CSVTest to accept the same parameters as _CSVReadFile

; #AutoIt3Wrapper_AU3Check_Parameters=-d -w 3 -w 4 -w 5 -w 6
#include-once

;===============================================================================
;
; Description:      Reads a CSV file into a 2D array
; Parameter(s):     $sPath       - Path to the CSV file to read
;                   $cSeparator  - Separator character, default is comma (,)
;                   $cEnclose    - Character used in enclosings, default is "
; Requirement(s):   None
; Return Value(s):  On Success - 2D CSV array
;                   On Failure - 0  and Set
;                                   @ERROR to:  1 - File not found/openable
;                                               2 - File read error
;                                               3 - CSV format error
; Author(s):        Ed Fletcher
; Note(s):          Pattern based on work by Jeffrey E. F. Friedl in
;                   "Mastering Regular Expressions, 2nd Edition"
;===============================================================================
Func _CSVReadFile( $path, $separator = ',', $enclose = '"' )

    ;; open the file and read the entire CSV dataset into one string.

    Local $hFile = FileOpen( $path, 0 )
    If $hFile == -1 Then
        SetError( 1 )
        Return 0
    EndIf

    Local $sRawData = FileRead( $hFile )
    If @error > 0 Then
        FileClose( $hFile )
        SetError( 2 )
        Return 0
    EndIf

    FileClose( $hFile )

    If StringRight($sRawData, 2) = @CRLF Then $sRawData = StringTrimRight($sRawData, 2)

    ;; parse the string into an array of matched fields

    Local $pattern = '(?m)'                      ; multi-line search mode
    $pattern &= '\G(?:^|[' & $separator & '])'   ; start of line or start of field
;~  $pattern &= '(?:'                            ; one of two options:
    $pattern &= '(?: *'                            ; one of two options: ; mod at https://www.autoitscript.com/forum/topic/50680-simple-quick-csv-parser/?do=findComment&comment=1338047
    $pattern &= $enclose                         ;   a field starting with at double quote
    $pattern &= StringFormat('([^%s]*+(?:%s%s[^%s]*+)*+)', $enclose, $enclose, $enclose, $enclose)
    ;                                            ;   (quote-pairs and any non-quote chars)
    $pattern &= $enclose                         ;   a double quote ending the field
    $pattern &= '(\r?\n?)'                       ;   (any sort of line ending here?)
    $pattern &= '|'                              ; or:
    $pattern &= '([^"' & $separator & '\r\n]*+)' ;   (a simple CSV field, no quotes or commas)
    $pattern &= '(\r?\n?)'                       ;   (any sort of line ending here?)
    $pattern &= ')'                              ; note that we should have 4 captures per CSV element

    Local $aRawData = StringRegExp( $sRawData, $pattern, 4 )
    If @error <> 0 Then
        SetError( 3 )
        Return 0
    EndIf

    $sRawData = ''  ; we're done with this, and it might be large

    ; $aRawData is a 1D array containing every field in the CSV file.  Each element
    ; in $aRawData is an array of 5 strings, like so:
    ; 0 - all of the characters consumed while matching this field
    ; 1 - field contents, if the field was double quoted
    ; 2 - a line ending, if the field was double quoted and this is the end of the line
    ; 3 - field contents, if the field was *not* double quoted
    ; 4 - a line ending, if the field was *not* double quoted and this is the end of the line


    ;; pass through the results once to determine the number of rows and the max number of columns

    Local $i, $aMatch
    Local $colCount = 0, $maxCols = 0
    Local $rowCount = 0

    For $i=0 To UBound($aRawData)-1
        $aMatch = $aRawData[$i]

        If $colCount == 0 Then
            $rowCount += 1          ; we're looking at the first field on the current row
        EndIf

        $colCount += 1

        If $colCount > $maxCols Then
            $maxCols = $colCount    ; longest row so far...
        EndIf

        If $aMatch[2] <> '' OR (UBound($aMatch) > 3 AND $aMatch[4] <> '') Then
            $colCount = 0           ; row complete, we might start a new one
        EndIf
    Next

    ;; we now know how large to make our 2D output array

    Local $aCsvData[$rowCount][$maxCols]


    ;; finally, populate our output array

    Local $row = 0, $col = 0

    For $i=0 To UBound($aRawData)-1
        $aMatch = $aRawData[$i]

        If UBound($aMatch) > 3 AND $aMatch[3] <> '' Then
            ; It was a simple field, no processing required
            $aCsvData[$row][$col] = $aMatch[3]
        Else
            ; It was a quoted value, so take care of embedded double quotes
            $aCsvData[$row][$col] = StringReplace($aMatch[1], '""', '"')
        EndIf

        $col += 1

        ; now look for a line ending that ends the current data row
        If $aMatch[2] <> '' OR (UBound($aMatch) > 3 AND $aMatch[4] <> '') Then
            $row += 1
            $col = 0
        EndIf
    Next

    Return $aCsvData
EndFunc

;===============================================================================
;
; Description:      Pulls a single column out of a 2D array
; Parameter(s):     $aCSV      - 2D array to work with; Const ByRef
;                   $colNum    - Column index, 0-based;
;                                Negative numbers for backwards parsing are allowed
; Requirement(s):   None
; Return Value(s):  On Success - An array of columnar data
;                   On Failure - 0  and Set
;                                   @ERROR to:  1 - Dimension mismatch; only 2D arrays!
;                                               2 - $colNum is invalid
;                                               3 - $colNum exceeds column count
; Note(s):
;
;===============================================================================
Func _CSVGetColumn( Const ByRef $aCSV, $colNum )
    ; test array dimensions
    If UBound($aCSV, 0) <> 2 Then
        SetError( 1 )
        Return 0
    EndIf
    ; test second parameter for validity
    $colNum = Int($colNum) ; cast strings
    If Not IsInt($colNum) Then
        SetError( 2 )
        Return 0
    EndIf
    Local $aBounds[2] = [UBound($aCSV, 1), UBound($aCSV, 2)]
    ; test second parameter for validity (2)
    If $colNum < 0 Then $colNum = $aBounds[1] + $colNum
    If $colNum < 0 Or $colNum > ($aBounds[1] - 1) Then
        SetError( 3 )
        Return 0
    EndIf

    ; start with defining the return array
    Local $aColumn[$aBounds[0]]
    Local $i

    For $i=0 To $aBounds[0]-1
        $aColumn[$i] = $aCSV[$i][$colNum]
    Next

    Return $aColumn
EndFunc

;===============================================================================
;
; Description:      Pulls a single row out of a 2D array
; Parameter(s):     $aCSV      - 2D array to work with; Const ByRef
;                   $rowNum    - Row index, 0-based;
;                                 Negative numbers for backwards parsing are allowed
; Requirement(s):   None
; Return Value(s):  On Success - An array of row data
;                   On Failure - 0  and Set
;                                   @ERROR to:  1 - Dimension mismatch; only 2D arrays!
;                                               2 - $rowNum is invalid
;                                               3 - $rowNum exceeds column count
; Note(s):
;
;===============================================================================
Func _CSVGetRow( Const ByRef $aCSV, $rowNum )
    ; test array dimensions
    If UBound($aCSV, 0) <> 2 Then
        SetError( 1 )
        Return 0
    EndIf
    ; test second parameter for validity
    $colNum = Int($rowNum) ; cast strings
    If Not IsInt($rowNum) Then
        SetError( 2 )
        Return 0
    EndIf
    Local $aBounds[2] = [UBound($aCSV, 1), UBound($aCSV, 2)]
    ; test second parameter for validity (2)
    If $rowNum < 0 Then $rowNum = $aBounds[0] + $rowNum
    If $rowNum < 0 Or $rowNum > ($aBounds[0] - 1) Then
        SetError( 3 )
        Return 0
    EndIf

    ; start with defining the return array
    Local $aRow[$aBounds[1]]
    Local $i

    For $i=0 To $aBounds[1]-1
        $aRow[$i] = $aCSV[$rowNum][$i]
    Next

    Return $aRow
EndFunc

;===============================================================================
;
; Description:      Test function for _CSVReadFile()
; Parameter(s):     $sPath   - Path to the file to read, default is 'test.csv'
;                   $cSeparator  - Separator character, default is comma (,)
;                   $cEnclose    - Character used in enclosings, default is "
; Requirement(s):   None
; Return Value(s):  None
; Note(s):          Dumps out array data via ConsoleWrite()
;
;===============================================================================
Func _CSVTest( $file = "test.csv", $separator = ',', $enclose = '"' )
    Local $output = _CSVReadFile( $file, $separator, $enclose )

    If @error <> 0 Then
        ConsoleWriteError( "Error " & @error & " reading file [" & $file & "]" & @CR )
        Exit
    EndIf

    Local $i, $j
    For $i=0 To UBound($output)-1
        For $j=0 To UBound($output, 2)-1
            ConsoleWrite('[' & $output[$i][$j] & ']')
        Next
        ConsoleWrite(@CR)
    Next
EndFunc

If @ScriptName == 'LibCsv2_mod.au3' Then
    _CSVTest("testdata.csv")
EndIf

 

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Link to comment
Share on other sites

16 minutes ago, Nine said:

or this would be faster IMO

Seems it, but the problem of hardcoding a value for columns is exhibited in your example with the testdata.csv file xcaliber posted.

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

Link to comment
Share on other sites

the number of things I would swap for a redim penalty is zero.  I think _FRTA is the burden in this script and begets the second loop.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

with that small sample including a redim at the end, it is still about 8 times faster.

if the actual file is very large, with millions of entries, the difference could be huge...

Link to comment
Share on other sites

  • xcaliber13 changed the title to FileReadtoArray Help(Solved)
3 hours ago, Nine said:

with that small sample including a redim at the end, it is still about 8 times faster.

if the actual file is very large, with millions of entries, the difference could be huge...

 

It is, but if you are going to lead with an assumption of columns and then reduce them, why have any loops at all?

Global $aFinal[0][11]
_ArrayAdd($aFinal , fileread('testdata.csv') , 0 , ',' , @CRLF)
_ArrayDisplay($aFinal)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

1 hour ago, iamtheky said:

why have any loops at all?

Because we are not talking about number of lines, we are talking about speed of execution.

With your last proposal it is about the same speed as the previous one, instead of 8 times slower, it stands at about 7 times slower.  And there is no possible way of redim, unless you include a new loop to check max number of columns.   Lowest number of lines <> highest efficiency 

Link to comment
Share on other sites

Your words don’t make my words less true.  Conceptually the loop is unnecessary, and might want to check how many times I was concerned with speed.  I simply posited that there are multiple loops where 1 and potentially none could exist.

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

Another no-ReDim version:

#include <File.au3>

Local $s = FileRead("testdata.csv")
$s = StringStripWS($s, $STR_STRIPLEADING + $STR_STRIPTRAILING )     ; remove any superflous \R
$s = StringRegExpReplace($s, '("[^"\v]*"|[^",\v]*)(,)', '$1' & ChrW(0xFFFD))    ; comma separators -> 0xFFFD (Unicode "Not a Character")
Local $t = _TempFile('.\')
Local $h = FileOpen($t, $FO_OVERWRITE)
FileWrite($h, $s)
FileClose($h)

Local $aData
_FileReadToArray($t, $aData, $FRTA_NOCOUNT, ChrW(0xFFFD))
FileDelete($t)
_ArrayDisplay($aData)

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

was hoping regex would enter the fray,  my initial thoughts were something akin to:

#include<array.au3>

$arex = StringRegExp(fileread('testdata.csv') , "(.*?),|\s\s" , 3)
_ArrayDisplay($arex)

But the endgame is rarely a bigass 1D array with breaks.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...