Jump to content

Recommended Posts

Posted (edited)

Hi all,

I need to convert a .csv file that after conversion will fill an array for futher usage.

function used/not used: stringsplit(), stringregexp(),stringinstr(), stringreplace()

I know the best way is to use stringsplit given a delimiter (,). But this is not possible here since:

Here is some lines fo the .csv

Game Title,Quantity,For Sale,Platform,Rating,Number,New
After Life,5,2,PC,PG1,23332,Yes
Max Payne,4,0,Wii,PG3,109,No
Fifa 2009,11,2,DS,PG2,976,Yes
"Hellgate, London",3,2,PC,PG2,112,No
Logitech Mouse,12,4,mouse, ,No
Pinball 2010,1,1,PS3,PG1,065,No

As you can see:

"Hellgate, London",3,2,PC,PG2,112,No

will not work with stringsplit() since the "," it's inside the name of the object

Logitech Mouse,12,4,mouse, ,No

Here I have blank spaces between "," and the field "Rating" is missing. This is due to the .csv format.

When Platform <> PC,Wii,DS,PS1,PS2,PS3,XB,X3 then I need to set Platform = Hardware and Rating = the value found in place of platform (in this case Platform = N, Rating = Hardware)

Once this is finished and I have my array built, I need to add 2 more columns to it.

I get the columns _arraysearch()ing between the just created array and another file (.xlm). If i found the item I need I add the prices, if not, skip.

So now I use this code (working):

$exportfile = @ScriptDir & "\filelist.csv"
Dim $mydb
Global $file2ftp = @TempDir & "\temp2db.txt"
Dim $csv_list[1][7]
Global $rows
 
Func _exp2db_x()
    Local $iCounter = 0, $array2db[10000]
    Local $user2db
    Local $k = 0
    Dim $csv_array
    _FileReadToArray($exportfile, $csv_array)
    _FileReadToArray($price_file, $mydb)
    _ArraySort($mydb)
    $begin = TimerInit()
    For $i = 1 To UBound($csv_array) - 1
        If StringInStr($csv_array[$i], ", ,") Then ; i'm looking for a ", ," in lines
            $string = StringReplace($csv_array[$i], ", ,", ",@,", 1) ; if found I change it
        Else
            $string = $csv_array[$i]
        EndIf
        If StringInStr($string, ", ") Then $string = StringReplace($string, ", ", "^", 1); If I find "," inside the name I need to change it
        $name_s = _StringBetween($string, "", ",")
        $name = _cleanfordb($name_s[0]) ; Need to reconvert name with "," and remove " if present
        $string = StringReplace($string, ",", "*", 1) ; since I can't use stringsplit() i need different separators
        $string = StringReplace($string, ",", "#", 1) ; since I can't use stringsplit() i need different separators
        $forsale = StringRegExp($string, "#(.*?),", 1) ; for sale
        If $forsale[0] > 0 Then
            $k +=1
            $string = StringReplace($string, ",", "_", 1) ; since I can't use stringsplit() i need different separators
            $platform = StringRegExp($string, "_(.*?),", 1) ;platform
            $rating = StringRegExp($string, ",(.*?),", 1) ;rating
            If $platform[0] <> "PC" And $platform[0] <> "DS" And $platform[0] <> "Wii" And $platform[0] <> "XB" And $platform[0] <> "X3" And $platform[0] <> "Junk" Then
                $rating[0] = "Hardware" ; fixed value
                $platform[0] = "N" ; fixed value
            EndIf
            If $platform[0] = "Junk" Then
                $rating[0] = $platform[0]
                $platform[0] = "None"
            EndIf
            $new = StringRight($string, 2) ; just lasr 2 chars for last item in row
            If $new = "es" Then $new = "Yes"
            ReDim $csv_list[$k + 1][7]
            $csv_list[$k][0] = $name
            $csv_list[$k][1] = $platform[0]
            If $rating[0] <> "@" Then $csv_list[$k][2] = $rating[0]
            $csv_list[$k][3] = $new
            $csv_list[$k][4] = $forsale[0]
            ; now I build the string to check vs the main db (I need a partial search)
            $string = "<z:row c0='" & _normalize_db($csv_list[$k][0]) & "' Rating='" & $csv_list[$k][2] & "' New='" & $csv_list[$k][3] & "' Platform='" & $csv_list[$k][1] ; _normalize_db just change exotic chars into xml_readable sequence
            $check_string = _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1) ; first 38 lines is the xml header so I can skip
            If $check_string <> -1 Then $csv_list[$k][5] = _sellp($check_string) ; I can have 2 possible prices in main DB, best price and street price. If bestprice = 0 I'll use StreetPrice
        EndIf
    Next
    ConsoleWrite("Time to complete: " & TimerDiff($begin) & @CRLF)
EndFunc   ;==>_exp2db_x
 
Func _cleanfordb($k)
    If StringInStr($k, "^") Then $k = StringReplace($k, "^", ", ") ; ^
    If StringInStr($k, '"') Then $k = StringMid($k, 2, StringLen($k) - 2)
    Return $k
EndFunc   ;==>_cleanfordb
 
Func _sellp($cc)
    $p = StringRegExp($mydb[$cc], "c6='(.*?)' c7",1)
    If $p[0] = 0 Then $p = StringRegExp($mydb[$cc], "c7='(.*?)'/>",1)
    Return $p[0]
EndFunc   ;==>_sellp

Time to complete is 137,000ms, 2'17'', for 1,243 lines of csv (and 30,000 lines of main DB)

I need to work, during normal phases, with 10,000 to 30,000 lines so time to complete will be really too high.

I ask you, great programmers, if there is a way to optimize this routine (i'm not able to modify the csv since I get it this way from the Store Application

Thanks!

M.

Edited by marko001
  • Moderators
Posted

marko001,

Try this and see if it is faster: ;)

#include <Array.au3> ; Just for display

; Simulate reading the file into an array
Global $aLines[7] = [ _
            'Game Title,Quantity,For Sale,Platform,Rating,Number,New', _
            'After Life,5,2,PC,PG1,23332,Yes', _
            'Max Payne,4,0,Wii,PG3,109,No', _
            'Fifa 2009,11,2,DS,PG2,976,Yes', _
            '"Hellgate, London",3,2,PC,PG2,112,No', _
            'Logitech Mouse,12,4,mouse, , ,No', _
            'Pinball 2010,1,1,PS3,PG1,065,No']

For $i = 0 To UBound($aLines) - 1

    ; Locate the first separator comma
    $iFirst_Comma = StringInStr($aLines[$i], ",", 2, -6)
    ; Create an array from the remaining sections
    $aParts = StringSplit(StringMid($aLines[$i], $iFirst_Comma + 1), ",")
    ; Repalce the count with the first section
    $aParts[0] = StringMid($aLines[$i], 1, $iFirst_Comma - 1)
    ; Display the result
    _ArrayDisplay($aParts)
    ; You can now use the array as necessary

Next

I had to add another blank field to the "Logitech Mouse line - I assume that the real file has the correct number of delimiters in each line? :graduated:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

@ Melba23: Nope, this is a problem. I can just see that once I have the file, but it comes as you see from the original program.

If the product is not a boxed game title the .csv is written that way

Game Title,Quantity,For Sale,Platform,Rating,Number,New

Logitech Mouse,12,4,mouse, ,No

Title Not Imported, 0,0,Junk, ,No

That's for I use

[font=arial,helvetica,sans-serif]          If $platform[0] <> "PC" And $platform[0] <> "DS" And $platform[0] <> "Wii" And $platform[0] <> "XB" And $platform[0] <> "X3" And $platform[0] <> "Junk" Then
                $rating[0] = "Hardware" ; fixed value
                $platform[0] = "N" ; fixed value
            EndIf
            If $platform[0] = "Junk" Then
                $rating[0] = $platform[0]
                $platform[0] = "None"
            EndIf[/font]
Posted

$csv_array = _ParseCSV($exportfile)
    For $i = 1 To UBound($csv_array) - 1
        if $csv_array[$i][3] = "Junk" Then
            $csv_array[$i][3] ="None"      
            $csv_array[$i][4] = "Junk"      
            $csv_array[$i][6] = "No"    
     elseIf $csv_array[$i][3] <> "PC" And $csv_array[$i][3] <> "Wii" And $csv_array[$i][3] <> "DS" And $csv_array[$i][3] <> "XB" And $csv_array[$i][3] <> "X3" Then
            $csv_array[$i][3] ="N"          
            $csv_array[$i][4] = "Hardware"      
            $csv_array[$i][6] = "No"    
        EndIf
    Next

It's a bit tricky but solves it and allows me to use _ParseCSV()

  • Moderators
Posted

marko001,

If the csv lines are "regularly irregular ( by which I mean that if there is a single blank field it should really be 2) then we could do this: ;)

#include <Array.au3> ; Just for display

; Simulate reading the file into an array
Global $aLines[7] = [ _
            'Game Title,Quantity,For Sale,Platform,Rating,Number,New', _
            'After Life,5,2,PC,PG1,23332,Yes', _
            'Max Payne,4,0,Wii,PG3,109,No', _
            'Fifa 2009,11,2,DS,PG2,976,Yes', _
            '"Hellgate, London",3,2,PC,PG2,112,No', _
            'Logitech Mouse,12,4,mouse, ,No', _
            'Pinball 2010,1,1,PS3,PG1,065,No']

For $i = 0 To UBound($aLines) - 1

    ; Add the missing comma if required <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $sLine = StringReplace($aLines[$i], ", ,", ", , ,")
    ; Locate the first separator comma
    $iFirst_Comma = StringInStr($sLine, ",", 2, -6)
    ; Create an array from the remaining sections
    $aParts = StringSplit(StringMid($sLine, $iFirst_Comma + 1), ",")
    ; Repalce the count with the first section
    $aParts[0] = StringMid($sLine, 1, $iFirst_Comma - 1)
    ; Display the result
    _ArrayDisplay($aParts)
    ; You can now use the array as necessary

Next

Any use? :graduated:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

Melba. first of all let me say that I always liked your smileys here and there :graduated:

You always put the proper smiley in the right position ...

Then yes, this solves and dramatically reduce first part. (I'll edit later the post to report timing)

What is then terrific is this:

$string = "<z:row c0='" & _normalize_db($csv_array[$i][0]) & "' Rating='" & $csv_array[$i][4] & "' New='" & $csv_array[$i][6] & "' Platform='" & $csv_array[$i][3] ; _normalize_db just change exotic chars into xml_readable sequence
            $check_string = _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1) ; first 38 lines is the xml header so I can skip
            If $check_string <> -1 Then $csv_price[$i][0] = _sellp($check_card) ; I can have 2 possible prices in main DB, best price and street price. If bestprice = 0 I'll use StreetPrice (_sellp() is in first post)
; I also created after the _ParseCSV() a new array with dim $csv_price[ubound($csv_array)][2]

This solution gives me 2 arrays but index is the same so I don't mind merging them, I can use them anyway. Problem is that _arraysearch() is soooo slow and I can't use _arraybinarysearch() since I need a partial search (,1) and I don't know exact lenght of $check_string (game titles vary)

Posted

You could modify the _ArrayBinarySearch function to do a partial search by using StringInString, look at the way that _ArraySearch does the partial searches for hints on how to go about it.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Posted (edited)

INCREDIBLE! Simply incredible .... from 173,000 ms to 278ms ! (1,000 records vs 40,000 lines of xml to check)

The updated _arraybinarysearch_ext() is this one. If someone feel it interesting it can be added to array.au3

; #FUNCTION# ====================================================================================================================
; Name...........: _ArrayBinarySearch_Ext
; Description ...: Uses the binary search algorithm to search through a 1-dimensional array.
; Syntax.........: _ArrayBinarySearch(Const ByRef $avArray, $vValue[, $iStart = 0[, $iEnd = 0][, $iPartial = 0]])
; Parameters ....: $avArray  - Array to search
;                 $vValue   - Value to find
;                 $iStart   - [optional] Index of array to start searching at
;                 $iEnd  - [optional] Index of array to stop searching at
;               $iPartial - [optional] If set to 1, executes a partial search
; Return values .: Success - Index that value was found at
;                 Failure - -1, sets @error to:
;                 |1 - $avArray is not an array
;                 |2 - $vValue outside of array's min/max values
;                 |3 - $vValue was not found in array
;                 |4 - $iStart is greater than $iEnd
;                 |5 - $avArray is not a 1 dimensional array
; Author ........: Jos van der Zande <jdeb at autoitscript dot com>
; Modified.......: Marko001 - Added Partial Search
; Remarks .......: When performing a binary search on an array of items, the contents MUST be sorted before the search is done.
;                 Otherwise undefined results will be returned.
; Related .......: _ArrayFindAll, _ArraySearch
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ArrayBinarySearch_Ext(Const ByRef $avArray, $vValue, $iStart = 0, $iEnd = 0, $iPartial = 0)
    If Not IsArray($avArray) Then Return SetError(1, 0, -1)
    If UBound($avArray, 0) <> 1 Then Return SetError(5, 0, -1)
 
    Local $iUBound = UBound($avArray) - 1
 
    ; Bounds checking
    If $iEnd < 1 Or $iEnd > $iUBound Then $iEnd = $iUBound
    If $iStart < 0 Then $iStart = 0
    If $iStart > $iEnd Then Return SetError(4, 0, -1)
 
    Local $iMid = Int(($iEnd + $iStart) / 2)
 
    If $avArray[$iStart] > $vValue Or $avArray[$iEnd] < $vValue Then Return SetError(2, 0, -1)
 
    ; Search
    If Not $iPartial Then
        While $iStart <= $iMid And $vValue <> $avArray[$iMid]
            If $vValue < $avArray[$iMid] Then
                $iEnd = $iMid - 1
            Else
                $iStart = $iMid + 1
            EndIf
            $iMid = Int(($iEnd + $iStart) / 2)
        WEnd
    Else
        While $iStart <= $iMid And StringInStr($avArray[$iMid], $vValue) = 0
            If $vValue < $avArray[$iMid] Then
                $iEnd = $iMid - 1
            Else
                $iStart = $iMid + 1
            EndIf
            $iMid = Int(($iEnd + $iStart) / 2)
        WEnd
    EndIf
 
    If $iStart > $iEnd Then Return SetError(3, 0, -1) ; Entry not found
 
    Return $iMid
EndFunc   ;==>_ArrayBinarySearch_Ext
Edited by marko001
Posted

I'll probably add that, or something similar to my version of _ArrayBinarySearch, just because it's missing a key piece of functionality without being able to do a partial search. The version I have is a 2D version of _ArrayBinarySearch which will search a specific subitem/"column" of a 1 or 2D array.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

  • Moderators
Posted

marko001,

173,000 ms to 278ms

Now that it what I call a speed increase! :graduated:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

I tried now with 24,700 records (vs. 40,000 lines of .xml):

Conversion finished (it took 6.81 secs)

I slightly modified the script with redim($csv_array[ubound($csv_array)][8] so I can add prices directly in the array. Some lines have price = "" (but I expected it, no problem)

I added also (after the previous timing test)

$idx = UBound($csv_array)
    For $x = $idx - 1 to 1 Step -1 ; delete lines with empty prices
        If $csv_array[$X][7]= "" Then
            _ArrayDelete($csv_array,$x)
            $idx -=1
        EndIf
    Next

this sucks, it moves the timedifference to 127,000 secs.

since I now throw the array in an INSERT to fill a remote DB it's maybe better skip lines during the insert phase? Or can I clean the array in a faster way?

  • Moderators
Posted

marko001,

ReDim is one of the slowest AutoIt functions, so avoid it if you can. I would strongly recommend ignoring any lines that do not have a price during the early parsing phase and just not entering them into the array rather than deleting them later. :graduated:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

I wonder if a StringRegExp() might get rid of a lot of the gyrations you had to put in the program because StringSplit() can't skip over a delimiter within quotes. I'm sure someone could improve upon this:

#include <Array.au3>
$string = '"Hell' & Chr(39) & 's Gate, London",3,2,PC,PG2,112,No'
MsgBox(1,"",$string)
$array = StringRegExp($string, "((?:[^,\x22]|\x22[^\x22]*\x22)+)", 3)
_ArrayDisplay($array)
Posted (edited)

@Melba

I'll take a look at the whole script to follow your hint with redim, to remove it where possible.,

Relating your proposal, I simply skipped lines when creating the query for the insert:

[b]If $csv_array[$x][7] > 0 Then[/b]
                    $query_part2 &= "('','" & shopname
                    $query_part2 &= "','" & _readyforquery($csv_array[$x][0]) ; just to return me '' if the name contains a ' (otherwise query messes up)
                    $query_part2 &= "','" & $csv_array[$x][3]
                    $query_part2 &= "','" & $csv_array[$x][4]
                    $query_part2 &= "','" & $csv_array[$x][6]
                    $query_part2 &= "','" & $csv_array[$x][2]
                    $query_part2 &= "','" & $csv_array[$x][7]
                    $query_part2 &= "','" & $csv_array[$x][8]
                 $query_part2 &= "'),"
                EndIf

Edit: 11 14 secs to execute the query (2 queries of batch_size of 1,000 each) on 1,500 records.

Query it's optimized at maximum (i think...)

Phew, this is the end (of this part...)

Edited by marko001
  • 3 weeks later...
Posted (edited)

Sorry for bumping an old thread. I have just made a CSV to array function, although I may be able to make some improvements. It hasn't really been tested for speed or anything, so I don't know how it compairs. It does read incomplete rows, and ignores inline commas and parses double quotes as expected. It only uses Redim when a larger number of items occur within a row, which is normallly only once, since it reads the column headings first.

Edited by czardas

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...