marko001 Posted September 28, 2011 Posted September 28, 2011 (edited) Hi all, I need to convert a .csv file that after conversion will fill an array for futher usage. function used/not used: stringsplit(), stringregexp(),stringinstr(), stringreplace() I know the best way is to use stringsplit given a delimiter (,). But this is not possible here since: Here is some lines fo the .csv Game Title,Quantity,For Sale,Platform,Rating,Number,New After Life,5,2,PC,PG1,23332,Yes Max Payne,4,0,Wii,PG3,109,No Fifa 2009,11,2,DS,PG2,976,Yes "Hellgate, London",3,2,PC,PG2,112,No Logitech Mouse,12,4,mouse, ,No Pinball 2010,1,1,PS3,PG1,065,No As you can see: "Hellgate, London",3,2,PC,PG2,112,No will not work with stringsplit() since the "," it's inside the name of the object Logitech Mouse,12,4,mouse, ,No Here I have blank spaces between "," and the field "Rating" is missing. This is due to the .csv format. When Platform <> PC,Wii,DS,PS1,PS2,PS3,XB,X3 then I need to set Platform = Hardware and Rating = the value found in place of platform (in this case Platform = N, Rating = Hardware) Once this is finished and I have my array built, I need to add 2 more columns to it. I get the columns _arraysearch()ing between the just created array and another file (.xlm). If i found the item I need I add the prices, if not, skip. So now I use this code (working): expandcollapse popup$exportfile = @ScriptDir & "\filelist.csv" Dim $mydb Global $file2ftp = @TempDir & "\temp2db.txt" Dim $csv_list[1][7] Global $rows Func _exp2db_x() Local $iCounter = 0, $array2db[10000] Local $user2db Local $k = 0 Dim $csv_array _FileReadToArray($exportfile, $csv_array) _FileReadToArray($price_file, $mydb) _ArraySort($mydb) $begin = TimerInit() For $i = 1 To UBound($csv_array) - 1 If StringInStr($csv_array[$i], ", ,") Then ; i'm looking for a ", ," in lines $string = StringReplace($csv_array[$i], ", ,", ",@,", 1) ; if found I change it Else $string = $csv_array[$i] EndIf If StringInStr($string, ", ") Then $string = StringReplace($string, ", ", "^", 1); If I find "," inside the name I need to change it $name_s = _StringBetween($string, "", ",") $name = _cleanfordb($name_s[0]) ; Need to reconvert name with "," and remove " if present $string = StringReplace($string, ",", "*", 1) ; since I can't use stringsplit() i need different separators $string = StringReplace($string, ",", "#", 1) ; since I can't use stringsplit() i need different separators $forsale = StringRegExp($string, "#(.*?),", 1) ; for sale If $forsale[0] > 0 Then $k +=1 $string = StringReplace($string, ",", "_", 1) ; since I can't use stringsplit() i need different separators $platform = StringRegExp($string, "_(.*?),", 1) ;platform $rating = StringRegExp($string, ",(.*?),", 1) ;rating If $platform[0] <> "PC" And $platform[0] <> "DS" And $platform[0] <> "Wii" And $platform[0] <> "XB" And $platform[0] <> "X3" And $platform[0] <> "Junk" Then $rating[0] = "Hardware" ; fixed value $platform[0] = "N" ; fixed value EndIf If $platform[0] = "Junk" Then $rating[0] = $platform[0] $platform[0] = "None" EndIf $new = StringRight($string, 2) ; just lasr 2 chars for last item in row If $new = "es" Then $new = "Yes" ReDim $csv_list[$k + 1][7] $csv_list[$k][0] = $name $csv_list[$k][1] = $platform[0] If $rating[0] <> "@" Then $csv_list[$k][2] = $rating[0] $csv_list[$k][3] = $new $csv_list[$k][4] = $forsale[0] ; now I build the string to check vs the main db (I need a partial search) $string = "<z:row c0='" & _normalize_db($csv_list[$k][0]) & "' Rating='" & $csv_list[$k][2] & "' New='" & $csv_list[$k][3] & "' Platform='" & $csv_list[$k][1] ; _normalize_db just change exotic chars into xml_readable sequence $check_string = _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1) ; first 38 lines is the xml header so I can skip If $check_string <> -1 Then $csv_list[$k][5] = _sellp($check_string) ; I can have 2 possible prices in main DB, best price and street price. If bestprice = 0 I'll use StreetPrice EndIf Next ConsoleWrite("Time to complete: " & TimerDiff($begin) & @CRLF) EndFunc ;==>_exp2db_x Func _cleanfordb($k) If StringInStr($k, "^") Then $k = StringReplace($k, "^", ", ") ; ^ If StringInStr($k, '"') Then $k = StringMid($k, 2, StringLen($k) - 2) Return $k EndFunc ;==>_cleanfordb Func _sellp($cc) $p = StringRegExp($mydb[$cc], "c6='(.*?)' c7",1) If $p[0] = 0 Then $p = StringRegExp($mydb[$cc], "c7='(.*?)'/>",1) Return $p[0] EndFunc ;==>_sellp Time to complete is 137,000ms, 2'17'', for 1,243 lines of csv (and 30,000 lines of main DB) I need to work, during normal phases, with 10,000 to 30,000 lines so time to complete will be really too high. I ask you, great programmers, if there is a way to optimize this routine (i'm not able to modify the csv since I get it this way from the Store Application Thanks! M. Edited September 28, 2011 by marko001
AdamUL Posted September 28, 2011 Posted September 28, 2011 Have a look at this I have used the functions in it many time when working with CSV files. Adam
Moderators Melba23 Posted September 28, 2011 Moderators Posted September 28, 2011 marko001,Try this and see if it is faster: #include <Array.au3> ; Just for display ; Simulate reading the file into an array Global $aLines[7] = [ _ 'Game Title,Quantity,For Sale,Platform,Rating,Number,New', _ 'After Life,5,2,PC,PG1,23332,Yes', _ 'Max Payne,4,0,Wii,PG3,109,No', _ 'Fifa 2009,11,2,DS,PG2,976,Yes', _ '"Hellgate, London",3,2,PC,PG2,112,No', _ 'Logitech Mouse,12,4,mouse, , ,No', _ 'Pinball 2010,1,1,PS3,PG1,065,No'] For $i = 0 To UBound($aLines) - 1 ; Locate the first separator comma $iFirst_Comma = StringInStr($aLines[$i], ",", 2, -6) ; Create an array from the remaining sections $aParts = StringSplit(StringMid($aLines[$i], $iFirst_Comma + 1), ",") ; Repalce the count with the first section $aParts[0] = StringMid($aLines[$i], 1, $iFirst_Comma - 1) ; Display the result _ArrayDisplay($aParts) ; You can now use the array as necessary NextI had to add another blank field to the "Logitech Mouse line - I assume that the real file has the correct number of delimiters in each line? M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
marko001 Posted September 28, 2011 Author Posted September 28, 2011 @ Melba23: Nope, this is a problem. I can just see that once I have the file, but it comes as you see from the original program. If the product is not a boxed game title the .csv is written that wayGame Title,Quantity,For Sale,Platform,Rating,Number,NewLogitech Mouse,12,4,mouse, ,NoTitle Not Imported, 0,0,Junk, ,NoThat's for I use [font=arial,helvetica,sans-serif] If $platform[0] <> "PC" And $platform[0] <> "DS" And $platform[0] <> "Wii" And $platform[0] <> "XB" And $platform[0] <> "X3" And $platform[0] <> "Junk" Then $rating[0] = "Hardware" ; fixed value $platform[0] = "N" ; fixed value EndIf If $platform[0] = "Junk" Then $rating[0] = $platform[0] $platform[0] = "None" EndIf[/font]
marko001 Posted September 28, 2011 Author Posted September 28, 2011 @ Adam: _ParseCSV() works (see post I made to Melba) except the (damn) fact that some lines of the .csv are "irregular" How can I manage them?
marko001 Posted September 28, 2011 Author Posted September 28, 2011 $csv_array = _ParseCSV($exportfile) For $i = 1 To UBound($csv_array) - 1 if $csv_array[$i][3] = "Junk" Then $csv_array[$i][3] ="None" $csv_array[$i][4] = "Junk" $csv_array[$i][6] = "No" elseIf $csv_array[$i][3] <> "PC" And $csv_array[$i][3] <> "Wii" And $csv_array[$i][3] <> "DS" And $csv_array[$i][3] <> "XB" And $csv_array[$i][3] <> "X3" Then $csv_array[$i][3] ="N" $csv_array[$i][4] = "Hardware" $csv_array[$i][6] = "No" EndIf Next It's a bit tricky but solves it and allows me to use _ParseCSV()
Moderators Melba23 Posted September 28, 2011 Moderators Posted September 28, 2011 marko001,If the csv lines are "regularly irregular ( by which I mean that if there is a single blank field it should really be 2) then we could do this: #include <Array.au3> ; Just for display ; Simulate reading the file into an array Global $aLines[7] = [ _ 'Game Title,Quantity,For Sale,Platform,Rating,Number,New', _ 'After Life,5,2,PC,PG1,23332,Yes', _ 'Max Payne,4,0,Wii,PG3,109,No', _ 'Fifa 2009,11,2,DS,PG2,976,Yes', _ '"Hellgate, London",3,2,PC,PG2,112,No', _ 'Logitech Mouse,12,4,mouse, ,No', _ 'Pinball 2010,1,1,PS3,PG1,065,No'] For $i = 0 To UBound($aLines) - 1 ; Add the missing comma if required <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $sLine = StringReplace($aLines[$i], ", ,", ", , ,") ; Locate the first separator comma $iFirst_Comma = StringInStr($sLine, ",", 2, -6) ; Create an array from the remaining sections $aParts = StringSplit(StringMid($sLine, $iFirst_Comma + 1), ",") ; Repalce the count with the first section $aParts[0] = StringMid($sLine, 1, $iFirst_Comma - 1) ; Display the result _ArrayDisplay($aParts) ; You can now use the array as necessary NextAny use? M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
marko001 Posted September 28, 2011 Author Posted September 28, 2011 Melba. first of all let me say that I always liked your smileys here and there You always put the proper smiley in the right position ... Then yes, this solves and dramatically reduce first part. (I'll edit later the post to report timing) What is then terrific is this: $string = "<z:row c0='" & _normalize_db($csv_array[$i][0]) & "' Rating='" & $csv_array[$i][4] & "' New='" & $csv_array[$i][6] & "' Platform='" & $csv_array[$i][3] ; _normalize_db just change exotic chars into xml_readable sequence $check_string = _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1) ; first 38 lines is the xml header so I can skip If $check_string <> -1 Then $csv_price[$i][0] = _sellp($check_card) ; I can have 2 possible prices in main DB, best price and street price. If bestprice = 0 I'll use StreetPrice (_sellp() is in first post) ; I also created after the _ParseCSV() a new array with dim $csv_price[ubound($csv_array)][2] This solution gives me 2 arrays but index is the same so I don't mind merging them, I can use them anyway. Problem is that _arraysearch() is soooo slow and I can't use _arraybinarysearch() since I need a partial search (,1) and I don't know exact lenght of $check_string (game titles vary)
BrewManNH Posted September 28, 2011 Posted September 28, 2011 You could modify the _ArrayBinarySearch function to do a partial search by using StringInString, look at the way that _ArraySearch does the partial searches for hints on how to go about it. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
marko001 Posted September 28, 2011 Author Posted September 28, 2011 (edited) INCREDIBLE! Simply incredible .... from 173,000 ms to 278ms ! (1,000 records vs 40,000 lines of xml to check) The updated _arraybinarysearch_ext() is this one. If someone feel it interesting it can be added to array.au3 expandcollapse popup; #FUNCTION# ==================================================================================================================== ; Name...........: _ArrayBinarySearch_Ext ; Description ...: Uses the binary search algorithm to search through a 1-dimensional array. ; Syntax.........: _ArrayBinarySearch(Const ByRef $avArray, $vValue[, $iStart = 0[, $iEnd = 0][, $iPartial = 0]]) ; Parameters ....: $avArray - Array to search ; $vValue - Value to find ; $iStart - [optional] Index of array to start searching at ; $iEnd - [optional] Index of array to stop searching at ; $iPartial - [optional] If set to 1, executes a partial search ; Return values .: Success - Index that value was found at ; Failure - -1, sets @error to: ; |1 - $avArray is not an array ; |2 - $vValue outside of array's min/max values ; |3 - $vValue was not found in array ; |4 - $iStart is greater than $iEnd ; |5 - $avArray is not a 1 dimensional array ; Author ........: Jos van der Zande <jdeb at autoitscript dot com> ; Modified.......: Marko001 - Added Partial Search ; Remarks .......: When performing a binary search on an array of items, the contents MUST be sorted before the search is done. ; Otherwise undefined results will be returned. ; Related .......: _ArrayFindAll, _ArraySearch ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _ArrayBinarySearch_Ext(Const ByRef $avArray, $vValue, $iStart = 0, $iEnd = 0, $iPartial = 0) If Not IsArray($avArray) Then Return SetError(1, 0, -1) If UBound($avArray, 0) <> 1 Then Return SetError(5, 0, -1) Local $iUBound = UBound($avArray) - 1 ; Bounds checking If $iEnd < 1 Or $iEnd > $iUBound Then $iEnd = $iUBound If $iStart < 0 Then $iStart = 0 If $iStart > $iEnd Then Return SetError(4, 0, -1) Local $iMid = Int(($iEnd + $iStart) / 2) If $avArray[$iStart] > $vValue Or $avArray[$iEnd] < $vValue Then Return SetError(2, 0, -1) ; Search If Not $iPartial Then While $iStart <= $iMid And $vValue <> $avArray[$iMid] If $vValue < $avArray[$iMid] Then $iEnd = $iMid - 1 Else $iStart = $iMid + 1 EndIf $iMid = Int(($iEnd + $iStart) / 2) WEnd Else While $iStart <= $iMid And StringInStr($avArray[$iMid], $vValue) = 0 If $vValue < $avArray[$iMid] Then $iEnd = $iMid - 1 Else $iStart = $iMid + 1 EndIf $iMid = Int(($iEnd + $iStart) / 2) WEnd EndIf If $iStart > $iEnd Then Return SetError(3, 0, -1) ; Entry not found Return $iMid EndFunc ;==>_ArrayBinarySearch_Ext Edited September 28, 2011 by marko001
BrewManNH Posted September 28, 2011 Posted September 28, 2011 I'll probably add that, or something similar to my version of _ArrayBinarySearch, just because it's missing a key piece of functionality without being able to do a partial search. The version I have is a 2D version of _ArrayBinarySearch which will search a specific subitem/"column" of a 1 or 2D array. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
Moderators Melba23 Posted September 28, 2011 Moderators Posted September 28, 2011 marko001,173,000 ms to 278msNow that it what I call a speed increase! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
marko001 Posted September 28, 2011 Author Posted September 28, 2011 I tried now with 24,700 records (vs. 40,000 lines of .xml): Conversion finished (it took 6.81 secs) I slightly modified the script with redim($csv_array[ubound($csv_array)][8] so I can add prices directly in the array. Some lines have price = "" (but I expected it, no problem) I added also (after the previous timing test) $idx = UBound($csv_array) For $x = $idx - 1 to 1 Step -1 ; delete lines with empty prices If $csv_array[$X][7]= "" Then _ArrayDelete($csv_array,$x) $idx -=1 EndIf Next this sucks, it moves the timedifference to 127,000 secs. since I now throw the array in an INSERT to fill a remote DB it's maybe better skip lines during the insert phase? Or can I clean the array in a faster way?
Moderators Melba23 Posted September 28, 2011 Moderators Posted September 28, 2011 marko001, ReDim is one of the slowest AutoIt functions, so avoid it if you can. I would strongly recommend ignoring any lines that do not have a price during the early parsing phase and just not entering them into the array rather than deleting them later. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
Spiff59 Posted September 28, 2011 Posted September 28, 2011 I wonder if a StringRegExp() might get rid of a lot of the gyrations you had to put in the program because StringSplit() can't skip over a delimiter within quotes. I'm sure someone could improve upon this: #include <Array.au3> $string = '"Hell' & Chr(39) & 's Gate, London",3,2,PC,PG2,112,No' MsgBox(1,"",$string) $array = StringRegExp($string, "((?:[^,\x22]|\x22[^\x22]*\x22)+)", 3) _ArrayDisplay($array)
marko001 Posted September 28, 2011 Author Posted September 28, 2011 (edited) @Melba I'll take a look at the whole script to follow your hint with redim, to remove it where possible., Relating your proposal, I simply skipped lines when creating the query for the insert: [b]If $csv_array[$x][7] > 0 Then[/b] $query_part2 &= "('','" & shopname $query_part2 &= "','" & _readyforquery($csv_array[$x][0]) ; just to return me '' if the name contains a ' (otherwise query messes up) $query_part2 &= "','" & $csv_array[$x][3] $query_part2 &= "','" & $csv_array[$x][4] $query_part2 &= "','" & $csv_array[$x][6] $query_part2 &= "','" & $csv_array[$x][2] $query_part2 &= "','" & $csv_array[$x][7] $query_part2 &= "','" & $csv_array[$x][8] $query_part2 &= "')," EndIf Edit: 11 14 secs to execute the query (2 queries of batch_size of 1,000 each) on 1,500 records.Query it's optimized at maximum (i think...) Phew, this is the end (of this part...) Edited September 28, 2011 by marko001
czardas Posted October 19, 2011 Posted October 19, 2011 (edited) Sorry for bumping an old thread. I have just made a CSV to array function, although I may be able to make some improvements. It hasn't really been tested for speed or anything, so I don't know how it compairs. It does read incomplete rows, and ignores inline commas and parses double quotes as expected. It only uses Redim when a larger number of items occur within a row, which is normallly only once, since it reads the column headings first. Edited October 19, 2011 by czardas operator64 ArrayWorkshop
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now