czardas Posted October 27, 2013 Posted October 27, 2013 (edited) This UDF introduces one or two small tweaks to the functions I posted in Snippet Dump (nothing major). Terminating the CSV with a break is now the default behaviour (although I think it's just a lazy convention). The inclusion of a new function _ArrayToSubItemCSV() got me excited. Thanks for an awesome idea Chimaera. I have included the option to sort the returned CSV formatted strings ascending using any column. ; expandcollapse popup#include-once #include <Array.au3> ; #INDEX# ======================================================================================================================= ; Title .........: CSVSplit ; AutoIt Version : 3.3.8.1 ; Language ......: English ; Description ...: CSV related functions ; Notes .........: CSV format does not have a general standard format, however these functions allow some flexibility. ; The default behaviour of the functions applies to the most common formats used in practice. ; Author(s) .....: czardas ; =============================================================================================================================== ; #CURRENT# ===================================================================================================================== ;_ArrayToCSV ;_ArrayToSubItemCSV ;_CSVSplit ; =============================================================================================================================== ; #INTERNAL_USE_ONLY#============================================================================================================ ; __GetSubstitute ; =============================================================================================================================== ; #FUNCTION# ==================================================================================================================== ; Name...........: _ArrayToCSV ; Description ...: Converts a two dimensional array to CSV format ; Syntax.........: _ArrayToCSV ( $aArray [, $sDelim [, $sNewLine [, $bFinalBreak ]]] ) ; Parameters ....: $aArray - The array to convert ; $sDelim - Optional - Delimiter set to comma by default (see comments) ; $sNewLine - Optional - New Line set to @LF by default (see comments) ; $bFinalBreak - Set to true in accordance with common practice => CSV Line termination ; Return values .: Success - Returns a string in CSV format ; Failure - Sets @error to: ; |@error = 1 - First parameter is not a valid array ; |@error = 2 - Second parameter is not a valid string ; |@error = 3 - Third parameter is not a valid string ; |@error = 4 - 2nd and 3rd parameters must be different characters ; Author ........: czardas ; Comments ......; One dimensional arrays are returned as multiline text (without delimiters) ; ; Some users may need to set the second parameter to semicolon to return the prefered CSV format ; ; To convert to TSV use @TAB for the second parameter ; ; Some users may wish to set the third parameter to @CRLF ; =============================================================================================================================== Func _ArrayToCSV($aArray, $sDelim = Default, $sNewLine = Default, $bFinalBreak = True) If Not IsArray($aArray) Or Ubound($aArray, 0) > 2 Or Ubound($aArray) = 0 Then Return SetError(1, 0 ,"") If $sDelim = Default Then $sDelim = "," If $sDelim = "" Then Return SetError(2, 0 ,"") If $sNewLine = Default Then $sNewLine = @LF If $sNewLine = "" Then Return SetError(3, 0 ,"") If $sDelim = $sNewLine Then Return SetError(4, 0, "") Local $iRows = UBound($aArray), $sString = "" If Ubound($aArray, 0) = 2 Then ; Check if the array has two dimensions Local $iCols = UBound($aArray, 2) For $i = 0 To $iRows -1 For $j = 0 To $iCols -1 If StringRegExp($aArray[$i][$j], '["\r\n' & $sDelim & ']') Then $aArray[$i][$j] = '"' & StringReplace($aArray[$i][$j], '"', '""') & '"' EndIf $sString &= $aArray[$i][$j] & $sDelim Next $sString = StringTrimRight($sString, StringLen($sDelim)) & $sNewLine Next Else ; The delimiter is not needed For $i = 0 To $iRows -1 If StringRegExp($aArray[$i], '["\r\n' & $sDelim & ']') Then $aArray[$i] = '"' & StringReplace($aArray[$i], '"', '""') & '"' EndIf $sString &= $aArray[$i] & $sNewLine Next EndIf If Not $bFinalBreak Then $sString = StringTrimRight($sString, StringLen($sNewLine)) ; Delete any newline characters added to the end of the string Return $sString EndFunc ;==> _ArrayToCSV ; #FUNCTION# ==================================================================================================================== ; Name...........: _ArrayToSubItemCSV ; Description ...: Converts an array to multiple CSV formated strings based on the content of the selected column ; Syntax.........: _ArrayToSubItemCSV($aCSV, $iCol [, $sDelim [, $bHeaders [, $iSortCol [, $bAlphaSort ]]]]) ; Parameters ....: $aCSV - The array to parse ; $iCol - Array column used to search for unique content ; $sDelim - Optional - Delimiter set to comma by default ; $bHeaders - Include csv column headers - Default = False ; $iSortCol - The column to sort on for each new CSV (sorts ascending) - Default = False ; $bAlphaSort - If set to true, sorting will be faster but numbers won't always appear in order of magnitude. ; Return values .: Success - Returns a two dimensional array - col 0 = subitem name, col 1 = CSV data ; Failure - Returns an empty string and sets @error to: ; |@error = 1 - First parameter is not a 2D array ; |@error = 2 - Nothing to parse ; |@error = 3 - Invalid second parameter Column number ; |@error = 4 - Invalid third parameter - Delimiter is an empty string ; |@error = 5 - Invalid fourth parameter - Sort Column number is out of range ; Author ........: czardas ; Comments ......; @CRLF is used for line breaks in the returned array of CSV strings. ; ; Data in the sorting column is automatically assumed to contain numeric values. ; ; Setting $iSortCol equal to $iCol will return csv rows in their original ordered sequence. ; =============================================================================================================================== Func _ArrayToSubItemCSV($aCSV, $iCol, $sDelim = Default, $bHeaders = Default, $iSortCol = Default, $bAlphaSort = Default) If Not IsArray($aCSV) Or UBound($aCSV, 0) <> 2 Then Return SetError(1, 0, "") ; Not a 2D array Local $iBound = UBound($aCSV), $iNumCols = UBound($aCSV, 2) If $iBound < 2 Then Return SetError(2, 0, "") ; Nothing to parse If IsInt($iCol) = 0 Or $iCol < 0 Or $iCol > $iNumCols -1 Then Return SetError(3, 0, "") ; $iCol is out of range If $sDelim = Default Then $sDelim = "," If $sDelim = "" Then Return SetError(4, 0, "") ; Delimiter can not be an empty string If $bHeaders = Default Then $bHeaders = False If $iSortCol = Default Or $iSortCol == False Then $iSortCol = -1 If IsInt($iSortCol) = 0 Or $iSortCol < -1 Or $iSortCol > $iNumCols -1 Then Return SetError(5, 0, "") ; $iSortCol is out of range If $bAlphaSort = Default Then $bAlphaSort = False Local $iStart = 0 If $bHeaders Then If $iBound = 2 Then Return SetError(2, 0, "") ; Nothing to parse $iStart = 1 EndIf Local $sTestItem, $iNewCol = 0 If $iSortCol <> -1 And ($bAlphaSort = False Or $iSortCol = $iCol) Then ; In this case we need an extra Column for sorting ReDim $aCSV [$iBound][$iNumCols +1] ; Populate column If $iSortCol = $iCol Then For $i = $iStart To $iBound -1 $aCSV[$i][$iNumCols] = $i Next Else For $i = $iStart To $iBound -1 $sTestItem = StringRegExpReplace($aCSV[$i][$iSortCol], "\A\h+", "") ; Remove leading horizontal WS If StringIsInt($sTestItem) Or StringIsFloat($sTestItem) Then $aCSV[$i][$iNumCols] = Number($sTestItem) Else $aCSV[$i][$iNumCols] = $aCSV[$i][$iSortCol] EndIf Next EndIf $iNewCol = 1 $iSortCol = $iNumCols EndIf _ArraySort($aCSV, 0, $iStart, 0, $iCol) ; Sort on the selected column Local $aSubItemCSV[$iBound][2], $iItems = 0, $aTempCSV[1][$iNumCols + $iNewCol], $iTempIndex $sTestItem = Not $aCSV[$iBound -1][$iCol] For $i = $iBound -1 To $iStart Step -1 If $sTestItem <> $aCSV[$i][$iCol] Then ; Start a new csv instance If $iItems > 0 Then ; Write to main array ReDim $aTempCSV[$iTempIndex][$iNumCols + $iNewCol] If $iSortCol <> -1 Then _ArraySort($aTempCSV, 0, $iStart, 0, $iSortCol) If $iNewCol Then ReDim $aTempCSV[$iTempIndex][$iNumCols] $aSubItemCSV[$iItems -1][0] = $sTestItem $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF) EndIf ReDim $aTempCSV[$iBound][$iNumCols + $iNewCol] ; Create new csv template $iTempIndex = 0 $sTestItem = $aCSV[$i][$iCol] If $bHeaders Then For $j = 0 To $iNumCols -1 $aTempCSV[0][$j] = $aCSV[0][$j] Next $iTempIndex = 1 EndIf $iItems += 1 EndIf For $j = 0 To $iNumCols + $iNewCol -1 ; Continue writing to csv $aTempCSV[$iTempIndex][$j] = $aCSV[$i][$j] Next $iTempIndex += 1 Next ReDim $aTempCSV[$iTempIndex][$iNumCols + $iNewCol] If $iSortCol <> -1 Then _ArraySort($aTempCSV, 0, $iStart, 0, $iSortCol) If $iNewCol Then ReDim $aTempCSV[$iTempIndex][$iNumCols] $aSubItemCSV[$iItems -1][0] = $sTestItem $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF) ReDim $aSubItemCSV[$iItems][2] Return $aSubItemCSV EndFunc ;==> _ArrayToSubItemCSV ; #FUNCTION# ==================================================================================================================== ; Name...........: _CSVSplit ; Description ...: Converts a string in CSV format to a two dimensional array (see comments) ; Syntax.........: CSVSplit ( $aArray [, $sDelim ] ) ; Parameters ....: $aArray - The array to convert ; $sDelim - Optional - Delimiter set to comma by default (see 2nd comment) ; Return values .: Success - Returns a two dimensional array or a one dimensional array (see 1st comment) ; Failure - Sets @error to: ; |@error = 1 - First parameter is not a valid string ; |@error = 2 - Second parameter is not a valid string ; |@error = 3 - Could not find suitable delimiter replacements ; Author ........: czardas ; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string ; ; Some CSV formats use semicolon as a delimiter instead of a comma ; ; Set the second parameter to @TAB To convert to TSV ; =============================================================================================================================== Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks Local $iOverride = 63743, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote For $i = 0 To 2 $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters Next $iOverride = 0 Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match $string = "" Local $iBound = UBound($aArray) For $i = 0 To $iBound -1 $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters EndIf $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters Next $iOverride = 0 $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows $iBound = UBound($aArray) Local $aCSV[$iBound][2], $aTemp For $i = 0 To $iBound -1 $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items If Not @error Then If $aTemp[0] > $iOverride Then $iOverride = $aTemp[0] ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items EndIf EndIf For $j = 1 To $aTemp[0] If StringLen($aTemp[$j]) Then If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char EndIf $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row EndIf Next Next If $iOverride > 1 Then Return $aCSV ; Multiple Columns Else For $i = 0 To $iBound -1 If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char EndIf Next Return $aArray ; Single column EndIf EndFunc ;==> _CSVSplit ; #INTERNAL_USE_ONLY# =========================================================================================================== ; Name...........: __GetSubstitute ; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string ; Syntax.........: __GetSubstitute($string, ByRef $iCountdown) ; Parameters ....: $string - The string of characters to avoid ; $iCountdown - The first code point to begin checking ; Return values .: Success - Returns a suitable substitution character not found within the first parameter ; Failure - Sets @error to 1 => No substitution character available ; Author ........: czardas ; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use ; $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function ; Initially $iCountown should be passed with a value = 63743 ; =============================================================================================================================== Func __GetSubstitute($string, ByRef $iCountdown) If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options Local $sTestChar For $i = $iCountdown To 57344 Step -1 $sTestChar = ChrW($i) $iCountdown -= 1 If Not StringInStr($string, $sTestChar) Then Return $sTestChar EndIf Next Return SetError(1, 0, "") ; Out of options EndFunc ;==> __GetSubstitute ; See examples in post #8 below. ; Edited March 20, 2015 by czardas hudsonhock, BigDaddyO and TimRude 1 2 operator64 ArrayWorkshop
guinness Posted October 27, 2013 Posted October 27, 2013 This is incorrect use of a character class. [(\r\n)\r\n] It's basically [\r\n] UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018
czardas Posted October 27, 2013 Author Posted October 27, 2013 This is incorrect use of a character class. [(\r\n)\r\n] It's basically [\r\n] Thanks guinness. operator64 ArrayWorkshop
guinness Posted October 27, 2013 Posted October 27, 2013 No probs -_0 UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018
czardas Posted October 27, 2013 Author Posted October 27, 2013 (edited) I can still see some ways to clean it up: which I will do shortly (I wrote much of this code a while ago). For all intents and purposes the resulting code will not behave differently. I am using 'private use' code point range characters as delimiters (first time implementation by me). Edited October 27, 2013 by czardas operator64 ArrayWorkshop
czardas Posted October 28, 2013 Author Posted October 28, 2013 (edited) I have updated the first post. The only changes made were in error checking. To get a good idea of what's going on in the example, make sure your csv has duplicated entries in col 1 (2nd column) otherwise you will get a new csv for every row - aargh. Edited October 28, 2013 by czardas operator64 ArrayWorkshop
czardas Posted November 24, 2013 Author Posted November 24, 2013 (edited) This replacement for the function _ArrayToSubItemCSV (see first post) still needs testing and perhaps a tweak or two. I have made some tricky choices and I wanted to know your opinions. Because csv fields commonly contain numeric data, the preferance has been set to sort strings by magnitude when numeric content is encountered. This is the default behaviour for sorting in the function below, although it involes more parsing. Setting $bAlphaSort to True negates the default behaviour, and the code will run slightly faster. One problem encountered is that number conversions sometimes alter string format so an extra column is needed to do the sorting. This avoids any potential data corruption within the newly created csv strings, but again adds extra overhead. Finally I considered placing the rows (in each new csv after parsing) back in the same order they were originally encountered in the source array. One column that is guaranteed to contain the same value in every row after splitting is the splitting column itself. So setting the sorting column parameter equal to the splitting column parameter is basically saying do nothing at all. I didn't want to add another parameter and I am already using the value $iSortCol = False when any order will do - in which case $iSortCol becomes -1 in the code (so as not to be confused with zero). Sorry it's hard to explain. Basically there are 4 choices: 1. Return data in any order ==> $iSortCol = False (fastest default method) 2. Quick Sort ==> Set $bAlphaSort = True (sorting method) 3. Original sequence ==> Set $iSortCol = $iCol (slower) 4. Numeric Sort ==> $bAlphaSort = False (slowest default sorting method) If _ArraySort() were to return duplicates in the order in which they are encountered (this is not a criticism), most of these complications wouldn't occur. Of course there are other ways to parse the data. CODE MOVED TO THE UDF (1st post) The modifications are the result of a feature request; so any alternative methods, suggestions about default behaviour or other ideas would be greatly appreciated. Edited November 26, 2013 by czardas operator64 ArrayWorkshop
czardas Posted November 24, 2013 Author Posted November 24, 2013 (edited) Finished testing and there were one or two bugs needed fixing. The default behaviour is non conventional but suitable for practical reasons, as mentioned above - although I expected differences of opinion. Examples to illustrate the functionality of the new version of _ArrayToSubItemCSV() Here is test.csv used for the example: Date, Details,Amount 01-02-2013,contra, 10 03-02-2013,withdrawal, 2 04-02-2013,lodgement,12.0 05-02-2013,transfer,.56 08-02-2013,contra,2000 21-02-2013,lodgement,0.20 04-03-2013,withdrawal,8 09-03-2013,contra,2 18-03-2013,lodgement,0.20 24-03-2013,lodgement,8 31-03-2013,transfer,10 04-04-2013,withdrawal, .01 05-04-2013,transfer,.56 08-05-2013,transfer,20.00 21-07-2013,lodgement,0.20 04-08-2013,withdrawal,8 04-09-2013,lodgement,12.0 05-09-2013,transfer,.56 08-09-2013,contra,2000 ; And here is the code used for testing: expandcollapse popup#include <CSVSplit.au3> Local $sFilePath = @ScriptDir & "\test.csv" ; Change this to your own csv file (in the same directory as the script) Local $hFile = FileOpen($sFilePath) If $hFile = -1 Then MsgBox(0, "", "Unable to open file") Exit EndIf Local $sCSV = FileRead($hFile) If @error Then MsgBox(0, "", "Unable to read file") FileClose($hFile) Exit EndIf FileClose($hFile) Local $aCSV = _CSVSplit($sCSV) ; Create the main array If @error Then ConsoleWrite("Error = " & @error & @LF) Exit EndIf ; Examples 0f _ArrayToSubItemCSV() Local $aParam[4][5] = [ _ ; Test parameters [1,Default,True,2,Default], _ ; Test 1 = Numeric sort - with headers [1,Default,True,2,True], _ ; Test 2 = Alpha Sort - with headers [1,Default,Default,Default,Default], _ ; Test 3 = Any order - no headers [1,Default,True,1,Default]] ; Test 4 = Original sequence - with headers Local $aSubItemCSV, $sFolderName, $sNewPath, $iSuccess For $i = 0 To 3 ; Create an array of csv strings $aSubItemCSV = _ArrayToSubItemCSV($aCSV, $aParam[$i][0], $aParam[$i][1], $aParam[$i][2], $aParam[$i][3], $aParam[$i][4]) If @error Then ConsoleWrite("Error = " & @error & @LF) Exit EndIf ; Now let's write each csv to file $sFolderName = StringTrimRight(StringReplace($sFilePath, @ScriptDir & "\", ""), 4) & " " & $i +1 For $j = 0 To UBound($aSubItemCSV) -1 If StringRegExp($aSubItemCSV[$j][0], '[\/\?<>\\\:\|\*"]') Then MsgBox(0, "Invalid file name", "You can not use the following characters in a file name" & @CRLF & '/ ? < > \ : | * "') Exit EndIf $sNewPath = @ScriptDir & "\" & $sFolderName & "\" & $aSubItemCSV[$j][0] & ".csv" $hFile = FileOpen($sNewPath, BitOr(8,1)) If $hFile = -1 Then MsgBox(0, "", "Unable to open file") Exit EndIf $iSuccess = FileWrite($hFile, $aSubItemCSV[$j][1]) FileClose($hFile) If $iSuccess = 0 Then MsgBox(0, "", "Unable to write to file") Exit EndIf Next Next ; Notice the differences in the output on each test. Look at the Date and Amount columns. Notice that if you use Excel to view the files, numbers are formatted for display purposes - tricky and misleading. No formatting or data corruption actually occurs in the csv. - Excel just does its own thing. Test 1 = Numeric sort on the Amount column with headers Test 2 = Quick sort on the Amount column with headers Test 3 = Any order without headers Test 4 = Original sequence with headers Edited November 24, 2013 by czardas operator64 ArrayWorkshop
Myicq Posted November 26, 2013 Posted November 26, 2013 Perhaps it would be an idea to ask the Windows locale for default split character ? Some parts of the world (at least mine) uses ; to split since we have comma as decimal. So we have Bob;Doe;1234,49 John;Doe;9948,12 Jane;Doe;1999,00 Welcomed UDF though ! WiValdiBB and czardas 2 I am just a hobby programmer, and nothing great to publish right now.
czardas Posted November 26, 2013 Author Posted November 26, 2013 (edited) Myicq, thanks for trying it. Change the parameter $sDelim to anything you want - although I have only seen semicolon used as an alternative. Use @TAB if you wish tsv. I think it should be the developer's job to make sure the delimiter is suitable. Edited November 26, 2013 by czardas operator64 ArrayWorkshop
czardas Posted November 26, 2013 Author Posted November 26, 2013 Update to first post: Bugfix in _ArrayToCSV(). New lines contained within fields were being ignored - ouch! Thanks Chimaera for reporting this. operator64 ArrayWorkshop
Chimaera Posted November 26, 2013 Posted November 26, 2013 Many thanks for the update If Ive just helped you ... miracles do happen. Chimaera CopyRobo() * Hidden Admin Account Enabler * Software Location From Registry * Find Display Resolution * _ChangeServices()
czardas Posted November 27, 2013 Author Posted November 27, 2013 (edited) You're welcome. The bug was a stupid oversight on my part. I've been using some of this code for a while already without noticing it. When I get time, I will probably make some trivial (non script breaking) changes. I can see a difference between the code I wrote a while back and the code I wrote recently. It feels good to finally create something useful to someone else, besides myself. Edited November 27, 2013 by czardas Xandy 1 operator64 ArrayWorkshop
Chimaera Posted February 16, 2014 Posted February 16, 2014 Been looking at this again Is it possible to have a second sort So Main Split / First Sort Column / Second Sort Column Company_code,Company_name,Stock_code,Short_desc,Discontinued,Preferred_supplier,Stock_type_l2,Stock_id,Line Number,26,Page_No,2128768,91914 ABRUTT,Abrutt Ltd,SKR3187,Swingback stepladder 4 tread,0,-1,Facility,27037,1,,560,,0 ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 5 tread,0,-1,Facility,27038,2,,560,,0 ABRUTT,Abrutt Ltd,SKR3189,Swingback stepladder 6 tread,0,-1,Facility,27039,3,,560,,0 ABRUTT,Abrutt Ltd,SKR3190,Swingback stepladder 8 tread,0,-1,Facility,27040,4,,560,,0 ABRUTT,Abrutt Ltd,SKR3191,Swingback stepladder 10 tread,0,-1,Facility,27041,5,,560,,0 ABRUTT,Abrutt Ltd,SKR3192,Swingback stepladder 12 tread,0,-1,Facility,27042,6,,560,,0 ABRUTT,Abrutt Ltd,SKR3193,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0 ABRUTT,Abrutt Ltd,SKR3194,Platform stepladder 4 tread,0,-1,Facility,27044,8,,561,,0 ABRUTT,Abrutt Ltd,SKR3195,Platform stepladder 5 tread,0,-1,Facility,27045,9,,561,,0 ABRUTT,Abrutt Ltd,SKR3196,Platform stepladder 6 tread,0,-1,Facility,27046,10,,561,,0 ABRUTT,Abrutt Ltd,SKR3197,Platform stepladder 8 tread,0,-1,Facility,27047,11,,561,,0 ABRUTT,Abrutt Ltd,SKR3198,Platform stepladder 10 tread,0,-1,Facility,27048,12,,561,,0 ABRUTT,Abrutt Ltd,SKR3199,Platform stepladder 12 tread,0,-1,Facility,27049,13,,561,,0 In this instance Page_No / Company_Code / Line_Number As im trying to refine the data to make it easier for the booklet im producing Just a thought thx If Ive just helped you ... miracles do happen. Chimaera CopyRobo() * Hidden Admin Account Enabler * Software Location From Registry * Find Display Resolution * _ChangeServices()
czardas Posted February 17, 2014 Author Posted February 17, 2014 (edited) I consider a second sort option to be more of an array, or database specific, type function. This UDF can be used for that purpose already, but it may be more convenient to simply use a multi-column array sorting function. The whole idea of creating an array from csv is to use array functions to manipulate the data. Here is a method using _ArrayToSubItemCSV(). Once you have done one split, you can split each csv string again using the main sorting column as the split, and sort each new csv on the second column. Then concatenate the results in the correct sequence. Remember there are potential issues with numeric data when sorting arrays. This is taken into consideration in the example below. I had to corrupt your example csv because it was too ordely to use as a test. The example below splits on column 10, followed by sorting on column 2, followed by sorting on column 7. The headers are removed and put back later. If there are no headers, then that part of the code will need to be modified. ; expandcollapse popup#include 'CSVSplit.au3' Local $sCSV = _ "Company_code,Company_name,Stock_code,Short_desc,Discontinued,Preferred_supplier,Stock_type_l2,Stock_id,Line Number,26,Page_No,2128768,91914" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 5 tread,0,-1,Facility,27038,2,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3193,Platform stepladder 6 tread,0,-1,Facility,27076,10,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 10 tread,0,-1,Facility,2741,5,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 12 tread,0,-1,Facility,27092,6,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3193,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 6 tread,0,-1,Facility,27036,3,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 8 tread,0,-1,Facility,27049,4,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 12 tread,0,-1,Facility,2049,13,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 4 tread,0,-1,Facility,2744,8,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Platform stepladder 5 tread,0,-1,Facility,70495,9,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3187,Swingback stepladder 4 tread,0,-1,Facility,27437,1,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3197,Platform stepladder 8 tread,0,-1,Facility,27037,11,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 6 tread,0,-1,Facility,27036,3,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 8 tread,0,-1,Facility,27019,4,,560,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 12 tread,0,-1,Facility,20492,13,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3187,Platform stepladder 4 tread,0,-1,Facility,2744,8,,561,,0" & @LF & _ "ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 10 tread,0,-1,Facility,27068,12,,561,,0" Local $iSplitCol = 10, $iFirstSortCol = 2, $iSecondSortCol = 7 Local $aFirstSplit = _CSVSplit($sCSV) ; Get rid of the headers to simplify the multiple splitting operations. Local $aHeader[1][UBound($aFirstSplit, 2)] For $i = 0 To UBound($aFirstSplit, 2) -1 $aHeader[0][$i] = $aFirstSplit[0][$i] Next Local $sCSVHeader = _ArrayToCSV($aHeader) ; We'll add the headers back later _ArrayDelete($aFirstSplit, 0) ; Delete Headers $aFirstSplit = _ArrayToSubItemCSV($aFirstSplit, $iSplitCol) ; Do not sort on the first run Local $aSecondSplit, $vTestItem, $vOutputDisplay For $i = 0 To UBound($aFirstSplit) -1 $aSecondSplit = _CSVSplit($aFirstSplit[$i][1]) ; The sub sort occurs here $aSecondSplit = _ArrayToSubItemCSV($aSecondSplit, $iFirstSortCol, Default, Default, $iSecondSortCol) ; Now we test to see if the data in the main sort is numeric. For $j = 0 To UBound($aSecondSplit) -1 $vTestItem = StringRegExpReplace($aSecondSplit[$j][0], "\A\h+", "") ; Remove leading horizontal WS If StringIsInt($vTestItem) Or StringIsFloat($vTestItem) Then $aSecondSplit[$j][0] = Number($aSecondSplit[$j][0]) ; Conversion to numeric data. EndIf Next _ArraySort($aSecondSplit) ; The main sort occurs here. ; Now combine the sorted csv strings in the correct order $vOutputDisplay = $sCSVHeader For $j = 0 To UBound($aSecondSplit) -1 $vOutputDisplay &= $aSecondSplit[$j][1] ; The final csv string is assembled. Next ; Add your code here below ; Write $vOutputDisplay string to file ; Remove the next two lines which are for testing purposes only. $vOutputDisplay = _CSVSplit($vOutputDisplay) _ArrayDisplay($vOutputDisplay) Next Edited February 17, 2014 by czardas operator64 ArrayWorkshop
czardas Posted March 20, 2015 Author Posted March 20, 2015 (edited) I discovered a bug in CSVSplit() and added a patch. In some instances fields only containing double quotes were returning incorrect results. This bug turned out to be a ghost in the machine. Edited March 20, 2015 by czardas operator64 ArrayWorkshop
JohnOne Posted March 20, 2015 Posted March 20, 2015 I've been using this for a while now, are there any script breaking changes? czardas 1 AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
czardas Posted March 20, 2015 Author Posted March 20, 2015 No script breaking changes. operator64 ArrayWorkshop
JohnOne Posted March 20, 2015 Posted March 20, 2015 Ace, cheers. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
UEZ Posted March 20, 2015 Posted March 20, 2015 (edited) This line seems not to work: $sLine = '10,20,"Blah,Blah",50,"Test1,10" Monitor,40",50",100' Output should be 10|20|Blah,Blah|50|Test1,10" Monitor,40|50"|100 But is 10|20|Blah,Blah|50|Test1,10 Monitor|40,50|100 Edited March 20, 2015 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now