_ArrayUnique on multiple columns

Jewtus · March 31, 2015

I'm wondering if there is a UDF or a function that can get the distinct values from a 2D array using multiple columns. I'm trying setup an excel output that can be trimmed and the distinct records recorded on the excel spreadsheet.

I was messing around with _arrayunique but trying to get the distinct records for multiple columns, but I'm not really getting anywhere. Any suggestions?

This is an example of the code:

If $TrimOpt='Op1' then $aWriteVal=_ArrayUnique($AllPro,0)
If $TrimOpt='Op2' then $aWriteVal=_ArrayUnique($AllPro,1)
If $TrimOpt='Op3' then $aWriteVal=_ArrayUnique($AllPro,2)
If $TrimOpt='Op4' then $aWriteVal=_ArrayUnique($AllPro,3)
If $TrimOpt='Op5' then $aWriteVal=_ArrayUnique($AllPro,4)
If $TrimOpt='Op6' then $aWriteVal=_ArrayUnique($AllPro,5)
If $TrimOpt='Op7' then $aWriteVal=_ArrayUnique($AllPro,6)
$oExcel = ObjCreate("Excel.Application")
$oExcel.Visible = True
$oWorkbook=$oExcel.WorkBooks.Open ($filePath)
$worksheet=$oExcel.ActiveWorkbook.Worksheets("AllProcesses")
   For $g=0 to UBound($aWriteVal)-1
    $worksheet.Range("A"&$g+2).Value=$aWriteVal[$g][0]
    If UBound($aWriteVal,2) >0 then $worksheet.Range("B"&$g+2).Value=$aWriteVal[$g][1]
    If UBound($aWriteVal,2) >1 then $worksheet.Range("C"&$g+2).Value=$aWriteVal[$g][2]
    If UBound($aWriteVal,2) >2 then $worksheet.Range("D"&$g+2).Value=$aWriteVal[$g][3]
    If UBound($aWriteVal,2) >3 then $worksheet.Range("E"&$g+2).Value=$aWriteVal[$g][4]
    If UBound($aWriteVal,2) >4 then $worksheet.Range("F"&$g+2).Value=$aWriteVal[$g][5]
    If UBound($aWriteVal,2) >5 then $worksheet.Range("G"&$g+2).Value=$aWriteVal[$g][6]
    Next

Gianni · March 31, 2015

hi Jewtus

not sure about your goal, but this simple draft function could be of help.
If you pass an 2D array loaded with whatever values, it returns that array with only unique values for each column.
(no error checking is performed if a correct array is passed to function)

#include <Array.au3>

Local $aArray[30][10]
For $i = 0 To 29
    For $x = 0 To 9
        $aArray[$i][$x] = Random(0, 50, 1)
    Next
Next
_ArrayDisplay($aArray, "Input array")

Local $aUniques = _Array2DUnique($aArray) ; returns only unique values in each column
_ArrayDisplay($aUniques, "Output array")

; this function accepts an 1D or 2D array
; and returns only unique values in each column
Func _Array2DUnique(ByRef $aSource)
    If UBound($aSource, 0) = 2 Then
        Local $aOutput[1][UBound($aSource, 2)], $aTemp
        For $iColumn = 0 To UBound($aSource, 2) - 1
            $aTemp = _ArrayUnique($aSource, $iColumn, 0, 0, 0)
            If UBound($aOutput) < UBound($aTemp) Then ReDim $aOutput[UBound($aTemp)][UBound($aSource, 2)]
            For $iRow = 0 To UBound($aTemp) - 1
                $aOutput[$iRow][$iColumn] = $aTemp[$iRow]
            Next
        Next
    Else
        Local $aOutput = _ArrayUnique($aSource)
    EndIf
    Return $aOutput
EndFunc   ;==>_Array2DUnique

Jewtus · April 1, 2015

This is kind of what I was looking for, but I'm trying to keep each line tied together. The array is a result set from SQL so I need to make sure each line stays together but I only grab unique versions of each line.

Example:

Row1 Row2 Row3 Row4

Hello Cruel World Now

Hello Cruel World This

Hello Cruel World Is

Hello Cruel World What

Hello Cruel World I

Hello Cruel World Need

So if I ignore row 4 the unique record is Hello|Cruel|World

The only way I have been able to make this work is perform a different SQL query dropping one column for each query, I'll just have to build 7 queries.

Gianni · April 4, 2015

So, you would remove all duplicates by checking data on only some columns and then keeping only one of those duplicated rows,
well, maybe the following function can do
just pass your array as first parameter and a string with the numbers of the columns that you want to check as comma separated values as second parameter (columns numbers are 0 based)

#include <Array.au3>

Local $aArray[6][4] = [ _
        ['Hello', 'Cruel', 'World', 'Now'], _
        ['Hello', 'Cruel', 'World', 'This'], _
        ['Hello', 'Cruel', 'World', 'Is'], _
        ['Hello', 'Cruel', 'World', 'What'], _
        ['Hello', 'Cruel', 'World', 'I'], _
        ['Hello', 'Cruel', 'World', 'Need']]

_ArrayDisplay($aArray, "Input array")
; returns only unique rows based on columns 0, 1 and 2
Local $aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2")
_ArrayDisplay($aUniques, "Output array")


Func _ArrayUnique2D_Ex(ByRef $aSource, $sColumns = "*")
    ; check wanted columns
    If $sColumns = "*" Then
        Local $aColumns[UBound($aSource, 2)]
        For $i = 0 To UBound($aColumns) - 1
            $aColumns[$i] = $i
        Next
    Else
        Local $aColumns = StringSplit($sColumns, ",", 2)
    EndIf
    ; chain fields to check
    Local $aChainFileds[UBound($aSource, 1)][2]
    For $iRow = 0 To UBound($aSource, 1) - 1
        $aChainFileds[$iRow][1] = 0
        For $iField = 0 To UBound($aColumns) - 1
            $aChainFileds[$iRow][0] &= $aSource[$iRow][$aColumns[$iField]]
        Next
    Next
    ; uniqe from chain
    $aTemp = _ArrayUnique($aChainFileds, 0, 0, 0, 1) ; remove duplicate records (if any)
    Local $aUniques[UBound($aTemp)][UBound($aSource, 2)]
    $aUniques[0][0] = 0 ; pointer to next free row to fill
    If UBound($aChainFileds) <> $aTemp[0] Then ; there are some duplicate
        Local $aDuplicates[UBound($aChainFileds, 1) - $aTemp[0] + 1][UBound($aSource, 2)] ; will hold only duplicate
        $aDuplicates[0][0] = 0 ; pointer to next free row to fill

        For $iRow = 0 To UBound($aChainFileds, 1) - 1
            If Not $aChainFileds[$iRow][1] Then ; this record still not checked
                $aTemp = _ArrayFindAll($aChainFileds, $aChainFileds[$iRow][0]) ; find duplicates (if any)
                For $i = 0 To UBound($aTemp) - 1
                    $aChainFileds[$aTemp[$i]][1] = UBound($aTemp) ; mark this record as a duplicate
                Next
                $aUniques[0][0] += 1
                For $iField = 0 To UBound($aSource, 2) - 1
                    $aUniques[$aUniques[0][0]][$iField] = $aSource[$aTemp[0]][$iField]
                Next
                If UBound($aTemp) > 1 Then ; there are duplicates of this record
                    For $i = 1 To UBound($aTemp) - 1
                        $aDuplicates[0][0] += 1
                        For $iField = 0 To UBound($aSource, 2) - 1
                            $aDuplicates[$aDuplicates[0][0]][$iField] = $aSource[$aTemp[$i]][$iField]
                        Next
                    Next
                EndIf
            EndIf
        Next
        ; _ArrayDisplay($aUniques, "Those are unique elements")
        ; _ArrayDisplay($aDuplicates, "These are duplicates discarded")
    Else
        ; there are not duplicates in source array
        ; return passed array unchanged
        Return $aSource
    EndIf
    _ArrayDelete($aUniques, 0) ; remove the count row
    Return $aUniques
EndFunc   ;==>_ArrayUnique2D_Ex

iamtheky · April 4, 2015

What you are explaining is also "only return the items that exist in every column"?

#include <Array.au3>

Local $aArr[6][4] = [ _
        ['Hello', 'Cruel', 'World', 'Now'], _
        ['Hello', 'Cruel', 'World', 'This'], _
        ['Hello', 'Cruel', 'World', 'Is'], _
        ['Hello', 'Cruel', 'World', 'What'], _
        ['Hello', 'Cruel', 'World', 'I'], _
        ['Hello', 'Cruel', 'World', 'Need']]

_ArrayDisplay($aArr, "Input array")
_ArrayDisplay(_ArrayCommon($aArr), "CommonValue")
_ArrayDisplay(_ArrayMultiValue($aArr), "MultiValue")

Func _ArrayCommon($aArray)

Local $aOut[1][ubound($aArray , 2)]

    for $k = 0 to ubound($aArray , 2) - 1
        $aMatch =  _ArrayFindAll($aArray , $aArray[0][$k] , 0 , 0 , 0 , 0 , $k)
        If ubound($aMatch) = UBound($aArray) Then $aOut[0][$k] = $aArray[0][$k]
    next

return $aOut

EndFunc

Func _ArrayMultiValue($aArray)

Local $aOut[ubound($aArray)][ubound($aArray , 2)]
Local $aSkip[0]

    for $i = 0 to ubound($aArray) - 1
        for $k = 0 to ubound($aArray , 2) - 1
            _ArrayFindAll($aSkip , $k)
                If @Error = 0 Then ContinueLoop
                $aMatch =  _ArrayFindAll($aArray , $aArray[$i][$k] , 0 , 0 , 0 , 0 , $k)
                    If ubound($aMatch) <> UBound($aArray) Then
                        $aOut[$i][$k] = $aArray[$i][$k]
                    Else
                        _ArrayAdd($aSkip , $k)
                    
                    EndIf
        next
    next

return $aOut

EndFunc

Edit: Also, did the opposite and returned columns with multiple values just to learn...cleaned it up so it didnt re-search columns already determined to be all matches.

Edited April 4, 2015 by boththose

Gianni · April 4, 2015

What you are explaining is also "only return the items that exist in every column"?

#include <Array.au3>

Local $aArr[6][4] = [ _
        ['Hello', 'Cruel', 'World', 'Now'], _
        ['Hello', 'Cruel', 'World', 'This'], _
        ['Hello', 'Cruel', 'World', 'Is'], _
        ['Hello', 'Cruel', 'World', 'What'], _
        ['Hello', 'Cruel', 'World', 'I'], _
        ['Hello', 'Cruel', 'World', 'Need']]

_ArrayDisplay($aArr, "Input array")
_ArrayDisplay(_ArrayCommon($aArr), "CommonValue")
_ArrayDisplay(_ArrayMultiValue($aArr), "MultiValue")

Func _ArrayCommon($aArray)

Local $aOut[1][ubound($aArray , 2)]

    for $k = 0 to ubound($aArray , 2) - 1
        $aMatch =  _ArrayFindAll($aArray , $aArray[0][$k] , 0 , 0 , 0 , 0 , $k)
        If ubound($aMatch) = UBound($aArray) Then $aOut[0][$k] = $aArray[0][$k]
    next

return $aOut

EndFunc

Func _ArrayMultiValue($aArray)

Local $aOut[ubound($aArray)][ubound($aArray , 2)]
Local $aSkip[0]

    for $i = 0 to ubound($aArray) - 1
        for $k = 0 to ubound($aArray , 2) - 1
            _ArrayFindAll($aSkip , $k)
                If @Error = 0 Then ContinueLoop
                $aMatch =  _ArrayFindAll($aArray , $aArray[$i][$k] , 0 , 0 , 0 , 0 , $k)
                    If ubound($aMatch) <> UBound($aArray) Then
                        $aOut[$i][$k] = $aArray[$i][$k]
                    Else
                        _ArrayAdd($aSkip , $k)
                    
                    EndIf
        next
    next

return $aOut

EndFunc

Edit: Also, did the opposite and returned columns with multiple values just to learn...cleaned it up so it didnt re-search columns already determined to be all matches.

your _ArrayCommon function, check if a column has the same value in all the rows and in that case it returns that value

while _ArrayMultiValue returns the whole column if one or more values are different

but I think that this is not what OP need

for example if you have in input an array like this:

Local $aArray[6][4] = [ _
        ['Hello', 'Cruel', 'World', 'Now'], _
        ['Hello', 'Nice', 'World', 'This'], _
        ['Hello', 'Cruel', 'World', 'Is'], _
        ['Hello', 'Nice', 'World', 'What'], _
        ['Hello', 'Cruel', 'World', 'I'], _
        ['Hello', 'Wonder', 'World', 'Need']]

and you want to search for duplicated data only by columns 0 and 1 for example

the function _ArrayUnique2d should return something like this:

Hello    Cruel   World    Now
Hello    Nice    World    This
Hello    Wonder  World    Need

that is: when you find more rows with the same values in column 0 and 1, then return only one occurrence of the duplicated rows (remove the duplicates and keep only one)

this is a little more complicated to achieve than what your functions are intended to do.

edit:

changed array data

Edited April 4, 2015 by Chimp

mikell · April 4, 2015

for example if you have in input an array like this:
Local $aArray[6][4] = [ _
        ['Hello', 'Cruel', 'World', 'Now'], _
        ['Hello', 'Nice', 'World', 'This'], _
        ['Hello', 'Cruel', 'World', 'Is'], _
        ['Hello', 'Nice', 'World', 'What'], _
        ['Hello', 'Cruel', 'World', 'I'], _
        ['Hello', 'Wonder', 'World', 'Need']]
and you want to search for duplicated data only by columns 0 and 1 for example
the function _ArrayUnique2d should return something like this:
Hello    Cruel   World    Now
Hello    Nice    World    This
Hello    Wonder  World    Need
that is: when you find more rows with the same values in column 0 and 1, then return only one occurrence of the duplicated rows (remove the duplicates and keep only one)

In this case for return#1 , why would row#1 be returned rather than row#3 or row#5 ? should the non concerned columns be totally omitted ?

Gianni · April 4, 2015

In this case for return#1 , why would row#1 be returned rather than row#3 or row#5 ? should the non concerned columns be totally omitted ?

The first row of many repeated is the one that is kept, not other constraints are specified by op (however, suggestions for improvement are welcome)

anyway, in my function Is also available an array with all the discarded rows ($aDuplicates), even if not returned, it can be used if needed.

Jewtus · April 6, 2015

Ok, Let me clear this up a little because I think my first post was a little unclear.

This is what I'm trying to do.. I have a SQL query that returns a ton of duplicate data in the first few columns because of the join (its a parent child table so its a table joined to itself). I'd like to be able to use the same result query and trim it based on the column I select. So for example, If I wanted all of the records, I could use the base result. If I only wanted the first 2 columns worth of data, all of the other columns would be deleted, then the distinct records left from the first two columns would show.

I tried to using Chimps sample, but I didn't quite understand why when I ran my array against it, it would consistently only show the first row. I did however try to use Boththose _ArrayCommon function, and its much closer to what I'm trying to do. I did some a minor mod, that seems to work... sort of. My actual result array is 7 columns wide and can have duplicates in potentially any column.

The following code works up to field 6, which is where I would like to see two results:

#include <Array.au3>

Local $aArr[7][7] = [ _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result2'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result3'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result4'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result5'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result6'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result7']]



_ArrayDisplay(_ArrayCommon($aArr,6), "CommonValue")


Func _ArrayCommon($aArray,$colNum='')
    If $colNum <> '' Then
        For $z=UBound($aArray,2) to $colNum step -1
            _ArrayColDelete($aArray,$z)
        Next
    EndIf
    Local $aOut[1][ubound($aArray , 2)]
        for $k = 0 to ubound($aArray , 2) - 1
            $aMatch =  _ArrayFindAll($aArray , $aArray[0][$k] , 0 , 0 , 0 , 0 , $k)
            If ubound($aMatch) = UBound($aArray) Then $aOut[0][$k] = $aArray[0][$k]
        next
    return $aOut
EndFunc

The result I'd like to see is:

['Field1','Field2','Field3','Field4','Field5','Field6']

['Field1','Field2','Field3','Field4','Field5','Field6.1']

SadBunny · April 6, 2015

Why not just do another SQL query (or a couple of them) with a SELECT DISTINCT?

iamtheky · April 6, 2015

#include <Array.au3>

Local $aArr[7][7] = [ _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result2'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result3'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result4'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result5'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result6'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result7']]

_ArrayDisplay($aArr, "Input array")
_ArrayDisplay(_ArrayCommon($aArr), "CommonValue")


Func _ArrayCommon($aArray)

Local $aOut[ubound($aArray)][ubound($aArray , 2)]
Local $o = 0

for $i = 0 to ubound($aArray) - 1
    for $k = 0 to ubound($aArray , 2) - 1
        $aMatch =  _ArrayFindAll($aArray , $aArray[$i][$k] , 0 , 0 , 0 , 0 , $k)
        If ubound($aMatch) > 1 Then
          If $o = 0 Then
              $aOut[$o][$k] = $aArray[$i][$k]
              $flag = 1
              EndIf
          If $o > 0 AND $aArray[$i][$k] <> $aOut[$o - 1][$k] Then
                for $p = 0 to ubound($aArray , 2) - 1
                    $aLineMatch =  _ArrayFindAll($aArray , $aArray[$i][$p] , $i , 0 , 0 , 0 , $p)
              If ubound($aLineMatch) > 1 Then $aOut[$o][$p] = $aArray[$i][$p]
              next
              $flag = 1
          EndIf
        EndIf
    next
            If $flag = 1 Then
                $o += 1
                $flag = 0
            EndIf
Next

return $aOut

EndFunc

jchd · April 6, 2015

Why not just do another SQL query (or a couple of them) with a SELECT DISTINCT?

That, or use a where or having clause to restrict the resultset, while requesting explicitly the columns needed (instead of select * from...).

SQL in general is powerful enough for most precise queries and working out useful queries is always simpler than writing tons of application code to extract useful data from a "grab all" query. This also helps to keep your application simpler and easier to code/debug/maintain.

Edited April 6, 2015 by jchd

Gianni · April 6, 2015

Ok, Let me clear this up a little because I think my first post was a little unclear.

This is what I'm trying to do.. I have a SQL query that returns a ton of duplicate data in the first few columns because of the join (its a parent child table so its a table joined to itself). I'd like to be able to use the same result query and trim it based on the column I select. So for example, If I wanted all of the records, I could use the base result. If I only wanted the first 2 columns worth of data, all of the other columns would be deleted, then the distinct records left from the first two columns would show.

I tried to using Chimps sample, but I didn't quite understand why when I ran my array against it, it would consistently only show the first row. I did however try to use Boththose _ArrayCommon function, and its much closer to what I'm trying to do. I did some a minor mod, that seems to work... sort of. My actual result array is 7 columns wide and can have duplicates in potentially any column.

The following code works up to field 6, which is where I would like to see two results:
#include <Array.au3>

Local $aArr[7][7] = [ _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result2'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result3'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result4'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result5'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result6'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result7']]



_ArrayDisplay(_ArrayCommon($aArr,6), "CommonValue")


Func _ArrayCommon($aArray,$colNum='')
    If $colNum <> '' Then
        For $z=UBound($aArray,2) to $colNum step -1
            _ArrayColDelete($aArray,$z)
        Next
    EndIf
    Local $aOut[1][ubound($aArray , 2)]
        for $k = 0 to ubound($aArray , 2) - 1
            $aMatch =  _ArrayFindAll($aArray , $aArray[0][$k] , 0 , 0 , 0 , 0 , $k)
            If ubound($aMatch) = UBound($aArray) Then $aOut[0][$k] = $aArray[0][$k]
        next
    return $aOut
EndFunc
The result I'd like to see is:

['Field1','Field2','Field3','Field4','Field5','Field6']

['Field1','Field2','Field3','Field4','Field5','Field6.1']

it seems to me that my function returns exactly what you ask,

the only difference from your wanted output is that it returns all the columns of the input array instead of only the ones on which is performed the "Unique" check, but it can easly modified

can you try this and say if I understand well the goal?

#include <Array.au3>

Local $aArray[7][7] = [ _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result2'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result3'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result4'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result5'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result6'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result7']]

_ArrayDisplay($aArray, "Input array")
; returns only unique rows based on columns 0, 1 and 2
Local $aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2,3,4,5")
_ArrayDisplay($aUniques, "Output array")


Func _ArrayUnique2D_Ex(ByRef $aSource, $sColumns = "*")
    ; check wanted columns
    If $sColumns = "*" Then
        Local $aColumns[UBound($aSource, 2)]
        For $i = 0 To UBound($aColumns) - 1
            $aColumns[$i] = $i
        Next
    Else
        Local $aColumns = StringSplit($sColumns, ",", 2)
    EndIf
    ; chain fields to check
    Local $aChainFileds[UBound($aSource, 1)][2]
    For $iRow = 0 To UBound($aSource, 1) - 1
        $aChainFileds[$iRow][1] = 0
        For $iField = 0 To UBound($aColumns) - 1
            $aChainFileds[$iRow][0] &= $aSource[$iRow][$aColumns[$iField]]
        Next
    Next
    ; uniqe from chain
    $aTemp = _ArrayUnique($aChainFileds, 0, 0, 0, 1) ; remove duplicate records (if any)
    Local $aUniques[UBound($aTemp)][UBound($aSource, 2)]
    $aUniques[0][0] = 0 ; pointer to next free row to fill
    If UBound($aChainFileds) <> $aTemp[0] Then ; there are some duplicate
        Local $aDuplicates[UBound($aChainFileds, 1) - $aTemp[0] + 1][UBound($aSource, 2)] ; will hold only duplicate
        $aDuplicates[0][0] = 0 ; pointer to next free row to fill

        For $iRow = 0 To UBound($aChainFileds, 1) - 1
            If Not $aChainFileds[$iRow][1] Then ; this record still not checked
                $aTemp = _ArrayFindAll($aChainFileds, $aChainFileds[$iRow][0]) ; find duplicates (if any)
                For $i = 0 To UBound($aTemp) - 1
                    $aChainFileds[$aTemp[$i]][1] = UBound($aTemp) ; mark this record as a duplicate
                Next
                $aUniques[0][0] += 1
                For $iField = 0 To UBound($aSource, 2) - 1
                    $aUniques[$aUniques[0][0]][$iField] = $aSource[$aTemp[0]][$iField]
                Next
                If UBound($aTemp) > 1 Then ; there are duplicates of this record
                    For $i = 1 To UBound($aTemp) - 1
                        $aDuplicates[0][0] += 1
                        For $iField = 0 To UBound($aSource, 2) - 1
                            $aDuplicates[$aDuplicates[0][0]][$iField] = $aSource[$aTemp[$i]][$iField]
                        Next
                    Next
                EndIf
            EndIf
        Next
        ; _ArrayDisplay($aUniques, "Those are unique elements")
        ; _ArrayDisplay($aDuplicates, "These are duplicates discarded")
    Else
        ; there are not duplicates in source array
        ; return passed array unchanged
        Return $aSource
    EndIf
    _ArrayDelete($aUniques, 0) ; remove the count row
    Return $aUniques
EndFunc   ;==>_ArrayUnique2D_Ex

edit:

p.s.

of course I agree with SadBunny and jchd form post #10 and #12. For sure with just an SQL query you can have only the data that you need,

anyway this function is just for "the fun in programming"

Edited April 6, 2015 by Chimp

Gianni · April 6, 2015

a small change to my _ArrayUnique2D_Ex() function that allows, by adding a parameter,

to choose if the returned array must contain all of the columns of the input array or only those columns relating to the check of "Uniqueness"
that is (for example):

$aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2,3,4,5", False)
$aArray :       The input array
"0,1,2,3,4,5": [optional] The columns you want for "uniqueness"
False/True :   [optional] True (default) returns same columns as in input (reurn all columns?)
                           False returns only the columns as in previous param

Here the modified version:

#include <Array.au3>

Local $aArray[7][7] = [ _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result2'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result3'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result4'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result5'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result6'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result7']]

_ArrayDisplay($aArray, "Input array")
; returns only unique rows based on columns 0, 1, 2, 3, 4, 5 and returned array contains only those columns
Local $aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2,3,4,5", False)
_ArrayDisplay($aUniques, "Output array")


Func _ArrayUnique2D_Ex(ByRef $aSource, $sColumns = "*", $iReturnAllCols = True)
    ; check wanted columns
    If $sColumns = "*" Then
        Local $aColumns[UBound($aSource, 2)]
        For $i = 0 To UBound($aColumns) - 1
            $aColumns[$i] = $i
        Next
    Else
        Local $aColumns = StringSplit($sColumns, ",", 2) ; NO count in element 0
    EndIf

    ; chain fields to check
    Local $aChainFileds[UBound($aSource, 1)][2]
    For $iRow = 0 To UBound($aSource, 1) - 1
        $aChainFileds[$iRow][1] = 0
        For $iField = 0 To UBound($aColumns) - 1
            $aChainFileds[$iRow][0] &= $aSource[$iRow][$aColumns[$iField]]
        Next
    Next
    ; uniqe from chain
    $aTemp = _ArrayUnique($aChainFileds, 0, 0, 0, 1) ; remove duplicate records (if any)
    If $iReturnAllCols Then
        Local $aUniques[UBound($aTemp)][UBound($aSource, 2)] ; Return all columns
    Else
        Local $aUniques[UBound($aTemp)][UBound($aColumns)] ; Return only checked columns
    EndIf
    $aUniques[0][0] = 0 ; pointer to next free row to fill
    If UBound($aChainFileds) <> $aTemp[0] Then ; there are some duplicate
        Local $aDuplicates[UBound($aChainFileds, 1) - $aTemp[0] + 1][UBound($aSource, 2)] ; will hold only duplicate
        $aDuplicates[0][0] = 0 ; pointer to next free row to fill

        For $iRow = 0 To UBound($aChainFileds, 1) - 1
            If Not $aChainFileds[$iRow][1] Then ; this record still not checked
                $aTemp = _ArrayFindAll($aChainFileds, $aChainFileds[$iRow][0]) ; find duplicates (if any)
                For $i = 0 To UBound($aTemp) - 1
                    $aChainFileds[$aTemp[$i]][1] = UBound($aTemp) ; mark this record as a duplicate
                Next
                $aUniques[0][0] += 1
                If $iReturnAllCols Then
                    For $iField = 0 To UBound($aSource, 2) - 1
                        $aUniques[$aUniques[0][0]][$iField] = $aSource[$aTemp[0]][$iField]
                    Next
                Else
                    For $iField = 0 To UBound($aColumns) - 1
                        $aUniques[$aUniques[0][0]][$iField] = $aSource[$aTemp[0]][$aColumns[$iField]]
                    Next
                EndIf
                If UBound($aTemp) > 1 Then ; there are duplicates of this record
                    For $i = 1 To UBound($aTemp) - 1
                        $aDuplicates[0][0] += 1
                        For $iField = 0 To UBound($aSource, 2) - 1
                            $aDuplicates[$aDuplicates[0][0]][$iField] = $aSource[$aTemp[$i]][$iField]
                        Next
                    Next
                EndIf
            EndIf
        Next
        ; _ArrayDisplay($aUniques, "Those are unique elements")
        ; _ArrayDisplay($aDuplicates, "These are duplicates discarded")
    Else
        ; there are not duplicates in source array
        ; return passed array unchanged
        Return $aSource
    EndIf
    _ArrayDelete($aUniques, 0) ; remove the count row
    Return $aUniques
EndFunc   ;==>_ArrayUnique2D_Ex

@boththose

It seems to me that your function has a bug and a limit:
1) the bug: If you pass the following array (note the "FieldX" in middle of column3) your function does not catch it

Local $aArr[7][7] = [ _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result2'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result3'], _
    ['Field1','Field2','FieldX','Field4','Field5','Field6','Result4'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6','Result5'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result6'], _
    ['Field1','Field2','Field3','Field4','Field5','Field6.1','Result7']]

2) the limit: if you want to check only few columns, say 3 and 5, you can't.

Jewtus · April 6, 2015

@SadBunny

That is the way I have it right now, but I'm trying to avoid spamming queries as well as making it so if I change the schema, I don't need to update 7 queries to reflect the changes

@Chimp

I think the issue I had with your script was that when it returned the array (EX when I put '0,1,2') it returned all 7 columns when I need it to return just the 3 columns. I did take another look and I tweaked your script to make it do what I wanted:

#include <Array.au3>


Local $aArray[7][7] = [ _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result2'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result3'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result4'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Result5'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result6'], _
        ['Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6.1', 'Result7']]


_ArrayDisplay($aArray, "Input array")
; returns only unique rows based on columns 0, 1 and 2
Local $aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2,3,4,5,6")
_ArrayDisplay($aUniques, "Output array")




Func _ArrayUnique2D_Ex(ByRef $aSource, $sColumns = "*")
    ; check wanted columns
    If $sColumns = "*" Then
        Local $aColumns[UBound($aSource, 2)]
        For $i = 0 To UBound($aColumns) - 1
            $aColumns[$i] = $i
        Next
    Else
        Local $aColumns = StringSplit($sColumns, ",", 2)
    EndIf
    ; chain fields to check
    Local $aChainFileds[UBound($aSource, 1)][2]
    For $iRow = 0 To UBound($aSource, 1) - 1
        $aChainFileds[$iRow][1] = 0
        For $iField = 0 To UBound($aColumns) - 1
            $aChainFileds[$iRow][0] &= $aSource[$iRow][$aColumns[$iField]]
        Next
    Next
    ; uniqe from chain
    $aTemp = _ArrayUnique($aChainFileds, 0, 0, 0, 1) ; remove duplicate records (if any)
    Local $aUniques[UBound($aTemp)][UBound($aSource, 2)]
    $aUniques[0][0] = 0 ; pointer to next free row to fill
    If UBound($aChainFileds) <> $aTemp[0] Then ; there are some duplicate
        Local $aDuplicates[UBound($aChainFileds, 1) - $aTemp[0] + 1][UBound($aSource, 2)] ; will hold only duplicate
        $aDuplicates[0][0] = 0 ; pointer to next free row to fill


        For $iRow = 0 To UBound($aChainFileds, 1) - 1
            If Not $aChainFileds[$iRow][1] Then ; this record still not checked
                $aTemp = _ArrayFindAll($aChainFileds, $aChainFileds[$iRow][0]) ; find duplicates (if any)
                For $i = 0 To UBound($aTemp) - 1
                    $aChainFileds[$aTemp[$i]][1] = UBound($aTemp) ; mark this record as a duplicate
                Next
                $aUniques[0][0] += 1
                For $iField = 0 To UBound($aSource, 2) - 1
                    $aUniques[$aUniques[0][0]][$iField] = $aSource[$aTemp[0]][$iField]
                Next
                If UBound($aTemp) > 1 Then ; there are duplicates of this record
                    For $i = 1 To UBound($aTemp) - 1
                        $aDuplicates[0][0] += 1
                        For $iField = 0 To UBound($aSource, 2) - 1
                            $aDuplicates[$aDuplicates[0][0]][$iField] = $aSource[$aTemp[$i]][$iField]
                        Next
                    Next
                EndIf
            EndIf
        Next
        ; _ArrayDisplay($aUniques, "Those are unique elements")
        ; _ArrayDisplay($aDuplicates, "These are duplicates discarded")
    Else
        ; there are not duplicates in source array
        ; return passed array unchanged
        Return $aSource
    EndIf
    _ArrayDelete($aUniques, 0) ; remove the count row
$trimOutput=StringSplit($sColumns,",")
$colNum=$trimOutput[UBound($trimOutput)-1]
If $colNum <> '' Then
For $z=UBound($aUniques,2) to $colNum+1 step -1
_ArrayColDelete($aUniques,$z)
Next
    EndIf
    Return $aUniques
EndFunc   ;==>_ArrayUnique2D_Ex

Edited April 6, 2015 by Jewtus

jchd · April 6, 2015

If you ever need to update the schema, you're essentially forced to modify the filtering code. But that's up to you.

Gianni · April 6, 2015

I posted the same time you posted....

see my post #14 above for a little improvement that allows you to check any column and return only those columns if you don't want all the other

Jewtus · April 6, 2015

@Chimp

Ya I saw that and changed the accepted solve. Thanks!

@jchd

Why would you say that I have to modify the filtering code? If I was to update the schema, I'd also update the initial query, which would make the overall array update, and that is what I'm looking for the uniqueness on... the original array result

iamtheky · April 6, 2015

Based off the explanations so far, I would think that Fieldx in that instance would not be desired, since it is just as unique as the items in the Results column. But it works fine when there is another, since I am only finding commonalities, not uniques. And I never saw #2 as a requirement.

Glad you made it out without Sql though, I dont even get to take guesses in those threads.

WildByDesign · December 31, 2024

On 4/6/2015 at 10:05 AM, Gianni said:

a small change to my _ArrayUnique2D_Ex() function that allows, by adding a parameter,

to choose if the returned array must contain all of the columns of the input array or only those columns relating to the check of "Uniqueness"
that is (for example):

$aUniques = _ArrayUnique2D_Ex($aArray, "0,1,2,3,4,5", False)

@Gianni I understand that this is a pretty old thread to reply to, but your _ArrayUnique2D_Ex() function is pure gold!

The program that I am making right now has 5 columns in the array. Each row required that all of the columns (except column 0) were filtered together for uniqueness. Column 0 contains a time stamp for each row, making the uniqueness filtering even more complicated.

I had achieved what I wanted via 4 separate functions, including _FileWriteFromArray to be consumed by PowerShell, filtered for uniqueness, and _FileReadToArray back to the program. Needless to say, it was complex, slow and likely more error prone.

I spent 2-3 days searching this forum for a more efficient solution and trying different things so that I could keep it all within the program and avoid PowerShell. That is when I finally came across your _ArrayUnique2D_Ex() function.

The fact that I was able to specify which columns to filter for uniqueness with your function, it made the entire thing incredibly simple and lightning fast.

Since I had to avoid the uniqueness filtering of column 0 because of the time stamp, I used the following:

$aUniques = _ArrayUnique2D_Ex($aArray1, "1,2,3,4", True)

Your _ArrayUnique2D_Ex() function is absolute magic and I really had to reply here to thank you for sharing this. I would imagine that this function could benefit a lot of people when it comes to filtering uniqueness on 2D arrays. And it does so with simplicity due to allow to specific columns.

This function really should become part of AutoIT. There were hundreds (likely many more) of threads with users asking for how to filter uniqueness on 2D arrays throughout this forum that I had to search through and read for a few days until I found your solution.

Have a wonderful New Year! Cheers!

Edited December 31, 2024 by WildByDesign
spelling

Sign In

_ArrayUnique on multiple columns

Recommended Posts

Jewtus

Gianni

Jewtus

Gianni

iamtheky

Gianni

mikell

Gianni

Jewtus

SadBunny

iamtheky

jchd

Gianni

Gianni

Jewtus

jchd

Gianni

Jewtus

iamtheky

WildByDesign

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta