Pattern Matching

JRowe · March 30, 2010

A function I rewrote from some php code written by a guy I encountered on freenode #ai irc.

; #FUNCTION# ;===============================================================================
;
; Name...........: _VectorDelta
; Description ...: Returns a similarity score between two lists
; Syntax.........: _DateDiff($sType, $sStartDate, $sEndDate)
; Parameters ....: $aDatasetA, $aDatasetB
; Return values .: Success - Similarity score.
;   Failure - potentially encounters division by zero.
; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon"
; Modified.......:
; Remarks .......: Not to be used directly, for use by _Similarity()
; Related .......: _Similarity
; Link ..........;
; Example .......; Yes
;
; ;==========================================================================================
Func _VectorDelta($aDatasetA, $aDatasetB)
    ;count
    Local $iCount = 0
    ;return
    Local $return = 0
    ;temp value
    Local $tempValue = 0
    ;index
    Local $index = 0
    ;value
    Local $value = 0

    ;iterate through each value in $aDatasetA and compare to values in $aDatasetB
    ;Iterate comparisons from here...
    For $value In $aDatasetA
        ;increment index
        $index += 1
        ;check if index is lesser than or equal to the size of $aDatasetB
        If $index <= UBound($aDatasetB) Then
            $iCount += 1
            $tempValue = $aDatasetB[$index - 1] - $value
            $tempValueSquared = $tempValue * $tempValue
            $return += $tempValueSquared
        EndIf
    Next
    ;... to here.

    ;Check the count of compared dataset pairs, return the square root of the summed comparisons or else 0
    If $iCount > 0 Then
        If $return > 0 Then
            $return = Sqrt($return)
        EndIf
    EndIf

    ;Return the result.
    Return $return
EndFunc ;==>_VectorDelta

; #FUNCTION# ;===============================================================================
;
; Name...........: _Similarity
; Description ...: Returns a similarity score between a list of elements and a set of other lists
; Syntax.........: _Similarity($aArrayH, $iIndexA, $iIndexB)
; Parameters ....: $aArrayH, $iIndexA, $iIndexB
; Return values .: Success - Similarity score comparing $aArrayH[$iIndexA] to $aArrayH[$iIndexB] against $iIndexA to each other array.
;   Failure - potentially encounters division by zero.
; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon"
; Modified.......:
; Remarks .......: Compares element to element, doesn't do iterative correlation.
; Related .......: _VectorDelta
; Link ..........;
; Example .......; Yes
;
; ;==========================================================================================
Func _Similarity($aArrayH, $iIndexA, $iIndexB)
    ;return
    Local $return = 0
    ;tally
    Local $tally = 0
    ;Vector delta of A to B
    Local $similarityOfAToB = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$iIndexB])
    Local $index = 0

    ;Iterate through each array, comparing similarity of every array
    For $iIndexC In $aArrayH
        $index += 1
        ;don't include self comparisons in $result
        If ($index<> $iIndexA) AND ($index<>$iIndexB) Then
            ;increment tally of comparisons
            $tally += 1
            ;Get Vector Delta of array[A] and array[index-1]
            $similarityOfAToList = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$index-1])
            ;Get Vector Delta of array[B] and array[index-1]
            $similarityOfBToList = _VectorDelta($aArrayH[$iIndexB], $aArrayH[$index-1])
            ;increment $return if similarity is greater than A to list
            If $similarityOfAToB > $similarityOfAToList Then $return += 1
            ;increment $return if similarity is greater than B to list
            If $similarityOfAToB > $similarityOfBToList Then $return += 1
        EndIf
    Next
    ;return $return divided by 2 over the number of tallied comparisons
    Return 1-($return / 2 / $tally)
EndFunc ;==>_Similarity

Example:

#include "_CorrelativeAnalysis.au3"
;Example

;1,2,3,4 representing up(1) down(2) left(3) and right(4) respectively
;[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] is a line going straight up, for example.

;Dataset for patterns
Global $testSet1[16] = [1,1,1,1,4,4,4,4,2,2,2,2,3,3,3,3]
Global $testSet2[16] = [1,1,1,1,4,4,4,4,4,4,4,4,1,1,1,1]
Global $testSet3[16] = [1,1,1,4,1,1,1,1,1,1,1,1,3,3,3,3]
Global $testSet4[16] = [2,2,2,2,3,3,3,3,3,3,3,3,1,1,1,1]
Global $testSet5[16] = [3,3,3,3,2,2,2,2,4,4,4,4,1,1,1,1]
Global $testSet6[16] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

;pattern we want to test
Global $MatchSet[16] = [1,4,1,1,1,4,4,4,3,2,1,2,3,3,4,1]

Global $comparison[7] = [$MatchSet, $testSet1, $testSet2, $testSet3, $testSet4, $testSet5, $testSet6]

ConsoleWrite("Similarity to 1: " & _Similarity($comparison, 0, 1) & @CRLF)
ConsoleWrite("Similarity to 2: " & _Similarity($comparison, 0, 2) & @CRLF)
ConsoleWrite("Similarity to 3: " & _Similarity($comparison, 0, 3) & @CRLF)
ConsoleWrite("Similarity to 4: " & _Similarity($comparison, 0, 4) & @CRLF)
ConsoleWrite("Similarity to 5: " & _Similarity($comparison, 0, 5) & @CRLF)
ConsoleWrite("Similarity to 6: " & _Similarity($comparison, 0, 6) & @CRLF)

This performs element to element matching. This doesn't handle nonlinear data sets... only linear clusters. It will detect similarities between pixel colors in the same position, for example, but it won't detect similarities between a pixel and its neighbors. That requires cycling through iterations and transformations of the data.

In the example, I laid out a set of arbitrary paths that could be seen as input from a mouse gesture. $matchSet is the data being tested against the data set. It returns 91% similarity to the correct match (test set 1) and lower similarity to each other set.

Sets can be weighted by repeated inclusion. You can match against incomplete sets, but the data requires being correctly aligned.

If anyone is interested, I'd really love some help in array manipulation so that this could be used on nonlinear data. Things like facial recognition and feature detection are possible, but I'm not the greatest at matrix manipulation.

Think of this as N-dimensional Venn diagrams. The similarity scores represent the percentage of overlap between each element in each list.

Thanks to Keal for laying this out. This is really a very robust and powerful piece of code.

_CorrelativeAnalysis.au3

gazai · July 23, 2010

Cool!

Can it be used for some nonparametric correlation use (such as ranked correlation)?

Thanks for the script.

Xibalba · July 29, 2010

Seems like a nice one,

Would it be possible to breakdown strings to arrays (with Asc() ?), and get a match score?

For example: string1 = "Arnold Schwarzenegger", string2 = "Arnold Shwarseneger" - match about ~80% ?

Or perhaps some other function/UDF would be better for that?

Ty

JRowe · August 4, 2010

This would be a good use for that. Also, you can cycle through it, so asdArnold Scwharzenjikksdfg would also show a high match (this requires throwing the comparison in a loop and iterating over the offset string, and returning the highest match.

Sign In

Pattern Matching

Recommended Posts

JRowe

gazai

Xibalba

JRowe

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta