JRowe Posted March 30, 2010 Share Posted March 30, 2010 A function I rewrote from some php code written by a guy I encountered on freenode #ai irc. expandcollapse popup; #FUNCTION# ;=============================================================================== ; ; Name...........: _VectorDelta ; Description ...: Returns a similarity score between two lists ; Syntax.........: _DateDiff($sType, $sStartDate, $sEndDate) ; Parameters ....: $aDatasetA, $aDatasetB ; Return values .: Success - Similarity score. ; Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Not to be used directly, for use by _Similarity() ; Related .......: _Similarity ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _VectorDelta($aDatasetA, $aDatasetB) ;count Local $iCount = 0 ;return Local $return = 0 ;temp value Local $tempValue = 0 ;index Local $index = 0 ;value Local $value = 0 ;iterate through each value in $aDatasetA and compare to values in $aDatasetB ;Iterate comparisons from here... For $value In $aDatasetA ;increment index $index += 1 ;check if index is lesser than or equal to the size of $aDatasetB If $index <= UBound($aDatasetB) Then $iCount += 1 $tempValue = $aDatasetB[$index - 1] - $value $tempValueSquared = $tempValue * $tempValue $return += $tempValueSquared EndIf Next ;... to here. ;Check the count of compared dataset pairs, return the square root of the summed comparisons or else 0 If $iCount > 0 Then If $return > 0 Then $return = Sqrt($return) EndIf EndIf ;Return the result. Return $return EndFunc ;==>_VectorDelta ; #FUNCTION# ;=============================================================================== ; ; Name...........: _Similarity ; Description ...: Returns a similarity score between a list of elements and a set of other lists ; Syntax.........: _Similarity($aArrayH, $iIndexA, $iIndexB) ; Parameters ....: $aArrayH, $iIndexA, $iIndexB ; Return values .: Success - Similarity score comparing $aArrayH[$iIndexA] to $aArrayH[$iIndexB] against $iIndexA to each other array. ; Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Compares element to element, doesn't do iterative correlation. ; Related .......: _VectorDelta ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _Similarity($aArrayH, $iIndexA, $iIndexB) ;return Local $return = 0 ;tally Local $tally = 0 ;Vector delta of A to B Local $similarityOfAToB = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$iIndexB]) Local $index = 0 ;Iterate through each array, comparing similarity of every array For $iIndexC In $aArrayH $index += 1 ;don't include self comparisons in $result If ($index<> $iIndexA) AND ($index<>$iIndexB) Then ;increment tally of comparisons $tally += 1 ;Get Vector Delta of array[A] and array[index-1] $similarityOfAToList = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$index-1]) ;Get Vector Delta of array[B] and array[index-1] $similarityOfBToList = _VectorDelta($aArrayH[$iIndexB], $aArrayH[$index-1]) ;increment $return if similarity is greater than A to list If $similarityOfAToB > $similarityOfAToList Then $return += 1 ;increment $return if similarity is greater than B to list If $similarityOfAToB > $similarityOfBToList Then $return += 1 EndIf Next ;return $return divided by 2 over the number of tallied comparisons Return 1-($return / 2 / $tally) EndFunc ;==>_Similarity Example: #include "_CorrelativeAnalysis.au3" ;Example ;1,2,3,4 representing up(1) down(2) left(3) and right(4) respectively ;[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] is a line going straight up, for example. ;Dataset for patterns Global $testSet1[16] = [1,1,1,1,4,4,4,4,2,2,2,2,3,3,3,3] Global $testSet2[16] = [1,1,1,1,4,4,4,4,4,4,4,4,1,1,1,1] Global $testSet3[16] = [1,1,1,4,1,1,1,1,1,1,1,1,3,3,3,3] Global $testSet4[16] = [2,2,2,2,3,3,3,3,3,3,3,3,1,1,1,1] Global $testSet5[16] = [3,3,3,3,2,2,2,2,4,4,4,4,1,1,1,1] Global $testSet6[16] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] ;pattern we want to test Global $MatchSet[16] = [1,4,1,1,1,4,4,4,3,2,1,2,3,3,4,1] Global $comparison[7] = [$MatchSet, $testSet1, $testSet2, $testSet3, $testSet4, $testSet5, $testSet6] ConsoleWrite("Similarity to 1: " & _Similarity($comparison, 0, 1) & @CRLF) ConsoleWrite("Similarity to 2: " & _Similarity($comparison, 0, 2) & @CRLF) ConsoleWrite("Similarity to 3: " & _Similarity($comparison, 0, 3) & @CRLF) ConsoleWrite("Similarity to 4: " & _Similarity($comparison, 0, 4) & @CRLF) ConsoleWrite("Similarity to 5: " & _Similarity($comparison, 0, 5) & @CRLF) ConsoleWrite("Similarity to 6: " & _Similarity($comparison, 0, 6) & @CRLF) This performs element to element matching. This doesn't handle nonlinear data sets... only linear clusters. It will detect similarities between pixel colors in the same position, for example, but it won't detect similarities between a pixel and its neighbors. That requires cycling through iterations and transformations of the data. In the example, I laid out a set of arbitrary paths that could be seen as input from a mouse gesture. $matchSet is the data being tested against the data set. It returns 91% similarity to the correct match (test set 1) and lower similarity to each other set. Sets can be weighted by repeated inclusion. You can match against incomplete sets, but the data requires being correctly aligned. If anyone is interested, I'd really love some help in array manipulation so that this could be used on nonlinear data. Things like facial recognition and feature detection are possible, but I'm not the greatest at matrix manipulation. Think of this as N-dimensional Venn diagrams. The similarity scores represent the percentage of overlap between each element in each list. Thanks to Keal for laying this out. This is really a very robust and powerful piece of code._CorrelativeAnalysis.au3 [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center] Link to comment Share on other sites More sharing options...
gazai Posted July 23, 2010 Share Posted July 23, 2010 Cool! Can it be used for some nonparametric correlation use (such as ranked correlation)? Thanks for the script. Link to comment Share on other sites More sharing options...
Xibalba Posted July 29, 2010 Share Posted July 29, 2010 Seems like a nice one, Would it be possible to breakdown strings to arrays (with Asc() ?), and get a match score? For example: string1 = "Arnold Schwarzenegger", string2 = "Arnold Shwarseneger" - match about ~80% ? Or perhaps some other function/UDF would be better for that? Ty Link to comment Share on other sites More sharing options...
JRowe Posted August 4, 2010 Author Share Posted August 4, 2010 This would be a good use for that. Also, you can cycle through it, so asdArnold Scwharzenjikksdfg would also show a high match (this requires throwing the comparison in a loop and iterating over the offset string, and returning the highest match. [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center] Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now