Stumpii Posted February 8, 2007 Share Posted February 8, 2007 (edited) I found some great code called SimMetrics that mathematically calculates the similarity of two strings, returning a value of 0.0 (not a match) to 1.0 (exact match).It is written in .Net 2.0, so I wrote a COM wrapper and installation package so I could use the feature in Excel. It could also be used in AutoIt.The installation package (which installs the SimMetrics dll, wrapper dll and example au3 code) can be downloaded from here: SimMetrics COM WrapperSome examples of use could be:Custom spell checkerSearch a music database for similar named songsFind the best match to a name in an address bookCross reference part numbers in two list that may have typosThe example given in the download shows the user a list of names and then asks the user to enter a name. The program picks the name that most closely matches the name the user enters.At the moment only the Levenshtein Distance Metric is included in the wrapper. If anyone finds another metric that they think would be useful, I can add it to the wrapper. For those interested, the SourceForge page for the SimMetrics program is here.Thanks go to sohfeyr for helping register the .Net dlls without the MS installer.Here is the example code for those interested:CODE#include <Array.au3>$MySimMetrics = ObjCreate("SimMetricsCOMWrapper.SimMetrics")If IsObj($MySimMetrics) = 0 Then MsgBox(16, "SimMetrics COM Wrapper Example", "Could not create SimMetrics object.") ExitEndIfDim $avArray$avArray = _ArrayCreate("JPM", "Holger", "Jon", "Larry", "Jeremy", "Valik", "Cyberslug", "Nutster", "Tylo", "JdeB")_ArrayDisplay($avArray, "List of users")$Value = InputBox("SimMetrics COM Wrapper Example", "Enter a user to search for (guess/mis-spell a name)")$MostSimilar = ""$MostSimilarSimilarity = 0For $i = 0 To UBound($avArray) - 1 $Similarity = $MySimMetrics.LevensteinSimilarity ($Value, $avArray[$i]) If Number($Similarity) > Number($MostSimilarSimilarity) Then $MostSimilar = $avArray[$i] $MostSimilarSimilarity = $Similarity EndIfNextMsgBox(64, "SimMetrics COM Wrapper Example", "Closest match to '" & $Value & "' is '" & $MostSimilar & "' with " & _ $MostSimilarSimilarity * 100 & "% similarity.") Edited February 8, 2007 by Stumpii Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity. Link to comment Share on other sites More sharing options...
Eusebio Posted February 8, 2007 Share Posted February 8, 2007 Here my old code in au3: http://www.autoitscript.com/forum/index.ph...st&p=224962 WhoIsYouTube Video downloaderSoundex - SoundexEx - Levenshtein Distance (algorithms)[font="Arial"]I3osé[/font][font="Arial"]AutoIT[/font] Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted February 8, 2007 Moderators Share Posted February 8, 2007 (edited) Couldn't resist... I have no idea how close in concept this is though.MsgBox(0, '', 'Case Sensitive: ' & _StringLikePerc('I am a strinG.', 'I am a string.', 1) & '%' & @CR & _ 'Not Sensitive: ' & _StringLikePerc('I am a strinG.', 'I am a string.') & '%') Func _StringLikePerc($sString1, $sString2, $nCase = 0) Local $nLen1 = StringLen($sString1), $nLen2 = StringLen($sString2) Local $nMinLen = $nLen1, $nMaxLen = $nLen2, $iAdd If $nLen2 < $nLen1 Then $nMinLen = $nLen2 $nMaxLen = $nLen1 EndIf If $nCase Then For $iCC = 1 To $nMinLen If StringMid($sString1, $iCC, 1) == StringMid($sString2, $iCC, 1) Then $iAdd += 1 Next Else For $iCC = 1 To $nMinLen If StringMid($sString1, $iCC, 1) = StringMid($sString2, $iCC, 1) Then $iAdd += 1 Next EndIf Return Round(($iAdd / $nMaxLen) * 100, 2) EndFunc Edit: Now that I think about it, the above isn't a good example, it would only do a % based on if the first char was the same or not... if that varied, then it would return a 0% match.... ahh well. Edited February 8, 2007 by SmOke_N ChesTeRcs 1 Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
WideBoyDixon Posted February 10, 2009 Share Posted February 10, 2009 The following implements two "similarity" functions. The first is the traditional Levenshtein method. The second is an algorithm adapted from http://siderite.blogspot.com/2007/01/super...-algorithm.html and re-coded for AutoIt. You can choose which implementation to use by passing a non-zero third parameter. This is my first code post so be kind!expandcollapse popup#include <Math.au3> Dim Const $maxOffset = 5 ; Change according to how far you want to search; a bigger value means a slower comparison ; Example usage MsgBox(0, "Levenshtein", _Similarity("The cat sat on the mat", "The dog sat on the cat")) MsgBox(0, "Similarity", _Similarity("The cat sat on the mat", "The dog sat on the cat", 1)) Exit ; Compute similarity between two strings Func _Similarity($s1, $s2, $alg = 0) Local $dis If ($alg = 0) Then $dis = _Levenshtein($s1, $s2) Else $dis = _Distance($s1, $s2) EndIf Local $maxLen = _Max(StringLen($s1), StringLen($s2)) If ($maxLen = 0) Then Return 100 Return (1 - $dis / $maxLen) * 100 EndFunc ; Compute Levenshtein Distance Func _Levenshtein($s, $t) Local $m, $n, $i, $j, $s_i, $t_j, $cost $n = StringLen($s) $m = StringLen($t) If $n * $m = 0 Then Return $m + $n Local $d[$n + 1][$m + 1] For $i = 0 To $n $d[$i][0] = $i Next For $j = 0 To $m $d[0][$j] = $j Next For $i = 1 To $n $s_i = StringMid($s, $i, 1) For $j = 1 To $m $t_j = StringMid($t, $j, 1) If $s_i = $t_j Then $cost = 0 Else $cost = 1 EndIf $d[$i][$j] = _Minimum3($d[$i - 1][$j] + 1, $d[$i][$j - 1] + 1, $d[$i - 1][$j - 1] + $cost) Next Next Return $d[$n][$m] EndFunc ; Get minimum of three values Func _Minimum3($a, $b, $c) Local $mi $mi = $a If $b < $mi Then $mi = $b If $c < $mi Then $mi = $c Return $mi EndFunc ; Efficient calculation of similarity Func _Distance($s1, $s2) Local $ls1 = StringLen($s1), $ls2 = StringLen($s2) If $ls1 * $ls2 = 0 Then Return $ls1 + $ls2 Local $c = 0, $offset1 = 0, $offset2 = 0, $dist = 0, $i While (($c + $offset1 < $ls1) And ($c + $offset2 < $ls2)) If (StringMid($s1, $c + $offset1 + 1, 1) <> StringMid($s2, $c + $offset2 + 1, 1)) Then $offset1 = 0 $offset2 = 0 For $i = 0 to $maxOffset - 1 If (($c + $i < $ls1) And (StringMid($s1, $c + $i + 1, 1) = StringMid($s2, $c + 1, 1))) Then If ($i > 0) Then $dist += 1 $offset1 = $i EndIf $dist -= 1 ExitLoop EndIf If (($c + $i < $ls2) And (StringMid($s1, $c + 1, 1) = StringMid($s2, $c + $i + 1, 1))) Then If ($i > 0) Then $dist += 1 $offset2 = $i EndIf $dist -= 1 ExitLoop EndIf Next $dist += 1 EndIf $c += 1 Wend Return $dist + ($ls1 - $offset1 + $ls2 - $offset2) / 2 - $c EndFuncWBD Seminko 1 [center]Wide by name, Wide by nature and Wide by girth[u]Scripts[/u]{Hot Folders} {Screen Calipers} {Screen Crosshairs} {Cross-Process Subclassing} {GDI+ Clock} {ASCII Art Signatures}{Another GDI+ Clock} {Desktop Goldfish} {Game of Life} {3D Pie Chart} {Stock Tracker}[u]UDFs[/u]{_FileReplaceText} {_ArrayCompare} {_ToBase}~ My Scripts On Google Code ~[/center] Link to comment Share on other sites More sharing options...
Malkey Posted September 17, 2009 Share Posted September 17, 2009 Here is another example using the Levenshtein Distance or Edit distance algorithm. Also, a string alignment, longest common subsequence. or _TraceBack function which uses the same table/matrix created by the _EditDistance() function. The results of the TraceBack function, eg.:- vintner- | || writ-ers No. of Differences: 5 appropriate m-eaning ||||| ||||| ||| approximate matching No. of Differences: 7are written to console. MsgBox does not format the display very well. In the above two examples - referring to the middle line between the two strings - the vertical lines indicate matches, the spaces are differences. expandcollapse popup; #include <Math.au3> #include <array.au3> ;Edit distance, Levenshtein distance, or Dynamic time warping ; http: www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/ ; String alignment, sequence alignment, longest common subsequence. or _TraceBack approach. ; http: books.google.com.au/books?id=STGlsyqtjYMC&pg=PA215&lpg=PA215&dq=string+difference+edit+distane&source=bl&ots=kiQ8vAL_v3&sig=TovNa041KRZDRGIa5bVI9HBMp1E&hl=en&ei=nd-iSt-SGIGYkQXd3OWBBA&sa=X&oi=book_result&ct=result&resnum=9#v=onepage&q=&f=false ; http: en.wikipedia.org/wiki/Sequence_alignment ; http: en.wikipedia.org/wiki/Longest_common_subsequence_problem ; [url="http://www.autoitscript.com/forum/index.php?showtopic=40843&view=findpost&p=303848"]http://www.autoitscript.com/forum/index....p?showtopic=40843&view=findpost&p=303848[/url] Local $sD Local $s1 = "appropriate meaning", $s2 = "approximate matching" ;Local $s1 = "vintner", $s2 = "writers" ;Local $s1='test', $s2='jesT' ;Local $s1 = "456043", $s2 = "1234567" Local $aArr = _EditDistance($s1, $s2) _TraceBack('', '', '', $aArr, StringLen($s1), StringLen($s2), $s1, $s2, $sD) ConsoleWrite(@CRLF & $sD & @CRLF & "No. of Differences: " & $aArr[StringLen($s1)][StringLen($s2)] & @CRLF) ;MsgBox(0, "", $sD & @CRLF & @CRLF & "No. of Differences: " & $aArr[StringLen($s1)][StringLen($s2)]) ;_ArrayDisplay($aArr) Func _EditDistance($s1, $s2) Local $m[StringLen($s1) + 1][StringLen($s2) + 1], $i, $j $m[0][0] = 0; boundary conditions For $j = 1 To StringLen($s2) $m[0][$j] = $m[0][$j - 1] + 1; boundary conditions Next For $i = 1 To StringLen($s1) $m[$i][0] = $m[$i - 1][0] + 1; boundary conditions Next For $j = 1 To StringLen($s2); outer loop For $i = 1 To StringLen($s1) ; inner loop If (StringMid($s1, $i, 1) = StringMid($s2, $j, 1)) Then $diag = 0; Else $diag = 1 EndIf $m[$i][$j] = _Min($m[$i - 1][$j] + 1, _ ; insertion (_Min($m[$i][$j - 1] + 1, _ ; deletion $m[$i - 1][$j - 1] + $diag))) ; substitution Next Next ;_TraceBack('', '', '', $m, StringLen($s1), StringLen($s2), $s1, $s2, $Dis); ;_ArrayDisplay($m) Return $m ; $m[StringLen($s1)][StringLen($s2)] EndFunc ;==>_EditDistance Func _TraceBack($row1, $row2, $row3, $m, $i, $j, $s1, $s2, ByRef $sD) Local $diag, $diagCh If ($i > 0) And ($j > 0) Then $diag = $m[$i - 1][$j - 1] $diagCh = '|'; If (StringMid($s1, $i, 1) <> StringMid($s2, $j, 1)) Then $diag += 1 $diagCh = ' '; } EndIf If ($m[$i][$j] >= $diag) Then _TraceBack(StringMid($s1, $i, 1) & $row1, $diagCh & $row2, StringMid($s2, $j, 1) & $row3, $m, $i - 1, $j - 1, $s1, $s2, $sD); change or match ElseIf $m[$i][$j] = ($m[$i - 1][$j] + 1) Then ; delete _TraceBack(StringMid($s1, $i, 1) & $row1, $diagCh & $row2, '-' & $row3, $m, $i - 1, $j, $s1, $s2, $sD); Else _TraceBack("-" & $row1, $diagCh & $row2, StringMid($s2, $j, 1) & $row3, $m, $i, $j - 1, $s1, $s2, $sD); insertion EndIf ElseIf ($i > 0) Then _TraceBack(StringMid($s1, $i, 1) & $row1, " " & $row2, '-' & $row3, $m, $i - 1, $j, $s1, $s2, $sD); ElseIf ($j > 0) Then _TraceBack("-" & $row1, " " & $row2, StringMid($s2, $j, 1) & $row3, $m, $i, $j - 1, $s1, $s2, $sD); Else ; $i==0 and $j==0 ;ConsoleWrite( @CRLF & $row1 & @CRLF & $row2 & @CRLF & $row3 & @CRLF) $sD = $row1 & @CRLF & $row2 & @CRLF & $row3 & @CRLF EndIf EndFunc ;==>_TraceBack ; mLipok 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now