sambalec Posted April 1, 2013 Share Posted April 1, 2013 Hello, How can I compare 2 strings to get a percent result about similarity ? Example : String 1 : "Hello Worlds !" String 2 : "Hello my World !!!" I need a % result, for example : 70 % similar... Many thanks ! :-) Link to comment Share on other sites More sharing options...
FireFox Posted April 1, 2013 Share Posted April 1, 2013 (edited) Hi, By using the StringCompare function with some math tricks. Here you go : #include <Misc.au3> Local Const $s1 = "toto" Local Const $s2 = "tata" Local Const $a1 = StringSplit($s1, ""), $a2 = StringSplit($s2, "") Local Const $iMax = _Iif($a1[0] > $a2[0], $a2[0], $a1[0]) Local $iDiffCount = 0 For $i = 1 To $iMax If StringCompare($a1[$i], $a2[$i], 2) <> 0 Then $iDiffCount += 1 Next ConsoleWrite("Diff: " & $iDiffCount / $iMax * 100 & "%" & @CrLf) Br, FireFox. Edited April 1, 2013 by FireFox Link to comment Share on other sites More sharing options...
jdelaney Posted April 1, 2013 Share Posted April 1, 2013 (edited) Wish I could cite the source:expandcollapse popupFunc _Typos(Const $st1, Const $st2, $anychar = '_', $anytail = '%') ; Get amount of typos between two strings Local $s1, $s2, $pen, $del, $ins, $subst If Not IsString($st1) Then Return SetError(-1, -1, -1) If Not IsString($st2) Then Return SetError(-2, -2, -1) If $st2 = '' Then Return StringLen($st1) If $st2 == $anytail Then Return 0 If $st1 = '' Then Return(StringInStr($st2 & $anytail, $anytail, 1) - 1) EndIf ;~ $s1 = StringSplit(_LowerUnaccent($st1)), "", 2) ;; _LowerUnaccent() addon function not available here ;~ $s2 = StringSplit(_LowerUnaccent($st2)), "", 2) ;; _LowerUnaccent() addon function not available here $s1 = StringSplit(StringLower($st1), "", 2) $s2 = StringSplit(StringLower($st2), "", 2) Local $l1 = UBound($s1), $l2 = UBound($s2) Local $r[$l1 + 1][$l2 + 1] For $x = 0 To $l2 - 1 Switch $s2[$x] Case $anychar If $x < $l1 Then $s2[$x] = $s1[$x] EndIf Case $anytail $l2 = $x If $l1 > $l2 Then $l1 = $l2 EndIf ExitLoop EndSwitch $r[0][$x] = $x Next $r[0][$l2] = $l2 For $x = 0 To $l1 $r[$x][0] = $x Next For $x = 1 To $l1 For $y = 1 To $l2 $pen = Not ($s1[$x - 1] == $s2[$y - 1]) $del = $r[$x-1][$y] + 1 $ins = $r[$x][$y-1] + 1 $subst = $r[$x-1][$y-1] + $pen If $del > $ins Then $del = $ins If $del > $subst Then $del = $subst $r[$x][$y] = $del If ($pen And $x > 1 And $y > 1 And $s1[$x-1] == $s2[$y-2] And $s1[$x-2] == $s2[$y-1]) Then If $r[$x][$y] >= $r[$x-2][$y-2] Then $r[$x][$y] = $r[$x-2][$y-2] + 1 $r[$x-1][$y-1] = $r[$x][$y] EndIf Next Next Return ($r[$l1][$l2]) ;~ ; usage ;~ Local $reference = "lexicographically" ;~ Local $Words[11][2] = [ _ ;~ [$reference], _ ;~ ["Lexicôgraphicaly"], _ ;~ ["lexkographicaly"], _ ;~ ["Lexico9raphically"], _ ;~ ["lexioo9asdasraphically"], _ ;~ ["Lexicographical"], _ ;~ ["lexicographlcally"], _ ;~ ["[email="Lex1cogr@phically"]Lex1cogr@phically[/email]"], _ ;~ ["lexic0graphïca1yl"], _ ;~ ["lexIcOgraphically"], _ ;~ ["Lexlcographically"] _ ;~ ] ;~ For $i = 0 To UBound($Words) - 1 ;~ $Words[$i][1] = _Typos($Words[$i][0], $reference) ;~ Next ;~ _ArrayDisplay($Words, "Number of typos") ;~ ConsoleWrite("Usage of '_' and '%' wildcards in pattern:" & @LF & @TAB & "_Typos([email="'lex1c0gr@fhlâofznho'"]'lex1c0gr@fhlâofznho'[/email], 'LEx_c_gr%') = " & _Typos([email="'lex1c0gr@fhlofznho'"]'lex1c0gr@fhlofznho'[/email], 'lex_c_gr%') & @LF) ;~ ConsoleWrite("Does not always return the absolute minimum edit distance:" & @LF & @TAB & "_Typos('bdac', 'abcd') = " & _Typos('bdac', 'abcd') & @LF) ;~ EndFuncgot it, jchd: Edited April 1, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
sambalec Posted April 1, 2013 Author Share Posted April 1, 2013 Nice ! I'm waiting for ! Many Thank's ! Link to comment Share on other sites More sharing options...
sambalec Posted April 2, 2013 Author Share Posted April 2, 2013 Thank's FireFox and Jdelaney for your help. when I try in your script (Mr FireFox) : Local Const $s1 = "pizza service" Local Const $s2 = "Pizza Service" result is 0 % (perfect for me) But : Local Const $s1 = "pizza service" Local Const $s2 = "the pizza Service" result is 100 % (is not good, il would like about 20 % of difference) Link to comment Share on other sites More sharing options...
water Posted April 2, 2013 Share Posted April 2, 2013 If you need it case sensitive then just change this line in the example Firefox provided If StringCompare($a1[$i], $a2[$i], 2) <> 0 Then $iDiffCount += 1to thisIf StringCompare($a1[$i], $a2[$i], 1) <> 0 Then $iDiffCount += 1 My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
sambalec Posted April 2, 2013 Author Share Posted April 2, 2013 Thanks Water, My problem was not with sensitive case, Problem is : Local Const $s1 = "pizza service" Local Const $s2 = "the pizza Service" result is 100 % of difference (is not good for me, il would like about 20 % of difference) Link to comment Share on other sites More sharing options...
FireFox Posted April 2, 2013 Share Posted April 2, 2013 result is 100 % of difference (is not good for me, il would like about 20 % of difference)Yes because it starts from the left to right, I don't know what is best algorithm that would fit your need.Maybe a second check from the opposite direction and take the less difference ?Br, FireFox. Link to comment Share on other sites More sharing options...
water Posted April 2, 2013 Share Posted April 2, 2013 My best bet is: Search for an algorithm written in Visual Basic and then translate it to AutoIt. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
FireFox Posted April 2, 2013 Share Posted April 2, 2013 (edited) Thanks for your link water, like I said there is different algorithms to check the similarity of strings and from this search I'm coming up with the link below.@sambalecCan you chose an algorithm from this page? Me or someone else will be glad to translate it for you Br, FireFox. Edited April 2, 2013 by FireFox Link to comment Share on other sites More sharing options...
jdelaney Posted April 2, 2013 Share Posted April 2, 2013 expandcollapse popupLocal $reference = "pizza service" Local $Words[4][4] = [ _ [$reference], _ ["the pizza service"], _ ["tha piza service"], _ ["pitza sarvace"]] For $i = 0 To UBound($Words) - 1 $Words[$i][1] = _Typos($Words[$i][0], $reference) $Words[$i][2] = (StringLen($reference) - $Words[$i][1]) / StringLen($reference) $Words[$i][3] = Abs(1-(StringLen($reference) - $Words[$i][1]) / StringLen($reference)) Next _ArrayDisplay($Words, "Number of typos") Exit Func _Typos(Const $st1, Const $st2, $anychar = '_', $anytail = '%') ; Get amount of typos between two strings Local $s1, $s2, $pen, $del, $ins, $subst If Not IsString($st1) Then Return SetError(-1, -1, -1) If Not IsString($st2) Then Return SetError(-2, -2, -1) If $st2 = '' Then Return StringLen($st1) If $st2 == $anytail Then Return 0 If $st1 = '' Then Return (StringInStr($st2 & $anytail, $anytail, 1) - 1) EndIf ;~ $s1 = StringSplit(_LowerUnaccent($st1)), "", 2) ;; _LowerUnaccent() addon function not available here ;~ $s2 = StringSplit(_LowerUnaccent($st2)), "", 2) ;; _LowerUnaccent() addon function not available here $s1 = StringSplit(StringLower($st1), "", 2) $s2 = StringSplit(StringLower($st2), "", 2) Local $l1 = UBound($s1), $l2 = UBound($s2) Local $r[$l1 + 1][$l2 + 1] For $x = 0 To $l2 - 1 Switch $s2[$x] Case $anychar If $x < $l1 Then $s2[$x] = $s1[$x] EndIf Case $anytail $l2 = $x If $l1 > $l2 Then $l1 = $l2 EndIf ExitLoop EndSwitch $r[0][$x] = $x Next $r[0][$l2] = $l2 For $x = 0 To $l1 $r[$x][0] = $x Next For $x = 1 To $l1 For $y = 1 To $l2 $pen = Not ($s1[$x - 1] == $s2[$y - 1]) $del = $r[$x - 1][$y] + 1 $ins = $r[$x][$y - 1] + 1 $subst = $r[$x - 1][$y - 1] + $pen If $del > $ins Then $del = $ins If $del > $subst Then $del = $subst $r[$x][$y] = $del If ($pen And $x > 1 And $y > 1 And $s1[$x - 1] == $s2[$y - 2] And $s1[$x - 2] == $s2[$y - 1]) Then If $r[$x][$y] >= $r[$x - 2][$y - 2] Then $r[$x][$y] = $r[$x - 2][$y - 2] + 1 $r[$x - 1][$y - 1] = $r[$x][$y] EndIf Next Next Return ($r[$l1][$l2]) EndFunc ;==>_Typos output: (against the expected) |String|Count wrong|Percent correct|Percent Wrong [0]|pizza service|0|1|0 [1]|the pizza service|4|0.692307692307692|0.307692307692308 [2]|tha piza service|5|0.615384615384615|0.384615384615385 [3]|pitza sarvace|3|0.769230769230769|0.230769230769231 or, switch the comparison to be against the actual: using: Local $reference = "pizza service" Local $Words[4][4] = [ _ [$reference], _ ["the pizza service"], _ ["tha piza service"], _ ["pitza sarvace"]] For $i = 0 To UBound($Words) - 1 $Words[$i][1] = _Typos($Words[$i][0], $reference) $Words[$i][2] = (StringLen($Words[$i][0]) - $Words[$i][1]) / StringLen($Words[$i][0]) $Words[$i][3] = Abs(1-(StringLen($Words[$i][0]) - $Words[$i][1]) / StringLen($Words[$i][0])) Next _ArrayDisplay($Words, "Number of typos") output: [0]|pizza service|0|1|0 [1]|the pizza service|4|0.764705882352941|0.235294117647059 [2]|tha piza service|5|0.6875|0.3125 [3]|pitza sarvace|3|0.769230769230769|0.230769230769231 IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
kylomas Posted April 2, 2013 Share Posted April 2, 2013 (edited) symbalec,What percent similar are the following two sets of strings (by your definition of similar)?abcdacbdandzzzzkylomasalso: these stringsthe boythe boy Edited April 2, 2013 by kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
jchd Posted April 2, 2013 Share Posted April 2, 2013 A percentage is not obviously the most informative measure since it depends on the length of the string. My function returns the number of edits required to change string1 into string2. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
sambalec Posted April 2, 2013 Author Share Posted April 2, 2013 Many thanks for your help ! :-) @Jdelaney : your script is very good for me... just inversing function is missing : Example : the Pizza Service Service Pizza the I need to get same result :-) Link to comment Share on other sites More sharing options...
jdelaney Posted April 2, 2013 Share Posted April 2, 2013 the Pizza Service Service Pizza the I need to get same result :-) these wouldn't return the same result...these would: the Pizza Service Pizza Service the Local $reference = "Pizza Service" Local $Words[4][4] = [ _ [$reference], _ ["the Pizza Service"], _ ["Pizza Service the"], _ ["Service Pizza the"]] For $i = 0 To UBound($Words) - 1 $Words[$i][1] = _Typos($Words[$i][0], $reference) $Words[$i][2] = (StringLen($Words[$i][0]) - $Words[$i][1]) / StringLen($Words[$i][0]) $Words[$i][3] = Abs(1-(StringLen($Words[$i][0]) - $Words[$i][1]) / StringLen($Words[$i][0])) Next _ArrayDisplay($Words, "Number of typos") output: [0]|Pizza Service|0|1|0 [1]|the Pizza Service|4|0.764705882352941|0.235294117647059 [2]|Pizza Service the|4|0.764705882352941|0.235294117647059 [3]|Service Pizza the|14|0.176470588235294|0.823529411764706 IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
kylomas Posted April 2, 2013 Share Posted April 2, 2013 A percentage is not obviously the most informative measure since it depends on the length of the string. My function returns the number of edits required to change string1 into string2. I know, trying to understand the OP's rules... sambelec, Try this local $str1 = 'the pizaa service', $init_len1 = stringlen($str1) local $str2 = 'pizaa service the', $init_len2 = stringlen($str2) for $1 = 1 to stringlen($str1) for $2 = 1 to stringlen($str2) if stringmid($str1,$1,1) = stringmid($str2,$2,1) then $str2 = stringreplace($str2,stringmid($str2,$2,1),'_') $str1 = stringreplace($str1,stringmid($str1,$1,1),'_') endif next next $str1 = stringreplace($str1,'_','') $str2 = stringreplace($str2,'_','') ConsoleWrite('String1 is ' & round( (stringlen($str1)/$init_len2)*100,2 ) & '% different from string2' & @LF) ConsoleWrite('String2 is ' & round( (stringlen($str2)/$init_len1)*100,2 ) & '% different from string1' & @LF) kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
jchd Posted April 2, 2013 Share Posted April 2, 2013 My remark was not towards you kylomas. Fuzzy question, fuzzy answer. czardas 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
kylomas Posted April 3, 2013 Share Posted April 3, 2013 My remark was not towards you kylomas.Fuzzy question, fuzzy answer.Yes, I know, been trying to get specifications.@sambalec,Please define exactly what you want.kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
kylomas Posted April 3, 2013 Share Posted April 3, 2013 (edited) sambalec, (follow up from 04/02/2013) The code that I posted simply eliminates "like" letters from each subject string. Therefore, this will produce differences of "0" percent: ;local $str1 = 'the pizaa service', $init_len1 = stringlen($str1) ;local $str2 = 'pizaa service the', $init_len2 = stringlen($str2) local $str1 = 'zzzzzz', $init_len1 = stringlen($str1) local $str2 = 'z', $init_len2 = stringlen($str2) for $1 = 1 to stringlen($str1) for $2 = 1 to stringlen($str2) if stringmid($str1,$1,1) = stringmid($str2,$2,1) then $str2 = stringreplace($str2,stringmid($str2,$2,1),'_') $str1 = stringreplace($str1,stringmid($str1,$1,1),'_') endif next next $str1 = stringreplace($str1,'_','') $str2 = stringreplace($str2,'_','') ConsoleWrite('String1 is ' & round( (stringlen($str1)/$init_len2)*100,2 ) & '% different from string2' & @LF) ConsoleWrite('String2 is ' & round( (stringlen($str2)/$init_len1)*100,2 ) & '% different from string1' & @LF) Do you see why we are asking for further specifications? kylomas edit: addfitional info This version leaves duplicate characters, so "z" compared to "zzzzzz" is 500% different (because there are 5 "z'" left over) ;local $str1 = 'the pizaa service', $init_len1 = stringlen($str1) ;local $str2 = 'pizaa service the', $init_len2 = stringlen($str2) local $str1 = 'zzzzzz', $init_len1 = stringlen($str1) local $str2 = 'z', $init_len2 = stringlen($str2) for $1 = 1 to stringlen($str1) for $2 = 1 to stringlen($str2) if stringmid($str1,$1,1) = stringmid($str2,$2,1) then $str2 = stringreplace($str2,stringmid($str2,$2,1),'_',1) $str1 = stringreplace($str1,stringmid($str1,$1,1),'_',1) endif next next $str1 = stringreplace($str1,'_','') $str2 = stringreplace($str2,'_','') ConsoleWrite('String1 is ' & round( (stringlen($str1)/$init_len2)*100,3 ) & '% different from string2' & @LF) ConsoleWrite('String2 is ' & round( (stringlen($str2)/$init_len1)*100,3 ) & '% different from string1' & @LF) Edited April 3, 2013 by kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now