fgthhhh Posted April 26, 2010 Posted April 26, 2010 i have a main-string and other sub-strings,find the sub-string which is like the main-string most. i don't know how to do it example: main-string: aqwert sub-strings: +qwerb +gfdgf +qwbt result: +qwerb : 80% ( have same qwer) +gfdgf: 0% ( nothing like) +qwbt: 20% ( have same qw) pls help me, thx
water Posted April 26, 2010 Posted April 26, 2010 I think "StringRegExp" will do what you need. I'm no expert so the following example checks for any character in the pattern and therefore doesn't give the result you need:#include <array.au3> $string = "aqwert" $pattern = "qwbt" $R = StringRegExp($string,"[" & $pattern & "]",3) If IsArray($R) Then MsgBox(0,"",UBound($R)*100/StringLen($pattern) & "% match") Else MsgBox(0,"","0% match") EndIfMaybe some RegExpr Guru can jump in and give you the correct expression. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
fgthhhh Posted April 26, 2010 Author Posted April 26, 2010 StringRegExp worked like magic but i still don't understand how it work Approximate string matching showed me more than really complicated anyway, thanks you two so much, i will need research more
jchd Posted April 26, 2010 Posted April 26, 2010 You can use my Typos() fuzzy comparison function: Typos.au3 It computes the edit distance between two strings, that is the number of omissions, insertions, changes or swap of letters necessary to transform one string into the other. If you compare several strings in succession and keep one having the smallest errors (typos) you'll be home. Optionally, you can use two distinct wildcards in the second string: _ and % (the same characters than in SQL LIKE.) _ is a single character joker, much like ? in Windows filename patterns % may represent one or more characters, like Windows * (but % may only appear at the end of the second parameter) Try it and post again if you have problems using it. robertocm and mLipok 2 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
fgthhhh Posted April 29, 2010 Author Posted April 29, 2010 (edited) hi jchd, u wrote a awesome script but i don't understand what the function return ? 0 is the same? higher number mean more mistake? i try $asd = _Typos("aqwert", "qwertb") MsgBox(0,"",$asd) it return 2 what does it mean? Edited April 29, 2010 by fgthhhh
czardas Posted April 29, 2010 Posted April 29, 2010 (edited) hi jchd, u wrote a awesome scriptbut i don't understand what the function return ?0 is the same? higher number mean more mistake?i try $asd = _Typos("aqwert", "qwertb")MsgBox(0,"",$asd)it return 2what does it mean?I took a quick look at jchd's code. It seems that the return value 2 means that there are two changes needed to convert one string to the other. The changes are as follows:1. Delete the first character => a 2. Add a character on the end => b.This converts one string to the other in 2 steps. jchd will be able to tell you if I'm wrong about this. Edited April 29, 2010 by czardas operator64 ArrayWorkshop
jchd Posted April 29, 2010 Posted April 29, 2010 That's correct. If typos($str1, $str2) = 0 Then MsgBox(0, $str1 & ' and ' & $str2 & ' are identical (case-sensitive wise).') ; Computes the number of typos (Damerau-Levenshtein distance) between two strings. ; Four types of differences are counted: ; insertion of a character, abcd ab#cd ; deletion of a character, abcd acd ; exchange of a character abcd ab$d ; inversion of adjacent chars abcd acbd ; ; This function does NOT satisfy the so-called "triangle inequality", which means ; more simply that it makes NO attempt to compute the MINIMUM edit distance in all ; cases. If you need that, you should use more complex algorithms. ; ; This simple function allows a fuzzy compare for e.g. recovering from typical ; human typos in short strings like names, address, cities... while getting rid of ; minor scripting differences (accents, ligatures). ; ; Strings are unaccented then lowercased. ; String $st2 can be used as a pattern similar to the SQL 'LIKE' operator: ; '_' and trailing '%' act as in LIKE. These wildcards can be passed as parameters ; but % should appear at most once for the function to work properly. Another comment, comes from the C version I use for SQLite extension: ** TYPOS($str1, $str2) ** returns the "Damerau-Levenshtein distance" between StringLower(str1) and ** StringLower(str2). This is the number of insertions, omissions, changes ** and transpositions (of adjacent letters only). ** ** If the reference string is 'abcdef', it will return 1 (one typo) for ** 'abdef' missing c ** 'abcudef' u inserted ** 'abzef' c changed into z ** 'abdcef' c & d exchanged ** ** Only one level of "typo" is considered, e.g. the function will ** consider the following transformations to be 3 typos: ** 'abcdef' reference ** 'abdcef' c & d exchanged ** 'abdzcef' z inserted inside (c & d exchanged) ** In this case, it will return 3. Technically, it does not ** always return the minimum edit distance and doesn't satisfy ** the "triangle inequality" in all cases. It is nonetheless ** very useful to anyone having to lookup simple entry subject to ** user typo (e.g. name or city name). ** ** It will also accept '_' and a trailing '%' in str2, both acting ** as in SQL LIKE operator. ** ** You can use it this way: ** $str = "Leiwenschtein" ** If typos($str, 'leivencht%') <= 2; ** or this way: ** $nbErrors = typos($str1, $str2) ** ** NOTE: the implementation may seem naive but is open to several ** evolutions. Due to the complexity in O(n*m) you ** should reserve its use to _short_ fields only. There ** are much better algorithms for large fields (most of ** which are terrible for small strings.) The choice made ** reflects the typical need to match names, surnames, ** street addresses, cities or such data prone to typos ** in user input. Flexibility has been choosen over mere ** performance, because fuzzy search is _slow_ anyway. ** So you better have a 380% slower algo that retrieves ** the data you're after, than a 100% slow algo that misses ** them most of the times. ** ** | DO NOT use TYPOS in case StringInStr would do! for instance, if ** | your data contains a fixed substring (without typo), ** | then use: ** | If StringInStr($cityname, 'angel') Then ** | It will match 'Los Angeles' without question. If you try: ** | If typos($cityname, 'angel%') <= 4 Then ** | you will be overhelmed with data from everywhere, since up ** | to 4 typos allows for typically _many_ values (cities, here). Hope this clears some mud. If you still have practical problems using it in real-world, post here. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
fgthhhh Posted April 30, 2010 Author Posted April 30, 2010 (edited) thanks mate for answer. i want ask a question can i use your script for auto-correct word? if yes, can u show me an example? ex: "thraa" how can it auto-correct to "three"? can i compare the "thraa" with some possible words and choose the best? Edited April 30, 2010 by fgthhhh
jchd Posted April 30, 2010 Posted April 30, 2010 You may have some (relative) success in doing so, but mostly for limited cases. For instance, this function works well in selecting words from a list which have a spelling close to a given word. It was designed in this goal as an extension to a database engine. In your example, only a human brain or really "smart" program can chose which of threw, three, tharm (for instance) should be the replacement for thraa. For making the (right) correction by program, you have to identify he context, the grammar, the partial semantics and devise a target global semantics to infer the right correction. For spelling or grammar correction, you'll have much better time using one of the available libraries specialized in those task. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
fgthhhh Posted April 30, 2010 Author Posted April 30, 2010 (edited) all my words is just limit from one to twenty( 1->20) so it will not have threw or tharm can u show me a way to correct the word? i really need an example to understand the code Edited April 30, 2010 by fgthhhh
jchd Posted April 30, 2010 Posted April 30, 2010 Do you mean the numbers 1 to 20 in plain text? If so, place the text in an array $A and find the minimum of typos($A[$i], $word), if any. Try to come up with somehing of your own. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
fgthhhh Posted April 30, 2010 Author Posted April 30, 2010 (edited) help me checking if it's ok$numeros[0]="one" $numeros[1]="two" $numeros[2]="three" $numeros[3]="four" $numeros[4]="five" $numeros[5]="six" $numeros[6]="seven" $numeros[7]="eight" $numeros[8]="nine" $numeros[9]="ten" $numeros[10]="eleven" $numeros[11]="twelve" $numeros[12]="thirteen" $numeros[13]="fourteen" $numeros[14]="fifteen" $numeros[15]="sixteen" $numeros[16]="seventeen" $numeros[17]="eighteen" $numeros[18]="nineteen" $numeros[19]="twenty" $test_word = "thraa" dim $result[20] for $k = 0 to 19 $result[$k] = typos($numeros[$k], $test_word) next _ArraySort($result) ; or _arraymin($result) ; then i can get the lowest result but i can't get the correct wordi stucked here, i don't know how to get the correct answer Edited April 30, 2010 by fgthhhh
jchd Posted April 30, 2010 Posted April 30, 2010 Hey, calm down. There is no need to brag like you do! Use something along this line: #include <String.au3> Local Const $numeros[20] = [ _ "one", _ "two", _ "three", _ "four", _ "five", _ "six", _ "seven", _ "eight", _ "nine", _ "ten", _ "eleven", _ "twelve", _ "thirteen", _ "fourteen", _ "fifteen", _ "sixteen", _ "seventeen", _ "eighteen", _ "nineteen", _ "twenty" _ ] Local $test_word = "thraa" Local $bestMatch = StringLen($test_word), $bestMatchIdx, $typos For $k = 0 To UBound($numeros) - 1 $typos = Typos($numeros[$k], $test_word) If $typos < $bestMatch Then $bestMatch = $typos $bestMatchIdx = $k EndIf next ConsoleWrite(StringFormat("Best match for '%s' is %s (%u) with %u spelling errors.\n", $test_word, $numeros[$bestMatchIdx], $bestMatchIdx + 1, $bestMatch)) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
fgthhhh Posted April 30, 2010 Author Posted April 30, 2010 greatu are my hero, that extractly what i need
Malkey Posted April 30, 2010 Posted April 30, 2010 _EditDistance() function from here, appears to be another version of the Typos() function from post #5 , this thread.expandcollapse popup#include <String.au3> #include <Array.au3> #include <Math.au3> Local Const $numeros[21] = ["zero", "one", "two", "three", "four", "five", "six", _ "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", _ "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", "twenty"] Local $test_word = "thraa" Local $bestMatch = StringLen($test_word), $bestMatchIdx, $typos For $k = 0 To UBound($numeros) - 1 $typos1 = Typos($numeros[$k], $test_word) ConsoleWrite("Typos => " & $typos1 & " ") $typos = _EditDistance($numeros[$k], $test_word) ConsoleWrite($typos & " <= _EditDistance" & @CRLF) If $typos < $bestMatch Then $bestMatch = $typos $bestMatchIdx = $k EndIf Next ConsoleWrite(StringFormat("Best match for '%s' is '%s' with %u different, non-matching characters.\n", $test_word, $numeros[$bestMatchIdx], $bestMatch)) Func _EditDistance($s1, $s2) Local $m[StringLen($s1) + 1][StringLen($s2) + 1], $i, $j $m[0][0] = 0; boundary conditions For $j = 1 To StringLen($s2) $m[0][$j] = $m[0][$j - 1] + 1; boundary conditions Next For $i = 1 To StringLen($s1) $m[$i][0] = $m[$i - 1][0] + 1; boundary conditions Next For $j = 1 To StringLen($s2); outer loop For $i = 1 To StringLen($s1) ; inner loop If (StringMid($s1, $i, 1) = StringMid($s2, $j, 1)) Then $diag = 0; Else $diag = 1 EndIf $m[$i][$j] = _Min($m[$i - 1][$j] + 1, _ ; insertion (_Min($m[$i][$j - 1] + 1, _ ; deletion $m[$i - 1][$j - 1] + $diag))) ; substitution Next Next Return $m[StringLen($s1)][StringLen($s2)] ; $m ; EndFunc ;==>_EditDistance Func Typos(Const $st1, Const $st2, $anychar = '_', $anytail = '%') Local $s1, $s2, $pen, $del, $ins, $subst If Not IsString($st1) Then Return SetError(-1, -1, -1) If Not IsString($st2) Then Return SetError(-2, -2, -1) If $st2 = '' Then Return StringLen($st1) If $st2 == $anytail Then Return 0 If $st1 = '' Then Return (StringInStr($st2 & $anytail, $anytail, 1) - 1) EndIf ;~ $s1 = StringSplit(_LowerUnaccent($st1)), "", 2) ;; _LowerUnaccent() addon function not available here ;~ $s2 = StringSplit(_LowerUnaccent($st2)), "", 2) ;; _LowerUnaccent() addon function not available here $s1 = StringSplit(StringLower($st1), "", 2) $s2 = StringSplit(StringLower($st2), "", 2) Local $l1 = UBound($s1), $l2 = UBound($s2) Local $r[$l1 + 1][$l2 + 1] For $x = 0 To $l2 - 1 Switch $s2[$x] Case $anychar If $x < $l1 Then $s2[$x] = $s1[$x] EndIf Case $anytail $l2 = $x If $l1 > $l2 Then $l1 = $l2 EndIf ExitLoop EndSwitch $r[0][$x] = $x Next $r[0][$l2] = $l2 For $x = 0 To $l1 $r[$x][0] = $x Next For $x = 1 To $l1 For $y = 1 To $l2 $pen = Not ($s1[$x - 1] == $s2[$y - 1]) $del = $r[$x - 1][$y] + 1 $ins = $r[$x][$y - 1] + 1 $subst = $r[$x - 1][$y - 1] + $pen If $del > $ins Then $del = $ins If $del > $subst Then $del = $subst $r[$x][$y] = $del If ($pen And $x > 1 And $y > 1 And $s1[$x - 1] == $s2[$y - 2] And $s1[$x - 2] == $s2[$y - 1]) Then If $r[$x][$y] >= $r[$x - 2][$y - 2] Then $r[$x][$y] = $r[$x - 2][$y - 2] + 1 $r[$x - 1][$y - 1] = $r[$x][$y] EndIf Next Next Return ($r[$l1][$l2]) EndFunc ;==>Typos
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now