stamandster Posted August 28, 2016 Share Posted August 28, 2016 20 hours ago, czardas said: If you can ring the number and someone answers, you have a complete number - not a partial number. The numbers may have been gathered from various sources, manually written or otherwise. It is a real world scenario. I'm not sure that's completely accurate :-) A number doesn't have to belong to someone to be a "real number". There is, however, a difference between an assigned number and an unassigned number. The challenge (at least from what I understood) wasn't specifically to find real/fake or assignable/unassignable numbers, it was to match numbers from the reference list to the best possible means. czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 1 minute ago, stamandster said: A number doesn't have to belong to someone to be a "real number". There is, however, a difference between an assigned number and an unassigned number. You are absolutely right and I take back what I said. An unassigned number might be assigned at any point in the future. DUH! stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
argumentum Posted August 29, 2016 Share Posted August 29, 2016 (edited) 5 hours ago, czardas said: I am constantly learning new things from you Thank you ( may be out of context, but I'm still laughing ) Edited August 29, 2016 by argumentum stamandster 1 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting. Link to comment Share on other sites More sharing options...
orbs Posted August 29, 2016 Share Posted August 29, 2016 (edited) 9 hours ago, czardas said: One thing is for certain: there are some talented individuals around here and so far the discussion has been of value in many ways. hear, hear! i am still in mind that the solution should be a general one, rather than needing to accommodate for each and every domestic and international country-per-country telephone numbers formats, past, present and future. and if, per problem description, typos are not to be considered, then the "string difference" methods (suggested by jchd post #47 in SQLite, and presented in AutoIt by stamandster in post #62) is an overkill, and quite a heavy one. especially since - as i demonstrate hereunder - typos can be accommodated for (to some extent) by specifying a more lenient match score. about the optional task presented in post #4: +44208.....missing numbers [optional task] i assume this was not thought thru, since if you search for that, you end up with a LOT of numbers... if this is indeed the intention, then i elaborate on my suggestion at post #36 to form this: expandcollapse popup#include <Array.au3> ; tester input Global $aPhone = [ _ '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Global $aQuery = [ _ '882 8565', _ '123 8762', _ '7543010', _ '07843 543287', _ '00441619346534', _ '+44208', _ '0015417543012'] Global $iScoreThreshold = 0.7 Global $bCheckBothSides = True ; declare a global var to temporarily stote the match score for any specific match Global $iScore ; declare the match results array: rows = phone numbers, columns = queries Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1] ; populate headers (rows and columns) and strip non-numeric characters For $iPhone = 0 To UBound($aPhone) - 1 $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone]) Next For $iQuery = 0 To UBound($aQuery) - 1 $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery]) Next ; match For $iPhone = 1 To UBound($aMatch) - 1 For $iQuery = 1 To UBound($aMatch, 2) - 1 $iScore = _StringMatch($aMatch[$iPhone][0], $aMatch[0][$iQuery], $bCheckBothSides) If $iScore > $iScoreThreshold Then $aMatch[$iPhone][$iQuery] = String(Round($iScore * 100)) & '%' Next Next ; re-populate headers with original values for display For $iPhone = 0 To UBound($aPhone) - 1 $aMatch[$iPhone + 1][0] = $aPhone[$iPhone] Next For $iQuery = 0 To UBound($aQuery) - 1 $aMatch[0][$iQuery + 1] = $aQuery[$iQuery] Next ; display match results _ArrayDisplay($aMatch) ; functions Func _StringStripNonNumeric($sString) Local $sResult = '' For $i = 1 To StringLen($sString) If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1) Next Return $sResult EndFunc ;==>_StringStripNonNumeric Func _StringMatch($sString, $sSubstr, $bCheckBothSides) Local $iScoreMax = 0 Local $iScoreNow = 0 Local $iScorePerChar = 1 / StringLen($sSubstr) ; check end-first For $i = StringLen($sSubstr) To 1 Step -1 $iScoreNow = $iScorePerChar * $i If StringInStr($sString, StringRight($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow Next If $bCheckBothSides Then ; check start-first For $i = StringLen($sSubstr) To 1 Step -1 $iScoreNow = $iScorePerChar * $i If StringInStr($sString, StringLeft($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow Next EndIf ; done Return $iScoreMax EndFunc ;==>_StringMatch note that checking the start of the string can be disabled by setting $bCheckBothSides = False, and the score threshold can also be set. i've set it to 0.7, which i find adequate to the given data sets - other data sets will probably benefit from different threshold values, although i believe not too different. note: this version shows the match score (in percentage) instead of just the "MATCH!" notice. run it, let me know what you think. EDIT: i delegate you to run it, since the _ArrayDisplay GUI is too big to fit in a screenshot... Edited August 29, 2016 by orbs stamandster 1 Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff Link to comment Share on other sites More sharing options...
czardas Posted August 29, 2016 Author Share Posted August 29, 2016 (edited) 3 hours ago, orbs said: i assume this was not thought thru, You are right and that's why I said not to bother with it unless you want to. Any number starting with +44208 is in London. I would only expect numbers that match 0208 xxx xxxx or +44208... in the results. You should probably also allow 0011 44 208 xxx xxxx and similar. Edited August 29, 2016 by czardas stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
stamandster Posted August 29, 2016 Share Posted August 29, 2016 (edited) 6 hours ago, orbs said: hear, hear! ... +44208.....missing numbers [optional task] i assume this was not thought thru, since if you search for that, you end up with a LOT of numbers... if this is indeed the intention, then i elaborate on my suggestion at post #36 to form this: ; functions Func _StringStripNonNumeric($sString) Local $sResult = '' For $i = 1 To StringLen($sString) If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1) Next Return $sResult EndFunc ;==>_StringStripNonNumeric You can use StringRegExpReplace in place of your _StringStripNonNumeric :-) Also, while I think what your doing with the script is awesome :-) Can we really assume that because someone has typed 882 8565 that they also mean to add (01) prefix? The match shows 100% of 882 8565 for both (01) 882 8565 and +44 207 882 8565, which is not completely accurate. What does the percentage indicate? That 100% of the challenge phone number is within the database number? Let's look at 00441619346534, show's 79% similar to 00441619346434. But, I would say that it's 92.86% similar because only one number is different. The math is 100-($nDiff*(100/stringlen($cNum))). $nDiff are the amount of different numbers between the two, $cNum is the challenge number. Though, it get's a little more complicated if the compared numbers lengths vary at all. My own function of checking for accuracy is based on a couple different elements, not the above example. Which is why against the same number I show 99% instead of 92%. +44208 shows a match to +442078828565 @ 80%, which in my mind would be a 66.67% match. The more I play with the information the more I realize that one solution for finding a number does not work for all iterations of numbers and what data you expect to return. Latest code below... factors in partial matches for left to right and right to left of challenge number if under 7 digits long. expandcollapse popup#include <Array.au3> #include "typos.au3" #cs looking for 882 8565 123 8762 7543010 07843 543287 00441619346534 +44208.....missing numbers [optional task] 44208 0800275002 ; too short, japan local? 08000225649 ; 11 chars 08457128276 ; 11 chars 0015417543012 #ce GLOBAL $refNumT Local $aArray = _ ['+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Local $findnumb = _ ['882 8565','123 8762','7543010','07843 543287','00441619346534','+44208','0015417543012'] Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF _ & '---------------------------------------------------------------------------------------------------------------------'& @CRLF) For $i = 0 to Ubound($findnumb)-1 ; find these numbers! $reference = StringRegExpReplace($findnumb[$i],"[^0-9]","") ; Santize Numbers For $a = 0 to ubound($aArray)-1 GLOBAL $m = 0 $dbnumbers = StringRegExpReplace($aArray[$a],"[^0-9]","") ; Sanitize Numbers $refNumT1 = $reference $refNumT2 = $reference if $reference = $dbnumbers Then Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) Consolewrite('+> Exact Match to --] '& $aArray[$a] & ' [-- row '& $a & @CRLF) EndIf IF StringLen($reference) < 7 then ; Find Partial Match at beginning of databasen number, through cyclical deleting of one character at a time from left or right of challenge number Do ; matches left to right of challenge number against db number IF StringLeft($dbnumbers,StringLen($refNumT1)) = $refNumT1 then Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [-- using LAST '& StringLen($refNumT1) & ' digits'& @CRLF) consolewrite('+> Partial Match, matching FIRST '& StringLen($refNumT1)&' numbers ('& $refNumT2 &') of --] ' & $aArray[$a] & ' [-- row '& $a & @CRLF) EndIf $refNumT1 = StringTrimLeft($refNumT1,1) Until StringLen($refNumT1) = 1 OR StringLen($dbnumbers) = 10 Do ; matches right to left of challenge number against db number IF StringLeft($dbnumbers,StringLen($refNumT2)) = $refNumT2 then Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [-- using FIRST '& StringLen($refNumT2) & ' digits'& @CRLF) consolewrite('+> Partial Match, matching FIRST '& StringLen($refNumT2)&' numbers ('& $refNumT2 &') of --] ' & $aArray[$a] & ' [-- row '& $a & @CRLF) exitloop EndIf $refNumT2 = StringTrimRight($refNumT2,1) Until StringLen($refNumT2) = 1 OR StringLen($dbnumbers) = 10 endif if StringInStr($dbnumbers,$reference) then ; Find Partial Match within the numbers database Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) Consolewrite('+> Partial Match, within larger number --] '& $aArray[$a] & ' [-- row '& $a & @CRLF) ;ContinueLoop EndIf $typos = _Typos($dbnumbers, $reference) ; Find Similar numbers based on limits $stringlen = Stringlen($dbnumbers) / StringLen($reference) $similarity = Stringleft(100-($stringlen*$typos),6) IF $similarity > 97.5 then Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) consolewrite('+> Similarity Match, '& $similarity &'% similar to number --] '& $aArray[$a] &' [-- row '& $a & @CRLF) EndIf Next Next Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF _ & '---------------------------------------------------------------------------------------------------------------------'& @CRLF) Output --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- > Phone Number --] 882 8565 [-- +> Partial Match, within larger number --] (01) 882 8565 [-- row 16 > Phone Number --] 882 8565 [-- +> Partial Match, within larger number --] +44 207 882 8565 [-- row 32 > Phone Number --] 7543010 [-- +> Partial Match, within larger number --] +441347543010 [-- row 26 > Phone Number --] 00441619346534 [-- +> Similarity Match, 99% similar to number --] 00441619346434 [-- row 17 > Phone Number --] 00441619346534 [-- +> Similarity Match, 97.642% similar to number --] 0161 934 6534 [-- row 30 > Phone Number --] +44208 [-- using LAST 2 digits +> Partial Match, matching FIRST 2 numbers (44208) of --] 08457 128276 [-- row 5 > Phone Number --] +44208 [-- using LAST 2 digits +> Partial Match, matching FIRST 2 numbers (44208) of --] 0-800-022-5649 [-- row 8 > Phone Number --] +44208 [-- using FIRST 2 digits +> Partial Match, matching FIRST 2 numbers (44) of --] +441347543010 [-- row 26 > Phone Number --] +44208 [-- using FIRST 4 digits +> Partial Match, matching FIRST 4 numbers (4420) of --] +44 207 882 8565 [-- row 32 > Phone Number --] +44208 [-- using FIRST 4 digits +> Partial Match, matching FIRST 4 numbers (4420) of --] +44 20 7946 0321 [-- row 41 > Phone Number --] 0015417543012 [-- +> Similarity Match, 98.307% similar to number --] +1-541-754-3012 [-- row 25 --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- Edited August 29, 2016 by stamandster Link to comment Share on other sites More sharing options...
orbs Posted August 29, 2016 Share Posted August 29, 2016 (edited) 1 hour ago, stamandster said: You can use StringRegExpReplace in place of your _StringStripNonNumeric :-) i never got the hang of RegExp... 1 hour ago, stamandster said: Let's look at 00441619346534, show's 79% similar to 00441619346434 the score is based on the longest successive substring, which in this case: phone: 00441619346534 query: 00441619346434 (don't you just love monospace fonts? ) now, 11 characters out of 14 is (rounded to) 79%. assuming what we're assuming about typos (i.e. they should be ignored), i feel that a match score of 79% is more sound them 93%. @czardas? also what i failed to highlight, is that once the array contains match scores, it can be sorted - which makes it quite easy for the end user to distinguish between the results. one can easily overlook the dissimilarity demonstrated above, but when they see it's only 79%, they will (hopefully) double-check. especially when - which is more troubling - the match score for 0161 934 6534 is only 71%. the 79% match score takes into account both sides of the query string, which as i mentioned, can (and should) be disabled, so the 71% match score becomes the only result, which happens to be the required result. it scores low because the query is so long. perhaps i should cross-check the shorter string against the longer one, whichever they may be. ... ok, i introduced the cross-check. now the 71% match score reevaluates to 91%. yey! also, another match score that was 85% now reevaluates to 100%, as it should - it matches the query 0015417543012 to the phone +1-541-754-3012. double yey! this is the updated code: expandcollapse popup#include <Array.au3> #include <Math.au3> ; tester input Global $aPhone = [ _ '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Global $aQuery = [ _ '882 8565', _ '123 8762', _ '7543010', _ '07843 543287', _ '00441619346534', _ '+44208', _ '0015417543012'] Global $iScoreThreshold = 0.7 Global $bCheckBothSides = True ; declare a global var to temporarily stote the match score for any specific match Global $iScore ; declare the match results array: rows = phone numbers, columns = queries Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1] ; populate headers (rows and columns) and strip non-numeric characters For $iPhone = 0 To UBound($aPhone) - 1 $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone]) Next For $iQuery = 0 To UBound($aQuery) - 1 $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery]) Next ; match For $iPhone = 1 To UBound($aMatch) - 1 For $iQuery = 1 To UBound($aMatch, 2) - 1 $iScore = _Max(_StringMatch($aMatch[$iPhone][0], $aMatch[0][$iQuery], $bCheckBothSides), _StringMatch($aMatch[0][$iQuery], $aMatch[$iPhone][0], $bCheckBothSides)) If $iScore > $iScoreThreshold Then $aMatch[$iPhone][$iQuery] = String(Round($iScore * 100)) & '%' Next Next ; re-populate headers with original values for display For $iPhone = 0 To UBound($aPhone) - 1 $aMatch[$iPhone + 1][0] = $aPhone[$iPhone] Next For $iQuery = 0 To UBound($aQuery) - 1 $aMatch[0][$iQuery + 1] = $aQuery[$iQuery] Next ; display match results _ArrayDisplay($aMatch) ; functions Func _StringStripNonNumeric($sString) Local $sResult = '' For $i = 1 To StringLen($sString) If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1) Next Return $sResult EndFunc ;==>_StringStripNonNumeric Func _StringMatch($sString, $sSubstr, $bCheckBothSides) Local $iScoreMax = 0 Local $iScoreNow = 0 Local $iScorePerChar = 1 / StringLen($sSubstr) ; check end-first For $i = StringLen($sSubstr) To 1 Step -1 $iScoreNow = $iScorePerChar * $i If StringInStr($sString, StringRight($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow Next If $bCheckBothSides Then ; check start-first For $i = StringLen($sSubstr) To 1 Step -1 $iScoreNow = $iScorePerChar * $i If StringInStr($sString, StringLeft($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow Next EndIf ; done Return $iScoreMax EndFunc ;==>_StringMatch oh, and let's round the percentage. no-one cares for the 0.86% in the 92.86%. just put 93%, ok? 1 hour ago, stamandster said: The match shows 100% of 882 8565 for both (01) 882 8565 and +44 207 882 8565, which is not completely accurate. why not? the query number appears in both phones in exact. Edited August 29, 2016 by orbs czardas and stamandster 2 Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff Link to comment Share on other sites More sharing options...
stamandster Posted August 29, 2016 Share Posted August 29, 2016 (edited) @orbs I like your improvements to the script! ... in regards to the 100% match, the number as a whole doesn't match, but I understand what you mean. 12345 is not the 100% the same as 345 even if 345 exists within 12345. See what I mean? Below is the sample that will sanitize input var to only have numbers $new = StringRegExpReplace($value,"[^0-9]","") Edited August 29, 2016 by stamandster Link to comment Share on other sites More sharing options...
czardas Posted August 29, 2016 Author Share Posted August 29, 2016 5 hours ago, orbs said: assuming what we're assuming about typos (i.e. they should be ignored), i feel that a match score of 79% is more sound them 93%. @czardas? I very much like the way you are going with this. Finding typos was not the original intention, however If they get caught in the net as a side effect of the method, I still consider it a valid approach. False positives are to be expected anyway. The (01) prefix was made up, but could easily occur as an outward dialing code from within an internal company network. Perhaps I ought to have verified every number, but unassigned numbers should actually be included. I hope that clears up any current doubts. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
stamandster Posted August 30, 2016 Share Posted August 30, 2016 (edited) Ok, created a different approach to what I've posted earlier... does away with typos.au3 ;-) expandcollapse popup#include <Array.au3> GLOBAL $MatchPerc = 30 ; no less than % to match on GLOBAL $MatchLen = 3 ; no smaller than X digits to match on Local $adbPNum = _ ['+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Local $arPNum = _ ['882 8565', _ '123 8762', _ '7543010', _ '07843 543287', _ '00441619346534', _ '+44208', _ '0015417543012'] Consolewrite('--> Matching Threshold '& $MatchPerc &'% of Reference Number | No less than '& $MatchLen & ' digits' & @CRLF) For $i = 0 to Ubound($arPNum)-1 ; find these numbers! $rPNum = StringRegExpReplace($arPNum[$i],"[^0-9]","") ; Santize Numbers For $a = 0 to ubound($adbPNum)-1 $dbPNum = StringRegExpReplace($adbPNum[$a],"[^0-9]","") ; Sanitize Numbers $pM = _PhoneMatch($rPNum,$dbPNum,$MatchPerc,$MatchLen) IF $pM <> 0 then $sPM = StringSplit($pM,'|') Consolewrite($sPM[1] &'% of Reference Number '& $arPNum[$i] &' matches '& $sPM[2] & '% of DB Phone Number '& $adbPNum[$a] & ' -- Accuracy of ' & $sPM[3] &'%'& @CRLF) EndIf Next Next FUNC _PhoneMatch($_refNum,$_dbNum, $_pMatch = 50, $_refNumLen = 3) LOCAL $_refNumC = $_refNum LOCAL $_dbNumC = $_dbNum LOCAL $C = 0 LOCAL $swap = 0 IF Stringlen($_refNum) > Stringlen($_dbNum) Then $swap = 1 $_refNumC = $_dbNum $_dbNumC = $_refNum EndIf $_refNumR = $_refNumC ; cached Right $_refNumL = $_refNumC ; cached Left Do IF $c <> 0 then $_refNumR = StringTrimRight($_refNumR,1) $_refNumL = StringTrimLeft($_refNumL,1) endif $percDbNum = (StringLen($_refNumR)*(100/StringLen($_dbNumC))) $percRefNum = (StringLen($_refNumR)*(100/StringLen($_refNumC))) Select Case (StringInStr($_dbNumC,$_refNumL) OR StringInStr($_dbNumC,$_refNumR)) and $percDbNum >= $_pMatch if $swap = 1 then $_PMAcc = (StringLeft($percDbNum,5) + StringLeft($percRefNum,5))/2 Return StringLeft($percDbNum,5) &'|'& StringLeft($percRefNum,5) &'|'& StringLeft($_PMAcc,5) else $_PMAcc = (StringLeft($percDbNum,5) + StringLeft($percRefNum,5))/2 Return StringLeft($percRefNum,5) &'|'& StringLeft($percDbNum,5) &'|'& StringLeft($_PMAcc,5) endif EndSelect $c = $c + 1 until StringLen($_refNumL) = $_refNumLen OR StringLen($_refNumR) = $_refNumLen EndFunc Output --> Matching Threshold 30% of Reference Number | No less than 3 digits 100% of Reference Number 882 8565 matches 77.77% of DB Phone Number (01) 882 8565 -- Accuracy of 88.88% 100% of Reference Number 882 8565 matches 58.33% of DB Phone Number +44 207 882 8565 -- Accuracy of 79.16% 85.71% of Reference Number 7543010 matches 54.54% of DB Phone Number +1-541-754-3012 -- Accuracy of 70.12% 100% of Reference Number 7543010 matches 58.33% of DB Phone Number +441347543010 -- Accuracy of 79.16% 78.57% of Reference Number 00441619346534 matches 78.57% of DB Phone Number 00441619346434 -- Accuracy of 78.57% 71.42% of Reference Number 00441619346534 matches 90.90% of DB Phone Number 0161 934 6534 -- Accuracy of 81.16% 80% of Reference Number +44208 matches 33.33% of DB Phone Number +44 207 882 8565 -- Accuracy of 56.66% 80% of Reference Number +44208 matches 33.33% of DB Phone Number +44 20 7946 0321 -- Accuracy of 56.66% 84.61% of Reference Number 0015417543012 matches 100% of DB Phone Number +1-541-754-3012 -- Accuracy of 92.30% Edited August 31, 2016 by stamandster fixed precentages of referenced number when swapped, tried to add accuracy percentage czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted August 30, 2016 Author Share Posted August 30, 2016 Just one more day left for late entries. Those who submitted more than one code example will have the last version taken to be their official entry. Thanks to all those participating and for the support of others. Good luck to everyone. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
markyrocks Posted August 31, 2016 Share Posted August 31, 2016 i been working on this for days..... its still not quite right but i just FINALLY got it to do remotely what i wanted it to. Its a mess and damn near impossible to decipher. But im proud of it. It finds all the matches i believe. I didn't get into partial matches. I just matched the search criteria to the phone number that all the numbers were there and in the correct order? I'm sure i could clean it up and refine it even further but i've already spent too much time on this. I was not able to do this without using 2d arrays. I will say i'm much better at using and understanding arrays. I tried to use ALOT of notes so that anyone that looks at it has some kinda guide. expandcollapse popup#include <Array.au3> Local $aArray = _ ['+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Global $data local $search=['882 8565', '123 8762', '7543010', '07843 543287', '00441619346534', '+44208', '0015417543012'] Local $compare='0123456789',$bool,$temp,$newtemp='',$bs,$result for $x=0 to UBound($aArray) -1 ;shuffle through array $aArray[$x]=_Sort($aArray[$x]) Next for $x=0 to UBound($search) -1 ;shuffle through array $search[$x]=_Sort($search[$x]) Next ;~ _ArrayDisplay($search) ;so so now everything is broken down lets compare for $x=0 to UBound($aArray)-1 for $y=0 to UBound($search)-1 $stringinstr=StringInStr($aArray[$x],$search[$y]) if $stringinstr<>0 Then ;success MsgBox('','success','search criteria=' & $search[$y] & 'phone number result=' & $aArray[$x]) EndIf Next Next ;search for missing numbers..... _RefinedSort(1) Func _RefinedSort($data) Local $newsearcharray[ubound($search)][1] ;$newsearcharray[$x][0]=number of digits in the search criteria ;~ _ArrayDisplay($search) for $x=0 to UBound($search)-1 for $y=0 to StringLen($search[$x]) if stringlen($search[$x]) > UBound($newsearcharray, 2) Then ReDim $newsearcharray[UBound($search)][StringLen($search[$x])+1] ;this section breaks down the search criteria into individual digits and also saves the length EndIf ;~ MsgBox('','',stringlen($search[$x])) ;~ msgbox('','',UBound($newsearcharray,2)) if $y=0 Then $newsearcharray[$x][$y]=StringLen($search[$x]) EndIf if $y>=1 Then $newsearcharray[$x][$y]=StringMid($search[$x],$y,1) EndIf Next Next ;~ _ArrayDisplay($newsearcharray) ;working good to here... ;~ _ArrayDisplay($aArray) Local $temparray[UBound($aArray)] ;this just saves the original array to a temp that can be manipulated for $x=0 to UBound($aArray)-1 $temparray[$x]=$aArray[$x] Next ;lets search the temparray to see if the search numbers actually exist in the aArray for $w=0 to UBound($temparray)-1 Local $count=0 for $x=0 to UBound($newsearcharray,1) for $y=1 to UBound($newsearcharray,2) $stringinstr=StringInStr($temparray[$w],$newsearcharray[$x][$y]) if $stringinstr<>0 Then $count+=1 Else $temparray[$w]='' ExitLoop 2 EndIf if $count=$newsearcharray[$x][0] Then ;~ msgbox('','','partial match found.....') ExitLoop 2 EndIf Next Next Next ;the temp array should be at least trimmed down at this point..... ;~ _ArrayDisplay($temparray) ;good good not sure how accurate the result is to this point but progress nonetheless ;*************************************************************************************************************************************** for $w=0 to UBound($temparray)-1 ;starts the loop to shuffle through $temparray(trimmed) for $x=0 to UBound($newsearcharray,1)-1 ;starts the loop to shuffle through $newsearcharray[search criterial][digit], $newsearcharray[$x][0]=number of digits in the search ;~ MsgBox('','',$newsearcharray[$x][0]) ;testing area ;~ _ArrayDisplay($newsearcharray) ;~ _ArrayDisplay($charpos) Local $occurrence=0, $count=0, $tempoccurrence=1, $temp="" Local $charpos[2][$newsearcharray[$x][0]] $occurrence=0 if $newsearcharray[$x][0]>UBound($charpos) Then ReDim $charpos[2][$newsearcharray[$x][0]] ;the row saves how many times the digit occurs in the possible match EndIf $occurrence=0 for $y=1 to $newsearcharray[$x][0] ; shuffles through the digits in the search criteria $newsearcharray[$x][0]=number of digits in the search if $temparray[$w]='' then ;skips any entry in temparray that is blank ExitLoop 2 EndIf ;~ _ArrayDisplay($newsearcharray) for $k=1 to $newsearcharray[$x][0] ;can probably make it so this is skipped if its already verified that the digits exist $stringinstr=StringInStr($temparray[$w],$newsearcharray[$x][$y],0,$k) ;this should figure out if a particular digit occurs more than once ;~ MsgBox('','stringinstr, temparray,newsearcharray', $stringinstr & ' ' & $temparray[$w] & ' ' & $newsearcharray[$x][$y]) if $stringinstr=0 Then ExitLoop EndIf if $stringinstr<>0 then $occurrence+=1 ;~ MsgBox('','','occurrence +1') EndIf next ;~ _ArrayDisplay($charpos) if $occurrence=0 Then ;this should make it so if the digit doesn't exist at this point then exit loop and move on to the next search term ExitLoop EndIf ;so at this point we should be sure that all digits in the search term exist $charpos[0][$y-1]=StringInStr($temparray[$w],$newsearcharray[$x][$y],0,$tempoccurrence) ;the digit exists and saves its position ;~ _ArrayDisplay($charpos) $charpos[1][$y-1]=$occurrence ;saves how many times the digit occurs ;~ _ArrayDisplay($charpos) $occurrence=0 ; fucking variable ;*****************************************looks good to this point&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& ;if a position is listed More than once but only has 1 occurrence then that means its in the search criteria more than once so it is not possible to be a match if $y=$newsearcharray[$x][0] Then $jesuschrist=1 for $a=0 to UBound($charpos,2)-1 ;first position............ $jesuschrist=1 for $b=$a+1 to UBound($charpos,2)-1 if $charpos[0][$a]=$charpos[0][$b] then $jesuschrist+=1 ;~ _ArrayDisplay($charpos) EndIf if $charpos[0][$a]=$charpos[0][$b] and $jesuschrist>$charpos[1][$a] then ;~ MsgBox('','charA, charB, char[1][a], Jesuschrist', $charpos[0][$a] & ' ' & $charpos[0][$b] & ' ' & $charpos[1][$a] & ' ' & $jesuschrist) ;~ MsgBox('','','exiting loop') ExitLoop 3 EndIf Next $jesuschrist=1 Next EndIf ;**********************************************looks good to here...............jezzzzz===================================================================== if $y=$newsearcharray[$x][0] Then $pp=2 for $a=0 to UBound($charpos,2)-1 ;this basicly sorts through the positions and makes sure that each occurrence of a number has a unique position...lol wtf for $b=$a+1 to UBound($charpos,2)-1 if $charpos[1][$a]=1 Then ;exits the loop if the occurrence is one ExitLoop endif ;~ _ArrayDisplay($charpos) if $charpos[0][$a]=$charpos[0][$b] Then ;~ MsgBox('','charpos[0][$b] newsearcharray temparray, pp', $charpos[0][$b] & " " & $newsearcharray[$x][$a+1] & " " & $temparray[$w] & ' ' & $pp) $charpos[0][$b]=StringInStr($temparray[$w],$newsearcharray[$x][$a+1],0,$pp) ;this is so that if a digit occurs more than once ;~ MsgBox('','charpos[0][$b] newsearcharray temparray, pp', $charpos[0][$b] & " " & $newsearcharray[$x][$a+1] & " " & $temparray[$w] & ' ' & $pp) EndIf if $charpos[0][$b]=0 then ;~ MsgBox('','','exiting loop') ExitLoop 3 EndIf ;~ _ArrayDisplay($charpos) if $charpos[1][$a]>$pp Then $pp+=1 EndIf Next $pp=2 Next ;~ MsgBox('','','made it') EndIf ; so now that every digit is accounted for and has a unique position... where getting really close..... ;~ MsgBox('','',$temparray[$w]) if $y=$newsearcharray[$x][0] Then for $a=UBound($charpos,2)-1 to 0 step -1 for $b=$a-1 to 0 step -1 ;charpos cooresponds with the digits in search in order charpos[0][0] is the first digit in newsearcharray ;~ MsgBox('','charposA, charposB',$charpos[0][$a] & ' ' & $charpos[0][$b]) if $charpos[0][$a]<$charpos[0][$b] Then ;looks at the last digit first so if any of the digits behind it are positioned ahead of then theyre out of order ;~ MsgBox('','','exiting loop') ExitLoop 3 EndIf ;~ MsgBox("","",$temparray[$w]) if $a=1 Then ;if it makes it this far then we should have a result........... for $g=1 to UBound($newsearcharray,2)-1 ;this puts the matching search criteria back together $temp&=$newsearcharray[$x][$g] Next ;~ MsgBox('','',$w) ;~ _ArrayDisplay($temparray) MsgBox('','Match','search criteria=' & $temp & 'match=' & $temparray[$w]) EndIf ;~ MsgBox('','a',$a) Next Next EndIf Next ;end of the digit shuffle loop ****************************************************************** Next Next EndFunc ;******************************************************************************************** Func _Sort($data) local $length=StringLen($data) ;finds the length of the string Local $temp[$length+1] for $y=1 to $length $temp[$y]=stringmid($data,$y,1) ;breaks the string down to individual chars Next ;_ArrayDisplay($temp) ;seems good to here for $z=1 to $length $stringinstr=StringInStr('0123456789',$temp[$z]) ;working to here...... if $stringinstr <> 0 Then ;saves the numbers to a new string with no spaces or random characters $newtemp &= $temp[$z] EndIf $data=$newtemp Next $newtemp="" Return $data EndFunc stamandster 1 Spoiler "I Believe array math to be potentially fatal, I may be dying from array math poisoning" Link to comment Share on other sites More sharing options...
czardas Posted August 31, 2016 Author Share Posted August 31, 2016 @markyrocks Thanks for your contribution. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted September 1, 2016 Author Share Posted September 1, 2016 This challenge is now closed for further entries. There will be a short pause before the announcement of the winner. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
TheSaint Posted September 1, 2016 Share Posted September 1, 2016 Damn! I've been working all week on a solution as per a request from Somerset, but now he won't pay me, because I missed the deadline. Wait until I work out which of his phone numbers is the correct one, then I will give him a piece of my mind. Spoiler Just Kidding .... September Fool's Day! czardas, argumentum and stamandster 3 Make sure brain is in gear before opening mouth! Remember, what is not said, can be just as important as what is said. Spoiler What is the Secret Key? Life is like a Donut If I put effort into communication, I expect you to read properly & fully, or just not comment. Ignoring those who try to divert conversation with irrelevancies. If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it. I'm only big and bad, to those who have an over-active imagination. I may have the Artistic Liesense to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage) Link to comment Share on other sites More sharing options...
argumentum Posted September 1, 2016 Share Posted September 1, 2016 2 hours ago, TheSaint said: but now he won't pay me y'all know I've won this due to my superior coding skills and unchallenged intelligence, but relax, I'll share the bounty PS: I won't be able to do this when the price is a coffee cup. stamandster 1 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting. Link to comment Share on other sites More sharing options...
stamandster Posted September 1, 2016 Share Posted September 1, 2016 Must take a long time to make those digital medals ;-) Link to comment Share on other sites More sharing options...
czardas Posted September 1, 2016 Author Share Posted September 1, 2016 7 minutes ago, stamandster said: Must take a long time to make those digital medals ;-) Sorry for the delay. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
iamtheky Posted September 2, 2016 Share Posted September 2, 2016 "Doesnt matter who wins cause they're all losers" - trolololol argumentum and stamandster 2 ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
czardas Posted September 2, 2016 Author Share Posted September 2, 2016 After some deliberation, I have come to a decision to call this a draw between orbs and stamandster. Both your examples are better than my attempt, although tweaking some regular expressions would improve mine to a degree. This was a deceptively difficult challenge and I think your examples are as good as anything some MVPs could have created. Not to be put off by their lack of enthusiasm for this: the only person who volunteered to look at your examples was Jos. To be fair, some people I asked are not able to do so for one reason or another. Despite some confusion over the first post, you have demonstrated, to me at least, that my description was clear enough. Perhaps I could have elaborated more. I wasn't quite sure how to approach this problem myself, and I am most impressed by the winners. I think orbs quickly got on the right track, and stamandster put in great effort to refine his code. Both examples passed further tests with flying colours, although I had to lower orbs' score threshold to get through to Argentina. I declare you both champions of the unofficial August 2016 AutoIt code challenge. Many thanks to all who participated here. @Somerset Better luck next time. argumentum and stamandster 2 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now