stamandster Posted August 28, 2016 Share Posted August 28, 2016 Just now, czardas said: Can they? I really doubt that. South American countries have some odd rules, but the last 7 or 8 digits remain unchanged. That doesn't mean that the number your searching for has been inputted correctly to match what your hoping to find ;-) and see a rule for every country becomes cumbersome. Deviation can exist anywhere for the input side because of human error. Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) Human error is a different challenge! Not that I don't like your idea. Edited August 28, 2016 by czardas stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
stamandster Posted August 28, 2016 Share Posted August 28, 2016 2 minutes ago, czardas said: Human error is a different challenge! Not that I don't like your idea. Lol yeah ;-) the challenge numbers are in error as some don't exist in their current form czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 2 minutes ago, stamandster said: the challenge numbers are in error as some don't exist in their current form They might exist somewhere. stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Somerset Posted August 28, 2016 Share Posted August 28, 2016 23 minutes ago, czardas said: In this case "Easy" is deceptive. Concept is quite simple though. Anyway, when are you going to write some AutoIt? I am taking the political view on this challenge. I will outsource it, and then claim i did it. stamandster and czardas 2 Link to comment Share on other sites More sharing options...
jchd Posted August 28, 2016 Share Posted August 28, 2016 51 minutes ago, czardas said: Typos.au3 is unsuitable for this. The method returns matches which are obviously wrong. It's an interesting idea, but fails terribly. Do you mean that there are no errors in the list of queries? If so I agree that this kind of fuzzy search isn't the tool to use and that I misunderstood your goal. But then a regex would do. But if you expect to match a query "local" or short number (no country code, no area code) against a real-world long list of actual numbers, then you'll get erroneous/misleading results as well. I look at the problem as ill-posed: "I failed to know my data at the right time and I'm now facing an untractable mess". Phone numbers aren't given by martians or a random lottery, they come from some source along with a meaning: they aren't data, but information. E.g. this is the phone number of one customer in Denmark, this other one is for a friend in India, a.s.o. Failing to turn the raw data (the series of digits and signs) into valid, useable information in the first place is the actual issue. The same applies to the query list: you're supposed to know where you live and make a difference between 123456 being the local UK number of your neighbour, 123456 being the middle part of the number of a company in Singapore and the same 123456 being the real local number of your aunt in Colorado. With no information but instead just a pile of raw data, then exact matches could possibly be "reliably" obtained by regexp but meaningless for acual processing. Another big catch is that phone numbers continuously change over time all around the world. You must be some telco entity to track those changes reliably and adjust your list accordingly. stamandster 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) 20 minutes ago, jchd said: You must be some telco entity to track those changes reliably and adjust your list accordingly. That is a problem. For the rest, you have no idea where the number originated. I thought I had made that clear in the first post. If someone travels around the globe, they may have contacts with duplicate phone number entries. On the other hand, if someone doesn't travel abroad, they won't include country codes (sometimes there is no area code). If these people use an application to search for a phone number, you could simply use StringInStr(). However, many people find phone numbers on the Internet and add them to databases etc... So there is a potential need to automate recognition. Naturally there will always be some false positives. Edited August 28, 2016 by czardas stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) On 8/23/2016 at 6:15 PM, czardas said: The tricky part is that you don't know anything about the number's location, user location or number formatting. Read again the specs. Why is this not clear? Edited August 28, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted August 28, 2016 Share Posted August 28, 2016 But that's exactly my point! Gathering partial (unspecified) numbers from around the globe in a database is just piling meaningless data. Would you sensibly store 123456 in three distinct phonebook entries without associating them with some clues about location and number owner? If yes how are you going to call your aunt if you're in hollidays in Germany, or selling insurance contracts while in London? It's just guesswork to me. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) If you can ring the number and someone answers, you have a complete number - not a partial number. The numbers may have been gathered from various sources, manually written or otherwise. It is a real world scenario. Edited August 28, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
argumentum Posted August 28, 2016 Share Posted August 28, 2016 (edited) 12 minutes ago, czardas said: It is a real world scenario. dumpster diving scenario nevertheless a good exercise Edited August 28, 2016 by argumentum czardas 1 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting. Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) 3 minutes ago, argumentum said: dumpster diving scenario Not everyone is a council worker. Edited August 28, 2016 by czardas argumentum 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) I have made some modifications to fix the false negative results in my previous code. @jchd is right in pointing out the impossibilities of being able to produce a 100% reliable match. It's impossible to differentiate between short country codes and long local area codes. My code below does not differentiate between different countries: instead it only checks that country code formatting is the likely cause of a mismatch before returning the result as a potential match. In no way is it perfect, but it's also not so likely that many people will store international telephone numbers that will cause collision. Unless you are a telephone company, having the same subscriber number (in different countries) appearing in your contacts is likely to be a fluke coincidence that happens once in a lifetime. You can also see that they have different prefixes as soon as you check the search results. expandcollapse popupMsgBox(0, "Malta +356", TelCompare('21 12345678', '0011 356 12345678')) MsgBox(0, "Argentina +54", TelCompare('0 22 15 12345678', '010 54 9 22 12345678')) MsgBox(0, "Hungary +36", TelCompare('06 12345678', '+36 12345678')) Func TelCompare($sTelNum1, $sTelNum2, $iMinMatch = 3) ; Maximum Length = 25 probably ; get rid of typical delimiters $sTelNum1 = StringRegExpReplace($sTelNum1, '[ \+\(\)\-]', '') $sTelNum2 = StringRegExpReplace($sTelNum2, '[ \+\(\)\-]', '') If $sTelNum1 = $sTelNum2 Then Return True ; no need to go any further Local $iLen1 = StringLen($sTelNum1), $iLen2 = StringLen($sTelNum2) If $iLen2 < $iLen1 Then ; make $sTelNum1 the shorter number Local $vTemp = $iLen1 $iLen1 = $iLen2 $iLen2 = $vTemp $vTemp = $sTelNum1 $sTelNum1 = $sTelNum2 $sTelNum2 = $vTemp EndIf If $iLen1 <= $iMinMatch Then Return False ; insufficient information If StringRight($sTelNum1, $iMinMatch) <> StringRight($sTelNum2, $iMinMatch) Then Return False ; minimum match failed $sTelNum1 = StringReverse($sTelNum1) ; to simplify parsing later $sTelNum2 = StringReverse($sTelNum2) ; dito ; the algorithm [international dialing codes all begin with zero] Local $sDigit1, $sDigit2 For $i = $iMinMatch +1 To $iLen1 $sDigit1 = StringMid($sTelNum1, $i, 1) $sDigit2 = StringMid($sTelNum2, $i, 1) If $sDigit1 <> $sDigit2 Then ; let's find out why Local $iOffSet = $iLen2 - $iLen1 If $i = $iLen1 Then ; we have reached the first digit - test the first single digit omission theory with country codes (reversed) ; maybe omitted in $sTelNum2 or different international dialing code Return (StringRegExp($sDigit1, '[078]') And StringRegExp(StringRight($sTelNum2, $iOffSet +1), '(\d){1,3}(00|1100|010|110)?')) Else ; odd exceptions or differences in international dialing codes (reversed) If $i = $iLen1 -1 Then ; we have reached the penultimate digit Local $sSub = StringRight($sTelNum1, 2) If $sSub = '12' And StringRegExp(StringRight($sTelNum2, $iOffSet +2), '(653)(00|1100|010|110)?') Then Return True ; Malta +356 If $sSub = '22' And StringRegExp(StringRight($sTelNum2, $iOffSet +2), '(132)(00|1100|010|110)?') Then Return True ; Liberia +231 EndIf ; check Latin American exceptions If StringRegExp($sTelNum2, '(75)(00|1100|010|110)?\z') And $i > 7 Then Return True ; Colombia +57 If StringRegExp($sTelNum2, '(45|55)(00|1100|010|110)?\z') And $i > 8 Then Return True ; Argentina +54, Brazil +55 If StringRegExp($sTelNum2, '(25)(00|1100|010|110)?\z') And $i > 10 Then Return True ; Mexico +52 ; check for international dialing code discrepancies (reversed) If StringRegExp($sDigit1, '[01]') Or StringRegExp($sDigit2, '[01]') Then $sTelNum1 = StringRight($sTelNum1, $iLen1 - $i) $sTelNum2 = StringRight($sTelNum2, $iLen2 - $i) Return (StringRegExp($sTelNum1, '\A(0|00|10|100)\z') And StringRegExp($sTelNum2, '\A(0|00|10|100)\z')) EndIf EndIf EndIf Next Return True EndFunc ;==> TelCompare The code is based on the information I posted earlier (which may be subject to change).http://www.onesimcard.com/how-to-dial/ I haven't thoroughly tested it yet. It shouldn't miss any possible matches now. Edit: Although working for the examples given, this code is based on some false assumptions. Edited August 31, 2016 by czardas Removed 1 line of code + small modification. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
pluto41 Posted August 28, 2016 Share Posted August 28, 2016 expandcollapse popup; Input check for valid phone numbers ; Documentation about phone number conventions: https://en.wikipedia.org/wiki/National_conventions_for_writing_telephone_numbers #include <array.au3> Opt ('MustDeclareVars', 1) Local $findnumb = _ ['882 8565','123 8762','7543010','07843 543287','00441619346534','+44208','0015417543012'] Local $strPhoneNumber, $intPhoneNumberLength For $i = 0 to UBound ( $findnumb ) - 1 $strPhoneNumber = $findnumb[$i] $strPhoneNumber = StringRegExpReplace($findnumb[$i],"[^0-9]","") ; * don't remove any starting 0 digit. They are used in some City Codes * $intPhoneNumberLength = StringLen ( $strPhoneNumber ) Switch ( $intPhoneNumberLength ) Case 12 ; to 15 [ a maximum of 15 numbers is reserved for use. Not sure if any country uses numbers higher then 12 digits-long at the moment. ConsoleWrite ( $strPhoneNumber & " [Valid Telephone Number with Country and City code]" & @CRLF ) Case 10 ; Twenty-four countries and territories share the North American Numbering Plan (NANP), with a single country code. It is a closed ; telephone numbering plan in which all telephone numbers consist of 10 digits, with the first three digits representing the area code ConsoleWrite ( $strPhoneNumber & " [Valid Telephone Number with City Code]" & @CRLF ) ;Case 9 ; ; Belgian telephone numbers: Land lines are always 9 digits long ; ConsoleWrite ( $strPhoneNumber & " [Valid Belgian Telephone Number With City Code]" & @CRLF ) ;Case 8 ; ; Danish telephone numbers are eight digits long ; ConsoleWrite ( $strPhoneNumber & " [Valid Belgian Telephone Number With City Code]" & @CRLF ) Case 7 ; 7-digit numbers: Most codes retain these rules today; in these areas, phone numbers continue to be written as 7-digit numbers ConsoleWrite ( $strPhoneNumber & " [Valid Local Telephone Number]" & @CRLF ) ;Case 6 ; ; Hungary the standard lengths for area codes is two / Subscribers' numbers are six digits long ; ConsoleWrite ( $strPhoneNumber & " [Valid Hungary Local Telephone Number]" & @CRLF ) Case 3 ConsoleWrite ( $strPhoneNumber & " [Valid Service Number]" & @CRLF ) Case Else ConsoleWrite ( $strPhoneNumber & " [Invalid Phone Number]" & @CRLF ) EndSwitch Next Nice to see all the good ideas you people come up with. Although i don't have much programming time the whole idea of phone number checking sounds great to me. This morning i thought about it a little more and concluded that since we don't have a UDF with all Country and City Codes and conventions its perhaps the best to stick with some simple string length checking. czardas and stamandster 2 Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) 2 hours ago, pluto41 said: Nice to see all the good ideas you people come up with. I absolutely agree. This is one of five search algorithms I am implementing in a program of mine. The others are ebay (word sequence) type searches and string type (exact or part string) searches. For my own purposes, these combined search algorithms will find anything I might ever type and this constitutes the final piece of the puzzle. I'm not partaking in the challenge, but I'll leave this a while for anyone else who wants to have a try. Your comments and ideas are invaluable to me and often quite entertaining. Those who wanted, or expected, a fool proof solution are naturally going to be disappointed. The challenge is to find the best approach - nothing more and nothing less. I'll look at every entry and ask someone to pick what they think is the most inventive solution. Failing that I'll pick one myself. The number of times I write phone numbers on scraps of paper is so annoying. People also give me numbers all the time. Now I can clear my drawer full of scrap paper without duplicating anything I might already have logged. No need to worry about number format (however rough and ready the solution might be). It should be born in mind that 99.9% of the world's population do not have a systematic way to type phone numbers, nor do they even know what a regular expression is. Edited August 28, 2016 by czardas stamandster 1 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) @pluto41 I didn't check every result from your last post, however these are valid number formats: 07843543287 [Invalid Phone Number] ==> Actually this looks like a UK mobile number 00441619346534 [Invalid Phone Number] ==> Actually this looks like someone calling the UK from within Europe Edited August 28, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
pluto41 Posted August 28, 2016 Share Posted August 28, 2016 That is correct @czardas the switch statement i made is incomplete. it was merely showing another approach for number checking. By using the KIS principle (Keep It Simple). Thats also why i commented some lines. When i would use the code i wrote into production i think i would have included all country rules into the switch statement. When production requirements are really -high- i would (personally) create a array for every country and every city there exists. As i live in the Netherlands i would start with some arrays something like this: Netherland = +31 CityName1 = 051 CityName2 = 038 ... .. LandLineLength = 8 MobileNumberPrefixLength1 = 06 MobileNumberSuffix = 8 chars This has then to be done in a consistent way for every country / city including all exceptions. [a hell of a job] So thats exactly the reason why i thought KIS and wrote some example code into that direction. Again its merely a approach and i think it depends on the requirements which way to go. czardas 1 Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 (edited) Uitstekend! I had to use Google to check my spelling. LOL Checking validity is interesting. I believe you will find a window of ambiguity: a certain range (number of digits) where uncertainty can't be eliminated. The question is - how large is the gap? Mexican mobiles and land lines contain at least 10 digits (so I believe). I think validity checking is a lot tougher challenge. Edited August 28, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
stamandster Posted August 28, 2016 Share Posted August 28, 2016 (edited) Seems there's two lines of reasoning to finding telephone numbers. First find only exact matches of the reference numbers or match the numbers within a larger number (even if reference numbers are mistyped or incorrect). The second line of reasoning is to define every type of phone number via rules and a table of known area, national, regional, etc. calling codes and don't look for the reference numbers. I've adjusted my code, based on the original request of the challenge, to look and find all the reference numbers and, from what I gathered, to find the most similar matches to up to a certain similarity limit (the whole "can have false positives"). One issue I noticed was the search for +44208... doesn't match ANYTHING unless you suppose that the numbers 08000225649 and 08457128276 are supposed to have +442 added to the beginning. Also added a piece that starts matching the original reference number to the left characters of the database numbers. It will one by one remove a character from the reference number to match the beginning of the database number. The reference number must be less than a certain string length in order to use this process. I may do similarly to the reverse, but haven't found it necessary, yet. expandcollapse popup#include <Array.au3> #include "typos.au3" #cs looking for 882 8565 123 8762 7543010 07843 543287 00441619346534 +44208.....missing numbers [optional task] 44208 0800275002 ; too short, japan local? 08000225649 ; 11 chars 08457128276 ; 11 chars 0015417543012 #ce GLOBAL $refNumT Local $aArray = _ ['+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _ '091 535 98 91 61', '2397865', '08457 128276', _ '348476300192', '05842 361774', '0-800-022-5649', _ '15499514891', '0096 363 0949', '04813137349', _ '06620 220168', '07766 554433', '047 845 44 22 94', _ '0435 773 4859', '(01) 882 8565', '00441619346434', _ '09314 367090', '0 164 268 0887', '0590995603', _ '991', '0267 746 3393', '064157526153', _ '0 719 829 7756', '+1-541-754-3012', '+441347543010', _ '03890 978398', '(31) 10 7765420', '020 8568 6646', _ '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _ '0800 275002', '0750 646 9746', '982-714-3119', _ '000 300 74 52 40', '023077529227', '1 758 441 0611', _ '0183 233 0151', '02047092863', '+44 20 7946 0321', _ '04935 410618', '048 257 67 60 79'] Local $findnumb = _ ['882 8565','123 8762','7543010','07843 543287','00441619346534','+44208.....missing numbers [optional task]','0015417543012'] Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF) Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF) For $i = 0 to Ubound($findnumb)-1 ; find these numbers! $reference = StringRegExpReplace($findnumb[$i],"[^0-9]","") ; Santize Numbers For $a = 0 to ubound($aArray)-1 GLOBAL $m = 0 $dbnumbers = StringRegExpReplace($aArray[$a],"[^0-9]","") ; Sanitize Numbers $refNumT = $reference if $reference = $dbnumbers Then Consolewrite('> Reference Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) Consolewrite('+> ^ Exact Match to --] '& $aArray[$a] & ' [-- row '& $a & @CRLF) EndIf IF StringLen($reference) < 7 then ; Find Partial Match at beginning of the database number Do IF StringLeft($dbnumbers,StringLen($refNumT)) = $refNumT then Consolewrite('> Reference Phone Number --] '& $findnumb[$i] & ' [-- using last '& StringLen($refNumT) & ' digits'& @CRLF) consolewrite('+> ^ Partial Match, matching first '& StringLen($refNumT)&' numbers of --] ' & $aArray[$a] & ' [-- row '& $a & @CRLF) EndIf $refNumT = StringTrimLeft($refNumT,1) Until StringLen($refNumT) = 1 OR StringLen($dbnumbers) = 10 endif if StringInStr($dbnumbers,$reference) then ; Find Partial Match within the numbers database Consolewrite('> Reference Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) Consolewrite('+> ^ Partial Match, within larger number --] '& $aArray[$a] & ' [-- row '& $a & @CRLF) ;ContinueLoop EndIf $typos = _Typos($dbnumbers, $reference) ; Find Similar numbers based on limits $stringlen = Stringlen($dbnumbers) / StringLen($reference) $similarity = Stringleft(100-($stringlen*$typos),6) IF $similarity > 97.5 then Consolewrite('> Reference Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF) consolewrite('+> ^ Similarity Match, '& $similarity &'% similar to number --] '& $aArray[$a] &' [-- row '& $a & @CRLF) EndIf Next Next Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF) Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF) My output looks like so --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- > Reference Phone Number --] 882 8565 [-- +> ^ Partial Match, within larger number --] (01) 882 8565 [-- row 16 > Reference Phone Number --] 882 8565 [-- +> ^ Partial Match, within larger number --] +44 207 882 8565 [-- row 32 > Reference Phone Number --] 7543010 [-- +> ^ Partial Match, within larger number --] +441347543010 [-- row 26 > Reference Phone Number --] 00441619346534 [-- +> ^ Similarity Match, 99% similar to number --] 00441619346434 [-- row 17 > Reference Phone Number --] 00441619346534 [-- +> ^ Similarity Match, 97.642% similar to number --] 0161 934 6534 [-- row 30 > Reference Phone Number --] +44208.....missing numbers [optional task] [-- using last 2 digits +> ^ Partial Match, matching first 2 numbers of --] 08457 128276 [-- row 5 > Reference Phone Number --] +44208.....missing numbers [optional task] [-- using last 2 digits +> ^ Partial Match, matching first 2 numbers of --] 0-800-022-5649 [-- row 8 > Reference Phone Number --] 0015417543012 [-- +> ^ Similarity Match, 98.307% similar to number --] +1-541-754-3012 [-- row 25 --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- Edited August 29, 2016 by stamandster Link to comment Share on other sites More sharing options...
czardas Posted August 28, 2016 Author Share Posted August 28, 2016 I'll leave this open until Thursday UK time. That gives three more days for any late entries. Perhaps you can come up with a new approach or improve on the ideas put forward already. One thing is for certain: there are some talented individuals around here and so far the discussion has been of value in many ways. I am constantly learning new things from you people! stamandster and orbs 2 operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now