Jump to content

Formatting phone numbers with regex


tecc
 Share

Recommended Posts

Since our telephone system sucks and can only dial numbers from Outlook contacts that nobody bothers to maintain, I could use a small script to dial numbers by marking them in a mail or webpage and then pasting this into the phone client. Of course the strings need to be formatted properly, like this

+49 (0) 1234 567890 -> 01234 567890
+49 1234 567890 -> 01234 567890
01234 567890 -> 01234 567890
+34 (0) 1234 567890 -> 0034 1234 567890
+41 1234 567890 -> 0041 1234 567890


The added wrinkle here is that all occurences of +49 should be replaced by one 0 but all other numbers starting with a + should be replaced with two 0. Any ideas how to do this?

Edited by tecc
Link to comment
Share on other sites

This works for all your examples. It's probably possible to come up with something more elegant, but it does the job

Local $sFormatted = ''
Local $aPhoneNumbers[5]
$aPhoneNumbers[0] = "+49 (0) 1234 567890"
$aPhoneNumbers[1] = "+49 1234 567890"
$aPhoneNumbers[2] = "01234 567890"
$aPhoneNumbers[3] = "+34 (0) 1234 567890"
$aPhoneNumbers[4] = "+41 1234 567890"

For $i = 0 To 4
    $sFormatted = StringRegExpReplace(StringRegExpReplace($aPhoneNumbers[$i],'^\+([^4][^9]|[0-9]{2})[ 0)(]*([0-9]{4}) ?([0-9]{6})$','00\1 \2 \3'),'^0049 (.*)$','0\1')
    ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sFormatted = ' & $sFormatted & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
next

 

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Regex is amazing, but please note that you don't need regex to do that  :)

#Include <Array.au3>

Local $aPhoneNumbers[5]
$aPhoneNumbers[0] = "+49 (0) 1234 567890"
$aPhoneNumbers[1] = "+49 1234 567890"
$aPhoneNumbers[2] = "01234 567890"
$aPhoneNumbers[3] = "+34 (0) 1234 567890"
$aPhoneNumbers[4] = "+41 1234 567890"

For $i = 0 To 4
   $tmp = StringSplit($aPhoneNumbers[$i], " ")
   If $tmp[0] = 2 Then ContinueLoop
   If StringInStr($tmp[1], "49") Then 
       $aPhoneNumbers[$i] = "0" & $tmp[$tmp[0]-1] & " " & $tmp[$tmp[0]]
   Else
       $aPhoneNumbers[$i] = "00" & StringReplace($tmp[1], "+", "") & " " & $tmp[$tmp[0]-1] & " " & $tmp[$tmp[0]]
   EndIf
Next
_ArrayDisplay($aPhoneNumbers)

 

Link to comment
Share on other sites

creating edge cases to challenge myself, this one in case the spaces are not reliable.  And because nesting ternary statements is fun for everybody.

*just have to figure out to tuck that (0) string replace into the dirty one liner :)

#Include <Array.au3>

Local $aPhoneNumbers[5]
$aPhoneNumbers[0] = "+49 (0) 1234 567 890"
$aPhoneNumbers[1] = "+49 1234 5678 90"
$aPhoneNumbers[2] = "01234 56789 0"
$aPhoneNumbers[3] = "+34 (0) 1234 5 67890"
$aPhoneNumbers[4] = "+41 1234 567 890"

For $i = 0 To 4
$aPhoneNumbers[$i] = stringreplace($aPhoneNumbers[$i] , "(0)" , "")
$aPhoneNumbers[$i] = stringleft($aPhoneNumbers[$i] , 1) = "+" ? stringmid($aPhoneNumbers[$i] , 2 ,2) = "49" ? stringreplace($aPhoneNumbers[$i] , stringleft($aPhoneNumbers[$i] , 3) , "0") : stringreplace($aPhoneNumbers[$i] , stringleft($aPhoneNumbers[$i] , 1) , "00") : $aPhoneNumbers[$i]
Next

_ArrayDisplay($aPhoneNumbers)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

@mikell
Right, I just thought it would be more elegant and I'm also quite amazed at the seeming complexity of such an expression.

@Bowmore
Works like a charme, 2 things though:
- can you expand your expression to keep only digits?

- I've just encountered this pattern in a customer's signature which isn't handled yet by your expression:
  '+49 1234 567890' should also be formatted to '01234567890'

@all
I'm thinking about what's the best way to activate this function. Should I add a hotkey to Windows to fire up the compiled script or have the script resident in memory sleeping and waiting for the designated hotkey to be pressed to copy the number over to clipboard? I guess with today's CPU speeds polling for the hotkey won't tax resources too much?

Link to comment
Share on other sites

11 hours ago, tecc said:

the seeming complexity of such an expression

It's complex because in '49' cases you want to replace the prefix by one leading zero only, and in the others by two leading zeros, this requires 2 passes as Bowmore did

#Include <Array.au3>

Local $aPhoneNumbers[5]
$aPhoneNumbers[0] = "+49 (0) 1234 567890"
$aPhoneNumbers[1] = "+49 1234 567890"
$aPhoneNumbers[2] = "01234 567890"
$aPhoneNumbers[3] = "+34 (0) 1234 567890"
$aPhoneNumbers[4] = "+41 1234 567890"

For $i = 0 To 4
  $aPhoneNumbers[$i] = StringRegExpReplace($aPhoneNumbers[$i], '^(\+(\d\d)(\h*\(0\))?)', "00$2")
  $aPhoneNumbers[$i] = StringRegExpReplace($aPhoneNumbers[$i], '(^0\K049|\h*)', "")
Next
_ArrayDisplay($aPhoneNumbers)

Without this constraint building a working one-liner would be quite easy

Personally to use such a script, I would include ClipGet() and ClipPut(), so I could copy the raw number, then run the script once, then paste the resulting formatted number
 

Link to comment
Share on other sites

Slightly modified version to output without spaces

Local $sFormatted = ''
Local $aPhoneNumbers[6]
$aPhoneNumbers[0] = "+49 (0) 1234 567890"
$aPhoneNumbers[1] = "+49 1234 567890"
$aPhoneNumbers[2] = "01234 567890"
$aPhoneNumbers[3] = "+34 (0) 1234 567890"
$aPhoneNumbers[4] = "+41 1234 567890"
$aPhoneNumbers[5] = "+34 1234 567890"
For $i = 0 To 5
    $sFormatted = StringRegExpReplace(StringRegExpReplace($aPhoneNumbers[$i],'^\+([^4][^9]|[0-9]{2})[ 0)(]*([0-9]{4}) ?([0-9]{6})$','00\1\2\3'),'^(?:0049[0-9]|(0[^0])) ?([0-9]{3}) ?([0-9]{6})$','0\1\2\3')
    ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sFormatted = ' & $sFormatted & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
next

 

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Maybe I'm pasting this wrong into my script (where I simply get one number from clipboard) but

+49 1234 567890 formats to 0234567890

and

+49 2345 55555 formats to +49 2345 55555

Maybe because of my preferred handling of +49 we actually need more than one expression as mikell suggested?

Link to comment
Share on other sites

Hi tecc

My mistake, I was not looking at the output closely enough. Are you sure the last number +49 2345 55555 is correct the last group only has 5 numbers. No matter I have amended my regular expression to allow 5 or 6 numbers in the last group.

Local $sFormatted = ''
Local $aPhoneNumbers[11]

$aPhoneNumbers[0] = "+49 (0) 1234 567890"
$aPhoneNumbers[1] = "+49 1234 567890"
$aPhoneNumbers[2] = "01234 567890"
$aPhoneNumbers[3] = "01234567890"
$aPhoneNumbers[4] = "+34 (0) 1234 567890"
$aPhoneNumbers[5] = "+34 1234 567890"
$aPhoneNumbers[6] = "+41 1234 567890"
$aPhoneNumbers[7] = "+411234567890"
$aPhoneNumbers[8] = "+41 1234567890"
$aPhoneNumbers[9] = "+49 1234 567890"
$aPhoneNumbers[10] = "+49 2345 55555"

For $i = 0 To 10
    $sFormatted = StringRegExpReplace(StringRegExpReplace($aPhoneNumbers[$i],'^(?:\+(\d{2}) ?(?:\(0\))? ?|(0))(\d{4}) ?(\d{5,6})$','00\1\2\3\4'),'^(?:0049|000)(\d+)$','0\1')
    ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sFormatted = ' & $sFormatted & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
next



 

Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

 

18 hours ago, mikell said:

Did you try this ?

Well, you can make a one-liner with this, which includes the two SRER ... maybe more elegant ? blah  :rolleyes:

Yes, and I like it, so I tried it like this:

$n = ClipGet()
$n = StringRegExpReplace($n, '^(\s*\+(\d\d)(\h*\(0\))?)', "00$2")
$n = StringRegExpReplace($n, '(^0\K049|\s*)', "")
$n = StringRegExpReplace($n, '[^0-9]',"")

If StringRegExp($n, '^00\d\d0') = 1 Then
    $n = StringReplace($n,5,"",1,2)
EndIf

Msgbox(0,"", $n)
Exit


I've added those two lines cause while sorting my inbox in the meantime I've stumbled on two more formats being used in customers' signatures:

02134/55564-0
+41 01234 567890

So my first line gets rid of all none digits and my If-statement is supposed to delete the extra zero in a number like '0041 01234 567890' but curiously enough I have to replace it with another char or white space, cause replaceing it with "" just doesn't work. I could of course switch the order of those lines of code but I wonder why that is.
 

Edited by tecc
Link to comment
Share on other sites

The addition of successive requirements is a typical feature of endless stories
Regex hate this. Regex wants (needs) ALL initial requirements to be clearly defined from the beginning
So please check ALL possible wrong formats before asking a question

BTW it's exactly the reason why I first suggested a more versatile traditional (non all-regex) approach


Edit
for the moment, this one-liner still works  :blink:

#Include <Array.au3>

Local $aPhoneNumbers[] = [ _
    "+49 (0) 1234 567890", _
    "+49 1234 567890", _
    "+491234 567890", _
    "+49 2345 55555", _
    "+34 (0) 1234 567890", _
    "+34 1234 567890", _
    "+41 1234 567890", _
    "+411234567890", _
    "+41 1234567890", _
    "+41 01234 567890", _
    "01234 567890", _
    "01234567890", _
    "02134/55564-0" ]

For $i = 0 To UBound($aPhoneNumbers)-1
    $aPhoneNumbers[$i] = StringRegExpReplace("0" & $aPhoneNumbers[$i] , _ 
            '^(?:0\+49|0\d|(0)\+(\d\d))(?:\h*\(0\))?\D*0?(\d+)\D*(\d+)\D*', "0$1$2$3$4")
Next
_ArrayDisplay($aPhoneNumbers)

 

Edited by mikell
Link to comment
Share on other sites

I get that and I appreciate everyone's help on this. I really didn't expect some more weird phone number formats to pop up and since this is my first time at coding/scripting stuff, it's a bit much. I had to read up on what an array is and I'm really glad your regex one-liner works. I'm still confused as to why my workaround

StringReplace($n,5,"",1,2)

didn't. I also haven't found a way to turn off caps-lock after I've used it as a hotkey to copy the number to clipboard like this:

HotKeySet("{CAPSLOCK}", "GetNumber")

While 1
    Sleep(100)
WEnd

Func GetNumber()
    $n = ClipGet()
    $n = StringRegExpReplace("0" & $n , _
    '^(?:0\+49|0\d|(0)\+(\d\d))(?:\h*\(0\))?\D*0?(\d+)\D*(\d+)\D*', "0$1$2$3$4")
    Msgbox(0,"", $n)
    EndFunc
Exit

When I replace Msgbox with Send capslock off, it's still on and the LED is lit.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...