AutID Posted January 1, 2015 Share Posted January 1, 2015 (edited) So as the tittle says, I am trying to find a pattern to get all the urls from different string. I want to capture from protocol, which is http in most of the cases, to the suffix of the domain name. The problem is there are many different types of suffix of domain names in urls which makes it a little bit tricky. In the beginning I made something like this #include <Array.au3> Local $sUrl = "http://www.google.com/(random expanded link)" Local $aArray = StringRegExp($sUrl, '(?i)http://(.*?).com', 2) If Not @error Then _ArrayDisplay($aArray) EndIf However if the suffix is something other than .com, for example .net, .org, this pattern will fail. Then I thought of creating an array with the most popular suffix and loop it and get all the domain names but this would take a lot of coding which could be avoided if I had better regex skills. Finally I came up with this pattern but I am not 100% sure that it will capture everything and sometimes I get some weird results. $pattern = "(?<Protocol>\w+):\/\/(?<Domain>[\w.]+\/?)\S*" So anyone has any ideas better ideas? Edit: hmmm I tried this simple pattern and seems to work pretty well.Local $aArray = StringRegExp($sUrl, '(?i)http://(.*?)/', 2) Either I'm very tired either it was very simple. Any opinions? Edited January 1, 2015 by AutID https://iblockify.wordpress.com/ Link to comment Share on other sites More sharing options...
mikell Posted January 1, 2015 Share Posted January 1, 2015 '(?i)http://([^/]+)' Link to comment Share on other sites More sharing options...
jguinch Posted January 2, 2015 Share Posted January 2, 2015 (edited) for more complex urls, you can use something like this regex : Local $aUrl[9] = ["http://server:12345/path/blabla", _ "http://server.com:1234/path?query_string#fragment_id", _ "ftp://user:password@server:1234/path", _ "ftp://user@server:1234/path", _ "http://www.server.com", _ "www.server.com/path", _ "server.com", _ "http://user@server.com:1234/path?query_string#fragment_id", _ "user@server.com:1234" ] Local $sPattern = "^(?i)(?:(?:[a-z]+):\/\/)?" & _ ; Protocol "(?:(?:(?:[^@:]+))" & _ ; Username "(?::(?:[^@]+))?@)?" & _ ; Password "([^\/:]+)" & _ ; Host "(?::(?:\d+))?" & _ ; Port "(?:\/(?:[^?]+)?)?" & _ ; Path "(?:\?\N+)?" ; Query For $i = 0 To UBound($aUrl) - 1 $aHost = StringRegExp($aUrl[$i], $sPattern, 1) ConsoleWrite($aHost[0] & @TAB & $aUrl[$i] & @CRLF) Next https://regex101.com/r/yB3dO1/1 Edited January 2, 2015 by jguinch Parsix and czardas 2 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 2, 2015 Moderators Share Posted January 2, 2015 Nice jguinch, but you might want to make it case insensitive. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
jguinch Posted January 2, 2015 Share Posted January 2, 2015 You are rigth SmOke_N... edited Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now