Developers Jos Posted November 3, 2018 Developers Share Posted November 3, 2018 Try this regex line: Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH) Jos youtuber 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
youtuber Posted November 3, 2018 Author Share Posted November 3, 2018 (edited) 11 minutes ago, Jos said: Try this regex line: Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH) Jos yes i want this,This works great, thanks Edit:but something like this if the extension changes https://www.autoit.script.com.us Edited November 3, 2018 by youtuber Link to comment Share on other sites More sharing options...
iamtheky Posted November 3, 2018 Share Posted November 3, 2018 essentially a string between op, but i have to drop a non-regex way #include <Array.au3> #include <Constants.au3> example() Func example() Local $aData = [ _ "http://autoit.script.com/blabla1/", _ "http://autoit-script.com/blabla2/blabla/", _ "http://autoitscript.com/bla-bla-bla3/", _ "http://autoit%20script.com", _ "http://autoit_script.com/", _ "https://www.autoit.script.com.us" _ ] for $each in $aData msgbox(0, '', stringtrimright(stringsplit($each , "/" , 2)[2], stringlen(stringsplit($each , "/" , 2)[2]) - StringInStr($each , "." , 0 , -1))) next EndFunc ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
TheXman Posted November 3, 2018 Share Posted November 3, 2018 (edited) I'm no longer sure what you want. Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain? Below has both: #include <Array.au3> #include <Constants.au3> example() Func example() Local $aResult Local $aData = [ _ "http://host_no_tld/blah", _ "http://autoit.script.com/blabla1/", _ "http://www.autoit-script.com/blabla2/blabla/", _ "http://autoitscript.com/bla-bla-bla3/", _ "http://autoit%20script.com", _ "http://autoit%20script.scripts.com", _ "http://autoit_script.com/", _ "https://www.autoit.script.com.us" _ ] Local $sData = _ArrayToString($aData, @CRLF) $aResult = StringRegExp($sData, "https?://(?:www\.)?([-.\w~%]+)(?:\.[-.\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ; host name excluding top level domain If IsArray($aResult) Then _ArrayDisplay($aResult, "host name excluding top level domain") $aResult = StringRegExp($sData, "https?://(?:www\.)?([-\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ;Only 1st word If IsArray($aResult) Then _ArrayDisplay($aResult, "Only 1st word") EndFunc Edited November 4, 2018 by TheXman Corrected first regex youtuber 1 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
mikell Posted November 4, 2018 Share Posted November 4, 2018 9 hours ago, youtuber said: I wonder why it doesn't match here It's because using these test strings the expression is not correct. The trailing slash must be optional : \/\/(.+)\.\w+\/? Link to comment Share on other sites More sharing options...
mikell Posted November 4, 2018 Share Posted November 4, 2018 7 hours ago, TheXman said: I'm no longer sure what you want. No way you ever can be sure IMHO TheXman and FrancescoDiMuro 2 Link to comment Share on other sites More sharing options...
youtuber Posted November 4, 2018 Author Share Posted November 4, 2018 10 hours ago, TheXman said: I'm no longer sure what you want. Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain? Below has both: I changed the subject title the regex pattern I want is like this https?:\/\/(?:www.)?(.+)\.\w+\/? and https?:\/\/(?:www.)?(.*)\.\w*[\/]* If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script Link to comment Share on other sites More sharing options...
Developers Jos Posted November 4, 2018 Developers Share Posted November 4, 2018 (edited) 4 minutes ago, youtuber said: If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script So you are expecting miracles? The previous posted RegEx simply strips the last ".xxx" and when there are suffixes with a dot inside you will have to hardcode them. One option could be to create an Array with all possible hardcoded domain suffixes and use that to strip the end of the domainname. Jos Edited November 4, 2018 by Jos youtuber 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
mikell Posted November 4, 2018 Share Posted November 4, 2018 Regex is magic but not that much youtuber 1 Link to comment Share on other sites More sharing options...
iamtheky Posted November 4, 2018 Share Posted November 4, 2018 3 hours ago, youtuber said: If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script This changes the rules completely, both of the regex and of subdomaining in general. If you always want it to end at 'script' I would specify that in the expression. It's not a cool regex, but seems to meet all criteria. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
Developers Jos Posted November 4, 2018 Developers Share Posted November 4, 2018 A complete public suffix domain list can be found here: https://publicsuffix.org/list/public_suffix_list.dat So you can imagine the "can of worms" you are dabbling in. Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
youtuber Posted November 4, 2018 Author Share Posted November 4, 2018 if it can't really be solved with just a regex pattern then this should be the only way Local $pattern = '(.com|\.net|\.org|\.info|\.biz|\.eu|\.fr|\.ch|\.kr|\.edu|\.us)(.*)' $aDomain = "https://www.autoit.script.com.us" $aRegex = StringRegExp($aDomain, "https?:\/\/(?:www.)?(.+)\.\w+\/?", 3) If IsArray($aRegex) Then ConsoleWrite(StringRegExpReplace($aRegex[0], $pattern, '') & @CRLF) EndIf Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now