Developers Jos Posted November 3, 2018 Developers Posted November 3, 2018 Try this regex line: Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH) Jos youtuber 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
youtuber Posted November 3, 2018 Author Posted November 3, 2018 (edited) On 11/3/2018 at 10:34 PM, Jos said: Try this regex line: Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH) Jos Expand yes i want this,This works great, thanks Edit:but something like this if the extension changes https://www.autoit.script.com.us Edited November 3, 2018 by youtuber
iamtheky Posted November 3, 2018 Posted November 3, 2018 essentially a string between op, but i have to drop a non-regex way #include <Array.au3> #include <Constants.au3> example() Func example() Local $aData = [ _ "http://autoit.script.com/blabla1/", _ "http://autoit-script.com/blabla2/blabla/", _ "http://autoitscript.com/bla-bla-bla3/", _ "http://autoit%20script.com", _ "http://autoit_script.com/", _ "https://www.autoit.script.com.us" _ ] for $each in $aData msgbox(0, '', stringtrimright(stringsplit($each , "/" , 2)[2], stringlen(stringsplit($each , "/" , 2)[2]) - StringInStr($each , "." , 0 , -1))) next EndFunc Reveal hidden contents ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
TheXman Posted November 3, 2018 Posted November 3, 2018 (edited) I'm no longer sure what you want. Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain? Below has both: #include <Array.au3> #include <Constants.au3> example() Func example() Local $aResult Local $aData = [ _ "http://host_no_tld/blah", _ "http://autoit.script.com/blabla1/", _ "http://www.autoit-script.com/blabla2/blabla/", _ "http://autoitscript.com/bla-bla-bla3/", _ "http://autoit%20script.com", _ "http://autoit%20script.scripts.com", _ "http://autoit_script.com/", _ "https://www.autoit.script.com.us" _ ] Local $sData = _ArrayToString($aData, @CRLF) $aResult = StringRegExp($sData, "https?://(?:www\.)?([-.\w~%]+)(?:\.[-.\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ; host name excluding top level domain If IsArray($aResult) Then _ArrayDisplay($aResult, "host name excluding top level domain") $aResult = StringRegExp($sData, "https?://(?:www\.)?([-\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ;Only 1st word If IsArray($aResult) Then _ArrayDisplay($aResult, "Only 1st word") EndFunc Edited November 4, 2018 by TheXman Corrected first regex youtuber 1 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman
mikell Posted November 4, 2018 Posted November 4, 2018 On 11/3/2018 at 10:18 PM, youtuber said: I wonder why it doesn't match here Expand It's because using these test strings the expression is not correct. The trailing slash must be optional : \/\/(.+)\.\w+\/?
mikell Posted November 4, 2018 Posted November 4, 2018 On 11/3/2018 at 11:59 PM, TheXman said: I'm no longer sure what you want. Expand No way you ever can be sure IMHO TheXman and FrancescoDiMuro 2
youtuber Posted November 4, 2018 Author Posted November 4, 2018 On 11/3/2018 at 11:59 PM, TheXman said: I'm no longer sure what you want. Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain? Below has both: Expand I changed the subject title the regex pattern I want is like this https?:\/\/(?:www.)?(.+)\.\w+\/? and https?:\/\/(?:www.)?(.*)\.\w*[\/]* If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script
Developers Jos Posted November 4, 2018 Developers Posted November 4, 2018 (edited) On 11/4/2018 at 10:28 AM, youtuber said: If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script Expand So you are expecting miracles? The previous posted RegEx simply strips the last ".xxx" and when there are suffixes with a dot inside you will have to hardcode them. One option could be to create an Array with all possible hardcoded domain suffixes and use that to strip the end of the domainname. Jos Edited November 4, 2018 by Jos youtuber 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
iamtheky Posted November 4, 2018 Posted November 4, 2018 On 11/4/2018 at 10:28 AM, youtuber said: If you change the url structure like https://www.autoit.script.com.us you would like to output it here autoit.script Expand This changes the rules completely, both of the regex and of subdomaining in general. If you always want it to end at 'script' I would specify that in the expression. It's not a cool regex, but seems to meet all criteria. Reveal hidden contents ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
Developers Jos Posted November 4, 2018 Developers Posted November 4, 2018 A complete public suffix domain list can be found here: https://publicsuffix.org/list/public_suffix_list.dat So you can imagine the "can of worms" you are dabbling in. Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
youtuber Posted November 4, 2018 Author Posted November 4, 2018 if it can't really be solved with just a regex pattern then this should be the only way Local $pattern = '(.com|\.net|\.org|\.info|\.biz|\.eu|\.fr|\.ch|\.kr|\.edu|\.us)(.*)' $aDomain = "https://www.autoit.script.com.us" $aRegex = StringRegExp($aDomain, "https?:\/\/(?:www.)?(.+)\.\w+\/?", 3) If IsArray($aRegex) Then ConsoleWrite(StringRegExpReplace($aRegex[0], $pattern, '') & @CRLF) EndIf
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now