youtuber Posted March 2, 2018 Share Posted March 2, 2018 I would like to get only this part of this url https://autoit.com/test1 but all my tests are failing thanks for this . thanks $url = "https://autoit.com/test1/test2/" $Reg = StringRegExp($url, 'https?://*.*[^/]+', 3) ConsoleWrite($Reg[0] & @CRLF) Link to comment Share on other sites More sharing options...
ripdad Posted March 3, 2018 Share Posted March 3, 2018 Much simpler than straining with a regex. Took me 2 minutes and I know it's accurate. Local $url = 'https://autoit.com/test1/test2/' Local $str = StringLeft($url, StringInStr($url, '/', 0, 4) - 1) MsgBox(0, '', $str) youtuber 1 "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
Bilgus Posted March 3, 2018 Share Posted March 3, 2018 Took me 5 and its more flexible $url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, ftp://autovt.com/test7/test8/ " $Reg = StringRegExp($url, '(?i).*?(..?tp.?://[^/]*.?[^/]*)', 3) ; (?i) caseless matching ; .*? 0 or more characters but doesn't keep them ; ( start a capturing group ; . any character 'f or h' ; .? 0 or 1 char 't or `none`' ; tp literally tp ; .? 0 or 1 char 'http>S<' ; : literal colon ; // literal slashes ;[^/]* all characters till a slash ;.? or or 1 character ;[^/]* all characters till a slash (again) ;) end capturing group ConsoleWrite($Reg[0] & @CRLF & $Reg[1] & @CRLF& $Reg[2] & @CRLF) youtuber 1 Link to comment Share on other sites More sharing options...
Bilgus Posted March 3, 2018 Share Posted March 3, 2018 oh outputs: https://autoit.com/test1 http://autoet.com/test5 ftp://autovt.com/test7 youtuber 1 Link to comment Share on other sites More sharing options...
youtuber Posted March 3, 2018 Author Share Posted March 3, 2018 @ripdad is also not suitable for me because I will do url verification at the same time. @Bilgus your friend regex has worked well for me thank you. (?i).*?(..?tp.?://[^/]*.?[^/]*) maybe there is another alternative @mikell Link to comment Share on other sites More sharing options...
mikell Posted March 3, 2018 Share Posted March 3, 2018 For the fun $sReg = StringRegExpReplace($url, '(.*?/){3}[^/]+\K.*', "") But the usual way is more understandable/accurate/secure $aReg = StringRegExp($url, 'https?://[^/]+/[^/]*', 1) youtuber 1 Link to comment Share on other sites More sharing options...
iamtheky Posted March 3, 2018 Share Posted March 3, 2018 2 hours ago, youtuber said: is also not suitable for me because I will do url verification at the same time. Are you doing an inetget on the return to verify the url or are you just validating that the first 8 characters are https://? either way it's plenty suitable within the existing parameters. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
mikell Posted March 3, 2018 Share Posted March 3, 2018 2 hours ago, iamtheky said: either way it's plenty suitable within the existing parameters I totally agree. The code from ripdad in post #2 works the same way than my srer (find the 4th occurence of "/" and grab all chars on the left) A less classy look maybe (though this could be discussed, for sure) but equal efficiency Link to comment Share on other sites More sharing options...
Jury Posted March 4, 2018 Share Posted March 4, 2018 $url = "https://autoit.com/test1/test2/" $Reg = StringRegExpReplace($url, '(https://.*?/\w+).*?$', '$1') ConsoleWrite($Reg & @CRLF) youtuber 1 Link to comment Share on other sites More sharing options...
Bilgus Posted March 4, 2018 Share Posted March 4, 2018 (edited) OP said he wanted to validate the URL at the same time so how about we start getting crazy $url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ " $aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)(?&host)(?&path)(?&path)', 3) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next https://autoit.com/test1/test2/ http://autoet.com/test5/test6/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ Edited March 4, 2018 by Bilgus Forgot the outputs youtuber 1 Link to comment Share on other sites More sharing options...
youtuber Posted March 4, 2018 Author Share Posted March 4, 2018 (edited) @Bilgus Thank you for your help @mikell and @Bilgus I would like to ask you another question, what should be the pattern to cut the http:// or https:// and www. at the beginning? https://autoit.com/test1/test2/ http://aut-oet.com/test5/test6/ https://www.autovt.com/test7/test8/ http://aut.oet.com/test9/test10/ (?:http[s]?:\/\/)?(?:www\.)?([^\/]+\/[^\/]*) Edited March 4, 2018 by youtuber Link to comment Share on other sites More sharing options...
Bilgus Posted March 4, 2018 Share Posted March 4, 2018 (edited) You have to turn them into non capturing groups $url = "test.com/test0/ https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ " ;This one is wrong ;$aReg = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3) $aReg = (?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)\K(?&host)(?&path) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next ConsoleWrite(@CRLF) $bReg = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3) For $i = 0 to UBound($bReg) - 1 ConsoleWrite($bReg[$i] & @CRLF) Next autoit.com/test1/ autoet.com/test5/ blank.com/Nope/ autovt.com/test7/ autoet.com/test9/ autoit.com/test1 autoet.com/test5 blank.com, http:/ autovt.com/test7 autoet.com/test9 Edited March 4, 2018 by Bilgus Fixed Link to comment Share on other sites More sharing options...
Bilgus Posted March 4, 2018 Share Posted March 4, 2018 So for the top one first off you can see the validation is a bit more robust I added \K to restart the match after matching Scheme since as far as I know you can't use a named group as a capturing group but we want it to be there for a valid match so we match it then reset the position for matching just after Link to comment Share on other sites More sharing options...
Bilgus Posted March 4, 2018 Share Posted March 4, 2018 Really for what you want it could be shortened to $aReg = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3) or $aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE)(?<path>[^/\s,]+/))..?tps?://\K\w+\.\w{3}/?(?&path)(?&path)', 3) the second being just in case you decide you want more than a single portion of the path Link to comment Share on other sites More sharing options...
iamtheky Posted March 4, 2018 Share Posted March 4, 2018 or you could just do this ;~ $str = "https://autoit.com/test1/test2/" ;~ $str = "http://aut-oet.com/test5/test6/" $str = "https://www.autovt.com/test7/test8/" ;~ $str = "http://aut.oet.com/test9/test10/" $sMid = Stringmid(stringleft($str , StringInStr($str , "/" , 0 , 4) - 1) , stringinstr($str , "aut")) msgbox(0, '' , $sMid) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
youtuber Posted March 4, 2018 Author Share Posted March 4, 2018 (edited) @Bilgus unfortunately some of your regex pattern unsuccessful domains are bypassing! $url = "https://autoitscript.com/test1/test2/" & @CRLF & _ "http://aut-oit.com/test5/test6/" & @CRLF & _ "https://www.autoit.com/test7/test8/" & @CRLF & _ "http://blog.autoitscript.com/test9/test10/" $aReg1 = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3) $aReg2 = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3) $aReg3 = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3) ConsoleWrite("----" & "$aReg1 " & "----" & @CRLF) For $i = 0 To UBound($aReg1) - 1 ConsoleWrite($aReg1[$i] & @CRLF) Next ConsoleWrite(@CRLF) ConsoleWrite("----" & "$aReg2 " & "----" & @CRLF) For $i = 0 To UBound($aReg2) - 1 ConsoleWrite($aReg2[$i] & @CRLF) Next ConsoleWrite(@CRLF) ConsoleWrite("----" & "$aReg3 " & "----" & @CRLF) For $i = 0 To UBound($aReg3) - 1 ConsoleWrite($aReg3[$i] & @CRLF) Next ConsoleWrite(@CRLF) console ----$aReg1 ---- autoitscript.com/test1/ www.autoit.com/ blog.autoitscript.com/ ----$aReg2 ---- autoitscript.com/test1 aut-oit.com/test5 www.autoit.com/test7;---->here www. I do not want the part blog.autoitscript.com/test9 ----$aReg3 ---- autoitscript.com/test1/ oit.com/test5/ www.autoit.com/ blog.autoitscript.com/ Looks better $aReg4 = StringRegExp($url, '(?i).*?..?tps?://?(?:www\.)?([^/]*.?[^/]*)', 3) Edited March 4, 2018 by youtuber Link to comment Share on other sites More sharing options...
Bilgus Posted March 4, 2018 Share Posted March 4, 2018 I updated the post you might want to recheck it Sorry I realized when I was writing a follow up post that it wasn't right youtuber 1 Link to comment Share on other sites More sharing options...
youtuber Posted March 4, 2018 Author Share Posted March 4, 2018 (edited) @Bilgus @mikell or Is it better? I do not know (?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*) Edited March 4, 2018 by youtuber Link to comment Share on other sites More sharing options...
Bilgus Posted March 5, 2018 Share Posted March 5, 2018 (edited) $url = "https://autoitscript.com/test1/test2/" & @CRLF & _ "http://aut-oit.com/test5/test6/" & @CRLF & _ "https://www.autoit.com/test7/test8/" & @CRLF & _ "https://google.com/test1/test2/" & @CRLF & _ "http://qwerty.org/test5/test?/whatabout_a_page.html" & @CRLF & _ "https://www.tungsten.com/test7/test8/metalisbetter.htm" & @CRLF & _ "https://falsedomains.com/test1/test2/" & @CRLF & _ "http://who%20is%20this/test5/test6/" & @CRLF & _;NO "https://www.autochecker.com/test7/test8/" & @CRLF & _ "https://autobanks.com" & @CRLF & _;NO "http://aut_ofitbaddomain" & @CRLF & _;NO "https://www.autoit.biz/test7/test8/" & @CRLF $aReg = StringRegExp($url, '(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/', 3) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next ConsoleWrite(@CRLF) $bReg = StringRegExp($url, '(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)',3) For $i = 0 to UBound($bReg) - 1 ConsoleWrite($bReg[$i] & @CRLF) Next Its easy enough to test ;'(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/' autoitscript.com/test1/ aut-oit.com/test5/ autoit.com/test7/ google.com/test1/ qwerty.org/test5/ tungsten.com/test7/ falsedomains.com/test1/ autochecker.com/test7/ autoit.biz/test7/ ;'(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)' autoitscript.com/test1 aut-oit.com/test5 autoit.com/test7 google.com/test1 qwerty.org/test5 tungsten.com/test7 falsedomains.com/test1 who%20is%20this/test5 autochecker.com/test7 autobanks.com http:/ autoit.biz/test7 as you can see the first one is still more robust although I still had to change it a bit to match your data URI have a list of valid characters but I don't know that I'd want to build a regex for them lol Edited March 5, 2018 by Bilgus youtuber 1 Link to comment Share on other sites More sharing options...
youtuber Posted March 5, 2018 Author Share Posted March 5, 2018 I do not want the right / slash at the end (?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/ Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now