Jump to content

Questions about regex


 Share

Recommended Posts

  • Developers

Try this regex line:

Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH)

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

11 minutes ago, Jos said:

Try this regex line:

Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH)

Jos

yes i want this,This works great, thanks :huggles:

Edit:but something like this if the extension changes :D

https://www.autoit.script.com.us

 

Edited by youtuber
Link to comment
Share on other sites

essentially a string between op, but i have to drop a non-regex way :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aData = [ _
                    "http://autoit.script.com/blabla1/", _
                    "http://autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]

   for $each in $aData
      msgbox(0, '', stringtrimright(stringsplit($each , "/" , 2)[2], stringlen(stringsplit($each , "/" , 2)[2]) - StringInStr($each , "." , 0 , -1)))
   next


EndFunc

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aResult
    Local $aData = [ _
                    "http://host_no_tld/blah", _
                    "http://autoit.script.com/blabla1/", _
                    "http://www.autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit%20script.scripts.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]
    Local $sData = _ArrayToString($aData, @CRLF)

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-.\w~%]+)(?:\.[-.\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ; host name excluding top level domain
    If IsArray($aResult) Then _ArrayDisplay($aResult, "host name excluding top level domain")

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ;Only 1st word
    If IsArray($aResult) Then _ArrayDisplay($aResult, "Only 1st word")
EndFunc

 

Edited by TheXman
Corrected first regex
Link to comment
Share on other sites

10 hours ago, TheXman said:

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

 

 

I changed the subject title

the regex pattern I want is like this

https?:\/\/(?:www.)?(.+)\.\w+\/?
and
https?:\/\/(?:www.)?(.*)\.\w*[\/]*


If you change the url structure like

https://www.autoit.script.com.us

you would like to output it here

autoit.script

Link to comment
Share on other sites

  • Developers
4 minutes ago, youtuber said:

If you change the url structure like


https://www.autoit.script.com.us

you would like to output it here

autoit.script

So you are expecting miracles? :)

The previous posted RegEx simply strips the last ".xxx" and when there are suffixes with a dot inside you will have to hardcode them. One option could be to create an Array with all possible hardcoded domain suffixes and use that to strip the end of the domainname.

Jos

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

3 hours ago, youtuber said:

If you change the url structure like


https://www.autoit.script.com.us

you would like to output it here

autoit.script

 

This changes the rules completely, both of the regex and of subdomaining in general. If you always want it to end at 'script' I would specify that in the expression.  It's not a cool regex, but seems to meet all criteria.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

  • Developers

A complete public suffix domain list can be found here: https://publicsuffix.org/list/public_suffix_list.dat

So you can imagine the "can of worms" you are dabbling in. :) 

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

if it can't really be solved with just a regex pattern then this should be the only way :)

Local $pattern = '(.com|\.net|\.org|\.info|\.biz|\.eu|\.fr|\.ch|\.kr|\.edu|\.us)(.*)'
$aDomain = "https://www.autoit.script.com.us"
$aRegex = StringRegExp($aDomain, "https?:\/\/(?:www.)?(.+)\.\w+\/?", 3)
If IsArray($aRegex) Then
    ConsoleWrite(StringRegExpReplace($aRegex[0], $pattern, '') & @CRLF)
EndIf

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...