Jump to content

Recommended Posts

Posted (edited)
  On 11/3/2018 at 10:34 PM, Jos said:

Try this regex line:

Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH)

Jos

Expand  

yes i want this,This works great, thanks :huggles:

Edit:but something like this if the extension changes :D

https://www.autoit.script.com.us

 

Edited by youtuber
Posted

essentially a string between op, but i have to drop a non-regex way :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aData = [ _
                    "http://autoit.script.com/blabla1/", _
                    "http://autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]

   for $each in $aData
      msgbox(0, '', stringtrimright(stringsplit($each , "/" , 2)[2], stringlen(stringsplit($each , "/" , 2)[2]) - StringInStr($each , "." , 0 , -1)))
   next


EndFunc

 

  Reveal hidden contents

Posted (edited)

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aResult
    Local $aData = [ _
                    "http://host_no_tld/blah", _
                    "http://autoit.script.com/blabla1/", _
                    "http://www.autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit%20script.scripts.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]
    Local $sData = _ArrayToString($aData, @CRLF)

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-.\w~%]+)(?:\.[-.\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ; host name excluding top level domain
    If IsArray($aResult) Then _ArrayDisplay($aResult, "host name excluding top level domain")

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ;Only 1st word
    If IsArray($aResult) Then _ArrayDisplay($aResult, "Only 1st word")
EndFunc

 

Edited by TheXman
Corrected first regex
Posted
  On 11/3/2018 at 11:59 PM, TheXman said:

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

 

 

Expand  

I changed the subject title

the regex pattern I want is like this

https?:\/\/(?:www.)?(.+)\.\w+\/?
and
https?:\/\/(?:www.)?(.*)\.\w*[\/]*


If you change the url structure like

https://www.autoit.script.com.us

you would like to output it here

autoit.script

  • Developers
Posted (edited)
  On 11/4/2018 at 10:28 AM, youtuber said:

If you change the url structure like

https://www.autoit.script.com.us

you would like to output it here

autoit.script

Expand  

So you are expecting miracles? :)

The previous posted RegEx simply strips the last ".xxx" and when there are suffixes with a dot inside you will have to hardcode them. One option could be to create an Array with all possible hardcoded domain suffixes and use that to strip the end of the domainname.

Jos

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Posted
  On 11/4/2018 at 10:28 AM, youtuber said:

If you change the url structure like

https://www.autoit.script.com.us

you would like to output it here

autoit.script

Expand  

 

This changes the rules completely, both of the regex and of subdomaining in general. If you always want it to end at 'script' I would specify that in the expression.  It's not a cool regex, but seems to meet all criteria.

  Reveal hidden contents

  • Developers
Posted

A complete public suffix domain list can be found here: https://publicsuffix.org/list/public_suffix_list.dat

So you can imagine the "can of worms" you are dabbling in. :) 

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...