Jump to content

Regex: when I say be lazy, it gets greedy (and vice versa)


Recommended Posts

Hello everyone

I'm having difficulty in understanding lazy vs greedy matching.  The help file for StringRegExp tells me regex in AutoIt is always greedy unless you tell it to be lazy, but I seem to be getting the opposite effect: when I add (?U), it matches *more*, not less.

$mystring = ' extype="myEXTYPE" match-quality="100%" origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE"'

$one = StringRegExp ($mystring, '(?U)(origin=")(.+?)(")', 1) ; uses (?U), so it's lazy

So, I expect
$one[0] >> 1
$one[1] >> origin="myORIGIN"

Instead, I get
$one[0] >> origin="
$one[1] >> myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE

$mystring = ' extype="myEXTYPE" match-quality="100%" origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE"'

$two = StringRegExp ($mystring, '(origin=")(.+?)(")', 1) ; does NOT use (?U), so it's greedy

So, I expect
$two[0] >> 1
$two[1] >> origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE"

Instead, I get
$two[0] >> origin="
$two[1] >> myORIGIN

Also, the fact that I get:
$two[0] >> origin="
makes NO sense to me.  It's supposed to give me an array of matches, so (depending on what kind of array is created -- the helpfile for StringRegExp doesn't say), [0] must be either "1" or [0] must be the first item in the array, and the way I understand the helpfile for StringRegExp, first item in the array is supposed to be:
origin="myORIGIN"

Samuel

Edited by leuce
Link to comment
Share on other sites

1 hour ago, leuce said:

'(?U)(origin=")(.+?)(")', 1) ; uses (?U), so it's lazy

Hmmm no.  .+  is greedy  (will get all chars up to the last quote in the text)  while  .+?  is lazy (will get all chars up to the next quote)
(?U) reverses this, I personally never use it because it's confusing (not needed, really...)
So this

StringRegExp ($mystring, 'origin="(.+?)"', 1)

will give you an array which contains 1 match only myORIGIN , because there is one capturing group only
You might also use

$myarray = StringRegExp ($mystring, 'origin="([^"]+)', 1)

to get "one or more non-quote characters right after the string origin="

Was it clear ?  :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...