Jump to content

RegEx lazy quantifier


Go to solution Solved by Factfinder,

Recommended Posts

Posted

I must have a misunderstanding on how lazy quantifiers work.

My expected return from below would be: "Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469"

But I'm picking up almost the full string.  I'm likely either using incorrect syntax or made a typo somewhere that I keep overlooking.  Any help would be great :)

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)
  • Moderators
Posted

DW1,

This works for me: :)

Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>][<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>]'
Local $sExtract = StringRegExpReplace($sString, '.*href="(.*)">Yarp.*', "$1")
ConsoleWrite($sExtract & " - Extracted" & @CRLF)
ConsoleWrite("Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469 - Required" & @CRLF)
M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

  • Moderators
Posted

DW1,

You need a guru for that - and I certainly do not qualify! :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

You sell yourself short, sir!

I would have expected either of these to work:

Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
Local $aTemp = StringRegExp($sString, '(?U)href="(.*)">Yarp', 3)

 

but it seems that I cannot get it to return a lazy result, just the greedy result... I'm super confused about this, and am hoping somebody can teach me to fish here.  I have workarounds, but more than anything, I'd like to clear up my own confusion, as I'm likely doing something wrong.

Posted (edited)

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="([^"\r\n]*)">Yarp', 3)
_ArrayDisplay($aTemp)
;or
$aTemp = StringRegExp($sString, 'href="?([^"\r\n\>\<]*)"?>Yarp', 3)
_ArrayDisplay($aTemp)

 

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Posted

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="([^"\r\n]*)">Yarp', 3)
_ArrayDisplay($aTemp)
;or
$aTemp = StringRegExp($sString, 'href="?([^"\r\n\>\<]*)"?>Yarp', 3)
_ArrayDisplay($aTemp)

 

Ciao.

Thank you.  More valid workarounds.

I'm still trying to get somebody to teach me to fish here though on why the lazy quantifier isn't working the way I expect it to.  I am open to it being user error, I just want to know what the error is.

Posted (edited)

wat works OK, is the pattern who is not OK, not the RegExp

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

so check href=" '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href=" and stops until ">Yarp

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Posted

Your original script would work with a little change, instaed of (.*?) use ([^>]*?) like this:

$sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
$aTemp = StringRegExp($sString, 'href="([^>]*?)">Yarp', 1)
If IsArray($aTemp) Then MsgBox(0, "", $aTemp[0])
Posted

wat works OK, is the pattern who is not OK, not the RegExp

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

so check href=" '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href=" and stops until ">Yarp

Ciao.

I understand that, and I have workarounds, however this doesn't address my question, as a lazy quantifier should be returning as little as possible while still matching, yet I'm still seeing the same result as a greedy quantifier.  That's what I'm hoping somebody can correct me on.

Posted

To clarify for anybody wondering what I'm on about...

I have plenty of workarounds to accomplish my task.  What I am asking is why the lazy quantifier is not working as I thought it did in the following:

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

 

Yes, this should match the entire string, however, I thought that adding the "?" after the quantifier "*" would make the match lazy, and grab as little as possible to match the expression.  My question is, where is my syntax error or my misunderstanding.  I understand how all of the workarounds are working.  What I don't understand is why the lazy quantifier isn't working the way I thought it did.  As I said previously, this is likely just a misunderstanding of mine, or a syntax error, but if somebody could answer how to get the lazy quantifier to work in this scenario, I'd appreciate it.

I would expect a greedy quantifier (as much as possible while still matching) to return as it is in my above script:

Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469

 

I would expect a lazy quantifier (as little as possible while still matching) to return the following:

Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469
Posted (edited)

I understand that, and I have workarounds, however this doesn't address my question, as a lazy quantifier should be returning as little as possible while still matching, yet I'm still seeing the same result as a greedy quantifier.  That's what I'm hoping somebody can correct me on.

yes right, but the '">Yarp' is already the first Match, so everything is ok

 

try

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Yarp</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

or tell RegExp to find the last 'href='

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, '.*href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

  • Solution
Posted (edited)

The script I suggested is not a workaround. It is the correct script.

Your script doesn't work because ? applies forward to string coming after href=" up to the first ">Yarp. So if you had a second ">Yarp  in the string the ? would make it match only up to the first ">Yarp.

As DXRW4E mentioned, your script start at the first href=" and ends at the first ">Yarp  because ? doesn't work backwards. To eliminate all the href="  in the string except the one preceding ">Yarp  you should use the script I suggested.

Edited by Factfinder
Posted (edited)
because ? doesn't work backwards

This is what you and DXRW4E have both been pointing out to me, which is now clear.

I wasn't understanding why the capturing group was not lazy, because I wasn't putting together the fact that the "?" doesn't work backwards.  This makes perfect sense to me now, thank you both!

EDIT: Marking Factfinder's post as the solution, however DXRW4E, I understand you were pointing out the same thing to me, I just didn't get it until his post spelled it out for me.  Thank you both!

Edited by DW1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...