Jump to content

[Solved]StringRegExp with offset and start of string anchor


Recommended Posts

So I'm having a issue with StringRegExp when using the offer parameter and using the start of string anchor if the offset is greater than 1

I just wonder if it's a bug or it is supposed to work like that?

See example below

StringRegExp("abc", "^[a-z]", 1, 1)
ConsoleWrite(@error&@CRLF);success
StringRegExp("abc", "^[a-z]", 1, 2)
ConsoleWrite(@error&@CRLF);failure

Thanks in advance

Edited by genius257
Link to comment
Share on other sites

They should both error, carat goes on the inside

StringRegExp("abc", "[^a-z]", 1, 1)
ConsoleWrite(@error&@CRLF);failure
StringRegExp("abc", "[^a-z]", 1, 2)
ConsoleWrite(@error&@CRLF);failure

 

StringRegExp("abc", "abc", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "bc", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "c", 1, 3)
ConsoleWrite(@error&@CRLF)

;errors

StringRegExp("abc", "ab", 1, 3)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "a", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^abc]", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^bc]", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^c]", 1, 3)
ConsoleWrite(@error&@CRLF)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

NO.

First RegEx is to get the "a", second RegEx is to get the "b"

From the documentation:

Quote
Outside a character class, the caret matches at the start of the subject text, and also just after a non-final newline sequence if option (?m) is active. By default the newline sequence is @CRLF.
Inside a character class, a leading ^ complements the class (excludes the characters listed there).

 

Link to comment
Share on other sites

ahh, i reversed it.  context free is tough, but thats a start point and so it gets 'abc', and then 'bc'  

 

edit, tested real quick and with dashes im getting the largest susbset, not the smallest subset of the group - running more

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

whered you get the quote from?  If you put that carat there you are only getting the first character, and only if it's letter, and only if its lowercase.  What is the desired end goal?

 

#include<array.au3>

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 1)
_ArrayDisplay($aMatch)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 2)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 3)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 4)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("A0bc", "^[a-z]", 3, 1)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("A0bc", "^[a-z]", 3, 4)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

From the StringRegExp documentation in the Anchors table in Remarks.

I'm iterating through a string, looking for exact matches:

Global $Types[][] = [ _
    ['^("[^"]*"|''''[^'''']*'''')',"String"], _
    ["^\$[_a-zA-Z0-9]+","Variable"] _
]

$sOutput = ""

$sInput = '$var = "this is a test"'
$iOffset = 1

#include <Array.au3>

While 1
    StringRegExp($sInput, "^\s*(\S)", 1, $iOffset)
    If @error<>0 Then ExitLoop
    $iOffset = @extended

    For $i=0 To UBound($Types, 1)-1
        $a = StringRegExp($sInput, $Types[$i][0], 1, $iOffset-1)
        If @error=0 Then
            $iOffset=@extended
            $sOutput&=$Types[$i][1]&";"
            ExitLoop
        EndIf
    Next
WEnd

I do know there are better ways of doing this, I'm just wondering if it's supposed to fail when using "^" and offset greater than 1

Edited by genius257
Link to comment
Share on other sites

5 minutes ago, genius257 said:

I'm just wondering if it's supposed to fail when using "^" and offset greater than 1

Obviously yes !
^  matches at the start of the subject text , while offset is The string position to start the match
First position (just after ^) is offset 1, so others (offset > 1) won't match if the ^ anchor is used - and if you don't use a workaround  :)

Link to comment
Share on other sites

Thanks @mikell.

It seems silly to me, as i see it, the offset would define where the string would be trimmed and matched, but i guess not.

guess I'll haft to sub string myself and just add the @extended to the offset... >.>

Edited by genius257
Link to comment
Share on other sites

14 minutes ago, genius257 said:

guess I'll haft to sub string myself and just add the return to the offset...

This is the workaround indeed  :)
Using offset you force the position where to start the match, so you'll jump into troubles if you do this with the ^ anchor in the pattern

$offset = 1
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
$offset = @extended
ConsoleWrite($res[0]&@CRLF)
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
$offset = @extended
ConsoleWrite($res[0]&@CRLF)
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)

 

Edited by mikell
Link to comment
Share on other sites

14 minutes ago, mikell said:

This is the workaround indeed  :)
Using offset you force the position where to start the match, so you'll jump into troubles if you do this with the ^ anchor in the pattern

$offset = 1
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)
$offset += StringLen($res[0])
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)
$offset += StringLen($res[0])
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)

 

kinda.

more like:

StringRegExp(StringMid($sInput, $offset), "^[a-z]", 1)

but it works now i guess..

Edited by genius257
forgot the anchor in the pattern
Link to comment
Share on other sites

The main problem with your solution is that if not using the anchor, it will match anywhere in the string. This will make it useless if the purpose it to iterate though it and process every char or do something else, should the match(es) fail.

I appreciate all the help ;)

Anyway this is my result (I think my offset calculation will be wrong in some cases and should be adjusted at a later time, but it works for now :))

Global $Types[][] = [ _
    ['^("[^"]*"|''''[^'''']*'''')',"String"], _
    ["^\$[_a-zA-Z0-9]+","Variable"] _
]

$sOutput = ""

$sInput = FileRead(@ScriptFullPath)
$sInput = '$var="this is a test" &"test"'
$iOffset = 1

While 1
    StringRegExp(StringMid($sInput, $iOffset), "^\s*(\S)", 1)
    If @error<>0 Then ExitLoop
    $iOffset += @extended-1

    ConsoleWrite(StringMid($sInput, $iOffset-1)&@CRLF)

    $bMatch=False
    For $i=0 To UBound($Types, 1)-1
        $a = StringRegExp(StringMid($sInput, $iOffset-1), $Types[$i][0], 1)
        If @error=0 Then
            $iOffset+=@extended-2
            $sOutput&=$Types[$i][1]&";"
            $bMatch=True
            ExitLoop
        EndIf
    Next
    If Not $bMatch Then $sOutput&="Unknown"&";"
WEnd

MsgBox(0, "", $sOutput)

 

Link to comment
Share on other sites

2 minutes ago, AspirinJunkie said:

Maybe i misunderstood something but if you use an offset in StringRegExp and want to match from the beginning of the current position then you have to use \G instead of ^:

StringRegExp("abc", "\G[a-z]", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "\G[a-z]", 1, 2)
ConsoleWrite(@error&@CRLF)

 

Ah! you are right!

Thank you ^^' Totally missed that.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...