steve8tch Posted October 13, 2009 Posted October 13, 2009 Please consider the following code : ;RegExp Error ? $str = "Error" $ptrn = "(A.*?or)|(E.*?or)" $a = StringRegExp($str,$ptrn,3) $out = "" If @error = 0 Then For $i = 0 to Ubound($a) - 1 $out &= $i & " => " & $a[$i] & @crlf Next Msgbox(0,"Found these matches", $out) Else MsgBox(0,"","Shouldn't get here") EndIf My pattern is looking for a word starting with "E" OR "A" and ending with "or" I am expecting just 1 match The first part of the OR statement should fail, and the second part should give me a match. But in fact I get get 2 matches. 0 => 1 => Error Anyone any thoughts ? Steve
Malkey Posted October 13, 2009 Posted October 13, 2009 (edited) Please consider the following code : ;RegExp Error ? $str = "Error" $ptrn = "(A.*?or)|(E.*?or)" $a = StringRegExp($str,$ptrn,3) $out = "" If @error = 0 Then For $i = 0 to Ubound($a) - 1 $out &= $i & " => " & $a[$i] & @crlf Next Msgbox(0,"Found these matches", $out) Else MsgBox(0,"","Shouldn't get here") EndIf My pattern is looking for a word starting with "E" OR "A" and ending with "or" I am expecting just 1 match The first part of the OR statement should fail, and the second part should give me a match. But in fact I get get 2 matches. 0 => 1 => Error Anyone any thoughts ? Steve Using this pattern appears to work in all cases. Edit: Except lower case. $ptrn = "((?:A|E).*?or)" 2nd Edit: Use this next pattern for case insensitive matching. $ptrn = "(?i)((?:A|E)[^\h]*?or)" ; Case insensitive Edited October 13, 2009 by Malkey
Authenticity Posted October 13, 2009 Posted October 13, 2009 (edited) ... My pattern is looking for a word starting with "E" OR "A" and ending with "or" ... First, your expression is looking for any A or any E somewhere that are followed by "or" on the same line. The empty match is because even though the first parentheses weren't (correct?) participate in the match they will be captured to the first index (or the index where they are). So it's sufficient to match to capture. Another alternative to Malkey's one is: ([AE].*?or) Or using word boundaries: \b([AE].*?or)\b Hope it's clear. Edit: Just to make it more accurate. Assume you got this pattern: (\w)(\d) If you'll read about back-reference you'll come to understand: (\w)(\d)(?(1)\w)(?(2)\d) ...so if the first matches the second capture nothing, but it's obvious. If the second matches then you'd want to refer to the second conditional sub-pattern to match another digit using it's second grouping parentheses and not guess which index will it be in case of not capturing the first. So if you have 9 grouping parentheses you'll definitely want all of them to capture even if only one contains value because of this back-reference mechanism. Edited October 13, 2009 by Authenticity
steve8tch Posted October 13, 2009 Author Posted October 13, 2009 Authenticity, Malkey - thanks for your detailed replies.I realised later the example I gave did not fully represent the issue I am having .. A better example would be this:;RegExp Error ? $str = "Error Success" $ptrn = "(Suc.*?cess)|(E.*?or)" $a = StringRegExp($str,$ptrn,3) $out = "" If @error = 0 Then For $i = 0 to Ubound($a) - 1 $out &= $i & " => " & $a[$i] & @crlf Next Msgbox(0,"Found these matches", $out) Else MsgBox(0,"","Shouldn't get here") EndIfResult here is :0 => 1 => Error2 => SuccessI understand (I think) why the regular expression engine needs to know what the previous match is (for back referencing purposes), but back referencing is not used here and the results from the test should not include failed matches.I have passed this pattern through online PCRE engines and RegEx buddy and they all give the expected results.Just to let you know, I have tried to create an example here to show the issue. In practice I am examining inputs and output to machine tools. The results are not words, but sequences or characters and white space. The indexing is important because I query the tools for a certain number of values and I am looking for a certain number of replies - for AutoIt to return the value for a failed OR test is not helpful as the indexing of test vs reply is important.I am really quite keen to get a solution - I have got a few million lines of log files to examine for a particular issue .. Thanks in advance for any assistance you can give me.
PsaltyDS Posted October 13, 2009 Posted October 13, 2009 (edited) Authenticity, Malkey - thanks for your detailed replies. I realised later the example I gave did not fully represent the issue I am having .. A better example would be this: ;RegExp Error ? $str = "Error Success" $ptrn = "(Suc.*?cess)|(E.*?or)" $a = StringRegExp($str,$ptrn,3) $out = "" If @error = 0 Then For $i = 0 to Ubound($a) - 1 $out &= $i & " => " & $a[$i] & @crlf Next Msgbox(0,"Found these matches", $out) Else MsgBox(0,"","Shouldn't get here") EndIf Result here is : 0 => 1 => Error 2 => Success I understand (I think) why the regular expression engine needs to know what the previous match is (for back referencing purposes), but back referencing is not used here and the results from the test should not include failed matches. I have passed this pattern through online PCRE engines and RegEx buddy and they all give the expected results. Just to let you know, I have tried to create an example here to show the issue. In practice I am examining inputs and output to machine tools. The results are not words, but sequences or characters and white space. The indexing is important because I query the tools for a certain number of values and I am looking for a certain number of replies - for AutoIt to return the value for a failed OR test is not helpful as the indexing of test vs reply is important. I am really quite keen to get a solution - I have got a few million lines of log files to examine for a particular issue .. Thanks in advance for any assistance you can give me. The result you got matches what I see on on-line PCRE tests: For example, here: Matches (Index followed by matched text): 1. Full pattern match: Error at offset 0 Sub-string #1: Sub-string #2: Error 2. Full pattern match: Success at offset 6 Sub-string #1: Success Here is what's happening: Because you used an "or" expression (the pipe symbol) it lists the results of each subgroup when either "or" expression matches, up to the one that matches. When it reads "Error", that matches the second of the two "or" expressions and it then returns the results from the two subgroups, which is "[uNDEFINED]" for (Suc.*?cess), and "Error" for (E.*?or). That's why you get the null report in [0] of the returned array. This doesn't happen when it gets to "Success" because it only lists sub-groups up to the one that matched, and this time it matched on the first sub-group. To see it even more use: $ptrn = "(NeverMatch)|(Suc.*?cess)|(E.*?or)" And then you get: 0 = "" 1 = "" 2 = Error 3 = "" 4 = Success To fix it, make the sub-groups non-capturing and return only the global result of all sub-groups together: $ptrn = "((?:Suc.*?cess)|(?:E.*?or))" Edited October 13, 2009 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
steve8tch Posted October 13, 2009 Author Posted October 13, 2009 PSaltyDS, Thank you for this help. The explanation you gave (and the solution) is brilliant. Thanks again to Authenticity and Malkey for looking at this. Hopefully the explanations you have given will be as helpful to others as they have been for me. Steve
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now