Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 As buggy as David's implementation was, at least simple patterns I expect to work... do. Is PCRE really just retarded or am I missing something completely obvious?I wish I knew enough expressions to comment. I'm pretty much limited to using the API based on the documentation and then relying on you guys and the test exe to see if it's working OK. But unless I compiled it incorrectly then it should be working as intended. Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 http://www.regextester.com/ Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 This one as well.http://www.lumadis.be/regex/test_regex.php?lang=enHmm, do we not need the old array[0] value?re> /(Foo)*/gdata> FooFooFoo0: FooFooFoo1: Foo0: So the 0 value (which we are now throwing away) does indeed match the entire thing, and then the first captured sub pattern is the single Foo.? Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
trids Posted October 3, 2006 Share Posted October 3, 2006 I found what I think might be an issue. Here is the code:..The output you got (1 match) is what i expect ... cos what matched was the entire pattern ( ie: including the "Uniques*" ). Link to comment Share on other sites More sharing options...
thomasl Posted October 3, 2006 Share Posted October 3, 2006 (edited) A general remark about REs: it is easy to produce meaningless patterns that actually crash an RE engine, a sort of "while 1 ... wend" thing in RE syntax. What happens depends on the actual implementation.Therefore, such an effect is not necessarily a bug in PCRE or in Jon's implementation of PCRE. It all depends on the pattern.As to (.*?): this a pattern that matches 0 or or more of whatever (that's .*) but is not greedy (the ?). So it matches any string (say "test") first in position 0 and returns an empty string. Then it matches the "t", then the empty string between "t" and "e", then the empty string between "e" and "s" and so on. It returns 9 matches.(.+?) returns exactly the four matches "t", "e", "s", "t", as one would expect.I agree that REs can be hell but then again they are a completely logical hell EDIT:I wish I knew enough expressions to comment. I'm pretty much limited to using the API based on the documentation and then relying on you guys and the test exe to see if it's working OK. But unless I compiled it incorrectly then it should be working as intended.PCRE does work as intented. It is used in dozens of high-profile apps.The fact that patterns don't do what people expect probably reflects more on their understanding of REs (or lack thereof) than actual errors in PCRE. (Note the "probably": this is not to say that PCRE has no bugs; it sure has. But if used correctly it tends to work correctly.)One of the good things about PCRE (and Perl REs in general) is that they are well-documented, so it shouldn't be too difficult to get the hang of it. Much of what has been written in this thread is a classic case of RTFM.As to the pattern (and results) Nutster's code accepted (and delivered), I would take these with a pinch of salt. They were definitely not Perl compatible. Edited October 3, 2006 by thomasl Link to comment Share on other sites More sharing options...
thomasl Posted October 3, 2006 Share Posted October 3, 2006 (edited) I have now downloaded the newest build and played a bit with it. My batch of patterns still work (though that's mostly ...Replace() stuff with backreferences etc.). What doesn't work at all is StringRegExp(), flag=3, ie global match. $s="test" $b=StringRegExp($s,"(.*?)",3) for $i=0 to ubound($B)-1 ConsoleWrite("!"&$b[$i]&"!"&@CRLF); next This should return the nine strings as detailed in my other post, above. Simpler patterns like a lone . also don't work. Edit: code Edited October 3, 2006 by thomasl Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 I have now downloaded the newest build and played a bit with it. My batch of patterns still work (though that's mostly ...Replace() stuff with backreferences etc.). What doesn't work at all is StringRegExp(), flag=3, ie global match. $s="test" $b=StringRegExp($s,"(.*?)",3) for $i=0 to ubound($B)-1 ConsoleWrite("!"&$b[$i]&"!"&@CRLF); next This should return the nine strings as detailed in my other post, above. Simpler patterns like a lone . also don't work. Edit: codeIt's working here (9 strings). I'm about to upload a new build in 10 mins so try again with that. Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 Ok, new build: http://www.autoitscript.com/autoit3/files/...utoIt3-pcre.exeI added option 2 and 4.Option 2, same as option 1 but it returns the full match as well in array[0] ( like preg_match() )Option 4, same as option 3 but returns an array of arrays Each sub array is like the single return value from option 2. This is like the php / preg_match_all() return value.Examples:;Option 2, single return, php/preg_match() style $array = StringRegExp('<test>a</test> <test>b</test> <test>c</Test>', '<(?i)test>(.*?)</(?i)test>', 2) for $i = 0 to UBound($array) - 1 msgbox(0, "Option 2 - " & $i, $array[$i]) Next ;Option 3, global return, old AutoIt style $array = StringRegExp('test', '(.*?)', 3) for $i = 0 to UBound($array) - 1 msgbox(0, "Option 3 - " & $i, $array[$i]) Next ;Option 4, global return, php/preg_match_all() style $array = StringRegExp('<test>a</test> <test>b</test> <test>c</Test>', '<(?i)test>(.*?)</(?i)test>', 4) for $i = 0 to UBound($array) - 1 $match = $array[$i] for $j = 0 to UBound($match) - 1 msgbox(0, "Option 4 - " & $i & ',' & $j, $match[$j]) Next Next Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
MHz Posted October 3, 2006 Share Posted October 3, 2006 This one as well.http://www.lumadis.be/regex/test_regex.php?lang=enThe Regex Coach is a nice free pcre compatible regex test program that may help with patterns and results. Link to comment Share on other sites More sharing options...
thomasl Posted October 3, 2006 Share Posted October 3, 2006 (edited) Ok, new build: http://www.autoitscript.com/autoit3/files/...utoIt3-pcre.exeI added option 2 and 4.Thx. (The previous build did work after all..., after I got me flaming paths sorted )Now all is well. Well, almost...StringRegExp("F1oF2oF3o","(F.o)*?",3) should give seven matches. AU3 gives only three, omitting the four empty matches (the other example -- "(.*? )" -- works):--AU3 :F1oAU3 :F2oAU3 :F3oPerl:Perl:F1oPerl:Perl:F2oPerl:Perl:F3oPerl:--I will continue to throw REs at it.EDIT: Mode 4 hangs with StringRegExp("test","(.*?)",4) Edited October 3, 2006 by thomasl Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 Fixed:http://www.autoitscript.com/autoit3/files/...utoIt3-pcre.exe Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
thomasl Posted October 3, 2006 Share Posted October 3, 2006 (edited) Fixed Here's another thing to chew over. $s="test"&@CRLF&"test" ConsoleWrite($s&@CRLF) $s=StringRegExpReplace($s,".","_") ConsoleWrite($s&@CRLF) This replaces everything with the exception of the LF (ie it also replaces the CR): test test !!!!! !!!! Now this whole CR/LF handling is a thorny problem anyway. Perl REs have an option that switches between \n (which Perl assumes to be "\n" under *x and "\r\n\" under Win32) being treated like a string terminator (ie not matched by a .) or as just another character. Your code seems to work under the assumption that \n is a terminator, not a normal character, which is fine for most matches and replaces (though at some point there should be an option to switch this off). But I am not sure about the semantics in terms of coding for AU3: if LF is not replaced, perhaps CR shouldn't either. EDIT: Here's more. StringRegExp("test"&@CRLF&"test",".",3) works as expected: nine matches (2*4 for the test's and 1 for the CR). OTOH, StringRegExp("test"&@CRLF&"test","(.*?)",3) simply stops matching after the LF. Edited October 3, 2006 by thomasl Link to comment Share on other sites More sharing options...
Valik Posted October 3, 2006 Share Posted October 3, 2006 The output you got (1 match) is what i expect ... cos what matched was the entire pattern ( ie: including the "Uniques*" ).Please explain to me what's going on then because from what I understand about regular expressions, it should start matching on Unique and once that part of the pattern matches, it moves to the first Foo which also matches the pattern. Then because of the repitition operator, it should move to the next and final Foo in the string which still matches because we are repeatedly capturing Foo's. What I was trying to do was find a unique position in a string which is then followed by one or more lines of data followed by an empty line. I wanted to capture the lines of data individually. An example of the string: Unique Data Line 1 Data Line 2Note that the example is basically like my AutoIt code above. Also, the "s*" should be "\s*". I don't know why but the forum stripped the escape sequence. Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 Your code seems to work under the assumption that \n is a terminator, not a normal character, which is fine for most matches and replaces (though at some point there should be an option to switch this off). But I am not sure about the semantics in terms of coding for AU3: if LF is not replaced, perhaps CR shouldn't either.Can't comment on the other stuff - right on the limit of my knowledge now - but I found an option in the pcrelib that is set at compile time that says you can specify a newline as \n or \r (a single char) it doesn't seem to have any options for \r\n. Our library was compiled with \n specified. It may be that when using CRLF sequences you have to strip them with StringStripCR() first to get expected results. Dunno. Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 Please explain to me what's going on then because from what I understand about regular expressions, itIs this any closer *makes straw grasping motion* re> /(?U)(Foo)*/gdata> FooFooFoo Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Valik Posted October 3, 2006 Share Posted October 3, 2006 Jon, I think that option sets what character(s) \n means. It can be either LF (Probably the default), CR or CRLF. I'm pretty sure I saw a flag in the documentation that sets it to CRLF, too. IMO, leaving \n to mean LF is fine because we can build a CRLF sequence with \r\n. However, it shouldn't affect \s, which is what I used above, because \s matches all whitespace characters and because of the repetition, it'll catch both CR and LF. Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 The code is looking pretty good to me (RE differences problems rather than buggy code problems). So the important thing is who can write the help file page on this - because I certainly can't! Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
trids Posted October 3, 2006 Share Posted October 3, 2006 .. What I was trying to do was find a unique position in a string which is then followed by one or more lines of data followed by an empty line. I wanted to capture the lines of data individually. An example of the string: Unique Data Line 1 Data Line 2Note that the example is basically like my AutoIt code above. ..ok .. then this RE "Unique\s*(?(Foo)\s*))*" .. with PCRE calls via Thomasl's wrapper i get two "Foo"s HTH Link to comment Share on other sites More sharing options...
Administrators Jon Posted October 3, 2006 Administrators Share Posted October 3, 2006 I swear anyone who understands this stuff is clinically insane. pixelsearch 1 Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/ Link to comment Share on other sites More sharing options...
Valik Posted October 3, 2006 Share Posted October 3, 2006 (edited) ok .. then this RE "Unique\s*(? (Foo)\s*))*".. with PCRE calls via Thomasl's wrapper i get two "Foo"sHTH Alright, that works. Now explain to me why. All you did was add another capture. How does that magically get it working?Edit: And I can simplify that to this "Unique\s*((Foo)\s*)*" and it still works further adding to my confusion. If that simplified form works, why does the non-capturing form not work? Edited October 3, 2006 by Valik Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now