Imbuter2000 Posted April 22, 2016 Posted April 22, 2016 I'm really going mad for a regular expression that doesn't work if I add or just invert the alternation. Try this script: $screen = "" _ & " AFASN0M A N A G R A F E INTERROGAZIONE SINTETICA - ABC 22/04/16 " _ & @CRLF & "=============================================================================== " _ & @CRLF & "Ndg: 12345678 Tipo: 10100 P.F. - EFF/COMPL. Sorvegl. " _ & @CRLF & "GHILARDI ABCDEF " _ & @CRLF & "VIA IV OTTOBRE 12 c/o " _ & @CRLF & "12345 PONTENUCOLA BG Cittad: ITALIANA " _ & @CRLF & "Nato il: 12/12/1912 A: MARISOLE Prov.: BG Sesso: M " _ & @CRLF & "C.F. ABCDEFGHM14I858O P.I. Tel. " _ & @CRLF & "SAE/RAE 200/ Data Visura: 12/12/1912 Prof. 000007 " _ & @CRLF & "Cod. C.R. 1234567890 Data: 01/2000 Cod. C.R.A. Data: " _ & @CRLF & "Segmento cliente: 12 MASS " _ & @CRLF & "Informazioni da altre societa' del gruppo " local $regex_one="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _ & "\r\n" local $regex_two="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +" $regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")" ; it DOESN'T MATCH :( $regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :) $regex_test_simple = $regex_two ; it MATCHES :) If StringRegexp($screen,$regex_test_one) = 1 Then Msgbox(0,"","match") Else Msgbox(0,"","NOT match") EndIf What the hell is the reason why $regex_test_one doesn't match while $regex_test_two and $regex_test_simple match, when they should all match????!
jchd Posted April 22, 2016 Posted April 22, 2016 Your code doesn't behave like you seem to believe it does. A more verbose version (subject and expressions unchanged): expandcollapse popup#include <Array.au3> $screen = "" _ & " AFASN0M A N A G R A F E INTERROGAZIONE SINTETICA - ABC 22/04/16 " _ & @CRLF & "=============================================================================== " _ & @CRLF & "Ndg: 12345678 Tipo: 10100 P.F. - EFF/COMPL. Sorvegl. " _ & @CRLF & "GHILARDI ABCDEF " _ & @CRLF & "VIA IV OTTOBRE 12 c/o " _ & @CRLF & "12345 PONTENUCOLA BG Cittad: ITALIANA " _ & @CRLF & "Nato il: 12/12/1912 A: MARISOLE Prov.: BG Sesso: M " _ & @CRLF & "C.F. ABCDEFGHM14I858O P.I. Tel. " _ & @CRLF & "SAE/RAE 200/ Data Visura: 12/12/1912 Prof. 000007 " _ & @CRLF & "Cod. C.R. 1234567890 Data: 01/2000 Cod. C.R.A. Data: " _ & @CRLF & "Segmento cliente: 12 MASS " _ & @CRLF & "Informazioni da altre societa' del gruppo " local $regex_one="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _ & "\r\n" local $regex_two="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +" $regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")" ; it DOESN'T MATCH :( $regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :) $regex_test_simple = $regex_two ; it MATCHES :) Local $a = StringRegexp($screen,$regex_test_one, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a) EndIf $a = StringRegexp($screen,$regex_test_two, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a) EndIf $a = StringRegexp($screen,$regex_test_simple, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a) EndIf This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Imbuter2000 Posted April 22, 2016 Author Posted April 22, 2016 Jchd your code has the same wrong results of mine: as far as I can see it doesn match with $regex_test_one while it matches with $regex_test_two and $regex_test_simple It's like saying that the regex "A|B" doesn't match over the text "B" (while the regex "B|A" or just "B" correctly match). Do you understand why in my example the regex "A|B" doesn't match the text "B"?
czardas Posted April 22, 2016 Posted April 22, 2016 (edited) $regex_test_one = "(?s:"&$regex_one&")|("&$regex_two&")" ; it MATCHES :) $regex_test_two = "(?s:"&$regex_two&")|("&$regex_one&")" ; it MATCHES :) or maybe $regex_test_one = "(?s:)("&$regex_one&")|("&$regex_two&")" ; it MATCHES :) $regex_test_two = "(?s:)("&$regex_two&")|("&$regex_one&")" ; it MATCHES :) Don't ask me - what's the difference? Edited April 22, 2016 by czardas operator64 ArrayWorkshop
Imbuter2000 Posted April 22, 2016 Author Posted April 22, 2016 I think that it's a bug of the regular expression engine or implementation, what do you think?
czardas Posted April 22, 2016 Posted April 22, 2016 (edited) I think regular expressions must have been invented by aliens. I would expect the same result as you, but I don't know if it's inconsistent implementation or my misconception. I guess someone will be able to fathom it out and hopefully cast some light on it. I tend to separate groups as above, which seems to do the trick in certain cases. Edited April 22, 2016 by czardas operator64 ArrayWorkshop
mikell Posted April 22, 2016 Posted April 22, 2016 It's probably because the regex one is wrong expandcollapse popup#include <Array.au3> $screen = "" _ & " AFASN0M A N A G R A F E INTERROGAZIONE SINTETICA - ABC 22/04/16 " _ & @CRLF & "=============================================================================== " _ & @CRLF & "Ndg: 12345678 Tipo: 10100 P.F. - EFF/COMPL. Sorvegl. " _ & @CRLF & "GHILARDI ABCDEF " _ & @CRLF & "VIA IV OTTOBRE 12 c/o " _ & @CRLF & "12345 PONTENUCOLA BG Cittad: ITALIANA " _ & @CRLF & "Nato il: 12/12/1912 A: MARISOLE Prov.: BG Sesso: M " _ & @CRLF & "C.F. ABCDEFGHM14I858O P.I. Tel. " _ & @CRLF & "SAE/RAE 200/ Data Visura: 12/12/1912 Prof. 000007 " _ & @CRLF & "Cod. C.R. 1234567890 Data: 01/2000 Cod. C.R.A. Data: " _ & @CRLF & "Segmento cliente: 12 MASS " _ & @CRLF & "Informazioni da altre societa' del gruppo " local $regex_one="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _ & "\r\n" local $regex_two="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +" $regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")" ; it DOESN'T MATCH :( $regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :) $regex_test_simple1 = $regex_one ; $regex_test_simple2 = $regex_two ; it MATCHES :) Local $a = StringRegexp($screen,$regex_test_one, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a, "$regex_test_one") EndIf $a = StringRegexp($screen,$regex_test_two, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a, "$regex_test_two") EndIf $a = StringRegexp($screen,$regex_test_simple1, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a, "$regex_test_simple1") EndIf $a = StringRegexp($screen,$regex_test_simple2, 1) If @error Then Msgbox(0,"Error", @error) Else _ArrayDisplay($a, "$regex_test_simple2") EndIf
jchd Posted April 22, 2016 Posted April 22, 2016 Is this a joke guys? Here I just added a title to ArrayDisplays: This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
jchd Posted April 22, 2016 Posted April 22, 2016 @Imbuter2000, If you meant that regex_one (note: not regex_test_one) alone doesn't match, this is no surprise since the final \r\n in the expression can never match. Hence this part of the alternation is pointless. Your regexps are terrible and imply a whole lot of useless backtracking. Besides, it is unclear what you actually intend to capture, how the various fields are formatted and whether they are mandatory or not. iamtheky 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
iamtheky Posted April 23, 2016 Posted April 23, 2016 9 hours ago, jchd said: Your regexps are terrible I vote this to replace 'resistance is obligatory' jchd 1 ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
Imbuter2000 Posted April 26, 2016 Author Posted April 26, 2016 Guys, the regexps that I wrote and seem terrible to you were simplifyied by me to the minimum necessary to display the bug. The original regexps that I'm using are the following and they're used in alternation because yes, only one at a time matches, in the $screen of the example only $regex_two matches, and further in my script I have an "if" that makes different things depending on what of the two matches. local $regex_one=" .* A N A G R A F E INTERROGAZIONE SINTETICA.* " _ & "\r\n=============================================================================== " _ & "\r\nNdg: " & $ndg & " +Tipo: \d{5} +.* Sorvegl\. .*" _ & "\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _ & "\r\n" local $regex_two=" .* A N A G R A F E INTERROGAZIONE SINTETICA.* " _ & "\r\n=============================================================================== " _ & "\r\nNdg: " & $ndg & " +Tipo: \d{5} +(P\.F\. -|COINT -) .* Sorvegl\. .*" _ & "\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +" _ & "\r\nNato il: (\d\d/\d\d/\d\d\d\d) A: ([^ ].*?) +Prov\.: (\w\w) +Sesso: (\w) *" _ & "\r\nC\.F\. +(\w+|) +P\.I\. +Tel\..*" _ & "\r\n(?s:.*)" _ & "\r\nCODTX: ____ .*" _ & "\r\n" So, is there a valid reason why the order in the following alternations makes the difference between not matching and matching over $screen? this doesn't match: "(?s:"&$regex_one&"|"&$regex_two&")"this matches: "(?s:"&$regex_two&"|"&$regex_one&")" Why??? I was also always confident that there were no need to add the parenthesis like czardas suggested. I learned to not use parenthesis when they're not needed. Should I start using them again around every elements in every alternations to be sure to bypass this bug/weird limitation of regexes in AutoIT? P.S.: I use regular expressions like this for 20 years for reading and parsing AS/400 terminal screens, if you think that my regexps are terrible can you please explain what's the problem in these?
jchd Posted April 26, 2016 Posted April 26, 2016 @Imbuter2000, You still didn't answer post #8, nor other subsequent questions. Making the thread a moving target doesn't simplify things. The reason why the following (at this time): local $regex_one="\r\n(.*?) +" _ & "\r\n(.*?) +c/o .*" _ & "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _ & "\r\n" doesn't match is due to the extra "\r\n" at the tail of the regexp, something that you could have checked by yourself. Also, option dotall (?s) makes dot (.) match \r\n, \r and \n since the implementation uses (*ANYCRLF) internally. Using the unbounded sequence .* implies a lot of backtracking. Use site http://regex101.com/ to see how things work. From post #8+ the question is still pending: which fields do you want to capture and are some fields optional? Since you now tell us that the subjects come from screen captures, I believe the subjects' format is fixed and extracting fields with StringMid() would probably be much simpler and more reliable. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Imbuter2000 Posted April 26, 2016 Author Posted April 26, 2016 jhcd, I didn't unswer to post #8 because the second image is broken (not visible) and I didn't understand your points. If you read again my first message, the question is not why $regex_one doesn't match. It's correct, it doesn't have to match in this example.The question, as you can read in my first post, is why $regex_test_one - that includes an alternation or both the not matching and matching versions - doesn't match! P.S.: StringMid to validate a text screen is impossible and to parse multiple elements in a whole text screen would require multiple StringMids and minutes for counting the position and lenght of any field. With a single regex built in less time I validate and parse the multiple elements. Validation is necessary as many pages could show unexpected fields and rows. The two regexes in alternation are there because of this, to work well in both the situations, where StringMid would blindly fail.
jchd Posted April 26, 2016 Posted April 26, 2016 AFAICT he three window captures in post #8 are visible here wih both FireFox and Chrome. I don't know what I can do about that. Run the script by yourself and see that your statement $regex_test_one doesn't match! is wrong.. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Imbuter2000 Posted April 26, 2016 Author Posted April 26, 2016 I just copied and pasted the code of my first message in my AutoIT, pressed F5 and I obtained a msgbox with the text "NOT match". Do you obtain "match" instead? (!?)
jchd Posted April 26, 2016 Posted April 26, 2016 Definitely: This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
jchd Posted April 26, 2016 Posted April 26, 2016 Use a different way to post your screenshot, this one isn't visible. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now