Jump to content

regexp and ANSI escape sequences


Recommended Posts

... I'm still stuck ... (damn regular expressions... :blink:)
this regexp by mr. @mikell

StringRegExpReplace($s2, '(?=[\r\n])', "*")

places an asterisk before the matched pattern, but I should place also another asterisk exactly after the matched pattern so that the matched character should be surrounded by two asterisks (one before and one after)
could some good soul skilled in regexp provide such pattern... ?

and kindly also another regexp to clean the main string from those control chars?

any help is greatly appreciated, thanks :)

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

2 hours ago, Chimp said:

... so that the matched character should be surrounded by two asterisks (one before and one after)

The easiest way is

$s = "abcde12345abcde12345"
$s = StringRegExpReplace($s, '([c3])', "*$1*")
Msgbox(0,"", $s)

 

2 hours ago, Chimp said:

also another regexp to clean the main string from those control chars?

For this, you can just add an alternation in the 2nd pattern from jchd  :)

Local $sClean = StringRegExpReplace($s, "(?x) (\x1B \[ (?:\s*\d*\s*;?)* [[:alpha:]]) | [\x1E\x0A]", "")
MsgBox(0, "Pure text", $sClean)

 

Link to comment
Share on other sites

tataaa... @mikell to the rescue (as always). Thank You
I had tried the same pattern on the second parameter as the one you used, but the two asterisks were still positioned before the captured character (I did not know that I had to modify the first pattern as well).
The fact that in the regular expressions every little variation on the model can completely change the final result, fascinates and scares at the same time ...

Thanks again

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Try this:

#Include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & _
            Chr(27) & "[ 47;32m And this is green on white" & chr(10) & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition," & @TAB & "albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI " & Chr(27) & "sequence" & _
            Chr(27) & "[ z but this one is OK and final bell!" & Chr(7)

Local Const $_Not_A_Char = '*'
Local $s2 = Execute("'" & StringRegExpReplace($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
                            "( \x1B (?&ANSI_Escape) | [\x00-\x1F] )", _
                            "' & $_Not_A_Char & _Constify('$2') & $_Not_A_Char & '") & "'")
Local $aRes = StringRegExp($s2, "(?x) (\" & $_Not_A_Char & ".+? \" & $_Not_A_Char & " | [^\" & $_Not_A_Char & "]+ )", 3)
_ArrayDisplay($aRes, "Mixed results")

; example of "dressing" control characters and ANSI sequences
Func _ConstiFy($c)
    Local Static $aK = [ _
        'NUL', 'SOH', 'STX', 'ETX', 'EOT', 'ENQ', 'ACK', 'BEL', _
        'BS',  'TAB', 'LF',  'VT',  'FF',  'CR',  'SO',  'SI', _
        'DLE', 'DC1', 'DC2', 'DC3', 'DC4', 'NAK', 'SYN', 'ETB', _
        'CAN', 'EM',  'SUB', 'ESC', 'FS',  'GS',  'RS',  'US' _
    ]
    Return('@' & $aK[AscW(StringLeft($c, 1))] & StringMid($c, 2))
EndFunc

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Yeah, \x00 in subject was a problem with these old PCRE1 versions. Things have changed since then, with current PCRE2 (which current AUtoIt releases do not use yet).

If we really need to support Chr(0) in subject (an anomaly in 99.999% use cases) we can still use a first step with StringReplace but I didn't find myself pressed to do so.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Hello again and thanks for the previous helps. I'm again here to ask for a further little help...
I've made some tests using this last pattern, it works well, but parsing the main string in 2 steps force to sacrify at least one char to be used as "separator" (the one choosed in the $_Not_A_Char variable). Doing so, the pattern may fail if that char is also found in the main text.
Also, "dressed" control codes are very nice, but in this context is easyer to work with "nude" control chars,  so that I can see if an array element contains an escape sequence, a control char or just normal text with something like this:

If $aRes[$n] > 0 And $aRes[$n] < 32 Then ; a control code?
        MsgBox(0, 0, "Element " & $n & "contains a Control char")
    ElseIf StringLeft($aRes[$n], 2) == Chr(27) & "[" Then ; an ESCAPE sequence ?
        MsgBox(0, 0, "Element " & $n & "contains an ESCAPE sequence")
    Else ; normal text
        MsgBox(0, 0, "Element " & $n & "contains normal text")
    EndIf


Since the first step works well on escape sequences, is it possible to add in the first regexp a second "filter" so that it can catch an <ESCAPE sequence> OR a <control char> as [from chr(1) to chr(31)] in just one step?. (I hope I'm not too boring)
p.s.
My ANSI file viewer is beginning to work, but still with some issues. as soon as I have an at least working version I will post it.
Thanks

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Hum :)
You might (maybe...) try this, using 0xfffd (replacement character) as "separator"

#Include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & _
            Chr(27) & "[ 47;32m And this is green on white" & chr(10) & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition," & @TAB & "albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI " & Chr(27) & "sequence" & _
            Chr(27) & "[ z but this one is OK and final bell!" & Chr(7)

Local $aRes = StringRegExp($s, "(?x)" & _
        "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
        "(?| \x1B (?&ANSI_Escape) | [\x01-\x1A\x1C-\x1F] | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
; _ArrayDisplay($aRes, "Mixed results")

$char = chrw(0xfffd)

$s2 = _ArrayToString($aRes, $char)
$s2 = Execute("'" & StringRegExpReplace($s2, "([\x01-\x1A\x1C-\x1F])", "' & $char & ascw('$1') & '") & "'") 
$aRes2 = StringSplit($s2, $char, 2)
_ArrayDisplay($aRes2, "Mixed results2")

 

Link to comment
Share on other sites

Hi @mikell, thanks very much for your reply

... running the example, it seems that: the control chars are strored in the array elements as "decimal" numbers instead of as a single byte with the hex value of the control char,

and in the _ArrayDisplay I see that the @TAB control char is not "splitted" in it's own element (see element [11]), but it remain as part of the following string "albeit probably .....

is something wrong?

 

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

I post this after Mikell already suggested the same idea: I got diverted elsewhere in the meantime.

3 minutes ago, Chimp said:

I've made some tests using this last pattern, it works well, but parsing the main string in 2 steps force to sacrify at least one char to be used as "separator" (the one choosed in the $_Not_A_Char variable). Doing so, the pattern may fail if that char is also found in the main text.

Feel free to use $_Not_A_Char = ChrW(0xFFFD) as, since it's not a valid (useable) Unicode codepoint, you normally have zero chance of hitting it in the source stream.

About how to decorate or not control chars and escape sequences, this is up to you as your needs govern what to do here. That's why I decided to invoke a separate function for doing so.

7 minutes ago, Chimp said:

Since the first step works well on escape sequences, is it possible to add in the first regexp a second "filter" so that it can catch an <ESCAPE sequence> OR a <control char> as [from chr(1) to chr(31)] in just one step?.

This precisely what the first regexp does: it catches both.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

9 hours ago, jchd said:

..... This precisely what the first regexp does: it catches both.

yes, I'm in searching of the horse, and I don't see that I'm already  on horseback...
I see that your last good pattern splits "ansi sequences" AND "control chars" each  in it's own element in the array, ... but I'm not been able to modify that pattern so to get an array with ansi sequences and control codes "undressed", that is without the (nice) @mnemonic and without both asterisks, but just with the byte value of the ctrl-codes.
so, how to modify that regexp so to have one that returns raw ansi sequences, ctrl-codes and normal text each in it's own elements of the array? (sorrry for my poor inexistent skill in regexp...)

Thanks again..

 

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Well let's try this one
(the result in _ArrayDisplay is to be compared with the one from jchd's last code)

#Include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & _
            Chr(27) & "[ 47;32m And this is green on white" & chr(10) & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition," & @TAB & "albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI " & Chr(27) & "sequence" & _
            Chr(27) & "[ z but this one is OK and final bell!" & Chr(7)

Local $aRes = StringRegExp($s, "(?x)" & _
        "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
        "(?| \x1B(?&ANSI_Escape) | [\x01-\x1A\x1C-\x1F] | \x1B(?!(?&ANSI_Escape)) | (?:[^\x01-\x1F] (?!(?&ANSI_Escape)))+ )", 3)

 _ArrayDisplay($aRes, "Mixed results")
 
 ;MsgBox(0, "", "chr(" & Asc($aRes[3]) & ")" & @CR & "chr(" & Asc($aRes[6]) & ")")

 

Edited by mikell
Link to comment
Share on other sites

1 hour ago, mikell said:

Well let's try this one

Don't know how nor why, but in that way seems OK :)
thank you both
p.s.
... to fill my regexp gaps, only one isn't enought, I need 2 MVPs to the rescue...
thanks a lot to both of you for yours brain ...

jokes apart, Your help is very appreciated
Thank You.

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...