czardas Posted March 30, 2013 Share Posted March 30, 2013 (edited) This may be nothing more than a help file issue. The help file states that \x represents ascii codes. Let's test this assumption. Local $sTestString = "" For $i = 0 To 255 $sTestString &= Chr($i) Next ; $sTestString has a string Length of 255 characters ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[\x00-\xFF]", "") ; Now $sTestString has a string Length of 27 characters ; What went wrong? For $i = 1 To StringLen($sTestString) ; The following 27 characters were not replaced ConsoleWrite(Asc(StringMid($sTestString, $i, 1)) &@LF) Next In conclusion, either regexp is broken, my machine is broken or the help file is wrong about what \x actually does. Edited March 30, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 (edited) Here's a work around if anyone needs it. Local $sTestString = "" For $i = 0 To 255 $sTestString &= Chr($i) Next ; $sTestString has a string Length of 255 characters Local $sSRE = "" For $i = 128 To 255 $sSRE &= Chr($i) Next ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[" & $sSRE & "\x00-\x7F]", "") ; Now $sTestString has a string Length of 0 characters MsgBox(0, "", StringLen($sTestString)) Edited March 30, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
DXRW4E Posted March 30, 2013 Share Posted March 30, 2013 (edited) Hi czardas, will have to be $sTestString = StringRegExpReplace($sTestString, "[[:ascii:]\x80-\xff]+", "") Ciao. Edited March 30, 2013 by DXRW4E Link to comment Share on other sites More sharing options...
AZJIO Posted March 30, 2013 Share Posted March 30, 2013 Hi czardas, will have to be $sTestString = StringRegExpReplace($sTestString, "[[:ascii:]\x80-\xff]+", "") Ciao. My other projects or all Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 (edited) Hi czardas, will have to be $sTestString = StringRegExpReplace($sTestString, "[[:ascii:]\x80-\xff]+", "") Ciao. What's the difference? That's inconsistant with the function Chr() which will sometimes return other characters. It's only a help file description issue. The help file says for \x Match the ascii character whose code is given in hexadecimal. \x80-\xFF is not consistant with the table of AscII characters in my help file : ascii code page win-2152 Edited March 30, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
DXRW4E Posted March 30, 2013 Share Posted March 30, 2013 Look here for more http://www.autoitscript.com/autoit3/pcrepattern.html Ciao. Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 (edited) x is in fact Unicode. See the to my post in the help file thread. Edited March 30, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
DXRW4E Posted March 30, 2013 Share Posted March 30, 2013 (edited) I do not think, this example is wrong then??, because French is not unicode? http://www.autoitscript.com/autoit3/pcrepattern.html ect ect ect if character tables for a French locale are in use, [xc8-xcb] matches accented E characters in both cases ect ect ect Ciao. Edited March 30, 2013 by DXRW4E Link to comment Share on other sites More sharing options...
jchd Posted March 30, 2013 Share Posted March 30, 2013 (edited) All of our strings are Unicode, the issue isn't there. Indeed, the help file should say "US-ASCII", which refers to the range x00-x7F.Now remember that current implementation of PCRE in AutoIt painfully converts strings (subject and patterns) to UTF-8 before submitting them to the engine. This is no problem with US-ASCII since this range is common to Unicode and all codepages.The issue arises with codepoints > 0x7F as you can see:For $i = 128 To 255 ConsoleWrite(Hex($i, 2) & ' = ' &StringToBinary(Chr($i), 4) & @LF) NextNone of those character are represented by a single byte, thanks to UTF-8 representation.In the pattern, x00-xFF is taken litterally and compiled into the engine verbatim. EDIT: that's untrueAre things clearer now? Edited March 30, 2013 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 Yes it clarifies what is happening, thanks. I think it's easy to take things you read at face value. ASCII code pages include an extended range. Perhaps the term ASCII is sometimes used too freely, and the small change you suggest would hint that there's something more going on. Looking at my ASCII code page of characters is going to be misleading. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted March 30, 2013 Share Posted March 30, 2013 That's why ASCII (originally 7-bit) ⊊ ANSI. ⊊ means "is a subset of but not equal to" czardas 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
trancexx Posted March 30, 2013 Share Posted March 30, 2013 (edited) All of our strings are Unicode, the issue isn't there. Indeed, the help file should say "US-ASCII", which refers to the range x00-x7F. Now remember that current implementation of PCRE in AutoIt painfully converts strings (subject and patterns) to UTF-8 before submitting them to the engine. This is no problem with US-ASCII since this range is common to Unicode and all codepages. The issue arises with codepoints > 0x7F as you can see: For $i = 128 To 255 ConsoleWrite(Hex($i, 2) & ' = ' &StringToBinary(Chr($i), 4) & @LF) Next None of those character are represented by a single byte, thanks to UTF-8 representation. In the pattern, x00-xFF is taken litterally and compiled into the engine verbatim. Are things clearer now? Where did you get those information form? Or from whom? edit: Oh I see, it's in the help file. Never mind. Edited March 30, 2013 by trancexx ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
trancexx Posted March 30, 2013 Share Posted March 30, 2013 (edited) Shit, I just read every line of the help file regarding regexp and everything looks fine. czardas, what help file you are talking about? Could you check for which AutoIt version that help file is written for? Edited March 30, 2013 by trancexx ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
jchd Posted March 30, 2013 Share Posted March 30, 2013 I finally got a few minutes to dig further. In fact I told bullshit (but I was not the only one). Run this simple test and you'll see what "fiat lux" means: Local $sTestString = "" For $i = 0 To 255 $sTestString &= Chr($i) Next ; $sTestString has a string Length of 255 characters ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[\x00-\xFF]", "") ; Now $sTestString has a string Length of 27 characters ; What went wrong? ConsoleWrite("Using Chr($i)" & @LF) For $i = 1 To StringLen($sTestString) ; The following 27 characters were not replaced ConsoleWrite(Asc(StringMid($sTestString, $i, 1)) &@LF) Next $sTestString = "" For $i = 0 To 255 $sTestString &= ChrW($i) ; this is where the difference lies (pun intended) Next ; $sTestString has a string Length of 255 characters ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[\x00-\xFF]", "") ConsoleWrite("Using ChrW($i)" & @LF) For $i = 1 To StringLen($sTestString) ConsoleWrite(Asc(StringMid($sTestString, $i, 1)) &@LF) Next This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 Sorry trancexx, it's an old one. I apologize. I'll unpack a box and take a look at the current version. Oops. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
trancexx Posted March 30, 2013 Share Posted March 30, 2013 I finally got a few minutes to dig further. In fact I told bullshit (but I was not the only one). Run this simple test and you'll see what "fiat lux" means: Local $sTestString = "" For $i = 0 To 255 $sTestString &= Chr($i) Next ; $sTestString has a string Length of 255 characters ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[\x00-\xFF]", "") ; Now $sTestString has a string Length of 27 characters ; What went wrong? ConsoleWrite("Using Chr($i)" & @LF) For $i = 1 To StringLen($sTestString) ; The following 27 characters were not replaced ConsoleWrite(Asc(StringMid($sTestString, $i, 1)) &@LF) Next $sTestString = "" For $i = 0 To 255 $sTestString &= ChrW($i) ; this is where the difference lies (pun intended) Next ; $sTestString has a string Length of 255 characters ; Remove all characters $sTestString = StringRegExpReplace($sTestString, "[\x00-\xFF]", "") ConsoleWrite("Using ChrW($i)" & @LF) For $i = 1 To StringLen($sTestString) ConsoleWrite(Asc(StringMid($sTestString, $i, 1)) &@LF) Next You didn't say much wrong. The issue is indeed conversion between encoding. But the confusion is created by Chr() function. I'm not sure what help file issues you both are referring too. ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
czardas Posted March 30, 2013 Author Share Posted March 30, 2013 (edited) Lol you're right, it's different. I was right - it needed changing, but it has been done already. I feel quite embarrassed. The current help file was in storage. I didn't intend to waste anyone's time. I got a lot of good help today. @JCHD I stumbled upon the same thing testing with ChrW, but I got the string length wrong. I wrote 255 instead of 256 in the comments. Edited March 30, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted March 31, 2013 Author Share Posted March 31, 2013 The implications of this are beginning to dawn on me. Any function which uses ANSI such as _HexToString() are liable to fail under certain circumstances. For example: hard coded hex strings converted using an inappropriate code page. Ouch! operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted March 31, 2013 Share Posted March 31, 2013 Correct. That's precisely what motivated Unicode: run out the ANSI and other uncomplete codepages hell. Unicode isn't exempt of difficulties but it brings too many advantages for only dark corner drawbacks. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted March 31, 2013 Author Share Posted March 31, 2013 (edited) Now I think I need to redesign one or two things. To begin with - my win-1252 keyboard could be made to work with any code page. I never thought to use Unicode to represent the ANSI, but it's an intriguing idea. At the moment it requires win-1252 to be the default code page. It's also possible to design a number of similar extended ASCII code page keyboards which will work on any Windows machine. In some ways it might seem a strange thing to do, but I quite like the idea. Edited March 31, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now