Xenobiologist Posted February 9, 2010 Share Posted February 9, 2010 Hi, first of all I'm no encoding expert Is this coorect? Should the function show negative numbers? Local $a = StringToASCIIArray(' „ ”', Default, Default, 0) ConsoleWrite('UTF-16 ' & @TAB & _ArrayToString($a, @TAB) & @CRLF) Local $b = StringToASCIIArray(' „ ”', Default, Default, 1) ConsoleWrite('ANSI ' & @TAB & _ArrayToString($b, @TAB) & @CRLF) Local $c = StringToASCIIArray(' „ ”', Default, Default, 2) ConsoleWrite('UTF-8 ' & @TAB & _ArrayToString($c, @TAB) & @CRLF) ConsoleWrite('! - - - - - - - - - - - - - - - - - - - - ' & @CRLF) ConsoleWrite( @TAB & AscW("") & @TAB) ConsoleWrite(AscW(" ") & @TAB) ConsoleWrite(AscW("„") & @TAB) ConsoleWrite(AscW(" ") & @TAB) ConsoleWrite(AscW("”") & @CRLF) UTF-16 129 32 8222 32 8221 ANSI -127 32 -124 32 -108 UTF-8 -62 -127 32 -30 -128 ! - - - - - - - - - - - - - - - - - - - - 129 32 8222 32 8221 If all is correct then I'm fine. Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
jchd Posted February 9, 2010 Share Posted February 9, 2010 (edited) I view the ANSI and UTF-8 as incorrect results. It looks like StringToASCIIArray is treating the individual codes for both ANSI and UTF-8 as signed, wrongly. Lets try it for UTF-16 with some codepoint having MSB set in its UTF-16 representation. $str contains five full width characters Local $str = "ABCDE" Local $a = StringToASCIIArray($str, Default, Default, 0) ConsoleWrite('UTF-16 ' & @TAB & _ArrayToString($a, @TAB) & @CRLF) The result is unsigned and OK. That was v3.3.5.1 under XP SP3 x86. Looks like you're ready for a ticket! EDIT: it turns out to be a little more buggy than that: $str2 contains the four "wind" mahjong tiles (codepoints > 0x10000) expandcollapse popup gives wrong results. This also shows up in the result of your first post. In short, we have several bugs here: StringLen doesn't count Unicode characters but counts every 16-bit position as a character. I admit that codepoints >= 0x10000 are not of routine usage for everyone (except for those using the new asian blocks!), but it's nonetheless a wrong behavior w.r.t. Unicode. StringToASCIIArray($string, 0) returns StringLen($string) 16-bit codes from the UTF-16LE encoding instead of returning actual Unicode codepoints. Two errors here. StringToASCIIArray($string, 1) returns the first StringLen($string) 8-bit codes (as signed values) of ANSI characters corresponding to the first StringLen($string) 16-bit codes from the UTF-16LE encoding. Three errors here. StringToASCIIArray($string, 2) returns the first StringLen($string) 8-bit values (as signed values) of StringLen($string) 8-bit codes from the UTF-8 encoding. Two errors here. I hope I didn't misrepresent the issues. Edited February 9, 2010 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options... Xenobiologist Posted February 9, 2010 Xenobiologist MVPs 4.9k 5 Xx Code~Mega xX Author Share Posted February 9, 2010 Thanks! Hopefully, some of the mods or devs find the topic and reply. Otherwise, I try my luck maybe with a ticket tomorrow. Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options... Xenobiologist Posted February 10, 2010 Xenobiologist MVPs 4.9k 5 Xx Code~Mega xX Author Share Posted February 10, 2010 I created a ticket to get a comment.Ticket Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options... Valik Posted February 10, 2010 Valik Active Members 18.3k Former developer. Share Posted February 10, 2010 My first comment is this: Read the fucking guidelines when you create a new ticket. I didn't write those to have them ignored. Link to comment Share on other sites More sharing options... Valik Posted February 10, 2010 Valik Active Members 18.3k Former developer. Share Posted February 10, 2010 (edited) Second comment: What are the correct UTF-8 values? Are they: UTF-8 194 129 32 226 128 Edited February 10, 2010 by Valik Link to comment Share on other sites More sharing options...
Valik Posted February 10, 2010 Share Posted February 10, 2010 The sign stuff was trivial and obvious to fix. Think no more about that. With that out of the way, that's not all the UTF-8 data, it's only half of it. It's complicated, though, or at least it will be if it is done correctly. All input is UTF-16 LE because that's what AutoIt stores things as internally. I didn't take into consideration that a single UTF-16 character might expand to 2+ UTF-8 characters. That's why you see the length capped at the length of the input string (and why only half the UTF-8 data is present). The problem is, it has to be done this way or the function can't honor the length parameter. The proper fix is to parse the UTF-8 characters in order to return the correct number of characters even if it's a lot more bytes. Rather obvious, but a pain since this was supposed to be a simple function. Ugh. Link to comment Share on other sites More sharing options...
Xenobiologist Posted February 10, 2010 Author Share Posted February 10, 2010 Where did I ignore the(your) guideline? FMPOV, there is a problem with that function. At least, there is a need to update the desciption. Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
jchd Posted February 10, 2010 Share Posted February 10, 2010 I didn't take into consideration that a single UTF-16 character might expand to 2+ UTF-8 characters. Valik, I understand what you mean but this is not correct the way you put it. A codepoint (= a single Unicode character) may need from 1 to 4 ubyte in UTF-8, from 1 to 2 ushort in UTF-16*E and 1 ulong in UTF-32. Here's a visual representation that I found useful when dealing with UTF (comes from in SQLite.c): ** Notes on UTF-8: ** ** Byte-0 Byte-1 Byte-2 Byte-3 _______ Value ** 0xxxxxxx ______________________________ 00000000 00000000 0xxxxxxx ** 110yyyyy 10xxxxxx _____________________ 00000000 00000yyy yyxxxxxx ** 1110zzzz 10yyyyyy 10xxxxxx ____________ 00000000 zzzzyyyy yyxxxxxx ** 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx ___ 000uuuuu zzzzyyyy yyxxxxxx ** ** ** Notes on UTF-16: (with wwww+1==uuuuu) ** ** Word-0 Word-1 Value ** 110110ww wwzzzzyy 110111yy yyxxxxxx ___ 000uuuuu zzzzyyyy yyxxxxxx ** zzzzyyyy yyxxxxxx _____________________ 00000000 zzzzyyyy yyxxxxxx (I had to put underlines so the alignment doesn't vanish.) The proper fix is to parse the UTF-8 characters in order to return the correct number of characters even if it's a lot more bytes. Rather obvious, but a pain since this was supposed to be a simple function. Ugh. I kown! See what Jon just said about it. And, fortunately, we don't even imagine to deal with anything else than C-form normalized strings. Never, ever, let an ounce of normalization stuff get inside AutoIt, let alone grapheme parsing... If you do then I suggest fixing a large amount of padding on the table in front of you right in the spot where you will be repeatedly banging your head*. * Roger Binns on SQLite list, about XML. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Valik Posted February 10, 2010 Share Posted February 10, 2010 It's fine, I've talked to Jon about it and he showed me how to use stuff he's wrote in the past few months to fix the function. However, if there are still problems with multi-word UTF-16 characters that's an AutoIt-wide issue. Link to comment Share on other sites More sharing options...
jchd Posted February 10, 2010 Share Posted February 10, 2010 Thank you for caring. OTOH multi code unit (codepoints > 0x10000) may be seen today as a pedantic "extension" (e.g.: Aztec, Deseret or byzantine music sympols) and indeed many editors (Scite, PsPad, NotePad++) display or rather hash them as per the UCS-2 standard, not less than 12 years old! But investing in support for the full UTF representations is certainly a safe bet for future as we see more blocks in planes 1, 2 and 14 being increasingly used (for instance, large enhancements to unified CJK and use of language tags). Worldwide exchange of documents and need for support of multi-language data[base] can only push the trend forward. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now