Stomp Posted June 7, 2010 Share Posted June 7, 2010 Hello I'm writing my first script with AutoIt and I like very much so far. I've encountered a problem, that I can't solve. The following script converts a string from Unicode to UTF-8 and back. It should produce the same string, but it doesn't. The first ConsoleWrite produces "ウィキペディアはオープンコンテントの百科事典です。" But the second produces "ウィキペディアはオープンコンチEトE百科事Eです". As you can see the produced string is broken. Why does it happen? Here is my script #include <WinAPI.au3> _Main() Func _Main() Local $str = "ウィキペディアはオープンコンテントの百科事典です。" ; a Japanese string ConsoleWrite($str & @CRLF) $str = _WinAPI_WideCharToMultiByte($str, 65001) $str = _WinAPI_MultiByteToWideChar($str, 65001, 0, 1) ConsoleWrite($str & @CRLF) EndFunc I'm on Windows XP SP3. I have AutoIt 3.3.6.1 . Code page for non-unicode applications is set to Japanese. Link to comment Share on other sites More sharing options...
jchd Posted June 7, 2010 Share Posted June 7, 2010 AFAIK the console outputs UTF8 correctly when you set SciTe parameter to Unicode.From SciTe, Options > Open Global option file and from there locate and edit the code.page parameter:# Internationalisation# Japanese input code page 932 and ShiftJIS character set 128#code.page=932#character.set=128# Unicodecode.page=65001 <<-- here#code.page=0#character.set=204I believe that if you send Japanese ANSI (which is a non-Unicode double-byte encoding) to the console you'll see, as your example demonstrate, things interpreted wrongly.So choose a codepage in SciTe and then either send Japanese double-byte ANSI _or_ UTF-8 but you can't have both displayed correctly with the same settings.Also, be sure to work with scripts saved in UTF-8 + BOM encoding to have consistent display. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 8, 2010 Author Share Posted June 8, 2010 ConsoleWrite is not the problem here, I think. The following script, which uses MsgBox instead, has exactly the same problem. I tried doing it directly in C and it works. I also tried using UTF-8 and ANSI for my script and it didn't make a difference. #include <WinAPI.au3> _Main() Func _Main() Local $str = "ウィキペディアはオープンコンテントの百科事典です。" MsgBox(0, "", $str & @CRLF) $str = _WinAPI_WideCharToMultiByte($str, 65001) $str = _WinAPI_MultiByteToWideChar($str, 65001, 0, 1) MsgBox(0, "", $str & @CRLF) EndFunc Link to comment Share on other sites More sharing options...
jchd Posted June 8, 2010 Share Posted June 8, 2010 There's is something wrong here. Can you run the same as this and tell us any difference: I don't have the oriental language pack installed so I can't use ConsoleWrite to write it in Japanese ANSI, but the UTF-8 version works as expected for me as you can see. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 8, 2010 Author Share Posted June 8, 2010 I changed the code page for non-unicode programs back to German and the script worked. When I changed it back to Japanese I had the same problem. It seems AutoIt has problems with the Japanese code page. I tried running this script with Japanese code page #include <WinAPI.au3> _Main() Func _Main() Local $str = "ウィキペディアはオープンコンテントの百科事典です。" ; a Japanese string ConsoleWrite(StringToBinary($str, 4) & @CRLF) $str = _WinAPI_WideCharToMultiByte($str, 65001) ConsoleWrite(Binary($str) & @CRLF) $str = _WinAPI_MultiByteToWideChar($str, 65001, 0, 1) ConsoleWrite(StringToBinary($str, 4) & @CRLF) EndFunc And this is what I got 0xE382A6E382A3E382ADE3839AE38387E382A3E382A2E381AFE382AAE383BCE38397E383B3E382B3E383B3E38386E383B3E38388E381AEE799BEE7A791E4BA8BE585B8E381A7E38199E38082 0xE382A6E382A3E382ADE3839AE38387E382A3E382A2E381AFE382AAE383BCE38397E383B3E382B3E383B3E383 0xE382A6E382A3E382ADE3839AE38387E382A3E382A2E381AFE382AAE383BCE38397E383B3E382B3E383B3E3838145E3838845E799BEE7A791E4BA8B45E381A7E38199 Link to comment Share on other sites More sharing options...
jchd Posted June 8, 2010 Share Posted June 8, 2010 While it's fairly likely than Germany would win over Japan at the soccer worldcup, I see no good reason for the behavior you show. Sorry if I ask again: are you positive that the script was saved in UTF-8 encoding? Note that it's not enough to change it from SciTe, you have to make some change to the file and save it for the new encoding to be effective (a dummy change will do). This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 8, 2010 Author Share Posted June 8, 2010 Yes, I'm sure that the script is in the UTF-8 encoding. When I open it in SciTe it says that it is in the UTF-8 encoding. Link to comment Share on other sites More sharing options...
jchd Posted June 8, 2010 Share Posted June 8, 2010 Sorry for unwanted delay in answering. Can you please tell which precise Japanese codepage you're using? 932 shift_jis 10001 x-mac_japanese (Mac!) 20290 IBM290 (EBCDIC!) 20932 EUC-JP (JIS X 0208-1990 and JIS X 0121-1990) 50220 iso-2022-jp 50221 csISO2022JP 50222 iso-2022-jp (JIS X 0201-1989) 50930 (EBCDIC!) 50931 (EBCDIC!) 50939 51932 euc-jp This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 8, 2010 Author Share Posted June 8, 2010 I meant that the language for non-Unicode programs is set to Japanese in Regional and Language Options in Control Panel. Here is link to a Microsoft page explaining how to set the language. Sorry, if it was not clear what I meant. Link to comment Share on other sites More sharing options...
Stomp Posted June 11, 2010 Author Share Posted June 11, 2010 Can someone reproduce this problem? I can't use AutoIt if this doesn't work. Link to comment Share on other sites More sharing options...
jchd Posted June 11, 2010 Share Posted June 11, 2010 I'm sorry but I just can't reproduce that issue here. I nonetheless believe that the problem you get has a simple solution as there are a significant number of asian users of AutoIt seen in this forum or the chinese forum (most also use a double-byte page code, Big5). Perhaps would you have a better chance of attracting seasonned asian users with a thread subject more explicit about the character set you use. Try openning a new one with "Help with Japanese charset" or something close. I must confess I've no clue at the moment about what happens in your case, albeit I've tackled down some issues with Unicode, UTF-*, asian charsets, ... and AutoIt before. Don't give up too quickly, there _must_ be a way out. As soon as I can in the week-end, I'll try to setup a vanilla XP SP3 x86 machine with asian support enabled and try to find out what the issue is. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 11, 2010 Author Share Posted June 11, 2010 (edited) Thanks, a lot for trying to help me. I tried it once again and I found a workaround. The following script works. _Main() Func _Main() Local $str = "ウィキペディアはオープンコンテントの百科事典です。" ; a Japanese string MsgBox(0, "", $str & @CRLF) $str = _WinAPI_WideCharToMultiByte($str, 65001, 0) $str = _WinAPI_MultiByteToWideChar($str, 65001, 0, 1) MsgBox(0, "", $str & @CRLF) EndFunc I changed the call to _WinAPI_WideCharToMultiByte to return a struct instead of the string. When I pass a multi byte string to C function I have to declare the argument as "ptr" and not as "str" and pass a pointer to the struct with DllStructGetPtr. Edited June 11, 2010 by Stomp Link to comment Share on other sites More sharing options...
jchd Posted June 11, 2010 Share Posted June 11, 2010 You're right that an "str" argument to DllCall will be silently converted to ANSI by AutoIt. The actual issue here is that you UTF-8 output from the first conversion was converted to ANSI codepage double-byte (which I'm not big fan). I've had this "str" type problem as well in the SQLite UDF, which uses its own wide-to-multibyte routines (using structs). But what I don't see is how your last script differs from the previous one(s). $str = _WinAPI_WideCharToMultiByte($str, 65001, 0) is equivalent to $str = _WinAPI_WideCharToMultiByte($str, 65001) as the third parameter (conversion flags) defaults to 0. The _fourth_ parameter is a switch to return string or struct. I believe you used that instead: $str = _WinAPI_WideCharToMultiByte($str, 65001, 0, 0) Anyway, glad to know you have things working now and don't hesitate to post again if you encounter other Unicode issue. BTW you may want to know that current AutoIt only handles the UCS-2 character set, rather than genuine UTF-16LE. In practice, that means that only the Unicode plane 0 characters can be dealt with in AutoIt. In other words, native AutoIt strings consist of 16-bit encoding units, with one unit = one character. Unicode characters in upper planes needs two 16-bit units to represent the full Unicode range. While most characters in those upper planes are not widely used, there are recent additions that map chinese extensions there. As a consequence, they can't be used with today's AutoIt and if you have data source using charaters in that range, you'll need to find workarounds to avoid data loss. I'm unable to tell you how widely used are those problematic ranges today. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Stomp Posted June 11, 2010 Author Share Posted June 11, 2010 You're right that an "str" argument to DllCall will be silently converted to ANSI by AutoIt. The actual issue here is that you UTF-8 output from the first conversion was converted to ANSI codepage double-byte (which I'm not big fan). I've had this "str" type problem as well in the SQLite UDF, which uses its own wide-to-multibyte routines (using structs). But what I don't see is how your last script differs from the previous one(s). $str = _WinAPI_WideCharToMultiByte($str, 65001, 0) is equivalent to $str = _WinAPI_WideCharToMultiByte($str, 65001) as the third parameter (conversion flags) defaults to 0. The _fourth_ parameter is a switch to return string or struct. I believe you used that instead: $str = _WinAPI_WideCharToMultiByte($str, 65001, 0, 0) Yes, that's what the documention says, but the function is declared like with in WinAPI.au3. Func _WinAPI_WideCharToMultiByte($pUnicode, $iCodePage = 0, $bRetString = True) Maybe AutoIt shouldn't convert multibyte string to ANSI, that would make things much easier. Link to comment Share on other sites More sharing options...
jchd Posted June 11, 2010 Share Posted June 11, 2010 (edited) Geez, good catch. I didn't use this function myself and never got into this trap. I'll take care of reporting/fixing this.Edit: that's now ticket #1671. Edited June 11, 2010 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now