Jump to content

Recommended Posts

Posted (edited)

The docs say that native AutoIt strings use UCS-2. 

But when I run this code:

$sString1 = "Hello"
MsgBox(0, '', $sString1)

$sString2 = $sString1 & " こんにちは"
MsgBox(0, '', $sString2)

I get this:

image.png.563b1a1299b68195cd3b5389bdcd8106.png

So isn't the string in UTF8 already and not USC-2?

Also, referring back to those linked docs:

image.thumb.png.c23cea7986b05aa260b4b82d5e652b4f.png

But isn't $sMyString already in UTF8? It seems that $sUTF8String is actually a conversion to ANSI since the output gives:

image.png.9595cdbd8856d07af36d5ddaeffd3701.png

Which makes sense because BinaryToString() is using option 1 for ANSI.


So does the latest version of AutoIt run in UTF8+BOM by default and the docs are wrong? Or am I not understating what a "native" string is in AutoIt?

Edited by lowbattery
  • lowbattery changed the title to Is AutoIt UTF8 and Not UCS-2?
Posted

In the source file, the string "こんにちは" is UTF-8. At runime, the string is read and converted to UCS-2 for memory storage and use by Windows primitives.

UCS-2 is the restriction of UTF16-LE to the first 64K codepoints aka BMP (but doesn't handle surrogates  per se). Yet you can still compose a native AutoIt3 string having codepoints beyond U+FFFF by entering UTF8 sequences with surrogates embedded. The UTF16-LE Windows renderer will detect them and render the upper-planes codepoints, well provided the font chosen provides data for those codepoints. The caveat is that the couple surrogate+codepoint counts for 2 characters in AutoIt string functions.

For instance, this is a string of Phoenician codepoints which can be rendered correctly using DejaVu or Segoe UI Historic fonts. You can see that the string is seen by AutoIt functions as 56 "characters", while it has only 28 codepoints (each with its surrogate). The Phoenician codepoint range is U+10900 - U+1091F.

Local $s = "𐤐𐤁𐤕𐤃𐤈𐤊𐤂𐤒𐤀𐤖𐤚𐤛𐤎𐤆𐤑𐤔𐤇𐤏𐤄𐤗𐤘𐤙𐤌𐤍𐤅𐤓𐤋𐤉"
_ArrayDisplay(StringToASCIIArray($s), "Length = " & StringLen($s))

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted (edited)
11 hours ago, lowbattery said:

But isn't $sMyString already in UTF8? It seems that $sUTF8String is actually a conversion to ANSI since the output gives:

You are confusing two situations:
1) a string in an UTF8 source file: it will be decoded and stored as UCS2 in memory at runtime, then processed as UTF16-LE by OS primitives.
2) an UTF8 string stored in memory, possibly sent by or to be fed to an external process: this is a string of bytes, not UCS2 encoding units. To be correctly decoded and processed by OS string primitives and rendered, it needs to be converted to UCS2 and then seen as UTF16-LE. Alternatively, if you need to send a native AutoIt string to an external process requiring UTF8 data, then the conversion is applicable.

I'm the author of this text in help and I'm not a native english speaker; if you find that some wording needs rework/clarification, just propose.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted (edited)

Thank you for your response jchd. It is much appreciated!

I have one follow-up question based on what you said.

6 hours ago, jchd said:

2) an UTF8 string stored in memory, possibly sent by or to be fed to an external process: this is a string of bytes, not UCS2 encoding units. To be correctly decoded and processed by OS string primitives and rendered, it needs to be converted to UCS2 and then seen as UTF16-LE. Alternatively, if you need to send a native AutoIt string to an external process requiring UTF8 data, then the conversion is applicable.

I receive JSON via a CLI curl which was sent to my script in UTF8 format. Upon receipt, I have been running BinaryToString($sReceivedData, $SB_UTF8). So it is correct to assume that the data is in UTF8 format upon using the BinaryToString() conversion I just mentioned? It seems that way.
 

6 hours ago, jchd said:

I'm the author of this text in help and I'm not a native English speaker; if you find that some wording needs rework/clarification, just propose.

Your English is better than mine and I'm a native speaker! But the proposal would be:

If you only need to convert UTF8 to/from native AutoIt strings you can use this

$sUTF8String = "Hello Χαίρετε こんにちは Привет xin chào हैलो مرحبا 你好 שלום வணக்கம்"
$sANSIString = BinaryToString(StringToBinary($sUTF8String & @LF, 4), 1)

; reverse conversion:
$sBackToUTF8 = BinaryToString(StringToBinary($sUTF8String & @LF, 1), 4)

Is the @LF needed in StringToBinary() as it seems to work without it? But I'm also not very smart lol, so I probably am missing something.

Also, is $sANSIString basically UCS2? I don't see any conversions functions to/from UCS2, but again, not very smart, so I'm probably missing something.

Edited by lowbattery
  • Solution
Posted
1 hour ago, lowbattery said:

So it is correct to assume that the data is in UTF8 format?

Yes: you received a string of bytes (each UTF8 character is 1 to 4 bytes) which needs  converting into UCS2 for AutoIt processing.

 

1 hour ago, lowbattery said:

But is the @LF needed in StringToBinary() as it seems to work without it?

No, it's a remnant of displaying the thing. My bad.

 

1 hour ago, lowbattery said:

Also, is $sANSIString UCS2? I don't see any conversions functions to/from UCS2

In BinaryToString and StringToBinary, the part String refers to "native UCS2 AutoIt string", Binary refers to "string of bytes using this or that codepage".

You may find clearer the code exemple found just before showing _StringToCodepage() and _CodepageToString() in that help text.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...