Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 07/23/2022 in all areas

  1. ConsoleWrite() silently "converts" Unicode text to ANSI, replacing almost all non-ANSI characters by question marks. This doesn't work fairly with non-latin languages. Below CW() is a homebrew Unicode-aware ConsoleWrite(): ; Mixed language strings $s = "Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة" CW($s) ConsoleWrite($s & @LF) ; A familly with different Fitzpatrick settings = only one glyph $s = "Our familly " & ChrW(0xD83D) & ChrW(0xDC68) & ChrW(0xD83C) & ChrW(0xDFFB) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC69) & ChrW(0xD83C) & ChrW(0xDFFF) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC66) & ChrW(0xD83C) & ChrW(0xDFFD) CW($s) ConsoleWrite($s & @LF) Result: Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة ??????? ???????? ??? ???? ?????? ????? ????? Our familly 👨🏻‍👩🏿‍👦🏽 Our familly ?????????????? I don't know which charset this legacy version of ACDSee handles for filenames. You can remove emojis or a wider range of Unicode charset explicitely using a regexp. BUT there is a pitfall however: AutoIt charset is UCS2, a limitation of Unicode UTF16 to the BMP (Unicode plane 0) using 16-bit encoding units. But there is more: Unicode codepoints in planes 1..16 use surrogate values to represent. For instance 😭 is represented in UCS2 (AutoIt string) as ChrW(0xD83D) & ChrW(0xDE2D). You might think: pretty easy, just use a regexp pattern to match and replace these values, using StringRegExpReplace($s, "[\x{D800}-\x{DFFF}]", "-") NO! Just because PCRE (the regexp engine used by AutoIt) invokation internally merges the two surrogates into the actual 😭 codepoint 0x1F62D (LOUDLY CRYING FACE). This will replace all series of non-BMP codepoints by an underscore: $s = "Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title 😭" & @LF $s &= "Our familly " & ChrW(0xD83D) & ChrW(0xDC68) & ChrW(0xD83C) & ChrW(0xDFFB) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC69) & ChrW(0xD83C) & ChrW(0xDFFF) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC66) & ChrW(0xD83C) & ChrW(0xDFFD) CW($s) $t = StringRegExpReplace($s, "[\x{10000}-\x{1FFFF}]+", "_") CW($t) Result: Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title 😭 Our familly 👨🏻‍👩🏿‍👦🏽 Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title _ Our familly _‍_‍_ Note that in the last line, there are 3 "people" joined with ChrW(0x200D) [Zero Width Joiner] hence three underscores. Yet I suspect that your image viewer will bark at codepoints outside the default 8-bit codepage of your system. If you still get question marks in the last example above, then your only bet is to correctly convert characters into their 8-bit codepage counterpoint, or by a useable substitution character when impossible. Func _StringToCodepage($sStr, $iCodepage = Default) If $iCodepage = Default Then $iCodepage = 65001 ; or Int(RegRead("HKLM\SYSTEM\CurrentControlSet\Control\Nls\Codepage", "OEMCP")) Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, "int", StringLen($sStr), _ "ptr", 0, "int", 0, "ptr", 0, "ptr", 0) Local $tCP = DllStructCreate("char[" & $aResult[0] & "]") $aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, "int", StringLen($sStr), _ "struct*", $tCP, "int", $aResult[0], "ptr", 0, "ptr", 0) Return DllStructGetData($tCP, 1) EndFunc ;==>_StringToCodepage Invoke this conversion function with the codepage ID which suits your needs. See https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers
    1 point
  2. New version in first post Version: 2022.07.22 - Support scripts with the same name but different content in different directories.
    1 point
×
×
  • Create New...