ConsoleWrite() silently "converts" Unicode text to ANSI, replacing almost all non-ANSI characters by question marks. This doesn't work fairly with non-latin languages. Below CW() is a homebrew Unicode-aware ConsoleWrite():
; Mixed language strings
$s = "Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة"
CW($s)
ConsoleWrite($s & @LF)
; A familly with different Fitzpatrick settings = only one glyph
$s = "Our familly " & ChrW(0xD83D) & ChrW(0xDC68) & ChrW(0xD83C) & ChrW(0xDFFB) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC69) & ChrW(0xD83C) & ChrW(0xDFFF) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC66) & ChrW(0xD83C) & ChrW(0xDFFD)
CW($s)
ConsoleWrite($s & @LF)
Result:
Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة
??????? ???????? ??? ???? ?????? ????? ?????
Our familly 👨🏻👩🏿👦🏽
Our familly ??????????????
I don't know which charset this legacy version of ACDSee handles for filenames. You can remove emojis or a wider range of Unicode charset explicitely using a regexp.
BUT there is a pitfall however: AutoIt charset is UCS2, a limitation of Unicode UTF16 to the BMP (Unicode plane 0) using 16-bit encoding units. But there is more: Unicode codepoints in planes 1..16 use surrogate values to represent. For instance 😭 is represented in UCS2 (AutoIt string) as ChrW(0xD83D) & ChrW(0xDE2D).
You might think: pretty easy, just use a regexp pattern to match and replace these values, using StringRegExpReplace($s, "[\x{D800}-\x{DFFF}]", "-")
NO! Just because PCRE (the regexp engine used by AutoIt) invokation internally merges the two surrogates into the actual 😭 codepoint 0x1F62D (LOUDLY CRYING FACE).
This will replace all series of non-BMP codepoints by an underscore:
$s = "Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title 😭" & @LF
$s &= "Our familly " & ChrW(0xD83D) & ChrW(0xDC68) & ChrW(0xD83C) & ChrW(0xDFFB) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC69) & ChrW(0xD83C) & ChrW(0xDFFF) & ChrW(0x200D) & ChrW(0xD83D) & ChrW(0xDC66) & ChrW(0xD83C) & ChrW(0xDFFD)
CW($s)
$t = StringRegExpReplace($s, "[\x{10000}-\x{1FFFF}]+", "_")
CW($t)
Result:
Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title 😭
Our familly 👨🏻👩🏿👦🏽
Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة Test Title _
Our familly ___
Note that in the last line, there are 3 "people" joined with ChrW(0x200D) [Zero Width Joiner] hence three underscores.
Yet I suspect that your image viewer will bark at codepoints outside the default 8-bit codepage of your system. If you still get question marks in the last example above, then your only bet is to correctly convert characters into their 8-bit codepage counterpoint, or by a useable substitution character when impossible.
Func _StringToCodepage($sStr, $iCodepage = Default)
If $iCodepage = Default Then $iCodepage = 65001 ; or Int(RegRead("HKLM\SYSTEM\CurrentControlSet\Control\Nls\Codepage", "OEMCP"))
Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, "int", StringLen($sStr), _
"ptr", 0, "int", 0, "ptr", 0, "ptr", 0)
Local $tCP = DllStructCreate("char[" & $aResult[0] & "]")
$aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, "int", StringLen($sStr), _
"struct*", $tCP, "int", $aResult[0], "ptr", 0, "ptr", 0)
Return DllStructGetData($tCP, 1)
EndFunc ;==>_StringToCodepage
Invoke this conversion function with the codepage ID which suits your needs. See https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers