Search the Community
Showing results for tags 'Unreadable'.
-
_WinGetTextFixed & _StringFixANSIInWide Update: Text problems no longer exist as of v3.3.9.11 Sick of getting rows of '?' from WinGetText, ControlGetText, and other AutoIt functions? The problem lies in the way the text is encoded - AutoIt 'sees' a Windows wide string (UTF-16), but in reality it grabs a string of 8-bit ANSI characters and shoves them into a 16-bit wide string. Here's a small example: Original string "x4" with null-term as an ANSI string sequence: 0x78 0x34 0x00 Same string interpreted as Unicode (reversal is due to little-endian encoding): 0x3478 0x00 The latter string produces the Unicode character "㑸" (hopefully that shows up properly as an asian character) So, as you can see, from the software side, "x4" compacted into a UTF-16 string still produces an acceptable string, even though it wasn't intended to be an asian character. Now, where the '?'s show up is during conversion of the same string back to ANSI. You can see the result of this by doing a ConsoleWrite(), which itself causes an implicit Wide-ANSI conversion to occur. Also, pasting text into another text editor that's using an ANSI code page will give the same results. (A workaround for the conversion is to set the encoding to Unicode) Of course, even if you can display the wide characters correctly, it's not the intended result - you want what the window in question produces. Plus, there are many invalid UTF-16 code points that can be produced, and certain values will be interpreted as a pair of UTF-16 code points.. which just gets messy. So how do we fix this? The easiest workaround, when the format of the string is known to be ANSI beforehand is: $sStr = BinaryToString(StringToBinary($sStr, 2), 1) However, determining whether a window or control will give you ANSI or Wide characters is another issue.. I had written the _StringFixANSIInWide function below to try and detect ANSI-in-Wide situations, and return the corrected string.. Unfortunately its a bit naive in its implementation (see update next paragraph), but it works okay for most situations. I can see it failing only when a Wide string contains nothing but >8-bit characters (usually there is a lot of 7-bit ASCII characters encoded in the lower part of true Wide strings). In theory that is an extremely unlikely situation (unless there are no ASCII characters in a string). Update - _WinGetTextFixed() alternative: Now I've identified the core issue is with the call to SendMessage (using WM_GETTEXT): The function will return either ANSI or Unicode (wide) characters, and its up to our code to determine, based on the length, whether it is an ANSI or wide string. We can then return the correct string. See _WinGetTextFixed() below. Note that AutoIt's WinGetText() function returns text for more than just the window - it returns text for various controls also. While this can be beneficial, it also introduces issues when different controls return Wide or ANSI characters. The result is basically a contaminated string, meaning it packs ANSI characters into UTF-16 sequences and includes legitimate Wide characters in the same string. This is unacceptable and really hard to work with. That's a reason there's now a 'force conversion' parameter in _StringFixANSIInWide. I would really really emphasize that it's best to not use WinGetText() and instead opt for _WinGetTextFixed instead. I may suggest this behavior be fixed in the AutoIt source code, but it looks like all the Devs are gone Anyway, enjoy! Example: Notepad 2 $hWnd = WinGetHandle("[CLASS:Notepad2]") $hControl = ControlGetHandle($hWnd,"","[CLASS:Scintilla]") ; Previous fix: ;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl)) $sText = _WinGetTextFixed($hControl) ConsoleWrite("Text:"&$sText&@CRLF) Example: SciTE $hWnd = WinGetHandle("[CLASS:SciTEWindow]") $hControl=ControlGetHandle($hWnd,"","[CLASS:Scintilla; INSTANCE:1]") ; Previous fix: ;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl)) $sText = _WinGetTextFixed($hControl) ConsoleWrite("Text:"&$sText&@CRLF) Example: Programmer's Notepad 2 $hWnd = WinGetHandle("[REGEXPTITLE:Programmer's Notepad]") $hControl = ControlGetHandle($hWnd, "", "[CLASS:ScintillaWindowImpl; INSTANCE:1]") ; Previous fix: ;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl)) $sText = _WinGetTextFixed($hControl) ConsoleWrite("Text:"&$sText&@CRLF) ; =================================================================================================== ; Func Func _WinGetTextFixed($hWnd) ; ; Function to get Text of a window or control ; This is an alternative to AutoIt's 'WinGetTitle', 'WinGetText', and 'ControlGetText', ; which have issues with reading ANSI text from some windows ; ; Author: Ascend4nt ; =================================================================================================== Func _WinGetTextFixed($hWnd) If Not IsHWnd($hWnd) Then Return SetError(1,0,'') Local $aRet, $stWideText, $stANSIText, $sText Local $nGetTextLen, $nHalfLen ; WM_GETTEXTLENGTH 0x0E $aRet = DllCall("user32.dll", "long", "SendMessageW", "hwnd", $hWnd, "uint", 0x0E, "wparam", 0, "lparam", 0) If @error Then Return SetError(2, @error, '') If Not $aRet[0] Then Return SetError(3, 0, '') $nGetTextLen = $aRet[0] ;~ ConsoleWrite("WM_GETTEXTLENGTH return:"&$nGetTextLen&@CRLF) ; Create a union structure, add 2 characters - 1 for null-term, 1 to handle odd-count cases $stWideText = DllStructCreate("wchar["&$nGetTextLen + 2&"]") If @error Then Return SetError(4, 0, '') $stANSIText = DllStructCreate("char["&($nGetTextLen+2)*2&"]", DllStructGetPtr($stWideText)) ; WM_GETTEXT $aRet = DllCall("user32.dll", "long", "SendMessageW", "hwnd", $hWnd, "uint", 0x0D, "wparam", $nGetTextLen + 1, "ptr", DllStructGetPtr($stWideText)) If @error Then Return SetError(2, @error, '') If Not $aRet[0] Then Return SetError(3, 0, '') $nGetTextLen = $aRet[0] ; Get text as WIDE characters 1st $sText = DllStructGetData($stWideText, 1) ;~ ConsoleWrite("$nGetTextLen = "&$nGetTextLen&", $nHalfLen = "&$nHalfLen&", StringLen() = "&StringLen($sText)&@CRLF) ; Determine if the wide string length is half the supposed returned text length ; - If so, it's an ANSI string $nHalfLen = ($nGetTextLen + BitAND($nGetTextLen, 1) ) / 2 If (StringLen($sText) - $nHalfLen < 2) Then ; Retrieve text correctly as ANSI $sText = DllStructGetData($stANSIText, 1) EndIf Return $sText EndFunc ; ====================================================================================================== ; Func _StringFixANSIInWide($sStr, $bForceCnvt = False) ; ; Function to fix a common issue where ANSI text is embedded in UTF-16 strings ; Problem occurs in 'WinGetText', 'ControlGetText', 'WinGetTitle' ; and some COM functions using 'bstr' types ; ; Easiest method, when you *know* the text is ANSI: ; BinaryToString(StringToBinary($sStr, 2), 1) ; ; *However*, if it is unknown what the string holds, we need to look ; for null characters (0's) in the string ; ; Alternatives:'WideCharToMultiByte' API call, which does the same replacements as below ; However, on Vista+, WC_ERR_INVALID_CHARS can be used to error-out on illegal characters ; ; Author: Ascend4nt ; ====================================================================================================== Func _StringFixANSIInWide($sStr, $bForceCnvt = False) If $sStr = '' Then Return '' Local $nLen, $stStrVer, $stBinVer, $sTmp, $nReplacements ; This fails to work in many mixed-ANSI/UTF-16 scenarios (as seen in WinGetText): ;~ If $bForceCnvt Then ;~ Return BinaryToString(StringToBinary($sStr, 2), 1) ;~ EndIf $nLen = StringLen($sStr) ; Struct for string (+1 for null-term) $stStrVer = DllStructCreate("wchar [" & $nLen + 1 & "]") ; Create a union, granting us binary 1-byte access to the wide chars $stBinVer = DllStructCreate("byte [" & $nLen * 2 & "]", DllStructGetPtr($stStrVer)) ; Set String in structure DllStructSetData($stStrVer, 1, $sStr) ; Load string as binary data, convert to ANSI string ; AND Replace 0's with 0xFFFD (the Unicode 'REPLACEMENT CHARACTER') $sTmp = StringReplace(BinaryToString(DllStructGetData($stBinVer, 1)), ChrW(0), ChrW(0xFFFD), 0, 2) $nReplacements = @extended ; Trim off null-terminator and any other trailing 0's at the end (all converted to 0xFFFD's) While (StringRight($sTmp, 1) = ChrW(0xFFFD)) $sTmp = StringTrimRight($sTmp, 1) $nReplacements -= 1 WEnd ; If no replacements remaining, then every byte contains data, so its a safe bet the string is ANSI ; Also, in mixed-ANSI/UTF-16 situations (sometimes seen in WinGetText), allow a force If ($nReplacements = 0 Or $bForceCnvt) Then Return $sTmp ; Same result as: ;Return BinaryToString(StringToBinary($sStr, 2), 1) Else Return $sStr EndIf EndFunc ;==>_StringFixANSIInWide *updates: - Added a 'force' parameter for scenarios where WinGetText() will return a mix of ANSI and Unicode text in the same string. The result will contain some '?'s in these scenarios, but there's really nothing you can do without modifying the AutoIt source code. update: problem no longer exists as of v3.3.9.11 (see BugTracker ticket # 2362)