Search the Community
Showing results for tags 'utf-8'.
-
I am trying to find information on using UTF-8 Strings in AutoIt. After searching extensively I cannot find anything conclusive on this topic. What I need to do is FileRead() into a String variable(or Array) and keep the UTF-8 Encoding. Some articles, and even Help documents on FileOpen() suggest that AutoIT (Current Versions) can read and store UTF-8 internally but my tests on reading a test web page containing UTF-8 encoded characters into a variable fails. Does/Can AutoIt use Strings Encoded as UTF-8, and if so how ? If Not does anyone know of a UDF, or a C/Win-API routine to allow to use a UTF-8 Array in AutoIt ? What does AutoIt use internally for Strings ? Is it converting the UTF-8 file to UCS-2 String in the Variable ? The following is an example which fails for me. ;UTF-8 Tests #include <FileConstants.au3> #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> ;https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html ;Also all checked in Notepad++ UTF-8 Encoding (Many Characters are scrambled) Local $sFile1 = "UTF-8 test file.htm"; 414 Lines | 76,412 characters. "UTF-8 test file.htm" = "/UTF-8-demo.html" Local $sFile2 = "test2.html" Local $hfile1 = FileOpen($sFile1, BitOr($FO_READ, $FO_UTF8_NOBOM)) If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen1", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf Local $sAm_I_UFT_8 = FileRead($hfile1, -1);Does not appear to read UTF-8 characters correctly from the "UTF-8 test file.htm" If @error Then MsgBox($MB_SYSTEMMODAL, "FileRead", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileClose($hfile1) Local $sAm_I_Still_UTF_8 = $sAm_I_UFT_8 ;Are these two strings stored internaly as UTF-8 ? If @error Then MsgBox($MB_SYSTEMMODAL, "String=String", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf Local $iStrLen1 = StringLen($sAm_I_UFT_8) Local $iStrLen2 = StringLen($sAm_I_Still_UTF_8) MsgBox($MB_SYSTEMMODAL, "String Lenght of $sAm_I_UFT_8", $iStrLen1); 414 Lines | 70,174 characters MsgBox($MB_SYSTEMMODAL, "String Lenght of $sAm_I_Still_UTF_8", $iStrLen2); 414 Lines | 70,174 characters Local $hfile2 = FileOpen($sFile2, BitOR($FO_OVERWRITE, $FO_BINARY)) If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen2", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileWrite($hfile2, $sAm_I_Still_UTF_8) ;If $sAm_I_Still_UTF_8 is actual UTF-8 it should be an exact copy of the original "UTF-8 test file.htm" If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen2", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileClose($hfile2)
-
I need to read log files into an array to search for errors. However when I display the array I get garbage or "chinese characters". Our developers say they are using UTF-8, but FileGetEncoding says the logs are "2048" or $FO_UTF16_BE_NOBOM (2048) = Use Unicode UTF16 Big Endian (without BOM) from the Encoding codes in FileOpen(). There is an app called Detenc that detects the encoding used by files. You have to guess, but it returns correctly when I set the Encoder for UTF-8. I understand Encoding is not etched in stone, but the first character of the file is a capital B, using HxD Hex Editor. I even have another topic here about running PowerShell to reencode the file so AutoIt will store the file properly in the array - See: So I am trying to figure out why AutoIt thinks my logs are not UTF-8. Here is sample code: #include <array.au3> #include <File.au3> Local $aRetArrayFile _FileReadToArray("C:\Logs\Myplayer1.log", $aRetArrayFile) _ArrayDisplay($aRetArrayFile) I won't post the results as it is illegible, but I did attach a screenshot of the _ArrayDisplay results, and this is the first line of the Log file: BANNER 10/10/2017 15:56:00 ====================================================================== And the Hex from the beginning of the file: 42 41 4E 4E 45 52 20 31 30 2F 31 30 2F 32 30 31 37 20 31 34 3A 33 31 3A 33 35 20 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 0D 0A 42 41 4E 4E 45 52 20 So I don't understand why AutoIt thinks the file is UTF16 BE. If I can get the Powershell script running, I have a workaround. BTW none of my other arrays display as garbage, just the log files. Weird. Rereading my post, what seems to be missing is the question. I guess my question is, does anyone know why these logs are being displayed incorrectly? Cheers Jibs
-
I need help with unicode char ü I get some text from online json but if try to read 4 example Zürich I heave Zürich. How can I convert with autoit unicode to a clear character readable? thx
-
Hello, i need to save files with ANSI-Encoding. Since 3.3.14.2 Auto-It it doesn't work in any direction. I tried the following: #include <FileConstants.au3> FileDelete(@ScriptDir&"\Test.txt") $o = FileOpen(@ScriptDir&"\Test.txt", BitOR($FO_BINARY,$FO_ANSI,$FO_OVERWRITE)) FileWrite($o, "Test") FileClose($o) Or #include <FileConstants.au3> FileDelete(@ScriptDir&"\Test.txt") $o = FileOpen(@ScriptDir&"\Test.txt", 514) FileWrite($o, "Test") FileClose($o) Both create UTF-8 encoded files. What am i doing wrong? Thank you!
-
Hello! I've been lurking around for a loooong time... and I decided to finally share a little. I do a lot of internet stuff, mostly machine to machine for work (instrumentation) so I have quite a few "RFC" scripts. Disclaimer these work for me... but I sometime use... "shortcuts" based on my particular requirement. An example, the Base64 encoding snippet might not be too good for binary data. I pad the original data with spaces to avoid the "==" padding of base64. So... first is the base64 encoding snippet. It is not in a function, it was in a sequential program, used only once! It encode $Graph to $SMTPMessage: ; Create the base64 encoding table Dim $Base64EncodingTable[0] For $Cpt = Asc("A") to Asc("Z") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next For $Cpt = Asc("a") to Asc("z") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next For $Cpt = Asc("0") to Asc("9") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next _ArrayAdd($Base64EncodingTable, "+") _ArrayAdd($Base64EncodingTable, "/") ; Pad the SVG Graph to attach with space(s). Lazy way to avoid base64 == pading While Mod(StringLen($Graph), 3) <> 0 $Graph &= " " WEnd ; Start from the first character $Cpt = 1 Do ; Extract the 3 characters to encode $Char1 = Asc(StringMid($Graph, $Cpt, 1)) $Char2 = Asc(StringMid($Graph, $Cpt+1, 1)) $Char3 = Asc(StringMid($Graph, $Cpt+2, 1)) ; Encode them to 4 characters $SMTPMessage &= $Base64EncodingTable[BitShift(BitAND($Char1, 252), 2)] $SMTPMessage &= $Base64EncodingTable[BitShift(BitAND($Char1, 3), -4) + BitShift(BitAND($Char2, 240), 4)] $SMTPMessage &= $Base64EncodingTable[BitShift(BitAND($Char2, 15), -2) + BitShift(BitAND($Char3, 192), 6)] $SMTPMessage &= $Base64EncodingTable[BitAND($Char3, 63)] ; Increment the counter, and if required, add a @CRLF to split in multiples lines $Cpt += 3 If Mod($Cpt, 57) = 1 Then $SMTPMessage &= @CRLF ; Do this until all the graph has been encoded Until $Cpt >= StringLen($Graph) Second... I just finished this one and was allready thinking about sharing it... so it's been encapsulated into function a bit more. I use it to decode email subjects in a system where you can update something by email. I separated the Base64Decode function so it can be grabbed more easily. Please note that it return an hex string so you would still need to convert it if it's a string with BinaryToString or whatever suit your needs. If can be copied as is and runned directly... it include my test strings! (Yes... I'm french!) ; For the $SB_UTF8 and $SB_ANSI Variable #include <StringConstants.au3> ; For _ArrayAdd and _ArraySearch used in the Base64 decoder #include <Array.au3> ; Various test sentences... ;$text = "=?UTF-8?Q?Ce=c3=a7i_est_un_autre_test!_h=c3=a9h=c3=a9!?=" ; Normal UTF-8 ;$text = "=?UTF-8?Q?Encore_=3d_un_autre_test_=c3=a9_?=" ; "=" added ;$text = "=?UTF-8?Q?un_autre_test_=5f_=c3=a9?=" ; "_" added ;$text = "=?UTF-8?B?Q2XDp2kgZXN0IHVuIGF1dHJlIHRlc3QhID0gXyBow6low6kh?=" ; UTF-8 Base64 $text = "=?UTF-8?B?ZcOnaSBlc3QgdW4gYXV0cmUgdGVzdCEgPSBfIGjDqWjDqSE=?=" ; UTF-8 Base64 with padding ;$text = "=?iso-8859-1?Q?Ce=E7i_est_un_test!?=" ; iso-8859-1 MsgBox(0, "", DecodeHeader($text)) Func DecodeHeader($lString) ; Check and store encoding type If StringInStr($lString, "?Q?") Then ; Quoted printable content $lType = "?Q?" ElseIf StringInStr($lString, "?B?") Then ; Base64 encoding $lType = "?B?" Else ; No encoding (or unknown encoding) return($lString) EndIf ; Start of the charset string $lStart = StringInStr($lString, "=?") + 2 ; End of the charset string $lStop = StringInStr($lString, $lType) ; Charset variable, storing "UTF-8" or "iso-8859-1" $lEncoding = StringMid($lString, $lStart, $lStop-$lStart) ; Change encoding type for the BinaryToString flag If $lEncoding = "UTF-8" Then $lEncoding = $SB_UTF8 ElseIf $lEncoding = "iso-8859-1" Then $lEncoding = $SB_ANSI Else MsgBox(0, "", "Unknown character set") Exit EndIf ; Start of the actual encoded content $lStart = $lStop + 3 ; End of the actual encoded content $lStop = StringInStr($lString, "?=") ; Actual content to decode $lString = StringMid($lString, $lStart, $lStop-$lStart) ; For Quoted printable content If $lType == "?Q?" Then ; Restore underscore encoded spaces $lString = StringReplace($lString, "_", " ") ; Starting with the first character of the string $lCpt = 1 ; "=XX" search and convert loop While 1 ; There will be 0 characters to convert in that block unless... $lConvertableLenght = 0 ; That character, and another one 3 bytes over... and the next, and the next... For $lCpt2 = 0 to 100 ; Is equal to "=" If StringMid($lString, $lCpt+($lCpt2*3), 1) == "=" Then ; In that case, yes, we will have to convert 3 more characters $lConvertableLenght += 3 Else ; But if we fail to find or reach the end of a block of encoded characters, exit the search ExitLoop EndIf Next ; If we did in fact find some encoded characters If $lConvertableLenght > 0 Then ; Extract that block of encoded characters $lConvertableString = StringMid($lString, $lCpt, $lConvertableLenght) ; Convert it $lConvertedString = BinaryToString("0x" & StringReplace($lConvertableString, "=", ""), $lEncoding) ; Replace it in the original $lString = StringReplace($lString, $lConvertableString, $lConvertedString) EndIf ; Increment the "=XX" search and convert loop counter $lCpt += 1 ; If we reached the end of the string, exit the "=XX" search and convert loop If $lCpt >= StringLen($lString) Then ExitLoop ; Continue searching in the "=XX" search and convert loop WEnd ; For Base64 encoded strings Else ; Use the separate Base64Decode function $lString = Base64Decode($lString) $lString = BinaryToString($lString, $lEncoding) EndIf return($lString) EndFunc Func Base64Decode($lEncoded) ; Create the base64 encoding table Dim $Base64EncodingTable[0] For $Cpt = Asc("A") to Asc("Z") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next For $Cpt = Asc("a") to Asc("z") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next For $Cpt = Asc("0") to Asc("9") _ArrayAdd($Base64EncodingTable, Chr($Cpt)) Next _ArrayAdd($Base64EncodingTable, "+") _ArrayAdd($Base64EncodingTable, "/") ; Start from the first character $Cpt = 1 $Decoded = "0x" Do ; Extract the 4 characters to encode $Char1 = StringMid($lEncoded, $Cpt, 1) $Char2 = StringMid($lEncoded, $Cpt+1, 1) $Char3 = StringMid($lEncoded, $Cpt+2, 1) $Char4 = StringMid($lEncoded, $Cpt+3, 1) ; Decode them $Decoded &= Hex(BitShift(_ArraySearch($Base64EncodingTable, $Char1, 0, 0, 1), -2) + BitShift(BitAnd(_ArraySearch($Base64EncodingTable, $Char2, 0, 0, 1), 48), 4), 2) If $Char3 <> "=" Then $Decoded &= Hex(BitShift(BitAnd(_ArraySearch($Base64EncodingTable, $Char2, 0, 0, 1), 15), -4) + BitShift(BitAnd(_ArraySearch($Base64EncodingTable, $Char3, 0, 0, 1), 60), 2), 2) If $Char4 <> "=" Then $Decoded &= Hex(BitShift(BitAnd(_ArraySearch($Base64EncodingTable, $Char3, 0, 0, 1), 3), -6) + _ArraySearch($Base64EncodingTable, $Char4, 0, 0, 1), 2) ; Increment the counter $Cpt += 4 ; Do this until all the encoded string has been decoded Until $Cpt >= StringLen($lEncoded) return($Decoded) EndFunc Last thing... I may update it into a better format for you, like a standalone telnet program with GUI. It is my telnet options negociations loops. The basic concept is systematically deny all request for special options and keep it "raw". If server says "Will", I reply "Don't". If it says "Do", I reply "Wont"... unless it's the terminal type subnegociation, in which case I reply xterm! $Data for now needs to be Global. You still need to know what you're doing, opening sockets and making a basic communication loop or something. Global $T_Is = Chr(0) Global $T_Send = Chr(1) Global $T_TerminalType = Chr(24) Global $T_SE = Chr(240) Global $T_SB = Chr(250) Global $T_Will = Chr(251) Global $T_Wont = Chr(252) Global $T_Do = Chr(253) Global $T_Dont = Chr(254) Global $T_IAC = Chr(255) Func NegotiateTelnetOptions() $NegotiationCommandsToSendBack = "" While StringInStr($Data, $T_IAC) $IACPosition = StringInStr($Data, $T_IAC) Switch StringMid($Data, $IACPosition+1, 1) Case $T_Will $NegotiationCommandsToSendBack &= CraftReply_CleanUpData($IACPosition, $T_Dont) Case $T_Do If StringMid($Data, $IACPosition+2, 1) = $T_TerminalType Then $NegotiationCommandsToSendBack &= CraftReply_CleanUpData($IACPosition, $T_Will) Else $NegotiationCommandsToSendBack &= CraftReply_CleanUpData($IACPosition, $T_Wont) EndIf Case $T_SB If StringMid($Data, $IACPosition, 6) = ($T_IAC & $T_SB & $T_TerminalType & $T_Send & $T_IAC & $T_SE) Then $NegotiationCommandsToSendBack &= $T_IAC & $T_SB & $T_TerminalType & $T_Is & "xterm" & $T_IAC & $T_SE $Data = StringReplace($Data, StringMid($Data, $IACPosition, 6), "") Else MsgBox(0, "", "Unknown Subnegotiation...") ; Should never happen. Exit EndIf EndSwitch WEnd Return $NegotiationCommandsToSendBack EndFunc Func CraftReply_CleanUpData($IACPosition, $Reply) $PartialCommandToSendBack = $T_IAC & $Reply & StringMid($Data, $IACPosition+2, 1) $Data = StringReplace($Data, StringMid($Data, $IACPosition, 3), "") Return $PartialCommandToSendBack EndFunc
-
Hello i have lots of text like this i have no idea what does this \u stuff means, after googling it i found that it is some sort of utf encoding how could i write this text to a file without loosing any characters and getting rid of \u ? EDIT: after more digging i found that i need "Converting Unicode Entities to Unicode Text" any ideas hoe to do that with autoit ?