jusb3 Posted May 13, 2014 Share Posted May 13, 2014 (edited) How can I decode string string containing unicode? For example "meru00e4u00e4" would turn out as "merää". Is there any build in function for that? Edited May 14, 2014 by jusb3 Link to comment Share on other sites More sharing options...
guinness Posted May 13, 2014 Share Posted May 13, 2014 Are you sure that's correct? As I thought 34 decimal was ". Local $sString = "mer\u0034\u0034" Local $aSRE = StringRegExp($sString, "\\u(\d+)", 3) For $i = 0 To UBound($aSRE) - 1 $sString = StringReplace($sString, "\u" & $aSRE[$i], ChrW(Int($aSRE[$i]))) Next ConsoleWrite($sString & @CRLF) ClipPut($sString) UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
jusb3 Posted May 14, 2014 Author Share Posted May 14, 2014 Are you sure that's correct? As I thought 34 decimal was ". Local $sString = "mer\u0034\u0034" Local $aSRE = StringRegExp($sString, "\\u(\d+)", 3) For $i = 0 To UBound($aSRE) - 1 $sString = StringReplace($sString, "\u" & $aSRE[$i], ChrW(Int($aSRE[$i]))) Next ConsoleWrite($sString & @CRLF) ClipPut($sString) Oh sorry, It should had been u00e4, it's hex value. But I will take a look at your example. Link to comment Share on other sites More sharing options...
czardas Posted May 14, 2014 Share Posted May 14, 2014 (edited) Where can I find such strange strings? It doesn't look like any standard format I have seen before. Did you invent it? - 'meru00e4u00e4'. BTW the exact referenced string can be encoded as ansi. It looks like a regular expression. The following variant would appear to be more consistant. 'u006Du0065u0072u00E4u00E4' Perhaps you (or someone) could enlighten me where this format is used. Edited May 15, 2014 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted May 15, 2014 Share Posted May 15, 2014 Perhaps you (or someone) could enlighten me where this format is used. JSON This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted May 15, 2014 Share Posted May 15, 2014 (edited) Okay, thanks. Are the encodings actually mixed and matched like that - half ansi/half code points? I suppose they do more or less the same thing in html. The reason I was asking was to get a clearer picture. 'meru00e4u00e4' doesn't reveal enough information to assume rules of formatting, and the OP seemed to think people would already have a solution: indicating it was a recognized format. There would need to be a way to escape the backslash character - like there is in regexp. Edited May 15, 2014 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted May 15, 2014 Share Posted May 15, 2014 It's initial BOM + Unicode (UTF-8 by default) + Unicode escapes in UTF16 and 2-character escapes similar to those in C. A backslash would be u005C or . It's a real mess where "extensions" are allowed, probably to increase the odds of non-interoperability and hidden bugs. It's no wonder since JSON means JavaScript Object Notation. They even pooped an RFC. czardas 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted May 15, 2014 Share Posted May 15, 2014 LMAO operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Jury Posted May 18, 2014 Share Posted May 18, 2014 (edited) Depending on the version pcre Jan Goyvaerts says: "Perl and PCRE do not support the uFFFF syntax. They use x{FFFF} instead. You can omit leading zeros in the hexadecimal number between the curly braces." so omitting the leading zeros what you have for: merää can be: x6Dx65x72xE4xE4 but with something with two codepoints such as: € you can use the x{FFFF} format: Local $sString = "€" Local $aSRE = StringRegExp($sString, "\x{20AC}", 3) For $i = 0 To UBound($aSRE) - 1 ConsoleWrite($aSRE[$i] & @CRLF) Next Edited May 18, 2014 by Jury Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now