lgwapnitsky Posted May 19, 2010 Author Share Posted May 19, 2010 (edited) No problem but I'm beginning to think you might want to do another replacement after the loop and that is to delete invalid characters. That way you probably wouldn't need to validate. I can whip up that SRER quickly since I already had it earlier but didn't think to save it. That's what this is for: $Testarray = StringRegExp($sStr, "([[:alnum:]\-\s\x27\x9e\x9f\xc9-\xff]+)", 3) MsgBox(0, "Result2", _ArrayToString($Testarray, "")) Actually, this code just concatenates out the bad characters. My SRER is working fine right now...for the characters I need. Edited May 19, 2010 by lgwapnitsky Link to comment Share on other sites More sharing options...
GEOSoft Posted May 19, 2010 Share Posted May 19, 2010 Instead of botherng to create the array with the RegEx you might want to think about just deleting enything except the allowable characters. $sUserName = StringRegExpReplace($sUserName, "[^[:alnum:]\-\s\x27\x9e\x9f\xc9-\xff]", "") In this one, anything inside the first "[" and the last "]" is excluded from deletion. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 19, 2010 Author Share Posted May 19, 2010 Instead of botherng to create the array with the RegEx you might want to think about just deleting enything except the allowable characters. $sUserName = StringRegExpReplace($sUserName, "[^[:alnum:]\-\s\x27\x9e\x9f\xc9-\xff]", "") In this one, anything inside the first "[" and the last "]" is excluded from deletion. Very clean...I like it. Thanks. Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 19, 2010 Author Share Posted May 19, 2010 Very clean...I like it. Thanks.btw - shouldn't it be \xc0 instead of \xc9? That's where A with grave is... Link to comment Share on other sites More sharing options...
GEOSoft Posted May 19, 2010 Share Posted May 19, 2010 btw - shouldn't it be \xc0 instead of \xc9? That's where A with grave is...Should it? I just copied the characters you were using, however \xc0 is correct. Easily tested if you follow that link I gave you. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Ascend4nt Posted May 19, 2010 Share Posted May 19, 2010 Though I don't want to threadjack this erm thread, I'm curious.. if we want to check for specific unicode characters, we can do it as long as we know the range of hex codes, like for Cyrillic, the range is something like 0x400-0x4FFF and some in the 0x5xxx range. I just did a test on Cyrillic characters, mixed with Japanese and english, and the following PCRE picked out the correct Cyrillic characters, but I'm kinda lost because I can't get hex codes for all character sets. Grab (a subset of) Cyrillic Unicode characters: "[\x{0400}-\x{04FF}]" GEOSoft, do you know of any good sites which list the hex-code ranges for different languages? I just sorta stumbled on what I did, and it's really a general overview. And I can't get my brain around this 'U+' notation. Plus now I'm reading that 16-bits isn't going to capture the entire Unicode character set (there's 32-bit versions though..) My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
GEOSoft Posted May 19, 2010 Share Posted May 19, 2010 Though I don't want to threadjack this erm thread, I'm curious.. if we want to check for specific unicode characters, we can do it as long as we know the range of hex codes, like for Cyrillic, the range is something like 0x400-0x4FFF and some in the 0x5xxx range. I just did a test on Cyrillic characters, mixed with Japanese and english, and the following PCRE picked out the correct Cyrillic characters, but I'm kinda lost because I can't get hex codes for all character sets. Grab (a subset of) Cyrillic Unicode characters: "[\x{0400}-\x{04FF}]" GEOSoft, do you know of any good sites which list the hex-code ranges for different languages? I just sorta stumbled on what I did, and it's really a general overview. And I can't get my brain around this 'U+' notation. Plus now I'm reading that 16-bits isn't going to capture the entire Unicode character set (there's 32-bit versions though..) I don't have a link to tha page I found when I first went looking for this information. I searched the forums earlier to find the script I did for getting the Unicode characters (Full as I recall) using a RegEx. I still haven't found it and I think that may be becuase I either didn't have the correct serch term or the limit on the number of search returns cut off the list. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
GEOSoft Posted May 19, 2010 Share Posted May 19, 2010 Wait a second here. I do remember that I found the correct codes for the sets some place on http://unicode.org/ George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jchd Posted May 20, 2010 Share Posted May 20, 2010 Ascend4nt,The right spot for much Unicode stuff is there. Be sure to visit UniView, Converters and other shortcuts Ishida offers.You can also benefit looking there to discover in detail which script maps where and in which Unicode plane.Finally, a Unicode primer, not very tech but correct. While you are there, take a few hours to read the 1075 articles Joel posted on his blog. Most are worth it. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Ascend4nt Posted May 20, 2010 Share Posted May 20, 2010 Awesome, thanks for the links guys! My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Awesome, thanks for the links guys!In case you are interested I Pmed you another link that you might be intereset in keeping track of.Also there was a minor update to the toolkit today, same link. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 (edited) Should it? I just copied the characters you were using, however \xc0 is correct. Easily tested if you follow that link I gave you. Strange- I thought I posted my final testing code yesterday, but I don't see it now: #include <String.au3> Global $UnicodeReplacements[19][2] = [["\xc0-\xc5","A"],["\xe0-\xe5","a"], ["\xc6","AE"], ["\xc7","C"],["\xe7","c"], _ ["\xc8-\xcb","E"], ["\xe8-\xeb","e"], ["\xcc-\xcf","I"], ["\xec-\xef","i"], _ ["\xd1","N"],["\xf1","n"], ["\xd2-\xd6\xd8","O"], ["\xf2-\xf6\xf8","o"], _ ["\xd9-\xdc","U"], ["\xf9-\xfc","u"], ["\xdd","Y"], ["\xfd","y"]] $ib = InputBox("Test","enter test string") $ib = StringRegExpReplace($ib, "[^[:alnum:]\-\s\x27\x9e\x9f\xc9-\xff]", "") for $ur = 0 to UBound($UnicodeReplacements) -1 $ib = StringRegExpReplace($ib, "[" & $UnicodeReplacements[$ur][0] & "]", $UnicodeReplacements[$ur][1]) next ConsoleWrite(_StringProper($ib) & @CRLF) Also, I'm curious about the toolkit link you PM'ed to Ascend4nt... Edited May 20, 2010 by lgwapnitsky Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 (edited) I think you did and if I'm correct it's way back on page 1 of this thread. Your array is different and I'm guessing that you already allowed for the changes. ConsoleWrite(), contrary to popular belief is not a great method of testing. Use MsgBoxes instead. For me it doesn't matter becuase it's only a single click to convert them anyway and in your case there is only one to be done. This portion of the code seems to be fine when I test it but I'm curious to know how it is with th rest of your script. It was just a link to the revisions for the latest version of PCRE which AutoIt isn't using yet. Edited May 20, 2010 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 I think you did and if I'm correct it's way back on page 1 of this thread. Your array is different and I'm guessing that you already allowed for the changes.ConsoleWrite(), contrary to popular belief is not a great method of testing. Use MsgBoxes instead. For me it doesn't matter becuase it's only a single click to convert them anyway and in your case there is only one to be done. This portion of the code seems to be fine when I test it but I'm curious to know how it is with th rest of your script.It was just a link to the revisions for the latest version of PCRE which AutoIt isn't using yet.Ahhh...i'm not awake yet.One more question...for now...\x2122 doesn't seem to be matching ™ (which I'm also still trying to figure out why I can't type it on my keyboard). This is for a different portion of the utility.Thanks Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Ahhh...i'm not awake yet.One more question...for now...\x2122 doesn't seem to be matching ™ (which I'm also still trying to figure out why I can't type it on my keyboard). This is for a different portion of the utility.ThanksGive me a while to get some caffeine into me and I'll look at that. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jchd Posted May 20, 2010 Share Posted May 20, 2010 You can type it with your keyboard using the compose sequence Alt0153 and it indeed is \x2122 or 8482 decimal. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 You can type it with your keyboard using the compose sequence Alt0153 and it indeed is \x2122 or 8482 decimal.That works...some help this is: http://www.fileformat.info/info/unicode/char/2122/index.htm Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Actually that should be \x99 which also doesn't work. I suspect it's a bug in the PCRE engine and not in AutoIt. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jchd Posted May 20, 2010 Share Posted May 20, 2010 Hi George,Beware that \x99 is the Latin1 ANSI assignment for ™ (the Unicode code point \U2122).Charmap.exe shows it nicely, once you check the "advanced view" checkbox (I don't know how it gets called in non frenchy Windows). You can switch the displayed charset between Unicode, Ansi (select your codepage), OEM, ... This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 Actually that should be \x99 which also doesn't work. I suspect it's a bug in the PCRE engine and not in AutoIt.I also tried the literal (\xe2\x84\xa2) which did not work either. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now