lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 (edited) I also tried the literal (\xe2\x84\xa2) which did not work either. Just got it working...I entered something "slightly" wrong...I love and hate regexp at the same time. edit: nevermind...the replacement isn't working just yet...this code is not successful right now: $teststring = ChrW(Dec(2122)) $teststring = StringRegExpReplace($teststring, "[\x99]", "TM") consolewrite($teststring) Edited May 20, 2010 by lgwapnitsky Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 I also tried the literal (\xe2\x84\xa2) which did not work either.I'm thinking that PCRE isn't picking up certain characters. There have been 2 updates to that engine, since the last time it was updated for AutoIt, which involved bug fixes so I'll try to find out if any of those fixes could be for this issue. I think I know of a test I can run to see if there are any others that are not being picked up.And consider that comment about "it should be \x99" to be a brain dead moment. That's what you had written. Caffeine straight, double up please. I might have to start taking my morning coffee by injection. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Interestingly enough this works $s = Chr(153) Local $i = StringRegExp($s, "™") ConsoleWrite(($i <> 0) & @CRLF) Which would seem to indicate a problem in Hex. Now, is that problem in AutoIt or PCRE. I suspect PCRE. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 Interestingly enough this works $s = Chr(153) Local $i = StringRegExp($s, "™") ConsoleWrite(($i <> 0) & @CRLF) Which would seem to indicate a problem in Hex. Now, is that problem in AutoIt or PCRE. I suspect PCRE. I found the same thing and am still puzzling it over... Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 I found the same thing and am still puzzling it over...I think I'll go ahead and report it as a bug although I still say it's in PCRE and not AutoIt. I'll also recommend in the report that the engine be updated from the current 8.0.0 to 8.0.2 which is the latest. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 (edited) Hi George,Beware that \x99 is the Latin1 ANSI assignment for ™ (the Unicode code point \U2122).Charmap.exe shows it nicely, once you check the "advanced view" checkbox (I don't know how it gets called in non frenchy Windows). You can switch the displayed charset between Unicode, Ansi (select your codepage), OEM, ...Sorry I missed your replies earlier.You're correct but I'm thinking that it should also pick up on the Hex value which is \x99 and that seems to be where it fails. We can't check \U2122 because the Unicode support doesn't work and I would still like to know why. It's been a pain since day 1.Edit: In English it's called "Advanced View", not sure what it's called in the frenchy windows though. Edited May 20, 2010 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Here is a complete list of the characters that fail. \x80 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8A \x8B \x8C \x8E \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9A \x9B \x9C \x9E \x9F Now for the good news. I finally got my SRE mentor involved in this and it looks like he has a work around. Just one short function which I'm testing right now. as soon as it's tested I'll come back and post it. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 Here is a complete list of the characters that fail.\x80\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8E\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9E\x9FNow for the good news. I finally got my SRE mentor involved in this and it looks like he has a work around. Just one short function which I'm testing right now. as soon as it's tested I'll come back and post it.Can you PM me as well as reply if you get it? SPAM is killing me today no matter how many times I correct it... Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 @lgwapnitsky I'll Pm you a copy of this as well. Well, again thanks to SmOke_N, we have a solution that works for MOST of those on the list. It's still a bit buggy on the few that remain but I'll work away at that. I'm posting the test code along with the function. expandcollapse popupLocal $s_str = "€agZ˜™berƒ„…†‡‰Š‹ŒŽ‘’“”" Local $s_pattern = "(\w" & _SRE_HexToChar("[\x98-\x99]+") & ")" Local $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("1 is good" & @CRLF) $s_pattern = "(\w" & _SRE_HexToChar("\x80-FF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("2 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("80-FF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("3 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("81-\xFF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("4 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("80-FF*") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("5 is good" & @CRLF) Func _SRE_HexToChar($s_str) Local $s_ret_str = "" Local $s_pattern = "(?:\[)?(?:\\x)?([[:xdigit:]]+)-" & _ "(?:\\x)?([[:xdigit:]]+)(?:\])?((?:\*|\+|\?))?" Local $a_range = StringRegExp($s_str, $s_pattern, 1) If Not @error Then $s_ret_str = "(?:" For $i = Dec($a_range[0]) To Dec($a_range[1]) $s_ret_str &= Chr($i) & "|" Next $s_ret_str = StringTrimRight($s_ret_str, 1) If Not $s_ret_str Then Return "" If UBound($a_range) = 3 Then Return $s_ret_str & ")" & $a_range[2] Return $s_ret_str & ")" EndIf Local $a_hex = StringRegExp($s_str, "[[:xdigit:]]+", 1) If @error Then Return "" Return Chr(Dec($a_hex[0])) EndFunc It will allow you to do ranges too with * or + or ? you can use [\x##] or \x## or ## George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 @lgwapnitsky I'll Pm you a copy of this as well. Well, again thanks to SmOke_N, we have a solution that works for MOST of those on the list. It's still a bit buggy on the few that remain but I'll work away at that. I'm posting the test code along with the function. expandcollapse popupLocal $s_str = "€agZ˜™berƒ„…†‡‰Š‹ŒŽ‘’“”" Local $s_pattern = "(\w" & _SRE_HexToChar("[\x98-\x99]+") & ")" Local $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("1 is good" & @CRLF) $s_pattern = "(\w" & _SRE_HexToChar("\x80-FF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("2 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("80-FF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("3 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("81-\xFF") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("4 is good" & @CRLF) $s_pattern ="(\w" & _SRE_HexToChar("80-FF*") & ")" $a_sre = StringRegExp($s_str, $s_pattern, 1) If IsArray($a_sre) Then ConsoleWrite("5 is good" & @CRLF) Func _SRE_HexToChar($s_str) Local $s_ret_str = "" Local $s_pattern = "(?:\[)?(?:\\x)?([[:xdigit:]]+)-" & _ "(?:\\x)?([[:xdigit:]]+)(?:\])?((?:\*|\+|\?))?" Local $a_range = StringRegExp($s_str, $s_pattern, 1) If Not @error Then $s_ret_str = "(?:" For $i = Dec($a_range[0]) To Dec($a_range[1]) $s_ret_str &= Chr($i) & "|" Next $s_ret_str = StringTrimRight($s_ret_str, 1) If Not $s_ret_str Then Return "" If UBound($a_range) = 3 Then Return $s_ret_str & ")" & $a_range[2] Return $s_ret_str & ")" EndIf Local $a_hex = StringRegExp($s_str, "[[:xdigit:]]+", 1) If @error Then Return "" Return Chr(Dec($a_hex[0])) EndFunc It will allow you to do ranges too with * or + or ? you can use [\x##] or \x## or ## That is simultaneously beautiful and ugly...I'll test it on my stuff and report back. Thanks for all the hard work. Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 That is simultaneously beautiful and ugly...I'll test it on my stuff and report back.Are you talking about SmOke_N or the function?Thanks for all the hard work.Don't thank me, thank SmOke. The hardest thing I had to do was IM him and point out a simple error in the function, which he promptly fixed. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Ascend4nt Posted May 20, 2010 Share Posted May 20, 2010 One more question...for now...\x2122 doesn't seem to be matching ™ (which I'm also still trying to figure out why I can't type it on my keyboard). This is for a different portion of the utility.It does match if you wrap anything greater than 2 hex digits in curly braces like this:"\x{2122}" My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted May 20, 2010 Moderators Share Posted May 20, 2010 (edited) The _SRE_HexToChar() func can be called as such for your hex query string: _SRE_HexToChar("[\x##-\x##]") _SRE_HexToChar("[\x##-##]") _SRE_HexToChar("[##-\x##]") _SRE_HexToChar("[##-##]") _SRE_HexToChar("[##]") _SRE_HexToChar("##") Where ## represents the 2 character hex value. Also anything in square brackets can be followed by a "*", "+", or "?". Edited May 20, 2010 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 Are you talking about SmOke_N or the function? Don't thank me, thank SmOke. The hardest thing I had to do was IM him and point out a simple error in the function, which he promptly fixed. Found something odd. This does not work in finding my "TM" symbol: $array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & "]+", 3) but this does: $array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & _SRE_HexToChar("99") & "]+", 3) The former does not show my "TM"s, but the latter does. Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 It does match if you wrap anything greater than 2 hex digits in curly braces like this:"\x{2122}"DAGNabbit!!! That works! Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 It does match if you wrap anything greater than 2 hex digits in curly braces like this: "\x{2122}" That doesn't surprize me too much. \x is only supposed to work with a maximum of 2 hex characters. SmOke_N and I agreed this morning that the documentation is in error for that as well. it says 2 digits and that clearly is not correct. Hex is not always just digits. It should be xdigits as in [a-fA-F0-9] George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Ascend4nt Posted May 20, 2010 Share Posted May 20, 2010 (edited) Here's a quick little converter for getting Unicode 4-character code values. Anything with 00 at the end can be put as a simple \x##, or its ANSI equivalent.*edit: see fixed version in -> this post Edited May 20, 2010 by Ascend4nt My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2010 Share Posted May 20, 2010 Here's a quick little converter for getting Unicode 4-character code values. Anything with 00 at the end can be put as a simple \x##, or its ANSI equivalent. $sInputStr=InputBox("Unicode-Hex converter","Enter Unicode character(s) in box to see Hexadecimal equivalents","","",360,160) If $sInputStr<>"" Then MsgBox(0,"Hex equivalent","Original string:"&@CRLF&$sInputStr&@CRLF&"Hexadecimal equivalents:"&@CRLF& _ StringRegExpReplace(StringTrimLeft(StringToBinary($sInputStr,2),2),"(.{4})","$1,")) Now there is another nice toy. Thank you. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted May 20, 2010 Moderators Share Posted May 20, 2010 (edited) Found something odd. This does not work in finding my "TM" symbol: $array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & "]+", 3) but this does: $array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & _SRE_HexToChar("99") & "]+", 3) The former does not show my "TM"s, but the latter does. No idea what your test string is, or what you feel the issue is. Clearly hex 99 is less than hex c9, so I'm left to guess that there's another issue? Edit: Remember: ##-## is a range from least to greatest, your "TM" symbol ( hex 99 ), is not even in that first pattern. Edited May 20, 2010 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
lgwapnitsky Posted May 20, 2010 Author Share Posted May 20, 2010 No idea what your test string is, or what you feel the issue is. Clearly hex 99 is less than hex c9, so I'm left to guess that there's another issue?Edit:Remember:##-## is a range from least to greatest, your "TM" symbol ( hex 99 ), is not even in that first pattern.You're right...i've been working at this too long. Regexp has fried my brain today. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now