Jump to content

Recommended Posts

Posted (edited)

  On 5/20/2010 at 2:28 PM, 'lgwapnitsky said:

I also tried the literal (\xe2\x84\xa2) which did not work either.

Just got it working...I entered something "slightly" wrong...I love and hate regexp at the same time.

edit: nevermind...the replacement isn't working just yet...this code is not successful right now:

$teststring = ChrW(Dec(2122))
$teststring = StringRegExpReplace($teststring, "[\x99]", "TM")
consolewrite($teststring)
Edited by lgwapnitsky
  • Replies 63
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted

  On 5/20/2010 at 2:28 PM, 'lgwapnitsky said:

I also tried the literal (\xe2\x84\xa2) which did not work either.

I'm thinking that PCRE isn't picking up certain characters. There have been 2 updates to that engine, since the last time it was updated for AutoIt, which involved bug fixes so I'll try to find out if any of those fixes could be for this issue. I think I know of a test I can run to see if there are any others that are not being picked up.

And consider that comment about "it should be \x99" to be a brain dead moment. That's what you had written. Caffeine straight, double up please. I might have to start taking my morning coffee by injection.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

Interestingly enough this works

$s = Chr(153)
Local $i = StringRegExp($s, "™")

ConsoleWrite(($i <> 0) & @CRLF)

Which would seem to indicate a problem in Hex. Now, is that problem in AutoIt or PCRE. I suspect PCRE.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

  On 5/20/2010 at 3:35 PM, 'GEOSoft said:

Interestingly enough this works

$s = Chr(153)
Local $i = StringRegExp($s, "™")

ConsoleWrite(($i <> 0) & @CRLF)

Which would seem to indicate a problem in Hex. Now, is that problem in AutoIt or PCRE. I suspect PCRE.

I found the same thing and am still puzzling it over...
Posted

  On 5/20/2010 at 4:01 PM, 'lgwapnitsky said:

I found the same thing and am still puzzling it over...

I think I'll go ahead and report it as a bug although I still say it's in PCRE and not AutoIt. I'll also recommend in the report that the engine be updated from the current 8.0.0 to 8.0.2 which is the latest.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted (edited)

  On 5/20/2010 at 2:24 PM, 'jchd said:

Hi George,

Beware that \x99 is the Latin1 ANSI assignment for ™ (the Unicode code point \U2122).

Charmap.exe shows it nicely, once you check the "advanced view" checkbox (I don't know how it gets called in non frenchy Windows). You can switch the displayed charset between Unicode, Ansi (select your codepage), OEM, ...

Sorry I missed your replies earlier.

You're correct but I'm thinking that it should also pick up on the Hex value which is \x99 and that seems to be where it fails. We can't check \U2122 because the Unicode support doesn't work and I would still like to know why. It's been a pain since day 1.

Edit: In English it's called "Advanced View", not sure what it's called in the frenchy windows though.

Edited by GEOSoft

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

Here is a complete list of the characters that fail.

\x80

\x82

\x83

\x84

\x85

\x86

\x87

\x88

\x89

\x8A

\x8B

\x8C

\x8E

\x91

\x92

\x93

\x94

\x95

\x96

\x97

\x98

\x99

\x9A

\x9B

\x9C

\x9E

\x9F

Now for the good news. I finally got my SRE mentor involved in this and it looks like he has a work around. Just one short function which I'm testing right now. as soon as it's tested I'll come back and post it.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

  On 5/20/2010 at 5:41 PM, 'GEOSoft said:

Here is a complete list of the characters that fail.

\x80

\x82

\x83

\x84

\x85

\x86

\x87

\x88

\x89

\x8A

\x8B

\x8C

\x8E

\x91

\x92

\x93

\x94

\x95

\x96

\x97

\x98

\x99

\x9A

\x9B

\x9C

\x9E

\x9F

Now for the good news. I finally got my SRE mentor involved in this and it looks like he has a work around. Just one short function which I'm testing right now. as soon as it's tested I'll come back and post it.

Can you PM me as well as reply if you get it? SPAM is killing me today no matter how many times I correct it...

Posted

@lgwapnitsky I'll Pm you a copy of this as well.

Well, again thanks to SmOke_N, we have a solution that works for MOST of those on the list. It's still a bit buggy on the few that remain but I'll work away at that. I'm posting the test code along with the function.

Local $s_str = "€agZ˜™berƒ„…†‡‰Š‹ŒŽ‘’“”"
Local $s_pattern = "(\w" & _SRE_HexToChar("[\x98-\x99]+") & ")"
Local $a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("1 is good" & @CRLF)

$s_pattern = "(\w" & _SRE_HexToChar("\x80-FF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("2 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("80-FF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("3 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("81-\xFF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("4 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("80-FF*") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("5 is good" & @CRLF)

Func _SRE_HexToChar($s_str)

    Local $s_ret_str = ""
    Local $s_pattern = "(?:\[)?(?:\\x)?([[:xdigit:]]+)-" & _
        "(?:\\x)?([[:xdigit:]]+)(?:\])?((?:\*|\+|\?))?"
    Local $a_range = StringRegExp($s_str, $s_pattern, 1)
    If Not @error Then
        $s_ret_str = "(?:"
        For $i = Dec($a_range[0]) To Dec($a_range[1])
            $s_ret_str &= Chr($i) & "|"
        Next
        $s_ret_str = StringTrimRight($s_ret_str, 1)
        If Not $s_ret_str Then Return ""
        If UBound($a_range) = 3 Then Return $s_ret_str & ")" & $a_range[2]
        Return $s_ret_str & ")"
    EndIf

    Local $a_hex = StringRegExp($s_str, "[[:xdigit:]]+", 1)
    If @error Then Return ""
    Return Chr(Dec($a_hex[0]))
EndFunc

It will allow you to do ranges too with * or + or ?

you can use [\x##] or \x## or ##

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

  On 5/20/2010 at 6:47 PM, 'GEOSoft said:

@lgwapnitsky I'll Pm you a copy of this as well.

Well, again thanks to SmOke_N, we have a solution that works for MOST of those on the list. It's still a bit buggy on the few that remain but I'll work away at that. I'm posting the test code along with the function.

Local $s_str = "€agZ˜™berƒ„…†‡‰Š‹ŒŽ‘’“”"
Local $s_pattern = "(\w" & _SRE_HexToChar("[\x98-\x99]+") & ")"
Local $a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("1 is good" & @CRLF)

$s_pattern = "(\w" & _SRE_HexToChar("\x80-FF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("2 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("80-FF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("3 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("81-\xFF") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("4 is good" & @CRLF)

$s_pattern ="(\w" & _SRE_HexToChar("80-FF*") & ")"
$a_sre = StringRegExp($s_str, $s_pattern, 1)
If IsArray($a_sre) Then ConsoleWrite("5 is good" & @CRLF)

Func _SRE_HexToChar($s_str)

    Local $s_ret_str = ""
    Local $s_pattern = "(?:\[)?(?:\\x)?([[:xdigit:]]+)-" & _
        "(?:\\x)?([[:xdigit:]]+)(?:\])?((?:\*|\+|\?))?"
    Local $a_range = StringRegExp($s_str, $s_pattern, 1)
    If Not @error Then
        $s_ret_str = "(?:"
        For $i = Dec($a_range[0]) To Dec($a_range[1])
            $s_ret_str &= Chr($i) & "|"
        Next
        $s_ret_str = StringTrimRight($s_ret_str, 1)
        If Not $s_ret_str Then Return ""
        If UBound($a_range) = 3 Then Return $s_ret_str & ")" & $a_range[2]
        Return $s_ret_str & ")"
    EndIf

    Local $a_hex = StringRegExp($s_str, "[[:xdigit:]]+", 1)
    If @error Then Return ""
    Return Chr(Dec($a_hex[0]))
EndFunc

It will allow you to do ranges too with * or + or ?

you can use [\x##] or \x## or ##

That is simultaneously beautiful and ugly...I'll test it on my stuff and report back.

Thanks for all the hard work.

Posted

  On 5/20/2010 at 6:57 PM, 'lgwapnitsky said:

That is simultaneously beautiful and ugly...I'll test it on my stuff and report back.

Are you talking about SmOke_N or the function?

  On 5/20/2010 at 6:57 PM, 'lgwapnitsky said:

Thanks for all the hard work.

Don't thank me, thank SmOke. The hardest thing I had to do was IM him and point out a simple error in the function, which he promptly fixed.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

  On 5/20/2010 at 1:48 PM, 'lgwapnitsky said:

One more question...for now...

\x2122 doesn't seem to be matching ™ (which I'm also still trying to figure out why I can't type it on my keyboard). This is for a different portion of the utility.

It does match if you wrap anything greater than 2 hex digits in curly braces like this:

"\x{2122}"

:idea:

My contributions:

  Reveal hidden contents

Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFsProcess CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen)Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery

Wrappers/Modifications of others' contributions:

_DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity)

UDF's added support/programming to:

_ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne)

(All personal code/wrappers centrally located at Ascend4nt's AutoIT Code)

  • Moderators
Posted (edited)

The _SRE_HexToChar() func can be called as such for your hex query string:

_SRE_HexToChar("[\x##-\x##]")
_SRE_HexToChar("[\x##-##]")
_SRE_HexToChar("[##-\x##]")
_SRE_HexToChar("[##-##]")
_SRE_HexToChar("[##]")
_SRE_HexToChar("##")

Where ## represents the 2 character hex value.

Also anything in square brackets can be followed by a "*", "+", or "?".

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

  On 5/20/2010 at 7:03 PM, 'GEOSoft said:

Are you talking about SmOke_N or the function?

Don't thank me, thank SmOke. The hardest thing I had to do was IM him and point out a simple error in the function, which he promptly fixed.

Found something odd. This does not work in finding my "TM" symbol:

$array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & "]+", 3)

but this does:

$array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & _SRE_HexToChar("99") & "]+", 3)

The former does not show my "TM"s, but the latter does.

Posted

  On 5/20/2010 at 7:44 PM, 'Ascend4nt said:

It does match if you wrap anything greater than 2 hex digits in curly braces like this:

"\x{2122}"

:idea:

DAGNabbit!!! That works!

Posted

  On 5/20/2010 at 7:44 PM, 'Ascend4nt said:

It does match if you wrap anything greater than 2 hex digits in curly braces like this:

"\x{2122}"

:idea:

That doesn't surprize me too much. \x is only supposed to work with a maximum of 2 hex characters. SmOke_N and I agreed this morning that the documentation is in error for that as well. it says 2 digits and that clearly is not correct. Hex is not always just digits. It should be xdigits as in [a-fA-F0-9]

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted (edited)

Here's a quick little converter for getting Unicode 4-character code values. Anything with 00 at the end can be put as a simple \x##, or its ANSI equivalent.

*edit: see fixed version in -> this post

Edited by Ascend4nt

My contributions:

  Reveal hidden contents

Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFsProcess CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen)Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery

Wrappers/Modifications of others' contributions:

_DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity)

UDF's added support/programming to:

_ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne)

(All personal code/wrappers centrally located at Ascend4nt's AutoIT Code)

Posted

  On 5/20/2010 at 8:10 PM, 'Ascend4nt said:

Here's a quick little converter for getting Unicode 4-character code values. Anything with 00 at the end can be put as a simple \x##, or its ANSI equivalent.

$sInputStr=InputBox("Unicode-Hex converter","Enter Unicode character(s) in box to see Hexadecimal equivalents","","",360,160)
If $sInputStr<>"" Then MsgBox(0,"Hex equivalent","Original string:"&@CRLF&$sInputStr&@CRLF&"Hexadecimal equivalents:"&@CRLF& _
    StringRegExpReplace(StringTrimLeft(StringToBinary($sInputStr,2),2),"(.{4})","$1,"))

Now there is another nice toy. Thank you.

George

  Reveal hidden contents
Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

  • Moderators
Posted (edited)

  On 5/20/2010 at 7:55 PM, 'lgwapnitsky said:

Found something odd. This does not work in finding my "TM" symbol:

$array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & "]+", 3)

but this does:

$array = StringRegExp($teststring, "[[:alnum:]\-\s\x27\x9e\x9f" & _SRE_HexToChar("c9-ff") & _SRE_HexToChar("99") & "]+", 3)

The former does not show my "TM"s, but the latter does.

No idea what your test string is, or what you feel the issue is. Clearly hex 99 is less than hex c9, so I'm left to guess that there's another issue?

Edit:

Remember:

##-## is a range from least to greatest, your "TM" symbol ( hex 99 ), is not even in that first pattern.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

  On 5/20/2010 at 8:18 PM, 'SmOke_N said:

No idea what your test string is, or what you feel the issue is. Clearly hex 99 is less than hex c9, so I'm left to guess that there's another issue?

Edit:

Remember:

##-## is a range from least to greatest, your "TM" symbol ( hex 99 ), is not even in that first pattern.

You're right...i've been working at this too long. Regexp has fried my brain today.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...