Kyan Posted March 1, 2015 Share Posted March 1, 2015 (edited) Hi, I never needed a "if" condition in SRE, how can I discard empty users from a list and just capture the ones with value? Example: --- Alice has $200 Bob has $10 John has null ---- I tried with $tex='Alice has $200'&@CRLF&'Bob has $10'&@CRLF&'John has null' $x=StringRegExp($tex,'(?i)^(.+?)\hhas\h(?:(?=:null)|(.+?))$',3) _ArrayDisplay($x) Exit But doesn't work Is this even possible to do with regex? Seen that conditional SRE existed on those websites: http://www.regular-expressions.info/conditional.html http://www.rexegg.com/regex-conditionals.html Edited March 1, 2015 by Kyan Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
jguinch Posted March 1, 2015 Share Posted March 1, 2015 It seems you don't need a conditional pattern for this. Just use (?m) and capture values from line containing a price $d+ at the end of line #Include <Array.au3> $sString = "Alice has $200" & @CRLF & _ "Bob has $10" & @CRLF & _ "John has null" $aValues = StringRegExp($sString, "(?im)^([a-z-]+).*\$(\d+)", 3) Local $aResult[UBound($aValues) / 2][2] For $i = 0 To UBound($aValues) - 1 Step 2 $aResult[$i / 2][0] = $aValues[$i] $aResult[$i / 2][1] = $aValues[$i + 1] Next _ArrayDisplay($aResult) Kyan 1 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Malkey Posted March 1, 2015 Share Posted March 1, 2015 Here is another way. #include <Array.au3> $tex = 'Alice has $200' & @CRLF & 'Bob has $10' & @CRLF & 'John has null' $x = StringRegExp($tex, '(?i).+\hhas\h(?!null).+', 3) _ArrayDisplay($x) Kyan 1 Link to comment Share on other sites More sharing options...
Kyan Posted March 1, 2015 Author Share Posted March 1, 2015 @jguinch, yeah, thats a good idea @Malkey, can you tell me why when I add groups to ".+" it splits things up? if this only works that way it means I need to add non capturing groups to everything else that is matching something? Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
SadBunny Posted March 1, 2015 Share Posted March 1, 2015 (edited) If your example data is representative, and if you want the full lines returned, it seems like you can just grab anything that doesn't end in an 'l' (lowercase L): $x = StringRegExp($tex, "(?m)^.*[^l]$", 3) _ArrayDisplay($x) Edited March 1, 2015 by SadBunny Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
Kyan Posted March 1, 2015 Author Share Posted March 1, 2015 @SadBunny, "(?m).+hhash.+(?!null)$", the case is, you gonna get all those values but not when they are null, there's no other workaround in the real case. I can post the real case, guess it can be posted here text: <a href="#" class="ValuesLst"... I want to exclude all the links with "#" through regex (excluding post Do Loop), so how can I capture all links except the one with "#" Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
mikell Posted March 1, 2015 Share Posted March 1, 2015 I can post the real case, guess it can be posted here You should do it... when dealing with regex the description of initial text, requirements and expected results must be as precise as possible Link to comment Share on other sites More sharing options...
kylomas Posted March 1, 2015 Share Posted March 1, 2015 What does this mean (excluding post Do Loop), ? Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
mikell Posted March 1, 2015 Share Posted March 1, 2015 And what about the "John has null" relevance with the <a href="#" links ? Link to comment Share on other sites More sharing options...
Kyan Posted March 2, 2015 Author Share Posted March 2, 2015 @kylomas, is like saying "post work" I wrote it without looking if was the same meaning in english, but Google Translate says so. By it I was meaning "without the need of a Do Loop after executing the regex. Like: $x = StringRegExp(....) local $I=0 Do if $x[$i] = 'null' then _ArrayDelete($x,$i) $i+=1 until $i>(ubound($x)-1) @mikel, I explained it on comment #6, I'm doing this '<a href="(.+?)" class="ValuesLst"' but I don't want to capture "#" links the way of thinking is the same, you want to capture all but excluding the one case. Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
SadBunny Posted March 2, 2015 Share Posted March 2, 2015 (edited) So which part of the link do you want captured exactly? Please give a couple of examples of actual input and the actual output you desire from those lines. Do you want the entire line? Do you just want the href target? Also, what is it exactly you want to skip? Any line containing anchors exactly equal to href="#"? Or any <a href...>...</a> construct containing that? (If there's more than one per line...) Finally: in the beginning of this thread we didn't know you were parsing HTML. Regex is not a good tool to parse HTML. Read this stackoverflow answer for a very eloquent and linguistically dexterous explanation of that fact. Unless you know exactly what your HTML is going to look like and that it's going to be valid, you will run into problems by doing this. Edited March 2, 2015 by SadBunny Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
Kyan Posted March 2, 2015 Author Share Posted March 2, 2015 (edited) All from .ValuesLst except the ones with "#" #include <Array.au3> $b64HTML='CTx0ZD48YSBocmVmPSJodHRwOi8vaW1ndXIuY29tL2tpWWFvdzEiIHRhcmdldD0iX2JsYW5rIiBjbGFzcz0iVmFsdWVzTHN0Ij48aW1' & _ 'nIHNyYz0iL2ltYWdlcy9MaXN0SWNvbi5wbmciIGJvcmRlcj0nMCcgLz48L2E+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlP' & _ 'SJTdGF0dXMiIGhyZWY9IiMiPjwvYT48L3RkPgkNCgkgPC90cj48dHI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2x' & _ 'hc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+PHRkPjxhIGNsY' & _ 'XNzPSJ0b29sdGlwIiB0aXRsZT0iU3RhdHVzIiBocmVmPSIjIj48L2E+PC90ZD4JDQoJIDwvdHI+PHRyICBjbGFzcz0ib2RkIj4NCgkJPHR' & _ 'kPjxpbWcgc3JjPSIvaW1hZ2VzL25hbi5naWYiIC8+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlPSJTdGF0dXMiIGhyZWY9I' & _ 'iMiPjxpbWcgc3JjPSIvaW1hZ2VzL3VuYXZhaWxhYmxlLnBuZyIgLz48L2E+PC90ZD4JDQoJIDwvdHI+PHRyPg0KCQk8dGQ+PGEgaHJlZj0' & _ 'iL2dyYXBocy8zOTI3MzczLnBuZyIgdGFyZ2V0PSJfYmxhbmsiPjxpbWcgc3JjPSIvaW1hZ2VzL0xpc3RJY29uLnBuZyIgYm9yZGVyPScwJ' & _ 'yAvPjwvYT48L3RkPg0KCSA8L3RyPjx0ciAgY2xhc3M9Im9kZCI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M' & _ '9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+DQoJIDwvdHI+PHRyP' & _ 'gkNCgkJPHRkPjxhIGhyZWY9Ii80MkNCODlBQjA0QUI4MzRCIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM' & _ '9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+' $sHTML = BinaryToString(_Base64Decode($b64HTML)) $rex=StringRegExp($sHTML,'(?im)href="(.+?)"[^>]+?target="[^"]+?"[^>]+?class="ValuesLst"',3) _ArrayDisplay($rex) Exit Func _Base64Decode($input_string) ; by trancexx Local $struct = DllStructCreate('int') Local $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', 0, 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0) If @error Or Not $a_Call[0] Then Return SetError(1, 0, '') Local $a = DllStructCreate('byte[' & DllStructGetData($struct, 1) & ']') $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', DllStructGetPtr($a), 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0) If @error Or Not $a_Call[0] Then Return SetError(2, 0, '') Return DllStructGetData($a, 1) EndFunc ;==>_Base64Decode output: Row|Col 0 [0]|http://imgur.com/kiYaow1 [1]|# [2]|# [3]|/42CB89AB04AB834B I don't want to capture the [1] and [2] :/ EDIT: This should've worked '(?im)href="(?<!:#)(.*?)"[^>]+?target="[^"]+?"[^>]+class="ValuesLst"', don't know how to do it :| Edited March 2, 2015 by Kyan Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
Solution Malkey Posted March 2, 2015 Solution Share Posted March 2, 2015 Try this RE pattern '(?i)href="([^#"]+).+?class="ValuesLst"' Kyan 1 Link to comment Share on other sites More sharing options...
Kyan Posted March 2, 2015 Author Share Posted March 2, 2015 It works! thank you Wouldn't be possible to do this with a "if" in regex like '(?i)href="(?(?<!#).+?)" class="ValuesLst"'? Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
mikell Posted March 2, 2015 Share Posted March 2, 2015 $rex=StringRegExp($sHTML,'(?i)href="(?(?!#)([^"]+)|mikell_was_here).+?ValuesLst', 3) A little overcomplicated Kyan 1 Link to comment Share on other sites More sharing options...
Kyan Posted March 2, 2015 Author Share Posted March 2, 2015 nice nice, if is not a "#" capture everything except quotes, else mikell_was_here xD Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
jguinch Posted March 2, 2015 Share Posted March 2, 2015 ? $rex=StringRegExp($sHTML,'href="([^"#]+)"',3) Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now