Gianni Posted February 1, 2015 Share Posted February 1, 2015 (edited) how can I extract the number that follows a word and the = sign in a random position within a string? also between the word the = sign and the number there can be, none or random spaces, and also, the number can be only the number or enclosed within " " or ' ' for example I need the number after ROWSPAN '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' or <td rowspan= " 2 "> or <td style="width:400px;" rowspan = '5 ' ...</td> thanks for suggestions Edited February 1, 2015 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted February 1, 2015 Moderators Share Posted February 1, 2015 (edited) Chimp,This seems to do the trick: #include <Array.au3> $sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _ '<td rowspan= " 2 ">' & @CRLF & _ '<td style="width:400px;" colspan = "5 " ...</td>' $aExtract = StringRegExp($sText, '(?i)span\s*=\s*"?\s?(\d+)', 3) _ArrayDisplay($aExtract, "", Default, 8)M23Edit: SRE decode (sorry I did not provide one last night):(?i) - Case insensitive (because we have ~SPAN and ~span) span - Look for the word "span", \s* - possibly followed by any number of spaces, = - but certainly by "=". \s* - Then there might be some more spaces, "? - and even a '"', \s? - with another possible space. (\d+) - Finally, capture the digits that come along! 3 - Produce an array of every match foundFor me a Regex is indispensable here as it allows you to get around the fact that the number of spaces (and even their very existence) is completely variable. Edited February 2, 2015 by Melba23 Added decode Gianni and SorryButImaNewbie 2 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Gianni Posted February 1, 2015 Author Share Posted February 1, 2015 (edited) Chimp, This seems to do the trick: #include <Array.au3> $sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _ '<td rowspan= " 2 ">' & @CRLF & _ '<td style="width:400px;" colspan = "5 " ...</td>' $aExtract = StringRegExp($sText, '(?i)span\s*=\s*"?\s?(\d+)', 3) _ArrayDisplay($aExtract, "", Default, 8) M23 Thanks Melba23 It's nearly what I need! I wrote colspan on one line for mistake, but the word is exactly rowspan, also I need to parse one line at time, not multiline, so how can be the regexp adapted to simply parse only one line at time and an exact word (not *span) thanks edit: also, the quote of the number could be without quote or single quotes or double quotes. thanks Edited February 1, 2015 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
JohnOne Posted February 1, 2015 Share Posted February 1, 2015 (edited) That code should work on single or multiline. #include <Array.au3> $sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _ '<td rowspan= " 2 ">' & @CRLF & _ '<td style="width:400px;" colspan = "5 " ...</td>' $aExtract = StringRegExp($sText, '(?i)rowspan\s*=\s*"?\s?(\d+)', 3) _ArrayDisplay($aExtract, "", Default, 8) Edited February 1, 2015 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
Gianni Posted February 1, 2015 Author Share Posted February 1, 2015 Thanks JohnOne .... just writing exactly "rowspan" in the pattern is enough? what (?i) stands for? Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
JohnOne Posted February 1, 2015 Share Posted February 1, 2015 Don't know. Seriously, I just guessed that rowspan, I'm terrible at regex. I think ?i is not case sensitive. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
Gianni Posted February 1, 2015 Author Share Posted February 1, 2015 ...... I'm terrible at regex. ..... me too I think that I should decide to study RegExp one of this days.... (the regexp patterns scare me) Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
JohnOne Posted February 1, 2015 Share Posted February 1, 2015 I've said that to myself plenty of times, but when I get around to starting I just stare at the screen, like a monkey with an ipod. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
SadBunny Posted February 2, 2015 Share Posted February 2, 2015 (edited) Yes, "(?i)" indicated case-insensitive mode. From the StringRegExp manual: Caseless: matching becomes case-insensitive from that point on. By default, matching is case-sensitive. When UCP is enabled casing applies to the entire Unicode plane 0, else applies by default to ASCII letters A-Z and a-z only. Regex is soooo sweet I used to have this instinctive fear of it as well, but once I started using it daily it really took off. It's unbelievably powerful, especially when used with scripting languages like awk/sed/perl on linux machines. I now need and use it pretty much daily for work and have used it countless times in private life as well to perform all kinds of data mining tasks. Getting over that small difficulty spike at the beginning is well worth it By the way, just install the nifty little freeware (thought it was freeware, apparently not, my company just has a license) tool called "expresso", and you can toy around with your data set and the regex JohnOne/Melba provided, and see a breakdown of what every item in the pattern actually does and have immediate results. Great for tweaking those longer regex patterns. (There are many other tools like that but IMHO expresso is just unbeatable.) Also, I suggest getting a pillow cover printed with this regex cheatsheet. I once ordered a mousemat with it and it has served me well. Edited February 2, 2015 by SadBunny Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
TheSaint Posted February 2, 2015 Share Posted February 2, 2015 Of course, you could have also just done a StringSplit on rowspan= followed by a StringSplit on spaces, to get the number. Obviously for those who struggle with RegExp ... or don't want to expend the brainpower necessary ... like me. Make sure brain is in gear before opening mouth! Remember, what is not said, can be just as important as what is said. Spoiler What is the Secret Key? Life is like a Donut If I put effort into communication, I expect you to read properly & fully, or just not comment. Ignoring those who try to divert conversation with irrelevancies. If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it. I'm only big and bad, to those who have an over-active imagination. I may have the Artistic Liesense to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage) Link to comment Share on other sites More sharing options...
Solution SadBunny Posted February 2, 2015 Solution Share Posted February 2, 2015 Here, a quick test in expresso. I also refined your pattern somewhat, and included a line in your testset with multiple rowspans on the same line. Don't know what you want to do if you encounter those, don't know if you ever would encounter those, but still, it came to mind. One thing you should realize is why this actually gets the numbers: it's because the d+, i.e. the 1 or more digits you are looking for, is (inside brackets). That's a "capture group". The StringRegExp with mode 3 ($STR_REGEXPARRAYGLOBALMATCH) returns an array of substrings matching that capture group. One problem with regex is that there's quite a variation in the default behaviour of parsers, so if it's important, you always want to test as many scenarios as possible. In AU3: #include <Array.au3> $s = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF $s = $s & '<td rowspan= " 2 ">' & @CRLF $s = $s & '<td style="width:400px;" rowspan = '' 5 '' ...</td>' & @CRLF $s = $s & '<td rowspan = '' 6''</td><td rowspan = '' 7''</td>' _ArrayDisplay(StringRegExp($s, "(?i)rowspan\s*=\s*[""']?\s*(\d+)", 3)) Note, just to be complete in case you didn't already know: when including " in a "string", or a ' in a 'string', like in this pattern and in this example string, you need to double the quote to "escape" it and not break the string, otherwise you'll get syntax errors. So: $s = "This string ""contains"" double doublequotes." $s = 'This string ''contains'' double singlequotes.' Hope this helps a bit in your understanding. Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
SadBunny Posted February 2, 2015 Share Posted February 2, 2015 Of course, you could have also just done a StringSplit on rowspan= followed by a StringSplit on spaces, to get the number. Obviously for those who struggle with RegExp ... or don't want to expend the brainpower necessary ... like me. When trying to get specific substrings from unpredictable input it's often well worth the trouble. Before you know it you're spending much more brainpower and coding time on things like "how to stringsplit if there's maybe a single or double quote between the rowspan= and the number". Furthermore, I have literally built regex patterns in my dreams. Choosing Regex is not a question of calculating whether it's worth to expend brainpower or not, it's a way of life Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
TheSaint Posted February 2, 2015 Share Posted February 2, 2015 Well, I've survived quite well up until now, rarely using it. A simple replace for quotes does the trick, and you can also include a StringIsDigit if you want. Barely any thought in doing any of that. But hey, if others want to use RegExp, go right ahead ... I'll even sit back and admire how clever you are, while still rarely bothering myself. In fact, I see it as almost a challenge, to not use RegExp these days, as so many seem so proficient at it. For those who struggle though, especially newbies to the finer art of programming, it always pays to give them a simple alternative too. Let them pick the one they are most comfortable with, especially if there is a need to adapt. Make sure brain is in gear before opening mouth! Remember, what is not said, can be just as important as what is said. Spoiler What is the Secret Key? Life is like a Donut If I put effort into communication, I expect you to read properly & fully, or just not comment. Ignoring those who try to divert conversation with irrelevancies. If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it. I'm only big and bad, to those who have an over-active imagination. I may have the Artistic Liesense to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage) Link to comment Share on other sites More sharing options...
SadBunny Posted February 2, 2015 Share Posted February 2, 2015 Well, I've survived quite well up until now, rarely using it. A simple replace for quotes does the trick, and you can also include a StringIsDigit if you want. Barely any thought in doing any of that. But hey, if others want to use RegExp, go right ahead ... I'll even sit back and admire how clever you are, while still rarely bothering myself. In fact, I see it as almost a challenge, to not use RegExp these days, as so many seem so proficient at it. For those who struggle though, especially newbies to the finer art of programming, it always pays to give them a simple alternative too. Let them pick the one they are most comfortable with, especially if there is a need to adapt. Obviously. Of course, to each his own But I just can't imagine life without regex any more. A large part of my job, the part where I parse and process customer input (inherently unpredictable, weird character sets, possible injection attempts, accitental copypastes of the entire King James bible, etc) would be literally impossible without it Roses are FF0000, violets are 0000FF... All my base are belong to you. Link to comment Share on other sites More sharing options...
oapjr Posted February 2, 2015 Share Posted February 2, 2015 I found this the other day: http://regex.inginf.units.it/# It automatically creates regular expression pattern. It's not going to work every time but it might help Gianni 1 Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted February 2, 2015 Moderators Share Posted February 2, 2015 Lots of ways to create a working expression: "(?is)(?:row|col)span\h*=\h*(?:'|"")?\h*(\d+)" (?is) = case insensitive, work through any type of new line sequence (html is famous for not working as expected without using this) (?:row|col) = non-capturing group to select rowspan or colspan h* = work through any horizontal space if it exists (?:'|")? = non-capturing group to look for a single or double quote, the "?" after is basically saying... May or may not exist (d+) = capture group (give me my digits please) .... Btw, didn't I provide a pattern in the the udf funcs I did for you with _htmlraw_* that did something like this? Gianni 1 Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
mikell Posted February 2, 2015 Share Posted February 2, 2015 What about negated types ? $word = "rowspan" $res = StringRegExp($s, '(?i)\Q' & $word & '\E\s*=\D*(\d+)', 3) Gianni 1 Link to comment Share on other sites More sharing options...
Gianni Posted February 2, 2015 Author Share Posted February 2, 2015 (edited) @Melba23 Thanks again for your help, also, very instructive the "SRE decode". Thanks @SadBunny I think your regexp is exactly what I was looking for "(?i)rowspans*=s*[""']?s*(d+)" it also catch numbers when are double quoted or single quoted as well. also appreciated the bonus extra illustrated explanation... Thanks @SmOke_N the listing in >that post you provided has been the first place where I searched, but the RegExp contained therein "(?is)<s*(?:td|th)h+rowspan=(?:x22|x27)(d+)(?:x22|x27)s*>" is a bit complicated for my knowledges and maybe not general purpose? (p.s. I'm still working on that table extraction function and I'm nearly to my wanted result. I will post there in short (spare time allowing)) Thanks What about negated types ? $word = "rowspan" $res = StringRegExp($s, '(?i)\Q' & $word & '\E\s*=\D*(\d+)', 3) .... emmm .... maybe yes thanks @JohnOne @TheSaint @oapjr Thanks for the appreciated contributions Edited February 2, 2015 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted February 2, 2015 Share Posted February 2, 2015 .... emmm .... maybe yes thanks Oh sorry for the lack of comments, my f* laziness... You want to get digits, so you can use D* (0 or more non-digit chars) after the "=" to match spaces and quotes Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now