jchd Posted October 31, 2013 Share Posted October 31, 2013 You're misled by your assumptions. A new version of some server component or a parameter change can result in insertion or removal of a number of whitespace characters in the flow, at about every point where they are allowed (yet meaningless) but uncommon. That breaks regexps, unless you insert [hs]* in UTF mode almost everywhere in your patterns, without ever missing one place where they can appear. I had a few live examples of web services which would provide differing versions almost everytime you made the same request, even several times in a row within the same session. While the active content was exactly identical, the flow was incredibly different, with sometimes dozens or hundreds of spaces, linefeeds, tabs or other meaningless allowed whitespaces at some place where another run had nothing at all. I was totally unaware such thing could happen and had to change hundreds of regexps to keep things working. IE functions are completely immune to such behavior. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Gianni Posted November 1, 2013 Share Posted November 1, 2013 (edited) an easy way to find the word after (and also word before if needed) within the whole text content of the web page example: #include <IE.au3> Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page _IEQuit($oIE) $substring = "Sunset" ; <-- change here the word of the reference $x = _WordAfter($source, $substring) MsgBox(0, "Word after " & $substring, $x) Func _WordBefore($source, $substring) ; returns the word before Local $a = StringInStr($source, $substring) ; position of character where start the found string ; Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before) Local $d = StringMid($source, $c + 1, $a - $c - 2) ; word before Return $d EndFunc ;==>_WordBefore Func _WordAfter($source, $substring) ; returns the word ofter Local $a = StringInStr($source, $substring) ; position of character where start the found string Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space) Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after Return $d EndFunc ;==>_WordAfter Edited November 1, 2013 by PincoPanco Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
molotofc Posted June 1, 2014 Author Share Posted June 1, 2014 an easy way to find the word after (and also word before if needed) within the whole text content of the web page example: #include <IE.au3> Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page _IEQuit($oIE) $substring = "Sunset" ; <-- change here the word of the reference $x = _WordAfter($source, $substring) MsgBox(0, "Word after " & $substring, $x) Func _WordBefore($source, $substring) ; returns the word before Local $a = StringInStr($source, $substring) ; position of character where start the found string ; Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before) Local $d = StringMid($source, $c + 1, $a - $c - 2) ; word before Return $d EndFunc ;==>_WordBefore Func _WordAfter($source, $substring) ; returns the word ofter Local $a = StringInStr($source, $substring) ; position of character where start the found string Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space) Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after Return $d EndFunc ;==>_WordAfter This is great thanks, but how to get it work on words that have mutiple occurrences? for example the word 'Range' Link to comment Share on other sites More sharing options...
mikell Posted June 1, 2014 Share Posted June 1, 2014 molotofc, It would be much easier if you point out what you exactly want to get $code = BinaryToString(InetRead("http://www.bbc.co.uk/weather/2643743")) ; isolate the needed part $a = StringRegExpReplace($code, '(?s).+?environmental-summary.+?<h4>(.+?)[^>]+?clear.+', '$1') ;Msgbox(0,"", $a) ; remove tags and strip multiple spaces $res = StringStripWS(StringRegExpReplace($a, '(?s)<.*?>', ""), 4) Msgbox(0,"", $res) Link to comment Share on other sites More sharing options...
jchd Posted June 1, 2014 Share Posted June 1, 2014 It would be smart from people asking for help to specify completely, exactly and head first what they want to achieve and the context they are working in, rather than refining their "specifications" right after every answer is given to a previous uncomplete request. Else I'm afraid that helpers will become more and more reluctant to provide advices, noting that their helping efforts are systematically off the ever moving targets imposed by helpees. somdcomputerguy 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Gianni Posted June 1, 2014 Share Posted June 1, 2014 agree with jchd when questions are the fruit of lazyness. Anyway, sometimes it is not even clear to ourselves what we want to achieve, but the goal become clear only as you go along. Learn by mistakes is also a way to learn (when we are in good faith and not in lazyness) here a slightly revised version of those functions a little debugged and added the optional "Occurence" parameter to be used when needed: _WordBefore($source, $substring, [$occurrence = 1]) _WordAfter($source, $substring, [$occurrence = 1]) expandcollapse popup#include <IE.au3> #include <StringConstants.au3> Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") ; ,0,0) add this to hide browser ; Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page Local $source = _IEPropertyGet($oIE, "innertext") _IEQuit($oIE) ; quit the browser $substring = "Sunset" ; <-- change here the word of the reference $x = _WordAfter($source, $substring) MsgBox(0, "Check Word after", "Word after " & $substring & " is:" & @CRLF & @CRLF & $x) $x = _WordBefore($source, $substring) MsgBox(0, "Check Word before", "Word before " & $substring & " is:" & @CRLF & @CRLF & $x) Func _WordBefore($source, $substring, $occurrence = 1) ; returns the word before ; ; ------------ clean a bit the $source string ; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace For $i = 9 To 13 $source = StringReplace($source, Chr($i), " ") Next ; this removes leading/trailing/double spaces $source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES) ; ------------ If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where starts the found string If $a = 1 Or Not $a Then Return SetError(1, 0, "") ; searched word is the first in string or searched word not found Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before) Local $d = StringMid($source, $c + 1, $a - $c - 1) ; word before Return $d EndFunc ;==>_WordBefore Func _WordAfter($source, $substring, $occurrence = 1) ; returns the word ofter ; ; ------------ clean a bit the $source string ; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace For $i = 9 To 13 $source = StringReplace($source, Chr($i), " ") Next ; this removes leading/trailing/double spaces $source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES) ; ------------ If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where start the found string If Not $a Then Return SetError(1, 0, "") ; searched word not found Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space) Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after Return $d EndFunc ;==>_WordAfter please, could someone (that knows how to use regexp) tell me how to translate this snippet in a single regexp? ; ------------ clean a bit the $source string ; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace For $i = 9 To 13 $source = StringReplace($source, Chr($i), " ") Next ; this removes leading/trailing/double spaces $source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES) ; ------------ thanks Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted June 1, 2014 Share Posted June 1, 2014 Chimp, StringStripWS : "WS includes Chr(9) thru Chr(13)" , so $source = StringStripWS($source, 7) is enough Link to comment Share on other sites More sharing options...
Gianni Posted June 1, 2014 Share Posted June 1, 2014 (edited) Chimp, StringStripWS : "WS includes Chr(9) thru Chr(13)" , so $source = StringStripWS($source, 7) is enough Hi mikell I would substitute that chr() with spaces, not remove thanks Edited June 1, 2014 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted June 1, 2014 Share Posted June 1, 2014 Oh I understand, you don't want to keep the newlines StringRegExpReplace($source, 's', " ") should work Link to comment Share on other sites More sharing options...
Gianni Posted June 1, 2014 Share Posted June 1, 2014 Oh I understand, you don't want to keep the newlines StringRegExpReplace($source, 's', " ") should work thanks mikell, it works it replaces only chrs 9 10 12 13, but I think it can do anyway. this can simplify a bit my code. expandcollapse popup#include <IE.au3> #include <StringConstants.au3> Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") ; ,0,0) add this to hide browser ; Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page Local $source = _IEPropertyGet($oIE, "innertext") _IEQuit($oIE) ; quit the browser $substring = "sunset" ; <-- change here the word of the reference $x = _WordAfter($source, $substring, 1) ; 1 means find first occurrence of substring and then return to me word after it MsgBox(0, "Check Word after", "Word after " & $substring & " is:" & @CRLF & @CRLF & $x, 5) $x = _WordBefore($source, $substring, 1) ; 1 means find first occurrence of substring and then return to me word before it MsgBox(0, "Check Word before", "Word before " & $substring & " is:" & @CRLF & @CRLF & $x, 5) Func _WordBefore($source, $substring, $occurrence = 1) ; returns the word before ; this removes chr 9 10 12 13 and leading/trailing/double spaces $source = StringStripWS(StringRegExpReplace($source, '\s', " "), $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES) If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where starts the found string If $a = 1 Or Not $a Then Return SetError(1, 0, "") ; searched word is the first in string or searched word not found Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before) Local $d = StringMid($source, $c + 1, $a - $c - 1) ; word before Return $d EndFunc ;==>_WordBefore Func _WordAfter($source, $substring, $occurrence = 1) ; returns the word ofter ; this removes chr 9 10 12 13 and leading/trailing/double spaces $source = StringStripWS(StringRegExpReplace($source, '\s', " "), $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES) If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where start the found string If Not $a Then Return SetError(1, 0, "") ; searched word not found Local $b = $a + StringLen($substring) ; position of last character of searched string Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space) Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after Return $d EndFunc ;==>_WordAfter Thanks again. Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
molotofc Posted June 2, 2014 Author Share Posted June 2, 2014 Thanks a lot. I agree that sometimes I cannot find the solution myself, but actually I am not clear what I want myself, but realise it would help to do so, in order to get the appropriate help. What Chimp and Mikell have posted is pretty much what I was looking for. Many thanks. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now