molotofc Posted October 26, 2013 Share Posted October 26, 2013 Dear all, I'm not sure if I can explain this correctly. Take this for example: http://www.bbc.co.uk/weather/2643743 On this webpage there is a part which says 'Sunrise' and then it gives the time of sunrise (at the moment 07:43). The sunrise value changes everyday obviously. I've not tried it but StringInStr should have no problem in finding the word 'Sunrise', but not for values that changes. What's the easiest and quickest way to find and return the sunrise value, or any other 'string of varying value'? I think in VBA you can do something like this; not sure if AutoIt can. Link to comment Share on other sites More sharing options...
jchd Posted October 26, 2013 Share Posted October 26, 2013 Lookup StringRegExp, especially the version that comes with the beta. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted October 26, 2013 Share Posted October 26, 2013 Because in the beta, there is a brand new and pretty niiiice help page for StringRegExp $code = BinaryToString(InetRead("http://www.bbc.co.uk/weather/2643743")) $res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1") Msgbox(0,"", "Sunrise at : " & $res) Link to comment Share on other sites More sharing options...
abberration Posted October 26, 2013 Share Posted October 26, 2013 I haven't tried this UDF, but it looks like you can calculate sunrise/sunset yourself without consulting a website: '?do=embed' frameborder='0' data-embedContent>> Easy MP3 | Software Installer | Password Manager Link to comment Share on other sites More sharing options...
mikell Posted October 26, 2013 Share Posted October 26, 2013 abberration, The sunrise thing was just for example, the real purpose is to find an unknown string which follows a known string In this case the regular expression finds the characters different from '<' following the string '>Sunrise ' Link to comment Share on other sites More sharing options...
abberration Posted October 26, 2013 Share Posted October 26, 2013 That's cool. I suggested that because I have seen people ask questions for something specific because they did not know an another way was possible. Easy MP3 | Software Installer | Password Manager Link to comment Share on other sites More sharing options...
molotofc Posted October 27, 2013 Author Share Posted October 27, 2013 Indeed it was just an example, but thanks for trying anyway mikell: that works great! but could you please explain: $res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1") how exactly does it specify the next word to return - and if I want to go further, how could I look for the next next word? Thank you Link to comment Share on other sites More sharing options...
Jury Posted October 28, 2013 Share Posted October 28, 2013 Its this bit: ([^<]+) which is captured in backreference variable $1 the meaning is: Match the regular expression below and capture its match into backreference number 1 «([^<]+)» Match any character that is NOT a “<” «[^<]+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Match any single character «.+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Link to comment Share on other sites More sharing options...
mikell Posted October 28, 2013 Share Posted October 28, 2013 mikell: that works great! but could you please explain: $res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1") how exactly does it specify the next word to return - and if I want to go further, how could I look for the next next word? Thank you For this the concerned helpfile page becomes your best friend as there are several ways to build a regular expression In this particular case the string you want to get is matched by the part of expression between parentheses >Sunriseh* : the string from which the search begins, meaning '>Sunrise' followed by 0 or more horizontal white spaces [^<]+ means : "one or more characters which are not the '<' character " , this definition obviously includes the digits and the colon and allows the search to stop at the first '<' character found So you understand that the expression should probably need to be adapted if you need to match something different Link to comment Share on other sites More sharing options...
JohnFall Posted October 29, 2013 Share Posted October 29, 2013 Hi all, This thread is looking at a problem that seems very similar to one I'm trying to solve myself at the moment. Apologies if this is a hijack but perhaps this discussion could also help the original poster. I'm brand new to AutoIt but seem to have got to grips with it pretty quickly, I'm really pleased to see such an active user base. Ok so in my work we buy client information from a third-party, this third-party are market leaders and unfortunately they know it, consequently their site hasn't been enhanced in over 10 years and whilst it works fine, due to the amount of data we buy from them an API would be a great help, currently we have to manually enter this data and that's a complete waste of time for the member of staff doing it, so I've been trying to automate it. So what happens is the provider email us a snippet of the customer data we've bought, the full data is on their site which we have to log in to view and then copy to our DB. My solution so far uses a VBA macro in Outlook to read the unique ID of this 'snippet' email and write it to a text file then call the AutoIt executable which loads up IE, logs-in, navigates to the customer data we want by reading the unique ID from the text file and then clicking the link that matches the ID. So far so good. However the full customer data is displayed in a human readable format and not so great for programmatic integration. Luckily for each page containing the customer date the tags either side are static, here's an example; <span id="NameLabel">Mr Joe Bloggs</span> What I need to do is read the first and last name and write that to a file, I don't need the title. Obviously the first and last name can vary in length. I have been looking at StringRegExp and believe the answer lies there. So I guess I need to find the first group of characters after the first white-space and before the second white-space, then the second group of characters after the second white-space and before the '<' character. Another problem could potentially be that the title will differ e.g. Mr/Mrs/Miss/Dr./Father etc. so perhaps it needs to find the '<span id="NameLabel">' then ignore the characters after this string until the first white-space? Any and all help graciously received. John. Link to comment Share on other sites More sharing options...
jchd Posted October 29, 2013 Share Posted October 29, 2013 While grabbing data thru regexps is a possibility, it won't be as robust as doing the same with the _IE* functions, which i strongly recommend. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted October 29, 2013 Share Posted October 29, 2013 Using regex the most secure way would be here to grab the full identity string $res = StringRegExpReplace($code, '(?s).+<span id="NameLabel">([^<]+).+', "$1") then parse the result with String* funcs , StringSplit or so, to prevent possibles errors with names like "Sir John William Smith", "Mrs Dr. H. Smith" etc Link to comment Share on other sites More sharing options...
JohnFall Posted October 31, 2013 Share Posted October 31, 2013 Thanks guys. I haven't had time to return to this project just yet but I'm hoping to have some time for it later today. Jchd, regarding the _IE* actions, I was looking at these functions but couldn't find anything that would specifically read what I needed from the page, probably my lack of experience so I'll read up on those sections again but thanks also mikell for providing that line of code. Link to comment Share on other sites More sharing options...
Xenobiologist Posted October 31, 2013 Share Posted October 31, 2013 #include <IE.au3> $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") ConsoleWrite(_getText('span', 'sunrise') & @CRLF) ConsoleWrite(_getText('span', 'sunset') & @CRLF) Func _getText($tag = 'div', $className = '') Local $oCorrectObj = '' Local $tags = $oIE.document.GetElementsByTagName($tag) For $tag In $tags ;~ $class_value = $tag.GetAttribute("class") $class_value = $tag.className If String($class_value) = $className Then $oCorrectObj = $tag ;~ MsgBox(0, "Level: ", "Level found :)") ExitLoop EndIf Next If IsObj($oCorrectObj) Then Return _IEPropertyGet($oCorrectObj, "innertext") EndIf Return -1 EndFunc ;==>_getText Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
mikell Posted October 31, 2013 Share Posted October 31, 2013 Whatever how you do it the _IE* way is (on my pc) minimum about 12 times slower than the InetRead+regex one ^^ Link to comment Share on other sites More sharing options...
Xenobiologist Posted October 31, 2013 Share Posted October 31, 2013 But it also works. :-) It was just a try to get it done. Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
jchd Posted October 31, 2013 Share Posted October 31, 2013 Whatever how you do it the _IE* way is (on my pc) minimum about 12 times slower than the InetRead+regex one ^^ True but the _IE* functions are immune to meaningless changes in the html flow, each of them easily breaking regexp-based code. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Gianni Posted October 31, 2013 Share Posted October 31, 2013 stringinstring & c. based $source = "On this webpage there is a part which says 'Sunrise' and then it gives the time of sunrise (at the moment 07:43). The sunrise value changes everyday obviously." $substring = 'moment' ConsoleWrite('The word after "' & $substring & '" is this -->' & @TAB & StringMid($source, (StringInStr($source, $substring) + StringLen($substring) + 1), StringInStr($source, " ", 0, 2, StringInStr($source, $substring) + StringLen($substring)) - (StringInStr($source, $substring) + StringLen($substring))) & @CRLF) just a proof of concept Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
jdelaney Posted October 31, 2013 Share Posted October 31, 2013 (edited) If you use the UDF in my sig: $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743", 0, 0) _IELoadWait($oIE) $aSunrise = BGe_IEGetDOMObjByXPathWithAttributes($oIE,"//span[@class='sunrise']") $sTime = StringRegExpReplace($aSunrise[0].innertext,"[^\d:]","") ConsoleWrite($sTime & @CRLF) _IEQuit($oIE) or, with just ie.au3: #include <ie.au3> $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743", 0, 1) _IELoadWait($oIE) $oSpans = _IETagNameGetCollection($oIE, "span") For $oSpan In $oSpans If String($oSpan.className) = "sunrise" Then $sFullText = String($oSpan.innertext) ConsoleWrite($sFullText & @CRLF) $sTime = StringRegExpReplace($sFullText,"[^\d:]","") ConsoleWrite($sTime & @CRLF) ExitLoop EndIf Next _IEQuit($oIE) haha, just noticed an example was already provided... Edited October 31, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
mikell Posted October 31, 2013 Share Posted October 31, 2013 True but the _IE* functions are immune to meaningless changes in the html flow, each of them easily breaking regexp-based code. jc, if such changes are likely to occur I can hardly imagine any really immune solution Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now