shaggy89 Posted May 30, 2014 Share Posted May 30, 2014 hi all, im trying to scrape some info off a wbsites api. when i open link its an xml and info i want is as follows <availability> <members date="2014-05-30" count="2" day="2" night="1" OOA="0" na="0" /> </availability> the info i want is the day= number would i use the _IEGetObjByName or _IEGetObjById and use day as the id or name? cheers p.s first time using xml and api Link to comment Share on other sites More sharing options...
jchd Posted May 30, 2014 Share Posted May 30, 2014 Assuming you don't need more from this xml you can probably use this: Local $sXML = 'Blah ... <availability>' & @CRLF & '<members date="2014-05-30" count="2" day="2" night="1" OOA="0" na="0" />' & @CRLF & '</availability> ... more blah' Local $day = StringRegExpReplace($sXML, '(?is)(?:.*<availability>.*? day=")(\d+)(?:".*)', '$1') ConsoleWrite($day & @LF) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Geir1983 Posted May 30, 2014 Share Posted May 30, 2014 You could use _StringBetween function to get text between day=" and ". (Use it first to get text between <availability> and </availability>). Im sure there is also some xml udf on this forum you could use, or even the functions you refering to, dont realy know them. Link to comment Share on other sites More sharing options...
shaggy89 Posted May 30, 2014 Author Share Posted May 30, 2014 thanks guys for reply @Jchd: that is all the info i need from the xml. @ Geir1983: I did look at udf but seems that all examples were reading of a file on hdd where this would be direct off the interwebs using IE.au3 libs again thanks are there any tips for reading direct off net xml ? Link to comment Share on other sites More sharing options...
shaggy89 Posted May 31, 2014 Author Share Posted May 31, 2014 @Jchd that works well but i for got to mention a one part as i was in a rush. I use IECreate(" point to website with api key) so now i need to extract that "day" number from that website. the "day" number always changes its not a stacit number. any ideas ? Link to comment Share on other sites More sharing options...
jchd Posted May 31, 2014 Share Posted May 31, 2014 The regular expression will extract whatever unsigned integer number is after day=" This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
shaggy89 Posted May 31, 2014 Author Share Posted May 31, 2014 The regular expression will extract whatever unsigned integer number is after day=" Â Right so what i have tried is #include <IE.au3> #include <Constants.au3> #include <String.au3> Global $oIE = _IECreate("website ") _IELoadWait ($oIE) Local $sXML ='<availability>' & @CRLF & '<members date="2014-05-30" count="2" day="2" night="1" OOA="0" na="0" />' & @CRLF & '</availability>' Local $day = StringRegExpReplace($oIE,$sXML, '(?is)(?:.*<availability>.*? night=")(\d+)(?:".*)', '$1') ConsoleWrite($day & @LF) Then all i get is an error. Local $sXML = $oIE,'<availability>' & @CRLF & '<members date="2014-05-30" count="2" day="2" night="1" OOA="0" na="0" />' & @CRLF & '</availability>' Local $sXML = $oIE,^ ERROR >Exit code: 1 Time: 0.907 My goal is to load up websites API, From there find the "day" number. in the end i will do a If $day <2 do so that day number has to come off that website as that "day number changes Thanks thus far Link to comment Share on other sites More sharing options...
Danp2 Posted May 31, 2014 Share Posted May 31, 2014 The error shown doesn't appear to match the code you posted. However, it looks like you are trying to use the $oIE object directly instead of calling one of its functions. To retrieve the XML from the webpage, you can try this: Local $sXML = _IEBodyReadHTML($oIE) You may also want to review the _IEBody* and _IEDoc* functions to see if one of them will work better for your given situation. shaggy89 1 Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
shaggy89 Posted May 31, 2014 Author Share Posted May 31, 2014 thanks @Danp2: i shall try that code shortly. after looking at code i just posted i can see what i did wrong. i added the $oIE on front thinking to read from that sorce. I have looked a _IEBodyReadHtml but assumed that was for HTML only. then relized XML is just code(new to xml and web scraping).so will try later thanks for all the help Link to comment Share on other sites More sharing options...
mikell Posted May 31, 2014 Share Posted May 31, 2014 If you know the address of the xml then you can get its text without using _IE* (faster) Here is an example $sXML = BinaryToString(InetRead("http://api.openweathermap.org/data/2.5/weather?q=London&mode=xml")) Msgbox(0,"content", $sXML) $clouds = StringRegExpReplace($sXML, '(?is).*<clouds.*?name="([^"]+).*', '$1') Msgbox(0,"clouds", $clouds) shaggy89 1 Link to comment Share on other sites More sharing options...
shaggy89 Posted May 31, 2014 Author Share Posted May 31, 2014 Â If you know the address of the xml then you can get its text without using _IE* (faster) Here is an example $sXML = BinaryToString(InetRead("http://api.openweathermap.org/data/2.5/weather?q=London&mode=xml")) Msgbox(0,"content", $sXML) $clouds = StringRegExpReplace($sXML, '(?is).*<clouds.*?name="([^"]+).*', '$1') Msgbox(0,"clouds", $clouds) thanks wored a Charm and your right works lot faster than _IE. what UDF was this from so i can read and learn more ? Link to comment Share on other sites More sharing options...
mikell Posted May 31, 2014 Share Posted May 31, 2014 The Inet* funcs are integrated, find them in the helpfile @ Function Reference / Network Management (nice examples inside) Else you can look at the Inet.au3 UDF Link to comment Share on other sites More sharing options...
shaggy89 Posted June 5, 2014 Author Share Posted June 5, 2014 hi all, Â Â Â Â Â Â Â script has been working great up until my xml had more data added in my original post I had one date. But now i have added more dates the script is finding information for all date not just current one. how can i make it read only today's date? xml example <availability> <members date="2014-06-6" count="2" day="2" night="1" OOA="0" na="0" /> <members date="2014-06-7" count="6" day="5" night="1" OOA="0" na="0" /> <members date="2014-06-8" count="8" day="4" night="1" OOA="0" na="0" /> <members date="2014-06-9" count="9" day="9" night="1" OOA="0" na="0" /> </availability> cheers shane Link to comment Share on other sites More sharing options...
Solution mikell Posted June 5, 2014 Solution Share Posted June 5, 2014 $sXML = '<availability>' & @crlf & _ '<members date="2014-06-6" count="2" day="2" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-7" count="6" day="5" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-8" count="8" day="4" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-9" count="9" day="9" night="1" OOA="0" na="0" />' & @crlf & _ '</availability>' $day = StringRegExpReplace($sXML, '(?is).*<availability.*?day="([^"]+).*</availability.*', '$1') ; gets the first one (2) Msgbox(0,"day", $day) $day = StringRegExpReplace($sXML, '(?is).*<availability.*day="([^"]+).*?</availability.*', '$1') ; gets the last one (9) Msgbox(0,"day", $day) shaggy89 1 Link to comment Share on other sites More sharing options...
shaggy89 Posted June 6, 2014 Author Share Posted June 6, 2014 (edited) $sXML = '<availability>' & @crlf & _ '<members date="2014-06-6" count="2" day="2" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-7" count="6" day="5" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-8" count="8" day="4" night="1" OOA="0" na="0" />' & @crlf & _ '<members date="2014-06-9" count="9" day="9" night="1" OOA="0" na="0" />' & @crlf & _ '</availability>' $day = StringRegExpReplace($sXML, '(?is).*<availability.*?day="([^"]+).*</availability.*', '$1') ; gets the first one (2) Msgbox(0,"day", $day) $day = StringRegExpReplace($sXML, '(?is).*<availability.*day="([^"]+).*?</availability.*', '$1') ; gets the last one (9) Msgbox(0,"day", $day) should explain the xml is a calender that shows dates for 2 weeks at a time. what im trying to achieve is to get the data for current day e.g get today's data today , tomorrows data tomorrow etc i did try $date = _Date_Time_SystemTimeToDateStr($tDate, 1) $day = StringRegExpReplace($sXML, & $date & '(?is).*<availability.*?day="([^"]+).*</availability.*', '$1') but got error $day = StringRegExpReplace($sXML, & $date & '(?is).*<members.*?day="([^"]+).*', '$1') $day = StringRegExpReplace($sXML, ^ ERROR Edited June 6, 2014 by shaggy89 Link to comment Share on other sites More sharing options...
jdelaney Posted June 6, 2014 Share Posted June 6, 2014 (edited) XMLDOM would be an easier route (my opinion): #include <File.au3> $file = @DesktopDir & "\some.xml" _FileCreate($file) FileWrite($file,'<SomeXML>' & @CRLF & _ '<availability>' & @CRLF & _ '<members date="2014-06-6" count="2" day="2" night="1" OOA="0" na="0" />' & @CRLF & _ '<members date="2014-06-7" count="6" day="5" night="1" OOA="0" na="0" />' & @CRLF & _ '<members date="2014-06-8" count="8" day="4" night="1" OOA="0" na="0" />' & @CRLF & _ '<members date="2014-06-9" count="9" day="9" night="1" OOA="0" na="0" />' & @CRLF & _ '</availability>' & @CRLF & _ '</SomeXML>') $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.Load($file) $oMembers= $oXML.selectNodes('//availability/members') For $oMember In $oMembers ConsoleWrite("date=" & $oMember.getAttribute("date") & _ "; count=" & $oMember.getAttribute("count") & _ "; day=" & $oMember.getAttribute("day") & _ "; night=" & $oMember.getAttribute("night") & _ "; OOA=" & $oMember.getAttribute("OOA") & _ "; na=" & $oMember.getAttribute("na") & @CRLF) Next Exit output: date=2014-06-6; count=2; day=2; night=1; OOA=0; na=0 date=2014-06-7; count=6; day=5; night=1; OOA=0; na=0 date=2014-06-8; count=8; day=4; night=1; OOA=0; na=0 date=2014-06-9; count=9; day=9; night=1; OOA=0; na=0 In the loop, you can do a condition, to validate the date is today...you might also need to format the month, when it's a single digit: If String($oMember.getAttribute("date")) = @YEAR & "-" & @MON & "-" & StringRegExpReplace(@MDAY,"(0)(\d+)","\2") Then ConsoleWrite ( @TAB & "This date ^ is for today" & @CRLF) EndIf  Or, you can just add it to the xpath, and only that day will return: $oMembers= $oXML.selectNodes('//availability/members[@date=' & @YEAR & "-" & @MON & "-" & StringRegExpReplace(@MDAY,"(0)(\d+)","\2") & ']') The power of XMLDOM Edited June 6, 2014 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
shaggy89 Posted June 6, 2014 Author Share Posted June 6, 2014 the xml is off a web page so what happens is that program runs every hour. once day number drops below say 3 an email is sent. thata why i need to to read only current day Link to comment Share on other sites More sharing options...
jdelaney Posted June 6, 2014 Share Posted June 6, 2014 (edited) You can use _ie functions, and load the html into the xmldom (generally; it's not a 1:1 conversion). I've added more contributions, above. Or, just use the _ie functions, and you can do similar to what I've done. I thought this was an actual XML document, not just HTML source. Edited June 6, 2014 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
shaggy89 Posted June 6, 2014 Author Share Posted June 6, 2014 You can use _ie functions, and load the html into the xmldom (generally; it's not a 1:1 conversion). I've added more contributions, above. Or, just use the _ie functions, and you can do similar to what I've done. I thought this was an actual XML document, not just HTML source. I was going to use IE functions but then was told about BinaryToString(InetRead($Site)) and i would like to keep using this as its alot faster then opening IE. im sure there is a way to only make it read the days information. Link to comment Share on other sites More sharing options...
jdelaney Posted June 6, 2014 Share Posted June 6, 2014 Try that out into xmldom...see if it works: $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.LoadXML(BinaryToString(InetRead($Site))) $oMembers= $oXML.selectNodes('//availability/members[@date=' & @YEAR & "-" & @MON & "-" & StringRegExpReplace(@MDAY,"(0)(\d+)","\2") & ']') For $oMember In $oMembers ConsoleWrite("date=" & $oMember.getAttribute("date") & _ "; count=" & $oMember.getAttribute("count") & _ "; day=" & $oMember.getAttribute("day") & _ "; night=" & $oMember.getAttribute("night") & _ "; OOA=" & $oMember.getAttribute("OOA") & _ "; na=" & $oMember.getAttribute("na") & @CRLF) If String($oMember.getAttribute("date")) = @YEAR & "-" & @MON & "-" & StringRegExpReplace(@MDAY,"(0)(\d+)","\2") Then ConsoleWrite ( @TAB & "This date ^ is for today" & @CRLF) EndIf Next IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now