vij Posted May 1, 2013 Share Posted May 1, 2013 (edited) I would like to read html body as lines and then look for a text that follows after a particular string Starting it like so: #include <IE.au3> $ieObj=_IECreate("https://ahrefs.com/index.php") $str = _IEBodyReadHTML($ieObj) Need to search for the string that follows after "a.src=document.location.protocol+" in the html body which is "//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000); Anyone already have a function that does it? Thanks Edited May 1, 2013 by vij Link to comment Share on other sites More sharing options...
jdelaney Posted May 1, 2013 Share Posted May 1, 2013 (edited) Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data Edited May 1, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
vij Posted May 1, 2013 Author Share Posted May 1, 2013 (edited) Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data The data I am looking for is not part of a link or html tag... Its part of a java script text within the page https://ahrefs.com/index.php Need to search for the string that follows "a.src=document.location.protocol+" #include $ieObj=_IECreate("https://ahrefs.com/index.php") $str = _IEBodyReadHTML($ieObj) Edited May 1, 2013 by vij Link to comment Share on other sites More sharing options...
jdelaney Posted May 1, 2013 Share Posted May 1, 2013 (edited) Same difference, but use __IEGetObjByName (script is the name) rather than _IELinkGetCollection Inside the loop, do stringinstr for the obj.innertext Edited May 1, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
vij Posted May 1, 2013 Author Share Posted May 1, 2013 (edited) Same difference, but use __IEGetObjByName (script is the name) rather than _IELinkGetCollection Inside the loop, do stringinstr for the obj.innertext I dont understand. How can that get the text after "a.src=document.location.protocol+" which is //dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js? From function MentionsCheckForm() { $('div.alert').remove(); $('input.alert-border').removeClass('alert-border'); var Ret = true; var request = $.trim(MentionRequestObj.val()); if (request == '') { MentionRequestObj.addClass('alert-border').after('<div class="alert font90">Incorrect Request</div>'); Ret = false; } if (Ret) { ProcessObj.show(); } else { ProcessObj.hide(); } return Ret; } </script> <script type="text/javascript"> setTimeout(function(){var a=document.createElement("script"); var b=document.getElementsByTagName("script")[0]; a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000); a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1); </script></body> </html> Edited May 1, 2013 by vij Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted May 1, 2013 Moderators Share Posted May 1, 2013 vij, Putting that text into a file (so that you do not have to worry about the single/double quote mix getting it into a string) makes it very easy to extract what you want: $sText = FileRead("Text.txt") $aExtract = StringRegExp($sText, "(?i)protocol\+\x22(.*js)\?", 3) ConsoleWrite($aExtract[0] & @CRLF) M23 vij 1  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area  Link to comment Share on other sites More sharing options...
jdelaney Posted May 1, 2013 Share Posted May 1, 2013 (edited) #include <IE.au3> $oIE=_IECreate("https://ahrefs.com/index.php") _IELoadWait($oIE) $oScriptCol = $oIE.document.GetElementsByTagName("Script") ConsoleWrite($oScriptCol.length & @CRLF) For $oScript In $oScriptCol If StringInStr($oScript.innertext, "a.src=document.location.protocol+") Then $adata = StringRegExp($oScript.innertext, "(?i)protocol\+\x22(.*js.*\);\s)", 3) ;~ $start = StringInStr($oScript.innertext, "a.src=document.location.protocol+") ;~ $end = StringInStr($oScript.innertext, "; ",default,Default,$start+1) ;~ ConsoleWrite($oScript.innertext & @CRLF & $end & @CRLF & StringMid($oScript.innertext,$start+StringLen("a.src=document.location.protocol+"),$end-10) & @CRLF) ConsoleWrite($adata[0] & @CRLF) EndIf Next Edited May 1, 2013 by jdelaney vij 1 IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
vij Posted May 2, 2013 Author Share Posted May 2, 2013 (edited) Thank you very much melba23 and jdelany. I was getting into a rut and you guys opened things for me And, I did this. #include <File.au3> #include <IE.au3> $oIE=_IECreate("https://ahrefs.com/index.php") $str=_IEBodyReadHTML($oIE) $answer=StringRegExp($str, "(?<=a\.src=document\.location\.protocol\+).*(?=\+Math\.floor\(new\ Date\(\)\.getTime\(\)/3600000\);)",1) ConsoleWrite($answer[0]) For those who find regexp daunting, there are regex designers. I used the one that comes with zennoposter -helps you construct those regular expressions quick Edited May 2, 2013 by vij Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now