Jump to content

Recommended Posts

Posted

Howdy,

  I would like to find the screen coordinates (x, y) of an element within an IE browser so that I may be able to automate 'scrolling' of the browser window to that particular coordinate. 

  I know I can use the '_IEGetObjByID' and '_IEGetObjByName' commands to return an 'object variable' which I can then use to access the 'browserx' and 'browsery' properties of an element.

  Question is how do I get an 'object variable' if there is no 'Name' or 'ID' to use with those commands?  Not all elements on a page have 'Name' or 'ID' attributes associated with them.  For example if I search for the string "49 Albert St" on a page...this may be in the form of various tags (<p>, <h2>, <span>, etc) but have no 'Name' or 'ID' attributes assigned...so how can I obtain the 'object variable' for an element such as this...?

  I figure if I can get the 'object variable' I should then be able to get the browser coordinates where that element lies on a webpage...then I should hopefully be able to automate the process of scrolling the page down to that location of the element...but how does one go about sourcing the 'object variable' when there are no attributes to 'key' on...?  I thank you in advance for any advice.  Regards.

Posted

Normally if it has absolutely not tags or attributes or innertext I'll use the previous element that has something distinguishable and then use something like $oObject.NextSibling...

Posted

Hi,

  Thanks for the reply.  In my example the "49 Albert St" would be the innertext (for example <h2>49 Albert St</h2>) ...can the innertext be used in the 'GetObjByName' command to return the 'object variable'...?  If so then that would be helpful...

Posted

No you would use Tags for example

;~ Get all H2 Elements
$oH2Tags = _IETagNameGetCollection($oIE, "h2")
;~ Loop through each H2 Object Element and check InnerText
For $oH2Tag In $oH2Tags
    If $oH2Tag.InnerText = "49 Albert St" Then MsgBox(4096, "Address Info", "Address : " & $oH2Tag.InnerText)
Next

 

Posted (edited)

OK, thanks for the information...I think I understand that...seems a bit inefficient, especially if there are many tags/innertexts to check...but I guess I can live with that...so is that used as a 'name' or an 'ID'...so I can get the x/y coordinates of that element?  example:

;~ Get all H2 Elements
$oH2Tags = _IETagNameGetCollection($oIE, "h2")
;~ Loop through each H2 Object Element and check InnerText
For $oH2Tag In $oH2Tags
    If $oH2Tag.InnerText = "49 Albert St" Then 
    MsgBox(4096, "Address Info", "Address : " & $oH2Tag.InnerText)
    $_reference = _IEGetObjByName($oIE, $oH2Tag)  ;get the 'object variable'...???
    
    $_ref_X = _IEPropertyGet($_reference, "browserx") ;x coordinate of <h2>49 Albert St</h2>...???
    $_ref_Y = _IEPropertyGet($_reference, "browsery") ;y coordinate of <h2>49 Albert St</h2>...???    
    EndIf  
Next

 

thank you again so much for the feedback

Edited by Burgs
Posted
1 hour ago, Burgs said:

so is that used as a 'name' or an 'ID'...so I can get the x/y coordinates of that element?

You don't need to do anything "extra". Just use $oH2Tag, which has the reference to the element, ie --

$_ref_X = _IEPropertyGet($oH2Tag, "browserx") ;x coordinate of <h2>49 Albert St</h2>...???
    $_ref_Y = _IEPropertyGet($oH2Tag, "browsery") ;y coordinate of <h2>49 Albert St</h2>...???

 

Posted

Hello,

  Oh OK thanks, I understand.  So the " $oH2Tag " IS the 'object variable' itself. 

  It seems that works fine when the proper tag is called to use in the _IETagNameGetCollection...however what if the tag is unknown?  I can think of a way or two to do it...using 'StringInStr' and possibly 'StringRegExp' to get the tag which can then be plugged into the '_IETagNameGetCollection'...however is there a more elegant or efficient way of doing it...?   Probably not...just asking.

Thanks again for the responses...!

Posted

Well, there's always _IETagNameAllGetCollection, but that is going to require you to traverse the entire DOM. Other possibilities --

  • Use Regex with _IEDocReadHTML
  • Use NextSibling (as suggested by @Subz) or something similar
  • Use the Xpath UDF
  • etc

There are lots of ways to skin the cat. Maybe we could help you identify the best option if you posted some HTML that demonstrates the scenario you are trying to solve.

Posted

Hello,

  Thanks for the reply.  Well actually I cannot offer any HTML because I'm attempting to generate something of a more 'universal' solution.  I have a need to scrape data, mainly addresses and other related information...off a great many websites.  Due to the large number (millions) of websites on the Internet...there are countless different ways to display it using so many different HTML tags.

  Schematically what I'm doing is basically:

  •   ...reading in the HTML source code from a page. ( _IEDocReadHTML )
  •   ...using 'StringInStr' to locate certain desired text I'm looking for...since I don't know what tags will be used however I generally do know what address information I am seeking from a website.    
  •   ...using additional 'StringInStr' commands to 'strip out' the tag(s) associated with the 'innertext' I have located.
  •   ...performing any needed additional processing (like discovering attributes, parent nodes, child and sibling nodes, etc).

  That is why I asked about removing the need to perform a loop in order to isolate the tag I want to work with...I was simply looking for ways to simplify the process and reduce iterations of code...I apologize I cannot be more specific with actual HTML however there are endless examples I could come up with if you needed to see...right now I was just looking for ideas on how to streamline my process...because I thought the loop is a bit cumbersome however I normally do not have 'ID' or 'name' attributes available to work with so it seems that is the only way to get an 'object variable'.

  I will look further into those ideas you mentioned as well...thanks so much again for the feedback from each who posted replies.  I appreciate it.

 

Posted

If you only want text you can use _IEBodyReadText, it drops all of the HTML Tags.  Scrapping a website will always be specific to a particular web page because:

a. hardly anyone keeps to standards which means the code will generally differ from website to website
b. depends on how the pages are rendered, unless a page is straight html, you're bound to encounter a number of issues. which mixing javascript or serverside rendered pages.

One thing about objects is that you can usually read everything within that object so for example, look at the html code below, I should be able to use the following code, which basically should only loop through one iteration of <h2> tags not the entire document.

Anyway Friday night need to go and have a drink

Ciao

$oDiv3 = $oDiv3 = _IEGetObjById("Div3")
$oH2Tags = _IETagNameGetCollection($oDiv3, "h2")
;~ Loop through each H2 Object Element and check InnerText
For $oH2Tag In $oH2Tags
    If $oH2Tag.InnerText = "49 Albert St" Then MsgBox(4096, "Address Info", "Address : " & $oH2Tag.InnerText)
Next
<body>
<div id="Div1">
<div id="Div2">
<h2>Blah</h2>
<p>BlahBlah</p>
<h2>SomeOtherData</h2>
</div> ;~ End of Div2
<div id="Div3">
<h2>49 Albert St</h2>
</div> ;~ End of Div3
</div> ;~ End of Div1
</body>

 

Posted (edited)

Yes OK I see how you are confining the loop to only cycle through the "h2" elements within the "Div3" tag only...that is useful for sure to reduce processing...thanks for the tip.

You mentioned some other concerns I had as well about server side and JavaScript pages...I don't suppose there is anything that can be done about them...?  These techniques being mentioned are only for HTML source only...correct?

Thanks again for the information...have a drink for me as well!  haha

Edited by Burgs

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...