i have the following question i am trying to extract the content from a specific div. i know that there is a command "_IEGetObjById($oIE, "divID")"

but i end up with to much garbage because the div containing a id; have a couple of child div which only have a class. so i wondered is there a possibility to look for class name in stat of a ID?


provide a snippet of the HTML, and I can show you how to use my signature...or you can use DOM methods to loop through the children of your parent returned by:_IEGetObjById($oIE, "divID")


You can use _IETagnameGetCollection, DIV starting with the DIV you have and use the 0-based index to specify the nested DIV you want. Or, you can loop through the DIV collection returned without the index and look at attributes like classname for a match.


hi here is the part of the html page i am intrested in.

<div id="rt-main" class="mb8-sa4">
<div class="rt-mainsection">
     <div class="rt-mainrow">
                                             <div class="rt-grid-8 rt-alpha">
                                                                     <div class="rt-block component-block">
             <div class="component-content">
                 <div class="item-page">
<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">
[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>

<dl class="article-info">
<dt class="article-info-term">Gegevens</dt>
[b]<dd class="category-name">
Categorie: <a href="/nokia">Nokia</a> </dd>
<dd class="published">
Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]

[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]

this site provide the latest news on smartphones, I want to able to extract the data of the post to provide it as a news feed on a project im working on, for the record the source will be added.

The information that the program will need to produce lies in the bold marked code. Do you have any suggestions how to accomplish this?



Kludgy, non-IE solution (FWIW)

#include <array.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

$str = stringregexpreplace($str,'\[b\]','')
$str = stringregexpreplace($str,'\[/b\]','')
$ret = stringregexp($str,'>(.*?)<',3)

for $1 = ubound($ret) - 1 to 0 step -1
    if stringlen(stringstripws($ret[$1],3)) = 0 then    _arraydelete($ret,$1)



This might give you some ideas for an IE solution.

#include <array.au3>
#include <ie.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

Local $ohtml = ObjCreate('HTMLFILE')

If Not IsObj($ohtml) Then SetError(-1)


Local $odivs = _IETagnameGetCollection($ohtml, 'div'), $o_str

if not isobj($odivs) then seterror(-2)

for $odiv in $odivs
    ConsoleWrite('!  Classname = ' & $odiv.classname & '   title = ' & $odiv.title & '   id = ' & $odiv.id & @LF)
    consolewrite('>' & @tab & @tab & $odiv.innertext & @lf)


Your answer in in the reply I left for you. If you don't understand it, do some reading and ask questions, but I wouldn't suggest you just ignore it, bump and hope someone writes it for you.


hi guys,

sorry for the late response but indeed i found my answer with _IETagNameGetCollection.

I bumped the topic because i couldn't get it to work the first time with _IETagNameGetCollection and other commands, but now i gto it working thank you for tip.

Here is what i got so far:

$divs = _IETagNameGetCollection($oIE, "div")

For $div In $divs
Local $Title, $ImgSource, $Publised
If $div.className == "item-page" Then
$htmlcontent = String($div.innerHTML)

$Title = _StringBetween($htmlcontent, '">', '')

$ImgSource = _StringBetween($htmlcontent, 'src="', '"')

$rawPublised = _StringBetween($htmlcontent, '<dd class="published">', '</dd>')
    Func _formatdate($sString) ; " Gepubliceerd op woensdag, 02 januari 2013 "
    $day = StringRegExp ( $sString, '([0-9]{2})', 1)
    $year = StringRegExp ( $sString, '([0-9]{4})', 1)
    $month = StringRegExp ( $sString, '([a-z]{3-9})(:? [0-9]{4})', 1)
    ConsoleWrite ($day[0] & ' ' & $month[0] & ' ' & $year[0])

i now have a question about StringRegEXP hope you can help me with this.

as you can see $sString contains this value "Gepubliceerd op woensdag, 02 januari 2013" and i was able the extract the 02 and 2013 value.

But now i am tryin to extract the month but my criteria isent working, my idea was to look at the string as followed "02 " extract " 2013" any idea's?


nvm found what i was looking for, StringRegExp ( $sString, '([a-z]{3,9})(?: ([0-9]{4}))', 1).

i think that i can finish it from here off.

thank you guys for the tips

Topic can be closed

