Hi Chimp!
Thanks a lot for your example - it saved me a lot work!
I had to parse a table with almost 1400 rows (and lots of rowspans) in an 1.5MB HTML file, and got some performance issues. Here is how I solved them:
First, I adapted the HTML tag position search in _ParseTags to search starting on the last tag found position, so StringInStr doesn't need to count thousands of "<tr" tags every iteration. Then, _ArraySort failed (too many rows...). So, to get the tag list pre-sorted, I search for the first opening and first closing tag. If the opening is before the closing, write to $aThisTagsPositions and find the next opening; if the closing is before the next opening, write to $aThisTagsPositions and find the next closing.
This made it possible to read that huge HTML file in less than 90 seconds.
Just replace the code on lines 208-216 with this:
Local $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1)
Local $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1)
Local $iOpenCount = 1
; 2) find in the HTML the positions of the $sOpening <tag and $sClosing </tag> tags
For $i = 1 To $iNrOfThisTag * 2 ;search all the opening and closing tags
If ($iNextOpenPosition < $iNextClosePosition) And $iNextOpenPosition <> 0 Then
$aThisTagsPositions[$i][0] = $iNextOpenPosition
$aThisTagsPositions[$i][1] = $sOpening ; it marks which kind of tag is this
$aThisTagsPositions[$i][2] = $iOpenCount; nr of this tag
$iOpenCount += 1
$iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1, $aThisTagsPositions[$i][0] + 1)
Else
$aThisTagsPositions[$i][0] = $iNextClosePosition + StringLen($sClosing) - 1
$aThisTagsPositions[$i][1] = $sClosing ; it marks which kind of tag is this
$iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1, $aThisTagsPositions[$i][0] + 1)
EndIf
Next