Gianni Posted January 11, 2015 Share Posted January 11, 2015 referring to this >post from this >topic, I've used that listing against a raw HTML file with the purpose of extracting the content of a table. Well, it works nearly well for my purpose, but it extracts the content of all the tables of the HTML page in just one array. #include <Array.au3> #include <IE.au3> ; just for HTML extraction from the page of the table example ; Local $oie = _IE_Example("table") Local $sHtml = _IEBodyReadHTML($oie) ; extract whole HTML ; Local $aResult = _TableWriteToArrayFromHTML($sHtml, 0) ; extracts table contents ; _ArrayDisplay($aResult) Func _TableWriteToArrayFromHTML($sHtml, $iTableNr = 0) ; second parameter should indicate which table Local $aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3) Local $aTempResult[UBound($aRes)][UBound($aRes)] Local $iRow = 0, $iCol = 0, $iMaxRow = 0 For $i = 0 To UBound($aRes) - 1 If $aRes[$i] = "/" Then $iRow += 1 $iCol = 0 Else $aTempResult[$iRow][$iCol] = $aRes[$i] $iCol += 1 If $iCol > $iMaxRow Then $iMaxRow = $iCol EndIf Next ReDim $aTempResult[$iRow][$iMaxRow] Return $aTempResult EndFunc ;==>_TableWriteToArrayFromHTML I would like to use that script (or some other way as well) in a way similar to the _IETableWriteToArray() function, used on a raw HTML instead of on an IE object instance. How could it be modified to allow the extraction of only one of the tables on the page (with the possibility to choose which one to extract)? Maybe, for example, starting the extraction from the "n" occurence of the tag <table> till </table> any help will be appreciated thanks Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted January 11, 2015 Share Posted January 11, 2015 I would first extract the "<table> till </table>" contents into an array, the "n" occurence of any table will be its index in this array Link to comment Share on other sites More sharing options...
Gianni Posted January 11, 2015 Author Share Posted January 11, 2015 thanks mikell, Even I was thinking about something like that. here a possible solution using _StringBetween() expandcollapse popup#include <Array.au3> #include <IE.au3> ; just for HTML extraction from the page of the table example #include <String.au3> ; ; Local $oie = _IECreate("http://www.w3schools.com/html/html_tables.asp") Local $oie = _IE_Example("table") Local $sHtml = _IEBodyReadHTML($oie) ; extract whole HTML $aTables = _StringBetween($sHtml, "<table", "</table>") ; each table goes into the array elements $iWantedTable = 1 ; second table (zero based) Local $aResult = _TableWriteToArrayFromHTML($aTables[$iWantedTable]) ; extracts table contents ; _ArrayDisplay($aResult, "Table nr." & $iWantedTable) _IEQuit($oie) Func _TableWriteToArrayFromHTML($sHtml) Local $aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3) ; _ArrayDisplay($aRes) Local $aTempResult[UBound($aRes)][UBound($aRes)] Local $iRow = 0, $iCol = 0, $iMaxRow = 0 For $i = 0 To UBound($aRes) - 1 If $aRes[$i] = "/" Then $iRow += 1 $iCol = 0 Else $aTempResult[$iRow][$iCol] = $aRes[$i] $iCol += 1 If $iCol > $iMaxRow Then $iMaxRow = $iCol EndIf Next ReDim $aTempResult[$iRow][$iMaxRow] Return $aTempResult EndFunc ;==>_TableWriteToArrayFromHTML has someone an alternative solution by using Regular expression instead of _StringBetween ? Thanks everybody Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted January 11, 2015 Share Posted January 11, 2015 (edited) This should be exactly the same $aTables = StringRegExp($sHtml, '(?s)<table.*?</table>', 3) Edit Fixed... sorry Edited January 11, 2015 by mikell Link to comment Share on other sites More sharing options...
Gianni Posted January 11, 2015 Author Share Posted January 11, 2015 This should be exactly the same $aTables = StringRegExp($sHtml, '(?s)<table>.*?</table>', 3) ... nope, I don't get an array... p.s. first string is <table (without the closing >) Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
mikell Posted January 11, 2015 Share Posted January 11, 2015 Edited Link to comment Share on other sites More sharing options...
Gianni Posted January 11, 2015 Author Share Posted January 11, 2015 (edited) Edited ...is not yet an array... code below returns Int32 as var type $aTables = StringRegExp($sHtml, '(?s)<table.*?</table>', 3) MsgBox(0, 0, VarGetType($aTables)) Edited January 11, 2015 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 12, 2015 Moderators Share Posted January 12, 2015 I went from trying a regex or 2 to this... but keep in mind, this is not a great solution. Nested tables, frame/iframe(s), html strings with regex data in them, all of those could screw these methods up. expandcollapse popup#include <Array.au3> Global $gsStr = _myHTML() Global $gaTables = _htmlraw_GetTables($gsStr) _ArrayDisplay($gaTables) Global $gaTable1Rows = _htmlraw_GetTableRows($gaTables[0]) _ArrayDisplay($gaTable1Rows) Global $gaTable1Row1Cols = _htmlraw_GetTableCols($gaTable1Rows[0]) _ArrayDisplay($gaTable1Row1Cols) Func _htmlraw_GetTables($sHTML) ; return an array of tables If Not StringLen($sHTML) Then Return SetError(1, 0, 0) EndIf ; some of the below pattern isn't necessary, but I code it as I think about conditions ; problem is with nested tables, this is not a good solution Local $sPatt = "(?si)<\s*table(?:\s*|\s.+?)>.*?<\s*/\s*table\s*>" Local $aReg = StringRegExp($sHTML, $sPatt, 3) If @error Then Return SetError(2, @error, 0) EndIf Return $aReg EndFunc Func _htmlraw_GetTableRows($sTable) ; believe it or not </tr> is not necessary ; though most use it, so better look for </table too> ; then there's the fun of not having nested tables ; but I don't have the brain power to think through all that today, so simple it is Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc Func _htmlraw_GetTableCols($sData) ; I've talked about nesting issues, just going to do it simple ; th/td Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc Func _myHTML() Local $sHTML $sHTML &= "0x3C68746D6C3E0D0A3C626F64793E0D0A3C7461626C65207374796C653D" $sHTML &= "2277696474683A31303025223E0D0A20203C74723E0D0A202020203C7468" $sHTML &= "3E4E616D653A3C2F74683E0D0A202020203C74643E42696C6C2047617465" $sHTML &= "733C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C" $sHTML &= "746820726F777370616E3D2232223E54656C6570686F6E653A3C2F74683E" $sHTML &= "0D0A202020203C74643E353535203737203835343C2F74643E0D0A20203C" $sHTML &= "2F74723E0D0A20203C74723E0D0A202020203C74643E3535352037372038" $sHTML &= "35353C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A3C74" $sHTML &= "61626C65207374796C653D2277696474683A31303025223E0D0A20203C63" $sHTML &= "617074696F6E3E4D6F6E74686C7920736176696E67733C2F63617074696F" $sHTML &= "6E3E0D0A20203C74723E0D0A202020203C74683E4D6F6E74683C2F74683E" $sHTML &= "0D0A202020203C74683E536176696E67733C2F74683E0D0A20203C2F7472" $sHTML &= "3E0D0A20203C74723E0D0A202020203C74643E4A616E756172793C2F7464" $sHTML &= "3E0D0A202020203C74643E243130303C2F74643E0D0A20203C2F74723E0D" $sHTML &= "0A20203C74723E0D0A202020203C74643E46656272756172793C2F74643E" $sHTML &= "0D0A202020203C74643E2435303C2F74643E0D0A20203C2F74723E0D0A3C" $sHTML &= "2F7461626C653E0D0A3C7461626C65207374796C653D2277696474683A31" $sHTML &= "303025223E0D0A20203C74723E0D0A202020203C74683E4E616D653C2F74" $sHTML &= "683E0D0A202020203C746820636F6C7370616E3D2232223E54656C657068" $sHTML &= "6F6E653C2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020" $sHTML &= "203C74643E42696C6C2047617465733C2F74643E0D0A202020203C74643E" $sHTML &= "353535203737203835343C2F74643E0D0A202020203C74643E3535352037" $sHTML &= "37203835353C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D" $sHTML &= "0A3C7461626C652069643D22743031223E0D0A20203C74723E0D0A202020" $sHTML &= "203C74683E46697273746E616D653C2F74683E0D0A202020203C74683E4C" $sHTML &= "6173746E616D653C2F74683E200D0A202020203C74683E506F696E74733C" $sHTML &= "2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C7464" $sHTML &= "3E4576653C2F74643E0D0A202020203C74643E4A61636B736F6E3C2F7464" $sHTML &= "3E200D0A202020203C74643E39343C2F74643E0D0A20203C2F74723E0D0A" $sHTML &= "3C2F7461626C653E0D0A3C2F626F64793E0D0A3C2F68746D6C3E" Return BinaryToString($sHTML) EndFunc Anyway, have fun. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Gianni Posted January 24, 2015 Author Share Posted January 24, 2015 Hi SmOke_N I'm back on this, I'm trying to get some results.... I came out with a way to extract all the tables from a web page also if them are nested. also, I've seen that my function to extract data from a given table works quite well. It returns a 2D array containing the table's data. (not quite good instead with tables that are not square) Trying to use your 2 functions I see that _htmlraw_GetTableCols returns all the cells in an 1D array, while _htmlraw_GetTableRows returns each row in each single element of an 1D array. well, I would like to merge both of your 2 functions to return an 2D array.... some suggestions on how to achieve it are welcome here is my base code for experimenting on tables expandcollapse popup#include <IE.au3> #include <String.au3> #include <Array.au3> ; ; 1) open an html page containing tables (also nested) ; it's an hodgepodge of tables just to make tests Local $oie = _IECreate() _IEDocWriteHTML($oie, MyHTML()) ; just to show the tables on the browser Do Sleep(250) Until IsObj($oie) Local $sHtml = _IEBodyReadHTML($oie) ; extract whole raw HTML of the page ; Local $aTables = ParseTables($sHtml) ; each table in each element of the array ; Local $iWantedTable, $sError, $aResult Do $iWantedTable = InputBox("select a table", "Please enter the nr. of the table to get data from (1 based)") $sError = @error If Not $sError Then $aResult = _TableWriteToArrayFromHTML($aTables[$iWantedTable]) ; extracts table contents in a 2D array ; $aResult = _htmlraw_GetTableRows($aTables[$iWantedTable]) ; by SmOke_N ; $aResult = _htmlraw_GetTableCols($aTables[$iWantedTable]) ; by SmOke_N $sError = @error _ArrayDisplay($aResult, "Content of table nr." & $iWantedTable) EndIf Until $sError ; ; ----------------------------------------------------------------- ; returns an array containing positions of <table and </table> tags ; ----------------------------------------------------------------- Func ParseTables($sHtml) ; finds how many tables are on the HTML page (tables collection) StringReplace($sHtml, "<table", "<table") ; in @xtended nr. of occurences Local $iNrOfTableTags = @extended ; ConsoleWrite(@CRLF & "Debug: This page contains " & $iNrOfTableTags & " tables." & @CRLF) ; I assume that <table and </table> tags are balanced (as should be) ; (so NO check is made to see if they are actually balanced) If $iNrOfTableTags Then ; if at least one table exists ; $aTableTagsPositions array will contain the positions of the ; starting <table and ending </table> tags within the HTML Local $aTableTagsPositions[$iNrOfTableTags * 2 + 1][3] ; 1 based (make room for all open and close tags) ; 2) find in the HTML the positions of the <table and </table> tags For $i = 1 To $iNrOfTableTags $aTableTagsPositions[$i][0] = StringInStr($sHtml, "<table", 0, $i) ; start position of $i occurrence of <table opening tag $aTableTagsPositions[$i][1] = "<table" ; mark tag of this location $aTableTagsPositions[$i][2] = $i ; nr of table $aTableTagsPositions[$iNrOfTableTags + $i][0] = StringInStr($sHtml, "</table>", 0, $i) + 7 ; end position of $i occurrence of </table> closing tag $aTableTagsPositions[$iNrOfTableTags + $i][1] = "</table>" ; mark tag of this location Next _ArraySort($aTableTagsPositions, 0, 1) ; now all opening and closing tags are in the same sequence as them appears in the HTML Local $aTables = ExtractTables($aTableTagsPositions, $sHtml) ; $aTables array will contains a table in each element If Not @error Then Return $aTables Return SetError(2, 0, 0) Else Return SetError(1, 0, 0) ; No tables in HTML EndIf EndFunc ;==>ParseTables ; --------------------------------------------------- ; returns an array containing a table in each element ; --------------------------------------------------- Func ExtractTables(ByRef $aTableTagsPositions, $html) Local $aStack[UBound($aTableTagsPositions)][2] Local $aTables[Ceiling(UBound($aTableTagsPositions) / 2)] ; will contains the collection of tables For $i = 1 To UBound($aTableTagsPositions) - 1 If $aTableTagsPositions[$i][1] = "<table" Then ; opening tag $aStack[0][0] += 1 $aStack[$aStack[0][0]][0] = "<table" $aStack[$aStack[0][0]][1] = $i ElseIf $aTableTagsPositions[$i][1] = "</table>" Then ; a closing tag was found If Not $aStack[0][0] Or Not ArePair($aStack[$aStack[0][0]][0], $aTableTagsPositions[$i][1]) Then Return SetError(1, 0, 0) ; False ; something is not ok Else ; pair detected (the reciprocal tag) ; now get coordinates of the 2 tags ; 1) extract this table from the html to the array $aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]] = StringMid($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0], 1 + $aTableTagsPositions[$i][0] - $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0]) ; 2) remove that table from the html $html = StringLeft($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0] - 1) & StringMid($html, $aTableTagsPositions[$i][0] + 1) ; 3) adjust the references to the new positions of remaining tags For $ii = $i To UBound($aTableTagsPositions) - 1 $aTableTagsPositions[$ii][0] -= StringLen($aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]]) Next $aStack[0][0] -= 1 EndIf EndIf Next If Not $aStack[0][0] Then Return $aTables Else Return SetError(1, 0, 0) EndIf EndFunc ;==>ExtractTables Func ArePair($sOpening, $sClosing) If ($sOpening = '<table' And $sClosing = '</table>') Then Return True Return False EndFunc ;==>ArePair ; ------------------------------------ ; copy content of cells into the array ; ------------------------------------ Func _TableWriteToArrayFromHTML($sHtml) Local $aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3) ; _ArrayDisplay($aRes) Local $aTempResult[UBound($aRes)][UBound($aRes)] Local $iRow = 0, $iCol = 0, $iMaxRow = 0 For $i = 0 To UBound($aRes) - 1 If $aRes[$i] = "/" Then $iRow += 1 $iCol = 0 Else $aTempResult[$iRow][$iCol] = $aRes[$i] $iCol += 1 If $iCol > $iMaxRow Then $iMaxRow = $iCol EndIf Next ReDim $aTempResult[$iRow][$iMaxRow] Return $aTempResult EndFunc ;==>_TableWriteToArrayFromHTML Func MyHTML() Local $sData = '0x' & _ '3C5441424C4520626F726465723D223122206267436F6C6F723D233030666630303E0D0A202020203C54523E0D0A20202020202020203C54443E5461626C6520' & _ '31202872316331293C7461626C6520626F726465723D223122206267436F6C6F723D236666303030303E0D0A20203C74723E0D0A202020203C74683E5461626C' & _ '6520322028743272316331293C2F74683E0D0A202020203C74683E5461626C65203220726F77203120436F6C756D6E20323C2F74683E0D0A202020203C74683E' & _ '5432523143323C2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C74643E5432523243313C2F74643E0D0A202020203C74643E0D0A202020' & _ '2020203C7461626C6520626F726465723D223122206267436F6C6F723D236666666630303E0D0A20202020202020203C74723E0D0A202020202020202020203C' & _ '74643E5461626C652033206E6573746564207461626C6520636F6C756D6E20313C2F74643E0D0A202020202020202020203C74643E6E6573746564207461626C' & _ '6520636F6C756D6E20323C2F74643E0D0A20202020202020203C2F74723E0D0A2020202020203C2F7461626C653E0D0A202020203C2F74643E0D0A202020203C' & _ '74643E5432523243333C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C74643E5432523343313C2F74643E0D0A202020203C74643E5432' & _ '523343323C2F74643E0D0A202020203C74643E5432523343333C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E203C2F54443E0D0A20202020202020' & _ '203C54443E5431523143323C2F54443E0D0A20202020202020203C2F54523E0D0A202020203C54523E0D0A20202020202020203C54443E5431523243310D0A20' & _ '202020202020202020203C7461626C6520626F726465723D31206267436F6C6F723D233939303030302020414C49474E3D43454E5445523E200D0A2020202020' & _ '2020202020203C74723E3C74643E205461626C652034204162636465663C2F74643E3C74643E7434723163323C2F74643E3C74643E7434723163333C2F74643E' & _ '3C74643E7434723163343C2F74643E3C74643E7434723163350D0A2020202020202020202020202020202020203C7461626C652020626F726465723D31206267' & _ '436F6C6F723D233939393930303E0D0A2020202020202020202020202020202020203C74723E3C74643E205461626C652035204768696A6B3C2F74643E3C7464' & _ '3E7435723163323C2F74643E3C74643E7435723163333C2F74643E3C74643E7435723163340D0A20202020202020202020202020202020202020202020202020' & _ '3C7461626C6520626F726465723D31206267436F6C6F723D233939393939393E0D0A202020202020202020202020202020202020202020202020203C74723E3C' & _ '74643E205461626C652036204C6D6E6F70713C2F74643E3C74643E7435723163323C2F74643E3C74643E7435723163330D0A2020202020202020202020202020' & _ '20202020202020202020202020202020202020202020203C7461626C652020626F726465723D31206267436F6C6F723D234545303045453E203C74723E3C7464' & _ '3E205461626C6520372052737475767778797A3C2F74643E3C74643E7437723163323C2F74643E3C74643E7437723163333C2F74643E3C2F74723E0D0A202020' & _ '202020202020202020202020202020202020202020202020202020202020202020203C74723E3C74643E7437723263313C2F74643E3C74643E7437723263323C' & _ '2F74643E3C74643E7437723263330D0A202020202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F74' & _ '61626C653E0D0A202020202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E0D0A2020' & _ '20202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E0D0A2020202020202020202020' & _ '202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E3C2F54443E0D0A20202020202020203C54443E54315232' & _ '43323C5441424C4520626F726465723D223122206267436F6C6F723D233030666666663E0D0A202020202020202020202020202020203C54523E0D0A20202020' & _ '202020202020202020202020202020203C54443E5461626C6520380D0A2020202020202020202020202020202020202020202020203C5441424C4520626F7264' & _ '65723D223122206267436F6C6F723D233030303066663E0D0A202020202020202020202020202020202020202020202020202020203C54523E0D0A2020202020' & _ '2020202020202020202020202020202020202020202020202020203C54443E5461626C6520393C2F54443E0D0A20202020202020202020202020202020202020' & _ '202020202020202020202020203C54443E54392052314332203C2F54443E0D0A2020202020202020202020202020202020202020202020202020202020202020' & _ '3C2F54523E0D0A202020202020202020202020202020202020202020202020202020203C54523E0D0A2020202020202020202020202020202020202020202020' & _ '2020202020202020203C54443E543920523243313C2F54443E0D0A20202020202020202020202020202020202020202020202020202020202020203C54443E54' & _ '3920523243323C2F54443E0D0A20202020202020202020202020202020202020202020202020202020202020203C2F54523E0D0A202020202020202020202020' & _ '202020202020202020202020202020203C2F5441424C453E0D0A2020202020202020202020202020202020202020202020203C2F54443E0D0A20202020202020' & _ '202020202020202020202020203C54443E543820523143323C2F54443E0D0A20202020202020202020202020202020202020203C2F54523E0D0A202020202020' & _ '202020202020202020203C54523E0D0A20202020202020202020202020202020202020203C54443E543820523243313C2F54443E0D0A20202020202020202020' & _ '202020202020202020203C54443E543820523243323C2F54443E0D0A20202020202020202020202020202020202020203C2F54523E0D0A202020202020202020' & _ '202020202020203C2F5441424C453E0D0A2020202020202020202020203C2F54443E0D0A20202020202020203C2F54523E0D0A3C54523E3C54443E5431205233' & _ '204331202D20412073696E676C652063656C6C20726F772028576974686F75742063656C6C70616464696E67293C2F54443E3C2F54523E0D0A20203C74723E0D' & _ '0A202020203C746420636F6C7370616E3D323E0D0A20202020202068656C6C6F2C2049276D20543152344331202873696E676C652063656C6C20576974682063' & _ '656C6C70616464696E673D32290D0A3C7461626C6520626F726465723D332063656C6C70616464696E673D3520414C49474E3D4C454654206267436F6C6F723D' & _ '233636363630303E0D0A20203C74723E0D0A202020203C746420636F6C7370616E3D323E0D0A2020202020205461626C6520313020524F573120434F4C554D4E' & _ '310D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A202020202020436F6E74656E742066726F6D20543130523243310D' & _ '0A202020203C2F74643E3C74643E0D0A202020202020436F6E74656E742066726F6D20543130523243320D0A202020203C2F74643E0D0A20203C2F74723E3C74' & _ '723E0D0A202020203C74643E0D0A202020202020436F6E74656E742066726F6D20543130523343310D0A202020203C2F74643E3C74643E0D0A20202020202043' & _ '6F6E74656E742066726F6D20543130523343320D0A202020203C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F72646572' & _ '3D332063656C6C70616464696E673D313020414C49474E3D43454E544552206267436F6C6F723D233939393939393E0D0A3C74723E0D0A20203C74642076616C' & _ '69676E3D746F703E0D0A2020202054616220313120723163310D0A20203C2F74643E3C74643E0D0A2020202054616220313120723163323C703E0D0A0909090D' & _ '0A202020203C7461626C6520626F726465723D31206267436F6C6F723D233030393939393E0D0A202020203C74723E0D0A2020202020203C74643E5431325231' & _ '43313C2F74643E0D0A2020202020203C74643E543132523143323C2F74643E0D0A202020203C2F74723E3C74723E0D0A2020202020203C74643E543132523243' & _ '313C2F74643E0D0A2020202020203C74643E543132523243323C2F74643E0D0A202020203C2F74723E0D0A202020203C2F7461626C653E3C703E0D0A0909090D' & _ '0A2020202054616220313120723163320D0A20203C2F74643E0D0A3C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F726465723D3320414C4947' & _ '4E3D5249474854206267436F6C6F723D233939303039393E0D0A20203C74723E0D0A202020203C746420726F777370616E3D333E0D0A20202020202054414231' & _ '3320433120726F777370616E3D330D0A202020203C2F74643E3C74643E0D0A202020202020543133523143320D0A202020203C2F74643E0D0A20203C2F74723E' & _ '3C74723E0D0A202020203C74643E0D0A202020202020543133523243320D0A202020203C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C' & _ '74643E0D0A202020202020543133523343320D0A202020203C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A202020203C2F74643E0D0A20203C' & _ '2F74723E3C74723E0D0A202020203C74643E0D0A2020202020205461626C653120726F773520636F6C756D6E310D0A202020203C2F74643E3C74643E0D0A2020' & _ '202020205461626C653120726F773520636F6C756D6E320D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A2020202020' & _ '205461626C653120726F773620636F6C756D6E310D0A202020203C2F74643E3C74643E0D0A2020202020205461626C653120726F773620636F6C756D6E320D0A' & _ '202020203C2F74643E0D0A20203C2F74723E0D0A202020203C2F5441424C453E' Return BinaryToString($sData) EndFunc ;==>MyHTML ; ------------------------------------ ; following functions are from SmOke_N ; ------------------------------------ Func _htmlraw_GetTables($sHtml) ; return an array of tables If Not StringLen($sHtml) Then Return SetError(1, 0, 0) EndIf ; some of the below pattern isn't necessary, but I code it as I think about conditions ; problem is with nested tables, this is not a good solution Local $sPatt = "(?si)<\s*table(?:\s*|\s.+?)>.*?<\s*/\s*table\s*>" Local $aReg = StringRegExp($sHtml, $sPatt, 3) If @error Then Return SetError(2, @error, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTables Func _htmlraw_GetTableRows($sTable) ; believe it or not </tr> is not necessary ; though most use it, so better look for </table too> ; then there's the fun of not having nested tables ; but I don't have the brain power to think through all that today, so simple it is Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableRows Func _htmlraw_GetTableCols($sData) ; I've talked about nesting issues, just going to do it simple ; th/td Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableCols Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 24, 2015 Moderators Share Posted January 24, 2015 Do you mean like this: expandcollapse popup#include <Array.au3> Global $gsStr = _myHTML() Global $gaTables = _htmlraw_GetTables($gsStr) ;~ _ArrayDisplay($gaTables) Global $gaTableData = _htmlraw_TableToArray($gaTables[0]) _ArrayDisplay($gaTableData) Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 $aRet[$iEnum][$j] = $aCols[$j] Next $iEnum += 1 Next Return $aRet EndFunc Func _htmlraw_GetTables($sHTML) ; return an array of tables If Not StringLen($sHTML) Then Return SetError(1, 0, 0) EndIf ; some of the below pattern isn't necessary, but I code it as I think about conditions ; problem is with nested tables, this is not a good solution Local $sPatt = "(?si)<\s*table(?:\s*|\s.+?)>.*?<\s*/\s*table\s*>" Local $aReg = StringRegExp($sHTML, $sPatt, 3) If @error Then Return SetError(2, @error, 0) EndIf Return $aReg EndFunc Func _htmlraw_GetTableRows($sTable) ; believe it or not </tr> is not necessary ; though most use it, so better look for </table too> ; then there's the fun of not having nested tables ; but I don't have the brain power to think through all that today, so simple it is Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc Func _htmlraw_GetTableCols($sData) ; I've talked about nesting issues, just going to do it simple ; th/td Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc Func _myHTML() Local $sHTML $sHTML &= "0x3C68746D6C3E0D0A3C626F64793E0D0A3C7461626C65207374796C653D" $sHTML &= "2277696474683A31303025223E0D0A20203C74723E0D0A202020203C7468" $sHTML &= "3E4E616D653A3C2F74683E0D0A202020203C74643E42696C6C2047617465" $sHTML &= "733C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C" $sHTML &= "746820726F777370616E3D2232223E54656C6570686F6E653A3C2F74683E" $sHTML &= "0D0A202020203C74643E353535203737203835343C2F74643E0D0A20203C" $sHTML &= "2F74723E0D0A20203C74723E0D0A202020203C74643E3535352037372038" $sHTML &= "35353C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A3C74" $sHTML &= "61626C65207374796C653D2277696474683A31303025223E0D0A20203C63" $sHTML &= "617074696F6E3E4D6F6E74686C7920736176696E67733C2F63617074696F" $sHTML &= "6E3E0D0A20203C74723E0D0A202020203C74683E4D6F6E74683C2F74683E" $sHTML &= "0D0A202020203C74683E536176696E67733C2F74683E0D0A20203C2F7472" $sHTML &= "3E0D0A20203C74723E0D0A202020203C74643E4A616E756172793C2F7464" $sHTML &= "3E0D0A202020203C74643E243130303C2F74643E0D0A20203C2F74723E0D" $sHTML &= "0A20203C74723E0D0A202020203C74643E46656272756172793C2F74643E" $sHTML &= "0D0A202020203C74643E2435303C2F74643E0D0A20203C2F74723E0D0A3C" $sHTML &= "2F7461626C653E0D0A3C7461626C65207374796C653D2277696474683A31" $sHTML &= "303025223E0D0A20203C74723E0D0A202020203C74683E4E616D653C2F74" $sHTML &= "683E0D0A202020203C746820636F6C7370616E3D2232223E54656C657068" $sHTML &= "6F6E653C2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020" $sHTML &= "203C74643E42696C6C2047617465733C2F74643E0D0A202020203C74643E" $sHTML &= "353535203737203835343C2F74643E0D0A202020203C74643E3535352037" $sHTML &= "37203835353C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D" $sHTML &= "0A3C7461626C652069643D22743031223E0D0A20203C74723E0D0A202020" $sHTML &= "203C74683E46697273746E616D653C2F74683E0D0A202020203C74683E4C" $sHTML &= "6173746E616D653C2F74683E200D0A202020203C74683E506F696E74733C" $sHTML &= "2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C7464" $sHTML &= "3E4576653C2F74643E0D0A202020203C74643E4A61636B736F6E3C2F7464" $sHTML &= "3E200D0A202020203C74643E39343C2F74643E0D0A20203C2F74723E0D0A" $sHTML &= "3C2F7461626C653E0D0A3C2F626F64793E0D0A3C2F68746D6C3E" Return BinaryToString($sHTML) EndFunc ? Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Gianni Posted January 24, 2015 Author Share Posted January 24, 2015 Hi SmOke_N, thanks for the reply yes, your _htmlraw_TableToArray() function extracts data in a 2D array, (could be removed the html tags?) but it has the same issue of my function, that is: if you extract table 13 for example the rows 2 and 3 of the column 2 are extracted in column 1 of the array. I include again the above code including your new function to show the issue on table 13 for example thanks expandcollapse popup#include <IE.au3> #include <String.au3> #include <Array.au3> ; ; 1) open an html page containing tables (also nested) ; it's an hodgepodge of tables just to make tests Local $oie = _IECreate() _IEDocWriteHTML($oie, MyHTML()) ; just to show the tables on the browser Do Sleep(250) Until IsObj($oie) Local $sHtml = _IEBodyReadHTML($oie) ; extract whole raw HTML of the page ; Local $aTables = ParseTables($sHtml) ; each table in each element of the array ; Local $iWantedTable, $sError, $aResult Do $iWantedTable = InputBox("select a table", "Please enter the nr. of the table to get data from (1 based)") $sError = @error If Not $sError Then ; $aResult = _TableWriteToArrayFromHTML($aTables[$iWantedTable]) ; extracts table contents in a 2D array ; $aResult = _htmlraw_GetTableRows($aTables[$iWantedTable]) ; by SmOke_N ; $aResult = _htmlraw_GetTableCols($aTables[$iWantedTable]) ; by SmOke_N $aResult = _htmlraw_TableToArray($aTables[$iWantedTable]) $sError = @error _ArrayDisplay($aResult, "Content of table nr." & $iWantedTable) EndIf Until $sError ; ; ----------------------------------------------------------------- ; returns an array containing positions of <table and </table> tags ; ----------------------------------------------------------------- Func ParseTables($sHtml) ; finds how many tables are on the HTML page (tables collection) StringReplace($sHtml, "<table", "<table") ; in @xtended nr. of occurences Local $iNrOfTableTags = @extended ; ConsoleWrite(@CRLF & "Debug: This page contains " & $iNrOfTableTags & " tables." & @CRLF) ; I assume that <table and </table> tags are balanced (as should be) ; (so NO check is made to see if they are actually balanced) If $iNrOfTableTags Then ; if at least one table exists ; $aTableTagsPositions array will contain the positions of the ; starting <table and ending </table> tags within the HTML Local $aTableTagsPositions[$iNrOfTableTags * 2 + 1][3] ; 1 based (make room for all open and close tags) ; 2) find in the HTML the positions of the <table and </table> tags For $i = 1 To $iNrOfTableTags $aTableTagsPositions[$i][0] = StringInStr($sHtml, "<table", 0, $i) ; start position of $i occurrence of <table opening tag $aTableTagsPositions[$i][1] = "<table" ; mark tag of this location $aTableTagsPositions[$i][2] = $i ; nr of table $aTableTagsPositions[$iNrOfTableTags + $i][0] = StringInStr($sHtml, "</table>", 0, $i) + 7 ; end position of $i occurrence of </table> closing tag $aTableTagsPositions[$iNrOfTableTags + $i][1] = "</table>" ; mark tag of this location Next _ArraySort($aTableTagsPositions, 0, 1) ; now all opening and closing tags are in the same sequence as them appears in the HTML Local $aTables = ExtractTables($aTableTagsPositions, $sHtml) ; $aTables array will contains a table in each element If Not @error Then Return $aTables Return SetError(2, 0, 0) Else Return SetError(1, 0, 0) ; No tables in HTML EndIf EndFunc ;==>ParseTables ; --------------------------------------------------- ; returns an array containing a table in each element ; --------------------------------------------------- Func ExtractTables(ByRef $aTableTagsPositions, $html) Local $aStack[UBound($aTableTagsPositions)][2] Local $aTables[Ceiling(UBound($aTableTagsPositions) / 2)] ; will contains the collection of tables For $i = 1 To UBound($aTableTagsPositions) - 1 If $aTableTagsPositions[$i][1] = "<table" Then ; opening tag $aStack[0][0] += 1 $aStack[$aStack[0][0]][0] = "<table" $aStack[$aStack[0][0]][1] = $i ElseIf $aTableTagsPositions[$i][1] = "</table>" Then ; a closing tag was found If Not $aStack[0][0] Or Not ArePair($aStack[$aStack[0][0]][0], $aTableTagsPositions[$i][1]) Then Return SetError(1, 0, 0) ; False ; something is not ok Else ; pair detected (the reciprocal tag) ; now get coordinates of the 2 tags ; 1) extract this table from the html to the array $aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]] = StringMid($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0], 1 + $aTableTagsPositions[$i][0] - $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0]) ; 2) remove that table from the html $html = StringLeft($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0] - 1) & StringMid($html, $aTableTagsPositions[$i][0] + 1) ; 3) adjust the references to the new positions of remaining tags For $ii = $i To UBound($aTableTagsPositions) - 1 $aTableTagsPositions[$ii][0] -= StringLen($aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]]) Next $aStack[0][0] -= 1 EndIf EndIf Next If Not $aStack[0][0] Then Return $aTables Else Return SetError(1, 0, 0) EndIf EndFunc ;==>ExtractTables Func ArePair($sOpening, $sClosing) If ($sOpening = '<table' And $sClosing = '</table>') Then Return True Return False EndFunc ;==>ArePair ; ------------------------------------ ; copy content of cells into the array ; ------------------------------------ Func _TableWriteToArrayFromHTML($sHtml) Local $aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3) ; _ArrayDisplay($aRes) Local $aTempResult[UBound($aRes)][UBound($aRes)] Local $iRow = 0, $iCol = 0, $iMaxRow = 0 For $i = 0 To UBound($aRes) - 1 If $aRes[$i] = "/" Then $iRow += 1 $iCol = 0 Else $aTempResult[$iRow][$iCol] = $aRes[$i] $iCol += 1 If $iCol > $iMaxRow Then $iMaxRow = $iCol EndIf Next ReDim $aTempResult[$iRow][$iMaxRow] Return $aTempResult EndFunc ;==>_TableWriteToArrayFromHTML Func MyHTML() Local $sData = '0x' & _ '3C5441424C4520626F726465723D223122206267436F6C6F723D233030666630303E0D0A202020203C54523E0D0A20202020202020203C54443E5461626C6520' & _ '31202872316331293C7461626C6520626F726465723D223122206267436F6C6F723D236666303030303E0D0A20203C74723E0D0A202020203C74683E5461626C' & _ '6520322028743272316331293C2F74683E0D0A202020203C74683E5461626C65203220726F77203120436F6C756D6E20323C2F74683E0D0A202020203C74683E' & _ '5432523143323C2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C74643E5432523243313C2F74643E0D0A202020203C74643E0D0A202020' & _ '2020203C7461626C6520626F726465723D223122206267436F6C6F723D236666666630303E0D0A20202020202020203C74723E0D0A202020202020202020203C' & _ '74643E5461626C652033206E6573746564207461626C6520636F6C756D6E20313C2F74643E0D0A202020202020202020203C74643E6E6573746564207461626C' & _ '6520636F6C756D6E20323C2F74643E0D0A20202020202020203C2F74723E0D0A2020202020203C2F7461626C653E0D0A202020203C2F74643E0D0A202020203C' & _ '74643E5432523243333C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C74643E5432523343313C2F74643E0D0A202020203C74643E5432' & _ '523343323C2F74643E0D0A202020203C74643E5432523343333C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E203C2F54443E0D0A20202020202020' & _ '203C54443E5431523143323C2F54443E0D0A20202020202020203C2F54523E0D0A202020203C54523E0D0A20202020202020203C54443E5431523243310D0A20' & _ '202020202020202020203C7461626C6520626F726465723D31206267436F6C6F723D233939303030302020414C49474E3D43454E5445523E200D0A2020202020' & _ '2020202020203C74723E3C74643E205461626C652034204162636465663C2F74643E3C74643E7434723163323C2F74643E3C74643E7434723163333C2F74643E' & _ '3C74643E7434723163343C2F74643E3C74643E7434723163350D0A2020202020202020202020202020202020203C7461626C652020626F726465723D31206267' & _ '436F6C6F723D233939393930303E0D0A2020202020202020202020202020202020203C74723E3C74643E205461626C652035204768696A6B3C2F74643E3C7464' & _ '3E7435723163323C2F74643E3C74643E7435723163333C2F74643E3C74643E7435723163340D0A20202020202020202020202020202020202020202020202020' & _ '3C7461626C6520626F726465723D31206267436F6C6F723D233939393939393E0D0A202020202020202020202020202020202020202020202020203C74723E3C' & _ '74643E205461626C652036204C6D6E6F70713C2F74643E3C74643E7435723163323C2F74643E3C74643E7435723163330D0A2020202020202020202020202020' & _ '20202020202020202020202020202020202020202020203C7461626C652020626F726465723D31206267436F6C6F723D234545303045453E203C74723E3C7464' & _ '3E205461626C6520372052737475767778797A3C2F74643E3C74643E7437723163323C2F74643E3C74643E7437723163333C2F74643E3C2F74723E0D0A202020' & _ '202020202020202020202020202020202020202020202020202020202020202020203C74723E3C74643E7437723263313C2F74643E3C74643E7437723263323C' & _ '2F74643E3C74643E7437723263330D0A202020202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F74' & _ '61626C653E0D0A202020202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E0D0A2020' & _ '20202020202020202020202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E0D0A2020202020202020202020' & _ '202020202020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E3C2F54443E0D0A20202020202020203C54443E54315232' & _ '43323C5441424C4520626F726465723D223122206267436F6C6F723D233030666666663E0D0A202020202020202020202020202020203C54523E0D0A20202020' & _ '202020202020202020202020202020203C54443E5461626C6520380D0A2020202020202020202020202020202020202020202020203C5441424C4520626F7264' & _ '65723D223122206267436F6C6F723D233030303066663E0D0A202020202020202020202020202020202020202020202020202020203C54523E0D0A2020202020' & _ '2020202020202020202020202020202020202020202020202020203C54443E5461626C6520393C2F54443E0D0A20202020202020202020202020202020202020' & _ '202020202020202020202020203C54443E54392052314332203C2F54443E0D0A2020202020202020202020202020202020202020202020202020202020202020' & _ '3C2F54523E0D0A202020202020202020202020202020202020202020202020202020203C54523E0D0A2020202020202020202020202020202020202020202020' & _ '2020202020202020203C54443E543920523243313C2F54443E0D0A20202020202020202020202020202020202020202020202020202020202020203C54443E54' & _ '3920523243323C2F54443E0D0A20202020202020202020202020202020202020202020202020202020202020203C2F54523E0D0A202020202020202020202020' & _ '202020202020202020202020202020203C2F5441424C453E0D0A2020202020202020202020202020202020202020202020203C2F54443E0D0A20202020202020' & _ '202020202020202020202020203C54443E543820523143323C2F54443E0D0A20202020202020202020202020202020202020203C2F54523E0D0A202020202020' & _ '202020202020202020203C54523E0D0A20202020202020202020202020202020202020203C54443E543820523243313C2F54443E0D0A20202020202020202020' & _ '202020202020202020203C54443E543820523243323C2F54443E0D0A20202020202020202020202020202020202020203C2F54523E0D0A202020202020202020' & _ '202020202020203C2F5441424C453E0D0A2020202020202020202020203C2F54443E0D0A20202020202020203C2F54523E0D0A3C54523E3C54443E5431205233' & _ '204331202D20412073696E676C652063656C6C20726F772028576974686F75742063656C6C70616464696E67293C2F54443E3C2F54523E0D0A20203C74723E0D' & _ '0A202020203C746420636F6C7370616E3D323E0D0A20202020202068656C6C6F2C2049276D20543152344331202873696E676C652063656C6C20576974682063' & _ '656C6C70616464696E673D32290D0A3C7461626C6520626F726465723D332063656C6C70616464696E673D3520414C49474E3D4C454654206267436F6C6F723D' & _ '233636363630303E0D0A20203C74723E0D0A202020203C746420636F6C7370616E3D323E0D0A2020202020205461626C6520313020524F573120434F4C554D4E' & _ '310D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A202020202020436F6E74656E742066726F6D20543130523243310D' & _ '0A202020203C2F74643E3C74643E0D0A202020202020436F6E74656E742066726F6D20543130523243320D0A202020203C2F74643E0D0A20203C2F74723E3C74' & _ '723E0D0A202020203C74643E0D0A202020202020436F6E74656E742066726F6D20543130523343310D0A202020203C2F74643E3C74643E0D0A20202020202043' & _ '6F6E74656E742066726F6D20543130523343320D0A202020203C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F72646572' & _ '3D332063656C6C70616464696E673D313020414C49474E3D43454E544552206267436F6C6F723D233939393939393E0D0A3C74723E0D0A20203C74642076616C' & _ '69676E3D746F703E0D0A2020202054616220313120723163310D0A20203C2F74643E3C74643E0D0A2020202054616220313120723163323C703E0D0A0909090D' & _ '0A202020203C7461626C6520626F726465723D31206267436F6C6F723D233030393939393E0D0A202020203C74723E0D0A2020202020203C74643E5431325231' & _ '43313C2F74643E0D0A2020202020203C74643E543132523143323C2F74643E0D0A202020203C2F74723E3C74723E0D0A2020202020203C74643E543132523243' & _ '313C2F74643E0D0A2020202020203C74643E543132523243323C2F74643E0D0A202020203C2F74723E0D0A202020203C2F7461626C653E3C703E0D0A0909090D' & _ '0A2020202054616220313120723163320D0A20203C2F74643E0D0A3C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F726465723D3320414C4947' & _ '4E3D5249474854206267436F6C6F723D233939303039393E0D0A20203C74723E0D0A202020203C746420726F777370616E3D333E0D0A20202020202054414231' & _ '3320433120726F777370616E3D330D0A202020203C2F74643E3C74643E0D0A202020202020543133523143320D0A202020203C2F74643E0D0A20203C2F74723E' & _ '3C74723E0D0A202020203C74643E0D0A202020202020543133523243320D0A202020203C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C' & _ '74643E0D0A202020202020543133523343320D0A202020203C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E0D0A202020203C2F74643E0D0A20203C' & _ '2F74723E3C74723E0D0A202020203C74643E0D0A2020202020205461626C653120726F773520636F6C756D6E310D0A202020203C2F74643E3C74643E0D0A2020' & _ '202020205461626C653120726F773520636F6C756D6E320D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A2020202020' & _ '205461626C653120726F773620636F6C756D6E310D0A202020203C2F74643E3C74643E0D0A2020202020205461626C653120726F773620636F6C756D6E320D0A' & _ '202020203C2F74643E0D0A20203C2F74723E0D0A202020203C2F5441424C453E' Return BinaryToString($sData) EndFunc ;==>MyHTML ; ------------------------------------ ; following functions are from SmOke_N ; ------------------------------------ Func _htmlraw_GetTables($sHtml) ; return an array of tables If Not StringLen($sHtml) Then Return SetError(1, 0, 0) EndIf ; some of the below pattern isn't necessary, but I code it as I think about conditions ; problem is with nested tables, this is not a good solution Local $sPatt = "(?si)<\s*table(?:\s*|\s.+?)>.*?<\s*/\s*table\s*>" Local $aReg = StringRegExp($sHtml, $sPatt, 3) If @error Then Return SetError(2, @error, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTables Func _htmlraw_GetTableRows($sTable) ; believe it or not </tr> is not necessary ; though most use it, so better look for </table too> ; then there's the fun of not having nested tables ; but I don't have the brain power to think through all that today, so simple it is Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableRows Func _htmlraw_GetTableCols($sData) ; I've talked about nesting issues, just going to do it simple ; th/td Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableCols Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 $aRet[$iEnum][$j] = $aCols[$j] Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 25, 2015 Moderators Share Posted January 25, 2015 (edited) <table border=3 ALIGN=RIGHT bgColor=#990099> <tr> <td rowspan=3> TAB13 C1 rowspan=3 </td><td> T13R1C2 </td> </tr><tr> <td> T13R2C2 </td> </tr> <tr> <td> T13R3C2 </td> </tr> </table> The array returns exactly what I'd expect. [0][0] = TAB13 C1 rowspan=3 [0][1] = T13R1C2 [1][0] = T13R2C2 [2][0] = T13R3C2 As far as your other question: Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], _ "^(?i)\h*<(?:th|td).*?(?<!>)>\h*|(?:\h*<\h*/\h*(?:th|td)\h*>\h*)$", "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Edit: Perhaps you were meaning to write your table code like: <table border=3 ALIGN=RIGHT bgColor=#990099> <tr> <td rowspan=3> TAB13 C1 rowspan=3 </td><td> T13R1C2 </td> </tr><tr> <td></td> <td> T13R2C2 </td> </tr> <tr> <td></td> <td> T13R3C2 </td> </tr> </table> If you did that, then this comes out as you expect I believe: expandcollapse popup#include <IE.au3> #include <String.au3> #include <Array.au3> ; ; 1) open an html page containing tables (also nested) ; it's an hodgepodge of tables just to make tests Local $oie = _IECreate() _IEDocWriteHTML($oie, MyHTML()) ; just to show the tables on the browser Do Sleep(250) Until IsObj($oie) Local $sHtml = _IEBodyReadHTML($oie) ; extract whole raw HTML of the page ; Local $aTables = ParseTables($sHtml) ; each table in each element of the array ; Local $iWantedTable, $sError, $aResult Do $iWantedTable = InputBox("select a table", "Please enter the nr. of the table to get data from (1 based)") $sError = @error If Not $sError Then ; $aResult = _TableWriteToArrayFromHTML($aTables[$iWantedTable]) ; extracts table contents in a 2D array ; $aResult = _htmlraw_GetTableRows($aTables[$iWantedTable]) ; by SmOke_N ; $aResult = _htmlraw_GetTableCols($aTables[$iWantedTable]) ; by SmOke_N $aResult = _htmlraw_TableToArray($aTables[$iWantedTable]) $sError = @error _ArrayDisplay($aResult, "Content of table nr." & $iWantedTable) EndIf Until $sError ; ; ----------------------------------------------------------------- ; returns an array containing positions of <table and </table> tags ; ----------------------------------------------------------------- Func ParseTables($sHtml) ; finds how many tables are on the HTML page (tables collection) StringReplace($sHtml, "<table", "<table") ; in @xtended nr. of occurences Local $iNrOfTableTags = @extended ; ConsoleWrite(@CRLF & "Debug: This page contains " & $iNrOfTableTags & " tables." & @CRLF) ; I assume that <table and </table> tags are balanced (as should be) ; (so NO check is made to see if they are actually balanced) If $iNrOfTableTags Then ; if at least one table exists ; $aTableTagsPositions array will contain the positions of the ; starting <table and ending </table> tags within the HTML Local $aTableTagsPositions[$iNrOfTableTags * 2 + 1][3] ; 1 based (make room for all open and close tags) ; 2) find in the HTML the positions of the <table and </table> tags For $i = 1 To $iNrOfTableTags $aTableTagsPositions[$i][0] = StringInStr($sHtml, "<table", 0, $i) ; start position of $i occurrence of <table opening tag $aTableTagsPositions[$i][1] = "<table" ; mark tag of this location $aTableTagsPositions[$i][2] = $i ; nr of table $aTableTagsPositions[$iNrOfTableTags + $i][0] = StringInStr($sHtml, "</table>", 0, $i) + 7 ; end position of $i occurrence of </table> closing tag $aTableTagsPositions[$iNrOfTableTags + $i][1] = "</table>" ; mark tag of this location Next _ArraySort($aTableTagsPositions, 0, 1) ; now all opening and closing tags are in the same sequence as them appears in the HTML Local $aTables = ExtractTables($aTableTagsPositions, $sHtml) ; $aTables array will contains a table in each element If Not @error Then Return $aTables Return SetError(2, 0, 0) Else Return SetError(1, 0, 0) ; No tables in HTML EndIf EndFunc ;==>ParseTables ; --------------------------------------------------- ; returns an array containing a table in each element ; --------------------------------------------------- Func ExtractTables(ByRef $aTableTagsPositions, $html) Local $aStack[UBound($aTableTagsPositions)][2] Local $aTables[Ceiling(UBound($aTableTagsPositions) / 2)] ; will contains the collection of tables For $i = 1 To UBound($aTableTagsPositions) - 1 If $aTableTagsPositions[$i][1] = "<table" Then ; opening tag $aStack[0][0] += 1 $aStack[$aStack[0][0]][0] = "<table" $aStack[$aStack[0][0]][1] = $i ElseIf $aTableTagsPositions[$i][1] = "</table>" Then ; a closing tag was found If Not $aStack[0][0] Or Not ArePair($aStack[$aStack[0][0]][0], $aTableTagsPositions[$i][1]) Then Return SetError(1, 0, 0) ; False ; something is not ok Else ; pair detected (the reciprocal tag) ; now get coordinates of the 2 tags ; 1) extract this table from the html to the array $aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]] = StringMid($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0], 1 + $aTableTagsPositions[$i][0] - $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0]) ; 2) remove that table from the html $html = StringLeft($html, $aTableTagsPositions[$aStack[$aStack[0][0]][1]][0] - 1) & StringMid($html, $aTableTagsPositions[$i][0] + 1) ; 3) adjust the references to the new positions of remaining tags For $ii = $i To UBound($aTableTagsPositions) - 1 $aTableTagsPositions[$ii][0] -= StringLen($aTables[$aTableTagsPositions[$aStack[$aStack[0][0]][1]][2]]) Next $aStack[0][0] -= 1 EndIf EndIf Next If Not $aStack[0][0] Then Return $aTables Else Return SetError(1, 0, 0) EndIf EndFunc ;==>ExtractTables Func ArePair($sOpening, $sClosing) If ($sOpening = '<table' And $sClosing = '</table>') Then Return True Return False EndFunc ;==>ArePair ; ------------------------------------ ; copy content of cells into the array ; ------------------------------------ Func _TableWriteToArrayFromHTML($sHtml) Local $aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3) ; _ArrayDisplay($aRes) Local $aTempResult[UBound($aRes)][UBound($aRes)] Local $iRow = 0, $iCol = 0, $iMaxRow = 0 For $i = 0 To UBound($aRes) - 1 If $aRes[$i] = "/" Then $iRow += 1 $iCol = 0 Else $aTempResult[$iRow][$iCol] = $aRes[$i] $iCol += 1 If $iCol > $iMaxRow Then $iMaxRow = $iCol EndIf Next ReDim $aTempResult[$iRow][$iMaxRow] Return $aTempResult EndFunc ;==>_TableWriteToArrayFromHTML Func MyHTML() Local $sData = "0x3C5441424C4520626F726465723D223122206267436F6C6F723D233030666630303E0D0A2" & _ "02020203C54523E0D0A20202020202020203C54443E5461626C652031202872316331293C74" & _ "61626C6520626F726465723D223122206267436F6C6F723D236666303030303E0D0A20203C7" & _ "4723E0D0A202020203C74683E5461626C6520322028743272316331293C2F74683E0D0A2020" & _ "20203C74683E5461626C65203220726F77203120436F6C756D6E20323C2F74683E0D0A20202" & _ "0203C74683E5432523143323C2F74683E0D0A20203C2F74723E0D0A20203C74723E0D0A2020" & _ "20203C74643E5432523243313C2F74643E0D0A202020203C74643E0D0A2020202020203C746" & _ "1626C6520626F726465723D223122206267436F6C6F723D236666666630303E0D0A20202020" & _ "202020203C74723E0D0A202020202020202020203C74643E5461626C652033206E657374656" & _ "4207461626C6520636F6C756D6E20313C2F74643E0D0A202020202020202020203C74643E6E" & _ "6573746564207461626C6520636F6C756D6E20323C2F74643E0D0A20202020202020203C2F7" & _ "4723E0D0A2020202020203C2F7461626C653E0D0A202020203C2F74643E0D0A202020203C74" & _ "643E5432523243333C2F74643E0D0A20203C2F74723E0D0A20203C74723E0D0A202020203C7" & _ "4643E5432523343313C2F74643E0D0A202020203C74643E5432523343323C2F74643E0D0A20" & _ "2020203C74643E5432523343333C2F74643E0D0A20203C2F74723E0D0A3C2F7461626C653E2" & _ "03C2F54443E0D0A20202020202020203C54443E5431523143323C2F54443E0D0A2020202020" & _ "2020203C2F54523E0D0A202020203C54523E0D0A20202020202020203C54443E54315232433" & _ "10D0A20202020202020202020203C7461626C6520626F726465723D31206267436F6C6F723D" & _ "233939303030302020414C49474E3D43454E5445523E200D0A20202020202020202020203C7" & _ "4723E3C74643E205461626C652034204162636465663C2F74643E3C74643E7434723163323C" & _ "2F74643E3C74643E7434723163333C2F74643E3C74643E7434723163343C2F74643E3C74643" & _ "E7434723163350D0A2020202020202020202020202020202020203C7461626C652020626F72" & _ "6465723D31206267436F6C6F723D233939393930303E0D0A202020202020202020202020202" & _ "0202020203C74723E3C74643E205461626C652035204768696A6B3C2F74643E3C74643E7435" & _ "723163323C2F74643E3C74643E7435723163333C2F74643E3C74643E7435723163340D0A202" & _ "020202020202020202020202020202020202020202020203C7461626C6520626F726465723D" & _ "31206267436F6C6F723D233939393939393E0D0A20202020202020202020202020202020202" & _ "0202020202020203C74723E3C74643E205461626C652036204C6D6E6F70713C2F74643E3C74" & _ "643E7435723163323C2F74643E3C74643E7435723163330D0A2020202020202020202020202" & _ "02020202020202020202020202020202020202020202020203C7461626C652020626F726465" & _ "723D31206267436F6C6F723D234545303045453E203C74723E3C74643E205461626C6520372" & _ "052737475767778797A3C2F74643E3C74643E7437723163323C2F74643E3C74643E74377231" & _ "63333C2F74643E3C2F74723E0D0A20202020202020202020202020202020202020202020202" & _ "0202020202020202020202020203C74723E3C74643E7437723263313C2F74643E3C74643E74" & _ "37723263323C2F74643E3C74643E7437723263330D0A2020202020202020202020202020202" & _ "02020202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E" & _ "0D0A202020202020202020202020202020202020202020202020202020202020202020203C2" & _ "F74643E3C2F74723E203C2F7461626C653E0D0A202020202020202020202020202020202020" & _ "202020202020202020202020202020203C2F74643E3C2F74723E203C2F7461626C653E0D0A2" & _ "020202020202020202020202020202020202020202020202020202020202020203C2F74643E" & _ "3C2F74723E203C2F7461626C653E3C2F54443E0D0A20202020202020203C54443E543152324" & _ "3323C5441424C4520626F726465723D223122206267436F6C6F723D233030666666663E0D0A" & _ "202020202020202020202020202020203C54523E0D0A2020202020202020202020202020202" & _ "0202020203C54443E5461626C6520380D0A2020202020202020202020202020202020202020" & _ "202020203C5441424C4520626F726465723D223122206267436F6C6F723D233030303066663" & _ "E0D0A202020202020202020202020202020202020202020202020202020203C54523E0D0A20" & _ "202020202020202020202020202020202020202020202020202020202020203C54443E54616" & _ "26C6520393C2F54443E0D0A2020202020202020202020202020202020202020202020202020" & _ "2020202020203C54443E54392052314332203C2F54443E0D0A2020202020202020202020202" & _ "0202020202020202020202020202020202020203C2F54523E0D0A2020202020202020202020" & _ "20202020202020202020202020202020203C54523E0D0A20202020202020202020202020202" & _ "020202020202020202020202020202020203C54443E543920523243313C2F54443E0D0A2020" & _ "2020202020202020202020202020202020202020202020202020202020203C54443E5439205" & _ "23243323C2F54443E0D0A202020202020202020202020202020202020202020202020202020" & _ "20202020203C2F54523E0D0A202020202020202020202020202020202020202020202020202" & _ "020203C2F5441424C453E0D0A2020202020202020202020202020202020202020202020203C" & _ "2F54443E0D0A20202020202020202020202020202020202020203C54443E543820523143323" & _ "C2F54443E0D0A20202020202020202020202020202020202020203C2F54523E0D0A20202020" & _ "2020202020202020202020203C54523E0D0A202020202020202020202020202020202020202" & _ "03C54443E543820523243313C2F54443E0D0A20202020202020202020202020202020202020" & _ "203C54443E543820523243323C2F54443E0D0A2020202020202020202020202020202020202" & _ "0203C2F54523E0D0A202020202020202020202020202020203C2F5441424C453E0D0A202020" & _ "2020202020202020203C2F54443E0D0A20202020202020203C2F54523E0D0A3C54523E3C544" & _ "43E5431205233204331202D20412073696E676C652063656C6C20726F772028576974686F75" & _ "742063656C6C70616464696E67293C2F54443E3C2F54523E0D0A20203C74723E0D0A2020202" & _ "03C746420636F6C7370616E3D323E0D0A20202020202068656C6C6F2C2049276D2054315234" & _ "4331202873696E676C652063656C6C20576974682063656C6C70616464696E673D32290D0A3" & _ "C7461626C6520626F726465723D332063656C6C70616464696E673D3520414C49474E3D4C45" & _ "4654206267436F6C6F723D233636363630303E0D0A20203C74723E0D0A202020203C7464206" & _ "36F6C7370616E3D323E0D0A2020202020205461626C6520313020524F573120434F4C554D4E" & _ "310D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A2" & _ "02020202020436F6E74656E742066726F6D20543130523243310D0A202020203C2F74643E3C" & _ "74643E0D0A202020202020436F6E74656E742066726F6D20543130523243320D0A202020203" & _ "C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A202020202020436F" & _ "6E74656E742066726F6D20543130523343310D0A202020203C2F74643E3C74643E0D0A20202" & _ "0202020436F6E74656E742066726F6D20543130523343320D0A202020203C2F74643E0D0A20" & _ "203C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F726465723D332063656C6" & _ "C70616464696E673D313020414C49474E3D43454E544552206267436F6C6F723D2339393939" & _ "39393E0D0A3C74723E0D0A20203C74642076616C69676E3D746F703E0D0A202020205461622" & _ "0313120723163310D0A20203C2F74643E3C74643E0D0A202020205461622031312072316332" & _ "3C703E0D0A0909090D0A202020203C7461626C6520626F726465723D31206267436F6C6F723" & _ "D233030393939393E0D0A202020203C74723E0D0A2020202020203C74643E54313252314331" & _ "3C2F74643E0D0A2020202020203C74643E543132523143323C2F74643E0D0A202020203C2F7" & _ "4723E3C74723E0D0A2020202020203C74643E543132523243313C2F74643E0D0A2020202020" & _ "203C74643E543132523243323C2F74643E0D0A202020203C2F74723E0D0A202020203C2F746" & _ "1626C653E3C703E0D0A0909090D0A2020202054616220313120723163320D0A20203C2F7464" & _ "3E0D0A3C2F74723E0D0A3C2F7461626C653E0D0A3C7461626C6520626F726465723D3320414" & _ "C49474E3D5249474854206267436F6C6F723D233939303039393E0D0A20203C74723E0D0A20" & _ "2020203C746420726F777370616E3D333E0D0A202020202020544142313320433120726F777" & _ "370616E3D330D0A202020203C2F74643E3C74643E0D0A202020202020543133523143320D0A" & _ "202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E3C2F74643E0" & _ "D0A202020203C74643E0D0A202020202020543133523243320D0A202020203C2F74643E0D0A" & _ "20203C2F74723E0D0A20203C74723E0D0A202020203C74643E3C2F74643E0D0A202020203C7" & _ "4643E0D0A202020202020543133523343320D0A202020203C2F74643E0D0A20203C2F74723E" & _ "0D0A3C2F7461626C653E0D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202" & _ "020203C74643E0D0A2020202020205461626C653120726F773520636F6C756D6E310D0A2020" & _ "20203C2F74643E3C74643E0D0A2020202020205461626C653120726F773520636F6C756D6E3" & _ "20D0A202020203C2F74643E0D0A20203C2F74723E3C74723E0D0A202020203C74643E0D0A20" & _ "20202020205461626C653120726F773620636F6C756D6E310D0A202020203C2F74643E3C746" & _ "43E0D0A2020202020205461626C653120726F773620636F6C756D6E320D0A202020203C2F74" & _ "643E0D0A20203C2F74723E0D0A3C2F5441424C453E" Return BinaryToString($sData) EndFunc ;==>MyHTML ; ------------------------------------ ; following functions are from SmOke_N ; ------------------------------------ Func _htmlraw_GetTables($sHtml) ; return an array of tables If Not StringLen($sHtml) Then Return SetError(1, 0, 0) EndIf ; some of the below pattern isn't necessary, but I code it as I think about conditions ; problem is with nested tables, this is not a good solution Local $sPatt = "(?si)<\s*table(?:\s*|\s.+?)>.*?<\s*/\s*table\s*>" Local $aReg = StringRegExp($sHtml, $sPatt, 3) If @error Then Return SetError(2, @error, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTables Func _htmlraw_GetTableRows($sTable) ; believe it or not </tr> is not necessary ; though most use it, so better look for </table too> ; then there's the fun of not having nested tables ; but I don't have the brain power to think through all that today, so simple it is Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableRows Func _htmlraw_GetTableCols($sData) ; I've talked about nesting issues, just going to do it simple ; th/td Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) EndIf Return $aReg EndFunc ;==>_htmlraw_GetTableCols Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], _ "^(?is)\h*<(?:th|td).*?(?<!>)>\s*|(?:\s*<\h*/\h*(?:th|td)\h*>\h*)$", "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray . Edited January 25, 2015 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Kyan Posted January 25, 2015 Share Posted January 25, 2015 Nice work you done here How how can I fix this output? How the table looks like expandcollapse popup#include <Array.au3> #NoTrayIcon $b64tbl = "PHRhYmxlIGNsYXNzPSJkYXRhLWJvcmRlcmVkIG1heCI+DQogIDx0Ym9keT4NCiAgICA8dHI+DQogICAgICA8dGggd2lkdGg9IjcxIiBzY29wZT0iY29sIj5QbGF0Zm9ybTwvdGg+DQogICAgICA8dGggd2lkdGg9Ijg4IiBzY29wZT0iY29sIj5Ccm93c2Vy" & _ "PC90aD4NCiAgICAgIDx0aCB3aWR0aD0iMTAxIiBzY29wZT0iY29sIj5QbGF5ZXImbmJzcDt2ZXJzaW9uPC90aD4NCiAgICA8L3RyPg0KICAgIDx0cj4NCiAgICAgIDx0ZCByb3dzcGFuPSI0Ij48c3Ryb25nPldpbmRvd3M8L3N0cm9uZz48L3RkPg0KICAgICA" & _ "gPHRkPkludGVybmV0IEV4cGxvcmVyIC0gQWN0aXZlWDwvdGQ+DQogICAgICA8dGQ+MTYuMC4wLjI4NzwvdGQ+DQogICAgPC90cj4NCiAgICANCiAgICAgPHRyPg0KICAgICAgPHRkPkludGVybmV0IEV4cGxvcmVyIChXaW5kb3dzIDgueCkgLSBBY3RpdmVYPC" & _ "90ZD4NCiAgICAgIDx0ZD4xNi4wLjAuMjg3PC90ZD4NCiAgICA8L3RyPg0KICAgIA0KICAgIDx0cj4NCiAgICAgIDx0ZD5GaXJlZm94LCBNb3ppbGxhIC0gTlBBUEk8L3RkPg0KICAgICAgPHRkPjE2LjAuMC4yODc8L3RkPg0KICAgIDwvdHI+DQogICAgPHRyP" & _ "g0KICAgICAgPHRkPkNocm9tZSAoZW1iZWRkZWQpLCBPcGVyYSwgQ2hyb21pdW0tYmFzZWQgYnJvd3NlcnMgLSBQUEFQSTwvdGQ+DQogICAgICA8dGQ+MTYuMC4wLjI4NzwvdGQ+DQogICAgPC90cj4NCiAgICA8dHI+DQogICAgICA8dGQgcm93c3Bhbj0iMiI+" & _ "PHN0cm9uZz5NYWNpbnRvc2g8YnIgLz5PUyBYPC9zdHJvbmc+PC90ZD4NCiAgICAgIDx0ZD5GaXJlZm94LCBTYWZhcmkgLSBOUEFQSTwvdGQ+DQogICAgICA8dGQ+MTYuMC4wLjI4NzwvdGQ+DQogICAgPC90cj4NCiAgICA8dHI+DQogICAgICA8dGQ+Q2hyb21" & _ "lIChlbWJlZGRlZCksIE9wZXJhLCBDaHJvbWl1bS1iYXNlZCBicm93c2VycyAtIFBQQVBJPC90ZD4NCiAgICAgIDx0ZD4xNi4wLjAuMjg3PC90ZD4NCiAgICA8L3RyPg0KICAgIDx0cj4NCiAgICAgIDx0ZCByb3dzcGFuPSIyIj48c3Ryb25nPkxpbnV4PC9zdH" & _ "Jvbmc+PC90ZD4NCiAgICAgIDx0ZD5Nb3ppbGxhLCBGaXJlZm94IC0gTlBBUEkgKEV4dGVuZGVkIFN1cHBvcnQgUmVsZWFzZSk8L3RkPg0KICAgICAgPHRkPjExLjIuMjAyLjQzODwvdGQ+DQogICAgPC90cj4NCiAgICA8dHI+DQogICAgICA8dGQ+Q2hyb21lI" & _ "ChlbWJlZGRlZCksIENocm9taXVtLWJhc2VkIGJyb3dzZXJzIC0gUFBBUEk8L3RkPg0KICAgICAgPHRkPjE2LjAuMC4yOTE8L3RkPg0KICAgIDwvdHI+DQogICAgPHRyPg0KICAgICAgPHRkPjxzdHJvbmc+U29sYXJpczwvc3Ryb25nPjwvdGQ+DQogICAgICA8" & _ "dGQ+Rmxhc2ggUGxheWVyIDExLjIuMjAyLjIyMyBpcyB0aGUgbGFzdCBzdXBwb3J0ZWQgRmxhc2ggUGxheWVyIHZlcnNpb24gZm9yIFNvbGFyaXMuPC90ZD4NCiAgICAgIDx0ZD4xMS4yLjIwMi4yMjM8L3RkPg0KICAgIDwvdHI+DQogIDwvdGJvZHk+DQo8L3R" & _ "hYmxlPg==" $stbl = BinaryToString(_Base64Decode($b64tbl)) $table = _htmlraw_TableToArray($stbl) _ArrayDisplay($table) Exit Func _Base64Decode($input_string) ; by trancexx Local $struct = DllStructCreate('int') Local $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', 0, 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0) If @error Or Not $a_Call[0] Then Return SetError(1, 0, '') Local $a = DllStructCreate('byte[' & DllStructGetData($struct, 1) & ']') $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', DllStructGetPtr($a), 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0) If @error Or Not $a_Call[0] Then Return SetError(2, 0, '') Return DllStructGetData($a, 1) EndFunc ;==>_Base64Decode Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) Local $iUBRow = UBound($aRows), $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] For $j = 0 To $iUBCol - 1 $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], "^(?i)\h*<(?:th|td).*?(?<!>)>\h*|(?:\h*<\h*/\h*(?:th|td)\h*>\h*)$", "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Func _htmlraw_GetTableRows($sTable) Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) Return $aReg EndFunc ;==>_htmlraw_GetTableRows Func _htmlraw_GetTableCols($sData) Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) Return $aReg EndFunc ;==>_htmlraw_GetTableCols Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 25, 2015 Moderators Share Posted January 25, 2015 (edited) What do you mean "clean it up"? The functions produce exactly what the data shows? Do you mean the html tags ( , <strong></strong>, </br>)? Well you'd strip them (I'd imagine before sending them to the table functions). Are you talking about the format? Well that's something the CSS is taking care of I'm sure by the scope/width/columns etc... something totally outside what I understood was trying to be accomplished. Edit: I guess you mean the rowspan... ugh Edited January 25, 2015 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 25, 2015 Moderators Share Posted January 25, 2015 expandcollapse popupFunc _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol Local $aRowSpan Local Const $sRowSpanPatt = "(?is)<\s*(?:td|th)\h+rowspan=" & _ "(?:\x22|\x27)(\d+)(?:\x22|\x27)\s*>" Local Const $sRemoveTagPatt = "^(?is)\h*<(?:th|td).*?(?<!>)>" & _ "\s*|(?:\s*<\h*/\h*(?:th|td)\h*>\h*)$" Local $iRowCount = -1, $aTmp For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop ; take care of rowspan If $iRowCount > -1 Then Dim $aTmp[$iUBCol + 1] For $j = 0 To $iUBCol - 1 $aTmp[$j + 1] = $aCols[$j] Next $aCols = $aTmp $iUBCol = UBound($aCols) $iRowCount -= 1 EndIf If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 If $iRowCount = -1 Then $aRowSpan = StringRegExp($aRows[$i], $sRowSpanPatt, 1) $iRowCount = ((Not @error) ? $aRowSpan[0] - 2 : -1) EndIf $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], $sRemoveTagPatt, "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Kyan 1 Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Kyan Posted January 25, 2015 Share Posted January 25, 2015 Works flawlessly, thank you SmOke_N! Heroes, there is no such thing One day I'll discover what IE.au3 has of special for so many users using it.C'mon there's InetRead and WinHTTP, way better Link to comment Share on other sites More sharing options...
Gianni Posted January 26, 2015 Author Share Posted January 26, 2015 expandcollapse popupFunc _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol Local $aRowSpan Local Const $sRowSpanPatt = "(?is)<\s*(?:td|th)\h+rowspan=" & _ "(?:\x22|\x27)(\d+)(?:\x22|\x27)\s*>" Local Const $sRemoveTagPatt = "^(?is)\h*<(?:th|td).*?(?<!>)>" & _ "\s*|(?:\s*<\h*/\h*(?:th|td)\h*>\h*)$" Local $iRowCount = -1, $aTmp For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop ; take care of rowspan If $iRowCount > -1 Then Dim $aTmp[$iUBCol + 1] For $j = 0 To $iUBCol - 1 $aTmp[$j + 1] = $aCols[$j] Next $aCols = $aTmp $iUBCol = UBound($aCols) $iRowCount -= 1 EndIf If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 If $iRowCount = -1 Then $aRowSpan = StringRegExp($aRows[$i], $sRowSpanPatt, 1) $iRowCount = ((Not @error) ? $aRowSpan[0] - 2 : -1) EndIf $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], $sRemoveTagPatt, "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray This is something similar to what I'm trying to accomplish. manage both "COLSPAN" and "ROWSPAN" to fill only the first cell of the array corresponding with the "colspan" or "rowspan" area and leaving other cells empty. (or maybe also repeat the same value in all cells of the array corresponding to the "col/rowspan" could be an option) also, exclude any tag between <td and </td> and keeping only the data contained within the cell should give cleaner data. since my regexp skill is nearly 0, I am not able to modify your regexp to achieve my goal, so I will try to achieve this result maybe using string functions. thanks for your sample code Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 26, 2015 Moderators Share Posted January 26, 2015 (edited) @Chimp I can't find an example where colspan would play a need here. Do you have table code that would make sense to even pursue it? Edit: Maybe I found one, extending the Cols out 1 more if colspan was used on say the 2nd of 3 cols (eg. colspan="2" on the second column and there is still a 3rd to process) Edited January 26, 2015 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Gianni Posted January 26, 2015 Author Share Posted January 26, 2015 (edited) Hi SmOke_N here some tables, you can also see that if the tag <td contains also extra parameters other than only rowspan, like this for example, <td bgcolor="#d3d3d3" align="center" valign="middle" rowspan="2"> then your rowspan management via regexp fails.... expandcollapse popup#include <Array.au3> #include <ie.au3> $stbl = MyHtml() ConsoleWrite(@CRLF & $stbl & @CRLF) Local $oie = _IECreate() _IEDocWriteHTML($oie, $stbl) ; just to show the tables on the browser Do Sleep(250) Until IsObj($oie) $table = _htmlraw_TableToArray($stbl) _ArrayDisplay($table) Exit Func _htmlraw_GetTableRows($sTable) Local $sPatt = "(?si)<\s*tr(?:\s*|\s.+?)>.*?<\s*/\s*tr\s*>" Local $aReg = StringRegExp($sTable, $sPatt, 3) If @error Then Return SetError(1, 0, 0) Return $aReg EndFunc ;==>_htmlraw_GetTableRows Func _htmlraw_GetTableCols($sData) Local $sPatt = "(?si)(?:<\s*th(?:\s*|\s.+?)>.*?<\s*/\s*th\s*>|" & _ "<\s*td(?:\s*|\s.+?)>.*?<\s*/\s*td\s*>)+" Local $aReg = StringRegExp($sData, $sPatt, 3) If @error Then Return SetError(1, 0, 0) Return $aReg EndFunc ;==>_htmlraw_GetTableCols Func _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) ; _ArrayDisplay($aRows, '_htmlraw_GetTableRows') If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol Local $aRowSpan Local Const $sRowSpanPatt = "(?is)<\s*(?:td|th)\h+rowspan=" & _ "(?:\x22|\x27)(\d+)(?:\x22|\x27)\s*>" Local Const $sRemoveTagPatt = "^(?is)\h*<(?:th|td).*?(?<!>)>" & _ "\s*|(?:\s*<\h*/\h*(?:th|td)\h*>\h*)$" Local $iRowCount = -1, $aTmp For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) ; _ArrayDisplay($aCols, '_htmlraw_GetTableCols') $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop ; take care of rowspan If $iRowCount > -1 Then Dim $aTmp[$iUBCol + 1] For $j = 0 To $iUBCol - 1 $aTmp[$j + 1] = $aCols[$j] Next $aCols = $aTmp $iUBCol = UBound($aCols) $iRowCount -= 1 EndIf If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 If $iRowCount = -1 Then $aRowSpan = StringRegExp($aRows[$i], $sRowSpanPatt, 1) $iRowCount = ((Not @error) ? $aRowSpan[0] - 2 : -1) EndIf $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], $sRemoveTagPatt, "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Func MyHtml() Local $sHTML = "" $sHTML &= @CRLF & '<table border=1 class="data-bordered max">' $sHTML &= @CRLF & '<tbody>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<th width="71" scope="col">Platform</th>' $sHTML &= @CRLF & '<th width="88" scope="col">Browser</th>' $sHTML &= @CRLF & '<th width="101" scope="col">Player version</th>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td rowspan="4"><strong>Windows</strong></td>' $sHTML &= @CRLF & '<td>Internet Explorer - ActiveX</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td>Internet Explorer (Windows 8.x) - ActiveX</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td>Firefox, Mozilla - NPAPI</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td>Chrome (embedded), Opera, Chromium-based browsers - PPAPI</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td rowspan="2"><strong>Macintosh<br />OS X</strong></td>' $sHTML &= @CRLF & '<td>Firefox, Safari - NPAPI</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td>Chrome (embedded), Opera, Chromium-based browsers - PPAPI</td>' $sHTML &= @CRLF & '<td>16.0.0.287</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td rowspan="2"><strong>Linux</strong></td>' $sHTML &= @CRLF & '<td>Mozilla, Firefox - NPAPI (Extended Support Release)</td>' $sHTML &= @CRLF & '<td>11.2.202.438</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td>Chrome (embedded), Chromium-based browsers - PPAPI</td>' $sHTML &= @CRLF & '<td>16.0.0.291</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr>' $sHTML &= @CRLF & '<td><strong>Solaris</strong></td>' $sHTML &= @CRLF & '<td>Flash Player 11.2.202.223 is the last supported Flash Player version for Solaris.</td>' $sHTML &= @CRLF & '<td>11.2.202.223</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '</tbody>' $sHTML &= @CRLF & '</table>' $sHTML &= @CRLF & '<br><br>' $sHTML &= @CRLF & '<TABLE BORDER=1 CELLPADDING=4>' $sHTML &= @CRLF & '<tbody>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD rowspan=''3''>Production</TD>' $sHTML &= @CRLF & '<TD>Raha Mutisya</TD> <TD>1493</TD>' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD>Shalom Buraka</TD> <TD>3829</TD> ' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD>Brandy Davis</TD> <TD>0283</TD>' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' $sHTML &= @CRLF & '<TD>Claire Horne</TD> <TD>4827</TD>' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD>Bruce Eckel</TD> <TD>7246</TD>' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '<TR>' $sHTML &= @CRLF & '<TD>Danny Zeman</TD> <TD>5689</TD>' $sHTML &= @CRLF & '</TR>' $sHTML &= @CRLF & '</tbody>' $sHTML &= @CRLF & '</TABLE>' $sHTML &= @CRLF & '<br><br>' $sHTML &= @CRLF & '<TABLE BORDER=2 CELLPADDING=4>' $sHTML &= @CRLF & '<TR> <TH COLSPAN=2>Production2</TH> </TR>' $sHTML &= @CRLF & '<TR> <TD>Raha Mutisya</TD> <TD>1493</TD> </TR>' $sHTML &= @CRLF & '<TR> <TD>Shalom Buraka</TD> <TD>3829</TD> </TR>' $sHTML &= @CRLF & '<TR> <TD>Brandy Davis</TD> <TD>0283</TD> </TR>' $sHTML &= @CRLF & '<TR> <TH COLSPAN=2>Sales</TH> </TR>' $sHTML &= @CRLF & '<TR> <TD>Claire Horne</TD> <TD>4827</TD> </TR>' $sHTML &= @CRLF & '<TR> <TD>Bruce Eckel</TD> <TD>7246</TD> </TR>' $sHTML &= @CRLF & '<TR> <TD>Danny Zeman</TD> <TD>5689</TD> </TR>' $sHTML &= @CRLF & '<TR> <TD></TD> <TD></TD> </TR>' $sHTML &= @CRLF & '</TABLE>' $sHTML &= @CRLF & '<br><br>' $sHTML &= @CRLF & '<table border="1" cellpadding="0" cellspacing="0">' $sHTML &= @CRLF & '<tr height="50">' $sHTML &= @CRLF & ' <td align="center" width="150" rowspan="2">State of Health</td>' $sHTML &= @CRLF & ' <td align="center" width="300" colspan="2">Fasting Value</td>' $sHTML &= @CRLF & ' <td align="center" width="150">After Eating</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr height="50">' $sHTML &= @CRLF & ' <td align="center" width="150">Minimum</td>' $sHTML &= @CRLF & ' <td align="center" width="150">Maximum</td>' $sHTML &= @CRLF & ' <td align="center" width="150">2 hours after eating</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr height="50">' $sHTML &= @CRLF & ' <td align="center" width="150">Healthy</td>' $sHTML &= @CRLF & ' <td align="center" width="150">70</td>' $sHTML &= @CRLF & ' <td align="center" width="150">100</td>' $sHTML &= @CRLF & ' <td align="center" width="150">Less than 140</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr height="50">' $sHTML &= @CRLF & ' <td align="center" width="150">Pre-Diabetes</td>' $sHTML &= @CRLF & ' <td align="center" width="150">101</td>' $sHTML &= @CRLF & ' <td align="center" width="150">126</td>' $sHTML &= @CRLF & ' <td align="center" width="150">140 to 200</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '<tr height="50">' $sHTML &= @CRLF & ' <td align="center" width="150">Diabetes</td>' $sHTML &= @CRLF & ' <td align="center" width="150">More than 126</td>' $sHTML &= @CRLF & ' <td align="center" width="150">N/A</td>' $sHTML &= @CRLF & ' <td align="center" width="150">More than 200</td>' $sHTML &= @CRLF & '</tr>' $sHTML &= @CRLF & '</table>' $sHTML &= @CRLF & '<br><br>' $sHTML &= @CRLF & '<table width="400" cellpadding="10" cellspacing="0" border="1">' $sHTML &= @CRLF & '<tr><td bgcolor="#fa8072" align="center" valign="middle">' $sHTML &= @CRLF & '<font size="2" color="#000000" face="verdana">' $sHTML &= @CRLF & '<b>Cell One</b></font>' $sHTML &= @CRLF & '</td><td bgcolor="#d3d3d3" align="center" valign="middle" rowspan="2">' $sHTML &= @CRLF & '<font size="2" color="#000000" face="verdana">' $sHTML &= @CRLF & '<b>Cell Two</b></font>' $sHTML &= @CRLF & '</td>' $sHTML &= @CRLF & '<td bgcolor="#fa8072" align="center" valign="middle">' $sHTML &= @CRLF & '<font size="2" color="#000000" face="verdana">' $sHTML &= @CRLF & '<b>Cell Three</b></font>' $sHTML &= @CRLF & '</td></tr>' $sHTML &= @CRLF & '<tr><td bgcolor="#90ee90" align="center" valign="middle">' $sHTML &= @CRLF & '<font size="2" color="#000000" face="verdana">' $sHTML &= @CRLF & '<b>Cell Four</b></font>' $sHTML &= @CRLF & '</td>' $sHTML &= @CRLF & '<td bgcolor="#90ee90" align="center" valign="middle">' $sHTML &= @CRLF & '<font size="2" color="#000000" face="verdana">' $sHTML &= @CRLF & '<b>Cell Five</b></font>' $sHTML &= @CRLF & '</td></tr></table>' Return $sHTML EndFunc ;==>MyHtml Edited January 26, 2015 by Chimp Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted January 26, 2015 Moderators Share Posted January 26, 2015 Ahh, well this may fix that... not sure... don't have time at the moment to mess around. expandcollapse popupFunc _htmlraw_TableToArray($sTable) If Not StringLen($sTable) Then Return SetError(1, 0, 0) EndIf Local $aRows = _htmlraw_GetTableRows($sTable) If Not IsArray($aRows) Then Return SetError(2, 0, 0) EndIf Local $iUBRow = UBound($aRows) Local $aRet[$iUBRow][1] Local $aCols, $iEnum = 0, $iUBCol Local $aRowSpan Local Const $sRowSpanPatt = "(?is)<\s*(?:td|th)\h+.*?(?<!>)\hrowspan=" & _ "(?:\x22|\x27)(\d+)(?:\x22|\x27).*?(?<!>)>" Local Const $sRemoveTagPatt = "^(?is)\h*<(?:th|td).*?(?<!>)>" & _ "\s*|(?:\s*<\h*/\h*(?:th|td)\h*>\h*)$" Local $iRowCount = -1, $aTmp For $i = 0 To $iUBRow - 1 $aCols = _htmlraw_GetTableCols($aRows[$i]) $iUBCol = UBound($aCols) If Not $iUBCol Then ContinueLoop ; take care of rowspan If $iRowCount > -1 Then Dim $aTmp[$iUBCol + 1] For $j = 0 To $iUBCol - 1 $aTmp[$j + 1] = $aCols[$j] Next $aCols = $aTmp $iUBCol = UBound($aCols) $iRowCount -= 1 EndIf If $iUBCol > UBound($aRet, 2) Then ReDim $aRet[$iUBRow][$iUBCol] EndIf For $j = 0 To $iUBCol - 1 If $iRowCount = -1 Then $aRowSpan = StringRegExp($aRows[$i], $sRowSpanPatt, 1) $iRowCount = ((Not @error) ? $aRowSpan[0] - 2 : -1) EndIf $aRet[$iEnum][$j] = StringRegExpReplace($aCols[$j], $sRemoveTagPatt, "") Next $iEnum += 1 Next Return $aRet EndFunc ;==>_htmlraw_TableToArray Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now