tommytx Posted March 1, 2013 Posted March 1, 2013 Normally I load html files into the browser and grab the html using the $oIE model..then I can run the DOM stuff on each file loaded into the browser... but the problem is that I need to load several hundred html files with bytes in the 100k range. To load each into the browser to process takes a long time... So I load the html files into a string for example $str and try to run the dom stuff on that string and of course it is not an object so it won't cooperate. Is there a way to fake it out or convert the string $str to an object so the DOM stuff can do its thing.. I am loading the $str from the old basic open file and read line by line.. Then I am using the following.. $str = $str & $line so that in short order the files are loaded... maybe 1 or 2 seconds per file.. but to load the same file in the browser may take 7 or 8 seconds... Can anyone help...
kylomas Posted March 1, 2013 Posted March 1, 2013 tommytx, This will ghet you started. This is some code that I use to test scraping routines. The source is a simple text file downloaded with inietget(). expandcollapse popup#include <StaticConstants.au3> #include <WindowsConstants.au3> #include <IE.au3> #include <array.au3> #include <string.au3> #AutoIt3Wrapper_Add_Constants=n local $fln = 'k:\sd\sd0100\nba\boxes\400440940' ; this is a file downloaded with inetget filedelete(@tempdir & '\tmp.txt') filewrite(@tempdir & '\tmp.txt',_do_tbls( fileread($fln) )) shellexecute(@tempdir & '\tmp.txt') func _do_tbls($html) $html = stringreplace($html,@crlf,'') $html = stringreplace($html,@cr,'') Local $o_htmlfile = ObjCreate('HTMLFILE'), $str If Not IsObj($o_htmlfile) Then Return SetError(-1) $o_htmlfile.open() $o_htmlfile.write($html) $o_htmlfile.close() Local $otbls = _IETagnameGetCollection($o_htmlfile, 'TABLE') if not isobj($otbls) then return seterror(-2) Local $otitles = _IETagnameGetCollection($o_htmlfile, 'TITLE') if not isobj($otitles) then return seterror(-3) for $otitle in $otitles ConsoleWrite($otitle.innertext & @LF) next Local $odivs = _IETagnameGetCollection($o_htmlfile, 'DIV') if not isobj($odivs) then return seterror(-4) for $odiv in $odivs ConsoleWrite('!---- ' & 'id = ' & $odiv.id & ' classname= ' & $odiv.classname & ' title = ' & $odiv.title & @LF) $str &= $odiv.innertext & @LF next for $otbl in $otbls ConsoleWrite(stringformat('ID = %-30sTITLE = %-30sSUMMARY = %-30s',$otbl.id, $otbl.title, $otbl.summary) & @lf) $a10 = _IETableWriteToArray($otbl,true) if not isarray($a10) then continueloop _arraydisplay($a10) for $1 = 0 to ubound($a10,1) - 1 for $2 = 0 to ubound($a10,2) - 1 $str &= $a10[$1][$2] & '`' Next $str &= @LF next next return $str endfunc kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now