Brickoneer Posted June 22, 2008 Posted June 22, 2008 Hey guys, I've been using IE.au3 for just about everything, but for my most recent project I'm having a bit of trouble. My script needs to read the text from a massive (i.e. 40-50 thousand lines of text) webpage. It either reads the entire page in less than 5 seconds, or locks the entire machine up. (It sometimes makes it out of the crash after a few minutes... sometimes not) All I'm using is the basic _IEBodyReadText($oIE)... nothing fancy. Is there something I can do to have it "play" with big webpages "nicer"? Thanks! Brickoneer
DaleHohm Posted June 22, 2008 Posted June 22, 2008 There would be no explanation in IE.au3 that I could offer for this. Two things come to mind as likely sources of your trouble. First, are you certain that it is the _IEBodyReadText that is the source of the slowdown? Insure that it is not instead that AutoIt is waiting for the page to complete loading before executing the _IEBodyReadText function (use SciTe Debug mode as one method of discerning this (or sprinkle in some ConsoleWrite commands). If it is not the first issue, the next most likely cause is a process or system performance bottle-neck not directly related to IE.au3. Use the System Monitor to try to figure out what is consuming CPU or memory during this time. If you have multiple IE windows going, this could also contribute. Especially if it works quickly sometimes and not others, it is unlikely that IE.au3 is the source of the trouble directly. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
Brickoneer Posted June 29, 2008 Author Posted June 29, 2008 (edited) Ok, here's what I've found. The part that is the actual slowdown is _IEBodyReadText... particularly the line: Return $o_object.document.body.innerText (of course that is basically the entire function right there... but that is the slowdown.) By monitoring the machine, as soon as it goes to the _IEBodyReadText function the IE process hits a constant 99% CPU load, and memory for the IE browser hits 60,000 k. Now it is semi-reliably reading the text in about 8.5 seconds. If I can get it to "break" again I guess I'll be back. Brick [edit] It failed again. I finished typing out this post, clicked over to the IE my script was trying to read and it was frozen, not responding, and locked at 99% CPU. I gave it a few minutes to see if it would recover and it never did. I'm really not sure what the problem is... maybe I should try and do it at a lower level like the HTTP UDF. (I even failed miserably at that... I managed to get the first 2-7 bytes of the webpage html and thats it.) [edit 2] I was wrong. It did manage to pull out. The _IEBodyReadText function took just over 10 minutes to read about 30-40,000 lines of text. Edited June 29, 2008 by Brickoneer
DaleHohm Posted June 29, 2008 Posted June 29, 2008 Just because you found an instruction that takes a long time to complete does not mean that you have identified root cause. As I suggested, there is likely some other performance bottleneck that you've hit that is exacerbated by this operation. In particular, look for things that may be consuming a lot of memory -- it may also be that the amount of text you are reading is causing you to deplete the memory on your system. You can also try starting IE without any Add-ons (Start, Programs, Accessorits, System Tools) and see it it is possibly a destructive interaction with something you have added to IE. If the text you are reading is really huge, there would be more efficient ways of accessing it. Using _IEBodyReadText you actually must first get it loaded into IE and then you make a copy of it in AutoIt, so you consume double the resources. Something like INetGet or TCP communication would only require you hold it in memory once. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now