Jump to content

speed up loading web page


frank10
 Share

Recommended Posts

I'm trying to collect some info on some Amazon pages.

I'm using IE.au3 like this example:

local $url = "www.amazon.com/Very-Stable-Genius-Testing-America-ebook/dp/B07WQQRMGP/ref=zg_bs_157325011_5?_encoding=UTF8&psc=1&refRID=R5Z6WM52CT1EK1QA91YR"
        
        Local $oIE = _IECreate('', 0, 0)

        _IENavigate($oIE, $url)

        $sHTML = _IEBodyReadHTML($oIE)

 

So, I tried InetGetSource but I get no readable char on response...

I tried winHttp GET and I get this answer in the BODY page:

<!--
        To discuss automated access to Amazon data please contact api-services-support@amazon.com.
        For information about migrating to our APIs refer to our Marketplace APIs at  .......
-->

So, what could I do to get faster loading page? I tried also disabling Images loading with key registry and it improves, but I would like faster.
At this time I get HTML code in 2 - 4 sec...

Link to comment
Share on other sites

try to specify the UserAgent of a browser so the amazon website believes that you are browsing with a browser and not with a script

#include <InetConstants.au3>

HttpSetUserAgent('Mozilla / 5.0')

Local $url = "https://www.amazon.com/Very-Stable-Genius-Testing-America-ebook/dp/B07WQQRMGP/ref=zg_bs_157325011_5?_encoding=UTF8&psc=1&refRID=R5Z6WM52CT1EK1QA91YR"
$sHTML = InetRead($url, $INET_FORCERELOAD + $INET_IGNORESSL)
ConsoleWrite(BinaryToString($sHTML) & @CRLF)

 

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

11 hours ago, Chimp said:

try to specify the UserAgent of a browser so the amazon website believes that you are browsing with a browser and not with a script

#include <InetConstants.au3>

HttpSetUserAgent('Mozilla / 5.0')

Local $url = "https://www.amazon.com/Very-Stable-Genius-Testing-America-ebook/dp/B07WQQRMGP/ref=zg_bs_157325011_5?_encoding=UTF8&psc=1&refRID=R5Z6WM52CT1EK1QA91YR"
$sHTML = InetRead($url, $INET_FORCERELOAD + $INET_IGNORESSL)
ConsoleWrite(BinaryToString($sHTML) & @CRLF)

 

thank you Chimp, that works, but it's not faster than IE.

Probably, as Bert said, it can be created a filter to speed up.

 

But, insted I got a way to make IE faster: 

_IENavigate($oIE, $url,0)

This way it does not wait until the page is loaded and I can check for what I want loaded with

$oIE.document.body.innerHTML

 

Link to comment
Share on other sites

One of the things I've noticed with many web pages today is the sequence of which a page loads. You will see the ads load first and many times you will also see "waiting on..." page to load. Meanwhile, you are staring at the ad waiting on the page to load. I firmly believe the wait times are deliberate just to make you look at the ad. 

It's why I use an ad blocker. I would not have nearly as much issue with ads if it wasn't for this "feature". When ads are block, the page loads fast. With ads - slow load. 

My ad blocker of choice is using a DNS server. That way all of your devices are covered. 

Edited by Bert
Link to comment
Share on other sites

mmm after some working tests, from time to time I get again:

robot.txt

<p class="a-last">Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.</p>

it's a captcha code...

so I can't do this way. Patience...

Edited by frank10
Link to comment
Share on other sites

I used many chrome extensions for Amazon and they work flawlessly in parsing Amazon pages...

Do you know how chrome extensions for Amazon can parse all Amazon pages without this annoying captchas? Is there a way to make autoit+IE behave like a chrome extension to parse data?

Edited by frank10
Link to comment
Share on other sites

You didn't provide any examples of Chrome extensions, so I can only assume that they are reading the HTML from a loaded page in the browser. If that's correct, then IMO it's no different than what you are doing with the _IE commands. You could use the Webdriver UDF to do the same thing, but it may not be any faster than what you already have using IE.

To avoid the "robot check", you may need to throw in some random pauses to your script. However, you may want to review their TOS as I suspect you may be running afoul with it.

Link to comment
Share on other sites

For example there is the extension KDspy that you start from an Amazon page that contains 20-50 books in it.

The extension starts loading in the background each book page url to get some data inside it and display it as a summary in its window. It takes about 1'' per book.

With my method with IE I got similar speed result, but now it starts this captcha thing... and it puts them also if I leave the LoadWait that slowes down it to 3-4'' per book.

I would like to reproduce the KDspy beahviour if possible. Random video just to see it in action:

But, yes, after searching a bit, it seems one must scrape data with their API, in AWS instead of normal HTML pages...

KDspy.jpg

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...