_Inetgetsource doesn't work for this yahoo url

BlazerV60 · April 17, 2024

Hello,

I'm creating a program to help me analyze stocks. So a big part of my tool is web scraping.

As of yesterday, the _INetGetSource command seemed to stop working for the page that tells me a stock's info which is on yahoo finance (I'll ref the link below). Here is the simplified version of the code.

#include <Inet.au3>

ConsoleWrite(_INetGetSource('https://finance.yahoo.com/quote/AAPL'))

It's strange because before yesterday, it was pulling the code from those pages correctly. The _INetGetSource will work for most other yahoo pages, even the finance home page (finance.yahoo.com) but not the page that shows me a specific stock's info.

Does anyone know why it stopped giving me the source code for those pages?

SOLVE-SMART · April 17, 2024

Hi @BlazerV60,

how does your output look like? What do you search exactly?
This site uses iframes on several sections - this might be a problem.

6 hours ago, BlazerV60 said:

So a big part of my tool is web scraping.

Which tool(s) do you use for the web scraping?

Are you only interested in getting the whole page content (source) ? Then you parse the needed data?
I guess it could be better to only get the expected data directly (e.g. WebDriver (au3WebDriver project) or by UIA).

6 hours ago, BlazerV60 said:

Does anyone know why it stopped giving me the source code for those pages?

How? Only the maintainer(s)/developer(s) of the page know if there were changes.

I am interested in helping you. So please provide more context 🤝 .

Best regards
Sven

Edited April 17, 2024 by SOLVE-SMART

BlazerV60 · April 18, 2024

The output comes out as weird characters like a symbol as shown in the attached image.

I'm trying to get the whole page content (source) and then parse the data I need.

I'm just wondering if you know a way to get the source code for that page again. I'd do it the _IECreate /w _IEDocReadHTML method but that makes my tool run slower because it technically has to open a hidden browser and then take the source code from that and then close that hidden browser, but if there's no solution to the _InetGetSource way then i'll do it.

SOLVE-SMART · April 18, 2024

There are several questions not answered so far @BlazerV60. So I have to answer to this ...

7 hours ago, BlazerV60 said:

I'm just wondering if you know a way to get the source code for that page again.

... simply with yes, use the WebDriver or try to use UIA. Both can be found several times here at the forum by the search box.

7 hours ago, BlazerV60 said:

I'd do it the _IECreate /w _IEDocReadHTML method but that makes my tool run slower because it technically has to open a hidden browser and then take the source code from that and then close that hidden browser [...]

This would also be the case for using WebDriver or UIA too. But in headless mode it's not that bad (it's quick enough I would say).

Best regards
Sven

BlazerV60 · April 18, 2024

6 hours ago, SOLVE-SMART said:

There are several questions not answered so far @BlazerV60. So I have to answer to this ...

... simply with yes, use the WebDriver or try to use UIA. Both can be found several times here at the forum by the search box.

This would also be the case for using WebDriver or UIA too. But in headless mode it's not that bad (it's quick enough I would say).

Best regards
Sven

Yeah but my tool usually looks at over 100 stocks on any given day so 100x the extra lag time on using _IECreate does make things a little slower but it looks like it's the only way for me to go about this.

So I guess the reason why inetgetsource stopped working on the page I referenced is due to the page implementing iframes?

argumentum · April 18, 2024

That page is a cluster f.. .I've looked at it with image.png.ae815f481140116301f384513a744ae5.png and I have no idea on how you did the scraping with just InetRead().

SOLVE-SMART · April 18, 2024

That's what I also though on the first look into the DOM structure @argumentum .

1 hour ago, BlazerV60 said:

Yeah but my tool usually looks at over 100 stocks on any given day so 100x the extra lag time on using _IECreate does make things a little slower but it looks like it's the only way for me to go about this.

I understand this, but me guess is in case you would only scrape you target information instead of trying to get all of the page, it shouldn't be very slow.
You also can implement multiple instances of the chromedriver to do the scraping actions in "parallel".

Ones again, if you could specific which data you need from which page, we could possibly make other/better suggestions.
Besides that, give the au3WebDriver Project a chance. For a quick start I refer to this post.

Best regards
Sven

BlazerV60 · April 18, 2024

Sure, I'm specifically only trying to grab the share price from the page, so right now it's showing around $166.

I'll look into the au3webdriver.

SOLVE-SMART · April 18, 2024

As a complete different approach:
Read this article that could be helpful - I don't know. => Usage of an API (yahoo finance) to get the information you want.
https://algotrading101.com/learn/yahoo-finance-api-guide/

💡 I was just searching for "yahoo finance api" on google ... several API ideas.

Best regards
Sven

Edited April 18, 2024 by SOLVE-SMART

Sign In

_Inetgetsource doesn't work for this yahoo url

Recommended Posts

BlazerV60

SOLVE-SMART

BlazerV60

SOLVE-SMART

BlazerV60

argumentum

SOLVE-SMART

BlazerV60

SOLVE-SMART

Create an account or sign in to comment

Create an account

Sign in

Similar Content

Script won't launch or won't exit after launching URLs in ShellExecute

I wanted to send a message with HTTP/GET to the URL https://ghsff.it/

How can i get chrome current url ? - (Moved)

RunAs with _IECreate - (Moved)

question about WinHttp

Browse

AutoIt Resources

Release

Beta