Jump to content

Recommended Posts

Posted

Hello,

I'm creating a program to help me analyze stocks. So a big part of my tool is web scraping.

As of yesterday, the _INetGetSource command seemed to stop working for the page that tells me a stock's info which is on yahoo finance (I'll ref the link below). Here is the simplified version of the code.

#include <Inet.au3>

ConsoleWrite(_INetGetSource('https://finance.yahoo.com/quote/AAPL'))

It's strange because before yesterday, it was pulling the code from those pages correctly. The _INetGetSource will work for most other yahoo pages, even the finance home page (finance.yahoo.com) but not the page that shows me a specific stock's info.

Does anyone know why it stopped giving me the source code for those pages? 

Posted (edited)

Hi @BlazerV60,

how does your output look like? What do you search exactly?
This site uses iframes on several sections - this might be a problem.
 

  On 4/17/2024 at 1:33 AM, BlazerV60 said:

So a big part of my tool is web scraping.

Expand  

Which tool(s) do you use for the web scraping?

Are you only interested in getting the whole page content (source) ? Then you parse the needed data?
I guess it could be better to only get the expected data directly (e.g. WebDriver (au3WebDriver project) or by UIA).

  On 4/17/2024 at 1:33 AM, BlazerV60 said:

Does anyone know why it stopped giving me the source code for those pages? 

Expand  

How? Only the maintainer(s)/developer(s) of the page know if there were changes.

I am interested in helping you. So please provide more context 🤝 .

Best regards
Sven

Edited by SOLVE-SMART

==> AutoIt related: 🔗 GitHub, 🔗 Discord Server

  Reveal hidden contents
Posted

The output comes out as weird characters like a symbol as shown in the attached image.

 

I'm trying to get the whole page content (source) and then parse the data I need. 

I'm just wondering if you know a way to get the source code for that page again. I'd do it the _IECreate /w _IEDocReadHTML method but that makes my tool run slower because it technically has to open a hidden browser and then take the source code from that and then close that hidden browser, but if there's no solution to the _InetGetSource way then i'll do it.

image.png

Posted

There are several questions not answered so far @BlazerV60. So I have to answer to this ...

  On 4/18/2024 at 1:14 AM, BlazerV60 said:

I'm just wondering if you know a way to get the source code for that page again.

Expand  

... simply with yes, use the WebDriver or try to use UIA. Both can be found several times here at the forum by the search box.

  On 4/18/2024 at 1:14 AM, BlazerV60 said:

I'd do it the _IECreate /w _IEDocReadHTML method but that makes my tool run slower because it technically has to open a hidden browser and then take the source code from that and then close that hidden browser [...]

Expand  

This would also be the case for using WebDriver or UIA too. But in headless mode it's not that bad (it's quick enough I would say).

Best regards
Sven

==> AutoIt related: 🔗 GitHub, 🔗 Discord Server

  Reveal hidden contents
Posted
  On 4/18/2024 at 9:12 AM, SOLVE-SMART said:

There are several questions not answered so far @BlazerV60. So I have to answer to this ...

... simply with yes, use the WebDriver or try to use UIA. Both can be found several times here at the forum by the search box.

This would also be the case for using WebDriver or UIA too. But in headless mode it's not that bad (it's quick enough I would say).

Best regards
Sven

Expand  

Yeah but my tool usually looks at over 100 stocks on any given day so 100x the extra lag time on using _IECreate does make things a little slower but it looks like it's the only way for me to go about this. 

So I guess the reason why inetgetsource stopped working on the page I referenced is due to the page implementing iframes?

Posted

That's what I also though on the first look into the DOM structure @argumentum .

  On 4/18/2024 at 3:42 PM, BlazerV60 said:

Yeah but my tool usually looks at over 100 stocks on any given day so 100x the extra lag time on using _IECreate does make things a little slower but it looks like it's the only way for me to go about this. 

Expand  

I understand this, but me guess is in case you would only scrape you target information instead of trying to get all of the page, it shouldn't be very slow.
You also can implement multiple instances of the chromedriver to do the scraping actions in "parallel".

Ones again, if you could specific which data you need from which page, we could possibly make other/better suggestions.
Besides that, give the au3WebDriver Project a chance. For a quick start I refer to this post.

Best regards
Sven

==> AutoIt related: 🔗 GitHub, 🔗 Discord Server

  Reveal hidden contents
Posted

Sure, I'm specifically only trying to grab the share price from the page, so right now it's showing around $166.

 

I'll look into the au3webdriver.

Posted (edited)

As a complete different approach:
Read this article that could be helpful - I don't know. => Usage of an API (yahoo finance) to get the information you want.
https://algotrading101.com/learn/yahoo-finance-api-guide/

💡 I was just searching for "yahoo finance api" on google ... several API ideas.

Best regards
Sven

Edited by SOLVE-SMART

==> AutoIt related: 🔗 GitHub, 🔗 Discord Server

  Reveal hidden contents

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...