Jump to content

I can no longer scrape this page, can you?


 Share

Recommended Posts

https://steamdb.info/app/264710/info/

For my program, the ID number is replaced by a variable.  For these tests, a single site is used.

I've been scraping these pages for over a year without issue: now I can't even use _IEDocReadHTML, it just returns "0".  Using _INetGetSource and FileWrite now give me an empty file and "$WinHttpReq = ObjCreate("WinHttp.WinHttpRequest.5.1")"  get's me a redirected page that is NOT the page requested.

Some examples that do NOT work for me (and this one worked fine yesterday!)

Quote

$URLSteam = "https://steamdb.info/app/264710/info/"

_INetGetSource($URLSteam)
        Local $WinHttpReq = ObjCreate("WinHttp.WinHttpRequest.5.1")
        If Not @error Then
            $WinHttpReq.Open("GET", $URLSteam, false)
            $WinHttpReq.Send()
            Local $Data = $WinHttpReq.ResponseText

            Local $Array_32bit_Icons = _StringBetween($Data, 'avatar" src=', '" alt=')
            _ArrayDisplay($Array_32bit_Icons)

Will result in no output as the array has no information.  Change the "stringbetween" parameters to "" and "" and you will get info that shows a page that has a header that mentions redirecting robots and the following data is some kind of code that I don't recognize.

Quote

$Install_DIR = @WorkingDir & "\"

Local $URL = _INetGetSource("https://steamdb.info/app/264710/info/")
FileWrite($Install_DIR & "Temp\url.html", $URL)

Results in an empty html file.

Quote

$URLSteam = "https://steamdb.info/app/264710/info/"

Local $sHTML = _IEDocReadHTML($URLSteam)
MsgBox($MB_SYSTEMMODAL, "Document Source", $sHTML)

This comes up with a console error of "_IEDocReadHTML, $_IESTATUS_InvalidDataType"

and of course "INetRead" does NOT read this page, but I already knew that.

 

Help?  This is driving me nuts!  I can't even save the file?  Wtf am I missing?  The page exists and I can view and save it manually, just not with any Autoit Script anymore!

Edited by Strydr
Link to comment
Share on other sites

Unsure why your prior code stopped working. Have you recently updated the Windows install?

FWIW, the following works for me --

#include <String.au3>
#include <Array.au3>
#include <WinHttp.au3> ; https://www.autoitscript.com/forum/topic/84133-winhttp-functions/

Local $sResponseText, $iResult = 0
Local $sURL = "https://steamdb.info/app/264710/info/"
Local $Array_32bit_Icons

Local $aURL = _WinHttpCrackUrl($sURL)
Local $hOpen = _WinHttpOpen()

; Get connection handle
Local $hConnect = _WinHttpConnect($hOpen, $aURL[2], $aURL[3])

If @error Then
    $iResult = 1
Else
    $sResponseText = _WinHttpSimpleSSLRequest($hConnect, "GET", $aURL[6])

    If @error Then
        $iResult = 2
    Else
        $Array_32bit_Icons = _StringBetween($sResponseText, 'avatar" src=', '" alt=')
        _ArrayDisplay($Array_32bit_Icons)
    EndIf
EndIf

_WinHttpCloseHandle($hConnect)
_WinHttpCloseHandle($hOpen)

ConsoleWrite("$iResult=" & $iResult & @CRLF)
ConsoleWrite("$sResponseText=" & $sResponseText & @CRLF)

 

Link to comment
Share on other sites

Well that works!  Thank you!  Thank you for including the "WinHttp.au3" info!!! 

Now to see if I can figure out what it's doing!  :D    Really bugs me that such simple script will no longer work on that site, it's just a public information page!

Edit: no, haven't had any updates since last night.  I did find a reference to MS security Certificates expiring, can anyone else confirm that the simple commands that I was using are working? -if so, then it's probably an IE setting somewhere?

Edited by Strydr
Link to comment
Share on other sites

Hello. this works for me.

#include <Array.au3>
#include <String.au3>

Local $URLSteam = "https://steamdb.info/app/264710/info/"
Local $oHttp = ObjCreate("WinHttp.WinHttpRequest.5.1")
$oHttp.Open("GET", $URLSteam, False)
$oHttp.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 4.0.20506)")
$oHttp.Send()
Local $sData = $oHttp.ResponseText

ConsoleWrite($sData & @CRLF)

Local $aArray_32bit_Icons = _StringBetween($sData, 'avatar" src=', '" alt=')
_ArrayDisplay($aArray_32bit_Icons)

Saludos

Link to comment
Share on other sites

Yay! That works, too! AND it answers my question of if I was using "user-agent" properly: nope! Gots more learnin' to do!  Thank you!!

Edit:  and it partially answers what's going on: something to do with ie or ie settings.

Edit2: adding that user agent line in the appropriate place makes my program work again as well!  Freakin' awesome!  Thank you again!

Edited by Strydr
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...