Seminko Posted September 3, 2019 Share Posted September 3, 2019 (edited) I have this code: #include <Inet.au3> #include <Debug.au3> #include <Array.au3> $pageSource = _INetGetSource("https://www.csfd.cz/film/508676-triple-threat/") If @error Then MsgBox(16, "", "inet error") ElseIf StringLen($pageSource) < 50 Then MsgBox(16, "short source", $pageSource) ElseIf StringLen($pageSource) >= 50 Then MsgBox(1, "", StringLen($pageSource)) ; shows 12681 characters $arr = StringRegExp($pageSource, '(?is)(?:<h1 itemprop="name">\s*|<li>\s*<img[^>]+>\s*<h3>)(.*?)\s*<', 3) If @error Then MsgBox(16, "regex error", $pageSource) ; shows only �‹� EndIf ClipPut($pageSource) ; puts only �‹� into the clip MsgBox(1, "", BinaryToString(StringToBinary(_ArrayToString($arr), 1), 4)) EndIf When I check for the length of the string returned by _INetGetSource it is 12681 characters, however when I put the $pageSource into a msgbox or clip, it only shows three characters '‹', triggering the regex error. When I go to the page manually and copy the source, the regex works. Any ideas what might be causing it? EDIT: I should add that this does happen only for a handfull of URLs out of the set which appear to be random. Edited September 3, 2019 by Seminko Link to comment Share on other sites More sharing options...
jchd Posted September 3, 2019 Share Posted September 3, 2019 The site most of the times returns a variable bunch of spurious characters, including strings of NULs (0x00). You should get binary then convert that to string before use (else the conversion is forced by functions you call.) Local $pageSource = _INetGetSource("https://www.csfd.cz/film/508676-triple-threat/", 0) If @error Then MsgBox(16, "", "inet error") Exit EndIf ConsoleWrite(BinaryLen($pageSource) & @LF & $pageSource & @LF) The length is very variable, as well as the content. Sample successive runs: 13319 0x1F8B0800000000000003ED7D5B731B57B6DEB3FC2BF6608E87E40C1AF70B419174C9926C59B6648D45DB734652B... 12947 0x1F8B0800000000000003ED7D49931B4796E699FC152E54AB32B384C0BE24929929E326519448B1446A2991B4340... 12592 0x1F8B0800000000000003ED7D5B731B4796E6B3F42BB2D1E3263946E17E2128920E5D6CCBB225AB2D4AEEB6A4602... 13568 0x1F8B0800000000000003ED9D4B731C579698D7D4AFC8AE1935C0E97A3F0190808222A5564B22C51129A9A745062... This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Seminko Posted September 4, 2019 Author Share Posted September 4, 2019 (edited) Thanks @jchd. Have you ever encountered something like that? Any idea what might be causing it and if it can be avoided? First thing that comes to mind is try using a get request. EDIT: WinHTTP same behavior, basic Curl get doesn't return anything Edited September 4, 2019 by Seminko Link to comment Share on other sites More sharing options...
jchd Posted September 4, 2019 Share Posted September 4, 2019 I've no idea. Maybe compressed content or a picture or ... who knows? Browsers know how to handle that but only web gurus can tell. Seminko 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Seminko Posted September 4, 2019 Author Share Posted September 4, 2019 Alright, thanks anyways! Link to comment Share on other sites More sharing options...
Seminko Posted September 4, 2019 Author Share Posted September 4, 2019 I think I figured it out. The return is most likely GZIPped. Not sure how I can "unzip" it using _INetGetSource, but I do how to do that using Curl. Link to comment Share on other sites More sharing options...
jchd Posted September 4, 2019 Share Posted September 4, 2019 Probably but what's strange is that the received binary is always one of the 4 types I dumped above. Maybe it's hidden ad or womething . This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now