vinnyMS1 Posted July 6, 2022 Author Share Posted July 6, 2022 help plz Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 6, 2022 Moderators Share Posted July 6, 2022 vinnyMS1, You have already been told not to bump your own threads within 24 hrs - please do not do it again. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
pixelsearch Posted July 6, 2022 Share Posted July 6, 2022 Does InetRead work on any website ? I'm not really sure of that. If it doesn't work on the required website, then a way to do it is to write this kind of automated script : 1) Get the 1st link from the text file and copy it to the Clipboard.https://www.example.org/list/?page=1 2) Activate the Browser's window and paste the clipboard into the Browser's url box (the browser should be opened before the script is run) then Send Enter to display the web page. 3) When the webpage is fully displayed, Send Ctrl+U to get the html source opened in a new tab of the browser (Ctrl+U works with Chrome, FireFox, Opera...) 4) Ctrl+A to select the whole source of the webpage, then Ctrl+C to copy the source to the Clipboard 5) Use the RegEx part to retrieve the links desired from the Clipboard (i.e. from the source) 6) Write those links in your output text file 7) Close the tabs from the Browser 8) Loop on 1) for the next linkhttps://www.example.org/list/?page=2 It's not as simple as InetRead but it works, if accurate Sleep() are added in the script, giving time for the pages to load etc... But you really need to check precisely every part of the process, adding tests and error checking to make sure everything is going fine and nothing hangs. As you got 374 links to check, then it's worth the time to spend writing the script, which will loop 374 times and repeat automatically all parts from 1) to 8) . Also the script may help you later when you'll reuse it... and it's a good/fun way to learn AutoIt ! Good luck Link to comment Share on other sites More sharing options...
mikell Posted July 6, 2022 Share Posted July 6, 2022 (edited) Hmmmyes. The main problem is the way you can get the source of the page. InetRead doesn't always work indeed, so you can use the way pixelsearch mentioned, or use curl, etc After that the regex must obviously be adapted to fit the search of the required data For example, using a txt file containing the source of the page1 link you provided, this code works for me ;https://www.lomcn.org/forum/members/list/?page=1 #Include <Array.au3> $txt = FileRead("site_page1.txt") ; source $list = "" $items = StringRegExp($txt, 'forum/members/(\w+\.\d+)', 3) $items = _ArrayUnique($items) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next Msgbox(0,"", $list) ;FileWrite("results.txt", $list) Edit@pixelsearch please try this ;https://www.lomcn.org/forum/members/list/?page=1 #Include <Array.au3> $out = _XP_Read("https://www.lomcn.org/forum/members/list/?page=1") ;ConsoleWrite($out & @crlf) $list = "" $items = StringRegExp($out, 'forum/members/(\w+\.\d+)', 3) $items = _ArrayUnique($items) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next Msgbox(0,"", $list) ;FileWrite("results.txt", $list) Func _XP_Read($url) Local $cmd = "curl -L -s -k -A 'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1)' " & $url Local $iPID = Run($cmd, "", @SW_HIDE, 2) ;$STDOUT_CHILD ProcessWaitClose($iPID) Local $output = StdoutRead($iPID) Return $output EndFunc I love curl Edited July 6, 2022 by mikell Zedna and pixelsearch 2 Link to comment Share on other sites More sharing options...
pixelsearch Posted July 6, 2022 Share Posted July 6, 2022 @mikell I've spent the last couple of hours dealing with curl (i never used it before and discovered it in your last post, before you edited it). So I went on curl website, trying to find the right version to download for windows. Finally I got it working, thanks to your syntax described in this post and it worked on the test link provided yesterday by OP (as InetRead didn't make it for me) : https://www.lomcn.org/forum/members/list/?page=1 The html source code filled AutoIt console, no error ! Thanks also to this post, where "VIP (I'm trong)" had a problem with InetGet and Error 13, which was nicely solved by using HttpSetUserAgent, this could help users sometimes. Only now I see your edited post above (which will definitely give a correct result as it already worked for me, using the syntax in your link I mentioned above) Guess I'm starting to like curl too, it's never too late to learn something new ! Link to comment Share on other sites More sharing options...
pixelsearch Posted July 6, 2022 Share Posted July 6, 2022 @mikell great job, this is the result I got : Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 6, 2022 Author Share Posted July 6, 2022 (edited) very good, how do i get multiple pages? with the latest version there's only 1 page address page 1 i have this ;https://www.lomcn.org/forum/members/list/?page=1 #Include <Array.au3> $base_url = "https://www.lomcn.org/forum/members/list/" For $i = 1 to 374 $out = _XP_Read($base_url & "?page=" & $i) ;ConsoleWrite($out & @crlf) $list = "" $items = StringRegExp($out, '\Q' & $base_url & '\E(\w+\.\d+)', 3) $items = _ArrayUnique($items) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next FileWrite("results.txt", $list) ;FileWrite("results.txt", $list) Next Func _XP_Read($url) Local $cmd = "curl -L -s -k -A 'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1)' " & $url Local $iPID = Run($cmd, "", @SW_HIDE, 2) ;$STDOUT_CHILD ProcessWaitClose($iPID) Local $output = StdoutRead($iPID) Return $output EndFunc Edited July 6, 2022 by vinnyMS1 Link to comment Share on other sites More sharing options...
mikell Posted July 7, 2022 Share Posted July 7, 2022 @vinnyMS1 Hey, you may think a little. If you don't try to understand what you read you will never learn anything As written in my "roadmap" code and as pixelsearch said, you have to loop through the numbered pages This works for me to treat the 3 first pages #Include <Array.au3> $base_url = "https://www.lomcn.org/" $sub_url = "forum/members/" $list = "" For $i = 1 to 3 $out = _XP_Read($base_url & $sub_url & "list/?page=" & $i) ;ConsoleWrite($out & @crlf) $items = StringRegExp($out, $sub_url & '(\w+\.\d+)', 3) $items = _ArrayUnique($items, 0, 0, 0, 0) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next Next Msgbox(0,"", $list) ;FileWrite("results.txt", $list) Func _XP_Read($url) Local $cmd = "curl -L -s -k -A 'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1)' " & $url Local $iPID = Run($cmd, "", @SW_HIDE, 2) ;$STDOUT_CHILD ProcessWaitClose($iPID) Local $output = StdoutRead($iPID) Return $output EndFunc Link to comment Share on other sites More sharing options...
pixelsearch Posted July 10, 2022 Share Posted July 10, 2022 @mikell just to let you know the minimal switches I just tried for a successful file download, using curl Local $url = "https://www.autoitscript.com/autoit3/scite/download/Au3Stripper.zip" Local $cmd = "C:\curl\curl.exe -O " & $url Local $iPID = Run($cmd, "", @SW_HIDE, 2) ; 2 = $STDOUT_CHILD ProcessWaitClose($iPID) Local $output = StdoutRead($iPID) The script downloads Au3Stripper.zip and saves it with the same name in the script folder because of the -O switch : -O, --remote-name Write output to a local file named like the remote file we get [...] But if we had to download this file : Local $url = "https://www.autoitscript.com/cgi-bin/getfile.pl?autoit3/autoit-v3.zip" Then it would download only a 1KB zip file (instead of 17Mb !) because of the missing switch -L . In this case, the proper syntax should be : Local $cmd = "C:\curl\curl.exe -O -L " & $url So it seems always good to include the -L switch (as you indicated), no matter the url : -L, --location (HTTP) If the server reports that the requested page has moved to a different location [...] this option will make curl redo the request on the new place. I avoided the -k (--insecure) switch after I read this in the help file : WARNING: using this option makes the transfer insecure. Without the -s/--silent switch, we can see nice progress lines in AutoIt console, great Finally, the -A/--user-agent <agent string> will certainly be useful depending on the site, but I just wanted to test the minimal mandatory switches needed for a successful download. Both versions of Curl did it : the old "curl_7_46_0_openssl_nghttp2_x86" I told you about, and the recent "curl-7.83.1_7-win32-mingw.zip" (may 2022) Also, as you guessed, I found the following info written on the official curl website : Microsoft ships curl too : curl is also shipped by Microsoft as part of Windows 10 and 11. You definitely knew all this but it's very new for me ! Have a great sunday and thanks for making us discover curl mikell 1 Link to comment Share on other sites More sharing options...
mikell Posted July 10, 2022 Share Posted July 10, 2022 9 hours ago, pixelsearch said: I just wanted to test the minimal mandatory switches needed for a successful download. I got some failures when using curl without the -k and/or -A switches... For convenience I use a little UDF with curl 'read' and 'get' functions inside, and I need them to work in as many cases as possible Reason why I also use -o instead of -O (matter of versatility) pixelsearch 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now