vinnyMS1 Posted July 4, 2022 Share Posted July 4, 2022 i have a list of links in a text file https://www.example.org/list/?page=2https://www.example.org/list/?page=3https://www.example.org/list/?page=4https://www.example.org/list/?page=5 etc there's 20 links for every page and they are clickable their link is what i'm after https://www.example.org/list/item.9308/ i want from the page links to extract the last part of the addresses "item.9308" and save it as item:9308 in a text file i have a list of 1000 page links and there's 20000 item links on them it can work with regex it will read the txt file i attached and save the items and their numbers to a text file as a list. like this screenshot links sources.txt Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 plz help Link to comment Share on other sites More sharing options...
Developers Jos Posted July 4, 2022 Developers Share Posted July 4, 2022 In a rush or something? Would be polite to wait at least 24hour before bumping your question and in the meantime maybe share what you have already or are you expecting the code being served on a silver plate?😉 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 i can describe programs but i don't program, i cant Link to comment Share on other sites More sharing options...
Developers Jos Posted July 4, 2022 Developers Share Posted July 4, 2022 20 minutes ago, vinnyMS1 said: i can describe programs but i don't program, i cant Well, you are in a AutoIt3 support forum, not a "make the code for free for me" forum. So come back when you have an actual Autoit3 question. SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 how do i read a file for text lines then determine they're links of webpages then make the script visit the webpage links and extract with regex the links that have a word and a period then a number in the end of the link in conclusion, the autoit questions how do i read text lines in a text file how to visit each link detected and find 20 links in each link based on how the links end like https://www.example.org/list/item.9308/ how to extract with regex the links that have a string a period and a number in the end how to write in a text file the last part of the links as string column and number https://www.example.org/list/item.9308/ i want from the page links to extract the last part of the addresses "item.9308" and save it as item:9308 etc Link to comment Share on other sites More sharing options...
Danp2 Posted July 4, 2022 Share Posted July 4, 2022 FileReadToArray is one option It depends on the browser involved Ever heard of a help file? Or maybe use the forum's search functionality? See above answer P.S. Show a little effort (meaning learn the language and write some code) and you're likely to receive further help 😏 Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 i have this regex $aWords = StringRegExp(FileRead($aFiles[$i]), "(?mi)^\s*(@.*)$", 3) how to make it copy item.9308 meaning string period and number everywhere Link to comment Share on other sites More sharing options...
Danp2 Posted July 4, 2022 Share Posted July 4, 2022 I am not a regex guru, but this is closer than your poor attempt -- (?mi)\/(item\..*)\/ Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
Subz Posted July 4, 2022 Share Posted July 4, 2022 You could just use String functions example: Global $g_iLinks, $g_aLinks[] = ["https://www.example.org/list/item.9308/","https://www.example.org/list/item.9309","https://www.example.org/list/item.9310","https://www.example.org/list/item.9311","https://www.example.org/list/item.9312"] For $i = 0 To UBound($g_aLinks) - 1 $g_iLinks = StringRight($g_aLinks[$i], 1) = "/" ? -2 : -1 ConsoleWrite(StringTrimLeft($g_aLinks[$i], StringInStr($g_aLinks[$i], "/", 0, $g_iLinks)) & @CRLF) Next Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 i don't have the links i'm trying to extract (item.9308) i only have the links (https://www.example.org/list/?page=1) to the extractable links (item.9308) 1 link 20 extractions i don't know how the links end it could be any random string with a period and any number Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 4, 2022 Author Share Posted July 4, 2022 i have this complete code that needs only a regex and a way to browse all the links from "links sources.txt" expandcollapse popup#include <File.au3> #include <Array.au3> #include <MsgBoxConstants.au3> #include <IE.au3> _Example() Func _Example() ; Error monitoring. This will trap all COM errors while alive. ; This particular object is declared as local, meaning after the function returns it will not exist. Local $oErrorHandler = ObjEvent("AutoIt.Error", "_ErrFunc") Local $oIE = _IE_Example("basic") Local $aWordst = _IEBodyReadText($oIE) Local $oDictionary = ObjCreate("Scripting.Dictionary") Local $mypath = @ScriptDir Local $aFiles = _FileListToArrayRec($mypath, "links sources.txt", 1, 1) If @error Then MsgBox($MB_SYSTEMMODAL, "Error", "No files found") Exit Else MsgBox($MB_SYSTEMMODAL, "Found", $aFiles[0] & " files") EndIf Local $aWords For $i = 1 To $aFiles[0] $aWords = StringRegExp(FileRead($aFiles[$i]), "(?mi)^\s*(@.*)$", 3) ; change pattern to fit your definition of "word Local $iError = @error If $iError = 0 Then For $Word In $aWords $oDictionary.add($Word, $Word) Next Else ;;MsgBox($MB_SYSTEMMODAL, "Error", $aFiles[$i] & " - " & $i & @CRLF & "error: " & $iError) EndIf Next $aWords = $oDictionary.Items FileWrite("saved result2.txt", _ArrayToString($aWords, @CRLF)) EndFunc ;==>_Example ; User's COM error function. Will be called if COM error occurs Func _ErrFunc($oError) ; Do anything here. ConsoleWrite(@ScriptName & " (" & $oError.scriptline & ") : ==> COM Error intercepted !" & @CRLF & _ @TAB & "err.number is: " & @TAB & @TAB & "0x" & Hex($oError.number) & @CRLF & _ @TAB & "err.windescription:" & @TAB & $oError.windescription & @CRLF & _ @TAB & "err.description is: " & @TAB & $oError.description & @CRLF & _ @TAB & "err.source is: " & @TAB & @TAB & $oError.source & @CRLF & _ @TAB & "err.helpfile is: " & @TAB & $oError.helpfile & @CRLF & _ @TAB & "err.helpcontext is: " & @TAB & $oError.helpcontext & @CRLF & _ @TAB & "err.lastdllerror is: " & @TAB & $oError.lastdllerror & @CRLF & _ @TAB & "err.scriptline is: " & @TAB & $oError.scriptline & @CRLF & _ @TAB & "err.retcode is: " & @TAB & "0x" & Hex($oError.retcode) & @CRLF & @CRLF) EndFunc ;==>_ErrFunc Link to comment Share on other sites More sharing options...
mikell Posted July 5, 2022 Share Posted July 5, 2022 So simple... Use InetRead to get the source code of the numbered pages. Don't need a txt file, just use a For/Next loop Then use a regex to extract from these texts the data you want Could be something like this - obviously untested, and raw (no error checking etc) $list = "" $base_url = "https://www.example.org/list/" For $i = 1 to 374 $txt = InetRead($base_url & "?page=" & $i) $items = StringRegExp($txt, '\Q' & $base_url & '\E(\w+\.\d+)', 3) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next Next FileWrite(".\results.txt", $list) Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 5, 2022 Author Share Posted July 5, 2022 2 hours ago, mikell said: So simple... Use InetRead to get the source code of the numbered pages. Don't need a txt file, just use a For/Next loop Then use a regex to extract from these texts the data you want Could be something like this - obviously untested, and raw (no error checking etc) $list = "" $base_url = "https://www.example.org/list/" For $i = 1 to 374 $txt = InetRead($base_url & "?page=" & $i) $items = StringRegExp($txt, '\Q' & $base_url & '\E(\w+\.\d+)', 3) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next Next FileWrite(".\results.txt", $list) what do i add on $list = "" Link to comment Share on other sites More sharing options...
mikell Posted July 5, 2022 Share Posted July 5, 2022 As I don't know the site you are dealing with, the code I provided is nothing but a roadmap. You have to understand what the various instructions mean, so have a look at the documentation To test try first on page1 only, using this : $list = "" $txt = InetRead("https://www.example.org/list/?page=1") $items = StringRegExp($txt, '\Q' & $base_url & '\E(\w+\.\d+)', 3) For $k = 0 to UBound($items)-1 $list &= $items[$k] & @crlf Next FileWrite("results.txt", $list) If it works as intended then try the next step - on several pages If it doesn't then the helpfile is your best friend Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 5, 2022 Author Share Posted July 5, 2022 (edited) it waits 1 minute then creates an empty results.txt same with the other version Edited July 5, 2022 by vinnyMS1 Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 5, 2022 Moderators Share Posted July 5, 2022 vinnyMS1, When you reply in future, please use the "Reply to this topic" button at the top of the thread or the "Reply to this topic" editor at the bottom rather than the "Quote" button - responders know what they wrote and it just pads the thread unnecessarily. Thanks in advance for your cooperation. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
mikell Posted July 5, 2022 Share Posted July 5, 2022 The 1 minute delay probably means that InetRead doesn't work. To say more I need to test - so please provide the name of the site Link to comment Share on other sites More sharing options...
vinnyMS1 Posted July 5, 2022 Author Share Posted July 5, 2022 i found a random website https://www.lomcn.org/forum/members/list/?page=1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now