babyjoe Posted November 1, 2006 Posted November 1, 2006 (edited) Hello, I would be happy with some advice on my project: Purpose: I have to make a crawler that searches 1 single internet page, and visits every link on that page. Then, I just need to copy-paste the content from that visited link into a DB file. Back again to central page, visit next link. I have some mediocre programming skills in Java, but have just discovered AutoIt. It seems to me that the simplest thing to do, is passing keystrokes to the browser, especially for the copy/paste. However, this is my problem: How can I direct the browser to the next link?? 'Tab' doesn't work as an advancing keystroke, so I do not know how to extract the links. Should I try to find out what pixels are blue (only the hyperlinks are blue) and try to click those pixels? Is there anyone who has had a similar project and has some advice? Thanks! Edited November 1, 2006 by babyjoe
babyjoe Posted November 1, 2006 Author Posted November 1, 2006 I forgot to say, but significant: it is a HTTPS site, so you can not save it, so I think keystrokes are the only option. Are there any other browser helper software around that can automatically select the next link?
_Kurt Posted November 1, 2006 Posted November 1, 2006 (edited) So what you what to do is: - Browse through X pages (or links) Question: Are the links preset? Or is it just like: browse through www.example.com\ + any pages related (not preset) i.e. www.example.com, www.example.com\page1, www.example.com\page2, etc Edited November 1, 2006 by _Kurt Awaiting Diablo III..
babyjoe Posted November 1, 2006 Author Posted November 1, 2006 (edited) Browse through all links on a page (they do not link to other physical pages, they are DB generated) I have come up with the following, but I have one caveat. I used Firefox, because you can use the cursor to select text, however, when you push the back button, you are back, but the link you clicked is not selected (in IE, when you go back, the last link you clicked is the active item). If I could make FF select the last visited link, I could keep using the down button to go to the next link. Now I have to keep a counter that tracks how many times I went down, and add 1 Hotkeyset('{F9}', 'Zoek') While 1 sleep (100) wend Func Zoek() Opt("WinTitleMatchMode", 2) send("{DOWN}") sleep (50) send("{ENTER}") sleep (50) send("^F") sleep (50) send("^a") sleep (50) send("{SHIFTDOWN}") sleep (50) send("{END}") sleep (50) send("{SHIFTUP}") sleep (50) send("^c") sleep (50) WinActivate("WordPad") send("^v") sleep (50) send("{ENTER}") sleep(50) WinActivate("Firefox") EndFunc Edited November 1, 2006 by babyjoe
Moderators big_daddy Posted November 1, 2006 Moderators Posted November 1, 2006 See if this is like what you are wanting. #include <IE.au3> $sURL = "http://www.google.com" $oIE = _IECreate($sURL) $oLinks = _IELinkGetCollection($oIE) For $oLink In $oLinks $sHREF = $oLink.href $oIE2 = _IECreate($sHREF, 0, 0) $sText = _IEBodyReadText($oIE2) ConsoleWrite("<<<<<<<<<<>>>>>>>>>>" & @CR) ConsoleWrite($sText & @CR) ConsoleWrite(">>>>>>>>>><<<<<<<<<<" & @CR & @CR) _IEQuit($oIE2) Next
Confuzzled Posted November 7, 2006 Posted November 7, 2006 Would something like WebReaper or WebStripper do the job?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now