Mcky Posted July 14, 2004 Share Posted July 14, 2004 (edited) After 1 month of intense testing and debugging,Mcky's Web Extrator is out.This script extracts http,ftp,https,news,e-mail and file:/// internet links from text and local HTML documents(file type can be customized). Not only that,it can do batch processing;that is extract links from all text and HTML files inside a folder. It saves the links to a HTML file which you can customize its appearance easily using the configuration file,WECFG.INI. See the TODO list way down below.Direct from the help file:;The first and only links extraction program to saves the links to a HTML file complete with formatting of text and background.;Extract http,https,news,ftp,e-mail and file links and saves them to a specified HTML file;Does not mess up the registry. It only keep settings in the configuration file,WECFG.INI which you edit it easily.;In just a few clicks,you can make a HTML links file from the file you extract the links from.;Easy to use wizard-like interface,suitable for any user beginner or advanced to use the program;Supports extraction of an infinite number of links;no limit,just be patient;Supports folder processing;extract links from all text and HTML files inside a folder;Supports custom link extraction type,example:gopher,res,aol,etc.;Supports custom file filter type. Extract links from file type other than HTML and text.(Applies to single file operation);Extract links from text and HTML files,removes white spaces and other unnecessary characters including """ and " ".(See FAQ if links doesn't work.);Supports "BaseURL" setting,extract only the base URL of the link,stripping all other unnecessary characters.;Resulted HTML page can be fully customized in the configuration file,WECFG.INI;Resulted HTML file can be imported into any HTML editor/WYSIWYG HTML editor for further customization. The resultant HTML page's code is clean and easy to understand.;Gives clear detailed report at end of processing;can save the report to a log file;ESC stops the program immediately,even while processing data.;Nicely layout HTML-based help fileRight click on the link and select "Save Target As":Download the script,zipped!Take a look at the demo folder on how the links are stored.Please post your comments!!TODO List:Improve processing speed Folder recursing(extract files in a folder within a folder)GUICleaner,leaner,easier to understand codeRemove duplicate linksLink alphabet sorting Edited July 15, 2004 by Mcky My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote] Link to comment Share on other sites More sharing options...
bobheart Posted July 14, 2004 Share Posted July 14, 2004 Bad zip file . just put the code here . Link to comment Share on other sites More sharing options...
CyberSlug Posted July 14, 2004 Share Posted July 14, 2004 Bad zip file . just put the code here .Yeah. When I open the zip, I see the contents of my desktop Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig! Link to comment Share on other sites More sharing options...
SlimShady Posted July 14, 2004 Share Posted July 14, 2004 This file is hosted by Tripod, a Lycos®Network Site, and is not available for download. Link to comment Share on other sites More sharing options...
this-is-me Posted July 14, 2004 Share Posted July 14, 2004 (edited) Has everyone here gone batty? The message SlimShady got is EXACTLY why Mcky put in instructions like Right click on the link and select "Save Target As": (kinda duh...) Edited July 14, 2004 by this-is-me Who else would I be? Link to comment Share on other sites More sharing options...
SlimShady Posted July 14, 2004 Share Posted July 14, 2004 Summary: Lycos webhosting sucks. Link to comment Share on other sites More sharing options...
CyberSlug Posted July 14, 2004 Share Posted July 14, 2004 (edited) Has everyone here gone batty? The message SlimShady got is EXACTLY why Mcky put in instructionsThat's what I did; but when I try to open the 10.7 KB zip file, it appears corrupted....EDIT: See this post if Lycos webhosting doesn't suffice Edited July 14, 2004 by CyberSlug Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig! Link to comment Share on other sites More sharing options...
Mcky Posted July 14, 2004 Author Share Posted July 14, 2004 In internet explorer,right click on the link. Select "Save Target As". Then you can save the zip file.(85KB) This is tripod's remote loading service limitation. My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote] Link to comment Share on other sites More sharing options...
bobheart Posted July 14, 2004 Share Posted July 14, 2004 Bad zip Link to comment Share on other sites More sharing options...
SlimShady Posted July 14, 2004 Share Posted July 14, 2004 The file is 79Kb here and I can edit webextract.au3.Bad zipTip: Clear your IE cache and download again. Link to comment Share on other sites More sharing options...
bobheart Posted July 14, 2004 Share Posted July 14, 2004 Got it . Link to comment Share on other sites More sharing options...
emmanuel Posted July 14, 2004 Share Posted July 14, 2004 Tip: Clear your IE cache and download again.Looks to somehow be a issue with Firefox, I tried from IE and it downloaded the proper zip file. I suggest the "IE View" Extension for Firefox for such situations. "I'm not even supposed to be here today!" -Dante (Hicks) Link to comment Share on other sites More sharing options...
pekster Posted July 14, 2004 Share Posted July 14, 2004 Looks to somehow be a issue with Firefox, I tried from IE and it downloaded the proper zip file.Worked for me under Firefox. I click it, the link opens in a new window, and it asks me where I want to save my zip...Using Firefox 0.9.2 [font="Optima"]"Standing in the rain, twisted and insane, we are holding onto nothing.Feeling every breath, holding no regrets, we're still looking out for something."[/font]Note: my projects are off-line until I can spend more time to make them compatable with syntax changes. Link to comment Share on other sites More sharing options...
Mcky Posted July 15, 2004 Author Share Posted July 15, 2004 (edited) Well,post some comments on the script.I will continue improving Web Extractor if i got enough comments,suggestions,bug reports. Thanks in advance!Btw,if you want to extract only the base link,open the configuration file,WECFG.INI,edit BaseURL=1Also,try this:Open the script,choose to extract links from an internet webpage. Enter the address:http://www.mnsi.net/~jhlavac/nps/It will extract over 400 links in 4-6 seconds.New download location(For people who can't download it from tripod server)http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zip Edited July 15, 2004 by Mcky My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote] Link to comment Share on other sites More sharing options...
tutor2000 Posted July 15, 2004 Share Posted July 15, 2004 Bad zip file . just put the code here .Zip file is finedid you right mouse click on it as he suggested?Rick Link to comment Share on other sites More sharing options...
bobheart Posted July 15, 2004 Share Posted July 15, 2004 Zip file is finedid you right mouse click on it as he suggested?RickSee above post that says "got it" . Link to comment Share on other sites More sharing options...
Mcky Posted July 15, 2004 Author Share Posted July 15, 2004 comments on the actual script is appreciable. Btw,the new new download link is at:http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zip My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote] Link to comment Share on other sites More sharing options...
emmanuel Posted July 15, 2004 Share Posted July 15, 2004 comments on the actual script is appreciable. Btw,the new new download link is at:http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zipok, here's one... I'm not sure exactly what I'd use this for, what kinds of pages would you parse with it to make the output valuable to you? I thought about news pages, but without the surrounding context, the links are fairly useless... educate me "I'm not even supposed to be here today!" -Dante (Hicks) Link to comment Share on other sites More sharing options...
peethebee Posted July 8, 2005 Share Posted July 8, 2005 Hi! At me it does only etract one link per file line. Please fix that. btw. is it possible to generate a plain text file with the links in it? peehtebee vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvGerman Forums: http://www.autoit.deGerman Help File: http://autoit.de/hilfe vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv Link to comment Share on other sites More sharing options...
ParoXsitiC Posted July 12, 2005 Share Posted July 12, 2005 Nice. Yea you do need fix the not being able to extract multiple links in 1 line of code. I used this code and cut it down alot and used it to extract Myspace Usernames/IDs...Then make an auto-friend adder. I did the same for a few multiple sites. You should have two filters for the "custom filters" one for the start, one for the end. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now