littleclown Posted March 15, 2010 Share Posted March 15, 2010 (edited) Hello. I think this is my first post in this subforum. This is quick and dirty code. I am not a programmer and I am sure there are a really better ways to do this. The main plus of this code is the speed. I use SQLITE and this makes this script more fast then every example here. Features: Find all URLs from one domain Find all e-mails from one domain (you can modify this and get URLS and e-mail out of domain too) You can stop it and start it - it will start from last URL Deep Level Option settings file Progress bar TODO: Interface Deep level option Everything is in SQLITE DB but you can convert it to what you need. I use SQLite Database Browser to review the db. If you need to reset the tool just remove or rename the database file. Edited March 13, 2012 by Valik Link to comment Share on other sites More sharing options...
JRowe Posted March 15, 2010 Share Posted March 15, 2010 You should make sure you're abiding by the robots.txt standard. http://www.robotstxt.org/ Otherwise, neat [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center] Link to comment Share on other sites More sharing options...
littleclown Posted March 15, 2010 Author Share Posted March 15, 2010 (edited) I create this for internal site in my company, without robots.txt but its good idea to check this file. Please see the attached file. Changes are: .option to limit the level of search. .option to set do you want to search out of domain .console write little more info .db file for every site SQLITE is cool Edited March 13, 2012 by Valik Link to comment Share on other sites More sharing options...
JohnOne Posted March 15, 2010 Share Posted March 15, 2010 I've no idea what this is, but would like to know. Any chance of an example ? AutoIt Absolute Beginners  Require a serial  Pause Script  Video Tutorials by Morthawt  ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
littleclown Posted March 15, 2010 Author Share Posted March 15, 2010 (edited) This is simple robot, crawler, spider. You set start URL. For example autoitscript.com (without http://) and this script find all e-mails (plain text) in this site. There are some variables that you can change: $firsturl="strelki.info" $max_level=0 $domain_only=1 This means that I want to scan strelki.info site, without max level (FULL site scan) and only internal links (don't follow any ads and other external links). The robot "visit" the main URL, scan for URLs and write them to the db file, scan for emails and write them too. After that It goes to the second URL (this is the first found URL in the main URL), etc. After that you have all e-mails written in this site and you have a list with all internal URLs. Thats all. Spammers usualy use this way to get victim's emails. You can analize the source for different types of info if you edit the code - image list, external links list, info about the size of html or images or videos - whatever you need. Edited March 15, 2010 by littleclown Link to comment Share on other sites More sharing options...
JohnOne Posted March 15, 2010 Share Posted March 15, 2010 So if I use this on say my googlemail account while logged in, I can find for example duplicate email addresses ? Sounds cool AutoIt Absolute Beginners  Require a serial  Pause Script  Video Tutorials by Morthawt  ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
littleclown Posted March 15, 2010 Author Share Posted March 15, 2010 I think Gmail can protect e-mail information from robots really well Link to comment Share on other sites More sharing options...
jvanegmond Posted March 15, 2010 Share Posted March 15, 2010 So if I use this on say my googlemail account while logged in, I can find for example duplicate email addresses ?Sounds coolNo, being logged in Internet Explorer ( or any browser ) is not going to automatically log you in in this crawler. There is no simple way to add support for sessions, etc, either. github.com/jvanegmond Link to comment Share on other sites More sharing options...
JohnOne Posted March 15, 2010 Share Posted March 15, 2010 I see, so this is probably more usefull for your own server. Thanks. AutoIt Absolute Beginners  Require a serial  Pause Script  Video Tutorials by Morthawt  ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
littleclown Posted March 16, 2010 Author Share Posted March 16, 2010 I will try to make streams, for faster download. And other idea is to make image downloader. Link to comment Share on other sites More sharing options...
littleclown Posted March 21, 2010 Author Share Posted March 21, 2010 Check it out the last version (from the main post) Progress bar added. Some bugfix. Settings file added. I try to add streams and it works, but not significantly faster than the old version, but makes the code complicated. Thats why I will leave this idea for now. Here is a sample settings.ini file: [DEFAULT] FIRST_URL=strelki.info MAX_LEVEL=0 DOMAIN_ONLY=1 LOG_EXTERNAL=0 Nobody try this yet? Any suggestions? Link to comment Share on other sites More sharing options...
SagePourpre Posted March 23, 2010 Share Posted March 23, 2010 (edited) Nice. I was not sure how to make it work before your example of a valid settings.ini I'm currently trying it. (It will take a while before it has finished) I do not really have use for collecting everybody emails but I find this script very interesting. edit : Since the operation can take a while, it would be nice to not have the progressbar window with ontop attribute and possibly movable. (I would then put it on my second screen display to continue watching progress without blocking view in my first display ) Edited March 23, 2010 by SagePourpre Link to comment Share on other sites More sharing options...
IchBistTod Posted March 23, 2010 Share Posted March 23, 2010 This is an email harvester. Used to spam email addresses. [center][/center][center]=][u][/u][/center][center][/center] Link to comment Share on other sites More sharing options...
JRowe Posted March 23, 2010 Share Posted March 23, 2010 This is a spider template. Used to harvest information off a web page. Spamming would require significantly more work, and anyone capable of that work would be more than capable of something like this. It's a pattern matching tool, nothing more. [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center] Link to comment Share on other sites More sharing options...
IchBistTod Posted March 23, 2010 Share Posted March 23, 2010 He said he made it for a company, yes?Its right now designed to harvest emails out of web pages, yes?He even stated "Spammers usualy use this way to get victim's emails."And more difficult code? I think not.PHP Mail() + GET + AutoIt _INetGetSource() or a tcp variant would effectively send mass emails in a loop, with under 50 lines of code, under 10 mins of coding.I really dont think this should be openly available with the ease to use it to create a spammer.Because as I stated its not more work to make a spammer, and any noob could create the code to do it, the spider and pattern matching is much more complex than the spamming component, and therefore your statement does not hold true. One would not know how to make an email harvester as this does, just because they could make the spammer.This gives them the code they need to make a spammer + auto harvester.This should at least have the ability to harvest emails stripped. [center][/center][center]=][u][/u][/center][center][/center] Link to comment Share on other sites More sharing options...
JRowe Posted March 24, 2010 Share Posted March 24, 2010 Heh. You're both overestimating the capacity of noobs to make code work and underestimating the difficulty of this task. This converts the data to a sqlite format, presuming upon a level of technical experience that noobs don't have. You need an interface to access the data, or you need to be able to modify the code to make the data more accessible. Both require a level of proficiency that makes this task trivial. It's not like this outputs a ready to use text file. You need to be able to interface with the code somehow... and any idiots trying to do that will migrate to the help forum, say "how can i use this to collect email addresses" and Jos or Valik or Smoke_N will ban their ass. [center]However, like ninjas, cyber warriors operate in silence.AutoIt Chat Engine (+Chatbot) , Link Grammar for AutoIt , Simple Speech RecognitionArtificial Neural Networks UDF , Bayesian Networks UDF , Pattern Matching UDFTransparent PNG GUI Elements , Au3Irrlicht 2Advanced Mouse Events MonitorGrammar Database GeneratorTransitions & Tweening UDFPoker Hand Evaluator[/center] Link to comment Share on other sites More sharing options...
IchBistTod Posted March 24, 2010 Share Posted March 24, 2010 Simply stating the autoit help file + mysql documentation would easily show any user how to retrieve the email addresses out of the db with a few lines of code using a query with the "select *" command. [center][/center][center]=][u][/u][/center][center][/center] Link to comment Share on other sites More sharing options...
littleclown Posted March 24, 2010 Author Share Posted March 24, 2010 (edited) I think there are a lot of tools for spammers already. I am not sure how somebody will became a spammer because of this code . If somebody need to be a spammer he will find a way. I think this script will be the last choice . Yes spammers use tools like this, but I think they already have what they need. You can find many scripts here and use them for "bad" actions. I public this script, because there is nothing like this here, and maybe will be helpful for somebody. I hope nobody will have more spam because of this simple script . Everybody can edit this and make mass image downloader, or website downoader, or create some analysis for word counting for example (this will be usefull for SEO projects). Sorry for the on top progress. Be aware of that one normal website have many many pages, and sometimes if you need all URLS or all e-mails this will costs days of work. There is a "Fast" word in the title because this is the fastest example here. I don't use 3d party DLLs and I use SQLITE and I believe this makes the script more fast and reliable. Some tips: If you stop this script it will resume the work from the last Checked URL. If you need to reset the search just remove or rename the db file. It was interesting for me how simple this can be done. I develop the level system in 3 minutes! . I am not a programmer, and if there is some stupid solution in my code and you know a way to speed up the process or to make the code more simple or more reliable - please comment. Edited March 24, 2010 by littleclown Link to comment Share on other sites More sharing options...
lsakizada Posted March 25, 2010 Share Posted March 25, 2010 I am not a programmer, and if there is some stupid solution in my code and you know a way to speed up the process or to make the code more simple or more reliable - please comment. First thanks for sharing. Your script gave me an idea that could be usefull for my project, absolutely will not be used for spamming. You may want to pass the data via URI Encode function before saving it into database. To make sure locale urls will be saved properly. You can use this functions for example: Func _URIDecode($sData) ; Prog@ndy Local $aData = StringSplit(StringReplace($sData, "+", " ", 0, 1), "%") $sData = "" For $i = 2 To $aData[0] $aData[1] &= Chr(Dec(StringLeft($aData[$i], 2))) & StringTrimLeft($aData[$i], 2) Next Return BinaryToString(StringToBinary($aData[1], 1), 4) EndFunc ;==>_URIDecode Func _URIEncode($sData) ; Thanks to Prog@ndy Local $aData = StringSplit(BinaryToString(StringToBinary($sData, 4), 1), "") Local $nChar $sData = "" For $i = 1 To $aData[0] $nChar = Asc($aData[$i]) Switch $nChar Case 45, 46, 48 - 57, 65 To 90, 95, 97 To 122, 126 $sData &= $aData[$i] Case 32 $sData &= "+" Case Else $sData &= "%" & Hex($nChar, 2) EndSwitch Next Return $sData EndFunc ;==>_URIEncode Be Green Now or Never (BGNN)! Link to comment Share on other sites More sharing options...
PhilHibbs Posted March 25, 2010 Share Posted March 25, 2010 Simply stating the autoit help file + mysql documentation would easily show any user how to retrieve the email addresses out of the db with a few lines of code using a query with the "select *" command.Oh come on. 200 lines of AutoIt script, most of which is database code, is hardly a major spamming tool. Spammers have tools that crack CAPTCHA and create user accounts to get full access to forums that need log-in, this is very basic stuff and does not constitute a hacking tool. Link to comment Share on other sites More sharing options...
Recommended Posts