Jump to content

Need easy help to retrieve proxies


Go to solution Solved by computergroove,

Recommended Posts

Posted

Hi all,

I am really advancing in the arts of web scraping, but I still got issues where there is this class identities... they hate me..

http://proxyipchecker.com/, they have this really neat area of their page, where they show up last checked IPs. I wish to have them. I would love if anyone could be the sweetest and help me! I will demonstrate the interesting bit here for you:

<div class="innertube">

<div class="innertube">
<div class="hovermenu">
<ul>
<li><a href="/check-my-proxy-ip.html" title="Check my Proxy IP">Check my Proxy IP</a></li>
<li><a href="/proxy-headers-checker.html" title="Proxy Headers Checker">Proxy Headers Checker</a></li>
<li><a href="/proxy-checker-online.html" title="Proxy Checker Online">Proxy Checker Online</a></li>
<li><a href="/buy-proxies-proxy-buy.html" title="Buy Proxies - Proxy Buy" style="background-color:#5c63ff;color:#ffffff">Buy Proxies - Proxy Buy</a></li>
<li><a href="/api.html" title="Proxy Checker API - Proxy List API">Proxy Checker API - Proxy List API</a></li>
</ul>
</div>
</div>

<h2>Latest open proxy servers, fast, checked and alive! Fresh proxies IP address and port continuously updated!</h2>

<ul class="freshproxies">
<li class="down">190.74.203.4 : 8080</li><li class="medium lowbw">118.26.142.5 : 80  </li><li class="medium lowbw">111.11.14.174 : 80  </li><li class="down">41.207.116.233 : 3128</li><li class="fast lowbw">195.40.6.43 : 8080  fresh</li><li class="fast lowbw">200.27.79.74 : 8080  open</li><li class="down">190.37.62.240 : 8080</li><li class="veryfast lowbw">77.243.2.171 : 80  up</li></ul>
</div>

Under "freshproxies" down at the slight bottom, you got a list of recent searches with their IP and port. I would like a simple code to fetch anything that's not related with "class=down" in an array. Anyone mind helping me with this? The code ought to be so simple I don't know if I really have to put up how I have faired in it. But I shall, case it humours you :)

Local $oIE = _IECreate("http://proxyipchecker.com/")
;~ Local $fresh = _IEGetObjById($oIE, "rightcolumn")
$tags = $oIE.document.GetElementsByTagName("li")
For $tag in $tags
    $class_value = $tag.className("class")
    If $class_value = "freshproxies" Then
       ConsoleWrite($class_value & @LF)
    EndIf
Next

Thanks again!

  • Solution
Posted (edited)

#include <array.au3>
#include <File.au3>
#include <String.au3>
#include <IE.au3>

Local $oIE = _IECreate("http://proxyipchecker.com/")
WinWait("Online Proxy Checker - IP Checker - Check Proxy - Internet Explorer")
Local $HTML = _IEDocReadHTML($oIE);Gets all HTML
Local $LeftCount = StringInStr($HTML,'<ul class="freshproxies">');find the count of characters that come before the first string you want to find
Local $temp = StringTrimLeft($HTML,$LeftCount + 25);removes all characters before the first ipaddress
Local $RightLocation = StringInStr($temp,"</li></ul>");position of the end of the ip address section in the html
Local $RawData = StringMid($temp,1,$RightLocation - 1);unedited datablock of ip address information
Local $SplitRaw = StringSplit($RawData,'</li>',1)
Local $TempArray[0][3]
For $i = 1 To Ubound($SplitRaw) - 1
    Local $M = StringReplace($SplitRaw[$i],'<li class="',"");remove leading text
    Local $N = StringReplace($M,'">',";");remove unwanted characters
    Local $O = StringReplace($N,":",";")
    _ArrayAdd($TempArray,$O,0,";")
Next
    _ArrayDisplay($TempArray)

This just needs the description removed and it is ready to use.

Edit - For anyone who wants to chime in on this one there is a description that becomes part of the string behind the port number that sometimes does not show up at all (it's optional when entering the data in the website). I cannot figure out how to trim the description from my array.

Edited by computergroove

Get Scite to add a popup when you use a 3rd party UDF -> http://www.autoitscript.com/autoit3/scite/docs/SciTE4AutoIt3/user-calltip-manager.html

Posted
#include <array.au3>
#include <File.au3>
#include <String.au3>
#include <IE.au3>

Local $oIE = _IECreate("http://proxyipchecker.com/")
WinWait("Online Proxy Checker - IP Checker - Check Proxy - Internet Explorer")
Local $HTML = _IEDocReadHTML($oIE);Gets all HTML
Local $LeftCount = StringInStr($HTML,'<ul class="freshproxies">');find the count of characters that come before the first string you want to find
Local $temp = StringTrimLeft($HTML,$LeftCount + 25);removes all characters before the first ipaddress
Local $RightLocation = StringInStr($temp,"</li></ul>");position of the end of the ip address section in the html
Local $RawData = StringMid($temp,1,$RightLocation - 1);unedited datablock of ip address information
Local $SplitRaw = StringSplit($RawData,'</li>',1)
Local $TempArray[0][3]
For $i = 1 To Ubound($SplitRaw) - 1
    Local $M = StringReplace($SplitRaw[$i],'<li class="',"");remove leading text
    Local $N = StringReplace($M,'">',";");remove unwanted characters
    Local $O = StringReplace($N,":",";")
    _ArrayAdd($TempArray,$O,0,";")
Next
    _ArrayDisplay($TempArray)

This just needs the description removed and it is ready to use.

Edit - For anyone who wants to chime in on this one there is a description that becomes part of the string behind the port number that sometimes does not show up at all (it's optional when entering the data in the website). I cannot figure out how to trim the description from my array.

 

Big thanks! :) A bug inside port section, but I think I can manage it from here. Dare say it would be kinda handy with a "_IEGetObjectBy($type (class,name, ID etc), $oObject, $sName, $iIndex [optional])", but I digress. Perhaps they would have done it already if it were possible, who knows.

Again, computergroove, thanks!

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...