Sodori Posted December 9, 2014 Posted December 9, 2014 Hi all, I am really advancing in the arts of web scraping, but I still got issues where there is this class identities... they hate me.. http://proxyipchecker.com/, they have this really neat area of their page, where they show up last checked IPs. I wish to have them. I would love if anyone could be the sweetest and help me! I will demonstrate the interesting bit here for you: <div class="innertube"> <div class="innertube"> <div class="hovermenu"> <ul> <li><a href="/check-my-proxy-ip.html" title="Check my Proxy IP">Check my Proxy IP</a></li> <li><a href="/proxy-headers-checker.html" title="Proxy Headers Checker">Proxy Headers Checker</a></li> <li><a href="/proxy-checker-online.html" title="Proxy Checker Online">Proxy Checker Online</a></li> <li><a href="/buy-proxies-proxy-buy.html" title="Buy Proxies - Proxy Buy" style="background-color:#5c63ff;color:#ffffff">Buy Proxies - Proxy Buy</a></li> <li><a href="/api.html" title="Proxy Checker API - Proxy List API">Proxy Checker API - Proxy List API</a></li> </ul> </div> </div> <h2>Latest open proxy servers, fast, checked and alive! Fresh proxies IP address and port continuously updated!</h2> <ul class="freshproxies"> <li class="down">190.74.203.4 : 8080</li><li class="medium lowbw">118.26.142.5 : 80 </li><li class="medium lowbw">111.11.14.174 : 80 </li><li class="down">41.207.116.233 : 3128</li><li class="fast lowbw">195.40.6.43 : 8080 fresh</li><li class="fast lowbw">200.27.79.74 : 8080 open</li><li class="down">190.37.62.240 : 8080</li><li class="veryfast lowbw">77.243.2.171 : 80 up</li></ul> </div> Under "freshproxies" down at the slight bottom, you got a list of recent searches with their IP and port. I would like a simple code to fetch anything that's not related with "class=down" in an array. Anyone mind helping me with this? The code ought to be so simple I don't know if I really have to put up how I have faired in it. But I shall, case it humours you Local $oIE = _IECreate("http://proxyipchecker.com/") ;~ Local $fresh = _IEGetObjById($oIE, "rightcolumn") $tags = $oIE.document.GetElementsByTagName("li") For $tag in $tags $class_value = $tag.className("class") If $class_value = "freshproxies" Then ConsoleWrite($class_value & @LF) EndIf Next Thanks again!
Solution computergroove Posted December 10, 2014 Solution Posted December 10, 2014 (edited) #include <array.au3> #include <File.au3> #include <String.au3> #include <IE.au3> Local $oIE = _IECreate("http://proxyipchecker.com/") WinWait("Online Proxy Checker - IP Checker - Check Proxy - Internet Explorer") Local $HTML = _IEDocReadHTML($oIE);Gets all HTML Local $LeftCount = StringInStr($HTML,'<ul class="freshproxies">');find the count of characters that come before the first string you want to find Local $temp = StringTrimLeft($HTML,$LeftCount + 25);removes all characters before the first ipaddress Local $RightLocation = StringInStr($temp,"</li></ul>");position of the end of the ip address section in the html Local $RawData = StringMid($temp,1,$RightLocation - 1);unedited datablock of ip address information Local $SplitRaw = StringSplit($RawData,'</li>',1) Local $TempArray[0][3] For $i = 1 To Ubound($SplitRaw) - 1 Local $M = StringReplace($SplitRaw[$i],'<li class="',"");remove leading text Local $N = StringReplace($M,'">',";");remove unwanted characters Local $O = StringReplace($N,":",";") _ArrayAdd($TempArray,$O,0,";") Next _ArrayDisplay($TempArray) This just needs the description removed and it is ready to use. Edit - For anyone who wants to chime in on this one there is a description that becomes part of the string behind the port number that sometimes does not show up at all (it's optional when entering the data in the website). I cannot figure out how to trim the description from my array. Edited December 10, 2014 by computergroove Sodori 1 Get Scite to add a popup when you use a 3rd party UDF -> http://www.autoitscript.com/autoit3/scite/docs/SciTE4AutoIt3/user-calltip-manager.html
Sodori Posted December 10, 2014 Author Posted December 10, 2014 #include <array.au3> #include <File.au3> #include <String.au3> #include <IE.au3> Local $oIE = _IECreate("http://proxyipchecker.com/") WinWait("Online Proxy Checker - IP Checker - Check Proxy - Internet Explorer") Local $HTML = _IEDocReadHTML($oIE);Gets all HTML Local $LeftCount = StringInStr($HTML,'<ul class="freshproxies">');find the count of characters that come before the first string you want to find Local $temp = StringTrimLeft($HTML,$LeftCount + 25);removes all characters before the first ipaddress Local $RightLocation = StringInStr($temp,"</li></ul>");position of the end of the ip address section in the html Local $RawData = StringMid($temp,1,$RightLocation - 1);unedited datablock of ip address information Local $SplitRaw = StringSplit($RawData,'</li>',1) Local $TempArray[0][3] For $i = 1 To Ubound($SplitRaw) - 1 Local $M = StringReplace($SplitRaw[$i],'<li class="',"");remove leading text Local $N = StringReplace($M,'">',";");remove unwanted characters Local $O = StringReplace($N,":",";") _ArrayAdd($TempArray,$O,0,";") Next _ArrayDisplay($TempArray) This just needs the description removed and it is ready to use. Edit - For anyone who wants to chime in on this one there is a description that becomes part of the string behind the port number that sometimes does not show up at all (it's optional when entering the data in the website). I cannot figure out how to trim the description from my array. Big thanks! A bug inside port section, but I think I can manage it from here. Dare say it would be kinda handy with a "_IEGetObjectBy($type (class,name, ID etc), $oObject, $sName, $iIndex [optional])", but I digress. Perhaps they would have done it already if it were possible, who knows. Again, computergroove, thanks!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now