youtuber Posted September 19, 2018 Share Posted September 19, 2018 Friends, I'm having trouble in my loop or I don't get the regex demi problem, but I can't get the right data. My codes are as follows $aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500") ;a href="https://www.nofollow.com/" rel="nofollow" target="_blank" ;a href="https://www.dofollow.com/" target="_blank" $RegExp0 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))(.*)nofollow", 3) $RegExp1 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))([^'&]+)", 3) If IsArray($RegExp0) Then For $i = 0 To UBound($RegExp0) - 1 ConsoleWrite($RegExp1[$i] & " " & "Nofollow" & @CRLF) Next Else For $c = 0 To UBound($RegExp1) - 1 ConsoleWrite($RegExp1[$i] & " " & "Dofollow" & @CRLF) Next EndIf Func _HttpGetRegexTest($aUrl) Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1") $oHTTP.Open("GET", $aUrl, False) $oHTTP.Send() If @error Then $error = 1 $oHTTP = 0 Return SetError(1) EndIf If $oHTTP.Status = 200 Then Local $sReceived = $oHTTP.ResponseText $oHTTP = Null Return $sReceived EndIf $oHTTP = Null Return -1 EndFunc ;==>_HttpGetRegexTest Link to comment Share on other sites More sharing options...
BrewManNH Posted September 19, 2018 Share Posted September 19, 2018 What data are you trying to get, what data are you actually getting? We can't help you if you don't tell us what you need, and what you get, that should be blatantly obvious to you by this time. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator Link to comment Share on other sites More sharing options...
youtuber Posted September 20, 2018 Author Share Posted September 20, 2018 To parse the link structure I specified below this is an nofollow url ;a href="https://www.nofollow.com/" rel="nofollow" target="_blank" This is a dofollow url ;a href="https://www.dofollow.com/" target="_blank" Link to comment Share on other sites More sharing options...
BigDaddyO Posted September 20, 2018 Share Posted September 20, 2018 is there a reason you can't just use a much simpler StringInStr? If StringLeft($sString, 7) = "a href=" Then If StringInStr($sString, ";nofollow") > 0 Then ConsoleWrite("This is a NO follow" & @CRLF) Else ConsoleWrite("This is a FOLLOW" & @CRLF) EndIf EndIf Link to comment Share on other sites More sharing options...
youtuber Posted September 20, 2018 Author Share Posted September 20, 2018 your code doesn't work $aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500") $RegExp1 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))([^'&]+)", 3) If StringLeft($aRegexGet, 7) = "a href=" Then If StringInStr($aRegexGet, ";nofollow") > 0 Then ConsoleWrite($RegExp1[0] & "This is a NO follow" & @CRLF) Else ConsoleWrite($RegExp1[0] & "This is a FOLLOW" & @CRLF) EndIf EndIf Func _HttpGetRegexTest($aUrl) Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1") $oHTTP.Open("GET", $aUrl, False) $oHTTP.Send() If @error Then $error = 1 $oHTTP = 0 Return SetError(1) EndIf If $oHTTP.Status = 200 Then Local $sReceived = $oHTTP.ResponseText $oHTTP = Null Return $sReceived EndIf $oHTTP = Null Return -1 EndFunc Link to comment Share on other sites More sharing options...
BigDaddyO Posted September 20, 2018 Share Posted September 20, 2018 works fine for the 2 example strings you posted above. If it's not putting anything in the Console then it's not an a href line and it's ignored. Link to comment Share on other sites More sharing options...
youtuber Posted September 22, 2018 Author Share Posted September 22, 2018 It also needs to parse the url address but does not parse the url Link to comment Share on other sites More sharing options...
mikell Posted September 22, 2018 Share Posted September 22, 2018 The webmaster who wrote the source code of the concerned page should be fired #Include <Array.au3> $aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500") $RegExp0 = StringRegExp($aRegexGet, '(https?[^;"]+)(?=(?:"|")[\w=\h"]*rel=(?:"|")?nofollow)', 3) _ArrayDisplay($RegExp0, "nofollow") youtuber 1 Link to comment Share on other sites More sharing options...
youtuber Posted September 27, 2018 Author Share Posted September 27, 2018 (edited) @mikell how should this html be shaped? Is my pattern right? nofollow <a href="https://testregex.nofollow.com" rel="nofollow">https://testregex.nofollow.com</a> <a href="https://testregex.nofollowopennewtab.com" target="_blank" rel="nofollow noopener">https://testregex.nofollowopennewtab.com</a> (https?[^"]+)(?=(?:|")[\w=\h"]*rel=(?:|")?nofollow) dofollow <a href="https://testregex.dofollowopennewtab.com" target="_blank" rel="noopener">https://testregex.dofollowyenisekme.com</a> <a href="https://testregex.dofollow.com">https://testregex.dofollow.com</a> <a\s+(?:[^>]*?\s+)?href="(?:[^>]*)> or dofollow regex pattern <a\s+(?:[^>]*?\s+)?href="(https?[^"]+)(?:["^>]*)> Edited September 27, 2018 by youtuber Link to comment Share on other sites More sharing options...
mikell Posted September 28, 2018 Share Posted September 28, 2018 22 hours ago, youtuber said: Is my pattern right? I don't know. Because I don't know what you exactly want to do, what should precisely be the expected results, etc You can play with this snippet which will - maybe... - fit your needs #Include <Array.au3> $txt = 'nofollow' & _ '<a href="https://testregex.nofollow.com" rel="nofollow">https://testregex.nofollow.com</a>' & _ '<a href="https://testregex.nofollowopennewtab.com" target="_blank" rel="nofollow noopener">"https://testregex.nofollowopennewtab.com</a>' & _ 'dofollow' & _ '<a href="https://testregex.dofollowopennewtab.com" target="_blank" rel="noopener">https://testregex.dofollowyenisekme.com</a>' & _ '<a href="https://testregex.dofollow.com">https://testregex.dofollow.com</a>' ;Msgbox(0,"", $txt) $link = '(https?[^"<>]++)"' $check = '[\w=\h"]*rel="nofollow' $a = StringRegExp($txt, $link & '(?=' & $check & ')', 3) $b = StringRegExp($txt, $link & '(?!' & $check & ')', 3) _ArrayDisplay($a, "nofollow") _ArrayDisplay($b, "follow") youtuber 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now