jcpetu Posted August 14, 2020 Share Posted August 14, 2020 Hi guys, I'm trying to extract all sites between href=" " from a concatenate string, but after several tests with StringRegExpReplace I'm not able to do it. I'll appreciate any help. This is what I have so far: $concatenate = 'class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/sea-suma-cuatro-goles-en-tres-partidos-ante-el-bayern/">SEA SUMA CUATRO GOLES EN TRES PARTIDOS ANTE EL BAYERN</a></div><div class="desc_noticies"><p>Sea Jcpe suma cuatro goles en tres enfrentamientos contra el Bayern de Múnich en la Liga de Campeones: dos en […]</p></div></div></div><div class="post_grid_noticies jcpe_noti_4"><div class="contenidor-zoom-out"><a href="https://jcpe.com/sea-marca-en-la-eliminacion-del-napoli/"><img width="2560" height="2560" src="https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-scaled.jpg?v=1596923563" class="img_grid_notis wp-post-image" alt="" srcset="https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-scaled.jpg?v=1596923563 2560w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-300x300.jpg?v=1596923563 300w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-1024x1024.jpg?v=1596923563 1024w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-150x150.jpg?v=1596923563 150w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-768x768.jpg?v=1596923563 768w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-1536x1536.jpg?v=1596923563 1536w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-2048x2048.jpg?v=1596923563 2048w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-75x75.jpg?v=1596923563 75w" sizes="(max-width: 2560px) 100vw, 2560px" /></a></div><div class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/sea-marca-en-la-eliminacion-del-napoli/">SEA JCPE MARCA EN LA CLASIFICACIÓN CONTRA EL NAPOLI</a></div><div class="desc_noticies"><p>Sea Jcpe ha marcado un gol en la victoria del Equipo ante el Napoli por 3-1, que supone la clasificación […]</p></div></div></div><div class="post_grid_noticies jcpe_noti_5"><div class="contenidor-zoom-out"><a href="https://jcpe.com/el-equipo-a-por-los-cuartos-de-final-de-la-champions/"><img width="2560" height="2560" src="https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-scaled.jpg?v=1596709556" class="img_grid_notis wp-post-image" alt="" srcset="https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-scaled.jpg?v=1596709556 2560w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-300x300.jpg?v=1596709556 300w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-1024x1024.jpg?v=1596709556 1024w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-150x150.jpg?v=1596709556 150w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-768x768.jpg?v=1596709556 768w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-1536x1536.jpg?v=1596709556 1536w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-2048x2048.jpg?v=1596709556 2048w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-75x75.jpg?v=1596709556 75w" sizes="(max-width: 2560px) 100vw, 2560px" /></a></div><div class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/el-equipo-a-por-los-cuartos-de-final-de-la-champions/">EL EQUIPO, A POR LOS CUARTOS DE FINAL DE LA CHAMPION...</a></div><div class="desc_noticies"><p>El Equipo buscará este sábado en el Camp Nou la clasificación para los cuartos de final de la Liga de […]</p></div></div></div></div></div></div><div class="mas-noticias mes-noticies"> <a href="noticias">Más noticias' $result = StringRegExpReplace($concatenate, "(?i)href=[""'](.*?)[""']|\z;", 3) _ArrayDisplay($result) Link to comment Share on other sites More sharing options...
TheXman Posted August 14, 2020 Share Posted August 14, 2020 (edited) 1 hour ago, jcpetu said: $result = StringRegExpReplace($concatenate, "(?i)href=[""'](.*?)[""']|\z;", 3) Your title and example refer to StringRegexpReplace. Why would you use StringRegexReplace to extract the hrefs in this particular case? That's not even the correct syntax for StringRegexpReplace. That's the syntax for StringRegexp. So how did you come up with the idea that you needed to use StringRegexpReplace? Here are a couple of ways that it could be done: #include <Constants.au3> #include <String.au3> #include <Debug.au3> $gsHTML = 'class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/sea-suma-cuatro-goles-en-tres-partidos-ante-el-bayern/">SEA SUMA CUATRO GOLES EN TRES PARTIDOS ANTE EL BAYERN</a></div><div class="desc_noticies"><p>Sea Jcpe suma cuatro goles en tres enfrentamientos contra el Bayern de Múnich en la Liga de Campeones: dos en […]</p></div></div></div><div class="post_grid_noticies jcpe_noti_4"><div class="contenidor-zoom-out"><a href="https://jcpe.com/sea-marca-en-la-eliminacion-del-napoli/"><img width="2560" height="2560" src="https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-scaled.jpg?v=1596923563" class="img_grid_notis wp-post-image" alt="" srcset="https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-scaled.jpg?v=1596923563 2560w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-300x300.jpg?v=1596923563 300w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-1024x1024.jpg?v=1596923563 1024w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-150x150.jpg?v=1596923563 150w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-768x768.jpg?v=1596923563 768w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-1536x1536.jpg?v=1596923563 1536w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-2048x2048.jpg?v=1596923563 2048w, https://static.jcpe.com/wp-content/uploads/2020/08/Crónica-Napoli-75x75.jpg?v=1596923563 75w" sizes="(max-width: 2560px) 100vw, 2560px" /></a></div><div class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/sea-marca-en-la-eliminacion-del-napoli/">SEA JCPE MARCA EN LA CLASIFICACIÓN CONTRA EL NAPOLI</a></div><div class="desc_noticies"><p>Sea Jcpe ha marcado un gol en la victoria del Equipo ante el Napoli por 3-1, que supone la clasificación […]</p></div></div></div><div class="post_grid_noticies jcpe_noti_5"><div class="contenidor-zoom-out"><a href="https://jcpe.com/el-equipo-a-por-los-cuartos-de-final-de-la-champions/"><img width="2560" height="2560" src="https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-scaled.jpg?v=1596709556" class="img_grid_notis wp-post-image" alt="" srcset="https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-scaled.jpg?v=1596709556 2560w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-300x300.jpg?v=1596709556 300w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-1024x1024.jpg?v=1596709556 1024w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-150x150.jpg?v=1596709556 150w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-768x768.jpg?v=1596709556 768w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-1536x1536.jpg?v=1596709556 1536w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-2048x2048.jpg?v=1596709556 2048w, https://static.jcpe.com/wp-content/uploads/2020/08/Previa-Champions-75x75.jpg?v=1596709556 75w" sizes="(max-width: 2560px) 100vw, 2560px" /></a></div><div class="contingut_noticies"><div class="tags_noticies"></div><div class="titol_noticies"> <a href="https://jcpe.com/el-equipo-a-por-los-cuartos-de-final-de-la-champions/">EL EQUIPO, A POR LOS CUARTOS DE FINAL DE LA CHAMPION...</a></div><div class="desc_noticies"><p>El Equipo buscará este sábado en el Camp Nou la clasificación para los cuartos de final de la Liga de […]</p></div></div></div></div></div></div><div class="mas-noticias mes-noticies"> <a href="noticias">Más noticias' $gaResult = StringRegExp($gsHTML, 'href="([^"]+)', $STR_REGEXPARRAYGLOBALMATCH) If IsArray($gaResult) Then _DebugArrayDisplay($gaResult) $gaResult = _StringBetween($gsHTML, 'href="', '"') If Not @error Then _DebugArrayDisplay($gaResult) Edited August 14, 2020 by TheXman Added _StringBetween() example jcpetu 1 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
jcpetu Posted August 14, 2020 Author Share Posted August 14, 2020 TheXman, thanks a lot for your rapid response. I'm sorry for the function misuse thing is that I was trying with both functions and after a lot of trial and error I mixed them up. It partially works, because it doesn't bring all the possible results. If you find href= in the view-source of the site it will get 85 matchs, and with your approach it gets 72. Thanks again. expandcollapse popup#include <array.au3> #include <Debug.au3> #include <String.au3> #include "WinHttp.au3" Local $hOpen = _WinHttpOpen() If @error Then MsgBox(48, "Error", "Error initializing the usage of WinHTTP functions.") Exit EndIf Local $Host = "messi.com" Local $hConnect = _WinHttpConnect($hOpen, $Host) ; <- yours here If @error Then MsgBox(48, "Error", "Error specifying the initial target server of an HTTP request.") _WinHttpCloseHandle($hOpen) Exit EndIf Local $req = _WinHttpOpenRequest($hConnect) If @error Then MsgBox(48, "Error", "Error creating an HTTP request handle.") _WinHttpCloseHandle($hConnect) _WinHttpCloseHandle($hOpen) Exit EndIf _WinHttpSendRequest($req) If @error Then MsgBox(48, "Error", "Error sending specified request.") _WinHttpCloseHandle($req) _WinHttpCloseHandle($hConnect) _WinHttpCloseHandle($hOpen) Exit EndIf _WinHttpReceiveResponse($req) ;------------------------ Wait for the response If @error Then MsgBox(48, "Error", "Error waiting for the response from the server.") _WinHttpCloseHandle($req) _WinHttpCloseHandle($hConnect) _WinHttpCloseHandle($hOpen) Exit EndIf Local $sChunk, $gsHTML If _WinHttpQueryDataAvailable($req) Then ;------------- See if there is data to read While 1 $sChunk = _WinHttpReadData($req) If @error Then ExitLoop $gsHTML &= $sChunk WEnd ConsoleWrite($gsHTML & @CRLF) ; print to console $gaResult = StringRegExp($gsHTML, 'href="([^"]+)', $STR_REGEXPARRAYGLOBALMATCH) If IsArray($gaResult) Then _DebugArrayDisplay($gaResult) Else MsgBox(48, "Error", "Site is experiencing problems.") EndIf _WinHttpCloseHandle($req) _WinHttpCloseHandle($hConnect) _WinHttpCloseHandle($hOpen) Link to comment Share on other sites More sharing options...
TheXman Posted August 14, 2020 Share Posted August 14, 2020 (edited) My example was as accurate as the data in which you provided. The discrepancy is because the website that you referenced (messi.com) has some hrefs enclosed in double quotes and others in single quotes. I only looked for double quotes because that is what was in the data that you provided. Also, my example was given to point you in the right direction, not to give you a fully working solution. Since this is kind of a weird one, here's an example that will get both: #include <Constants.au3> #include <InetConstants.au3> #include <Debug.au3> $gsHTML = InetRead("https://messi.com", $INET_FORCEBYPASS) If @error Then Exit MsgBox($MB_ICONERROR, "ERROR", "Unable to retrieve website") $gsHTML = BinaryToString($gsHTML) $gaResult = StringRegExp($gsHTML, 'href=(?:"|'')([^"'']+)', $STR_REGEXPARRAYGLOBALMATCH) If IsArray($gaResult) Then _DebugArrayDisplay($gaResult) Edited August 14, 2020 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
jcpetu Posted August 14, 2020 Author Share Posted August 14, 2020 OK, thanks a lot. Link to comment Share on other sites More sharing options...
TheXman Posted August 14, 2020 Share Posted August 14, 2020 (edited) I updated my previous post with a more accurate example based on your actual data. Edited August 14, 2020 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
jcpetu Posted August 14, 2020 Author Share Posted August 14, 2020 I appreciated a lot TheXman. Link to comment Share on other sites More sharing options...
TheXman Posted August 14, 2020 Share Posted August 14, 2020 You're welcome! CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
jcpetu Posted August 16, 2020 Author Share Posted August 16, 2020 Hi people, in some cases I need unique records so I apply _ArrayUnique to the resulting array. Is there any option to StringRegExp to get unique records and avoid using _ArrayUnique? $gaResult = StringRegExp($gsHTML, 'href=(?:"|'')([^"'']+)', $STR_REGEXPARRAYGLOBALMATCH) If IsArray($gaResult) Then $gaResult = _ArrayUnique($gaResult) _DebugArrayDisplay($gaResult) EndIf Link to comment Share on other sites More sharing options...
TheXman Posted August 16, 2020 Share Posted August 16, 2020 (edited) 3 hours ago, jcpetu said: Is there any option to StringRegExp to get unique records and avoid using _ArrayUnique? A regular expression to provide unique hrefs is certainly possible. One way would be to use a negative lookahead. But compared to using _ArrayUnique(), it would be MUCH slower and inefficient due to all of the backtracking that would need to be done by the regular expression engine. _ArrayUnique() uses a scripting dictionary to remove duplicates which is lightning fast compared to most other AutoIt methods, assuming you area dealing with a 1D or 2D array and only need to remove duplicates based on a single column. Why do you want to avoid using _ArrayUnique()? Edited August 16, 2020 by TheXman Musashi and jcpetu 1 1 CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
jcpetu Posted August 16, 2020 Author Share Posted August 16, 2020 Hi TheXman, I'm not avoiding to use _ArrayUnique() I was just curious. But as per what you said I'll keep using _ArrayUnique(). Thank you very much. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now