youtuber Posted February 11, 2018 Share Posted February 11, 2018 (edited) I doubt about whether the site title is true for a regex try like this, I want title1, title2, title3 <title>title1 | wb1</title> <title>title2 – wb2</title> <title>title3 - wb3</title> <title>title4 _ wb4</title> #include <Array.au3> $sSource = "<title>title1 | wb1</title>" & @CRLF & _ "<title>title2 – ; wb2</title>" & @CRLF & _ "<title>title3 – wb3</title>" & @CRLF & _ "<title>title4 _ wb4</title>" $aRegEx = StringRegExp($sSource, '<title>?([^|\-\–\_]+)', 3) _ArrayDisplay($aRegEx) Edited February 11, 2018 by youtuber Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 ? $aRegEx = StringRegExp($sSource, '<title>(.*?)\W.*</title>', 3) youtuber 1 "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
Simpel Posted February 11, 2018 Share Posted February 11, 2018 Or pattern like this: “<title>([a-zA-Z0-9]+).+<\/title>“ Regards, Conrad youtuber 1 SciTE4AutoIt = 3.7.3.0 AutoIt = 3.3.14.2 AutoItX64 = 0 OS = Win_10 Build = 19044 OSArch = X64 Language = 0407/german H:\...\AutoIt3\SciTE H:\...\AutoIt3 H:\...\AutoIt3\Include (H:\ = Network Drive) Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 (edited) Or StringRegExp($sSource, "(?i)<title>\s*([^ ]*)", 3) Basically it says, barring any intial whitespace, capture everything after <title> until you encounter a space. The (?i) makes the search case-insensitive. If you know that <title> will ALWAYS be lowercase, then you can remove it from the regular expression. The same is true for the \s*. If you are sure that there will never be one or more spaces between <title> and the title, then you can remove it also. Edited February 11, 2018 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 (edited) They will probably all fail since titles generally have spaces in them. ie: The quick brown fox - wb1 Edited February 11, 2018 by ripdad "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 (edited) 8 minutes ago, ripdad said: They will probably all fail since titles generally have spaces in them. ie: The quick brown fox - wb1 No, I don't think you understand the regular expression. I said that it will remove any INITIAL whitespace. If you test it, I think you will see that it will not fail. The OP said he wanted just title1, title2, title3... It didn't say that everything between the <title> tag was wanted. If everything between <title> and </title> is required, then it is even simpler: StringRegExp($sSource, "(?i)<title>(.*?)</title>", 3) Edited February 11, 2018 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 8 minutes ago, TheXman said: The OP said he wanted just title1, title2, title3... It didn't say that everything between the <title> tag was wanted. I know. But, like all things with SRE, you have to think of what you overlooked. He wants the titles, but not anything extra in the title. And since titles are generally more than one word separated by spaces, they (the previous codes) will fail. Except for the last one you posted, which is not what he ask for. "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 (edited) 1 hour ago, youtuber said: I want title1, title2, title3 I am taking what the OP asked for literally which is all one can do unless he/she adds more specificity. I think you are making assumptions that were not detailed in the request, that being, more than just title1, title2, title3 is what is wanted. You could be right or I could be right. I'm just working off of what was requested, not what I think the OP meant. Edited February 11, 2018 by TheXman CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 Okay. "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
OldGuyWalking Posted February 11, 2018 Share Posted February 11, 2018 YouTuber - Thanks for asking the question. Your example helped me understand the $STR_REGEXPARRAYGLOBALMATCH (3) flag better. That will come in handy for doing one off extraction of data from XML and NZB files. I love accidental learning situations. Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 (edited) Although I don't like to make assumptions, assuming @ripdad is correct in terms of the OP wanting everything between the <title> tags except certain characters or strings, here's a possible solution. It is not meant to be exhaustive, just an example of one way to achieve the goal. expandcollapse popup#include <Array.au3> #include <Constants.au3> test() ;========================================================================== ; ;========================================================================== Func test() Const $kSource = "<title>title1 | wb1</title>" & @CRLF & _ "<title>title2 – ; wb2</title>" & @CRLF & _ "<title>title3 – wb3</title>" & @CRLF & _ "<title>title4 _ wb4</title>" Local $aTitles ;Parse titles into an array $aTitles = StringRegExp($kSource, "(?i)<title>(.*?)</title>", 3) Switch @error Case 1 ; No matches found MsgBox($MB_ICONWARNING, "Test", "No matches found - check regular expression") Exit 1 Case 2 ; Invalid regex MsgBox($MB_ICONERROR, "Test", StringFormat("Invalid regular expression - error at position %s", @extended)) Exit 1 EndSwitch _ArrayDisplay($aTitles, "Raw Titles") ;Remove unwanted characters from each title in the array For $i = 0 To UBound($aTitles) - 1 $aTitles[$i] = StringReplace($aTitles[$i], "-", "") $aTitles[$i] = StringReplace($aTitles[$i], "–", "") $aTitles[$i] = StringReplace($aTitles[$i], "|", "") $aTitles[$i] = StringReplace($aTitles[$i], "_", "") $aTitles[$i] = StringReplace($aTitles[$i], ";", "") $aTitles[$i] = StringRegExpReplace($aTitles[$i], "&#\d{4}", "") $aTitles[$i] = StringRegExpReplace($aTitles[$i], " +", " ") ;remove extra spaces Next _ArrayDisplay($aTitles, "Scrubbed Titles") EndFunc Edited February 11, 2018 by TheXman Added error checking example and comments CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
iamtheky Posted February 11, 2018 Share Posted February 11, 2018 why so complex for: #include <Array.au3> $sSource = "<title>title1 | wb1</title>" & @CRLF & _ "<title>title22 – ; wb2</title>" & @CRLF & _ "<title>title333 – wb3</title>" & @CRLF & _ "<title>title4444 _ wb4</title>" ;just titlenumber $aRegEx = StringRegExp($sSource, '<title>(\w+)', 3) _ArrayDisplay($aRegEx) ;everything $aRegEx = StringRegExp($sSource, '<title>(.*?)<', 3) _ArrayDisplay($aRegEx) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 2 minutes ago, iamtheky said: why so complex for: Because there is a question as to what the OP actually wants. If it is everything between the <title> tags except for certain characters (as it appears he/she may have been trying to do in their example), then your snippet would not achieve that goal. If it is just pulling out the title, then their are a few examples of that too, including yours. CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
iamtheky Posted February 11, 2018 Share Posted February 11, 2018 (edited) my snippet does both things... the point was how few words it uses to show both examples rather than having a long ass discussion about the two things it could mean. i see now mulitple similar examples scattered throughout, still a wordy af thread. Edited February 11, 2018 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
TheXman Posted February 11, 2018 Share Posted February 11, 2018 1 minute ago, iamtheky said: my snippet does both things... Are you sure about that? Yours appears to have all of the characters that the OP appeared to be trying to exclude in their regex. Also, I wasn't aware that we were having a contest on who could use the fewest lines of code. CryptoNG UDF: Cryptography API: Next Gen jq UDF: Powerful and Flexible JSON Processor | jqPlayground: An Interactive JSON Processor Xml2Json UDF: Transform XML to JSON | HttpApi UDF: HTTP Server API | Roku Remote: Example Script About Me How To Ask Good Questions On Technical And Scientific Forums (Detailed) | How to Ask Good Technical Questions (Brief) "Any fool can know. The point is to understand." -Albert Einstein "If you think you're a big fish, it's probably because you only swim in small ponds." ~TheXman Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 This is as close as I can get without making my brain bleed. You will have to check for blanks in a For Loop. ie: If $aRegEx[$i] <> '' Then.... #include <Array.au3> $sSource = "<title>title1 | wb1</title>" & @CRLF & _ "<title>the quick brown fox</title>" & @CRLF & _ "<title>title3 – wb3</title>" & @CRLF & _ "<title>title4 _ wb4</title>" $aRegEx = StringRegExp($sSource, '<title>(.*?)[\|\-\_\&\#\–].*</title>|<title>(.*?)</title>', 3) _ArrayDisplay($aRegEx) I'm sure an SRE guru will come along with a simpler solution. "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
kylomas Posted February 11, 2018 Share Posted February 11, 2018 ripdad, ? #include <array.au3> local $str = '<title>title1 | wb1</title>' & @CRLF & _ '<title>title2 – wb2</title>' & @CRLF & _ '<title>title3 - wb3</title>' & @CRLF & _ '<title>title4 _ wb4</title>' ;msgbox(0,'',$str) msgbox(0,'',_arraytostring(stringregexp($str,'title>([^<]+).*',3),@CRLF)) kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
ripdad Posted February 11, 2018 Share Posted February 11, 2018 kylomas, I wish it were that simple. If you study the code in Post #1, you will see he is trying to remove everything after certain characters within the title. He just wants the initial title. "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward Link to comment Share on other sites More sharing options...
OldGuyWalking Posted February 11, 2018 Share Posted February 11, 2018 My take on this. Slight tweak on the original and It returns. Title1 Title2 Title3 Title4 #include <Array.au3> $sSource = "<title>title1 | wb1</title>" & @CRLF & _ "<title>title2 – ; wb2</title>" & @CRLF & _ "<title>title3 – wb3</title>" & @CRLF & _ "<title>title4 _ wb4</title>" $aRegEx = StringRegExp($sSource, '<title>?(\w[^|\-\–\_\s]+)', 3) _ArrayDisplay($aRegEx) Link to comment Share on other sites More sharing options...
youtuber Posted February 11, 2018 Author Share Posted February 11, 2018 (edited) Obviously, I don't want the ascii characters on the left side What I need is title1 title2 title3 title4 title5 title6 #include <Array.au3> $sSource = "<title>»title1 | wb1</title>" & @CRLF & _ "<title>®☺title2 – ; wb2</title>" & @CRLF & _ "<title>●title3 – wb3</title>" & @CRLF & _ "<title>-title4 _ wb4</title>" & @CRLF & _ "<title>_title5 _ wb5</title>" & @CRLF & _ "<title> _ title6 _ wb6</title>" $aRegEx = StringRegExp($sSource, "<title>.*?([a-zA-Z0-9]+).+<\/title>", 3) _ArrayDisplay($aRegEx) And if this is what I want title-1 tit-le2 tit@le3 title_4 tit-le5 ti_tle6 #include <Array.au3> $sSource = "<title>»title-1 | wb1</title>" & @CRLF & _ "<title>®☺tit-le2 – ; wb2</title>" & @CRLF & _ "<title>●tit@le3 – wb3</title>" & @CRLF & _ "<title>-title_4 _ wb4</title>" & @CRLF & _ "<title>_tit-le5 _ wb5</title>" & @CRLF & _ "<title> _ ti_tle6 _ wb6</title>" $aRegEx = StringRegExp($sSource, "<title>.*?([a-zA-Z0-9]+).+<\/title>", 3) _ArrayDisplay($aRegEx) but I do not know what a pattern would be like Edited February 11, 2018 by youtuber Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now