neogia Posted March 24, 2006 Share Posted March 24, 2006 (edited) Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().StringRegExp( "test", "pattern" [, flag ] )"test" = The string to search through for matches."pattern" = A string consisting of certain key characters that let the function know PRECISELY what you want to match. No ifs, ands, or buts.. it's a match or it isn't.flag[optional] = Tells the function if you just want to know if the "pattern" is found, or if you want it to return the first match, or if you want it to return all the matches in the "test" string.The Very Basics------------------As you may have figured out, the "pattern" string is the only difficult part of calling a StringRegExp() (forthwith: SRE). I find it best to think of patterns as telling the function to match a string character by character. There are different ways to find a certain character: If you want to match the string "test", that should be simple enough. You want to tell SRE to first search the string for a "t". If it finds one, then it assumes it has a match, and the rest of the pattern is used to try to prove that what it's found is not a match. So, if the next character is an "e", it could still be a match. Let's say the next letter is an "x". SRE knows immediately that it hasn't found a match because the third character you tell it to look for is an "s".Example 1MsgBox(0, "SRE Example 1 Result", StringRegExp("text", 'test'))In this example, the message box should read "0", which means the pattern "test" was not found in the test string "text". I know this seems pretty simple, but now you know why it wasn't found.The next way of specifying a pattern is by using a set ("[ ... ]"). You can equate a set to the logic function "OR". Let's use the previous Example. We want to find either the string "test" or the string "text". So, the way I start looking for a pattern is to think like SRE would think: The first character I want to match is "t", then the letter "e", this is the same for both strings we want to match. Now we want to match "s" OR "x", so we can use a set as a substitute: "[sx]" means match an "s" or an "x". Then the last letter is a "t" again.Example 2MsgBox(0, "SRE Example 2 Result", StringRegExp("text", 'te[sx]t')) MsgBox(0, "SRE Example 2 Result", StringRegExp("test", 'te[sx]t'))These should both provide the result "1", because the pattern should match both "test" and "text".You can also specify how many times to match each character by using "{number of matches}" or you can specify a range by using "{min, max}". The first example below is redundant, but shows what I mean:Example 3MsgBox(0, "SRE Example 3 Result", StringRegExp("text", 't{1}e{1}[sx]{1}t{1}')) MsgBox(0, "SRE Example 3 Result", StringRegExp("aaaabbbbcccc", 'b{4}'))The Not-So Basics--------------------Right now you're probably thinking "Isn't this just a glorified StringInStr() function?". Well, using a "flag" value of 0, most of the time you're right. But SRE is much more powerful than that. As you use SRE's more and more, you'll find you might know less and less about the type of pattern you are looking for. There are ways to be less and less specific about each character you wish to specify in the pattern. Take, for example, a line from the chat log of a game: "Gnarly Monster hits you for 18 damage." You want to find out how much damage Gnarly Monster hit you for. Well, you can't use StringInStr() because you aren't looking for "18", you're looking for "????", where ? could be any digit.Here's how I would assemble this pattern. Look at what you do and do not know about what you want to find: 1) You know that it will ALWAYS contain nothing but digits. 2) You know that it will SOMETIMES be 2 characters long. 2a) You know from playing the game that the maximum damage a monster can do is 999. 2b) You know that the minimum damage a monster can do is 0. 3) You know that it will ALWAYS be between 1 and 3 characters long. 4) You know that there are no other digits in the test string.At this point, I'd like to introduce the FLAG value of "1" and the grouping characters "()". The flag value of "1" means that SRE will not only match your pattern, but also return an array, with each element of the array consisting of a captured "group" of characters. So without veering off course too much, take this example:Example 4$asResult = StringRegExp("This is a test example", '(test)', 1) If @error == 0 Then MsgBox(0, "SRE Example 4 Result", $asResult[0]) EndIf $asResult = StringRegExp("This is a test example", '(te)(st)', 1) If @error == 0 Then MsgBox(0, "SRE Example 4 Result", $asResult[0] & "," & $asResult[1]) EndIfSo, first the pattern must match somewhere in the test string. If it does, then SRE is told to "capture" any groups ("()") and store them in the return array. You can use multiple captures, as demonstrated by the second piece of code in Example 4.Ok, back to the Gnarly Monster. Now that we know how to "capture" text, let's construct our pattern: Since you know what you're looking for is digits, there are 3 ways to specify "match any digit": "[:digit:]", "[0-9]", and "\d". The first is probably the easiest to understand. There are a few classes (digit, alnum, space, etc. Check the helpfile for a full list) you can use to specify sets of characters, one of them being digit. "[0-9]" just specifies a range of all the digits 0 through 9. "\d" is just a special character that means the same as the first two. There is no difference between the three, and with all SRE's there are usually at least a couple ways to construct any pattern.So, first we know we want to capture the digits, so indicate that with the opening parentheses "(". Next, we know we want to capture between 1 and 3 characters, all consisting of digits, so our pattern now looks like "([0-9]{1,3}". And finally close it off with the closing parentheses to indicate the end of our group: "([0-9]{1,3})". Let's try it:Example 5$asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1) If @error == 0 Then MsgBox(0, "SRE Example 5 Result", $asResult[0]) EndIfThere you go, the message box correctly displays "18".Next we need to cover non-capturing groups. The way you indicate these groups is by opening the group with "(?:" instead of just "(". Let's say your log says "You deflect 36 of Gnarly Monster's 279 damage." Now if you run Example 5's SRE on this, you'll come up with "36" instead of "279". Now what I like to do here is just determine what's different between the numbers. One that jumps out at me is that the second number is always followed by a space and then the word "damage". We could just modify our previous pattern to be "([0-9]{1,3} damage)", but what if our script is just looking for the amount of damage, without " damage" tacked onto the end of the number? Here's where you can use a non-capturing group to accomplish this.Example 6$asResult = StringRegExp("You deflect 36 of Gnarly Monster's 279 damage.", '([0-9]{1,3})(?: damage)', 1) If @error == 0 Then MsgBox(0, "SRE Example 6 Result", $asResult[0]) EndIfThis could get lengthy, but mostly I just wanted to lay out the foundation for how regular expressions work, and mainly how SRE "thinks". A few things to keep in mind: - Remember to think about the pattern one character at a time - The StringRegExp() function finds the first character in the pattern, then it's your job to provide enough evidence to "prove" whether or not it truly is a match. Example 6 is a good display of this. - Remember [ ... ] means OR ([xyz] match an "x", a "y", OR a "z")If you have any other questions, consult the help file first! It explains in detail all of the nitty gritty syntax that comes along with SRE's. One thing to look at in particular is the section on "Repeating Characters". It can make your pattern more readible by substituting certain characters for ranges. For example: "*" is equivalent to {0,} or the range from 0 to any number of characters.Good luck, Regular Expressions can greatly decrease the length of your code, and make it easier to modify later. Corrections and feedback are welcome!Resources------------Wikipedia Article - Regular Expressions - Thanks blindwig.StringRegExpGUI.au3 (GUI for testing various StringRegExp() patterns) - Thanks steve8tch. Credit: w0uter Edited March 24, 2006 by neogia [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted March 24, 2006 Moderators Share Posted March 24, 2006 This is GREAT... neogia, thanks for sharing. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Xenobiologist Posted March 24, 2006 Share Posted March 24, 2006 This is GREAT... neogia, thanks for sharing.HI,I agree! Thank you very much! I have to play around with it.So long,Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
PsaltyDS Posted March 24, 2006 Share Posted March 24, 2006 Here's a smallish guide on unravelling the seeming mysteries of StringRegExp(). StringRegExp( "test", "pattern" [, flag ] ) "test" = The string to search through for matches. "pattern" = A string consisting of certain key characters that let the function know PRECISELY what you want to match. No ifs, ands, or buts.. it's a match or it isn't. flag[optional] = Tells the function if you just want to know if the "pattern" is found, or if you want it to return the first match, or if you want it to return all the matches in the "test" string. Nice job. Thanks. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
blindwig Posted March 24, 2006 Share Posted March 24, 2006 Just an FYI to anyone who is new to regular expressions - these are not a new concept, nor are they special to AutoIt. Anyone who has experience with Unix/Posix, Perl, or Tcl, for example, would be familiar with regular expressions.Here's the wikipedia page for regular expressions:http://en.wikipedia.org/wiki/Regular_expressionThe point is, if you learn regular expressions, that knowlede will be useful to you beyond just AutoIt. My UDF Threads:Pseudo-Hash: Binary Trees, Flat TablesFiles: Filter by Attribute, Tree List, Recursive Find, Recursive Folders Size, exported to XMLArrays: Nested, Pull Common Elements, Display 2dSystem: Expand Environment Strings, List Drives, List USB DrivesMisc: Multi-Layer Progress Bars, Binary FlagsStrings: Find Char(s) in String, Find String in SetOther UDF Threads I Participated:Base64 Conversions Link to comment Share on other sites More sharing options...
seandisanti Posted March 24, 2006 Share Posted March 24, 2006 awesome. i think i'll give SRE another try now. i've put them off a few times. thanks for the effort putting this together Link to comment Share on other sites More sharing options...
slightly_abnormal Posted March 24, 2006 Share Posted March 24, 2006 you think this will make it into the help file? Link to comment Share on other sites More sharing options...
steve8tch Posted March 24, 2006 Share Posted March 24, 2006 I have attached a utilility called StrRegExpGUI.au3. It was originally written by @w0uter and modified. I use it all the time to makeup patterns and quickly test and retest to see if the pattern is going to do what I want. A number of people have already downloaded it. I would urge poeple who are learning this function to use it. StrRegExpGUI.au3 Link to comment Share on other sites More sharing options...
neogia Posted March 24, 2006 Author Share Posted March 24, 2006 Thanks for all the great feedback, guys. I've added the resources listed by blindwig and steve8tch to the first post. [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
billmez Posted March 25, 2006 Share Posted March 25, 2006 Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().Great Job neogia!!!!!!One thing I would add is a mention of the importance in escaping special characters in the pattern with a backslash as mentioned in the help file. This is often a reason for matches failing. Link to comment Share on other sites More sharing options...
Uten Posted April 13, 2006 Share Posted April 13, 2006 Hi neogia, Seems to me you know your way around RegExp's in AutoIt, so I seek your advice. I have posted this in the forum, but there seems to be no takers. I have used ^ and $ quite a lot in vim, sed and awk but cant get it to work in AutoIt. The definitely work if you pass a single line to the StringRegExp function. But if you pass several lines (@CRLF separated) ^ and $ does not seem to work. Do you have any comments on this? Spot any obvious mistake in my code? I'm having trouble matching patterns specified to one line. This sample code is rather specific and I can extract the lines by using some of the other words, but that is not the point. I want to understand how I can write my pattern to match something starting on a line and ending on the same line. #include <Array.au3> ; Sample found in nutsters RegExp_Test_4.au3 if not StringRegExp("Theee always was work.", "^The*\s+\w+ was \w+[.]$", 0 ) then msgbox(16, "ERROR:","StringRegExp: Faield") Exit endif ; === My code ===================== $data = "This is a expected test line" & @CRLF & _ "And This is Not a expected line" & @CRLF & _ "This is the second expected msg" & @CRLF & _ "This is not wanted" $regexp ="(^This.*$)" $arr = StringRegExp($data, $regexp,3) ConsoleWrite("@error:=" & @error & ", @extended:=" & @extended & " , Ubound($arr):=" & UBound($arr) & @CRLF) _ArrayDisplay($arr, "MSG") ConsoleWrite("EXIT") According to nutster ^ and $ was planed to be documented but I can't find it in the help file v3.1.1.118(beta) And in this post. ^ and $ should work, but does it when you want to get the matches in an array? Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
neogia Posted April 14, 2006 Author Share Posted April 14, 2006 (edited) Thanks for the compliment, I love creating RegExp's. Well, I'm not completely familiar with all other languages' regular expression syntax, but I know that with AutoIt "^" means "only match if this is the first character in the test string" and likewise, "$" means "only match if this is the last character in the test string". I'm afraid it's not supposed to work on a per-line basis. If you do need to do this type of thing, though, I've created a workaround for you, with one discrepency: You list the last line in the test string as "This is not wanted", but I don't see a difference that will be discerned by your test pattern. Is this a typo, or am I missing something? My script recognizes that line, along with the first and third, as matches.#include <Array.au3> $data = "This is a expected test line" & @CRLF & _ "And This is Not a expected line" & @CRLF & _ "This is the second expected msg" & @CRLF & _ "This is not wanted" If StringLeft($data, 2) <> @CRLF Then $data = @CRLF & $data EndIf If StringRight($data, 2) <> @CRLF Then $data = $data & @CRLF EndIf $results = "" $match = StringRegExp($data, '\r\n(This.*?)(\#)\r\n', 1) While @extended == 1 If Not IsArray($results) Then $results = _ArrayCreate($match[0]) Else _ArrayAdd($results, $match[0]) EndIf $data = StringTrimLeft($data, $match[1]) $match = StringRegExp($data, '\r\n(This.*?)(\#)\r\n', 1) WEnd _ArrayDisplay($results, "")Also, if anyone else would like help with a certain pain-in-the-butt RegExp, post it here, and I'll make one for you. I'm in the process of lengthening my guide, and would like more examples to include. Edited April 14, 2006 by neogia [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
GrungeRocker Posted April 15, 2006 Share Posted April 15, 2006 thanks now i did understand it!....i think.... [font="Verdana"]In work:[list=1][*]InstallIt[*]New version of SpaceWar[/list] [/font] Link to comment Share on other sites More sharing options...
Uten Posted April 15, 2006 Share Posted April 15, 2006 (edited) I'm afraid it's not supposed to work on a per-line basis. If you do need to do this type of thing, though, I've created a workaround for you, with one discrepency: You list the last line in the test string as "This is not wanted", but I don't see a difference that will be discerned by your test pattern. Is this a typo, or am I missing something? My script recognizes that line, along with the first and third, as matches.Very well explained, and your right the last line in my test string is a typo as I was thinking of getting all lines starting with a T and ending with a @CRLF (and probably @LF and @CR).After I posted I realized that the autoit RegExp functions works different than I expected. My expectations was indeed influenced by my knowledge of sed and awk (who takes what's passed and parses it line by line).Also, if anyone else would like help with a certain pain-in-the-butt RegExp, post it here, and I'll make one for you. I'm in the process of lengthening my guide, and would like more examples to include.I have to run now, but I know that there are a few places denoted "sed one liners" and "awk one liners" with lots of straight forward, but sometimes complicated, samples. I think AutoIt would benefit from such a collection of samples. I'll volunteer to participate with you to get such a AutoIt collection as part of your guide, if you like.Thanks for the sample, explanation and your very nice guide.EDIT: Did not have a spell checker available when posting. Edited April 15, 2006 by Uten Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Uten Posted April 15, 2006 Share Posted April 15, 2006 Sed one linersAwk one linersOne line programsI'm not claiming that all of these one liners apply to tasks done in AutoIt. But I think most of them are good starting points to get a grip on RegExp patterns. Especially if we translate them (many or some) into AutoIt samples.Obviously it would be a crime not to mention http://www.autoitscript.com/fileman/users/Nutster/ Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
CharlieK Posted May 3, 2006 Share Posted May 3, 2006 I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is. I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function. I used the sample code example: $asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1) If @error == 0 Then MsgBox(0, "SRE Example 5 Result", $asResult[0]) EndIf I'm using v3.1.0 Can someone tell me what I am missing? Link to comment Share on other sites More sharing options...
seandisanti Posted May 3, 2006 Share Posted May 3, 2006 I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is. I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function. I used the sample code example: $asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1) If @error == 0 Then MsgBox(0, "SRE Example 5 Result", $asResult[0]) EndIf I'm using v3.1.0 Can someone tell me what I am missing?it sounds like you're missing the beta. stringregexp function requires beta. beta has alot more added functionality than that too, and has improved many times over on the production version 3.1 Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted May 3, 2006 Moderators Share Posted May 3, 2006 I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is. I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function. I used the sample code example: $asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1) If @error == 0 Then MsgBox(0, "SRE Example 5 Result", $asResult[0]) EndIf I'm using v3.1.0 Can someone tell me what I am missing?You're missing Beta is all: AutoIt Beta Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
yair Posted June 15, 2006 Share Posted June 15, 2006 (edited) hi, love this forum. i use expresso to learn and train my regxp strings.but it is a little differenet then the Autoit regxp engine.could someone explain what type of syntx autoit 'is more compatible with' regarding other regxp engines? Edited June 15, 2006 by yair Link to comment Share on other sites More sharing options...
BigDaddyO Posted June 15, 2006 Share Posted June 15, 2006 (edited) Here are some more Examples. I will edit this post and add more as I finish them. ;===================================================================== ; Verify that an e-mail address is properly formatted ; This was modified from a string located on Regular-Exp[b][/b]ressions.info ;===================================================================== $EmailAdds = "email adds here" if StringRegExp($EmailAdds, "\<[A-Za-z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\>") Then MsgBox(0, "E-Mail", "The E-Mail address entered is Valid") Else MsgBox(0, "E-Mail", "Please enter a Valid e-mail address") EndIf ;====================================================================== ; Search text and return all valid e-mail addressess that are in there ;====================================================================== $Text = FileRead("email.txt") $EmailFound = StringRegExp($Text, "([A-Za-z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})", 3) if @extended = 1 Then for $i = 0 to UBound($EmailFound) - 1 MsgBox(0, "E-Mail", $EmailFound[$i]) Next Else MsgBox(0, "E-Mail", "No E-Mail addressess found in the supplied text") EndIf ;edit: Here is my current script to format a last name to insert into a database $LastName = "Last'Name-Here" MsgBox(0, "Formatted for Database insertion", _StringProper(StringRegExpReplace($LastName, "[^-a-zA-Z0-9]|Jr\.*\>|jr\.*\>", ""))) I would like to make a Proper Case string using the StringRegExpReplace but I can't figure out how to replace a lowercase letter with the exact same letter only in uppercase. Any Ideas? Mike Edited June 15, 2006 by MikeOsdx Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now