avery Posted December 15, 2009 Share Posted December 15, 2009 Does anyone know if this is impossible to do with RegEx or not? (Warning: Head-ache material) To populate an Array with: Example #1 04/19/2005 09:16 AM 16,384 BUILTIN\Administrators filename.doc 1111111111 22222222 333333 44444444444444444444444444444444444 Example #2 04/19/2005 09:16 AM 16,384 BUILTIN\Administrators filename.doc 1111111111 22222222 333333 4444444444444444444444 555555555555 The <blank area> are not tabs, unforgivably, they are spaces. I also understand a login name could include spaces as well so I figured "Example #1" is impossible or to high of a probability to result in bad results. I figured "Example #2" might be do-able if I was to understand regex better. I tried to use StringSplit but the delimiters are not consistent enough for me to get good results with. Please, if there are any regex guru's out there, help me. These things hurt my head worse then anything else. I understand I am asking for a lot of help. I'll donate 10$ to jon@autoitscript.com to help with his hosting bills if anyone is willing to try and help me out with my regex. Thanks for reading my post. Respectfully, Avery Howell Merry Christmas or Happy Holidays! The autoitscript.com domain runs on its own physical and dedicated server and currently handles 30GB of traffic per day. The hosting fees are paid for by user donations and my own money. Please make a donation if you feel AutoIt is worth supporting. No amount is too small - it all helps Thanks, Jon www.abox.orgAvery HowellVisit My AutoIt Websitehttp://www.abox.org Link to comment Share on other sites More sharing options...
PsaltyDS Posted December 15, 2009 Share Posted December 15, 2009 Oh, c'mon it wasn't that har... uhmm... I mean... That was tough! Here you go: #include <Array.au3> Global $aInput[3] = ["03/18/2007 08:16 AM 987 BUILTIN\Users SmallFile.doc", _ "05/20/2008 12:01 PM 16,384 BUILTIN\Administrators filename.doc", _ "04/19/2005 09:16 AM 2,316,384 BUILTIN\Guests BigFile.doc"] Global $sRegExp = "(\d{2}/\d{2}/\d{4})(?:\s+)(\d{2}:\d{2}\s[[:alpha:]]{2})(?:\s+)([0-9,]+)(?:\s+)(.+)" For $n = 0 To UBound($aInput) - 1 $aRET = StringRegExp($aInput[$n], $sRegExp, 3) If IsArray($aRET) Then _ArrayDisplay($aRET, $n & ": $aRET") Else ConsoleWrite($n & ": Error" & @LF) EndIf Next Make that donation to AutoIt commensurate with the extreme effort this required! Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted December 16, 2009 Moderators Share Posted December 16, 2009 I was just going to offer another pattern example to achieve the same thing... however, I had a thought that maybe this is one larger fileread or string. So...#include <Array.au3> Global $s_string = "04/19/2005 09:16 AM 16,384 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 44444444444444444444444444444444444" & @CRLF $s_string &= "08/24/2006 11:23 PM 6 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 4444444444444444444444 555555555555" ; If we have a large string, we can do this in two parts ( or one if you want to step 4) ; Get just the lines that are valid Global $a_just_lines = _myString_GetValidLinesArray($s_string) If IsArray($a_just_lines) = 0 Then Exit _ArrayDisplay($a_just_lines) ; If we are not skipping the above ( not using Step 4 ) ; Then we can send each individual line and get the 4 parts of the values returned Global $a_sep_data For $i = 0 To UBound($a_just_lines) - 1 $a_sep_data = _myString_GetValidDataArray($a_just_lines[$i]) _ArrayDisplay($a_sep_data) Next Func _myString_GetValidLinesArray($s_string) Local $s_pattern = "(\d{2}/\d{2}/\d{4}\s+\d+:\d+\s+(?:AM|PM)\s+[\d,]+\s+.+?)(?:\v|\z)" Return StringRegExp($s_string, $s_pattern, 3) EndFunc Func _myString_GetValidDataArray($s_string) Local $s_pattern = "(\d{2}/\d{2}/\d{4})\s+(\d+:\d+\s+(?:AM|PM))\s+([\d,]+)\s+(.+?)(?:\v|\z)" Return StringRegExp($s_string, $s_pattern, 3) EndFunc Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
PsaltyDS Posted December 16, 2009 Share Posted December 16, 2009 I was just going to offer another pattern example to achieve the same thing... however, I had a thought that maybe this is one larger fileread or string.So...Don't forget to emphasize what a huge level of effort this requires. We'd hate to see avery feel like a Scrooge at Christmas, now wouldn't we? Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
enaiman Posted December 16, 2009 Share Posted December 16, 2009 Not the worst case to work with; only 1 group out of 4 is "not known" to you (it may have white spaces or not). You know for sure that first group and the 3rd one does not have any white spaces. You know also that the 2nd group has 1 white space. It can be easily done without StringRegExp (easy for me because StringRegExp is still a matter of trail and error for me) this way: - StringStripWS with flag 4 (strip double or more spaces between words) - StringSplit for " " (white space) - [1] is the first group (date) - [2] & [3] is "time" - [4] is "size" - what's left is the last group It could have been worse: other groups might have or not white spaces or they might be present or not ... and you were speaking about headaches SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script wannabe "Unbeatable" Tic-Tac-Toe Paper-Scissor-Rock ... try to beat it anyway :) Link to comment Share on other sites More sharing options...
Malkey Posted December 16, 2009 Share Posted December 16, 2009 Here is another attempt at using the string of numbers in each example as a template for the entries of an array. #include <Array.au3> Global $s_string = "04/19/2005 09:16 AM 16,384 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 44444444444444444444444444444444444" & @CRLF $s_string &= "08/24/2006 11:23 PM 16,384 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 4444444444444444444444 555555555555" Local $temp $aInput = StringSplit(StringRegExpReplace($s_string, "([ ]+)", " "), @CRLF, 3) For $Ex = 0 To UBound($aInput) - 1 Step 2 Local $Pat = StringRegExp($aInput[$Ex + 1], "([^ ]+)", 3) Local $aArray[UBound($Pat)] ConsoleWrite($aInput[$Ex] & @CRLF) $Num = 1 For $i = 0 To StringLen($aInput[$Ex + 1] & " ") If StringMid($aInput[$Ex + 1] & " ", $i, 1) = $Num Then $temp &= StringMid($aInput[$Ex], $i, 1) EndIf If StringMid($aInput[$Ex + 1] & " ", $i, 1) = " " Then $aArray[$Num - 1] = $temp $Num += 1 ConsoleWrite($Num & " " & $temp & @CRLF) $temp = "" EndIf Next _ArrayDisplay($aArray) Next Link to comment Share on other sites More sharing options...
Skruge Posted December 16, 2009 Share Posted December 16, 2009 Don't forget to emphasize what a huge level of effort this requires. We'd hate to see avery feel like a Scrooge at Christmas, now wouldn't we?You rang?Seriously though, my contribution to this matter is thus:The given output looks exactly like the output of the "dir /q" command.If this is correct, then the owner field is fixed at 23 characters (longer names are concatenated with no space between it and the filename, shorter names are padded with spaces) [font="Tahoma"]"Tougher than the toughies and smarter than the smarties"[/font] Link to comment Share on other sites More sharing options...
Mison Posted December 16, 2009 Share Posted December 16, 2009 (edited) Regex Pattern..Single Line [\d/:,]+(?:(?:\sA|P)M)?|[A-Z]+\\.*(?=\s)|[a-z.]+Doesn't works if login name has spaces. FixedMultilines mode:(*ANYCRLF)(?m)[\d/:,]+(?:(?:\sA|P)M)?|[A-Z]+\\.*(?=\s\S+$)|[a-z.]+ Edited December 16, 2009 by Mison Hi ;) Link to comment Share on other sites More sharing options...
Malkey Posted December 16, 2009 Share Posted December 16, 2009 Another attempt. #include <Array.au3> Global $s_string = "04/19/2005 09:16 AM 16,384 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 44444444444444444444444444444444444" & @CRLF $s_string &= "08/24/2006 11:23 PM 16,384 BUILTIN\Administrators filename.doc" & @CRLF $s_string &= "1111111111 22222222 333333 4444444444444444444444 555555555555" $aInput = StringSplit(StringRegExpReplace($s_string, "([ ]+)", " "), @CRLF, 3) For $Ex = 0 To 1 Local $aResult = StringRegExp($aInput[$Ex], "(.{10}) *(.{8}) *(.{6}) *(.*)", 3) _ArrayDisplay($aResult) Next For $Ex = 2 To 3 Local $aResult2 = StringRegExp($aInput[$Ex], "(.{10}) *(.{8}) *(.{6}) *(.*?) (.*)", 3) _ArrayDisplay($aResult2) Next Link to comment Share on other sites More sharing options...
Anteaus Posted December 17, 2009 Share Posted December 17, 2009 Just a point, but unless strings containing spaces are enclosed in quotes (which it looks like they aren't) then I don't think #2 can be done by any method. #1 should be feasible if times and dates are assumed to be in a regular format.For example, if you have "domain\user name file.txt" there is no way of telling which section 'name' belongs to, so you cannot separate #4 from #5. Link to comment Share on other sites More sharing options...
avery Posted December 17, 2009 Author Share Posted December 17, 2009 Thanks guys. I still think it was a hard regex. My example was listed with the numbers under the data as the array index number I was looking to create using the regex but I'm pretty sure these awesome regex would work with either anyways, correct? The 111,222,333 etc is not in the original data-source I'm looking to parse. I will do the donation just as I promised and it was totally worth it even though some of you think it was easy. I've always struggled with regex for some reason. Maybe someone will buy me a regex book for Christmas, it was on my list to Santa. www.abox.orgAvery HowellVisit My AutoIt Websitehttp://www.abox.org Link to comment Share on other sites More sharing options...
enaiman Posted December 17, 2009 Share Posted December 17, 2009 Told you can be done without using RegEx. I agree, RegEx results in a shorter and faster code and for those RegEx gurus nothing is easier, but there are always workarounds. There is always at least one other way to do it. SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script wannabe "Unbeatable" Tic-Tac-Toe Paper-Scissor-Rock ... try to beat it anyway :) Link to comment Share on other sites More sharing options...
danielkza Posted December 18, 2009 Share Posted December 18, 2009 @avery:Have you checked http://www.regular-expressions.info ? RegEx looked like voodoo to me as well until I took the time to read a good deal of it's material. They have pretty clear examples of all the features (including more advanced topics, like lookarounds, greediness, etc), including explanations of how each match is performed.PS: Some motivation for you, if you need it:"Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee." Link to comment Share on other sites More sharing options...
PsaltyDS Posted December 18, 2009 Share Posted December 18, 2009 @danielkza: Your linky seemed to be a mashup of Cameron Laird's personal notes on "Regular Expressions" and Regular-Expressions.info Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now