guinness Posted September 15, 2012 Share Posted September 15, 2012 The following code is fully working. But my question is how would you have done the regular expression? I currently have two groups for the MD5 checksum and filepath >> (w{32}) - Match a word with a value of 32 characters. s* - Match the delimiter space-asterix. (V+) - Match anything which isn't vertical space e.g. @LF or @CRLF. Any suggestions for improvements are welcome. #include <Array.au3> Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1 *Example.exe' & @CRLF & _ 'e5c13b7e919d8a36c308cf25e11efef4 *Example.exe' & @CRLF & _ '087102d22e8a313a3bc192f3fa6e19b6 *Example.exe' & @CRLF & _ '0637924f96cc8243fe49d811cf603784 *Example.exe' & @CRLF & _ 'c43282ed2ce63a31d015dd1f6e98e1c6 *C:ExampleExample.exe' & @CRLF ; Create an array like the following: ; $aArray[0] = MD5 checksum ; $aArray[1] = FilePath ; $aArray[2] = MD5 checksum ; $aArray[3] = FilePath ; $aArray[n] = MD5 checksum ; $aArray[n + 1] = FilePath Local $aSRE = StringRegExp($sMD5FileData, '(w{32})s*(V+)', 3) _ArrayDisplay($aSRE, 'SRE: ' & @error) UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
dany Posted September 15, 2012 Share Posted September 15, 2012 Depends. If this is just for parsing it's good, although I'd + the s because some implementations put two spaces to separate md5 from filepath. Also the asterisk might not be there. If it's also for validating then it's not strict enough. ((?i)[a-f0-9]{32}) - Match hexadecimal value of 32 characters, case-insensitive. s+*? - Match any number of delimiter spaces and optional asterisk. (V+) - Match anything which isn't vertical space e.g. @LF or @CRLF. [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
guinness Posted September 15, 2012 Author Share Posted September 15, 2012 (edited) Hey dany, Thanks for replying, I like the idea of ((?i)[a-f0-9]{32}), though in terms of optimisation isn't this better? ([A-Fa-f0-9]{32}) - Match hexadecimal value of 32 characters. I should have mentioned that this is an example for parsing a md5 checksum file created by the application getmd5checker.com, which uses s* as the delimiter. The standard options for a delimiter are space-asterix, pipe and double-space, so if this is to be a general parser, perhaps the following is better >> #include <Array.au3> Local $aDelimiter[3] = [' *', '|', ' '], $iDelimiter = Random(0, 2, 1) Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _ 'e5c13b7e919d8a36c308cf25e11efef4' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _ '087102d22e8a313a3bc192f3fa6e19b6' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _ '0637924f96cc8243fe49d811cf603784' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _ 'c43282ed2ce63a31d015dd1f6e98e1c6' & $aDelimiter[$iDelimiter] & 'C:ExampleExample.exe' & @CRLF ConsoleWrite('Delimiter option was - ' & StringReplace($aDelimiter[$iDelimiter], ' ', '<space>') & @CRLF) ; Create an array like the following: ; $aArray[0] = MD5 checksum ; $aArray[1] = FilePath ; $aArray[2] = MD5 checksum ; $aArray[3] = FilePath ; $aArray[n] = MD5 checksum ; $aArray[n + 1] = FilePath Local $aSRE = StringRegExp($sMD5FileData, '([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)', 3) ; Or is this better? ([A-Fa-f0-9]{32})s{0,2}(?:*||)?(V+) _ArrayDisplay($aSRE, 'SRE: ' & @error) Edited September 15, 2012 by guinness UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
dany Posted September 15, 2012 Share Posted September 15, 2012 ([A-Fa-f0-9]{32}) - Match hexadecimal value of 32 characters.Yea that's optimized imho, I imagine that the regexp machinery does this internally anyway. It's also less ambiguous when reading code.s means [ trn], so maybe use ' [ |*]'. And I wouldn't have used * quantifiers or ? but +, you know they're there and you want them, so demand them.([A-Fa-f0-9]{32}) [ |*](V+) ([A-Fa-f0-9]{32}) - Match hexadecimal value of 32 characters, case-insensitive. [ |*] - Match 2 delimiters: spaces or space with asterisk or pipe. (V+) - Match anything which isn't vertical space e.g. @LF or @CRLF.; Or is this better? ([A-Fa-f0-9]{32})s{0,2}(?:*||)?(V+)Character classes are a little faster than alternating groups and in this case they're a better choice imho. Sometimes you can't do without alternating groups but they can be quite costly:$sRegExp = '(autoit|aut2exe|au3check)' ; Ignoring actual casing in favor of example. $sTest = 'au3check'$sTest will be checked against 'autoit' first. Index 1 is a match. Index 2 as well. Index 3 fails. Back to index 1 and start matching against 'aut2exe'. Again matches up to index 2 but fails at index 3. Back to index 1 and test against 'au3check'. Now matches up to last index. Return True.This pattern would be marginally better:$sRegExp = 'au(3check|toit|t2exe)' ; Also note reordering to fail sooner.As I said, sometimes you can't do without them, but if you just want to match a single position then character classes are much faster:$sRegExp = 'gr(a|e)y' $sRegExp = 'gr[ae]y' ; Optimized. [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
ProgAndy Posted September 15, 2012 Share Posted September 15, 2012 You could also use ([[:xdigit:]]{32}) I think. It might be slightly faster since it is a predefined character class. *GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes Link to comment Share on other sites More sharing options...
guinness Posted September 15, 2012 Author Share Posted September 15, 2012 s means [ trn], so maybe use ' [ |*]'. And I wouldn't have used * quantifiers or ? but +, you know they're there and you want them, so demand them. Good tip, though I need the quantifiers + for the class, otherwise the array returns incorrect values. $sTest will be checked against 'autoit' first. Index 1 is a match. Index 2 as well. Index 3 fails. Back to index 1 and start matching against 'aut2exe'. Again matches up to index 2 but fails at index 3. Back to index 1 and test against 'au3check'. Now matches up to last index. Return True. This pattern would be marginally better:Thanks for the lesson. It's one thing I'm starting to learn with regular expressions, optimisation. Just like in AutoIt it takes time to know what is good and what is not OR if it's worth over complicating something just for a couple of milliseconds. You could also use ([[:xdigit:]]{32}) I think. It might be slightly faster since it is a predefined character class. I completely forgot about that character class. Thanks. I ran the following regular expression through a loop 1000 times and then divided the time by a 1000. This was the results. 0.0405123961529231 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+) 0.0374282045654834 - ([A-Fa-f0-9]{32})[ *|](V+) 0.0377551973027254 - ([[:xdigit:]]{32})[ *|](V+) UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
jchd Posted September 15, 2012 Share Posted September 15, 2012 You hit one subtle barrier. Yes, your timing is probably correct, but PCRE works in two passes (which are hidden in the current implementation of regexp support within AutoIt). First PCRE compiles the expression then it runs it. If you had access to a handle of the compiled expression (supposedly to be run 100 or so times) you might well find that different patterns run at various speed once compiled.Pattern compilation is what eats most of the time, unless it is applied to a huge string (e.g. large text file).A similar two-step occurs whith SQLite, where queries are first compiled into a runable VM (yes, a virtual machine) then only run for real.In both PCRE and SQLite demanding applications, common practice is to compile the expression or query once, then run it as many times as needed inside the application. This can be done for SQLite (albeit not recommended for beginners) but current AutoIt doesn't expose PCRE 2-steps bahavior, hence forcing a recompile of the pattern at every invokation. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
dany Posted September 15, 2012 Share Posted September 15, 2012 (edited) Good tip, though I need the quantifiers + for the class, otherwise the array returns incorrect values. What I said only applied to this little bit here between the capturing groups, "s{0,2}[*|]*". It will also match an empty string. Or in your case it would match a sha256 checksum, delimiters and filename as well. The sha256 checksum would be split at one-fourth and the rest would be in the second array index along with the whitespace and the filename. That's why I said if you know what to expect at an index demand it in your regexp. Also you're using quote rather than code and it looks like you're missing a space: ([A-Fa-f0-9]{32}) [ *|](V+) ^ whitespace! edit: typo Edited September 16, 2012 by dany [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
guinness Posted September 15, 2012 Author Share Posted September 15, 2012 (edited) Also you're using quote rather than code and it looks like you're missing a space: ([A-Fa-f0-9]{32}) [ *|](V+) ^ whitespace! That extra space should be in the group really, as the pipe delimiter will fail, due to it being formatted as MD5|FILEPATH (no whitespace.) Final results and regular exp<b></b>ressions. 0.257127732943223 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+) 0.408735544990339 - ([A-Fa-f0-9]{32})[ *|](V+) 0.455178289246338 - ([[:xdigit:]]{32})[ *|](V+) dany, Your help and input has been very much appreciated. It proves I'm learning everyday and willing to accept when I don't know something. You hit one subtle barrier. Yes, your timing is probably correct, but PCRE works in two passes (which are hidden in the current implementation of regexp support within AutoIt). First PCRE compiles the exp<b></b>ression then it runs it. If you had access to a handle of the compiled exp<b></b>ression (supposedly to be run 100 or so times) you might well find that different patterns run at various speed once compiled. Pattern compilation is what eats most of the time, unless it is applied to a huge string (e.g. large text file). A similar two-step occurs whith SQLite, where queries are first compiled into a runable VM (yes, a virtual machine) then only run for real. In both PCRE and SQLite demanding applications, common practice is to compile the exp<b></b>ression or query once, then run it as many times as needed inside the application. This can be done for SQLite (albeit not recommended for beginners) but current AutoIt doesn't expose PCRE 2-steps bahavior, hence forcing a recompile of the pattern at every invokation. Suffice to say your input and knowledge has increased my understanding of how AutoIt processes regular exp<b></b>ressions. Edited September 15, 2012 by guinness UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
dany Posted September 16, 2012 Share Posted September 16, 2012 (edited) That extra space should be in the group really, as the pipe delimiter will fail, due to it being formatted as MD5|FILEPATH (no whitespace.) Ok, but this is not a group but a character range. [aa] is the same as [a] or [aaa], it will match a single a only. I'm rather curious about your actual matches because as I see it, the space will be matched moving the engine on to the next part in the pattern, the (V+), which will then gobble up the second space or asterisk. It wouldn't exactly fail but it isn't the result you want. I would suggest this: ([A-Fa-f0-9]{32}) ?[ *|](V+) ; make it optional. Coming from an MVP i'm rather humbled and also glad my input isn't wasted. On the other hand... Final results and regular exp<b></b>ressions. 0.257127732943223 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+) 0.408735544990339 - ([A-Fa-f0-9]{32})[ *|](V+) 0.455178289246338 - ([[:xdigit:]]{32})[ *|](V+) Funny how the first exp<b></b>ression is now the fastest where it was the slowest... edit: Yea, jchd is absolutely right. That will slow down things, something you can't optimize against. Edited September 16, 2012 by dany [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
AZJIO Posted September 16, 2012 Share Posted September 16, 2012 (edited) #include <Array.au3> Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1 *Example.exe' & @CRLF & _ 'e5c13b7e919d8a36c308cf25e11efef4 *Example.exe' & @CRLF & _ '087102d22e8a313a3bc192f3fa6e19b6 *Example.exe' & @CRLF & _ '0637924f96cc8243fe49d811cf603784 *Example.exe' & @CRLF & _ 'c43282ed2ce63a31d015dd1f6e98e1c6 *C:ExampleExample.exe' & @CRLF ; MsgBox(0, 'Сообщение', $sMD5FileData ) $timer = TimerInit() For $i = 1 To 10000 Local $aSRE = StringRegExp($sMD5FileData, '([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)', 3) Next MsgBox(0, "???", 'time : ' & Round(TimerDiff($timer), 2) & ' msec') _ArrayDisplay($aSRE, 'SRE: ' & @error)Local $aSRE = StringRegExp($sMD5FileData, '(?m)^([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)r?$', 3) Local $aSRE = StringRegExp($sMD5FileData, '(?m)^(.{32})h+*(.+)r$', 3) Edited September 16, 2012 by AZJIO My other projects or all Link to comment Share on other sites More sharing options...
guinness Posted September 16, 2012 Author Share Posted September 16, 2012 Thanks AZJIO. I just think it goes to show there are plenty of ways to go about this. Ok, but this is not a group but a character range. [aa] is the same as [a] or [aaa], it will match a single a only. I'm rather curious about your actual matches because as I see it, the space will be matched moving the engine on to the next part in the pattern, the (V+), which will then gobble up the second space or asterisk. It wouldn't exactly fail but it isn't the result you want. I would suggest this: ([A-Fa-f0-9]{32}) ?[ *|](V+) ; make it optional.Of course, making it optional. Duh! Sometimes the obvious things are the obvious things. I also see your point about the character range, but it did match when I used a double space. Coming from an MVP i'm rather humbled and also glad my input isn't wasted. On the other hand... Well credit where credit is due. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
guinness Posted September 16, 2012 Author Share Posted September 16, 2012 (edited) dany, I was wrong, you were right, don't know what I saw last night but the double-space in the character class didn't work. Learnt something today, always double check. Edit: Here are the results, the last results were wrong because I didn't reset the timer! Delimiter option was - <space><space> 0.0425633102418133 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+) 0.044208049495654 - ([A-Fa-f0-9]{32}) ?[ *|](V+) 0.0386496617409211 - ([[:xdigit:]]{32}) ?[ *|](V+) 0.0430682183099314 - ([[:xdigit:]]{32})h*[ *|]([^*?/|"<>[:cntrl:]]+) Edited September 16, 2012 by guinness UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
ProgAndy Posted September 16, 2012 Share Posted September 16, 2012 (edited) 0.0199695295556956 - ([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+) This pattern only accepts at least one space and one star as delimiter. Replace h+* with h*[ *|] for horizontal whitespace followed by space, star or pipe. Edited September 16, 2012 by ProgAndy *GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes Link to comment Share on other sites More sharing options...
guinness Posted September 16, 2012 Author Share Posted September 16, 2012 (edited) 0.0199695295556956 - ([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)[code=auto:0] This pattern only accepts at least one space and one star as delimiter. Replace h+* with h*[ *|] for horizontal whitespace followed by space, star or pipe. I didn't check the output, so didn't notice this and took it at face value. Cheers for that. Edit: I changed the results above, seems you were right ProgAndy that the hexadecimal character class is an optimised approach. Edited September 16, 2012 by guinness UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now