guinness Posted May 24, 2012 Author Share Posted May 24, 2012 This is now considered solved. Thanks to jchd & UEZ as well as Spiff59 who participated in providing additional examples. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
Robjong Posted May 25, 2012 Share Posted May 25, 2012 (edited) Hey,It might be solved but since the post is to improve your knowledge of regular expressions I thought I would share anyway.Here is my take on it, have a read, I explained most of it in the comments though I admit it could have been clearer.expandcollapse popup#include <Array.au3> #cs Tested hosts files: 1. http://winhelp2002.mvps.org/hosts.txt 2. http://support.it-mate.co.uk/downloads/HOSTS.txt 3. http://remember.mine.nu/Hosts 4. http://www.autoitscript.com/forum/topic/140724-solved-improving-pcre-knowledge-parsing-hosts-file-for-hostnames/page__view__findpost__p__989981 5. local hosts file Timers samples: 1 (14446) 2 (183763) 3 (70595) 4 (15) Example: 75.9701664916754 | 881.994120293985 | 340.773511324434 | 0.254597270136493 Example_guinness: 159.703579821009 | 1934.34539773011 | 733.858217089146 | 0.265321733913304 Example_jchd: failed | failed | failed | 0.241597920103995 #ce Local $sData = FileRead(@SystemDir & 'driversetcHOSTS') ; 'HOSTS.txt' $timer = TimerInit() $aArray = Example($sData) ConsoleWrite(StringFormat("%-20s %-20s t%dn", 'Example', TimerDiff($timer), UBound($aArray, 1))) ;~ _ArrayDisplay($aArray, "Example") $timer = TimerInit() $aArray = Example_guinness($sData) ConsoleWrite(StringFormat("%-20s %-20s t%dn", 'Example_guinness', TimerDiff($timer), UBound($aArray, 1))) ;~ _ArrayDisplay($aArray, "Example_guinness") $timer = TimerInit() $aArray = Example_jchd($sData) ; please note that this one is not working for some hosts files ConsoleWrite(StringFormat("%-20s %-20s t%dn", 'Example_jchd', TimerDiff($timer), UBound($aArray, 1))) ; last to avoid breaking the script if it fails ;~ _ArrayDisplay($aArray, "Example_jchd") #cs - Combined replace patterns and added stripping of IPv6 entries (::1 ...). - Removed unnecessary group and unnecessary escape backslashes. These meta characters do not retain their special meaning inside a character set (some get a different meaning), except the - (dash) it expresses a character range, so if we list that last there is no need to escape. #ce Func Example_guinness($sData) $sData = StringRegExpReplace($sData, '#.*|(?m)^h*(::d.*|[d.]{7,15})', '') Return StringRegExp($sData, '[w/.-]{3,}', 3) EndFunc ;==>Example_guinness #cs jchd had the right idea, since we can not repeatedly capture a group, even with global matching, we have to make the pattern match repeatedly, which is done by skipping irrelevant parts before each match. This is where the G sequence comes into play, it anchors the match at the beginning of the string, just like the ^, at the first match only but from the second match on it anchors the end of the previous match. If we were to use ^ instead of G it would only match the first entry. We also notice this pattern is too restrictive when parsing some hosts files, it does not match any hostnames starting with digits for example. #ce Func Example_jchd($sData) ;~ Return StringRegExp($sData, "(?im)G(?:(?:s*#.*$s*)*|(?:s*)*)*(?:|(?:d{1,3}.){3}d{1,3}s+)((?:(?:d{1,3}.){3}d{1,3}.)?[[:alpha:]][w.-/]{2,})s*", 3) ; jchd's pattern (original) Return StringRegExp($sData, "(?im)G(?:s*(?:#.*$s*)*)*(?:(?:d{1,3}.){3}d{1,3}s+)?((?:(?:d{1,3}.){4})?[a-z][w/.-]{2,})s*", 3) ; slightly modified version of jchd's pattern EndFunc ;==>Example_jchd #cs We can write this (jchd's) pattern a bit cleaner and more effective if we drop extraneous parts and simplify other parts. We start with the G anchor, then optionally match, but not capture, pound or colon preceded parts (comments/IPv6), followed by 7-15 digits and dots (an IPv4, most likely) and then we capture the next word with optional slashes, dots and dashes. Looks pretty simple now right? #ce Func Example($sData) Return StringRegExp($sData, "(?im)G(?:h*[#:].*$s*)*(?:[d.]{7,15}h+)?([w/.-]{2,})s*", 3) ; simplified pattern EndFunc ;==>Example Edited May 25, 2012 by Robjong Link to comment Share on other sites More sharing options...
jchd Posted May 25, 2012 Share Posted May 25, 2012 (edited) You couldn't know but there has been a bit of talks in MPs about this. I agree that the RE could be simpler and while roaming I made a couple of dumb mistakes (like not allowing hosts with leading digits ). I also explicitely excluded IPv6 forms (that can be easily fixed once the rest works). I kept guinness subpatterns as is and added extras accordingly, since they perform a useful (?) format validation (albeit elementary) so I didn't look at simplifying more this part. The actual format guinness uses looks partly like this: 127.0.0.1 08.185.87.4.liveadvert.com 08.185.87.40.liveadvert.com 08.185.87.41.liveadvert.com and I even checked the validity of the host's IP part, which as your simplication shows, can be matched as part of the hostname in a single go without any attempt at breaking it down into pieces. Edit: is line 11092 of example 2 valid (hostname ending with dot)? 127.0.0.1 albatross.cz. Edited May 25, 2012 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
guinness Posted May 25, 2012 Author Share Posted May 25, 2012 It seems hphosts have made a mistake along the way with that hostname and the proceeding . (dot) plus never knew about the remember.mine.nu site.Thanks Robjong for adding to the discussion, certainly an enlightening insight into regular expressions. I will leave jchd & UEZ to continue as they know more than I do. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
Robjong Posted May 25, 2012 Share Posted May 25, 2012 The pattern is not optimal if you want to make sure the file is parsed correctly (too much room for error, you're warned), the implementation of regular expressions in AutoIt turned out to be too limited for my initial approach.However guinness stated he wanted to improve his knowledge of PCRE which I think my scripts helps with a bit. I will have look at the standards/rules for the hosts file (any good links?) and see what I come up with.BTW is the G sequence for SRE not documented in the help file? or do I just need to update again?@jchd So that's where all the good discussions are hehe, did anything worth mentioning come up there? About the entry I think it is not valid, dots are separators, but it does work. Link to comment Share on other sites More sharing options...
guinness Posted May 25, 2012 Author Share Posted May 25, 2012 BTW is the G sequence for SRE not documented in the help file? or do I just need to update again?It isn't but I'm afraid to update StringRegExp if G isn't officially supported in the version of PCRE AutoIt uses. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
jchd Posted May 25, 2012 Share Posted May 25, 2012 Oh yes G is part of PCRE from the start. It's mandatory to do things like that, where splittings may occur outside line breaks.There are a big number of options and metacharacters in PCRE that the help file doesn't discuss. OTOH, an exact discussion of many options would require lengthy explanations.Official PCRE man pages for the latest release are available here. I also make available a link to download the v8.30 html counterpart which I find easier to use.Remember that AutoIt PCRE may be a few versions behind but the differences are mostly in incredibly dark corners with sharp angles where most of us are likely never going to wander. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
GEOSoft Posted May 25, 2012 Share Posted May 25, 2012 I think we are far enough behind now that an update should be in order. I really don't think it's necessary to update with every new version but evry 4 or 5 wouldn't be so bad. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now