DeltaRocked Posted November 2, 2012 Posted November 2, 2012 (edited) Hi, Type String 1 : <META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh> Type String 2 : <META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com"> Present Regex: (?i)(?:<[\s*]{0,1}meta http-equiv[\s*]{0,1}=[\s*]{0,1}["']{0,1}refresh["']{0,1}[^>]*)content[\s*]?=[\s*]?\"(.*?)\" This regex extract "Content" from string 2 but fails while processing String 1. Is there any way , by what of which I can have 1 regex which will extract "Content" from both the strings? regards Edited November 2, 2012 by DeltaRocked
Chance Posted November 2, 2012 Posted November 2, 2012 (edited) pffff #include <Array.au3> Global $Test[2] $Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' $Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">' For $I = 0 To 1 $Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=""(.*?)"".*?>", 3) _ArrayDisplay($Result) Next In case its the url ur after #include <Array.au3> Global $Test[2] $Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' $Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' For $I = 0 To 1 $Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=[""'](?:d{0,3};s?url=)(.*?)[""'].*?>", 3) _ArrayDisplay($Result) Next Edited November 2, 2012 by FlutterShy
blityon Posted November 2, 2012 Posted November 2, 2012 (edited) $arrayDatos = StringRegExp($datos, '<META .*? CONTENT="(.*?)".*?>', 3) Edited November 2, 2012 by blityon
Robjong Posted November 2, 2012 Posted November 2, 2012 Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.#include <Array.au3> Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _ '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only _ArrayDisplay($aResult)
Chance Posted November 2, 2012 Posted November 2, 2012 Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance. #include <Array.au3> Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _ '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only _ArrayDisplay($aResult) Note taken, just opened my eyes to some flaws in a project of mine, thanks for the contribution!
guinness Posted November 2, 2012 Posted November 2, 2012 Isn't using ? next to * pointless? So .*? is the same as .* but not .+?. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018
Robjong Posted November 2, 2012 Posted November 2, 2012 No. .* will match 0 or more occurrences of any character (except for newline, assuming single line mode), .+ would match 1 or more occurrences of any character, they would consume the largest possible match, this is called greedy. But in .*? the lazy operator (?) will tell it to return the smallest possible match, which would match nothing or 1 character. Because the pattern ".*?" starts and ends with a quote the engine will look for 0 or more character between 2 quotes, this would match against "", as well as "1". If we had the pattern ".+?" and matched it against the subject "" it would fail, because it needs at least 1 character, so it would match "1". Example: $sSubject = 'A string with a "quoted part", and a separate " floating in there.' ; Greedy $aResult = StringRegExp($sSubject, '".*"', 3) ; matches : "quoted part", and a separate " _ArrayDisplay($aResult, "Greedy") ; Lazy $aResult = StringRegExp($sSubject, '".*?"', 3) ; matches: "quoted part" _ArrayDisplay($aResult, "Lazy")
guinness Posted November 2, 2012 Posted November 2, 2012 I was having a dumb moment then. Thanks Robjong. UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018
DeltaRocked Posted November 2, 2012 Author Posted November 2, 2012 wow. learn't a lot from these posts. thank you everyone. now need to test this regex in real world pages - which incidentally contain phishing / malwares / clean too . regards
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now