Graywalker Posted January 24, 2012 Share Posted January 24, 2012 (edited) I am needing to remove everything that is not a letter, number, non-breaking space or / or . or ? or : or - I am trying this, $result = StringRegExpReplace($string, "[^[:word:][:blank:]/.?:-]", "") But not working. I am still getting all kinds of stuff like "@÷XÉ€ @·RÒð»3”%€ð‘ŽF" left in the string. ?? I could also use some assistance or pointers on replacing "/something.ext" with "/something.ext-cut-" so that I can trim after the extension in a url. Edited January 24, 2012 by Graywalker Link to comment Share on other sites More sharing options...
GEOSoft Posted January 24, 2012 Share Posted January 24, 2012 It would help immensly if you would post a couple of example strings and what you expect to be returned. The strings should be complete lines that contain the urls. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Xenobiologist Posted January 24, 2012 Share Posted January 24, 2012 $string = "13123213@÷XÉ€ @·RÒð»3”%€ð34543erfTest‘ŽF" ConsoleWrite(StringRegExpReplace($string, '[^a-zA-Z0-9/.?:-]', '') & @LF) Graywalker 1 Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
Graywalker Posted January 24, 2012 Author Share Posted January 24, 2012 (edited) Some of the lines I am trying to pull urls and text-only from cause problems when posting, but I will try... Line would look like : "®•mÁÖÌÅ`h”3@¼}3@¼}ï¾Þres://ieframe.dll/background_gradient.jpgÞbackground_gradient[2]Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾Þï¾ÞREDR€ˆ€T2http://servername/Sumtotal/lang-en/management/registrationex/LMS_Registration_Post.aspae9ddabdÞï¾Þï¾" I want to pull "res://ieframe.dll/background_gradient.jpg" and "http://servername/Sumtotal/lang-en/management/registrationex/LMS_Registration_Post.asp" out of that line. To trim the "_Post.aspae9ddabd" down to just "_Post.asp" , I was hoping for something like : $url = StringRegExpReplace($url, "/w*.[:alpha:]{0,4}", "$1-cut-") $cut = stringinstr($url,"-cut-",1) $len = stringlen($url) $url = stringtrimright($url,$len-$cut) Not that it is working as I would expect from reading the "help" file, but also, I could see having to put something to match .asp or .aspx or .html or .htm or .js or on and on... EDIT: $url = StringRegExpReplace($string, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-") Seems to be working fine for doing that. Edited January 24, 2012 by Graywalker Link to comment Share on other sites More sharing options...
Graywalker Posted January 24, 2012 Author Share Posted January 24, 2012 $string = "13123213@÷XÉ€ @·RÒð»3”%€ð34543erfTest‘ŽF" ConsoleWrite(StringRegExpReplace($string, '[^a-zA-Z0-9/.?:-]', '') & @LF) Thank you! That is working better than : $url = StringRegExpReplace($url,"[^[:word:][:blank:]/.?:-]f", "") Link to comment Share on other sites More sharing options...
Xenobiologist Posted January 24, 2012 Share Posted January 24, 2012 If you want to grep blanks too then you can replace a-zA-Z0-9 by w Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
GEOSoft Posted January 24, 2012 Share Posted January 24, 2012 Or if you don't want the blanks you can use [:alnum:] If you look in my sig there is a toolkit that will allow you to load a web page as html and test against it. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jchd Posted January 24, 2012 Share Posted January 24, 2012 Is it me or do the urls have a character encoding issue? I suspect it would be safer to solve that root cause instead. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Graywalker Posted January 24, 2012 Author Share Posted January 24, 2012 (edited) Is it me or do the urls have a character encoding issue? I suspect it would be safer to solve that root cause instead. Quite possibly. The whole script is finding the index.dat file in Temporary Internet Files and trying to pull the info from there. I have it working fairly well, but I know its getting a LOT of info, but not pulling all the urls from the file and it is giving me some blank entries - even though I have If StringInStr($ln, "://") And StringLen($ln) > 6 Then before returning the entry. So, the entry has "://" and is longer than six characters, but its displaying in msgbox and reports (csv) as blank?? I haven't even tried for getting the time stamps... It is eventually going to be a function in another script, but if anyone wants to play with index.dat and getting info from it, here is what I've got so far. expandcollapse popup; Some sample remote IE History locations ; ;ThatComputerc$UsersSomeUserAppDataLocalMicrosoftWindowsTemporary Internet FilesContent.IE5index.dat ;TheOtherPuterc$Documents and SettingsSomeUserLocal SettingsTemporary Internet FilesContent.IE5index.dat ; ;Registry keys of use ; ;HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionExplorerShell Folders - gives all user folders ;HKEY_USERS.DEFAULTSoftwareMicrosoftWindowsCurrentVersionExplorerUser Shell Folders - gives the paths for user folders #include <Array.au3> $strComputer = @ComputerName $user = @UserName ; the User Name for the folder may have the domain attached to it in some rare cases, ie "UserName.mybiz" Global $logfile = FileOpen("Report.csv", 2) $rprefix = "" & $strComputer & "" $ProfilesPath = RegRead($rprefix & "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionExplorerShell Folders", "Common Documents") ; This gives something like C:UsersPublicDocuments or C:Documents and SettingsAll UsersDocuments - we just want the first part. ; So, find the second "" and remove everything after that $2slash = StringInStr($ProfilesPath, "", 0, 2) $len = StringLen($ProfilesPath) $ProfilesPath = StringTrimRight($ProfilesPath, $len - $2slash) $ProfilesPath = StringReplace($ProfilesPath, ":", "$") ; get the IE history folder path $historypath = RegRead($rprefix & "HKEY_USERS.DEFAULTSoftwareMicrosoftWindowsCurrentVersionExplorerUser Shell Folders", "Cache") ; This gives us something like %USERPROFILE%AppDataLocalMicrosoftWindowsTemporary Internet Files or ; %USERPROFILE%Local SettingsTemporary Internet Files ; Strip out the %USERPROFILE% and we're in business! $historypath = StringReplace($historypath, "%USERPROFILE%", "") $indexdatpath = $rprefix & $ProfilesPath & $user & $historypath & "content.ie5index.dat" $test = FileExists($indexdatpath) ;MsgBox(0,"Index.dat Path",$indexdatpath & @CRLF & $test) _ParseIndexdat($indexdatpath) ;_OBOD_ParseIndexdat($indexdatpath) FileClose($logfile) Exit Func _ParseIndexdat($indexdatpath) ; Parse index.dat file for useable info ; The tools I've seen don't grab all the info I want :( $Bindexdat = FileOpen($indexdatpath, 16) $indexdat = FileRead($Bindexdat);$indexdatpath) $strIndexdat = BinaryToString($indexdat, 1) $strIndexdat = StringReplace($strIndexdat, @CRLF, @CR) $strIndexdat = StringReplace($strIndexdat, @LF, @CR) ;$strIndexdat = StringRegExpReplace($strIndexdat,"f|t|e","") ;$strIndexdat = StringReplace($strIndexdat,Chr(0)," ") $FileArray = StringSplit($strIndexdat, "URL", 1) ;This may get complex... Dim $r = 1 ; to count the records Dim $e = 0; to count the entries Dim $urls, $record = "", $ln Dim $ResultArray[2][5], $i ; Start reading from line 1 $i = 1 For $line In $FileArray ; Get the URLs $linearray = StringSplit($line, @CR, 1) For $ln In $linearray ;$ln = StringRegExpReplace($ln, '[^w/.?:-]', '') ;Enable this here and NOTHING is returned. ? ;$ln = StringStripWS($ln,7) $urls = StringRegExp($ln, "(http|https|res:|file|ftp)") Select Case $urls = 1 $aurls = StringSplit($ln, "REDR", 1) If $aurls[0] > 1 Then For $url In $aurls $url = StringRegExpReplace($url, '[^w/.?:-]', '');"[^[:word:][:blank:]/.?:-]f", "") $url = StringRegExpReplace($url, "(http|https|res:|file|ftp)", "-Start-" & "$1") $httppos = StringInStr($url, "-start-", 0) $url = StringTrimLeft($url, $httppos + 6) If StringInStr($url, "?") Then $ques = StringInStr($url, "?") $len = StringLen($url) $url = StringTrimRight($url, $len - $ques + 1) EndIf If StringInStr($url, "://") And StringLen($url) > 6 Then $url = StringRegExpReplace($url, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-") $cut = StringInStr($url, "-cut-", 1) $len = StringLen($url) $url = StringTrimRight($url, $len - $cut + 1) $ResultArray[$i][1] = $ResultArray[$i][1] & $url & "," EndIf Next Else $ln = StringRegExpReplace($ln, '[^w/.?:-]', '');"[^[:word:][:blank:]/.?:-]f", "") $ln = StringRegExpReplace($ln, "(http|https|res:|file|ftp)", "-Start-" & "$1") If StringInStr($ln, "-Start-") Then $httppos = StringInStr($ln, "-Start-", 0) $ln = StringTrimLeft($ln, $httppos + 6) Else $httppos = StringInStr($ln, "http", 0) $ln = StringTrimLeft($ln, $httppos - 1) EndIf If StringInStr($ln, "?") Then $ques = StringInStr($ln, "?") $len = StringLen($ln) $url = StringTrimRight($ln, $len - $ques + 1) EndIf If StringInStr($ln, "://") And StringLen($ln) > 6 Then $ln = StringRegExpReplace($ln, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-") $cut = StringInStr($ln, "-cut-", 1) $len = StringLen($ln) $ln = StringTrimRight($ln, $len - $cut + 1) $ResultArray[$i][1] = $ResultArray[$i][1] & $ln & "," EndIf EndIf Case StringInStr($ln, "Content-Type:") ; this is an entry I want $ln = StringStripWS($ln, 7) $ResultArray[$i][2] = $ln Case StringInStr($ln, "X-Powered-By:") ; this is an entry I want $ln = StringStripWS($ln, 7) $ResultArray[$i][3] = $ln Case StringInStr($ln, "~U:") ; this is an entry I want and it marks the end of a record $ln = StringReplace($ln, "~U:", "") $ln = StringStripWS($ln, 7) $ResultArray[$i][4] = $ln FileWriteLine($logfile, $ResultArray[$i][1] & "," & $ResultArray[$i][2] & "," & _ $ResultArray[$i][3] & "," & $ResultArray[$i][4]) $i = $i + 1 ReDim $ResultArray[$i + 1][5] ;_ArrayDisplay($ResultArray) Case Else ; do nothing with the line EndSelect Next Next EndFunc ;==>_ParseIndexdat Edited January 24, 2012 by Graywalker Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now